How we install depends on what we're choosing to keep around. My concern is getting core Metron's scope down to a supportable level. This entire conversation is probably just a thought experiment until we properly limit the rest of our scope. It's putting the cart before the horse. I want to emphasize this, because we're having a discussion about how to install something that in many ways doesn't actually exist yet.
A lot of the install complexity comes from managing so many moving parts at once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure we need a big installer to manage everything. Plenty of projects trust people to be able to run convenience scripts and shell commands. Again, I think this is an academic discussion until we figure out our overall project direction. On Tue, Apr 21, 2020 at 10:02 AM Nick Allen <n...@nickallen.org> wrote: > Hi Tom - > > > Do you or anyone have enough experience to judge if it is possible to > leverage Ansible as a replacement to deploy a working cluster? > > Yes, I worked a lot on the Ansible mechanism in the early days of Metron. > This was the primary deployment mechanism before we had the Ambari MPack. > > We found it very difficult to use Ansible to create a one-size-fits-all > deployment solution. It's possible, but very difficult to get a solution > that doesn't take close monitoring and manual work arounds when attempting > to use it across environments of different sizes and shapes. In terms of > usability, the Ambari MPack was a big step-up in my opinion. > > > > perhaps a dedicated docker image that is designed to connect with other > dockerized applications such as Storm, Kafka, etc..? > > Yes, I think that would be the way to go for a dev environment. We would be > able to use community supported containers for most of our underlying > platform needs. Unfortunately, this alone would not help anyone deploy > Metron on a cluster. > > > > > On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom <tom.ye...@ubc.ca> wrote: > > > Hi Nick, > > > > I see there is a lot of work done using Ansible in the repository. Do you > > or anyone have enough experience to judge if it is possible to leverage > > Ansible as a replacement to deploy a working cluster? > > > > Now that I am typing this out, I wonder if docker might be a solution > that > > would work? I don't have much experience with docker, perhaps a dedicated > > docker image that is designed to connect with other dockerized > applications > > such as Storm, Kafka, etc..? > > > > --Tom. > > > > On 2020-04-17, 11:27 AM, "Nick Allen" <n...@nickallen.org> wrote: > > > > This is a good discussion and one that I haven't fully grappled with > > in my > > own mind yet. I'll have more to add, but I just want to chime in on > the > > topic of Ambari at this point. > > > > ### Ambari and the Paywall > > > > The problem with Ambari is that its installation mechanism requires a > > repository of compiled packages (RPMs, DEBs, etc.) To install the > > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) > we > > relied on binary packages that were made freely available by > > Cloudera/Hortonworks. As of this past January, those packages are now > > behind a paywall. > > > > Due to the paywall, installing your own HDP cluster with Ambari is > now > > effectively dead. I am not sure if legacy versions of Kafka, HBase, > > Storm, > > etc will continue to be freely available, but even if so, we cannot > > continue to rely on this mechanism if new versions and security > updates > > will not be made available. > > > > The Apache Metron project does not publish compiled binaries or > > packages > > either. We do make the code freely available to allow users to build > > and > > publish their own Metron packages. But even with this capability, > > unless > > you have a means to install the underlying platform dependencies via > > Ambari, installing Metron with Ambari has little value. > > > > Unfortunately, I don't see a feasible path forward for Metron's > Ambari > > MPack. > > > > ### Dev Environment > > > > This not only impacts the users of Apache Metron, this impacts > > contributors > > also. Our primary development environment relies on that Ambari > > MPack. To > > continue development on any of the components of Apache Metron, we > > would > > need to build an alternative development environment that can > function > > despite the paywall. That could take many shapes, but in my opinion > it > > would be a blocker for continuing any development on Apache Metron, > > unfortunately. > > > > Please do let me know if anyone disagrees or can think of an > > alternative > > approach that would allow the current Ambari MPack to remain viable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov <dimdr...@gmail.com> > > wrote: > > > > > - Dropping Ambari. > > > > > > I like the progress that Apache did with Ambari in 2.7. And I don't > > know a > > > better installer/manager for all the services (we use other Hadoop > > eco > > > services besides Metron). > > > > > > Sometimes its buggy, agents get stuck or server needs reboot from > > time to > > > time, mpacks brake some functionality. But overall I feel this is > the > > > direction for central management and orchestration. > > > > > > - Dima > > > > > > On Wed, Apr 15, 2020, 12:45 Justin Leet <justinjl...@gmail.com> > > wrote: > > > > > > > This is a bit off the top of my head, but I'd I agree with pretty > > much > > > all > > > > of points on what's bringing a lot of overhead. There's probably > > also a > > > > worthwhile discussion about what value we're shooting for the > > project to > > > > provide to people that influences what stays/goes. > > > > > > > > Thinking out loud a bit > > > > > > > > - Dropping Storm and moving to Spark drops the very hard to > > > > tune/manage/troubleshoot Storm. > > > > - Dropping the UIs (and making SQL the external interface) > > pretty much > > > > implies dropping the REST APIs and ES/Solr. ES/Solr have been > > a giant > > > > source of dev heartache on the project and they exist > primarily > > for > > > the > > > > real time use case. People can build whatever UIs or use > > existing > > > tools > > > > against Parquet/Hive/whatever. > > > > - Dropping Ambari. It's a complex beast to install because of > > how many > > > > components we have. Dropping the above makes our install much > > easier > > > and > > > > should alleviate the need for a complex installer. > > > > > > > > At that point, we're basically left with > > > > > > > > - Some Spark for parse -> enrich -> output > > > > - The profiler > > > > - Stellar > > > > - Probably some other misc stuff (sensors, bro kafka plugging, > > etc.) > > > > > > > > At a glance, that seems almost an order of magnitude smaller than > > what we > > > > currently try to handle. > > > > > > > > I'm not really sure what an appropriate way to handle the > profiler > > is. > > > I've > > > > barely touched the code for it, so I anything I say is a vague > > guess. > > > > > > > > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom <tom.ye...@ubc.ca> > > wrote: > > > > > > > > > To me Metron is big and broad in the scope of technology > > required to > > > get > > > > > it running. If things were more modular that would go a long > way > > to > > > > > reducing the learning curve or at least putting it into smaller > > bites > > > > (and > > > > > it might encourage more people to get involved). > > > > > > > > > > If the UI were an add-on module in another project, it would > > have made > > > it > > > > > easier for me and it could also encourage my hypothetical buddy > > who is > > > a > > > > > web developer expert to get involved since he could focus on > the > > web-ui > > > > > module instead of trying to tackle all the other pieces that > are > > > probably > > > > > not part of his bailiwick. > > > > > > > > > > Stellar is very intriguing, maybe that is not unique to Metron? > > The > > > > > architecture of Metron with respect to parsing, enriching, > etc., > > makes > > > a > > > > > lot of sense to anyone I talk with. These two aspects of Metron > > seem > > > like > > > > > standout examples that make for a powerful platform to develop > > on. > > > > > > > > > > Thanks for continuing this discussion, > > > > > > > > > > Tom. > > > > > > > > > > > > > > > On 2020-04-08 15:32:46-07:00 Casey Stella wrote: > > > > > > > > > > As far as I know there is no minimum bar of development > activity > > to > > > keep > > > > a > > > > > project open. I think we would all be grateful for any > > investment that > > > > you > > > > > or your organization would want to make. > > > > > It also occurs to me that your observation is absolutely spot > > on: we > > > have > > > > > a LOT of moving parts. > > > > > I see some deficiencies here: > > > > > > > > > > * We depend on a lot of the various hadoop ecosystem > > projects and > > > > they > > > > > have to work together very precisely: > > > > > * This makes for a system that is hard to install. > > > > > * This also makes for a system which is hard to > > tune/manage > > > > > * We have a large surface area of coverage > > > > > * We have an installer, backend system and front-end UI, > > which > > > > > stretches our developers a bit thin, especially since there > > isn't even > > > > > interest in those systems > > > > > > > > > > Perhaps a reconsideration of the scope and technologies that we > > use > > > would > > > > > be merited? If we were to decide to, for instance: > > > > > > > > > > * Consolidate scope: focus on a viable backend/API rather > > than a UI > > > > > * Consolidate technology: reposition ourselves on top of > > Spark as a > > > > > consolidated streaming/batch system > > > > > * Make SQL our external interface: write out to parquet + > > the Hive > > > > > metastore and let users pin up presto tables or hive tables as > > they see > > > > fit > > > > > > > > > > This might reduce some of our surface area and make it more > > viable to > > > get > > > > > started? > > > > > Anyway, just some thoughts. > > > > > Casey > > > > > > > > > > On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom <tom.ye...@ubc.ca > > <mailto: > > > > > tom.ye...@ubc.ca>> wrote: > > > > > Hi Casey, > > > > > > > > > > I'm new here and new to contributing to an open source project. > > Thus > > > far > > > > > my contribution has been questions, however the steep learning > > curve > > > has > > > > > had me working to understand all the moving parts for the last > 18 > > > months > > > > > and I see that as a big investment by my organization. > > > > > > > > > > What is a level that would be viable? > > > > > > > > > > If my organization were to contribute I don't know that it > would > > be > > > soon > > > > > enough or at the volume that is recognized as viable, which is > > why I > > > ask > > > > > the question. > > > > > > > > > > > > > > > On 2020-04-08 15:05:51-07:00 Casey Stella wrote: > > > > > > > > > > Hi all, > > > > > > > > > > When composing the board report today, I realized that we have > > > > effectively > > > > > had no development in the last quarter on this project. Please > > be > > > aware > > > > > that I say this without a shred of blame or judgement > > (especially so > > > > > considering I have not contributed in a long time). That being > > said, I > > > > > would like to pose the question to the community: > > > > > > > > > > Do we feel that this project is viable? If so, how are we > going > > to > > > spur > > > > > new contributions? If not, then should we begin the process to > > fold > > > the > > > > > project? > > > > > > > > > > > > > > > Best, > > > > > > > > > > Casey > > > > > > > > > > > > > > > > > > > >