(This is only brainstorming.)

I like Metron's documentation. There has been effort and care taken there.

Ambari is nice, but given that mpack is moving behind a paywall then it seems 
that the groups benefiting from the paywall can chip in to build Metron mpacks 
at their leisure.

Elasticsearch is popular so I can see an argument to keep it. On the flip side, 
Elastisearch is not really trivial to run. Replacing that with a simple 
template that writes data to a file as an example of how to write an IndexDAO 
(pardon if my terminology is incorrect), and split Elasticsearch into another 
repo to be maintained by ELK enthusiasts would reduce the core workload further.

Hbase might be another piece that can be put into another project and another 
simpler example written that relies on something like SQLite could replace it. 
SQLite is relatively trivial to set up and run.

Deployment in Ansible and maintaining the development build, with some work in 
documentation on how to add modules like (as an example), Elasticsearch and 
Hbase for "bigger" development work.

--T.


On 2020-04-21 10:12:52-07:00 Otto Fowler wrote:

 I think the difference is the maintenance of the core of metron that *has*
to be, and other things that may still be done, but will be worked on for
their merits or by community need and not be required for everything

On April 21, 2020 at 10:29:24, Justin Leet (justinjl...@gmail.com) wrote:

How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level. This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope. It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen <n...@nickallen.org> wrote:

> Hi Tom -
>
> > Do you or anyone have enough experience to judge if it is 
possible to
> leverage Ansible as a replacement to deploy a working cluster?
>
> Yes, I worked a lot on the Ansible mechanism in the early days of 
Metron.
> This was the primary deployment mechanism before we had the Ambari 
MPack.
>
> We found it very difficult to use Ansible to create a one-size-fits-all
> deployment solution. It's possible, but very difficult to get a 
solution
> that doesn't take close monitoring and manual work arounds when
attempting
> to use it across environments of different sizes and shapes. In terms 
of
> usability, the Ambari MPack was a big step-up in my opinion.
>
>
> > perhaps a dedicated docker image that is designed to connect 
with other
> dockerized applications such as Storm, Kafka, etc..?
>
> Yes, I think that would be the way to go for a dev environment. We 
would
be
> able to use community supported containers for most of our underlying
> platform needs. Unfortunately, this alone would not help anyone deploy
> Metron on a cluster.
>
>
>
>
> On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom <tom.ye...@ubc.ca> 
wrote:
>
> > Hi Nick,
> >
> > I see there is a lot of work done using Ansible in the 
repository. Do
you
> > or anyone have enough experience to judge if it is possible 
to leverage
> > Ansible as a replacement to deploy a working cluster?
> >
> > Now that I am typing this out, I wonder if docker might be a 
solution
> that
> > would work? I don't have much experience with docker, perhaps 
a
dedicated
> > docker image that is designed to connect with other dockerized
> applications
> > such as Storm, Kafka, etc..?
> >
> > --Tom.
> >
> > On 2020-04-17, 11:27 AM, "Nick Allen" 
<n...@nickallen.org> wrote:
> >
> > This is a good discussion and one that I haven't fully 
grappled with
> > in my
> > own mind yet. I'll have more to add, but I just want to chime 
in on
> the
> > topic of Ambari at this point.
> >
> > ### Ambari and the Paywall
> >
> > The problem with Ambari is that its installation mechanism 
requires a
> > repository of compiled packages (RPMs, DEBs, etc.) To install 
the
> > underlying platform dependencies (like Kafka, HBase, Storm, 
Zk, etc)
> we
> > relied on binary packages that were made freely available by
> > Cloudera/Hortonworks. As of this past January, those packages 
are now
> > behind a paywall.
> >
> > Due to the paywall, installing your own HDP cluster with 
Ambari is
> now
> > effectively dead. I am not sure if legacy versions of Kafka, 
HBase,
> > Storm,
> > etc will continue to be freely available, but even if so, we 
cannot
> > continue to rely on this mechanism if new versions and 
security
> updates
> > will not be made available.
> >
> > The Apache Metron project does not publish compiled binaries 
or
> > packages
> > either. We do make the code freely available to allow users 
to build
> > and
> > publish their own Metron packages. But even with this 
capability,
> > unless
> > you have a means to install the underlying platform 
dependencies via
> > Ambari, installing Metron with Ambari has little value.
> >
> > Unfortunately, I don't see a feasible path forward for 
Metron's
> Ambari
> > MPack.
> >
> > ### Dev Environment
> >
> > This not only impacts the users of Apache Metron, this impacts
> > contributors
> > also. Our primary development environment relies on that 
Ambari
> > MPack. To
> > continue development on any of the components of Apache 
Metron, we
> > would
> > need to build an alternative development environment that can
> function
> > despite the paywall. That could take many shapes, but in my 
opinion
> it
> > would be a blocker for continuing any development on Apache 
Metron,
> > unfortunately.
> >
> > Please do let me know if anyone disagrees or can think of an
> > alternative
> > approach that would allow the current Ambari MPack to remain 
viable.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov 
<dimdr...@gmail.com>
> > wrote:
> >
> > > - Dropping Ambari.
> > >
> > > I like the progress that Apache did with Ambari in 
2.7. And I don't
> > know a
> > > better installer/manager for all the services (we 
use other Hadoop
> > eco
> > > services besides Metron).
> > >
> > > Sometimes its buggy, agents get stuck or server 
needs reboot from
> > time to
> > > time, mpacks brake some functionality. But overall I 
feel this is
> the
> > > direction for central management and orchestration.
> > >
> > > - Dima
> > >
> > > On Wed, Apr 15, 2020, 12:45 Justin Leet 
<justinjl...@gmail.com>
> > wrote:
> > >
> > > > This is a bit off the top of my head, but 
I'd I agree with pretty
> > much
> > > all
> > > > of points on what's bringing a lot of 
overhead. There's probably
> > also a
> > > > worthwhile discussion about what value 
we're shooting for the
> > project to
> > > > provide to people that influences what 
stays/goes.
> > > >
> > > > Thinking out loud a bit
> > > >
> > > > - Dropping Storm and moving to Spark drops 
the very hard to
> > > > tune/manage/troubleshoot Storm.
> > > > - Dropping the UIs (and making SQL the 
external interface)
> > pretty much
> > > > implies dropping the REST APIs and ES/Solr. 
ES/Solr have been
> > a giant
> > > > source of dev heartache on the project and 
they exist
> primarily
> > for
> > > the
> > > > real time use case. People can build 
whatever UIs or use
> > existing
> > > tools
> > > > against Parquet/Hive/whatever.
> > > > - Dropping Ambari. It's a complex beast to 
install because of
> > how many
> > > > components we have. Dropping the above 
makes our install much
> > easier
> > > and
> > > > should alleviate the need for a complex 
installer.
> > > >
> > > > At that point, we're basically left with
> > > >
> > > > - Some Spark for parse -> enrich 
-> output
> > > > - The profiler
> > > > - Stellar
> > > > - Probably some other misc stuff (sensors, 
bro kafka plugging,
> > etc.)
> > > >
> > > > At a glance, that seems almost an order of 
magnitude smaller than
> > what we
> > > > currently try to handle.
> > > >
> > > > I'm not really sure what an appropriate way 
to handle the
> profiler
> > is.
> > > I've
> > > > barely touched the code for it, so I 
anything I say is a vague
> > guess.
> > > >
> > > > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom 
<tom.ye...@ubc.ca>
> > wrote:
> > > >
> > > > > To me Metron is big and broad in 
the scope of technology
> > required to
> > > get
> > > > > it running. If things were more 
modular that would go a long
> way
> > to
> > > > > reducing the learning curve or at 
least putting it into smaller
> > bites
> > > > (and
> > > > > it might encourage more people to 
get involved).
> > > > >
> > > > > If the UI were an add-on module in 
another project, it would
> > have made
> > > it
> > > > > easier for me and it could also 
encourage my hypothetical buddy
> > who is
> > > a
> > > > > web developer expert to get 
involved since he could focus on
> the
> > web-ui
> > > > > module instead of trying to tackle 
all the other pieces that
> are
> > > probably
> > > > > not part of his bailiwick.
> > > > >
> > > > > Stellar is very intriguing, maybe 
that is not unique to Metron?
> > The
> > > > > architecture of Metron with 
respect to parsing, enriching,
> etc.,
> > makes
> > > a
> > > > > lot of sense to anyone I talk 
with. These two aspects of Metron
> > seem
> > > like
> > > > > standout examples that make for a 
powerful platform to develop
> > on.
> > > > >
> > > > > Thanks for continuing this 
discussion,
> > > > >
> > > > > Tom.
> > > > >
> > > > >
> > > > > On 2020-04-08 15:32:46-07:00 Casey 
Stella wrote:
> > > > >
> > > > > As far as I know there is no 
minimum bar of development
> activity
> > to
> > > keep
> > > > a
> > > > > project open. I think we would all 
be grateful for any
> > investment that
> > > > you
> > > > > or your organization would want to 
make.
> > > > > It also occurs to me that your 
observation is absolutely spot
> > on: we
> > > have
> > > > > a LOT of moving parts.
> > > > > I see some deficiencies here:
> > > > >
> > > > > * We depend on a lot of the 
various hadoop ecosystem
> > projects and
> > > > they
> > > > > have to work together very 
precisely:
> > > > > * This makes for a system that is 
hard to install.
> > > > > * This also makes for a system 
which is hard to
> > tune/manage
> > > > > * We have a large surface area of 
coverage
> > > > > * We have an installer, backend 
system and front-end UI,
> > which
> > > > > stretches our developers a bit 
thin, especially since there
> > isn't even
> > > > > interest in those systems
> > > > >
> > > > > Perhaps a reconsideration of the 
scope and technologies that we
> > use
> > > would
> > > > > be merited? If we were to decide 
to, for instance:
> > > > >
> > > > > * Consolidate scope: focus on a 
viable backend/API rather
> > than a UI
> > > > > * Consolidate technology: 
reposition ourselves on top of
> > Spark as a
> > > > > consolidated streaming/batch system
> > > > > * Make SQL our external interface: 
write out to parquet +
> > the Hive
> > > > > metastore and let users pin up 
presto tables or hive tables as
> > they see
> > > > fit
> > > > >
> > > > > This might reduce some of our 
surface area and make it more
> > viable to
> > > get
> > > > > started?
> > > > > Anyway, just some thoughts.
> > > > > Casey
> > > > >
> > > > > On Wed, Apr 8, 2020 at 6:20 PM 
Yerex, Tom <tom.ye...@ubc.ca> > <mailto:> > > 
> > tom.ye...@ubc.ca>> wrote:
> > > > > Hi Casey,
> > > > >
> > > > > I'm new here and new to 
contributing to an open source project.
> > Thus
> > > far
> > > > > my contribution has been 
questions, however the steep learning
> > curve
> > > has
> > > > > had me working to understand all 
the moving parts for the last
> 18
> > > months
> > > > > and I see that as a big investment 
by my organization.
> > > > >
> > > > > What is a level that would be 
viable?
> > > > >
> > > > > If my organization were to 
contribute I don't know that it
> would
> > be
> > > soon
> > > > > enough or at the volume that is 
recognized as viable, which is
> > why I
> > > ask
> > > > > the question.
> > > > >
> > > > >
> > > > > On 2020-04-08 15:05:51-07:00 Casey 
Stella wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > When composing the board report 
today, I realized that we have
> > > > effectively
> > > > > had no development in the last 
quarter on this project. Please
> > be
> > > aware
> > > > > that I say this without a shred of 
blame or judgement
> > (especially so
> > > > > considering I have not contributed 
in a long time). That being
> > said, I
> > > > > would like to pose the question to 
the community:
> > > > >
> > > > > Do we feel that this project is 
viable? If so, how are we
> going
> > to
> > > spur
> > > > > new contributions? If not, then 
should we begin the process to
> > fold
> > > the
> > > > > project?
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Casey
> > > > >
> > > > >
> > > >
> > >
> >
>
</mailto:></tom.ye...@ubc.ca></tom.ye...@ubc.ca></justinjl...@gmail.com></dimdr...@gmail.com></n...@nickallen.org></tom.ye...@ubc.ca></n...@nickallen.org>

Reply via email to