Me likey.  +1

On Wed, Nov 9, 2016 at 5:15 PM, James Sirota <jsir...@apache.org> wrote:

> Guys,
>
> You know, looking at the release I think the changes were significant
> enough due to the storm & kafka upgrade to justify moving it to a non-point
> release.  Generally point releases are reserved for patches or maintenance
> releases.  I think this release is more than just a maintenance release.  I
> suggest we consider 0.3.0
>
> 04.11.2016, 18:27, "Kyle Richardson" <kylerichards...@gmail.com>:
> > I'm a little late to the party but thought I would go ahead and throw my
> > two cents into the mix.
> >
> > I share the concern around an upgrade / migration path. While I would
> love
> > to see the BETA dropped sooner than later, to me, this is a game changer
> > for people implementing Metron. I think there is a silent expectation of
> no
> > data loss after dropping the BETA tag.
> >
> > Even if there is not a direct upgrade path for a few releases, is there
> > documentation that we could provide to ensure a data migration path for
> > users? I'm not thinking anything automated just some instructions on what
> > to do.
> >
> > -Kyle
> >
> > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella <ceste...@gmail.com> wrote:
> >
> >>  Jon,
> >>
> >>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>  coming. This kind of discussion is exactly why I sent out this thread.
> I
> >>  think it's safe to say that the entire community shares your desire for
> >>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>  the masses." We should hold ourselves to a high standard, no doubt.
> >>
> >>  Casey
> >>
> >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
> >>
> >>  > Please understand that my points mostly relate to perception and
> ease of
> >>  > use, not what's technically possible or available. I'm coming at
> this as
> >>  > Metron should be a data analysis platform for the masses.
> >>  >
> >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>  > definition of non-issue. I personally believe that data (in every
> >>  location
> >>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>  not
> >>  > concerned that the correct data won't exist somewhere in the
> cluster, I'm
> >>  > focusing on it being easily accessible by an operations team that may
> >>  > consist of entry level analysts. Once 517 is done and merged I would
> >>  > consider that a short term mitigation is in place.
> >>  >
> >>  > I feel like the project should stick to certain principles and a
> >>  suggestion
> >>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>  > like this that was agreed upon, discussed, or documented? Probably a
> >>  > discussion for a different thread.
> >>  >
> >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>  and
> >>  > resolving them would give a better first impression (assuming that
> people
> >>  > monitoring the project will start using it more once it's non-BETA
> >>  > software). First impressions are big on my book and could affect
> initial
> >>  > adoption.
> >>  >
> >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>  saw
> >>  > this issue as well. I think the finger is currently being pointed at
> >>  monit
> >>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>  > run into it while the cluster is under load and after dozens of
> topology
> >>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>  this
> >>  > relates to ease of use/load testing/tuning.
> >>  >
> >>  > Agree with the upgrade comments - as long as it's supported at some
> >>  defined
> >>  > point (IMHO this is when a project leaves BETA but others are
> welcome to
> >>  > disagree).
> >>  >
> >>  > Finally, I know this doesn't come across well in email but I'm just
> >>  > mentioning items which I think are important, not attempting to
> demand
> >>  that
> >>  > they be fixed or that this doesn't leave beta. Thanks,
> >>  >
> >>  > Jon
> >>  >
> >>  > On Thu, Nov 3, 2016, 16:44 James Sirota <jsir...@apache.org> wrote:
> >>  >
> >>  >
> >>  > Hi Jon,
> >>  >
> >>  > Here are my thoughts around your objections.
> >>  >
> >>  > METRON-517/METRON-542
> >>  >
> >>  > I thin the mechanism currently exists within Metron to make this a
> >>  > non-issue. I believe you can solve it with a combination of a Stellar
> >>  > statement and ES templates. As you mentioned, we can truncate the
> string
> >>  > and then include the relevant meta data in the message (original
> length,
> >>  > hash, etc). Cramming really long strings into ES is generally a bad
> >>  thing,
> >>  > which is why this limitation exists. The metadata in the indexed
> >>  message
> >>  > along with the timestamp allows you to pull data from HDFS should you
> >>  need
> >>  > to recover the full string.
> >>  >
> >>  > METRON-485
> >>  >
> >>  > We cannot replicate this issue in our environment, but if this is
> indeed
> >>  an
> >>  > issue this is an issue with Storm. A Jira should be filed against
> Storm
> >>  > and not against Metron. My hunch, though, is that it's probably
> >>  something
> >>  > in your environment. I just tried stopping all topologies on my AWS
> >>  > cluster and then went to all Storm nodes and didn't see any workers
> left
> >>  > behind.
> >>  >
> >>  > METRON-470
> >>  >
> >>  > I think this is mainly a consistency issue. I don't think this
> impacts
> >>  the
> >>  > stability or function of the software. I think this is a nice to
> have,
> >>  > maybe in the next few releases, but I don't think we absolutely have
> to
> >>  > have this to drop BETA
> >>  >
> >>  > With respect to upgrades, here are my thoughts. There is really no
> way
> >>  to
> >>  > upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a
> >>  change
> >>  > of HDP. The new build will only be compatible with HDP 2.5 and not
> 2.4.
> >>  > So you have to lay down a new cluster regardless. We can document
> how to
> >>  > get the configs off of your old Metron and plug them into your new
> Metron
> >>  > so that it works the same. That shouldn't be a problem.
> >>  >
> >>  > Our upgrade path for future releases will revolve around the Ambari
> >>  Metron
> >>  > management pack that is available with the upcoming build. Right now
> the
> >>  > install capability is available and the upgrade capability will come
> in
> >>  > incrementally within the next few release. We will additionally
> >>  deprecate
> >>  > Monit and switch that functionality to Ambari as well. Finally, we
> will
> >>  > also use Ambari for metrics monitoring. There is lots to do so we
> will
> >>  > triage and prioritize Jiras as a community to see which parts we
> want to
> >>  > tackle first. This is why your participation in the community is so
> >>  > valuable.
> >>  >
> >>  > Thanks,
> >>  > James
> >>  >
> >>  >
> >>  >
> >>  > 03.11.2016, 11:07, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  > > I agree that we can split METRON-517 into a short term and long
> term
> >>  fix.
> >>  > > I have attempted to organize my thoughts regarding the long term
> fix
> >>  into
> >>  > > METRON-542 and can get a PR out for METRON-517 soon to close that
> out.
> >>  > >
> >>  > > This leaves cluster tuning and a valid upgrade path for users, the
> >>  latter
> >>  > of
> >>  > > which is my predominant concern. If the team is willing to say that
> >>  > > starting with 0.2.2 there will be a valid upgrade path to future
> >>  releases
> >>  > I
> >>  > > think that removing the BETA tag at 0.2.2 is reasonable. That said,
> >>  this
> >>  > > is just following my perception of what the BETA tag represents.
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > On Thu, Nov 3, 2016 at 11:50 AM Casey Stella <ceste...@gmail.com>
> >>  wrote:
> >>  > >
> >>  > >> Ok, regarding METRON-517, I've thought about this a bit having
> read
> >>  > your
> >>  > >> really great and detailed JIRA as well as the discussion around
> this
> >>  on
> >>  > the
> >>  > >> dev list between you and Matt Foley. I want to separate the
> >>  discussion
> >>  > >> between what is the correct long-term solution for this issue
> versus
> >>  > what
> >>  > >> is an acceptable solution.
> >>  > >>
> >>  > >> In terms of an acceptable work-around, my opinion is that because
> we
> >>  > allow
> >>  > >> the user to modify the ES template they can
> >>  > >>
> >>  > >> - Adjust the template to specify ignore_above
> >>  > >> <
> >>  > >>
> >>  > https://www.elastic.co/guide/en/elasticsearch/reference/
> >>  > current/ignore-above.html
> >>  > >> >
> >>  > >> on
> >>  > >> fields which they feel are likely to be large (maybe every string
> >>  > field)
> >>  > >> - The combination of timestamp and ip_src_addr should be
> >>  sufficient
> >>  > for
> >>  > >> picking out the raw data in question from the HDFS store
> >>  > >> - A stellar enrichment can be used to tag the messages with large
> >>  > URIs
> >>  > >> and that can factor into the threat triage even or be used to
> >>  filter
> >>  > in
> >>  > >> kibana
> >>  > >> - As you say, you can use the profiler to track counts of such
> >>  > messages
> >>  > >> if you so desire and factor that into threat alerting or filtering
> >>  > in
> >>  > >> kibana.
> >>  > >>
> >>  > >> Ultimately, I believe we have exposed the appropriate set of
> tooling
> >>  to
> >>  > >> provide an acceptable solution for the moment. Now, as for the
> best
> >>  > >> long-term solution, I will let the good discussion on the mailing
> >>  list
> >>  > and
> >>  > >> JIRA continue and contribute my thoughts on the JIRA
> >>  > >> <https://issues.apache.org/jira/browse/METRON-517>.
> >>  > >>
> >>  > >> Of course, this is just $0.02 :)
> >>  > >>
> >>  > >> Apologies to Dave, I wanted to mark this aspect of the discussion
> on
> >>  > this
> >>  > >> thread as it is relevant to sufficient criteria to remove the BETA
> >>  tag.
> >>  > >>
> >>  > >> Best,
> >>  > >>
> >>  > >> Casey
> >>  > >>
> >>  > >> On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com <
> zeo...@gmail.com>
> >>  > wrote:
> >>  > >>
> >>  > >> > To clarify, it only needs to truncate fields > 32766 which need
> a
> >>  > >> > full/exact string match search to be run on them (analyzed
> fields
> >>  > >> generally
> >>  > >> > would not hit this limitation but I guess in theory they could).
> >>  > >> However,
> >>  > >> > that's probably every field which can get > 32766 because I'm
> >>  > assuming
> >>  > >> > those will all be strings.
> >>  > >> >
> >>  > >> > I also think using the profiler to monitor the truncation action
> >>  > could
> >>  > >> be a
> >>  > >> > useful default.
> >>  > >> >
> >>  > >> > Jon
> >>  > >> >
> >>  > >> > On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com <zeo...@gmail.com>
> >>  > wrote:
> >>  > >> >
> >>  > >> > > That would break searching on uri entirely unless you queried
> and
> >>  > knew
> >>  > >> to
> >>  > >> > > truncate at 32766 because it's not analyzed. I don't like
> pushing
> >>  > that
> >>  > >> > > complication to the end user.
> >>  > >> > >
> >>  > >> > > I would suggest truncation in the indexingBolt (not using
> stellar
> >>  > >> because
> >>  > >> > > you'd want this across the board) for all fields > 32766 (how
> do
> >>  we
> >>  > >> make
> >>  > >> > > sure this gets updated if the limitation changes in Lucene?)
> and
> >>  > adding
> >>  > >> > > metadata key-value pairs (pre-trunc length, hash, truncated
> bool,
> >>  > >> etc.).
> >>  > >> > > In the URI scenario I would also suggest doing a multifield
> >>  mapping
> >>  > by
> >>  > >> > > default because of the way that data is useful (not sure which
> >>  > analyser
> >>  > >> > to
> >>  > >> > > use though - maybe write or find a good URI analyzer?). Since
> >>  > >> timestamp
> >>  > >> > is
> >>  > >> > > a required field for all messages (I'm pretty sure?) I'm ok
> with
> >>  > >> > timestamp
> >>  > >> > > and field value used as the UID, but would prefer something
> >>  better.
> >>  > >> > >
> >>  > >> > > Jon
> >>  > >> > >
> >>  > >> > > On Wed, Nov 2, 2016, 20:33 James Sirota <jsir...@apache.org>
> >>  > wrote:
> >>  > >> > >
> >>  > >> > > Jon,
> >>  > >> > >
> >>  > >> > > For METRON-517 would it suffice to have a stellar statement to
> >>  take
> >>  > a
> >>  > >> URI
> >>  > >> > > string and truncate it to length of 32766 in the ES writer?
> But
> >>  > still
> >>  > >> > > write the actual string to HDFS? You can then search against
> ES
> >>  on
> >>  > the
> >>  > >> > > truncated portion, but retrieve the actual timestamp from
> HDFS.
> >>  > It's
> >>  > >> > easy
> >>  > >> > > to do because you know the timestamp from the original
> message.
> >>  So
> >>  > you
> >>  > >> > > know which logs in HDFS to search through to find the data.
> >>  > >> > >
> >>  > >> > > 02.11.2016, 14:12, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>  > >> > > > I personally would like to see the following things done
> before
> >>  > >> things
> >>  > >> > > > leave BETA:
> >>  > >> > > > (1) Address data integrity concerns (Specifically thinking
> of
> >>  > >> > METRON-370,
> >>  > >> > > > METRON-517)
> >>  > >> > > > (2) Make cluster tuning easier and more consistent
> (METRON-485,
> >>  > >> > > METRON-470,
> >>  > >> > > > and the "[DISCUSS] moving parsers back to flux" which I
> can't
> >>  > find a
> >>  > >> > JIRA
> >>  > >> > > > for).
> >>  > >> > > >
> >>  > >> > > > I would also want to see the upgrade path (as opposed to
> >>  rebuild)
> >>  > be
> >>  > >> > more
> >>  > >> > > > thoroughly and regularly tested once things leave BETA.
> From my
> >>  > >> > > > perspective I think the project is very close but not yet
> >>  ready.
> >>  > >> > > >
> >>  > >> > > > Jon
> >>  > >> > > >
> >>  > >> > > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <
> >>  ceste...@gmail.com>
> >>  > >> > wrote:
> >>  > >> > > >
> >>  > >> > > > Hello Everyone,
> >>  > >> > > >
> >>  > >> > > > Now that the discussion around the next release has
> started, it
> >>  > has
> >>  > >> > been
> >>  > >> > > > proposed and I think it's a good time to discuss what to
> name
> >>  > this
> >>  > >> next
> >>  > >> > > > release. Before, we have adopted the BETA suffix. I think it
> >>  > might be
> >>  > >> > > > time to drop it and call the next release 0.2.2
> >>  > >> > > >
> >>  > >> > > > Thoughts?
> >>  > >> > > >
> >>  > >> > > > Best,
> >>  > >> > > >
> >>  > >> > > > Casey
> >>  > >> > > >
> >>  > >> > > > --
> >>  > >> > > >
> >>  > >> > > > Jon
> >>  > >> > >
> >>  > >> > > -------------------
> >>  > >> > > Thank you,
> >>  > >> > >
> >>  > >> > > James Sirota
> >>  > >> > > PPMC- Apache Metron (Incubating)
> >>  > >> > > jsirota AT apache DOT org
> >>  > >> > >
> >>  > >> > > --
> >>  > >> > >
> >>  > >> > > Jon
> >>  > >> > >
> >>  > >> > --
> >>  > >> >
> >>  > >> > Jon
> >>  > >> >
> >>  > > --
> >>  > >
> >>  > > Jon
> >>  >
> >>  > -------------------
> >>  > Thank you,
> >>  >
> >>  > James Sirota
> >>  > PPMC- Apache Metron (Incubating)
> >>  > jsirota AT apache DOT org
> >>  >
> >>  > --
> >>  >
> >>  > Jon
> >>  >
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>



-- 
Nick Allen <n...@nickallen.org>

Reply via email to