Metron builds against Apache artifacts by default (storm 1.x, hbase 1.x,
Kafka 0.10), so the bits can run on other Hadoop installations that conform
to those versions, but our ansible uses HDP 2.5 as a base Hadoop. What
James meant was that upgrade instructions for Metron start with Hadoop
distribution upgrade instructions.

On Sat, Nov 5, 2016 at 00:53 Dima Kovalyov <dima.koval...@sstech.us> wrote:

> Hello James,
>
> Does that mean Metron 0.2.2 goes with HDP 2.5 by default?
>
> - Dima
>
> On 11/05/2016 06:26 AM, James Sirota wrote:
> > Hi Kyle,
> >
> > The HDP upgrade guide can be found here:
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html
> >
> > After executing these instructions you get to HDP 2.5 with no data
> loss.  After that, upgrading Metron is as simple as saving the old configs,
> ES templates, grok statements from HDFS, and NiFi flows from your 0.2.1
> build, installing 0.2.2 (via Ambari management pack), and putting the
> configs back into zookeeper, copying the ES templates and Grok files back,
> and restarting your NiFi flows.  I agree that we should automate most of
> this eventually, and we will, but I don't think this is necessarily a show
> stopper for dropping BETA.  Would you agree?
> >
> > Thanks,
> > James
> >
> > 04.11.2016, 18:27, "Kyle Richardson" <kylerichards...@gmail.com>:
> >> I'm a little late to the party but thought I would go ahead and throw my
> >> two cents into the mix.
> >>
> >> I share the concern around an upgrade / migration path. While I would
> love
> >> to see the BETA dropped sooner than later, to me, this is a game changer
> >> for people implementing Metron. I think there is a silent expectation
> of no
> >> data loss after dropping the BETA tag.
> >>
> >> Even if there is not a direct upgrade path for a few releases, is there
> >> documentation that we could provide to ensure a data migration path for
> >> users? I'm not thinking anything automated just some instructions on
> what
> >> to do.
> >>
> >> -Kyle
> >>
> >> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella <ceste...@gmail.com>
> wrote:
> >>
> >>>  Jon,
> >>>
> >>>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>>  coming. This kind of discussion is exactly why I sent out this
> thread. I
> >>>  think it's safe to say that the entire community shares your desire
> for
> >>>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>>  the masses." We should hold ourselves to a high standard, no doubt.
> >>>
> >>>  Casey
> >>>
> >>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
> >>>
> >>>  > Please understand that my points mostly relate to perception and
> ease of
> >>>  > use, not what's technically possible or available. I'm coming at
> this as
> >>>  > Metron should be a data analysis platform for the masses.
> >>>  >
> >>>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>>  > definition of non-issue. I personally believe that data (in every
> >>>  location
> >>>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>>  not
> >>>  > concerned that the correct data won't exist somewhere in the
> cluster, I'm
> >>>  > focusing on it being easily accessible by an operations team that
> may
> >>>  > consist of entry level analysts. Once 517 is done and merged I would
> >>>  > consider that a short term mitigation is in place.
> >>>  >
> >>>  > I feel like the project should stick to certain principles and a
> >>>  suggestion
> >>>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>>  > like this that was agreed upon, discussed, or documented? Probably a
> >>>  > discussion for a different thread.
> >>>  >
> >>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>>  and
> >>>  > resolving them would give a better first impression (assuming that
> people
> >>>  > monitoring the project will start using it more once it's non-BETA
> >>>  > software). First impressions are big on my book and could affect
> initial
> >>>  > adoption.
> >>>  >
> >>>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>>  saw
> >>>  > this issue as well. I think the finger is currently being pointed at
> >>>  monit
> >>>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>>  > run into it while the cluster is under load and after dozens of
> topology
> >>>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>>  this
> >>>  > relates to ease of use/load testing/tuning.
> >>>  >
> >>>  > Agree with the upgrade comments - as long as it's supported at some
> >>>  defined
> >>>  > point (IMHO this is when a project leaves BETA but others are
> welcome to
> >>>  > disagree).
> >>>  >
> >>>  > Finally, I know this doesn't come across well in email but I'm just
> >>>  > mentioning items which I think are important, not attempting to
> demand
> >>>  that
> >>>  > they be fixed or that this doesn't leave beta. Thanks,
> >>>  >
> >>>  > Jon
> >>>  >
> >>>  > On Thu, Nov 3, 2016, 16:44 James Sirota <jsir...@apache.org> wrote:
> >>>  >
> >>>  >
> >>>  > Hi Jon,
> >>>  >
> >>>  > Here are my thoughts around your objections.
> >>>  >
> >>>  > METRON-517/METRON-542
> >>>  >
> >>>  > I thin the mechanism currently exists within Metron to make this a
> >>>  > non-issue. I believe you can solve it with a combination of a
> Stellar
> >>>  > statement and ES templates. As you mentioned, we can truncate the
> string
> >>>  > and then include the relevant meta data in the message (original
> length,
> >>>  > hash, etc). Cramming really long strings into ES is generally a bad
> >>>  thing,
> >>>  > which is why this limitation exists. The metadata in the indexed
> >>>  message
> >>>  > along with the timestamp allows you to pull data from HDFS should
> you
> >>>  need
> >>>  > to recover the full string.
> >>>  >
> >>>  > METRON-485
> >>>  >
> >>>  > We cannot replicate this issue in our environment, but if this is
> indeed
> >>>  an
> >>>  > issue this is an issue with Storm. A Jira should be filed against
> Storm
> >>>  > and not against Metron. My hunch, though, is that it's probably
> >>>  something
> >>>  > in your environment. I just tried stopping all topologies on my AWS
> >>>  > cluster and then went to all Storm nodes and didn't see any workers
> left
> >>>  > behind.
> >>>  >
> >>>  > METRON-470
> >>>  >
> >>>  > I think this is mainly a consistency issue. I don't think this
> impacts
> >>>  the
> >>>  > stability or function of the software. I think this is a nice to
> have,
> >>>  > maybe in the next few releases, but I don't think we absolutely
> have to
> >>>  > have this to drop BETA
> >>>  >
> >>>  > With respect to upgrades, here are my thoughts. There is really no
> way
> >>>  to
> >>>  > upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a
> >>>  change
> >>>  > of HDP. The new build will only be compatible with HDP 2.5 and not
> 2.4.
> >>>  > So you have to lay down a new cluster regardless. We can document
> how to
> >>>  > get the configs off of your old Metron and plug them into your new
> Metron
> >>>  > so that it works the same. That shouldn't be a problem.
> >>>  >
> >>>  > Our upgrade path for future releases will revolve around the Ambari
> >>>  Metron
> >>>  > management pack that is available with the upcoming build. Right
> now the
> >>>  > install capability is available and the upgrade capability will
> come in
> >>>  > incrementally within the next few release. We will additionally
> >>>  deprecate
> >>>  > Monit and switch that functionality to Ambari as well. Finally, we
> will
> >>>  > also use Ambari for metrics monitoring. There is lots to do so we
> will
> >>>  > triage and prioritize Jiras as a community to see which parts we
> want to
> >>>  > tackle first. This is why your participation in the community is so
> >>>  > valuable.
> >>>  >
> >>>  > Thanks,
> >>>  > James
> >>>  >
> >>>  >
> >>>  >
> >>>  > 03.11.2016, 11:07, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>>  > > I agree that we can split METRON-517 into a short term and long
> term
> >>>  fix.
> >>>  > > I have attempted to organize my thoughts regarding the long term
> fix
> >>>  into
> >>>  > > METRON-542 and can get a PR out for METRON-517 soon to close that
> out.
> >>>  > >
> >>>  > > This leaves cluster tuning and a valid upgrade path for users, the
> >>>  latter
> >>>  > of
> >>>  > > which is my predominant concern. If the team is willing to say
> that
> >>>  > > starting with 0.2.2 there will be a valid upgrade path to future
> >>>  releases
> >>>  > I
> >>>  > > think that removing the BETA tag at 0.2.2 is reasonable. That
> said,
> >>>  this
> >>>  > > is just following my perception of what the BETA tag represents.
> >>>  > >
> >>>  > > Jon
> >>>  > >
> >>>  > > On Thu, Nov 3, 2016 at 11:50 AM Casey Stella <ceste...@gmail.com>
> >>>  wrote:
> >>>  > >
> >>>  > >> Ok, regarding METRON-517, I've thought about this a bit having
> read
> >>>  > your
> >>>  > >> really great and detailed JIRA as well as the discussion around
> this
> >>>  on
> >>>  > the
> >>>  > >> dev list between you and Matt Foley. I want to separate the
> >>>  discussion
> >>>  > >> between what is the correct long-term solution for this issue
> versus
> >>>  > what
> >>>  > >> is an acceptable solution.
> >>>  > >>
> >>>  > >> In terms of an acceptable work-around, my opinion is that
> because we
> >>>  > allow
> >>>  > >> the user to modify the ES template they can
> >>>  > >>
> >>>  > >> - Adjust the template to specify ignore_above
> >>>  > >> <
> >>>  > >>
> >>>  > https://www.elastic.co/guide/en/elasticsearch/reference/
> >>>  > current/ignore-above.html
> >>>  > >> >
> >>>  > >> on
> >>>  > >> fields which they feel are likely to be large (maybe every string
> >>>  > field)
> >>>  > >> - The combination of timestamp and ip_src_addr should be
> >>>  sufficient
> >>>  > for
> >>>  > >> picking out the raw data in question from the HDFS store
> >>>  > >> - A stellar enrichment can be used to tag the messages with large
> >>>  > URIs
> >>>  > >> and that can factor into the threat triage even or be used to
> >>>  filter
> >>>  > in
> >>>  > >> kibana
> >>>  > >> - As you say, you can use the profiler to track counts of such
> >>>  > messages
> >>>  > >> if you so desire and factor that into threat alerting or
> filtering
> >>>  > in
> >>>  > >> kibana.
> >>>  > >>
> >>>  > >> Ultimately, I believe we have exposed the appropriate set of
> tooling
> >>>  to
> >>>  > >> provide an acceptable solution for the moment. Now, as for the
> best
> >>>  > >> long-term solution, I will let the good discussion on the mailing
> >>>  list
> >>>  > and
> >>>  > >> JIRA continue and contribute my thoughts on the JIRA
> >>>  > >> <https://issues.apache.org/jira/browse/METRON-517>.
> >>>  > >>
> >>>  > >> Of course, this is just $0.02 :)
> >>>  > >>
> >>>  > >> Apologies to Dave, I wanted to mark this aspect of the
> discussion on
> >>>  > this
> >>>  > >> thread as it is relevant to sufficient criteria to remove the
> BETA
> >>>  tag.
> >>>  > >>
> >>>  > >> Best,
> >>>  > >>
> >>>  > >> Casey
> >>>  > >>
> >>>  > >> On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com <
> zeo...@gmail.com>
> >>>  > wrote:
> >>>  > >>
> >>>  > >> > To clarify, it only needs to truncate fields > 32766 which
> need a
> >>>  > >> > full/exact string match search to be run on them (analyzed
> fields
> >>>  > >> generally
> >>>  > >> > would not hit this limitation but I guess in theory they
> could).
> >>>  > >> However,
> >>>  > >> > that's probably every field which can get > 32766 because I'm
> >>>  > assuming
> >>>  > >> > those will all be strings.
> >>>  > >> >
> >>>  > >> > I also think using the profiler to monitor the truncation
> action
> >>>  > could
> >>>  > >> be a
> >>>  > >> > useful default.
> >>>  > >> >
> >>>  > >> > Jon
> >>>  > >> >
> >>>  > >> > On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com <zeo...@gmail.com>
> >>>  > wrote:
> >>>  > >> >
> >>>  > >> > > That would break searching on uri entirely unless you
> queried and
> >>>  > knew
> >>>  > >> to
> >>>  > >> > > truncate at 32766 because it's not analyzed. I don't like
> pushing
> >>>  > that
> >>>  > >> > > complication to the end user.
> >>>  > >> > >
> >>>  > >> > > I would suggest truncation in the indexingBolt (not using
> stellar
> >>>  > >> because
> >>>  > >> > > you'd want this across the board) for all fields > 32766
> (how do
> >>>  we
> >>>  > >> make
> >>>  > >> > > sure this gets updated if the limitation changes in Lucene?)
> and
> >>>  > adding
> >>>  > >> > > metadata key-value pairs (pre-trunc length, hash, truncated
> bool,
> >>>  > >> etc.).
> >>>  > >> > > In the URI scenario I would also suggest doing a multifield
> >>>  mapping
> >>>  > by
> >>>  > >> > > default because of the way that data is useful (not sure
> which
> >>>  > analyser
> >>>  > >> > to
> >>>  > >> > > use though - maybe write or find a good URI analyzer?). Since
> >>>  > >> timestamp
> >>>  > >> > is
> >>>  > >> > > a required field for all messages (I'm pretty sure?) I'm ok
> with
> >>>  > >> > timestamp
> >>>  > >> > > and field value used as the UID, but would prefer something
> >>>  better.
> >>>  > >> > >
> >>>  > >> > > Jon
> >>>  > >> > >
> >>>  > >> > > On Wed, Nov 2, 2016, 20:33 James Sirota <jsir...@apache.org>
> >>>  > wrote:
> >>>  > >> > >
> >>>  > >> > > Jon,
> >>>  > >> > >
> >>>  > >> > > For METRON-517 would it suffice to have a stellar statement
> to
> >>>  take
> >>>  > a
> >>>  > >> URI
> >>>  > >> > > string and truncate it to length of 32766 in the ES writer?
> But
> >>>  > still
> >>>  > >> > > write the actual string to HDFS? You can then search against
> ES
> >>>  on
> >>>  > the
> >>>  > >> > > truncated portion, but retrieve the actual timestamp from
> HDFS.
> >>>  > It's
> >>>  > >> > easy
> >>>  > >> > > to do because you know the timestamp from the original
> message.
> >>>  So
> >>>  > you
> >>>  > >> > > know which logs in HDFS to search through to find the data.
> >>>  > >> > >
> >>>  > >> > > 02.11.2016, 14:12, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>>  > >> > > > I personally would like to see the following things done
> before
> >>>  > >> things
> >>>  > >> > > > leave BETA:
> >>>  > >> > > > (1) Address data integrity concerns (Specifically thinking
> of
> >>>  > >> > METRON-370,
> >>>  > >> > > > METRON-517)
> >>>  > >> > > > (2) Make cluster tuning easier and more consistent
> (METRON-485,
> >>>  > >> > > METRON-470,
> >>>  > >> > > > and the "[DISCUSS] moving parsers back to flux" which I
> can't
> >>>  > find a
> >>>  > >> > JIRA
> >>>  > >> > > > for).
> >>>  > >> > > >
> >>>  > >> > > > I would also want to see the upgrade path (as opposed to
> >>>  rebuild)
> >>>  > be
> >>>  > >> > more
> >>>  > >> > > > thoroughly and regularly tested once things leave BETA.
> From my
> >>>  > >> > > > perspective I think the project is very close but not yet
> >>>  ready.
> >>>  > >> > > >
> >>>  > >> > > > Jon
> >>>  > >> > > >
> >>>  > >> > > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <
> >>>  ceste...@gmail.com>
> >>>  > >> > wrote:
> >>>  > >> > > >
> >>>  > >> > > > Hello Everyone,
> >>>  > >> > > >
> >>>  > >> > > > Now that the discussion around the next release has
> started, it
> >>>  > has
> >>>  > >> > been
> >>>  > >> > > > proposed and I think it's a good time to discuss what to
> name
> >>>  > this
> >>>  > >> next
> >>>  > >> > > > release. Before, we have adopted the BETA suffix. I think
> it
> >>>  > might be
> >>>  > >> > > > time to drop it and call the next release 0.2.2
> >>>  > >> > > >
> >>>  > >> > > > Thoughts?
> >>>  > >> > > >
> >>>  > >> > > > Best,
> >>>  > >> > > >
> >>>  > >> > > > Casey
> >>>  > >> > > >
> >>>  > >> > > > --
> >>>  > >> > > >
> >>>  > >> > > > Jon
> >>>  > >> > >
> >>>  > >> > > -------------------
> >>>  > >> > > Thank you,
> >>>  > >> > >
> >>>  > >> > > James Sirota
> >>>  > >> > > PPMC- Apache Metron (Incubating)
> >>>  > >> > > jsirota AT apache DOT org
> >>>  > >> > >
> >>>  > >> > > --
> >>>  > >> > >
> >>>  > >> > > Jon
> >>>  > >> > >
> >>>  > >> > --
> >>>  > >> >
> >>>  > >> > Jon
> >>>  > >> >
> >>>  > > --
> >>>  > >
> >>>  > > Jon
> >>>  >
> >>>  > -------------------
> >>>  > Thank you,
> >>>  >
> >>>  > James Sirota
> >>>  > PPMC- Apache Metron (Incubating)
> >>>  > jsirota AT apache DOT org
> >>>  >
> >>>  > --
> >>>  >
> >>>  > Jon
> >>>  >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>
>

Reply via email to