Re: [DISCUSS] Next Release Name

2016-11-09 Thread Otto Fowler
+ 1


On November 9, 2016 at 17:15:16, James Sirota (jsir...@apache.org) wrote:

Guys,

You know, looking at the release I think the changes were significant
enough due to the storm & kafka upgrade to justify moving it to a non-point
release. Generally point releases are reserved for patches or maintenance
releases. I think this release is more than just a maintenance release. I
suggest we consider 0.3.0

04.11.2016, 18:27, "Kyle Richardson" :
> I'm a little late to the party but thought I would go ahead and throw my
> two cents into the mix.
>
> I share the concern around an upgrade / migration path. While I would
love
> to see the BETA dropped sooner than later, to me, this is a game changer
> for people implementing Metron. I think there is a silent expectation of
no
> data loss after dropping the BETA tag.
>
> Even if there is not a direct upgrade path for a few releases, is there
> documentation that we could provide to ensure a data migration path for
> users? I'm not thinking anything automated just some instructions on what
> to do.
>
> -Kyle
>
> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
>
>>  Jon,
>>
>>  Thank you for your thoughts; they are appreciated and you should keep
them
>>  coming. This kind of discussion is exactly why I sent out this thread.
I
>>  think it's safe to say that the entire community shares your desire for
>>  Metron to be as easy to use as possible and a "data analysis platform
for
>>  the masses." We should hold ourselves to a high standard, no doubt.
>>
>>  Casey
>>
>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
wrote:
>>
>>  > Please understand that my points mostly relate to perception and ease
of
>>  > use, not what's technically possible or available. I'm coming at this
as
>>  > Metron should be a data analysis platform for the masses.
>>  >
>>  > METRON-517/542 - While I'm willing to let this one go it depends on
your
>>  > definition of non-issue. I personally believe that data (in every
>>  location
>>  > that it exists) needs to be obvious and have ultra high integrity.
I'm
>>  not
>>  > concerned that the correct data won't exist somewhere in the cluster,
I'm
>>  > focusing on it being easily accessible by an operations team that may
>>  > consist of entry level analysts. Once 517 is done and merged I would
>>  > consider that a short term mitigation is in place.
>>  >
>>  > I feel like the project should stick to certain principles and a
>>  suggestion
>>  > is that data access is easy, accurate, and obvious. Do we have
anything
>>  > like this that was agreed upon, discussed, or documented? Probably a
>>  > discussion for a different thread.
>>  >
>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
that
>>  and
>>  > resolving them would give a better first impression (assuming that
people
>>  > monitoring the project will start using it more once it's non-BETA
>>  > software). First impressions are big on my book and could affect
initial
>>  > adoption.
>>  >
>>  > Regarding 485 - Otto may be able to clarify but I thought somebody
else
>>  saw
>>  > this issue as well. I think the finger is currently being pointed at
>>  monit
>>  > timeouts and not storm. It also doesn't happen every single time, I
only
>>  > run into it while the cluster is under load and after dozens of
topology
>>  > restarts that I do when tuning parallelism in storm. I'm going to be
>>  > updating to storm 1.0.x in order to see if this still exists. Again,
>>  this
>>  > relates to ease of use/load testing/tuning.
>>  >
>>  > Agree with the upgrade comments - as long as it's supported at some
>>  defined
>>  > point (IMHO this is when a project leaves BETA but others are welcome
to
>>  > disagree).
>>  >
>>  > Finally, I know this doesn't come across well in email but I'm just
>>  > mentioning items which I think are important, not attempting to
demand
>>  that
>>  > they be fixed or that this doesn't leave beta. Thanks,
>>  >
>>  > Jon
>>  >
>>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>>  >
>>  >
>>  > Hi Jon,
>>  >
>>  > Here are my thoughts around your objections.
>>  >
>>  > METRON-517/METRON-542
>>  >
>>  > I thin the mechanism currently exists within Metron to make this a
>>  > non-issue. I believe you can solve it with a combination of a Stellar
>>  > statement and ES templates. As you mentioned, we can truncate the
string
>>  > and then include the relevant meta data in the message (original
length,
>>  > hash, etc). Cramming really long strings into ES is generally a bad
>>  thing,
>>  > which is why this limitation exists. The metadata in the indexed
>>  message
>>  > along with the timestamp allows you to pull data from HDFS should you
>>  need
>>  > to recover the full string.
>>  >
>>  > METRON-485
>>  >
>>  > We cannot replicate this issue in our environment, but if this is
indeed
>>  an
>>  > issue this is an issue with Storm. A Jira should 

Re: [DISCUSS] Next Release Name

2016-11-09 Thread Kyle Richardson
Makes sense to me. +1

-Kyle

> On Nov 9, 2016, at 5:50 PM, Nick Allen  wrote:
> 
> Me likey.  +1
> 
>> On Wed, Nov 9, 2016 at 5:15 PM, James Sirota  wrote:
>> 
>> Guys,
>> 
>> You know, looking at the release I think the changes were significant
>> enough due to the storm & kafka upgrade to justify moving it to a non-point
>> release.  Generally point releases are reserved for patches or maintenance
>> releases.  I think this release is more than just a maintenance release.  I
>> suggest we consider 0.3.0
>> 
>> 04.11.2016, 18:27, "Kyle Richardson" :
>>> I'm a little late to the party but thought I would go ahead and throw my
>>> two cents into the mix.
>>> 
>>> I share the concern around an upgrade / migration path. While I would
>> love
>>> to see the BETA dropped sooner than later, to me, this is a game changer
>>> for people implementing Metron. I think there is a silent expectation of
>> no
>>> data loss after dropping the BETA tag.
>>> 
>>> Even if there is not a direct upgrade path for a few releases, is there
>>> documentation that we could provide to ensure a data migration path for
>>> users? I'm not thinking anything automated just some instructions on what
>>> to do.
>>> 
>>> -Kyle
>>> 
 On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
 
 Jon,
 
 Thank you for your thoughts; they are appreciated and you should keep
>> them
 coming. This kind of discussion is exactly why I sent out this thread.
>> I
 think it's safe to say that the entire community shares your desire for
 Metron to be as easy to use as possible and a "data analysis platform
>> for
 the masses." We should hold ourselves to a high standard, no doubt.
 
 Casey
 
 On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
>> wrote:
 
> Please understand that my points mostly relate to perception and
>> ease of
> use, not what's technically possible or available. I'm coming at
>> this as
> Metron should be a data analysis platform for the masses.
> 
> METRON-517/542 - While I'm willing to let this one go it depends on
>> your
> definition of non-issue. I personally believe that data (in every
 location
> that it exists) needs to be obvious and have ultra high integrity.
>> I'm
 not
> concerned that the correct data won't exist somewhere in the
>> cluster, I'm
> focusing on it being easily accessible by an operations team that may
> consist of entry level analysts. Once 517 is done and merged I would
> consider that a short term mitigation is in place.
> 
> I feel like the project should stick to certain principles and a
 suggestion
> is that data access is easy, accurate, and obvious. Do we have
>> anything
> like this that was agreed upon, discussed, or documented? Probably a
> discussion for a different thread.
> 
> METRON-485/470/etc. were mostly to illustrate a consistency issue
>> that
 and
> resolving them would give a better first impression (assuming that
>> people
> monitoring the project will start using it more once it's non-BETA
> software). First impressions are big on my book and could affect
>> initial
> adoption.
> 
> Regarding 485 - Otto may be able to clarify but I thought somebody
>> else
 saw
> this issue as well. I think the finger is currently being pointed at
 monit
> timeouts and not storm. It also doesn't happen every single time, I
>> only
> run into it while the cluster is under load and after dozens of
>> topology
> restarts that I do when tuning parallelism in storm. I'm going to be
> updating to storm 1.0.x in order to see if this still exists. Again,
 this
> relates to ease of use/load testing/tuning.
> 
> Agree with the upgrade comments - as long as it's supported at some
 defined
> point (IMHO this is when a project leaves BETA but others are
>> welcome to
> disagree).
> 
> Finally, I know this doesn't come across well in email but I'm just
> mentioning items which I think are important, not attempting to
>> demand
 that
> they be fixed or that this doesn't leave beta. Thanks,
> 
> Jon
> 
> On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
> 
> 
> Hi Jon,
> 
> Here are my thoughts around your objections.
> 
> METRON-517/METRON-542
> 
> I thin the mechanism currently exists within Metron to make this a
> non-issue. I believe you can solve it with a combination of a Stellar
> statement and ES templates. As you mentioned, we can truncate the
>> string
> and then include the relevant meta data in the message (original
>> length,
> hash, etc). Cramming really long strings into ES is generally a bad
 thing,
> which is why this limitation exists. The metadata in the indexed

Re: [DISCUSS] Next Release Name

2016-11-09 Thread Nick Allen
Me likey.  +1

On Wed, Nov 9, 2016 at 5:15 PM, James Sirota  wrote:

> Guys,
>
> You know, looking at the release I think the changes were significant
> enough due to the storm & kafka upgrade to justify moving it to a non-point
> release.  Generally point releases are reserved for patches or maintenance
> releases.  I think this release is more than just a maintenance release.  I
> suggest we consider 0.3.0
>
> 04.11.2016, 18:27, "Kyle Richardson" :
> > I'm a little late to the party but thought I would go ahead and throw my
> > two cents into the mix.
> >
> > I share the concern around an upgrade / migration path. While I would
> love
> > to see the BETA dropped sooner than later, to me, this is a game changer
> > for people implementing Metron. I think there is a silent expectation of
> no
> > data loss after dropping the BETA tag.
> >
> > Even if there is not a direct upgrade path for a few releases, is there
> > documentation that we could provide to ensure a data migration path for
> > users? I'm not thinking anything automated just some instructions on what
> > to do.
> >
> > -Kyle
> >
> > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
> >
> >>  Jon,
> >>
> >>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>  coming. This kind of discussion is exactly why I sent out this thread.
> I
> >>  think it's safe to say that the entire community shares your desire for
> >>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>  the masses." We should hold ourselves to a high standard, no doubt.
> >>
> >>  Casey
> >>
> >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> wrote:
> >>
> >>  > Please understand that my points mostly relate to perception and
> ease of
> >>  > use, not what's technically possible or available. I'm coming at
> this as
> >>  > Metron should be a data analysis platform for the masses.
> >>  >
> >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>  > definition of non-issue. I personally believe that data (in every
> >>  location
> >>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>  not
> >>  > concerned that the correct data won't exist somewhere in the
> cluster, I'm
> >>  > focusing on it being easily accessible by an operations team that may
> >>  > consist of entry level analysts. Once 517 is done and merged I would
> >>  > consider that a short term mitigation is in place.
> >>  >
> >>  > I feel like the project should stick to certain principles and a
> >>  suggestion
> >>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>  > like this that was agreed upon, discussed, or documented? Probably a
> >>  > discussion for a different thread.
> >>  >
> >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>  and
> >>  > resolving them would give a better first impression (assuming that
> people
> >>  > monitoring the project will start using it more once it's non-BETA
> >>  > software). First impressions are big on my book and could affect
> initial
> >>  > adoption.
> >>  >
> >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>  saw
> >>  > this issue as well. I think the finger is currently being pointed at
> >>  monit
> >>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>  > run into it while the cluster is under load and after dozens of
> topology
> >>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>  this
> >>  > relates to ease of use/load testing/tuning.
> >>  >
> >>  > Agree with the upgrade comments - as long as it's supported at some
> >>  defined
> >>  > point (IMHO this is when a project leaves BETA but others are
> welcome to
> >>  > disagree).
> >>  >
> >>  > Finally, I know this doesn't come across well in email but I'm just
> >>  > mentioning items which I think are important, not attempting to
> demand
> >>  that
> >>  > they be fixed or that this doesn't leave beta. Thanks,
> >>  >
> >>  > Jon
> >>  >
> >>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
> >>  >
> >>  >
> >>  > Hi Jon,
> >>  >
> >>  > Here are my thoughts around your objections.
> >>  >
> >>  > METRON-517/METRON-542
> >>  >
> >>  > I thin the mechanism currently exists within Metron to make this a
> >>  > non-issue. I believe you can solve it with a combination of a Stellar
> >>  > statement and ES templates. As you mentioned, we can truncate the
> string
> >>  > and then include the relevant meta data in the message (original
> length,
> >>  > hash, etc). Cramming really long strings into ES is generally a bad
> >>  thing,
> >>  > which is why this limitation exists. The metadata in the indexed
> >>  message
> >>  > along with the 

Re: [DISCUSS] Next Release Name

2016-11-09 Thread Ryan Merriman
+1

On Wed, Nov 9, 2016 at 4:30 PM, Casey Stella  wrote:

> Agreed, +1 to 0.3.0
>
> On Wed, Nov 9, 2016 at 5:28 PM, zeo...@gmail.com  wrote:
>
> > That sounds very reasonable to me.
> >
> > Jon
> >
> > On Wed, Nov 9, 2016, 17:15 James Sirota  wrote:
> >
> > Guys,
> >
> > You know, looking at the release I think the changes were significant
> > enough due to the storm & kafka upgrade to justify moving it to a
> non-point
> > release.  Generally point releases are reserved for patches or
> maintenance
> > releases.  I think this release is more than just a maintenance
> release.  I
> > suggest we consider 0.3.0
> >
> > 04.11.2016, 18:27, "Kyle Richardson" :
> > > I'm a little late to the party but thought I would go ahead and throw
> my
> > > two cents into the mix.
> > >
> > > I share the concern around an upgrade / migration path. While I would
> > love
> > > to see the BETA dropped sooner than later, to me, this is a game
> changer
> > > for people implementing Metron. I think there is a silent expectation
> of
> > no
> > > data loss after dropping the BETA tag.
> > >
> > > Even if there is not a direct upgrade path for a few releases, is there
> > > documentation that we could provide to ensure a data migration path for
> > > users? I'm not thinking anything automated just some instructions on
> what
> > > to do.
> > >
> > > -Kyle
> > >
> > > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella 
> wrote:
> > >
> > >>  Jon,
> > >>
> > >>  Thank you for your thoughts; they are appreciated and you should keep
> > them
> > >>  coming. This kind of discussion is exactly why I sent out this
> thread.
> > I
> > >>  think it's safe to say that the entire community shares your desire
> for
> > >>  Metron to be as easy to use as possible and a "data analysis platform
> > for
> > >>  the masses." We should hold ourselves to a high standard, no doubt.
> > >>
> > >>  Casey
> > >>
> > >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> > wrote:
> > >>
> > >>  > Please understand that my points mostly relate to perception and
> ease
> > of
> > >>  > use, not what's technically possible or available. I'm coming at
> this
> > as
> > >>  > Metron should be a data analysis platform for the masses.
> > >>  >
> > >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> > your
> > >>  > definition of non-issue. I personally believe that data (in every
> > >>  location
> > >>  > that it exists) needs to be obvious and have ultra high integrity.
> > I'm
> > >>  not
> > >>  > concerned that the correct data won't exist somewhere in the
> cluster,
> > I'm
> > >>  > focusing on it being easily accessible by an operations team that
> may
> > >>  > consist of entry level analysts. Once 517 is done and merged I
> would
> > >>  > consider that a short term mitigation is in place.
> > >>  >
> > >>  > I feel like the project should stick to certain principles and a
> > >>  suggestion
> > >>  > is that data access is easy, accurate, and obvious. Do we have
> > anything
> > >>  > like this that was agreed upon, discussed, or documented? Probably
> a
> > >>  > discussion for a different thread.
> > >>  >
> > >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> > that
> > >>  and
> > >>  > resolving them would give a better first impression (assuming that
> > people
> > >>  > monitoring the project will start using it more once it's non-BETA
> > >>  > software). First impressions are big on my book and could affect
> > initial
> > >>  > adoption.
> > >>  >
> > >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> > else
> > >>  saw
> > >>  > this issue as well. I think the finger is currently being pointed
> at
> > >>  monit
> > >>  > timeouts and not storm. It also doesn't happen every single time, I
> > only
> > >>  > run into it while the cluster is under load and after dozens of
> > topology
> > >>  > restarts that I do when tuning parallelism in storm. I'm going to
> be
> > >>  > updating to storm 1.0.x in order to see if this still exists.
> Again,
> > >>  this
> > >>  > relates to ease of use/load testing/tuning.
> > >>  >
> > >>  > Agree with the upgrade comments - as long as it's supported at some
> > >>  defined
> > >>  > point (IMHO this is when a project leaves BETA but others are
> welcome
> > to
> > >>  > disagree).
> > >>  >
> > >>  > Finally, I know this doesn't come across well in email but I'm just
> > >>  > mentioning items which I think are important, not attempting to
> > demand
> > >>  that
> > >>  > they be fixed or that this doesn't leave beta. Thanks,
> > >>  >
> > >>  > Jon
> > >>  >
> > >>  > On Thu, Nov 3, 2016, 16:44 James Sirota 
> wrote:
> > >>  >
> > >>  >
> > >>  > Hi Jon,
> > >>  >
> > >>  > Here are my thoughts around your objections.
> > >>  >
> > >>  > METRON-517/METRON-542
> > >>  >
> > >>  > I thin the 

Re: [DISCUSS] Next Release Name

2016-11-09 Thread Casey Stella
Agreed, +1 to 0.3.0

On Wed, Nov 9, 2016 at 5:28 PM, zeo...@gmail.com  wrote:

> That sounds very reasonable to me.
>
> Jon
>
> On Wed, Nov 9, 2016, 17:15 James Sirota  wrote:
>
> Guys,
>
> You know, looking at the release I think the changes were significant
> enough due to the storm & kafka upgrade to justify moving it to a non-point
> release.  Generally point releases are reserved for patches or maintenance
> releases.  I think this release is more than just a maintenance release.  I
> suggest we consider 0.3.0
>
> 04.11.2016, 18:27, "Kyle Richardson" :
> > I'm a little late to the party but thought I would go ahead and throw my
> > two cents into the mix.
> >
> > I share the concern around an upgrade / migration path. While I would
> love
> > to see the BETA dropped sooner than later, to me, this is a game changer
> > for people implementing Metron. I think there is a silent expectation of
> no
> > data loss after dropping the BETA tag.
> >
> > Even if there is not a direct upgrade path for a few releases, is there
> > documentation that we could provide to ensure a data migration path for
> > users? I'm not thinking anything automated just some instructions on what
> > to do.
> >
> > -Kyle
> >
> > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
> >
> >>  Jon,
> >>
> >>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>  coming. This kind of discussion is exactly why I sent out this thread.
> I
> >>  think it's safe to say that the entire community shares your desire for
> >>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>  the masses." We should hold ourselves to a high standard, no doubt.
> >>
> >>  Casey
> >>
> >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> wrote:
> >>
> >>  > Please understand that my points mostly relate to perception and ease
> of
> >>  > use, not what's technically possible or available. I'm coming at this
> as
> >>  > Metron should be a data analysis platform for the masses.
> >>  >
> >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>  > definition of non-issue. I personally believe that data (in every
> >>  location
> >>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>  not
> >>  > concerned that the correct data won't exist somewhere in the cluster,
> I'm
> >>  > focusing on it being easily accessible by an operations team that may
> >>  > consist of entry level analysts. Once 517 is done and merged I would
> >>  > consider that a short term mitigation is in place.
> >>  >
> >>  > I feel like the project should stick to certain principles and a
> >>  suggestion
> >>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>  > like this that was agreed upon, discussed, or documented? Probably a
> >>  > discussion for a different thread.
> >>  >
> >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>  and
> >>  > resolving them would give a better first impression (assuming that
> people
> >>  > monitoring the project will start using it more once it's non-BETA
> >>  > software). First impressions are big on my book and could affect
> initial
> >>  > adoption.
> >>  >
> >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>  saw
> >>  > this issue as well. I think the finger is currently being pointed at
> >>  monit
> >>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>  > run into it while the cluster is under load and after dozens of
> topology
> >>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>  this
> >>  > relates to ease of use/load testing/tuning.
> >>  >
> >>  > Agree with the upgrade comments - as long as it's supported at some
> >>  defined
> >>  > point (IMHO this is when a project leaves BETA but others are welcome
> to
> >>  > disagree).
> >>  >
> >>  > Finally, I know this doesn't come across well in email but I'm just
> >>  > mentioning items which I think are important, not attempting to
> demand
> >>  that
> >>  > they be fixed or that this doesn't leave beta. Thanks,
> >>  >
> >>  > Jon
> >>  >
> >>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
> >>  >
> >>  >
> >>  > Hi Jon,
> >>  >
> >>  > Here are my thoughts around your objections.
> >>  >
> >>  > METRON-517/METRON-542
> >>  >
> >>  > I thin the mechanism currently exists within Metron to make this a
> >>  > non-issue. I believe you can solve it with a combination of a Stellar
> >>  > statement and ES templates. As you mentioned, we can truncate the
> string
> >>  > and then include the relevant meta data in the message (original
> length,
> >>  > hash, etc). Cramming really long strings into ES is generally a bad

Re: [DISCUSS] Next Release Name

2016-11-09 Thread zeo...@gmail.com
That sounds very reasonable to me.

Jon

On Wed, Nov 9, 2016, 17:15 James Sirota  wrote:

Guys,

You know, looking at the release I think the changes were significant
enough due to the storm & kafka upgrade to justify moving it to a non-point
release.  Generally point releases are reserved for patches or maintenance
releases.  I think this release is more than just a maintenance release.  I
suggest we consider 0.3.0

04.11.2016, 18:27, "Kyle Richardson" :
> I'm a little late to the party but thought I would go ahead and throw my
> two cents into the mix.
>
> I share the concern around an upgrade / migration path. While I would love
> to see the BETA dropped sooner than later, to me, this is a game changer
> for people implementing Metron. I think there is a silent expectation of
no
> data loss after dropping the BETA tag.
>
> Even if there is not a direct upgrade path for a few releases, is there
> documentation that we could provide to ensure a data migration path for
> users? I'm not thinking anything automated just some instructions on what
> to do.
>
> -Kyle
>
> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
>
>>  Jon,
>>
>>  Thank you for your thoughts; they are appreciated and you should keep
them
>>  coming. This kind of discussion is exactly why I sent out this thread. I
>>  think it's safe to say that the entire community shares your desire for
>>  Metron to be as easy to use as possible and a "data analysis platform
for
>>  the masses." We should hold ourselves to a high standard, no doubt.
>>
>>  Casey
>>
>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
wrote:
>>
>>  > Please understand that my points mostly relate to perception and ease
of
>>  > use, not what's technically possible or available. I'm coming at this
as
>>  > Metron should be a data analysis platform for the masses.
>>  >
>>  > METRON-517/542 - While I'm willing to let this one go it depends on
your
>>  > definition of non-issue. I personally believe that data (in every
>>  location
>>  > that it exists) needs to be obvious and have ultra high integrity. I'm
>>  not
>>  > concerned that the correct data won't exist somewhere in the cluster,
I'm
>>  > focusing on it being easily accessible by an operations team that may
>>  > consist of entry level analysts. Once 517 is done and merged I would
>>  > consider that a short term mitigation is in place.
>>  >
>>  > I feel like the project should stick to certain principles and a
>>  suggestion
>>  > is that data access is easy, accurate, and obvious. Do we have
anything
>>  > like this that was agreed upon, discussed, or documented? Probably a
>>  > discussion for a different thread.
>>  >
>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue that
>>  and
>>  > resolving them would give a better first impression (assuming that
people
>>  > monitoring the project will start using it more once it's non-BETA
>>  > software). First impressions are big on my book and could affect
initial
>>  > adoption.
>>  >
>>  > Regarding 485 - Otto may be able to clarify but I thought somebody
else
>>  saw
>>  > this issue as well. I think the finger is currently being pointed at
>>  monit
>>  > timeouts and not storm. It also doesn't happen every single time, I
only
>>  > run into it while the cluster is under load and after dozens of
topology
>>  > restarts that I do when tuning parallelism in storm. I'm going to be
>>  > updating to storm 1.0.x in order to see if this still exists. Again,
>>  this
>>  > relates to ease of use/load testing/tuning.
>>  >
>>  > Agree with the upgrade comments - as long as it's supported at some
>>  defined
>>  > point (IMHO this is when a project leaves BETA but others are welcome
to
>>  > disagree).
>>  >
>>  > Finally, I know this doesn't come across well in email but I'm just
>>  > mentioning items which I think are important, not attempting to demand
>>  that
>>  > they be fixed or that this doesn't leave beta. Thanks,
>>  >
>>  > Jon
>>  >
>>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>>  >
>>  >
>>  > Hi Jon,
>>  >
>>  > Here are my thoughts around your objections.
>>  >
>>  > METRON-517/METRON-542
>>  >
>>  > I thin the mechanism currently exists within Metron to make this a
>>  > non-issue. I believe you can solve it with a combination of a Stellar
>>  > statement and ES templates. As you mentioned, we can truncate the
string
>>  > and then include the relevant meta data in the message (original
length,
>>  > hash, etc). Cramming really long strings into ES is generally a bad
>>  thing,
>>  > which is why this limitation exists. The metadata in the indexed
>>  message
>>  > along with the timestamp allows you to pull data from HDFS should you
>>  need
>>  > to recover the full string.
>>  >
>>  > METRON-485
>>  >
>>  > We cannot replicate this issue in our environment, but if this is
indeed
>>  an
>>  > issue this is an 

Re: [DISCUSS] Next Release Name

2016-11-09 Thread James Sirota
Guys,

You know, looking at the release I think the changes were significant enough 
due to the storm & kafka upgrade to justify moving it to a non-point release.  
Generally point releases are reserved for patches or maintenance releases.  I 
think this release is more than just a maintenance release.  I suggest we 
consider 0.3.0

04.11.2016, 18:27, "Kyle Richardson" :
> I'm a little late to the party but thought I would go ahead and throw my
> two cents into the mix.
>
> I share the concern around an upgrade / migration path. While I would love
> to see the BETA dropped sooner than later, to me, this is a game changer
> for people implementing Metron. I think there is a silent expectation of no
> data loss after dropping the BETA tag.
>
> Even if there is not a direct upgrade path for a few releases, is there
> documentation that we could provide to ensure a data migration path for
> users? I'm not thinking anything automated just some instructions on what
> to do.
>
> -Kyle
>
> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
>
>>  Jon,
>>
>>  Thank you for your thoughts; they are appreciated and you should keep them
>>  coming. This kind of discussion is exactly why I sent out this thread. I
>>  think it's safe to say that the entire community shares your desire for
>>  Metron to be as easy to use as possible and a "data analysis platform for
>>  the masses." We should hold ourselves to a high standard, no doubt.
>>
>>  Casey
>>
>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  wrote:
>>
>>  > Please understand that my points mostly relate to perception and ease of
>>  > use, not what's technically possible or available. I'm coming at this as
>>  > Metron should be a data analysis platform for the masses.
>>  >
>>  > METRON-517/542 - While I'm willing to let this one go it depends on your
>>  > definition of non-issue. I personally believe that data (in every
>>  location
>>  > that it exists) needs to be obvious and have ultra high integrity. I'm
>>  not
>>  > concerned that the correct data won't exist somewhere in the cluster, I'm
>>  > focusing on it being easily accessible by an operations team that may
>>  > consist of entry level analysts. Once 517 is done and merged I would
>>  > consider that a short term mitigation is in place.
>>  >
>>  > I feel like the project should stick to certain principles and a
>>  suggestion
>>  > is that data access is easy, accurate, and obvious. Do we have anything
>>  > like this that was agreed upon, discussed, or documented? Probably a
>>  > discussion for a different thread.
>>  >
>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue that
>>  and
>>  > resolving them would give a better first impression (assuming that people
>>  > monitoring the project will start using it more once it's non-BETA
>>  > software). First impressions are big on my book and could affect initial
>>  > adoption.
>>  >
>>  > Regarding 485 - Otto may be able to clarify but I thought somebody else
>>  saw
>>  > this issue as well. I think the finger is currently being pointed at
>>  monit
>>  > timeouts and not storm. It also doesn't happen every single time, I only
>>  > run into it while the cluster is under load and after dozens of topology
>>  > restarts that I do when tuning parallelism in storm. I'm going to be
>>  > updating to storm 1.0.x in order to see if this still exists. Again,
>>  this
>>  > relates to ease of use/load testing/tuning.
>>  >
>>  > Agree with the upgrade comments - as long as it's supported at some
>>  defined
>>  > point (IMHO this is when a project leaves BETA but others are welcome to
>>  > disagree).
>>  >
>>  > Finally, I know this doesn't come across well in email but I'm just
>>  > mentioning items which I think are important, not attempting to demand
>>  that
>>  > they be fixed or that this doesn't leave beta. Thanks,
>>  >
>>  > Jon
>>  >
>>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>>  >
>>  >
>>  > Hi Jon,
>>  >
>>  > Here are my thoughts around your objections.
>>  >
>>  > METRON-517/METRON-542
>>  >
>>  > I thin the mechanism currently exists within Metron to make this a
>>  > non-issue. I believe you can solve it with a combination of a Stellar
>>  > statement and ES templates. As you mentioned, we can truncate the string
>>  > and then include the relevant meta data in the message (original length,
>>  > hash, etc). Cramming really long strings into ES is generally a bad
>>  thing,
>>  > which is why this limitation exists. The metadata in the indexed
>>  message
>>  > along with the timestamp allows you to pull data from HDFS should you
>>  need
>>  > to recover the full string.
>>  >
>>  > METRON-485
>>  >
>>  > We cannot replicate this issue in our environment, but if this is indeed
>>  an
>>  > issue this is an issue with Storm. A Jira should be filed against Storm
>>  > and not against Metron. My hunch, though, is 

Re: [DISCUSS] Next Release Name

2016-11-05 Thread zeo...@gmail.com
Agreed.  I could also contribute to that doc.

On Sat, Nov 5, 2016, 11:41 Kyle Richardson 
wrote:

> Thanks, James. Very helpful information. Based on that, I agree the path is
> there and I have no issues with it being manual at this point. I would
> suggest we add a simple UPGRADING.md outining the steps you have with a
> little more detail to make it easy for the user. I'd be happy to take this
> on if folks agree it would be useful.
>
> -Kyle
>
> On Sat, Nov 5, 2016 at 7:56 AM, Casey Stella  wrote:
>
> > I agree. I think the upgrade path is clear however manual right now.
> Going
> > forward we will need to prioritize making it more automated, but I think
> > the path is there.
> >
> > On Sat, Nov 5, 2016 at 00:26 James Sirota  wrote:
> >
> > > Hi Kyle,
> > >
> > > The HDP upgrade guide can be found here:
> > >
> > > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/
> > bk_command-line-upgrade/content/ch_upgrade_2_4.html
> > >
> > > After executing these instructions you get to HDP 2.5 with no data
> loss.
> > > After that, upgrading Metron is as simple as saving the old configs, ES
> > > templates, grok statements from HDFS, and NiFi flows from your 0.2.1
> > build,
> > > installing 0.2.2 (via Ambari management pack), and putting the configs
> > back
> > > into zookeeper, copying the ES templates and Grok files back, and
> > > restarting your NiFi flows.  I agree that we should automate most of
> this
> > > eventually, and we will, but I don't think this is necessarily a show
> > > stopper for dropping BETA.  Would you agree?
> > >
> > > Thanks,
> > > James
> > >
> > > 04.11.2016, 18:27, "Kyle Richardson" :
> > > > I'm a little late to the party but thought I would go ahead and throw
> > my
> > > > two cents into the mix.
> > > >
> > > > I share the concern around an upgrade / migration path. While I would
> > > love
> > > > to see the BETA dropped sooner than later, to me, this is a game
> > changer
> > > > for people implementing Metron. I think there is a silent expectation
> > of
> > > no
> > > > data loss after dropping the BETA tag.
> > > >
> > > > Even if there is not a direct upgrade path for a few releases, is
> there
> > > > documentation that we could provide to ensure a data migration path
> for
> > > > users? I'm not thinking anything automated just some instructions on
> > what
> > > > to do.
> > > >
> > > > -Kyle
> > > >
> > > > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella 
> > wrote:
> > > >
> > > >>  Jon,
> > > >>
> > > >>  Thank you for your thoughts; they are appreciated and you should
> keep
> > > them
> > > >>  coming. This kind of discussion is exactly why I sent out this
> > thread.
> > > I
> > > >>  think it's safe to say that the entire community shares your desire
> > for
> > > >>  Metron to be as easy to use as possible and a "data analysis
> platform
> > > for
> > > >>  the masses." We should hold ourselves to a high standard, no doubt.
> > > >>
> > > >>  Casey
> > > >>
> > > >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  >
> > > wrote:
> > > >>
> > > >>  > Please understand that my points mostly relate to perception and
> > > ease of
> > > >>  > use, not what's technically possible or available. I'm coming at
> > > this as
> > > >>  > Metron should be a data analysis platform for the masses.
> > > >>  >
> > > >>  > METRON-517/542 - While I'm willing to let this one go it depends
> on
> > > your
> > > >>  > definition of non-issue. I personally believe that data (in every
> > > >>  location
> > > >>  > that it exists) needs to be obvious and have ultra high
> integrity.
> > > I'm
> > > >>  not
> > > >>  > concerned that the correct data won't exist somewhere in the
> > > cluster, I'm
> > > >>  > focusing on it being easily accessible by an operations team that
> > may
> > > >>  > consist of entry level analysts. Once 517 is done and merged I
> > would
> > > >>  > consider that a short term mitigation is in place.
> > > >>  >
> > > >>  > I feel like the project should stick to certain principles and a
> > > >>  suggestion
> > > >>  > is that data access is easy, accurate, and obvious. Do we have
> > > anything
> > > >>  > like this that was agreed upon, discussed, or documented?
> Probably
> > a
> > > >>  > discussion for a different thread.
> > > >>  >
> > > >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> > > that
> > > >>  and
> > > >>  > resolving them would give a better first impression (assuming
> that
> > > people
> > > >>  > monitoring the project will start using it more once it's
> non-BETA
> > > >>  > software). First impressions are big on my book and could affect
> > > initial
> > > >>  > adoption.
> > > >>  >
> > > >>  > Regarding 485 - Otto may be able to clarify but I thought
> somebody
> > > else
> > > >>  saw
> > > >>  > this issue as well. I think the finger is currently being pointed

Re: [DISCUSS] Next Release Name

2016-11-05 Thread Kyle Richardson
Thanks, James. Very helpful information. Based on that, I agree the path is
there and I have no issues with it being manual at this point. I would
suggest we add a simple UPGRADING.md outining the steps you have with a
little more detail to make it easy for the user. I'd be happy to take this
on if folks agree it would be useful.

-Kyle

On Sat, Nov 5, 2016 at 7:56 AM, Casey Stella  wrote:

> I agree. I think the upgrade path is clear however manual right now. Going
> forward we will need to prioritize making it more automated, but I think
> the path is there.
>
> On Sat, Nov 5, 2016 at 00:26 James Sirota  wrote:
>
> > Hi Kyle,
> >
> > The HDP upgrade guide can be found here:
> >
> > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/
> bk_command-line-upgrade/content/ch_upgrade_2_4.html
> >
> > After executing these instructions you get to HDP 2.5 with no data loss.
> > After that, upgrading Metron is as simple as saving the old configs, ES
> > templates, grok statements from HDFS, and NiFi flows from your 0.2.1
> build,
> > installing 0.2.2 (via Ambari management pack), and putting the configs
> back
> > into zookeeper, copying the ES templates and Grok files back, and
> > restarting your NiFi flows.  I agree that we should automate most of this
> > eventually, and we will, but I don't think this is necessarily a show
> > stopper for dropping BETA.  Would you agree?
> >
> > Thanks,
> > James
> >
> > 04.11.2016, 18:27, "Kyle Richardson" :
> > > I'm a little late to the party but thought I would go ahead and throw
> my
> > > two cents into the mix.
> > >
> > > I share the concern around an upgrade / migration path. While I would
> > love
> > > to see the BETA dropped sooner than later, to me, this is a game
> changer
> > > for people implementing Metron. I think there is a silent expectation
> of
> > no
> > > data loss after dropping the BETA tag.
> > >
> > > Even if there is not a direct upgrade path for a few releases, is there
> > > documentation that we could provide to ensure a data migration path for
> > > users? I'm not thinking anything automated just some instructions on
> what
> > > to do.
> > >
> > > -Kyle
> > >
> > > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella 
> wrote:
> > >
> > >>  Jon,
> > >>
> > >>  Thank you for your thoughts; they are appreciated and you should keep
> > them
> > >>  coming. This kind of discussion is exactly why I sent out this
> thread.
> > I
> > >>  think it's safe to say that the entire community shares your desire
> for
> > >>  Metron to be as easy to use as possible and a "data analysis platform
> > for
> > >>  the masses." We should hold ourselves to a high standard, no doubt.
> > >>
> > >>  Casey
> > >>
> > >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> > wrote:
> > >>
> > >>  > Please understand that my points mostly relate to perception and
> > ease of
> > >>  > use, not what's technically possible or available. I'm coming at
> > this as
> > >>  > Metron should be a data analysis platform for the masses.
> > >>  >
> > >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> > your
> > >>  > definition of non-issue. I personally believe that data (in every
> > >>  location
> > >>  > that it exists) needs to be obvious and have ultra high integrity.
> > I'm
> > >>  not
> > >>  > concerned that the correct data won't exist somewhere in the
> > cluster, I'm
> > >>  > focusing on it being easily accessible by an operations team that
> may
> > >>  > consist of entry level analysts. Once 517 is done and merged I
> would
> > >>  > consider that a short term mitigation is in place.
> > >>  >
> > >>  > I feel like the project should stick to certain principles and a
> > >>  suggestion
> > >>  > is that data access is easy, accurate, and obvious. Do we have
> > anything
> > >>  > like this that was agreed upon, discussed, or documented? Probably
> a
> > >>  > discussion for a different thread.
> > >>  >
> > >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> > that
> > >>  and
> > >>  > resolving them would give a better first impression (assuming that
> > people
> > >>  > monitoring the project will start using it more once it's non-BETA
> > >>  > software). First impressions are big on my book and could affect
> > initial
> > >>  > adoption.
> > >>  >
> > >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> > else
> > >>  saw
> > >>  > this issue as well. I think the finger is currently being pointed
> at
> > >>  monit
> > >>  > timeouts and not storm. It also doesn't happen every single time, I
> > only
> > >>  > run into it while the cluster is under load and after dozens of
> > topology
> > >>  > restarts that I do when tuning parallelism in storm. I'm going to
> be
> > >>  > updating to storm 1.0.x in order to see if this still exists.
> Again,
> > >>  this
> > >>  > relates to ease of 

Re: [DISCUSS] Next Release Name

2016-11-05 Thread Casey Stella
I agree. I think the upgrade path is clear however manual right now. Going
forward we will need to prioritize making it more automated, but I think
the path is there.

On Sat, Nov 5, 2016 at 00:26 James Sirota  wrote:

> Hi Kyle,
>
> The HDP upgrade guide can be found here:
>
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html
>
> After executing these instructions you get to HDP 2.5 with no data loss.
> After that, upgrading Metron is as simple as saving the old configs, ES
> templates, grok statements from HDFS, and NiFi flows from your 0.2.1 build,
> installing 0.2.2 (via Ambari management pack), and putting the configs back
> into zookeeper, copying the ES templates and Grok files back, and
> restarting your NiFi flows.  I agree that we should automate most of this
> eventually, and we will, but I don't think this is necessarily a show
> stopper for dropping BETA.  Would you agree?
>
> Thanks,
> James
>
> 04.11.2016, 18:27, "Kyle Richardson" :
> > I'm a little late to the party but thought I would go ahead and throw my
> > two cents into the mix.
> >
> > I share the concern around an upgrade / migration path. While I would
> love
> > to see the BETA dropped sooner than later, to me, this is a game changer
> > for people implementing Metron. I think there is a silent expectation of
> no
> > data loss after dropping the BETA tag.
> >
> > Even if there is not a direct upgrade path for a few releases, is there
> > documentation that we could provide to ensure a data migration path for
> > users? I'm not thinking anything automated just some instructions on what
> > to do.
> >
> > -Kyle
> >
> > On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
> >
> >>  Jon,
> >>
> >>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>  coming. This kind of discussion is exactly why I sent out this thread.
> I
> >>  think it's safe to say that the entire community shares your desire for
> >>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>  the masses." We should hold ourselves to a high standard, no doubt.
> >>
> >>  Casey
> >>
> >>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> wrote:
> >>
> >>  > Please understand that my points mostly relate to perception and
> ease of
> >>  > use, not what's technically possible or available. I'm coming at
> this as
> >>  > Metron should be a data analysis platform for the masses.
> >>  >
> >>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>  > definition of non-issue. I personally believe that data (in every
> >>  location
> >>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>  not
> >>  > concerned that the correct data won't exist somewhere in the
> cluster, I'm
> >>  > focusing on it being easily accessible by an operations team that may
> >>  > consist of entry level analysts. Once 517 is done and merged I would
> >>  > consider that a short term mitigation is in place.
> >>  >
> >>  > I feel like the project should stick to certain principles and a
> >>  suggestion
> >>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>  > like this that was agreed upon, discussed, or documented? Probably a
> >>  > discussion for a different thread.
> >>  >
> >>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>  and
> >>  > resolving them would give a better first impression (assuming that
> people
> >>  > monitoring the project will start using it more once it's non-BETA
> >>  > software). First impressions are big on my book and could affect
> initial
> >>  > adoption.
> >>  >
> >>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>  saw
> >>  > this issue as well. I think the finger is currently being pointed at
> >>  monit
> >>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>  > run into it while the cluster is under load and after dozens of
> topology
> >>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>  this
> >>  > relates to ease of use/load testing/tuning.
> >>  >
> >>  > Agree with the upgrade comments - as long as it's supported at some
> >>  defined
> >>  > point (IMHO this is when a project leaves BETA but others are
> welcome to
> >>  > disagree).
> >>  >
> >>  > Finally, I know this doesn't come across well in email but I'm just
> >>  > mentioning items which I think are important, not attempting to
> demand
> >>  that
> >>  > they be fixed or that this doesn't leave beta. Thanks,
> >>  >
> >>  > Jon
> >>  >
> >>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
> >>  >
> >>  >
> >>  > Hi Jon,
> >>  >
> >>  > Here are my thoughts around your objections.
> 

Re: [DISCUSS] Next Release Name

2016-11-05 Thread Casey Stella
Metron builds against Apache artifacts by default (storm 1.x, hbase 1.x,
Kafka 0.10), so the bits can run on other Hadoop installations that conform
to those versions, but our ansible uses HDP 2.5 as a base Hadoop. What
James meant was that upgrade instructions for Metron start with Hadoop
distribution upgrade instructions.

On Sat, Nov 5, 2016 at 00:53 Dima Kovalyov  wrote:

> Hello James,
>
> Does that mean Metron 0.2.2 goes with HDP 2.5 by default?
>
> - Dima
>
> On 11/05/2016 06:26 AM, James Sirota wrote:
> > Hi Kyle,
> >
> > The HDP upgrade guide can be found here:
> >
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html
> >
> > After executing these instructions you get to HDP 2.5 with no data
> loss.  After that, upgrading Metron is as simple as saving the old configs,
> ES templates, grok statements from HDFS, and NiFi flows from your 0.2.1
> build, installing 0.2.2 (via Ambari management pack), and putting the
> configs back into zookeeper, copying the ES templates and Grok files back,
> and restarting your NiFi flows.  I agree that we should automate most of
> this eventually, and we will, but I don't think this is necessarily a show
> stopper for dropping BETA.  Would you agree?
> >
> > Thanks,
> > James
> >
> > 04.11.2016, 18:27, "Kyle Richardson" :
> >> I'm a little late to the party but thought I would go ahead and throw my
> >> two cents into the mix.
> >>
> >> I share the concern around an upgrade / migration path. While I would
> love
> >> to see the BETA dropped sooner than later, to me, this is a game changer
> >> for people implementing Metron. I think there is a silent expectation
> of no
> >> data loss after dropping the BETA tag.
> >>
> >> Even if there is not a direct upgrade path for a few releases, is there
> >> documentation that we could provide to ensure a data migration path for
> >> users? I'm not thinking anything automated just some instructions on
> what
> >> to do.
> >>
> >> -Kyle
> >>
> >> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella 
> wrote:
> >>
> >>>  Jon,
> >>>
> >>>  Thank you for your thoughts; they are appreciated and you should keep
> them
> >>>  coming. This kind of discussion is exactly why I sent out this
> thread. I
> >>>  think it's safe to say that the entire community shares your desire
> for
> >>>  Metron to be as easy to use as possible and a "data analysis platform
> for
> >>>  the masses." We should hold ourselves to a high standard, no doubt.
> >>>
> >>>  Casey
> >>>
> >>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com 
> wrote:
> >>>
> >>>  > Please understand that my points mostly relate to perception and
> ease of
> >>>  > use, not what's technically possible or available. I'm coming at
> this as
> >>>  > Metron should be a data analysis platform for the masses.
> >>>  >
> >>>  > METRON-517/542 - While I'm willing to let this one go it depends on
> your
> >>>  > definition of non-issue. I personally believe that data (in every
> >>>  location
> >>>  > that it exists) needs to be obvious and have ultra high integrity.
> I'm
> >>>  not
> >>>  > concerned that the correct data won't exist somewhere in the
> cluster, I'm
> >>>  > focusing on it being easily accessible by an operations team that
> may
> >>>  > consist of entry level analysts. Once 517 is done and merged I would
> >>>  > consider that a short term mitigation is in place.
> >>>  >
> >>>  > I feel like the project should stick to certain principles and a
> >>>  suggestion
> >>>  > is that data access is easy, accurate, and obvious. Do we have
> anything
> >>>  > like this that was agreed upon, discussed, or documented? Probably a
> >>>  > discussion for a different thread.
> >>>  >
> >>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue
> that
> >>>  and
> >>>  > resolving them would give a better first impression (assuming that
> people
> >>>  > monitoring the project will start using it more once it's non-BETA
> >>>  > software). First impressions are big on my book and could affect
> initial
> >>>  > adoption.
> >>>  >
> >>>  > Regarding 485 - Otto may be able to clarify but I thought somebody
> else
> >>>  saw
> >>>  > this issue as well. I think the finger is currently being pointed at
> >>>  monit
> >>>  > timeouts and not storm. It also doesn't happen every single time, I
> only
> >>>  > run into it while the cluster is under load and after dozens of
> topology
> >>>  > restarts that I do when tuning parallelism in storm. I'm going to be
> >>>  > updating to storm 1.0.x in order to see if this still exists. Again,
> >>>  this
> >>>  > relates to ease of use/load testing/tuning.
> >>>  >
> >>>  > Agree with the upgrade comments - as long as it's supported at some
> >>>  defined
> >>>  > point (IMHO this is when a project leaves BETA but others are
> welcome to
> >>>  > disagree).
> >>>  >
> >>>  > Finally, I know 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread Dima Kovalyov
Hello James,

Does that mean Metron 0.2.2 goes with HDP 2.5 by default?

- Dima

On 11/05/2016 06:26 AM, James Sirota wrote:
> Hi Kyle,
>
> The HDP upgrade guide can be found here:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html
>
> After executing these instructions you get to HDP 2.5 with no data loss.  
> After that, upgrading Metron is as simple as saving the old configs, ES 
> templates, grok statements from HDFS, and NiFi flows from your 0.2.1 build, 
> installing 0.2.2 (via Ambari management pack), and putting the configs back 
> into zookeeper, copying the ES templates and Grok files back, and restarting 
> your NiFi flows.  I agree that we should automate most of this eventually, 
> and we will, but I don't think this is necessarily a show stopper for 
> dropping BETA.  Would you agree?
>
> Thanks,
> James 
>
> 04.11.2016, 18:27, "Kyle Richardson" :
>> I'm a little late to the party but thought I would go ahead and throw my
>> two cents into the mix.
>>
>> I share the concern around an upgrade / migration path. While I would love
>> to see the BETA dropped sooner than later, to me, this is a game changer
>> for people implementing Metron. I think there is a silent expectation of no
>> data loss after dropping the BETA tag.
>>
>> Even if there is not a direct upgrade path for a few releases, is there
>> documentation that we could provide to ensure a data migration path for
>> users? I'm not thinking anything automated just some instructions on what
>> to do.
>>
>> -Kyle
>>
>> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
>>
>>>  Jon,
>>>
>>>  Thank you for your thoughts; they are appreciated and you should keep them
>>>  coming. This kind of discussion is exactly why I sent out this thread. I
>>>  think it's safe to say that the entire community shares your desire for
>>>  Metron to be as easy to use as possible and a "data analysis platform for
>>>  the masses." We should hold ourselves to a high standard, no doubt.
>>>
>>>  Casey
>>>
>>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  wrote:
>>>
>>>  > Please understand that my points mostly relate to perception and ease of
>>>  > use, not what's technically possible or available. I'm coming at this as
>>>  > Metron should be a data analysis platform for the masses.
>>>  >
>>>  > METRON-517/542 - While I'm willing to let this one go it depends on your
>>>  > definition of non-issue. I personally believe that data (in every
>>>  location
>>>  > that it exists) needs to be obvious and have ultra high integrity. I'm
>>>  not
>>>  > concerned that the correct data won't exist somewhere in the cluster, I'm
>>>  > focusing on it being easily accessible by an operations team that may
>>>  > consist of entry level analysts. Once 517 is done and merged I would
>>>  > consider that a short term mitigation is in place.
>>>  >
>>>  > I feel like the project should stick to certain principles and a
>>>  suggestion
>>>  > is that data access is easy, accurate, and obvious. Do we have anything
>>>  > like this that was agreed upon, discussed, or documented? Probably a
>>>  > discussion for a different thread.
>>>  >
>>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue that
>>>  and
>>>  > resolving them would give a better first impression (assuming that people
>>>  > monitoring the project will start using it more once it's non-BETA
>>>  > software). First impressions are big on my book and could affect initial
>>>  > adoption.
>>>  >
>>>  > Regarding 485 - Otto may be able to clarify but I thought somebody else
>>>  saw
>>>  > this issue as well. I think the finger is currently being pointed at
>>>  monit
>>>  > timeouts and not storm. It also doesn't happen every single time, I only
>>>  > run into it while the cluster is under load and after dozens of topology
>>>  > restarts that I do when tuning parallelism in storm. I'm going to be
>>>  > updating to storm 1.0.x in order to see if this still exists. Again,
>>>  this
>>>  > relates to ease of use/load testing/tuning.
>>>  >
>>>  > Agree with the upgrade comments - as long as it's supported at some
>>>  defined
>>>  > point (IMHO this is when a project leaves BETA but others are welcome to
>>>  > disagree).
>>>  >
>>>  > Finally, I know this doesn't come across well in email but I'm just
>>>  > mentioning items which I think are important, not attempting to demand
>>>  that
>>>  > they be fixed or that this doesn't leave beta. Thanks,
>>>  >
>>>  > Jon
>>>  >
>>>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>>>  >
>>>  >
>>>  > Hi Jon,
>>>  >
>>>  > Here are my thoughts around your objections.
>>>  >
>>>  > METRON-517/METRON-542
>>>  >
>>>  > I thin the mechanism currently exists within Metron to make this a
>>>  > non-issue. I believe you can solve it with a combination of a Stellar
>>>  > statement and ES templates. As 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread James Sirota
Hi Kyle,

The HDP upgrade guide can be found here:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-upgrade/content/ch_upgrade_2_4.html

After executing these instructions you get to HDP 2.5 with no data loss.  After 
that, upgrading Metron is as simple as saving the old configs, ES templates, 
grok statements from HDFS, and NiFi flows from your 0.2.1 build, installing 
0.2.2 (via Ambari management pack), and putting the configs back into 
zookeeper, copying the ES templates and Grok files back, and restarting your 
NiFi flows.  I agree that we should automate most of this eventually, and we 
will, but I don't think this is necessarily a show stopper for dropping BETA.  
Would you agree?

Thanks,
James 

04.11.2016, 18:27, "Kyle Richardson" :
> I'm a little late to the party but thought I would go ahead and throw my
> two cents into the mix.
>
> I share the concern around an upgrade / migration path. While I would love
> to see the BETA dropped sooner than later, to me, this is a game changer
> for people implementing Metron. I think there is a silent expectation of no
> data loss after dropping the BETA tag.
>
> Even if there is not a direct upgrade path for a few releases, is there
> documentation that we could provide to ensure a data migration path for
> users? I'm not thinking anything automated just some instructions on what
> to do.
>
> -Kyle
>
> On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:
>
>>  Jon,
>>
>>  Thank you for your thoughts; they are appreciated and you should keep them
>>  coming. This kind of discussion is exactly why I sent out this thread. I
>>  think it's safe to say that the entire community shares your desire for
>>  Metron to be as easy to use as possible and a "data analysis platform for
>>  the masses." We should hold ourselves to a high standard, no doubt.
>>
>>  Casey
>>
>>  On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  wrote:
>>
>>  > Please understand that my points mostly relate to perception and ease of
>>  > use, not what's technically possible or available. I'm coming at this as
>>  > Metron should be a data analysis platform for the masses.
>>  >
>>  > METRON-517/542 - While I'm willing to let this one go it depends on your
>>  > definition of non-issue. I personally believe that data (in every
>>  location
>>  > that it exists) needs to be obvious and have ultra high integrity. I'm
>>  not
>>  > concerned that the correct data won't exist somewhere in the cluster, I'm
>>  > focusing on it being easily accessible by an operations team that may
>>  > consist of entry level analysts. Once 517 is done and merged I would
>>  > consider that a short term mitigation is in place.
>>  >
>>  > I feel like the project should stick to certain principles and a
>>  suggestion
>>  > is that data access is easy, accurate, and obvious. Do we have anything
>>  > like this that was agreed upon, discussed, or documented? Probably a
>>  > discussion for a different thread.
>>  >
>>  > METRON-485/470/etc. were mostly to illustrate a consistency issue that
>>  and
>>  > resolving them would give a better first impression (assuming that people
>>  > monitoring the project will start using it more once it's non-BETA
>>  > software). First impressions are big on my book and could affect initial
>>  > adoption.
>>  >
>>  > Regarding 485 - Otto may be able to clarify but I thought somebody else
>>  saw
>>  > this issue as well. I think the finger is currently being pointed at
>>  monit
>>  > timeouts and not storm. It also doesn't happen every single time, I only
>>  > run into it while the cluster is under load and after dozens of topology
>>  > restarts that I do when tuning parallelism in storm. I'm going to be
>>  > updating to storm 1.0.x in order to see if this still exists. Again,
>>  this
>>  > relates to ease of use/load testing/tuning.
>>  >
>>  > Agree with the upgrade comments - as long as it's supported at some
>>  defined
>>  > point (IMHO this is when a project leaves BETA but others are welcome to
>>  > disagree).
>>  >
>>  > Finally, I know this doesn't come across well in email but I'm just
>>  > mentioning items which I think are important, not attempting to demand
>>  that
>>  > they be fixed or that this doesn't leave beta. Thanks,
>>  >
>>  > Jon
>>  >
>>  > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>>  >
>>  >
>>  > Hi Jon,
>>  >
>>  > Here are my thoughts around your objections.
>>  >
>>  > METRON-517/METRON-542
>>  >
>>  > I thin the mechanism currently exists within Metron to make this a
>>  > non-issue. I believe you can solve it with a combination of a Stellar
>>  > statement and ES templates. As you mentioned, we can truncate the string
>>  > and then include the relevant meta data in the message (original length,
>>  > hash, etc). Cramming really long strings into ES is generally a bad
>>  thing,
>>  > which is why this limitation exists. 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread Kyle Richardson
I'm a little late to the party but thought I would go ahead and throw my
two cents into the mix.

I share the concern around an upgrade / migration path. While I would love
to see the BETA dropped sooner than later, to me, this is a game changer
for people implementing Metron. I think there is a silent expectation of no
data loss after dropping the BETA tag.

Even if there is not a direct upgrade path for a few releases, is there
documentation that we could provide to ensure a data migration path for
users? I'm not thinking anything automated just some instructions on what
to do.

-Kyle

On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella  wrote:

> Jon,
>
> Thank you for your thoughts; they are appreciated and you should keep them
> coming.  This kind of discussion is exactly why I sent out this thread.  I
> think it's safe to say that the entire community shares your desire for
> Metron to be as easy to use as possible and a "data analysis platform for
> the masses."  We should hold ourselves to a high standard, no doubt.
>
> Casey
>
> On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  wrote:
>
> > Please understand that my points mostly relate to perception and ease of
> > use, not what's technically possible or available.  I'm coming at this as
> > Metron should be a data analysis platform for the masses.
> >
> > METRON-517/542 - While I'm willing to let this one go it depends on your
> > definition of non-issue.  I personally believe that data (in every
> location
> > that it exists) needs to be obvious and have ultra high integrity.  I'm
> not
> > concerned that the correct data won't exist somewhere in the cluster, I'm
> > focusing on it being easily accessible by an operations team that may
> > consist of entry level analysts.  Once 517 is done and merged I would
> > consider that a short term mitigation is in place.
> >
> > I feel like the project should stick to certain principles and a
> suggestion
> > is that data access is easy, accurate, and obvious. Do we have anything
> > like this that was agreed upon, discussed, or documented? Probably a
> > discussion for a different thread.
> >
> > METRON-485/470/etc. were mostly to illustrate a consistency issue that
> and
> > resolving them would give a better first impression (assuming that people
> > monitoring the project will start using it more once it's non-BETA
> > software).  First impressions are big on my book and could affect initial
> > adoption.
> >
> > Regarding 485 - Otto may be able to clarify but I thought somebody else
> saw
> > this issue as well.  I think the finger is currently being pointed at
> monit
> > timeouts and not storm.  It also doesn't happen every single time, I only
> > run into it while the cluster is under load and after dozens of topology
> > restarts that I do when tuning parallelism in storm.  I'm going to be
> > updating to storm 1.0.x in order to see if this still exists.  Again,
> this
> > relates to ease of use/load testing/tuning.
> >
> > Agree with the upgrade comments - as long as it's supported at some
> defined
> > point (IMHO this is when a project leaves BETA but others are welcome to
> > disagree).
> >
> > Finally, I know this doesn't come across well in email but I'm just
> > mentioning items which I think are important, not attempting to demand
> that
> > they be fixed or that this doesn't leave beta.  Thanks,
> >
> > Jon
> >
> > On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
> >
> >
> > Hi Jon,
> >
> > Here are my thoughts around your objections.
> >
> > METRON-517/METRON-542
> >
> > I thin the mechanism currently exists within Metron to make this a
> > non-issue.  I believe you can solve it with a combination of a Stellar
> > statement and ES templates.  As you mentioned, we can truncate the string
> > and then include the relevant meta data in the message (original length,
> > hash, etc).  Cramming really long strings into ES is generally a bad
> thing,
> > which is why this limitation exists.   The metadata in the indexed
> message
> > along with the timestamp allows you to pull data from HDFS should you
> need
> > to recover the full string.
> >
> > METRON-485
> >
> > We cannot replicate this issue in our environment, but if this is indeed
> an
> > issue this is an issue with Storm.  A Jira should be filed against Storm
> > and not against Metron.  My hunch, though, is that it's probably
> something
> > in your environment.  I just tried stopping all topologies on my AWS
> > cluster and then went to all Storm nodes and didn't see any workers left
> > behind.
> >
> > METRON-470
> >
> > I think this is mainly a consistency issue.  I don't think this impacts
> the
> > stability or function of the software.  I think this is a nice to have,
> > maybe in the next few releases, but I don't think we absolutely have to
> > have this to drop BETA
> >
> > With respect to upgrades, here are my thoughts.  There is really no way
> to
> > upgrade Metron 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread Casey Stella
Jon,

Thank you for your thoughts; they are appreciated and you should keep them
coming.  This kind of discussion is exactly why I sent out this thread.  I
think it's safe to say that the entire community shares your desire for
Metron to be as easy to use as possible and a "data analysis platform for
the masses."  We should hold ourselves to a high standard, no doubt.

Casey

On Fri, Nov 4, 2016 at 6:30 AM, zeo...@gmail.com  wrote:

> Please understand that my points mostly relate to perception and ease of
> use, not what's technically possible or available.  I'm coming at this as
> Metron should be a data analysis platform for the masses.
>
> METRON-517/542 - While I'm willing to let this one go it depends on your
> definition of non-issue.  I personally believe that data (in every location
> that it exists) needs to be obvious and have ultra high integrity.  I'm not
> concerned that the correct data won't exist somewhere in the cluster, I'm
> focusing on it being easily accessible by an operations team that may
> consist of entry level analysts.  Once 517 is done and merged I would
> consider that a short term mitigation is in place.
>
> I feel like the project should stick to certain principles and a suggestion
> is that data access is easy, accurate, and obvious. Do we have anything
> like this that was agreed upon, discussed, or documented? Probably a
> discussion for a different thread.
>
> METRON-485/470/etc. were mostly to illustrate a consistency issue that and
> resolving them would give a better first impression (assuming that people
> monitoring the project will start using it more once it's non-BETA
> software).  First impressions are big on my book and could affect initial
> adoption.
>
> Regarding 485 - Otto may be able to clarify but I thought somebody else saw
> this issue as well.  I think the finger is currently being pointed at monit
> timeouts and not storm.  It also doesn't happen every single time, I only
> run into it while the cluster is under load and after dozens of topology
> restarts that I do when tuning parallelism in storm.  I'm going to be
> updating to storm 1.0.x in order to see if this still exists.  Again, this
> relates to ease of use/load testing/tuning.
>
> Agree with the upgrade comments - as long as it's supported at some defined
> point (IMHO this is when a project leaves BETA but others are welcome to
> disagree).
>
> Finally, I know this doesn't come across well in email but I'm just
> mentioning items which I think are important, not attempting to demand that
> they be fixed or that this doesn't leave beta.  Thanks,
>
> Jon
>
> On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:
>
>
> Hi Jon,
>
> Here are my thoughts around your objections.
>
> METRON-517/METRON-542
>
> I thin the mechanism currently exists within Metron to make this a
> non-issue.  I believe you can solve it with a combination of a Stellar
> statement and ES templates.  As you mentioned, we can truncate the string
> and then include the relevant meta data in the message (original length,
> hash, etc).  Cramming really long strings into ES is generally a bad thing,
> which is why this limitation exists.   The metadata in the indexed message
> along with the timestamp allows you to pull data from HDFS should you need
> to recover the full string.
>
> METRON-485
>
> We cannot replicate this issue in our environment, but if this is indeed an
> issue this is an issue with Storm.  A Jira should be filed against Storm
> and not against Metron.  My hunch, though, is that it's probably something
> in your environment.  I just tried stopping all topologies on my AWS
> cluster and then went to all Storm nodes and didn't see any workers left
> behind.
>
> METRON-470
>
> I think this is mainly a consistency issue.  I don't think this impacts the
> stability or function of the software.  I think this is a nice to have,
> maybe in the next few releases, but I don't think we absolutely have to
> have this to drop BETA
>
> With respect to upgrades, here are my thoughts.  There is really no way to
> upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a change
> of HDP.  The new build will only be compatible with HDP 2.5 and not 2.4.
> So you have to lay down a new cluster regardless.  We can document how to
> get the configs off of your old Metron and plug them into your new Metron
> so that it works the same.  That shouldn't be a problem.
>
> Our upgrade path for future releases will revolve around the Ambari Metron
> management pack that is available with the upcoming build.  Right now the
> install capability is available and the upgrade capability will come in
> incrementally within the next few release.  We will additionally deprecate
> Monit and switch that functionality to Ambari as well.  Finally, we will
> also use Ambari for metrics monitoring.  There is lots to do so we will
> triage and prioritize Jiras as a community to see which parts we want to
> 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread Otto Fowler
RE- METRON-485

I believe that there are a couple of issues here.

1.  We don’t use the -w timeout parameter when killing the topologies,
which means technically we may not get out cleanly.  We should change this.
2. Beyond the storm timeouts monit itself has timeouts and will ‘kill’ the
scripts itself if they don’t complete.

I believe that I have seen this happening in resource constrained testing
done with the Storm 1.0 work.

2 competing timeouts/settings here are a real yellow flag.

If the long term fix is the move from monit to ambari, that is fine.  But
in the mean time, getting something in to make this issue better ( along
with other work done for the quick and full recently ) is worth doing in my
opinion.

On November 4, 2016 at 06:31:04, zeo...@gmail.com (zeo...@gmail.com) wrote:

Please understand that my points mostly relate to perception and ease of
use, not what's technically possible or available. I'm coming at this as
Metron should be a data analysis platform for the masses.

METRON-517/542 - While I'm willing to let this one go it depends on your
definition of non-issue. I personally believe that data (in every location
that it exists) needs to be obvious and have ultra high integrity. I'm not
concerned that the correct data won't exist somewhere in the cluster, I'm
focusing on it being easily accessible by an operations team that may
consist of entry level analysts. Once 517 is done and merged I would
consider that a short term mitigation is in place.

I feel like the project should stick to certain principles and a suggestion
is that data access is easy, accurate, and obvious. Do we have anything
like this that was agreed upon, discussed, or documented? Probably a
discussion for a different thread.

METRON-485/470/etc. were mostly to illustrate a consistency issue that and
resolving them would give a better first impression (assuming that people
monitoring the project will start using it more once it's non-BETA
software). First impressions are big on my book and could affect initial
adoption.

Regarding 485 - Otto may be able to clarify but I thought somebody else saw
this issue as well. I think the finger is currently being pointed at monit
timeouts and not storm. It also doesn't happen every single time, I only
run into it while the cluster is under load and after dozens of topology
restarts that I do when tuning parallelism in storm. I'm going to be
updating to storm 1.0.x in order to see if this still exists. Again, this
relates to ease of use/load testing/tuning.

Agree with the upgrade comments - as long as it's supported at some defined
point (IMHO this is when a project leaves BETA but others are welcome to
disagree).

Finally, I know this doesn't come across well in email but I'm just
mentioning items which I think are important, not attempting to demand that
they be fixed or that this doesn't leave beta. Thanks,

Jon

On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:


Hi Jon,

Here are my thoughts around your objections.

METRON-517/METRON-542

I thin the mechanism currently exists within Metron to make this a
non-issue. I believe you can solve it with a combination of a Stellar
statement and ES templates. As you mentioned, we can truncate the string
and then include the relevant meta data in the message (original length,
hash, etc). Cramming really long strings into ES is generally a bad thing,
which is why this limitation exists. The metadata in the indexed message
along with the timestamp allows you to pull data from HDFS should you need
to recover the full string.

METRON-485

We cannot replicate this issue in our environment, but if this is indeed an
issue this is an issue with Storm. A Jira should be filed against Storm
and not against Metron. My hunch, though, is that it's probably something
in your environment. I just tried stopping all topologies on my AWS
cluster and then went to all Storm nodes and didn't see any workers left
behind.

METRON-470

I think this is mainly a consistency issue. I don't think this impacts the
stability or function of the software. I think this is a nice to have,
maybe in the next few releases, but I don't think we absolutely have to
have this to drop BETA

With respect to upgrades, here are my thoughts. There is really no way to
upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a change
of HDP. The new build will only be compatible with HDP 2.5 and not 2.4.
So you have to lay down a new cluster regardless. We can document how to
get the configs off of your old Metron and plug them into your new Metron
so that it works the same. That shouldn't be a problem.

Our upgrade path for future releases will revolve around the Ambari Metron
management pack that is available with the upcoming build. Right now the
install capability is available and the upgrade capability will come in
incrementally within the next few release. We will additionally deprecate
Monit and switch that functionality to Ambari as well. 

Re: [DISCUSS] Next Release Name

2016-11-04 Thread zeo...@gmail.com
Please understand that my points mostly relate to perception and ease of
use, not what's technically possible or available.  I'm coming at this as
Metron should be a data analysis platform for the masses.

METRON-517/542 - While I'm willing to let this one go it depends on your
definition of non-issue.  I personally believe that data (in every location
that it exists) needs to be obvious and have ultra high integrity.  I'm not
concerned that the correct data won't exist somewhere in the cluster, I'm
focusing on it being easily accessible by an operations team that may
consist of entry level analysts.  Once 517 is done and merged I would
consider that a short term mitigation is in place.

I feel like the project should stick to certain principles and a suggestion
is that data access is easy, accurate, and obvious. Do we have anything
like this that was agreed upon, discussed, or documented? Probably a
discussion for a different thread.

METRON-485/470/etc. were mostly to illustrate a consistency issue that and
resolving them would give a better first impression (assuming that people
monitoring the project will start using it more once it's non-BETA
software).  First impressions are big on my book and could affect initial
adoption.

Regarding 485 - Otto may be able to clarify but I thought somebody else saw
this issue as well.  I think the finger is currently being pointed at monit
timeouts and not storm.  It also doesn't happen every single time, I only
run into it while the cluster is under load and after dozens of topology
restarts that I do when tuning parallelism in storm.  I'm going to be
updating to storm 1.0.x in order to see if this still exists.  Again, this
relates to ease of use/load testing/tuning.

Agree with the upgrade comments - as long as it's supported at some defined
point (IMHO this is when a project leaves BETA but others are welcome to
disagree).

Finally, I know this doesn't come across well in email but I'm just
mentioning items which I think are important, not attempting to demand that
they be fixed or that this doesn't leave beta.  Thanks,

Jon

On Thu, Nov 3, 2016, 16:44 James Sirota  wrote:


Hi Jon,

Here are my thoughts around your objections.

METRON-517/METRON-542

I thin the mechanism currently exists within Metron to make this a
non-issue.  I believe you can solve it with a combination of a Stellar
statement and ES templates.  As you mentioned, we can truncate the string
and then include the relevant meta data in the message (original length,
hash, etc).  Cramming really long strings into ES is generally a bad thing,
which is why this limitation exists.   The metadata in the indexed message
along with the timestamp allows you to pull data from HDFS should you need
to recover the full string.

METRON-485

We cannot replicate this issue in our environment, but if this is indeed an
issue this is an issue with Storm.  A Jira should be filed against Storm
and not against Metron.  My hunch, though, is that it's probably something
in your environment.  I just tried stopping all topologies on my AWS
cluster and then went to all Storm nodes and didn't see any workers left
behind.

METRON-470

I think this is mainly a consistency issue.  I don't think this impacts the
stability or function of the software.  I think this is a nice to have,
maybe in the next few releases, but I don't think we absolutely have to
have this to drop BETA

With respect to upgrades, here are my thoughts.  There is really no way to
upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a change
of HDP.  The new build will only be compatible with HDP 2.5 and not 2.4.
So you have to lay down a new cluster regardless.  We can document how to
get the configs off of your old Metron and plug them into your new Metron
so that it works the same.  That shouldn't be a problem.

Our upgrade path for future releases will revolve around the Ambari Metron
management pack that is available with the upcoming build.  Right now the
install capability is available and the upgrade capability will come in
incrementally within the next few release.  We will additionally deprecate
Monit and switch that functionality to Ambari as well.  Finally, we will
also use Ambari for metrics monitoring.  There is lots to do so we will
triage and prioritize Jiras as a community to see which parts we want to
tackle first.  This is why your participation in the community is so
valuable.

Thanks,
James



03.11.2016, 11:07, "zeo...@gmail.com" :
> I agree that we can split METRON-517 into a short term and long term fix.
> I have attempted to organize my thoughts regarding the long term fix into
> METRON-542 and can get a PR out for METRON-517 soon to close that out.
>
> This leaves cluster tuning and a valid upgrade path for users, the latter
of
> which is my predominant concern. If the team is willing to say that
> starting with 0.2.2 there will be a valid upgrade path to future releases
I
> think 

Re: [DISCUSS] Next Release Name

2016-11-03 Thread James Sirota

Hi Jon,

Here are my thoughts around your objections.  

METRON-517/METRON-542

I thin the mechanism currently exists within Metron to make this a non-issue.  
I believe you can solve it with a combination of a Stellar statement and ES 
templates.  As you mentioned, we can truncate the string and then include the 
relevant meta data in the message (original length, hash, etc).  Cramming 
really long strings into ES is generally a bad thing, which is why this 
limitation exists.   The metadata in the indexed message along with the 
timestamp allows you to pull data from HDFS should you need to recover the full 
string.  

METRON-485

We cannot replicate this issue in our environment, but if this is indeed an 
issue this is an issue with Storm.  A Jira should be filed against Storm and 
not against Metron.  My hunch, though, is that it's probably something in your 
environment.  I just tried stopping all topologies on my AWS cluster and then 
went to all Storm nodes and didn't see any workers left behind.

METRON-470

I think this is mainly a consistency issue.  I don't think this impacts the 
stability or function of the software.  I think this is a nice to have, maybe 
in the next few releases, but I don't think we absolutely have to have this to 
drop BETA

With respect to upgrades, here are my thoughts.  There is really no way to 
upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a change of 
HDP.  The new build will only be compatible with HDP 2.5 and not 2.4.  So you 
have to lay down a new cluster regardless.  We can document how to get the 
configs off of your old Metron and plug them into your new Metron so that it 
works the same.  That shouldn't be a problem.  

Our upgrade path for future releases will revolve around the Ambari Metron 
management pack that is available with the upcoming build.  Right now the 
install capability is available and the upgrade capability will come in 
incrementally within the next few release.  We will additionally deprecate 
Monit and switch that functionality to Ambari as well.  Finally, we will also 
use Ambari for metrics monitoring.  There is lots to do so we will triage and 
prioritize Jiras as a community to see which parts we want to tackle first.  
This is why your participation in the community is so valuable.  

Thanks,
James 



03.11.2016, 11:07, "zeo...@gmail.com" :
> I agree that we can split METRON-517 into a short term and long term fix.
> I have attempted to organize my thoughts regarding the long term fix into
> METRON-542 and can get a PR out for METRON-517 soon to close that out.
>
> This leaves cluster tuning and a valid upgrade path for users, the later of
> which is my predominant concern. If the team is willing to say that
> starting with 0.2.2 there will be a valid upgrade path to future releases I
> think that removing the BETA tag at 0.2.2 is reasonable. That said, this
> is just following my perception of what the BETA tag represents.
>
> Jon
>
> On Thu, Nov 3, 2016 at 11:50 AM Casey Stella  wrote:
>
>>  Ok, regarding METRON-517, I've thought about this a bit having read your
>>  really great and detailed JIRA as well as the discussion around this on the
>>  dev list between you and Matt Foley. I want to separate the discussion
>>  between what is the correct long-term solution for this issue versus what
>>  is an acceptable solution.
>>
>>  In terms of an acceptable work-around, my opinion is that because we allow
>>  the user to modify the ES template they can
>>
>> - Adjust the template to specify ignore_above
>> <
>>  
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
>>  >
>>  on
>> fields which they feel are likely to be large (maybe every string field)
>> - The combination of timestamp and ip_src_addr should be sufficient for
>> picking out the raw data in question from the HDFS store
>> - A stellar enrichment can be used to tag the messages with large URIs
>> and that can factor into the threat triage even or be used to filter in
>> kibana
>> - As you say, you can use the profiler to track counts of such messages
>> if you so desire and factor that into threat alerting or filtering in
>> kibana.
>>
>>  Ultimately, I believe we have exposed the appropriate set of tooling to
>>  provide an acceptable solution for the moment. Now, as for the best
>>  long-term solution, I will let the good discussion on the mailing list and
>>  JIRA continue and contribute my thoughts on the JIRA
>>  .
>>
>>  Of course, this is just $0.02 :)
>>
>>  Apologies to Dave, I wanted to mark this aspect of the discussion on this
>>  thread as it is relevant to sufficient criteria to remove the BETA tag.
>>
>>  Best,
>>
>>  Casey
>>
>>  On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com  wrote:
>>
>>  > To clarify, it only needs to truncate fields > 32766 which 

Re: [DISCUSS] Next Release Name

2016-11-03 Thread zeo...@gmail.com
I agree that we can split METRON-517 into a short term and long term fix.
I have attempted to organize my thoughts regarding the long term fix into
METRON-542 and can get a PR out for METRON-517 soon to close that out.

This leaves cluster tuning and a valid upgrade path for users, the later of
which is my predominant concern.  If the team is willing to say that
starting with 0.2.2 there will be a valid upgrade path to future releases I
think that removing the BETA tag at 0.2.2 is reasonable.  That said, this
is just following my perception of what the BETA tag represents.

Jon

On Thu, Nov 3, 2016 at 11:50 AM Casey Stella  wrote:

> Ok, regarding METRON-517, I've thought about this a bit having read your
> really great and detailed JIRA as well as the discussion around this on the
> dev list between you and Matt Foley.  I want to separate the discussion
> between what is the correct long-term solution for this issue versus what
> is an acceptable solution.
>
> In terms of an acceptable work-around, my opinion is that because we allow
> the user to modify the ES template they can
>
>- Adjust the template to specify ignore_above
><
> https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
> >
> on
>fields which they feel are likely to be large (maybe every string field)
>- The combination of timestamp and ip_src_addr should be sufficient for
>picking out the raw data in question from the HDFS store
>- A stellar enrichment can be used to tag the messages with large URIs
>and that can factor into the threat triage even or be used to filter in
>kibana
>- As you say, you can use the profiler to track counts of such messages
>if you so desire and factor that into threat alerting or filtering in
>kibana.
>
> Ultimately, I believe we have exposed the appropriate set of tooling to
> provide an acceptable solution for the moment.  Now, as for the best
> long-term solution, I will let the good discussion on the mailing list and
> JIRA continue and contribute my thoughts on the JIRA
> .
>
> Of course, this is just $0.02 :)
>
> Apologies to Dave, I wanted to mark this aspect of the discussion on this
> thread as it is relevant to sufficient criteria to remove the BETA tag.
>
> Best,
>
> Casey
>
> On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com  wrote:
>
> > To clarify, it only needs to truncate fields > 32766 which need a
> > full/exact string match search to be run on them (analyzed fields
> generally
> > would not hit this limitation but I guess in theory they could).
> However,
> > that's probably every field which can get > 32766 because I'm assuming
> > those will all be strings.
> >
> > I also think using the profiler to monitor the truncation action could
> be a
> > useful default.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com  wrote:
> >
> > > That would break searching on uri entirely unless you queried and knew
> to
> > > truncate at 32766 because it's not analyzed.  I don't like pushing that
> > > complication to the end user.
> > >
> > > I would suggest truncation in the indexingBolt (not using stellar
> because
> > > you'd want this across the board) for all fields > 32766 (how do we
> make
> > > sure this gets updated if the limitation changes in Lucene?) and adding
> > > metadata key-value pairs (pre-trunc length, hash, truncated bool,
> etc.).
> > > In the URI scenario I would also suggest doing a multifield mapping by
> > > default because of the way that data is useful (not sure which analyser
> > to
> > > use though - maybe write or find a good URI analyzer?).  Since
> timestamp
> > is
> > > a required field for all messages (I'm pretty sure?) I'm ok with
> > timestamp
> > > and field value used as the UID, but would prefer something better.
> > >
> > > Jon
> > >
> > > On Wed, Nov 2, 2016, 20:33 James Sirota  wrote:
> > >
> > > Jon,
> > >
> > > For METRON-517 would it suffice to have a stellar statement to take a
> URI
> > > string and truncate it to length of 32766 in the ES writer?  But still
> > > write the actual string to HDFS? You can then search against ES on the
> > > truncated portion, but retrieve the actual timestamp from HDFS.  It's
> > easy
> > > to do because you know the timestamp from the original message.  So you
> > > know which logs in HDFS to search through to find the data.
> > >
> > > 02.11.2016, 14:12, "zeo...@gmail.com" :
> > > > I personally would like to see the following things done before
> things
> > > > leave BETA:
> > > > (1) Address data integrity concerns (Specifically thinking of
> > METRON-370,
> > > > METRON-517)
> > > > (2) Make cluster tuning easier and more consistent (METRON-485,
> > > METRON-470,
> > > > and the "[DISCUSS] moving parsers back to flux" which I can't find a
> > JIRA
> > > > for).
> > > >
> > > > I would also want to 

Re: [DISCUSS] Next Release Name

2016-11-03 Thread Casey Stella
Ok, regarding METRON-517, I've thought about this a bit having read your
really great and detailed JIRA as well as the discussion around this on the
dev list between you and Matt Foley.  I want to separate the discussion
between what is the correct long-term solution for this issue versus what
is an acceptable solution.

In terms of an acceptable work-around, my opinion is that because we allow
the user to modify the ES template they can

   - Adjust the template to specify ignore_above
   

on
   fields which they feel are likely to be large (maybe every string field)
   - The combination of timestamp and ip_src_addr should be sufficient for
   picking out the raw data in question from the HDFS store
   - A stellar enrichment can be used to tag the messages with large URIs
   and that can factor into the threat triage even or be used to filter in
   kibana
   - As you say, you can use the profiler to track counts of such messages
   if you so desire and factor that into threat alerting or filtering in
   kibana.

Ultimately, I believe we have exposed the appropriate set of tooling to
provide an acceptable solution for the moment.  Now, as for the best
long-term solution, I will let the good discussion on the mailing list and
JIRA continue and contribute my thoughts on the JIRA
.

Of course, this is just $0.02 :)

Apologies to Dave, I wanted to mark this aspect of the discussion on this
thread as it is relevant to sufficient criteria to remove the BETA tag.

Best,

Casey

On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com  wrote:

> To clarify, it only needs to truncate fields > 32766 which need a
> full/exact string match search to be run on them (analyzed fields generally
> would not hit this limitation but I guess in theory they could).  However,
> that's probably every field which can get > 32766 because I'm assuming
> those will all be strings.
>
> I also think using the profiler to monitor the truncation action could be a
> useful default.
>
> Jon
>
> On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com  wrote:
>
> > That would break searching on uri entirely unless you queried and knew to
> > truncate at 32766 because it's not analyzed.  I don't like pushing that
> > complication to the end user.
> >
> > I would suggest truncation in the indexingBolt (not using stellar because
> > you'd want this across the board) for all fields > 32766 (how do we make
> > sure this gets updated if the limitation changes in Lucene?) and adding
> > metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.).
> > In the URI scenario I would also suggest doing a multifield mapping by
> > default because of the way that data is useful (not sure which analyser
> to
> > use though - maybe write or find a good URI analyzer?).  Since timestamp
> is
> > a required field for all messages (I'm pretty sure?) I'm ok with
> timestamp
> > and field value used as the UID, but would prefer something better.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016, 20:33 James Sirota  wrote:
> >
> > Jon,
> >
> > For METRON-517 would it suffice to have a stellar statement to take a URI
> > string and truncate it to length of 32766 in the ES writer?  But still
> > write the actual string to HDFS? You can then search against ES on the
> > truncated portion, but retrieve the actual timestamp from HDFS.  It's
> easy
> > to do because you know the timestamp from the original message.  So you
> > know which logs in HDFS to search through to find the data.
> >
> > 02.11.2016, 14:12, "zeo...@gmail.com" :
> > > I personally would like to see the following things done before things
> > > leave BETA:
> > > (1) Address data integrity concerns (Specifically thinking of
> METRON-370,
> > > METRON-517)
> > > (2) Make cluster tuning easier and more consistent (METRON-485,
> > METRON-470,
> > > and the "[DISCUSS] moving parsers back to flux" which I can't find a
> JIRA
> > > for).
> > >
> > > I would also want to see the upgrade path (as opposed to rebuild) be
> more
> > > thoroughly and regularly tested once things leave BETA. From my
> > > perspective I think the project is very close but not yet ready.
> > >
> > > Jon
> > >
> > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella 
> wrote:
> > >
> > > Hello Everyone,
> > >
> > > Now that the discussion around the next release has started, it has
> been
> > > proposed and I think it's a good time to discuss what to name this next
> > > release. Before, we have adopted the BETA suffix. I think it might be
> > > time to drop it and call the next release 0.2.2
> > >
> > > Thoughts?
> > >
> > > Best,
> > >
> > > Casey
> > >
> > > --
> > >
> > > Jon
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> > --
> >
> > Jon
> 

Re: [DISCUSS] Next Release Name

2016-11-03 Thread David Lyle
Hi,

This is in interesting discussion. Would you mind moving it to the jira or
it's own DISCUSS thread?

Thanks!

-D...


On Thu, Nov 3, 2016 at 7:26 AM, zeo...@gmail.com  wrote:

> To clarify, it only needs to truncate fields > 32766 which need a
> full/exact string match search to be run on them (analyzed fields generally
> would not hit this limitation but I guess in theory they could).  However,
> that's probably every field which can get > 32766 because I'm assuming
> those will all be strings.
>
> I also think using the profiler to monitor the truncation action could be a
> useful default.
>
> Jon
>
> On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com  wrote:
>
> > That would break searching on uri entirely unless you queried and knew to
> > truncate at 32766 because it's not analyzed.  I don't like pushing that
> > complication to the end user.
> >
> > I would suggest truncation in the indexingBolt (not using stellar because
> > you'd want this across the board) for all fields > 32766 (how do we make
> > sure this gets updated if the limitation changes in Lucene?) and adding
> > metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.).
> > In the URI scenario I would also suggest doing a multifield mapping by
> > default because of the way that data is useful (not sure which analyser
> to
> > use though - maybe write or find a good URI analyzer?).  Since timestamp
> is
> > a required field for all messages (I'm pretty sure?) I'm ok with
> timestamp
> > and field value used as the UID, but would prefer something better.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016, 20:33 James Sirota  wrote:
> >
> > Jon,
> >
> > For METRON-517 would it suffice to have a stellar statement to take a URI
> > string and truncate it to length of 32766 in the ES writer?  But still
> > write the actual string to HDFS? You can then search against ES on the
> > truncated portion, but retrieve the actual timestamp from HDFS.  It's
> easy
> > to do because you know the timestamp from the original message.  So you
> > know which logs in HDFS to search through to find the data.
> >
> > 02.11.2016, 14:12, "zeo...@gmail.com" :
> > > I personally would like to see the following things done before things
> > > leave BETA:
> > > (1) Address data integrity concerns (Specifically thinking of
> METRON-370,
> > > METRON-517)
> > > (2) Make cluster tuning easier and more consistent (METRON-485,
> > METRON-470,
> > > and the "[DISCUSS] moving parsers back to flux" which I can't find a
> JIRA
> > > for).
> > >
> > > I would also want to see the upgrade path (as opposed to rebuild) be
> more
> > > thoroughly and regularly tested once things leave BETA. From my
> > > perspective I think the project is very close but not yet ready.
> > >
> > > Jon
> > >
> > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella 
> wrote:
> > >
> > > Hello Everyone,
> > >
> > > Now that the discussion around the next release has started, it has
> been
> > > proposed and I think it's a good time to discuss what to name this next
> > > release. Before, we have adopted the BETA suffix. I think it might be
> > > time to drop it and call the next release 0.2.2
> > >
> > > Thoughts?
> > >
> > > Best,
> > >
> > > Casey
> > >
> > > --
> > >
> > > Jon
> >
> > ---
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> > --
> >
> > Jon
> >
> --
>
> Jon
>


Re: [DISCUSS] Next Release Name

2016-11-03 Thread zeo...@gmail.com
To clarify, it only needs to truncate fields > 32766 which need a
full/exact string match search to be run on them (analyzed fields generally
would not hit this limitation but I guess in theory they could).  However,
that's probably every field which can get > 32766 because I'm assuming
those will all be strings.

I also think using the profiler to monitor the truncation action could be a
useful default.

Jon

On Wed, Nov 2, 2016, 21:08 zeo...@gmail.com  wrote:

> That would break searching on uri entirely unless you queried and knew to
> truncate at 32766 because it's not analyzed.  I don't like pushing that
> complication to the end user.
>
> I would suggest truncation in the indexingBolt (not using stellar because
> you'd want this across the board) for all fields > 32766 (how do we make
> sure this gets updated if the limitation changes in Lucene?) and adding
> metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.).
> In the URI scenario I would also suggest doing a multifield mapping by
> default because of the way that data is useful (not sure which analyser to
> use though - maybe write or find a good URI analyzer?).  Since timestamp is
> a required field for all messages (I'm pretty sure?) I'm ok with timestamp
> and field value used as the UID, but would prefer something better.
>
> Jon
>
> On Wed, Nov 2, 2016, 20:33 James Sirota  wrote:
>
> Jon,
>
> For METRON-517 would it suffice to have a stellar statement to take a URI
> string and truncate it to length of 32766 in the ES writer?  But still
> write the actual string to HDFS? You can then search against ES on the
> truncated portion, but retrieve the actual timestamp from HDFS.  It's easy
> to do because you know the timestamp from the original message.  So you
> know which logs in HDFS to search through to find the data.
>
> 02.11.2016, 14:12, "zeo...@gmail.com" :
> > I personally would like to see the following things done before things
> > leave BETA:
> > (1) Address data integrity concerns (Specifically thinking of METRON-370,
> > METRON-517)
> > (2) Make cluster tuning easier and more consistent (METRON-485,
> METRON-470,
> > and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> > for).
> >
> > I would also want to see the upgrade path (as opposed to rebuild) be more
> > thoroughly and regularly tested once things leave BETA. From my
> > perspective I think the project is very close but not yet ready.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
> >
> > Hello Everyone,
> >
> > Now that the discussion around the next release has started, it has been
> > proposed and I think it's a good time to discuss what to name this next
> > release. Before, we have adopted the BETA suffix. I think it might be
> > time to drop it and call the next release 0.2.2
> >
> > Thoughts?
> >
> > Best,
> >
> > Casey
> >
> > --
> >
> > Jon
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
> --
>
> Jon
>
-- 

Jon


Re: [DISCUSS] Next Release Name

2016-11-02 Thread zeo...@gmail.com
That would break searching on uri entirely unless you queried and knew to
truncate at 32766 because it's not analyzed.  I don't like pushing that
complication to the end user.

I would suggest truncation in the indexingBolt (not using stellar because
you'd want this across the board) for all fields > 32766 (how do we make
sure this gets updated if the limitation changes in Lucene?) and adding
metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.).
In the URI scenario I would also suggest doing a multifield mapping by
default because of the way that data is useful (not sure which analyser to
use though - maybe write or find a good URI analyzer?).  Since timestamp is
a required field for all messages (I'm pretty sure?) I'm ok with timestamp
and field value used as the UID, but would prefer something better.

Jon

On Wed, Nov 2, 2016, 20:33 James Sirota  wrote:

> Jon,
>
> For METRON-517 would it suffice to have a stellar statement to take a URI
> string and truncate it to length of 32766 in the ES writer?  But still
> write the actual string to HDFS? You can then search against ES on the
> truncated portion, but retrieve the actual timestamp from HDFS.  It's easy
> to do because you know the timestamp from the original message.  So you
> know which logs in HDFS to search through to find the data.
>
> 02.11.2016, 14:12, "zeo...@gmail.com" :
> > I personally would like to see the following things done before things
> > leave BETA:
> > (1) Address data integrity concerns (Specifically thinking of METRON-370,
> > METRON-517)
> > (2) Make cluster tuning easier and more consistent (METRON-485,
> METRON-470,
> > and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> > for).
> >
> > I would also want to see the upgrade path (as opposed to rebuild) be more
> > thoroughly and regularly tested once things leave BETA. From my
> > perspective I think the project is very close but not yet ready.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
> >
> > Hello Everyone,
> >
> > Now that the discussion around the next release has started, it has been
> > proposed and I think it's a good time to discuss what to name this next
> > release. Before, we have adopted the BETA suffix. I think it might be
> > time to drop it and call the next release 0.2.2
> >
> > Thoughts?
> >
> > Best,
> >
> > Casey
> >
> > --
> >
> > Jon
>
> ---
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
-- 

Jon


Re: [DISCUSS] Next Release Name

2016-11-02 Thread James Sirota
Jon,

For METRON-517 would it suffice to have a stellar statement to take a URI 
string and truncate it to length of 32766 in the ES writer?  But still write 
the actual string to HDFS? You can then search against ES on the truncated 
portion, but retrieve the actual timestamp from HDFS.  It's easy to do because 
you know the timestamp from the original message.  So you know which logs in 
HDFS to search through to find the data.

02.11.2016, 14:12, "zeo...@gmail.com" :
> I personally would like to see the following things done before things
> leave BETA:
> (1) Address data integrity concerns (Specifically thinking of METRON-370,
> METRON-517)
> (2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
> and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> for).
>
> I would also want to see the upgrade path (as opposed to rebuild) be more
> thoroughly and regularly tested once things leave BETA. From my
> perspective I think the project is very close but not yet ready.
>
> Jon
>
> On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
>
> Hello Everyone,
>
> Now that the discussion around the next release has started, it has been
> proposed and I think it's a good time to discuss what to name this next
> release. Before, we have adopted the BETA suffix. I think it might be
> time to drop it and call the next release 0.2.2
>
> Thoughts?
>
> Best,
>
> Casey
>
> --
>
> Jon

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org


Re: [DISCUSS] Next Release Name

2016-11-02 Thread James Sirota
Sounds to me that we should close METRON-370 as non-discrepant.  If we want a 
feature later on to be able to replay enrichments we should add a Jira on that 
instead.  I don't see why you would do this, though.  If the enrichment fails 
you want to pass the message through unenriched.  Otherwise if you keep 
replaying you will backup your ingest pipeline. 

02.11.2016, 15:05, "Michael Miklavcic" :
> Hi Jon, I have commented on 370 -
> https://issues.apache.org/jira/browse/METRON-370
>
> Best,
> Mike
>
> On Wed, Nov 2, 2016 at 3:11 PM, zeo...@gmail.com  wrote:
>
>>  I personally would like to see the following things done before things
>>  leave BETA:
>>  (1) Address data integrity concerns (Specifically thinking of METRON-370,
>>  METRON-517)
>>  (2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
>>  and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
>>  for).
>>
>>  I would also want to see the upgrade path (as opposed to rebuild) be more
>>  thoroughly and regularly tested once things leave BETA. From my
>>  perspective I think the project is very close but not yet ready.
>>
>>  Jon
>>
>>  On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
>>
>>  Hello Everyone,
>>
>>  Now that the discussion around the next release has started, it has been
>>  proposed and I think it's a good time to discuss what to name this next
>>  release. Before, we have adopted the BETA suffix. I think it might be
>>  time to drop it and call the next release 0.2.2
>>
>>  Thoughts?
>>
>>  Best,
>>
>>  Casey
>>
>>  --
>>
>>  Jon

--- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org


Re: [DISCUSS] Next Release Name

2016-11-02 Thread Michael Miklavcic
Hi Jon, I have commented on 370 -
https://issues.apache.org/jira/browse/METRON-370

Best,
Mike

On Wed, Nov 2, 2016 at 3:11 PM, zeo...@gmail.com  wrote:

> I personally would like to see the following things done before things
> leave BETA:
> (1) Address data integrity concerns (Specifically thinking of METRON-370,
> METRON-517)
> (2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
> and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> for).
>
> I would also want to see the upgrade path (as opposed to rebuild) be more
> thoroughly and regularly tested once things leave BETA.  From my
> perspective I think the project is very close but not yet ready.
>
> Jon
>
> On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
>
> Hello Everyone,
>
> Now that the discussion around the next release has started, it has been
> proposed and I think it's a good time to discuss what to name this next
> release.  Before, we have adopted the BETA suffix.  I think it might be
> time to drop it and call the next release 0.2.2
>
> Thoughts?
>
> Best,
>
> Casey
>
> --
>
> Jon
>


Re: [DISCUSS] Next Release Name

2016-11-02 Thread David Lyle
I'm +1 dropping the BETA. Correct me if I'm wrong, but there are users in
production. I think leaving it in the 0 major is appropriate until the
concerns Jon mentioned are addressed.

-D...


On Wed, Nov 2, 2016 at 5:11 PM, zeo...@gmail.com  wrote:

> I personally would like to see the following things done before things
> leave BETA:
> (1) Address data integrity concerns (Specifically thinking of METRON-370,
> METRON-517)
> (2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
> and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> for).
>
> I would also want to see the upgrade path (as opposed to rebuild) be more
> thoroughly and regularly tested once things leave BETA.  From my
> perspective I think the project is very close but not yet ready.
>
> Jon
>
> On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:
>
> Hello Everyone,
>
> Now that the discussion around the next release has started, it has been
> proposed and I think it's a good time to discuss what to name this next
> release.  Before, we have adopted the BETA suffix.  I think it might be
> time to drop it and call the next release 0.2.2
>
> Thoughts?
>
> Best,
>
> Casey
>
> --
>
> Jon
>


Re: [DISCUSS] Next Release Name

2016-11-02 Thread zeo...@gmail.com
I personally would like to see the following things done before things
leave BETA:
(1) Address data integrity concerns (Specifically thinking of METRON-370,
METRON-517)
(2) Make cluster tuning easier and more consistent (METRON-485, METRON-470,
and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
for).

I would also want to see the upgrade path (as opposed to rebuild) be more
thoroughly and regularly tested once things leave BETA.  From my
perspective I think the project is very close but not yet ready.

Jon

On Wed, Nov 2, 2016 at 4:44 PM Casey Stella  wrote:

Hello Everyone,

Now that the discussion around the next release has started, it has been
proposed and I think it's a good time to discuss what to name this next
release.  Before, we have adopted the BETA suffix.  I think it might be
time to drop it and call the next release 0.2.2

Thoughts?

Best,

Casey

-- 

Jon