Re: [DISCUSS] Project reorganization

2016-04-20 Thread Ryan Merriman
Sheetal, 

Thank you for the input.  We appreciate all the hard work you and others
put into OpenSOC to get us to where we are today.

To your points:

- Agreed on reevaluating the bolts that now ship with Storm.  I believe
the HDFS and HBase bolts didn’t quite provide all the functionality needed
and is the reason for custom implementationsm but I will defer to others
who actually worked on those tasks.
- Agreed on changing HbaseConverter to HBaseConverter.  I will update the
spreadsheet.
- Agreed on a common package for HBase related classes.  We should look
more closely at this, any suggestions are welcome of course.
- There is a reason Solr and Elasticsearch classes ended up in separate
projects.  The supported version of Elasticsearch (1.7.4) is a couple
years old and the supported version of Solr is recent.  Locating these in
the same project is challenging because they both depend on very different
versions of Lucene.  Once we update Elasticsearch to a more recent version
such that it depends on the same Lucene version as Solr, keeping them
together in the same project should be much easier.
- Parsers and Enrichments are now decoupled, whereas before they were
included in the same topology.  Now they run in different topologies and
are deployed in separate jars.
- Agreed on the categories.  I believe some in your list are already
represented in the proposed project structure.  “Data Acquisition” is
analogous to the top level “metron-sensors” project. “Data Access” is
represented by the top level “metron-ui” project and the “metron-api”
project within the top level “metron-platform” project.  I like your idea
of having “Active Analysis” and “Deep Analytics” projects as well.  The
real-time pieces are represented in various sub projects in
“metron-platform” but I think there will eventually be a need for a “Deep
Analytics” project which is missing.  Maybe we should include a
“metron-analytics” project under “metron-platform”?  If not now, in the
future when we deliver more functionality in this area?

Ryan Merriman

On 4/19/16, 3:48 PM, "Sheetal Dolas"  wrote:

>Some of HBase bolt related classes were created in OpenSoc as that time
>Storm's HBase bolt did not have all necessary features (ability to add
>custom configs, enable/disable WAL, easy tuple mapping etc.). It should be
>re-evaluated to see if we can leverage the these components from Storm
>itself so as to avoid additional maintenance.
>
>Some observations and pointers for more thoughts:
>* HbaseConverter should be H*B*aseConverter to match other cases.
>* org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but
>other hbase components are in hbase package.
>* It may be better to have project structure on functional grouping than
>mix of function + implementation choices for example solr, and es probably
>could be packages than sub modules. (Unless the intention is to support
>more such "pluggable" indexing mechanisms at any given point)
>* parsers/enrichments, are they expected to be reused across multiple
>projects? If yes, are they different from common? If not, should they be
>packages instead?
>* From deployment perspective essentially there following broader
>categories
>1. Data Acquisition (pcap, nifi, flume, kafka writer etc.)
>2. Active Analysis (real time pieces - kafka, storm topology, bolts,
>parsers, enrichments, alerts etc)
>3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark
>related components)
>4. Data Access (apis, UI etc)
>
>Would it make sense to create project structure in such functional
>groupings?
>
>
>On Mon, Apr 18, 2016 at 1:46 PM, James Sirota 
>wrote:
>
>> Hi Ryan,
>>
>> This is great.  You should attach this to the Jira when you are ready to
>> commit the reorg so we know which parts shifted.
>>
>> Thanks,
>> James
>>
>>
>>
>>
>> On 4/18/16, 1:30 PM, "Ryan Merriman"  wrote:
>>
>> >Thanks Frank.  I’ve updated those in the spreadsheet.
>> >
>> >On 4/18/16, 3:27 PM, "Frank Lu"  wrote:
>> >
>> >>As of now, I think the following classes are not used:
>> >>
>> >>
>> >>
>> >>
>> >>Metron-EnrichmentAdapters
>> >>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
>> >>
>> >>
>> >>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
>> >>
>> >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
>> >>
>> >>
>> >>Metron-DataLoads
>> >>org.apache.metron.dataloads.cif.HBaseTableLoad.java
>> >>
>> >>
>> >>Thanks,
>> >>Frank Lu
>> >>
>> >>
>> >>
>> >>
>> >>On 4/18/16, 3:05 PM, "Ryan Merriman" 
>>wrote:
>> >>
>> >>>All,
>> >>>
>> >>>I put together a list of all the project java assets that details
>>where
>> >>>they will be moved (or potentially deleted) as part of the project
>> >>>reorganization.  Feedback welcome.
>> >>>
>> >>>Ryan Merriman
>> >>>
>> >>>On 4/13/16, 9:42 AM, "James Sirota"  wrote:
>> >>>
>> I would have configs as a project but rather as a folder structure
>>that
>> other modules can point to
>> 
>> Thanks,
>> James

Re: [DISCUSS] Project reorganization

2016-04-19 Thread Sheetal Dolas
Some of HBase bolt related classes were created in OpenSoc as that time
Storm's HBase bolt did not have all necessary features (ability to add
custom configs, enable/disable WAL, easy tuple mapping etc.). It should be
re-evaluated to see if we can leverage the these components from Storm
itself so as to avoid additional maintenance.

Some observations and pointers for more thoughts:
* HbaseConverter should be H*B*aseConverter to match other cases.
* org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but
other hbase components are in hbase package.
* It may be better to have project structure on functional grouping than
mix of function + implementation choices for example solr, and es probably
could be packages than sub modules. (Unless the intention is to support
more such "pluggable" indexing mechanisms at any given point)
* parsers/enrichments, are they expected to be reused across multiple
projects? If yes, are they different from common? If not, should they be
packages instead?
* From deployment perspective essentially there following broader categories
1. Data Acquisition (pcap, nifi, flume, kafka writer etc.)
2. Active Analysis (real time pieces - kafka, storm topology, bolts,
parsers, enrichments, alerts etc)
3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark
related components)
4. Data Access (apis, UI etc)

Would it make sense to create project structure in such functional
groupings?


On Mon, Apr 18, 2016 at 1:46 PM, James Sirota 
wrote:

> Hi Ryan,
>
> This is great.  You should attach this to the Jira when you are ready to
> commit the reorg so we know which parts shifted.
>
> Thanks,
> James
>
>
>
>
> On 4/18/16, 1:30 PM, "Ryan Merriman"  wrote:
>
> >Thanks Frank.  I’ve updated those in the spreadsheet.
> >
> >On 4/18/16, 3:27 PM, "Frank Lu"  wrote:
> >
> >>As of now, I think the following classes are not used:
> >>
> >>
> >>
> >>
> >>Metron-EnrichmentAdapters
> >>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
> >>
> >>
> >>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
> >>
> >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
> >>
> >>
> >>Metron-DataLoads
> >>org.apache.metron.dataloads.cif.HBaseTableLoad.java
> >>
> >>
> >>Thanks,
> >>Frank Lu
> >>
> >>
> >>
> >>
> >>On 4/18/16, 3:05 PM, "Ryan Merriman"  wrote:
> >>
> >>>All,
> >>>
> >>>I put together a list of all the project java assets that details where
> >>>they will be moved (or potentially deleted) as part of the project
> >>>reorganization.  Feedback welcome.
> >>>
> >>>Ryan Merriman
> >>>
> >>>On 4/13/16, 9:42 AM, "James Sirota"  wrote:
> >>>
> I would have configs as a project but rather as a folder structure that
> other modules can point to
> 
> Thanks,
> James
> 
> 
> 
> 
> On 4/13/16, 7:32 AM, "Ryan Merriman" 
> wrote:
> 
> >James brings up a good point.  I propose adding another project under
> >metron-platform called metron-configuration.  This would be a fairly
> >lightweight project that would contain anything related to
> >configuration
> >(property files, json files, flux files, etc).
> >
> >On 4/13/16, 8:56 AM, "James Sirota"  wrote:
> >
> >>+1 from me.
> >>
> >>I would also like to address the configs and make sure the configs
> are
> >>in
> >>the same place.  Do you have ideas on where we would put those?
> >>
> >>Thanks,
> >>James
> >>
> >>
> >>
> >>On 4/13/16, 6:50 AM, "Ryan Merriman" 
> >>wrote:
> >>
> >>>Thank you for all the feedback everyone.  I will attempt to
> summarize
> >>>all
> >>>the input we¹ve received and update my initial proposal.  We can
> >>>discuss
> >>>further if anyone is still unclear and I will volunteer to capture
> >>>all
> >>>the
> >>>details in a document of some kind once we all come to a consensus.
> >>>
> >>>Looks like everyone is in agreement for the top level projects.
> Nick
> >>>is
> >>>working on a task that will require an addition top level project so
> >>>I
> >>>am
> >>>going to add that in as well:
> >>>
> >>>metron-deployment
> >>>metron-platform
> >>>metron-ui
> >>>metron-sensors
> >>>
> >>>All of these except metron-platform are well understood and don¹t
> >>>warrant
> >>>any more discussion.  For metron-platform there seem to be 2 areas
> >>>that
> >>>are not as clear:
> >>>
> >>>- whether we need a common project
> >>>- how do we organize test related code
> >>>
> >>>I agree with David and others that a common project will likely get
> >>>misused and could become unnecessary bloated.  But I suspect there
> >>>will
> >>>be
> >>>cases where we have common code being used across multiple projects
> >>>(is
> >>>already happening).  In this case we will either need this common
> >>>project
> >>>or we wil

Re: [DISCUSS] Project reorganization

2016-04-18 Thread James Sirota
Hi Ryan,

This is great.  You should attach this to the Jira when you are ready to commit 
the reorg so we know which parts shifted.

Thanks,
James 




On 4/18/16, 1:30 PM, "Ryan Merriman"  wrote:

>Thanks Frank.  I’ve updated those in the spreadsheet.
>
>On 4/18/16, 3:27 PM, "Frank Lu"  wrote:
>
>>As of now, I think the following classes are not used:
>>
>>
>> 
>> 
>>Metron-EnrichmentAdapters
>>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
>> 
>> 
>>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
>>
>>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
>>
>>
>>Metron-DataLoads
>>org.apache.metron.dataloads.cif.HBaseTableLoad.java
>>  
>>
>>Thanks,
>>Frank Lu
>>
>>
>>
>>
>>On 4/18/16, 3:05 PM, "Ryan Merriman"  wrote:
>>
>>>All,
>>>
>>>I put together a list of all the project java assets that details where
>>>they will be moved (or potentially deleted) as part of the project
>>>reorganization.  Feedback welcome.
>>>
>>>Ryan Merriman 
>>>
>>>On 4/13/16, 9:42 AM, "James Sirota"  wrote:
>>>
I would have configs as a project but rather as a folder structure that
other modules can point to

Thanks,
James 




On 4/13/16, 7:32 AM, "Ryan Merriman"  wrote:

>James brings up a good point.  I propose adding another project under
>metron-platform called metron-configuration.  This would be a fairly
>lightweight project that would contain anything related to
>configuration
>(property files, json files, flux files, etc).
>
>On 4/13/16, 8:56 AM, "James Sirota"  wrote:
>
>>+1 from me.
>>
>>I would also like to address the configs and make sure the configs are
>>in
>>the same place.  Do you have ideas on where we would put those?
>>
>>Thanks,
>>James 
>>
>>
>>
>>On 4/13/16, 6:50 AM, "Ryan Merriman" 
>>wrote:
>>
>>>Thank you for all the feedback everyone.  I will attempt to summarize
>>>all
>>>the input we¹ve received and update my initial proposal.  We can
>>>discuss
>>>further if anyone is still unclear and I will volunteer to capture
>>>all
>>>the
>>>details in a document of some kind once we all come to a consensus.
>>>
>>>Looks like everyone is in agreement for the top level projects.  Nick
>>>is
>>>working on a task that will require an addition top level project so
>>>I
>>>am
>>>going to add that in as well:
>>>
>>>metron-deployment
>>>metron-platform
>>>metron-ui
>>>metron-sensors
>>>
>>>All of these except metron-platform are well understood and don¹t
>>>warrant
>>>any more discussion.  For metron-platform there seem to be 2 areas
>>>that
>>>are not as clear:
>>>
>>>- whether we need a common project
>>>- how do we organize test related code
>>>
>>>I agree with David and others that a common project will likely get
>>>misused and could become unnecessary bloated.  But I suspect there
>>>will
>>>be
>>>cases where we have common code being used across multiple projects
>>>(is
>>>already happening).  In this case we will either need this common
>>>project
>>>or we will have to keep common code in one of the other projects and
>>>have
>>>all other projects extend that. For the latter, an example would be
>>>keeping common code in enrichment and having parsers declare
>>>enrichment
>>>as
>>>a dependency.  There are a couple downsides I see with this approach:
>>>
>>>- parser topology jars now bring along all the enrichment
>>>dependencies
>>>- since more code from various projects are being packaged together,
>>>version conflicts are more likely and poms become more complicated
>>>due
>>>to
>>>all the necessary exclusions
>>>
>>>My thinking is that any jar file being deployed should only contain
>>>what
>>>it needs.  Curious what others think here.  My vote would be to
>>>maintain
>>>a
>>>common project (or whatever we want to call it) and be diligent about
>>>not
>>>letting project-specific code slip in there.
>>>
>>>I believe Nick was the first person to ask the question about
>>>projects
>>>related to test code and why we would need separate test and
>>>integration
>>>test.  The reason for this is that our integration-test classes
>>>currently
>>>depend on other projects (not surprising since they are integration
>>>tests).  If there are utilities we want make available to all
>>>projects
>>>(mock classes, utilities for reading sample data, etc) then it can¹t
>>>live
>>>in integration-test because that will introduce circular
>>>dependencies.
>>>If
>>>it is possible to refactor our current Metron-Testing project so that
>>>it
>>>doesn¹t depend on any other projects, then we can keep utilities
>>>here.
>>

Re: [DISCUSS] Project reorganization

2016-04-18 Thread Ryan Merriman
Thanks Frank.  I’ve updated those in the spreadsheet.

On 4/18/16, 3:27 PM, "Frank Lu"  wrote:

>As of now, I think the following classes are not used:
>
>
> 
> 
>Metron-EnrichmentAdapters
>  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
> 
> 
>  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java
>
>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java
>
>
>Metron-DataLoads
>org.apache.metron.dataloads.cif.HBaseTableLoad.java
>   
>
>Thanks,
>Frank Lu
>
>
>
>
>On 4/18/16, 3:05 PM, "Ryan Merriman"  wrote:
>
>>All,
>>
>>I put together a list of all the project java assets that details where
>>they will be moved (or potentially deleted) as part of the project
>>reorganization.  Feedback welcome.
>>
>>Ryan Merriman 
>>
>>On 4/13/16, 9:42 AM, "James Sirota"  wrote:
>>
>>>I would have configs as a project but rather as a folder structure that
>>>other modules can point to
>>>
>>>Thanks,
>>>James 
>>>
>>>
>>>
>>>
>>>On 4/13/16, 7:32 AM, "Ryan Merriman"  wrote:
>>>
James brings up a good point.  I propose adding another project under
metron-platform called metron-configuration.  This would be a fairly
lightweight project that would contain anything related to
configuration
(property files, json files, flux files, etc).

On 4/13/16, 8:56 AM, "James Sirota"  wrote:

>+1 from me.
>
>I would also like to address the configs and make sure the configs are
>in
>the same place.  Do you have ideas on where we would put those?
>
>Thanks,
>James 
>
>
>
>On 4/13/16, 6:50 AM, "Ryan Merriman" 
>wrote:
>
>>Thank you for all the feedback everyone.  I will attempt to summarize
>>all
>>the input we¹ve received and update my initial proposal.  We can
>>discuss
>>further if anyone is still unclear and I will volunteer to capture
>>all
>>the
>>details in a document of some kind once we all come to a consensus.
>>
>>Looks like everyone is in agreement for the top level projects.  Nick
>>is
>>working on a task that will require an addition top level project so
>>I
>>am
>>going to add that in as well:
>>
>>metron-deployment
>>metron-platform
>>metron-ui
>>metron-sensors
>>
>>All of these except metron-platform are well understood and don¹t
>>warrant
>>any more discussion.  For metron-platform there seem to be 2 areas
>>that
>>are not as clear:
>>
>>- whether we need a common project
>>- how do we organize test related code
>>
>>I agree with David and others that a common project will likely get
>>misused and could become unnecessary bloated.  But I suspect there
>>will
>>be
>>cases where we have common code being used across multiple projects
>>(is
>>already happening).  In this case we will either need this common
>>project
>>or we will have to keep common code in one of the other projects and
>>have
>>all other projects extend that. For the latter, an example would be
>>keeping common code in enrichment and having parsers declare
>>enrichment
>>as
>>a dependency.  There are a couple downsides I see with this approach:
>>
>>- parser topology jars now bring along all the enrichment
>>dependencies
>>- since more code from various projects are being packaged together,
>>version conflicts are more likely and poms become more complicated
>>due
>>to
>>all the necessary exclusions
>>
>>My thinking is that any jar file being deployed should only contain
>>what
>>it needs.  Curious what others think here.  My vote would be to
>>maintain
>>a
>>common project (or whatever we want to call it) and be diligent about
>>not
>>letting project-specific code slip in there.
>>
>>I believe Nick was the first person to ask the question about
>>projects
>>related to test code and why we would need separate test and
>>integration
>>test.  The reason for this is that our integration-test classes
>>currently
>>depend on other projects (not surprising since they are integration
>>tests).  If there are utilities we want make available to all
>>projects
>>(mock classes, utilities for reading sample data, etc) then it can¹t
>>live
>>in integration-test because that will introduce circular
>>dependencies.
>>If
>>it is possible to refactor our current Metron-Testing project so that
>>it
>>doesn¹t depend on any other projects, then we can keep utilities
>>here.
>>Otherwise we need a separate project for testing utilities.  I
>>suspect
>>removing other project dependencies from Metron-Testing will prove
>>more
>>difficult than it¹s worth so my vote would be to have 2 test related
>>projects.
>>
>>So here is where our metron-platform organization stands:
>>
>>metron-commo

Re: [DISCUSS] Project reorganization

2016-04-18 Thread Frank Lu
As of now, I think the following classes are not used:


 
 
Metron-EnrichmentAdapters 
  org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java
 
 
  org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java

org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java


Metron-DataLoads
org.apache.metron.dataloads.cif.HBaseTableLoad.java


Thanks,
Frank Lu




On 4/18/16, 3:05 PM, "Ryan Merriman"  wrote:

>All,
>
>I put together a list of all the project java assets that details where
>they will be moved (or potentially deleted) as part of the project
>reorganization.  Feedback welcome.
>
>Ryan Merriman 
>
>On 4/13/16, 9:42 AM, "James Sirota"  wrote:
>
>>I would have configs as a project but rather as a folder structure that
>>other modules can point to
>>
>>Thanks,
>>James 
>>
>>
>>
>>
>>On 4/13/16, 7:32 AM, "Ryan Merriman"  wrote:
>>
>>>James brings up a good point.  I propose adding another project under
>>>metron-platform called metron-configuration.  This would be a fairly
>>>lightweight project that would contain anything related to configuration
>>>(property files, json files, flux files, etc).
>>>
>>>On 4/13/16, 8:56 AM, "James Sirota"  wrote:
>>>
+1 from me.

I would also like to address the configs and make sure the configs are
in
the same place.  Do you have ideas on where we would put those?

Thanks,
James 



On 4/13/16, 6:50 AM, "Ryan Merriman"  wrote:

>Thank you for all the feedback everyone.  I will attempt to summarize
>all
>the input we¹ve received and update my initial proposal.  We can
>discuss
>further if anyone is still unclear and I will volunteer to capture all
>the
>details in a document of some kind once we all come to a consensus.
>
>Looks like everyone is in agreement for the top level projects.  Nick
>is
>working on a task that will require an addition top level project so I
>am
>going to add that in as well:
>
>metron-deployment
>metron-platform
>metron-ui
>metron-sensors
>
>All of these except metron-platform are well understood and don¹t
>warrant
>any more discussion.  For metron-platform there seem to be 2 areas that
>are not as clear:
>
>- whether we need a common project
>- how do we organize test related code
>
>I agree with David and others that a common project will likely get
>misused and could become unnecessary bloated.  But I suspect there will
>be
>cases where we have common code being used across multiple projects (is
>already happening).  In this case we will either need this common
>project
>or we will have to keep common code in one of the other projects and
>have
>all other projects extend that. For the latter, an example would be
>keeping common code in enrichment and having parsers declare enrichment
>as
>a dependency.  There are a couple downsides I see with this approach:
>
>- parser topology jars now bring along all the enrichment dependencies
>- since more code from various projects are being packaged together,
>version conflicts are more likely and poms become more complicated due
>to
>all the necessary exclusions
>
>My thinking is that any jar file being deployed should only contain
>what
>it needs.  Curious what others think here.  My vote would be to
>maintain
>a
>common project (or whatever we want to call it) and be diligent about
>not
>letting project-specific code slip in there.
>
>I believe Nick was the first person to ask the question about projects
>related to test code and why we would need separate test and
>integration
>test.  The reason for this is that our integration-test classes
>currently
>depend on other projects (not surprising since they are integration
>tests).  If there are utilities we want make available to all projects
>(mock classes, utilities for reading sample data, etc) then it can¹t
>live
>in integration-test because that will introduce circular dependencies.
>If
>it is possible to refactor our current Metron-Testing project so that
>it
>doesn¹t depend on any other projects, then we can keep utilities here.
>Otherwise we need a separate project for testing utilities.  I suspect
>removing other project dependencies from Metron-Testing will prove more
>difficult than it¹s worth so my vote would be to have 2 test related
>projects.
>
>So here is where our metron-platform organization stands:
>
>metron-common *
>metron-integration-test *
>metron-test-utilities *
>metron-data-management
>metron-pcap
>metron-parsers
>metron-enrichment
>   metron-solr
>   metron-elasticsearch
>metron-api
>
>* may or may not change depending on the outcome of this discussion
>
>Thoughts?
>
>Rya

Re: [DISCUSS] Project reorganization

2016-04-18 Thread Ryan Merriman
All,

I put together a list of all the project java assets that details where
they will be moved (or potentially deleted) as part of the project
reorganization.  Feedback welcome.

Ryan Merriman 

On 4/13/16, 9:42 AM, "James Sirota"  wrote:

>I would have configs as a project but rather as a folder structure that
>other modules can point to
>
>Thanks,
>James 
>
>
>
>
>On 4/13/16, 7:32 AM, "Ryan Merriman"  wrote:
>
>>James brings up a good point.  I propose adding another project under
>>metron-platform called metron-configuration.  This would be a fairly
>>lightweight project that would contain anything related to configuration
>>(property files, json files, flux files, etc).
>>
>>On 4/13/16, 8:56 AM, "James Sirota"  wrote:
>>
>>>+1 from me.
>>>
>>>I would also like to address the configs and make sure the configs are
>>>in
>>>the same place.  Do you have ideas on where we would put those?
>>>
>>>Thanks,
>>>James 
>>>
>>>
>>>
>>>On 4/13/16, 6:50 AM, "Ryan Merriman"  wrote:
>>>
Thank you for all the feedback everyone.  I will attempt to summarize
all
the input we¹ve received and update my initial proposal.  We can
discuss
further if anyone is still unclear and I will volunteer to capture all
the
details in a document of some kind once we all come to a consensus.

Looks like everyone is in agreement for the top level projects.  Nick
is
working on a task that will require an addition top level project so I
am
going to add that in as well:

metron-deployment
metron-platform
metron-ui
metron-sensors

All of these except metron-platform are well understood and don¹t
warrant
any more discussion.  For metron-platform there seem to be 2 areas that
are not as clear:

- whether we need a common project
- how do we organize test related code

I agree with David and others that a common project will likely get
misused and could become unnecessary bloated.  But I suspect there will
be
cases where we have common code being used across multiple projects (is
already happening).  In this case we will either need this common
project
or we will have to keep common code in one of the other projects and
have
all other projects extend that. For the latter, an example would be
keeping common code in enrichment and having parsers declare enrichment
as
a dependency.  There are a couple downsides I see with this approach:

- parser topology jars now bring along all the enrichment dependencies
- since more code from various projects are being packaged together,
version conflicts are more likely and poms become more complicated due
to
all the necessary exclusions

My thinking is that any jar file being deployed should only contain
what
it needs.  Curious what others think here.  My vote would be to
maintain
a
common project (or whatever we want to call it) and be diligent about
not
letting project-specific code slip in there.

I believe Nick was the first person to ask the question about projects
related to test code and why we would need separate test and
integration
test.  The reason for this is that our integration-test classes
currently
depend on other projects (not surprising since they are integration
tests).  If there are utilities we want make available to all projects
(mock classes, utilities for reading sample data, etc) then it can¹t
live
in integration-test because that will introduce circular dependencies.
If
it is possible to refactor our current Metron-Testing project so that
it
doesn¹t depend on any other projects, then we can keep utilities here.
Otherwise we need a separate project for testing utilities.  I suspect
removing other project dependencies from Metron-Testing will prove more
difficult than it¹s worth so my vote would be to have 2 test related
projects.

So here is where our metron-platform organization stands:

metron-common *
metron-integration-test *
metron-test-utilities *
metron-data-management
metron-pcap
metron-parsers
metron-enrichment
metron-solr
metron-elasticsearch
metron-api

* may or may not change depending on the outcome of this discussion

Thoughts?

Ryan Merriman


On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:

>If you load up your Irc client just type
>/join #apache-metron-dev
>
>Sent from my iPhone
>
>> On Apr 11, 2016, at 12:06 PM, James Sirota 
>>wrote:
>> 
>> Great, thanks, Debo.  Where can I find instructions on how to get to
>>it?
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" 
>>>wrote:
>>> 
>>> Hi James 
>>> 
>>> Ok set it up and ack Š..
>>> 
>>> Thx
>>

Re: [DISCUSS] Project reorganization

2016-04-13 Thread James Sirota
I would have configs as a project but rather as a folder structure that other 
modules can point to 

Thanks,
James 




On 4/13/16, 7:32 AM, "Ryan Merriman"  wrote:

>James brings up a good point.  I propose adding another project under
>metron-platform called metron-configuration.  This would be a fairly
>lightweight project that would contain anything related to configuration
>(property files, json files, flux files, etc).
>
>On 4/13/16, 8:56 AM, "James Sirota"  wrote:
>
>>+1 from me.
>>
>>I would also like to address the configs and make sure the configs are in
>>the same place.  Do you have ideas on where we would put those?
>>
>>Thanks,
>>James 
>>
>>
>>
>>On 4/13/16, 6:50 AM, "Ryan Merriman"  wrote:
>>
>>>Thank you for all the feedback everyone.  I will attempt to summarize all
>>>the input we¹ve received and update my initial proposal.  We can discuss
>>>further if anyone is still unclear and I will volunteer to capture all
>>>the
>>>details in a document of some kind once we all come to a consensus.
>>>
>>>Looks like everyone is in agreement for the top level projects.  Nick is
>>>working on a task that will require an addition top level project so I am
>>>going to add that in as well:
>>>
>>>metron-deployment
>>>metron-platform
>>>metron-ui
>>>metron-sensors
>>>
>>>All of these except metron-platform are well understood and don¹t warrant
>>>any more discussion.  For metron-platform there seem to be 2 areas that
>>>are not as clear:
>>>
>>>- whether we need a common project
>>>- how do we organize test related code
>>>
>>>I agree with David and others that a common project will likely get
>>>misused and could become unnecessary bloated.  But I suspect there will
>>>be
>>>cases where we have common code being used across multiple projects (is
>>>already happening).  In this case we will either need this common project
>>>or we will have to keep common code in one of the other projects and have
>>>all other projects extend that. For the latter, an example would be
>>>keeping common code in enrichment and having parsers declare enrichment
>>>as
>>>a dependency.  There are a couple downsides I see with this approach:
>>>
>>>- parser topology jars now bring along all the enrichment dependencies
>>>- since more code from various projects are being packaged together,
>>>version conflicts are more likely and poms become more complicated due to
>>>all the necessary exclusions
>>>
>>>My thinking is that any jar file being deployed should only contain what
>>>it needs.  Curious what others think here.  My vote would be to maintain
>>>a
>>>common project (or whatever we want to call it) and be diligent about not
>>>letting project-specific code slip in there.
>>>
>>>I believe Nick was the first person to ask the question about projects
>>>related to test code and why we would need separate test and integration
>>>test.  The reason for this is that our integration-test classes currently
>>>depend on other projects (not surprising since they are integration
>>>tests).  If there are utilities we want make available to all projects
>>>(mock classes, utilities for reading sample data, etc) then it can¹t live
>>>in integration-test because that will introduce circular dependencies.
>>>If
>>>it is possible to refactor our current Metron-Testing project so that it
>>>doesn¹t depend on any other projects, then we can keep utilities here.
>>>Otherwise we need a separate project for testing utilities.  I suspect
>>>removing other project dependencies from Metron-Testing will prove more
>>>difficult than it¹s worth so my vote would be to have 2 test related
>>>projects.
>>>
>>>So here is where our metron-platform organization stands:
>>>
>>>metron-common *
>>>metron-integration-test *
>>>metron-test-utilities *
>>>metron-data-management
>>>metron-pcap
>>>metron-parsers
>>>metron-enrichment
>>> metron-solr
>>> metron-elasticsearch
>>>metron-api
>>>
>>>* may or may not change depending on the outcome of this discussion
>>>
>>>Thoughts?
>>>
>>>Ryan Merriman
>>>
>>>
>>>On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:
>>>
If you load up your Irc client just type
/join #apache-metron-dev

Sent from my iPhone

> On Apr 11, 2016, at 12:06 PM, James Sirota 
>wrote:
> 
> Great, thanks, Debo.  Where can I find instructions on how to get to
>it?
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" 
>>wrote:
>> 
>> Hi James 
>> 
>> Ok set it up and ack Š..
>> 
>> Thx
>> 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 6:31 PM, "James Sirota"  wrote:
>>> 
>>> Hi Debo,
>>> 
>>> I think it would be great if you set it up
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
 
 I have set it up for another open source effort in the past and it
was not very hard. Am happy to voluntee

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Ryan Merriman
James brings up a good point.  I propose adding another project under
metron-platform called metron-configuration.  This would be a fairly
lightweight project that would contain anything related to configuration
(property files, json files, flux files, etc).

On 4/13/16, 8:56 AM, "James Sirota"  wrote:

>+1 from me.
>
>I would also like to address the configs and make sure the configs are in
>the same place.  Do you have ideas on where we would put those?
>
>Thanks,
>James 
>
>
>
>On 4/13/16, 6:50 AM, "Ryan Merriman"  wrote:
>
>>Thank you for all the feedback everyone.  I will attempt to summarize all
>>the input we¹ve received and update my initial proposal.  We can discuss
>>further if anyone is still unclear and I will volunteer to capture all
>>the
>>details in a document of some kind once we all come to a consensus.
>>
>>Looks like everyone is in agreement for the top level projects.  Nick is
>>working on a task that will require an addition top level project so I am
>>going to add that in as well:
>>
>>metron-deployment
>>metron-platform
>>metron-ui
>>metron-sensors
>>
>>All of these except metron-platform are well understood and don¹t warrant
>>any more discussion.  For metron-platform there seem to be 2 areas that
>>are not as clear:
>>
>>- whether we need a common project
>>- how do we organize test related code
>>
>>I agree with David and others that a common project will likely get
>>misused and could become unnecessary bloated.  But I suspect there will
>>be
>>cases where we have common code being used across multiple projects (is
>>already happening).  In this case we will either need this common project
>>or we will have to keep common code in one of the other projects and have
>>all other projects extend that. For the latter, an example would be
>>keeping common code in enrichment and having parsers declare enrichment
>>as
>>a dependency.  There are a couple downsides I see with this approach:
>>
>>- parser topology jars now bring along all the enrichment dependencies
>>- since more code from various projects are being packaged together,
>>version conflicts are more likely and poms become more complicated due to
>>all the necessary exclusions
>>
>>My thinking is that any jar file being deployed should only contain what
>>it needs.  Curious what others think here.  My vote would be to maintain
>>a
>>common project (or whatever we want to call it) and be diligent about not
>>letting project-specific code slip in there.
>>
>>I believe Nick was the first person to ask the question about projects
>>related to test code and why we would need separate test and integration
>>test.  The reason for this is that our integration-test classes currently
>>depend on other projects (not surprising since they are integration
>>tests).  If there are utilities we want make available to all projects
>>(mock classes, utilities for reading sample data, etc) then it can¹t live
>>in integration-test because that will introduce circular dependencies.
>>If
>>it is possible to refactor our current Metron-Testing project so that it
>>doesn¹t depend on any other projects, then we can keep utilities here.
>>Otherwise we need a separate project for testing utilities.  I suspect
>>removing other project dependencies from Metron-Testing will prove more
>>difficult than it¹s worth so my vote would be to have 2 test related
>>projects.
>>
>>So here is where our metron-platform organization stands:
>>
>>metron-common *
>>metron-integration-test *
>>metron-test-utilities *
>>metron-data-management
>>metron-pcap
>>metron-parsers
>>metron-enrichment
>>  metron-solr
>>  metron-elasticsearch
>>metron-api
>>
>>* may or may not change depending on the outcome of this discussion
>>
>>Thoughts?
>>
>>Ryan Merriman
>>
>>
>>On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:
>>
>>>If you load up your Irc client just type
>>>/join #apache-metron-dev
>>>
>>>Sent from my iPhone
>>>
 On Apr 11, 2016, at 12:06 PM, James Sirota 
wrote:
 
 Great, thanks, Debo.  Where can I find instructions on how to get to
it?
 
 Thanks,
 James 
 
 
 
 
> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" 
>wrote:
> 
> Hi James 
> 
> Ok set it up and ack Š..
> 
> Thx
> 
> 
> 
> 
> 
>> On 4/10/16, 6:31 PM, "James Sirota"  wrote:
>> 
>> Hi Debo,
>> 
>> I think it would be great if you set it up
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>>> 
>>> I have set it up for another open source effort in the past and it
>>>was not very hard. Am happy to volunteer if needed.
>>> 
>>> Thx 
>>> Debo
>>> 
>>> Sent from my iPhone
>>> 
 On Apr 10, 2016, at 5:53 PM, James Sirota

wrote:
 
 I¹d be open to an IRC channel.  Does anyone know if Apache allows
this?  If yes, does anyone know how to set one up

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Nick Allen
+1 I like it.

On Wed, Apr 13, 2016 at 9:59 AM, Ryan Merriman 
wrote:

> To answer a couple of other questions people asked:
>
> Debo, agreed having clear extension points is going to be extremely
> important for us.  Currently we have well defined interfaces for parsers
> and enrichment adapters as well as the ability to load data into and drive
> enrichments (threat intels) from HBase tables with well defined key
> structures.  Eventually we will want to extend this to models.  Maybe an
> analytical project makes sense when we get to that point?
>
> Debo and James, yes my vision for the metron-api project is a standard
> interface for interacting with Metron.  This would include everything from
> data access (pcap service) to security and beyond.
>
> David, let’s explore the best way to leverage the dependencyManagement
> section in our top level pom.  I think you’re on to something there.  Our
> maven implementation needs a thorough review as well.
>
> Ryan Merriman
>
>
>
> On 4/13/16, 8:50 AM, "Ryan Merriman"  wrote:
>
> >Thank you for all the feedback everyone.  I will attempt to summarize all
> >the input we¹ve received and update my initial proposal.  We can discuss
> >further if anyone is still unclear and I will volunteer to capture all the
> >details in a document of some kind once we all come to a consensus.
> >
> >Looks like everyone is in agreement for the top level projects.  Nick is
> >working on a task that will require an addition top level project so I am
> >going to add that in as well:
> >
> >metron-deployment
> >metron-platform
> >metron-ui
> >metron-sensors
> >
> >All of these except metron-platform are well understood and don¹t warrant
> >any more discussion.  For metron-platform there seem to be 2 areas that
> >are not as clear:
> >
> >- whether we need a common project
> >- how do we organize test related code
> >
> >I agree with David and others that a common project will likely get
> >misused and could become unnecessary bloated.  But I suspect there will be
> >cases where we have common code being used across multiple projects (is
> >already happening).  In this case we will either need this common project
> >or we will have to keep common code in one of the other projects and have
> >all other projects extend that. For the latter, an example would be
> >keeping common code in enrichment and having parsers declare enrichment as
> >a dependency.  There are a couple downsides I see with this approach:
> >
> >- parser topology jars now bring along all the enrichment dependencies
> >- since more code from various projects are being packaged together,
> >version conflicts are more likely and poms become more complicated due to
> >all the necessary exclusions
> >
> >My thinking is that any jar file being deployed should only contain what
> >it needs.  Curious what others think here.  My vote would be to maintain a
> >common project (or whatever we want to call it) and be diligent about not
> >letting project-specific code slip in there.
> >
> >I believe Nick was the first person to ask the question about projects
> >related to test code and why we would need separate test and integration
> >test.  The reason for this is that our integration-test classes currently
> >depend on other projects (not surprising since they are integration
> >tests).  If there are utilities we want make available to all projects
> >(mock classes, utilities for reading sample data, etc) then it can¹t live
> >in integration-test because that will introduce circular dependencies.  If
> >it is possible to refactor our current Metron-Testing project so that it
> >doesn¹t depend on any other projects, then we can keep utilities here.
> >Otherwise we need a separate project for testing utilities.  I suspect
> >removing other project dependencies from Metron-Testing will prove more
> >difficult than it¹s worth so my vote would be to have 2 test related
> >projects.
> >
> >So here is where our metron-platform organization stands:
> >
> >metron-common *
> >metron-integration-test *
> >metron-test-utilities *
> >metron-data-management
> >metron-pcap
> >metron-parsers
> >metron-enrichment
> >   metron-solr
> >   metron-elasticsearch
> >metron-api
> >
> >* may or may not change depending on the outcome of this discussion
> >
> >Thoughts?
> >
> >Ryan Merriman
> >
> >
> >On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:
> >
> >>If you load up your Irc client just type
> >>/join #apache-metron-dev
> >>
> >>Sent from my iPhone
> >>
> >>> On Apr 11, 2016, at 12:06 PM, James Sirota 
> >>>wrote:
> >>>
> >>> Great, thanks, Debo.  Where can I find instructions on how to get to
> >>>it?
> >>>
> >>> Thanks,
> >>> James
> >>>
> >>>
> >>>
> >>>
>  On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" 
> wrote:
> 
>  Hi James
> 
>  Ok set it up and ack Š..
> 
>  Thx
> 
> 
> 
> 
> 
> > On 4/10/16, 6:31 PM, "James Sirota"  wrote:
> >
> > Hi Debo,
> >
> > I think it would be 

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Ryan Merriman
To answer a couple of other questions people asked:

Debo, agreed having clear extension points is going to be extremely
important for us.  Currently we have well defined interfaces for parsers
and enrichment adapters as well as the ability to load data into and drive
enrichments (threat intels) from HBase tables with well defined key
structures.  Eventually we will want to extend this to models.  Maybe an
analytical project makes sense when we get to that point?

Debo and James, yes my vision for the metron-api project is a standard
interface for interacting with Metron.  This would include everything from
data access (pcap service) to security and beyond.

David, let’s explore the best way to leverage the dependencyManagement
section in our top level pom.  I think you’re on to something there.  Our
maven implementation needs a thorough review as well.

Ryan Merriman



On 4/13/16, 8:50 AM, "Ryan Merriman"  wrote:

>Thank you for all the feedback everyone.  I will attempt to summarize all
>the input we¹ve received and update my initial proposal.  We can discuss
>further if anyone is still unclear and I will volunteer to capture all the
>details in a document of some kind once we all come to a consensus.
>
>Looks like everyone is in agreement for the top level projects.  Nick is
>working on a task that will require an addition top level project so I am
>going to add that in as well:
>
>metron-deployment
>metron-platform
>metron-ui
>metron-sensors
>
>All of these except metron-platform are well understood and don¹t warrant
>any more discussion.  For metron-platform there seem to be 2 areas that
>are not as clear: 
>
>- whether we need a common project
>- how do we organize test related code
>
>I agree with David and others that a common project will likely get
>misused and could become unnecessary bloated.  But I suspect there will be
>cases where we have common code being used across multiple projects (is
>already happening).  In this case we will either need this common project
>or we will have to keep common code in one of the other projects and have
>all other projects extend that. For the latter, an example would be
>keeping common code in enrichment and having parsers declare enrichment as
>a dependency.  There are a couple downsides I see with this approach:
>
>- parser topology jars now bring along all the enrichment dependencies
>- since more code from various projects are being packaged together,
>version conflicts are more likely and poms become more complicated due to
>all the necessary exclusions
>
>My thinking is that any jar file being deployed should only contain what
>it needs.  Curious what others think here.  My vote would be to maintain a
>common project (or whatever we want to call it) and be diligent about not
>letting project-specific code slip in there.
>
>I believe Nick was the first person to ask the question about projects
>related to test code and why we would need separate test and integration
>test.  The reason for this is that our integration-test classes currently
>depend on other projects (not surprising since they are integration
>tests).  If there are utilities we want make available to all projects
>(mock classes, utilities for reading sample data, etc) then it can¹t live
>in integration-test because that will introduce circular dependencies.  If
>it is possible to refactor our current Metron-Testing project so that it
>doesn¹t depend on any other projects, then we can keep utilities here.
>Otherwise we need a separate project for testing utilities.  I suspect
>removing other project dependencies from Metron-Testing will prove more
>difficult than it¹s worth so my vote would be to have 2 test related
>projects.
>
>So here is where our metron-platform organization stands:
>
>metron-common *
>metron-integration-test *
>metron-test-utilities *
>metron-data-management
>metron-pcap
>metron-parsers
>metron-enrichment
>   metron-solr
>   metron-elasticsearch
>metron-api
>
>* may or may not change depending on the outcome of this discussion
>
>Thoughts?
>
>Ryan Merriman
>
>
>On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:
>
>>If you load up your Irc client just type
>>/join #apache-metron-dev
>>
>>Sent from my iPhone
>>
>>> On Apr 11, 2016, at 12:06 PM, James Sirota 
>>>wrote:
>>> 
>>> Great, thanks, Debo.  Where can I find instructions on how to get to
>>>it?
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:
 
 Hi James 
 
 Ok set it up and ack Š..
 
 Thx
 
 
 
 
 
> On 4/10/16, 6:31 PM, "James Sirota"  wrote:
> 
> Hi Debo,
> 
> I think it would be great if you set it up
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>> 
>> I have set it up for another open source effort in the past and it
>>was not very hard. Am happy to volunteer if needed.
>> 
>> Thx 
>> Debo
>

Re: [DISCUSS] Project reorganization

2016-04-13 Thread James Sirota
+1 from me.

I would also like to address the configs and make sure the configs are in the 
same place.  Do you have ideas on where we would put those?

Thanks,
James 



On 4/13/16, 6:50 AM, "Ryan Merriman"  wrote:

>Thank you for all the feedback everyone.  I will attempt to summarize all
>the input we¹ve received and update my initial proposal.  We can discuss
>further if anyone is still unclear and I will volunteer to capture all the
>details in a document of some kind once we all come to a consensus.
>
>Looks like everyone is in agreement for the top level projects.  Nick is
>working on a task that will require an addition top level project so I am
>going to add that in as well:
>
>metron-deployment
>metron-platform
>metron-ui
>metron-sensors
>
>All of these except metron-platform are well understood and don¹t warrant
>any more discussion.  For metron-platform there seem to be 2 areas that
>are not as clear: 
>
>- whether we need a common project
>- how do we organize test related code
>
>I agree with David and others that a common project will likely get
>misused and could become unnecessary bloated.  But I suspect there will be
>cases where we have common code being used across multiple projects (is
>already happening).  In this case we will either need this common project
>or we will have to keep common code in one of the other projects and have
>all other projects extend that. For the latter, an example would be
>keeping common code in enrichment and having parsers declare enrichment as
>a dependency.  There are a couple downsides I see with this approach:
>
>- parser topology jars now bring along all the enrichment dependencies
>- since more code from various projects are being packaged together,
>version conflicts are more likely and poms become more complicated due to
>all the necessary exclusions
>
>My thinking is that any jar file being deployed should only contain what
>it needs.  Curious what others think here.  My vote would be to maintain a
>common project (or whatever we want to call it) and be diligent about not
>letting project-specific code slip in there.
>
>I believe Nick was the first person to ask the question about projects
>related to test code and why we would need separate test and integration
>test.  The reason for this is that our integration-test classes currently
>depend on other projects (not surprising since they are integration
>tests).  If there are utilities we want make available to all projects
>(mock classes, utilities for reading sample data, etc) then it can¹t live
>in integration-test because that will introduce circular dependencies.  If
>it is possible to refactor our current Metron-Testing project so that it
>doesn¹t depend on any other projects, then we can keep utilities here.
>Otherwise we need a separate project for testing utilities.  I suspect
>removing other project dependencies from Metron-Testing will prove more
>difficult than it¹s worth so my vote would be to have 2 test related
>projects.
>
>So here is where our metron-platform organization stands:
>
>metron-common *
>metron-integration-test *
>metron-test-utilities *
>metron-data-management
>metron-pcap
>metron-parsers
>metron-enrichment
>   metron-solr
>   metron-elasticsearch
>metron-api
>
>* may or may not change depending on the outcome of this discussion
>
>Thoughts?
>
>Ryan Merriman
>
>
>On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:
>
>>If you load up your Irc client just type
>>/join #apache-metron-dev
>>
>>Sent from my iPhone
>>
>>> On Apr 11, 2016, at 12:06 PM, James Sirota 
>>>wrote:
>>> 
>>> Great, thanks, Debo.  Where can I find instructions on how to get to it?
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:
 
 Hi James 
 
 Ok set it up and ack Š..
 
 Thx
 
 
 
 
 
> On 4/10/16, 6:31 PM, "James Sirota"  wrote:
> 
> Hi Debo,
> 
> I think it would be great if you set it up
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>> 
>> I have set it up for another open source effort in the past and it
>>was not very hard. Am happy to volunteer if needed.
>> 
>> Thx 
>> Debo
>> 
>> Sent from my iPhone
>> 
>>> On Apr 10, 2016, at 5:53 PM, James Sirota 
>>>wrote:
>>> 
>>> I¹d be open to an IRC channel.  Does anyone know if Apache allows
>>>this?  If yes, does anyone know how to set one up?
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
 
 Hi Nick 
 
 I like your suggestions. For the enrichment layer do you think it
would also include any advanced analytics. Else we might want to
have an analytics layer.
 
 It would be good to have an arch which could be extended for new
functionality.
 

Re: [DISCUSS] Project reorganization

2016-04-13 Thread Ryan Merriman
Thank you for all the feedback everyone.  I will attempt to summarize all
the input we¹ve received and update my initial proposal.  We can discuss
further if anyone is still unclear and I will volunteer to capture all the
details in a document of some kind once we all come to a consensus.

Looks like everyone is in agreement for the top level projects.  Nick is
working on a task that will require an addition top level project so I am
going to add that in as well:

metron-deployment
metron-platform
metron-ui
metron-sensors

All of these except metron-platform are well understood and don¹t warrant
any more discussion.  For metron-platform there seem to be 2 areas that
are not as clear: 

- whether we need a common project
- how do we organize test related code

I agree with David and others that a common project will likely get
misused and could become unnecessary bloated.  But I suspect there will be
cases where we have common code being used across multiple projects (is
already happening).  In this case we will either need this common project
or we will have to keep common code in one of the other projects and have
all other projects extend that. For the latter, an example would be
keeping common code in enrichment and having parsers declare enrichment as
a dependency.  There are a couple downsides I see with this approach:

- parser topology jars now bring along all the enrichment dependencies
- since more code from various projects are being packaged together,
version conflicts are more likely and poms become more complicated due to
all the necessary exclusions

My thinking is that any jar file being deployed should only contain what
it needs.  Curious what others think here.  My vote would be to maintain a
common project (or whatever we want to call it) and be diligent about not
letting project-specific code slip in there.

I believe Nick was the first person to ask the question about projects
related to test code and why we would need separate test and integration
test.  The reason for this is that our integration-test classes currently
depend on other projects (not surprising since they are integration
tests).  If there are utilities we want make available to all projects
(mock classes, utilities for reading sample data, etc) then it can¹t live
in integration-test because that will introduce circular dependencies.  If
it is possible to refactor our current Metron-Testing project so that it
doesn¹t depend on any other projects, then we can keep utilities here.
Otherwise we need a separate project for testing utilities.  I suspect
removing other project dependencies from Metron-Testing will prove more
difficult than it¹s worth so my vote would be to have 2 test related
projects.

So here is where our metron-platform organization stands:

metron-common *
metron-integration-test *
metron-test-utilities *
metron-data-management
metron-pcap
metron-parsers
metron-enrichment
metron-solr
metron-elasticsearch
metron-api

* may or may not change depending on the outcome of this discussion

Thoughts?

Ryan Merriman


On 4/11/16, 4:15 PM, "Debojyoti Dutta"  wrote:

>If you load up your Irc client just type
>/join #apache-metron-dev
>
>Sent from my iPhone
>
>> On Apr 11, 2016, at 12:06 PM, James Sirota 
>>wrote:
>> 
>> Great, thanks, Debo.  Where can I find instructions on how to get to it?
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:
>>> 
>>> Hi James 
>>> 
>>> Ok set it up and ack Š..
>>> 
>>> Thx
>>> 
>>> 
>>> 
>>> 
>>> 
 On 4/10/16, 6:31 PM, "James Sirota"  wrote:
 
 Hi Debo,
 
 I think it would be great if you set it up
 
 Thanks,
 James 
 
 
 
 
> On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
> 
> I have set it up for another open source effort in the past and it
>was not very hard. Am happy to volunteer if needed.
> 
> Thx 
> Debo
> 
> Sent from my iPhone
> 
>> On Apr 10, 2016, at 5:53 PM, James Sirota 
>>wrote:
>> 
>> I¹d be open to an IRC channel.  Does anyone know if Apache allows
>>this?  If yes, does anyone know how to set one up?
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>>> 
>>> Hi Nick 
>>> 
>>> I like your suggestions. For the enrichment layer do you think it
>>>would also include any advanced analytics. Else we might want to
>>>have an analytics layer.
>>> 
>>> It would be good to have an arch which could be extended for new
>>>functionality.
>>> 
>>> However Ryan's suggestion of the ui API and deployer also makes
>>>sense. 
>>> 
>>> Should we have an IRC channel to discuss this or maybe etherpad?
>>> 
>>> Debo
>>> 
>>> Sent from my iPhone
>>> 
 On Apr 10, 2016, at 4:36 PM, Nick Allen 
wrote:
 
 It might help to think of our code base as 

Re: [DISCUSS] Project reorganization

2016-04-11 Thread Debojyoti Dutta
If you load up your Irc client just type
/join #apache-metron-dev

Sent from my iPhone

> On Apr 11, 2016, at 12:06 PM, James Sirota  wrote:
> 
> Great, thanks, Debo.  Where can I find instructions on how to get to it?
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:
>> 
>> Hi James 
>> 
>> Ok set it up and ack ….. 
>> 
>> Thx
>> 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 6:31 PM, "James Sirota"  wrote:
>>> 
>>> Hi Debo,
>>> 
>>> I think it would be great if you set it up
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
 
 I have set it up for another open source effort in the past and it was not 
 very hard. Am happy to volunteer if needed. 
 
 Thx 
 Debo
 
 Sent from my iPhone
 
> On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
> 
> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  
> If yes, does anyone know how to set one up?
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>> 
>> Hi Nick 
>> 
>> I like your suggestions. For the enrichment layer do you think it would 
>> also include any advanced analytics. Else we might want to have an 
>> analytics layer. 
>> 
>> It would be good to have an arch which could be extended for new 
>> functionality. 
>> 
>> However Ryan's suggestion of the ui API and deployer also makes sense. 
>> 
>> Should we have an IRC channel to discuss this or maybe etherpad?
>> 
>> Debo
>> 
>> Sent from my iPhone
>> 
>>> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
>>> 
>>> It might help to think of our code base as four separate types of
>>> functionality.  This is primarily meant to give us a framework to think
>>> about the organization of Metron (and drive more discussion), rather 
>>> than
>>> my proposal for a specific structure.
>>> 
>>> - Sensor - Anything that captures external, non-streaming data and
>>> presents it in a form ready for stream processing.
>>> - Input - Responsible for preparing streaming data for enrichment.  The
>>> existing "parsers" fit neatly into this space.
>>> - Enrichment - Responsible for enriching an incoming data feed like
>>> geoip, asset enrichment, threat intel lookups, etc.
>>> - Output - Responsible for persisting data that has been processed by
>>> Metron which obviously means search indexers or data stores.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>>> 
>>> wrote:
>>> 
 All,
 
 I would like to propose a review and refactor of the current project
 organization within Metron.  Much of the way the legacy code was 
 organized
 does not make sense anymore and could be designed so that it is easier 
 to
 navigate and understand.  Our test coverage has increased 
 substantially so
 I believe we can do this with confidence.
 
 First off, I think we should agree on a naming convention.  I see some
 projects (YARN and Storm for example) that prepend the sub-project 
 with the
 name of the top-level project (storm-core for example).  Metron also
 currently does this (Metron-Common).  I think that's fine, although in 
 the
 case of Metron, I feel like having "Metron" prepended is redundant.
 Regardless of whether we decide to stick with that approach, I propose 
 that
 project names be uniform and lowercase.  For example, under these
 assumptions "Metron-Common" would change to "common".
 
 The first level of organization makes sense to me.  Only change I would
 make would be to project names:
 
 *   deployment
 *   streaming
 *   ui
 
 Or if we want to keep metron in project names:
 
 *   metron-deployment
 *   metron-streaming
 *   metron-ui
 
 For now I don't see any changes necessary in deployment or ui
 organization.  I see the streaming project structure primarily driven 
 by 2
 things:  the Maven dependency tree and deployment targets.  For 
 example,
 solr and elasticsearch code should be separated (because their 
 dependency
 on lucene conflicts) but both will depend on common enrichment code.  
 Also,
 now that parser, enrichment and pcap topologies are separate, code for
 those topologies will be deployed as separate jars.  No reason to 
 include
 parser code in enrichment topologies and vice-versa.  Any other
 considerations I'm missing?
 
 With that being said, here is my initial proposal:
 
>>>

Re: [DISCUSS] Project reorganization

2016-04-11 Thread Debo Dutta (dedutta)
I just registered 2 channels under my nick (you need to be registered with 
freenode to create a channel) …. I am on a mac now and textual5 works for me. 
These are open channels. 


13:10:20]  -ChanServ-   #apache-metron is now registered to ddutta. 

[13:10:20]  -ChanServ-  

[13:10:20]  -ChanServ-  Channel guidelines can be found on the freenode 
website 

[13:10:20]  -ChanServ-  (http://freenode.net/channel_guidelines.shtml). 

[13:10:20]  -ChanServ-  This is a primary namespace channel as per  

[13:10:20]  -ChanServ-  http://freenode.net/policy.shtml#primarychannels

[13:10:20]  -ChanServ-  If you do not own this name, please consider

[13:10:20]  -ChanServ-  dropping #apache-metron and using 
##apache-metron instead.




[13:11:19]  register #apache-metron-dev

[13:11:19]  -ChanServ-  #apache-metron-dev is now registered to ddutta. 

[13:11:19]  -ChanServ-  

[13:11:19]  -ChanServ-  Channel guidelines can be found on the freenode 
website 

[13:11:19]  -ChanServ-  (http://freenode.net/channel_guidelines.shtml). 

[13:11:19]  -ChanServ-  This is a primary namespace channel as per  

[13:11:19]  -ChanServ-  http://freenode.net/policy.shtml#primarychannels

[13:11:19]  -ChanServ-  If you do not own this name, please consider

[13:11:19]  -ChanServ-  dropping #apache-metron-dev and using 
##apache-metron-dev instead.











On 4/11/16, 12:06 PM, "James Sirota"  wrote:

>Great, thanks, Debo.  Where can I find instructions on how to get to it?
>
>Thanks,
>James 
>
>
>
>
>On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:
>
>>Hi James 
>>
>>Ok set it up and ack ….. 
>>
>>Thx
>>
>>
>>
>>
>>
>>On 4/10/16, 6:31 PM, "James Sirota"  wrote:
>>
>>>Hi Debo,
>>>
>>>I think it would be great if you set it up
>>>
>>>Thanks,
>>>James 
>>>
>>>
>>>
>>>
>>>On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>>>
I have set it up for another open source effort in the past and it was not 
very hard. Am happy to volunteer if needed. 

Thx 
Debo

Sent from my iPhone

> On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
> 
> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  
> If yes, does anyone know how to set one up?
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>> 
>> Hi Nick 
>> 
>> I like your suggestions. For the enrichment layer do you think it would 
>> also include any advanced analytics. Else we might want to have an 
>> analytics layer. 
>> 
>> It would be good to have an arch which could be extended for new 
>> functionality. 
>> 
>> However Ryan's suggestion of the ui API and deployer also makes sense. 
>> 
>> Should we have an IRC channel to discuss this or maybe etherpad?
>> 
>> Debo
>> 
>> Sent from my iPhone
>> 
>>> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
>>> 
>>> It might help to think of our code base as four separate types of
>>> functionality.  This is primarily meant to give us a framework to think
>>> about the organization of Metron (and drive more discussion), rather 
>>> than
>>> my proposal for a specific structure.
>>> 
>>>  - Sensor - Anything that captures external, non-streaming data and
>>>  presents it in a form ready for stream processing.
>>>  - Input - Responsible for preparing streaming data for enrichment.  The
>>>  existing "parsers" fit neatly into this space.
>>>  - Enrichment - Responsible for enriching an incoming data feed like
>>>  geoip, asset enrichment, threat intel lookups, etc.
>>>  - Output - Responsible for persisting data that has been processed by
>>>  Metron which obviously means search indexers or data stores.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>>> 
>>> wrote:
>>> 
 All,
 
 I would like to propose a review and refactor of the current project
 organization within Metron.  Much of the way the legacy code was 
 organized
 does not make sense anymore and could be designed so that it is easier 
 to
 navigate and understand.  Our test coverage has increased 
 substantially so
 I believe we can do this with confidence.
 
 First off, I think we should agree on a naming convention.  I see some
 projects (YARN and Storm for example) that prepend the sub-project 
 with the
 name of the top-level project (storm-core for example).  Metron also
 currently does this (Metron-Common).  I think that's fine, although in 
 the
 case of Metron, I feel like having "Metron" prepended is redundant.
 Regardless of wheth

Re: [DISCUSS] Project reorganization

2016-04-11 Thread James Sirota
Great, thanks, Debo.  Where can I find instructions on how to get to it?

Thanks,
James 




On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)"  wrote:

>Hi James 
>
>Ok set it up and ack ….. 
>
>Thx
>
>
>
>
>
>On 4/10/16, 6:31 PM, "James Sirota"  wrote:
>
>>Hi Debo,
>>
>>I think it would be great if you set it up
>>
>>Thanks,
>>James 
>>
>>
>>
>>
>>On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>>
>>>I have set it up for another open source effort in the past and it was not 
>>>very hard. Am happy to volunteer if needed. 
>>>
>>>Thx 
>>>Debo
>>>
>>>Sent from my iPhone
>>>
 On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
 
 I’d be open to an IRC channel.  Does anyone know if Apache allows this?  
 If yes, does anyone know how to set one up?
 
 Thanks,
 James 
 
 
 
 
> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
> 
> Hi Nick 
> 
> I like your suggestions. For the enrichment layer do you think it would 
> also include any advanced analytics. Else we might want to have an 
> analytics layer. 
> 
> It would be good to have an arch which could be extended for new 
> functionality. 
> 
> However Ryan's suggestion of the ui API and deployer also makes sense. 
> 
> Should we have an IRC channel to discuss this or maybe etherpad?
> 
> Debo
> 
> Sent from my iPhone
> 
>> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
>> 
>> It might help to think of our code base as four separate types of
>> functionality.  This is primarily meant to give us a framework to think
>> about the organization of Metron (and drive more discussion), rather than
>> my proposal for a specific structure.
>> 
>>  - Sensor - Anything that captures external, non-streaming data and
>>  presents it in a form ready for stream processing.
>>  - Input - Responsible for preparing streaming data for enrichment.  The
>>  existing "parsers" fit neatly into this space.
>>  - Enrichment - Responsible for enriching an incoming data feed like
>>  geoip, asset enrichment, threat intel lookups, etc.
>>  - Output - Responsible for persisting data that has been processed by
>>  Metron which obviously means search indexers or data stores.
>> 
>> 
>> 
>> 
>> 
>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>> wrote:
>> 
>>> All,
>>> 
>>> I would like to propose a review and refactor of the current project
>>> organization within Metron.  Much of the way the legacy code was 
>>> organized
>>> does not make sense anymore and could be designed so that it is easier 
>>> to
>>> navigate and understand.  Our test coverage has increased substantially 
>>> so
>>> I believe we can do this with confidence.
>>> 
>>> First off, I think we should agree on a naming convention.  I see some
>>> projects (YARN and Storm for example) that prepend the sub-project with 
>>> the
>>> name of the top-level project (storm-core for example).  Metron also
>>> currently does this (Metron-Common).  I think that's fine, although in 
>>> the
>>> case of Metron, I feel like having "Metron" prepended is redundant.
>>> Regardless of whether we decide to stick with that approach, I propose 
>>> that
>>> project names be uniform and lowercase.  For example, under these
>>> assumptions "Metron-Common" would change to "common".
>>> 
>>> The first level of organization makes sense to me.  Only change I would
>>> make would be to project names:
>>> 
>>> *   deployment
>>> *   streaming
>>> *   ui
>>> 
>>> Or if we want to keep metron in project names:
>>> 
>>> *   metron-deployment
>>> *   metron-streaming
>>> *   metron-ui
>>> 
>>> For now I don't see any changes necessary in deployment or ui
>>> organization.  I see the streaming project structure primarily driven 
>>> by 2
>>> things:  the Maven dependency tree and deployment targets.  For example,
>>> solr and elasticsearch code should be separated (because their 
>>> dependency
>>> on lucene conflicts) but both will depend on common enrichment code.  
>>> Also,
>>> now that parser, enrichment and pcap topologies are separate, code for
>>> those topologies will be deployed as separate jars.  No reason to 
>>> include
>>> parser code in enrichment topologies and vice-versa.  Any other
>>> considerations I'm missing?
>>> 
>>> With that being said, here is my initial proposal:
>>> 
>>> *   common -  Any common code that all topologies depend on
>>> (configuration classes, generic writers for example).  No dependencies 
>>> on
>>> other Metron projects.
>>> *   test - Contains utilities for writing unit tests, sample configs and
>>> sample data.  Will depend on common.
>>> *   integration-test - Contains utilit

Re: [DISCUSS] Project reorganization

2016-04-11 Thread Debo Dutta (dedutta)
Hi James 

Ok set it up and ack ….. 

Thx





On 4/10/16, 6:31 PM, "James Sirota"  wrote:

>Hi Debo,
>
>I think it would be great if you set it up
>
>Thanks,
>James 
>
>
>
>
>On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>
>>I have set it up for another open source effort in the past and it was not 
>>very hard. Am happy to volunteer if needed. 
>>
>>Thx 
>>Debo
>>
>>Sent from my iPhone
>>
>>> On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
>>> 
>>> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  If 
>>> yes, does anyone know how to set one up?
>>> 
>>> Thanks,
>>> James 
>>> 
>>> 
>>> 
>>> 
 On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
 
 Hi Nick 
 
 I like your suggestions. For the enrichment layer do you think it would 
 also include any advanced analytics. Else we might want to have an 
 analytics layer. 
 
 It would be good to have an arch which could be extended for new 
 functionality. 
 
 However Ryan's suggestion of the ui API and deployer also makes sense. 
 
 Should we have an IRC channel to discuss this or maybe etherpad?
 
 Debo
 
 Sent from my iPhone
 
> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
> 
> It might help to think of our code base as four separate types of
> functionality.  This is primarily meant to give us a framework to think
> about the organization of Metron (and drive more discussion), rather than
> my proposal for a specific structure.
> 
>  - Sensor - Anything that captures external, non-streaming data and
>  presents it in a form ready for stream processing.
>  - Input - Responsible for preparing streaming data for enrichment.  The
>  existing "parsers" fit neatly into this space.
>  - Enrichment - Responsible for enriching an incoming data feed like
>  geoip, asset enrichment, threat intel lookups, etc.
>  - Output - Responsible for persisting data that has been processed by
>  Metron which obviously means search indexers or data stores.
> 
> 
> 
> 
> 
> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
> wrote:
> 
>> All,
>> 
>> I would like to propose a review and refactor of the current project
>> organization within Metron.  Much of the way the legacy code was 
>> organized
>> does not make sense anymore and could be designed so that it is easier to
>> navigate and understand.  Our test coverage has increased substantially 
>> so
>> I believe we can do this with confidence.
>> 
>> First off, I think we should agree on a naming convention.  I see some
>> projects (YARN and Storm for example) that prepend the sub-project with 
>> the
>> name of the top-level project (storm-core for example).  Metron also
>> currently does this (Metron-Common).  I think that's fine, although in 
>> the
>> case of Metron, I feel like having "Metron" prepended is redundant.
>> Regardless of whether we decide to stick with that approach, I propose 
>> that
>> project names be uniform and lowercase.  For example, under these
>> assumptions "Metron-Common" would change to "common".
>> 
>> The first level of organization makes sense to me.  Only change I would
>> make would be to project names:
>> 
>> *   deployment
>> *   streaming
>> *   ui
>> 
>> Or if we want to keep metron in project names:
>> 
>> *   metron-deployment
>> *   metron-streaming
>> *   metron-ui
>> 
>> For now I don't see any changes necessary in deployment or ui
>> organization.  I see the streaming project structure primarily driven by 
>> 2
>> things:  the Maven dependency tree and deployment targets.  For example,
>> solr and elasticsearch code should be separated (because their dependency
>> on lucene conflicts) but both will depend on common enrichment code.  
>> Also,
>> now that parser, enrichment and pcap topologies are separate, code for
>> those topologies will be deployed as separate jars.  No reason to include
>> parser code in enrichment topologies and vice-versa.  Any other
>> considerations I'm missing?
>> 
>> With that being said, here is my initial proposal:
>> 
>> *   common -  Any common code that all topologies depend on
>> (configuration classes, generic writers for example).  No dependencies on
>> other Metron projects.
>> *   test - Contains utilities for writing unit tests, sample configs and
>> sample data.  Will depend on common.
>> *   integration-test - Contains utilities and classes needed to run our
>> integration tests (in memory components for example).  Will depend on
>> common and test.
>> *   dataload - Contains all code related to data loading.  Will also
>> include any property files needed and integration tests.  Will depend on
>> common, test (test scope

Re: [DISCUSS] Project reorganization

2016-04-11 Thread Casey Stella
Also, +1 for more intelligent use of dependencyManagement and have each su
module build independently. I think the top level Pom should build metron
streaming as well as the C component as well.

I tend to favor smaller projects for common code rather than these grab bag
common projects as well, but I do not have a strong opposition necessarily.

Sorry for typos; commenting from my phone at the airport.

Casey
On Mon, Apr 11, 2016 at 11:57 Casey Stella  wrote:

> I'm in general in favor of keeping an integration test project only for
> integration test infrastructure (i.e. The inmemory components) and having
> the integration tests live in the projects that have the components that
> are being tested.
>
> On Mon, Apr 11, 2016 at 11:36 David Lyle  wrote:
>
>> I think I was thinking along the same lines as James, let me read it back
>> to make sure:
>>
>> Metron
>>   Platform
>>  Common (*)
>>  Integration-Test (*)
>>  DataManagement
>>  PCAP
>>  Parsers
>>  Enrichment
>>Solr
>>Elasticsearch
>>   Deployment
>>   Streaming
>>   UI
>>
>> For Common and Integration-Test, I'd be interested in a little more
>> discussion around keeping them. I lean toward not having them. I
>> understand
>> and support the goal of reuse, but I've found these catch-all projects
>> don't always facilitate that aim. We may be better served in the long run
>> by aligning these classes with their initial users. For example, wouldn't
>> all the bolt interfaces and abstract classes be better homed in
>> Enrichment?
>> Configuration classes may be best as a separate project under Platform?
>> The
>> classes in Metron-Testing may have to stick around as a separate project-
>> but perhaps not, they seem to be tightly aligned with enrichment type
>> integration testing.
>>
>> Also- since we're going to have to refactor the poms as part of this
>> effort, there are some first order principles that'd I'd be interested in
>> hearing other's thoughts about:
>>
>> 1) mvn (whatever) should run from the top level and each sub-module.
>> 2) The top level pom should use a dependencyManagement section to avoid
>> global_version type variables.
>> 3) All plugins and dependencies should have a specified version (fwiw, I
>> think we're pretty good here, but it's worth a look)
>> 4) Versioning- master/trunk should be version-SNAPSHOT.
>> 5) Other thoughts?
>>
>>
>> -D...
>>
>>
>> On Sun, Apr 10, 2016 at 8:31 PM, James Sirota 
>> wrote:
>>
>> > Hi Debo,
>> >
>> > I think it would be great if you set it up
>> >
>> > Thanks,
>> > James
>> >
>> >
>> >
>> >
>> > On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>> >
>> > >I have set it up for another open source effort in the past and it was
>> > not very hard. Am happy to volunteer if needed.
>> > >
>> > >Thx
>> > >Debo
>> > >
>> > >Sent from my iPhone
>> > >
>> > >> On Apr 10, 2016, at 5:53 PM, James Sirota 
>> > wrote:
>> > >>
>> > >> I’d be open to an IRC channel.  Does anyone know if Apache allows
>> > this?  If yes, does anyone know how to set one up?
>> > >>
>> > >> Thanks,
>> > >> James
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>> > >>>
>> > >>> Hi Nick
>> > >>>
>> > >>> I like your suggestions. For the enrichment layer do you think it
>> > would also include any advanced analytics. Else we might want to have an
>> > analytics layer.
>> > >>>
>> > >>> It would be good to have an arch which could be extended for new
>> > functionality.
>> > >>>
>> > >>> However Ryan's suggestion of the ui API and deployer also makes
>> sense.
>> > >>>
>> > >>> Should we have an IRC channel to discuss this or maybe etherpad?
>> > >>>
>> > >>> Debo
>> > >>>
>> > >>> Sent from my iPhone
>> > >>>
>> >  On Apr 10, 2016, at 4:36 PM, Nick Allen 
>> wrote:
>> > 
>> >  It might help to think of our code base as four separate types of
>> >  functionality.  This is primarily meant to give us a framework to
>> > think
>> >  about the organization of Metron (and drive more discussion),
>> rather
>> > than
>> >  my proposal for a specific structure.
>> > 
>> >   - Sensor - Anything that captures external, non-streaming data and
>> >   presents it in a form ready for stream processing.
>> >   - Input - Responsible for preparing streaming data for enrichment.
>> > The
>> >   existing "parsers" fit neatly into this space.
>> >   - Enrichment - Responsible for enriching an incoming data feed
>> like
>> >   geoip, asset enrichment, threat intel lookups, etc.
>> >   - Output - Responsible for persisting data that has been
>> processed by
>> >   Metron which obviously means search indexers or data stores.
>> > 
>> > 
>> > 
>> > 
>> > 
>> >  On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman <
>> > rmerri...@hortonworks.com>
>> >  wrote:
>> > 
>> > > All,
>> > >
>> > > I would like to propose a review and refactor of the current
>> project
>> > > organizatio

Re: [DISCUSS] Project reorganization

2016-04-11 Thread Casey Stella
I'm in general in favor of keeping an integration test project only for
integration test infrastructure (i.e. The inmemory components) and having
the integration tests live in the projects that have the components that
are being tested.

On Mon, Apr 11, 2016 at 11:36 David Lyle  wrote:

> I think I was thinking along the same lines as James, let me read it back
> to make sure:
>
> Metron
>   Platform
>  Common (*)
>  Integration-Test (*)
>  DataManagement
>  PCAP
>  Parsers
>  Enrichment
>Solr
>Elasticsearch
>   Deployment
>   Streaming
>   UI
>
> For Common and Integration-Test, I'd be interested in a little more
> discussion around keeping them. I lean toward not having them. I understand
> and support the goal of reuse, but I've found these catch-all projects
> don't always facilitate that aim. We may be better served in the long run
> by aligning these classes with their initial users. For example, wouldn't
> all the bolt interfaces and abstract classes be better homed in Enrichment?
> Configuration classes may be best as a separate project under Platform? The
> classes in Metron-Testing may have to stick around as a separate project-
> but perhaps not, they seem to be tightly aligned with enrichment type
> integration testing.
>
> Also- since we're going to have to refactor the poms as part of this
> effort, there are some first order principles that'd I'd be interested in
> hearing other's thoughts about:
>
> 1) mvn (whatever) should run from the top level and each sub-module.
> 2) The top level pom should use a dependencyManagement section to avoid
> global_version type variables.
> 3) All plugins and dependencies should have a specified version (fwiw, I
> think we're pretty good here, but it's worth a look)
> 4) Versioning- master/trunk should be version-SNAPSHOT.
> 5) Other thoughts?
>
>
> -D...
>
>
> On Sun, Apr 10, 2016 at 8:31 PM, James Sirota 
> wrote:
>
> > Hi Debo,
> >
> > I think it would be great if you set it up
> >
> > Thanks,
> > James
> >
> >
> >
> >
> > On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
> >
> > >I have set it up for another open source effort in the past and it was
> > not very hard. Am happy to volunteer if needed.
> > >
> > >Thx
> > >Debo
> > >
> > >Sent from my iPhone
> > >
> > >> On Apr 10, 2016, at 5:53 PM, James Sirota 
> > wrote:
> > >>
> > >> I’d be open to an IRC channel.  Does anyone know if Apache allows
> > this?  If yes, does anyone know how to set one up?
> > >>
> > >> Thanks,
> > >> James
> > >>
> > >>
> > >>
> > >>
> > >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
> > >>>
> > >>> Hi Nick
> > >>>
> > >>> I like your suggestions. For the enrichment layer do you think it
> > would also include any advanced analytics. Else we might want to have an
> > analytics layer.
> > >>>
> > >>> It would be good to have an arch which could be extended for new
> > functionality.
> > >>>
> > >>> However Ryan's suggestion of the ui API and deployer also makes
> sense.
> > >>>
> > >>> Should we have an IRC channel to discuss this or maybe etherpad?
> > >>>
> > >>> Debo
> > >>>
> > >>> Sent from my iPhone
> > >>>
> >  On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
> > 
> >  It might help to think of our code base as four separate types of
> >  functionality.  This is primarily meant to give us a framework to
> > think
> >  about the organization of Metron (and drive more discussion), rather
> > than
> >  my proposal for a specific structure.
> > 
> >   - Sensor - Anything that captures external, non-streaming data and
> >   presents it in a form ready for stream processing.
> >   - Input - Responsible for preparing streaming data for enrichment.
> > The
> >   existing "parsers" fit neatly into this space.
> >   - Enrichment - Responsible for enriching an incoming data feed like
> >   geoip, asset enrichment, threat intel lookups, etc.
> >   - Output - Responsible for persisting data that has been processed
> by
> >   Metron which obviously means search indexers or data stores.
> > 
> > 
> > 
> > 
> > 
> >  On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman <
> > rmerri...@hortonworks.com>
> >  wrote:
> > 
> > > All,
> > >
> > > I would like to propose a review and refactor of the current
> project
> > > organization within Metron.  Much of the way the legacy code was
> > organized
> > > does not make sense anymore and could be designed so that it is
> > easier to
> > > navigate and understand.  Our test coverage has increased
> > substantially so
> > > I believe we can do this with confidence.
> > >
> > > First off, I think we should agree on a naming convention.  I see
> > some
> > > projects (YARN and Storm for example) that prepend the sub-project
> > with the
> > > name of the top-level project (storm-core for example).  Metron
> also
> > > currently does this (Metron-Common).  I think that's fi

Re: [DISCUSS] Project reorganization

2016-04-11 Thread David Lyle
I think I was thinking along the same lines as James, let me read it back
to make sure:

Metron
  Platform
 Common (*)
 Integration-Test (*)
 DataManagement
 PCAP
 Parsers
 Enrichment
   Solr
   Elasticsearch
  Deployment
  Streaming
  UI

For Common and Integration-Test, I'd be interested in a little more
discussion around keeping them. I lean toward not having them. I understand
and support the goal of reuse, but I've found these catch-all projects
don't always facilitate that aim. We may be better served in the long run
by aligning these classes with their initial users. For example, wouldn't
all the bolt interfaces and abstract classes be better homed in Enrichment?
Configuration classes may be best as a separate project under Platform? The
classes in Metron-Testing may have to stick around as a separate project-
but perhaps not, they seem to be tightly aligned with enrichment type
integration testing.

Also- since we're going to have to refactor the poms as part of this
effort, there are some first order principles that'd I'd be interested in
hearing other's thoughts about:

1) mvn (whatever) should run from the top level and each sub-module.
2) The top level pom should use a dependencyManagement section to avoid
global_version type variables.
3) All plugins and dependencies should have a specified version (fwiw, I
think we're pretty good here, but it's worth a look)
4) Versioning- master/trunk should be version-SNAPSHOT.
5) Other thoughts?


-D...


On Sun, Apr 10, 2016 at 8:31 PM, James Sirota 
wrote:

> Hi Debo,
>
> I think it would be great if you set it up
>
> Thanks,
> James
>
>
>
>
> On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:
>
> >I have set it up for another open source effort in the past and it was
> not very hard. Am happy to volunteer if needed.
> >
> >Thx
> >Debo
> >
> >Sent from my iPhone
> >
> >> On Apr 10, 2016, at 5:53 PM, James Sirota 
> wrote:
> >>
> >> I’d be open to an IRC channel.  Does anyone know if Apache allows
> this?  If yes, does anyone know how to set one up?
> >>
> >> Thanks,
> >> James
> >>
> >>
> >>
> >>
> >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
> >>>
> >>> Hi Nick
> >>>
> >>> I like your suggestions. For the enrichment layer do you think it
> would also include any advanced analytics. Else we might want to have an
> analytics layer.
> >>>
> >>> It would be good to have an arch which could be extended for new
> functionality.
> >>>
> >>> However Ryan's suggestion of the ui API and deployer also makes sense.
> >>>
> >>> Should we have an IRC channel to discuss this or maybe etherpad?
> >>>
> >>> Debo
> >>>
> >>> Sent from my iPhone
> >>>
>  On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
> 
>  It might help to think of our code base as four separate types of
>  functionality.  This is primarily meant to give us a framework to
> think
>  about the organization of Metron (and drive more discussion), rather
> than
>  my proposal for a specific structure.
> 
>   - Sensor - Anything that captures external, non-streaming data and
>   presents it in a form ready for stream processing.
>   - Input - Responsible for preparing streaming data for enrichment.
> The
>   existing "parsers" fit neatly into this space.
>   - Enrichment - Responsible for enriching an incoming data feed like
>   geoip, asset enrichment, threat intel lookups, etc.
>   - Output - Responsible for persisting data that has been processed by
>   Metron which obviously means search indexers or data stores.
> 
> 
> 
> 
> 
>  On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman <
> rmerri...@hortonworks.com>
>  wrote:
> 
> > All,
> >
> > I would like to propose a review and refactor of the current project
> > organization within Metron.  Much of the way the legacy code was
> organized
> > does not make sense anymore and could be designed so that it is
> easier to
> > navigate and understand.  Our test coverage has increased
> substantially so
> > I believe we can do this with confidence.
> >
> > First off, I think we should agree on a naming convention.  I see
> some
> > projects (YARN and Storm for example) that prepend the sub-project
> with the
> > name of the top-level project (storm-core for example).  Metron also
> > currently does this (Metron-Common).  I think that's fine, although
> in the
> > case of Metron, I feel like having "Metron" prepended is redundant.
> > Regardless of whether we decide to stick with that approach, I
> propose that
> > project names be uniform and lowercase.  For example, under these
> > assumptions "Metron-Common" would change to "common".
> >
> > The first level of organization makes sense to me.  Only change I
> would
> > make would be to project names:
> >
> > *   deployment
> > *   streaming
> > *   ui
> >
> > Or if we want to keep metron in proje

Re: [DISCUSS] Project reorganization

2016-04-10 Thread James Sirota
Hi Debo,

I think it would be great if you set it up

Thanks,
James 




On 4/10/16, 6:25 PM, "Debojyoti Dutta"  wrote:

>I have set it up for another open source effort in the past and it was not 
>very hard. Am happy to volunteer if needed. 
>
>Thx 
>Debo
>
>Sent from my iPhone
>
>> On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
>> 
>> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  If 
>> yes, does anyone know how to set one up?
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>>> 
>>> Hi Nick 
>>> 
>>> I like your suggestions. For the enrichment layer do you think it would 
>>> also include any advanced analytics. Else we might want to have an 
>>> analytics layer. 
>>> 
>>> It would be good to have an arch which could be extended for new 
>>> functionality. 
>>> 
>>> However Ryan's suggestion of the ui API and deployer also makes sense. 
>>> 
>>> Should we have an IRC channel to discuss this or maybe etherpad?
>>> 
>>> Debo
>>> 
>>> Sent from my iPhone
>>> 
 On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
 
 It might help to think of our code base as four separate types of
 functionality.  This is primarily meant to give us a framework to think
 about the organization of Metron (and drive more discussion), rather than
 my proposal for a specific structure.
 
  - Sensor - Anything that captures external, non-streaming data and
  presents it in a form ready for stream processing.
  - Input - Responsible for preparing streaming data for enrichment.  The
  existing "parsers" fit neatly into this space.
  - Enrichment - Responsible for enriching an incoming data feed like
  geoip, asset enrichment, threat intel lookups, etc.
  - Output - Responsible for persisting data that has been processed by
  Metron which obviously means search indexers or data stores.
 
 
 
 
 
 On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
 wrote:
 
> All,
> 
> I would like to propose a review and refactor of the current project
> organization within Metron.  Much of the way the legacy code was organized
> does not make sense anymore and could be designed so that it is easier to
> navigate and understand.  Our test coverage has increased substantially so
> I believe we can do this with confidence.
> 
> First off, I think we should agree on a naming convention.  I see some
> projects (YARN and Storm for example) that prepend the sub-project with 
> the
> name of the top-level project (storm-core for example).  Metron also
> currently does this (Metron-Common).  I think that's fine, although in the
> case of Metron, I feel like having "Metron" prepended is redundant.
> Regardless of whether we decide to stick with that approach, I propose 
> that
> project names be uniform and lowercase.  For example, under these
> assumptions "Metron-Common" would change to "common".
> 
> The first level of organization makes sense to me.  Only change I would
> make would be to project names:
> 
> *   deployment
> *   streaming
> *   ui
> 
> Or if we want to keep metron in project names:
> 
> *   metron-deployment
> *   metron-streaming
> *   metron-ui
> 
> For now I don't see any changes necessary in deployment or ui
> organization.  I see the streaming project structure primarily driven by 2
> things:  the Maven dependency tree and deployment targets.  For example,
> solr and elasticsearch code should be separated (because their dependency
> on lucene conflicts) but both will depend on common enrichment code.  
> Also,
> now that parser, enrichment and pcap topologies are separate, code for
> those topologies will be deployed as separate jars.  No reason to include
> parser code in enrichment topologies and vice-versa.  Any other
> considerations I'm missing?
> 
> With that being said, here is my initial proposal:
> 
> *   common -  Any common code that all topologies depend on
> (configuration classes, generic writers for example).  No dependencies on
> other Metron projects.
> *   test - Contains utilities for writing unit tests, sample configs and
> sample data.  Will depend on common.
> *   integration-test - Contains utilities and classes needed to run our
> integration tests (in memory components for example).  Will depend on
> common and test.
> *   dataload - Contains all code related to data loading.  Will also
> include any property files needed and integration tests.  Will depend on
> common, test (test scope), and integration-test (test scope).
> *   parser - All code specific to the parser topologies.  Would also
> include scripts, property files, flux files and parser topology 
> integration
> tests.  This project will depend on common, t

Re: [DISCUSS] Project reorganization

2016-04-10 Thread Debojyoti Dutta
I have set it up for another open source effort in the past and it was not very 
hard. Am happy to volunteer if needed. 

Thx 
Debo

Sent from my iPhone

> On Apr 10, 2016, at 5:53 PM, James Sirota  wrote:
> 
> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  If 
> yes, does anyone know how to set one up?
> 
> Thanks,
> James 
> 
> 
> 
> 
>> On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:
>> 
>> Hi Nick 
>> 
>> I like your suggestions. For the enrichment layer do you think it would also 
>> include any advanced analytics. Else we might want to have an analytics 
>> layer. 
>> 
>> It would be good to have an arch which could be extended for new 
>> functionality. 
>> 
>> However Ryan's suggestion of the ui API and deployer also makes sense. 
>> 
>> Should we have an IRC channel to discuss this or maybe etherpad?
>> 
>> Debo
>> 
>> Sent from my iPhone
>> 
>>> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
>>> 
>>> It might help to think of our code base as four separate types of
>>> functionality.  This is primarily meant to give us a framework to think
>>> about the organization of Metron (and drive more discussion), rather than
>>> my proposal for a specific structure.
>>> 
>>>  - Sensor - Anything that captures external, non-streaming data and
>>>  presents it in a form ready for stream processing.
>>>  - Input - Responsible for preparing streaming data for enrichment.  The
>>>  existing "parsers" fit neatly into this space.
>>>  - Enrichment - Responsible for enriching an incoming data feed like
>>>  geoip, asset enrichment, threat intel lookups, etc.
>>>  - Output - Responsible for persisting data that has been processed by
>>>  Metron which obviously means search indexers or data stores.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>>> wrote:
>>> 
 All,
 
 I would like to propose a review and refactor of the current project
 organization within Metron.  Much of the way the legacy code was organized
 does not make sense anymore and could be designed so that it is easier to
 navigate and understand.  Our test coverage has increased substantially so
 I believe we can do this with confidence.
 
 First off, I think we should agree on a naming convention.  I see some
 projects (YARN and Storm for example) that prepend the sub-project with the
 name of the top-level project (storm-core for example).  Metron also
 currently does this (Metron-Common).  I think that's fine, although in the
 case of Metron, I feel like having "Metron" prepended is redundant.
 Regardless of whether we decide to stick with that approach, I propose that
 project names be uniform and lowercase.  For example, under these
 assumptions "Metron-Common" would change to "common".
 
 The first level of organization makes sense to me.  Only change I would
 make would be to project names:
 
 *   deployment
 *   streaming
 *   ui
 
 Or if we want to keep metron in project names:
 
 *   metron-deployment
 *   metron-streaming
 *   metron-ui
 
 For now I don't see any changes necessary in deployment or ui
 organization.  I see the streaming project structure primarily driven by 2
 things:  the Maven dependency tree and deployment targets.  For example,
 solr and elasticsearch code should be separated (because their dependency
 on lucene conflicts) but both will depend on common enrichment code.  Also,
 now that parser, enrichment and pcap topologies are separate, code for
 those topologies will be deployed as separate jars.  No reason to include
 parser code in enrichment topologies and vice-versa.  Any other
 considerations I'm missing?
 
 With that being said, here is my initial proposal:
 
 *   common -  Any common code that all topologies depend on
 (configuration classes, generic writers for example).  No dependencies on
 other Metron projects.
 *   test - Contains utilities for writing unit tests, sample configs and
 sample data.  Will depend on common.
 *   integration-test - Contains utilities and classes needed to run our
 integration tests (in memory components for example).  Will depend on
 common and test.
 *   dataload - Contains all code related to data loading.  Will also
 include any property files needed and integration tests.  Will depend on
 common, test (test scope), and integration-test (test scope).
 *   parser - All code specific to the parser topologies.  Would also
 include scripts, property files, flux files and parser topology integration
 tests.  This project will depend on common, test (test scope), and
 integration-testing (test scope).
 *   enrichment - All code specific to the enrichment topologies (except
 solr and elasticsearch).  Would also include scripts, property files, flux
 files and enrichment topology integration t

Re: [DISCUSS] Project reorganization

2016-04-10 Thread James Sirota
I’d be open to an IRC channel.  Does anyone know if Apache allows this?  If 
yes, does anyone know how to set one up?

Thanks,
James 




On 4/10/16, 4:52 PM, "Debojyoti Dutta"  wrote:

>Hi Nick 
>
>I like your suggestions. For the enrichment layer do you think it would also 
>include any advanced analytics. Else we might want to have an analytics layer. 
>
>It would be good to have an arch which could be extended for new 
>functionality. 
>
>However Ryan's suggestion of the ui API and deployer also makes sense. 
>
>Should we have an IRC channel to discuss this or maybe etherpad?
>
>Debo
>
>Sent from my iPhone
>
>> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
>> 
>> It might help to think of our code base as four separate types of
>> functionality.  This is primarily meant to give us a framework to think
>> about the organization of Metron (and drive more discussion), rather than
>> my proposal for a specific structure.
>> 
>>   - Sensor - Anything that captures external, non-streaming data and
>>   presents it in a form ready for stream processing.
>>   - Input - Responsible for preparing streaming data for enrichment.  The
>>   existing "parsers" fit neatly into this space.
>>   - Enrichment - Responsible for enriching an incoming data feed like
>>   geoip, asset enrichment, threat intel lookups, etc.
>>   - Output - Responsible for persisting data that has been processed by
>>   Metron which obviously means search indexers or data stores.
>> 
>> 
>> 
>> 
>> 
>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>> wrote:
>> 
>>> All,
>>> 
>>> I would like to propose a review and refactor of the current project
>>> organization within Metron.  Much of the way the legacy code was organized
>>> does not make sense anymore and could be designed so that it is easier to
>>> navigate and understand.  Our test coverage has increased substantially so
>>> I believe we can do this with confidence.
>>> 
>>> First off, I think we should agree on a naming convention.  I see some
>>> projects (YARN and Storm for example) that prepend the sub-project with the
>>> name of the top-level project (storm-core for example).  Metron also
>>> currently does this (Metron-Common).  I think that's fine, although in the
>>> case of Metron, I feel like having "Metron" prepended is redundant.
>>> Regardless of whether we decide to stick with that approach, I propose that
>>> project names be uniform and lowercase.  For example, under these
>>> assumptions "Metron-Common" would change to "common".
>>> 
>>> The first level of organization makes sense to me.  Only change I would
>>> make would be to project names:
>>> 
>>>  *   deployment
>>>  *   streaming
>>>  *   ui
>>> 
>>> Or if we want to keep metron in project names:
>>> 
>>>  *   metron-deployment
>>>  *   metron-streaming
>>>  *   metron-ui
>>> 
>>> For now I don't see any changes necessary in deployment or ui
>>> organization.  I see the streaming project structure primarily driven by 2
>>> things:  the Maven dependency tree and deployment targets.  For example,
>>> solr and elasticsearch code should be separated (because their dependency
>>> on lucene conflicts) but both will depend on common enrichment code.  Also,
>>> now that parser, enrichment and pcap topologies are separate, code for
>>> those topologies will be deployed as separate jars.  No reason to include
>>> parser code in enrichment topologies and vice-versa.  Any other
>>> considerations I'm missing?
>>> 
>>> With that being said, here is my initial proposal:
>>> 
>>>  *   common -  Any common code that all topologies depend on
>>> (configuration classes, generic writers for example).  No dependencies on
>>> other Metron projects.
>>>  *   test - Contains utilities for writing unit tests, sample configs and
>>> sample data.  Will depend on common.
>>>  *   integration-test - Contains utilities and classes needed to run our
>>> integration tests (in memory components for example).  Will depend on
>>> common and test.
>>>  *   dataload - Contains all code related to data loading.  Will also
>>> include any property files needed and integration tests.  Will depend on
>>> common, test (test scope), and integration-test (test scope).
>>>  *   parser - All code specific to the parser topologies.  Would also
>>> include scripts, property files, flux files and parser topology integration
>>> tests.  This project will depend on common, test (test scope), and
>>> integration-testing (test scope).
>>>  *   enrichment - All code specific to the enrichment topologies (except
>>> solr and elasticsearch).  Would also include scripts, property files, flux
>>> files and enrichment topology integration tests.  This project will depend
>>> on common, test (test scope), and integration-test (test scope).
>>>  *   elasticsearch - All Elasticsearch related code.  Will depend on
>>> enrichment.
>>>  *   solr - All Solr related code.  Will depend on enrichment.
>>>  *   pcap - All code specific to the topology dedicated to pcap.  Would

Re: [DISCUSS] Project reorganization

2016-04-10 Thread James Sirota
I would put integration test framework into common (since all modules share 
this).  I would also put a unit test framework that other projects can extend 
into common as well.  I would then have each individual module extend the 
frameworks from common.  I don’t think I would want the tests broken up in 
their own project that live separate from the modules.  

Thanks,
James 




On 4/10/16, 4:10 PM, "Nick Allen"  wrote:

>Is there any reason to keep the "test" and "integration-test" code
>separate?
>
>
>
>
>
>
>On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>wrote:
>
>> All,
>>
>> I would like to propose a review and refactor of the current project
>> organization within Metron.  Much of the way the legacy code was organized
>> does not make sense anymore and could be designed so that it is easier to
>> navigate and understand.  Our test coverage has increased substantially so
>> I believe we can do this with confidence.
>>
>> First off, I think we should agree on a naming convention.  I see some
>> projects (YARN and Storm for example) that prepend the sub-project with the
>> name of the top-level project (storm-core for example).  Metron also
>> currently does this (Metron-Common).  I think that's fine, although in the
>> case of Metron, I feel like having "Metron" prepended is redundant.
>> Regardless of whether we decide to stick with that approach, I propose that
>> project names be uniform and lowercase.  For example, under these
>> assumptions "Metron-Common" would change to "common".
>>
>> The first level of organization makes sense to me.  Only change I would
>> make would be to project names:
>>
>>   *   deployment
>>   *   streaming
>>   *   ui
>>
>> Or if we want to keep metron in project names:
>>
>>   *   metron-deployment
>>   *   metron-streaming
>>   *   metron-ui
>>
>> For now I don't see any changes necessary in deployment or ui
>> organization.  I see the streaming project structure primarily driven by 2
>> things:  the Maven dependency tree and deployment targets.  For example,
>> solr and elasticsearch code should be separated (because their dependency
>> on lucene conflicts) but both will depend on common enrichment code.  Also,
>> now that parser, enrichment and pcap topologies are separate, code for
>> those topologies will be deployed as separate jars.  No reason to include
>> parser code in enrichment topologies and vice-versa.  Any other
>> considerations I'm missing?
>>
>> With that being said, here is my initial proposal:
>>
>>   *   common -  Any common code that all topologies depend on
>> (configuration classes, generic writers for example).  No dependencies on
>> other Metron projects.
>>   *   test - Contains utilities for writing unit tests, sample configs and
>> sample data.  Will depend on common.
>>   *   integration-test - Contains utilities and classes needed to run our
>> integration tests (in memory components for example).  Will depend on
>> common and test.
>>   *   dataload - Contains all code related to data loading.  Will also
>> include any property files needed and integration tests.  Will depend on
>> common, test (test scope), and integration-test (test scope).
>>   *   parser - All code specific to the parser topologies.  Would also
>> include scripts, property files, flux files and parser topology integration
>> tests.  This project will depend on common, test (test scope), and
>> integration-testing (test scope).
>>   *   enrichment - All code specific to the enrichment topologies (except
>> solr and elasticsearch).  Would also include scripts, property files, flux
>> files and enrichment topology integration tests.  This project will depend
>> on common, test (test scope), and integration-test (test scope).
>>   *   elasticsearch - All Elasticsearch related code.  Will depend on
>> enrichment.
>>   *   solr - All Solr related code.  Will depend on enrichment.
>>   *   pcap - All code specific to the topology dedicated to pcap.  Would
>> also include scripts, property files, flux files and pcap integration
>> test.  This project will depend on common, test (test scope) and
>> integration-test (test scope).
>>   *   api - This will serve as a generic replacement for
>> Metron-Pcap_Service.  Will contain all code to build a Metron web service
>> middle layer that can expose APIs through REST or other client protocols.
>> Could possibly depend on all other projects or separated further if version
>> conflicts arise (separate api projects for solr and elasticsearch for
>> example).
>>
>> Looking forward to hearing everyone's feedback and great ideas.
>>
>> Ryan Merriman
>>
>
>
>
>-- 
>Nick Allen 


Re: [DISCUSS] Project reorganization

2016-04-10 Thread James Sirota
Hi Ryan, 

Here are my thoughts.  I agree with the first level of breakdown.  Deployment, 
Streaming, UI.  That makes sense.  Although we may re-think Streaming because 
it will now contain a PCAP MR job, which is batch.  I would probably just call 
it Metron-Platform or something like that.

Under Metron-Pletform I would have the following projects:

Common - agree with you we need it for the reasons you described.  This will 
help us with code reuse and standardization 
DataManagement - contains data loaders (enrichment, threat intel) + data 
cleanup and rotation scripts
PCAP - PCAP Storm topology + PCAP Service + MR job to back the service
Parsers - Parser topology + parser bolt + parser modules/grok expressions.  I 
think this should be broken up like this to make the incremental cost of adding 
new topologies as low as possible.  To add a new topology we only want a user 
to build and deploy this jar and we want this jar to be as light as possible to 
only contain code for adding additional sources.
Enrichment - enrichment topology + threat intel + alerts 

Next level down under Enrichment I would include elastic search and sold 
indexing projects as modules. I don’t think they warrant their own project, but 
they can be sub-modules of enrichment.

API - I am in agreement with you that we need this.  However, I think this API 
should wrap the PCAP service + introduce additional services for security and 
multi tenancy (discuss threads are going around right now).  We want our 
security model to be consistently enforced so we should build it into this 
module and expose it as REST services.

What do you think?

Thanks,
James 



On 4/8/16, 1:46 PM, "Ryan Merriman"  wrote:

>All,
>
>I would like to propose a review and refactor of the current project 
>organization within Metron.  Much of the way the legacy code was organized 
>does not make sense anymore and could be designed so that it is easier to 
>navigate and understand.  Our test coverage has increased substantially so I 
>believe we can do this with confidence.
>
>First off, I think we should agree on a naming convention.  I see some 
>projects (YARN and Storm for example) that prepend the sub-project with the 
>name of the top-level project (storm-core for example).  Metron also currently 
>does this (Metron-Common).  I think that's fine, although in the case of 
>Metron, I feel like having "Metron" prepended is redundant.  Regardless of 
>whether we decide to stick with that approach, I propose that project names be 
>uniform and lowercase.  For example, under these assumptions "Metron-Common" 
>would change to "common".
>
>The first level of organization makes sense to me.  Only change I would make 
>would be to project names:
>
>  *   deployment
>  *   streaming
>  *   ui
>
>Or if we want to keep metron in project names:
>
>  *   metron-deployment
>  *   metron-streaming
>  *   metron-ui
>
>For now I don't see any changes necessary in deployment or ui organization.  I 
>see the streaming project structure primarily driven by 2 things:  the Maven 
>dependency tree and deployment targets.  For example, solr and elasticsearch 
>code should be separated (because their dependency on lucene conflicts) but 
>both will depend on common enrichment code.  Also, now that parser, enrichment 
>and pcap topologies are separate, code for those topologies will be deployed 
>as separate jars.  No reason to include parser code in enrichment topologies 
>and vice-versa.  Any other considerations I'm missing?
>
>With that being said, here is my initial proposal:
>
>  *   common -  Any common code that all topologies depend on (configuration 
> classes, generic writers for example).  No dependencies on other Metron 
> projects.
>  *   test - Contains utilities for writing unit tests, sample configs and 
> sample data.  Will depend on common.
>  *   integration-test - Contains utilities and classes needed to run our 
> integration tests (in memory components for example).  Will depend on common 
> and test.
>  *   dataload - Contains all code related to data loading.  Will also include 
> any property files needed and integration tests.  Will depend on common, test 
> (test scope), and integration-test (test scope).
>  *   parser - All code specific to the parser topologies.  Would also include 
> scripts, property files, flux files and parser topology integration tests.  
> This project will depend on common, test (test scope), and 
> integration-testing (test scope).
>  *   enrichment - All code specific to the enrichment topologies (except solr 
> and elasticsearch).  Would also include scripts, property files, flux files 
> and enrichment topology integration tests.  This project will depend on 
> common, test (test scope), and integration-test (test scope).
>  *   elasticsearch - All Elasticsearch related code.  Will depend on 
> enrichment.
>  *   solr - All Solr related code.  Will depend on enrichment.
>  *   pcap - All code specific to the topology dedic

Re: [DISCUSS] Project reorganization

2016-04-10 Thread James Sirota
Hi Nick,

Threat intel is almost like an enrichment.  A telemetry feed gets 
cross-referenced against a threat intel feed (think pivot tables), but threat 
intel in itself is not a telemetry.  Metron’s storm topologies parse out 
individual attributes from telemetries like IDS alerts, OS logs, etc. and these 
attributes may be user agents, IP’s, ports, protocols, etc.  Then the threat 
intel bolts cross-reference that information with anything we have in our 
threat intel feeds to see if any values for these attributes are contained in 
the feeds.  So if we have an IP we check to see if that IP is in our list of 
malicious IPs. If we have a user agent we check if we have any information on 
it in our threat feeds.  If we do we tag a telemetry with is_alert=true to 
indicate that a message received a hit against threat intel and append whatever 
the threat intel data was that it hit against. 

Thanks,
James 



On 4/10/16, 4:54 PM, "Nick Allen"  wrote:

>I had a thought after going through this exercise.  Why treat threat intel
>any different than Netflow, Snort or YAF data?  All input should have the
>opportunity to be enriched using the generic tools that Metron provides.
>Is there any reason to treat threat intel differently from other data
>sources?
>
>
>On Sun, Apr 10, 2016 at 7:36 PM, Nick Allen  wrote:
>
>> It might help to think of our code base as four separate types of
>> functionality.  This is primarily meant to give us a framework to think
>> about the organization of Metron (and drive more discussion), rather than
>> my proposal for a specific structure.
>>
>>- Sensor - Anything that captures external, non-streaming data and
>>presents it in a form ready for stream processing.
>>- Input - Responsible for preparing streaming data for enrichment.
>>The existing "parsers" fit neatly into this space.
>>- Enrichment - Responsible for enriching an incoming data feed like
>>geoip, asset enrichment, threat intel lookups, etc.
>>- Output - Responsible for persisting data that has been processed by
>>Metron which obviously means search indexers or data stores.
>>
>>
>>
>>
>>
>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
>> wrote:
>>
>>> All,
>>>
>>> I would like to propose a review and refactor of the current project
>>> organization within Metron.  Much of the way the legacy code was organized
>>> does not make sense anymore and could be designed so that it is easier to
>>> navigate and understand.  Our test coverage has increased substantially so
>>> I believe we can do this with confidence.
>>>
>>> First off, I think we should agree on a naming convention.  I see some
>>> projects (YARN and Storm for example) that prepend the sub-project with the
>>> name of the top-level project (storm-core for example).  Metron also
>>> currently does this (Metron-Common).  I think that's fine, although in the
>>> case of Metron, I feel like having "Metron" prepended is redundant.
>>> Regardless of whether we decide to stick with that approach, I propose that
>>> project names be uniform and lowercase.  For example, under these
>>> assumptions "Metron-Common" would change to "common".
>>>
>>> The first level of organization makes sense to me.  Only change I would
>>> make would be to project names:
>>>
>>>   *   deployment
>>>   *   streaming
>>>   *   ui
>>>
>>> Or if we want to keep metron in project names:
>>>
>>>   *   metron-deployment
>>>   *   metron-streaming
>>>   *   metron-ui
>>>
>>> For now I don't see any changes necessary in deployment or ui
>>> organization.  I see the streaming project structure primarily driven by 2
>>> things:  the Maven dependency tree and deployment targets.  For example,
>>> solr and elasticsearch code should be separated (because their dependency
>>> on lucene conflicts) but both will depend on common enrichment code.  Also,
>>> now that parser, enrichment and pcap topologies are separate, code for
>>> those topologies will be deployed as separate jars.  No reason to include
>>> parser code in enrichment topologies and vice-versa.  Any other
>>> considerations I'm missing?
>>>
>>> With that being said, here is my initial proposal:
>>>
>>>   *   common -  Any common code that all topologies depend on
>>> (configuration classes, generic writers for example).  No dependencies on
>>> other Metron projects.
>>>   *   test - Contains utilities for writing unit tests, sample configs
>>> and sample data.  Will depend on common.
>>>   *   integration-test - Contains utilities and classes needed to run our
>>> integration tests (in memory components for example).  Will depend on
>>> common and test.
>>>   *   dataload - Contains all code related to data loading.  Will also
>>> include any property files needed and integration tests.  Will depend on
>>> common, test (test scope), and integration-test (test scope).
>>>   *   parser - All code specific to the parser topologies.  Would also
>>> include scripts, property files, flux files and parser topolo

Re: [DISCUSS] Project reorganization

2016-04-10 Thread Nick Allen
I had a thought after going through this exercise.  Why treat threat intel
any different than Netflow, Snort or YAF data?  All input should have the
opportunity to be enriched using the generic tools that Metron provides.
Is there any reason to treat threat intel differently from other data
sources?


On Sun, Apr 10, 2016 at 7:36 PM, Nick Allen  wrote:

> It might help to think of our code base as four separate types of
> functionality.  This is primarily meant to give us a framework to think
> about the organization of Metron (and drive more discussion), rather than
> my proposal for a specific structure.
>
>- Sensor - Anything that captures external, non-streaming data and
>presents it in a form ready for stream processing.
>- Input - Responsible for preparing streaming data for enrichment.
>The existing "parsers" fit neatly into this space.
>- Enrichment - Responsible for enriching an incoming data feed like
>geoip, asset enrichment, threat intel lookups, etc.
>- Output - Responsible for persisting data that has been processed by
>Metron which obviously means search indexers or data stores.
>
>
>
>
>
> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
> wrote:
>
>> All,
>>
>> I would like to propose a review and refactor of the current project
>> organization within Metron.  Much of the way the legacy code was organized
>> does not make sense anymore and could be designed so that it is easier to
>> navigate and understand.  Our test coverage has increased substantially so
>> I believe we can do this with confidence.
>>
>> First off, I think we should agree on a naming convention.  I see some
>> projects (YARN and Storm for example) that prepend the sub-project with the
>> name of the top-level project (storm-core for example).  Metron also
>> currently does this (Metron-Common).  I think that's fine, although in the
>> case of Metron, I feel like having "Metron" prepended is redundant.
>> Regardless of whether we decide to stick with that approach, I propose that
>> project names be uniform and lowercase.  For example, under these
>> assumptions "Metron-Common" would change to "common".
>>
>> The first level of organization makes sense to me.  Only change I would
>> make would be to project names:
>>
>>   *   deployment
>>   *   streaming
>>   *   ui
>>
>> Or if we want to keep metron in project names:
>>
>>   *   metron-deployment
>>   *   metron-streaming
>>   *   metron-ui
>>
>> For now I don't see any changes necessary in deployment or ui
>> organization.  I see the streaming project structure primarily driven by 2
>> things:  the Maven dependency tree and deployment targets.  For example,
>> solr and elasticsearch code should be separated (because their dependency
>> on lucene conflicts) but both will depend on common enrichment code.  Also,
>> now that parser, enrichment and pcap topologies are separate, code for
>> those topologies will be deployed as separate jars.  No reason to include
>> parser code in enrichment topologies and vice-versa.  Any other
>> considerations I'm missing?
>>
>> With that being said, here is my initial proposal:
>>
>>   *   common -  Any common code that all topologies depend on
>> (configuration classes, generic writers for example).  No dependencies on
>> other Metron projects.
>>   *   test - Contains utilities for writing unit tests, sample configs
>> and sample data.  Will depend on common.
>>   *   integration-test - Contains utilities and classes needed to run our
>> integration tests (in memory components for example).  Will depend on
>> common and test.
>>   *   dataload - Contains all code related to data loading.  Will also
>> include any property files needed and integration tests.  Will depend on
>> common, test (test scope), and integration-test (test scope).
>>   *   parser - All code specific to the parser topologies.  Would also
>> include scripts, property files, flux files and parser topology integration
>> tests.  This project will depend on common, test (test scope), and
>> integration-testing (test scope).
>>   *   enrichment - All code specific to the enrichment topologies (except
>> solr and elasticsearch).  Would also include scripts, property files, flux
>> files and enrichment topology integration tests.  This project will depend
>> on common, test (test scope), and integration-test (test scope).
>>   *   elasticsearch - All Elasticsearch related code.  Will depend on
>> enrichment.
>>   *   solr - All Solr related code.  Will depend on enrichment.
>>   *   pcap - All code specific to the topology dedicated to pcap.  Would
>> also include scripts, property files, flux files and pcap integration
>> test.  This project will depend on common, test (test scope) and
>> integration-test (test scope).
>>   *   api - This will serve as a generic replacement for
>> Metron-Pcap_Service.  Will contain all code to build a Metron web service
>> middle layer that can expose APIs through REST or other client protocols.
>> Could 

Re: [DISCUSS] Project reorganization

2016-04-10 Thread Debojyoti Dutta
Hi Nick 

I like your suggestions. For the enrichment layer do you think it would also 
include any advanced analytics. Else we might want to have an analytics layer. 

It would be good to have an arch which could be extended for new functionality. 

However Ryan's suggestion of the ui API and deployer also makes sense. 

Should we have an IRC channel to discuss this or maybe etherpad?

Debo

Sent from my iPhone

> On Apr 10, 2016, at 4:36 PM, Nick Allen  wrote:
> 
> It might help to think of our code base as four separate types of
> functionality.  This is primarily meant to give us a framework to think
> about the organization of Metron (and drive more discussion), rather than
> my proposal for a specific structure.
> 
>   - Sensor - Anything that captures external, non-streaming data and
>   presents it in a form ready for stream processing.
>   - Input - Responsible for preparing streaming data for enrichment.  The
>   existing "parsers" fit neatly into this space.
>   - Enrichment - Responsible for enriching an incoming data feed like
>   geoip, asset enrichment, threat intel lookups, etc.
>   - Output - Responsible for persisting data that has been processed by
>   Metron which obviously means search indexers or data stores.
> 
> 
> 
> 
> 
> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
> wrote:
> 
>> All,
>> 
>> I would like to propose a review and refactor of the current project
>> organization within Metron.  Much of the way the legacy code was organized
>> does not make sense anymore and could be designed so that it is easier to
>> navigate and understand.  Our test coverage has increased substantially so
>> I believe we can do this with confidence.
>> 
>> First off, I think we should agree on a naming convention.  I see some
>> projects (YARN and Storm for example) that prepend the sub-project with the
>> name of the top-level project (storm-core for example).  Metron also
>> currently does this (Metron-Common).  I think that's fine, although in the
>> case of Metron, I feel like having "Metron" prepended is redundant.
>> Regardless of whether we decide to stick with that approach, I propose that
>> project names be uniform and lowercase.  For example, under these
>> assumptions "Metron-Common" would change to "common".
>> 
>> The first level of organization makes sense to me.  Only change I would
>> make would be to project names:
>> 
>>  *   deployment
>>  *   streaming
>>  *   ui
>> 
>> Or if we want to keep metron in project names:
>> 
>>  *   metron-deployment
>>  *   metron-streaming
>>  *   metron-ui
>> 
>> For now I don't see any changes necessary in deployment or ui
>> organization.  I see the streaming project structure primarily driven by 2
>> things:  the Maven dependency tree and deployment targets.  For example,
>> solr and elasticsearch code should be separated (because their dependency
>> on lucene conflicts) but both will depend on common enrichment code.  Also,
>> now that parser, enrichment and pcap topologies are separate, code for
>> those topologies will be deployed as separate jars.  No reason to include
>> parser code in enrichment topologies and vice-versa.  Any other
>> considerations I'm missing?
>> 
>> With that being said, here is my initial proposal:
>> 
>>  *   common -  Any common code that all topologies depend on
>> (configuration classes, generic writers for example).  No dependencies on
>> other Metron projects.
>>  *   test - Contains utilities for writing unit tests, sample configs and
>> sample data.  Will depend on common.
>>  *   integration-test - Contains utilities and classes needed to run our
>> integration tests (in memory components for example).  Will depend on
>> common and test.
>>  *   dataload - Contains all code related to data loading.  Will also
>> include any property files needed and integration tests.  Will depend on
>> common, test (test scope), and integration-test (test scope).
>>  *   parser - All code specific to the parser topologies.  Would also
>> include scripts, property files, flux files and parser topology integration
>> tests.  This project will depend on common, test (test scope), and
>> integration-testing (test scope).
>>  *   enrichment - All code specific to the enrichment topologies (except
>> solr and elasticsearch).  Would also include scripts, property files, flux
>> files and enrichment topology integration tests.  This project will depend
>> on common, test (test scope), and integration-test (test scope).
>>  *   elasticsearch - All Elasticsearch related code.  Will depend on
>> enrichment.
>>  *   solr - All Solr related code.  Will depend on enrichment.
>>  *   pcap - All code specific to the topology dedicated to pcap.  Would
>> also include scripts, property files, flux files and pcap integration
>> test.  This project will depend on common, test (test scope) and
>> integration-test (test scope).
>>  *   api - This will serve as a generic replacement for
>> Metron-Pcap_Service.  Will contain all code to build 

Re: [DISCUSS] Project reorganization

2016-04-10 Thread Nick Allen
It might help to think of our code base as four separate types of
functionality.  This is primarily meant to give us a framework to think
about the organization of Metron (and drive more discussion), rather than
my proposal for a specific structure.

   - Sensor - Anything that captures external, non-streaming data and
   presents it in a form ready for stream processing.
   - Input - Responsible for preparing streaming data for enrichment.  The
   existing "parsers" fit neatly into this space.
   - Enrichment - Responsible for enriching an incoming data feed like
   geoip, asset enrichment, threat intel lookups, etc.
   - Output - Responsible for persisting data that has been processed by
   Metron which obviously means search indexers or data stores.





On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
wrote:

> All,
>
> I would like to propose a review and refactor of the current project
> organization within Metron.  Much of the way the legacy code was organized
> does not make sense anymore and could be designed so that it is easier to
> navigate and understand.  Our test coverage has increased substantially so
> I believe we can do this with confidence.
>
> First off, I think we should agree on a naming convention.  I see some
> projects (YARN and Storm for example) that prepend the sub-project with the
> name of the top-level project (storm-core for example).  Metron also
> currently does this (Metron-Common).  I think that's fine, although in the
> case of Metron, I feel like having "Metron" prepended is redundant.
> Regardless of whether we decide to stick with that approach, I propose that
> project names be uniform and lowercase.  For example, under these
> assumptions "Metron-Common" would change to "common".
>
> The first level of organization makes sense to me.  Only change I would
> make would be to project names:
>
>   *   deployment
>   *   streaming
>   *   ui
>
> Or if we want to keep metron in project names:
>
>   *   metron-deployment
>   *   metron-streaming
>   *   metron-ui
>
> For now I don't see any changes necessary in deployment or ui
> organization.  I see the streaming project structure primarily driven by 2
> things:  the Maven dependency tree and deployment targets.  For example,
> solr and elasticsearch code should be separated (because their dependency
> on lucene conflicts) but both will depend on common enrichment code.  Also,
> now that parser, enrichment and pcap topologies are separate, code for
> those topologies will be deployed as separate jars.  No reason to include
> parser code in enrichment topologies and vice-versa.  Any other
> considerations I'm missing?
>
> With that being said, here is my initial proposal:
>
>   *   common -  Any common code that all topologies depend on
> (configuration classes, generic writers for example).  No dependencies on
> other Metron projects.
>   *   test - Contains utilities for writing unit tests, sample configs and
> sample data.  Will depend on common.
>   *   integration-test - Contains utilities and classes needed to run our
> integration tests (in memory components for example).  Will depend on
> common and test.
>   *   dataload - Contains all code related to data loading.  Will also
> include any property files needed and integration tests.  Will depend on
> common, test (test scope), and integration-test (test scope).
>   *   parser - All code specific to the parser topologies.  Would also
> include scripts, property files, flux files and parser topology integration
> tests.  This project will depend on common, test (test scope), and
> integration-testing (test scope).
>   *   enrichment - All code specific to the enrichment topologies (except
> solr and elasticsearch).  Would also include scripts, property files, flux
> files and enrichment topology integration tests.  This project will depend
> on common, test (test scope), and integration-test (test scope).
>   *   elasticsearch - All Elasticsearch related code.  Will depend on
> enrichment.
>   *   solr - All Solr related code.  Will depend on enrichment.
>   *   pcap - All code specific to the topology dedicated to pcap.  Would
> also include scripts, property files, flux files and pcap integration
> test.  This project will depend on common, test (test scope) and
> integration-test (test scope).
>   *   api - This will serve as a generic replacement for
> Metron-Pcap_Service.  Will contain all code to build a Metron web service
> middle layer that can expose APIs through REST or other client protocols.
> Could possibly depend on all other projects or separated further if version
> conflicts arise (separate api projects for solr and elasticsearch for
> example).
>
> Looking forward to hearing everyone's feedback and great ideas.
>
> Ryan Merriman
>



-- 
Nick Allen 


Re: [DISCUSS] Project reorganization

2016-04-10 Thread Nick Allen
I agree that we should stick to some sort of naming convention.  Personally
I prefer to keep the "metron" identifier and alter deployment so that it
matches the others; metron-deployment.

   - metron-deployment
   - metron-streaming
   - metron-ui
   - ...


On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
wrote:

> All,
>
> I would like to propose a review and refactor of the current project
> organization within Metron.  Much of the way the legacy code was organized
> does not make sense anymore and could be designed so that it is easier to
> navigate and understand.  Our test coverage has increased substantially so
> I believe we can do this with confidence.
>
> First off, I think we should agree on a naming convention.  I see some
> projects (YARN and Storm for example) that prepend the sub-project with the
> name of the top-level project (storm-core for example).  Metron also
> currently does this (Metron-Common).  I think that's fine, although in the
> case of Metron, I feel like having "Metron" prepended is redundant.
> Regardless of whether we decide to stick with that approach, I propose that
> project names be uniform and lowercase.  For example, under these
> assumptions "Metron-Common" would change to "common".
>
> The first level of organization makes sense to me.  Only change I would
> make would be to project names:
>
>   *   deployment
>   *   streaming
>   *   ui
>
> Or if we want to keep metron in project names:
>
>   *   metron-deployment
>   *   metron-streaming
>   *   metron-ui
>
> For now I don't see any changes necessary in deployment or ui
> organization.  I see the streaming project structure primarily driven by 2
> things:  the Maven dependency tree and deployment targets.  For example,
> solr and elasticsearch code should be separated (because their dependency
> on lucene conflicts) but both will depend on common enrichment code.  Also,
> now that parser, enrichment and pcap topologies are separate, code for
> those topologies will be deployed as separate jars.  No reason to include
> parser code in enrichment topologies and vice-versa.  Any other
> considerations I'm missing?
>
> With that being said, here is my initial proposal:
>
>   *   common -  Any common code that all topologies depend on
> (configuration classes, generic writers for example).  No dependencies on
> other Metron projects.
>   *   test - Contains utilities for writing unit tests, sample configs and
> sample data.  Will depend on common.
>   *   integration-test - Contains utilities and classes needed to run our
> integration tests (in memory components for example).  Will depend on
> common and test.
>   *   dataload - Contains all code related to data loading.  Will also
> include any property files needed and integration tests.  Will depend on
> common, test (test scope), and integration-test (test scope).
>   *   parser - All code specific to the parser topologies.  Would also
> include scripts, property files, flux files and parser topology integration
> tests.  This project will depend on common, test (test scope), and
> integration-testing (test scope).
>   *   enrichment - All code specific to the enrichment topologies (except
> solr and elasticsearch).  Would also include scripts, property files, flux
> files and enrichment topology integration tests.  This project will depend
> on common, test (test scope), and integration-test (test scope).
>   *   elasticsearch - All Elasticsearch related code.  Will depend on
> enrichment.
>   *   solr - All Solr related code.  Will depend on enrichment.
>   *   pcap - All code specific to the topology dedicated to pcap.  Would
> also include scripts, property files, flux files and pcap integration
> test.  This project will depend on common, test (test scope) and
> integration-test (test scope).
>   *   api - This will serve as a generic replacement for
> Metron-Pcap_Service.  Will contain all code to build a Metron web service
> middle layer that can expose APIs through REST or other client protocols.
> Could possibly depend on all other projects or separated further if version
> conflicts arise (separate api projects for solr and elasticsearch for
> example).
>
> Looking forward to hearing everyone's feedback and great ideas.
>
> Ryan Merriman
>



-- 
Nick Allen 


Re: [DISCUSS] Project reorganization

2016-04-10 Thread Nick Allen
Is there any reason to keep the "test" and "integration-test" code
separate?






On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman 
wrote:

> All,
>
> I would like to propose a review and refactor of the current project
> organization within Metron.  Much of the way the legacy code was organized
> does not make sense anymore and could be designed so that it is easier to
> navigate and understand.  Our test coverage has increased substantially so
> I believe we can do this with confidence.
>
> First off, I think we should agree on a naming convention.  I see some
> projects (YARN and Storm for example) that prepend the sub-project with the
> name of the top-level project (storm-core for example).  Metron also
> currently does this (Metron-Common).  I think that's fine, although in the
> case of Metron, I feel like having "Metron" prepended is redundant.
> Regardless of whether we decide to stick with that approach, I propose that
> project names be uniform and lowercase.  For example, under these
> assumptions "Metron-Common" would change to "common".
>
> The first level of organization makes sense to me.  Only change I would
> make would be to project names:
>
>   *   deployment
>   *   streaming
>   *   ui
>
> Or if we want to keep metron in project names:
>
>   *   metron-deployment
>   *   metron-streaming
>   *   metron-ui
>
> For now I don't see any changes necessary in deployment or ui
> organization.  I see the streaming project structure primarily driven by 2
> things:  the Maven dependency tree and deployment targets.  For example,
> solr and elasticsearch code should be separated (because their dependency
> on lucene conflicts) but both will depend on common enrichment code.  Also,
> now that parser, enrichment and pcap topologies are separate, code for
> those topologies will be deployed as separate jars.  No reason to include
> parser code in enrichment topologies and vice-versa.  Any other
> considerations I'm missing?
>
> With that being said, here is my initial proposal:
>
>   *   common -  Any common code that all topologies depend on
> (configuration classes, generic writers for example).  No dependencies on
> other Metron projects.
>   *   test - Contains utilities for writing unit tests, sample configs and
> sample data.  Will depend on common.
>   *   integration-test - Contains utilities and classes needed to run our
> integration tests (in memory components for example).  Will depend on
> common and test.
>   *   dataload - Contains all code related to data loading.  Will also
> include any property files needed and integration tests.  Will depend on
> common, test (test scope), and integration-test (test scope).
>   *   parser - All code specific to the parser topologies.  Would also
> include scripts, property files, flux files and parser topology integration
> tests.  This project will depend on common, test (test scope), and
> integration-testing (test scope).
>   *   enrichment - All code specific to the enrichment topologies (except
> solr and elasticsearch).  Would also include scripts, property files, flux
> files and enrichment topology integration tests.  This project will depend
> on common, test (test scope), and integration-test (test scope).
>   *   elasticsearch - All Elasticsearch related code.  Will depend on
> enrichment.
>   *   solr - All Solr related code.  Will depend on enrichment.
>   *   pcap - All code specific to the topology dedicated to pcap.  Would
> also include scripts, property files, flux files and pcap integration
> test.  This project will depend on common, test (test scope) and
> integration-test (test scope).
>   *   api - This will serve as a generic replacement for
> Metron-Pcap_Service.  Will contain all code to build a Metron web service
> middle layer that can expose APIs through REST or other client protocols.
> Could possibly depend on all other projects or separated further if version
> conflicts arise (separate api projects for solr and elasticsearch for
> example).
>
> Looking forward to hearing everyone's feedback and great ideas.
>
> Ryan Merriman
>



-- 
Nick Allen 


[DISCUSS] Project reorganization

2016-04-08 Thread Ryan Merriman
All,

I would like to propose a review and refactor of the current project 
organization within Metron.  Much of the way the legacy code was organized does 
not make sense anymore and could be designed so that it is easier to navigate 
and understand.  Our test coverage has increased substantially so I believe we 
can do this with confidence.

First off, I think we should agree on a naming convention.  I see some projects 
(YARN and Storm for example) that prepend the sub-project with the name of the 
top-level project (storm-core for example).  Metron also currently does this 
(Metron-Common).  I think that's fine, although in the case of Metron, I feel 
like having "Metron" prepended is redundant.  Regardless of whether we decide 
to stick with that approach, I propose that project names be uniform and 
lowercase.  For example, under these assumptions "Metron-Common" would change 
to "common".

The first level of organization makes sense to me.  Only change I would make 
would be to project names:

  *   deployment
  *   streaming
  *   ui

Or if we want to keep metron in project names:

  *   metron-deployment
  *   metron-streaming
  *   metron-ui

For now I don't see any changes necessary in deployment or ui organization.  I 
see the streaming project structure primarily driven by 2 things:  the Maven 
dependency tree and deployment targets.  For example, solr and elasticsearch 
code should be separated (because their dependency on lucene conflicts) but 
both will depend on common enrichment code.  Also, now that parser, enrichment 
and pcap topologies are separate, code for those topologies will be deployed as 
separate jars.  No reason to include parser code in enrichment topologies and 
vice-versa.  Any other considerations I'm missing?

With that being said, here is my initial proposal:

  *   common -  Any common code that all topologies depend on (configuration 
classes, generic writers for example).  No dependencies on other Metron 
projects.
  *   test - Contains utilities for writing unit tests, sample configs and 
sample data.  Will depend on common.
  *   integration-test - Contains utilities and classes needed to run our 
integration tests (in memory components for example).  Will depend on common 
and test.
  *   dataload - Contains all code related to data loading.  Will also include 
any property files needed and integration tests.  Will depend on common, test 
(test scope), and integration-test (test scope).
  *   parser - All code specific to the parser topologies.  Would also include 
scripts, property files, flux files and parser topology integration tests.  
This project will depend on common, test (test scope), and integration-testing 
(test scope).
  *   enrichment - All code specific to the enrichment topologies (except solr 
and elasticsearch).  Would also include scripts, property files, flux files and 
enrichment topology integration tests.  This project will depend on common, 
test (test scope), and integration-test (test scope).
  *   elasticsearch - All Elasticsearch related code.  Will depend on 
enrichment.
  *   solr - All Solr related code.  Will depend on enrichment.
  *   pcap - All code specific to the topology dedicated to pcap.  Would also 
include scripts, property files, flux files and pcap integration test.  This 
project will depend on common, test (test scope) and integration-test (test 
scope).
  *   api - This will serve as a generic replacement for Metron-Pcap_Service.  
Will contain all code to build a Metron web service middle layer that can 
expose APIs through REST or other client protocols.  Could possibly depend on 
all other projects or separated further if version conflicts arise (separate 
api projects for solr and elasticsearch for example).

Looking forward to hearing everyone's feedback and great ideas.

Ryan Merriman