Re: [DISCUSS] Project reorganization
Sheetal, Thank you for the input. We appreciate all the hard work you and others put into OpenSOC to get us to where we are today. To your points: - Agreed on reevaluating the bolts that now ship with Storm. I believe the HDFS and HBase bolts didn’t quite provide all the functionality needed and is the reason for custom implementationsm but I will defer to others who actually worked on those tasks. - Agreed on changing HbaseConverter to HBaseConverter. I will update the spreadsheet. - Agreed on a common package for HBase related classes. We should look more closely at this, any suggestions are welcome of course. - There is a reason Solr and Elasticsearch classes ended up in separate projects. The supported version of Elasticsearch (1.7.4) is a couple years old and the supported version of Solr is recent. Locating these in the same project is challenging because they both depend on very different versions of Lucene. Once we update Elasticsearch to a more recent version such that it depends on the same Lucene version as Solr, keeping them together in the same project should be much easier. - Parsers and Enrichments are now decoupled, whereas before they were included in the same topology. Now they run in different topologies and are deployed in separate jars. - Agreed on the categories. I believe some in your list are already represented in the proposed project structure. “Data Acquisition” is analogous to the top level “metron-sensors” project. “Data Access” is represented by the top level “metron-ui” project and the “metron-api” project within the top level “metron-platform” project. I like your idea of having “Active Analysis” and “Deep Analytics” projects as well. The real-time pieces are represented in various sub projects in “metron-platform” but I think there will eventually be a need for a “Deep Analytics” project which is missing. Maybe we should include a “metron-analytics” project under “metron-platform”? If not now, in the future when we deliver more functionality in this area? Ryan Merriman On 4/19/16, 3:48 PM, "Sheetal Dolas" wrote: >Some of HBase bolt related classes were created in OpenSoc as that time >Storm's HBase bolt did not have all necessary features (ability to add >custom configs, enable/disable WAL, easy tuple mapping etc.). It should be >re-evaluated to see if we can leverage the these components from Storm >itself so as to avoid additional maintenance. > >Some observations and pointers for more thoughts: >* HbaseConverter should be H*B*aseConverter to match other cases. >* org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but >other hbase components are in hbase package. >* It may be better to have project structure on functional grouping than >mix of function + implementation choices for example solr, and es probably >could be packages than sub modules. (Unless the intention is to support >more such "pluggable" indexing mechanisms at any given point) >* parsers/enrichments, are they expected to be reused across multiple >projects? If yes, are they different from common? If not, should they be >packages instead? >* From deployment perspective essentially there following broader >categories >1. Data Acquisition (pcap, nifi, flume, kafka writer etc.) >2. Active Analysis (real time pieces - kafka, storm topology, bolts, >parsers, enrichments, alerts etc) >3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark >related components) >4. Data Access (apis, UI etc) > >Would it make sense to create project structure in such functional >groupings? > > >On Mon, Apr 18, 2016 at 1:46 PM, James Sirota >wrote: > >> Hi Ryan, >> >> This is great. You should attach this to the Jira when you are ready to >> commit the reorg so we know which parts shifted. >> >> Thanks, >> James >> >> >> >> >> On 4/18/16, 1:30 PM, "Ryan Merriman" wrote: >> >> >Thanks Frank. I’ve updated those in the spreadsheet. >> > >> >On 4/18/16, 3:27 PM, "Frank Lu" wrote: >> > >> >>As of now, I think the following classes are not used: >> >> >> >> >> >> >> >> >> >>Metron-EnrichmentAdapters >> >> org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java >> >> >> >> >> >> org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java >> >> >> >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java >> >> >> >> >> >>Metron-DataLoads >> >>org.apache.metron.dataloads.cif.HBaseTableLoad.java >> >> >> >> >> >>Thanks, >> >>Frank Lu >> >> >> >> >> >> >> >> >> >>On 4/18/16, 3:05 PM, "Ryan Merriman" >>wrote: >> >> >> >>>All, >> >>> >> >>>I put together a list of all the project java assets that details >>where >> >>>they will be moved (or potentially deleted) as part of the project >> >>>reorganization. Feedback welcome. >> >>> >> >>>Ryan Merriman >> >>> >> >>>On 4/13/16, 9:42 AM, "James Sirota" wrote: >> >>> >> I would have configs as a project but rather as a folder structure >>that >> other modules can point to >> >> Thanks, >> James
Re: [DISCUSS] Project reorganization
Some of HBase bolt related classes were created in OpenSoc as that time Storm's HBase bolt did not have all necessary features (ability to add custom configs, enable/disable WAL, easy tuple mapping etc.). It should be re-evaluated to see if we can leverage the these components from Storm itself so as to avoid additional maintenance. Some observations and pointers for more thoughts: * HbaseConverter should be H*B*aseConverter to match other cases. * org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but other hbase components are in hbase package. * It may be better to have project structure on functional grouping than mix of function + implementation choices for example solr, and es probably could be packages than sub modules. (Unless the intention is to support more such "pluggable" indexing mechanisms at any given point) * parsers/enrichments, are they expected to be reused across multiple projects? If yes, are they different from common? If not, should they be packages instead? * From deployment perspective essentially there following broader categories 1. Data Acquisition (pcap, nifi, flume, kafka writer etc.) 2. Active Analysis (real time pieces - kafka, storm topology, bolts, parsers, enrichments, alerts etc) 3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark related components) 4. Data Access (apis, UI etc) Would it make sense to create project structure in such functional groupings? On Mon, Apr 18, 2016 at 1:46 PM, James Sirota wrote: > Hi Ryan, > > This is great. You should attach this to the Jira when you are ready to > commit the reorg so we know which parts shifted. > > Thanks, > James > > > > > On 4/18/16, 1:30 PM, "Ryan Merriman" wrote: > > >Thanks Frank. I’ve updated those in the spreadsheet. > > > >On 4/18/16, 3:27 PM, "Frank Lu" wrote: > > > >>As of now, I think the following classes are not used: > >> > >> > >> > >> > >>Metron-EnrichmentAdapters > >> org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java > >> > >> > >> org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java > >> > >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java > >> > >> > >>Metron-DataLoads > >>org.apache.metron.dataloads.cif.HBaseTableLoad.java > >> > >> > >>Thanks, > >>Frank Lu > >> > >> > >> > >> > >>On 4/18/16, 3:05 PM, "Ryan Merriman" wrote: > >> > >>>All, > >>> > >>>I put together a list of all the project java assets that details where > >>>they will be moved (or potentially deleted) as part of the project > >>>reorganization. Feedback welcome. > >>> > >>>Ryan Merriman > >>> > >>>On 4/13/16, 9:42 AM, "James Sirota" wrote: > >>> > I would have configs as a project but rather as a folder structure that > other modules can point to > > Thanks, > James > > > > > On 4/13/16, 7:32 AM, "Ryan Merriman" > wrote: > > >James brings up a good point. I propose adding another project under > >metron-platform called metron-configuration. This would be a fairly > >lightweight project that would contain anything related to > >configuration > >(property files, json files, flux files, etc). > > > >On 4/13/16, 8:56 AM, "James Sirota" wrote: > > > >>+1 from me. > >> > >>I would also like to address the configs and make sure the configs > are > >>in > >>the same place. Do you have ideas on where we would put those? > >> > >>Thanks, > >>James > >> > >> > >> > >>On 4/13/16, 6:50 AM, "Ryan Merriman" > >>wrote: > >> > >>>Thank you for all the feedback everyone. I will attempt to > summarize > >>>all > >>>the input we¹ve received and update my initial proposal. We can > >>>discuss > >>>further if anyone is still unclear and I will volunteer to capture > >>>all > >>>the > >>>details in a document of some kind once we all come to a consensus. > >>> > >>>Looks like everyone is in agreement for the top level projects. > Nick > >>>is > >>>working on a task that will require an addition top level project so > >>>I > >>>am > >>>going to add that in as well: > >>> > >>>metron-deployment > >>>metron-platform > >>>metron-ui > >>>metron-sensors > >>> > >>>All of these except metron-platform are well understood and don¹t > >>>warrant > >>>any more discussion. For metron-platform there seem to be 2 areas > >>>that > >>>are not as clear: > >>> > >>>- whether we need a common project > >>>- how do we organize test related code > >>> > >>>I agree with David and others that a common project will likely get > >>>misused and could become unnecessary bloated. But I suspect there > >>>will > >>>be > >>>cases where we have common code being used across multiple projects > >>>(is > >>>already happening). In this case we will either need this common > >>>project > >>>or we wil
Re: [DISCUSS] Project reorganization
Hi Ryan, This is great. You should attach this to the Jira when you are ready to commit the reorg so we know which parts shifted. Thanks, James On 4/18/16, 1:30 PM, "Ryan Merriman" wrote: >Thanks Frank. I’ve updated those in the spreadsheet. > >On 4/18/16, 3:27 PM, "Frank Lu" wrote: > >>As of now, I think the following classes are not used: >> >> >> >> >>Metron-EnrichmentAdapters >> org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java >> >> >> org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java >> >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java >> >> >>Metron-DataLoads >>org.apache.metron.dataloads.cif.HBaseTableLoad.java >> >> >>Thanks, >>Frank Lu >> >> >> >> >>On 4/18/16, 3:05 PM, "Ryan Merriman" wrote: >> >>>All, >>> >>>I put together a list of all the project java assets that details where >>>they will be moved (or potentially deleted) as part of the project >>>reorganization. Feedback welcome. >>> >>>Ryan Merriman >>> >>>On 4/13/16, 9:42 AM, "James Sirota" wrote: >>> I would have configs as a project but rather as a folder structure that other modules can point to Thanks, James On 4/13/16, 7:32 AM, "Ryan Merriman" wrote: >James brings up a good point. I propose adding another project under >metron-platform called metron-configuration. This would be a fairly >lightweight project that would contain anything related to >configuration >(property files, json files, flux files, etc). > >On 4/13/16, 8:56 AM, "James Sirota" wrote: > >>+1 from me. >> >>I would also like to address the configs and make sure the configs are >>in >>the same place. Do you have ideas on where we would put those? >> >>Thanks, >>James >> >> >> >>On 4/13/16, 6:50 AM, "Ryan Merriman" >>wrote: >> >>>Thank you for all the feedback everyone. I will attempt to summarize >>>all >>>the input we¹ve received and update my initial proposal. We can >>>discuss >>>further if anyone is still unclear and I will volunteer to capture >>>all >>>the >>>details in a document of some kind once we all come to a consensus. >>> >>>Looks like everyone is in agreement for the top level projects. Nick >>>is >>>working on a task that will require an addition top level project so >>>I >>>am >>>going to add that in as well: >>> >>>metron-deployment >>>metron-platform >>>metron-ui >>>metron-sensors >>> >>>All of these except metron-platform are well understood and don¹t >>>warrant >>>any more discussion. For metron-platform there seem to be 2 areas >>>that >>>are not as clear: >>> >>>- whether we need a common project >>>- how do we organize test related code >>> >>>I agree with David and others that a common project will likely get >>>misused and could become unnecessary bloated. But I suspect there >>>will >>>be >>>cases where we have common code being used across multiple projects >>>(is >>>already happening). In this case we will either need this common >>>project >>>or we will have to keep common code in one of the other projects and >>>have >>>all other projects extend that. For the latter, an example would be >>>keeping common code in enrichment and having parsers declare >>>enrichment >>>as >>>a dependency. There are a couple downsides I see with this approach: >>> >>>- parser topology jars now bring along all the enrichment >>>dependencies >>>- since more code from various projects are being packaged together, >>>version conflicts are more likely and poms become more complicated >>>due >>>to >>>all the necessary exclusions >>> >>>My thinking is that any jar file being deployed should only contain >>>what >>>it needs. Curious what others think here. My vote would be to >>>maintain >>>a >>>common project (or whatever we want to call it) and be diligent about >>>not >>>letting project-specific code slip in there. >>> >>>I believe Nick was the first person to ask the question about >>>projects >>>related to test code and why we would need separate test and >>>integration >>>test. The reason for this is that our integration-test classes >>>currently >>>depend on other projects (not surprising since they are integration >>>tests). If there are utilities we want make available to all >>>projects >>>(mock classes, utilities for reading sample data, etc) then it can¹t >>>live >>>in integration-test because that will introduce circular >>>dependencies. >>>If >>>it is possible to refactor our current Metron-Testing project so that >>>it >>>doesn¹t depend on any other projects, then we can keep utilities >>>here. >>
Re: [DISCUSS] Project reorganization
Thanks Frank. I’ve updated those in the spreadsheet. On 4/18/16, 3:27 PM, "Frank Lu" wrote: >As of now, I think the following classes are not used: > > > > >Metron-EnrichmentAdapters > org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java > > > org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java > >org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java > > >Metron-DataLoads >org.apache.metron.dataloads.cif.HBaseTableLoad.java > > >Thanks, >Frank Lu > > > > >On 4/18/16, 3:05 PM, "Ryan Merriman" wrote: > >>All, >> >>I put together a list of all the project java assets that details where >>they will be moved (or potentially deleted) as part of the project >>reorganization. Feedback welcome. >> >>Ryan Merriman >> >>On 4/13/16, 9:42 AM, "James Sirota" wrote: >> >>>I would have configs as a project but rather as a folder structure that >>>other modules can point to >>> >>>Thanks, >>>James >>> >>> >>> >>> >>>On 4/13/16, 7:32 AM, "Ryan Merriman" wrote: >>> James brings up a good point. I propose adding another project under metron-platform called metron-configuration. This would be a fairly lightweight project that would contain anything related to configuration (property files, json files, flux files, etc). On 4/13/16, 8:56 AM, "James Sirota" wrote: >+1 from me. > >I would also like to address the configs and make sure the configs are >in >the same place. Do you have ideas on where we would put those? > >Thanks, >James > > > >On 4/13/16, 6:50 AM, "Ryan Merriman" >wrote: > >>Thank you for all the feedback everyone. I will attempt to summarize >>all >>the input we¹ve received and update my initial proposal. We can >>discuss >>further if anyone is still unclear and I will volunteer to capture >>all >>the >>details in a document of some kind once we all come to a consensus. >> >>Looks like everyone is in agreement for the top level projects. Nick >>is >>working on a task that will require an addition top level project so >>I >>am >>going to add that in as well: >> >>metron-deployment >>metron-platform >>metron-ui >>metron-sensors >> >>All of these except metron-platform are well understood and don¹t >>warrant >>any more discussion. For metron-platform there seem to be 2 areas >>that >>are not as clear: >> >>- whether we need a common project >>- how do we organize test related code >> >>I agree with David and others that a common project will likely get >>misused and could become unnecessary bloated. But I suspect there >>will >>be >>cases where we have common code being used across multiple projects >>(is >>already happening). In this case we will either need this common >>project >>or we will have to keep common code in one of the other projects and >>have >>all other projects extend that. For the latter, an example would be >>keeping common code in enrichment and having parsers declare >>enrichment >>as >>a dependency. There are a couple downsides I see with this approach: >> >>- parser topology jars now bring along all the enrichment >>dependencies >>- since more code from various projects are being packaged together, >>version conflicts are more likely and poms become more complicated >>due >>to >>all the necessary exclusions >> >>My thinking is that any jar file being deployed should only contain >>what >>it needs. Curious what others think here. My vote would be to >>maintain >>a >>common project (or whatever we want to call it) and be diligent about >>not >>letting project-specific code slip in there. >> >>I believe Nick was the first person to ask the question about >>projects >>related to test code and why we would need separate test and >>integration >>test. The reason for this is that our integration-test classes >>currently >>depend on other projects (not surprising since they are integration >>tests). If there are utilities we want make available to all >>projects >>(mock classes, utilities for reading sample data, etc) then it can¹t >>live >>in integration-test because that will introduce circular >>dependencies. >>If >>it is possible to refactor our current Metron-Testing project so that >>it >>doesn¹t depend on any other projects, then we can keep utilities >>here. >>Otherwise we need a separate project for testing utilities. I >>suspect >>removing other project dependencies from Metron-Testing will prove >>more >>difficult than it¹s worth so my vote would be to have 2 test related >>projects. >> >>So here is where our metron-platform organization stands: >> >>metron-commo
Re: [DISCUSS] Project reorganization
As of now, I think the following classes are not used: Metron-EnrichmentAdapters org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java Metron-DataLoads org.apache.metron.dataloads.cif.HBaseTableLoad.java Thanks, Frank Lu On 4/18/16, 3:05 PM, "Ryan Merriman" wrote: >All, > >I put together a list of all the project java assets that details where >they will be moved (or potentially deleted) as part of the project >reorganization. Feedback welcome. > >Ryan Merriman > >On 4/13/16, 9:42 AM, "James Sirota" wrote: > >>I would have configs as a project but rather as a folder structure that >>other modules can point to >> >>Thanks, >>James >> >> >> >> >>On 4/13/16, 7:32 AM, "Ryan Merriman" wrote: >> >>>James brings up a good point. I propose adding another project under >>>metron-platform called metron-configuration. This would be a fairly >>>lightweight project that would contain anything related to configuration >>>(property files, json files, flux files, etc). >>> >>>On 4/13/16, 8:56 AM, "James Sirota" wrote: >>> +1 from me. I would also like to address the configs and make sure the configs are in the same place. Do you have ideas on where we would put those? Thanks, James On 4/13/16, 6:50 AM, "Ryan Merriman" wrote: >Thank you for all the feedback everyone. I will attempt to summarize >all >the input we¹ve received and update my initial proposal. We can >discuss >further if anyone is still unclear and I will volunteer to capture all >the >details in a document of some kind once we all come to a consensus. > >Looks like everyone is in agreement for the top level projects. Nick >is >working on a task that will require an addition top level project so I >am >going to add that in as well: > >metron-deployment >metron-platform >metron-ui >metron-sensors > >All of these except metron-platform are well understood and don¹t >warrant >any more discussion. For metron-platform there seem to be 2 areas that >are not as clear: > >- whether we need a common project >- how do we organize test related code > >I agree with David and others that a common project will likely get >misused and could become unnecessary bloated. But I suspect there will >be >cases where we have common code being used across multiple projects (is >already happening). In this case we will either need this common >project >or we will have to keep common code in one of the other projects and >have >all other projects extend that. For the latter, an example would be >keeping common code in enrichment and having parsers declare enrichment >as >a dependency. There are a couple downsides I see with this approach: > >- parser topology jars now bring along all the enrichment dependencies >- since more code from various projects are being packaged together, >version conflicts are more likely and poms become more complicated due >to >all the necessary exclusions > >My thinking is that any jar file being deployed should only contain >what >it needs. Curious what others think here. My vote would be to >maintain >a >common project (or whatever we want to call it) and be diligent about >not >letting project-specific code slip in there. > >I believe Nick was the first person to ask the question about projects >related to test code and why we would need separate test and >integration >test. The reason for this is that our integration-test classes >currently >depend on other projects (not surprising since they are integration >tests). If there are utilities we want make available to all projects >(mock classes, utilities for reading sample data, etc) then it can¹t >live >in integration-test because that will introduce circular dependencies. >If >it is possible to refactor our current Metron-Testing project so that >it >doesn¹t depend on any other projects, then we can keep utilities here. >Otherwise we need a separate project for testing utilities. I suspect >removing other project dependencies from Metron-Testing will prove more >difficult than it¹s worth so my vote would be to have 2 test related >projects. > >So here is where our metron-platform organization stands: > >metron-common * >metron-integration-test * >metron-test-utilities * >metron-data-management >metron-pcap >metron-parsers >metron-enrichment > metron-solr > metron-elasticsearch >metron-api > >* may or may not change depending on the outcome of this discussion > >Thoughts? > >Rya
Re: [DISCUSS] Project reorganization
All, I put together a list of all the project java assets that details where they will be moved (or potentially deleted) as part of the project reorganization. Feedback welcome. Ryan Merriman On 4/13/16, 9:42 AM, "James Sirota" wrote: >I would have configs as a project but rather as a folder structure that >other modules can point to > >Thanks, >James > > > > >On 4/13/16, 7:32 AM, "Ryan Merriman" wrote: > >>James brings up a good point. I propose adding another project under >>metron-platform called metron-configuration. This would be a fairly >>lightweight project that would contain anything related to configuration >>(property files, json files, flux files, etc). >> >>On 4/13/16, 8:56 AM, "James Sirota" wrote: >> >>>+1 from me. >>> >>>I would also like to address the configs and make sure the configs are >>>in >>>the same place. Do you have ideas on where we would put those? >>> >>>Thanks, >>>James >>> >>> >>> >>>On 4/13/16, 6:50 AM, "Ryan Merriman" wrote: >>> Thank you for all the feedback everyone. I will attempt to summarize all the input we¹ve received and update my initial proposal. We can discuss further if anyone is still unclear and I will volunteer to capture all the details in a document of some kind once we all come to a consensus. Looks like everyone is in agreement for the top level projects. Nick is working on a task that will require an addition top level project so I am going to add that in as well: metron-deployment metron-platform metron-ui metron-sensors All of these except metron-platform are well understood and don¹t warrant any more discussion. For metron-platform there seem to be 2 areas that are not as clear: - whether we need a common project - how do we organize test related code I agree with David and others that a common project will likely get misused and could become unnecessary bloated. But I suspect there will be cases where we have common code being used across multiple projects (is already happening). In this case we will either need this common project or we will have to keep common code in one of the other projects and have all other projects extend that. For the latter, an example would be keeping common code in enrichment and having parsers declare enrichment as a dependency. There are a couple downsides I see with this approach: - parser topology jars now bring along all the enrichment dependencies - since more code from various projects are being packaged together, version conflicts are more likely and poms become more complicated due to all the necessary exclusions My thinking is that any jar file being deployed should only contain what it needs. Curious what others think here. My vote would be to maintain a common project (or whatever we want to call it) and be diligent about not letting project-specific code slip in there. I believe Nick was the first person to ask the question about projects related to test code and why we would need separate test and integration test. The reason for this is that our integration-test classes currently depend on other projects (not surprising since they are integration tests). If there are utilities we want make available to all projects (mock classes, utilities for reading sample data, etc) then it can¹t live in integration-test because that will introduce circular dependencies. If it is possible to refactor our current Metron-Testing project so that it doesn¹t depend on any other projects, then we can keep utilities here. Otherwise we need a separate project for testing utilities. I suspect removing other project dependencies from Metron-Testing will prove more difficult than it¹s worth so my vote would be to have 2 test related projects. So here is where our metron-platform organization stands: metron-common * metron-integration-test * metron-test-utilities * metron-data-management metron-pcap metron-parsers metron-enrichment metron-solr metron-elasticsearch metron-api * may or may not change depending on the outcome of this discussion Thoughts? Ryan Merriman On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: >If you load up your Irc client just type >/join #apache-metron-dev > >Sent from my iPhone > >> On Apr 11, 2016, at 12:06 PM, James Sirota >>wrote: >> >> Great, thanks, Debo. Where can I find instructions on how to get to >>it? >> >> Thanks, >> James >> >> >> >> >>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" >>>wrote: >>> >>> Hi James >>> >>> Ok set it up and ack Š.. >>> >>> Thx >>
Re: [DISCUSS] Project reorganization
I would have configs as a project but rather as a folder structure that other modules can point to Thanks, James On 4/13/16, 7:32 AM, "Ryan Merriman" wrote: >James brings up a good point. I propose adding another project under >metron-platform called metron-configuration. This would be a fairly >lightweight project that would contain anything related to configuration >(property files, json files, flux files, etc). > >On 4/13/16, 8:56 AM, "James Sirota" wrote: > >>+1 from me. >> >>I would also like to address the configs and make sure the configs are in >>the same place. Do you have ideas on where we would put those? >> >>Thanks, >>James >> >> >> >>On 4/13/16, 6:50 AM, "Ryan Merriman" wrote: >> >>>Thank you for all the feedback everyone. I will attempt to summarize all >>>the input we¹ve received and update my initial proposal. We can discuss >>>further if anyone is still unclear and I will volunteer to capture all >>>the >>>details in a document of some kind once we all come to a consensus. >>> >>>Looks like everyone is in agreement for the top level projects. Nick is >>>working on a task that will require an addition top level project so I am >>>going to add that in as well: >>> >>>metron-deployment >>>metron-platform >>>metron-ui >>>metron-sensors >>> >>>All of these except metron-platform are well understood and don¹t warrant >>>any more discussion. For metron-platform there seem to be 2 areas that >>>are not as clear: >>> >>>- whether we need a common project >>>- how do we organize test related code >>> >>>I agree with David and others that a common project will likely get >>>misused and could become unnecessary bloated. But I suspect there will >>>be >>>cases where we have common code being used across multiple projects (is >>>already happening). In this case we will either need this common project >>>or we will have to keep common code in one of the other projects and have >>>all other projects extend that. For the latter, an example would be >>>keeping common code in enrichment and having parsers declare enrichment >>>as >>>a dependency. There are a couple downsides I see with this approach: >>> >>>- parser topology jars now bring along all the enrichment dependencies >>>- since more code from various projects are being packaged together, >>>version conflicts are more likely and poms become more complicated due to >>>all the necessary exclusions >>> >>>My thinking is that any jar file being deployed should only contain what >>>it needs. Curious what others think here. My vote would be to maintain >>>a >>>common project (or whatever we want to call it) and be diligent about not >>>letting project-specific code slip in there. >>> >>>I believe Nick was the first person to ask the question about projects >>>related to test code and why we would need separate test and integration >>>test. The reason for this is that our integration-test classes currently >>>depend on other projects (not surprising since they are integration >>>tests). If there are utilities we want make available to all projects >>>(mock classes, utilities for reading sample data, etc) then it can¹t live >>>in integration-test because that will introduce circular dependencies. >>>If >>>it is possible to refactor our current Metron-Testing project so that it >>>doesn¹t depend on any other projects, then we can keep utilities here. >>>Otherwise we need a separate project for testing utilities. I suspect >>>removing other project dependencies from Metron-Testing will prove more >>>difficult than it¹s worth so my vote would be to have 2 test related >>>projects. >>> >>>So here is where our metron-platform organization stands: >>> >>>metron-common * >>>metron-integration-test * >>>metron-test-utilities * >>>metron-data-management >>>metron-pcap >>>metron-parsers >>>metron-enrichment >>> metron-solr >>> metron-elasticsearch >>>metron-api >>> >>>* may or may not change depending on the outcome of this discussion >>> >>>Thoughts? >>> >>>Ryan Merriman >>> >>> >>>On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: >>> If you load up your Irc client just type /join #apache-metron-dev Sent from my iPhone > On Apr 11, 2016, at 12:06 PM, James Sirota >wrote: > > Great, thanks, Debo. Where can I find instructions on how to get to >it? > > Thanks, > James > > > > >> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" >>wrote: >> >> Hi James >> >> Ok set it up and ack Š.. >> >> Thx >> >> >> >> >> >>> On 4/10/16, 6:31 PM, "James Sirota" wrote: >>> >>> Hi Debo, >>> >>> I think it would be great if you set it up >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: I have set it up for another open source effort in the past and it was not very hard. Am happy to voluntee
Re: [DISCUSS] Project reorganization
James brings up a good point. I propose adding another project under metron-platform called metron-configuration. This would be a fairly lightweight project that would contain anything related to configuration (property files, json files, flux files, etc). On 4/13/16, 8:56 AM, "James Sirota" wrote: >+1 from me. > >I would also like to address the configs and make sure the configs are in >the same place. Do you have ideas on where we would put those? > >Thanks, >James > > > >On 4/13/16, 6:50 AM, "Ryan Merriman" wrote: > >>Thank you for all the feedback everyone. I will attempt to summarize all >>the input we¹ve received and update my initial proposal. We can discuss >>further if anyone is still unclear and I will volunteer to capture all >>the >>details in a document of some kind once we all come to a consensus. >> >>Looks like everyone is in agreement for the top level projects. Nick is >>working on a task that will require an addition top level project so I am >>going to add that in as well: >> >>metron-deployment >>metron-platform >>metron-ui >>metron-sensors >> >>All of these except metron-platform are well understood and don¹t warrant >>any more discussion. For metron-platform there seem to be 2 areas that >>are not as clear: >> >>- whether we need a common project >>- how do we organize test related code >> >>I agree with David and others that a common project will likely get >>misused and could become unnecessary bloated. But I suspect there will >>be >>cases where we have common code being used across multiple projects (is >>already happening). In this case we will either need this common project >>or we will have to keep common code in one of the other projects and have >>all other projects extend that. For the latter, an example would be >>keeping common code in enrichment and having parsers declare enrichment >>as >>a dependency. There are a couple downsides I see with this approach: >> >>- parser topology jars now bring along all the enrichment dependencies >>- since more code from various projects are being packaged together, >>version conflicts are more likely and poms become more complicated due to >>all the necessary exclusions >> >>My thinking is that any jar file being deployed should only contain what >>it needs. Curious what others think here. My vote would be to maintain >>a >>common project (or whatever we want to call it) and be diligent about not >>letting project-specific code slip in there. >> >>I believe Nick was the first person to ask the question about projects >>related to test code and why we would need separate test and integration >>test. The reason for this is that our integration-test classes currently >>depend on other projects (not surprising since they are integration >>tests). If there are utilities we want make available to all projects >>(mock classes, utilities for reading sample data, etc) then it can¹t live >>in integration-test because that will introduce circular dependencies. >>If >>it is possible to refactor our current Metron-Testing project so that it >>doesn¹t depend on any other projects, then we can keep utilities here. >>Otherwise we need a separate project for testing utilities. I suspect >>removing other project dependencies from Metron-Testing will prove more >>difficult than it¹s worth so my vote would be to have 2 test related >>projects. >> >>So here is where our metron-platform organization stands: >> >>metron-common * >>metron-integration-test * >>metron-test-utilities * >>metron-data-management >>metron-pcap >>metron-parsers >>metron-enrichment >> metron-solr >> metron-elasticsearch >>metron-api >> >>* may or may not change depending on the outcome of this discussion >> >>Thoughts? >> >>Ryan Merriman >> >> >>On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: >> >>>If you load up your Irc client just type >>>/join #apache-metron-dev >>> >>>Sent from my iPhone >>> On Apr 11, 2016, at 12:06 PM, James Sirota wrote: Great, thanks, Debo. Where can I find instructions on how to get to it? Thanks, James > On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" >wrote: > > Hi James > > Ok set it up and ack Š.. > > Thx > > > > > >> On 4/10/16, 6:31 PM, "James Sirota" wrote: >> >> Hi Debo, >> >> I think it would be great if you set it up >> >> Thanks, >> James >> >> >> >> >>> On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >>> >>> I have set it up for another open source effort in the past and it >>>was not very hard. Am happy to volunteer if needed. >>> >>> Thx >>> Debo >>> >>> Sent from my iPhone >>> On Apr 10, 2016, at 5:53 PM, James Sirota wrote: I¹d be open to an IRC channel. Does anyone know if Apache allows this? If yes, does anyone know how to set one up
Re: [DISCUSS] Project reorganization
+1 I like it. On Wed, Apr 13, 2016 at 9:59 AM, Ryan Merriman wrote: > To answer a couple of other questions people asked: > > Debo, agreed having clear extension points is going to be extremely > important for us. Currently we have well defined interfaces for parsers > and enrichment adapters as well as the ability to load data into and drive > enrichments (threat intels) from HBase tables with well defined key > structures. Eventually we will want to extend this to models. Maybe an > analytical project makes sense when we get to that point? > > Debo and James, yes my vision for the metron-api project is a standard > interface for interacting with Metron. This would include everything from > data access (pcap service) to security and beyond. > > David, let’s explore the best way to leverage the dependencyManagement > section in our top level pom. I think you’re on to something there. Our > maven implementation needs a thorough review as well. > > Ryan Merriman > > > > On 4/13/16, 8:50 AM, "Ryan Merriman" wrote: > > >Thank you for all the feedback everyone. I will attempt to summarize all > >the input we¹ve received and update my initial proposal. We can discuss > >further if anyone is still unclear and I will volunteer to capture all the > >details in a document of some kind once we all come to a consensus. > > > >Looks like everyone is in agreement for the top level projects. Nick is > >working on a task that will require an addition top level project so I am > >going to add that in as well: > > > >metron-deployment > >metron-platform > >metron-ui > >metron-sensors > > > >All of these except metron-platform are well understood and don¹t warrant > >any more discussion. For metron-platform there seem to be 2 areas that > >are not as clear: > > > >- whether we need a common project > >- how do we organize test related code > > > >I agree with David and others that a common project will likely get > >misused and could become unnecessary bloated. But I suspect there will be > >cases where we have common code being used across multiple projects (is > >already happening). In this case we will either need this common project > >or we will have to keep common code in one of the other projects and have > >all other projects extend that. For the latter, an example would be > >keeping common code in enrichment and having parsers declare enrichment as > >a dependency. There are a couple downsides I see with this approach: > > > >- parser topology jars now bring along all the enrichment dependencies > >- since more code from various projects are being packaged together, > >version conflicts are more likely and poms become more complicated due to > >all the necessary exclusions > > > >My thinking is that any jar file being deployed should only contain what > >it needs. Curious what others think here. My vote would be to maintain a > >common project (or whatever we want to call it) and be diligent about not > >letting project-specific code slip in there. > > > >I believe Nick was the first person to ask the question about projects > >related to test code and why we would need separate test and integration > >test. The reason for this is that our integration-test classes currently > >depend on other projects (not surprising since they are integration > >tests). If there are utilities we want make available to all projects > >(mock classes, utilities for reading sample data, etc) then it can¹t live > >in integration-test because that will introduce circular dependencies. If > >it is possible to refactor our current Metron-Testing project so that it > >doesn¹t depend on any other projects, then we can keep utilities here. > >Otherwise we need a separate project for testing utilities. I suspect > >removing other project dependencies from Metron-Testing will prove more > >difficult than it¹s worth so my vote would be to have 2 test related > >projects. > > > >So here is where our metron-platform organization stands: > > > >metron-common * > >metron-integration-test * > >metron-test-utilities * > >metron-data-management > >metron-pcap > >metron-parsers > >metron-enrichment > > metron-solr > > metron-elasticsearch > >metron-api > > > >* may or may not change depending on the outcome of this discussion > > > >Thoughts? > > > >Ryan Merriman > > > > > >On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: > > > >>If you load up your Irc client just type > >>/join #apache-metron-dev > >> > >>Sent from my iPhone > >> > >>> On Apr 11, 2016, at 12:06 PM, James Sirota > >>>wrote: > >>> > >>> Great, thanks, Debo. Where can I find instructions on how to get to > >>>it? > >>> > >>> Thanks, > >>> James > >>> > >>> > >>> > >>> > On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" > wrote: > > Hi James > > Ok set it up and ack Š.. > > Thx > > > > > > > On 4/10/16, 6:31 PM, "James Sirota" wrote: > > > > Hi Debo, > > > > I think it would be
Re: [DISCUSS] Project reorganization
To answer a couple of other questions people asked: Debo, agreed having clear extension points is going to be extremely important for us. Currently we have well defined interfaces for parsers and enrichment adapters as well as the ability to load data into and drive enrichments (threat intels) from HBase tables with well defined key structures. Eventually we will want to extend this to models. Maybe an analytical project makes sense when we get to that point? Debo and James, yes my vision for the metron-api project is a standard interface for interacting with Metron. This would include everything from data access (pcap service) to security and beyond. David, let’s explore the best way to leverage the dependencyManagement section in our top level pom. I think you’re on to something there. Our maven implementation needs a thorough review as well. Ryan Merriman On 4/13/16, 8:50 AM, "Ryan Merriman" wrote: >Thank you for all the feedback everyone. I will attempt to summarize all >the input we¹ve received and update my initial proposal. We can discuss >further if anyone is still unclear and I will volunteer to capture all the >details in a document of some kind once we all come to a consensus. > >Looks like everyone is in agreement for the top level projects. Nick is >working on a task that will require an addition top level project so I am >going to add that in as well: > >metron-deployment >metron-platform >metron-ui >metron-sensors > >All of these except metron-platform are well understood and don¹t warrant >any more discussion. For metron-platform there seem to be 2 areas that >are not as clear: > >- whether we need a common project >- how do we organize test related code > >I agree with David and others that a common project will likely get >misused and could become unnecessary bloated. But I suspect there will be >cases where we have common code being used across multiple projects (is >already happening). In this case we will either need this common project >or we will have to keep common code in one of the other projects and have >all other projects extend that. For the latter, an example would be >keeping common code in enrichment and having parsers declare enrichment as >a dependency. There are a couple downsides I see with this approach: > >- parser topology jars now bring along all the enrichment dependencies >- since more code from various projects are being packaged together, >version conflicts are more likely and poms become more complicated due to >all the necessary exclusions > >My thinking is that any jar file being deployed should only contain what >it needs. Curious what others think here. My vote would be to maintain a >common project (or whatever we want to call it) and be diligent about not >letting project-specific code slip in there. > >I believe Nick was the first person to ask the question about projects >related to test code and why we would need separate test and integration >test. The reason for this is that our integration-test classes currently >depend on other projects (not surprising since they are integration >tests). If there are utilities we want make available to all projects >(mock classes, utilities for reading sample data, etc) then it can¹t live >in integration-test because that will introduce circular dependencies. If >it is possible to refactor our current Metron-Testing project so that it >doesn¹t depend on any other projects, then we can keep utilities here. >Otherwise we need a separate project for testing utilities. I suspect >removing other project dependencies from Metron-Testing will prove more >difficult than it¹s worth so my vote would be to have 2 test related >projects. > >So here is where our metron-platform organization stands: > >metron-common * >metron-integration-test * >metron-test-utilities * >metron-data-management >metron-pcap >metron-parsers >metron-enrichment > metron-solr > metron-elasticsearch >metron-api > >* may or may not change depending on the outcome of this discussion > >Thoughts? > >Ryan Merriman > > >On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: > >>If you load up your Irc client just type >>/join #apache-metron-dev >> >>Sent from my iPhone >> >>> On Apr 11, 2016, at 12:06 PM, James Sirota >>>wrote: >>> >>> Great, thanks, Debo. Where can I find instructions on how to get to >>>it? >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: Hi James Ok set it up and ack Š.. Thx > On 4/10/16, 6:31 PM, "James Sirota" wrote: > > Hi Debo, > > I think it would be great if you set it up > > Thanks, > James > > > > >> On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >> >> I have set it up for another open source effort in the past and it >>was not very hard. Am happy to volunteer if needed. >> >> Thx >> Debo >
Re: [DISCUSS] Project reorganization
+1 from me. I would also like to address the configs and make sure the configs are in the same place. Do you have ideas on where we would put those? Thanks, James On 4/13/16, 6:50 AM, "Ryan Merriman" wrote: >Thank you for all the feedback everyone. I will attempt to summarize all >the input we¹ve received and update my initial proposal. We can discuss >further if anyone is still unclear and I will volunteer to capture all the >details in a document of some kind once we all come to a consensus. > >Looks like everyone is in agreement for the top level projects. Nick is >working on a task that will require an addition top level project so I am >going to add that in as well: > >metron-deployment >metron-platform >metron-ui >metron-sensors > >All of these except metron-platform are well understood and don¹t warrant >any more discussion. For metron-platform there seem to be 2 areas that >are not as clear: > >- whether we need a common project >- how do we organize test related code > >I agree with David and others that a common project will likely get >misused and could become unnecessary bloated. But I suspect there will be >cases where we have common code being used across multiple projects (is >already happening). In this case we will either need this common project >or we will have to keep common code in one of the other projects and have >all other projects extend that. For the latter, an example would be >keeping common code in enrichment and having parsers declare enrichment as >a dependency. There are a couple downsides I see with this approach: > >- parser topology jars now bring along all the enrichment dependencies >- since more code from various projects are being packaged together, >version conflicts are more likely and poms become more complicated due to >all the necessary exclusions > >My thinking is that any jar file being deployed should only contain what >it needs. Curious what others think here. My vote would be to maintain a >common project (or whatever we want to call it) and be diligent about not >letting project-specific code slip in there. > >I believe Nick was the first person to ask the question about projects >related to test code and why we would need separate test and integration >test. The reason for this is that our integration-test classes currently >depend on other projects (not surprising since they are integration >tests). If there are utilities we want make available to all projects >(mock classes, utilities for reading sample data, etc) then it can¹t live >in integration-test because that will introduce circular dependencies. If >it is possible to refactor our current Metron-Testing project so that it >doesn¹t depend on any other projects, then we can keep utilities here. >Otherwise we need a separate project for testing utilities. I suspect >removing other project dependencies from Metron-Testing will prove more >difficult than it¹s worth so my vote would be to have 2 test related >projects. > >So here is where our metron-platform organization stands: > >metron-common * >metron-integration-test * >metron-test-utilities * >metron-data-management >metron-pcap >metron-parsers >metron-enrichment > metron-solr > metron-elasticsearch >metron-api > >* may or may not change depending on the outcome of this discussion > >Thoughts? > >Ryan Merriman > > >On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: > >>If you load up your Irc client just type >>/join #apache-metron-dev >> >>Sent from my iPhone >> >>> On Apr 11, 2016, at 12:06 PM, James Sirota >>>wrote: >>> >>> Great, thanks, Debo. Where can I find instructions on how to get to it? >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: Hi James Ok set it up and ack Š.. Thx > On 4/10/16, 6:31 PM, "James Sirota" wrote: > > Hi Debo, > > I think it would be great if you set it up > > Thanks, > James > > > > >> On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >> >> I have set it up for another open source effort in the past and it >>was not very hard. Am happy to volunteer if needed. >> >> Thx >> Debo >> >> Sent from my iPhone >> >>> On Apr 10, 2016, at 5:53 PM, James Sirota >>>wrote: >>> >>> I¹d be open to an IRC channel. Does anyone know if Apache allows >>>this? If yes, does anyone know how to set one up? >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: Hi Nick I like your suggestions. For the enrichment layer do you think it would also include any advanced analytics. Else we might want to have an analytics layer. It would be good to have an arch which could be extended for new functionality.
Re: [DISCUSS] Project reorganization
Thank you for all the feedback everyone. I will attempt to summarize all the input we¹ve received and update my initial proposal. We can discuss further if anyone is still unclear and I will volunteer to capture all the details in a document of some kind once we all come to a consensus. Looks like everyone is in agreement for the top level projects. Nick is working on a task that will require an addition top level project so I am going to add that in as well: metron-deployment metron-platform metron-ui metron-sensors All of these except metron-platform are well understood and don¹t warrant any more discussion. For metron-platform there seem to be 2 areas that are not as clear: - whether we need a common project - how do we organize test related code I agree with David and others that a common project will likely get misused and could become unnecessary bloated. But I suspect there will be cases where we have common code being used across multiple projects (is already happening). In this case we will either need this common project or we will have to keep common code in one of the other projects and have all other projects extend that. For the latter, an example would be keeping common code in enrichment and having parsers declare enrichment as a dependency. There are a couple downsides I see with this approach: - parser topology jars now bring along all the enrichment dependencies - since more code from various projects are being packaged together, version conflicts are more likely and poms become more complicated due to all the necessary exclusions My thinking is that any jar file being deployed should only contain what it needs. Curious what others think here. My vote would be to maintain a common project (or whatever we want to call it) and be diligent about not letting project-specific code slip in there. I believe Nick was the first person to ask the question about projects related to test code and why we would need separate test and integration test. The reason for this is that our integration-test classes currently depend on other projects (not surprising since they are integration tests). If there are utilities we want make available to all projects (mock classes, utilities for reading sample data, etc) then it can¹t live in integration-test because that will introduce circular dependencies. If it is possible to refactor our current Metron-Testing project so that it doesn¹t depend on any other projects, then we can keep utilities here. Otherwise we need a separate project for testing utilities. I suspect removing other project dependencies from Metron-Testing will prove more difficult than it¹s worth so my vote would be to have 2 test related projects. So here is where our metron-platform organization stands: metron-common * metron-integration-test * metron-test-utilities * metron-data-management metron-pcap metron-parsers metron-enrichment metron-solr metron-elasticsearch metron-api * may or may not change depending on the outcome of this discussion Thoughts? Ryan Merriman On 4/11/16, 4:15 PM, "Debojyoti Dutta" wrote: >If you load up your Irc client just type >/join #apache-metron-dev > >Sent from my iPhone > >> On Apr 11, 2016, at 12:06 PM, James Sirota >>wrote: >> >> Great, thanks, Debo. Where can I find instructions on how to get to it? >> >> Thanks, >> James >> >> >> >> >>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: >>> >>> Hi James >>> >>> Ok set it up and ack Š.. >>> >>> Thx >>> >>> >>> >>> >>> On 4/10/16, 6:31 PM, "James Sirota" wrote: Hi Debo, I think it would be great if you set it up Thanks, James > On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: > > I have set it up for another open source effort in the past and it >was not very hard. Am happy to volunteer if needed. > > Thx > Debo > > Sent from my iPhone > >> On Apr 10, 2016, at 5:53 PM, James Sirota >>wrote: >> >> I¹d be open to an IRC channel. Does anyone know if Apache allows >>this? If yes, does anyone know how to set one up? >> >> Thanks, >> James >> >> >> >> >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >>> >>> Hi Nick >>> >>> I like your suggestions. For the enrichment layer do you think it >>>would also include any advanced analytics. Else we might want to >>>have an analytics layer. >>> >>> It would be good to have an arch which could be extended for new >>>functionality. >>> >>> However Ryan's suggestion of the ui API and deployer also makes >>>sense. >>> >>> Should we have an IRC channel to discuss this or maybe etherpad? >>> >>> Debo >>> >>> Sent from my iPhone >>> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: It might help to think of our code base as
Re: [DISCUSS] Project reorganization
If you load up your Irc client just type /join #apache-metron-dev Sent from my iPhone > On Apr 11, 2016, at 12:06 PM, James Sirota wrote: > > Great, thanks, Debo. Where can I find instructions on how to get to it? > > Thanks, > James > > > > >> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: >> >> Hi James >> >> Ok set it up and ack ….. >> >> Thx >> >> >> >> >> >>> On 4/10/16, 6:31 PM, "James Sirota" wrote: >>> >>> Hi Debo, >>> >>> I think it would be great if you set it up >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: I have set it up for another open source effort in the past and it was not very hard. Am happy to volunteer if needed. Thx Debo Sent from my iPhone > On Apr 10, 2016, at 5:53 PM, James Sirota wrote: > > I’d be open to an IRC channel. Does anyone know if Apache allows this? > If yes, does anyone know how to set one up? > > Thanks, > James > > > > >> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >> >> Hi Nick >> >> I like your suggestions. For the enrichment layer do you think it would >> also include any advanced analytics. Else we might want to have an >> analytics layer. >> >> It would be good to have an arch which could be extended for new >> functionality. >> >> However Ryan's suggestion of the ui API and deployer also makes sense. >> >> Should we have an IRC channel to discuss this or maybe etherpad? >> >> Debo >> >> Sent from my iPhone >> >>> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: >>> >>> It might help to think of our code base as four separate types of >>> functionality. This is primarily meant to give us a framework to think >>> about the organization of Metron (and drive more discussion), rather >>> than >>> my proposal for a specific structure. >>> >>> - Sensor - Anything that captures external, non-streaming data and >>> presents it in a form ready for stream processing. >>> - Input - Responsible for preparing streaming data for enrichment. The >>> existing "parsers" fit neatly into this space. >>> - Enrichment - Responsible for enriching an incoming data feed like >>> geoip, asset enrichment, threat intel lookups, etc. >>> - Output - Responsible for persisting data that has been processed by >>> Metron which obviously means search indexers or data stores. >>> >>> >>> >>> >>> >>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >>> >>> wrote: >>> All, I would like to propose a review and refactor of the current project organization within Metron. Much of the way the legacy code was organized does not make sense anymore and could be designed so that it is easier to navigate and understand. Our test coverage has increased substantially so I believe we can do this with confidence. First off, I think we should agree on a naming convention. I see some projects (YARN and Storm for example) that prepend the sub-project with the name of the top-level project (storm-core for example). Metron also currently does this (Metron-Common). I think that's fine, although in the case of Metron, I feel like having "Metron" prepended is redundant. Regardless of whether we decide to stick with that approach, I propose that project names be uniform and lowercase. For example, under these assumptions "Metron-Common" would change to "common". The first level of organization makes sense to me. Only change I would make would be to project names: * deployment * streaming * ui Or if we want to keep metron in project names: * metron-deployment * metron-streaming * metron-ui For now I don't see any changes necessary in deployment or ui organization. I see the streaming project structure primarily driven by 2 things: the Maven dependency tree and deployment targets. For example, solr and elasticsearch code should be separated (because their dependency on lucene conflicts) but both will depend on common enrichment code. Also, now that parser, enrichment and pcap topologies are separate, code for those topologies will be deployed as separate jars. No reason to include parser code in enrichment topologies and vice-versa. Any other considerations I'm missing? With that being said, here is my initial proposal: >>>
Re: [DISCUSS] Project reorganization
I just registered 2 channels under my nick (you need to be registered with freenode to create a channel) …. I am on a mac now and textual5 works for me. These are open channels. 13:10:20] -ChanServ- #apache-metron is now registered to ddutta. [13:10:20] -ChanServ- [13:10:20] -ChanServ- Channel guidelines can be found on the freenode website [13:10:20] -ChanServ- (http://freenode.net/channel_guidelines.shtml). [13:10:20] -ChanServ- This is a primary namespace channel as per [13:10:20] -ChanServ- http://freenode.net/policy.shtml#primarychannels [13:10:20] -ChanServ- If you do not own this name, please consider [13:10:20] -ChanServ- dropping #apache-metron and using ##apache-metron instead. [13:11:19] register #apache-metron-dev [13:11:19] -ChanServ- #apache-metron-dev is now registered to ddutta. [13:11:19] -ChanServ- [13:11:19] -ChanServ- Channel guidelines can be found on the freenode website [13:11:19] -ChanServ- (http://freenode.net/channel_guidelines.shtml). [13:11:19] -ChanServ- This is a primary namespace channel as per [13:11:19] -ChanServ- http://freenode.net/policy.shtml#primarychannels [13:11:19] -ChanServ- If you do not own this name, please consider [13:11:19] -ChanServ- dropping #apache-metron-dev and using ##apache-metron-dev instead. On 4/11/16, 12:06 PM, "James Sirota" wrote: >Great, thanks, Debo. Where can I find instructions on how to get to it? > >Thanks, >James > > > > >On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: > >>Hi James >> >>Ok set it up and ack ….. >> >>Thx >> >> >> >> >> >>On 4/10/16, 6:31 PM, "James Sirota" wrote: >> >>>Hi Debo, >>> >>>I think it would be great if you set it up >>> >>>Thanks, >>>James >>> >>> >>> >>> >>>On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >>> I have set it up for another open source effort in the past and it was not very hard. Am happy to volunteer if needed. Thx Debo Sent from my iPhone > On Apr 10, 2016, at 5:53 PM, James Sirota wrote: > > I’d be open to an IRC channel. Does anyone know if Apache allows this? > If yes, does anyone know how to set one up? > > Thanks, > James > > > > >> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >> >> Hi Nick >> >> I like your suggestions. For the enrichment layer do you think it would >> also include any advanced analytics. Else we might want to have an >> analytics layer. >> >> It would be good to have an arch which could be extended for new >> functionality. >> >> However Ryan's suggestion of the ui API and deployer also makes sense. >> >> Should we have an IRC channel to discuss this or maybe etherpad? >> >> Debo >> >> Sent from my iPhone >> >>> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: >>> >>> It might help to think of our code base as four separate types of >>> functionality. This is primarily meant to give us a framework to think >>> about the organization of Metron (and drive more discussion), rather >>> than >>> my proposal for a specific structure. >>> >>> - Sensor - Anything that captures external, non-streaming data and >>> presents it in a form ready for stream processing. >>> - Input - Responsible for preparing streaming data for enrichment. The >>> existing "parsers" fit neatly into this space. >>> - Enrichment - Responsible for enriching an incoming data feed like >>> geoip, asset enrichment, threat intel lookups, etc. >>> - Output - Responsible for persisting data that has been processed by >>> Metron which obviously means search indexers or data stores. >>> >>> >>> >>> >>> >>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >>> >>> wrote: >>> All, I would like to propose a review and refactor of the current project organization within Metron. Much of the way the legacy code was organized does not make sense anymore and could be designed so that it is easier to navigate and understand. Our test coverage has increased substantially so I believe we can do this with confidence. First off, I think we should agree on a naming convention. I see some projects (YARN and Storm for example) that prepend the sub-project with the name of the top-level project (storm-core for example). Metron also currently does this (Metron-Common). I think that's fine, although in the case of Metron, I feel like having "Metron" prepended is redundant. Regardless of wheth
Re: [DISCUSS] Project reorganization
Great, thanks, Debo. Where can I find instructions on how to get to it? Thanks, James On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" wrote: >Hi James > >Ok set it up and ack ….. > >Thx > > > > > >On 4/10/16, 6:31 PM, "James Sirota" wrote: > >>Hi Debo, >> >>I think it would be great if you set it up >> >>Thanks, >>James >> >> >> >> >>On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >> >>>I have set it up for another open source effort in the past and it was not >>>very hard. Am happy to volunteer if needed. >>> >>>Thx >>>Debo >>> >>>Sent from my iPhone >>> On Apr 10, 2016, at 5:53 PM, James Sirota wrote: I’d be open to an IRC channel. Does anyone know if Apache allows this? If yes, does anyone know how to set one up? Thanks, James > On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: > > Hi Nick > > I like your suggestions. For the enrichment layer do you think it would > also include any advanced analytics. Else we might want to have an > analytics layer. > > It would be good to have an arch which could be extended for new > functionality. > > However Ryan's suggestion of the ui API and deployer also makes sense. > > Should we have an IRC channel to discuss this or maybe etherpad? > > Debo > > Sent from my iPhone > >> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: >> >> It might help to think of our code base as four separate types of >> functionality. This is primarily meant to give us a framework to think >> about the organization of Metron (and drive more discussion), rather than >> my proposal for a specific structure. >> >> - Sensor - Anything that captures external, non-streaming data and >> presents it in a form ready for stream processing. >> - Input - Responsible for preparing streaming data for enrichment. The >> existing "parsers" fit neatly into this space. >> - Enrichment - Responsible for enriching an incoming data feed like >> geoip, asset enrichment, threat intel lookups, etc. >> - Output - Responsible for persisting data that has been processed by >> Metron which obviously means search indexers or data stores. >> >> >> >> >> >> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >> wrote: >> >>> All, >>> >>> I would like to propose a review and refactor of the current project >>> organization within Metron. Much of the way the legacy code was >>> organized >>> does not make sense anymore and could be designed so that it is easier >>> to >>> navigate and understand. Our test coverage has increased substantially >>> so >>> I believe we can do this with confidence. >>> >>> First off, I think we should agree on a naming convention. I see some >>> projects (YARN and Storm for example) that prepend the sub-project with >>> the >>> name of the top-level project (storm-core for example). Metron also >>> currently does this (Metron-Common). I think that's fine, although in >>> the >>> case of Metron, I feel like having "Metron" prepended is redundant. >>> Regardless of whether we decide to stick with that approach, I propose >>> that >>> project names be uniform and lowercase. For example, under these >>> assumptions "Metron-Common" would change to "common". >>> >>> The first level of organization makes sense to me. Only change I would >>> make would be to project names: >>> >>> * deployment >>> * streaming >>> * ui >>> >>> Or if we want to keep metron in project names: >>> >>> * metron-deployment >>> * metron-streaming >>> * metron-ui >>> >>> For now I don't see any changes necessary in deployment or ui >>> organization. I see the streaming project structure primarily driven >>> by 2 >>> things: the Maven dependency tree and deployment targets. For example, >>> solr and elasticsearch code should be separated (because their >>> dependency >>> on lucene conflicts) but both will depend on common enrichment code. >>> Also, >>> now that parser, enrichment and pcap topologies are separate, code for >>> those topologies will be deployed as separate jars. No reason to >>> include >>> parser code in enrichment topologies and vice-versa. Any other >>> considerations I'm missing? >>> >>> With that being said, here is my initial proposal: >>> >>> * common - Any common code that all topologies depend on >>> (configuration classes, generic writers for example). No dependencies >>> on >>> other Metron projects. >>> * test - Contains utilities for writing unit tests, sample configs and >>> sample data. Will depend on common. >>> * integration-test - Contains utilit
Re: [DISCUSS] Project reorganization
Hi James Ok set it up and ack ….. Thx On 4/10/16, 6:31 PM, "James Sirota" wrote: >Hi Debo, > >I think it would be great if you set it up > >Thanks, >James > > > > >On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: > >>I have set it up for another open source effort in the past and it was not >>very hard. Am happy to volunteer if needed. >> >>Thx >>Debo >> >>Sent from my iPhone >> >>> On Apr 10, 2016, at 5:53 PM, James Sirota wrote: >>> >>> I’d be open to an IRC channel. Does anyone know if Apache allows this? If >>> yes, does anyone know how to set one up? >>> >>> Thanks, >>> James >>> >>> >>> >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: Hi Nick I like your suggestions. For the enrichment layer do you think it would also include any advanced analytics. Else we might want to have an analytics layer. It would be good to have an arch which could be extended for new functionality. However Ryan's suggestion of the ui API and deployer also makes sense. Should we have an IRC channel to discuss this or maybe etherpad? Debo Sent from my iPhone > On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: > > It might help to think of our code base as four separate types of > functionality. This is primarily meant to give us a framework to think > about the organization of Metron (and drive more discussion), rather than > my proposal for a specific structure. > > - Sensor - Anything that captures external, non-streaming data and > presents it in a form ready for stream processing. > - Input - Responsible for preparing streaming data for enrichment. The > existing "parsers" fit neatly into this space. > - Enrichment - Responsible for enriching an incoming data feed like > geoip, asset enrichment, threat intel lookups, etc. > - Output - Responsible for persisting data that has been processed by > Metron which obviously means search indexers or data stores. > > > > > > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman > wrote: > >> All, >> >> I would like to propose a review and refactor of the current project >> organization within Metron. Much of the way the legacy code was >> organized >> does not make sense anymore and could be designed so that it is easier to >> navigate and understand. Our test coverage has increased substantially >> so >> I believe we can do this with confidence. >> >> First off, I think we should agree on a naming convention. I see some >> projects (YARN and Storm for example) that prepend the sub-project with >> the >> name of the top-level project (storm-core for example). Metron also >> currently does this (Metron-Common). I think that's fine, although in >> the >> case of Metron, I feel like having "Metron" prepended is redundant. >> Regardless of whether we decide to stick with that approach, I propose >> that >> project names be uniform and lowercase. For example, under these >> assumptions "Metron-Common" would change to "common". >> >> The first level of organization makes sense to me. Only change I would >> make would be to project names: >> >> * deployment >> * streaming >> * ui >> >> Or if we want to keep metron in project names: >> >> * metron-deployment >> * metron-streaming >> * metron-ui >> >> For now I don't see any changes necessary in deployment or ui >> organization. I see the streaming project structure primarily driven by >> 2 >> things: the Maven dependency tree and deployment targets. For example, >> solr and elasticsearch code should be separated (because their dependency >> on lucene conflicts) but both will depend on common enrichment code. >> Also, >> now that parser, enrichment and pcap topologies are separate, code for >> those topologies will be deployed as separate jars. No reason to include >> parser code in enrichment topologies and vice-versa. Any other >> considerations I'm missing? >> >> With that being said, here is my initial proposal: >> >> * common - Any common code that all topologies depend on >> (configuration classes, generic writers for example). No dependencies on >> other Metron projects. >> * test - Contains utilities for writing unit tests, sample configs and >> sample data. Will depend on common. >> * integration-test - Contains utilities and classes needed to run our >> integration tests (in memory components for example). Will depend on >> common and test. >> * dataload - Contains all code related to data loading. Will also >> include any property files needed and integration tests. Will depend on >> common, test (test scope
Re: [DISCUSS] Project reorganization
Also, +1 for more intelligent use of dependencyManagement and have each su module build independently. I think the top level Pom should build metron streaming as well as the C component as well. I tend to favor smaller projects for common code rather than these grab bag common projects as well, but I do not have a strong opposition necessarily. Sorry for typos; commenting from my phone at the airport. Casey On Mon, Apr 11, 2016 at 11:57 Casey Stella wrote: > I'm in general in favor of keeping an integration test project only for > integration test infrastructure (i.e. The inmemory components) and having > the integration tests live in the projects that have the components that > are being tested. > > On Mon, Apr 11, 2016 at 11:36 David Lyle wrote: > >> I think I was thinking along the same lines as James, let me read it back >> to make sure: >> >> Metron >> Platform >> Common (*) >> Integration-Test (*) >> DataManagement >> PCAP >> Parsers >> Enrichment >>Solr >>Elasticsearch >> Deployment >> Streaming >> UI >> >> For Common and Integration-Test, I'd be interested in a little more >> discussion around keeping them. I lean toward not having them. I >> understand >> and support the goal of reuse, but I've found these catch-all projects >> don't always facilitate that aim. We may be better served in the long run >> by aligning these classes with their initial users. For example, wouldn't >> all the bolt interfaces and abstract classes be better homed in >> Enrichment? >> Configuration classes may be best as a separate project under Platform? >> The >> classes in Metron-Testing may have to stick around as a separate project- >> but perhaps not, they seem to be tightly aligned with enrichment type >> integration testing. >> >> Also- since we're going to have to refactor the poms as part of this >> effort, there are some first order principles that'd I'd be interested in >> hearing other's thoughts about: >> >> 1) mvn (whatever) should run from the top level and each sub-module. >> 2) The top level pom should use a dependencyManagement section to avoid >> global_version type variables. >> 3) All plugins and dependencies should have a specified version (fwiw, I >> think we're pretty good here, but it's worth a look) >> 4) Versioning- master/trunk should be version-SNAPSHOT. >> 5) Other thoughts? >> >> >> -D... >> >> >> On Sun, Apr 10, 2016 at 8:31 PM, James Sirota >> wrote: >> >> > Hi Debo, >> > >> > I think it would be great if you set it up >> > >> > Thanks, >> > James >> > >> > >> > >> > >> > On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >> > >> > >I have set it up for another open source effort in the past and it was >> > not very hard. Am happy to volunteer if needed. >> > > >> > >Thx >> > >Debo >> > > >> > >Sent from my iPhone >> > > >> > >> On Apr 10, 2016, at 5:53 PM, James Sirota >> > wrote: >> > >> >> > >> I’d be open to an IRC channel. Does anyone know if Apache allows >> > this? If yes, does anyone know how to set one up? >> > >> >> > >> Thanks, >> > >> James >> > >> >> > >> >> > >> >> > >> >> > >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >> > >>> >> > >>> Hi Nick >> > >>> >> > >>> I like your suggestions. For the enrichment layer do you think it >> > would also include any advanced analytics. Else we might want to have an >> > analytics layer. >> > >>> >> > >>> It would be good to have an arch which could be extended for new >> > functionality. >> > >>> >> > >>> However Ryan's suggestion of the ui API and deployer also makes >> sense. >> > >>> >> > >>> Should we have an IRC channel to discuss this or maybe etherpad? >> > >>> >> > >>> Debo >> > >>> >> > >>> Sent from my iPhone >> > >>> >> > On Apr 10, 2016, at 4:36 PM, Nick Allen >> wrote: >> > >> > It might help to think of our code base as four separate types of >> > functionality. This is primarily meant to give us a framework to >> > think >> > about the organization of Metron (and drive more discussion), >> rather >> > than >> > my proposal for a specific structure. >> > >> > - Sensor - Anything that captures external, non-streaming data and >> > presents it in a form ready for stream processing. >> > - Input - Responsible for preparing streaming data for enrichment. >> > The >> > existing "parsers" fit neatly into this space. >> > - Enrichment - Responsible for enriching an incoming data feed >> like >> > geoip, asset enrichment, threat intel lookups, etc. >> > - Output - Responsible for persisting data that has been >> processed by >> > Metron which obviously means search indexers or data stores. >> > >> > >> > >> > >> > >> > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman < >> > rmerri...@hortonworks.com> >> > wrote: >> > >> > > All, >> > > >> > > I would like to propose a review and refactor of the current >> project >> > > organizatio
Re: [DISCUSS] Project reorganization
I'm in general in favor of keeping an integration test project only for integration test infrastructure (i.e. The inmemory components) and having the integration tests live in the projects that have the components that are being tested. On Mon, Apr 11, 2016 at 11:36 David Lyle wrote: > I think I was thinking along the same lines as James, let me read it back > to make sure: > > Metron > Platform > Common (*) > Integration-Test (*) > DataManagement > PCAP > Parsers > Enrichment >Solr >Elasticsearch > Deployment > Streaming > UI > > For Common and Integration-Test, I'd be interested in a little more > discussion around keeping them. I lean toward not having them. I understand > and support the goal of reuse, but I've found these catch-all projects > don't always facilitate that aim. We may be better served in the long run > by aligning these classes with their initial users. For example, wouldn't > all the bolt interfaces and abstract classes be better homed in Enrichment? > Configuration classes may be best as a separate project under Platform? The > classes in Metron-Testing may have to stick around as a separate project- > but perhaps not, they seem to be tightly aligned with enrichment type > integration testing. > > Also- since we're going to have to refactor the poms as part of this > effort, there are some first order principles that'd I'd be interested in > hearing other's thoughts about: > > 1) mvn (whatever) should run from the top level and each sub-module. > 2) The top level pom should use a dependencyManagement section to avoid > global_version type variables. > 3) All plugins and dependencies should have a specified version (fwiw, I > think we're pretty good here, but it's worth a look) > 4) Versioning- master/trunk should be version-SNAPSHOT. > 5) Other thoughts? > > > -D... > > > On Sun, Apr 10, 2016 at 8:31 PM, James Sirota > wrote: > > > Hi Debo, > > > > I think it would be great if you set it up > > > > Thanks, > > James > > > > > > > > > > On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: > > > > >I have set it up for another open source effort in the past and it was > > not very hard. Am happy to volunteer if needed. > > > > > >Thx > > >Debo > > > > > >Sent from my iPhone > > > > > >> On Apr 10, 2016, at 5:53 PM, James Sirota > > wrote: > > >> > > >> I’d be open to an IRC channel. Does anyone know if Apache allows > > this? If yes, does anyone know how to set one up? > > >> > > >> Thanks, > > >> James > > >> > > >> > > >> > > >> > > >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: > > >>> > > >>> Hi Nick > > >>> > > >>> I like your suggestions. For the enrichment layer do you think it > > would also include any advanced analytics. Else we might want to have an > > analytics layer. > > >>> > > >>> It would be good to have an arch which could be extended for new > > functionality. > > >>> > > >>> However Ryan's suggestion of the ui API and deployer also makes > sense. > > >>> > > >>> Should we have an IRC channel to discuss this or maybe etherpad? > > >>> > > >>> Debo > > >>> > > >>> Sent from my iPhone > > >>> > > On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: > > > > It might help to think of our code base as four separate types of > > functionality. This is primarily meant to give us a framework to > > think > > about the organization of Metron (and drive more discussion), rather > > than > > my proposal for a specific structure. > > > > - Sensor - Anything that captures external, non-streaming data and > > presents it in a form ready for stream processing. > > - Input - Responsible for preparing streaming data for enrichment. > > The > > existing "parsers" fit neatly into this space. > > - Enrichment - Responsible for enriching an incoming data feed like > > geoip, asset enrichment, threat intel lookups, etc. > > - Output - Responsible for persisting data that has been processed > by > > Metron which obviously means search indexers or data stores. > > > > > > > > > > > > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman < > > rmerri...@hortonworks.com> > > wrote: > > > > > All, > > > > > > I would like to propose a review and refactor of the current > project > > > organization within Metron. Much of the way the legacy code was > > organized > > > does not make sense anymore and could be designed so that it is > > easier to > > > navigate and understand. Our test coverage has increased > > substantially so > > > I believe we can do this with confidence. > > > > > > First off, I think we should agree on a naming convention. I see > > some > > > projects (YARN and Storm for example) that prepend the sub-project > > with the > > > name of the top-level project (storm-core for example). Metron > also > > > currently does this (Metron-Common). I think that's fi
Re: [DISCUSS] Project reorganization
I think I was thinking along the same lines as James, let me read it back to make sure: Metron Platform Common (*) Integration-Test (*) DataManagement PCAP Parsers Enrichment Solr Elasticsearch Deployment Streaming UI For Common and Integration-Test, I'd be interested in a little more discussion around keeping them. I lean toward not having them. I understand and support the goal of reuse, but I've found these catch-all projects don't always facilitate that aim. We may be better served in the long run by aligning these classes with their initial users. For example, wouldn't all the bolt interfaces and abstract classes be better homed in Enrichment? Configuration classes may be best as a separate project under Platform? The classes in Metron-Testing may have to stick around as a separate project- but perhaps not, they seem to be tightly aligned with enrichment type integration testing. Also- since we're going to have to refactor the poms as part of this effort, there are some first order principles that'd I'd be interested in hearing other's thoughts about: 1) mvn (whatever) should run from the top level and each sub-module. 2) The top level pom should use a dependencyManagement section to avoid global_version type variables. 3) All plugins and dependencies should have a specified version (fwiw, I think we're pretty good here, but it's worth a look) 4) Versioning- master/trunk should be version-SNAPSHOT. 5) Other thoughts? -D... On Sun, Apr 10, 2016 at 8:31 PM, James Sirota wrote: > Hi Debo, > > I think it would be great if you set it up > > Thanks, > James > > > > > On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: > > >I have set it up for another open source effort in the past and it was > not very hard. Am happy to volunteer if needed. > > > >Thx > >Debo > > > >Sent from my iPhone > > > >> On Apr 10, 2016, at 5:53 PM, James Sirota > wrote: > >> > >> I’d be open to an IRC channel. Does anyone know if Apache allows > this? If yes, does anyone know how to set one up? > >> > >> Thanks, > >> James > >> > >> > >> > >> > >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: > >>> > >>> Hi Nick > >>> > >>> I like your suggestions. For the enrichment layer do you think it > would also include any advanced analytics. Else we might want to have an > analytics layer. > >>> > >>> It would be good to have an arch which could be extended for new > functionality. > >>> > >>> However Ryan's suggestion of the ui API and deployer also makes sense. > >>> > >>> Should we have an IRC channel to discuss this or maybe etherpad? > >>> > >>> Debo > >>> > >>> Sent from my iPhone > >>> > On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: > > It might help to think of our code base as four separate types of > functionality. This is primarily meant to give us a framework to > think > about the organization of Metron (and drive more discussion), rather > than > my proposal for a specific structure. > > - Sensor - Anything that captures external, non-streaming data and > presents it in a form ready for stream processing. > - Input - Responsible for preparing streaming data for enrichment. > The > existing "parsers" fit neatly into this space. > - Enrichment - Responsible for enriching an incoming data feed like > geoip, asset enrichment, threat intel lookups, etc. > - Output - Responsible for persisting data that has been processed by > Metron which obviously means search indexers or data stores. > > > > > > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman < > rmerri...@hortonworks.com> > wrote: > > > All, > > > > I would like to propose a review and refactor of the current project > > organization within Metron. Much of the way the legacy code was > organized > > does not make sense anymore and could be designed so that it is > easier to > > navigate and understand. Our test coverage has increased > substantially so > > I believe we can do this with confidence. > > > > First off, I think we should agree on a naming convention. I see > some > > projects (YARN and Storm for example) that prepend the sub-project > with the > > name of the top-level project (storm-core for example). Metron also > > currently does this (Metron-Common). I think that's fine, although > in the > > case of Metron, I feel like having "Metron" prepended is redundant. > > Regardless of whether we decide to stick with that approach, I > propose that > > project names be uniform and lowercase. For example, under these > > assumptions "Metron-Common" would change to "common". > > > > The first level of organization makes sense to me. Only change I > would > > make would be to project names: > > > > * deployment > > * streaming > > * ui > > > > Or if we want to keep metron in proje
Re: [DISCUSS] Project reorganization
Hi Debo, I think it would be great if you set it up Thanks, James On 4/10/16, 6:25 PM, "Debojyoti Dutta" wrote: >I have set it up for another open source effort in the past and it was not >very hard. Am happy to volunteer if needed. > >Thx >Debo > >Sent from my iPhone > >> On Apr 10, 2016, at 5:53 PM, James Sirota wrote: >> >> I’d be open to an IRC channel. Does anyone know if Apache allows this? If >> yes, does anyone know how to set one up? >> >> Thanks, >> James >> >> >> >> >>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >>> >>> Hi Nick >>> >>> I like your suggestions. For the enrichment layer do you think it would >>> also include any advanced analytics. Else we might want to have an >>> analytics layer. >>> >>> It would be good to have an arch which could be extended for new >>> functionality. >>> >>> However Ryan's suggestion of the ui API and deployer also makes sense. >>> >>> Should we have an IRC channel to discuss this or maybe etherpad? >>> >>> Debo >>> >>> Sent from my iPhone >>> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: It might help to think of our code base as four separate types of functionality. This is primarily meant to give us a framework to think about the organization of Metron (and drive more discussion), rather than my proposal for a specific structure. - Sensor - Anything that captures external, non-streaming data and presents it in a form ready for stream processing. - Input - Responsible for preparing streaming data for enrichment. The existing "parsers" fit neatly into this space. - Enrichment - Responsible for enriching an incoming data feed like geoip, asset enrichment, threat intel lookups, etc. - Output - Responsible for persisting data that has been processed by Metron which obviously means search indexers or data stores. On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman wrote: > All, > > I would like to propose a review and refactor of the current project > organization within Metron. Much of the way the legacy code was organized > does not make sense anymore and could be designed so that it is easier to > navigate and understand. Our test coverage has increased substantially so > I believe we can do this with confidence. > > First off, I think we should agree on a naming convention. I see some > projects (YARN and Storm for example) that prepend the sub-project with > the > name of the top-level project (storm-core for example). Metron also > currently does this (Metron-Common). I think that's fine, although in the > case of Metron, I feel like having "Metron" prepended is redundant. > Regardless of whether we decide to stick with that approach, I propose > that > project names be uniform and lowercase. For example, under these > assumptions "Metron-Common" would change to "common". > > The first level of organization makes sense to me. Only change I would > make would be to project names: > > * deployment > * streaming > * ui > > Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > > For now I don't see any changes necessary in deployment or ui > organization. I see the streaming project structure primarily driven by 2 > things: the Maven dependency tree and deployment targets. For example, > solr and elasticsearch code should be separated (because their dependency > on lucene conflicts) but both will depend on common enrichment code. > Also, > now that parser, enrichment and pcap topologies are separate, code for > those topologies will be deployed as separate jars. No reason to include > parser code in enrichment topologies and vice-versa. Any other > considerations I'm missing? > > With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on > (configuration classes, generic writers for example). No dependencies on > other Metron projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on > common and test. > * dataload - Contains all code related to data loading. Will also > include any property files needed and integration tests. Will depend on > common, test (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also > include scripts, property files, flux files and parser topology > integration > tests. This project will depend on common, t
Re: [DISCUSS] Project reorganization
I have set it up for another open source effort in the past and it was not very hard. Am happy to volunteer if needed. Thx Debo Sent from my iPhone > On Apr 10, 2016, at 5:53 PM, James Sirota wrote: > > I’d be open to an IRC channel. Does anyone know if Apache allows this? If > yes, does anyone know how to set one up? > > Thanks, > James > > > > >> On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >> >> Hi Nick >> >> I like your suggestions. For the enrichment layer do you think it would also >> include any advanced analytics. Else we might want to have an analytics >> layer. >> >> It would be good to have an arch which could be extended for new >> functionality. >> >> However Ryan's suggestion of the ui API and deployer also makes sense. >> >> Should we have an IRC channel to discuss this or maybe etherpad? >> >> Debo >> >> Sent from my iPhone >> >>> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: >>> >>> It might help to think of our code base as four separate types of >>> functionality. This is primarily meant to give us a framework to think >>> about the organization of Metron (and drive more discussion), rather than >>> my proposal for a specific structure. >>> >>> - Sensor - Anything that captures external, non-streaming data and >>> presents it in a form ready for stream processing. >>> - Input - Responsible for preparing streaming data for enrichment. The >>> existing "parsers" fit neatly into this space. >>> - Enrichment - Responsible for enriching an incoming data feed like >>> geoip, asset enrichment, threat intel lookups, etc. >>> - Output - Responsible for persisting data that has been processed by >>> Metron which obviously means search indexers or data stores. >>> >>> >>> >>> >>> >>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >>> wrote: >>> All, I would like to propose a review and refactor of the current project organization within Metron. Much of the way the legacy code was organized does not make sense anymore and could be designed so that it is easier to navigate and understand. Our test coverage has increased substantially so I believe we can do this with confidence. First off, I think we should agree on a naming convention. I see some projects (YARN and Storm for example) that prepend the sub-project with the name of the top-level project (storm-core for example). Metron also currently does this (Metron-Common). I think that's fine, although in the case of Metron, I feel like having "Metron" prepended is redundant. Regardless of whether we decide to stick with that approach, I propose that project names be uniform and lowercase. For example, under these assumptions "Metron-Common" would change to "common". The first level of organization makes sense to me. Only change I would make would be to project names: * deployment * streaming * ui Or if we want to keep metron in project names: * metron-deployment * metron-streaming * metron-ui For now I don't see any changes necessary in deployment or ui organization. I see the streaming project structure primarily driven by 2 things: the Maven dependency tree and deployment targets. For example, solr and elasticsearch code should be separated (because their dependency on lucene conflicts) but both will depend on common enrichment code. Also, now that parser, enrichment and pcap topologies are separate, code for those topologies will be deployed as separate jars. No reason to include parser code in enrichment topologies and vice-versa. Any other considerations I'm missing? With that being said, here is my initial proposal: * common - Any common code that all topologies depend on (configuration classes, generic writers for example). No dependencies on other Metron projects. * test - Contains utilities for writing unit tests, sample configs and sample data. Will depend on common. * integration-test - Contains utilities and classes needed to run our integration tests (in memory components for example). Will depend on common and test. * dataload - Contains all code related to data loading. Will also include any property files needed and integration tests. Will depend on common, test (test scope), and integration-test (test scope). * parser - All code specific to the parser topologies. Would also include scripts, property files, flux files and parser topology integration tests. This project will depend on common, test (test scope), and integration-testing (test scope). * enrichment - All code specific to the enrichment topologies (except solr and elasticsearch). Would also include scripts, property files, flux files and enrichment topology integration t
Re: [DISCUSS] Project reorganization
I’d be open to an IRC channel. Does anyone know if Apache allows this? If yes, does anyone know how to set one up? Thanks, James On 4/10/16, 4:52 PM, "Debojyoti Dutta" wrote: >Hi Nick > >I like your suggestions. For the enrichment layer do you think it would also >include any advanced analytics. Else we might want to have an analytics layer. > >It would be good to have an arch which could be extended for new >functionality. > >However Ryan's suggestion of the ui API and deployer also makes sense. > >Should we have an IRC channel to discuss this or maybe etherpad? > >Debo > >Sent from my iPhone > >> On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: >> >> It might help to think of our code base as four separate types of >> functionality. This is primarily meant to give us a framework to think >> about the organization of Metron (and drive more discussion), rather than >> my proposal for a specific structure. >> >> - Sensor - Anything that captures external, non-streaming data and >> presents it in a form ready for stream processing. >> - Input - Responsible for preparing streaming data for enrichment. The >> existing "parsers" fit neatly into this space. >> - Enrichment - Responsible for enriching an incoming data feed like >> geoip, asset enrichment, threat intel lookups, etc. >> - Output - Responsible for persisting data that has been processed by >> Metron which obviously means search indexers or data stores. >> >> >> >> >> >> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >> wrote: >> >>> All, >>> >>> I would like to propose a review and refactor of the current project >>> organization within Metron. Much of the way the legacy code was organized >>> does not make sense anymore and could be designed so that it is easier to >>> navigate and understand. Our test coverage has increased substantially so >>> I believe we can do this with confidence. >>> >>> First off, I think we should agree on a naming convention. I see some >>> projects (YARN and Storm for example) that prepend the sub-project with the >>> name of the top-level project (storm-core for example). Metron also >>> currently does this (Metron-Common). I think that's fine, although in the >>> case of Metron, I feel like having "Metron" prepended is redundant. >>> Regardless of whether we decide to stick with that approach, I propose that >>> project names be uniform and lowercase. For example, under these >>> assumptions "Metron-Common" would change to "common". >>> >>> The first level of organization makes sense to me. Only change I would >>> make would be to project names: >>> >>> * deployment >>> * streaming >>> * ui >>> >>> Or if we want to keep metron in project names: >>> >>> * metron-deployment >>> * metron-streaming >>> * metron-ui >>> >>> For now I don't see any changes necessary in deployment or ui >>> organization. I see the streaming project structure primarily driven by 2 >>> things: the Maven dependency tree and deployment targets. For example, >>> solr and elasticsearch code should be separated (because their dependency >>> on lucene conflicts) but both will depend on common enrichment code. Also, >>> now that parser, enrichment and pcap topologies are separate, code for >>> those topologies will be deployed as separate jars. No reason to include >>> parser code in enrichment topologies and vice-versa. Any other >>> considerations I'm missing? >>> >>> With that being said, here is my initial proposal: >>> >>> * common - Any common code that all topologies depend on >>> (configuration classes, generic writers for example). No dependencies on >>> other Metron projects. >>> * test - Contains utilities for writing unit tests, sample configs and >>> sample data. Will depend on common. >>> * integration-test - Contains utilities and classes needed to run our >>> integration tests (in memory components for example). Will depend on >>> common and test. >>> * dataload - Contains all code related to data loading. Will also >>> include any property files needed and integration tests. Will depend on >>> common, test (test scope), and integration-test (test scope). >>> * parser - All code specific to the parser topologies. Would also >>> include scripts, property files, flux files and parser topology integration >>> tests. This project will depend on common, test (test scope), and >>> integration-testing (test scope). >>> * enrichment - All code specific to the enrichment topologies (except >>> solr and elasticsearch). Would also include scripts, property files, flux >>> files and enrichment topology integration tests. This project will depend >>> on common, test (test scope), and integration-test (test scope). >>> * elasticsearch - All Elasticsearch related code. Will depend on >>> enrichment. >>> * solr - All Solr related code. Will depend on enrichment. >>> * pcap - All code specific to the topology dedicated to pcap. Would
Re: [DISCUSS] Project reorganization
I would put integration test framework into common (since all modules share this). I would also put a unit test framework that other projects can extend into common as well. I would then have each individual module extend the frameworks from common. I don’t think I would want the tests broken up in their own project that live separate from the modules. Thanks, James On 4/10/16, 4:10 PM, "Nick Allen" wrote: >Is there any reason to keep the "test" and "integration-test" code >separate? > > > > > > >On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >wrote: > >> All, >> >> I would like to propose a review and refactor of the current project >> organization within Metron. Much of the way the legacy code was organized >> does not make sense anymore and could be designed so that it is easier to >> navigate and understand. Our test coverage has increased substantially so >> I believe we can do this with confidence. >> >> First off, I think we should agree on a naming convention. I see some >> projects (YARN and Storm for example) that prepend the sub-project with the >> name of the top-level project (storm-core for example). Metron also >> currently does this (Metron-Common). I think that's fine, although in the >> case of Metron, I feel like having "Metron" prepended is redundant. >> Regardless of whether we decide to stick with that approach, I propose that >> project names be uniform and lowercase. For example, under these >> assumptions "Metron-Common" would change to "common". >> >> The first level of organization makes sense to me. Only change I would >> make would be to project names: >> >> * deployment >> * streaming >> * ui >> >> Or if we want to keep metron in project names: >> >> * metron-deployment >> * metron-streaming >> * metron-ui >> >> For now I don't see any changes necessary in deployment or ui >> organization. I see the streaming project structure primarily driven by 2 >> things: the Maven dependency tree and deployment targets. For example, >> solr and elasticsearch code should be separated (because their dependency >> on lucene conflicts) but both will depend on common enrichment code. Also, >> now that parser, enrichment and pcap topologies are separate, code for >> those topologies will be deployed as separate jars. No reason to include >> parser code in enrichment topologies and vice-versa. Any other >> considerations I'm missing? >> >> With that being said, here is my initial proposal: >> >> * common - Any common code that all topologies depend on >> (configuration classes, generic writers for example). No dependencies on >> other Metron projects. >> * test - Contains utilities for writing unit tests, sample configs and >> sample data. Will depend on common. >> * integration-test - Contains utilities and classes needed to run our >> integration tests (in memory components for example). Will depend on >> common and test. >> * dataload - Contains all code related to data loading. Will also >> include any property files needed and integration tests. Will depend on >> common, test (test scope), and integration-test (test scope). >> * parser - All code specific to the parser topologies. Would also >> include scripts, property files, flux files and parser topology integration >> tests. This project will depend on common, test (test scope), and >> integration-testing (test scope). >> * enrichment - All code specific to the enrichment topologies (except >> solr and elasticsearch). Would also include scripts, property files, flux >> files and enrichment topology integration tests. This project will depend >> on common, test (test scope), and integration-test (test scope). >> * elasticsearch - All Elasticsearch related code. Will depend on >> enrichment. >> * solr - All Solr related code. Will depend on enrichment. >> * pcap - All code specific to the topology dedicated to pcap. Would >> also include scripts, property files, flux files and pcap integration >> test. This project will depend on common, test (test scope) and >> integration-test (test scope). >> * api - This will serve as a generic replacement for >> Metron-Pcap_Service. Will contain all code to build a Metron web service >> middle layer that can expose APIs through REST or other client protocols. >> Could possibly depend on all other projects or separated further if version >> conflicts arise (separate api projects for solr and elasticsearch for >> example). >> >> Looking forward to hearing everyone's feedback and great ideas. >> >> Ryan Merriman >> > > > >-- >Nick Allen
Re: [DISCUSS] Project reorganization
Hi Ryan, Here are my thoughts. I agree with the first level of breakdown. Deployment, Streaming, UI. That makes sense. Although we may re-think Streaming because it will now contain a PCAP MR job, which is batch. I would probably just call it Metron-Platform or something like that. Under Metron-Pletform I would have the following projects: Common - agree with you we need it for the reasons you described. This will help us with code reuse and standardization DataManagement - contains data loaders (enrichment, threat intel) + data cleanup and rotation scripts PCAP - PCAP Storm topology + PCAP Service + MR job to back the service Parsers - Parser topology + parser bolt + parser modules/grok expressions. I think this should be broken up like this to make the incremental cost of adding new topologies as low as possible. To add a new topology we only want a user to build and deploy this jar and we want this jar to be as light as possible to only contain code for adding additional sources. Enrichment - enrichment topology + threat intel + alerts Next level down under Enrichment I would include elastic search and sold indexing projects as modules. I don’t think they warrant their own project, but they can be sub-modules of enrichment. API - I am in agreement with you that we need this. However, I think this API should wrap the PCAP service + introduce additional services for security and multi tenancy (discuss threads are going around right now). We want our security model to be consistently enforced so we should build it into this module and expose it as REST services. What do you think? Thanks, James On 4/8/16, 1:46 PM, "Ryan Merriman" wrote: >All, > >I would like to propose a review and refactor of the current project >organization within Metron. Much of the way the legacy code was organized >does not make sense anymore and could be designed so that it is easier to >navigate and understand. Our test coverage has increased substantially so I >believe we can do this with confidence. > >First off, I think we should agree on a naming convention. I see some >projects (YARN and Storm for example) that prepend the sub-project with the >name of the top-level project (storm-core for example). Metron also currently >does this (Metron-Common). I think that's fine, although in the case of >Metron, I feel like having "Metron" prepended is redundant. Regardless of >whether we decide to stick with that approach, I propose that project names be >uniform and lowercase. For example, under these assumptions "Metron-Common" >would change to "common". > >The first level of organization makes sense to me. Only change I would make >would be to project names: > > * deployment > * streaming > * ui > >Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > >For now I don't see any changes necessary in deployment or ui organization. I >see the streaming project structure primarily driven by 2 things: the Maven >dependency tree and deployment targets. For example, solr and elasticsearch >code should be separated (because their dependency on lucene conflicts) but >both will depend on common enrichment code. Also, now that parser, enrichment >and pcap topologies are separate, code for those topologies will be deployed >as separate jars. No reason to include parser code in enrichment topologies >and vice-versa. Any other considerations I'm missing? > >With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on (configuration > classes, generic writers for example). No dependencies on other Metron > projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on common > and test. > * dataload - Contains all code related to data loading. Will also include > any property files needed and integration tests. Will depend on common, test > (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also include > scripts, property files, flux files and parser topology integration tests. > This project will depend on common, test (test scope), and > integration-testing (test scope). > * enrichment - All code specific to the enrichment topologies (except solr > and elasticsearch). Would also include scripts, property files, flux files > and enrichment topology integration tests. This project will depend on > common, test (test scope), and integration-test (test scope). > * elasticsearch - All Elasticsearch related code. Will depend on > enrichment. > * solr - All Solr related code. Will depend on enrichment. > * pcap - All code specific to the topology dedic
Re: [DISCUSS] Project reorganization
Hi Nick, Threat intel is almost like an enrichment. A telemetry feed gets cross-referenced against a threat intel feed (think pivot tables), but threat intel in itself is not a telemetry. Metron’s storm topologies parse out individual attributes from telemetries like IDS alerts, OS logs, etc. and these attributes may be user agents, IP’s, ports, protocols, etc. Then the threat intel bolts cross-reference that information with anything we have in our threat intel feeds to see if any values for these attributes are contained in the feeds. So if we have an IP we check to see if that IP is in our list of malicious IPs. If we have a user agent we check if we have any information on it in our threat feeds. If we do we tag a telemetry with is_alert=true to indicate that a message received a hit against threat intel and append whatever the threat intel data was that it hit against. Thanks, James On 4/10/16, 4:54 PM, "Nick Allen" wrote: >I had a thought after going through this exercise. Why treat threat intel >any different than Netflow, Snort or YAF data? All input should have the >opportunity to be enriched using the generic tools that Metron provides. >Is there any reason to treat threat intel differently from other data >sources? > > >On Sun, Apr 10, 2016 at 7:36 PM, Nick Allen wrote: > >> It might help to think of our code base as four separate types of >> functionality. This is primarily meant to give us a framework to think >> about the organization of Metron (and drive more discussion), rather than >> my proposal for a specific structure. >> >>- Sensor - Anything that captures external, non-streaming data and >>presents it in a form ready for stream processing. >>- Input - Responsible for preparing streaming data for enrichment. >>The existing "parsers" fit neatly into this space. >>- Enrichment - Responsible for enriching an incoming data feed like >>geoip, asset enrichment, threat intel lookups, etc. >>- Output - Responsible for persisting data that has been processed by >>Metron which obviously means search indexers or data stores. >> >> >> >> >> >> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman >> wrote: >> >>> All, >>> >>> I would like to propose a review and refactor of the current project >>> organization within Metron. Much of the way the legacy code was organized >>> does not make sense anymore and could be designed so that it is easier to >>> navigate and understand. Our test coverage has increased substantially so >>> I believe we can do this with confidence. >>> >>> First off, I think we should agree on a naming convention. I see some >>> projects (YARN and Storm for example) that prepend the sub-project with the >>> name of the top-level project (storm-core for example). Metron also >>> currently does this (Metron-Common). I think that's fine, although in the >>> case of Metron, I feel like having "Metron" prepended is redundant. >>> Regardless of whether we decide to stick with that approach, I propose that >>> project names be uniform and lowercase. For example, under these >>> assumptions "Metron-Common" would change to "common". >>> >>> The first level of organization makes sense to me. Only change I would >>> make would be to project names: >>> >>> * deployment >>> * streaming >>> * ui >>> >>> Or if we want to keep metron in project names: >>> >>> * metron-deployment >>> * metron-streaming >>> * metron-ui >>> >>> For now I don't see any changes necessary in deployment or ui >>> organization. I see the streaming project structure primarily driven by 2 >>> things: the Maven dependency tree and deployment targets. For example, >>> solr and elasticsearch code should be separated (because their dependency >>> on lucene conflicts) but both will depend on common enrichment code. Also, >>> now that parser, enrichment and pcap topologies are separate, code for >>> those topologies will be deployed as separate jars. No reason to include >>> parser code in enrichment topologies and vice-versa. Any other >>> considerations I'm missing? >>> >>> With that being said, here is my initial proposal: >>> >>> * common - Any common code that all topologies depend on >>> (configuration classes, generic writers for example). No dependencies on >>> other Metron projects. >>> * test - Contains utilities for writing unit tests, sample configs >>> and sample data. Will depend on common. >>> * integration-test - Contains utilities and classes needed to run our >>> integration tests (in memory components for example). Will depend on >>> common and test. >>> * dataload - Contains all code related to data loading. Will also >>> include any property files needed and integration tests. Will depend on >>> common, test (test scope), and integration-test (test scope). >>> * parser - All code specific to the parser topologies. Would also >>> include scripts, property files, flux files and parser topolo
Re: [DISCUSS] Project reorganization
I had a thought after going through this exercise. Why treat threat intel any different than Netflow, Snort or YAF data? All input should have the opportunity to be enriched using the generic tools that Metron provides. Is there any reason to treat threat intel differently from other data sources? On Sun, Apr 10, 2016 at 7:36 PM, Nick Allen wrote: > It might help to think of our code base as four separate types of > functionality. This is primarily meant to give us a framework to think > about the organization of Metron (and drive more discussion), rather than > my proposal for a specific structure. > >- Sensor - Anything that captures external, non-streaming data and >presents it in a form ready for stream processing. >- Input - Responsible for preparing streaming data for enrichment. >The existing "parsers" fit neatly into this space. >- Enrichment - Responsible for enriching an incoming data feed like >geoip, asset enrichment, threat intel lookups, etc. >- Output - Responsible for persisting data that has been processed by >Metron which obviously means search indexers or data stores. > > > > > > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman > wrote: > >> All, >> >> I would like to propose a review and refactor of the current project >> organization within Metron. Much of the way the legacy code was organized >> does not make sense anymore and could be designed so that it is easier to >> navigate and understand. Our test coverage has increased substantially so >> I believe we can do this with confidence. >> >> First off, I think we should agree on a naming convention. I see some >> projects (YARN and Storm for example) that prepend the sub-project with the >> name of the top-level project (storm-core for example). Metron also >> currently does this (Metron-Common). I think that's fine, although in the >> case of Metron, I feel like having "Metron" prepended is redundant. >> Regardless of whether we decide to stick with that approach, I propose that >> project names be uniform and lowercase. For example, under these >> assumptions "Metron-Common" would change to "common". >> >> The first level of organization makes sense to me. Only change I would >> make would be to project names: >> >> * deployment >> * streaming >> * ui >> >> Or if we want to keep metron in project names: >> >> * metron-deployment >> * metron-streaming >> * metron-ui >> >> For now I don't see any changes necessary in deployment or ui >> organization. I see the streaming project structure primarily driven by 2 >> things: the Maven dependency tree and deployment targets. For example, >> solr and elasticsearch code should be separated (because their dependency >> on lucene conflicts) but both will depend on common enrichment code. Also, >> now that parser, enrichment and pcap topologies are separate, code for >> those topologies will be deployed as separate jars. No reason to include >> parser code in enrichment topologies and vice-versa. Any other >> considerations I'm missing? >> >> With that being said, here is my initial proposal: >> >> * common - Any common code that all topologies depend on >> (configuration classes, generic writers for example). No dependencies on >> other Metron projects. >> * test - Contains utilities for writing unit tests, sample configs >> and sample data. Will depend on common. >> * integration-test - Contains utilities and classes needed to run our >> integration tests (in memory components for example). Will depend on >> common and test. >> * dataload - Contains all code related to data loading. Will also >> include any property files needed and integration tests. Will depend on >> common, test (test scope), and integration-test (test scope). >> * parser - All code specific to the parser topologies. Would also >> include scripts, property files, flux files and parser topology integration >> tests. This project will depend on common, test (test scope), and >> integration-testing (test scope). >> * enrichment - All code specific to the enrichment topologies (except >> solr and elasticsearch). Would also include scripts, property files, flux >> files and enrichment topology integration tests. This project will depend >> on common, test (test scope), and integration-test (test scope). >> * elasticsearch - All Elasticsearch related code. Will depend on >> enrichment. >> * solr - All Solr related code. Will depend on enrichment. >> * pcap - All code specific to the topology dedicated to pcap. Would >> also include scripts, property files, flux files and pcap integration >> test. This project will depend on common, test (test scope) and >> integration-test (test scope). >> * api - This will serve as a generic replacement for >> Metron-Pcap_Service. Will contain all code to build a Metron web service >> middle layer that can expose APIs through REST or other client protocols. >> Could
Re: [DISCUSS] Project reorganization
Hi Nick I like your suggestions. For the enrichment layer do you think it would also include any advanced analytics. Else we might want to have an analytics layer. It would be good to have an arch which could be extended for new functionality. However Ryan's suggestion of the ui API and deployer also makes sense. Should we have an IRC channel to discuss this or maybe etherpad? Debo Sent from my iPhone > On Apr 10, 2016, at 4:36 PM, Nick Allen wrote: > > It might help to think of our code base as four separate types of > functionality. This is primarily meant to give us a framework to think > about the organization of Metron (and drive more discussion), rather than > my proposal for a specific structure. > > - Sensor - Anything that captures external, non-streaming data and > presents it in a form ready for stream processing. > - Input - Responsible for preparing streaming data for enrichment. The > existing "parsers" fit neatly into this space. > - Enrichment - Responsible for enriching an incoming data feed like > geoip, asset enrichment, threat intel lookups, etc. > - Output - Responsible for persisting data that has been processed by > Metron which obviously means search indexers or data stores. > > > > > > On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman > wrote: > >> All, >> >> I would like to propose a review and refactor of the current project >> organization within Metron. Much of the way the legacy code was organized >> does not make sense anymore and could be designed so that it is easier to >> navigate and understand. Our test coverage has increased substantially so >> I believe we can do this with confidence. >> >> First off, I think we should agree on a naming convention. I see some >> projects (YARN and Storm for example) that prepend the sub-project with the >> name of the top-level project (storm-core for example). Metron also >> currently does this (Metron-Common). I think that's fine, although in the >> case of Metron, I feel like having "Metron" prepended is redundant. >> Regardless of whether we decide to stick with that approach, I propose that >> project names be uniform and lowercase. For example, under these >> assumptions "Metron-Common" would change to "common". >> >> The first level of organization makes sense to me. Only change I would >> make would be to project names: >> >> * deployment >> * streaming >> * ui >> >> Or if we want to keep metron in project names: >> >> * metron-deployment >> * metron-streaming >> * metron-ui >> >> For now I don't see any changes necessary in deployment or ui >> organization. I see the streaming project structure primarily driven by 2 >> things: the Maven dependency tree and deployment targets. For example, >> solr and elasticsearch code should be separated (because their dependency >> on lucene conflicts) but both will depend on common enrichment code. Also, >> now that parser, enrichment and pcap topologies are separate, code for >> those topologies will be deployed as separate jars. No reason to include >> parser code in enrichment topologies and vice-versa. Any other >> considerations I'm missing? >> >> With that being said, here is my initial proposal: >> >> * common - Any common code that all topologies depend on >> (configuration classes, generic writers for example). No dependencies on >> other Metron projects. >> * test - Contains utilities for writing unit tests, sample configs and >> sample data. Will depend on common. >> * integration-test - Contains utilities and classes needed to run our >> integration tests (in memory components for example). Will depend on >> common and test. >> * dataload - Contains all code related to data loading. Will also >> include any property files needed and integration tests. Will depend on >> common, test (test scope), and integration-test (test scope). >> * parser - All code specific to the parser topologies. Would also >> include scripts, property files, flux files and parser topology integration >> tests. This project will depend on common, test (test scope), and >> integration-testing (test scope). >> * enrichment - All code specific to the enrichment topologies (except >> solr and elasticsearch). Would also include scripts, property files, flux >> files and enrichment topology integration tests. This project will depend >> on common, test (test scope), and integration-test (test scope). >> * elasticsearch - All Elasticsearch related code. Will depend on >> enrichment. >> * solr - All Solr related code. Will depend on enrichment. >> * pcap - All code specific to the topology dedicated to pcap. Would >> also include scripts, property files, flux files and pcap integration >> test. This project will depend on common, test (test scope) and >> integration-test (test scope). >> * api - This will serve as a generic replacement for >> Metron-Pcap_Service. Will contain all code to build
Re: [DISCUSS] Project reorganization
It might help to think of our code base as four separate types of functionality. This is primarily meant to give us a framework to think about the organization of Metron (and drive more discussion), rather than my proposal for a specific structure. - Sensor - Anything that captures external, non-streaming data and presents it in a form ready for stream processing. - Input - Responsible for preparing streaming data for enrichment. The existing "parsers" fit neatly into this space. - Enrichment - Responsible for enriching an incoming data feed like geoip, asset enrichment, threat intel lookups, etc. - Output - Responsible for persisting data that has been processed by Metron which obviously means search indexers or data stores. On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman wrote: > All, > > I would like to propose a review and refactor of the current project > organization within Metron. Much of the way the legacy code was organized > does not make sense anymore and could be designed so that it is easier to > navigate and understand. Our test coverage has increased substantially so > I believe we can do this with confidence. > > First off, I think we should agree on a naming convention. I see some > projects (YARN and Storm for example) that prepend the sub-project with the > name of the top-level project (storm-core for example). Metron also > currently does this (Metron-Common). I think that's fine, although in the > case of Metron, I feel like having "Metron" prepended is redundant. > Regardless of whether we decide to stick with that approach, I propose that > project names be uniform and lowercase. For example, under these > assumptions "Metron-Common" would change to "common". > > The first level of organization makes sense to me. Only change I would > make would be to project names: > > * deployment > * streaming > * ui > > Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > > For now I don't see any changes necessary in deployment or ui > organization. I see the streaming project structure primarily driven by 2 > things: the Maven dependency tree and deployment targets. For example, > solr and elasticsearch code should be separated (because their dependency > on lucene conflicts) but both will depend on common enrichment code. Also, > now that parser, enrichment and pcap topologies are separate, code for > those topologies will be deployed as separate jars. No reason to include > parser code in enrichment topologies and vice-versa. Any other > considerations I'm missing? > > With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on > (configuration classes, generic writers for example). No dependencies on > other Metron projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on > common and test. > * dataload - Contains all code related to data loading. Will also > include any property files needed and integration tests. Will depend on > common, test (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also > include scripts, property files, flux files and parser topology integration > tests. This project will depend on common, test (test scope), and > integration-testing (test scope). > * enrichment - All code specific to the enrichment topologies (except > solr and elasticsearch). Would also include scripts, property files, flux > files and enrichment topology integration tests. This project will depend > on common, test (test scope), and integration-test (test scope). > * elasticsearch - All Elasticsearch related code. Will depend on > enrichment. > * solr - All Solr related code. Will depend on enrichment. > * pcap - All code specific to the topology dedicated to pcap. Would > also include scripts, property files, flux files and pcap integration > test. This project will depend on common, test (test scope) and > integration-test (test scope). > * api - This will serve as a generic replacement for > Metron-Pcap_Service. Will contain all code to build a Metron web service > middle layer that can expose APIs through REST or other client protocols. > Could possibly depend on all other projects or separated further if version > conflicts arise (separate api projects for solr and elasticsearch for > example). > > Looking forward to hearing everyone's feedback and great ideas. > > Ryan Merriman > -- Nick Allen
Re: [DISCUSS] Project reorganization
I agree that we should stick to some sort of naming convention. Personally I prefer to keep the "metron" identifier and alter deployment so that it matches the others; metron-deployment. - metron-deployment - metron-streaming - metron-ui - ... On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman wrote: > All, > > I would like to propose a review and refactor of the current project > organization within Metron. Much of the way the legacy code was organized > does not make sense anymore and could be designed so that it is easier to > navigate and understand. Our test coverage has increased substantially so > I believe we can do this with confidence. > > First off, I think we should agree on a naming convention. I see some > projects (YARN and Storm for example) that prepend the sub-project with the > name of the top-level project (storm-core for example). Metron also > currently does this (Metron-Common). I think that's fine, although in the > case of Metron, I feel like having "Metron" prepended is redundant. > Regardless of whether we decide to stick with that approach, I propose that > project names be uniform and lowercase. For example, under these > assumptions "Metron-Common" would change to "common". > > The first level of organization makes sense to me. Only change I would > make would be to project names: > > * deployment > * streaming > * ui > > Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > > For now I don't see any changes necessary in deployment or ui > organization. I see the streaming project structure primarily driven by 2 > things: the Maven dependency tree and deployment targets. For example, > solr and elasticsearch code should be separated (because their dependency > on lucene conflicts) but both will depend on common enrichment code. Also, > now that parser, enrichment and pcap topologies are separate, code for > those topologies will be deployed as separate jars. No reason to include > parser code in enrichment topologies and vice-versa. Any other > considerations I'm missing? > > With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on > (configuration classes, generic writers for example). No dependencies on > other Metron projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on > common and test. > * dataload - Contains all code related to data loading. Will also > include any property files needed and integration tests. Will depend on > common, test (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also > include scripts, property files, flux files and parser topology integration > tests. This project will depend on common, test (test scope), and > integration-testing (test scope). > * enrichment - All code specific to the enrichment topologies (except > solr and elasticsearch). Would also include scripts, property files, flux > files and enrichment topology integration tests. This project will depend > on common, test (test scope), and integration-test (test scope). > * elasticsearch - All Elasticsearch related code. Will depend on > enrichment. > * solr - All Solr related code. Will depend on enrichment. > * pcap - All code specific to the topology dedicated to pcap. Would > also include scripts, property files, flux files and pcap integration > test. This project will depend on common, test (test scope) and > integration-test (test scope). > * api - This will serve as a generic replacement for > Metron-Pcap_Service. Will contain all code to build a Metron web service > middle layer that can expose APIs through REST or other client protocols. > Could possibly depend on all other projects or separated further if version > conflicts arise (separate api projects for solr and elasticsearch for > example). > > Looking forward to hearing everyone's feedback and great ideas. > > Ryan Merriman > -- Nick Allen
Re: [DISCUSS] Project reorganization
Is there any reason to keep the "test" and "integration-test" code separate? On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman wrote: > All, > > I would like to propose a review and refactor of the current project > organization within Metron. Much of the way the legacy code was organized > does not make sense anymore and could be designed so that it is easier to > navigate and understand. Our test coverage has increased substantially so > I believe we can do this with confidence. > > First off, I think we should agree on a naming convention. I see some > projects (YARN and Storm for example) that prepend the sub-project with the > name of the top-level project (storm-core for example). Metron also > currently does this (Metron-Common). I think that's fine, although in the > case of Metron, I feel like having "Metron" prepended is redundant. > Regardless of whether we decide to stick with that approach, I propose that > project names be uniform and lowercase. For example, under these > assumptions "Metron-Common" would change to "common". > > The first level of organization makes sense to me. Only change I would > make would be to project names: > > * deployment > * streaming > * ui > > Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > > For now I don't see any changes necessary in deployment or ui > organization. I see the streaming project structure primarily driven by 2 > things: the Maven dependency tree and deployment targets. For example, > solr and elasticsearch code should be separated (because their dependency > on lucene conflicts) but both will depend on common enrichment code. Also, > now that parser, enrichment and pcap topologies are separate, code for > those topologies will be deployed as separate jars. No reason to include > parser code in enrichment topologies and vice-versa. Any other > considerations I'm missing? > > With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on > (configuration classes, generic writers for example). No dependencies on > other Metron projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on > common and test. > * dataload - Contains all code related to data loading. Will also > include any property files needed and integration tests. Will depend on > common, test (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also > include scripts, property files, flux files and parser topology integration > tests. This project will depend on common, test (test scope), and > integration-testing (test scope). > * enrichment - All code specific to the enrichment topologies (except > solr and elasticsearch). Would also include scripts, property files, flux > files and enrichment topology integration tests. This project will depend > on common, test (test scope), and integration-test (test scope). > * elasticsearch - All Elasticsearch related code. Will depend on > enrichment. > * solr - All Solr related code. Will depend on enrichment. > * pcap - All code specific to the topology dedicated to pcap. Would > also include scripts, property files, flux files and pcap integration > test. This project will depend on common, test (test scope) and > integration-test (test scope). > * api - This will serve as a generic replacement for > Metron-Pcap_Service. Will contain all code to build a Metron web service > middle layer that can expose APIs through REST or other client protocols. > Could possibly depend on all other projects or separated further if version > conflicts arise (separate api projects for solr and elasticsearch for > example). > > Looking forward to hearing everyone's feedback and great ideas. > > Ryan Merriman > -- Nick Allen
[DISCUSS] Project reorganization
All, I would like to propose a review and refactor of the current project organization within Metron. Much of the way the legacy code was organized does not make sense anymore and could be designed so that it is easier to navigate and understand. Our test coverage has increased substantially so I believe we can do this with confidence. First off, I think we should agree on a naming convention. I see some projects (YARN and Storm for example) that prepend the sub-project with the name of the top-level project (storm-core for example). Metron also currently does this (Metron-Common). I think that's fine, although in the case of Metron, I feel like having "Metron" prepended is redundant. Regardless of whether we decide to stick with that approach, I propose that project names be uniform and lowercase. For example, under these assumptions "Metron-Common" would change to "common". The first level of organization makes sense to me. Only change I would make would be to project names: * deployment * streaming * ui Or if we want to keep metron in project names: * metron-deployment * metron-streaming * metron-ui For now I don't see any changes necessary in deployment or ui organization. I see the streaming project structure primarily driven by 2 things: the Maven dependency tree and deployment targets. For example, solr and elasticsearch code should be separated (because their dependency on lucene conflicts) but both will depend on common enrichment code. Also, now that parser, enrichment and pcap topologies are separate, code for those topologies will be deployed as separate jars. No reason to include parser code in enrichment topologies and vice-versa. Any other considerations I'm missing? With that being said, here is my initial proposal: * common - Any common code that all topologies depend on (configuration classes, generic writers for example). No dependencies on other Metron projects. * test - Contains utilities for writing unit tests, sample configs and sample data. Will depend on common. * integration-test - Contains utilities and classes needed to run our integration tests (in memory components for example). Will depend on common and test. * dataload - Contains all code related to data loading. Will also include any property files needed and integration tests. Will depend on common, test (test scope), and integration-test (test scope). * parser - All code specific to the parser topologies. Would also include scripts, property files, flux files and parser topology integration tests. This project will depend on common, test (test scope), and integration-testing (test scope). * enrichment - All code specific to the enrichment topologies (except solr and elasticsearch). Would also include scripts, property files, flux files and enrichment topology integration tests. This project will depend on common, test (test scope), and integration-test (test scope). * elasticsearch - All Elasticsearch related code. Will depend on enrichment. * solr - All Solr related code. Will depend on enrichment. * pcap - All code specific to the topology dedicated to pcap. Would also include scripts, property files, flux files and pcap integration test. This project will depend on common, test (test scope) and integration-test (test scope). * api - This will serve as a generic replacement for Metron-Pcap_Service. Will contain all code to build a Metron web service middle layer that can expose APIs through REST or other client protocols. Could possibly depend on all other projects or separated further if version conflicts arise (separate api projects for solr and elasticsearch for example). Looking forward to hearing everyone's feedback and great ideas. Ryan Merriman