Re: [DISCUSS] Project reorganization

James Sirota Sun, 10 Apr 2016 18:31:30 -0700

Hi Debo,

I think it would be great if you set it up


Thanks,
James 




On 4/10/16, 6:25 PM, "Debojyoti Dutta" <ddu...@gmail.com> wrote:

>I have set it up for another open source effort in the past and it was not 
>very hard. Am happy to volunteer if needed. 
>
>Thx 
>Debo
>
>Sent from my iPhone
>
>> On Apr 10, 2016, at 5:53 PM, James Sirota <jsir...@hortonworks.com> wrote:
>> 
>> I’d be open to an IRC channel.  Does anyone know if Apache allows this?  If 
>> yes, does anyone know how to set one up?
>> 
>> Thanks,
>> James 
>> 
>> 
>> 
>> 
>>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" <ddu...@gmail.com> wrote:
>>> 
>>> Hi Nick 
>>> 
>>> I like your suggestions. For the enrichment layer do you think it would 
>>> also include any advanced analytics. Else we might want to have an 
>>> analytics layer. 
>>> 
>>> It would be good to have an arch which could be extended for new 
>>> functionality. 
>>> 
>>> However Ryan's suggestion of the ui API and deployer also makes sense. 
>>> 
>>> Should we have an IRC channel to discuss this or maybe etherpad?
>>> 
>>> Debo
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 10, 2016, at 4:36 PM, Nick Allen <n...@nickallen.org> wrote:
>>>> 
>>>> It might help to think of our code base as four separate types of
>>>> functionality.  This is primarily meant to give us a framework to think
>>>> about the organization of Metron (and drive more discussion), rather than
>>>> my proposal for a specific structure.
>>>> 
>>>>  - Sensor - Anything that captures external, non-streaming data and
>>>>  presents it in a form ready for stream processing.
>>>>  - Input - Responsible for preparing streaming data for enrichment.  The
>>>>  existing "parsers" fit neatly into this space.
>>>>  - Enrichment - Responsible for enriching an incoming data feed like
>>>>  geoip, asset enrichment, threat intel lookups, etc.
>>>>  - Output - Responsible for persisting data that has been processed by
>>>>  Metron which obviously means search indexers or data stores.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman <rmerri...@hortonworks.com>
>>>> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> I would like to propose a review and refactor of the current project
>>>>> organization within Metron.  Much of the way the legacy code was organized
>>>>> does not make sense anymore and could be designed so that it is easier to
>>>>> navigate and understand.  Our test coverage has increased substantially so
>>>>> I believe we can do this with confidence.
>>>>> 
>>>>> First off, I think we should agree on a naming convention.  I see some
>>>>> projects (YARN and Storm for example) that prepend the sub-project with 
>>>>> the
>>>>> name of the top-level project (storm-core for example).  Metron also
>>>>> currently does this (Metron-Common).  I think that's fine, although in the
>>>>> case of Metron, I feel like having "Metron" prepended is redundant.
>>>>> Regardless of whether we decide to stick with that approach, I propose 
>>>>> that
>>>>> project names be uniform and lowercase.  For example, under these
>>>>> assumptions "Metron-Common" would change to "common".
>>>>> 
>>>>> The first level of organization makes sense to me.  Only change I would
>>>>> make would be to project names:
>>>>> 
>>>>> *   deployment
>>>>> *   streaming
>>>>> *   ui
>>>>> 
>>>>> Or if we want to keep metron in project names:
>>>>> 
>>>>> *   metron-deployment
>>>>> *   metron-streaming
>>>>> *   metron-ui
>>>>> 
>>>>> For now I don't see any changes necessary in deployment or ui
>>>>> organization.  I see the streaming project structure primarily driven by 2
>>>>> things:  the Maven dependency tree and deployment targets.  For example,
>>>>> solr and elasticsearch code should be separated (because their dependency
>>>>> on lucene conflicts) but both will depend on common enrichment code.  
>>>>> Also,
>>>>> now that parser, enrichment and pcap topologies are separate, code for
>>>>> those topologies will be deployed as separate jars.  No reason to include
>>>>> parser code in enrichment topologies and vice-versa.  Any other
>>>>> considerations I'm missing?
>>>>> 
>>>>> With that being said, here is my initial proposal:
>>>>> 
>>>>> *   common -  Any common code that all topologies depend on
>>>>> (configuration classes, generic writers for example).  No dependencies on
>>>>> other Metron projects.
>>>>> *   test - Contains utilities for writing unit tests, sample configs and
>>>>> sample data.  Will depend on common.
>>>>> *   integration-test - Contains utilities and classes needed to run our
>>>>> integration tests (in memory components for example).  Will depend on
>>>>> common and test.
>>>>> *   dataload - Contains all code related to data loading.  Will also
>>>>> include any property files needed and integration tests.  Will depend on
>>>>> common, test (test scope), and integration-test (test scope).
>>>>> *   parser - All code specific to the parser topologies.  Would also
>>>>> include scripts, property files, flux files and parser topology 
>>>>> integration
>>>>> tests.  This project will depend on common, test (test scope), and
>>>>> integration-testing (test scope).
>>>>> *   enrichment - All code specific to the enrichment topologies (except
>>>>> solr and elasticsearch).  Would also include scripts, property files, flux
>>>>> files and enrichment topology integration tests.  This project will depend
>>>>> on common, test (test scope), and integration-test (test scope).
>>>>> *   elasticsearch - All Elasticsearch related code.  Will depend on
>>>>> enrichment.
>>>>> *   solr - All Solr related code.  Will depend on enrichment.
>>>>> *   pcap - All code specific to the topology dedicated to pcap.  Would
>>>>> also include scripts, property files, flux files and pcap integration
>>>>> test.  This project will depend on common, test (test scope) and
>>>>> integration-test (test scope).
>>>>> *   api - This will serve as a generic replacement for
>>>>> Metron-Pcap_Service.  Will contain all code to build a Metron web service
>>>>> middle layer that can expose APIs through REST or other client protocols.
>>>>> Could possibly depend on all other projects or separated further if 
>>>>> version
>>>>> conflicts arise (separate api projects for solr and elasticsearch for
>>>>> example).
>>>>> 
>>>>> Looking forward to hearing everyone's feedback and great ideas.
>>>>> 
>>>>> Ryan Merriman
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Nick Allen <n...@nickallen.org>
>>> 
>

Re: [DISCUSS] Project reorganization

Reply via email to