All,

I would like to propose a review and refactor of the current project 
organization within Metron.  Much of the way the legacy code was organized does 
not make sense anymore and could be designed so that it is easier to navigate 
and understand.  Our test coverage has increased substantially so I believe we 
can do this with confidence.

First off, I think we should agree on a naming convention.  I see some projects 
(YARN and Storm for example) that prepend the sub-project with the name of the 
top-level project (storm-core for example).  Metron also currently does this 
(Metron-Common).  I think that's fine, although in the case of Metron, I feel 
like having "Metron" prepended is redundant.  Regardless of whether we decide 
to stick with that approach, I propose that project names be uniform and 
lowercase.  For example, under these assumptions "Metron-Common" would change 
to "common".

The first level of organization makes sense to me.  Only change I would make 
would be to project names:

  *   deployment
  *   streaming
  *   ui

Or if we want to keep metron in project names:

  *   metron-deployment
  *   metron-streaming
  *   metron-ui

For now I don't see any changes necessary in deployment or ui organization.  I 
see the streaming project structure primarily driven by 2 things:  the Maven 
dependency tree and deployment targets.  For example, solr and elasticsearch 
code should be separated (because their dependency on lucene conflicts) but 
both will depend on common enrichment code.  Also, now that parser, enrichment 
and pcap topologies are separate, code for those topologies will be deployed as 
separate jars.  No reason to include parser code in enrichment topologies and 
vice-versa.  Any other considerations I'm missing?

With that being said, here is my initial proposal:

  *   common -  Any common code that all topologies depend on (configuration 
classes, generic writers for example).  No dependencies on other Metron 
projects.
  *   test - Contains utilities for writing unit tests, sample configs and 
sample data.  Will depend on common.
  *   integration-test - Contains utilities and classes needed to run our 
integration tests (in memory components for example).  Will depend on common 
and test.
  *   dataload - Contains all code related to data loading.  Will also include 
any property files needed and integration tests.  Will depend on common, test 
(test scope), and integration-test (test scope).
  *   parser - All code specific to the parser topologies.  Would also include 
scripts, property files, flux files and parser topology integration tests.  
This project will depend on common, test (test scope), and integration-testing 
(test scope).
  *   enrichment - All code specific to the enrichment topologies (except solr 
and elasticsearch).  Would also include scripts, property files, flux files and 
enrichment topology integration tests.  This project will depend on common, 
test (test scope), and integration-test (test scope).
  *   elasticsearch - All Elasticsearch related code.  Will depend on 
enrichment.
  *   solr - All Solr related code.  Will depend on enrichment.
  *   pcap - All code specific to the topology dedicated to pcap.  Would also 
include scripts, property files, flux files and pcap integration test.  This 
project will depend on common, test (test scope) and integration-test (test 
scope).
  *   api - This will serve as a generic replacement for Metron-Pcap_Service.  
Will contain all code to build a Metron web service middle layer that can 
expose APIs through REST or other client protocols.  Could possibly depend on 
all other projects or separated further if version conflicts arise (separate 
api projects for solr and elasticsearch for example).

Looking forward to hearing everyone's feedback and great ideas.

Ryan Merriman

Reply via email to