Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

James Sirota Sat, 18 Feb 2017 09:07:52 -0800

I like the idea of having each parser as its own maven module and having an 
archetype for it.  In my vision when you click to create a maven "metronParser" 
archetype what you would get is a module consisting of a blank parser template 
with a parse() method that a dev would have to fill in, the associated test 
template with some rudimentary tests pre-filled, and two test resource files to 
populate with raw data and parsed data.  I think it's clean and extensible.


I think one thing we would have to worry about with this approach is classpath 
issues.  If there is not a top-level POM anymore then you are increasing 
chances of different parser modules pulling in different versions of the same 
library.  

Thanks,
James 



18.02.2017, 07:24, "Otto Fowler" <ottobackwa...@gmail.com>:
> Thanks for taking the time Matt,
>
> It is likely that I am not seeing your point clearly, could you elaborate
> how Spring or Guice would be applicable to this proposal if there is no
> intent to change the parser’s composition or run-time functionality, but
> rather it’s deployment and external management? I will admit that I am
> starting with the idea that I don’t want to change how the parsers work, so
> I may be limiting my thinking. This is also based on my limited
> understanding of what we need to deliver to storm. Even if the parsers etc
> were using spring or guice at runtime, wouldn’t we still have to deliver
> the right uber jar to storm?
>
> With regards to the configuration, the idea would be that the current
> runtime configurations would stay exactly how they are now, only be
> delivered differently. So ZK->Parser would be the same.
>
> On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote:
>
> Outstanding write-up, Otto! As Casey said, don’t expect this to be a
> coherent response, but some possibly useful thoughts:
>
> 1. It’s clear that because parsers, enrichers, and indexers are all
> specialized per sensor, that “adding a new sensor” is necessarily a complex
> operation. You’ve thrown a lasso around it all, and suggested
> auto-generation of the generic parts. Excellent start.
>
> In my fuzzy computer-sciencey way, your sketch makes me view this as an
> Inversion of Control scenario (
> https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have
> to define this for our readers, but allow me to quote one paragraph, from
> article
> http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html
> :
>
> “[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be
> added and tested independently of other objects, because they don't depend
> on anything other than what you pass them. When using traditional
> dependencies, to test an object you have to create an environment where all
> of its dependencies exist and are reachable before you can test it. With
> [IoC or] DI, it's possible to test the object in isolation passing it mock
> objects for the ones you don't want or need to create. Likewise, adding a
> class to a project is facilitated because the class is self-contained, so
> this avoids the ‘big hairball’ that large projects often evolve into.”
>
> Surely part of what we want, no? Does it make sense to use Spring or Guice
> to drive the integration (and design) of this extensibility capability? I
> know this could be viewed as an implementation issue, but you said you’re
> starting to prototype, and these things are best integrated from the
> beginning.
>
> 2. Regarding configuration, consider that some (dynamic config parameters)
> will be dynamically read during runtime and some (static config parameters)
> will require restarting (or re-instantiating) the components. Config params
> that want to be read dynamically should definitely go in ZK so they can
> take advantage of Curator notifications. Static config params, that can
> only usefully be set at startup or instantiation, could either go in ZK or
> be handled the traditional way in Ambari as files on all configured hosts.
> If you choose to put static params also in ZK, note that separating static
> and dynamic configs into different znodes makes the process of monitoring
> changes in the dynamic configs more efficient, and this is unrelated to the
> human-readable grouping of params the user sees in a UI.
>
> I am talking with Ambari engineers about implementing an ability for Ambari
> to manage config parameters in ZK, at the option of the component
> implementor, and expect to be opening Apache Ambari jiras soon. At the
> Ambari UI level there should be no difference; at the implementation level
> a json or other config file could be written once to a ZK znode instead of
> to filesystem files on all configured hosts. The usages could be mixed,
> with the component implementation deciding which config files get written
> to which target.
>
> 3. Yes I read that far :-)
>
> Again, great draft.
> Thanks,
> --Matt
>
> On 2/17/17, 1:07 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
>
> RE:
> * One Module - yes, I think grouping for the base parsers is good, I just
> don’t want them to stay in -common, it should ‘live’ in the metron lib. I
> think a grouped set of the primitive parsers is correct, still it’s own.
> * ES Templates - they don’t *have* to be there, but if they are they will
> be used. The idea that I’m having is “ someone writing a parser should be
> able to produce 1 thing, in one place”. We are talking with Simon on a
> different thread about the types of indexing templates we could have. I
> think we could have from *nothing to es or solr specific to something new
>
> As we discuss we can come up with the mv-pr.
>
> On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote:
>
> Ok, This is a long one, so don't expect a coherent response just yet, but I
> will give some initial impressions:
>
> - I strongly agree with the premise of this idea. Making Metron
> extensible is and should be among the top of our priorities and at the
> moment, it's painful to develop a new parser.
> - One maven module per parser may be overkill here as the shading is
> costly and I think it may make some sense to group based on characteristics
> in some way (e.g. json and csv may get grouped together).
> - The notion of instance vs parser is a good one
> - Binding ES templates and parsers may not be a good idea. You can have
> non-indexed parsers (e.g. streaming enrichments).
>
> Can we start small here and then iterate toward the complete vision? I'd
> recommend
>
> - Splitting the parsers up into some coherent organization with common
> bits separated from the parser itself
> - Having a maven archetype
>
> As the two most valuable and achievable parts of this idea since they are
> the bits required to enable users to create parsers without forking Metron.
>
> On Fri, Feb 17, 2017 at 11:54 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>>  The ability for implementors and developers building on the project to
>>  ‘side load’, that is to build, maintain, and install, telemetry sources
>>  into the system without having to actually develop within METRON itself
>
> is
>>  very important.
>>
>>  If done properly it gives developers and easier and more manageable
>>  proposition for extending METRON to suit their needs in what may be the
>>  most common extension case. It also may reduce the necessity to create
>
> and
>>  maintain forks of METRON.
>>
>>  I would like to put forward a proposal on a way to move this forward, and
>>  ask the community for feedback and assistance in reaching an acceptable
>>  approach and raising the issues that I have surely missed.
>>
>>  Conceptually what I would like to propose is the following:
>>
>>  * What is currently metron-parsers should be broken apart such that each
>>  parser is it’s own individual component
>>  * Each of these components should be completely self contained ( or
>
> produce
>>  a self contained package )
>>  * These packages will include the shaded jar for the parser, default
>>  configurations for the parser and enrichment, default elasticsearch
>>  template, and a default log-rotate script
>>  * These packages will be deployed to disk in a new library directory
>
> under
>>  metron
>>  * Zookeeper should have a new telemetry or source area where all
>>  ‘installed’ sources exist
>>  * This area would host the default configurations, rules, templates, and
>>  scripts and metadata
>>  * Installed sources can be instantiated as named instances
>>  * Instantiating an instance will move the default configurations to what
>
> is
>>  currently the enrichment and parser areas for the instance name
>>  * It will also deploy the elasticsearch template for the instance
>>  name
>>  * It will deploy the log-rotate scripts
>>  * Installed and instantiated sources can be ‘redeployed’ from disk to
>>  upgrade
>>  * Installed sources are available for selection in ambari
>>  * question on post selection configuration, but we have that problem
>>  already
>>  * Instantiation is exposed through REST
>>  * the UI can install a new package
>>  * the UI can allow a workflow to edit the configurations and templates
>>  before finalizing
>>  * are there three states here? Installed | Edited | Instantiated
>>  ?
>>  * the UI can edit existing and redeploy
>>  * possibly re-deploy ES template after adding fields or account for
>
> fields
>>  added by enrichment…. manually or automatically?
>>  * a script can be made to instantiate a ‘base’ parser ( json, grok, csv )
>>  with only configuration
>>  * The installation and instantiation should be exposed through the
>
> Stellar
>>  management console
>>  * Starting a topology will now start the parser’s shaded jar found
>
> through
>>  the parser type ( which may need to added to the configurations ) and the
>>  library
>>  * A Maven Archetype should be created for a parser | telemetry source
>>  project that allows the proper setup of a development project outside the
>>  METRON source tree
>>  * should be published
>>  * should have a useful default set
>>
>>  So the developer’s workflow:
>>
>>  * Create a new project from the archetype outside of the metron tree
>>  * edit the configurations, templates, rules etc in the project
>>  * code or modify the sample
>>  * build
>>  * run the installer script or the ui to upload/deploy the package
>>  * use the console or ui to create an instance
>>
>>  QUESTIONS:
>>  * it seems strange to have this as ‘parsers’ when conceptually parsers
>
> are
>>  a part of the whole, should we introduce something like ‘source’ that is
>>  all of it?
>>  * should configurations etc be in ZK or on disk? or HDFS? or All of the
>>  above?
>>  * did you read this far? good!
>>  * I am sure that after hitting send I will think of 10 things that are
>>  missing from this
>>
>>  I have started a POC of this, and thus far have created
>>  metron-parsers-common and started breaking out metron-parser-asa.
>>  I will continue to work through some of this here
>>  https://github.com/ottobackwards/incubator-metron/tree/METRON-258
>>
>>  Again, thank you for your time and feedback.

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS][PROPOSAL] Side Loading and Installation of telemetry sources [METRON-258]

Reply via email to