I like the idea of having each parser as its own maven module and having an archetype for it. In my vision when you click to create a maven "metronParser" archetype what you would get is a module consisting of a blank parser template with a parse() method that a dev would have to fill in, the associated test template with some rudimentary tests pre-filled, and two test resource files to populate with raw data and parsed data. I think it's clean and extensible.
I think one thing we would have to worry about with this approach is classpath issues. If there is not a top-level POM anymore then you are increasing chances of different parser modules pulling in different versions of the same library. Thanks, James 18.02.2017, 07:24, "Otto Fowler" <ottobackwa...@gmail.com>: > Thanks for taking the time Matt, > > It is likely that I am not seeing your point clearly, could you elaborate > how Spring or Guice would be applicable to this proposal if there is no > intent to change the parser’s composition or run-time functionality, but > rather it’s deployment and external management? I will admit that I am > starting with the idea that I don’t want to change how the parsers work, so > I may be limiting my thinking. This is also based on my limited > understanding of what we need to deliver to storm. Even if the parsers etc > were using spring or guice at runtime, wouldn’t we still have to deliver > the right uber jar to storm? > > With regards to the configuration, the idea would be that the current > runtime configurations would stay exactly how they are now, only be > delivered differently. So ZK->Parser would be the same. > > On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote: > > Outstanding write-up, Otto! As Casey said, don’t expect this to be a > coherent response, but some possibly useful thoughts: > > 1. It’s clear that because parsers, enrichers, and indexers are all > specialized per sensor, that “adding a new sensor” is necessarily a complex > operation. You’ve thrown a lasso around it all, and suggested > auto-generation of the generic parts. Excellent start. > > In my fuzzy computer-sciencey way, your sketch makes me view this as an > Inversion of Control scenario ( > https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have > to define this for our readers, but allow me to quote one paragraph, from > article > http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html > : > > “[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be > added and tested independently of other objects, because they don't depend > on anything other than what you pass them. When using traditional > dependencies, to test an object you have to create an environment where all > of its dependencies exist and are reachable before you can test it. With > [IoC or] DI, it's possible to test the object in isolation passing it mock > objects for the ones you don't want or need to create. Likewise, adding a > class to a project is facilitated because the class is self-contained, so > this avoids the ‘big hairball’ that large projects often evolve into.” > > Surely part of what we want, no? Does it make sense to use Spring or Guice > to drive the integration (and design) of this extensibility capability? I > know this could be viewed as an implementation issue, but you said you’re > starting to prototype, and these things are best integrated from the > beginning. > > 2. Regarding configuration, consider that some (dynamic config parameters) > will be dynamically read during runtime and some (static config parameters) > will require restarting (or re-instantiating) the components. Config params > that want to be read dynamically should definitely go in ZK so they can > take advantage of Curator notifications. Static config params, that can > only usefully be set at startup or instantiation, could either go in ZK or > be handled the traditional way in Ambari as files on all configured hosts. > If you choose to put static params also in ZK, note that separating static > and dynamic configs into different znodes makes the process of monitoring > changes in the dynamic configs more efficient, and this is unrelated to the > human-readable grouping of params the user sees in a UI. > > I am talking with Ambari engineers about implementing an ability for Ambari > to manage config parameters in ZK, at the option of the component > implementor, and expect to be opening Apache Ambari jiras soon. At the > Ambari UI level there should be no difference; at the implementation level > a json or other config file could be written once to a ZK znode instead of > to filesystem files on all configured hosts. The usages could be mixed, > with the component implementation deciding which config files get written > to which target. > > 3. Yes I read that far :-) > > Again, great draft. > Thanks, > --Matt > > On 2/17/17, 1:07 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote: > > RE: > * One Module - yes, I think grouping for the base parsers is good, I just > don’t want them to stay in -common, it should ‘live’ in the metron lib. I > think a grouped set of the primitive parsers is correct, still it’s own. > * ES Templates - they don’t *have* to be there, but if they are they will > be used. The idea that I’m having is “ someone writing a parser should be > able to produce 1 thing, in one place”. We are talking with Simon on a > different thread about the types of indexing templates we could have. I > think we could have from *nothing to es or solr specific to something new > > As we discuss we can come up with the mv-pr. > > On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote: > > Ok, This is a long one, so don't expect a coherent response just yet, but I > will give some initial impressions: > > - I strongly agree with the premise of this idea. Making Metron > extensible is and should be among the top of our priorities and at the > moment, it's painful to develop a new parser. > - One maven module per parser may be overkill here as the shading is > costly and I think it may make some sense to group based on characteristics > in some way (e.g. json and csv may get grouped together). > - The notion of instance vs parser is a good one > - Binding ES templates and parsers may not be a good idea. You can have > non-indexed parsers (e.g. streaming enrichments). > > Can we start small here and then iterate toward the complete vision? I'd > recommend > > - Splitting the parsers up into some coherent organization with common > bits separated from the parser itself > - Having a maven archetype > > As the two most valuable and achievable parts of this idea since they are > the bits required to enable users to create parsers without forking Metron. > > On Fri, Feb 17, 2017 at 11:54 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> The ability for implementors and developers building on the project to >> ‘side load’, that is to build, maintain, and install, telemetry sources >> into the system without having to actually develop within METRON itself > > is >> very important. >> >> If done properly it gives developers and easier and more manageable >> proposition for extending METRON to suit their needs in what may be the >> most common extension case. It also may reduce the necessity to create > > and >> maintain forks of METRON. >> >> I would like to put forward a proposal on a way to move this forward, and >> ask the community for feedback and assistance in reaching an acceptable >> approach and raising the issues that I have surely missed. >> >> Conceptually what I would like to propose is the following: >> >> * What is currently metron-parsers should be broken apart such that each >> parser is it’s own individual component >> * Each of these components should be completely self contained ( or > > produce >> a self contained package ) >> * These packages will include the shaded jar for the parser, default >> configurations for the parser and enrichment, default elasticsearch >> template, and a default log-rotate script >> * These packages will be deployed to disk in a new library directory > > under >> metron >> * Zookeeper should have a new telemetry or source area where all >> ‘installed’ sources exist >> * This area would host the default configurations, rules, templates, and >> scripts and metadata >> * Installed sources can be instantiated as named instances >> * Instantiating an instance will move the default configurations to what > > is >> currently the enrichment and parser areas for the instance name >> * It will also deploy the elasticsearch template for the instance >> name >> * It will deploy the log-rotate scripts >> * Installed and instantiated sources can be ‘redeployed’ from disk to >> upgrade >> * Installed sources are available for selection in ambari >> * question on post selection configuration, but we have that problem >> already >> * Instantiation is exposed through REST >> * the UI can install a new package >> * the UI can allow a workflow to edit the configurations and templates >> before finalizing >> * are there three states here? Installed | Edited | Instantiated >> ? >> * the UI can edit existing and redeploy >> * possibly re-deploy ES template after adding fields or account for > > fields >> added by enrichment…. manually or automatically? >> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv ) >> with only configuration >> * The installation and instantiation should be exposed through the > > Stellar >> management console >> * Starting a topology will now start the parser’s shaded jar found > > through >> the parser type ( which may need to added to the configurations ) and the >> library >> * A Maven Archetype should be created for a parser | telemetry source >> project that allows the proper setup of a development project outside the >> METRON source tree >> * should be published >> * should have a useful default set >> >> So the developer’s workflow: >> >> * Create a new project from the archetype outside of the metron tree >> * edit the configurations, templates, rules etc in the project >> * code or modify the sample >> * build >> * run the installer script or the ui to upload/deploy the package >> * use the console or ui to create an instance >> >> QUESTIONS: >> * it seems strange to have this as ‘parsers’ when conceptually parsers > > are >> a part of the whole, should we introduce something like ‘source’ that is >> all of it? >> * should configurations etc be in ZK or on disk? or HDFS? or All of the >> above? >> * did you read this far? good! >> * I am sure that after hitting send I will think of 10 things that are >> missing from this >> >> I have started a POC of this, and thus far have created >> metron-parsers-common and started breaking out metron-parser-asa. >> I will continue to work through some of this here >> https://github.com/ottobackwards/incubator-metron/tree/METRON-258 >> >> Again, thank you for your time and feedback. ------------------- Thank you, James Sirota PPMC- Apache Metron (Incubating) jsirota AT apache DOT org