Awesome write up and ideas Otto, I also strongly support this idea. As someone who has the development of a few parsers quickly approaching the top of their to do list, I will happily beta test this for you when it's far enough along for that. Until then I will attempt to take a look at your branch and come up to speed more thoroughly.
Regarding the storage of configurations, I'm in favor of ZK for largely the reasons that mattf mentioned, but also for organizational reasons. Finding configurations should be intuitive and to do that I think storing them in a common area is reasonable and makes them easier to audit. I'll leave my specific comments regarding management of indexing templates for the other thread, but I think that getting our arms around a solution where modifications in one part of the stack accounts for updates to other places will be key in improving adoption. Also, James, I have had direct requests regarding the templating parser assistance that you outlined, so I know that parts of the community are looking for that exact feature. It gets a big +1 from me. Jon On Sat, Feb 18, 2017, 12:51 PM Otto Fowler <ottobackwa...@gmail.com> wrote: I plan on looking at the NiFi archetype to see if there is something there about this and other things. I think this is very similar to the nar. On February 18, 2017 at 12:07:35, James Sirota (jsir...@apache.org) wrote: I like the idea of having each parser as its own maven module and having an archetype for it. In my vision when you click to create a maven "metronParser" archetype what you would get is a module consisting of a blank parser template with a parse() method that a dev would have to fill in, the associated test template with some rudimentary tests pre-filled, and two test resource files to populate with raw data and parsed data. I think it's clean and extensible. I think one thing we would have to worry about with this approach is classpath issues. If there is not a top-level POM anymore then you are increasing chances of different parser modules pulling in different versions of the same library. Thanks, James 18.02.2017, 07:24, "Otto Fowler" <ottobackwa...@gmail.com>: > Thanks for taking the time Matt, > > It is likely that I am not seeing your point clearly, could you elaborate > how Spring or Guice would be applicable to this proposal if there is no > intent to change the parser’s composition or run-time functionality, but > rather it’s deployment and external management? I will admit that I am > starting with the idea that I don’t want to change how the parsers work, so > I may be limiting my thinking. This is also based on my limited > understanding of what we need to deliver to storm. Even if the parsers etc > were using spring or guice at runtime, wouldn’t we still have to deliver > the right uber jar to storm? > > With regards to the configuration, the idea would be that the current > runtime configurations would stay exactly how they are now, only be > delivered differently. So ZK->Parser would be the same. > > On February 17, 2017 at 19:06:16, Matt Foley (ma...@apache.org) wrote: > > Outstanding write-up, Otto! As Casey said, don’t expect this to be a > coherent response, but some possibly useful thoughts: > > 1. It’s clear that because parsers, enrichers, and indexers are all > specialized per sensor, that “adding a new sensor” is necessarily a complex > operation. You’ve thrown a lasso around it all, and suggested > auto-generation of the generic parts. Excellent start. > > In my fuzzy computer-sciencey way, your sketch makes me view this as an > Inversion of Control scenario ( > https://en.wikipedia.org/wiki/Inversion_of_control ). I know I don’t have > to define this for our readers, but allow me to quote one paragraph, from > article > http://www.javaworld.com/article/2071914/excellent-explanation-of-dependency-injection--inversion-of-control-.html > : > > “[IoC (or DI)] delivers a key advantage: loose coupling. Objects can be > added and tested independently of other objects, because they don't depend > on anything other than what you pass them. When using traditional > dependencies, to test an object you have to create an environment where all > of its dependencies exist and are reachable before you can test it. With > [IoC or] DI, it's possible to test the object in isolation passing it mock > objects for the ones you don't want or need to create. Likewise, adding a > class to a project is facilitated because the class is self-contained, so > this avoids the ‘big hairball’ that large projects often evolve into.” > > Surely part of what we want, no? Does it make sense to use Spring or Guice > to drive the integration (and design) of this extensibility capability? I > know this could be viewed as an implementation issue, but you said you’re > starting to prototype, and these things are best integrated from the > beginning. > > 2. Regarding configuration, consider that some (dynamic config parameters) > will be dynamically read during runtime and some (static config parameters) > will require restarting (or re-instantiating) the components. Config params > that want to be read dynamically should definitely go in ZK so they can > take advantage of Curator notifications. Static config params, that can > only usefully be set at startup or instantiation, could either go in ZK or > be handled the traditional way in Ambari as files on all configured hosts. > If you choose to put static params also in ZK, note that separating static > and dynamic configs into different znodes makes the process of monitoring > changes in the dynamic configs more efficient, and this is unrelated to the > human-readable grouping of params the user sees in a UI. > > I am talking with Ambari engineers about implementing an ability for Ambari > to manage config parameters in ZK, at the option of the component > implementor, and expect to be opening Apache Ambari jiras soon. At the > Ambari UI level there should be no difference; at the implementation level > a json or other config file could be written once to a ZK znode instead of > to filesystem files on all configured hosts. The usages could be mixed, > with the component implementation deciding which config files get written > to which target. > > 3. Yes I read that far :-) > > Again, great draft. > Thanks, > --Matt > > On 2/17/17, 1:07 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote: > > RE: > * One Module - yes, I think grouping for the base parsers is good, I just > don’t want them to stay in -common, it should ‘live’ in the metron lib. I > think a grouped set of the primitive parsers is correct, still it’s own. > * ES Templates - they don’t *have* to be there, but if they are they will > be used. The idea that I’m having is “ someone writing a parser should be > able to produce 1 thing, in one place”. We are talking with Simon on a > different thread about the types of indexing templates we could have. I > think we could have from *nothing to es or solr specific to something new > > As we discuss we can come up with the mv-pr. > > On February 17, 2017 at 15:47:57, Casey Stella (ceste...@gmail.com) wrote: > > Ok, This is a long one, so don't expect a coherent response just yet, but I > will give some initial impressions: > > - I strongly agree with the premise of this idea. Making Metron > extensible is and should be among the top of our priorities and at the > moment, it's painful to develop a new parser. > - One maven module per parser may be overkill here as the shading is > costly and I think it may make some sense to group based on characteristics > in some way (e.g. json and csv may get grouped together). > - The notion of instance vs parser is a good one > - Binding ES templates and parsers may not be a good idea. You can have > non-indexed parsers (e.g. streaming enrichments). > > Can we start small here and then iterate toward the complete vision? I'd > recommend > > - Splitting the parsers up into some coherent organization with common > bits separated from the parser itself > - Having a maven archetype > > As the two most valuable and achievable parts of this idea since they are > the bits required to enable users to create parsers without forking Metron. > > On Fri, Feb 17, 2017 at 11:54 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> The ability for implementors and developers building on the project to >> ‘side load’, that is to build, maintain, and install, telemetry sources >> into the system without having to actually develop within METRON itself > > is >> very important. >> >> If done properly it gives developers and easier and more manageable >> proposition for extending METRON to suit their needs in what may be the >> most common extension case. It also may reduce the necessity to create > > and >> maintain forks of METRON. >> >> I would like to put forward a proposal on a way to move this forward, and >> ask the community for feedback and assistance in reaching an acceptable >> approach and raising the issues that I have surely missed. >> >> Conceptually what I would like to propose is the following: >> >> * What is currently metron-parsers should be broken apart such that each >> parser is it’s own individual component >> * Each of these components should be completely self contained ( or > > produce >> a self contained package ) >> * These packages will include the shaded jar for the parser, default >> configurations for the parser and enrichment, default elasticsearch >> template, and a default log-rotate script >> * These packages will be deployed to disk in a new library directory > > under >> metron >> * Zookeeper should have a new telemetry or source area where all >> ‘installed’ sources exist >> * This area would host the default configurations, rules, templates, and >> scripts and metadata >> * Installed sources can be instantiated as named instances >> * Instantiating an instance will move the default configurations to what > > is >> currently the enrichment and parser areas for the instance name >> * It will also deploy the elasticsearch template for the instance >> name >> * It will deploy the log-rotate scripts >> * Installed and instantiated sources can be ‘redeployed’ from disk to >> upgrade >> * Installed sources are available for selection in ambari >> * question on post selection configuration, but we have that problem >> already >> * Instantiation is exposed through REST >> * the UI can install a new package >> * the UI can allow a workflow to edit the configurations and templates >> before finalizing >> * are there three states here? Installed | Edited | Instantiated >> ? >> * the UI can edit existing and redeploy >> * possibly re-deploy ES template after adding fields or account for > > fields >> added by enrichment…. manually or automatically? >> * a script can be made to instantiate a ‘base’ parser ( json, grok, csv ) >> with only configuration >> * The installation and instantiation should be exposed through the > > Stellar >> management console >> * Starting a topology will now start the parser’s shaded jar found > > through >> the parser type ( which may need to added to the configurations ) and the >> library >> * A Maven Archetype should be created for a parser | telemetry source >> project that allows the proper setup of a development project outside the >> METRON source tree >> * should be published >> * should have a useful default set >> >> So the developer’s workflow: >> >> * Create a new project from the archetype outside of the metron tree >> * edit the configurations, templates, rules etc in the project >> * code or modify the sample >> * build >> * run the installer script or the ui to upload/deploy the package >> * use the console or ui to create an instance >> >> QUESTIONS: >> * it seems strange to have this as ‘parsers’ when conceptually parsers > > are >> a part of the whole, should we introduce something like ‘source’ that is >> all of it? >> * should configurations etc be in ZK or on disk? or HDFS? or All of the >> above? >> * did you read this far? good! >> * I am sure that after hitting send I will think of 10 things that are >> missing from this >> >> I have started a POC of this, and thus far have created >> metron-parsers-common and started breaking out metron-parser-asa. >> I will continue to work through some of this here >> https://github.com/ottobackwards/incubator-metron/tree/METRON-258 >> >> Again, thank you for your time and feedback. ------------------- Thank you, James Sirota PPMC- Apache Metron (Incubating) jsirota AT apache DOT org -- Jon Sent from my mobile device