I like the flexibility and extensibility of having some kind of internal representation (generic definition) of the names and types of the fields produced by a parser.
Rather than shipping with an ES template, a parser would ship with a generic definition of the field names and data types that it adds to a message. The Elasticsearch indexer would then take this generic definition and translate it to an Elasticsearch template. Each new indexer (Solr, etc) would know how to consume this generic definition and produce whatever artifacts that it needs to index the data accurately. On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org> wrote: > I am not sure I agree with packaging source-specific templates with the > parser. I think that would make it harder to add additional storage > sources. For example, what happens if I have 50 parsers with Solr and ES > schemas defined, but now I want to add druid? Now I have to add 50 schemas > to all my existing parsers, which I don't think makes sense. I think what > we should have instead is tuple mappers that map some internal > representation of our schema to whatever schema the tool uses. We already > somewhat started to move down this path with Kyle defining the schema enum > for his ASA parser PR and Simon defining a JSON schema for his CEF parser > PR. I think we need to unify these approaches and then propagate them to > all the parsers. I think what has to happen is the following: > > We have to introduce a partial schema for Metron messages where you can > enforce a schema on a part of a message you want, but at the same time > allow enough flexibility for the rest of the message to be flexible. What > I mean by that is that you should enforce a schema for things like ip, > protocol, timestamp, etc, but have a fully flexible structure outside of > that. > > After you do that then you can map the partial schema you defined to es, > solr, druid, etc, etc. For the fields you don't have a schema for you just > assume they are strings. To add additional storage/indexing source to > Metron all you do is define a mapper to that source's schema and load that > into our indexing bolt. > > > > Thanks, > James > > 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>: > > I think this is a good direction to move things toward - moving indexing > > templates to be packaged with parsers (using multiple tiered options) > that > > are then merged with the possible enrich fields before getting added to > the > > indexing technology in use. Now, to read the proposal thread... > > > > Jon > > > > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball < > > si...@simonellistonball.com> wrote: > > > >> I’d broadly agree with that tiered approach. > >> > >> The version where the parser emits a generic schema, and enrichments > >> contribute generic schema chunks to that which get combined into an > indexer > >> specific template generated at the end of the flow, so yes, pretty much > >> inline with your proposal. (I did read though it, apologies if I > missed any > >> of the detail, brain is still a little bit post-RSA!) > >> > >> Simon > >> > >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> > > >> > We already make them do this now, or they get the defaults. So this > is > >> no different. > >> > Having parsers emit names and types etc, that would be another step > - or > >> it could be the ‘generic schema’ as implemented actually. > >> > > >> > A tiered approach - from > >> > * you give nothing with the parser - you get whatever ES guesses at > but > >> you don’t care do you > >> > * you give the schema > >> > * you give the types and we figure it out for you > >> > > >> > would be the best to move to. > >> > > >> > Also, we could use the names and types method tied to enrichment to > >> generate indexing templates for enrichment types or deriving them > rather, > >> which i mention in my proposal. > >> > > >> > I’m starting to think you haven’t rushed out to read it Simon ;) > >> > > >> > > >> > > >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball ( > >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) > wrote: > >> > > >> >> I like that, to an extent… Forcing the provision of explicit schema > >> might be a bit of a load for parser development. I’m assuming that > custom > >> parsers would be pushed towards the same packaging approach. > >> >> > >> >> Would it make sense to require the parser to emit field names and > types > >> expected, and then for us to provide a means of creating the templates > for > >> supported indices, and push the actual template management to the index > >> layer rather than the parsing layer. Schema is after all determined not > >> just by a parser, but also by the combination of enrichments and models > >> applied. > >> >> > >> >> We could also of course provide an override option within your > proposed > >> parser package model to allow any destination specific configuration > of the > >> indexing template. > >> >> > >> >> Simon > >> >> > >> >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwa...@gmail.com > >> <mailto:ottobackwa...@gmail.com>> wrote: > >> >> > > >> >> > I think we can get there from my proposal. > >> >> > A source may package: > >> >> > * explicit schemas ( ES, SOLR, FOO ) > >> >> > * a generic to be invented schema for a to be invented pluggable > >> indexing > >> >> > component :) > >> >> > and we’ll be able to handle it. > >> >> > > >> >> > > >> >> > > >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson ( > >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>) > >> >> > wrote: > >> >> > > >> >> > I personally like the idea of a typed schema per parser that we > could > >> >> > translate to multiple targets. This would allow us a lot more > >> modularity > >> >> > and extensibility in indexing down the road. > >> >> > > >> >> > -Kyle > >> >> > > >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball < > >> >> > si...@simonellistonball.com <mailto:si...@simonellistonball.com>> > >> wrote: > >> >> > > >> >> >> That sounds like a great idea Otto. Do you have any early design > on > >> that > >> >> >> we can look at. Also, rather than just elastic templates do you > >> think we > >> >> >> should have some sort of typed schema we could translate to > multiple > >> >> >> targets (solr, elastic, ur... other...) or are you thinking of > >> packaging > >> >> >> specific scheme assets like template json with the parser? > >> >> >> > >> >> >> Simon > >> >> >> > >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <ottobackwa...@gmail.com > >> <mailto:ottobackwa...@gmail.com>> wrote: > >> >> >>> > >> >> >>> > >> >> >>> Not to jump the gun, but I’m crafting a proposal about parsers > and > >> one > >> >> >> of the things I am going to propose relates to having the ES > >> Template for > >> >> > a > >> >> >> given parser installed or packaged with the parser. We could > load the > >> >> >> template from there, edit, save and deploy etc. We can extend > that > >> >> > concept > >> >> >> more and more later (drafts, versioning etc ) > >> >> >>> > >> >> >>> > >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball ( > >> >> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com > >) > >> wrote: > >> >> >>>> > >> >> >>>> A little while ago the issue of managing Elastic templates for > new > >> >> >> sensor configs came up, and we didn’t quite put it to bed. > >> >> >>>> > >> >> >>>> When creating new sensors, I almost invariably find the > >> auto-generated > >> >> >> schemas for elastic pick some incorrect types. I also find I > have to > >> >> >> recreate indexes every time to push in the proper dynamic > templates > >> for > >> >> >> things like geo enrichment fields. > >> >> >>>> > >> >> >>>> So, my questions are: > >> >> >>>> How should we address elastic template for new sensors? > >> >> >>>> Do we have circumstances where we would need to configure > types, or > >> >> > can > >> >> >> we get away with inferring them? > >> >> >>>> Should we just add some additional dynamic templates to cover > our > >> >> >> common fields like timestamp (the most common culprit I find for > >> >> > incorrect > >> >> >> typing)? > >> >> >>>> > >> >> >>>> I’d also like to think about ways we can generalise this. Does > >> anyone > >> >> >> have any thoughts on what sort of additional index schemes we > should > >> want > >> >> >> to infer (solr seems an obvious one, any others?). > >> >> >>>> > >> >> >>>> Thoughts on a well typed, schemaed and easily indexed postcard > >> please > >> >> > :) > >> >> >>>> > >> >> >>>> Simon > >> >> >> > >> > >> -- > > > > Jon > > > > Sent from my mobile device > > ------------------- > Thank you, > > James Sirota > PPMC- Apache Metron (Incubating) > jsirota AT apache DOT org >