Taking this a step further, I think this challenge goes beyond just parsers. We would also need to solve this problem for enrichments. When I add an enrichment, I want the enriched data to be indexed accurately. How can we make that happen?
- As part of defining an enrichment, should I also be able to specify the fields and types using this same generic definition? - Or could this be inferred somehow via an extension to Stellar? On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote: > I like the flexibility and extensibility of having some kind of internal > representation (generic definition) of the names and types of the fields > produced by a parser. > > Rather than shipping with an ES template, a parser would ship with a > generic definition of the field names and data types that it adds to a > message. The Elasticsearch indexer would then take this generic definition > and translate it to an Elasticsearch template. > > Each new indexer (Solr, etc) would know how to consume this generic > definition and produce whatever artifacts that it needs to index the data > accurately. > > > > > > > > On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org> wrote: > >> I am not sure I agree with packaging source-specific templates with the >> parser. I think that would make it harder to add additional storage >> sources. For example, what happens if I have 50 parsers with Solr and ES >> schemas defined, but now I want to add druid? Now I have to add 50 schemas >> to all my existing parsers, which I don't think makes sense. I think what >> we should have instead is tuple mappers that map some internal >> representation of our schema to whatever schema the tool uses. We already >> somewhat started to move down this path with Kyle defining the schema enum >> for his ASA parser PR and Simon defining a JSON schema for his CEF parser >> PR. I think we need to unify these approaches and then propagate them to >> all the parsers. I think what has to happen is the following: >> >> We have to introduce a partial schema for Metron messages where you can >> enforce a schema on a part of a message you want, but at the same time >> allow enough flexibility for the rest of the message to be flexible. What >> I mean by that is that you should enforce a schema for things like ip, >> protocol, timestamp, etc, but have a fully flexible structure outside of >> that. >> >> After you do that then you can map the partial schema you defined to es, >> solr, druid, etc, etc. For the fields you don't have a schema for you just >> assume they are strings. To add additional storage/indexing source to >> Metron all you do is define a mapper to that source's schema and load that >> into our indexing bolt. >> >> >> >> Thanks, >> James >> >> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>: >> > I think this is a good direction to move things toward - moving indexing >> > templates to be packaged with parsers (using multiple tiered options) >> that >> > are then merged with the possible enrich fields before getting added to >> the >> > indexing technology in use. Now, to read the proposal thread... >> > >> > Jon >> > >> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball < >> > si...@simonellistonball.com> wrote: >> > >> >> I’d broadly agree with that tiered approach. >> >> >> >> The version where the parser emits a generic schema, and enrichments >> >> contribute generic schema chunks to that which get combined into an >> indexer >> >> specific template generated at the end of the flow, so yes, pretty >> much >> >> inline with your proposal. (I did read though it, apologies if I >> missed any >> >> of the detail, brain is still a little bit post-RSA!) >> >> >> >> Simon >> >> >> >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com> >> wrote: >> >> > >> >> > We already make them do this now, or they get the defaults. So this >> is >> >> no different. >> >> > Having parsers emit names and types etc, that would be another step >> - or >> >> it could be the ‘generic schema’ as implemented actually. >> >> > >> >> > A tiered approach - from >> >> > * you give nothing with the parser - you get whatever ES guesses at >> but >> >> you don’t care do you >> >> > * you give the schema >> >> > * you give the types and we figure it out for you >> >> > >> >> > would be the best to move to. >> >> > >> >> > Also, we could use the names and types method tied to enrichment to >> >> generate indexing templates for enrichment types or deriving them >> rather, >> >> which i mention in my proposal. >> >> > >> >> > I’m starting to think you haven’t rushed out to read it Simon ;) >> >> > >> >> > >> >> > >> >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball ( >> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) >> wrote: >> >> > >> >> >> I like that, to an extent… Forcing the provision of explicit schema >> >> might be a bit of a load for parser development. I’m assuming that >> custom >> >> parsers would be pushed towards the same packaging approach. >> >> >> >> >> >> Would it make sense to require the parser to emit field names and >> types >> >> expected, and then for us to provide a means of creating the >> templates for >> >> supported indices, and push the actual template management to the >> index >> >> layer rather than the parsing layer. Schema is after all determined >> not >> >> just by a parser, but also by the combination of enrichments and >> models >> >> applied. >> >> >> >> >> >> We could also of course provide an override option within your >> proposed >> >> parser package model to allow any destination specific configuration >> of the >> >> indexing template. >> >> >> >> >> >> Simon >> >> >> >> >> >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwa...@gmail.com >> >> <mailto:ottobackwa...@gmail.com>> wrote: >> >> >> > >> >> >> > I think we can get there from my proposal. >> >> >> > A source may package: >> >> >> > * explicit schemas ( ES, SOLR, FOO ) >> >> >> > * a generic to be invented schema for a to be invented pluggable >> >> indexing >> >> >> > component :) >> >> >> > and we’ll be able to handle it. >> >> >> > >> >> >> > >> >> >> > >> >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson ( >> >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>) >> >> >> > wrote: >> >> >> > >> >> >> > I personally like the idea of a typed schema per parser that we >> could >> >> >> > translate to multiple targets. This would allow us a lot more >> >> modularity >> >> >> > and extensibility in indexing down the road. >> >> >> > >> >> >> > -Kyle >> >> >> > >> >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball < >> >> >> > si...@simonellistonball.com <mailto:si...@simonellistonball.com >> >> >> >> wrote: >> >> >> > >> >> >> >> That sounds like a great idea Otto. Do you have any early >> design on >> >> that >> >> >> >> we can look at. Also, rather than just elastic templates do you >> >> think we >> >> >> >> should have some sort of typed schema we could translate to >> multiple >> >> >> >> targets (solr, elastic, ur... other...) or are you thinking of >> >> packaging >> >> >> >> specific scheme assets like template json with the parser? >> >> >> >> >> >> >> >> Simon >> >> >> >> >> >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <ottobackwa...@gmail.com >> >> <mailto:ottobackwa...@gmail.com>> wrote: >> >> >> >>> >> >> >> >>> >> >> >> >>> Not to jump the gun, but I’m crafting a proposal about parsers >> and >> >> one >> >> >> >> of the things I am going to propose relates to having the ES >> >> Template for >> >> >> > a >> >> >> >> given parser installed or packaged with the parser. We could >> load the >> >> >> >> template from there, edit, save and deploy etc. We can extend >> that >> >> >> > concept >> >> >> >> more and more later (drafts, versioning etc ) >> >> >> >>> >> >> >> >>> >> >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball ( >> >> >> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com >> >) >> >> wrote: >> >> >> >>>> >> >> >> >>>> A little while ago the issue of managing Elastic templates >> for new >> >> >> >> sensor configs came up, and we didn’t quite put it to bed. >> >> >> >>>> >> >> >> >>>> When creating new sensors, I almost invariably find the >> >> auto-generated >> >> >> >> schemas for elastic pick some incorrect types. I also find I >> have to >> >> >> >> recreate indexes every time to push in the proper dynamic >> templates >> >> for >> >> >> >> things like geo enrichment fields. >> >> >> >>>> >> >> >> >>>> So, my questions are: >> >> >> >>>> How should we address elastic template for new sensors? >> >> >> >>>> Do we have circumstances where we would need to configure >> types, or >> >> >> > can >> >> >> >> we get away with inferring them? >> >> >> >>>> Should we just add some additional dynamic templates to cover >> our >> >> >> >> common fields like timestamp (the most common culprit I find for >> >> >> > incorrect >> >> >> >> typing)? >> >> >> >>>> >> >> >> >>>> I’d also like to think about ways we can generalise this. Does >> >> anyone >> >> >> >> have any thoughts on what sort of additional index schemes we >> should >> >> want >> >> >> >> to infer (solr seems an obvious one, any others?). >> >> >> >>>> >> >> >> >>>> Thoughts on a well typed, schemaed and easily indexed postcard >> >> please >> >> >> > :) >> >> >> >>>> >> >> >> >>>> Simon >> >> >> >> >> >> >> >> -- >> > >> > Jon >> > >> > Sent from my mobile device >> >> ------------------- >> Thank you, >> >> James Sirota >> PPMC- Apache Metron (Incubating) >> jsirota AT apache DOT org >> > >