I'm just blowing smoke at 10,000 feet here. :) I think we could engineer it to be performant in some manner.
Per your thoughts, It would make sense to have this sort of thing hooked to configuration changes, not a check for every message that comes in. On Mon, Feb 20, 2017 at 2:51 PM, Otto Fowler <ottobackwa...@gmail.com> wrote: > I think that would be interesting to do, > > validateFields() > updateIndexIf() > insert() > > But do you want to take that hit every message? I’m not sure. > > > What if we instead hooked to configuration such that when you ‘commit’ a > configuration > change it recalculates and fixes up the index instead? So don’t do it in > the indexer, but have > good lifecycle management in the configuration. > > There are issues there with timing the switchover I’d want to think > through, but I like that better > than putting that in the stream. > > > On February 20, 2017 at 14:39:57, Nick Allen (n...@nickallen.org) wrote: > > Since enrichments, and even parsers, can be added on-the-fly, should the > ES > indexer be intelligent enough to manage the index templates on-the-fly > also? Ideally, I should never have to manually install something like an > ES template. The indexer should just take care of all that. > > In the case of the Elasticsearch indexer, if it notices a new field added > by an enrichment, or a new source of telemetry, then it should update the > ES template on-the-fly also. Ideally, we would never have to manually > create/deploy an ES template. It should all happen seamlessly and remain > in-sync with whatever enrichments, etc exist. > > > > > > On Mon, Feb 20, 2017 at 2:36 PM, Nick Allen <n...@nickallen.org> wrote: > > > > > Taking this a step further, I think this challenge goes beyond just > > parsers. We would also need to solve this problem for enrichments. When > I > > add an enrichment, I want the enriched data to be indexed accurately. > How > > can we make that happen? > > > > - As part of defining an enrichment, should I also be able to specify > > the fields and types using this same generic definition? > > - Or could this be inferred somehow via an extension to Stellar? > > > > > > > > On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote: > > > >> I like the flexibility and extensibility of having some kind of > internal > >> representation (generic definition) of the names and types of the > fields > >> produced by a parser. > >> > >> Rather than shipping with an ES template, a parser would ship with a > >> generic definition of the field names and data types that it adds to a > >> message. The Elasticsearch indexer would then take this generic > definition > >> and translate it to an Elasticsearch template. > >> > >> Each new indexer (Solr, etc) would know how to consume this generic > >> definition and produce whatever artifacts that it needs to index the > data > >> accurately. > >> > >> > >> > >> > >> > >> > >> > >> On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org> > >> wrote: > >> > >>> I am not sure I agree with packaging source-specific templates with > the > >>> parser. I think that would make it harder to add additional storage > >>> sources. For example, what happens if I have 50 parsers with Solr and > ES > >>> schemas defined, but now I want to add druid? Now I have to add 50 > schemas > >>> to all my existing parsers, which I don't think makes sense. I think > what > >>> we should have instead is tuple mappers that map some internal > >>> representation of our schema to whatever schema the tool uses. We > already > >>> somewhat started to move down this path with Kyle defining the schema > enum > >>> for his ASA parser PR and Simon defining a JSON schema for his CEF > parser > >>> PR. I think we need to unify these approaches and then propagate them > to > >>> all the parsers. I think what has to happen is the following: > >>> > >>> We have to introduce a partial schema for Metron messages where you > can > >>> enforce a schema on a part of a message you want, but at the same time > >>> allow enough flexibility for the rest of the message to be flexible. > What > >>> I mean by that is that you should enforce a schema for things like ip, > >>> protocol, timestamp, etc, but have a fully flexible structure outside > of > >>> that. > >>> > >>> After you do that then you can map the partial schema you defined to > es, > >>> solr, druid, etc, etc. For the fields you don't have a schema for you > just > >>> assume they are strings. To add additional storage/indexing source to > >>> Metron all you do is define a mapper to that source's schema and load > that > >>> into our indexing bolt. > >>> > >>> > >>> > >>> Thanks, > >>> James > >>> > >>> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>: > >>> > I think this is a good direction to move things toward - moving > >>> indexing > >>> > templates to be packaged with parsers (using multiple tiered > options) > >>> that > >>> > are then merged with the possible enrich fields before getting added > >>> to the > >>> > indexing technology in use. Now, to read the proposal thread... > >>> > > >>> > Jon > >>> > > >>> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball < > >>> > si...@simonellistonball.com> wrote: > >>> > > >>> >> I’d broadly agree with that tiered approach. > >>> >> > >>> >> The version where the parser emits a generic schema, and > enrichments > >>> >> contribute generic schema chunks to that which get combined into an > >>> indexer > >>> >> specific template generated at the end of the flow, so yes, pretty > >>> much > >>> >> inline with your proposal. (I did read though it, apologies if I > >>> missed any > >>> >> of the detail, brain is still a little bit post-RSA!) > >>> >> > >>> >> Simon > >>> >> > >>> >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com> > >>> wrote: > >>> >> > > >>> >> > We already make them do this now, or they get the defaults. So > >>> this is > >>> >> no different. > >>> >> > Having parsers emit names and types etc, that would be another > >>> step - or > >>> >> it could be the ‘generic schema’ as implemented actually. > >>> >> > > >>> >> > A tiered approach - from > >>> >> > * you give nothing with the parser - you get whatever ES guesses > >>> at but > >>> >> you don’t care do you > >>> >> > * you give the schema > >>> >> > * you give the types and we figure it out for you > >>> >> > > >>> >> > would be the best to move to. > >>> >> > > >>> >> > Also, we could use the names and types method tied to enrichment > to > >>> >> generate indexing templates for enrichment types or deriving them > >>> rather, > >>> >> which i mention in my proposal. > >>> >> > > >>> >> > I’m starting to think you haven’t rushed out to read it Simon ;) > >>> >> > > >>> >> > > >>> >> > > >>> >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball ( > >>> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) > >>> wrote: > >>> >> > > >>> >> >> I like that, to an extent… Forcing the provision of explicit > >>> schema > >>> >> might be a bit of a load for parser development. I’m assuming that > >>> custom > >>> >> parsers would be pushed towards the same packaging approach. > >>> >> >> > >>> >> >> Would it make sense to require the parser to emit field names > and > >>> types > >>> >> expected, and then for us to provide a means of creating the > >>> templates for > >>> >> supported indices, and push the actual template management to the > >>> index > >>> >> layer rather than the parsing layer. Schema is after all determined > >>> not > >>> >> just by a parser, but also by the combination of enrichments and > >>> models > >>> >> applied. > >>> >> >> > >>> >> >> We could also of course provide an override option within your > >>> proposed > >>> >> parser package model to allow any destination specific > configuration > >>> of the > >>> >> indexing template. > >>> >> >> > >>> >> >> Simon > >>> >> >> > >>> >> >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwa...@gmail.com > >>> >> <mailto:ottobackwa...@gmail.com>> wrote: > >>> >> >> > > >>> >> >> > I think we can get there from my proposal. > >>> >> >> > A source may package: > >>> >> >> > * explicit schemas ( ES, SOLR, FOO ) > >>> >> >> > * a generic to be invented schema for a to be invented > pluggable > >>> >> indexing > >>> >> >> > component :) > >>> >> >> > and we’ll be able to handle it. > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson ( > >>> >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>) > >>> >> >> > wrote: > >>> >> >> > > >>> >> >> > I personally like the idea of a typed schema per parser that > we > >>> could > >>> >> >> > translate to multiple targets. This would allow us a lot more > >>> >> modularity > >>> >> >> > and extensibility in indexing down the road. > >>> >> >> > > >>> >> >> > -Kyle > >>> >> >> > > >>> >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball < > >>> >> >> > si...@simonellistonball.com <mailto:simon@ > simonellistonball.com > >>> >> > >>> >> wrote: > >>> >> >> > > >>> >> >> >> That sounds like a great idea Otto. Do you have any early > >>> design on > >>> >> that > >>> >> >> >> we can look at. Also, rather than just elastic templates do > you > >>> >> think we > >>> >> >> >> should have some sort of typed schema we could translate to > >>> multiple > >>> >> >> >> targets (solr, elastic, ur... other...) or are you thinking > of > >>> >> packaging > >>> >> >> >> specific scheme assets like template json with the parser? > >>> >> >> >> > >>> >> >> >> Simon > >>> >> >> >> > >>> >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler < > >>> ottobackwa...@gmail.com > >>> >> <mailto:ottobackwa...@gmail.com>> wrote: > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> Not to jump the gun, but I’m crafting a proposal about > >>> parsers and > >>> >> one > >>> >> >> >> of the things I am going to propose relates to having the ES > >>> >> Template for > >>> >> >> > a > >>> >> >> >> given parser installed or packaged with the parser. We could > >>> load the > >>> >> >> >> template from there, edit, save and deploy etc. We can extend > >>> that > >>> >> >> > concept > >>> >> >> >> more and more later (drafts, versioning etc ) > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball ( > >>> >> >> >> si...@simonellistonball.com <mailto:simon@simonellistonbal > >>> l.com>) > >>> >> wrote: > >>> >> >> >>>> > >>> >> >> >>>> A little while ago the issue of managing Elastic templates > >>> for new > >>> >> >> >> sensor configs came up, and we didn’t quite put it to bed. > >>> >> >> >>>> > >>> >> >> >>>> When creating new sensors, I almost invariably find the > >>> >> auto-generated > >>> >> >> >> schemas for elastic pick some incorrect types. I also find I > >>> have to > >>> >> >> >> recreate indexes every time to push in the proper dynamic > >>> templates > >>> >> for > >>> >> >> >> things like geo enrichment fields. > >>> >> >> >>>> > >>> >> >> >>>> So, my questions are: > >>> >> >> >>>> How should we address elastic template for new sensors? > >>> >> >> >>>> Do we have circumstances where we would need to configure > >>> types, or > >>> >> >> > can > >>> >> >> >> we get away with inferring them? > >>> >> >> >>>> Should we just add some additional dynamic templates to > >>> cover our > >>> >> >> >> common fields like timestamp (the most common culprit I find > >>> for > >>> >> >> > incorrect > >>> >> >> >> typing)? > >>> >> >> >>>> > >>> >> >> >>>> I’d also like to think about ways we can generalise this. > >>> Does > >>> >> anyone > >>> >> >> >> have any thoughts on what sort of additional index schemes we > >>> should > >>> >> want > >>> >> >> >> to infer (solr seems an obvious one, any others?). > >>> >> >> >>>> > >>> >> >> >>>> Thoughts on a well typed, schemaed and easily indexed > >>> postcard > >>> >> please > >>> >> >> > :) > >>> >> >> >>>> > >>> >> >> >>>> Simon > >>> >> >> >> > >>> >> > >>> >> -- > >>> > > >>> > Jon > >>> > > >>> > Sent from my mobile device > >>> > >>> ------------------- > >>> Thank you, > >>> > >>> James Sirota > >>> PPMC- Apache Metron (Incubating) > >>> jsirota AT apache DOT org > >>> > >> > >> > > > >