Based on my experience, I think we will come to point of needing to work on a configuration, test approve and deploy it for this system to be usable
On February 20, 2017 at 14:55:39, Nick Allen (n...@nickallen.org) wrote: > I'm just blowing smoke at 10,000 feet here. :) I think we could engineer > it to be performant in some manner. > > Per your thoughts, It would make sense to have this sort of thing hooked > to configuration changes, not a check for every message that comes in. > > On Mon, Feb 20, 2017 at 2:51 PM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> I think that would be interesting to do, >> >> validateFields() >> updateIndexIf() >> insert() >> >> But do you want to take that hit every message? I’m not sure. >> >> >> What if we instead hooked to configuration such that when you ‘commit’ a >> configuration >> change it recalculates and fixes up the index instead? So don’t do it in >> the indexer, but have >> good lifecycle management in the configuration. >> >> There are issues there with timing the switchover I’d want to think >> through, but I like that better >> than putting that in the stream. >> >> >> On February 20, 2017 at 14:39:57, Nick Allen (n...@nickallen.org) wrote: >> >> Since enrichments, and even parsers, can be added on-the-fly, should the >> ES >> indexer be intelligent enough to manage the index templates on-the-fly >> also? Ideally, I should never have to manually install something like an >> ES template. The indexer should just take care of all that. >> >> In the case of the Elasticsearch indexer, if it notices a new field added >> by an enrichment, or a new source of telemetry, then it should update the >> ES template on-the-fly also. Ideally, we would never have to manually >> create/deploy an ES template. It should all happen seamlessly and remain >> in-sync with whatever enrichments, etc exist. >> >> >> >> >> >> On Mon, Feb 20, 2017 at 2:36 PM, Nick Allen <n...@nickallen.org> wrote: >> >> > >> > Taking this a step further, I think this challenge goes beyond just >> > parsers. We would also need to solve this problem for enrichments. When >> I >> > add an enrichment, I want the enriched data to be indexed accurately. >> How >> > can we make that happen? >> > >> > - As part of defining an enrichment, should I also be able to specify >> > the fields and types using this same generic definition? >> > - Or could this be inferred somehow via an extension to Stellar? >> > >> > >> > >> > On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote: >> > >> >> I like the flexibility and extensibility of having some kind of >> internal >> >> representation (generic definition) of the names and types of the >> fields >> >> produced by a parser. >> >> >> >> Rather than shipping with an ES template, a parser would ship with a >> >> generic definition of the field names and data types that it adds to a >> >> message. The Elasticsearch indexer would then take this generic >> definition >> >> and translate it to an Elasticsearch template. >> >> >> >> Each new indexer (Solr, etc) would know how to consume this generic >> >> definition and produce whatever artifacts that it needs to index the >> data >> >> accurately. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org> >> >> wrote: >> >> >> >>> I am not sure I agree with packaging source-specific templates with >> the >> >>> parser. I think that would make it harder to add additional storage >> >>> sources. For example, what happens if I have 50 parsers with Solr and >> ES >> >>> schemas defined, but now I want to add druid? Now I have to add 50 >> schemas >> >>> to all my existing parsers, which I don't think makes sense. I think >> what >> >>> we should have instead is tuple mappers that map some internal >> >>> representation of our schema to whatever schema the tool uses. We >> already >> >>> somewhat started to move down this path with Kyle defining the schema >> enum >> >>> for his ASA parser PR and Simon defining a JSON schema for his CEF >> parser >> >>> PR. I think we need to unify these approaches and then propagate them >> to >> >>> all the parsers. I think what has to happen is the following: >> >>> >> >>> We have to introduce a partial schema for Metron messages where you >> can >> >>> enforce a schema on a part of a message you want, but at the same time >> >>> allow enough flexibility for the rest of the message to be flexible. >> What >> >>> I mean by that is that you should enforce a schema for things like ip, >> >>> protocol, timestamp, etc, but have a fully flexible structure outside >> of >> >>> that. >> >>> >> >>> After you do that then you can map the partial schema you defined to >> es, >> >>> solr, druid, etc, etc. For the fields you don't have a schema for you >> just >> >>> assume they are strings. To add additional storage/indexing source to >> >>> Metron all you do is define a mapper to that source's schema and load >> that >> >>> into our indexing bolt. >> >>> >> >>> >> >>> >> >>> Thanks, >> >>> James >> >>> >> >>> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>: >> >>> > I think this is a good direction to move things toward - moving >> >>> indexing >> >>> > templates to be packaged with parsers (using multiple tiered >> options) >> >>> that >> >>> > are then merged with the possible enrich fields before getting added >> >>> to the >> >>> > indexing technology in use. Now, to read the proposal thread... >> >>> > >> >>> > Jon >> >>> > >> >>> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball < >> >>> > si...@simonellistonball.com> wrote: >> >>> > >> >>> >> I’d broadly agree with that tiered approach. >> >>> >> >> >>> >> The version where the parser emits a generic schema, and >> enrichments >> >>> >> contribute generic schema chunks to that which get combined into an >> >>> indexer >> >>> >> specific template generated at the end of the flow, so yes, pretty >> >>> much >> >>> >> inline with your proposal. (I did read though it, apologies if I >> >>> missed any >> >>> >> of the detail, brain is still a little bit post-RSA!) >> >>> >> >> >>> >> Simon >> >>> >> >> >>> >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com> >> >>> wrote: >> >>> >> > >> >>> >> > We already make them do this now, or they get the defaults. So >> >>> this is >> >>> >> no different. >> >>> >> > Having parsers emit names and types etc, that would be another >> >>> step - or >> >>> >> it could be the ‘generic schema’ as implemented actually. >> >>> >> > >> >>> >> > A tiered approach - from >> >>> >> > * you give nothing with the parser - you get whatever ES guesses >> >>> at but >> >>> >> you don’t care do you >> >>> >> > * you give the schema >> >>> >> > * you give the types and we figure it out for you >> >>> >> > >> >>> >> > would be the best to move to. >> >>> >> > >> >>> >> > Also, we could use the names and types method tied to enrichment >> to >> >>> >> generate indexing templates for enrichment types or deriving them >> >>> rather, >> >>> >> which i mention in my proposal. >> >>> >> > >> >>> >> > I’m starting to think you haven’t rushed out to read it Simon ;) >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball ( >> >>> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>) >> >>> wrote: >> >>> >> > >> >>> >> >> I like that, to an extent… Forcing the provision of explicit >> >>> schema >> >>> >> might be a bit of a load for parser development. I’m assuming that >> >>> custom >> >>> >> parsers would be pushed towards the same packaging approach. >> >>> >> >> >> >>> >> >> Would it make sense to require the parser to emit field names >> and >> >>> types >> >>> >> expected, and then for us to provide a means of creating the >> >>> templates for >> >>> >> supported indices, and push the actual template management to the >> >>> index >> >>> >> layer rather than the parsing layer. Schema is after all determined >> >>> not >> >>> >> just by a parser, but also by the combination of enrichments and >> >>> models >> >>> >> applied. >> >>> >> >> >> >>> >> >> We could also of course provide an override option within your >> >>> proposed >> >>> >> parser package model to allow any destination specific >> configuration >> >>> of the >> >>> >> indexing template. >> >>> >> >> >> >>> >> >> Simon >> >>> >> >> >> >>> >> >> > On 17 Feb 2017, at 12:01, Otto Fowler < >> ottobackwa...@gmail.com >> >>> >> <mailto:ottobackwa...@gmail.com>> wrote: >> >>> >> >> > >> >>> >> >> > I think we can get there from my proposal. >> >>> >> >> > A source may package: >> >>> >> >> > * explicit schemas ( ES, SOLR, FOO ) >> >>> >> >> > * a generic to be invented schema for a to be invented >> pluggable >> >>> >> indexing >> >>> >> >> > component :) >> >>> >> >> > and we’ll be able to handle it. >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson ( >> >>> >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>) >> >>> >> >> > wrote: >> >>> >> >> > >> >>> >> >> > I personally like the idea of a typed schema per parser that >> we >> >>> could >> >>> >> >> > translate to multiple targets. This would allow us a lot more >> >>> >> modularity >> >>> >> >> > and extensibility in indexing down the road. >> >>> >> >> > >> >>> >> >> > -Kyle >> >>> >> >> > >> >>> >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball < >> >>> >> >> > si...@simonellistonball.com <mailto:simon@ >> simonellistonball.com >> >>> >> >> >>> >> wrote: >> >>> >> >> > >> >>> >> >> >> That sounds like a great idea Otto. Do you have any early >> >>> design on >> >>> >> that >> >>> >> >> >> we can look at. Also, rather than just elastic templates do >> you >> >>> >> think we >> >>> >> >> >> should have some sort of typed schema we could translate to >> >>> multiple >> >>> >> >> >> targets (solr, elastic, ur... other...) or are you thinking >> of >> >>> >> packaging >> >>> >> >> >> specific scheme assets like template json with the parser? >> >>> >> >> >> >> >>> >> >> >> Simon >> >>> >> >> >> >> >>> >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler < >> >>> ottobackwa...@gmail.com >> >>> >> <mailto:ottobackwa...@gmail.com>> wrote: >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> Not to jump the gun, but I’m crafting a proposal about >> >>> parsers and >> >>> >> one >> >>> >> >> >> of the things I am going to propose relates to having the ES >> >>> >> Template for >> >>> >> >> > a >> >>> >> >> >> given parser installed or packaged with the parser. We could >> >>> load the >> >>> >> >> >> template from there, edit, save and deploy etc. We can extend >> >>> that >> >>> >> >> > concept >> >>> >> >> >> more and more later (drafts, versioning etc ) >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball ( >> >>> >> >> >> si...@simonellistonball.com <mailto:simon@simonellistonbal >> >>> l.com>) >> >>> >> wrote: >> >>> >> >> >>>> >> >>> >> >> >>>> A little while ago the issue of managing Elastic templates >> >>> for new >> >>> >> >> >> sensor configs came up, and we didn’t quite put it to bed. >> >>> >> >> >>>> >> >>> >> >> >>>> When creating new sensors, I almost invariably find the >> >>> >> auto-generated >> >>> >> >> >> schemas for elastic pick some incorrect types. I also find I >> >>> have to >> >>> >> >> >> recreate indexes every time to push in the proper dynamic >> >>> templates >> >>> >> for >> >>> >> >> >> things like geo enrichment fields. >> >>> >> >> >>>> >> >>> >> >> >>>> So, my questions are: >> >>> >> >> >>>> How should we address elastic template for new sensors? >> >>> >> >> >>>> Do we have circumstances where we would need to configure >> >>> types, or >> >>> >> >> > can >> >>> >> >> >> we get away with inferring them? >> >>> >> >> >>>> Should we just add some additional dynamic templates to >> >>> cover our >> >>> >> >> >> common fields like timestamp (the most common culprit I find >> >>> for >> >>> >> >> > incorrect >> >>> >> >> >> typing)? >> >>> >> >> >>>> >> >>> >> >> >>>> I’d also like to think about ways we can generalise this. >> >>> Does >> >>> >> anyone >> >>> >> >> >> have any thoughts on what sort of additional index schemes we >> >>> should >> >>> >> want >> >>> >> >> >> to infer (solr seems an obvious one, any others?). >> >>> >> >> >>>> >> >>> >> >> >>>> Thoughts on a well typed, schemaed and easily indexed >> >>> postcard >> >>> >> please >> >>> >> >> > :) >> >>> >> >> >>>> >> >>> >> >> >>>> Simon >> >>> >> >> >> >> >>> >> >> >>> >> -- >> >>> > >> >>> > Jon >> >>> > >> >>> > Sent from my mobile device >> >>> >> >>> ------------------- >> >>> Thank you, >> >>> >> >>> James Sirota >> >>> PPMC- Apache Metron (Incubating) >> >>> jsirota AT apache DOT org >> >>> >> >> >> >> >> > >> >> >