Re: [DISCUSS] Management of Elastic and other index schemas

Otto Fowler Mon, 20 Feb 2017 16:43:10 -0800

Based on my experience, I think we will come to point of needing to work on
a configuration, test approve and deploy it for this system to be usable


On February 20, 2017 at 14:55:39, Nick Allen (n...@nickallen.org) wrote:

> I'm just blowing smoke at 10,000 feet here. :) I think we could engineer
> it to be performant in some manner.
>
> Per your thoughts, It would make sense to have this sort of thing hooked
> to configuration changes, not a check for every message that comes in.
>
> On Mon, Feb 20, 2017 at 2:51 PM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>> I think that would be interesting to do,
>>
>> validateFields()
>> updateIndexIf()
>> insert()
>>
>> But do you want to take that hit every message?  I’m not sure.
>>
>>
>> What if we instead hooked to configuration such that when you ‘commit’ a
>> configuration
>> change it recalculates and fixes up the index instead?  So don’t do it in
>> the indexer, but have
>> good lifecycle management in the configuration.
>>
>> There are issues there with timing the switchover I’d want to think
>> through, but I like that better
>> than putting that in the stream.
>>
>>
>> On February 20, 2017 at 14:39:57, Nick Allen (n...@nickallen.org) wrote:
>>
>> Since enrichments, and even parsers, can be added on-the-fly, should the
>> ES
>> indexer be intelligent enough to manage the index templates on-the-fly
>> also? Ideally, I should never have to manually install something like an
>> ES template. The indexer should just take care of all that.
>>
>> In the case of the Elasticsearch indexer, if it notices a new field added
>> by an enrichment, or a new source of telemetry, then it should update the
>> ES template on-the-fly also. Ideally, we would never have to manually
>> create/deploy an ES template. It should all happen seamlessly and remain
>> in-sync with whatever enrichments, etc exist.
>>
>>
>>
>>
>>
>> On Mon, Feb 20, 2017 at 2:36 PM, Nick Allen <n...@nickallen.org> wrote:
>>
>> >
>> > Taking this a step further, I think this challenge goes beyond just
>> > parsers. We would also need to solve this problem for enrichments. When
>> I
>> > add an enrichment, I want the enriched data to be indexed accurately.
>> How
>> > can we make that happen?
>> >
>> > - As part of defining an enrichment, should I also be able to specify
>> > the fields and types using this same generic definition?
>> > - Or could this be inferred somehow via an extension to Stellar?
>> >
>> >
>> >
>> > On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote:
>> >
>> >> I like the flexibility and extensibility of having some kind of
>> internal
>> >> representation (generic definition) of the names and types of the
>> fields
>> >> produced by a parser.
>> >>
>> >> Rather than shipping with an ES template, a parser would ship with a
>> >> generic definition of the field names and data types that it adds to a
>> >> message. The Elasticsearch indexer would then take this generic
>> definition
>> >> and translate it to an Elasticsearch template.
>> >>
>> >> Each new indexer (Solr, etc) would know how to consume this generic
>> >> definition and produce whatever artifacts that it needs to index the
>> data
>> >> accurately.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org>
>> >> wrote:
>> >>
>> >>> I am not sure I agree with packaging source-specific templates with
>> the
>> >>> parser. I think that would make it harder to add additional storage
>> >>> sources. For example, what happens if I have 50 parsers with Solr and
>> ES
>> >>> schemas defined, but now I want to add druid? Now I have to add 50
>> schemas
>> >>> to all my existing parsers, which I don't think makes sense. I think
>> what
>> >>> we should have instead is tuple mappers that map some internal
>> >>> representation of our schema to whatever schema the tool uses. We
>> already
>> >>> somewhat started to move down this path with Kyle defining the schema
>> enum
>> >>> for his ASA parser PR and Simon defining a JSON schema for his CEF
>> parser
>> >>> PR. I think we need to unify these approaches and then propagate them
>> to
>> >>> all the parsers. I think what has to happen is the following:
>> >>>
>> >>> We have to introduce a partial schema for Metron messages where you
>> can
>> >>> enforce a schema on a part of a message you want, but at the same time
>> >>> allow enough flexibility for the rest of the message to be flexible.
>> What
>> >>> I mean by that is that you should enforce a schema for things like ip,
>> >>> protocol, timestamp, etc, but have a fully flexible structure outside
>> of
>> >>> that.
>> >>>
>> >>> After you do that then you can map the partial schema you defined to
>> es,
>> >>> solr, druid, etc, etc. For the fields you don't have a schema for you
>> just
>> >>> assume they are strings. To add additional storage/indexing source to
>> >>> Metron all you do is define a mapper to that source's schema and load
>> that
>> >>> into our indexing bolt.
>> >>>
>> >>>
>> >>>
>> >>> Thanks,
>> >>> James
>> >>>
>> >>> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>:
>> >>> > I think this is a good direction to move things toward - moving
>> >>> indexing
>> >>> > templates to be packaged with parsers (using multiple tiered
>> options)
>> >>> that
>> >>> > are then merged with the possible enrich fields before getting added
>> >>> to the
>> >>> > indexing technology in use. Now, to read the proposal thread...
>> >>> >
>> >>> > Jon
>> >>> >
>> >>> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball <
>> >>> > si...@simonellistonball.com> wrote:
>> >>> >
>> >>> >> I’d broadly agree with that tiered approach.
>> >>> >>
>> >>> >> The version where the parser emits a generic schema, and
>> enrichments
>> >>> >> contribute generic schema chunks to that which get combined into an
>> >>> indexer
>> >>> >> specific template generated at the end of the flow, so yes, pretty
>> >>> much
>> >>> >> inline with your proposal. (I did read though it, apologies if I
>> >>> missed any
>> >>> >> of the detail, brain is still a little bit post-RSA!)
>> >>> >>
>> >>> >> Simon
>> >>> >>
>> >>> >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com>
>> >>> wrote:
>> >>> >> >
>> >>> >> > We already make them do this now, or they get the defaults. So
>> >>> this is
>> >>> >> no different.
>> >>> >> > Having parsers emit names and types etc, that would be another
>> >>> step - or
>> >>> >> it could be the ‘generic schema’ as implemented actually.
>> >>> >> >
>> >>> >> > A tiered approach - from
>> >>> >> > * you give nothing with the parser - you get whatever ES guesses
>> >>> at but
>> >>> >> you don’t care do you
>> >>> >> > * you give the schema
>> >>> >> > * you give the types and we figure it out for you
>> >>> >> >
>> >>> >> > would be the best to move to.
>> >>> >> >
>> >>> >> > Also, we could use the names and types method tied to enrichment
>> to
>> >>> >> generate indexing templates for enrichment types or deriving them
>> >>> rather,
>> >>> >> which i mention in my proposal.
>> >>> >> >
>> >>> >> > I’m starting to think you haven’t rushed out to read it Simon ;)
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball (
>> >>> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>)
>> >>> wrote:
>> >>> >> >
>> >>> >> >> I like that, to an extent… Forcing the provision of explicit
>> >>> schema
>> >>> >> might be a bit of a load for parser development. I’m assuming that
>> >>> custom
>> >>> >> parsers would be pushed towards the same packaging approach.
>> >>> >> >>
>> >>> >> >> Would it make sense to require the parser to emit field names
>> and
>> >>> types
>> >>> >> expected, and then for us to provide a means of creating the
>> >>> templates for
>> >>> >> supported indices, and push the actual template management to the
>> >>> index
>> >>> >> layer rather than the parsing layer. Schema is after all determined
>> >>> not
>> >>> >> just by a parser, but also by the combination of enrichments and
>> >>> models
>> >>> >> applied.
>> >>> >> >>
>> >>> >> >> We could also of course provide an override option within your
>> >>> proposed
>> >>> >> parser package model to allow any destination specific
>> configuration
>> >>> of the
>> >>> >> indexing template.
>> >>> >> >>
>> >>> >> >> Simon
>> >>> >> >>
>> >>> >> >> > On 17 Feb 2017, at 12:01, Otto Fowler <
>> ottobackwa...@gmail.com
>> >>> >> <mailto:ottobackwa...@gmail.com>> wrote:
>> >>> >> >> >
>> >>> >> >> > I think we can get there from my proposal.
>> >>> >> >> > A source may package:
>> >>> >> >> > * explicit schemas ( ES, SOLR, FOO )
>> >>> >> >> > * a generic to be invented schema for a to be invented
>> pluggable
>> >>> >> indexing
>> >>> >> >> > component :)
>> >>> >> >> > and we’ll be able to handle it.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson (
>> >>> >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>)
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > I personally like the idea of a typed schema per parser that
>> we
>> >>> could
>> >>> >> >> > translate to multiple targets. This would allow us a lot more
>> >>> >> modularity
>> >>> >> >> > and extensibility in indexing down the road.
>> >>> >> >> >
>> >>> >> >> > -Kyle
>> >>> >> >> >
>> >>> >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball <
>> >>> >> >> > si...@simonellistonball.com <mailto:simon@
>> simonellistonball.com
>> >>> >>
>> >>> >> wrote:
>> >>> >> >> >
>> >>> >> >> >> That sounds like a great idea Otto. Do you have any early
>> >>> design on
>> >>> >> that
>> >>> >> >> >> we can look at. Also, rather than just elastic templates do
>> you
>> >>> >> think we
>> >>> >> >> >> should have some sort of typed schema we could translate to
>> >>> multiple
>> >>> >> >> >> targets (solr, elastic, ur... other...) or are you thinking
>> of
>> >>> >> packaging
>> >>> >> >> >> specific scheme assets like template json with the parser?
>> >>> >> >> >>
>> >>> >> >> >> Simon
>> >>> >> >> >>
>> >>> >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <
>> >>> ottobackwa...@gmail.com
>> >>> >> <mailto:ottobackwa...@gmail.com>> wrote:
>> >>> >> >> >>>
>> >>> >> >> >>>
>> >>> >> >> >>> Not to jump the gun, but I’m crafting a proposal about
>> >>> parsers and
>> >>> >> one
>> >>> >> >> >> of the things I am going to propose relates to having the ES
>> >>> >> Template for
>> >>> >> >> > a
>> >>> >> >> >> given parser installed or packaged with the parser. We could
>> >>> load the
>> >>> >> >> >> template from there, edit, save and deploy etc. We can extend
>> >>> that
>> >>> >> >> > concept
>> >>> >> >> >> more and more later (drafts, versioning etc )
>> >>> >> >> >>>
>> >>> >> >> >>>
>> >>> >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball (
>> >>> >> >> >> si...@simonellistonball.com <mailto:simon@simonellistonbal
>> >>> l.com>)
>> >>> >> wrote:
>> >>> >> >> >>>>
>> >>> >> >> >>>> A little while ago the issue of managing Elastic templates
>> >>> for new
>> >>> >> >> >> sensor configs came up, and we didn’t quite put it to bed.
>> >>> >> >> >>>>
>> >>> >> >> >>>> When creating new sensors, I almost invariably find the
>> >>> >> auto-generated
>> >>> >> >> >> schemas for elastic pick some incorrect types. I also find I
>> >>> have to
>> >>> >> >> >> recreate indexes every time to push in the proper dynamic
>> >>> templates
>> >>> >> for
>> >>> >> >> >> things like geo enrichment fields.
>> >>> >> >> >>>>
>> >>> >> >> >>>> So, my questions are:
>> >>> >> >> >>>> How should we address elastic template for new sensors?
>> >>> >> >> >>>> Do we have circumstances where we would need to configure
>> >>> types, or
>> >>> >> >> > can
>> >>> >> >> >> we get away with inferring them?
>> >>> >> >> >>>> Should we just add some additional dynamic templates to
>> >>> cover our
>> >>> >> >> >> common fields like timestamp (the most common culprit I find
>> >>> for
>> >>> >> >> > incorrect
>> >>> >> >> >> typing)?
>> >>> >> >> >>>>
>> >>> >> >> >>>> I’d also like to think about ways we can generalise this.
>> >>> Does
>> >>> >> anyone
>> >>> >> >> >> have any thoughts on what sort of additional index schemes we
>> >>> should
>> >>> >> want
>> >>> >> >> >> to infer (solr seems an obvious one, any others?).
>> >>> >> >> >>>>
>> >>> >> >> >>>> Thoughts on a well typed, schemaed and easily indexed
>> >>> postcard
>> >>> >> please
>> >>> >> >> > :)
>> >>> >> >> >>>>
>> >>> >> >> >>>> Simon
>> >>> >> >> >>
>> >>> >>
>> >>> >> --
>> >>> >
>> >>> > Jon
>> >>> >
>> >>> > Sent from my mobile device
>> >>>
>> >>> -------------------
>> >>> Thank you,
>> >>>
>> >>> James Sirota
>> >>> PPMC- Apache Metron (Incubating)
>> >>> jsirota AT apache DOT org
>> >>>
>> >>
>> >>
>> >
>>
>>
>

Re: [DISCUSS] Management of Elastic and other index schemas

Reply via email to