Re: [DISCUSS] Management of Elastic and other index schemas

Nick Allen Mon, 20 Feb 2017 11:56:59 -0800

I'm just blowing smoke at 10,000 feet here. :) I think we could engineer it
to be performant in some manner.


Per your thoughts, It would make sense to have this sort of thing hooked to
configuration changes, not a check for every message that comes in.

On Mon, Feb 20, 2017 at 2:51 PM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> I think that would be interesting to do,
>
> validateFields()
> updateIndexIf()
> insert()
>
> But do you want to take that hit every message?  I’m not sure.
>
>
> What if we instead hooked to configuration such that when you ‘commit’ a
> configuration
> change it recalculates and fixes up the index instead?  So don’t do it in
> the indexer, but have
> good lifecycle management in the configuration.
>
> There are issues there with timing the switchover I’d want to think
> through, but I like that better
> than putting that in the stream.
>
>
> On February 20, 2017 at 14:39:57, Nick Allen (n...@nickallen.org) wrote:
>
> Since enrichments, and even parsers, can be added on-the-fly, should the
> ES
> indexer be intelligent enough to manage the index templates on-the-fly
> also? Ideally, I should never have to manually install something like an
> ES template. The indexer should just take care of all that.
>
> In the case of the Elasticsearch indexer, if it notices a new field added
> by an enrichment, or a new source of telemetry, then it should update the
> ES template on-the-fly also. Ideally, we would never have to manually
> create/deploy an ES template. It should all happen seamlessly and remain
> in-sync with whatever enrichments, etc exist.
>
>
>
>
>
> On Mon, Feb 20, 2017 at 2:36 PM, Nick Allen <n...@nickallen.org> wrote:
>
> >
> > Taking this a step further, I think this challenge goes beyond just
> > parsers. We would also need to solve this problem for enrichments. When
> I
> > add an enrichment, I want the enriched data to be indexed accurately.
> How
> > can we make that happen?
> >
> > - As part of defining an enrichment, should I also be able to specify
> > the fields and types using this same generic definition?
> > - Or could this be inferred somehow via an extension to Stellar?
> >
> >
> >
> > On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote:
> >
> >> I like the flexibility and extensibility of having some kind of
> internal
> >> representation (generic definition) of the names and types of the
> fields
> >> produced by a parser.
> >>
> >> Rather than shipping with an ES template, a parser would ship with a
> >> generic definition of the field names and data types that it adds to a
> >> message. The Elasticsearch indexer would then take this generic
> definition
> >> and translate it to an Elasticsearch template.
> >>
> >> Each new indexer (Solr, etc) would know how to consume this generic
> >> definition and produce whatever artifacts that it needs to index the
> data
> >> accurately.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org>
> >> wrote:
> >>
> >>> I am not sure I agree with packaging source-specific templates with
> the
> >>> parser. I think that would make it harder to add additional storage
> >>> sources. For example, what happens if I have 50 parsers with Solr and
> ES
> >>> schemas defined, but now I want to add druid? Now I have to add 50
> schemas
> >>> to all my existing parsers, which I don't think makes sense. I think
> what
> >>> we should have instead is tuple mappers that map some internal
> >>> representation of our schema to whatever schema the tool uses. We
> already
> >>> somewhat started to move down this path with Kyle defining the schema
> enum
> >>> for his ASA parser PR and Simon defining a JSON schema for his CEF
> parser
> >>> PR. I think we need to unify these approaches and then propagate them
> to
> >>> all the parsers. I think what has to happen is the following:
> >>>
> >>> We have to introduce a partial schema for Metron messages where you
> can
> >>> enforce a schema on a part of a message you want, but at the same time
> >>> allow enough flexibility for the rest of the message to be flexible.
> What
> >>> I mean by that is that you should enforce a schema for things like ip,
> >>> protocol, timestamp, etc, but have a fully flexible structure outside
> of
> >>> that.
> >>>
> >>> After you do that then you can map the partial schema you defined to
> es,
> >>> solr, druid, etc, etc. For the fields you don't have a schema for you
> just
> >>> assume they are strings. To add additional storage/indexing source to
> >>> Metron all you do is define a mapper to that source's schema and load
> that
> >>> into our indexing bolt.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>> James
> >>>
> >>> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>:
> >>> > I think this is a good direction to move things toward - moving
> >>> indexing
> >>> > templates to be packaged with parsers (using multiple tiered
> options)
> >>> that
> >>> > are then merged with the possible enrich fields before getting added
> >>> to the
> >>> > indexing technology in use. Now, to read the proposal thread...
> >>> >
> >>> > Jon
> >>> >
> >>> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball <
> >>> > si...@simonellistonball.com> wrote:
> >>> >
> >>> >> I’d broadly agree with that tiered approach.
> >>> >>
> >>> >> The version where the parser emits a generic schema, and
> enrichments
> >>> >> contribute generic schema chunks to that which get combined into an
> >>> indexer
> >>> >> specific template generated at the end of the flow, so yes, pretty
> >>> much
> >>> >> inline with your proposal. (I did read though it, apologies if I
> >>> missed any
> >>> >> of the detail, brain is still a little bit post-RSA!)
> >>> >>
> >>> >> Simon
> >>> >>
> >>> >> > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com>
> >>> wrote:
> >>> >> >
> >>> >> > We already make them do this now, or they get the defaults. So
> >>> this is
> >>> >> no different.
> >>> >> > Having parsers emit names and types etc, that would be another
> >>> step - or
> >>> >> it could be the ‘generic schema’ as implemented actually.
> >>> >> >
> >>> >> > A tiered approach - from
> >>> >> > * you give nothing with the parser - you get whatever ES guesses
> >>> at but
> >>> >> you don’t care do you
> >>> >> > * you give the schema
> >>> >> > * you give the types and we figure it out for you
> >>> >> >
> >>> >> > would be the best to move to.
> >>> >> >
> >>> >> > Also, we could use the names and types method tied to enrichment
> to
> >>> >> generate indexing templates for enrichment types or deriving them
> >>> rather,
> >>> >> which i mention in my proposal.
> >>> >> >
> >>> >> > I’m starting to think you haven’t rushed out to read it Simon ;)
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On February 17, 2017 at 15:24:37, Simon Elliston Ball (
> >>> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com>)
> >>> wrote:
> >>> >> >
> >>> >> >> I like that, to an extent… Forcing the provision of explicit
> >>> schema
> >>> >> might be a bit of a load for parser development. I’m assuming that
> >>> custom
> >>> >> parsers would be pushed towards the same packaging approach.
> >>> >> >>
> >>> >> >> Would it make sense to require the parser to emit field names
> and
> >>> types
> >>> >> expected, and then for us to provide a means of creating the
> >>> templates for
> >>> >> supported indices, and push the actual template management to the
> >>> index
> >>> >> layer rather than the parsing layer. Schema is after all determined
> >>> not
> >>> >> just by a parser, but also by the combination of enrichments and
> >>> models
> >>> >> applied.
> >>> >> >>
> >>> >> >> We could also of course provide an override option within your
> >>> proposed
> >>> >> parser package model to allow any destination specific
> configuration
> >>> of the
> >>> >> indexing template.
> >>> >> >>
> >>> >> >> Simon
> >>> >> >>
> >>> >> >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwa...@gmail.com
> >>> >> <mailto:ottobackwa...@gmail.com>> wrote:
> >>> >> >> >
> >>> >> >> > I think we can get there from my proposal.
> >>> >> >> > A source may package:
> >>> >> >> > * explicit schemas ( ES, SOLR, FOO )
> >>> >> >> > * a generic to be invented schema for a to be invented
> pluggable
> >>> >> indexing
> >>> >> >> > component :)
> >>> >> >> > and we’ll be able to handle it.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On February 17, 2017 at 14:39:07, Kyle Richardson (
> >>> >> kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>)
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> > I personally like the idea of a typed schema per parser that
> we
> >>> could
> >>> >> >> > translate to multiple targets. This would allow us a lot more
> >>> >> modularity
> >>> >> >> > and extensibility in indexing down the road.
> >>> >> >> >
> >>> >> >> > -Kyle
> >>> >> >> >
> >>> >> >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball <
> >>> >> >> > si...@simonellistonball.com <mailto:simon@
> simonellistonball.com
> >>> >>
> >>> >> wrote:
> >>> >> >> >
> >>> >> >> >> That sounds like a great idea Otto. Do you have any early
> >>> design on
> >>> >> that
> >>> >> >> >> we can look at. Also, rather than just elastic templates do
> you
> >>> >> think we
> >>> >> >> >> should have some sort of typed schema we could translate to
> >>> multiple
> >>> >> >> >> targets (solr, elastic, ur... other...) or are you thinking
> of
> >>> >> packaging
> >>> >> >> >> specific scheme assets like template json with the parser?
> >>> >> >> >>
> >>> >> >> >> Simon
> >>> >> >> >>
> >>> >> >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <
> >>> ottobackwa...@gmail.com
> >>> >> <mailto:ottobackwa...@gmail.com>> wrote:
> >>> >> >> >>>
> >>> >> >> >>>
> >>> >> >> >>> Not to jump the gun, but I’m crafting a proposal about
> >>> parsers and
> >>> >> one
> >>> >> >> >> of the things I am going to propose relates to having the ES
> >>> >> Template for
> >>> >> >> > a
> >>> >> >> >> given parser installed or packaged with the parser. We could
> >>> load the
> >>> >> >> >> template from there, edit, save and deploy etc. We can extend
> >>> that
> >>> >> >> > concept
> >>> >> >> >> more and more later (drafts, versioning etc )
> >>> >> >> >>>
> >>> >> >> >>>
> >>> >> >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball (
> >>> >> >> >> si...@simonellistonball.com <mailto:simon@simonellistonbal
> >>> l.com>)
> >>> >> wrote:
> >>> >> >> >>>>
> >>> >> >> >>>> A little while ago the issue of managing Elastic templates
> >>> for new
> >>> >> >> >> sensor configs came up, and we didn’t quite put it to bed.
> >>> >> >> >>>>
> >>> >> >> >>>> When creating new sensors, I almost invariably find the
> >>> >> auto-generated
> >>> >> >> >> schemas for elastic pick some incorrect types. I also find I
> >>> have to
> >>> >> >> >> recreate indexes every time to push in the proper dynamic
> >>> templates
> >>> >> for
> >>> >> >> >> things like geo enrichment fields.
> >>> >> >> >>>>
> >>> >> >> >>>> So, my questions are:
> >>> >> >> >>>> How should we address elastic template for new sensors?
> >>> >> >> >>>> Do we have circumstances where we would need to configure
> >>> types, or
> >>> >> >> > can
> >>> >> >> >> we get away with inferring them?
> >>> >> >> >>>> Should we just add some additional dynamic templates to
> >>> cover our
> >>> >> >> >> common fields like timestamp (the most common culprit I find
> >>> for
> >>> >> >> > incorrect
> >>> >> >> >> typing)?
> >>> >> >> >>>>
> >>> >> >> >>>> I’d also like to think about ways we can generalise this.
> >>> Does
> >>> >> anyone
> >>> >> >> >> have any thoughts on what sort of additional index schemes we
> >>> should
> >>> >> want
> >>> >> >> >> to infer (solr seems an obvious one, any others?).
> >>> >> >> >>>>
> >>> >> >> >>>> Thoughts on a well typed, schemaed and easily indexed
> >>> postcard
> >>> >> please
> >>> >> >> > :)
> >>> >> >> >>>>
> >>> >> >> >>>> Simon
> >>> >> >> >>
> >>> >>
> >>> >> --
> >>> >
> >>> > Jon
> >>> >
> >>> > Sent from my mobile device
> >>>
> >>> -------------------
> >>> Thank you,
> >>>
> >>> James Sirota
> >>> PPMC- Apache Metron (Incubating)
> >>> jsirota AT apache DOT org
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] Management of Elastic and other index schemas

Reply via email to