Re: [DISCUSS] Management of Elastic and other index schemas

Nick Allen Mon, 20 Feb 2017 11:36:26 -0800

Taking this a step further, I think this challenge goes beyond just
parsers.  We would also need to solve this problem for enrichments.  When I
add an enrichment, I want the enriched data to be indexed accurately.   How
can we make that happen?


   - As part of defining an enrichment, should I also be able to specify
   the fields and types using this same generic definition?
   - Or could this be inferred somehow via an extension to Stellar?



On Mon, Feb 20, 2017 at 2:29 PM, Nick Allen <n...@nickallen.org> wrote:

> I like the flexibility and extensibility of having some kind of internal
> representation (generic definition) of the names and types of the fields
> produced by a parser.
>
> Rather than shipping with an ES template, a parser would ship with a
> generic definition of the field names and data types that it adds to a
> message.  The Elasticsearch indexer would then take this generic definition
> and translate it to an Elasticsearch template.
>
> Each new indexer (Solr, etc) would know how to consume this generic
> definition and produce whatever artifacts that it needs to index the data
> accurately.
>
>
>
>
>
>
>
> On Sat, Feb 18, 2017 at 12:26 PM, James Sirota <jsir...@apache.org> wrote:
>
>> I am not sure I agree with packaging source-specific templates with the
>> parser.  I think that would make it harder to add additional storage
>> sources.  For example, what happens if I have 50 parsers with Solr and ES
>> schemas defined, but now I want to add druid?  Now I have to add 50 schemas
>> to all my  existing parsers, which I don't think makes sense.  I think what
>> we should have instead is tuple mappers that map some internal
>> representation of our schema to whatever schema the tool uses.  We already
>> somewhat started to move down this path with Kyle defining the schema enum
>> for his ASA parser PR and Simon defining a JSON schema for his CEF parser
>> PR.  I think we need to unify these approaches and then propagate them to
>> all the parsers.  I think what has to happen is the following:
>>
>> We have to introduce a partial schema for Metron messages where you can
>> enforce a schema on a part of a message you want, but at the same time
>> allow enough flexibility for the rest of the message to be flexible.  What
>> I mean by that is that you should enforce a schema for things like ip,
>> protocol, timestamp, etc, but have a fully flexible structure outside of
>> that.
>>
>> After you do that then you can map the partial schema you defined to es,
>> solr, druid, etc, etc.  For the fields you don't have a schema for you just
>> assume they are strings.  To add additional storage/indexing source to
>> Metron all you do is define a mapper to that source's schema and load that
>> into our indexing bolt.
>>
>>
>>
>> Thanks,
>> James
>>
>> 17.02.2017, 16:36, "zeo...@gmail.com" <zeo...@gmail.com>:
>> > I think this is a good direction to move things toward - moving indexing
>> > templates to be packaged with parsers (using multiple tiered options)
>> that
>> > are then merged with the possible enrich fields before getting added to
>> the
>> > indexing technology in use. Now, to read the proposal thread...
>> >
>> > Jon
>> >
>> > On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball <
>> > si...@simonellistonball.com> wrote:
>> >
>> >>  I’d broadly agree with that tiered approach.
>> >>
>> >>  The version where the parser emits a generic schema, and enrichments
>> >>  contribute generic schema chunks to that which get combined into an
>> indexer
>> >>  specific template generated at the end of the flow, so yes, pretty
>> much
>> >>  inline with your proposal. (I did read though it, apologies if I
>> missed any
>> >>  of the detail, brain is still a little bit post-RSA!)
>> >>
>> >>  Simon
>> >>
>> >>  > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwa...@gmail.com>
>> wrote:
>> >>  >
>> >>  > We already make them do this now, or they get the defaults. So this
>> is
>> >>  no different.
>> >>  > Having parsers emit names and types etc, that would be another step
>> - or
>> >>  it could be the ‘generic schema’ as implemented actually.
>> >>  >
>> >>  > A tiered approach - from
>> >>  > * you give nothing with the parser - you get whatever ES guesses at
>> but
>> >>  you don’t care do you
>> >>  > * you give the schema
>> >>  > * you give the types and we figure it out for you
>> >>  >
>> >>  > would be the best to move to.
>> >>  >
>> >>  > Also, we could use the names and types method tied to enrichment to
>> >>  generate indexing templates for enrichment types or deriving them
>> rather,
>> >>  which i mention in my proposal.
>> >>  >
>> >>  > I’m starting to think you haven’t rushed out to read it Simon ;)
>> >>  >
>> >>  >
>> >>  >
>> >>  > On February 17, 2017 at 15:24:37, Simon Elliston Ball (
>> >>  si...@simonellistonball.com <mailto:si...@simonellistonball.com>)
>> wrote:
>> >>  >
>> >>  >> I like that, to an extent… Forcing the provision of explicit schema
>> >>  might be a bit of a load for parser development. I’m assuming that
>> custom
>> >>  parsers would be pushed towards the same packaging approach.
>> >>  >>
>> >>  >> Would it make sense to require the parser to emit field names and
>> types
>> >>  expected, and then for us to provide a means of creating the
>> templates for
>> >>  supported indices, and push the actual template management to the
>> index
>> >>  layer rather than the parsing layer. Schema is after all determined
>> not
>> >>  just by a parser, but also by the combination of enrichments and
>> models
>> >>  applied.
>> >>  >>
>> >>  >> We could also of course provide an override option within your
>> proposed
>> >>  parser package model to allow any destination specific configuration
>> of the
>> >>  indexing template.
>> >>  >>
>> >>  >> Simon
>> >>  >>
>> >>  >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwa...@gmail.com
>> >>  <mailto:ottobackwa...@gmail.com>> wrote:
>> >>  >> >
>> >>  >> > I think we can get there from my proposal.
>> >>  >> > A source may package:
>> >>  >> > * explicit schemas ( ES, SOLR, FOO )
>> >>  >> > * a generic to be invented schema for a to be invented pluggable
>> >>  indexing
>> >>  >> > component :)
>> >>  >> > and we’ll be able to handle it.
>> >>  >> >
>> >>  >> >
>> >>  >> >
>> >>  >> > On February 17, 2017 at 14:39:07, Kyle Richardson (
>> >>  kylerichards...@gmail.com <mailto:kylerichards...@gmail.com>)
>> >>  >> > wrote:
>> >>  >> >
>> >>  >> > I personally like the idea of a typed schema per parser that we
>> could
>> >>  >> > translate to multiple targets. This would allow us a lot more
>> >>  modularity
>> >>  >> > and extensibility in indexing down the road.
>> >>  >> >
>> >>  >> > -Kyle
>> >>  >> >
>> >>  >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball <
>> >>  >> > si...@simonellistonball.com <mailto:si...@simonellistonball.com
>> >>
>> >>  wrote:
>> >>  >> >
>> >>  >> >> That sounds like a great idea Otto. Do you have any early
>> design on
>> >>  that
>> >>  >> >> we can look at. Also, rather than just elastic templates do you
>> >>  think we
>> >>  >> >> should have some sort of typed schema we could translate to
>> multiple
>> >>  >> >> targets (solr, elastic, ur... other...) or are you thinking of
>> >>  packaging
>> >>  >> >> specific scheme assets like template json with the parser?
>> >>  >> >>
>> >>  >> >> Simon
>> >>  >> >>
>> >>  >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <ottobackwa...@gmail.com
>> >>  <mailto:ottobackwa...@gmail.com>> wrote:
>> >>  >> >>>
>> >>  >> >>>
>> >>  >> >>> Not to jump the gun, but I’m crafting a proposal about parsers
>> and
>> >>  one
>> >>  >> >> of the things I am going to propose relates to having the ES
>> >>  Template for
>> >>  >> > a
>> >>  >> >> given parser installed or packaged with the parser. We could
>> load the
>> >>  >> >> template from there, edit, save and deploy etc. We can extend
>> that
>> >>  >> > concept
>> >>  >> >> more and more later (drafts, versioning etc )
>> >>  >> >>>
>> >>  >> >>>
>> >>  >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball (
>> >>  >> >> si...@simonellistonball.com <mailto:si...@simonellistonball.com
>> >)
>> >>  wrote:
>> >>  >> >>>>
>> >>  >> >>>> A little while ago the issue of managing Elastic templates
>> for new
>> >>  >> >> sensor configs came up, and we didn’t quite put it to bed.
>> >>  >> >>>>
>> >>  >> >>>> When creating new sensors, I almost invariably find the
>> >>  auto-generated
>> >>  >> >> schemas for elastic pick some incorrect types. I also find I
>> have to
>> >>  >> >> recreate indexes every time to push in the proper dynamic
>> templates
>> >>  for
>> >>  >> >> things like geo enrichment fields.
>> >>  >> >>>>
>> >>  >> >>>> So, my questions are:
>> >>  >> >>>> How should we address elastic template for new sensors?
>> >>  >> >>>> Do we have circumstances where we would need to configure
>> types, or
>> >>  >> > can
>> >>  >> >> we get away with inferring them?
>> >>  >> >>>> Should we just add some additional dynamic templates to cover
>> our
>> >>  >> >> common fields like timestamp (the most common culprit I find for
>> >>  >> > incorrect
>> >>  >> >> typing)?
>> >>  >> >>>>
>> >>  >> >>>> I’d also like to think about ways we can generalise this. Does
>> >>  anyone
>> >>  >> >> have any thoughts on what sort of additional index schemes we
>> should
>> >>  want
>> >>  >> >> to infer (solr seems an obvious one, any others?).
>> >>  >> >>>>
>> >>  >> >>>> Thoughts on a well typed, schemaed and easily indexed postcard
>> >>  please
>> >>  >> > :)
>> >>  >> >>>>
>> >>  >> >>>> Simon
>> >>  >> >>
>> >>
>> >>  --
>> >
>> > Jon
>> >
>> > Sent from my mobile device
>>
>> -------------------
>> Thank you,
>>
>> James Sirota
>> PPMC- Apache Metron (Incubating)
>> jsirota AT apache DOT org
>>
>
>

Re: [DISCUSS] Management of Elastic and other index schemas

Reply via email to