Hi Xavier,
Thanks for the pointer,
Would an option in the spec "includeHeaders" make sense?
Then user could indicate he wish to include headers as fields.

My idea for API for the spec was:
{
 "timestampSpec": ...,
 "dimensionsSpec": ...,
 ...
  "includeHeaders": true,
  "headersPrefix": "hd_",
  "keyPrefix": "k_",
  "keySpec": { 
    ... an existing spec for key input format, for example JSONParseSpec (or a 
sub set of the existing one)
  },
  "valueSpec": {
   ... an existing spec for key input format, for example JSONParseSpec (or a 
sub set of the existing one)
  }
  "keepNullColumns": false,
  ...
}

The part that might be confusing in this setup that the input specs existing 
today have repetition in them like timestampSpec which make sense only in the 
root Spec, but that could be simply ignored if provided (Not very friendly but 
again overcoming it will require a bigger refactoring)

On 2021/04/21 17:13:45, Xavier Léauté <xav...@confluent.io.INVALID> wrote: 
> Thanks Noam,
> 
> I think this would be a great addition. We should not limit this to keys
> though, and include Kafka record headers as well, since the semantics are
> similar.
> Header names are already well-defined strings, so an API for keys would
> likely apply easily to headers as well.
> 
> As Gian mentioned, the support to access those Kafka record fields from
> InputFormat is already there, but we lack a good user-facing API,
> so if you would like to help figure that out that'd be awesome.
> 
> - Xavier
> 
> On Wed, Apr 21, 2021 at 8:40 AM Gian Merlino <g...@apache.org> wrote:
> 
> > Hey Noam,
> >
> > I think this would certainly be useful, and thank you for your interest in
> > contributing!
> >
> > I think the toughest part will be designing a good API (meaning: what would
> > users specify in the kafka supervisor json spec in order to activate and
> > configure this feature?). So a good way to proceed would be to propose some
> > API, gather some community feedback on the design of the API, and then
> > start working on a patch.
> >
> > Some thoughts on API design:
> >
> > 1) https://github.com/apache/druid/pull/10730 adds some related
> > functionality that you would want to hook into. This patch added Java APIs
> > that can be used in extensions, but didn't add any JSON APIs that can be
> > used by regular users. But you could build some JSON APIs on top of this.
> >
> > 2) Some keys are "formatted" (like the examples you gave: json and
> > delimited). Formatted keys should be parsed and fields extracted from them
> > somehow, using their own InputFormat. Maybe we should call it the
> > "keyInputFormat". We need to figure out what semantics make the most sense
> > for presenting the parsed key to later stages of the system (which expect a
> > single namespace). Merging the parsed key map with the parsed value map
> > seems like a bad idea, since there might be field name collisions. So maybe
> > we should prefix them with some string like "__key.". There could still be
> > collisions, but they'd be less likely if we choose an uncommon prefix. At
> > some point, we may also need to let users specify their own prefix, or even
> > something fancier like an explicit mapping. But I think we won't need that
> > feature on day 1.
> >
> > 3) There are also unformatted keys that might be simple strings or byte
> > arrays. These unformatted keys should become a single field. I’m not sure
> > which is more prevalent, or which one we should build first, but I think
> > ultimately we’ll want to support both styles.
> >
> > On Fri, Apr 16, 2021 at 3:36 PM noam shaish <noamsha...@gmail.com> wrote:
> >
> > > Hi,
> > > I would like to try and add a InputFormat for Kafka to support also
> > fields
> > > coming from the event key.
> > > In my scenario there are to options:
> > > 1. both key and value are json
> > > 2. key is delimited string and the value is json.
> > >
> > > Would such a feature will be welcome for contribution? or should I keep
> > on
> > > my own fork?
> > >
> > > Thanks,
> > > Noam
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > For additional commands, e-mail: dev-h...@druid.apache.org
> > >
> > >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to