Hi Gian,

I have seen the mention code but I have several points I am currently 
struggling with:
1. Currently InputFormats are not very composable so I cannot reuse 
functionality from the existing InputFormats, so I have to replicate the 
functionality in my implementation (I really dont like to do that).
I could extract the common functionality to some common utilities but that will 
yield a big refactoring.
2.  Since I receive the records as "InputEntity" which is generic I am using 
"IntermediateRowParsingReader" to cast the records to KeyValueEntity, but what 
should be the behaviour for records which are not key value? 
I could Just return an "empty" key that will provide no additional fields but 
thats doesnt feel right. a better approach was to limit this combination from 
existing, I guess this can be achieved in the UI part but thats out of my 
skillset (backend developer.....)

Regarding your comments:
1. Thanks for the pointer, I thought by simply defining dedicated Input formats 
for the 2 use cases I require and generalize later. first get it working :)
2 + 3.  I could solve this by having a structured spec that will have "sub 
specs" for the key and value.
But as I mentioned before InputFormats are not composable. Meaning I need to 
replicate all specs if I wont make a huge refactor. Therefor I thought to start 
with a dedicated formats:
JsonJson, DelimitedJson at start to cover the use cases I wish to cover for now.
Than I guess some pattern will reveal itself for any addition :)

On 2021/04/21 15:40:28, Gian Merlino <g...@apache.org> wrote: 
> Hey Noam,
> 
> I think this would certainly be useful, and thank you for your interest in
> contributing!
> 
> I think the toughest part will be designing a good API (meaning: what would
> users specify in the kafka supervisor json spec in order to activate and
> configure this feature?). So a good way to proceed would be to propose some
> API, gather some community feedback on the design of the API, and then
> start working on a patch.
> 
> Some thoughts on API design:
> 
> 1) https://github.com/apache/druid/pull/10730 adds some related
> functionality that you would want to hook into. This patch added Java APIs
> that can be used in extensions, but didn't add any JSON APIs that can be
> used by regular users. But you could build some JSON APIs on top of this.
> 
> 2) Some keys are "formatted" (like the examples you gave: json and
> delimited). Formatted keys should be parsed and fields extracted from them
> somehow, using their own InputFormat. Maybe we should call it the
> "keyInputFormat". We need to figure out what semantics make the most sense
> for presenting the parsed key to later stages of the system (which expect a
> single namespace). Merging the parsed key map with the parsed value map
> seems like a bad idea, since there might be field name collisions. So maybe
> we should prefix them with some string like "__key.". There could still be
> collisions, but they'd be less likely if we choose an uncommon prefix. At
> some point, we may also need to let users specify their own prefix, or even
> something fancier like an explicit mapping. But I think we won't need that
> feature on day 1.
> 
> 3) There are also unformatted keys that might be simple strings or byte
> arrays. These unformatted keys should become a single field. I’m not sure
> which is more prevalent, or which one we should build first, but I think
> ultimately we’ll want to support both styles.
> 
> On Fri, Apr 16, 2021 at 3:36 PM noam shaish <noamsha...@gmail.com> wrote:
> 
> > Hi,
> > I would like to try and add a InputFormat for Kafka to support also fields
> > coming from the event key.
> > In my scenario there are to options:
> > 1. both key and value are json
> > 2. key is delimited string and the value is json.
> >
> > Would such a feature will be welcome for contribution? or should I keep on
> > my own fork?
> >
> > Thanks,
> > Noam
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to