Hi Gian, I have seen the mention code but I have several points I am currently struggling with: 1. Currently InputFormats are not very composable so I cannot reuse functionality from the existing InputFormats, so I have to replicate the functionality in my implementation (I really dont like to do that). I could extract the common functionality to some common utilities but that will yield a big refactoring. 2. Since I receive the records as "InputEntity" which is generic I am using "IntermediateRowParsingReader" to cast the records to KeyValueEntity, but what should be the behaviour for records which are not key value? I could Just return an "empty" key that will provide no additional fields but thats doesnt feel right. a better approach was to limit this combination from existing, I guess this can be achieved in the UI part but thats out of my skillset (backend developer.....)
Regarding your comments: 1. Thanks for the pointer, I thought by simply defining dedicated Input formats for the 2 use cases I require and generalize later. first get it working :) 2 + 3. I could solve this by having a structured spec that will have "sub specs" for the key and value. But as I mentioned before InputFormats are not composable. Meaning I need to replicate all specs if I wont make a huge refactor. Therefor I thought to start with a dedicated formats: JsonJson, DelimitedJson at start to cover the use cases I wish to cover for now. Than I guess some pattern will reveal itself for any addition :) On 2021/04/21 15:40:28, Gian Merlino <g...@apache.org> wrote: > Hey Noam, > > I think this would certainly be useful, and thank you for your interest in > contributing! > > I think the toughest part will be designing a good API (meaning: what would > users specify in the kafka supervisor json spec in order to activate and > configure this feature?). So a good way to proceed would be to propose some > API, gather some community feedback on the design of the API, and then > start working on a patch. > > Some thoughts on API design: > > 1) https://github.com/apache/druid/pull/10730 adds some related > functionality that you would want to hook into. This patch added Java APIs > that can be used in extensions, but didn't add any JSON APIs that can be > used by regular users. But you could build some JSON APIs on top of this. > > 2) Some keys are "formatted" (like the examples you gave: json and > delimited). Formatted keys should be parsed and fields extracted from them > somehow, using their own InputFormat. Maybe we should call it the > "keyInputFormat". We need to figure out what semantics make the most sense > for presenting the parsed key to later stages of the system (which expect a > single namespace). Merging the parsed key map with the parsed value map > seems like a bad idea, since there might be field name collisions. So maybe > we should prefix them with some string like "__key.". There could still be > collisions, but they'd be less likely if we choose an uncommon prefix. At > some point, we may also need to let users specify their own prefix, or even > something fancier like an explicit mapping. But I think we won't need that > feature on day 1. > > 3) There are also unformatted keys that might be simple strings or byte > arrays. These unformatted keys should become a single field. I’m not sure > which is more prevalent, or which one we should build first, but I think > ultimately we’ll want to support both styles. > > On Fri, Apr 16, 2021 at 3:36 PM noam shaish <noamsha...@gmail.com> wrote: > > > Hi, > > I would like to try and add a InputFormat for Kafka to support also fields > > coming from the event key. > > In my scenario there are to options: > > 1. both key and value are json > > 2. key is delimited string and the value is json. > > > > Would such a feature will be welcome for contribution? or should I keep on > > my own fork? > > > > Thanks, > > Noam > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > For additional commands, e-mail: dev-h...@druid.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org