Re: Flatten JSON processor
Mike, Thanks for the contribution! I was just talking to someone about flattening JSON :) Another great processor would be FlattenRecord, which would handle any format coming in (using a specified RecordReader). I'm fairly busy at the moment, but hope to review your PR when I get a chance. In the meantime, anyone else please feel free to take it! Cheers, Matt On Thu, Nov 30, 2017 at 11:51 AM, Mike Thomsenwrote: > I just submitted a PR for a JSON flattener processor. It may prove useful > for people who need to ingest complex JSON documents into HBase or > something like that. > > https://github.com/apache/nifi/pull/2307 > > The library it uses is ASL according to the author's repo. > > Thanks, > > Mike
Flatten JSON processor
I just submitted a PR for a JSON flattener processor. It may prove useful for people who need to ingest complex JSON documents into HBase or something like that. https://github.com/apache/nifi/pull/2307 The library it uses is ASL according to the author's repo. Thanks, Mike
Re: "Flatten" JSON
Created an issue for this functionality [1]. Please change issue properties and comment as necessary. -Nick [1] https://issues.apache.org/jira/browse/NIFI-4398 On Sat, Sep 16, 2017 at 4:55 PM, Matt Burgess <mattyb...@apache.org> wrote: > +1 for FlattenRecord as well. In the meantime you can use > ExecuteScript or InvokeScriptedProcessor, I have a Groovy script > (albeit for a different product) that does the flatten [1]. > > Regards, > Matt > > [1] http://funpdi.blogspot.com/2014/10/flatten-json-to-key- > value-pairs-in-pdi.html > > On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <kdoran.apa...@gmail.com> > wrote: > > +1 for adding a FlattenRecord processor. I can think of a few scenarios > in which it would be quite useful, and it would be convenient if it could > be accomplished without JOLT. > > > > Thanks, > > Kevin > > > > On 9/15/17, 09:16, "Nicholas Hughes" <nicholasmhug...@gmail.com on > behalf of nicholasmhughes.n...@gmail.com> wrote: > > > > Mark, > > > > I'm definitely for making the processor as generic as possible. I > don't > > mind chaining together a few simple processors to get a job done > (such as > > convert JSON to Avro > infer schema > flatten records)... I just > don't want > > steps get super complex... and the Jolt Transform processor does > seem very > > powerful and very complex. > > > > If there's some support for a "FlattenRecord" processor, I can > submit the > > Jira containing the meat of this thread. > > > > -Nick > > > > > > On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com> > wrote: > > > > > Nick, > > > > > > I do believe that there's a way to do what you're asking with Jolt, > > > without knowing any kind of schema. > > > That said, Jolt can get complex pretty quickly and I don't know it > well > > > :) Personally, I have no problem with having a > > > FlattenRecord processor. I guess the question here, though, is are > you > > > using Record-oriented processors, > > > or are you using JSON-specific processors? > > > > > > Personally, I'd like to see a FlattenRecord processor, rather than > > > FlattenJSON, because that would allow > > > the transformation to apply to Avro as well (and as soon as we get > an XML > > > reader built, XML also). However, > > > the Record-oriented processors would expect that a schema be given > (though > > > it could also be inferred using > > > another existing processor). > > > > > > -Mark > > > > > > > > > > > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes < > > > nicholasmhughes.n...@gmail.com> wrote: > > > > > > > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > > > > > > > For input data like that shown below from Yahoo [1] > > > > > > > > { > > > > "query": { > > > >"count": 1, > > > >"created": "2017-09-15T11:20:26Z", > > > >"lang": "en-US", > > > >"results": { > > > > "channel": { > > > >"item": { > > > > "condition": { > > > >"code": "33", > > > >"date": "Fri, 15 Sep 2017 06:00 AM EDT", > > > >"temp": "63", > > > >"text": "Mostly Clear" > > > > } > > > >} > > > > } > > > >} > > > > } > > > > } > > > > > > > > > > > > ...I'd like to end up with output something like this: > > > > > > > > { > > > > "query.count": 1, > > > > "query.created": "2017-09-15T11:20:26Z", > > > > "query.lang": "en-US", > > > > "query.results.channel.item.condition.code": "33", > > > > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 > 06:00 > > > AM EDT&q
Re: "Flatten" JSON
+1 for FlattenRecord as well. In the meantime you can use ExecuteScript or InvokeScriptedProcessor, I have a Groovy script (albeit for a different product) that does the flatten [1]. Regards, Matt [1] http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <kdoran.apa...@gmail.com> wrote: > +1 for adding a FlattenRecord processor. I can think of a few scenarios in > which it would be quite useful, and it would be convenient if it could be > accomplished without JOLT. > > Thanks, > Kevin > > On 9/15/17, 09:16, "Nicholas Hughes" <nicholasmhug...@gmail.com on behalf of > nicholasmhughes.n...@gmail.com> wrote: > > Mark, > > I'm definitely for making the processor as generic as possible. I don't > mind chaining together a few simple processors to get a job done (such as > convert JSON to Avro > infer schema > flatten records)... I just don't > want > steps get super complex... and the Jolt Transform processor does seem very > powerful and very complex. > > If there's some support for a "FlattenRecord" processor, I can submit the > Jira containing the meat of this thread. > > -Nick > > > On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com> wrote: > > > Nick, > > > > I do believe that there's a way to do what you're asking with Jolt, > > without knowing any kind of schema. > > That said, Jolt can get complex pretty quickly and I don't know it well > > :) Personally, I have no problem with having a > > FlattenRecord processor. I guess the question here, though, is are you > > using Record-oriented processors, > > or are you using JSON-specific processors? > > > > Personally, I'd like to see a FlattenRecord processor, rather than > > FlattenJSON, because that would allow > > the transformation to apply to Avro as well (and as soon as we get an > XML > > reader built, XML also). However, > > the Record-oriented processors would expect that a schema be given > (though > > it could also be inferred using > > another existing processor). > > > > -Mark > > > > > > > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes < > > nicholasmhughes.n...@gmail.com> wrote: > > > > > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > > > > > For input data like that shown below from Yahoo [1] > > > > > > { > > > "query": { > > >"count": 1, > > >"created": "2017-09-15T11:20:26Z", > > >"lang": "en-US", > > >"results": { > > > "channel": { > > >"item": { > > > "condition": { > > >"code": "33", > > >"date": "Fri, 15 Sep 2017 06:00 AM EDT", > > >"temp": "63", > > >"text": "Mostly Clear" > > > } > > >} > > > } > > >} > > > } > > > } > > > > > > > > > ...I'd like to end up with output something like this: > > > > > > { > > > "query.count": 1, > > > "query.created": "2017-09-15T11:20:26Z", > > > "query.lang": "en-US", > > > "query.results.channel.item.condition.code": "33", > > > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 > > AM EDT", > > > "query.results.channel.item.condition.temp": "63", > > > "query.results.channel.item.condition.text": "Mostly Clear" > > > } > > > > > > > > > I checked out the JoltTransformJSON processor and some examples, such > as > > > the nested data to "prefix soup" demo [2], but it seems as though I > need > > to > > > enter information about the schema for the incoming data in order to > > > transform it. Ideally, I'd like to have a processor "just figure it > out" > > > without explicit entry of a schema. > > > > > > Is there any way to accomplish this in a generic way with > > JoltTransformJSON > > > (or another native processor)? > > > > > > If not, would a ticket requesting a "Field Flattener" processor much > like > > > the one included in StreamSets Data Collector [3] be worthwhile? > > > > > > Thanks in advance! > > > > > > -Nick > > > > > > > > > [1] > > > https://query.yahooapis.com/v1/public/yql?q=select%20item. > > condition%20from%20weather.forecast%20where%20woeid%20% > > 3D%202383558=json=store%3A%2F%2Fdatatables.org% > > 2Falltableswithkeys > > > > > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup > > > > > > [3] > > > https://github.com/streamsets/datacollector/tree/master/ > > basic-lib/src/main/java/com/streamsets/pipeline/stage/ > > processor/fieldflattener > > > > > > >
Re: "Flatten" JSON
+1 for adding a FlattenRecord processor. I can think of a few scenarios in which it would be quite useful, and it would be convenient if it could be accomplished without JOLT. Thanks, Kevin On 9/15/17, 09:16, "Nicholas Hughes"wrote: Mark, I'm definitely for making the processor as generic as possible. I don't mind chaining together a few simple processors to get a job done (such as convert JSON to Avro > infer schema > flatten records)... I just don't want steps get super complex... and the Jolt Transform processor does seem very powerful and very complex. If there's some support for a "FlattenRecord" processor, I can submit the Jira containing the meat of this thread. -Nick On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne wrote: > Nick, > > I do believe that there's a way to do what you're asking with Jolt, > without knowing any kind of schema. > That said, Jolt can get complex pretty quickly and I don't know it well > :) Personally, I have no problem with having a > FlattenRecord processor. I guess the question here, though, is are you > using Record-oriented processors, > or are you using JSON-specific processors? > > Personally, I'd like to see a FlattenRecord processor, rather than > FlattenJSON, because that would allow > the transformation to apply to Avro as well (and as soon as we get an XML > reader built, XML also). However, > the Record-oriented processors would expect that a schema be given (though > it could also be inferred using > another existing processor). > > -Mark > > > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes < > nicholasmhughes.n...@gmail.com> wrote: > > > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > > > For input data like that shown below from Yahoo [1] > > > > { > > "query": { > >"count": 1, > >"created": "2017-09-15T11:20:26Z", > >"lang": "en-US", > >"results": { > > "channel": { > >"item": { > > "condition": { > >"code": "33", > >"date": "Fri, 15 Sep 2017 06:00 AM EDT", > >"temp": "63", > >"text": "Mostly Clear" > > } > >} > > } > >} > > } > > } > > > > > > ...I'd like to end up with output something like this: > > > > { > > "query.count": 1, > > "query.created": "2017-09-15T11:20:26Z", > > "query.lang": "en-US", > > "query.results.channel.item.condition.code": "33", > > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 > AM EDT", > > "query.results.channel.item.condition.temp": "63", > > "query.results.channel.item.condition.text": "Mostly Clear" > > } > > > > > > I checked out the JoltTransformJSON processor and some examples, such as > > the nested data to "prefix soup" demo [2], but it seems as though I need > to > > enter information about the schema for the incoming data in order to > > transform it. Ideally, I'd like to have a processor "just figure it out" > > without explicit entry of a schema. > > > > Is there any way to accomplish this in a generic way with > JoltTransformJSON > > (or another native processor)? > > > > If not, would a ticket requesting a "Field Flattener" processor much like > > the one included in StreamSets Data Collector [3] be worthwhile? > > > > Thanks in advance! > > > > -Nick > > > > > > [1] > > https://query.yahooapis.com/v1/public/yql?q=select%20item. > condition%20from%20weather.forecast%20where%20woeid%20% > 3D%202383558=json=store%3A%2F%2Fdatatables.org% > 2Falltableswithkeys > > > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup > > > > [3] > > https://github.com/streamsets/datacollector/tree/master/ > basic-lib/src/main/java/com/streamsets/pipeline/stage/ > processor/fieldflattener > >
Re: "Flatten" JSON
Mark, I'm definitely for making the processor as generic as possible. I don't mind chaining together a few simple processors to get a job done (such as convert JSON to Avro > infer schema > flatten records)... I just don't want steps get super complex... and the Jolt Transform processor does seem very powerful and very complex. If there's some support for a "FlattenRecord" processor, I can submit the Jira containing the meat of this thread. -Nick On Fri, Sep 15, 2017 at 9:01 AM, Mark Paynewrote: > Nick, > > I do believe that there's a way to do what you're asking with Jolt, > without knowing any kind of schema. > That said, Jolt can get complex pretty quickly and I don't know it well > :) Personally, I have no problem with having a > FlattenRecord processor. I guess the question here, though, is are you > using Record-oriented processors, > or are you using JSON-specific processors? > > Personally, I'd like to see a FlattenRecord processor, rather than > FlattenJSON, because that would allow > the transformation to apply to Avro as well (and as soon as we get an XML > reader built, XML also). However, > the Record-oriented processors would expect that a schema be given (though > it could also be inferred using > another existing processor). > > -Mark > > > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes < > nicholasmhughes.n...@gmail.com> wrote: > > > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > > > For input data like that shown below from Yahoo [1] > > > > { > > "query": { > >"count": 1, > >"created": "2017-09-15T11:20:26Z", > >"lang": "en-US", > >"results": { > > "channel": { > >"item": { > > "condition": { > >"code": "33", > >"date": "Fri, 15 Sep 2017 06:00 AM EDT", > >"temp": "63", > >"text": "Mostly Clear" > > } > >} > > } > >} > > } > > } > > > > > > ...I'd like to end up with output something like this: > > > > { > > "query.count": 1, > > "query.created": "2017-09-15T11:20:26Z", > > "query.lang": "en-US", > > "query.results.channel.item.condition.code": "33", > > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 > AM EDT", > > "query.results.channel.item.condition.temp": "63", > > "query.results.channel.item.condition.text": "Mostly Clear" > > } > > > > > > I checked out the JoltTransformJSON processor and some examples, such as > > the nested data to "prefix soup" demo [2], but it seems as though I need > to > > enter information about the schema for the incoming data in order to > > transform it. Ideally, I'd like to have a processor "just figure it out" > > without explicit entry of a schema. > > > > Is there any way to accomplish this in a generic way with > JoltTransformJSON > > (or another native processor)? > > > > If not, would a ticket requesting a "Field Flattener" processor much like > > the one included in StreamSets Data Collector [3] be worthwhile? > > > > Thanks in advance! > > > > -Nick > > > > > > [1] > > https://query.yahooapis.com/v1/public/yql?q=select%20item. > condition%20from%20weather.forecast%20where%20woeid%20% > 3D%202383558=json=store%3A%2F%2Fdatatables.org% > 2Falltableswithkeys > > > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup > > > > [3] > > https://github.com/streamsets/datacollector/tree/master/ > basic-lib/src/main/java/com/streamsets/pipeline/stage/ > processor/fieldflattener > >
Re: "Flatten" JSON
Nick, I do believe that there's a way to do what you're asking with Jolt, without knowing any kind of schema. That said, Jolt can get complex pretty quickly and I don't know it well :) Personally, I have no problem with having a FlattenRecord processor. I guess the question here, though, is are you using Record-oriented processors, or are you using JSON-specific processors? Personally, I'd like to see a FlattenRecord processor, rather than FlattenJSON, because that would allow the transformation to apply to Avro as well (and as soon as we get an XML reader built, XML also). However, the Record-oriented processors would expect that a schema be given (though it could also be inferred using another existing processor). -Mark > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes> wrote: > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > For input data like that shown below from Yahoo [1] > > { > "query": { >"count": 1, >"created": "2017-09-15T11:20:26Z", >"lang": "en-US", >"results": { > "channel": { >"item": { > "condition": { >"code": "33", >"date": "Fri, 15 Sep 2017 06:00 AM EDT", >"temp": "63", >"text": "Mostly Clear" > } >} > } >} > } > } > > > ...I'd like to end up with output something like this: > > { > "query.count": 1, > "query.created": "2017-09-15T11:20:26Z", > "query.lang": "en-US", > "query.results.channel.item.condition.code": "33", > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT", > "query.results.channel.item.condition.temp": "63", > "query.results.channel.item.condition.text": "Mostly Clear" > } > > > I checked out the JoltTransformJSON processor and some examples, such as > the nested data to "prefix soup" demo [2], but it seems as though I need to > enter information about the schema for the incoming data in order to > transform it. Ideally, I'd like to have a processor "just figure it out" > without explicit entry of a schema. > > Is there any way to accomplish this in a generic way with JoltTransformJSON > (or another native processor)? > > If not, would a ticket requesting a "Field Flattener" processor much like > the one included in StreamSets Data Collector [3] be worthwhile? > > Thanks in advance! > > -Nick > > > [1] > https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup > > [3] > https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener
"Flatten" JSON
Is there an easy way to "flatten" arbitrary JSON within NiFi? For input data like that shown below from Yahoo [1] { "query": { "count": 1, "created": "2017-09-15T11:20:26Z", "lang": "en-US", "results": { "channel": { "item": { "condition": { "code": "33", "date": "Fri, 15 Sep 2017 06:00 AM EDT", "temp": "63", "text": "Mostly Clear" } } } } } } ...I'd like to end up with output something like this: { "query.count": 1, "query.created": "2017-09-15T11:20:26Z", "query.lang": "en-US", "query.results.channel.item.condition.code": "33", "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT", "query.results.channel.item.condition.temp": "63", "query.results.channel.item.condition.text": "Mostly Clear" } I checked out the JoltTransformJSON processor and some examples, such as the nested data to "prefix soup" demo [2], but it seems as though I need to enter information about the schema for the incoming data in order to transform it. Ideally, I'd like to have a processor "just figure it out" without explicit entry of a schema. Is there any way to accomplish this in a generic way with JoltTransformJSON (or another native processor)? If not, would a ticket requesting a "Field Flattener" processor much like the one included in StreamSets Data Collector [3] be worthwhile? Thanks in advance! -Nick [1] https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup [3] https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener