Mark, I'm definitely for making the processor as generic as possible. I don't mind chaining together a few simple processors to get a job done (such as convert JSON to Avro > infer schema > flatten records)... I just don't want steps get super complex... and the Jolt Transform processor does seem very powerful and very complex.
If there's some support for a "FlattenRecord" processor, I can submit the Jira containing the meat of this thread. -Nick On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com> wrote: > Nick, > > I do believe that there's a way to do what you're asking with Jolt, > without knowing any kind of schema. > That said, Jolt can get complex pretty quickly and I don't know it well > :) Personally, I have no problem with having a > FlattenRecord processor. I guess the question here, though, is are you > using Record-oriented processors, > or are you using JSON-specific processors? > > Personally, I'd like to see a FlattenRecord processor, rather than > FlattenJSON, because that would allow > the transformation to apply to Avro as well (and as soon as we get an XML > reader built, XML also). However, > the Record-oriented processors would expect that a schema be given (though > it could also be inferred using > another existing processor). > > -Mark > > > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes < > nicholasmhughes.n...@gmail.com> wrote: > > > > Is there an easy way to "flatten" arbitrary JSON within NiFi? > > > > For input data like that shown below from Yahoo [1] > > > > { > > "query": { > > "count": 1, > > "created": "2017-09-15T11:20:26Z", > > "lang": "en-US", > > "results": { > > "channel": { > > "item": { > > "condition": { > > "code": "33", > > "date": "Fri, 15 Sep 2017 06:00 AM EDT", > > "temp": "63", > > "text": "Mostly Clear" > > } > > } > > } > > } > > } > > } > > > > > > ...I'd like to end up with output something like this: > > > > { > > "query.count": 1, > > "query.created": "2017-09-15T11:20:26Z", > > "query.lang": "en-US", > > "query.results.channel.item.condition.code": "33", > > "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 > AM EDT", > > "query.results.channel.item.condition.temp": "63", > > "query.results.channel.item.condition.text": "Mostly Clear" > > } > > > > > > I checked out the JoltTransformJSON processor and some examples, such as > > the nested data to "prefix soup" demo [2], but it seems as though I need > to > > enter information about the schema for the incoming data in order to > > transform it. Ideally, I'd like to have a processor "just figure it out" > > without explicit entry of a schema. > > > > Is there any way to accomplish this in a generic way with > JoltTransformJSON > > (or another native processor)? > > > > If not, would a ticket requesting a "Field Flattener" processor much like > > the one included in StreamSets Data Collector [3] be worthwhile? > > > > Thanks in advance! > > > > -Nick > > > > > > [1] > > https://query.yahooapis.com/v1/public/yql?q=select%20item. > condition%20from%20weather.forecast%20where%20woeid%20% > 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org% > 2Falltableswithkeys > > > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup > > > > [3] > > https://github.com/streamsets/datacollector/tree/master/ > basic-lib/src/main/java/com/streamsets/pipeline/stage/ > processor/fieldflattener > >