Re: "Flatten" JSON

Nicholas Hughes Fri, 15 Sep 2017 06:17:04 -0700

Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.


If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com> wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> nicholasmhughes.n...@gmail.com> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >    "count": 1,
> >    "created": "2017-09-15T11:20:26Z",
> >    "lang": "en-US",
> >    "results": {
> >      "channel": {
> >        "item": {
> >          "condition": {
> >            "code": "33",
> >            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >            "temp": "63",
> >            "text": "Mostly Clear"
> >          }
> >        }
> >      }
> >    }
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>

Re: "Flatten" JSON

Reply via email to