Re: Flatten JSON processor

2017-11-30 Thread Matt Burgess
Mike,

Thanks for the contribution!  I was just talking to someone about
flattening JSON :) Another great processor would be FlattenRecord,
which would handle any format coming in (using a specified
RecordReader). I'm fairly busy at the moment, but hope to review your
PR when I get a chance. In the meantime, anyone else please feel free
to take it!

Cheers,
Matt

On Thu, Nov 30, 2017 at 11:51 AM, Mike Thomsen  wrote:
> I just submitted a PR for a JSON flattener processor. It may prove useful
> for people who need to ingest complex JSON documents into HBase or
> something like that.
>
> https://github.com/apache/nifi/pull/2307
>
> The library it uses is ASL according to the author's repo.
>
> Thanks,
>
> Mike


Flatten JSON processor

2017-11-30 Thread Mike Thomsen
I just submitted a PR for a JSON flattener processor. It may prove useful
for people who need to ingest complex JSON documents into HBase or
something like that.

https://github.com/apache/nifi/pull/2307

The library it uses is ASL according to the author's repo.

Thanks,

Mike


Re: "Flatten" JSON

2017-09-19 Thread Nicholas Hughes
Created an issue for this functionality [1]. Please change issue properties
and comment as necessary.

-Nick

[1] https://issues.apache.org/jira/browse/NIFI-4398


On Sat, Sep 16, 2017 at 4:55 PM, Matt Burgess <mattyb...@apache.org> wrote:

> +1 for FlattenRecord as well. In the meantime you can use
> ExecuteScript or InvokeScriptedProcessor, I have a Groovy script
> (albeit for a different product) that does the flatten [1].
>
> Regards,
> Matt
>
> [1] http://funpdi.blogspot.com/2014/10/flatten-json-to-key-
> value-pairs-in-pdi.html
>
> On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <kdoran.apa...@gmail.com>
> wrote:
> > +1 for adding a FlattenRecord processor. I can think of a few scenarios
> in which it would be quite useful, and it would be convenient if it could
> be accomplished without JOLT.
> >
> > Thanks,
> > Kevin
> >
> > On 9/15/17, 09:16, "Nicholas Hughes" <nicholasmhug...@gmail.com on
> behalf of nicholasmhughes.n...@gmail.com> wrote:
> >
> > Mark,
> >
> > I'm definitely for making the processor as generic as possible. I
> don't
> > mind chaining together a few simple processors to get a job done
> (such as
> > convert JSON to Avro > infer schema > flatten records)... I just
> don't want
> > steps get super complex... and the Jolt Transform processor does
> seem very
> > powerful and very complex.
> >
> > If there's some support for a "FlattenRecord" processor, I can
> submit the
> > Jira containing the meat of this thread.
> >
> > -Nick
> >
> >
> > On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com>
> wrote:
> >
> > > Nick,
> > >
> > > I do believe that there's a way to do what you're asking with Jolt,
> > > without knowing any kind of schema.
> > > That said, Jolt can get complex pretty quickly and I don't know it
> well
> > > :)  Personally, I have no problem with having a
> > > FlattenRecord processor. I guess the question here, though, is are
> you
> > > using Record-oriented processors,
> > > or are you using JSON-specific processors?
> > >
> > > Personally, I'd like to see a FlattenRecord processor, rather than
> > > FlattenJSON, because that would allow
> > > the transformation to apply to Avro as well (and as soon as we get
> an XML
> > > reader built, XML also). However,
> > > the Record-oriented processors would expect that a schema be given
> (though
> > > it could also be inferred using
> > > another existing processor).
> > >
> > > -Mark
> > >
> > >
> > >
> > > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> > > nicholasmhughes.n...@gmail.com> wrote:
> > > >
> > > > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> > > >
> > > > For input data like that shown below from Yahoo [1]
> > > >
> > > > {
> > > >  "query": {
> > > >"count": 1,
> > > >"created": "2017-09-15T11:20:26Z",
> > > >"lang": "en-US",
> > > >"results": {
> > > >  "channel": {
> > > >"item": {
> > > >  "condition": {
> > > >"code": "33",
> > > >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> > > >"temp": "63",
> > > >"text": "Mostly Clear"
> > > >  }
> > > >}
> > > >  }
> > > >}
> > > >  }
> > > > }
> > > >
> > > >
> > > > ...I'd like to end up with output something like this:
> > > >
> > > > {
> > > >  "query.count": 1,
> > > >  "query.created": "2017-09-15T11:20:26Z",
> > > >  "query.lang": "en-US",
> > > >  "query.results.channel.item.condition.code": "33",
> > > >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017
> 06:00
> > > AM EDT&q

Re: "Flatten" JSON

2017-09-16 Thread Matt Burgess
+1 for FlattenRecord as well. In the meantime you can use
ExecuteScript or InvokeScriptedProcessor, I have a Groovy script
(albeit for a different product) that does the flatten [1].

Regards,
Matt

[1] 
http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html

On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <kdoran.apa...@gmail.com> wrote:
> +1 for adding a FlattenRecord processor. I can think of a few scenarios in 
> which it would be quite useful, and it would be convenient if it could be 
> accomplished without JOLT.
>
> Thanks,
> Kevin
>
> On 9/15/17, 09:16, "Nicholas Hughes" <nicholasmhug...@gmail.com on behalf of 
> nicholasmhughes.n...@gmail.com> wrote:
>
> Mark,
>
> I'm definitely for making the processor as generic as possible. I don't
> mind chaining together a few simple processors to get a job done (such as
> convert JSON to Avro > infer schema > flatten records)... I just don't 
> want
> steps get super complex... and the Jolt Transform processor does seem very
> powerful and very complex.
>
> If there's some support for a "FlattenRecord" processor, I can submit the
> Jira containing the meat of this thread.
>
> -Nick
>
>
> On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <marka...@hotmail.com> wrote:
>
> > Nick,
> >
> > I do believe that there's a way to do what you're asking with Jolt,
> > without knowing any kind of schema.
> > That said, Jolt can get complex pretty quickly and I don't know it well
> > :)  Personally, I have no problem with having a
> > FlattenRecord processor. I guess the question here, though, is are you
> > using Record-oriented processors,
> > or are you using JSON-specific processors?
> >
> > Personally, I'd like to see a FlattenRecord processor, rather than
> > FlattenJSON, because that would allow
> > the transformation to apply to Avro as well (and as soon as we get an 
> XML
> > reader built, XML also). However,
> > the Record-oriented processors would expect that a schema be given 
> (though
> > it could also be inferred using
> > another existing processor).
> >
> > -Mark
> >
> >
> >
> > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> > nicholasmhughes.n...@gmail.com> wrote:
> > >
> > > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> > >
> > > For input data like that shown below from Yahoo [1]
> > >
> > > {
> > >  "query": {
> > >"count": 1,
> > >"created": "2017-09-15T11:20:26Z",
> > >"lang": "en-US",
> > >"results": {
> > >  "channel": {
> > >"item": {
> > >  "condition": {
> > >"code": "33",
> > >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> > >"temp": "63",
> > >"text": "Mostly Clear"
> > >  }
> > >}
> > >  }
> > >}
> > >  }
> > > }
> > >
> > >
> > > ...I'd like to end up with output something like this:
> > >
> > > {
> > >  "query.count": 1,
> > >  "query.created": "2017-09-15T11:20:26Z",
> > >  "query.lang": "en-US",
> > >  "query.results.channel.item.condition.code": "33",
> > >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> > AM EDT",
> > >  "query.results.channel.item.condition.temp": "63",
> > >  "query.results.channel.item.condition.text": "Mostly Clear"
> > > }
> > >
> > >
> > > I checked out the JoltTransformJSON processor and some examples, such 
> as
> > > the nested data to "prefix soup" demo [2], but it seems as though I 
> need
> > to
> > > enter information about the schema for the incoming data in order to
> > > transform it. Ideally, I'd like to have a processor "just figure it 
> out"
> > > without explicit entry of a schema.
> > >
> > > Is there any way to accomplish this in a generic way with
> > JoltTransformJSON
> > > (or another native processor)?
> > >
> > > If not, would a ticket requesting a "Field Flattener" processor much 
> like
> > > the one included in StreamSets Data Collector [3] be worthwhile?
> > >
> > > Thanks in advance!
> > >
> > > -Nick
> > >
> > >
> > > [1]
> > > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> > condition%20from%20weather.forecast%20where%20woeid%20%
> > 3D%202383558=json=store%3A%2F%2Fdatatables.org%
> > 2Falltableswithkeys
> > >
> > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> > >
> > > [3]
> > > https://github.com/streamsets/datacollector/tree/master/
> > basic-lib/src/main/java/com/streamsets/pipeline/stage/
> > processor/fieldflattener
> >
> >
>
>
>


Re: "Flatten" JSON

2017-09-15 Thread Kevin Doran
+1 for adding a FlattenRecord processor. I can think of a few scenarios in 
which it would be quite useful, and it would be convenient if it could be 
accomplished without JOLT.

Thanks,
Kevin

On 9/15/17, 09:16, "Nicholas Hughes"  wrote:

Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.

If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne  wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> nicholasmhughes.n...@gmail.com> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >"count": 1,
> >"created": "2017-09-15T11:20:26Z",
> >"lang": "en-US",
> >"results": {
> >  "channel": {
> >"item": {
> >  "condition": {
> >"code": "33",
> >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >"temp": "63",
> >"text": "Mostly Clear"
> >  }
> >}
> >  }
> >}
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much 
like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558=json=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>





Re: "Flatten" JSON

2017-09-15 Thread Nicholas Hughes
Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.

If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne  wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> nicholasmhughes.n...@gmail.com> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >"count": 1,
> >"created": "2017-09-15T11:20:26Z",
> >"lang": "en-US",
> >"results": {
> >  "channel": {
> >"item": {
> >  "condition": {
> >"code": "33",
> >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >"temp": "63",
> >"text": "Mostly Clear"
> >  }
> >}
> >  }
> >}
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558=json=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>


Re: "Flatten" JSON

2017-09-15 Thread Mark Payne
Nick,

I do believe that there's a way to do what you're asking with Jolt, without 
knowing any kind of schema.
That said, Jolt can get complex pretty quickly and I don't know it well :)  
Personally, I have no problem with having a
FlattenRecord processor. I guess the question here, though, is are you using 
Record-oriented processors,
or are you using JSON-specific processors?

Personally, I'd like to see a FlattenRecord processor, rather than FlattenJSON, 
because that would allow
the transformation to apply to Avro as well (and as soon as we get an XML 
reader built, XML also). However,
the Record-oriented processors would expect that a schema be given (though it 
could also be inferred using
another existing processor).

-Mark



> On Sep 15, 2017, at 7:43 AM, Nicholas Hughes  
> wrote:
> 
> Is there an easy way to "flatten" arbitrary JSON within NiFi?
> 
> For input data like that shown below from Yahoo [1]
> 
> {
>  "query": {
>"count": 1,
>"created": "2017-09-15T11:20:26Z",
>"lang": "en-US",
>"results": {
>  "channel": {
>"item": {
>  "condition": {
>"code": "33",
>"date": "Fri, 15 Sep 2017 06:00 AM EDT",
>"temp": "63",
>"text": "Mostly Clear"
>  }
>}
>  }
>}
>  }
> }
> 
> 
> ...I'd like to end up with output something like this:
> 
> {
>  "query.count": 1,
>  "query.created": "2017-09-15T11:20:26Z",
>  "query.lang": "en-US",
>  "query.results.channel.item.condition.code": "33",
>  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
>  "query.results.channel.item.condition.temp": "63",
>  "query.results.channel.item.condition.text": "Mostly Clear"
> }
> 
> 
> I checked out the JoltTransformJSON processor and some examples, such as
> the nested data to "prefix soup" demo [2], but it seems as though I need to
> enter information about the schema for the incoming data in order to
> transform it. Ideally, I'd like to have a processor "just figure it out"
> without explicit entry of a schema.
> 
> Is there any way to accomplish this in a generic way with JoltTransformJSON
> (or another native processor)?
> 
> If not, would a ticket requesting a "Field Flattener" processor much like
> the one included in StreamSets Data Collector [3] be worthwhile?
> 
> Thanks in advance!
> 
> -Nick
> 
> 
> [1]
> https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
> 
> [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> 
> [3]
> https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener



"Flatten" JSON

2017-09-15 Thread Nicholas Hughes
Is there an easy way to "flatten" arbitrary JSON within NiFi?

For input data like that shown below from Yahoo [1]

{
  "query": {
"count": 1,
"created": "2017-09-15T11:20:26Z",
"lang": "en-US",
"results": {
  "channel": {
"item": {
  "condition": {
"code": "33",
"date": "Fri, 15 Sep 2017 06:00 AM EDT",
"temp": "63",
"text": "Mostly Clear"
  }
}
  }
}
  }
}


...I'd like to end up with output something like this:

{
  "query.count": 1,
  "query.created": "2017-09-15T11:20:26Z",
  "query.lang": "en-US",
  "query.results.channel.item.condition.code": "33",
  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
  "query.results.channel.item.condition.temp": "63",
  "query.results.channel.item.condition.text": "Mostly Clear"
}


I checked out the JoltTransformJSON processor and some examples, such as
the nested data to "prefix soup" demo [2], but it seems as though I need to
enter information about the schema for the incoming data in order to
transform it. Ideally, I'd like to have a processor "just figure it out"
without explicit entry of a schema.

Is there any way to accomplish this in a generic way with JoltTransformJSON
(or another native processor)?

If not, would a ticket requesting a "Field Flattener" processor much like
the one included in StreamSets Data Collector [3] be worthwhile?

Thanks in advance!

-Nick


[1]
https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

[2] http://jolt-demo.appspot.com/#bucketToPrefixSoup

[3]
https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener