Re: JOLTTransformRecord problem

Mark Woodcock Tue, 10 Oct 2023 14:32:25 -0700

Matt,

1) Yea, I definitely muddled what you meant by "just JSON" before.   I'm
definitely looking to get a bunch of records out...because I want to AVRO
each of them later.


2) I'm sure I've also messed up the threading in the issues.  Apparently, I
should have subscribed before today (which I've done now); hopefully that
will get better.

3) I'll see if I can make any of your suggestions work.

thx,

mew


On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <mattyb...@apache.org> wrote:

> For some reason I don't have the original thread, I must've
> inadvertently deleted it. IIRC your example input was a single JSON
> object and I said if that were the case you could use
> JoltTransformJSON instead. However if that is NOT the case (which is
> your point c above) then you have a couple of options:
>
> 1) To continue using JoltTransformJSON with a top-level array you need
> to surround your spec with
> "*": { <your_current_spec> }
> and will need to use "[&1]." in front of all the output fields. This
> will output the transformation to the same index in the array as it
> was in the input.
>
> 2) One major difference between JoltTransformJSON and
> JoltTransformRecord is that the former reads the entire thing into
> memory, where JoltTransformRecord reads one record at a time. So your
> current spec should work with JoltTransformRecord, but if you are
> still getting the original error, can you provide (or re-provide if
> you already did, I can't find the original thread) sample input that
> represents the "real" input (not just one JSON object if you'll be
> getting multiple records or if the top-level is an array even with
> only one object in it), desired output, and the error with full stack
> trace? I'm guessing there is an inference error with complex fields,
> if you know what the input and output schemas are you can provide them
> to the Reader and Writer respectively instead of using "Infer schema".
> That should work around any inference issues.
> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
> you can try that before your JoltTransformRecord processor with the
> same reader and see what it comes out with as a schema. Then you can
> manually alter it to better match your data and use that in the Reader
> specified in JoltTransformRecord.
>
> Regards,
> Matt
>
> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <woodc...@usna.edu.invalid>
> wrote:
> >
> > Chris,
> >
> > 1) I've upgraded to 1.23.2  (which appears to be the latest and
> greatest).
> >
> > 2) I've tested the JoltTransformRecord with
> > a) JsonTreeReader w/ InferredSchema
> > b) JsonRecordSetWriter w/ InheritsSchema
> > c) a GetFile processor which grabs a text file with the various bits of
> > test data
> >
> > It appears that your suspicions are correct:
> > i) if I test with just that single record as the entire content of the
> > file, the processor is successful.
> > ii) if I test with multiple records, none of which have the complicated
> > inner field, all is successful.
> > c) if I test with multiple records, where at least one has the
> complicated
> > inner field, I get the earlier noted error.
> >
> > IOW, yep, it only happens with *more* data.
> >
> > bummer,
> >
> > mew
> >
> > On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
> > <chris.samp...@naimuri.com.invalid> wrote:
> >
> > > Using your example (single JSON Object and Jolt Spec) seems to work
> fine
> > > in both JoltTransformJSON and JoltTransformRecord when run on the
> current
> > > main branch (which is for the upcoming 2.0.0 release).
> > >
> > > To test, I setup a GenerateFlowFile processor to output the example
> JSON
> > > you gave, then sent that through both of the Jolt processors using a
> > > JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that
> > > “Inherits Schema” for the Record processor.
> > >
> > > If you run *just* your example from this email chain through the Jolt
> > > processors on the version of NiFi you’re using, do you see the errors
> you
> > > mention, or does that only happen with more data?
> > >
> > >
> > > Cheers,
> > >
> > > ---
> > > Chris Sampson
> > > IT Consultant
> > > chris.samp...@naimuri.com
> > >
> > >
> > > > On 10 Oct 2023, at 15:45, Mark Woodcock <woodc...@usna.edu.INVALID>
> > > wrote:
> > > >
> > > > Hmmmm,
> > > >
> > > > One small problem:  While JOLTTransformJSON is quite lovely (a) it
> has a
> > > > great "advanced" interface that allows one to test their spec and
> json
> > > > inputs and (b) it actually works for the cases that I noted...it
> treats
> > > the
> > > > input a single blob of JSON.  Unfortunately, my input files are
> > > collections
> > > > of JSON records (which--less the noted problem--JOLTTransformRecord
> does
> > > > quite nicely with)--that's literally how they arrive, not the result
> of
> > > me
> > > > formatting them at all.
> > > >
> > > > Is there a way to get JTJ to treat the input as records?
> > > > Does 1.22 or 1.23 have the fix for JTR?
> > > >
> > > > thx,
> > > >
> > > > mew
> > > >
> > > >
> > > > On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <woodc...@usna.edu>
> wrote:
> > > >
> > > >> confirmed:  version 1.21.
> > > >> How recent is the fix?
> > > >>
> > > >> thx,
> > > >>
> > > >> mew
> > > >>
> > > >>
> > > >> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <woodc...@usna.edu>
> > > wrote:
> > > >>
> > > >>> Matt,
> > > >>>
> > > >>> Unfortunately (at home now) the details are all at work at the
> moment,
> > > >>> but I know that I didn't start this work until April (at the
> > > earliest), so
> > > >>> I'm surely using at least 1.21; is the fix more recent than that?
> > >  {If so,
> > > >>> perhaps there is a bug.}
> > > >>>
> > > >>> Fortunately, yea, JSON out is the intent; I need the data to be in
> that
> > > >>> format to set up a subsequent transform to AVRO, so it seems there
> are
> > > two
> > > >>> possible ways out (depending on which version I'm running):
> upgrade or
> > > >>> change processors.  So, at least there is a path.
> > > >>>
> > > >>> thx,
> > > >>>
> > > >>> mew
> > > >>>
> > > >>>
> > >
> > >
>

Re: JOLTTransformRecord problem

Reply via email to