For some reason I don't have the original thread, I must've
inadvertently deleted it. IIRC your example input was a single JSON
object and I said if that were the case you could use
JoltTransformJSON instead. However if that is NOT the case (which is
your point c above) then you have a couple of options:
1) To continue using JoltTransformJSON with a top-level array you need
to surround your spec with
"*": { <your_current_spec> }
and will need to use "[&1]." in front of all the output fields. This
will output the transformation to the same index in the array as it
was in the input.
2) One major difference between JoltTransformJSON and
JoltTransformRecord is that the former reads the entire thing into
memory, where JoltTransformRecord reads one record at a time. So your
current spec should work with JoltTransformRecord, but if you are
still getting the original error, can you provide (or re-provide if
you already did, I can't find the original thread) sample input that
represents the "real" input (not just one JSON object if you'll be
getting multiple records or if the top-level is an array even with
only one object in it), desired output, and the error with full stack
trace? I'm guessing there is an inference error with complex fields,
if you know what the input and output schemas are you can provide them
to the Reader and Writer respectively instead of using "Infer schema".
That should work around any inference issues.
With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
you can try that before your JoltTransformRecord processor with the
same reader and see what it comes out with as a schema. Then you can
manually alter it to better match your data and use that in the Reader
specified in JoltTransformRecord.
Regards,
Matt
On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <[email protected]> wrote:
>
> Chris,
>
> 1) I've upgraded to 1.23.2 (which appears to be the latest and greatest).
>
> 2) I've tested the JoltTransformRecord with
> a) JsonTreeReader w/ InferredSchema
> b) JsonRecordSetWriter w/ InheritsSchema
> c) a GetFile processor which grabs a text file with the various bits of
> test data
>
> It appears that your suspicions are correct:
> i) if I test with just that single record as the entire content of the
> file, the processor is successful.
> ii) if I test with multiple records, none of which have the complicated
> inner field, all is successful.
> c) if I test with multiple records, where at least one has the complicated
> inner field, I get the earlier noted error.
>
> IOW, yep, it only happens with *more* data.
>
> bummer,
>
> mew
>
> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
> <[email protected]> wrote:
>
> > Using your example (single JSON Object and Jolt Spec) seems to work fine
> > in both JoltTransformJSON and JoltTransformRecord when run on the current
> > main branch (which is for the upcoming 2.0.0 release).
> >
> > To test, I setup a GenerateFlowFile processor to output the example JSON
> > you gave, then sent that through both of the Jolt processors using a
> > JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that
> > “Inherits Schema” for the Record processor.
> >
> > If you run *just* your example from this email chain through the Jolt
> > processors on the version of NiFi you’re using, do you see the errors you
> > mention, or does that only happen with more data?
> >
> >
> > Cheers,
> >
> > ---
> > Chris Sampson
> > IT Consultant
> > [email protected]
> >
> >
> > > On 10 Oct 2023, at 15:45, Mark Woodcock <[email protected]>
> > wrote:
> > >
> > > Hmmmm,
> > >
> > > One small problem: While JOLTTransformJSON is quite lovely (a) it has a
> > > great "advanced" interface that allows one to test their spec and json
> > > inputs and (b) it actually works for the cases that I noted...it treats
> > the
> > > input a single blob of JSON. Unfortunately, my input files are
> > collections
> > > of JSON records (which--less the noted problem--JOLTTransformRecord does
> > > quite nicely with)--that's literally how they arrive, not the result of
> > me
> > > formatting them at all.
> > >
> > > Is there a way to get JTJ to treat the input as records?
> > > Does 1.22 or 1.23 have the fix for JTR?
> > >
> > > thx,
> > >
> > > mew
> > >
> > >
> > > On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <[email protected]> wrote:
> > >
> > >> confirmed: version 1.21.
> > >> How recent is the fix?
> > >>
> > >> thx,
> > >>
> > >> mew
> > >>
> > >>
> > >> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <[email protected]>
> > wrote:
> > >>
> > >>> Matt,
> > >>>
> > >>> Unfortunately (at home now) the details are all at work at the moment,
> > >>> but I know that I didn't start this work until April (at the
> > earliest), so
> > >>> I'm surely using at least 1.21; is the fix more recent than that?
> > {If so,
> > >>> perhaps there is a bug.}
> > >>>
> > >>> Fortunately, yea, JSON out is the intent; I need the data to be in that
> > >>> format to set up a subsequent transform to AVRO, so it seems there are
> > two
> > >>> possible ways out (depending on which version I'm running): upgrade or
> > >>> change processors. So, at least there is a path.
> > >>>
> > >>> thx,
> > >>>
> > >>> mew
> > >>>
> > >>>
> >
> >