Chris,

1) well, reassuring to learn that I've found an actual bug; and pleasing to
know that I constructed an effective and illuminating test.  hurrah.

2) So, I can certainly use the ReplaceText (is there a better choice?)
processor to ditch any field that looks like "whatever": "", (and I
successfully implemented it), but unfortunately when I pass the resulting
json onto another processor (e.g. a ConvertRecord, so I can spit out AVRO),
the fact that the data now has different schemas causes an error.  Is this
just kicking the can down the road?

thx,

mew


On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson
<chris.samp...@naimuri.com.invalid> wrote:

> FYI - original thread in the archives for reference [1].
>
> Thanks for your more complete example, this does indeed fail with the
> error you indicate. I think it’s related to NIFI-8135 [2], which identified
> a deficiency in the way Records are converted to Java Maps, particularly
> where CHOICE types are involved.
>
> The example data you’ve provided does indeed have a mix of String and
> Record (JSON Object) values for the affected fields - this is a little
> unusual, but certainly nothing that’s banned in the world of JSON, so
> should probably be handled better by NiFi.
>
> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3]. I’d
> been struggling to re-create the error for the ticket, but I think your
> example does it nicely, so provides a good test for whether the problem is
> fixed - unfortunately, when I run this example data against my branch, it
> still fails albeit with a different error:
>
> java.lang.ClassCastException: class java.lang.String cannot be cast to
> class org.apache.nifi.serialization.record.Record (java.lang.String is in
> module java.base of loader 'bootstrap';
> org.apache.nifi.serialization.record.Record is in unnamed module of loader
> org.apache.nifi.nar.NarClassLoader @4b3ad7ca)
>         at
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893)
>         at
> org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425)
>         ...
>
> So it seems there’s a little more debugging and work to do for NIFI-8135
> yet,
>
> One way of you working around this in your example would be to remove
> empty fields from your JSON before passing it through the JOLT processors,
> e.g. remove the field completely if it’s null/empty.
>
>
> [1] https://lists.apache.org/thread/kcnsxvbbdfwfhj0tdsyn53x8ljhgdt1v
>
> [2] https://issues.apache.org/jira/browse/NIFI-8135
>
> [3] https://github.com/apache/nifi/pull/7746
>
>
>
> Cheers,
>
> ---
> Chris Sampson
> IT Consultant
> chris.samp...@naimuri.com
>
>
> > On 10 Oct 2023, at 22:31, Mark Woodcock <woodc...@usna.edu.INVALID>
> wrote:
> >
> > Matt,
> >
> > 1) Yea, I definitely muddled what you meant by "just JSON" before.   I'm
> > definitely looking to get a bunch of records out...because I want to AVRO
> > each of them later.
> >
> > 2) I'm sure I've also messed up the threading in the issues.
> Apparently, I
> > should have subscribed before today (which I've done now); hopefully that
> > will get better.
> >
> > 3) I'll see if I can make any of your suggestions work.
> >
> > thx,
> >
> > mew
> >
> >
> > On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <mattyb...@apache.org>
> wrote:
> >
> >> For some reason I don't have the original thread, I must've
> >> inadvertently deleted it. IIRC your example input was a single JSON
> >> object and I said if that were the case you could use
> >> JoltTransformJSON instead. However if that is NOT the case (which is
> >> your point c above) then you have a couple of options:
> >>
> >> 1) To continue using JoltTransformJSON with a top-level array you need
> >> to surround your spec with
> >> "*": { <your_current_spec> }
> >> and will need to use "[&1]." in front of all the output fields. This
> >> will output the transformation to the same index in the array as it
> >> was in the input.
> >>
> >> 2) One major difference between JoltTransformJSON and
> >> JoltTransformRecord is that the former reads the entire thing into
> >> memory, where JoltTransformRecord reads one record at a time. So your
> >> current spec should work with JoltTransformRecord, but if you are
> >> still getting the original error, can you provide (or re-provide if
> >> you already did, I can't find the original thread) sample input that
> >> represents the "real" input (not just one JSON object if you'll be
> >> getting multiple records or if the top-level is an array even with
> >> only one object in it), desired output, and the error with full stack
> >> trace? I'm guessing there is an inference error with complex fields,
> >> if you know what the input and output schemas are you can provide them
> >> to the Reader and Writer respectively instead of using "Infer schema".
> >> That should work around any inference issues.
> >> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
> >> you can try that before your JoltTransformRecord processor with the
> >> same reader and see what it comes out with as a schema. Then you can
> >> manually alter it to better match your data and use that in the Reader
> >> specified in JoltTransformRecord.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <woodc...@usna.edu.invalid
> >
> >> wrote:
> >>>
> >>> Chris,
> >>>
> >>> 1) I've upgraded to 1.23.2  (which appears to be the latest and
> >> greatest).
> >>>
> >>> 2) I've tested the JoltTransformRecord with
> >>> a) JsonTreeReader w/ InferredSchema
> >>> b) JsonRecordSetWriter w/ InheritsSchema
> >>> c) a GetFile processor which grabs a text file with the various bits of
> >>> test data
> >>>
> >>> It appears that your suspicions are correct:
> >>> i) if I test with just that single record as the entire content of the
> >>> file, the processor is successful.
> >>> ii) if I test with multiple records, none of which have the complicated
> >>> inner field, all is successful.
> >>> c) if I test with multiple records, where at least one has the
> >> complicated
> >>> inner field, I get the earlier noted error.
> >>>
> >>> IOW, yep, it only happens with *more* data.
> >>>
> >>> bummer,
> >>>
> >>> mew
> >>>
> >>> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
> >>> <chris.samp...@naimuri.com.invalid> wrote:
> >>>
> >>>> Using your example (single JSON Object and Jolt Spec) seems to work
> >> fine
> >>>> in both JoltTransformJSON and JoltTransformRecord when run on the
> >> current
> >>>> main branch (which is for the upcoming 2.0.0 release).
> >>>>
> >>>> To test, I setup a GenerateFlowFile processor to output the example
> >> JSON
> >>>> you gave, then sent that through both of the Jolt processors using a
> >>>> JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that
> >>>> “Inherits Schema” for the Record processor.
> >>>>
> >>>> If you run *just* your example from this email chain through the Jolt
> >>>> processors on the version of NiFi you’re using, do you see the errors
> >> you
> >>>> mention, or does that only happen with more data?
> >>>>
> >>>>
> >>>> Cheers,
> >>>>
> >>>> ---
> >>>> Chris Sampson
> >>>> IT Consultant
> >>>> chris.samp...@naimuri.com
> >>>>
> >>>>
> >>>>> On 10 Oct 2023, at 15:45, Mark Woodcock <woodc...@usna.edu.INVALID>
> >>>> wrote:
> >>>>>
> >>>>> Hmmmm,
> >>>>>
> >>>>> One small problem:  While JOLTTransformJSON is quite lovely (a) it
> >> has a
> >>>>> great "advanced" interface that allows one to test their spec and
> >> json
> >>>>> inputs and (b) it actually works for the cases that I noted...it
> >> treats
> >>>> the
> >>>>> input a single blob of JSON.  Unfortunately, my input files are
> >>>> collections
> >>>>> of JSON records (which--less the noted problem--JOLTTransformRecord
> >> does
> >>>>> quite nicely with)--that's literally how they arrive, not the result
> >> of
> >>>> me
> >>>>> formatting them at all.
> >>>>>
> >>>>> Is there a way to get JTJ to treat the input as records?
> >>>>> Does 1.22 or 1.23 have the fix for JTR?
> >>>>>
> >>>>> thx,
> >>>>>
> >>>>> mew
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <woodc...@usna.edu>
> >> wrote:
> >>>>>
> >>>>>> confirmed:  version 1.21.
> >>>>>> How recent is the fix?
> >>>>>>
> >>>>>> thx,
> >>>>>>
> >>>>>> mew
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <woodc...@usna.edu>
> >>>> wrote:
> >>>>>>
> >>>>>>> Matt,
> >>>>>>>
> >>>>>>> Unfortunately (at home now) the details are all at work at the
> >> moment,
> >>>>>>> but I know that I didn't start this work until April (at the
> >>>> earliest), so
> >>>>>>> I'm surely using at least 1.21; is the fix more recent than that?
> >>>> {If so,
> >>>>>>> perhaps there is a bug.}
> >>>>>>>
> >>>>>>> Fortunately, yea, JSON out is the intent; I need the data to be in
> >> that
> >>>>>>> format to set up a subsequent transform to AVRO, so it seems there
> >> are
> >>>> two
> >>>>>>> possible ways out (depending on which version I'm running):
> >> upgrade or
> >>>>>>> change processors.  So, at least there is a path.
> >>>>>>>
> >>>>>>> thx,
> >>>>>>>
> >>>>>>> mew
> >>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>
>
>

Reply via email to