You could use an AvroWriter to output the results of the JoltTransformRecord - it doesn’t need to be JSON (in or out), that’s one of the great things of NiFi’s Record processors - if there’s a Reader and Writer in the format you want, you can use that data and the Writer doesn’t need to be the same format as the Reader.
Good news: I’ve identified the problem in my NIFI-8135 PR [1] by adding a cut-down version of your example as a unit test for the JoltTransformRecord processor. However, I’m not so sure the output is quite what you were expecting - see the nifi-nar-bundles/nifi-jolt-record-bundle/nifi-jolt-record-processors/src/test/resources/TestJoltTransformRecord/flattenedOutput.json file in the linked PR, the “Eta”’s “value” field appears as a Java Map serialised as a String, I imagine you were wanting this to be a nested Object? If the latter, I think we’re then running into NIFI-8134 [2], for which I have a separate PR ready for review [3]. [1]: https://github.com/apache/nifi/pull/7746/files [2]: https://issues.apache.org/jira/browse/NIFI-8134 [3]: https://github.com/apache/nifi/pull/7745 Cheers, --- Chris Sampson IT Consultant [email protected] > On 11 Oct 2023, at 19:22, Mark Woodcock <[email protected]> wrote: > > Chris, > > 1) well, reassuring to learn that I've found an actual bug; and pleasing to > know that I constructed an effective and illuminating test. hurrah. > > 2) So, I can certainly use the ReplaceText (is there a better choice?) > processor to ditch any field that looks like "whatever": "", (and I > successfully implemented it), but unfortunately when I pass the resulting > json onto another processor (e.g. a ConvertRecord, so I can spit out AVRO), > the fact that the data now has different schemas causes an error. Is this > just kicking the can down the road? > > thx, > > mew > > > On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson > <[email protected]> wrote: > >> FYI - original thread in the archives for reference [1]. >> >> Thanks for your more complete example, this does indeed fail with the >> error you indicate. I think it’s related to NIFI-8135 [2], which identified >> a deficiency in the way Records are converted to Java Maps, particularly >> where CHOICE types are involved. >> >> The example data you’ve provided does indeed have a mix of String and >> Record (JSON Object) values for the affected fields - this is a little >> unusual, but certainly nothing that’s banned in the world of JSON, so >> should probably be handled better by NiFi. >> >> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3]. I’d >> been struggling to re-create the error for the ticket, but I think your >> example does it nicely, so provides a good test for whether the problem is >> fixed - unfortunately, when I run this example data against my branch, it >> still fails albeit with a different error: >> >> java.lang.ClassCastException: class java.lang.String cannot be cast to >> class org.apache.nifi.serialization.record.Record (java.lang.String is in >> module java.base of loader 'bootstrap'; >> org.apache.nifi.serialization.record.Record is in unnamed module of loader >> org.apache.nifi.nar.NarClassLoader @4b3ad7ca) >> at >> org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893) >> at >> org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425) >> ... >> >> So it seems there’s a little more debugging and work to do for NIFI-8135 >> yet, >> >> One way of you working around this in your example would be to remove >> empty fields from your JSON before passing it through the JOLT processors, >> e.g. remove the field completely if it’s null/empty. >> >> >> [1] https://lists.apache.org/thread/kcnsxvbbdfwfhj0tdsyn53x8ljhgdt1v >> >> [2] https://issues.apache.org/jira/browse/NIFI-8135 >> >> [3] https://github.com/apache/nifi/pull/7746 >> >> >> >> Cheers, >> >> --- >> Chris Sampson >> IT Consultant >> [email protected] >> >> >>> On 10 Oct 2023, at 22:31, Mark Woodcock <[email protected]> >> wrote: >>> >>> Matt, >>> >>> 1) Yea, I definitely muddled what you meant by "just JSON" before. I'm >>> definitely looking to get a bunch of records out...because I want to AVRO >>> each of them later. >>> >>> 2) I'm sure I've also messed up the threading in the issues. >> Apparently, I >>> should have subscribed before today (which I've done now); hopefully that >>> will get better. >>> >>> 3) I'll see if I can make any of your suggestions work. >>> >>> thx, >>> >>> mew >>> >>> >>> On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <[email protected]> >> wrote: >>> >>>> For some reason I don't have the original thread, I must've >>>> inadvertently deleted it. IIRC your example input was a single JSON >>>> object and I said if that were the case you could use >>>> JoltTransformJSON instead. However if that is NOT the case (which is >>>> your point c above) then you have a couple of options: >>>> >>>> 1) To continue using JoltTransformJSON with a top-level array you need >>>> to surround your spec with >>>> "*": { <your_current_spec> } >>>> and will need to use "[&1]." in front of all the output fields. This >>>> will output the transformation to the same index in the array as it >>>> was in the input. >>>> >>>> 2) One major difference between JoltTransformJSON and >>>> JoltTransformRecord is that the former reads the entire thing into >>>> memory, where JoltTransformRecord reads one record at a time. So your >>>> current spec should work with JoltTransformRecord, but if you are >>>> still getting the original error, can you provide (or re-provide if >>>> you already did, I can't find the original thread) sample input that >>>> represents the "real" input (not just one JSON object if you'll be >>>> getting multiple records or if the top-level is an array even with >>>> only one object in it), desired output, and the error with full stack >>>> trace? I'm guessing there is an inference error with complex fields, >>>> if you know what the input and output schemas are you can provide them >>>> to the Reader and Writer respectively instead of using "Infer schema". >>>> That should work around any inference issues. >>>> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor, >>>> you can try that before your JoltTransformRecord processor with the >>>> same reader and see what it comes out with as a schema. Then you can >>>> manually alter it to better match your data and use that in the Reader >>>> specified in JoltTransformRecord. >>>> >>>> Regards, >>>> Matt >>>> >>>> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <[email protected] >>> >>>> wrote: >>>>> >>>>> Chris, >>>>> >>>>> 1) I've upgraded to 1.23.2 (which appears to be the latest and >>>> greatest). >>>>> >>>>> 2) I've tested the JoltTransformRecord with >>>>> a) JsonTreeReader w/ InferredSchema >>>>> b) JsonRecordSetWriter w/ InheritsSchema >>>>> c) a GetFile processor which grabs a text file with the various bits of >>>>> test data >>>>> >>>>> It appears that your suspicions are correct: >>>>> i) if I test with just that single record as the entire content of the >>>>> file, the processor is successful. >>>>> ii) if I test with multiple records, none of which have the complicated >>>>> inner field, all is successful. >>>>> c) if I test with multiple records, where at least one has the >>>> complicated >>>>> inner field, I get the earlier noted error. >>>>> >>>>> IOW, yep, it only happens with *more* data. >>>>> >>>>> bummer, >>>>> >>>>> mew >>>>> >>>>> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson >>>>> <[email protected]> wrote: >>>>> >>>>>> Using your example (single JSON Object and Jolt Spec) seems to work >>>> fine >>>>>> in both JoltTransformJSON and JoltTransformRecord when run on the >>>> current >>>>>> main branch (which is for the upcoming 2.0.0 release). >>>>>> >>>>>> To test, I setup a GenerateFlowFile processor to output the example >>>> JSON >>>>>> you gave, then sent that through both of the Jolt processors using a >>>>>> JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that >>>>>> “Inherits Schema” for the Record processor. >>>>>> >>>>>> If you run *just* your example from this email chain through the Jolt >>>>>> processors on the version of NiFi you’re using, do you see the errors >>>> you >>>>>> mention, or does that only happen with more data? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> --- >>>>>> Chris Sampson >>>>>> IT Consultant >>>>>> [email protected] >>>>>> >>>>>> >>>>>>> On 10 Oct 2023, at 15:45, Mark Woodcock <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Hmmmm, >>>>>>> >>>>>>> One small problem: While JOLTTransformJSON is quite lovely (a) it >>>> has a >>>>>>> great "advanced" interface that allows one to test their spec and >>>> json >>>>>>> inputs and (b) it actually works for the cases that I noted...it >>>> treats >>>>>> the >>>>>>> input a single blob of JSON. Unfortunately, my input files are >>>>>> collections >>>>>>> of JSON records (which--less the noted problem--JOLTTransformRecord >>>> does >>>>>>> quite nicely with)--that's literally how they arrive, not the result >>>> of >>>>>> me >>>>>>> formatting them at all. >>>>>>> >>>>>>> Is there a way to get JTJ to treat the input as records? >>>>>>> Does 1.22 or 1.23 have the fix for JTR? >>>>>>> >>>>>>> thx, >>>>>>> >>>>>>> mew >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> confirmed: version 1.21. >>>>>>>> How recent is the fix? >>>>>>>> >>>>>>>> thx, >>>>>>>> >>>>>>>> mew >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <[email protected]> >>>>>> wrote: >>>>>>>> >>>>>>>>> Matt, >>>>>>>>> >>>>>>>>> Unfortunately (at home now) the details are all at work at the >>>> moment, >>>>>>>>> but I know that I didn't start this work until April (at the >>>>>> earliest), so >>>>>>>>> I'm surely using at least 1.21; is the fix more recent than that? >>>>>> {If so, >>>>>>>>> perhaps there is a bug.} >>>>>>>>> >>>>>>>>> Fortunately, yea, JSON out is the intent; I need the data to be in >>>> that >>>>>>>>> format to set up a subsequent transform to AVRO, so it seems there >>>> are >>>>>> two >>>>>>>>> possible ways out (depending on which version I'm running): >>>> upgrade or >>>>>>>>> change processors. So, at least there is a path. >>>>>>>>> >>>>>>>>> thx, >>>>>>>>> >>>>>>>>> mew >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> >>>> >> >>
