Re: JOLTTransformRecord problem

Chris Sampson Wed, 11 Oct 2023 13:35:25 -0700

You could use an AvroWriter to output the results of the JoltTransformRecord - 
it doesn’t need to be JSON (in or out), that’s one of the great things of 
NiFi’s Record processors - if there’s a Reader and Writer in the format you 
want, you can use that data and the Writer doesn’t need to be the same format 
as the Reader.


Good news: I’ve identified the problem in my NIFI-8135 PR [1] by adding a 
cut-down version of your example as a unit test for the JoltTransformRecord 
processor.

However, I’m not so sure the output is quite what you were expecting - see the 
nifi-nar-bundles/nifi-jolt-record-bundle/nifi-jolt-record-processors/src/test/resources/TestJoltTransformRecord/flattenedOutput.json
 file in the linked PR, the “Eta”’s “value” field appears as a Java Map 
serialised as a String, I imagine you were wanting this to be a nested Object?

If the latter, I think we’re then running into NIFI-8134 [2], for which I have 
a separate PR ready for review [3].

[1]: https://github.com/apache/nifi/pull/7746/files

[2]: https://issues.apache.org/jira/browse/NIFI-8134

[3]: https://github.com/apache/nifi/pull/7745


Cheers,

---
Chris Sampson
IT Consultant
[email protected]


> On 11 Oct 2023, at 19:22, Mark Woodcock <[email protected]> wrote:
> 
> Chris,
> 
> 1) well, reassuring to learn that I've found an actual bug; and pleasing to
> know that I constructed an effective and illuminating test.  hurrah.
> 
> 2) So, I can certainly use the ReplaceText (is there a better choice?)
> processor to ditch any field that looks like "whatever": "", (and I
> successfully implemented it), but unfortunately when I pass the resulting
> json onto another processor (e.g. a ConvertRecord, so I can spit out AVRO),
> the fact that the data now has different schemas causes an error.  Is this
> just kicking the can down the road?
> 
> thx,
> 
> mew
> 
> 
> On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson
> <[email protected]> wrote:
> 
>> FYI - original thread in the archives for reference [1].
>> 
>> Thanks for your more complete example, this does indeed fail with the
>> error you indicate. I think it’s related to NIFI-8135 [2], which identified
>> a deficiency in the way Records are converted to Java Maps, particularly
>> where CHOICE types are involved.
>> 
>> The example data you’ve provided does indeed have a mix of String and
>> Record (JSON Object) values for the affected fields - this is a little
>> unusual, but certainly nothing that’s banned in the world of JSON, so
>> should probably be handled better by NiFi.
>> 
>> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3]. I’d
>> been struggling to re-create the error for the ticket, but I think your
>> example does it nicely, so provides a good test for whether the problem is
>> fixed - unfortunately, when I run this example data against my branch, it
>> still fails albeit with a different error:
>> 
>> java.lang.ClassCastException: class java.lang.String cannot be cast to
>> class org.apache.nifi.serialization.record.Record (java.lang.String is in
>> module java.base of loader 'bootstrap';
>> org.apache.nifi.serialization.record.Record is in unnamed module of loader
>> org.apache.nifi.nar.NarClassLoader @4b3ad7ca)
>>        at
>> org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893)
>>        at
>> org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425)
>>        ...
>> 
>> So it seems there’s a little more debugging and work to do for NIFI-8135
>> yet,
>> 
>> One way of you working around this in your example would be to remove
>> empty fields from your JSON before passing it through the JOLT processors,
>> e.g. remove the field completely if it’s null/empty.
>> 
>> 
>> [1] https://lists.apache.org/thread/kcnsxvbbdfwfhj0tdsyn53x8ljhgdt1v
>> 
>> [2] https://issues.apache.org/jira/browse/NIFI-8135
>> 
>> [3] https://github.com/apache/nifi/pull/7746
>> 
>> 
>> 
>> Cheers,
>> 
>> ---
>> Chris Sampson
>> IT Consultant
>> [email protected]
>> 
>> 
>>> On 10 Oct 2023, at 22:31, Mark Woodcock <[email protected]>
>> wrote:
>>> 
>>> Matt,
>>> 
>>> 1) Yea, I definitely muddled what you meant by "just JSON" before.   I'm
>>> definitely looking to get a bunch of records out...because I want to AVRO
>>> each of them later.
>>> 
>>> 2) I'm sure I've also messed up the threading in the issues.
>> Apparently, I
>>> should have subscribed before today (which I've done now); hopefully that
>>> will get better.
>>> 
>>> 3) I'll see if I can make any of your suggestions work.
>>> 
>>> thx,
>>> 
>>> mew
>>> 
>>> 
>>> On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <[email protected]>
>> wrote:
>>> 
>>>> For some reason I don't have the original thread, I must've
>>>> inadvertently deleted it. IIRC your example input was a single JSON
>>>> object and I said if that were the case you could use
>>>> JoltTransformJSON instead. However if that is NOT the case (which is
>>>> your point c above) then you have a couple of options:
>>>> 
>>>> 1) To continue using JoltTransformJSON with a top-level array you need
>>>> to surround your spec with
>>>> "*": { <your_current_spec> }
>>>> and will need to use "[&1]." in front of all the output fields. This
>>>> will output the transformation to the same index in the array as it
>>>> was in the input.
>>>> 
>>>> 2) One major difference between JoltTransformJSON and
>>>> JoltTransformRecord is that the former reads the entire thing into
>>>> memory, where JoltTransformRecord reads one record at a time. So your
>>>> current spec should work with JoltTransformRecord, but if you are
>>>> still getting the original error, can you provide (or re-provide if
>>>> you already did, I can't find the original thread) sample input that
>>>> represents the "real" input (not just one JSON object if you'll be
>>>> getting multiple records or if the top-level is an array even with
>>>> only one object in it), desired output, and the error with full stack
>>>> trace? I'm guessing there is an inference error with complex fields,
>>>> if you know what the input and output schemas are you can provide them
>>>> to the Reader and Writer respectively instead of using "Infer schema".
>>>> That should work around any inference issues.
>>>> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
>>>> you can try that before your JoltTransformRecord processor with the
>>>> same reader and see what it comes out with as a schema. Then you can
>>>> manually alter it to better match your data and use that in the Reader
>>>> specified in JoltTransformRecord.
>>>> 
>>>> Regards,
>>>> Matt
>>>> 
>>>> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <[email protected]
>>> 
>>>> wrote:
>>>>> 
>>>>> Chris,
>>>>> 
>>>>> 1) I've upgraded to 1.23.2  (which appears to be the latest and
>>>> greatest).
>>>>> 
>>>>> 2) I've tested the JoltTransformRecord with
>>>>> a) JsonTreeReader w/ InferredSchema
>>>>> b) JsonRecordSetWriter w/ InheritsSchema
>>>>> c) a GetFile processor which grabs a text file with the various bits of
>>>>> test data
>>>>> 
>>>>> It appears that your suspicions are correct:
>>>>> i) if I test with just that single record as the entire content of the
>>>>> file, the processor is successful.
>>>>> ii) if I test with multiple records, none of which have the complicated
>>>>> inner field, all is successful.
>>>>> c) if I test with multiple records, where at least one has the
>>>> complicated
>>>>> inner field, I get the earlier noted error.
>>>>> 
>>>>> IOW, yep, it only happens with *more* data.
>>>>> 
>>>>> bummer,
>>>>> 
>>>>> mew
>>>>> 
>>>>> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
>>>>> <[email protected]> wrote:
>>>>> 
>>>>>> Using your example (single JSON Object and Jolt Spec) seems to work
>>>> fine
>>>>>> in both JoltTransformJSON and JoltTransformRecord when run on the
>>>> current
>>>>>> main branch (which is for the upcoming 2.0.0 release).
>>>>>> 
>>>>>> To test, I setup a GenerateFlowFile processor to output the example
>>>> JSON
>>>>>> you gave, then sent that through both of the Jolt processors using a
>>>>>> JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that
>>>>>> “Inherits Schema” for the Record processor.
>>>>>> 
>>>>>> If you run *just* your example from this email chain through the Jolt
>>>>>> processors on the version of NiFi you’re using, do you see the errors
>>>> you
>>>>>> mention, or does that only happen with more data?
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> ---
>>>>>> Chris Sampson
>>>>>> IT Consultant
>>>>>> [email protected]
>>>>>> 
>>>>>> 
>>>>>>> On 10 Oct 2023, at 15:45, Mark Woodcock <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hmmmm,
>>>>>>> 
>>>>>>> One small problem:  While JOLTTransformJSON is quite lovely (a) it
>>>> has a
>>>>>>> great "advanced" interface that allows one to test their spec and
>>>> json
>>>>>>> inputs and (b) it actually works for the cases that I noted...it
>>>> treats
>>>>>> the
>>>>>>> input a single blob of JSON.  Unfortunately, my input files are
>>>>>> collections
>>>>>>> of JSON records (which--less the noted problem--JOLTTransformRecord
>>>> does
>>>>>>> quite nicely with)--that's literally how they arrive, not the result
>>>> of
>>>>>> me
>>>>>>> formatting them at all.
>>>>>>> 
>>>>>>> Is there a way to get JTJ to treat the input as records?
>>>>>>> Does 1.22 or 1.23 have the fix for JTR?
>>>>>>> 
>>>>>>> thx,
>>>>>>> 
>>>>>>> mew
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> confirmed:  version 1.21.
>>>>>>>> How recent is the fix?
>>>>>>>> 
>>>>>>>> thx,
>>>>>>>> 
>>>>>>>> mew
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <[email protected]>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Matt,
>>>>>>>>> 
>>>>>>>>> Unfortunately (at home now) the details are all at work at the
>>>> moment,
>>>>>>>>> but I know that I didn't start this work until April (at the
>>>>>> earliest), so
>>>>>>>>> I'm surely using at least 1.21; is the fix more recent than that?
>>>>>> {If so,
>>>>>>>>> perhaps there is a bug.}
>>>>>>>>> 
>>>>>>>>> Fortunately, yea, JSON out is the intent; I need the data to be in
>>>> that
>>>>>>>>> format to set up a subsequent transform to AVRO, so it seems there
>>>> are
>>>>>> two
>>>>>>>>> possible ways out (depending on which version I'm running):
>>>> upgrade or
>>>>>>>>> change processors.  So, at least there is a path.
>>>>>>>>> 
>>>>>>>>> thx,
>>>>>>>>> 
>>>>>>>>> mew
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> 
>>

Re: JOLTTransformRecord problem

Reply via email to