FYI - original thread in the archives for reference [1].
Thanks for your more complete example, this does indeed fail with the error you
indicate. I think it’s related to NIFI-8135 [2], which identified a deficiency
in the way Records are converted to Java Maps, particularly where CHOICE types
are involved.
The example data you’ve provided does indeed have a mix of String and Record
(JSON Object) values for the affected fields - this is a little unusual, but
certainly nothing that’s banned in the world of JSON, so should probably be
handled better by NiFi.
I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3]. I’d been
struggling to re-create the error for the ticket, but I think your example does
it nicely, so provides a good test for whether the problem is fixed -
unfortunately, when I run this example data against my branch, it still fails
albeit with a different error:
java.lang.ClassCastException: class java.lang.String cannot be cast to class
org.apache.nifi.serialization.record.Record (java.lang.String is in module
java.base of loader 'bootstrap'; org.apache.nifi.serialization.record.Record is
in unnamed module of loader org.apache.nifi.nar.NarClassLoader @4b3ad7ca)
at
org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893)
at
org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425)
...
So it seems there’s a little more debugging and work to do for NIFI-8135 yet,
One way of you working around this in your example would be to remove empty
fields from your JSON before passing it through the JOLT processors, e.g.
remove the field completely if it’s null/empty.
[1] https://lists.apache.org/thread/kcnsxvbbdfwfhj0tdsyn53x8ljhgdt1v
[2] https://issues.apache.org/jira/browse/NIFI-8135
[3] https://github.com/apache/nifi/pull/7746
Cheers,
---
Chris Sampson
IT Consultant
[email protected]
> On 10 Oct 2023, at 22:31, Mark Woodcock <[email protected]> wrote:
>
> Matt,
>
> 1) Yea, I definitely muddled what you meant by "just JSON" before. I'm
> definitely looking to get a bunch of records out...because I want to AVRO
> each of them later.
>
> 2) I'm sure I've also messed up the threading in the issues. Apparently, I
> should have subscribed before today (which I've done now); hopefully that
> will get better.
>
> 3) I'll see if I can make any of your suggestions work.
>
> thx,
>
> mew
>
>
> On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <[email protected]> wrote:
>
>> For some reason I don't have the original thread, I must've
>> inadvertently deleted it. IIRC your example input was a single JSON
>> object and I said if that were the case you could use
>> JoltTransformJSON instead. However if that is NOT the case (which is
>> your point c above) then you have a couple of options:
>>
>> 1) To continue using JoltTransformJSON with a top-level array you need
>> to surround your spec with
>> "*": { <your_current_spec> }
>> and will need to use "[&1]." in front of all the output fields. This
>> will output the transformation to the same index in the array as it
>> was in the input.
>>
>> 2) One major difference between JoltTransformJSON and
>> JoltTransformRecord is that the former reads the entire thing into
>> memory, where JoltTransformRecord reads one record at a time. So your
>> current spec should work with JoltTransformRecord, but if you are
>> still getting the original error, can you provide (or re-provide if
>> you already did, I can't find the original thread) sample input that
>> represents the "real" input (not just one JSON object if you'll be
>> getting multiple records or if the top-level is an array even with
>> only one object in it), desired output, and the error with full stack
>> trace? I'm guessing there is an inference error with complex fields,
>> if you know what the input and output schemas are you can provide them
>> to the Reader and Writer respectively instead of using "Infer schema".
>> That should work around any inference issues.
>> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
>> you can try that before your JoltTransformRecord processor with the
>> same reader and see what it comes out with as a schema. Then you can
>> manually alter it to better match your data and use that in the Reader
>> specified in JoltTransformRecord.
>>
>> Regards,
>> Matt
>>
>> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <[email protected]>
>> wrote:
>>>
>>> Chris,
>>>
>>> 1) I've upgraded to 1.23.2 (which appears to be the latest and
>> greatest).
>>>
>>> 2) I've tested the JoltTransformRecord with
>>> a) JsonTreeReader w/ InferredSchema
>>> b) JsonRecordSetWriter w/ InheritsSchema
>>> c) a GetFile processor which grabs a text file with the various bits of
>>> test data
>>>
>>> It appears that your suspicions are correct:
>>> i) if I test with just that single record as the entire content of the
>>> file, the processor is successful.
>>> ii) if I test with multiple records, none of which have the complicated
>>> inner field, all is successful.
>>> c) if I test with multiple records, where at least one has the
>> complicated
>>> inner field, I get the earlier noted error.
>>>
>>> IOW, yep, it only happens with *more* data.
>>>
>>> bummer,
>>>
>>> mew
>>>
>>> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
>>> <[email protected]> wrote:
>>>
>>>> Using your example (single JSON Object and Jolt Spec) seems to work
>> fine
>>>> in both JoltTransformJSON and JoltTransformRecord when run on the
>> current
>>>> main branch (which is for the upcoming 2.0.0 release).
>>>>
>>>> To test, I setup a GenerateFlowFile processor to output the example
>> JSON
>>>> you gave, then sent that through both of the Jolt processors using a
>>>> JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that
>>>> “Inherits Schema” for the Record processor.
>>>>
>>>> If you run *just* your example from this email chain through the Jolt
>>>> processors on the version of NiFi you’re using, do you see the errors
>> you
>>>> mention, or does that only happen with more data?
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> ---
>>>> Chris Sampson
>>>> IT Consultant
>>>> [email protected]
>>>>
>>>>
>>>>> On 10 Oct 2023, at 15:45, Mark Woodcock <[email protected]>
>>>> wrote:
>>>>>
>>>>> Hmmmm,
>>>>>
>>>>> One small problem: While JOLTTransformJSON is quite lovely (a) it
>> has a
>>>>> great "advanced" interface that allows one to test their spec and
>> json
>>>>> inputs and (b) it actually works for the cases that I noted...it
>> treats
>>>> the
>>>>> input a single blob of JSON. Unfortunately, my input files are
>>>> collections
>>>>> of JSON records (which--less the noted problem--JOLTTransformRecord
>> does
>>>>> quite nicely with)--that's literally how they arrive, not the result
>> of
>>>> me
>>>>> formatting them at all.
>>>>>
>>>>> Is there a way to get JTJ to treat the input as records?
>>>>> Does 1.22 or 1.23 have the fix for JTR?
>>>>>
>>>>> thx,
>>>>>
>>>>> mew
>>>>>
>>>>>
>>>>> On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <[email protected]>
>> wrote:
>>>>>
>>>>>> confirmed: version 1.21.
>>>>>> How recent is the fix?
>>>>>>
>>>>>> thx,
>>>>>>
>>>>>> mew
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <[email protected]>
>>>> wrote:
>>>>>>
>>>>>>> Matt,
>>>>>>>
>>>>>>> Unfortunately (at home now) the details are all at work at the
>> moment,
>>>>>>> but I know that I didn't start this work until April (at the
>>>> earliest), so
>>>>>>> I'm surely using at least 1.21; is the fix more recent than that?
>>>> {If so,
>>>>>>> perhaps there is a bug.}
>>>>>>>
>>>>>>> Fortunately, yea, JSON out is the intent; I need the data to be in
>> that
>>>>>>> format to set up a subsequent transform to AVRO, so it seems there
>> are
>>>> two
>>>>>>> possible ways out (depending on which version I'm running):
>> upgrade or
>>>>>>> change processors. So, at least there is a path.
>>>>>>>
>>>>>>> thx,
>>>>>>>
>>>>>>> mew
>>>>>>>
>>>>>>>
>>>>
>>>>
>>