I've had similar issues with the different processing done by
JoltTransformJSON and JoltTransformRecord and threw my hands in the air and
just ExcuteScript out to some Python that transforms the data correctly.

Try to minimise content transformations so the content repository doesn't
bloat with interim forms.

On Fri, 13 Oct 2023, 01:26 Mark Woodcock, <woodc...@usna.edu.invalid> wrote:

> Chris,
>
> I guess I was so wrapped up in the ConvertRecord, it didn't occur to me to
> have the JOLT processor just emit a different format; not sure how I missed
> that.
> So, I re-rigged my flow to have  ReplaceText -> JOLTTransform ->
> UpdateAttribute -> etc.   Unfortunately, I got the attached error.
>
> This is running against the set of data that I provided earlier (that is
> the test case for the bug).  It's first 3 records all lack dimension and
> eta fields (which my ReplaceText process has removed all evidence of);
> followed by one record that has record values for both fields.  The error
> seems to be saying that those early records (at the time of writing to
> Avro) don't have the appropriate format, because they have "null" values
> (in the ETA and dimensions fields) which are not valid for the JSON schema
> which has been inferred.
>
> Why is it creating a single JSON schema for the whole file instead of
> managing each record independently?  (and not that it really matters, but
> hey, the short records are coming first, why precompute the schema before
> finishing the processing of any record?)
>
> And, is there something else I can try?
>
> thx,
>
> mew
>
>
>
>
> On Wed, Oct 11, 2023 at 4:35 PM Chris Sampson
> <chris.samp...@naimuri.com.invalid> wrote:
>
>> You could use an AvroWriter to output the results of the
>> JoltTransformRecord - it doesn’t need to be JSON (in or out), that’s one of
>> the great things of NiFi’s Record processors - if there’s a Reader and
>> Writer in the format you want, you can use that data and the Writer doesn’t
>> need to be the same format as the Reader.
>>
>> Good news: I’ve identified the problem in my NIFI-8135 PR [1] by adding a
>> cut-down version of your example as a unit test for the JoltTransformRecord
>> processor.
>>
>> However, I’m not so sure the output is quite what you were expecting -
>> see the
>> nifi-nar-bundles/nifi-jolt-record-bundle/nifi-jolt-record-processors/src/test/resources/TestJoltTransformRecord/flattenedOutput.json
>> file in the linked PR, the “Eta”’s “value” field appears as a Java Map
>> serialised as a String, I imagine you were wanting this to be a nested
>> Object?
>>
>> If the latter, I think we’re then running into NIFI-8134 [2], for which I
>> have a separate PR ready for review [3].
>>
>> [1]: https://github.com/apache/nifi/pull/7746/files
>>
>> [2]: https://issues.apache.org/jira/browse/NIFI-8134
>>
>> [3]: https://github.com/apache/nifi/pull/7745
>>
>>
>> Cheers,
>>
>> ---
>> Chris Sampson
>> IT Consultant
>> chris.samp...@naimuri.com
>>
>>
>> > On 11 Oct 2023, at 19:22, Mark Woodcock <woodc...@usna.edu.INVALID>
>> wrote:
>> >
>> > Chris,
>> >
>> > 1) well, reassuring to learn that I've found an actual bug; and
>> pleasing to
>> > know that I constructed an effective and illuminating test.  hurrah.
>> >
>> > 2) So, I can certainly use the ReplaceText (is there a better choice?)
>> > processor to ditch any field that looks like "whatever": "", (and I
>> > successfully implemented it), but unfortunately when I pass the
>> resulting
>> > json onto another processor (e.g. a ConvertRecord, so I can spit out
>> AVRO),
>> > the fact that the data now has different schemas causes an error.  Is
>> this
>> > just kicking the can down the road?
>> >
>> > thx,
>> >
>> > mew
>> >
>> >
>> > On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson
>> > <chris.samp...@naimuri.com.invalid> wrote:
>> >
>> >> FYI - original thread in the archives for reference [1].
>> >>
>> >> Thanks for your more complete example, this does indeed fail with the
>> >> error you indicate. I think it’s related to NIFI-8135 [2], which
>> identified
>> >> a deficiency in the way Records are converted to Java Maps,
>> particularly
>> >> where CHOICE types are involved.
>> >>
>> >> The example data you’ve provided does indeed have a mix of String and
>> >> Record (JSON Object) values for the affected fields - this is a little
>> >> unusual, but certainly nothing that’s banned in the world of JSON, so
>> >> should probably be handled better by NiFi.
>> >>
>> >> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3].
>> I’d
>> >> been struggling to re-create the error for the ticket, but I think your
>> >> example does it nicely, so provides a good test for whether the
>> problem is
>> >> fixed - unfortunately, when I run this example data against my branch,
>> it
>> >> still fails albeit with a different error:
>> >>
>> >> java.lang.ClassCastException: class java.lang.String cannot be cast to
>> >> class org.apache.nifi.serialization.record.Record (java.lang.String is
>> in
>> >> module java.base of loader 'bootstrap';
>> >> org.apache.nifi.serialization.record.Record is in unnamed module of
>> loader
>> >> org.apache.nifi.nar.NarClassLoader @4b3ad7ca)
>> >>        at
>> >>
>> org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893)
>> >>        at
>> >>
>> org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425)
>> >>        ...
>> >>
>> >> So it seems there’s a little more debugging and work to do for
>> NIFI-8135
>> >> yet,
>> >>
>> >> One way of you working around this in your example would be to remove
>> >> empty fields from your JSON before passing it through the JOLT
>> processors,
>> >> e.g. remove the field completely if it’s null/empty.
>> >>
>> >>
>> >> [1] https://lists.apache.org/thread/kcnsxvbbdfwfhj0tdsyn53x8ljhgdt1v
>> >>
>> >> [2] https://issues.apache.org/jira/browse/NIFI-8135
>> >>
>> >> [3] https://github.com/apache/nifi/pull/7746
>> >>
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> ---
>> >> Chris Sampson
>> >> IT Consultant
>> >> chris.samp...@naimuri.com
>> >>
>> >>
>> >>> On 10 Oct 2023, at 22:31, Mark Woodcock <woodc...@usna.edu.INVALID>
>> >> wrote:
>> >>>
>> >>> Matt,
>> >>>
>> >>> 1) Yea, I definitely muddled what you meant by "just JSON" before.
>>  I'm
>> >>> definitely looking to get a bunch of records out...because I want to
>> AVRO
>> >>> each of them later.
>> >>>
>> >>> 2) I'm sure I've also messed up the threading in the issues.
>> >> Apparently, I
>> >>> should have subscribed before today (which I've done now); hopefully
>> that
>> >>> will get better.
>> >>>
>> >>> 3) I'll see if I can make any of your suggestions work.
>> >>>
>> >>> thx,
>> >>>
>> >>> mew
>> >>>
>> >>>
>> >>> On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <mattyb...@apache.org>
>> >> wrote:
>> >>>
>> >>>> For some reason I don't have the original thread, I must've
>> >>>> inadvertently deleted it. IIRC your example input was a single JSON
>> >>>> object and I said if that were the case you could use
>> >>>> JoltTransformJSON instead. However if that is NOT the case (which is
>> >>>> your point c above) then you have a couple of options:
>> >>>>
>> >>>> 1) To continue using JoltTransformJSON with a top-level array you
>> need
>> >>>> to surround your spec with
>> >>>> "*": { <your_current_spec> }
>> >>>> and will need to use "[&1]." in front of all the output fields. This
>> >>>> will output the transformation to the same index in the array as it
>> >>>> was in the input.
>> >>>>
>> >>>> 2) One major difference between JoltTransformJSON and
>> >>>> JoltTransformRecord is that the former reads the entire thing into
>> >>>> memory, where JoltTransformRecord reads one record at a time. So your
>> >>>> current spec should work with JoltTransformRecord, but if you are
>> >>>> still getting the original error, can you provide (or re-provide if
>> >>>> you already did, I can't find the original thread) sample input that
>> >>>> represents the "real" input (not just one JSON object if you'll be
>> >>>> getting multiple records or if the top-level is an array even with
>> >>>> only one object in it), desired output, and the error with full stack
>> >>>> trace? I'm guessing there is an inference error with complex fields,
>> >>>> if you know what the input and output schemas are you can provide
>> them
>> >>>> to the Reader and Writer respectively instead of using "Infer
>> schema".
>> >>>> That should work around any inference issues.
>> >>>> With NiFi 1.23.2 you also have the new ExtractRecordSchema processor,
>> >>>> you can try that before your JoltTransformRecord processor with the
>> >>>> same reader and see what it comes out with as a schema. Then you can
>> >>>> manually alter it to better match your data and use that in the
>> Reader
>> >>>> specified in JoltTransformRecord.
>> >>>>
>> >>>> Regards,
>> >>>> Matt
>> >>>>
>> >>>> On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock
>> <woodc...@usna.edu.invalid
>> >>>
>> >>>> wrote:
>> >>>>>
>> >>>>> Chris,
>> >>>>>
>> >>>>> 1) I've upgraded to 1.23.2  (which appears to be the latest and
>> >>>> greatest).
>> >>>>>
>> >>>>> 2) I've tested the JoltTransformRecord with
>> >>>>> a) JsonTreeReader w/ InferredSchema
>> >>>>> b) JsonRecordSetWriter w/ InheritsSchema
>> >>>>> c) a GetFile processor which grabs a text file with the various
>> bits of
>> >>>>> test data
>> >>>>>
>> >>>>> It appears that your suspicions are correct:
>> >>>>> i) if I test with just that single record as the entire content of
>> the
>> >>>>> file, the processor is successful.
>> >>>>> ii) if I test with multiple records, none of which have the
>> complicated
>> >>>>> inner field, all is successful.
>> >>>>> c) if I test with multiple records, where at least one has the
>> >>>> complicated
>> >>>>> inner field, I get the earlier noted error.
>> >>>>>
>> >>>>> IOW, yep, it only happens with *more* data.
>> >>>>>
>> >>>>> bummer,
>> >>>>>
>> >>>>> mew
>> >>>>>
>> >>>>> On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson
>> >>>>> <chris.samp...@naimuri.com.invalid> wrote:
>> >>>>>
>> >>>>>> Using your example (single JSON Object and Jolt Spec) seems to work
>> >>>> fine
>> >>>>>> in both JoltTransformJSON and JoltTransformRecord when run on the
>> >>>> current
>> >>>>>> main branch (which is for the upcoming 2.0.0 release).
>> >>>>>>
>> >>>>>> To test, I setup a GenerateFlowFile processor to output the example
>> >>>> JSON
>> >>>>>> you gave, then sent that through both of the Jolt processors using
>> a
>> >>>>>> JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter
>> that
>> >>>>>> “Inherits Schema” for the Record processor.
>> >>>>>>
>> >>>>>> If you run *just* your example from this email chain through the
>> Jolt
>> >>>>>> processors on the version of NiFi you’re using, do you see the
>> errors
>> >>>> you
>> >>>>>> mention, or does that only happen with more data?
>> >>>>>>
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>>
>> >>>>>> ---
>> >>>>>> Chris Sampson
>> >>>>>> IT Consultant
>> >>>>>> chris.samp...@naimuri.com
>> >>>>>>
>> >>>>>>
>> >>>>>>> On 10 Oct 2023, at 15:45, Mark Woodcock <woodc...@usna.edu.INVALID
>> >
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Hmmmm,
>> >>>>>>>
>> >>>>>>> One small problem:  While JOLTTransformJSON is quite lovely (a) it
>> >>>> has a
>> >>>>>>> great "advanced" interface that allows one to test their spec and
>> >>>> json
>> >>>>>>> inputs and (b) it actually works for the cases that I noted...it
>> >>>> treats
>> >>>>>> the
>> >>>>>>> input a single blob of JSON.  Unfortunately, my input files are
>> >>>>>> collections
>> >>>>>>> of JSON records (which--less the noted
>> problem--JOLTTransformRecord
>> >>>> does
>> >>>>>>> quite nicely with)--that's literally how they arrive, not the
>> result
>> >>>> of
>> >>>>>> me
>> >>>>>>> formatting them at all.
>> >>>>>>>
>> >>>>>>> Is there a way to get JTJ to treat the input as records?
>> >>>>>>> Does 1.22 or 1.23 have the fix for JTR?
>> >>>>>>>
>> >>>>>>> thx,
>> >>>>>>>
>> >>>>>>> mew
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <woodc...@usna.edu>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> confirmed:  version 1.21.
>> >>>>>>>> How recent is the fix?
>> >>>>>>>>
>> >>>>>>>> thx,
>> >>>>>>>>
>> >>>>>>>> mew
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <woodc...@usna.edu
>> >
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Matt,
>> >>>>>>>>>
>> >>>>>>>>> Unfortunately (at home now) the details are all at work at the
>> >>>> moment,
>> >>>>>>>>> but I know that I didn't start this work until April (at the
>> >>>>>> earliest), so
>> >>>>>>>>> I'm surely using at least 1.21; is the fix more recent than
>> that?
>> >>>>>> {If so,
>> >>>>>>>>> perhaps there is a bug.}
>> >>>>>>>>>
>> >>>>>>>>> Fortunately, yea, JSON out is the intent; I need the data to be
>> in
>> >>>> that
>> >>>>>>>>> format to set up a subsequent transform to AVRO, so it seems
>> there
>> >>>> are
>> >>>>>> two
>> >>>>>>>>> possible ways out (depending on which version I'm running):
>> >>>> upgrade or
>> >>>>>>>>> change processors.  So, at least there is a path.
>> >>>>>>>>>
>> >>>>>>>>> thx,
>> >>>>>>>>>
>> >>>>>>>>> mew
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>
>> >>
>>
>>

Reply via email to