Matt, 1) Yea, I definitely muddled what you meant by "just JSON" before. I'm definitely looking to get a bunch of records out...because I want to AVRO each of them later.
2) I'm sure I've also messed up the threading in the issues. Apparently, I should have subscribed before today (which I've done now); hopefully that will get better. 3) I'll see if I can make any of your suggestions work. thx, mew On Tue, Oct 10, 2023 at 5:21 PM Matt Burgess <mattyb...@apache.org> wrote: > For some reason I don't have the original thread, I must've > inadvertently deleted it. IIRC your example input was a single JSON > object and I said if that were the case you could use > JoltTransformJSON instead. However if that is NOT the case (which is > your point c above) then you have a couple of options: > > 1) To continue using JoltTransformJSON with a top-level array you need > to surround your spec with > "*": { <your_current_spec> } > and will need to use "[&1]." in front of all the output fields. This > will output the transformation to the same index in the array as it > was in the input. > > 2) One major difference between JoltTransformJSON and > JoltTransformRecord is that the former reads the entire thing into > memory, where JoltTransformRecord reads one record at a time. So your > current spec should work with JoltTransformRecord, but if you are > still getting the original error, can you provide (or re-provide if > you already did, I can't find the original thread) sample input that > represents the "real" input (not just one JSON object if you'll be > getting multiple records or if the top-level is an array even with > only one object in it), desired output, and the error with full stack > trace? I'm guessing there is an inference error with complex fields, > if you know what the input and output schemas are you can provide them > to the Reader and Writer respectively instead of using "Infer schema". > That should work around any inference issues. > With NiFi 1.23.2 you also have the new ExtractRecordSchema processor, > you can try that before your JoltTransformRecord processor with the > same reader and see what it comes out with as a schema. Then you can > manually alter it to better match your data and use that in the Reader > specified in JoltTransformRecord. > > Regards, > Matt > > On Tue, Oct 10, 2023 at 4:47 PM Mark Woodcock <woodc...@usna.edu.invalid> > wrote: > > > > Chris, > > > > 1) I've upgraded to 1.23.2 (which appears to be the latest and > greatest). > > > > 2) I've tested the JoltTransformRecord with > > a) JsonTreeReader w/ InferredSchema > > b) JsonRecordSetWriter w/ InheritsSchema > > c) a GetFile processor which grabs a text file with the various bits of > > test data > > > > It appears that your suspicions are correct: > > i) if I test with just that single record as the entire content of the > > file, the processor is successful. > > ii) if I test with multiple records, none of which have the complicated > > inner field, all is successful. > > c) if I test with multiple records, where at least one has the > complicated > > inner field, I get the earlier noted error. > > > > IOW, yep, it only happens with *more* data. > > > > bummer, > > > > mew > > > > On Tue, Oct 10, 2023 at 2:43 PM Chris Sampson > > <chris.samp...@naimuri.com.invalid> wrote: > > > > > Using your example (single JSON Object and Jolt Spec) seems to work > fine > > > in both JoltTransformJSON and JoltTransformRecord when run on the > current > > > main branch (which is for the upcoming 2.0.0 release). > > > > > > To test, I setup a GenerateFlowFile processor to output the example > JSON > > > you gave, then sent that through both of the Jolt processors using a > > > JsonTreeReader with “Inferred Schema”, and a JsonRecordSetWriter that > > > “Inherits Schema” for the Record processor. > > > > > > If you run *just* your example from this email chain through the Jolt > > > processors on the version of NiFi you’re using, do you see the errors > you > > > mention, or does that only happen with more data? > > > > > > > > > Cheers, > > > > > > --- > > > Chris Sampson > > > IT Consultant > > > chris.samp...@naimuri.com > > > > > > > > > > On 10 Oct 2023, at 15:45, Mark Woodcock <woodc...@usna.edu.INVALID> > > > wrote: > > > > > > > > Hmmmm, > > > > > > > > One small problem: While JOLTTransformJSON is quite lovely (a) it > has a > > > > great "advanced" interface that allows one to test their spec and > json > > > > inputs and (b) it actually works for the cases that I noted...it > treats > > > the > > > > input a single blob of JSON. Unfortunately, my input files are > > > collections > > > > of JSON records (which--less the noted problem--JOLTTransformRecord > does > > > > quite nicely with)--that's literally how they arrive, not the result > of > > > me > > > > formatting them at all. > > > > > > > > Is there a way to get JTJ to treat the input as records? > > > > Does 1.22 or 1.23 have the fix for JTR? > > > > > > > > thx, > > > > > > > > mew > > > > > > > > > > > > On Mon, Oct 9, 2023 at 3:21 PM Mark Woodcock <woodc...@usna.edu> > wrote: > > > > > > > >> confirmed: version 1.21. > > > >> How recent is the fix? > > > >> > > > >> thx, > > > >> > > > >> mew > > > >> > > > >> > > > >> On Sun, Oct 8, 2023 at 11:39 PM Mark Woodcock <woodc...@usna.edu> > > > wrote: > > > >> > > > >>> Matt, > > > >>> > > > >>> Unfortunately (at home now) the details are all at work at the > moment, > > > >>> but I know that I didn't start this work until April (at the > > > earliest), so > > > >>> I'm surely using at least 1.21; is the fix more recent than that? > > > {If so, > > > >>> perhaps there is a bug.} > > > >>> > > > >>> Fortunately, yea, JSON out is the intent; I need the data to be in > that > > > >>> format to set up a subsequent transform to AVRO, so it seems there > are > > > two > > > >>> possible ways out (depending on which version I'm running): > upgrade or > > > >>> change processors. So, at least there is a path. > > > >>> > > > >>> thx, > > > >>> > > > >>> mew > > > >>> > > > >>> > > > > > > >