Hey Matt, I'm exploring the NiFi code and I feel I'm really close. This code will probably work great for me, but I'm getting a failure because it's seeing a TypeInfo struct once I delegate the code to: row[i] = OrcUtils.convertToORCObject(OrcUtils.getOrcField(fieldSchema), o);
*Java Code:* > Schema avroSchema = record.getSchema(); > TypeInfo orcInfo = OrcUtils.getOrcField(avroSchema); > TypeDescription orcSchema = > TypeDescription.fromString(orcInfo.getTypeName()); > Writer orcWriter = OrcWriter.createWriter(orcSchema, > driver.generateTmpPath()); *Sample Data:* struct<businessDayDate:string,anotherField:string,anotherFieldAgain:string> On Wed, Jul 15, 2020 at 11:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > It's good for you, Ryan, because there are many alternatives. > > FYI, Apache Spark 3.0.0 is using Apache Hive 2.3.7. > And, everything is running in a local mode on the single container. > > > Would the entire thing run within the same container and then I leverage > the Spark APIs from that in local mode? > > More simply, you can generate a minimal scala script on the runtime like > the following and run via Spark shell in that container. > > $ cat hello.scala > print("a") > $ bin/spark-shell -I hello.scala > > Bests, > Dongjoon. > > > On Wed, Jul 15, 2020 at 7:34 PM Matt Burgess <mattyb...@apache.org> wrote: > > > Ryan, > > > > It's possible there are some changes that would cause that code not to > > compile for Hive 2, but I have done some work with porting similar > > processors to Hive 2 and as I recall it was mostly API-type breaking > > changes and not so much from the behavior side of things, more of a > > Maven and Java-package-name kind of thing. > > > > Regards, > > Matt > > > > On Wed, Jul 15, 2020 at 8:39 PM Ryan Schachte > > <coderyanschac...@gmail.com> wrote: > > > > > > Great, thanks Matt! Looking at this code now and feel this will really > > help > > > me a lot. Anything you think would break using this logic for Hive > 2.3.5? > > > > > > On Wed, Jul 15, 2020 at 5:04 PM Matt Burgess <mattyb...@apache.org> > > wrote: > > > > > > > Ryan, > > > > > > > > In Apache NiFi we have a ConvertAvroToOrc processor [1], you may find > > > > code there that you can use in your Java program (take a look at line > > > > 212 and down). We had to create our own OrcFileWriter because the one > > > > in Apache ORC writes to a FileSystem where we needed to write to our > > > > own FlowFile component. But all the relevant code should be there > (you > > > > can replace the createWriter() call with the normal ORC one); one > > > > caveat is that it's for Apache Hive 1.2, you may need to make changes > > > > if you're using Hive 3 libraries for example. > > > > > > > > Regards, > > > > Matt > > > > > > > > [1] > > > > > > > https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java > > > > > > > > On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte > > > > <coderyanschac...@gmail.com> wrote: > > > > > > > > > > I'm writing a standalone Java process and interested in converting > > the > > > > > consumed Avro messages to ORC. I've seen a plethora of examples of > > > > writing > > > > > to ORC, but the conversion to ORC from Avro is what I can't seem to > > find > > > > a > > > > > lot of examples of. > > > > > > > > > > This is just a standard Java process running inside of a container. > > > > > > >