Re: Avro to ORC Conversion

Ryan Schachte Thu, 16 Jul 2020 13:27:24 -0700

Hey Matt,
I'm exploring the NiFi code and I feel I'm really close. This code will
probably work great for me, but I'm getting a failure because it's seeing a
TypeInfo struct once I delegate the code to:
row[i] = OrcUtils.convertToORCObject(OrcUtils.getOrcField(fieldSchema), o);


*Java Code:*

> Schema avroSchema = record.getSchema();
> TypeInfo orcInfo = OrcUtils.getOrcField(avroSchema);
> TypeDescription orcSchema =
> TypeDescription.fromString(orcInfo.getTypeName());
> Writer orcWriter = OrcWriter.createWriter(orcSchema,
> driver.generateTmpPath());


*Sample Data:*
struct<businessDayDate:string,anotherField:string,anotherFieldAgain:string>

On Wed, Jul 15, 2020 at 11:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> It's good for you, Ryan, because there are many alternatives.
>
> FYI, Apache Spark 3.0.0 is using Apache Hive 2.3.7.
> And, everything is running in a local mode on the single container.
>
> > Would the entire thing run within the same container and then I leverage
> the Spark APIs from that in local mode?
>
> More simply, you can generate a minimal scala script on the runtime like
> the following and run via Spark shell in that container.
>
>     $ cat hello.scala
>     print("a")
>     $ bin/spark-shell -I hello.scala
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jul 15, 2020 at 7:34 PM Matt Burgess <mattyb...@apache.org> wrote:
>
> > Ryan,
> >
> > It's possible there are some changes that would cause that code not to
> > compile for Hive 2, but I have done some work with porting similar
> > processors to Hive 2 and as I recall it was mostly API-type breaking
> > changes and not so much from the behavior side of things, more of a
> > Maven and Java-package-name kind of thing.
> >
> > Regards,
> > Matt
> >
> > On Wed, Jul 15, 2020 at 8:39 PM Ryan Schachte
> > <coderyanschac...@gmail.com> wrote:
> > >
> > > Great, thanks Matt! Looking at this code now and feel this will really
> > help
> > > me a lot. Anything you think would break using this logic for Hive
> 2.3.5?
> > >
> > > On Wed, Jul 15, 2020 at 5:04 PM Matt Burgess <mattyb...@apache.org>
> > wrote:
> > >
> > > > Ryan,
> > > >
> > > > In Apache NiFi we have a ConvertAvroToOrc processor [1], you may find
> > > > code there that you can use in your Java program (take a look at line
> > > > 212 and down). We had to create our own OrcFileWriter because the one
> > > > in Apache ORC writes to a FileSystem where we needed to write to our
> > > > own FlowFile component. But all the relevant code should be there
> (you
> > > > can replace the createWriter() call with the normal ORC one); one
> > > > caveat is that it's for Apache Hive 1.2, you may need to make changes
> > > > if you're using Hive 3 libraries for example.
> > > >
> > > > Regards,
> > > > Matt
> > > >
> > > > [1]
> > > >
> >
> https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java
> > > >
> > > > On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte
> > > > <coderyanschac...@gmail.com> wrote:
> > > > >
> > > > > I'm writing a standalone Java process and interested in converting
> > the
> > > > > consumed Avro messages to ORC. I've seen a plethora of examples of
> > > > writing
> > > > > to ORC, but the conversion to ORC from Avro is what I can't seem to
> > find
> > > > a
> > > > > lot of examples of.
> > > > >
> > > > > This is just a standard Java process running inside of a container.
> > > >
> >
>

Re: Avro to ORC Conversion

Reply via email to