Re: ORC 2.0

David Tue, 31 Aug 2021 06:12:31 -0700

Hello,

Thank you for your interest.


I am proposing tagging the 1.x line and reserving it for JDK 8
Moving the 'main' branch to be built on a minimum of JDK11

Note that the Premier Support for JDK8 expires in March 2022.

https://www.oracle.com/java/technologies/java-se-support-roadmap.html

I suspect that there are new APIs within JDK 11 that will enhance
performance of the ORC.  In particular I see that there are a bunch of
improvements around comparing byte arrays (which ORC does quite a bit of).

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Arrays.java#L2700
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L228

I am then proposing that we begin the process of iteratively removing the
hadoop dependencies.  At a minimum ORC 2.0 is released once that work is
completed.

Thanks.

On Mon, Aug 30, 2021 at 3:45 PM Dongjoon Hyun <[email protected]>
wrote:

> Thank you for sending an email.
> Could you elaborate more about your background?
>
> The following is my opinion at first glance.
>
> For (1), Apache ORC supports Java 8/11/17
> without any problem as you see our CI test coverage.
> I'm -1 for dropping Java 8 support because
> We still have lots of customers who are on JDK8 still.
> Specifically, Apache Spark distribution should be built with JDK8.
>
> For (2), there is Owen's PR in the community.
>
>     https://github.com/apache/orc/pull/641
>     ORC-508 remove hadoop dependency
>
> So, I'm wondering if you are
>     A. Proposing a new PR, or
>     B. Taking over Owen's PR
>
> Thanks,
> Dongjoon.
>
> On Mon, Aug 30, 2021 at 6:11 AM David <[email protected]> wrote:
>
> > Hello Gang,
> >
> > Thank you for being very accommodating and welcoming to my sometimes
> > tedious pull requests.
> >
> > I'm not sure of the capacity of the participants of the project, but I
> > would like to propose starting on ORC v2 with the following objectives:
> >
> > 1. Moving to JDK 11 (LTS)
> > 2. Removing the direct dependencies on Hadoop of core ORC (and scrubbing
> > many of the mentions to "Hadoop" from the website).
> >
> > Thanks,
> > David (Belugabehr)
> >
>

Re: ORC 2.0

Reply via email to