hi Micah,

I agree with your reasoning. If supporting BE in some languages (e.g.
Java) is impractical due to performance regressions on LE platforms,
then I don't think it's worth it. But if it can be handled at compile
time or without runtime overhead, and tested / maintained properly on
an ongoing basis, then it seems reasonable to me. It seems that the
number of Arrow stakeholders will only increase from here so I would
hope that there will be more people invested in helping maintain BE in
the future.

- Wes

On Tue, Aug 25, 2020 at 11:33 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> I'm expanding the scope of this thread since it looks like work has also
> started for making golang support BigEndian architectures.
>
> I think as a community we should come to a consensus on whether we want to
> support Big Endian architectures in general.  I don't think it is a good
> outcome if some implementations accept PRs for Big Endian fixes and some
> don't.
>
> But maybe this is OK with others?
>
> My current opinion on the matter is that we should support it under the
> following conditions:
>
> 1.  As long as there is CI in place to catch regressions (right now I think
> the CI is fairly unreliable?)
> 2.  No degradation in performance for little-endian architectures (verified
> by additional micro benchmarks)
> 3.  Not a large amount of invasive code to distinguish between platforms.
>
> Kazuaki Ishizaki I asked question previously, but could you give some data
> points around:
> 1.  The current state of C++ support (how much code needed to change)?
> 2.  How many more PRs you expect to need for Java (and approximate size)?
>
> I think this would help myself and others in the decision making process.
>
> Thanks,
> Micah
>
> On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > My thoughts on the points raised so far:
> >
> > * Does supporting Big Endian increase the reach of Arrow by a lot?
> >
> > Probably not a significant amount, but it does provide one more avenue of
> > adoption.
> >
> > * Does it increase code complexity?
> >
> > Yes.  I agree this is a concern.  The PR in question did not seem too bad
> > to me but this is subjective.  I think the remaining question is how many
> > more places need to be fixed up in the code base and how invasive are the
> > changes.  In C++ IIUC it turned out to be a relatively small number of
> > places.
> >
> > Kazuaki Ishizaki have you been able to get the Java implementation working
> > fully locally?  How many additional PRs will be needed and what do
> > they look like (I think there already a few more in the queue)?
> >
> > * Will it introduce performance regressions?
> >
> > If done properly I suspect no, but I think if we continue with BigEndian
> > support the places that need to be touched should have benchmarks added to
> > confirm this (including for PRs already merged).
> >
> > Thanks,
> > Micah
> >
> > On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <liya.fa...@gmail.com> wrote:
> >
> >> Thank Kazuaki Ishizaki for working on this.
> >> IMO, supporting the big-endian should be a large change, as in many
> >> places of the code base, we have implicitly assumed the little-endian
> >> platform (e.g.
> >> https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/ByteFunctionHelpers.java
> >> ).
> >> Supporting the big-endian platform may introduce branches in such places
> >> (or virtual calls) which will affect the performance.
> >> So it would be helpful to evaluate the performance impact.
> >>
> >> Best,
> >> Liya Fan
> >>
> >>
> >> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <jacq...@apache.org>
> >> wrote:
> >>
> >>> Hey Micah, thanks for starting the discussion.
> >>>
> >>> I just skimmed that thread and it isn't entirely clear that there was a
> >>> conclusion that the overhead was worth it. I think everybody agrees that
> >>> it
> >>> would be nice to have the code work on both platforms. On the flipside,
> >>> the
> >>> code noise for a rare case makes the cost-benefit questionable.
> >>>
> >>> In the Java code, we wrote the code to explicitly disallow big endian
> >>> platforms and put preconditions checks in. I definitely think if we want
> >>> to
> >>> support this, it should be done holistically across the code with
> >>> appropriate test plan (both functional and perf).
> >>>
> >>> To me, the question is really about how many use cases are blocked by
> >>> this.
> >>> I'm not sure I've heard anyone say that the limiting factor to leveraging
> >>> Java Arrow was the block on endianess. Keep in mind that until very
> >>> recently, using any Arrow Java code would throw a preconditions check
> >>> before you could even get started on big-endian and I don't think we've
> >>> seen a bunch of messages on that exception. Adding if conditions
> >>> throughout
> >>> the codebase like this patch: [1] isn't exactly awesome and it can also
> >>> risk performance impacts depending on how carefully it is done.
> >>>
> >>> If there isn't a preponderance of evidence of many users being blocked by
> >>> this capability, I don't think we should accept the code. We already
> >>> have a
> >>> backlog of items that we need to address just ensure existing use cases
> >>> work well. Expanding to new use cases that there is no clear demand for
> >>> will likely just increase code development cost at little benefit.
> >>>
> >>> What do others think?
> >>>
> >>> [1] https://github.com/apache/arrow/pull/7923#issuecomment-674311119
> >>>
> >>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield <emkornfi...@gmail.com>
> >>> wrote:
> >>>
> >>> > Kazuaki Ishizak has started working on Big Endian support in Java
> >>> > (including setting up CI for it).  Thank you!
> >>> >
> >>> > We previously discussed support for Big Endian architectures in C++
> >>> [1] and
> >>> > generally agreed that it was a reasonable thing to do.
> >>> >
> >>> > Similar to C++ I think as long as we have a working CI setup it is
> >>> > reasonable for Java to support Big Endian machines.
> >>> >
> >>> > But I think there might be differing opinions so it is worth a
> >>> discussion
> >>> > to see if there are technical blockers or other reasons for not
> >>> supporting
> >>> > Big Endian architectures in the existing java implementation.
> >>> >
> >>> > Thanks,
> >>> > Micah
> >>> >
> >>> >
> >>> > [1]
> >>> >
> >>> >
> >>> https://lists.apache.org/thread.html/rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019%40%3Cdev.arrow.apache.org%3E
> >>> >
> >>>
> >>

Reply via email to