hi Micah, I agree with your reasoning. If supporting BE in some languages (e.g. Java) is impractical due to performance regressions on LE platforms, then I don't think it's worth it. But if it can be handled at compile time or without runtime overhead, and tested / maintained properly on an ongoing basis, then it seems reasonable to me. It seems that the number of Arrow stakeholders will only increase from here so I would hope that there will be more people invested in helping maintain BE in the future.
- Wes On Tue, Aug 25, 2020 at 11:33 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > > I'm expanding the scope of this thread since it looks like work has also > started for making golang support BigEndian architectures. > > I think as a community we should come to a consensus on whether we want to > support Big Endian architectures in general. I don't think it is a good > outcome if some implementations accept PRs for Big Endian fixes and some > don't. > > But maybe this is OK with others? > > My current opinion on the matter is that we should support it under the > following conditions: > > 1. As long as there is CI in place to catch regressions (right now I think > the CI is fairly unreliable?) > 2. No degradation in performance for little-endian architectures (verified > by additional micro benchmarks) > 3. Not a large amount of invasive code to distinguish between platforms. > > Kazuaki Ishizaki I asked question previously, but could you give some data > points around: > 1. The current state of C++ support (how much code needed to change)? > 2. How many more PRs you expect to need for Java (and approximate size)? > > I think this would help myself and others in the decision making process. > > Thanks, > Micah > > On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > My thoughts on the points raised so far: > > > > * Does supporting Big Endian increase the reach of Arrow by a lot? > > > > Probably not a significant amount, but it does provide one more avenue of > > adoption. > > > > * Does it increase code complexity? > > > > Yes. I agree this is a concern. The PR in question did not seem too bad > > to me but this is subjective. I think the remaining question is how many > > more places need to be fixed up in the code base and how invasive are the > > changes. In C++ IIUC it turned out to be a relatively small number of > > places. > > > > Kazuaki Ishizaki have you been able to get the Java implementation working > > fully locally? How many additional PRs will be needed and what do > > they look like (I think there already a few more in the queue)? > > > > * Will it introduce performance regressions? > > > > If done properly I suspect no, but I think if we continue with BigEndian > > support the places that need to be touched should have benchmarks added to > > confirm this (including for PRs already merged). > > > > Thanks, > > Micah > > > > On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > >> Thank Kazuaki Ishizaki for working on this. > >> IMO, supporting the big-endian should be a large change, as in many > >> places of the code base, we have implicitly assumed the little-endian > >> platform (e.g. > >> https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/ByteFunctionHelpers.java > >> ). > >> Supporting the big-endian platform may introduce branches in such places > >> (or virtual calls) which will affect the performance. > >> So it would be helpful to evaluate the performance impact. > >> > >> Best, > >> Liya Fan > >> > >> > >> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <jacq...@apache.org> > >> wrote: > >> > >>> Hey Micah, thanks for starting the discussion. > >>> > >>> I just skimmed that thread and it isn't entirely clear that there was a > >>> conclusion that the overhead was worth it. I think everybody agrees that > >>> it > >>> would be nice to have the code work on both platforms. On the flipside, > >>> the > >>> code noise for a rare case makes the cost-benefit questionable. > >>> > >>> In the Java code, we wrote the code to explicitly disallow big endian > >>> platforms and put preconditions checks in. I definitely think if we want > >>> to > >>> support this, it should be done holistically across the code with > >>> appropriate test plan (both functional and perf). > >>> > >>> To me, the question is really about how many use cases are blocked by > >>> this. > >>> I'm not sure I've heard anyone say that the limiting factor to leveraging > >>> Java Arrow was the block on endianess. Keep in mind that until very > >>> recently, using any Arrow Java code would throw a preconditions check > >>> before you could even get started on big-endian and I don't think we've > >>> seen a bunch of messages on that exception. Adding if conditions > >>> throughout > >>> the codebase like this patch: [1] isn't exactly awesome and it can also > >>> risk performance impacts depending on how carefully it is done. > >>> > >>> If there isn't a preponderance of evidence of many users being blocked by > >>> this capability, I don't think we should accept the code. We already > >>> have a > >>> backlog of items that we need to address just ensure existing use cases > >>> work well. Expanding to new use cases that there is no clear demand for > >>> will likely just increase code development cost at little benefit. > >>> > >>> What do others think? > >>> > >>> [1] https://github.com/apache/arrow/pull/7923#issuecomment-674311119 > >>> > >>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield <emkornfi...@gmail.com> > >>> wrote: > >>> > >>> > Kazuaki Ishizak has started working on Big Endian support in Java > >>> > (including setting up CI for it). Thank you! > >>> > > >>> > We previously discussed support for Big Endian architectures in C++ > >>> [1] and > >>> > generally agreed that it was a reasonable thing to do. > >>> > > >>> > Similar to C++ I think as long as we have a working CI setup it is > >>> > reasonable for Java to support Big Endian machines. > >>> > > >>> > But I think there might be differing opinions so it is worth a > >>> discussion > >>> > to see if there are technical blockers or other reasons for not > >>> supporting > >>> > Big Endian architectures in the existing java implementation. > >>> > > >>> > Thanks, > >>> > Micah > >>> > > >>> > > >>> > [1] > >>> > > >>> > > >>> https://lists.apache.org/thread.html/rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019%40%3Cdev.arrow.apache.org%3E > >>> > > >>> > >>