Hi Micah, Thanks for your summary. Your proposal sounds reasonable to me.
Best, Liya Fan On Tue, Sep 22, 2020 at 1:16 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > I wanted to give this thread a bump, does the proposal I made below sound > reasonable? > > On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > If I read the responses so far it seems like the following might be a > good > > compromise/summary: > > > > 1. It does not seem too invasive to support native endianness in > > implementation libraries. As long as there is appropriate performance > > testing and CI infrastructure to demonstrate the changes work. > > 2. It is up to implementation maintainers if they wish to accept PRs that > > handle byte swapping between different architectures. (Right now it > sounds > > like C++ is potentially OK with it and for Java at least Jacques is > opposed > > to it? > > > > Testing changes that break big-endian can be a potential drag on > developer > > productivity but there are methods to run locally (at least on more > recent > > OSes). > > > > Thoughts? > > > > Thanks, > > Micah > > > > On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > >> Thank Kazuaki for the survey and thank Micah for starting the > discussion. > >> > >> I do not oppose supporting BE. In fact, I am in general optimistic about > >> the performance impact (for Java). > >> IMO, this is going to be a painful way (many byte order related problems > >> are tricky to debug), so I hope we can make it short. > >> > >> It is good that someone is willing to take this on, and I would like to > >> provide help if needed. > >> > >> Best, > >> Liya Fan > >> > >> > >> > >> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <cutl...@gmail.com> wrote: > >> > >> > I also think this would be a worthwhile addition and help the project > >> > expand in more areas. Beyond the Apache Spark optimization use case, > >> having > >> > Arrow interoperability with the Python data science stack on BE would > be > >> > very useful. I have looked at the remaining PRs for Java and they seem > >> > pretty minimal and straightforward. Implementing the equivalent record > >> > batch swapping as done in C++ at [1] would be a little more involved, > >> but > >> > still reasonable. Would it make sense to create a branch to apply all > >> > remaining changes with CI to get a better picture before deciding on > >> > bringing into master branch? I could help out with shepherding this > >> effort > >> > and assist in maintenance, if we decide to accept. > >> > > >> > Bryan > >> > > >> > [1] https://github.com/apache/arrow/pull/7507 > >> > > >> > On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <wesmck...@gmail.com> > >> wrote: > >> > > >> > > I think it's well within the right of an implementation to reject BE > >> > > data (or non-native-endian), but if an implementation chooses to > >> > > implement and maintain the endianness conversions, then it does not > >> > > seem so bad to me. > >> > > > >> > > On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <jacq...@apache.org> > >> > wrote: > >> > > > > >> > > > And yes, for those of you looking closely, I commented on > ARROW-245 > >> > when > >> > > it > >> > > > was committed. I just forgot about it. > >> > > > > >> > > > It looks like I had mostly the same concerns then that I do now :) > >> Now > >> > > I'm > >> > > > just more worried about format sprawl... > >> > > > > >> > > > On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau < > jacq...@apache.org> > >> > > wrote: > >> > > > > >> > > > > What do you mean? The Endianness field (a Big|Little enum) was > >> > added 4 > >> > > > >> years ago: > >> > > > >> https://issues.apache.org/jira/browse/ARROW-245 > >> > > > > > >> > > > > > >> > > > > I didn't realize that was done, my bad. Good example of format > rot > >> > > from my > >> > > > > pov. > >> > > > > > >> > > > > > >> > > > > > >> > > > >> > > >> > > >