I agree with Byran. If it's acceptable to have another job to test with Python 3.5 and pyarrow 0.10.0, I am leaning towards upgrading arrow.
Arrow 0.10.0 has tons of bug fixes and improves from 0.8.0, including important memory leak fixes such as https://issues.apache.org/jira/browse/ARROW-1973. I think releasing with 0.10.0 will improve the overall experience of arrow related features quite bit. I also think it's a good idea to test against newer Python versions. But I don't know how difficult it is and whether or not it's feasible to resolve that between branch cut and RC cut. On Fri, Aug 10, 2018 at 5:44 PM, shane knapp <skn...@berkeley.edu> wrote: > see: https://github.com/apache/spark/pull/21939#issuecomment-412154343 > > yes, i can set up a build. have some Qs in the PR about building the > spark package before running the python tests. > > On Fri, Aug 10, 2018 at 10:41 AM, Bryan Cutler <cutl...@gmail.com> wrote: > >> I agree that we should hold off on the Arrow upgrade if it requires major >> changes to our testing. I did have another thought that maybe we could just >> add another job to test against Python 3.5 and pyarrow 0.10.0 and keep all >> current testing the same? I'm not sure how doable that is right now and >> don't want to make a ton of extra work, so no objections from me to hold >> off on things for now. >> >> On Fri, Aug 10, 2018 at 9:48 AM, shane knapp <skn...@berkeley.edu> wrote: >> >>> On Fri, Aug 10, 2018 at 9:47 AM, Wenchen Fan <cloud0...@gmail.com> >>> wrote: >>> >>>> It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave >>>> it to Spark 3.0, so that we have more time to test. Any objections? >>>> >>> >>> none here. >>> >>> -- >>> Shane Knapp >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> >> > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >