+1 on doing this in 3.0. On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung < felixcheun...@hotmail.com > wrote:
> > I’m +1 if 3.0 > > > > > *From:* Sean Owen < srowen@ gmail. com ( sro...@gmail.com ) > > *Sent:* Monday, March 25, 2019 6:48 PM > *To:* Hyukjin Kwon > *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp > *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] > > I don't know a lot about Arrow here, but seems reasonable. Is this for > Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3 > seems right. > > On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon < gurwls223@ gmail. com ( > gurwls...@gmail.com ) > wrote: > > > > Hi all, > > > > We really need to upgrade the minimal version soon. It's actually > slowing down the PySpark dev, for instance, by the overhead that sometimes > we need currently to test all multiple matrix of Arrow and Pandas. Also, > it currently requires to add some weird hacks or ugly codes. Some bugs > exist in lower versions, and some features are not supported in low > PyArrow, for instance. > > > > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and > my opinion as well, we should better increase the minimal version to > 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature). > > > > So, I and Bryan will proceed this roughly in few days if there isn't > objections assuming we're fine with increasing it to 0.12.x. Please let me > know if there are some concerns. > > > > For clarification, this requires some jobs in Jenkins to upgrade the > minimal version of PyArrow (I cc'ed Shane as well). > > > > PS: I roughly heard that Shane's busy for some work stuff .. but it's > kind of important in my perspective. > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscribe@ spark. apache. org ( > dev-unsubscr...@spark.apache.org ) >