Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Hyukjin Kwon
Hi all,

We really need to upgrade the minimal version soon. It's actually slowing
down the PySpark dev, for instance, by the overhead that sometimes we need
currently to test all multiple matrix of Arrow and Pandas. Also, it
currently requires to add some weird hacks or ugly codes. Some bugs exist
in lower versions, and some features are not supported in low PyArrow, for
instance.

Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my
opinion as well, we should better increase the minimal version to 0.12.x.
(Also, note that Pandas <> Arrow is an experimental feature).

So, I and Bryan will proceed this roughly in few days if there isn't
objections assuming we're fine with increasing it to 0.12.x. Please let me
know if there are some concerns.

For clarification, this requires some jobs in Jenkins to upgrade the
minimal version of PyArrow (I cc'ed Shane as well).

PS: I roughly heard that Shane's busy for some work stuff .. but it's kind
of important in my perspective.


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread shane knapp
thanks for the heads up...  i'll test deploy this tomorrow and see what
gotchas turn up.  we may need to upgrade from python 3.4 to 3.5 IIRC.

On Mon, Mar 25, 2019 at 6:16 PM Hyukjin Kwon  wrote:

> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing
> down the PySpark dev, for instance, by the overhead that sometimes we need
> currently to test all multiple matrix of Arrow and Pandas. Also, it
> currently requires to add some weird hacks or ugly codes. Some bugs exist
> in lower versions, and some features are not supported in low PyArrow, for
> instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my
> opinion as well, we should better increase the minimal version to 0.12.x.
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't
> objections assuming we're fine with increasing it to 0.12.x. Please let me
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the
> minimal version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind
> of important in my perspective.
>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Sean Owen
I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Felix Cheung
I’m +1 if 3.0



From: Sean Owen 
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Reynold Xin
+1 on doing this in 3.0.

On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung < felixcheun...@hotmail.com > 
wrote:

> 
> I’m +1 if 3.0
> 
> 
> 
>  
> *From:* Sean Owen < srowen@ gmail. com ( sro...@gmail.com ) >
> *Sent:* Monday, March 25, 2019 6:48 PM
> *To:* Hyukjin Kwon
> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>  
> I don't know a lot about Arrow here, but seems reasonable. Is this for
> Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
> seems right.
> 
> On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon < gurwls223@ gmail. com (
> gurwls...@gmail.com ) > wrote:
> >
> > Hi all,
> >
> > We really need to upgrade the minimal version soon. It's actually
> slowing down the PySpark dev, for instance, by the overhead that sometimes
> we need currently to test all multiple matrix of Arrow and Pandas. Also,
> it currently requires to add some weird hacks or ugly codes. Some bugs
> exist in lower versions, and some features are not supported in low
> PyArrow, for instance.
> >
> > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and
> my opinion as well, we should better increase the minimal version to
> 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature).
> >
> > So, I and Bryan will proceed this roughly in few days if there isn't
> objections assuming we're fine with increasing it to 0.12.x. Please let me
> know if there are some concerns.
> >
> > For clarification, this requires some jobs in Jenkins to upgrade the
> minimal version of PyArrow (I cc'ed Shane as well).
> >
> > PS: I roughly heard that Shane's busy for some work stuff .. but it's
> kind of important in my perspective.
> >
> 
> -
> To unsubscribe e-mail: dev-unsubscribe@ spark. apache. org (
> dev-unsubscr...@spark.apache.org )
>

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-26 Thread Bryan Cutler
Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a link
to the JIRA https://issues.apache.org/jira/browse/SPARK-27276.  Shane is
also correct in that newer versions of pyarrow have stopped support for
Python 3.4, so we should probably have Jenkins test against 2.7 and 3.5.

On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin  wrote:

> +1 on doing this in 3.0.
>
>
> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung 
> wrote:
>
>> I’m +1 if 3.0
>>
>>
>> --
>> *From:* Sean Owen 
>> *Sent:* Monday, March 25, 2019 6:48 PM
>> *To:* Hyukjin Kwon
>> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp
>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>>
>> I don't know a lot about Arrow here, but seems reasonable. Is this for
>> Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
>> seems right.
>>
>> On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
>> wrote:
>> >
>> > Hi all,
>> >
>> > We really need to upgrade the minimal version soon. It's actually
>> slowing down the PySpark dev, for instance, by the overhead that sometimes
>> we need currently to test all multiple matrix of Arrow and Pandas. Also, it
>> currently requires to add some weird hacks or ugly codes. Some bugs exist
>> in lower versions, and some features are not supported in low PyArrow, for
>> instance.
>> >
>> > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and
>> my opinion as well, we should better increase the minimal version to
>> 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature).
>> >
>> > So, I and Bryan will proceed this roughly in few days if there isn't
>> objections assuming we're fine with increasing it to 0.12.x. Please let me
>> know if there are some concerns.
>> >
>> > For clarification, this requires some jobs in Jenkins to upgrade the
>> minimal version of PyArrow (I cc'ed Shane as well).
>> >
>> > PS: I roughly heard that Shane's busy for some work stuff .. but it's
>> kind of important in my perspective.
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-26 Thread shane knapp
i'm pretty certain that i've got a solid python 3.5 conda environment ready
to be deployed, but this isn't a minor change to the build system and there
might be some bugs to iron out.

another problem is that the current python 3.4 environment is hard-coded in
to the both the build scripts on jenkins (all over the place) and in the
codebase (thankfully in only one spot):  export
PATH=/home/anaconda/envs/py3k/bin:$PATH

this means that every branch (master, 2.x, etc) will test against whatever
version of python lives in that conda environment.  if we upgrade to 3.5,
all branches will test against this version.  changing the build and test
infra to support testing against 2.7, 3.4 or 3.5 based on branch is
definitely non-trivial...

thoughts?




On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler  wrote:

> Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a
> link to the JIRA https://issues.apache.org/jira/browse/SPARK-27276.
> Shane is also correct in that newer versions of pyarrow have stopped
> support for Python 3.4, so we should probably have Jenkins test against 2.7
> and 3.5.
>
> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin  wrote:
>
>> +1 on doing this in 3.0.
>>
>>
>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung 
>> wrote:
>>
>>> I’m +1 if 3.0
>>>
>>>
>>> --
>>> *From:* Sean Owen 
>>> *Sent:* Monday, March 25, 2019 6:48 PM
>>> *To:* Hyukjin Kwon
>>> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp
>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>> [SPARK-27276]
>>>
>>> I don't know a lot about Arrow here, but seems reasonable. Is this for
>>> Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
>>> seems right.
>>>
>>> On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > We really need to upgrade the minimal version soon. It's actually
>>> slowing down the PySpark dev, for instance, by the overhead that sometimes
>>> we need currently to test all multiple matrix of Arrow and Pandas. Also, it
>>> currently requires to add some weird hacks or ugly codes. Some bugs exist
>>> in lower versions, and some features are not supported in low PyArrow, for
>>> instance.
>>> >
>>> > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and
>>> my opinion as well, we should better increase the minimal version to
>>> 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature).
>>> >
>>> > So, I and Bryan will proceed this roughly in few days if there isn't
>>> objections assuming we're fine with increasing it to 0.12.x. Please let me
>>> know if there are some concerns.
>>> >
>>> > For clarification, this requires some jobs in Jenkins to upgrade the
>>> minimal version of PyArrow (I cc'ed Shane as well).
>>> >
>>> > PS: I roughly heard that Shane's busy for some work stuff .. but it's
>>> kind of important in my perspective.
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
That’s not necessarily bad. I don’t know if we have plan to ever release any 
new 2.2.x, 2.3.x at this point and we can message this “supported version” of 
python change for any new 2.4 release.

Besides we could still support python 3.4 - it’s just more complicated to test 
manually without Jenkins coverage.



From: shane knapp 
Sent: Tuesday, March 26, 2019 12:11 PM
To: Bryan Cutler
Cc: dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm pretty certain that i've got a solid python 3.5 conda environment ready to 
be deployed, but this isn't a minor change to the build system and there might 
be some bugs to iron out.

another problem is that the current python 3.4 environment is hard-coded in to 
the both the build scripts on jenkins (all over the place) and in the codebase 
(thankfully in only one spot):  export PATH=/home/anaconda/envs/py3k/bin:$PATH

this means that every branch (master, 2.x, etc) will test against whatever 
version of python lives in that conda environment.  if we upgrade to 3.5, all 
branches will test against this version.  changing the build and test infra to 
support testing against 2.7, 3.4 or 3.5 based on branch is definitely 
non-trivial...

thoughts?




On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a link to 
the JIRA https://issues.apache.org/jira/browse/SPARK-27276.  Shane is also 
correct in that newer versions of pyarrow have stopped support for Python 3.4, 
so we should probably have Jenkins test against 2.7 and 3.5.

On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:

+1 on doing this in 3.0.


On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
I’m +1 if 3.0



From: Sean Owen mailto:sro...@gmail.com>>
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Hyukjin Kwon
Bryan, was there an actual change when to drop Python 3.4 in PyArrow? If
not, I think it might be possible that we can increase the minimal Arrow
version separately.
If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
3.5.

2019년 3월 29일 (금) 오전 1:39, Felix Cheung 님이 작성:

> That’s not necessarily bad. I don’t know if we have plan to ever release
> any new 2.2.x, 2.3.x at this point and we can message this “supported
> version” of python change for any new 2.4 release.
>
> Besides we could still support python 3.4 - it’s just more complicated to
> test manually without Jenkins coverage.
>
>
> --
> *From:* shane knapp 
> *Sent:* Tuesday, March 26, 2019 12:11 PM
> *To:* Bryan Cutler
> *Cc:* dev
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>
> i'm pretty certain that i've got a solid python 3.5 conda environment
> ready to be deployed, but this isn't a minor change to the build system and
> there might be some bugs to iron out.
>
> another problem is that the current python 3.4 environment is hard-coded
> in to the both the build scripts on jenkins (all over the place) and in the
> codebase (thankfully in only one spot):  export
> PATH=/home/anaconda/envs/py3k/bin:$PATH
>
> this means that every branch (master, 2.x, etc) will test against whatever
> version of python lives in that conda environment.  if we upgrade to 3.5,
> all branches will test against this version.  changing the build and test
> infra to support testing against 2.7, 3.4 or 3.5 based on branch is
> definitely non-trivial...
>
> thoughts?
>
>
>
>
> On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler  wrote:
>
>> Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a
>> link to the JIRA https://issues.apache.org/jira/browse/SPARK-27276.
>> Shane is also correct in that newer versions of pyarrow have stopped
>> support for Python 3.4, so we should probably have Jenkins test against 2.7
>> and 3.5.
>>
>> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin  wrote:
>>
>>> +1 on doing this in 3.0.
>>>
>>>
>>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung >> > wrote:
>>>
>>>> I’m +1 if 3.0
>>>>
>>>>
>>>> --
>>>> *From:* Sean Owen 
>>>> *Sent:* Monday, March 25, 2019 6:48 PM
>>>> *To:* Hyukjin Kwon
>>>> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp
>>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>>> [SPARK-27276]
>>>>
>>>> I don't know a lot about Arrow here, but seems reasonable. Is this for
>>>> Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
>>>> seems right.
>>>>
>>>> On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > We really need to upgrade the minimal version soon. It's actually
>>>> slowing down the PySpark dev, for instance, by the overhead that sometimes
>>>> we need currently to test all multiple matrix of Arrow and Pandas. Also, it
>>>> currently requires to add some weird hacks or ugly codes. Some bugs exist
>>>> in lower versions, and some features are not supported in low PyArrow, for
>>>> instance.
>>>> >
>>>> > Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation
>>>> and my opinion as well, we should better increase the minimal version to
>>>> 0.12.x. (Also, note that Pandas <> Arrow is an experimental feature).
>>>> >
>>>> > So, I and Bryan will proceed this roughly in few days if there isn't
>>>> objections assuming we're fine with increasing it to 0.12.x. Please let me
>>>> know if there are some concerns.
>>>> >
>>>> > For clarification, this requires some jobs in Jenkins to upgrade the
>>>> minimal version of PyArrow (I cc'ed Shane as well).
>>>> >
>>>> > PS: I roughly heard that Shane's busy for some work stuff .. but it's
>>>> kind of important in my perspective.
>>>> >
>>>>
>>>> -
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>
>>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread shane knapp
>
>
> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
> 3.5.
>
> this is inevitable.  3.4s final release was 10 days ago (
https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread shane knapp
looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:

>
>> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to
>> 3.5.
>>
>> this is inevitable.  3.4s final release was 10 days ago (
> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
3.4 is end of life but 3.5 is not. From your link

we expect to release Python 3.5.8 around September 2019.




From: shane knapp 
Sent: Thursday, March 28, 2019 7:54 PM
To: Hyukjin Kwon
Cc: Bryan Cutler; dev; Felix Cheung
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
mailto:skn...@berkeley.edu>> wrote:

If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to 3.5.

this is inevitable.  3.4s final release was 10 days ago 
(https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Bryan Cutler
PyArrow dropping Python 3.4 was mainly due to support going away at
Conda-Forge and other dependencies also dropping it.  I think we better
upgrade Jenkins Python while we are at it.  Are you all against jumping to
Python 3.6 so we are not in the same boat in September?

On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
wrote:

> 3.4 is end of life but 3.5 is not. From your link
>
> we expect to release Python 3.5.8 around September 2019.
>
>
>
> --
> *From:* shane knapp 
> *Sent:* Thursday, March 28, 2019 7:54 PM
> *To:* Hyukjin Kwon
> *Cc:* Bryan Cutler; dev; Felix Cheung
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>
> looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/
>
> let's pick a python version and start testing.
>
> On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:
>
>>
>>> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4
>>> to 3.5.
>>>
>>> this is inevitable.  3.4s final release was 10 days ago (
>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread shane knapp
i'm not opposed to 3.6 at all.

On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:

> PyArrow dropping Python 3.4 was mainly due to support going away at
> Conda-Forge and other dependencies also dropping it.  I think we better
> upgrade Jenkins Python while we are at it.  Are you all against jumping to
> Python 3.6 so we are not in the same boat in September?
>
> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
> wrote:
>
>> 3.4 is end of life but 3.5 is not. From your link
>>
>> we expect to release Python 3.5.8 around September 2019.
>>
>>
>>
>> --
>> *From:* shane knapp 
>> *Sent:* Thursday, March 28, 2019 7:54 PM
>> *To:* Hyukjin Kwon
>> *Cc:* Bryan Cutler; dev; Felix Cheung
>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>>
>> looks like the same for 3.5...
>> https://www.python.org/dev/peps/pep-0478/
>>
>> let's pick a python version and start testing.
>>
>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:
>>
>>>
>>>> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4
>>>> to 3.5.
>>>>
>>>> this is inevitable.  3.4s final release was 10 days ago (
>>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Felix Cheung
I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just 
saying the next release.

In any case I think in the next release it will be great to get more Python 3.x 
release test coverage.




From: shane knapp 
Sent: Friday, March 29, 2019 4:46 PM
To: Bryan Cutler
Cc: Felix Cheung; Hyukjin Kwon; dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm not opposed to 3.6 at all.

On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
PyArrow dropping Python 3.4 was mainly due to support going away at Conda-Forge 
and other dependencies also dropping it.  I think we better upgrade Jenkins 
Python while we are at it.  Are you all against jumping to Python 3.6 so we are 
not in the same boat in September?

On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
3.4 is end of life but 3.5 is not. From your link

we expect to release Python 3.5.8 around September 2019.




From: shane knapp mailto:skn...@berkeley.edu>>
Sent: Thursday, March 28, 2019 7:54 PM
To: Hyukjin Kwon
Cc: Bryan Cutler; dev; Felix Cheung
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
mailto:skn...@berkeley.edu>> wrote:

If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to 3.5.

this is inevitable.  3.4s final release was 10 days ago 
(https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
i'd much prefer that we minimize the number of python versions that we test
against...  would 2.7 and 3.6 be sufficient?

On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
wrote:

> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just
> saying the next release.
>
> In any case I think in the next release it will be great to get more
> Python 3.x release test coverage.
>
>
>
> --
> *From:* shane knapp 
> *Sent:* Friday, March 29, 2019 4:46 PM
> *To:* Bryan Cutler
> *Cc:* Felix Cheung; Hyukjin Kwon; dev
> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>
> i'm not opposed to 3.6 at all.
>
> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>
>> PyArrow dropping Python 3.4 was mainly due to support going away at
>> Conda-Forge and other dependencies also dropping it.  I think we better
>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>> Python 3.6 so we are not in the same boat in September?
>>
>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
>> wrote:
>>
>>> 3.4 is end of life but 3.5 is not. From your link
>>>
>>> we expect to release Python 3.5.8 around September 2019.
>>>
>>>
>>>
>>> ----------
>>> *From:* shane knapp 
>>> *Sent:* Thursday, March 28, 2019 7:54 PM
>>> *To:* Hyukjin Kwon
>>> *Cc:* Bryan Cutler; dev; Felix Cheung
>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>> [SPARK-27276]
>>>
>>> looks like the same for 3.5...
>>> https://www.python.org/dev/peps/pep-0478/
>>>
>>> let's pick a python version and start testing.
>>>
>>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp  wrote:
>>>
>>>>
>>>>> If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4
>>>>> to 3.5.
>>>>>
>>>>> this is inevitable.  3.4s final release was 10 days ago (
>>>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
well now!  color me completely surprised...  i decided to whip up a fresh
python3.6.8 conda environment this morning to "see if things just worked".

well, apparently they do!  :)

regardless, this is pretty awesome news as i will be able to easily update
the 'py3k' python3.4 environment to a fresh, less bloated, but still
package-complete python3.6.8 environment (including pyarrow 0.12.0, pandas
0.24.2, scipy 1.2.1).

i tested this pretty extensively today on both the ubuntu and centos
workers, and i think i'm ready to pull the trigger for a build-system-wide
upgrade...   however, i'll be out wednesday through friday this week and
don't want to make a massive change before disappearing for a few days.

so:  how does early next week sound for the python upgrade?  :)

shane

On Mon, Apr 1, 2019 at 8:58 AM shane knapp  wrote:

> i'd much prefer that we minimize the number of python versions that we
> test against...  would 2.7 and 3.6 be sufficient?
>
> On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
> wrote:
>
>> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just
>> saying the next release.
>>
>> In any case I think in the next release it will be great to get more
>> Python 3.x release test coverage.
>>
>>
>>
>> --
>> *From:* shane knapp 
>> *Sent:* Friday, March 29, 2019 4:46 PM
>> *To:* Bryan Cutler
>> *Cc:* Felix Cheung; Hyukjin Kwon; dev
>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]
>>
>> i'm not opposed to 3.6 at all.
>>
>> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>>
>>> PyArrow dropping Python 3.4 was mainly due to support going away at
>>> Conda-Forge and other dependencies also dropping it.  I think we better
>>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>>> Python 3.6 so we are not in the same boat in September?
>>>
>>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
>>> wrote:
>>>
>>>> 3.4 is end of life but 3.5 is not. From your link
>>>>
>>>> we expect to release Python 3.5.8 around September 2019.
>>>>
>>>>
>>>>
>>>> --
>>>> *From:* shane knapp 
>>>> *Sent:* Thursday, March 28, 2019 7:54 PM
>>>> *To:* Hyukjin Kwon
>>>> *Cc:* Bryan Cutler; dev; Felix Cheung
>>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>>> [SPARK-27276]
>>>>
>>>> looks like the same for 3.5...
>>>> https://www.python.org/dev/peps/pep-0478/
>>>>
>>>> let's pick a python version and start testing.
>>>>
>>>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
>>>> wrote:
>>>>
>>>>>
>>>>>> If there was, it looks inevitable to upgrade Jenkins\s Python from
>>>>>> 3.4 to 3.5.
>>>>>>
>>>>>> this is inevitable.  3.4s final release was 10 days ago (
>>>>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>>>>
>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread Bryan Cutler
Nice work Shane! That all sounds good to me.  We might want to use pyarrow
0.12.1 though, there is a major bug that was fixed, but we can discuss in
the PR.  I will put up the code changes in the next few days.
Felix, I think you're right about Python 3.5, they just list one upcoming
release and that's not necessarily the last. Comparing the histories, it
might still be soon though. I think using 3.6 will be fine, as a point of
reference, pyarrow CI uses 2.7 and 3.6.

On Mon, Apr 1, 2019 at 3:09 PM shane knapp  wrote:

> well now!  color me completely surprised...  i decided to whip up a fresh
> python3.6.8 conda environment this morning to "see if things just worked".
>
> well, apparently they do!  :)
>
> regardless, this is pretty awesome news as i will be able to easily update
> the 'py3k' python3.4 environment to a fresh, less bloated, but still
> package-complete python3.6.8 environment (including pyarrow 0.12.0, pandas
> 0.24.2, scipy 1.2.1).
>
> i tested this pretty extensively today on both the ubuntu and centos
> workers, and i think i'm ready to pull the trigger for a build-system-wide
> upgrade...   however, i'll be out wednesday through friday this week and
> don't want to make a massive change before disappearing for a few days.
>
> so:  how does early next week sound for the python upgrade?  :)
>
> shane
>
> On Mon, Apr 1, 2019 at 8:58 AM shane knapp  wrote:
>
>> i'd much prefer that we minimize the number of python versions that we
>> test against...  would 2.7 and 3.6 be sufficient?
>>
>> On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
>> wrote:
>>
>>> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s
>>> just saying the next release.
>>>
>>> In any case I think in the next release it will be great to get more
>>> Python 3.x release test coverage.
>>>
>>>
>>>
>>> ------
>>> *From:* shane knapp 
>>> *Sent:* Friday, March 29, 2019 4:46 PM
>>> *To:* Bryan Cutler
>>> *Cc:* Felix Cheung; Hyukjin Kwon; dev
>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>> [SPARK-27276]
>>>
>>> i'm not opposed to 3.6 at all.
>>>
>>> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>>>
>>>> PyArrow dropping Python 3.4 was mainly due to support going away at
>>>> Conda-Forge and other dependencies also dropping it.  I think we better
>>>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>>>> Python 3.6 so we are not in the same boat in September?
>>>>
>>>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
>>>> wrote:
>>>>
>>>>> 3.4 is end of life but 3.5 is not. From your link
>>>>>
>>>>> we expect to release Python 3.5.8 around September 2019.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *From:* shane knapp 
>>>>> *Sent:* Thursday, March 28, 2019 7:54 PM
>>>>> *To:* Hyukjin Kwon
>>>>> *Cc:* Bryan Cutler; dev; Felix Cheung
>>>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>>>> [SPARK-27276]
>>>>>
>>>>> looks like the same for 3.5...
>>>>> https://www.python.org/dev/peps/pep-0478/
>>>>>
>>>>> let's pick a python version and start testing.
>>>>>
>>>>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>> If there was, it looks inevitable to upgrade Jenkins\s Python from
>>>>>>> 3.4 to 3.5.
>>>>>>>
>>>>>>> this is inevitable.  3.4s final release was 10 days ago (
>>>>>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread shane knapp
i am totally fine w/waiting a few days for the latest arrow release...  not
at all a problem.

On Tue, Apr 2, 2019 at 9:14 AM Bryan Cutler  wrote:

> Nice work Shane! That all sounds good to me.  We might want to use pyarrow
> 0.12.1 though, there is a major bug that was fixed, but we can discuss in
> the PR.  I will put up the code changes in the next few days.
> Felix, I think you're right about Python 3.5, they just list one upcoming
> release and that's not necessarily the last. Comparing the histories, it
> might still be soon though. I think using 3.6 will be fine, as a point of
> reference, pyarrow CI uses 2.7 and 3.6.
>
> On Mon, Apr 1, 2019 at 3:09 PM shane knapp  wrote:
>
>> well now!  color me completely surprised...  i decided to whip up a fresh
>> python3.6.8 conda environment this morning to "see if things just worked".
>>
>> well, apparently they do!  :)
>>
>> regardless, this is pretty awesome news as i will be able to easily
>> update the 'py3k' python3.4 environment to a fresh, less bloated, but still
>> package-complete python3.6.8 environment (including pyarrow 0.12.0, pandas
>> 0.24.2, scipy 1.2.1).
>>
>> i tested this pretty extensively today on both the ubuntu and centos
>> workers, and i think i'm ready to pull the trigger for a build-system-wide
>> upgrade...   however, i'll be out wednesday through friday this week and
>> don't want to make a massive change before disappearing for a few days.
>>
>> so:  how does early next week sound for the python upgrade?  :)
>>
>> shane
>>
>> On Mon, Apr 1, 2019 at 8:58 AM shane knapp  wrote:
>>
>>> i'd much prefer that we minimize the number of python versions that we
>>> test against...  would 2.7 and 3.6 be sufficient?
>>>
>>> On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung 
>>> wrote:
>>>
>>>> I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s
>>>> just saying the next release.
>>>>
>>>> In any case I think in the next release it will be great to get more
>>>> Python 3.x release test coverage.
>>>>
>>>>
>>>>
>>>> --
>>>> *From:* shane knapp 
>>>> *Sent:* Friday, March 29, 2019 4:46 PM
>>>> *To:* Bryan Cutler
>>>> *Cc:* Felix Cheung; Hyukjin Kwon; dev
>>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>>> [SPARK-27276]
>>>>
>>>> i'm not opposed to 3.6 at all.
>>>>
>>>> On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler  wrote:
>>>>
>>>>> PyArrow dropping Python 3.4 was mainly due to support going away at
>>>>> Conda-Forge and other dependencies also dropping it.  I think we better
>>>>> upgrade Jenkins Python while we are at it.  Are you all against jumping to
>>>>> Python 3.6 so we are not in the same boat in September?
>>>>>
>>>>> On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung <
>>>>> felixcheun...@hotmail.com> wrote:
>>>>>
>>>>>> 3.4 is end of life but 3.5 is not. From your link
>>>>>>
>>>>>> we expect to release Python 3.5.8 around September 2019.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *From:* shane knapp 
>>>>>> *Sent:* Thursday, March 28, 2019 7:54 PM
>>>>>> *To:* Hyukjin Kwon
>>>>>> *Cc:* Bryan Cutler; dev; Felix Cheung
>>>>>> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x
>>>>>> [SPARK-27276]
>>>>>>
>>>>>> looks like the same for 3.5...
>>>>>> https://www.python.org/dev/peps/pep-0478/
>>>>>>
>>>>>> let's pick a python version and start testing.
>>>>>>
>>>>>> On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>> If there was, it looks inevitable to upgrade Jenkins\s Python from
>>>>>>>> 3.4 to 3.5.
>>>>>>>>
>>>>>>>> this is inevitable.  3.4s final release was 10 days ago (
>>>>>>> https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Shane Knapp
>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>> https://rise.cs.berkeley.edu
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu