> > Note that you _can_ use a Python 2.7 `ipython` executable on the driver > while continuing to use a vanilla `python` executable on the executors
Whoops, just to be clear, this should actually read "while continuing to use a vanilla `python` 2.7 executable". On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <joshro...@databricks.com> wrote: > Yep, the driver and executors need to have compatible Python versions. I > think that there are some bytecode-level incompatibilities between 2.6 and > 2.7 which would impact the deserialization of Python closures, so I think > you need to be running the same 2.x version for all communicating Spark > processes. Note that you _can_ use a Python 2.7 `ipython` executable on the > driver while continuing to use a vanilla `python` executable on the > executors (we have environment variables which allow you to control these > separately). > > On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I think all the slaves need the same (or a compatible) version of Python >> installed since they run Python code in PySpark jobs natively. >> >> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote: >> >>> interesting i didnt know that! >>> >>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas < >>> nicholas.cham...@gmail.com> wrote: >>> >>>> even if python 2.7 was needed only on this one machine that launches >>>> the app we can not ship it with our software because its gpl licensed >>>> >>>> Not to nitpick, but maybe this is important. The Python license is >>>> GPL-compatible >>>> but not GPL <https://docs.python.org/3/license.html>: >>>> >>>> Note GPL-compatible doesn’t mean that we’re distributing Python under >>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified >>>> version without making your changes open source. The GPL-compatible >>>> licenses make it possible to combine Python with other software that is >>>> released under the GPL; the others don’t. >>>> >>>> Nick >>>> >>>> >>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote: >>>> >>>>> i do not think so. >>>>> >>>>> does the python 2.7 need to be installed on all slaves? if so, we do >>>>> not have direct access to those. >>>>> >>>>> also, spark is easy for us to ship with our software since its apache >>>>> 2 licensed, and it only needs to be present on the machine that launches >>>>> the app (thanks to yarn). >>>>> even if python 2.7 was needed only on this one machine that launches >>>>> the app we can not ship it with our software because its gpl licensed, so >>>>> the client would have to download it and install it themselves, and this >>>>> would mean its an independent install which has to be audited and approved >>>>> and now you are in for a lot of fun. basically it will never happen. >>>>> >>>>> >>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshro...@databricks.com> >>>>> wrote: >>>>> >>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I >>>>>> imagine that they're also capable of installing a standalone Python >>>>>> alongside that Spark version (without changing Python systemwide). For >>>>>> instance, Anaconda/Miniconda make it really easy to install Python >>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't >>>>>> require any special permissions to install (you don't need root / sudo >>>>>> access). Does this address the Python versioning concerns for RHEL users? >>>>>> >>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> >>>>>> wrote: >>>>>> >>>>>>> yeah, the practical concern is that we have no control over java or >>>>>>> python version on large company clusters. our current reality for the >>>>>>> vast >>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that >>>>>>> is. >>>>>>> >>>>>>> i dont like it either, but i cannot change it. >>>>>>> >>>>>>> we currently don't use pyspark so i have no stake in this, but if we >>>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was >>>>>>> dropped. no point in developing something that doesnt run for majority >>>>>>> of >>>>>>> customers. >>>>>>> >>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas < >>>>>>> nicholas.cham...@gmail.com> wrote: >>>>>>> >>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6 >>>>>>>> until 2020. So I'm assuming these large companies will have the option >>>>>>>> of >>>>>>>> riding out Python 2.6 until then. >>>>>>>> >>>>>>>> Are we seriously saying that Spark should likewise support Python >>>>>>>> 2.6 for the next several years? Even though the core Python devs >>>>>>>> stopped >>>>>>>> supporting it in 2013? >>>>>>>> >>>>>>>> If that's not what we're suggesting, then when, roughly, can we >>>>>>>> drop support? What are the criteria? >>>>>>>> >>>>>>>> I understand the practical concern here. If companies are stuck >>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But >>>>>>>> balancing >>>>>>>> that concern against the maintenance burden on this project, I would >>>>>>>> say >>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable >>>>>>>> position to take. There are many tiny annoyances one has to put up >>>>>>>> with to >>>>>>>> support 2.6. >>>>>>>> >>>>>>>> I suppose if our main PySpark contributors are fine putting up with >>>>>>>> those annoyances, then maybe we don't need to drop support just yet... >>>>>>>> >>>>>>>> Nick >>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente < >>>>>>>> ju...@esbet.es>님이 작성: >>>>>>>> >>>>>>>>> Unfortunately, Koert is right. >>>>>>>>> >>>>>>>>> I've been in a couple of projects using Spark (banking industry) >>>>>>>>> where CentOS + Python 2.6 is the toolbox available. >>>>>>>>> >>>>>>>>> That said, I believe it should not be a concern for Spark. Python >>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark >>>>>>>>> philosophy >>>>>>>>> IMO. >>>>>>>>> >>>>>>>>> >>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> >>>>>>>>> escribió: >>>>>>>>> >>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it? >>>>>>>>> >>>>>>>>> if so, i still know plenty of large companies where python 2.6 is >>>>>>>>> the only option. asking them for python 2.7 is not going to work >>>>>>>>> >>>>>>>>> so i think its a bad idea >>>>>>>>> >>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland < >>>>>>>>> juliet.hougl...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. >>>>>>>>>> At this point, Python 3 should be the default that is encouraged. >>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging >>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6 >>>>>>>>>> support sounds very reasonable to me. >>>>>>>>>> >>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas < >>>>>>>>>> nicholas.cham...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> +1 >>>>>>>>>>> >>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020 >>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, >>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python >>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good >>>>>>>>>>> enough >>>>>>>>>>> reason to continue support for Python 2.6 IMO. >>>>>>>>>>> >>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I >>>>>>>>>>> believe we currently do). >>>>>>>>>>> >>>>>>>>>>> Nick >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang < >>>>>>>>>>> allenzhang...@126.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> plus 1, >>>>>>>>>>>> >>>>>>>>>>>> we are currently using python 2.7.2 in production environment. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <meethu.mat...@flytxt.com> >>>>>>>>>>>> 写道: >>>>>>>>>>>> >>>>>>>>>>>> +1 >>>>>>>>>>>> We use Python 2.7 >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> Meethu Mathew >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin < >>>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Does anybody here care about us dropping support for Python >>>>>>>>>>>>> 2.6 in Spark 2.0? >>>>>>>>>>>>> >>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects >>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries >>>>>>>>>>>>> that >>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the >>>>>>>>>>>>> library >>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm >>>>>>>>>>>>> curious if >>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>> >