[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090451#comment-15090451 ] Reynold Xin commented on SPARK-922: --- cc [~shivaram] > Update Spark AMI to Python 2.7 > -- > > Key: SPARK-922 > URL: https://issues.apache.org/jira/browse/SPARK-922 > Project: Spark > Issue Type: Task > Components: EC2, PySpark >Reporter: Josh Rosen >Priority: Blocker > > Many Python libraries only support Python 2.7+, so we should make Python 2.7 > the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169331#comment-14169331 ] Nicholas Chammas commented on SPARK-922: [~joshrosen] - Do you mean [this script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? I doesn't seem to have anything related to Python 2.7. Anyway, what I meant was if you were open to holding off on updating the Spark AMIs until we had also figured out how to automate that process per [SPARK-3821]. I should have something for that as soon as this week or next. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169380#comment-14169380 ] Josh Rosen commented on SPARK-922: -- [~nchammas] - I don't think that there's an urgent rush to update the AMIs before the next round of releases, so I'm fine with waiting to incorporate this into SPARK-3821. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168787#comment-14168787 ] Andrew Davidson commented on SPARK-922: --- Wow upgrading matplotlib was a bear. The following worked for me. The trick was getting the correct version of the source code. The recipe bellow is not 100% correct. I have not figured out how to use pssh with yum. yum prompts you y/n before downloading pip2.7 install six pssh -t0 -h /root/spark-ec2/slaves pip2.7 install six pip2.7 install python-dateutil pssh -t0 -h /root/spark-ec2/slaves pip2.7 install python-dateutil pip2.7 install pyparsing pssh -t0 -h /root/spark-ec2/slaves pip2.7 install pyparsing yum install yum-utils wget https://github.com/matplotlib/matplotlib/archive/master.tar.gz tar -zxvf master.tar.gz cd matplotlib-master/ yum install freetype-devel yum install libpng-devel python2.7 setup.py build python2.7 setup.py install Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168788#comment-14168788 ] Andrew Davidson commented on SPARK-922: --- also forgot the mention there are a couple of steps on http://nbviewer.ipython.org/gist/JoshRosen/6856670 that are important in the upgrade process # # restart spark # /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168931#comment-14168931 ] Josh Rosen commented on SPARK-922: -- [~nchammas]: It would be great to include Python 2.7 in the next AMI; I think our current AMI shell script has it, though. [~aedwip]: {quote} I have not figured out how to use pssh with yum. yum prompts you y/n before downloading {quote} Try {{yum install -y}}. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168459#comment-14168459 ] Nicholas Chammas commented on SPARK-922: [~joshrosen] Are you open to having this resolved as part of [SPARK-3821]? Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168018#comment-14168018 ] Davies Liu commented on SPARK-922: -- We did not use json heavily in pyspark, also user have several choice of json library in Python, this should not be a issue, i think. We definitely need to upgrade to Python2.7 (as default), if some user need python2.6, it's easy to use it by PYSPARK_PYTHON. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0 Reporter: Josh Rosen Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159284#comment-14159284 ] Josh Rosen commented on SPARK-922: -- I chatted with someone who had a job that ran ~20x slower on Python 2.6 than on 2.7, likely due to changes to the json library (in 2.7+, json is implemented as a C extension rather than in pure-Python). Maybe we should add a note on this to the docs. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.2.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150695#comment-14150695 ] Andrew Davidson commented on SPARK-922: --- I must have missed something. I am running iPython notebook over a ssh tunnel. I am still running using the old version. I made sure to export PYSPARK_PYTHON=python2.7 I also tried export PYSPARK_PYTHON=/usr/bin/python2.7\ import IPython print IPython.sys_info() {'commit_hash': '858d539', 'commit_source': 'installation', 'default_encoding': 'UTF-8', 'ipython_path': '/usr/lib/python2.6/site-packages/ipython-0.13.2-py2.6.egg/IPython', 'ipython_version': '0.13.2', 'os_name': 'posix', 'platform': 'Linux-3.4.37-40.44.amzn1.x86_64-x86_64-with-glibc2.2.5', 'sys_executable': '/usr/bin/python2.6', 'sys_platform': 'linux2', 'sys_version': '2.6.9 (unknown, Sep 13 2014, 00:25:11) \n[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)]'} here is how I am launching iPython notebook. I am running as the ec2-user IPYTHON_OPTS=notebook --pylab inline --no-browser --port=7000 $SPARK_HOME/bin/pyspark Bellow are all the upgrade commands I ran Any idea what I missed? Andy yum install -y pssh yum install -y python27 python27-devel pssh -h /root/spark-ec2/slaves yum install -y python27 python27-devel wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python27 pssh -h /root/spark-ec2/slaves wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python27 easy_install-2.7 pip pssh -h /root/spark-ec2/slaves easy_install-2.7 pip pip2.7 install numpy pssh -t0 -h /root/spark-ec2/slaves pip2.7 install numpy pip2.7 install ipython[all] printf \n# Set Spark Python version\nexport PYSPARK_PYTHON=/usr/bin/python2.7\n /root/spark/conf/spark-env.sh source /root/spark/conf/spark-env.sh Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.2.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135860#comment-14135860 ] Josh Rosen commented on SPARK-922: -- [~nchammas] In the long run, it might be nice to automate the AMI creation / upgrade process so that changes like this can be done in build configuration files. We might have some internal tooling for this; I'll ask around and find out. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.2.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134800#comment-14134800 ] Nicholas Chammas commented on SPARK-922: [~joshrosen] By the way, as part of this work to update the AMIs, can we also have them include the latest security patches and updates? It's a good practice, and we also suspect that it would [improve our EC2 startup time|https://github.com/apache/spark/pull/2339#issuecomment-55483793]. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.2.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115636#comment-14115636 ] Nicholas Chammas commented on SPARK-922: FYI, I believe the line to install numpy on the slaves should read: {code} pssh -t0 -h /root/spark-ec2/slaves pip2.7 install numpy {code} i.e. Change the position of the {{-t0}}. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.1.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098824#comment-14098824 ] Josh Rosen commented on SPARK-922: -- Updated script, which also updates numpy: {code} yum install -y pssh yum install -y python27 python27-devel pssh -h /root/spark-ec2/slaves yum install -y python27 python27-devel wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python27 pssh -h /root/spark-ec2/slaves wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python27 easy_install-2.7 pip pssh -h /root/spark-ec2/slaves easy_install-2.7 pip pip2.7 install numpy pssh -h /root/spark-ec2/slaves pip2.7 install numpy {code} Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.1.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099049#comment-14099049 ] Nicholas Chammas commented on SPARK-922: Josh, at the end of your updated script do we still also need the step to edit {{spark-env.sh}}? Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.1.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099089#comment-14099089 ] Josh Rosen commented on SPARK-922: -- Yeah, you still need to set PYSPARK_PYTHON since this doesn't overwrite the system Python. I was updating this to brain-dump the script I'm using for a Python 2.6 vs Python 2.7 benchmark. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Josh Rosen Fix For: 1.1.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7
[ https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985219#comment-13985219 ] Patrick Wendell commented on SPARK-922: --- This is no longer a blocker now that we've downgraded the python dependency, but would still be nice to have. Update Spark AMI to Python 2.7 -- Key: SPARK-922 URL: https://issues.apache.org/jira/browse/SPARK-922 Project: Spark Issue Type: Task Components: EC2, PySpark Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Josh Rosen Fix For: 1.1.0 Many Python libraries only support Python 2.7+, so we should make Python 2.7 the default Python on the Spark AMIs. -- This message was sent by Atlassian JIRA (v6.2#6252)