[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2016-07-04 Thread Takao Magoori (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362015#comment-15362015
 ] 

Takao Magoori commented on SPARK-6764:
--

Sorry. It seems there is no isolated site-package directory. Workdir is just 
added to sys.path.

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2016-07-04 Thread Takao Magoori (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361922#comment-15361922
 ] 

Takao Magoori commented on SPARK-6764:
--

Sorry all, I have been busy on my projects for long time. 

[~gae...@xeberon.net] Thanks for your work.

I checked SPARK-13587. Virtualenv can create an isolated python environment and 
make installing all dependencies simple (by "requirement.txt") though, my first 
impression is that supporting naive virtualenv is unnecessary since each Spark 
worker already has its own site-packages directory just like naive virtualenv. 
Moreover, it should support installing dependencies without internet-connection.
But this is my "first" impression. As I have not follow spark source code and 
jira recently, I will check current code and the other tickes.

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2016-07-04 Thread Semet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361408#comment-15361408
 ] 

Semet commented on SPARK-6764:
--

Hello
I am working on a new proposal for complete wheel support, along with 
virtualenv. I think this will solve many dependency problem with python 
packages.

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2016-03-01 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174945#comment-15174945
 ] 

Jeff Zhang commented on SPARK-6764:
---

[~msukmanowsky] Can SPARK-13587 solve your issue ? I am working on it, welcome 
any comments. 

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2016-03-01 Thread Mike Sukmanowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174899#comment-15174899
 ] 

Mike Sukmanowsky commented on SPARK-6764:
-

Just bumping this issue up. We use Spark (PySpark) pretty extensively and would 
love the ability to use wheels in addition to eggs with spark-submit.

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2015-06-26 Thread Punya Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603785#comment-14603785
 ] 

Punya Biswal commented on SPARK-6764:
-

Some packages need to be installed on workers, it's not enough just to put 
archived versions on the PYTHONPATH. Is there a reason to avoid using pip on 
the workers?

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2015-05-21 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554825#comment-14554825
 ] 

Davies Liu commented on SPARK-6764:
---

My first question is that, can we use wheel package just like egg file ( 
without pip in the worker)? Package or download them in the driver, then send 
them to workers. 

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2015-04-07 Thread Takao Magoori (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484786#comment-14484786
 ] 

Takao Magoori commented on SPARK-6764:
--

Note:
*core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala* should be 
modified since it adds all pyfile paths to the environment variable 
"PYTHONPATH" before creating python (driver ?) process. 

But, I could not modify it since I am not familiar with scala/java and Spark 
internal details.

Basically, I think the followings are required.
* Add any python package file paths except for .whl (that is, all .egg, .zip 
and .jar) to PYTHONPATH
* Install .whl files by invoking subprocess with the command
{code}
pythonExec -m pip install --quiet --upgrade --no-deps --no-index --target 
TEMPORARY_SITE_PACKAGES_DIRECTORY SPACE_SEPARATED_WHL_FILE_PATHS...
{code}
* Then, add the _TEMPORARY_SITE_PACKAGES_DIRECTORY_ to PYTHONPATH.

Is there someone to modify it ?


> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6764) Add wheel package support for PySpark

2015-04-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484758#comment-14484758
 ] 

Apache Spark commented on SPARK-6764:
-

User 'takaomag' has created a pull request for this issue:
https://github.com/apache/spark/pull/5408

> Add wheel package support for PySpark
> -
>
> Key: SPARK-6764
> URL: https://issues.apache.org/jira/browse/SPARK-6764
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, PySpark
>Reporter: Takao Magoori
>Priority: Minor
>  Labels: newbie
>
> We can do _spark-submit_ with one or more Python packages (.egg,.zip and 
> .jar) by *--py-files* option.
> h4. zip packaging
> Spark put a zip file on its working directory and adds the absolute path to 
> Python's sys.path. When the user program imports it, 
> [zipimport|https://docs.python.org/2.7/library/zipimport.html] is 
> automatically invoked under the hood. That is, data-files and dynamic 
> modules(.pyd .so) can not be used since zipimport supports only .py, .pyc and 
> .pyo.
> h4. egg packaging
> Spark put an egg file on its working directory and adds the absolute path to 
> Python's sys.path. Unlike zipimport, egg can handle data files and dynamid 
> modules as far as the author of the package uses [pkg_resources 
> API|https://pythonhosted.org/setuptools/formats.html#other-technical-considerations]
>  properly. But so many python modules does not use pkg_resources API, that 
> causes "ImportError"or "No such file" error. Moreover, creating eggs of 
> dependencies and further dependencies are troublesome job.
> h4. wheel packaging
> Supporting new Python standard package-format 
> "[wheel|https://wheel.readthedocs.org/en/latest/]"; would be nice. With wheel, 
> we can do spark-submit with complex dependencies simply as follows.
> 1. Write requirements.txt file.
> {noformat}
> SQLAlchemy
> MySQL-python
> requests
> simplejson>=3.6.0,<=3.6.5
> pydoop
> {noformat}
> 2. Do wheel packaging by only one command. All dependencies are wheel-ed.
> {noformat}
> $ your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse --requirement 
> requirements.txt
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') your_driver.py
> {noformat}
> If your pyspark driver is a package which consists of many modules,
> 1. Write setup.py for your pyspark driver package.
> {noformat}
> from setuptools import (
> find_packages,
> setup,
> )
> setup(
> name='yourpkg',
> version='0.0.1',
> packages=find_packages(),
> install_requires=[
> 'SQLAlchemy',
> 'MySQL-python',
> 'requests',
> 'simplejson>=3.6.0,<=3.6.5',
> 'pydoop',
> ],
> )
> {noformat}
> 2. Do wheel packaging by only one command. Your driver package and all 
> dependencies are wheel-ed.
> {noformat}
> your_pip_dir/pip wheel --wheel-dir /tmp/wheelhouse your_driver_package/.
> {noformat}
> 3. Do spark-submit
> {noformat}
> your_spark_home/bin/spark-submit --master local[4] --py-files $(find 
> /tmp/wheelhouse/ -name "*.whl" -print0 | sed -e 's/\x0/,/g') 
> your_driver_bootstrap.py
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org