[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471014#comment-17471014 ] jingxiong zhong commented on SPARK-37708: - [~hyukjin.kwon]In the end, we found that the operating system was different and that python would not run in the image.If we use centos system, it can work normally > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471012#comment-17471012 ] Apache Spark commented on SPARK-37708: -- User 'zhongjingxiong' has created a pull request for this issue: https://github.com/apache/spark/pull/35142 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471013#comment-17471013 ] Apache Spark commented on SPARK-37708: -- User 'zhongjingxiong' has created a pull request for this issue: https://github.com/apache/spark/pull/35142 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465293#comment-17465293 ] Hyukjin Kwon commented on SPARK-37708: -- [~zhongjingxiong] If it works, we should try. I think it should better match Os in general. > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465060#comment-17465060 ] jingxiong zhong commented on SPARK-37708: - [~hyukjin.kwon] I found some packages downloaded, such as pandas, NLTK. Can I change the default operating system debian of dockerFile to Centos6/7? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465058#comment-17465058 ] jingxiong zhong commented on SPARK-37708: - I used wget to download and compile the source code, but it seems python3.6 is not supported by spark3.2 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464194#comment-17464194 ] Hyukjin Kwon commented on SPARK-37708: -- How was "python3.6.9.tgz" created? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Critical > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org