Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files 
http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in 
the container at 
“/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code 
change in entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded 
and available in container for example 
/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <i...@cornell.edu>; liyinan...@gmail.com
Cc: Spark dev list <d...@spark.apache.org>; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in 
https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files http://10.75.145.25:80/Spark/getNN.py 
http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in 
<module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory 
/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in 
https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and 
available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master 
k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf 
spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf 
--py-files ./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <i...@cornell.edu<mailto:i...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan...@gmail.com<mailto:liyinan...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) 
<suryanarayana.garlap...@nokia.com<mailto:suryanarayana.garlap...@nokia.com>>; 
Spark dev list <d...@spark.apache.org<mailto:d...@spark.apache.org>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li 
<liyinan...@gmail.com<mailto:liyinan...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own 
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - 
IN/Bangalore) 
<suryanarayana.garlap...@nokia.com<mailto:suryanarayana.garlap...@nokia.com>> 
wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. 
When the dependent files are passed through the --py-files option, they are not 
getting resolved by the main python script. Please let me know, is this a known 
issue?

Regards
Surya

Reply via email to