Yesha Vora created SPARK-19095: ---------------------------------- Summary: virtualenv example does not work in yarn cluster mode Key: SPARK-19095 URL: https://issues.apache.org/jira/browse/SPARK-19095 Project: Spark Issue Type: Bug Reporter: Yesha Vora Priority: Critical
Steps: * install virtualenv on all nodes * create requirement1.txt with "numpy > requirement1.txt " * Run kmeans.py application in yarn-cluster mode. {code} spark-submit --master yarn --deploy-mode cluster --conf "spark.pyspark.virtualenv.enabled=true" --conf "spark.pyspark.virtualenv.type=native" --conf "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf "spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv" --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt 3{code} The application fails to find numpy. {code} LogType:stdout Log Upload Time:Thu Jan 05 20:05:49 +0000 2017 LogLength:134 Log Contents: Traceback (most recent call last): File "kmeans.py", line 27, in <module> import numpy as np ImportError: No module named numpy End of LogType:stdout {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org