Supreeth Sharma created SPARK-23600:
---------------------------------------
Summary: conda_panda_example test fails to import panda lib with
Spark 2.3
Key: SPARK-23600
URL: https://issues.apache.org/jira/browse/SPARK-23600
Project: Spark
Issue Type: Bug
Components: Spark Submit
Affects Versions: 2.3.0
Environment: ambari-server --version 2.7.0.2-64
HDP-3.0.0.2-132
Reporter: Supreeth Sharma
Fix For: 2.3.0
With Spark2.3, conda panda test is failing to import panda.
python version: Python 2.7.5
1) Create Requirement file.
virtual_env_type : Native
{code:java}
packaging==16.8
panda==0.3.1
pyparsing==2.1.10
requests==2.13.0
six==1.10.0
numpy==1.12.0
pandas==0.19.2
python-dateutil==2.6.0
pytz==2016.10
{code}
virtual_env_type : conda
{code:java}
mkl=2017.0.1=0
numpy=1.12.0=py27_0
openssl=1.0.2k=0
pandas=0.19.2=np112py27_1
pip=9.0.1=py27_1
python=2.7.13=0
python-dateutil=2.6.0=py27_0
pytz=2016.10=py27_0
readline=6.2=2
setuptools=27.2.0=py27_0
six=1.10.0=py27_0
sqlite=3.13.0=0
tk=8.5.18=0
wheel=0.29.0=py27_0
zlib=1.2.8=3
{code}
2) Run conda panda test
{code:java}
spark-submit --master yarn-client --jars
/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.3.0.0.2-132.jar --conf
spark.pyspark.virtualenv.enabled=true --conf
spark.pyspark.virtualenv.type=native --conf
spark.pyspark.virtualenv.requirements=/tmp/requirements.txt --conf
spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv
/hwqe/hadoopqe/tests/spark/data/conda_panda_example.py 2>&1 | tee
/tmp/1/Spark_clientLogs/pyenv_conda_panda_example_native_yarn-client.log
{code}
3) Application fail to import panda.
{code:java}
2018-03-05 13:43:31,493|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO
YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning
after reached minRegisteredResourcesRatio: 0.8
2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|Traceback (most recent call
last):
2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|File
"/hwqe/hadoopqe/tests/spark/data/conda_panda_example.py", line 5, in <module>
2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|import pandas as pd
2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|ImportError: No module named
pandas
2018-03-05 13:43:31,547|INFO|MainThread|machine.py:167 -
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO
BlockManagerMasterEndpoint: Registering block manager
ctr-e138-1518143905142-67599-01-000005.hwx.site:44861 with 366.3 MB RAM,
BlockManagerId(2, ctr-e138-1518143905142-67599-01-000005.hwx.site, 44861,
None){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]