[ https://issues.apache.org/jira/browse/SPARK-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-19096. ------------------------------- Resolution: Duplicate > Kmeans.py application fails with virtualenv and due to parse error > -------------------------------------------------------------------- > > Key: SPARK-19096 > URL: https://issues.apache.org/jira/browse/SPARK-19096 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Reporter: Yesha Vora > > Spark version : 2 > Steps: > * Install virtualenv ( pip install virtualenv) > * create requirements.txt (pip freeze > /tmp/requirements.txt) > * start kmeans.py application in yarn-client mode. > The application fails with Runtime Exception > {code:title=app log} > 17/01/05 19:49:59 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/01/05 19:49:59 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > Invalid requirement: 'pip freeze' > Traceback (most recent call last): > File > "/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0006/container_1483592608863_0006_01_000002/virtualenv_application_1483592608863_0006_0/lib/python2.7/site-packages/pip/req/req_install.py", > line 82, in __init__ > req = Requirement(req) > File > "/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0006/container_1483592608863_0006_01_000002/virtualenv_application_1483592608863_0006_0/lib/python2.7/site-packages/pip/_vendor/packaging/requirements.py", > line 96, in __init__ > requirement_string[e.loc:e.loc + 8])) > InvalidRequirement: Invalid requirement, parse error at "u'freeze'" > 17/01/05 19:50:03 WARN BlockManager: Putting block rdd_3_0 failed due to an > exception > 17/01/05 19:50:03 WARN BlockManager: Block rdd_3_0 could not be removed as it > was not found on disk or in memory > 17/01/05 19:50:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > {code} > {code:title=job client log} > 17/01/05 19:50:07 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, > xxx.site, executor 1): java.lang.RuntimeException: Fail to run command: > virtualenv_application_1483592608863_0006_1/bin/python -m pip --cache-dir > /home/yarn install -r requirements.txt > at > org.apache.spark.api.python.PythonWorkerFactory.execCommand(PythonWorkerFactory.scala:142) > at > org.apache.spark.api.python.PythonWorkerFactory.setupVirtualEnv(PythonWorkerFactory.scala:128) > at > org.apache.spark.api.python.PythonWorkerFactory.<init>(PythonWorkerFactory.scala:70) > at > org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117) > at > org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117) > at > scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194) > at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) > at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org