Python couldn't find your module. Do you have that on each worker node? You will need to have that on each one
--- Original Message --- From: "Davies Liu" <dav...@databricks.com> Sent: January 22, 2015 9:12 PM To: "Mohit Singh" <mohit1...@gmail.com> Cc: user@spark.apache.org Subject: Re: Using third party libraries in pyspark You need to install these libraries on all the slaves, or submit via spark-submit: spark-submit --py-files xxx On Thu, Jan 22, 2015 at 11:23 AM, Mohit Singh <mohit1...@gmail.com> wrote: > Hi, > I might be asking something very trivial, but whats the recommend way of > using third party libraries. > I am using tables to read hdf5 format file.. > And here is the error trace: > > > print rdd.take(2) > File "/tmp/spark/python/pyspark/rdd.py", line 1111, in take > res = self.context.runJob(self, takeUpToNumLeft, p, True) > File "/tmp/spark/python/pyspark/context.py", line 818, in runJob > it = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, > javaPartitions, allowLocal) > File "/tmp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File "/tmp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line > 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, srv-108-23.720.rdio): org.apache.spark.api.python.PythonException: > Traceback (most recent call last): > File > "/hadoop/disk3/mapred/local/filecache/540/spark-assembly-1.2.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py", > line 90, in main > command = pickleSer._read_with_length(infile) > File > "/hadoop/disk3/mapred/local/filecache/540/spark-assembly-1.2.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py", > line 151, in _read_with_length > return self.loads(obj) > File > "/hadoop/disk3/mapred/local/filecache/540/spark-assembly-1.2.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py", > line 396, in loads > return cPickle.loads(obj) > File > "/hadoop/disk3/mapred/local/filecache/540/spark-assembly-1.2.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/cloudpickle.py", > line 825, in subimport > __import__(name) > ImportError: ('No module named tables', <function subimport at 0x47e1398>, > ('tables',)) > > Though, import tables works fine on the local python shell.. but seems like > every thing is being pickled.. Are we expected to send all the files as > helper files? that doesn't seems right? > Thanks > > > -- > > > Mohit > > "When you want success as badly as you want the air, then you will get it. > There is no other secret of success." > -Socrates --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org