Diana Carroll created SPARK-8795: ------------------------------------ Summary: pySpark wholeTextFiles error when mapping string Key: SPARK-8795 URL: https://issues.apache.org/jira/browse/SPARK-8795 Project: Spark Issue Type: Bug Components: PySpark Environment: CentOS 6.6, Python 2.7, CDH 5.4.1 Reporter: Diana Carroll
I created a test directory with two tiny text files. This call works: {code}sc.wholeTextFiles("testdata").map(lambda (fname,x): len(x)).collect(){code} This call does not: {code}sc.wholeTextFiles("testdata").map(lambda (fname,x): x.islower()).collect(){code} In fact, any attempt to call any string methods on X, or pass X to any function requiring a string, fail the same way. The main error is {code} File "/usr/lib/spark/python/pyspark/worker.py", line 101, in main process() File "/usr/lib/spark/python/pyspark/worker.py", line 96, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/lib/spark/python/pyspark/serializers.py", line 236, in dump_stream vs = list(itertools.islice(iterator, batch)) File "<ipython-input-107-5192d18d0e4c>", line 1, in <lambda> TypeError: 'bool' object is not callable {code} Will attach full log. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org