Hi, all: Recently, we are trying to compare with spark sql and hive on MR, and I have tried to run spark (spark1.6 rc2) sql with script transformation, the spark job faild and get an error message like:
16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code generated in 19.054534 ms 16/06/26 11:01:28 ERROR execution.ScriptTransformationWriterThread: /bin/bash: test.py: command not found 16/06/26 11:01:28 ERROR util.Utils: Uncaught exception in thread Thread-ScriptTransformation-Feed java.io.IOException: Stream closed at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434) at java.io.OutputStream.write(OutputStream.java:116) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244) 16/06/26 11:01:28 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Thread-ScriptTransformation-Feed,5,main] java.io.IOException: Stream closed at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434) at java.io.OutputStream.write(OutputStream.java:116) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244) cmd is: > spark-1.6/bin/spark-sql -f transform.sql the sql and python script is: transform.sql (which was executed successfully on hive) : > add file /tmp/spark_sql_test/test.py; > select transform(cityname) using 'test.py' as (new_cityname) from > test.spark2_orc where dt='20160622' limit 5 ; test.py: > #!/usr/bin/env python > #coding=utf-8 > import sys > import string > reload(sys) > sys.setdefaultencoding('utf8') > for line in sys.stdin: > cityname = line.strip("\n").split("\t")[0] > lt = [] > lt.append(cityname + "_zlx") > print "\t".join(lt) And after making two modifications: (1) chmod +x test.py (2) transform.sql:using 'test.py' -> using './test.py' the sql executed successfully. I was wonder that if the spark sql with script transformation should be run like this way? Any one meet the same problem?