run spark sql with script transformation faild

linxi zeng Mon, 27 Jun 2016 08:31:13 -0700

Hi, all:
    Recently, we are trying to compare with spark sql and hive on MR, and I
have tried to run spark (spark1.6 rc2) sql with script transformation, the
spark job faild and get an error message like:


16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code
generated in 19.054534 ms

16/06/26 11:01:28 ERROR execution.ScriptTransformationWriterThread:
/bin/bash: test.py: command not found



16/06/26 11:01:28 ERROR util.Utils: Uncaught exception in thread
Thread-ScriptTransformation-Feed

java.io.IOException: Stream closed

        at 
java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)

        at java.io.OutputStream.write(OutputStream.java:116)

        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

        at java.io.DataOutputStream.write(DataOutputStream.java:107)

        at 
org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255)

        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244)

16/06/26 11:01:28 ERROR util.SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[Thread-ScriptTransformation-Feed,5,main]

java.io.IOException: Stream closed

        at 
java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)

        at java.io.OutputStream.write(OutputStream.java:116)

        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

        at java.io.DataOutputStream.write(DataOutputStream.java:107)

        at 
org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255)

        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801)

        at 
org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244)


cmd is:

> spark-1.6/bin/spark-sql -f transform.sql


the sql and python script is:
transform.sql (which was executed successfully on hive) :

> add file /tmp/spark_sql_test/test.py;
> select transform(cityname) using 'test.py' as (new_cityname) from
> test.spark2_orc where dt='20160622' limit 5 ;

test.py:

> #!/usr/bin/env python
> #coding=utf-8
> import sys
> import string
> reload(sys)
> sys.setdefaultencoding('utf8')
> for line in sys.stdin:
>     cityname = line.strip("\n").split("\t")[0]
>     lt = []
>     lt.append(cityname + "_zlx")
>     print "\t".join(lt)


And after making two modifications:
(1) chmod +x test.py
(2) transform.sql：using 'test.py'  ->  using './test.py'
the sql executed successfully.
I was wonder that if the spark sql with script transformation should be run
like this way? Any one meet the same problem?

run spark sql with script transformation faild

Reply via email to