Re: Hive custom transform scripts in Spark?

Michael Armbrust Tue, 20 Oct 2015 10:49:20 -0700

Yeah, I don't think this feature was designed to work on systems that don't
have bash.  You could open a JIRA.


On Tue, Oct 20, 2015 at 10:36 AM, Yang Wu (Tata Consultancy Services) <
v-wuy...@microsoft.com> wrote:

> Yes.
>
> We are trying to run a custom script written in C# using TRANSFORM, but
> cannot get it work.
>
> The query and error are below. Any suggestions? Thank you!
>
>
>
> Spark version: 1.3
>
> Here is how we add and invoke the script:
>
>
>
> scala> hiveContext.sql("""ADD FILE wasb://… /NSSGraphHelper.exe""")
>
>                 …
>
> scala> hiveContext.sql("""SELECT TRANSFORM (dc, attribute, key, time,
> value) USING 'NSSGraphHelper. exe'  FROM SourceTable""").collect()
>
>
>
> The query throws an exception that it cannot find the file specified:
>
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 16.0 failed 4 times, most recent fail
>
> ure: Lost task 0.3 in stage 16.0 (TID 1273,
> workernode1.nsssparkcluster.g10.internal.cloudapp.net):
> java.io.IOException:
>
> Cannot run program "/bin/bash": CreateProcess error=2, The system cannot
> find the file specified
>
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
>
>         at
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:61)
>
>         at
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:58)
>
>         at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>
>         at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>
>        at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>
>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.io.IOException: CreateProcess error=2, The system cannot
> find the file specified
>
>         at java.lang.ProcessImpl.create(Native Method)
>
>         at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
>
>         at java.lang.ProcessImpl.start(ProcessImpl.java:136)
>
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
>
>         ... 16 more
>
>
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
>
> DAGScheduler.scala:1204)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
>
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>
>
> *From:* Michael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Tuesday, October 20, 2015 10:21 AM
> *To:* Yang Wu (Tata Consultancy Services) <v-wuy...@microsoft.com>
> *Cc:* user <user@spark.apache.org>
> *Subject:* Re: Hive custom transform scripts in Spark?
>
>
>
> We support TRANSFORM.  Are you having a problem using it?
>
>
>
> On Tue, Oct 20, 2015 at 8:21 AM, wuyangjack <v-wuy...@microsoft.com>
> wrote:
>
> How to reuse hive custom transform scripts written in python or c++?
>
> These scripts process data from stdin and print to stdout in spark.
> They use the Transform Syntax in Hive:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcwiki.apache.org%2fconfluence%2fdisplay%2fHive%2fLanguageManual%2bTransform&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pL6wNubIoOntZPeD%2fGld%2b7ZPm57tpFKw4Q6Ab0YZ%2bV4%3d>
>
> Example in Hive:
> SELECT TRANSFORM(stuff)
> USING 'script.exe'
> AS thing1, thing2
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-custom-transform-scripts-in-Spark-tp25142.html
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fapache-spark-user-list.1001560.n3.nabble.com%2fHive-custom-transform-scripts-in-Spark-tp25142.html&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=E19LCyw%2ft%2b75qAtLbc1lCcOfCG02S8xts3e51HIEVE4%3d>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>

Re: Hive custom transform scripts in Spark?

Reply via email to