Yeah, I don't think this feature was designed to work on systems that don't have bash. You could open a JIRA.
On Tue, Oct 20, 2015 at 10:36 AM, Yang Wu (Tata Consultancy Services) < v-wuy...@microsoft.com> wrote: > Yes. > > We are trying to run a custom script written in C# using TRANSFORM, but > cannot get it work. > > The query and error are below. Any suggestions? Thank you! > > > > Spark version: 1.3 > > Here is how we add and invoke the script: > > > > scala> hiveContext.sql("""ADD FILE wasb://… /NSSGraphHelper.exe""") > > … > > scala> hiveContext.sql("""SELECT TRANSFORM (dc, attribute, key, time, > value) USING 'NSSGraphHelper. exe' FROM SourceTable""").collect() > > > > The query throws an exception that it cannot find the file specified: > > > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 16.0 failed 4 times, most recent fail > > ure: Lost task 0.3 in stage 16.0 (TID 1273, > workernode1.nsssparkcluster.g10.internal.cloudapp.net): > java.io.IOException: > > Cannot run program "/bin/bash": CreateProcess error=2, The system cannot > find the file specified > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) > > at > org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:61) > > at > org.apache.spark.sql.hive.execution.ScriptTransformation$$anonfun$1.apply(ScriptTransformation.scala:58) > > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) > > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > > at org.apache.spark.scheduler.Task.run(Task.scala:64) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.IOException: CreateProcess error=2, The system cannot > find the file specified > > at java.lang.ProcessImpl.create(Native Method) > > at java.lang.ProcessImpl.<init>(ProcessImpl.java:385) > > at java.lang.ProcessImpl.start(ProcessImpl.java:136) > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) > > ... 16 more > > > > Driver stacktrace: > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages( > > DAGScheduler.scala:1204) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > > at scala.Option.foreach(Option.scala:236) > > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > > *From:* Michael Armbrust [mailto:mich...@databricks.com] > *Sent:* Tuesday, October 20, 2015 10:21 AM > *To:* Yang Wu (Tata Consultancy Services) <v-wuy...@microsoft.com> > *Cc:* user <user@spark.apache.org> > *Subject:* Re: Hive custom transform scripts in Spark? > > > > We support TRANSFORM. Are you having a problem using it? > > > > On Tue, Oct 20, 2015 at 8:21 AM, wuyangjack <v-wuy...@microsoft.com> > wrote: > > How to reuse hive custom transform scripts written in python or c++? > > These scripts process data from stdin and print to stdout in spark. > They use the Transform Syntax in Hive: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform > <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcwiki.apache.org%2fconfluence%2fdisplay%2fHive%2fLanguageManual%2bTransform&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pL6wNubIoOntZPeD%2fGld%2b7ZPm57tpFKw4Q6Ab0YZ%2bV4%3d> > > Example in Hive: > SELECT TRANSFORM(stuff) > USING 'script.exe' > AS thing1, thing2 > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Hive-custom-transform-scripts-in-Spark-tp25142.html > <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fapache-spark-user-list.1001560.n3.nabble.com%2fHive-custom-transform-scripts-in-Spark-tp25142.html&data=01%7c01%7cv-wuyang%40microsoft.com%7ca204316fb2bd41492b2708d2d972dde2%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=E19LCyw%2ft%2b75qAtLbc1lCcOfCG02S8xts3e51HIEVE4%3d> > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > >