[ https://issues.apache.org/jira/browse/PIG-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925123#comment-15925123 ]
liyunzhang_intel commented on PIG-5181: --------------------------------------- [~nkollar]: commit to spark branch, thanks! > JobManagement_3 is failing with spark exec type > ----------------------------------------------- > > Key: PIG-5181 > URL: https://issues.apache.org/jira/browse/PIG-5181 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Fix For: spark-branch > > Attachments: PIG-5181_1.patch, PIG-5181_2.patch > > > Task failed on executor node: > {code} > java.lang.RuntimeException: Error while processing 'perl PigStreaming.pl - - > nameMap > (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)' > failed with exit status: 2 > at > org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.readNext(OutputConsumerIterator.java:85) > at > org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.hasNext(OutputConsumerIterator.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) > at > org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.readNext(OutputConsumerIterator.java:57) > at > org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.hasNext(OutputConsumerIterator.java:96) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > It seems that in SparkLauncher the cached file is copied from the file system > to a temporary file (which case is the original file name + some random > string) > {code} > File tmpFile = File.createTempFile(fileName, ".tmp"); > {code} > and this file is added to SparkContext. It is added with the name of the temp > file. This is a problem, since this cached file can't be used with the > original name, like in JobManagement_3: > {code} > define CMD `perl PigStreaming.pl - - nameMap` > ship(':SCRIPTHOMEPATH:/PigStreaming.pl') > cache(':INPATH:/nameMap/part-00000#nameMap'); > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)