[ https://issues.apache.org/jira/browse/PIG-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4438: ---------------------------------- Attachment: PIG-4438_1.patch PIG-4438_1.patch is the initial patch. Meet some problems when running the script in the bug description. Need more time to figure out. Error info is: {code} 269 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: java.lang.Byte cannot be cast to java.util.Iterator 270 at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:85) 271 at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48) 272 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 273 at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30) 274 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35) 275 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 276 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 277 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 278 at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) 279 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30) 280 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 281 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 282 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 283 at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) 284 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30) 285 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 286 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 287 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 288 at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987) 289 at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965) 290 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 291 at org.apache.spark.scheduler.Task.run(Task.scala:56) 292 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 293 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 294 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 295 at java.lang.Thread.run(Thread.java:744) {code} > Can not work when in "limit after sort" situation in spark mode > --------------------------------------------------------------- > > Key: PIG-4438 > URL: https://issues.apache.org/jira/browse/PIG-4438 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4438_1.patch > > > when pig script executes "order" before "limit" in spark mode, the results > will be wrong. > cat testlimit.txt > 1 orange > 3 coconut > 5 grape > 6 pear > 2 apple > 4 mango > testlimit.pig: > a = load './testlimit.txt' as (x:int, y:chararray); > b = order a by x; > c = limit b 1; > store c into './testlimit.out'; > the result: > 1 orange > 2 apple > 3 coconut > 4 mango > 5 grape > 6 pear > the correct result should be: > 1 orange -- This message was sent by Atlassian JIRA (v6.3.4#6332)