[jira] [Updated] (PIG-4438) Can not work when in limit after sort situation in spark mode
[ https://issues.apache.org/jira/browse/PIG-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated PIG-4438: -- Attachment: PIG-4438_1.patch PIG-4438_1.patch is the initial patch. Meet some problems when running the script in the bug description. Need more time to figure out. Error info is: {code} 269 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: java.lang.Byte cannot be cast to java.util.Iterator 270 at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:85) 271 at org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48) 272 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 273 at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30) 274 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35) 275 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 276 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 277 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 278 at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) 279 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30) 280 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 281 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 282 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 283 at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) 284 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:30) 285 at org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64) 286 at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) 287 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 288 at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:987) 289 at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965) 290 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 291 at org.apache.spark.scheduler.Task.run(Task.scala:56) 292 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) 293 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 294 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 295 at java.lang.Thread.run(Thread.java:744) {code} Can not work when in limit after sort situation in spark mode --- Key: PIG-4438 URL: https://issues.apache.org/jira/browse/PIG-4438 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4438_1.patch when pig script executes order before limit in spark mode, the results will be wrong. cat testlimit.txt 1 orange 3 coconut 5 grape 6 pear 2 apple 4 mango testlimit.pig: a = load './testlimit.txt' as (x:int, y:chararray); b = order a by x; c = limit b 1; store c into './testlimit.out'; the result: 1 orange 2 apple 3 coconut 4 mango 5 grape 6 pear the correct result should be: 1 orange -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Need JIRA/UNIX Admin that knows confluence --- Warren, NJ----$55 per hr on C2C
Thanks, Mohan Gadde HCL Global Systems, Inc 24543 Indoplex Circle, Suite# 220 Farmington Hills, MI 48335 mo...@hclglobal.com Phone # 248 473 0720 EXT 197 Cell: 302 983 8001 -Original Message- From: HCL GLOBAL [mailto:mo...@hclglobal.com] Sent: Monday, March 16, 2015 12:06 PM To: dev@pig.apache.org Cc: j...@apache.org Subject: I need JIRA/UNIX admin that knows confluence Wireless in Warren- $55 per hr on C2C Thanks, Mohan Gadde HCL Global Systems, Inc 24543 Indoplex Circle, Suite# 220 Farmington Hills, MI 48335 mo...@hclglobal.com Phone # 248 473 0720 EXT 197 Cell: 302 983 8001 -Original Message- From: j...@apache.org [mailto:j...@apache.org] Sent: Sunday, March 15, 2015 3:01 AM To: dev@pig.apache.org Subject: [jira] Subscription: PIG patch available Issue Subscription Filter: PIG patch available (23 issues) Subscriber: pigdaily Key Summary PIG-4458Support UDFs in a FOREACH Before a Merge Join https://issues.apache.org/jira/browse/PIG-4458 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4452Embedded SQL using SQL instead of sql fails with string index out of range: -1 error https://issues.apache.org/jira/browse/PIG-4452 PIG-4422Implement visitMergeJoin in SparkCompiler https://issues.apache.org/jira/browse/PIG-4422 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3294Allow Pig use Hive UDFs https://issues.apache.org/jira/browse/PIG-3294 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384
Re: Need JIRA/UNIX Admin that knows confluence --- Warren, NJ----$55 per hr on C2C
Stop spamming the list On Monday, March 16, 2015, HCL GLOBAL mo...@hclglobal.com wrote: Thanks, Mohan Gadde HCL Global Systems, Inc 24543 Indoplex Circle, Suite# 220 Farmington Hills, MI 48335 mo...@hclglobal.com javascript:; Phone # 248 473 0720 EXT 197 Cell: 302 983 8001 -Original Message- From: HCL GLOBAL [mailto:mo...@hclglobal.com javascript:;] Sent: Monday, March 16, 2015 12:06 PM To: dev@pig.apache.org javascript:; Cc: j...@apache.org javascript:; Subject: I need JIRA/UNIX admin that knows confluence Wireless in Warren- $55 per hr on C2C Thanks, Mohan Gadde HCL Global Systems, Inc 24543 Indoplex Circle, Suite# 220 Farmington Hills, MI 48335 mo...@hclglobal.com javascript:; Phone # 248 473 0720 EXT 197 Cell: 302 983 8001 -Original Message- From: j...@apache.org javascript:; [mailto:j...@apache.org javascript:;] Sent: Sunday, March 15, 2015 3:01 AM To: dev@pig.apache.org javascript:; Subject: [jira] Subscription: PIG patch available Issue Subscription Filter: PIG patch available (23 issues) Subscriber: pigdaily Key Summary PIG-4458Support UDFs in a FOREACH Before a Merge Join https://issues.apache.org/jira/browse/PIG-4458 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4452Embedded SQL using SQL instead of sql fails with string index out of range: -1 error https://issues.apache.org/jira/browse/PIG-4452 PIG-4422Implement visitMergeJoin in SparkCompiler https://issues.apache.org/jira/browse/PIG-4422 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3294Allow Pig use Hive UDFs https://issues.apache.org/jira/browse/PIG-3294 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384 -- Thank You, /andi Andi Levin Indeed.com | West Coast Engineering Talent a...@indeed.com | (425) 954-7085 Indeed | How the World Works.™ https://www.youtube.com/watch?v=aa6hoIgYjtQlist=PL6qIzGkkiXFFktKEuZ-rCdPMWdgbOrmZB100% of the talent in our new ad was hired on Indeed! Watch how we made it happen https://www.youtube.com/watch?v=aa6hoIgYjtQlist=PL6qIzGkkiXFFktKEuZ-rCdPMWdgbOrmZB
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (23 issues) Subscriber: pigdaily Key Summary PIG-4458Support UDFs in a FOREACH Before a Merge Join https://issues.apache.org/jira/browse/PIG-4458 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4452Embedded SQL using SQL instead of sql fails with string index out of range: -1 error https://issues.apache.org/jira/browse/PIG-4452 PIG-4422Implement visitMergeJoin in SparkCompiler https://issues.apache.org/jira/browse/PIG-4422 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3294Allow Pig use Hive UDFs https://issues.apache.org/jira/browse/PIG-3294 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384
[jira] [Commented] (PIG-4422) Implement visitMergeJoin in SparkCompiler
[ https://issues.apache.org/jira/browse/PIG-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362873#comment-14362873 ] liyunzhang_intel commented on PIG-4422: --- [~praveenr019], merge join is a big feature, can you submit the detail design doc about how to implement it in spark like https://wiki.apache.org/pig/PigMergeJoin( a design about how to implement in MR). Implement visitMergeJoin in SparkCompiler - Key: PIG-4422 URL: https://issues.apache.org/jira/browse/PIG-4422 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: Praveen Rachabattuni Fix For: spark-branch in PIG-4374_6.patch. SparkCompiler#visitMergeJoin is marked TODO -- This message was sent by Atlassian JIRA (v6.3.4#6332)