[jira] [Updated] (PIG-4569) Fix e2e test Rank_1 failure

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4569:

Attachment: PIG-4569-1.patch

Attach patch. JobConf.getNumMapTasks() cannot get the right #maps, it returns 
the default value 2. Change to use counters to get #maps.

> Fix e2e test Rank_1 failure
> ---
>
> Key: PIG-4569
> URL: https://issues.apache.org/jira/browse/PIG-4569
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4569-1.patch
>
>
> It fails on Hadoop 1, but the issue could exist on Hadoop 2 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4569) Fix e2e test Rank_1 failure

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4569:

Description: 
It fails on Hadoop 1, but the issue could exist on Hadoop 2 as well.

Error message:
{code}
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
executing ForEach at [C[6,4]]
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:279)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: Unable to read counter 
pig.counters.counter_6929257954538808410_2
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getRankCounterOffset(PORank.java:185)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:160)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNextTuple(PORank.java:141)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
... 11 more
{code}

  was:It fails on Hadoop 1, but the issue could exist on Hadoop 2 as well.


> Fix e2e test Rank_1 failure
> ---
>
> Key: PIG-4569
> URL: https://issues.apache.org/jira/browse/PIG-4569
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4569-1.patch
>
>
> It fails on Hadoop 1, but the issue could exist on Hadoop 2 as well.
> Error message:
> {code}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
> executing ForEach at [C[6,4]]
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:279)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.RuntimeException: Unable to read counter 
> pig.counters.counter_6929257954538808410_2
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getRankCounterOffset(PORank.java:185)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNextTuple(PORank.java:141)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
>   ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4569) Fix e2e test Rank_1 failure

2015-05-21 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4569:
---

 Summary: Fix e2e test Rank_1 failure
 Key: PIG-4569
 URL: https://issues.apache.org/jira/browse/PIG-4569
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.15.0


It fails on Hadoop 1, but the issue could exist on Hadoop 2 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4568) Fix unit test failure in TestSecondarySortSpark

2015-05-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4568:
--
Attachment: PIG-4568.patch

[~mohitsabharwal],[~xuefuz],[~praveenr019]:
PIG-4568.patch fixes following unit test failures:
1.TestSecondarySortSpark#testSecondarySortSpark#testNestedDistinctEndToEnd1
2.TestSecondarySortSpark#testSecondarySortSpark#testNestedDistinctEndToEnd2

Changes are:
Compare the sorted group by results and expected results.

> Fix unit test failure in TestSecondarySortSpark 
> 
>
> Key: PIG-4568
> URL: https://issues.apache.org/jira/browse/PIG-4568
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4568.patch
>
>
> In https://builds.apache.org/job/Pig-spark/191/, it shows that 
> two regression unit test failures are added:
> 1.TestSecondarySortSpark#testSecondarySortSpark#testNestedDistinctEndToEnd1
> 2.TestSecondarySortSpark#testSecondarySortSpark#org.apache.pig.spark.TestSecondarySortSpark.testNestedDistinctEndToEnd2
> the reason why these two unit test fail is  the result of group is not sorted 
> in spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4568) Fix unit test failure in TestSecondarySortSpark

2015-05-21 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created PIG-4568:
-

 Summary: Fix unit test failure in TestSecondarySortSpark 
 Key: PIG-4568
 URL: https://issues.apache.org/jira/browse/PIG-4568
 Project: Pig
  Issue Type: Sub-task
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel


In https://builds.apache.org/job/Pig-spark/191/, it shows that 
two regression unit test failures are added:
1.TestSecondarySortSpark#testSecondarySortSpark#testNestedDistinctEndToEnd1
2.TestSecondarySortSpark#testSecondarySortSpark#org.apache.pig.spark.TestSecondarySortSpark.testNestedDistinctEndToEnd2

the reason why these two unit test fail is  the result of group is not sorted 
in spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2015-05-21 Thread jira
Issue Subscription
Filter: PIG patch available (28 issues)

Subscriber: pigdaily

Key Summary
PIG-4565Support custom MR partitioners for Spark engine 
https://issues.apache.org/jira/browse/PIG-4565
PIG-4549Set CROSS operation parallelism for Spark engine
https://issues.apache.org/jira/browse/PIG-4549
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4490MIN/MAX builtin UDFs return wrong results when accumulating for 
strings
https://issues.apache.org/jira/browse/PIG-4490
PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0
https://issues.apache.org/jira/browse/PIG-4468
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4429Add Pig alias information and Pig script to the DAG view in Tez UI
https://issues.apache.org/jira/browse/PIG-4429
PIG-4418NullPointerException in JVMReuseImpl
https://issues.apache.org/jira/browse/PIG-4418
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4373Implement Optimize the use of DistributedCache(PIG-2672) and 
PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4365TOP udf should implement Accumulator interface
https://issues.apache.org/jira/browse/PIG-4365
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


[jira] [Commented] (PIG-4066) An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555621#comment-14555621
 ] 

Daniel Dai commented on PIG-4066:
-

[~cheolsoo], no problem, I will take care.

> An optimization for ROLLUP operation in Pig
> ---
>
> Key: PIG-4066
> URL: https://issues.apache.org/jira/browse/PIG-4066
> Project: Pig
>  Issue Type: Improvement
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Quang-Nhat HOANG-XUAN
>  Labels: hybrid-irg, optimization, rollup
> Fix For: 0.15.0
>
> Attachments: Current Rollup vs Our Rollup.jpg, PIG-4066-revert.patch, 
> PIG-4066.2.patch, PIG-4066.3.patch, PIG-4066.4.patch, PIG-4066.5.patch, 
> PIG-4066.patch, TechnicalNotes.2.pdf, TechnicalNotes.pdf, UserGuide.pdf
>
>
> This patch aims at addressing the current limitation of the ROLLUP operator 
> in PIG: most of the work is done in the Map phase of the underlying MapReduce 
> job to generate all possible intermediate keys that the reducer use to 
> aggregate and produce the ROLLUP output. Based on our previous work: 
> “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of 
> MapReduce ROLLUP aggregates” 
> (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
> show that the design space for a ROLLUP implementation allows for a different 
> approach (in-reducer grouping, IRG), in which less work is done in the Map 
> phase and the grouping is done in the Reduce phase. This patch presents the 
> most efficient implementation we designed (Hybrid IRG), which allows defining 
> a parameter to balance between parallelism (in the reducers) and 
> communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { 
> CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]}...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
> operator will be executed with our approach (IRG, Hybrid IRG) while the 
> remaining ROLLUP ahead will be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP 
> implementation and the current ROLLUP. More information can be found at here: 
> http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> Patch can be reviewed at here: https://reviews.apache.org/r/23804/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555620#comment-14555620
 ] 

Daniel Dai commented on PIG-4567:
-

Thanks [~prkommireddi], I will roll a RC by the end of the week.

> Allow UDFs to specify a counter increment other than default of 1
> -
>
> Key: PIG-4567
> URL: https://issues.apache.org/jira/browse/PIG-4567
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
> Fix For: 0.16.0
>
>
> Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
> to report counters which increments by 1. 
> {code}
> public final void warn(String msg, Enum warningEnum)
> {code}
> It would be more flexible to have an additional method that takes in an 
> argument to increment the counter by.
> {code}
> public final void warn(String msg, Enum warningEnum, long incr)
> {code}
> This will be useful when you might have, for instance, several fields within 
> the same row that are bad and you want the counter to reflect that. Making 
> repetitive "warn" calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4066) An optimization for ROLLUP operation in Pig

2015-05-21 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555615#comment-14555615
 ] 

Cheolsoo Park commented on PIG-4066:


[~daijy], sorry for the trouble and thanks for the clean-up.

> An optimization for ROLLUP operation in Pig
> ---
>
> Key: PIG-4066
> URL: https://issues.apache.org/jira/browse/PIG-4066
> Project: Pig
>  Issue Type: Improvement
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Quang-Nhat HOANG-XUAN
>  Labels: hybrid-irg, optimization, rollup
> Fix For: 0.15.0
>
> Attachments: Current Rollup vs Our Rollup.jpg, PIG-4066-revert.patch, 
> PIG-4066.2.patch, PIG-4066.3.patch, PIG-4066.4.patch, PIG-4066.5.patch, 
> PIG-4066.patch, TechnicalNotes.2.pdf, TechnicalNotes.pdf, UserGuide.pdf
>
>
> This patch aims at addressing the current limitation of the ROLLUP operator 
> in PIG: most of the work is done in the Map phase of the underlying MapReduce 
> job to generate all possible intermediate keys that the reducer use to 
> aggregate and produce the ROLLUP output. Based on our previous work: 
> “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of 
> MapReduce ROLLUP aggregates” 
> (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
> show that the design space for a ROLLUP implementation allows for a different 
> approach (in-reducer grouping, IRG), in which less work is done in the Map 
> phase and the grouping is done in the Reduce phase. This patch presents the 
> most efficient implementation we designed (Hybrid IRG), which allows defining 
> a parameter to balance between parallelism (in the reducers) and 
> communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { 
> CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]}...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
> operator will be executed with our approach (IRG, Hybrid IRG) while the 
> remaining ROLLUP ahead will be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP 
> implementation and the current ROLLUP. More information can be found at here: 
> http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> Patch can be reviewed at here: https://reviews.apache.org/r/23804/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4549) Set CROSS operation parallelism for Spark engine

2015-05-21 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated PIG-4549:
-
Attachment: PIG-4549.2.patch

> Set CROSS operation parallelism for Spark engine
> 
>
> Key: PIG-4549
> URL: https://issues.apache.org/jira/browse/PIG-4549
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Affects Versions: spark-branch
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4549.1.patch, PIG-4549.2.patch, PIG-4549.patch
>
>
> Spark engine should set parallelism to be used for CROSS operation by GFCross 
> UDF.
> If not set, GFCross throws an exception:
> {code}
> String s = cfg.get(PigImplConstants.PIG_CROSS_PARALLELISM + 
> "." + crossKey);
> if (s == null) {
> throw new IOException("Unable to get parallelism hint 
> from job conf");
> }
> {code}
> Estimating parallelism for Spark engine is a TBD item. Until that is done, 
> for CROSS to work, we should use the default parallelism value in GFCross.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1462#comment-1462
 ] 

Prashant Kommireddi commented on PIG-4567:
--

Hi [~daijy], I changed it to 0.16. When are you planning to roll 0.15 RC? I 
will try and make it before then, if not I will aim for 0.16.

> Allow UDFs to specify a counter increment other than default of 1
> -
>
> Key: PIG-4567
> URL: https://issues.apache.org/jira/browse/PIG-4567
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
> Fix For: 0.16.0
>
>
> Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
> to report counters which increments by 1. 
> {code}
> public final void warn(String msg, Enum warningEnum)
> {code}
> It would be more flexible to have an additional method that takes in an 
> argument to increment the counter by.
> {code}
> public final void warn(String msg, Enum warningEnum, long incr)
> {code}
> This will be useful when you might have, for instance, several fields within 
> the same row that are bad and you want the counter to reflect that. Making 
> repetitive "warn" calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-4567:
-
Fix Version/s: (was: 0.15.0)
   0.16.0

> Allow UDFs to specify a counter increment other than default of 1
> -
>
> Key: PIG-4567
> URL: https://issues.apache.org/jira/browse/PIG-4567
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
> Fix For: 0.16.0
>
>
> Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
> to report counters which increments by 1. 
> {code}
> public final void warn(String msg, Enum warningEnum)
> {code}
> It would be more flexible to have an additional method that takes in an 
> argument to increment the counter by.
> {code}
> public final void warn(String msg, Enum warningEnum, long incr)
> {code}
> This will be useful when you might have, for instance, several fields within 
> the same row that are bad and you want the counter to reflect that. Making 
> repetitive "warn" calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555337#comment-14555337
 ] 

Daniel Dai commented on PIG-4567:
-

[~prkommireddi], it might not able to make it to 0.15.0. May I change fix 
version to 0.16?

> Allow UDFs to specify a counter increment other than default of 1
> -
>
> Key: PIG-4567
> URL: https://issues.apache.org/jira/browse/PIG-4567
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Prashant Kommireddi
> Fix For: 0.15.0
>
>
> Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method 
> to report counters which increments by 1. 
> {code}
> public final void warn(String msg, Enum warningEnum)
> {code}
> It would be more flexible to have an additional method that takes in an 
> argument to increment the counter by.
> {code}
> public final void warn(String msg, Enum warningEnum, long incr)
> {code}
> This will be useful when you might have, for instance, several fields within 
> the same row that are bad and you want the counter to reflect that. Making 
> repetitive "warn" calls is not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4400) Documentation for RollupHIIOptimizer (PIG-4066)

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4400:

Fix Version/s: (was: 0.15.0)
   0.16.0

> Documentation for RollupHIIOptimizer (PIG-4066)
> ---
>
> Key: PIG-4400
> URL: https://issues.apache.org/jira/browse/PIG-4400
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Daniel Dai
>Priority: Critical
>  Labels: hybrid-irg, rollup
> Fix For: 0.16.0
>
> Attachments: PIG-4400-1.patch
>
>
> Adding documentation for RollupHIIOptimizer.
> Please refer to PIG-4066.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4400) Documentation for RollupHIIOptimizer (PIG-4066)

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-4400:
---

Assignee: Daniel Dai

> Documentation for RollupHIIOptimizer (PIG-4066)
> ---
>
> Key: PIG-4400
> URL: https://issues.apache.org/jira/browse/PIG-4400
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Daniel Dai
>Priority: Critical
>  Labels: hybrid-irg, rollup
> Fix For: 0.15.0
>
> Attachments: PIG-4400-1.patch
>
>
> Adding documentation for RollupHIIOptimizer.
> Please refer to PIG-4066.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4566) Reimplement PIG-4066: An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4566:

Attachment: PIG-4566-1.patch

Did some mass cleanup on the original patch:
1. New execution path will only kick in with "pivot" keyword
2. Limit "rollup pivot" to be the last element in cube statement
3. Lots of cleanup on the logical/physical/MR plan

Yet to do:
1. Bug fix and cleanup in the engine part
2. Port it to Tez.

> Reimplement PIG-4066: An optimization for ROLLUP operation in Pig
> -
>
> Key: PIG-4566
> URL: https://issues.apache.org/jira/browse/PIG-4566
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4566-1.patch
>
>
> There are some issues in the original implementation of PIG-4066. Since the 
> fix will touch most part of the patch, I'd like to rollback PIG-4066 and 
> reimplement here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4066) An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4066:

Attachment: PIG-4066-revert.patch

Patch reverted on 0.15 branch and trunk. Attach patch for reverting.

> An optimization for ROLLUP operation in Pig
> ---
>
> Key: PIG-4066
> URL: https://issues.apache.org/jira/browse/PIG-4066
> Project: Pig
>  Issue Type: Improvement
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Quang-Nhat HOANG-XUAN
>  Labels: hybrid-irg, optimization, rollup
> Fix For: 0.15.0
>
> Attachments: Current Rollup vs Our Rollup.jpg, PIG-4066-revert.patch, 
> PIG-4066.2.patch, PIG-4066.3.patch, PIG-4066.4.patch, PIG-4066.5.patch, 
> PIG-4066.patch, TechnicalNotes.2.pdf, TechnicalNotes.pdf, UserGuide.pdf
>
>
> This patch aims at addressing the current limitation of the ROLLUP operator 
> in PIG: most of the work is done in the Map phase of the underlying MapReduce 
> job to generate all possible intermediate keys that the reducer use to 
> aggregate and produce the ROLLUP output. Based on our previous work: 
> “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of 
> MapReduce ROLLUP aggregates” 
> (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
> show that the design space for a ROLLUP implementation allows for a different 
> approach (in-reducer grouping, IRG), in which less work is done in the Map 
> phase and the grouping is done in the Reduce phase. This patch presents the 
> most efficient implementation we designed (Hybrid IRG), which allows defining 
> a parameter to balance between parallelism (in the reducers) and 
> communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { 
> CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]}...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
> operator will be executed with our approach (IRG, Hybrid IRG) while the 
> remaining ROLLUP ahead will be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP 
> implementation and the current ROLLUP. More information can be found at here: 
> http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> Patch can be reviewed at here: https://reviews.apache.org/r/23804/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3953) Ordering on date doesn't work

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3953:

Fix Version/s: (was: 0.12.1)
   0.16.0

> Ordering on date doesn't work
> -
>
> Key: PIG-3953
> URL: https://issues.apache.org/jira/browse/PIG-3953
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Ben Vermeersch
> Fix For: 0.16.0
>
>
> I get the following error when trying to order by datetime datatype:
> java.lang.Exception: java.lang.NoSuchMethodError: 
> org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)I
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
> Caused by: java.lang.NoSuchMethodError: 
> org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)I
>   at 
> org.apache.pig.backend.hadoop.DateTimeWritable$Comparator.compare(DateTimeWritable.java:105)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigDateTimeRawComparator.compare(PigDateTimeRawComparator.java:82)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1248)
>   at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:74)
>   at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1582)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
>   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Can be reproduced using the following dates.txt:
>  
> {quote}
> 12-01-2013
> 13-04-2014
> 12-01-2013
> 01-01-2012
> 02-12-2011
> {quote}
> And the following script:
> {quote}
> records = LOAD 'dates.txt' as (date:chararray);
> dates = foreach records generate ToDate(date, 'dd-MM-');
> orderedDates = order dates by $0;
> DUMP orderedDates;
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3920) Case statement support

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3920:

Fix Version/s: (was: 0.12.1)
   0.16.0

> Case statement support
> --
>
> Key: PIG-3920
> URL: https://issues.apache.org/jira/browse/PIG-3920
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.12.1
>Reporter: llddy
> Fix For: 0.16.0
>
>
> Case when the current function is not supported is not null, expression 
> functions, nested functions, please help solve, thank you very much.
> not supported more "and" function concat



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4567) Allow UDFs to specify a counter increment other than default of 1

2015-05-21 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-4567:


 Summary: Allow UDFs to specify a counter increment other than 
default of 1
 Key: PIG-4567
 URL: https://issues.apache.org/jira/browse/PIG-4567
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prashant Kommireddi
 Fix For: 0.15.0


Current APIs (EvalFunc, LoadFunc and StoreFunc) have a default *warn* method to 
report counters which increments by 1. 
{code}
public final void warn(String msg, Enum warningEnum)
{code}

It would be more flexible to have an additional method that takes in an 
argument to increment the counter by.
{code}
public final void warn(String msg, Enum warningEnum, long incr)
{code}

This will be useful when you might have, for instance, several fields within 
the same row that are bad and you want the counter to reflect that. Making 
repetitive "warn" calls is not ideal.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4566) Reimplement PIG-4066: An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4566:

Issue Type: New Feature  (was: Bug)

> Reimplement PIG-4066: An optimization for ROLLUP operation in Pig
> -
>
> Key: PIG-4566
> URL: https://issues.apache.org/jira/browse/PIG-4566
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
>
> There are some issues in the original implementation of PIG-4066. Since the 
> fix will touch most part of the patch, I'd like to rollback PIG-4066 and 
> reimplement here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4566) Reimplement PIG-4066: An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4566:
---

 Summary: Reimplement PIG-4066: An optimization for ROLLUP 
operation in Pig
 Key: PIG-4566
 URL: https://issues.apache.org/jira/browse/PIG-4566
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.16.0


There are some issues in the original implementation of PIG-4066. Since the fix 
will touch most part of the patch, I'd like to rollback PIG-4066 and 
reimplement here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4066) An optimization for ROLLUP operation in Pig

2015-05-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555201#comment-14555201
 ] 

Daniel Dai commented on PIG-4066:
-

Looking at the patch while trying to document it. The idea is good and simple, 
however, there are couple of issues in the implementation:
1. Some basic queries does not work, eg: "cubed_and_rolled = CUBE salesinp BY 
CUBE(product,year), ROLLUP(region, state, city) pivot 1;"
2. Even if there is no "pivot" keyword, the implementation still using the new 
Pivot code
3. All script will go through RollupHIIOptimizer, it's on by default. Both #2 
and #3 makes it impossible to just make it experimental feature and ship
4. The logic of RollupHII should be wrapped into the new operator, not 
necessary propagate to cogroup/UserFuncExpression, etc
5. There are a lot of redundant code needs to be cleaned up
6. Not a show stop but would like to port it to Tez as well

I already did quite a few cleanup. Since it will touch a majority part of the 
original patch, to make the commit history less confusing, I'd like to rollback 
the patch completely first and then redo it.

> An optimization for ROLLUP operation in Pig
> ---
>
> Key: PIG-4066
> URL: https://issues.apache.org/jira/browse/PIG-4066
> Project: Pig
>  Issue Type: Improvement
>Reporter: Quang-Nhat HOANG-XUAN
>Assignee: Quang-Nhat HOANG-XUAN
>  Labels: hybrid-irg, optimization, rollup
> Fix For: 0.15.0
>
> Attachments: Current Rollup vs Our Rollup.jpg, PIG-4066.2.patch, 
> PIG-4066.3.patch, PIG-4066.4.patch, PIG-4066.5.patch, PIG-4066.patch, 
> TechnicalNotes.2.pdf, TechnicalNotes.pdf, UserGuide.pdf
>
>
> This patch aims at addressing the current limitation of the ROLLUP operator 
> in PIG: most of the work is done in the Map phase of the underlying MapReduce 
> job to generate all possible intermediate keys that the reducer use to 
> aggregate and produce the ROLLUP output. Based on our previous work: 
> “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of 
> MapReduce ROLLUP aggregates” 
> (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
> show that the design space for a ROLLUP implementation allows for a different 
> approach (in-reducer grouping, IRG), in which less work is done in the Map 
> phase and the grouping is done in the Reduce phase. This patch presents the 
> most efficient implementation we designed (Hybrid IRG), which allows defining 
> a parameter to balance between parallelism (in the reducers) and 
> communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { 
> CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]}...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
> operator will be executed with our approach (IRG, Hybrid IRG) while the 
> remaining ROLLUP ahead will be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP 
> implementation and the current ROLLUP. More information can be found at here: 
> http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> Patch can be reviewed at here: https://reviews.apache.org/r/23804/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-4556.
-
Resolution: Fixed

PIG-4556-3.patch committed to both 0.15 branch and trunk. Thanks Thejas/Rohini 
for quick review!

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch, PIG-4556-3.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555174#comment-14555174
 ] 

Rohini Palaniswamy commented on PIG-4556:
-

+1

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch, PIG-4556-3.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4556:

Attachment: PIG-4556-3.patch

Look at PIG-4247, it only need to add s3 properties in local mode, so yes, we 
can simplify. Attach another patch as Rohini suggested.

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch, PIG-4556-3.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555170#comment-14555170
 ] 

Rohini Palaniswamy commented on PIG-4556:
-

bq. I believe s3 is used in non-local modes as well.
   Yes. But the additional code getS3Conf() is only needed in local mode as 
core-site.xml is not loaded there. Non-local modes load core-site.xml and so 
will have the s3 properties from that already.

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555165#comment-14555165
 ] 

Thejas M Nair commented on PIG-4556:


I believe s3 is used in non-local modes as well.
Change looks good to me. (for my reference, since it took me few mins to figure 
out, the real change is in order in which params are passed to  
ConfigurationUtil.mergeConf).


> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555160#comment-14555160
 ] 

Rohini Palaniswamy commented on PIG-4556:
-

Do we need the s3 conf in non local mode? It can be simplified to below.

{code}
JobConf jc;
if (!this.pigContext.getExecType().isLocal()) {
jc = getExecConf(properties);

// Trick to ...
} else {
// If we are running in local mode we dont read the hadoop conf file
.
properties.setProperty(ALTERNATIVE_FILE_SYSTEM_LOCATION, 
"file:///");

jc = getLocalConf();
// Pick s3 properties from core-site.xml and add to local conf
JobConf s3Jc = getS3Conf();
ConfigurationUtil.mergeConf(jc, s3Jc);
}
{code}

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4556:

Attachment: PIG-4556-2.patch

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (PIG-4556) Local mode is broken in some case by PIG-4247

2015-05-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-4556:
-

This fix break MR mode. In particular, Pig fails on secure cluster. Need one 
more fix.

> Local mode is broken in some case by PIG-4247
> -
>
> Key: PIG-4556
> URL: https://issues.apache.org/jira/browse/PIG-4556
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4556-1.patch, PIG-4556-2.patch
>
>
> HExecutionEngine.getS3Conf is wrong. It should only return s3 config. 
> Currently it will return all the properties, including *-site.xml even in 
> local mode. In one particular case, mapred-site.xml contains 
> "mapreduce.application.framework.path", this will going to the local mode 
> config, thus we see the exception:
> {code}
> Message: java.io.FileNotFoundException: File 
> file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
>   at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
>   at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
>   at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
>   at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
>   at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Pig-trunk-commit #2138

2015-05-21 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-4562: Typo in DataType.toDateTime

[daijy] PIG-4563: Upgrade to released Tez 0.7.0

--
[...truncated 4417 lines...]
[junit] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.914 sec
[junit] Running org.apache.pig.test.TestNewPlanImplicitSplit
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.12 sec
[junit] Running org.apache.pig.test.TestNewPlanListener
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.368 sec
[junit] Running org.apache.pig.test.TestNewPlanLogToPhyTranslationVisitor
[junit] Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.792 sec
[junit] Running org.apache.pig.test.TestNewPlanLogicalOptimizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.389 sec
[junit] Running org.apache.pig.test.TestNewPlanOperatorPlan
[junit] Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.009 sec
[junit] Running org.apache.pig.test.TestNewPlanPruneMapKeys
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.958 sec
[junit] Running org.apache.pig.test.TestNewPlanPushDownForeachFlatten
[junit] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.879 sec
[junit] Running org.apache.pig.test.TestNewPlanPushUpFilter
[junit] Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.965 sec
[junit] Running org.apache.pig.test.TestNewPlanRule
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.332 sec
[junit] Running org.apache.pig.test.TestNotEqualTo
[junit] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.445 sec
[junit] Running org.apache.pig.test.TestNull
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.437 sec
[junit] Running org.apache.pig.test.TestNullConstant
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.473 sec
[junit] Running org.apache.pig.test.TestNumberOfReducers
[junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
511.52 sec
[junit] Running org.apache.pig.test.TestOptimizeLimit
[junit] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.346 sec
[junit] Running org.apache.pig.test.TestOrderBy3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.258 sec
[junit] Running org.apache.pig.test.TestPOBinCond
[junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.456 sec
[junit] Running org.apache.pig.test.TestPOCast
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.878 sec
[junit] Running org.apache.pig.test.TestPODistinct
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.388 sec
[junit] Running org.apache.pig.test.TestPOGenerate
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.362 sec
[junit] Running org.apache.pig.test.TestPOMapLookUp
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.352 sec
[junit] Running org.apache.pig.test.TestPONegative
[junit] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.711 sec
[junit] Running org.apache.pig.test.TestPOPartialAgg
[junit] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.929 sec
[junit] Running org.apache.pig.test.TestPOPartialAggPlan
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
0.245 sec
[junit] Running org.apache.pig.test.TestPORegexp
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.372 sec
[junit] Running org.apache.pig.test.TestPOSort
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.444 sec
[junit] Running org.apache.pig.test.TestPOSplit
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.364 sec
[junit] Running org.apache.pig.test.TestPOUserFunc
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.446 sec
[junit] Running org.apache.pig.test.TestPackage
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.787 sec
[junit] Running org.apache.pig.test.TestParamSubPreproc
[junit] Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.352 sec
[junit] Running org.apache.pig.test.TestParser
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.996 sec
[junit] Running org.apache.pig.test.TestPhyOp
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.433 sec
[junit] Running org.apache.pig.test.TestPhyPatternMatch
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.352 sec
[junit] R

Review Request 34537: PIG-4365 TOP udf should implement Accumulator interface

2015-05-21 Thread Eyal Allweil

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34537/
---

Review request for pig.


Bugs: PIG-4365
https://issues.apache.org/jira/browse/PIG-4365


Repository: pig


Description
---

I think the implementation of Accumulator is pretty straightforward - I 
extented AccumulatorEvalFunc and so that very few changes were needed.


Diffs
-

  trunk/src/org/apache/pig/builtin/TOP.java 1680807 
  trunk/test/org/apache/pig/builtin/TestTOP.java 1680807 
  trunk/test/org/apache/pig/test/TestAccumulator.java 1680807 

Diff: https://reviews.apache.org/r/34537/diff/


Testing
---


Thanks,

Eyal Allweil