[jira] [Commented] (PIG-4385) testDefaultBootup fails because it cannot find pig.properties

2015-02-25 Thread Martin Kudlej (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336454#comment-14336454
 ] 

Martin Kudlej commented on PIG-4385:


I use Pig distributed by Hortonworks and PIG_HOME in that case leads to 
/usr/lib/pig. This directory is own by rwxr_xr_x root:root so pig user or any 
other user cannot write there. So this test cannot write properties from that 
directory and fails. Still I would like to run pig smoke tests with this 
distribution package. That's why I've created this patch.

 testDefaultBootup fails because it cannot find pig.properties
 ---

 Key: PIG-4385
 URL: https://issues.apache.org/jira/browse/PIG-4385
 Project: Pig
  Issue Type: Test
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Martin Kudlej
 Attachments: 0001-PIG-4385.patch


 testDefaultBootup fails because Pig cannot find created file 
 pig.properties. I think  test of using pig.property file should be 
 separated from testDefaultBootup. Only clean way how to do it is to use 
 Properties for start of PigServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor

2015-02-25 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336462#comment-14336462
 ] 

Brian Johnson commented on PIG-4408:


The included test case fails without the change with the wrong count. I can't 
remember exactly now, but I believe with only one record in the LHS of the 
merge it will not produce any output at all. It still eventually gets to EOP 
now, but like in PIG-4166 this forces another pass to get any records which are 
cached waiting for the LHS key to change before outputting. I think there was 
actually a bug in it before this change.

 Merge join should support replicated join as a predecessor
 --

 Key: PIG-4408
 URL: https://issues.apache.org/jira/browse/PIG-4408
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Brian Johnson
Assignee: Brian Johnson
 Fix For: 0.14.1

 Attachments: patch


 Since a replicated join doesn't trigger a reduce or change the output 
 ordering a merge join should work after it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4426) RowNumber(simple) Rank not producing correct results

2015-02-25 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336605#comment-14336605
 ] 

Koji Noguchi commented on PIG-4426:
---

+1 on moving away from counter based approach. (I believe Tez version of Rank 
already does that.)

 RowNumber(simple) Rank not producing correct results
 

 Key: PIG-4426
 URL: https://issues.apache.org/jira/browse/PIG-4426
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
 Fix For: 0.15.0

 Attachments: pig-4426-v01.txt, pig-4426-v02.txt, pig-4426-v03.txt


 After PIG-4392, started seeing TestRank3.testRankWithSplitInMap (and some 
 others) failing with 
 {noformat}
 Comparing actual and expected results.  expected:[(1,1,2), (1,1,2), (1,3,1), 
 (2,1,2), (3,1,2), (3,2,3), (3,2,4), (4,2,3), (5,2,4), (5,3,1)] but 
 was:[(1,1,2), (1,1,2), (3,2,3), (3,2,4), (5,3,1)]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work

2015-02-25 Thread Kevin J. Price (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336632#comment-14336632
 ] 

Kevin J. Price commented on PIG-4433:
-

Thanks, Daniel! Will do.

 Loading bigdecimal in nested tuple does not work
 

 Key: PIG-4433
 URL: https://issues.apache.org/jira/browse/PIG-4433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0, 0.14.1, 0.15.0
Reporter: Kevin J. Price
Assignee: Kevin J. Price
 Fix For: 0.15.0

 Attachments: PIG-4433-1.patch


 The parsing of BigDecimal data types in a nested tuple, as implemented by 
 Utf8StorageConverter.java, does not work. There's a break; missing from a 
 switch statement.
 Code example that demonstrates the problem:
 === input.txt ===
 (17,1234567890.0987654321)
 === pig_script ===:
 inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal));
 STORE inp INTO 'output';
 === output ===
 (17,)
 With patch, the output becomes the expected:
 (17,1234567890.0987654321)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor

2015-02-25 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336850#comment-14336850
 ] 

Brian Johnson commented on PIG-4408:


It looks like only one of them needs to change to STATUS_NULL at line 402. I'm 
running all the tests again and then I'll resubmit the patch

 Merge join should support replicated join as a predecessor
 --

 Key: PIG-4408
 URL: https://issues.apache.org/jira/browse/PIG-4408
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Brian Johnson
Assignee: Brian Johnson
 Fix For: 0.14.1

 Attachments: patch


 Since a replicated join doesn't trigger a reduce or change the output 
 ordering a merge join should work after it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4434) Improve auto-parallelism for tez

2015-02-25 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4434:
---

 Summary: Improve auto-parallelism for tez
 Key: PIG-4434
 URL: https://issues.apache.org/jira/browse/PIG-4434
 Project: Pig
  Issue Type: Improvement
  Components: tez
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.15.0


Tez auto-parallelism currently has some limitation:
1. ShuffledVertexManager only decrease parallelism not increase
2. Pig currently exaggerate parallelism at frontend, ShuffledVertexManager 
might get initial parallelism way large than actual, that would be costly

Instead of that, we can gradually adjust initial vertex parallelism at runtime 
once upstream vertexes finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4408) Merge join should support replicated join as a predecessor

2015-02-25 Thread Brian Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Johnson updated PIG-4408:
---
Attachment: patch

 Merge join should support replicated join as a predecessor
 --

 Key: PIG-4408
 URL: https://issues.apache.org/jira/browse/PIG-4408
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Brian Johnson
Assignee: Brian Johnson
 Fix For: 0.14.1

 Attachments: patch, patch


 Since a replicated join doesn't trigger a reduce or change the output 
 ordering a merge join should work after it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor

2015-02-25 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336906#comment-14336906
 ] 

Brian Johnson commented on PIG-4408:


updated the patch with a much smaller scoped change

 Merge join should support replicated join as a predecessor
 --

 Key: PIG-4408
 URL: https://issues.apache.org/jira/browse/PIG-4408
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Brian Johnson
Assignee: Brian Johnson
 Fix For: 0.14.1

 Attachments: patch, patch


 Since a replicated join doesn't trigger a reduce or change the output 
 ordering a merge join should work after it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3294) Allow Pig use Hive UDFs

2015-02-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3294:

Attachment: PIG-3294-3.patch

PIG-3294-2.patch does not compile under Hadoop 1. Attach a new patch.

 Allow Pig use Hive UDFs
 ---

 Key: PIG-3294
 URL: https://issues.apache.org/jira/browse/PIG-3294
 Project: Pig
  Issue Type: New Feature
Reporter: Daniel Dai
Assignee: Daniel Dai
  Labels: gsoc2013, java
 Fix For: 0.15.0

 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
 PIG-3294-before-refactory.patch


 It would be nice if Pig provide some interoperability with Hive. We can wrap 
 Hive UDF in Pig so we can use Hive UDF in Pig.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4385) testDefaultBootup fails because it cannot find pig.properties

2015-02-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4385:

   Resolution: Fixed
Fix Version/s: 0.15.0
 Assignee: Martin Kudlej
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Makes sense. I also added back the declaration of out. Otherwise compilation 
fail.

Patch committed to trunk. Thanks Martin!

 testDefaultBootup fails because it cannot find pig.properties
 ---

 Key: PIG-4385
 URL: https://issues.apache.org/jira/browse/PIG-4385
 Project: Pig
  Issue Type: Test
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Martin Kudlej
Assignee: Martin Kudlej
 Fix For: 0.15.0

 Attachments: 0001-PIG-4385.patch


 testDefaultBootup fails because Pig cannot find created file 
 pig.properties. I think  test of using pig.property file should be 
 separated from testDefaultBootup. Only clean way how to do it is to use 
 Properties for start of PigServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2015-02-25 Thread jira
Issue Subscription
Filter: PIG patch available (25 issues)

Subscriber: pigdaily

Key Summary
PIG-4432Built-in VALUELIST and VALUESET UDFs do not preserve the schema 
when the map value type is a complex type
https://issues.apache.org/jira/browse/PIG-4432
PIG-4412Race condition in writing multiple outputs from STREAM op
https://issues.apache.org/jira/browse/PIG-4412
PIG-4408Merge join should support replicated join as a predecessor
https://issues.apache.org/jira/browse/PIG-4408
PIG-4389Flag to run selected test suites in e2e tests
https://issues.apache.org/jira/browse/PIG-4389
PIG-4377Skewed outer join produce wrong result in some cases
https://issues.apache.org/jira/browse/PIG-4377
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4300Enable unit test TestSample for spark
https://issues.apache.org/jira/browse/PIG-4300
PIG-4287Enable unit test TestLimitVariable for spark
https://issues.apache.org/jira/browse/PIG-4287
PIG-4264Port TestAvroStorage to tez local mode
https://issues.apache.org/jira/browse/PIG-4264
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4193Make collected group work with Spark
https://issues.apache.org/jira/browse/PIG-4193
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384


[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor

2015-02-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337996#comment-14337996
 ] 

Daniel Dai commented on PIG-4408:
-

The included test case succeed even without the STATUS_NULL change. Is there 
any other precondition?

 Merge join should support replicated join as a predecessor
 --

 Key: PIG-4408
 URL: https://issues.apache.org/jira/browse/PIG-4408
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Brian Johnson
Assignee: Brian Johnson
 Fix For: 0.14.1

 Attachments: patch, patch


 Since a replicated join doesn't trigger a reduce or change the output 
 ordering a merge join should work after it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4435) TPC-DI queries for Pig

2015-02-25 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4435:
---

 Summary: TPC-DI queries for Pig
 Key: PIG-4435
 URL: https://issues.apache.org/jira/browse/PIG-4435
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai


Migrate TPC-DI queries to Pig so we can compare performance with other tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4436) TPC-DS queries for Pig

2015-02-25 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4436:
---

 Summary: TPC-DS queries for Pig
 Key: PIG-4436
 URL: https://issues.apache.org/jira/browse/PIG-4436
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai


Migrate TPC-DS queries to Pig so we can compare performance with other tool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit

2015-02-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337976#comment-14337976
 ] 

Daniel Dai commented on PIG-3114:
-

The original fix does not apply to the new function introduced in PIG-2692. I 
cannot find a easy fix for that. We do need to register the query multiple 
times to get alias mock working, but that causes Duplicated macro error.

One simple work around is to change the test to register macro separately:
{code}
String[] script = {
data =  LOAD 'input' AS (query:CHARARRAY);,
queries_group = GROUP data BY query;,
queries_count = FOREACH queries_group GENERATE group AS query, 
COUNT(data) AS total;,
queries_ordered = my_macro_1(queries_count, query);,
queries_limit = LIMIT queries_ordered 2;,
STORE queries_limit INTO 'output';
};
PigTest test = new PigTest(script, new String[]{});
test.getPigServer().registerQuery(DEFINE my_macro_1 (QUERY, A) RETURNS 
C { +
$C = ORDER $QUERY BY total DESC, $A; +
} ;);
test.assertOutput(data, new String[] {1},
queries_limit, new String[] {(1)});
{code}

 Duplicated macro name error when using pigunit
 --

 Key: PIG-3114
 URL: https://issues.apache.org/jira/browse/PIG-3114
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.11
Reporter: Chetan Nadgire
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3114-2.patch, PIG-3114-3.patch, PIG-3114.patch, 
 PIG-3114.patch, PatchedPigTest.java, bug_pig.tbz2


 I'm using PigUnit to test a pig script within which a macro is defined.
 Pig runs fine on cluster but getting parsing error with pigunit.
 So I tried very basic pig script with macro and getting similar error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1'
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
   at 
 org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
   at 
 org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 
 'my_macro_1'
   at 
 org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
   at 
 org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
   at