[jira] [Commented] (PIG-4385) testDefaultBootup fails because it cannot find pig.properties
[ https://issues.apache.org/jira/browse/PIG-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336454#comment-14336454 ] Martin Kudlej commented on PIG-4385: I use Pig distributed by Hortonworks and PIG_HOME in that case leads to /usr/lib/pig. This directory is own by rwxr_xr_x root:root so pig user or any other user cannot write there. So this test cannot write properties from that directory and fails. Still I would like to run pig smoke tests with this distribution package. That's why I've created this patch. testDefaultBootup fails because it cannot find pig.properties --- Key: PIG-4385 URL: https://issues.apache.org/jira/browse/PIG-4385 Project: Pig Issue Type: Test Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Martin Kudlej Attachments: 0001-PIG-4385.patch testDefaultBootup fails because Pig cannot find created file pig.properties. I think test of using pig.property file should be separated from testDefaultBootup. Only clean way how to do it is to use Properties for start of PigServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor
[ https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336462#comment-14336462 ] Brian Johnson commented on PIG-4408: The included test case fails without the change with the wrong count. I can't remember exactly now, but I believe with only one record in the LHS of the merge it will not produce any output at all. It still eventually gets to EOP now, but like in PIG-4166 this forces another pass to get any records which are cached waiting for the LHS key to change before outputting. I think there was actually a bug in it before this change. Merge join should support replicated join as a predecessor -- Key: PIG-4408 URL: https://issues.apache.org/jira/browse/PIG-4408 Project: Pig Issue Type: New Feature Affects Versions: 0.14.0 Reporter: Brian Johnson Assignee: Brian Johnson Fix For: 0.14.1 Attachments: patch Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work after it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4426) RowNumber(simple) Rank not producing correct results
[ https://issues.apache.org/jira/browse/PIG-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336605#comment-14336605 ] Koji Noguchi commented on PIG-4426: --- +1 on moving away from counter based approach. (I believe Tez version of Rank already does that.) RowNumber(simple) Rank not producing correct results Key: PIG-4426 URL: https://issues.apache.org/jira/browse/PIG-4426 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Fix For: 0.15.0 Attachments: pig-4426-v01.txt, pig-4426-v02.txt, pig-4426-v03.txt After PIG-4392, started seeing TestRank3.testRankWithSplitInMap (and some others) failing with {noformat} Comparing actual and expected results. expected:[(1,1,2), (1,1,2), (1,3,1), (2,1,2), (3,1,2), (3,2,3), (3,2,4), (4,2,3), (5,2,4), (5,3,1)] but was:[(1,1,2), (1,1,2), (3,2,3), (3,2,4), (5,3,1)] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work
[ https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336632#comment-14336632 ] Kevin J. Price commented on PIG-4433: - Thanks, Daniel! Will do. Loading bigdecimal in nested tuple does not work Key: PIG-4433 URL: https://issues.apache.org/jira/browse/PIG-4433 Project: Pig Issue Type: Bug Affects Versions: 0.14.0, 0.14.1, 0.15.0 Reporter: Kevin J. Price Assignee: Kevin J. Price Fix For: 0.15.0 Attachments: PIG-4433-1.patch The parsing of BigDecimal data types in a nested tuple, as implemented by Utf8StorageConverter.java, does not work. There's a break; missing from a switch statement. Code example that demonstrates the problem: === input.txt === (17,1234567890.0987654321) === pig_script ===: inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal)); STORE inp INTO 'output'; === output === (17,) With patch, the output becomes the expected: (17,1234567890.0987654321) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor
[ https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336850#comment-14336850 ] Brian Johnson commented on PIG-4408: It looks like only one of them needs to change to STATUS_NULL at line 402. I'm running all the tests again and then I'll resubmit the patch Merge join should support replicated join as a predecessor -- Key: PIG-4408 URL: https://issues.apache.org/jira/browse/PIG-4408 Project: Pig Issue Type: New Feature Affects Versions: 0.14.0 Reporter: Brian Johnson Assignee: Brian Johnson Fix For: 0.14.1 Attachments: patch Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work after it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4434) Improve auto-parallelism for tez
Daniel Dai created PIG-4434: --- Summary: Improve auto-parallelism for tez Key: PIG-4434 URL: https://issues.apache.org/jira/browse/PIG-4434 Project: Pig Issue Type: Improvement Components: tez Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.15.0 Tez auto-parallelism currently has some limitation: 1. ShuffledVertexManager only decrease parallelism not increase 2. Pig currently exaggerate parallelism at frontend, ShuffledVertexManager might get initial parallelism way large than actual, that would be costly Instead of that, we can gradually adjust initial vertex parallelism at runtime once upstream vertexes finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4408) Merge join should support replicated join as a predecessor
[ https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Johnson updated PIG-4408: --- Attachment: patch Merge join should support replicated join as a predecessor -- Key: PIG-4408 URL: https://issues.apache.org/jira/browse/PIG-4408 Project: Pig Issue Type: New Feature Affects Versions: 0.14.0 Reporter: Brian Johnson Assignee: Brian Johnson Fix For: 0.14.1 Attachments: patch, patch Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work after it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor
[ https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336906#comment-14336906 ] Brian Johnson commented on PIG-4408: updated the patch with a much smaller scoped change Merge join should support replicated join as a predecessor -- Key: PIG-4408 URL: https://issues.apache.org/jira/browse/PIG-4408 Project: Pig Issue Type: New Feature Affects Versions: 0.14.0 Reporter: Brian Johnson Assignee: Brian Johnson Fix For: 0.14.1 Attachments: patch, patch Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work after it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-3294) Allow Pig use Hive UDFs
[ https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3294: Attachment: PIG-3294-3.patch PIG-3294-2.patch does not compile under Hadoop 1. Attach a new patch. Allow Pig use Hive UDFs --- Key: PIG-3294 URL: https://issues.apache.org/jira/browse/PIG-3294 Project: Pig Issue Type: New Feature Reporter: Daniel Dai Assignee: Daniel Dai Labels: gsoc2013, java Fix For: 0.15.0 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, PIG-3294-before-refactory.patch It would be nice if Pig provide some interoperability with Hive. We can wrap Hive UDF in Pig so we can use Hive UDF in Pig. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4385) testDefaultBootup fails because it cannot find pig.properties
[ https://issues.apache.org/jira/browse/PIG-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4385: Resolution: Fixed Fix Version/s: 0.15.0 Assignee: Martin Kudlej Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Makes sense. I also added back the declaration of out. Otherwise compilation fail. Patch committed to trunk. Thanks Martin! testDefaultBootup fails because it cannot find pig.properties --- Key: PIG-4385 URL: https://issues.apache.org/jira/browse/PIG-4385 Project: Pig Issue Type: Test Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Martin Kudlej Assignee: Martin Kudlej Fix For: 0.15.0 Attachments: 0001-PIG-4385.patch testDefaultBootup fails because Pig cannot find created file pig.properties. I think test of using pig.property file should be separated from testDefaultBootup. Only clean way how to do it is to use Properties for start of PigServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (25 issues) Subscriber: pigdaily Key Summary PIG-4432Built-in VALUELIST and VALUESET UDFs do not preserve the schema when the map value type is a complex type https://issues.apache.org/jira/browse/PIG-4432 PIG-4412Race condition in writing multiple outputs from STREAM op https://issues.apache.org/jira/browse/PIG-4412 PIG-4408Merge join should support replicated join as a predecessor https://issues.apache.org/jira/browse/PIG-4408 PIG-4389Flag to run selected test suites in e2e tests https://issues.apache.org/jira/browse/PIG-4389 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4300Enable unit test TestSample for spark https://issues.apache.org/jira/browse/PIG-4300 PIG-4287Enable unit test TestLimitVariable for spark https://issues.apache.org/jira/browse/PIG-4287 PIG-4264Port TestAvroStorage to tez local mode https://issues.apache.org/jira/browse/PIG-4264 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384
[jira] [Commented] (PIG-4408) Merge join should support replicated join as a predecessor
[ https://issues.apache.org/jira/browse/PIG-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337996#comment-14337996 ] Daniel Dai commented on PIG-4408: - The included test case succeed even without the STATUS_NULL change. Is there any other precondition? Merge join should support replicated join as a predecessor -- Key: PIG-4408 URL: https://issues.apache.org/jira/browse/PIG-4408 Project: Pig Issue Type: New Feature Affects Versions: 0.14.0 Reporter: Brian Johnson Assignee: Brian Johnson Fix For: 0.14.1 Attachments: patch, patch Since a replicated join doesn't trigger a reduce or change the output ordering a merge join should work after it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4435) TPC-DI queries for Pig
Daniel Dai created PIG-4435: --- Summary: TPC-DI queries for Pig Key: PIG-4435 URL: https://issues.apache.org/jira/browse/PIG-4435 Project: Pig Issue Type: Improvement Reporter: Daniel Dai Migrate TPC-DI queries to Pig so we can compare performance with other tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4436) TPC-DS queries for Pig
Daniel Dai created PIG-4436: --- Summary: TPC-DS queries for Pig Key: PIG-4436 URL: https://issues.apache.org/jira/browse/PIG-4436 Project: Pig Issue Type: Improvement Reporter: Daniel Dai Migrate TPC-DS queries to Pig so we can compare performance with other tool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit
[ https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337976#comment-14337976 ] Daniel Dai commented on PIG-3114: - The original fix does not apply to the new function introduced in PIG-2692. I cannot find a easy fix for that. We do need to register the query multiple times to get alias mock working, but that causes Duplicated macro error. One simple work around is to change the test to register macro separately: {code} String[] script = { data = LOAD 'input' AS (query:CHARARRAY);, queries_group = GROUP data BY query;, queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total;, queries_ordered = my_macro_1(queries_count, query);, queries_limit = LIMIT queries_ordered 2;, STORE queries_limit INTO 'output'; }; PigTest test = new PigTest(script, new String[]{}); test.getPigServer().registerQuery(DEFINE my_macro_1 (QUERY, A) RETURNS C { + $C = ORDER $QUERY BY total DESC, $A; + } ;); test.assertOutput(data, new String[] {1}, queries_limit, new String[] {(1)}); {code} Duplicated macro name error when using pigunit -- Key: PIG-3114 URL: https://issues.apache.org/jira/browse/PIG-3114 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.11 Reporter: Chetan Nadgire Assignee: Daniel Dai Fix For: 0.12.0 Attachments: PIG-3114-2.patch, PIG-3114-3.patch, PIG-3114.patch, PIG-3114.patch, PatchedPigTest.java, bug_pig.tbz2 I'm using PigUnit to test a pig script within which a macro is defined. Pig runs fine on cluster but getting parsing error with pigunit. So I tried very basic pig script with macro and getting similar error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988) at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56) at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261) at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406) at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277) at