[jira] [Commented] (PIG-2114) Enhancements to PIG HBaseStorage Load Store Func with extra scan configurations
[ https://issues.apache.org/jira/browse/PIG-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082955#comment-13082955 ] Dmitriy V. Ryaboy commented on PIG-2114: Apologies again for taking a while to review. Thanks, that looks like a fair bit of work. First, just a couple of procedural notes: 1) make sure the new files don't have @author annotations and do have the apache headers 2) there is already a TestHBaseStorage. Why add a new one in util? 3) This is sort of a PITA request, especially as there are plenty of places in the codebase that don't adhere to this practice, but can you make sure to do things like put spaces after commas (as in Mapfamily,Mapqualifier,Maptimestamp,value) and before opening parens (as in for(Map.Entry valueEntry: ...), wrap lines to a reasonable length, etc? My major concern with the patch is as follows. In getNext() you inserted a completely new flow that is used if timestamps are used. It bypasses all the existing logic for how results are created, and as far as I can see, does not respect things like projection pushdown. It also makes it so any future work on the hbase loader logic has to happen in two places. Let's not do that. Isn't loading a single-version row just a special case of loading multiple versions (with n = 1)? We should be able to do this in one go. There being so much stuff mixed in here, I propose we get the smaller stuff like PIG-2115 in. Some of the things you are doing here are also pretty non-controversial, like omitNulls and prefix filters, we can get those in pretty easily. Let's factor out the multiple versions changes and add them to PIG-1832, leaving this (blessedly unspecifically titled :)) ticket to deal with the smaller stuff. Enhancements to PIG HBaseStorage Load Store Func with extra scan configurations - Key: PIG-2114 URL: https://issues.apache.org/jira/browse/PIG-2114 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.9.0 Reporter: Hariprasad Kuppuswamy Assignee: Hariprasad Kuppuswamy Priority: Minor Labels: hbase, storage Fix For: 0.10 Attachments: Enhancments-to-enable-timestampversion-based-row-scan.patch - Added capability to specify scan based on timestamps (Hariprasad Kuppuswwamy) - Ability to specify number of versions to be fetched with current scan (Hariprasad Kuppuswwamy) - Configure the rowkey prefixes filter for the scan (Hariprasad Kuppuswwamy) - Added ability to omit nulls when dealing with hbase storage (Greg Bowyer) - Added ability to specify Put timestamps while insertion (Hariprasad Kuppuswamy) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Failing tests after parser change?
HBaseStorage is failing, and it's not something we did to HBaseStorage... Looks like the parser. Any takers? Testcase: testStoreToHBase_2_with_projection took 0.34 sec Caused an ERROR Error during parsing. line 1, column 84 mismatched input '(' expecting SEMI_COLON org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.PigServer.registerQuery(PigServer.java:553) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706) Caused by: Failed to parse: line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)
[jira] [Commented] (PIG-2174) HBaseStorage column filters miss some fields
[ https://issues.apache.org/jira/browse/PIG-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082958#comment-13082958 ] Dmitriy V. Ryaboy commented on PIG-2174: +1 assuming test-patch passes. Sadly at the moment TestHBaseStorage doesn't pass in trunk even without this patch.. HBaseStorage column filters miss some fields Key: PIG-2174 URL: https://issues.apache.org/jira/browse/PIG-2174 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Bill Graham Attachments: PIG-2174_1.patch When mixing static and dynamic column mappings, {{HBaseStorage}} sometimes doesn't pick up the static column values and nulls are returned. I believe this bug has been masked by HBase being a bit over-eager when it comes to respecting column filters (i.e. HBase is returning more columns than it should). For example, this query returns nulls for the {{sc}} column, even when it contains data: {noformat} a = LOAD 'hbase://pigtable_1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('pig:sc pig:prefixed_col_*','-loadKey') AS (rowKey:chararray, sc:chararray, pig_cf_map:map[]); {noformat} What is very strange (about HBase), is that the same script will return values just fine if {{sc}} is instead {{col_a}}, assuming of course that both columns contain data: {noformat} a = LOAD 'hbase://pigtable_1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('pig:col_a pig:prefixed_col_*','-loadKey') AS (rowKey:chararray, col_a:chararray, pig_cf_map:map[]); {noformat} Potential HBase issues aside, I think there is a bug in the logic on the Pig side. Patch to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1060
See https://builds.apache.org/job/Pig-trunk/1060/changes Changes: [thejas] PIG-2176: add logical plan assumption checker -- [...truncated 37928 lines...] [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /tmp/TestStore-output--716161521724243.txt_cleanupOnFailure_succeeded by DFSClient_325773412 [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] at org.apache.hadoop.ipc.Client.call(Client.java:740) [junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) [junit] at $Proxy0.complete(Unknown Source) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) [junit] at $Proxy0.complete(Unknown Source) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) [junit] at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1043) [junit] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:237) [junit] at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269) [junit] at org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:83) [junit] at org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77) [junit] at org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68) [junit] at org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:127) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) [junit] Shutting down the Mini HDFS Cluster [junit] 11/08/11 10:31:57 WARN hdfs.StateChange: DIR* NameSystem.completeFile: failed to complete /tmp/TestStore-output--7612399809477939598.txt_cleanupOnFailure_succeeded1 because dir.getFileBlocks() is null and pendingFile is null [junit] Shutting down DataNode 3 [junit] 11/08/11 10:31:57 INFO ipc.Server: IPC Server handler 4 on 34636, call complete(/tmp/TestStore-output--7612399809477939598.txt_cleanupOnFailure_succeeded1, DFSClient_325773412) from 127.0.0.1:45206: error: java.io.IOException: Could not complete write
[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083081#comment-13083081 ] Zhijie Shen commented on PIG-1429: -- Hi Daniel, Thanks for your review, I've fixed 1, 3, 4. For 5, I'll add the comments and some end-to-end tests. For 2, if we want to stick to boolean as the name of Boolean Type, we'd better revise the grammar: changing from bool keyword to boolean. Otherwise, Utils.getSchemaFromString() will be broken if the supplied schema string uses bool. And the name used in Pig Latin commands should be consistent to that in the displayed plan/schema. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Zhijie Shen Labels: boolean, gsoc2011, pig, type Attachments: PIG-1429_1.patch, PIG-1429_2.patch, PIG-1429_3.patch, working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2215) Newlines in function arguments still cause exceptions to be thrown
[ https://issues.apache.org/jira/browse/PIG-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Warrington updated PIG-2215: - Attachment: PIG-2215-0.patch Newlines in function arguments still cause exceptions to be thrown -- Key: PIG-2215 URL: https://issues.apache.org/jira/browse/PIG-2215 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Adam Warrington Attachments: PIG-2215-0.patch PIG-1749 was an attempt to allow newlines in function arguments. It appears that the AstValidator and the LogicalPlanGenerator grammars were not updated, so the following exception and stracktrace will be thrown when executing a script that has newlines in function arguments: ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) Failed to parse: Pig script failed to parse: MismatchedTokenException(93!=3) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) at org.apache.pig.PigServer.registerQuery(PigServer.java:583) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:487) at org.apache.pig.Main.main(Main.java:108) Caused by: MismatchedTokenException(93!=3) at org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) at org.apache.pig.parser.AstValidator.load_clause(AstValidator.java:2464) at org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:934) at org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) at org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) at org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2215) Newlines in function arguments still cause exceptions to be thrown
[ https://issues.apache.org/jira/browse/PIG-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083215#comment-13083215 ] Adam Warrington commented on PIG-2215: -- This patch updates the LogicalPlanGenerator and AstValidator grammars, and adds 2 unit tests to test the new functionality. Newlines in function arguments still cause exceptions to be thrown -- Key: PIG-2215 URL: https://issues.apache.org/jira/browse/PIG-2215 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Adam Warrington Attachments: PIG-2215-0.patch PIG-1749 was an attempt to allow newlines in function arguments. It appears that the AstValidator and the LogicalPlanGenerator grammars were not updated, so the following exception and stracktrace will be thrown when executing a script that has newlines in function arguments: ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3) Failed to parse: Pig script failed to parse: MismatchedTokenException(93!=3) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) at org.apache.pig.PigServer.registerQuery(PigServer.java:583) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:487) at org.apache.pig.Main.main(Main.java:108) Caused by: MismatchedTokenException(93!=3) at org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497) at org.apache.pig.parser.AstValidator.load_clause(AstValidator.java:2464) at org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:934) at org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574) at org.apache.pig.parser.AstValidator.statement(AstValidator.java:396) at org.apache.pig.parser.AstValidator.query(AstValidator.java:306) at org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Failing tests after parser change?
This looks like the intermittent Antlr bug we're seeing (https://issues.apache.org/jira/browse/PIG-2055). We're testing other versions of Antlr to try to fix this, but until we find one that addresses the issue the only solution is to do ant clean, and then rebuild and see if it goes away. We have also noticed it happens more often when built on Mac than on Linux, if you happen to have a Linux box you could build on. Alan. On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote: HBaseStorage is failing, and it's not something we did to HBaseStorage... Looks like the parser. Any takers? Testcase: testStoreToHBase_2_with_projection took 0.34 sec Caused an ERROR Error during parsing. line 1, column 84 mismatched input '(' expecting SEMI_COLON org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.PigServer.registerQuery(PigServer.java:553) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706) Caused by: Failed to parse: line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)
[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083219#comment-13083219 ] Daniel Dai commented on PIG-1429: - You are right. But I mean change the keyword to boolean as well. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Zhijie Shen Labels: boolean, gsoc2011, pig, type Attachments: PIG-1429_1.patch, PIG-1429_2.patch, PIG-1429_3.patch, working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2216) deprecate use of type in as clause of foreach statement
deprecate use of type in as clause of foreach statement --- Key: PIG-2216 URL: https://issues.apache.org/jira/browse/PIG-2216 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Fix For: 0.10 In the as clause of foreach statement, type can be specified, but that type is actually not used (ie, it does not result in a cast). This behavior is misleading. {code} F = foreach INP generate c1 as (name : chararray); {code} Pig 0.8 produces an error if c1 in above example is not of same type as specified in the as clause. In 0.9, that check seems to have been lost in the parser migration. It also results in the logical plan thinking that the type of c1 is that specified in the as clause. That can cause errors such as ClassCastException. One way to be consistent here would have been to add cast for the as clause as well. But having two ways to cast complicates things. So long term, I think the use of types in as clause should be removed. For 0.10, i think the check present in 0.8 should be added back, and the syntax should be deprecated (resulting in a warning when used). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2055) inconsistentcy behavior in parser generated during build
[ https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2055: --- Attachment: PIG-2055.2.patch PIG-2055.2.patch - the stringtemplate.jar 4.0.4 was released under a different artifact name (ST4). This patch uses the new artifact and version 4.0.4. inconsistentcy behavior in parser generated during build - Key: PIG-2055 URL: https://issues.apache.org/jira/browse/PIG-2055 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thejas M Nair Attachments: PIG-2055.1.patch, PIG-2055.2.patch On certain builds, i see that pig fails to support this syntax - {code} grunt l = load 'x' using PigStorage(':'); 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: line 1, column 29 mismatched input '(' expecting SEMI_COLON Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log {code} I seem to be the only one who has seen this behavior, and I have seen on occassion when I build on mac. It could be problem with antlr and apple jvm interaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Failing tests after parser change?
Dmitriy, You don't realize how lucky you are! ;) I have been trying hard to reproduce this problem, so that I can check if the patch in PIG-2055 actually fixes the issue. I ran build+ (small)test in a loop for 2000+ times, and this hasn't happened yet. If this is happening (almost) consistently, can you try the patch in PIG-2055 and see if that helps ? Thanks, Thejas On 8/11/11 9:44 AM, Alan Gates wrote: This looks like the intermittent Antlr bug we're seeing (https://issues.apache.org/jira/browse/PIG-2055). We're testing other versions of Antlr to try to fix this, but until we find one that addresses the issue the only solution is to do ant clean, and then rebuild and see if it goes away. We have also noticed it happens more often when built on Mac than on Linux, if you happen to have a Linux box you could build on. Alan. On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote: HBaseStorage is failing, and it's not something we did to HBaseStorage... Looks like the parser. Any takers? Testcase: testStoreToHBase_2_with_projection took 0.34 sec Caused an ERROR Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.PigServer.registerQuery(PigServer.java:553) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706) Caused by: Failed to parse:line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)
[jira] [Commented] (PIG-2209) JsonMetadata fails to find schema for glob paths
[ https://issues.apache.org/jira/browse/PIG-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083378#comment-13083378 ] Dmitriy V. Ryaboy commented on PIG-2209: Slight correction: this only happens when the items referred to in the glob are directories. It works fine when they are files or parts of file names. JsonMetadata fails to find schema for glob paths Key: PIG-2209 URL: https://issues.apache.org/jira/browse/PIG-2209 Project: Pig Issue Type: Bug Affects Versions: 0.10 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy JsonMetadata, used in PigStorage to work with serialized schemas, does not correctly interpret paths like '/foo/bar/{1,2,3}' and throws an exception: {code} Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131: Could not find schema file for file:///foo/bar/{1,2} at org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:217) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:186) at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) ... 17 more Caused by: java.io.IOException: Unable to read file:///foo/bar/z/{1,2} at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:183) ... 19 more Caused by: java.net.URISyntaxException: Illegal character in path at index 36: file:///foo/bar/{1,2} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Failing tests after parser change?
Fwiw, I believe I've also been hitting the same bug that Dmitriy described. In my case, I was running Cloudera's CDH3u0 build on a Mac. I'll also try to recreate today or tomorrow. Norbert On Thu, Aug 11, 2011 at 2:39 PM, Thejas Nair the...@hortonworks.com wrote: Dmitriy, You don't realize how lucky you are! ;) I have been trying hard to reproduce this problem, so that I can check if the patch in PIG-2055 actually fixes the issue. I ran build+ (small)test in a loop for 2000+ times, and this hasn't happened yet. If this is happening (almost) consistently, can you try the patch in PIG-2055 and see if that helps ? Thanks, Thejas On 8/11/11 9:44 AM, Alan Gates wrote: This looks like the intermittent Antlr bug we're seeing ( https://issues.apache.org/**jira/browse/PIG-2055https://issues.apache.org/jira/browse/PIG-2055). We're testing other versions of Antlr to try to fix this, but until we find one that addresses the issue the only solution is to do ant clean, and then rebuild and see if it goes away. We have also noticed it happens more often when built on Mac than on Linux, if you happen to have a Linux box you could build on. Alan. On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote: HBaseStorage is failing, and it's not something we did to HBaseStorage... Looks like the parser. Any takers? Testcase: testStoreToHBase_2_with_**projection took 0.34 sec Caused an ERROR Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1000: Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.PigServer$**Graph.parseQuery(PigServer.** java:1597) at org.apache.pig.PigServer$**Graph.registerQuery(PigServer.** java:1540) at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540) at org.apache.pig.PigServer.**registerQuery(PigServer.java:**553) at org.apache.pig.test.**TestHBaseStorage.scanTable1(** TestHBaseStorage.java:771) at org.apache.pig.test.**TestHBaseStorage.scanTable1(** TestHBaseStorage.java:767) at org.apache.pig.test.**TestHBaseStorage.**testStoreToHBase_2_with_** projection(TestHBaseStorage.**java:706) Caused by: Failed to parse:line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.parser.**QueryParserDriver.parse(** QueryParserDriver.java:222) at org.apache.pig.parser.**QueryParserDriver.parse(** QueryParserDriver.java:164) at org.apache.pig.PigServer$**Graph.parseQuery(PigServer.** java:1589)
[jira] [Updated] (PIG-2174) HBaseStorage column filters miss some fields
[ https://issues.apache.org/jira/browse/PIG-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-2174: --- Resolution: Fixed Fix Version/s: 0.10 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Bill! HBaseStorage column filters miss some fields Key: PIG-2174 URL: https://issues.apache.org/jira/browse/PIG-2174 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-2174_1.patch When mixing static and dynamic column mappings, {{HBaseStorage}} sometimes doesn't pick up the static column values and nulls are returned. I believe this bug has been masked by HBase being a bit over-eager when it comes to respecting column filters (i.e. HBase is returning more columns than it should). For example, this query returns nulls for the {{sc}} column, even when it contains data: {noformat} a = LOAD 'hbase://pigtable_1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('pig:sc pig:prefixed_col_*','-loadKey') AS (rowKey:chararray, sc:chararray, pig_cf_map:map[]); {noformat} What is very strange (about HBase), is that the same script will return values just fine if {{sc}} is instead {{col_a}}, assuming of course that both columns contain data: {noformat} a = LOAD 'hbase://pigtable_1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('pig:col_a pig:prefixed_col_*','-loadKey') AS (rowKey:chararray, col_a:chararray, pig_cf_map:map[]); {noformat} Potential HBase issues aside, I think there is a bug in the logic on the Pig side. Patch to follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1061
See https://builds.apache.org/job/Pig-trunk/1061/changes Changes: [dvryaboy] PIG-2174: HBaseStorage column filters miss some fields -- [...truncated 38358 lines...] [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] 11/08/11 22:32:34 ERROR hdfs.DFSClient: Exception closing file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by DFSClient_1110622717 [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by DFSClient_1110622717 [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] at org.apache.hadoop.ipc.Client.call(Client.java:740) [junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) [junit] at $Proxy0.complete(Unknown Source) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) [junit] at $Proxy0.complete(Unknown Source) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) [junit] at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1043) [junit] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:237) [junit] at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269) [junit] at org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:83) [junit] at org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77) [junit] at org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68) [junit] at org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:127) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] Shutting down the Mini HDFS Cluster [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] Shutting down DataNode 3 [junit] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
Re: Build failed in Jenkins: Pig-trunk #1061
Looks like my change to use warn() instead of log.warn is causing issues for Jenkins. [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided to UDF: org.apache.pig.builtin. SUBSTRING. java.lang.NullPointerException [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided to UDF: org.apache.pig.builtin.SUBSTRING. java.lang.StringIndexOutOfBoundsException: String index out of range: -2 [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided to UDF: org.apache.pig.builtin.SUBSTRING. java.lang.StringIndexOutOfBoundsException: String index out of range: -1 [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided to UDF: org.apache.pig.builtin.SUBSTRING. java.lang.StringIndexOutOfBoundsException: String index out of range: -8 [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided to UDF: org.apache.pig.builtin.SUBSTRING. java.lang.StringIndexOutOfBoundsException: String index out of range: -2 [junit] 11/08/11 22:32:36 WARN builtin.INDEXOF: No logger object provided to UDF: org.apache.pig.builtin.INDEXOF. Failed to process input; error - null [junit] 11/08/11 22:32:36 WARN builtin.LAST_INDEX_OF: No logger object provided to UDF: org.apache.pig.builtin.LAST_INDEX_OF. Failed to process input; error - null Any idea if this is an environment thing or a me thing? D On Thu, Aug 11, 2011 at 3:32 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/Pig-trunk/1061/changes Changes: [dvryaboy] PIG-2174: HBaseStorage column filters miss some fields -- [...truncated 38358 lines...] [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] 11/08/11 22:32:34 ERROR hdfs.DFSClient: Exception closing file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by DFSClient_1110622717 [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by DFSClient_1110622717 [junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) [junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) [junit] [junit] at org.apache.hadoop.ipc.Client.call(Client.java:740) [junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) [junit] at $Proxy0.complete(Unknown Source) [junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) [junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) [junit] at $Proxy0.complete(Unknown Source) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) [junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) [junit] at
[jira] [Updated] (PIG-2213) Pig 0.10 Documentation
[ https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-2213: - Attachment: pig-2213-patch-1.patch Patch for: PIG-2126 (PigStorage) PIG-2211 (exec command) Pig 0.10 Documentation --- Key: PIG-2213 URL: https://issues.apache.org/jira/browse/PIG-2213 Project: Pig Issue Type: Task Reporter: Corinne Chandel Attachments: pig-2213-patch-1.patch Doc JIRA for Pig 0.10 release. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2213) Pig 0.10 Documentation
[ https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-2213: - Assignee: Daniel Dai Affects Version/s: 0.10 0.9.1 Status: Patch Available (was: Open) Apply pig-2213-patch-1.patch to TRUNK and branch-09 Pig 0.10 Documentation --- Key: PIG-2213 URL: https://issues.apache.org/jira/browse/PIG-2213 Project: Pig Issue Type: Task Affects Versions: 0.9.1, 0.10 Reporter: Corinne Chandel Assignee: Daniel Dai Attachments: pig-2213-patch-1.patch Doc JIRA for Pig 0.10 release. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2211) update documentation for use of exec with no-args
[ https://issues.apache.org/jira/browse/PIG-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-2211. -- Resolution: Fixed Fix for this JIRA included in PIG-2213 (pig-2213-patch-1.patch). Add any comments to PIG-2213. update documentation for use of exec with no-args - Key: PIG-2211 URL: https://issues.apache.org/jira/browse/PIG-2211 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thejas M Nair Assignee: Corinne Chandel Fix For: 0.9.1, 0.10 In the description of the arguments of the exec commands, it shows that the script argument is compulsory. {code} exec [-param param_name = param_value] [-param_file file_name] script {code} should be {code} exec [-param param_name = param_value] [-param_file file_name] [script] {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2126) Pig doc need to describe how to load complex data for PigStorage
[ https://issues.apache.org/jira/browse/PIG-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-2126. -- Resolution: Fixed Fix for this JIRA included in PIG-2213 (pig-2213-patch-1.patch). Add any comments to PIG-2213. Pig doc need to describe how to load complex data for PigStorage Key: PIG-2126 URL: https://issues.apache.org/jira/browse/PIG-2126 Project: Pig Issue Type: Improvement Affects Versions: 0.9.1, 0.10 Reporter: Daniel Dai Assignee: Corinne Chandel Fix For: 0.9.1, 0.10 Need to describe how to load bag, tuple, map -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2196) Test harness should be independent of Pig
[ https://issues.apache.org/jira/browse/PIG-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083778#comment-13083778 ] Alan Gates commented on PIG-2196: - In default.conf, changing the default of testconfigpath from $ENV{PH_CLUSTER}/conf to $ENV/{PH_CLUSTER} breaks the tests. Without the patch 'ant -Dpig.harness.old.pig=$HOME/grid/pig-0.8.1/ -Dpig.harness.cluster=$HOME/grid/max' works, but with the patch it fails. Also, why not change the names from pig.harness.old.pig and pig.harness.cluster to harness.old.pig and harness.cluster, since you're trying to remove the piggyness from the harness? Test harness should be independent of Pig - Key: PIG-2196 URL: https://issues.apache.org/jira/browse/PIG-2196 Project: Pig Issue Type: Improvement Affects Versions: 0.10 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.10 Attachments: pig-2196.patch Test harness is designed to be independent of Pig, but currently it makes those assumptions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2177) e2e test harness should not assume hadoop dir structure
[ https://issues.apache.org/jira/browse/PIG-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2177: Resolution: Duplicate Status: Resolved (was: Patch Available) Same code change is included in PIG-2196. e2e test harness should not assume hadoop dir structure --- Key: PIG-2177 URL: https://issues.apache.org/jira/browse/PIG-2177 Project: Pig Issue Type: Improvement Components: tools Affects Versions: 0.10 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Trivial Fix For: 0.10 Attachments: pig-2177.patch testconfigpath variable assumes conf/ dir to exist in $PH_CLUSTER location. It may or may not exist. If it exists, its better to provide full path including conf/ If it doesn't exist full path to the dir leading to *.xml can be provided. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2213) Pig 0.10 Documentation
[ https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-2213: - Attachment: pig-2113-patch-2.patch Use this patch for PIG-2126 and PIG-2211 Pig 0.10 Documentation --- Key: PIG-2213 URL: https://issues.apache.org/jira/browse/PIG-2213 Project: Pig Issue Type: Task Affects Versions: 0.9.1, 0.10 Reporter: Corinne Chandel Assignee: Daniel Dai Attachments: pig-2113-patch-2.patch, pig-2213-patch-1.patch Doc JIRA for Pig 0.10 release. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2208: -- Attachment: PIG-2208.patch This patch implements option 2. Augmenting Pig grammar will be more involved and could be done later. Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2208.patch PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083882#comment-13083882 ] Dmitriy V. Ryaboy commented on PIG-2208: This is just trading one issue for another. If we use too many counters, the job is killed by limits. If we don't, we spam the logs and the tasks are killed for using too much local disk. We should at least do local aggregation -- keep counters local to task (a simple map), and log what we would otherwise put in counters. Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2208.patch PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement
POStore.getSchema() returns null if I dont have a schema defined at load statement -- Key: PIG-2217 URL: https://issues.apache.org/jira/browse/PIG-2217 Project: Pig Issue Type: Bug Affects Versions: 0.9.0, 0.8.1 Reporter: Vivek Padmanabhan If I don't specify a schema definition in load statement, then POStore.getSchema() returns null because of which PigOutputCommitter is not storing schema . For example if I run the below script, .pig_header and .pig_schema files wont be saved. load_1 = LOAD 'i1' USING PigStorage(); ordered_data_1 = ORDER load_1 BY * ASC PARALLEL 1; STORE ordered_data_1 INTO 'myout' using org.apache.pig.piggybank.storage.PigStorageSchema(); This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is not getting invoked for these cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement
[ https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083939#comment-13083939 ] Vivek Padmanabhan commented on PIG-2217: For the above mentioned script the schema is marked as null from the logical layer itself, ie LOStore.getSchema() returns a null. Since all the schema is derived from its predeccessor operators, the schema object for LOLoad itself is null. Hence this scenario will be happening for all scripts which does not define a schema in the load stmt. In Pig 0.7 , even if the schema value is null from logical layer, while translating, it is wrapped with an empty schema For ex; In LogToPhyTranslationVisitor public void visit(LOStore loStore) throws VisitorException { store.setSchema(new Schema(loStore.getSchema())); Hence the file will look like below .pig_header (empty file ) .pig_schema --- {fields:[],version:0,sortKeys:[-1],sortKeyOrders:[ASCENDING]} But in 0.8 (new logical plan) onwards, the null value is directly returned, because of which the metadata is not saved. This change in behaviour came with the new logical plan introduced in Pig 0.8 which also got transferred into Pig 0.9. Disabling the new logical plan in 0.8 ( pig -useversion 0.8 -Dpig.usenewlogicalplan=false), will produce .pig_header and .pig_schema files. POStore.getSchema() returns null if I dont have a schema defined at load statement -- Key: PIG-2217 URL: https://issues.apache.org/jira/browse/PIG-2217 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.0 Reporter: Vivek Padmanabhan If I don't specify a schema definition in load statement, then POStore.getSchema() returns null because of which PigOutputCommitter is not storing schema . For example if I run the below script, .pig_header and .pig_schema files wont be saved. load_1 = LOAD 'i1' USING PigStorage(); ordered_data_1 = ORDER load_1 BY * ASC PARALLEL 1; STORE ordered_data_1 INTO 'myout' using org.apache.pig.piggybank.storage.PigStorageSchema(); This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is not getting invoked for these cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira