[jira] [Commented] (PIG-2114) Enhancements to PIG HBaseStorage Load Store Func with extra scan configurations

2011-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082955#comment-13082955
 ] 

Dmitriy V. Ryaboy commented on PIG-2114:


Apologies again for taking a while to review.

Thanks, that looks like a fair bit of work.

First, just a couple of procedural notes:
1) make sure the new files don't have @author annotations and do have the 
apache headers
2) there is already a TestHBaseStorage. Why add a new one in util?
3) This is sort of a PITA request, especially as there are plenty of places in 
the codebase that don't adhere to this practice, but can you make sure to do 
things like put spaces after commas (as in 
Mapfamily,Mapqualifier,Maptimestamp,value) and before opening parens (as 
in for(Map.Entry valueEntry: ...),  wrap lines to a reasonable length, etc?

My major concern with the patch is as follows.

In getNext() you inserted a completely new flow that is used if timestamps are 
used. It bypasses all the existing logic for how results are created, and as 
far as I can see, does not respect things like projection pushdown. It also 
makes it so any future work on the hbase loader logic has to happen in two 
places. Let's not do that. Isn't loading a single-version row just a special 
case of loading multiple versions (with n = 1)? We should be able to do this in 
one go.

There being so much stuff mixed in here, I propose we get the smaller stuff 
like PIG-2115 in. Some of the things you are doing here are also pretty 
non-controversial, like omitNulls and prefix filters, we can get those in 
pretty easily. Let's factor out the multiple versions changes and add them to 
PIG-1832, leaving this (blessedly unspecifically titled :)) ticket to deal with 
the smaller stuff.

 Enhancements to PIG HBaseStorage Load  Store Func with extra scan 
 configurations
 -

 Key: PIG-2114
 URL: https://issues.apache.org/jira/browse/PIG-2114
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.9.0
Reporter: Hariprasad Kuppuswamy
Assignee: Hariprasad Kuppuswamy
Priority: Minor
  Labels: hbase, storage
 Fix For: 0.10

 Attachments: 
 Enhancments-to-enable-timestampversion-based-row-scan.patch


 - Added capability to specify scan based on timestamps (Hariprasad 
 Kuppuswwamy)
 - Ability to specify number of versions to be fetched with current scan 
 (Hariprasad Kuppuswwamy)
 - Configure the rowkey prefixes filter for the scan (Hariprasad Kuppuswwamy)
 - Added ability to omit nulls when dealing with hbase storage (Greg Bowyer)
 - Added ability to specify Put timestamps while insertion (Hariprasad 
 Kuppuswamy)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failing tests after parser change?

2011-08-11 Thread Dmitriy Ryaboy
HBaseStorage is failing, and it's not something we did to HBaseStorage...
Looks like the parser.

Any takers?

Testcase: testStoreToHBase_2_with_projection took 0.34 sec
Caused an ERROR
Error during parsing. line 1, column 84  mismatched input '(' expecting
SEMI_COLON
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
parsing. line 1, column 84  mismatched input '(' expecting SEMI_COLON
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:553)
at
org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771)
at
org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767)
at
org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706)
Caused by: Failed to parse: line 1, column 84  mismatched input '('
expecting SEMI_COLON
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)


[jira] [Commented] (PIG-2174) HBaseStorage column filters miss some fields

2011-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082958#comment-13082958
 ] 

Dmitriy V. Ryaboy commented on PIG-2174:


+1 assuming test-patch passes.

Sadly at the moment TestHBaseStorage doesn't pass in trunk even without this 
patch..

 HBaseStorage column filters miss some fields
 

 Key: PIG-2174
 URL: https://issues.apache.org/jira/browse/PIG-2174
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Bill Graham
 Attachments: PIG-2174_1.patch


 When mixing static and dynamic column mappings, {{HBaseStorage}} sometimes 
 doesn't pick up the static column values and nulls are returned. I believe 
 this bug has been masked by HBase being a bit over-eager when it comes to 
 respecting column filters (i.e. HBase is returning more columns than it 
 should).
 For example, this query returns nulls for the {{sc}} column, even when it 
 contains data:
 {noformat}
 a = LOAD 'hbase://pigtable_1' USING
   org.apache.pig.backend.hadoop.hbase.HBaseStorage
   ('pig:sc pig:prefixed_col_*','-loadKey') AS
   (rowKey:chararray, sc:chararray, pig_cf_map:map[]);
 {noformat}
 What is very strange (about HBase), is that the same script will return 
 values just fine if {{sc}} is instead {{col_a}}, assuming of course that both 
 columns contain data:
 {noformat}
 a = LOAD 'hbase://pigtable_1' USING
   org.apache.pig.backend.hadoop.hbase.HBaseStorage
   ('pig:col_a pig:prefixed_col_*','-loadKey') AS
   (rowKey:chararray, col_a:chararray, pig_cf_map:map[]);
 {noformat}
 Potential HBase issues aside, I think there is a bug in the logic on the Pig 
 side. Patch to follow. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1060

2011-08-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1060/changes

Changes:

[thejas] PIG-2176: add logical plan assumption checker

--
[...truncated 37928 lines...]
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could 
not complete write to file 
/tmp/TestStore-output--716161521724243.txt_cleanupOnFailure_succeeded by 
DFSClient_325773412
[junit] at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] at org.apache.hadoop.ipc.Client.call(Client.java:740)
[junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1043)
[junit] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:237)
[junit] at 
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:83)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:127)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] Shutting down the Mini HDFS Cluster
[junit] 11/08/11 10:31:57 WARN hdfs.StateChange: DIR* 
NameSystem.completeFile: failed to complete 
/tmp/TestStore-output--7612399809477939598.txt_cleanupOnFailure_succeeded1 
because dir.getFileBlocks() is null  and pendingFile is null
[junit] Shutting down DataNode 3
[junit] 11/08/11 10:31:57 INFO ipc.Server: IPC Server handler 4 on 34636, 
call 
complete(/tmp/TestStore-output--7612399809477939598.txt_cleanupOnFailure_succeeded1,
 DFSClient_325773412) from 127.0.0.1:45206: error: java.io.IOException: Could 
not complete write 

[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig

2011-08-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083081#comment-13083081
 ] 

Zhijie Shen commented on PIG-1429:
--

Hi Daniel,

Thanks for your review, I've fixed 1, 3, 4. For 5, I'll add the comments and 
some end-to-end tests. For 2, if we want to stick to boolean as the name of 
Boolean Type, we'd better revise the grammar: changing from bool keyword to 
boolean. Otherwise, Utils.getSchemaFromString() will be broken if the 
supplied schema string uses bool. And the name used in Pig Latin commands 
should be consistent to that in the displayed plan/schema.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: boolean, gsoc2011, pig, type
 Attachments: PIG-1429_1.patch, PIG-1429_2.patch, PIG-1429_3.patch, 
 working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  
 This is a candidate project for Google summer of code 2011. More information 
 about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2215) Newlines in function arguments still cause exceptions to be thrown

2011-08-11 Thread Adam Warrington (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Warrington updated PIG-2215:
-

Attachment: PIG-2215-0.patch

 Newlines in function arguments still cause exceptions to be thrown
 --

 Key: PIG-2215
 URL: https://issues.apache.org/jira/browse/PIG-2215
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Adam Warrington
 Attachments: PIG-2215-0.patch


 PIG-1749 was an attempt to allow newlines in function arguments. It appears 
 that the AstValidator and the LogicalPlanGenerator grammars were not updated, 
 so the following exception and stracktrace will be thrown when executing a 
 script that has newlines in function arguments:
 ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3)
 Failed to parse: Pig script failed to parse: MismatchedTokenException(93!=3)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
 at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
 at org.apache.pig.Main.run(Main.java:487)
 at org.apache.pig.Main.main(Main.java:108)
 Caused by: MismatchedTokenException(93!=3)
 at 
 org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209)
 at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
 at 
 org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497)
 at 
 org.apache.pig.parser.AstValidator.load_clause(AstValidator.java:2464)
 at org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:934)
 at 
 org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574)
 at org.apache.pig.parser.AstValidator.statement(AstValidator.java:396)
 at org.apache.pig.parser.AstValidator.query(AstValidator.java:306)
 at 
 org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
 ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2215) Newlines in function arguments still cause exceptions to be thrown

2011-08-11 Thread Adam Warrington (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083215#comment-13083215
 ] 

Adam Warrington commented on PIG-2215:
--

This patch updates the LogicalPlanGenerator and AstValidator grammars, and adds 
2 unit tests to test the new functionality.

 Newlines in function arguments still cause exceptions to be thrown
 --

 Key: PIG-2215
 URL: https://issues.apache.org/jira/browse/PIG-2215
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Adam Warrington
 Attachments: PIG-2215-0.patch


 PIG-1749 was an attempt to allow newlines in function arguments. It appears 
 that the AstValidator and the LogicalPlanGenerator grammars were not updated, 
 so the following exception and stracktrace will be thrown when executing a 
 script that has newlines in function arguments:
 ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3)
 Failed to parse: Pig script failed to parse: MismatchedTokenException(93!=3)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
 at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
 at org.apache.pig.Main.run(Main.java:487)
 at org.apache.pig.Main.main(Main.java:108)
 Caused by: MismatchedTokenException(93!=3)
 at 
 org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209)
 at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
 at 
 org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497)
 at 
 org.apache.pig.parser.AstValidator.load_clause(AstValidator.java:2464)
 at org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:934)
 at 
 org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574)
 at org.apache.pig.parser.AstValidator.statement(AstValidator.java:396)
 at org.apache.pig.parser.AstValidator.query(AstValidator.java:306)
 at 
 org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
 ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Failing tests after parser change?

2011-08-11 Thread Alan Gates
This looks like the intermittent Antlr bug we're seeing 
(https://issues.apache.org/jira/browse/PIG-2055).  We're testing other versions 
of Antlr to try to fix this, but until we find one that addresses the issue the 
only solution is to do ant clean, and then rebuild and see if it goes away.  We 
have also noticed it happens more often when built on Mac than on Linux, if you 
happen to have a Linux box you could build on.

Alan.

On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote:

 HBaseStorage is failing, and it's not something we did to HBaseStorage...
 Looks like the parser.
 
 Any takers?
 
 Testcase: testStoreToHBase_2_with_projection took 0.34 sec
Caused an ERROR
 Error during parsing. line 1, column 84  mismatched input '(' expecting
 SEMI_COLON
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
 parsing. line 1, column 84  mismatched input '(' expecting SEMI_COLON
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:553)
at
 org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771)
at
 org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767)
at
 org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706)
 Caused by: Failed to parse: line 1, column 84  mismatched input '('
 expecting SEMI_COLON
at
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222)
at
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)



[jira] [Commented] (PIG-1429) Add Boolean Data Type to Pig

2011-08-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083219#comment-13083219
 ] 

Daniel Dai commented on PIG-1429:
-

You are right. But I mean change the keyword to boolean as well.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: boolean, gsoc2011, pig, type
 Attachments: PIG-1429_1.patch, PIG-1429_2.patch, PIG-1429_3.patch, 
 working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  
 This is a candidate project for Google summer of code 2011. More information 
 about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2216) deprecate use of type in as clause of foreach statement

2011-08-11 Thread Thejas M Nair (JIRA)
deprecate use of type in as clause of foreach statement
---

 Key: PIG-2216
 URL: https://issues.apache.org/jira/browse/PIG-2216
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
 Fix For: 0.10


In the as clause of foreach statement, type can be specified, but that type is 
actually not used (ie, it does not result in a cast). This behavior is 
misleading.

{code}
F = foreach INP generate c1 as (name : chararray);
{code}
Pig 0.8 produces an error if c1 in above example is not of same type as 
specified in the as clause.
In 0.9, that check seems to have been lost in the parser migration. It also 
results in the logical plan thinking that the type of c1 is that specified in 
the as clause. That can cause errors such as ClassCastException. 

One way to be consistent here would have been to add cast for the as clause as 
well. But having two ways to cast complicates things. So long term, I think the 
use of types in as clause should be removed.

For 0.10, i think the check present in 0.8 should be added back, and the syntax 
should be deprecated (resulting in a warning when used).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2055) inconsistentcy behavior in parser generated during build

2011-08-11 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2055:
---

Attachment: PIG-2055.2.patch

PIG-2055.2.patch - the stringtemplate.jar 4.0.4 was released under a different 
artifact name (ST4). This patch uses the new artifact and version 4.0.4.

 inconsistentcy behavior in parser generated during build 
 -

 Key: PIG-2055
 URL: https://issues.apache.org/jira/browse/PIG-2055
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
 Attachments: PIG-2055.1.patch, PIG-2055.2.patch


 On certain builds, i see that pig fails to support this syntax -
 {code}
 grunt l = load 'x' using PigStorage(':');   
 2011-05-10 09:21:41,565 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1200: line 1, column 29  mismatched input '(' expecting SEMI_COLON
 Details at logfile: /Users/tejas/pig_trunk_cp/trunk/pig_1305044484712.log
 {code}
 I seem to be the only one who has seen this behavior, and I have seen on 
 occassion when I build on mac. It could be problem with antlr and apple jvm 
 interaction. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Failing tests after parser change?

2011-08-11 Thread Thejas Nair

Dmitriy,
You don't realize how lucky you are! ;)
I have been trying hard to reproduce this problem, so that I can check 
if the patch in PIG-2055 actually fixes the issue. I ran build+ 
(small)test in a loop for 2000+ times, and this hasn't happened yet.


If this is happening (almost) consistently, can you try the patch in 
PIG-2055 and see if that helps ?


Thanks,
Thejas



On 8/11/11 9:44 AM, Alan Gates wrote:

This looks like the intermittent Antlr bug we're seeing 
(https://issues.apache.org/jira/browse/PIG-2055).  We're testing other versions 
of Antlr to try to fix this, but until we find one that addresses the issue the 
only solution is to do ant clean, and then rebuild and see if it goes away.  We 
have also noticed it happens more often when built on Mac than on Linux, if you 
happen to have a Linux box you could build on.

Alan.

On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote:


HBaseStorage is failing, and it's not something we did to HBaseStorage...
Looks like the parser.

Any takers?

Testcase: testStoreToHBase_2_with_projection took 0.34 sec
Caused an ERROR
Error during parsing.line 1, column 84   mismatched input '(' expecting
SEMI_COLON
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
parsing.line 1, column 84   mismatched input '(' expecting SEMI_COLON
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:553)
at
org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771)
at
org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767)
at
org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706)
Caused by: Failed to parse:line 1, column 84   mismatched input '('
expecting SEMI_COLON
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)






[jira] [Commented] (PIG-2209) JsonMetadata fails to find schema for glob paths

2011-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083378#comment-13083378
 ] 

Dmitriy V. Ryaboy commented on PIG-2209:


Slight correction: this only happens when the items referred to in the glob are 
directories. It works fine when they are files or parts of file names.

 JsonMetadata fails to find schema for glob paths
 

 Key: PIG-2209
 URL: https://issues.apache.org/jira/browse/PIG-2209
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy

 JsonMetadata, used in PigStorage to work with serialized schemas, does not 
 correctly interpret paths like '/foo/bar/{1,2,3}' and throws an exception:
 {code}
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131: 
 Could not find schema file for file:///foo/bar/{1,2}
   at 
 org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:217)
   at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:186)
   at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438)
   at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
   ... 17 more
 Caused by: java.io.IOException: Unable to read file:///foo/bar/z/{1,2}
   at 
 org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
   at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:183)
   ... 19 more
 Caused by: java.net.URISyntaxException: Illegal character in path at index 
 36: file:///foo/bar/{1,2}
   at java.net.URI$Parser.fail(URI.java:2809)
   at java.net.URI$Parser.checkChars(URI.java:2982)
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Failing tests after parser change?

2011-08-11 Thread Norbert Burger
Fwiw, I believe I've also been hitting the same bug that Dmitriy described.
In my case, I was running Cloudera's CDH3u0 build on a Mac.  I'll also try
to recreate today or tomorrow.

Norbert

On Thu, Aug 11, 2011 at 2:39 PM, Thejas Nair the...@hortonworks.com wrote:

 Dmitriy,
 You don't realize how lucky you are! ;)
 I have been trying hard to reproduce this problem, so that I can check if
 the patch in PIG-2055 actually fixes the issue. I ran build+ (small)test in
 a loop for 2000+ times, and this hasn't happened yet.

 If this is happening (almost) consistently, can you try the patch in
 PIG-2055 and see if that helps ?

 Thanks,
 Thejas




 On 8/11/11 9:44 AM, Alan Gates wrote:

 This looks like the intermittent Antlr bug we're seeing (
 https://issues.apache.org/**jira/browse/PIG-2055https://issues.apache.org/jira/browse/PIG-2055).
  We're testing other versions of Antlr to try to fix this, but until we find
 one that addresses the issue the only solution is to do ant clean, and then
 rebuild and see if it goes away.  We have also noticed it happens more often
 when built on Mac than on Linux, if you happen to have a Linux box you could
 build on.

 Alan.

 On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote:

  HBaseStorage is failing, and it's not something we did to HBaseStorage...
 Looks like the parser.

 Any takers?

 Testcase: testStoreToHBase_2_with_**projection took 0.34 sec
Caused an ERROR
 Error during parsing.line 1, column 84   mismatched input '(' expecting
 SEMI_COLON
 org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR 1000:
 Error during
 parsing.line 1, column 84   mismatched input '(' expecting SEMI_COLON
at org.apache.pig.PigServer$**Graph.parseQuery(PigServer.**
 java:1597)
at org.apache.pig.PigServer$**Graph.registerQuery(PigServer.**
 java:1540)
at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540)
at org.apache.pig.PigServer.**registerQuery(PigServer.java:**553)
at
 org.apache.pig.test.**TestHBaseStorage.scanTable1(**
 TestHBaseStorage.java:771)
at
 org.apache.pig.test.**TestHBaseStorage.scanTable1(**
 TestHBaseStorage.java:767)
at
 org.apache.pig.test.**TestHBaseStorage.**testStoreToHBase_2_with_**
 projection(TestHBaseStorage.**java:706)
 Caused by: Failed to parse:line 1, column 84   mismatched input '('
 expecting SEMI_COLON
at
 org.apache.pig.parser.**QueryParserDriver.parse(**
 QueryParserDriver.java:222)
at
 org.apache.pig.parser.**QueryParserDriver.parse(**
 QueryParserDriver.java:164)
at org.apache.pig.PigServer$**Graph.parseQuery(PigServer.**
 java:1589)






[jira] [Updated] (PIG-2174) HBaseStorage column filters miss some fields

2011-08-11 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2174:
---

   Resolution: Fixed
Fix Version/s: 0.10
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Bill!

 HBaseStorage column filters miss some fields
 

 Key: PIG-2174
 URL: https://issues.apache.org/jira/browse/PIG-2174
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-2174_1.patch


 When mixing static and dynamic column mappings, {{HBaseStorage}} sometimes 
 doesn't pick up the static column values and nulls are returned. I believe 
 this bug has been masked by HBase being a bit over-eager when it comes to 
 respecting column filters (i.e. HBase is returning more columns than it 
 should).
 For example, this query returns nulls for the {{sc}} column, even when it 
 contains data:
 {noformat}
 a = LOAD 'hbase://pigtable_1' USING
   org.apache.pig.backend.hadoop.hbase.HBaseStorage
   ('pig:sc pig:prefixed_col_*','-loadKey') AS
   (rowKey:chararray, sc:chararray, pig_cf_map:map[]);
 {noformat}
 What is very strange (about HBase), is that the same script will return 
 values just fine if {{sc}} is instead {{col_a}}, assuming of course that both 
 columns contain data:
 {noformat}
 a = LOAD 'hbase://pigtable_1' USING
   org.apache.pig.backend.hadoop.hbase.HBaseStorage
   ('pig:col_a pig:prefixed_col_*','-loadKey') AS
   (rowKey:chararray, col_a:chararray, pig_cf_map:map[]);
 {noformat}
 Potential HBase issues aside, I think there is a bug in the logic on the Pig 
 side. Patch to follow. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1061

2011-08-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1061/changes

Changes:

[dvryaboy] PIG-2174: HBaseStorage column filters miss some fields

--
[...truncated 38358 lines...]
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 11/08/11 22:32:34 ERROR hdfs.DFSClient: Exception closing file 
/tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 : 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete 
write to file 
/tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by 
DFSClient_1110622717
[junit] at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could 
not complete write to file 
/tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 by 
DFSClient_1110622717
[junit] at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 
[junit] at org.apache.hadoop.ipc.Client.call(Client.java:740)
[junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
[junit] at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
[junit] at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1043)
[junit] at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:237)
[junit] at 
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:83)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:127)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] Shutting down the Mini HDFS Cluster
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] Shutting down DataNode 3
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
 

Re: Build failed in Jenkins: Pig-trunk #1061

2011-08-11 Thread Dmitriy Ryaboy
Looks like my change to use warn() instead of log.warn is causing issues for
Jenkins.

 [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object provided
to UDF: org.apache.pig.builtin.
SUBSTRING. java.lang.NullPointerException
   [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object
provided to UDF: org.apache.pig.builtin.SUBSTRING.
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
   [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object
provided to UDF: org.apache.pig.builtin.SUBSTRING.
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object
provided to UDF: org.apache.pig.builtin.SUBSTRING.
java.lang.StringIndexOutOfBoundsException: String index out of range: -8
   [junit] 11/08/11 22:32:36 WARN builtin.SUBSTRING: No logger object
provided to UDF: org.apache.pig.builtin.SUBSTRING.
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
   [junit] 11/08/11 22:32:36 WARN builtin.INDEXOF: No logger object provided
to UDF: org.apache.pig.builtin.INDEXOF. Failed to process input; error -
null
   [junit] 11/08/11 22:32:36 WARN builtin.LAST_INDEX_OF: No logger object
provided to UDF: org.apache.pig.builtin.LAST_INDEX_OF. Failed to process
input; error - null

Any idea if this is an environment thing or a me thing?

D


On Thu, Aug 11, 2011 at 3:32 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 See https://builds.apache.org/job/Pig-trunk/1061/changes

 Changes:

 [dvryaboy] PIG-2174: HBaseStorage column filters miss some fields

 --
 [...truncated 38358 lines...]
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit] 11/08/11 22:32:34 ERROR hdfs.DFSClient: Exception closing file
 /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2 :
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
 complete write to file
 /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2
 by DFSClient_1110622717
[junit] at
 org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
 Source)
[junit] at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native
 Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit]
[junit] org.apache.hadoop.ipc.RemoteException: java.io.IOException:
 Could not complete write to file
 /tmp/TestStore-output--2391036896539643097.txt_cleanupOnFailure_succeeded2
 by DFSClient_1110622717
[junit] at
 org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
 Source)
[junit] at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
[junit] at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
[junit] at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
[junit] at java.security.AccessController.doPrivileged(Native
 Method)
[junit] at javax.security.auth.Subject.doAs(Subject.java:396)
[junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
[junit]
[junit] at org.apache.hadoop.ipc.Client.call(Client.java:740)
[junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
 Source)
[junit] at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
[junit] at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
[junit] at $Proxy0.complete(Unknown Source)
[junit] at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
[junit] at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
[junit] at
 

[jira] [Updated] (PIG-2213) Pig 0.10 Documentation

2011-08-11 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-2213:
-

Attachment: pig-2213-patch-1.patch

Patch for:
 PIG-2126 (PigStorage)
 PIG-2211 (exec command)

 Pig 0.10 Documentation 
 ---

 Key: PIG-2213
 URL: https://issues.apache.org/jira/browse/PIG-2213
 Project: Pig
  Issue Type: Task
Reporter: Corinne Chandel
 Attachments: pig-2213-patch-1.patch


 Doc JIRA for Pig 0.10 release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2213) Pig 0.10 Documentation

2011-08-11 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-2213:
-

 Assignee: Daniel Dai
Affects Version/s: 0.10
   0.9.1
   Status: Patch Available  (was: Open)

Apply pig-2213-patch-1.patch to TRUNK and branch-09

 Pig 0.10 Documentation 
 ---

 Key: PIG-2213
 URL: https://issues.apache.org/jira/browse/PIG-2213
 Project: Pig
  Issue Type: Task
Affects Versions: 0.9.1, 0.10
Reporter: Corinne Chandel
Assignee: Daniel Dai
 Attachments: pig-2213-patch-1.patch


 Doc JIRA for Pig 0.10 release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2211) update documentation for use of exec with no-args

2011-08-11 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-2211.
--

Resolution: Fixed

Fix for this JIRA included in PIG-2213 (pig-2213-patch-1.patch). Add any 
comments to PIG-2213.

 update documentation for use of exec with no-args
 -

 Key: PIG-2211
 URL: https://issues.apache.org/jira/browse/PIG-2211
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Corinne Chandel
 Fix For: 0.9.1, 0.10


 In the description of the arguments of the exec commands, it shows that the 
 script argument is compulsory.
 {code}
 exec [-param param_name = param_value] [-param_file file_name] script  
 {code}
 should be
 {code} 
 exec [-param param_name = param_value] [-param_file file_name] [script]  
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2126) Pig doc need to describe how to load complex data for PigStorage

2011-08-11 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-2126.
--

Resolution: Fixed

Fix for this JIRA included in PIG-2213 (pig-2213-patch-1.patch). Add any 
comments to PIG-2213.

 Pig doc need to describe how to load complex data for PigStorage
 

 Key: PIG-2126
 URL: https://issues.apache.org/jira/browse/PIG-2126
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.1, 0.10
Reporter: Daniel Dai
Assignee: Corinne Chandel
 Fix For: 0.9.1, 0.10


 Need to describe how to load bag, tuple, map

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2196) Test harness should be independent of Pig

2011-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083778#comment-13083778
 ] 

Alan Gates commented on PIG-2196:
-

In default.conf, changing the default of testconfigpath from 
$ENV{PH_CLUSTER}/conf to $ENV/{PH_CLUSTER} breaks the tests.  Without the patch 
'ant -Dpig.harness.old.pig=$HOME/grid/pig-0.8.1/ 
-Dpig.harness.cluster=$HOME/grid/max' works, but with the patch it fails.

Also, why not change the names from pig.harness.old.pig and pig.harness.cluster 
to harness.old.pig and harness.cluster, since you're trying to remove the 
piggyness from the harness?

 Test harness should be independent of Pig
 -

 Key: PIG-2196
 URL: https://issues.apache.org/jira/browse/PIG-2196
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.10

 Attachments: pig-2196.patch


 Test harness is designed to be independent of Pig, but currently it makes 
 those assumptions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2177) e2e test harness should not assume hadoop dir structure

2011-08-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2177:


Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Same code change is included in PIG-2196.

 e2e test harness should not assume hadoop dir structure
 ---

 Key: PIG-2177
 URL: https://issues.apache.org/jira/browse/PIG-2177
 Project: Pig
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.10
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Trivial
 Fix For: 0.10

 Attachments: pig-2177.patch


 testconfigpath variable assumes conf/ dir to exist in $PH_CLUSTER location. 
 It may or may not exist. If it exists, its better to provide full path 
 including conf/ If it doesn't exist full path to the dir leading to *.xml can 
 be provided. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2213) Pig 0.10 Documentation

2011-08-11 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-2213:
-

Attachment: pig-2113-patch-2.patch

Use this patch for PIG-2126 and PIG-2211

 Pig 0.10 Documentation 
 ---

 Key: PIG-2213
 URL: https://issues.apache.org/jira/browse/PIG-2213
 Project: Pig
  Issue Type: Task
Affects Versions: 0.9.1, 0.10
Reporter: Corinne Chandel
Assignee: Daniel Dai
 Attachments: pig-2113-patch-2.patch, pig-2213-patch-1.patch


 Doc JIRA for Pig 0.10 release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-11 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2208:
--

Attachment: PIG-2208.patch

This patch implements option 2. Augmenting Pig grammar will be more involved 
and could be done later.

 Restrict number of PIG generated Haddop counters 
 -

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2208.patch


 PIG 8.0 implemented Hadoop counters to track the number of records read for 
 each input and the number of records written for each output (PIG-1389  
 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
 (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
 Therefore we need a way to cap the number of PIG generated counters.
 Here are the two options:
 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
 (e.g., 20). If the number of inputs of a job exceeds this number, the input 
 counters are disabled. Similarly, if the number of outputs of a job exceeds 
 this number, the output counters are disabled.
 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
 file (default: false). If this property is set to true, then the PIG 
 generated counters are disabled.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083882#comment-13083882
 ] 

Dmitriy V. Ryaboy commented on PIG-2208:


This is just trading one issue for another. If we use too many counters, the 
job is killed by limits. If we don't, we spam the logs and the tasks are killed 
for using too much local disk.  We should at least do local aggregation -- keep 
counters local to task (a simple map), and log what we would otherwise put in 
counters. 

 Restrict number of PIG generated Haddop counters 
 -

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2208.patch


 PIG 8.0 implemented Hadoop counters to track the number of records read for 
 each input and the number of records written for each output (PIG-1389  
 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
 (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
 Therefore we need a way to cap the number of PIG generated counters.
 Here are the two options:
 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
 (e.g., 20). If the number of inputs of a job exceeds this number, the input 
 counters are disabled. Similarly, if the number of outputs of a job exceeds 
 this number, the output counters are disabled.
 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
 file (default: false). If this property is set to true, then the PIG 
 generated counters are disabled.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement

2011-08-11 Thread Vivek Padmanabhan (JIRA)
POStore.getSchema() returns null if I dont have a schema defined at load 
statement
--

 Key: PIG-2217
 URL: https://issues.apache.org/jira/browse/PIG-2217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0, 0.8.1
Reporter: Vivek Padmanabhan


If I don't specify a schema definition in load statement, then 
POStore.getSchema() returns null because of which PigOutputCommitter is not 
storing schema . 

For example if I run the below script, .pig_header and .pig_schema files 
wont be saved.


load_1 =  LOAD 'i1' USING PigStorage();
ordered_data_1 =  ORDER load_1 BY * ASC PARALLEL 1;
STORE ordered_data_1 INTO 'myout' using 
org.apache.pig.piggybank.storage.PigStorageSchema();


This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is not 
getting invoked for these cases.




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement

2011-08-11 Thread Vivek Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083939#comment-13083939
 ] 

Vivek Padmanabhan commented on PIG-2217:


For the above mentioned script the schema is marked as null from the logical 
layer itself, ie LOStore.getSchema() returns a null.
Since all the schema is derived from its predeccessor operators, the schema 
object for LOLoad itself is null.
Hence this scenario will be happening for all scripts which does not define a 
schema in the load stmt.


In Pig 0.7 , even if the schema value is null from logical layer, while 
translating, it is wrapped with an empty schema
For ex; In LogToPhyTranslationVisitor
   public void visit(LOStore loStore) throws VisitorException {

 store.setSchema(new Schema(loStore.getSchema()));

Hence the file will look like below
.pig_header (empty file )
.pig_schema
---
{fields:[],version:0,sortKeys:[-1],sortKeyOrders:[ASCENDING]}



But in 0.8 (new logical plan) onwards, the null value is directly returned, 
because of which the metadata is not saved.


This change in behaviour came with the new logical plan introduced in Pig 0.8 
which also got transferred into Pig 0.9.
Disabling the new logical plan in 0.8 ( pig -useversion 0.8 
-Dpig.usenewlogicalplan=false), will produce
.pig_header and .pig_schema files.

 POStore.getSchema() returns null if I dont have a schema defined at load 
 statement
 --

 Key: PIG-2217
 URL: https://issues.apache.org/jira/browse/PIG-2217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Vivek Padmanabhan

 If I don't specify a schema definition in load statement, then 
 POStore.getSchema() returns null because of which PigOutputCommitter is not 
 storing schema . 
 For example if I run the below script, .pig_header and .pig_schema files 
 wont be saved.
 load_1 =  LOAD 'i1' USING PigStorage();
 ordered_data_1 =  ORDER load_1 BY * ASC PARALLEL 1;
 STORE ordered_data_1 INTO 'myout' using 
 org.apache.pig.piggybank.storage.PigStorageSchema();
 This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is 
 not getting invoked for these cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira