[jira] Commented: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004944#comment-13004944
 ] 

Mridul Muralidharan commented on PIG-1618:
--

Not sure what the scope of this JIRA is, but it will be a good idea to get rid 
of the pre-processor and integrate that into the parser.
This will lead to consistent error messages (so no need to muck around with 
-debug, etc) while allowing for easier integration of pig in embedded mode.

Regards,
Mridul

> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.2.patch, NewParser-23.patch, NewParser-24.patch, 
> NewParser-3.patch, NewParser-3.patch, NewParser-4.patch, NewParser-5.patch, 
> NewParser-6.patch, NewParser-7.patch, NewParser-8.patches, NewParser-9.patch, 
> antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1618:
---

Attachment: NewParser-23.2.patch

NewParser-23.2.patch - NewParser-23.patch with fixes for unit tests (support 
for project-star with alias).


> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.2.patch, NewParser-23.patch, NewParser-24.patch, 
> NewParser-3.patch, NewParser-3.patch, NewParser-4.patch, NewParser-5.patch, 
> NewParser-6.patch, NewParser-7.patch, NewParser-8.patches, NewParser-9.patch, 
> antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1890:
---

Assignee: Jakob Homan  (was: Daniel Dai)

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-596) Anonymous tuples in bags create ParseExceptions

2011-03-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004910#comment-13004910
 ] 

Xuefu Zhang commented on PIG-596:
-

The old parser requires a name for the tuple type in a bag type. This 
requirement is dropped in the new parser. Thus, the following query works:

One = load 'one.txt' using PigStorage() as ( one: int );
LabelledTupleInBag = foreach One generate { ( 1, 2 ) } as mybag : { tuplelabel: 
tuple ( a, b ) };
AnonymousTupleInBag = foreach One generate { ( 2, 3 ) } as mybag : { ( a, b ) };

However, you do need separate field alias and its type with a ":", as shown 
above.

> Anonymous tuples in bags create ParseExceptions
> ---
>
> Key: PIG-596
> URL: https://issues.apache.org/jira/browse/PIG-596
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: David Ciemiewicz
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> {code}
> One = load 'one.txt' using PigStorage() as ( one: int );
> LabelledTupleInBag = foreach One generate { ( 1, 2 ) } as mybag { tuplelabel: 
> tuple ( a, b ) };
> AnonymousTupleInBag = foreach One generate { ( 2, 3 ) } as mybag { tuple ( a, 
> b ) }; -- Anonymous tuple creates bug
> Tuples = union LabelledTupleInBag, AnonymousTupleInBag;
> dump Tuples;
> {code}
> java.io.IOException: Encountered "{ tuple" at line 6, column 66.
> Was expecting one of:
> "parallel" ...
> ";" ...
> "," ...
> ":" ...
> "(" ...
> "{"  ...
> "{" "}" ...
> "[" ...
> 
> at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Encountered "{ tuple" at line 6, column 66.
> Why can't there be an anonymous tuple at the top level of a bag?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004906#comment-13004906
 ] 

Daniel Dai commented on PIG-1890:
-

PIG-1890-1.patch fix the first issue. I temporary comment out all test cases in 
TestAvroStorage.

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1890:


Attachment: (was: PIG-1890-1.patch)

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1890:


Attachment: PIG-1890-1.patch

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1890:


Description: 
TestAvroStorage fail on trunk. There are two reasons:
1. After PIG-1680, we call LoadFunc.setLocation one more time.
2. The schema for AvroStorage seems to be wrong. For example, in first test 
case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
{PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
issue is hidden until PIG-1188 checked in.

  was:Two piggybank tests TestAllLoader, TestAvroStorage fail on trunk. We need 
to fix them.

Summary: Fix piggybank unit test TestAvroStorage  (was: Fix piggybank 
unit test TestAllLoader, TestAvroStorage)

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1152) bincond operator throws parser error

2011-03-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004900#comment-13004900
 ] 

Xuefu Zhang commented on PIG-1152:
--

Actually the problem is not caused by either bincond operator or parser, but by 
pig's grammar limitation. Data fields in a literal bag can only have literals. 
By Pig's grammar, -1 is not a literal, but an expression.

Thus, parser is happy with C = foreach B generate group, flatten(((COUNT(A) < 
1L) ? {(1)} : A.x));.

To solve the problem, Pig grammar needs to be extended.

> bincond operator throws parser error
> 
>
> Key: PIG-1152
> URL: https://issues.apache.org/jira/browse/PIG-1152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> Bincond operator throws parser error when true condition contains a constant 
> bag with 1 tuple containing a single field of int type with -ve value. 
> Here is the script to reproduce the issue
> A = load 'A' as (s: chararray, x: int, y: int);
> B = group A by s;
> C = foreach B generate group, flatten(((COUNT(A) < 1L) ? {(-1)} : A.x));
> dump C;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1874) Make PigServer work in a multithreading environment

2011-03-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1874:
--

Attachment: PIG-1874.patch

Attaching patch for review.

This patch removed the static variables from PigServer and PigContext classes. 
It also made UDFContext instance thread local.

To avoid sharing PigContext object, users should use following constructors to 
create PigServer instance in each thread:

{code}
public PigServer(ExecType execType) throws ExecException;

public PigServer(ExecType execType, Properties properties) throws ExecException;
{code} 

> Make PigServer work in a multithreading environment
> ---
>
> Key: PIG-1874
> URL: https://issues.apache.org/jira/browse/PIG-1874
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1874.patch
>
>
> This means that PigServers should work if one creates separate PigServer 
> instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2011-03-09 Thread Alexander Lehmann (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004883#comment-13004883
 ] 

Alexander Lehmann commented on PIG-366:
---

Until now, it is not in the Pig subversion, I assume that would be a first step 
for a new "owner".
I have tried to get it to compile with pig 0.8.0, but that turns out to be 
rather complicated due to api changes (or maybe because I'm not an Eclipse 
expert ...).
I'll try to file a new issue if I get it working.



> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
>  Labels: gsoc, mentor
> Attachments: PigPen.tgz, org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen-0.7.4.tar.gz, 
> org.apache.pig.pigpen-0.7.5.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, org.apache.pig.pigpen_0.7.4.jar, 
> org.apache.pig.pigpen_0.7.5.jar, pigPen.patch, pigpen.patch
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2011-03-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004873#comment-13004873
 ] 

Alan Gates commented on PIG-1891:
-

It sounds like what you want is a way for the storage function to inject code 
into OutputCommitter.cleanupJob.  (See 
http://hadoop.apache.org/common/docs/r0.20.2/api/index.html for details.  This 
is a final task that Hadoop runs after all reduces have finished.)  

At this point since this is already offered by Hadoop's OutputFormat we have 
left these things there, rather than mimic the interface in Pig.  So the way to 
do this would be to have the OutputFormat you are using return an 
OutputCommitter that would do the commit (or whatever) in cleanupJob.  You do 
not have to write a whole new OutputFormat for this.  You can extend whatever 
OutputFormat you are using and the associated OutputCommitter it returns.  Your 
extended OutputFormat should return your OutputCommitter in getOutputCommitter. 
 Your OutputCommitter should only change cleanupJob, which should call 
super.cleanupJob and then do whatever you want to do.


> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alex Rovner
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Pig-trunk-commit #693

2011-03-09 Thread Apache Hudson Server
See 




[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2011-03-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004872#comment-13004872
 ] 

Alan Gates commented on PIG-366:


By ownership here Olga didn't mean taking the code out of Pig.  Pig owns the 
code.  She meant someone who would be an expert in the area and work on it.  

Just adding a few patches for now would be a great start.  File issues on JIRA 
and attach your patches.  The Pig committers will work with you to get them in.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
>  Labels: gsoc, mentor
> Attachments: PigPen.tgz, org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen-0.7.4.tar.gz, 
> org.apache.pig.pigpen-0.7.5.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, org.apache.pig.pigpen_0.7.4.jar, 
> org.apache.pig.pigpen_0.7.5.jar, pigPen.patch, pigpen.patch
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2011-03-09 Thread Alex Rovner (JIRA)
Enable StoreFunc to make intelligent decision based on job success or failure
-

 Key: PIG-1891
 URL: https://issues.apache.org/jira/browse/PIG-1891
 Project: Pig
  Issue Type: New Feature
Reporter: Alex Rovner


We are in the process of using PIG for various data processing and component 
integration. Here is where we feel pig storage funcs lack:

They are not aware if the over all job has succeeded. This creates a problem 
for storage funcs which needs to "upload" results into another system:

DB, FTP, another file system etc.

I looked at the DBStorage in the piggybank 
(http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
 and what I see is essentially a mechanism which for each task does the 
following:

1. Creates a recordwriter (in this case open connection to db)
2. Open transaction.
3. Writes records into a batch
4. Executes commit or rollback depending if the task was successful.

While this aproach works great on a task level, it does not work at all on a 
job level. 

If certain tasks will succeed but over job will fail, partial records are going 
to get uploaded into the DB.

Any ideas on the workaround? 

Our current workaround is fairly ugly: We created a java wrapper that launches 
pig jobs and then uploads to DB's once pig's job is successful. While the 
approach works, it's not really integrated into pig.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004868#comment-13004868
 ] 

Thejas M Nair commented on PIG-1618:


Review of NewParser-24.patch -
in SchemaAliasVisitor.validate() the check could be done in single loop if the 
alias is stored in a HashSet and its presence is checked. That will reduce the 
complexity.
I will make the change in a new patch.
NewParser-23.patch has unit test failure, I will submit new patch with fix, 
which will include this change.


> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.patch, NewParser-24.patch, NewParser-3.patch, NewParser-3.patch, 
> NewParser-4.patch, NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, 
> NewParser-8.patches, NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004869#comment-13004869
 ] 

Thejas M Nair commented on PIG-1618:


NewParser-24.patch committed to trunk.

> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.patch, NewParser-24.patch, NewParser-3.patch, NewParser-3.patch, 
> NewParser-4.patch, NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, 
> NewParser-8.patches, NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Fix piggybank unit test TestAllLoader, TestAvroStorage

2011-03-09 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/485/
---

Review request for pig, Jakob Homan and Richard Ding.


Summary
---

Two piggybank tests TestAllLoader, TestAvroStorage fail on trunk. We need to 
fix them.

TestAvroStorage is broken after PIG-1680. We now call LoadFunc.setLocation one 
more time. Need original author to take a look.


This addresses bug PIG-1890.
https://issues.apache.org/jira/browse/PIG-1890


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestAllLoader.java
 1079597 

Diff: https://reviews.apache.org/r/485/diff


Testing
---

All piggybank tests pass. Since no change in Pig core code, ignore tests for 
pig core.


Thanks,

Daniel



[jira] Resolved: (PIG-1188) Padding nulls to the input tuple according to input schema

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1188.
-

  Resolution: Fixed
Release Note: If load statement specify schema, Pig will truncate/padding 
null to make sure the loaded data has exactly the same number of fields 
specified in load statement.
Hadoop Flags: [Reviewed]

Patch committed to trunk

> Padding nulls to the input tuple according to input schema
> --
>
> Key: PIG-1188
> URL: https://issues.apache.org/jira/browse/PIG-1188
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1188-1.patch, PIG-1188-2.patch
>
>
> Currently, the number of fields in the input tuple is determined by the data. 
> When we have schema, we should generate input data according to the schema, 
> and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1   2
> 1   2   3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1876) Typed map for Pig

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1876:


Release Note: 
User can specify the a typed map in place of untyped map using the syntax:
map[type]

Untyped map still works as before.

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch, PIG-1876_3.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Resolved: (PIG-1876) Typed map for Pig

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1876.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk.

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch, PIG-1876_3.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1876) Typed map for Pig

2011-03-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004858#comment-13004858
 ] 

Daniel Dai commented on PIG-1876:
-

Review notes: https://reviews.apache.org/r/472/

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch, PIG-1876_3.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1876) Typed map for Pig

2011-03-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1876:
--

Attachment: PIG-1876_3.patch

Added a few unit tests for macro.

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch, PIG-1876_3.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1876) Typed map for Pig

2011-03-09 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004830#comment-13004830
 ] 

Richard Ding commented on PIG-1876:
---

+1

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch, PIG-1876_3.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-1188) Padding nulls to the input tuple according to input schema

2011-03-09 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004826#comment-13004826
 ] 

Richard Ding commented on PIG-1188:
---

+1

> Padding nulls to the input tuple according to input schema
> --
>
> Key: PIG-1188
> URL: https://issues.apache.org/jira/browse/PIG-1188
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1188-1.patch, PIG-1188-2.patch
>
>
> Currently, the number of fields in the input tuple is determined by the data. 
> When we have schema, we should generate input data according to the schema, 
> and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1   2
> 1   2   3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1618:
-

Attachment: NewParser-24.patch

Add SchemaAliasVisitor
Improve new parser error message and handling.

Unit test and end2end test passed.

test-patch run results: (javacc warning are caused by generated code)

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] -1 javac.  The applied patch generated 863 javac compiler 
warnings (more than the trunk's current 860 warnings).
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.patch, NewParser-24.patch, NewParser-3.patch, NewParser-3.patch, 
> NewParser-4.patch, NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, 
> NewParser-8.patches, NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1618:
-

Attachment: (was: NewParser-23.patch)

> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.patch, NewParser-3.patch, NewParser-3.patch, NewParser-4.patch, 
> NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, NewParser-8.patches, 
> NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (PIG-1890) Fix piggybank unit test TestAllLoader, TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)
Fix piggybank unit test TestAllLoader, TestAvroStorage
--

 Key: PIG-1890
 URL: https://issues.apache.org/jira/browse/PIG-1890
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.9.0
 Attachments: PIG-1890-1.patch

Two piggybank tests TestAllLoader, TestAvroStorage fail on trunk. We need to 
fix them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1890) Fix piggybank unit test TestAllLoader, TestAvroStorage

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1890:


Attachment: PIG-1890-1.patch

> Fix piggybank unit test TestAllLoader, TestAvroStorage
> --
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> Two piggybank tests TestAllLoader, TestAvroStorage fail on trunk. We need to 
> fix them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1618) Switch to new parser generator technology

2011-03-09 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1618:
-

Attachment: NewParser-23.patch

> Switch to new parser generator technology
> -
>
> Key: PIG-1618
> URL: https://issues.apache.org/jira/browse/PIG-1618
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Alan Gates
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: NewParser-1.patch, NewParser-10.patch, 
> NewParser-11.patch, NewParser-12.patch, NewParser-13.2.patch, 
> NewParser-13.patch, NewParser-14.patch, NewParser-15.patch, 
> NewParser-18.patch, NewParser-19.3.patch, NewParser-19.patch, 
> NewParser-2.patch, NewParser-21.patch, NewParser-22.patch, 
> NewParser-23.patch, NewParser-23.patch, NewParser-3.patch, NewParser-3.patch, 
> NewParser-4.patch, NewParser-5.patch, NewParser-6.patch, NewParser-7.patch, 
> NewParser-8.patches, NewParser-9.patch, antlr-3.2.jar, javadoc.patch
>
>
> There are many bugs in Pig related to the parser, particularly to bad error 
> messages.  After review of Java CC we feel these will be difficult to address 
> using that tool.  Also, the .jjt files used by JavaCC are hard to understand 
> and maintain.  
> ANTLR is being reviewed as the most likely choice to move to, but other 
> parsers will be reviewed as well.
> This JIRA will act as an umbrella issue for other parser issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2011-03-09 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004808#comment-13004808
 ] 

Alex Rovner commented on PIG-366:
-

What would ownership entail? I would be glad to take ownership and submit a few 
fixes etc. What would be a good place to host this project? I couldn't find a 
way to email the owners to find out how to submit patches etc.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Robert Gibbon
>Priority: Minor
>  Labels: gsoc, mentor
> Attachments: PigPen.tgz, org.apache.pig.pigpen-0.7.0.tar.gz, 
> org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen-0.7.4.tar.gz, 
> org.apache.pig.pigpen-0.7.5.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> org.apache.pig.pigpen_0.7.2.jar, org.apache.pig.pigpen_0.7.4.jar, 
> org.apache.pig.pigpen_0.7.5.jar, pigPen.patch, pigpen.patch
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Padding nulls to the input tuple according to input schema

2011-03-09 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/484/
---

Review request for pig and Richard Ding.


Summary
---

Currently, the number of fields in the input tuple is determined by the data. 
When we have schema, we should generate input data according to the schema, and 
padding nulls if necessary.


This addresses bug PIG-1188.
https://issues.apache.org/jira/browse/PIG-1188


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/TypeCastInserter.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEvalPipeline2.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMergeForEachOptimization.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMultiQueryCompiler.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanFilterAboveForeach.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanFilterRule.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanLogicalOptimizer.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanPushDownForeachFlatten.java
 1079597 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPartitionFilterPushDown.java
 1079597 

Diff: https://reviews.apache.org/r/484/diff


Testing
---

test-patch:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 24 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit test:
all pass

End to end test:
all pass


Thanks,

Daniel



[jira] Updated: (PIG-1780) Update the Documentation in PIG site

2011-03-09 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1780:
-

Attachment: pig-1780.patch

Patch

Apply to Branch-8 only.



> Update the Documentation in PIG site
> 
>
> Key: PIG-1780
> URL: https://issues.apache.org/jira/browse/PIG-1780
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.8.0
>Reporter: Charles Ferreira Gonçalves
>Assignee: Corinne Chandel
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: pig-1780.patch
>
>
> The Tutorial, Setup and others documentation in the pig site has directives 
> that didn't work and are outdated.
> It desirable that this documentation could fast introduce new users by 
> working out-of-the-box 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1876) Typed map for Pig

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1876:


Attachment: PIG-1876-2.patch

PIG-1876-2.patch resync with trunk.

> Typed map for Pig
> -
>
> Key: PIG-1876
> URL: https://issues.apache.org/jira/browse/PIG-1876
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1876-1.patch, PIG-1876-2.patch
>
>
> Currently Pig map type is untyped, which means map value is always of 
> bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a 
> shuffle key, which somewhat relieve the problem. However, typed map is still 
> beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to 
> explicitly cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance 
> suffers. We don't have a raw comparator for the unknown type, instead, we 
> need to instantiate the value object and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value 
> is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {int}
> The behavior of untyped map will remain the same.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1188) Padding nulls to the input tuple according to input schema

2011-03-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1188:


Attachment: PIG-1188-2.patch

PIG-1188-2.patch fix unit test failures.

> Padding nulls to the input tuple according to input schema
> --
>
> Key: PIG-1188
> URL: https://issues.apache.org/jira/browse/PIG-1188
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1188-1.patch, PIG-1188-2.patch
>
>
> Currently, the number of fields in the input tuple is determined by the data. 
> When we have schema, we should generate input data according to the schema, 
> and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1   2
> 1   2   3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: PIG-671

2011-03-09 Thread Scott Carey


On 3/7/11 9:55 PM, "deepak kumar v"  wrote:

>Hi Pig Developers,
>This is my first dive into open source contribution and i hope to dive
>deep.
>
>I was going through https://issues.apache.org/jira/browse/PIG-671 and
>observed the following with COUNT.java
>
>COUNT.exec() always retrieves the first item from input tuple which it
>assumes is a bag and counts the numbers of items in the bag.
>Even if we pass multiple arguments to COUNT(), it will always pick the
>first
>argument.
>
>There are few ways we go through this
>a) Leave as is cause it returns correct result for counting the number of
>items in the first argument.
>OR
>b) Make a check for the size of the input tuple in COUNT.exec() and if it
>is
>not 1 then throw ExecException()  or IllegalArgumentException {might be
>correct}
>which will cause the Map job to fail.

What about:

c) Count the number of non-null tuples in the bag (same as COUNT_STAR as
long as null tuples are not inserted somehow).  This is what users seem to
expect; I've seen several bugs due to users doing COUNT(FOO) and not
expecting it to be equivalent to COUNT(FOO.$0).

>
>Let me know how to we go about it.
>
>
>Regards,
>Deepak