date:20100316

[jira] Created: (PIG-1301) Problem pruning columns with UDF

2010-03-16 Thread Andrew Groh (JIRA)

Problem pruning columns with UDF


 Key: PIG-1301
 URL: https://issues.apache.org/jira/browse/PIG-1301
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Andrew Groh


I just upgraded to pig 0.6.0.

I have a pig file like
raw = load 'foo.csv' using PigStorage() as (field1:chararray, field2:chararray);

define contains com.mycompany.pig.Contains();

rawactions = foreach raw generate contains(field1, field2) as junk,  field1;

reqcnt = foreach rawactions generate field1;

dump reqcnt

When I try to run this code, I get an error:
Problem with input: (Name: Project 1-40 Projections: [1] Overloaded: false 
Operator Key: 1-40) of User-defined function: (Name: UserFunc 1-39 function: 
com.mycompany.pig.Contains Operator Key: 1-39)
Thrown from line 98 of LOUserFunction.java

This was caused by another FrontEndException 
Attempt to access field: 1 from schema: {field1: chararray}
from Schema.java

I also investigated changing the pig code
if you change
rawactions = foreach raw generate contains(field1, field2) as junk,  field1;

to either
rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
or
rawactions = foreach raw generate contains(field2, field2) as junk,  field1;

or if you change
reqcnt = foreach rawactions generate field1;
to
reqcnt = foreach rawactions generate field1, junk;

It all works correctly.

The problem appears to be that it prunes out field2, but then gets confused and 
does not prune out the plan associated with the UDF contains, since field1 is 
not pruned.  So if the UDF only references field2 it will get removed, if it 
only references field1 the field will have not been pruned and it can run.

I eventually tracked this down to the code around 947 of LOForEach.java
for (LOProject loProject : projectFinder.getProjectSet()) {
PairInteger, Integer pair = new PairInteger, Integer(0,
loProject.getCol());
if (!columns.contains(pair)) {
allPruned = false;
break;
}
}
if (allPruned) {
planToRemove.add(i);
}

In the example pig, allPruned is false for the plan associated the UDF.  This 
is because field1 is both a column for the UDF and for the ForEach in general.  
Since field1 is not pruned, the plan is not removed and bad things happen later.

I don't really understand the pruning code all that well, so I don't have a fix 
for it.  I hope that it will be clear to someone who understands this code 
better.  I can provide a better test case for this if necessary.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1292) Interface Refinements

2010-03-16 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1292:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked-in with changes suggested in previous comment. Core test failure 
reported by hudson was transient. It passed on my machine.

 Interface Refinements
 -

 Key: PIG-1292
 URL: https://issues.apache.org/jira/browse/PIG-1292
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1292.patch, pig-interfaces.patch


 A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both 
 are abstract classes instead of being interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-16 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846027#action_12846027
 ] 

Benjamin Reed commented on PIG-1257:


excellent work pradeep. just one minor thing:  you always append a \n before 
inputData in your test case, so you never test the case when you end with just 
\r


 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-16 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

Attachment: PIG-1258.patch

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-16 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846038#action_12846038
 ] 

Pradeep Kamath commented on PIG-1257:
-

In the following case in inputData the record will end with \r won't it? 
(notice the \r in the middle after 2)
{code}
  1\t2\r3\t4, // '\r' case - this will be split into two tuples
{code}

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Branching for Pig 0.7.0 release

2010-03-16 Thread Olga Natkovich

Hi,

 

It has been a few weeks since we merged the Load-Store redesign changes
into the trunk. We have been doing a lot of testing and fixing bugs. I
think it is time to branch the code in preparation for Pig 0.7.0
release. Unless I here objections, I will do this next Monday, 3/22.

 

Olga

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-16 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846080#action_12846080
 ] 

Pradeep Kamath commented on PIG-1257:
-

I ran all unit tests on my local machines and also  the test-patch ant target:
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] 


 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-16 Thread Benjamin Reed (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846083#action_12846083
 ] 

Benjamin Reed commented on PIG-1257:


+1 you are right. thanx pradeep. i think it is ready to commit.

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1301) Problem pruning columns with UDF

2010-03-16 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1301.
-

   Resolution: Fixed
Fix Version/s: 0.7.0

 Problem pruning columns with UDF
 

 Key: PIG-1301
 URL: https://issues.apache.org/jira/browse/PIG-1301
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Andrew Groh
 Fix For: 0.7.0


 I just upgraded to pig 0.6.0.
 I have a pig file like
 raw = load 'foo.csv' using PigStorage() as (field1:chararray, 
 field2:chararray);
 define contains com.mycompany.pig.Contains();
 rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
 reqcnt = foreach rawactions generate field1;
 dump reqcnt
 When I try to run this code, I get an error:
 Problem with input: (Name: Project 1-40 Projections: [1] Overloaded: false 
 Operator Key: 1-40) of User-defined function: (Name: UserFunc 1-39 function: 
 com.mycompany.pig.Contains Operator Key: 1-39)
 Thrown from line 98 of LOUserFunction.java
 This was caused by another FrontEndException 
 Attempt to access field: 1 from schema: {field1: chararray}
 from Schema.java
 I also investigated changing the pig code
 if you change
 rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
 to either
 rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
 or
 rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
 or if you change
 reqcnt = foreach rawactions generate field1;
 to
 reqcnt = foreach rawactions generate field1, junk;
 It all works correctly.
 The problem appears to be that it prunes out field2, but then gets confused 
 and does not prune out the plan associated with the UDF contains, since 
 field1 is not pruned.  So if the UDF only references field2 it will get 
 removed, if it only references field1 the field will have not been pruned and 
 it can run.
 I eventually tracked this down to the code around 947 of LOForEach.java
 for (LOProject loProject : projectFinder.getProjectSet()) {
 PairInteger, Integer pair = new PairInteger, Integer(0,
 loProject.getCol());
 if (!columns.contains(pair)) {
 allPruned = false;
 break;
 }
 }
 if (allPruned) {
 planToRemove.add(i);
 }
 In the example pig, allPruned is false for the plan associated the UDF.  This 
 is because field1 is both a column for the UDF and for the ForEach in 
 general.  Since field1 is not pruned, the plan is not removed and bad things 
 happen later.
 I don't really understand the pruning code all that well, so I don't have a 
 fix for it.  I hope that it will be clear to someone who understands this 
 code better.  I can provide a better test case for this if necessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1301) Problem pruning columns with UDF

2010-03-16 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846087#action_12846087
 ] 

Daniel Dai commented on PIG-1301:
-

Thanks for reporting. I tried the script on trunk. Seems on trunk we have fixed 
that. The code you mentioned do have the problem you mentioned. But in the 
trunk, we already changed the code to:

{code}
boolean anyPruned = false;
for (LOProject loProject : projectFinder.getProjectSet()) {
PairInteger, Integer pair = new PairInteger, Integer(0, 
loProject.getCol());
if (columns.contains(pair)) {
anyPruned = true;
break;
}
}
{code}

The fix will come with next Pig release. 

 Problem pruning columns with UDF
 

 Key: PIG-1301
 URL: https://issues.apache.org/jira/browse/PIG-1301
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Andrew Groh
 Fix For: 0.7.0


 I just upgraded to pig 0.6.0.
 I have a pig file like
 raw = load 'foo.csv' using PigStorage() as (field1:chararray, 
 field2:chararray);
 define contains com.mycompany.pig.Contains();
 rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
 reqcnt = foreach rawactions generate field1;
 dump reqcnt
 When I try to run this code, I get an error:
 Problem with input: (Name: Project 1-40 Projections: [1] Overloaded: false 
 Operator Key: 1-40) of User-defined function: (Name: UserFunc 1-39 function: 
 com.mycompany.pig.Contains Operator Key: 1-39)
 Thrown from line 98 of LOUserFunction.java
 This was caused by another FrontEndException 
 Attempt to access field: 1 from schema: {field1: chararray}
 from Schema.java
 I also investigated changing the pig code
 if you change
 rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
 to either
 rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
 or
 rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
 or if you change
 reqcnt = foreach rawactions generate field1;
 to
 reqcnt = foreach rawactions generate field1, junk;
 It all works correctly.
 The problem appears to be that it prunes out field2, but then gets confused 
 and does not prune out the plan associated with the UDF contains, since 
 field1 is not pruned.  So if the UDF only references field2 it will get 
 removed, if it only references field1 the field will have not been pruned and 
 it can run.
 I eventually tracked this down to the code around 947 of LOForEach.java
 for (LOProject loProject : projectFinder.getProjectSet()) {
 PairInteger, Integer pair = new PairInteger, Integer(0,
 loProject.getCol());
 if (!columns.contains(pair)) {
 allPruned = false;
 break;
 }
 }
 if (allPruned) {
 planToRemove.add(i);
 }
 In the example pig, allPruned is false for the plan associated the UDF.  This 
 is because field1 is both a column for the UDF and for the ForEach in 
 general.  Since field1 is not pruned, the plan is not removed and bad things 
 happen later.
 I don't really understand the pruning code all that well, so I don't have a 
 fix for it.  I hope that it will be clear to someone who understands this 
 code better.  I can provide a better test case for this if necessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1289) PIG Join fails while doing a filter on joined data

2010-03-16 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1289:


Fix Version/s: 0.7.0

 PIG Join fails while doing a filter on joined data
 --

 Key: PIG-1289
 URL: https://issues.apache.org/jira/browse/PIG-1289
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Karim Saadah
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1289-1.patch


 PIG Join fails while doing a filter on joined data
 Here are the steps to reproduce it:
 -bash-3.1$ pig -latest -x local
 grunt a = load 'first.dat' using PigStorage('\u0001') as (f1:int, 
 f2:chararray);
 grunt DUMP a;
 (1,A)
 (2,B)
 (3,C)
 (4,D)
 grunt b = load 'second.dat' using PigStorage() as (f3:chararray);
 grunt DUMP b;
 (A)
 (D)
 (E)
 grunt c = join a by f2 LEFT OUTER, b by f3;
 grunt DUMP c;
 (1,A,A)
 (2,B,)
 (3,C,)
 (4,D,D)
 grunt describe c;
 c: {a::f1: int,a::f2: chararray,b::f3: chararray}
 grunt d = filter c by (f3 is null or f3 =='');
 grunt dump d;
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for a
 2010-03-03 15:00:37,130 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for a
 2010-03-03 15:00:37,130 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias d
 This one is failing too:
 grunt d = filter c by (b::f3 is null or b::f3 =='');
 or this one not returning results as expected:
 grunt d = foreach c generate f1 as f1, f2 as f2, f3 as f3;
 grunt e = filter d by (f3 is null or f3 =='');
 grunt DUMP e;
 (1,A,)
 (2,B,)
 (3,C,)
 (4,D,)
 while the expected result is
 (2,B,)
 (3,C,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

2010-03-16 Thread Alok Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846111#action_12846111
 ] 

Alok Singh commented on PIG-1284:
-

Hi 

 As mentioned earlier, I have run the test locally and is passing.

The timeout issue is not related to this.

Can a moderater review my patch and commit it

Thanks
Alok

 pig UDF is lacking XMLLoader. Plan to add the XMLLoader
 ---

 Key: PIG-1284
 URL: https://issues.apache.org/jira/browse/PIG-1284
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Alok Singh
 Fix For: 0.7.0

 Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Hi All,
  We are planning to add the XMLLoader UDF in the piggybank repository.
 Here is the proposal with the user docs :-
  The load function to load the XML file
  This will implements the LoadFunc interface which is used to parse records
  from a dataset.
  This takes a xmlTag as the arg which it will use to split the inputdataset 
 into
  multiple records.
  For example if the input xml (input.xml) is like this
  configuration
  property
  name foobar /name
  value barfoo /value
  /property
  ignoreProperty
  name foo /name
  /ignoreProperty
  property
  name justname /name
  /property
  /configuration
  And your pig script is like this
  --load the jar files
  register loader.jar;
  -- load the dataset using XMLLoader
  -- A is the bag containing the tuple which contains one atom i.e doc see 
 output
  A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as 
 (doc:chararray);
  --dump the result
  dump A;
  Then you will get the output
 (property
 name foobar /name
 value barfoo /value
 /property)
 (property
 name justname /name
 /property)
 Where each () indicate one record
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1302) Include zebra's

2010-03-16 Thread Pradeep Kamath (JIRA)

Include zebra's 


 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Reporter: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-03-16 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1302:


  Description: There are changes made in Pig interfaces which break 
zebra loaders/storers. It would be good to run the pig tests in the zebra unit 
tests as part of running pig's core-test for each patch submission. So 
essentially in the test ant target in pig, we would need to invoke zebra's 
pigtest target.
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0
  Summary: Include zebra's pigtest ant target as a part of pig's 
ant test target  (was: Include zebra's )

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

2010-03-16 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846130#action_12846130
 ] 

Alan Gates commented on PIG-1284:
-

I'll take a look at the patch.

 pig UDF is lacking XMLLoader. Plan to add the XMLLoader
 ---

 Key: PIG-1284
 URL: https://issues.apache.org/jira/browse/PIG-1284
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Alok Singh
 Fix For: 0.7.0

 Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Hi All,
  We are planning to add the XMLLoader UDF in the piggybank repository.
 Here is the proposal with the user docs :-
  The load function to load the XML file
  This will implements the LoadFunc interface which is used to parse records
  from a dataset.
  This takes a xmlTag as the arg which it will use to split the inputdataset 
 into
  multiple records.
  For example if the input xml (input.xml) is like this
  configuration
  property
  name foobar /name
  value barfoo /value
  /property
  ignoreProperty
  name foo /name
  /ignoreProperty
  property
  name justname /name
  /property
  /configuration
  And your pig script is like this
  --load the jar files
  register loader.jar;
  -- load the dataset using XMLLoader
  -- A is the bag containing the tuple which contains one atom i.e doc see 
 output
  A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as 
 (doc:chararray);
  --dump the result
  dump A;
  Then you will get the output
 (property
 name foobar /name
 value barfoo /value
 /property)
 (property
 name justname /name
 /property)
 Where each () indicate one record
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-03-16 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846179#action_12846179
 ] 

Alan Gates commented on PIG-1302:
-

-1.  Pig must build independent of its contrib projects.  I'm fine with 
changing the hudson process to run some of Zebra's tests as well.  But ant 
test at the Pig level should not invoke Zebra.

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-03-16 Thread Olga Natkovich (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846184#action_12846184
 ] 

Olga Natkovich commented on PIG-1302:
-

That's the approach we initially favored but according to Giri, this is not the 
way hadoop is doing this and we wanted to be consistent with them.

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-16 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-03-16 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846194#action_12846194
 ] 

Alan Gates commented on PIG-1302:
-

I still maintain my -1.  It just seems wrong for main projects to depend on 
their contribs.  99% of Pig users (counting by organization, not by individual 
users) don't care about Zebra.  Making them test Zebra in addition to Pig is 
not helpful for them.  Perhaps we could add a test-stack target or something 
that tests Pig plus its contrib projects and have hudson call that. 

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-03-16 Thread Olga Natkovich (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846197#action_12846197
 ] 

Olga Natkovich commented on PIG-1302:
-

The idea is that we don't want to commit thing that break contrib projects and 
that's why integrating it in test and not another target makes sense.

I am fine re-visiting this issue with Giri and just adding it to test-patch 
process though it seems that the end result is exactly the same - you can't 
commit patches that break contrib projects.

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-03-16 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1287:


Attachment: PIG-1287-2.patch

The new patch also fixes warning aggregation in PigHadoopLogger to use the 
counter support now available in hadoop 0.20.2

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-03-16 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1287:


Status: Patch Available  (was: Open)

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1289) PIG Join fails while doing a filter on joined data

2010-03-16 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846222#action_12846222
 ] 

Daniel Dai commented on PIG-1289:
-

Yes, it is safe not push filter up a branch that will be producing nulls. I 
might be wrong but what I did is try to be a little bit more aggressive. Since 
the only extra value outer join will produce is null, so if filter is not 
testing null, we can still push it up even if it is on the inner branch. 

Eg:
A = load 'foo' as (q, r, s);
B = load 'bar ' as (t, u, v);
C = join A on q outer, B on t;
D = filter C by t  0;

The production C consists of two parts:
A + B
A + null

If we do a filter after join, it is a union on this two parts:
filter(A + B) union filter(A + null)

If we are not testing nullability (eg, t  0), then filter(A + null) will not 
have any production, so
filter(A + B) union filter(A + null) = filter(A + B)

In this case, outer join is equivalent as a regular join (since all generated 
null B records are filtered away), so we can still push the filter up.

 PIG Join fails while doing a filter on joined data
 --

 Key: PIG-1289
 URL: https://issues.apache.org/jira/browse/PIG-1289
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Karim Saadah
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1289-1.patch


 PIG Join fails while doing a filter on joined data
 Here are the steps to reproduce it:
 -bash-3.1$ pig -latest -x local
 grunt a = load 'first.dat' using PigStorage('\u0001') as (f1:int, 
 f2:chararray);
 grunt DUMP a;
 (1,A)
 (2,B)
 (3,C)
 (4,D)
 grunt b = load 'second.dat' using PigStorage() as (f3:chararray);
 grunt DUMP b;
 (A)
 (D)
 (E)
 grunt c = join a by f2 LEFT OUTER, b by f3;
 grunt DUMP c;
 (1,A,A)
 (2,B,)
 (3,C,)
 (4,D,D)
 grunt describe c;
 c: {a::f1: int,a::f2: chararray,b::f3: chararray}
 grunt d = filter c by (f3 is null or f3 =='');
 grunt dump d;
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for a
 2010-03-03 15:00:37,130 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for a
 2010-03-03 15:00:37,130 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias d
 This one is failing too:
 grunt d = filter c by (b::f3 is null or b::f3 =='');
 or this one not returning results as expected:
 grunt d = foreach c generate f1 as f1, f2 as f2, f3 as f3;
 grunt e = filter d by (f3 is null or f3 =='');
 grunt DUMP e;
 (1,A,)
 (2,B,)
 (3,C,)
 (4,D,)
 while the expected result is
 (2,B,)
 (3,C,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-03-16 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846224#action_12846224
 ] 

Pradeep Kamath commented on PIG-1205:
-

Jeff, if the only issue blocking the commit is javac warning - unless the 
warning is due to use of deprecated hadoop API, we should fix it - if it is due 
to deprecated hadoop API then its ok to ignore. Very soon trunk will be 
branched for Pig 0.7.0 - so if this feature is useful to feature in Pig 0.7.0, 
we should get this committed soon.

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
 PIG_1205_4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

2010-03-16 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846226#action_12846226
 ] 

Alan Gates commented on PIG-1293:
-

Allen,

I'm having trouble reproducing this issue, so I'm not sure how to test your 
fix.  If I take top of trunk and install it, then do:

{code}
gates echo $PIG_HOME

gates PATH=/usr/bin:/usr/local/bin:/bin:./bin which pig
/home/gates/tmp/pig-0.7.0-dev/bin/pig
gates PATH=/usr/bin:/usr/local/bin:/bin:./bin pig -x local 
~/pig/scripts/Checkin_2.local.pig
10/03/16 17:09:24 INFO pig.Main: Logging error messages to: 
/home/gates/tmp/pig-0.7.0-dev/pig_1268784564902.log
2010-03-16 17:09:25,205 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2010-03-16 17:09:26,047 [main] WARN  org.apache.pig.PigServer - Encountered 
Warning IMPLICIT_CAST_TO_INT 2 time(s).
...
{code}

What am I doing wrong here?

 pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set
 -

 Key: PIG-1293
 URL: https://issues.apache.org/jira/browse/PIG-1293
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Allen Wittenauer
 Attachments: PIG-1293.txt


 If PIG_HOME isn't set and pig is in the path, the pig wrapper script can't 
 find its home.  Setting PIG_HOME makes it hard to support multiple versions 
 of pig. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1289) PIG Join fails while doing a filter on joined data

2010-03-16 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846228#action_12846228
 ] 

Alan Gates commented on PIG-1289:
-

In the case of 

D = filter C by t  0

the filter will evaluate to null when t is null.  By definition filters return 
only records that evaluate true.  So t  0 will have the affect of filtering 
out all outer records of A because t will be null for every one of them.  That 
is, it turns the join into an inner join.  However, if the filter is pushed 
above the join, it will remain an outer join, since it will only filter the 
records from B where t  0 and not the outer records from A.  Thus this 
transformation is not output neutral.

 PIG Join fails while doing a filter on joined data
 --

 Key: PIG-1289
 URL: https://issues.apache.org/jira/browse/PIG-1289
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Karim Saadah
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1289-1.patch


 PIG Join fails while doing a filter on joined data
 Here are the steps to reproduce it:
 -bash-3.1$ pig -latest -x local
 grunt a = load 'first.dat' using PigStorage('\u0001') as (f1:int, 
 f2:chararray);
 grunt DUMP a;
 (1,A)
 (2,B)
 (3,C)
 (4,D)
 grunt b = load 'second.dat' using PigStorage() as (f3:chararray);
 grunt DUMP b;
 (A)
 (D)
 (E)
 grunt c = join a by f2 LEFT OUTER, b by f3;
 grunt DUMP c;
 (1,A,A)
 (2,B,)
 (3,C,)
 (4,D,D)
 grunt describe c;
 c: {a::f1: int,a::f2: chararray,b::f3: chararray}
 grunt d = filter c by (f3 is null or f3 =='');
 grunt dump d;
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for b
 2010-03-03 15:00:37,129 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for a
 2010-03-03 15:00:37,130 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for a
 2010-03-03 15:00:37,130 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias d
 This one is failing too:
 grunt d = filter c by (b::f3 is null or b::f3 =='');
 or this one not returning results as expected:
 grunt d = foreach c generate f1 as f1, f2 as f2, f3 as f3;
 grunt e = filter d by (f3 is null or f3 =='');
 grunt DUMP e;
 (1,A,)
 (2,B,)
 (3,C,)
 (4,D,)
 while the expected result is
 (2,B,)
 (3,C,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-03-16 Thread Olga Natkovich (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846229#action_12846229
 ] 

Olga Natkovich commented on PIG-1150:
-

We would like to cut release branch next Monday. This mean that the code needs 
to be committed by the end of the week. Is this likely to happen?

If not, I would like to unlink this from 0.7.0 release and leave for inclusion 
in one of the future releases when this patch is ready.

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

2010-03-16 Thread Olga Natkovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1284:



Since we are planning to branch for release next Monday, 3/22, it needs to be 
ready to be committed by the end of the week. Otherwise, we should schedule it 
for the next release.

Please, update the target version accordingly.

 pig UDF is lacking XMLLoader. Plan to add the XMLLoader
 ---

 Key: PIG-1284
 URL: https://issues.apache.org/jira/browse/PIG-1284
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Alok Singh
 Fix For: 0.7.0

 Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Hi All,
  We are planning to add the XMLLoader UDF in the piggybank repository.
 Here is the proposal with the user docs :-
  The load function to load the XML file
  This will implements the LoadFunc interface which is used to parse records
  from a dataset.
  This takes a xmlTag as the arg which it will use to split the inputdataset 
 into
  multiple records.
  For example if the input xml (input.xml) is like this
  configuration
  property
  name foobar /name
  value barfoo /value
  /property
  ignoreProperty
  name foo /name
  /ignoreProperty
  property
  name justname /name
  /property
  /configuration
  And your pig script is like this
  --load the jar files
  register loader.jar;
  -- load the dataset using XMLLoader
  -- A is the bag containing the tuple which contains one atom i.e doc see 
 output
  A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as 
 (doc:chararray);
  --dump the result
  dump A;
  Then you will get the output
 (property
 name foobar /name
 value barfoo /value
 /property)
 (property
 name justname /name
 /property)
 Where each () indicate one record
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

2010-03-16 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846236#action_12846236
 ] 

Allen Wittenauer commented on PIG-1293:
---

You likely have PIG_HOME configured.

Unset it, then try running bash -x pig and the message about being unable to 
find pig-env.sh won't be hidden by bash.  BTW, the hadoop equiv jira is 
HADOOP-6630, as it suffers from the same problem.

 pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set
 -

 Key: PIG-1293
 URL: https://issues.apache.org/jira/browse/PIG-1293
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Allen Wittenauer
 Attachments: PIG-1293.txt


 If PIG_HOME isn't set and pig is in the path, the pig wrapper script can't 
 find its home.  Setting PIG_HOME makes it hard to support multiple versions 
 of pig. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Branching for Pig 0.7.0

2010-03-16 Thread Olga Natkovich

Hi,

 

If you have an issue assigned to you for Pig 0.7.0 release, please, make
sure that it can be committed by the end of the week since we are aiming
to branch for the release by next Monday, 3/22. If you don't think the
issue can be addressed by then but feel strongly that it needs to be in
Pig 0.7.0, please, respond with your reasoning. Otherwise, please,
unlink from the release any issues that will not meet the deadline.

 

Thanks,

 

Olga

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

2010-03-16 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846243#action_12846243
 ] 

Allen Wittenauer commented on PIG-1293:
---

Err, not PIG_HOME, PIG_CONF_DIR.

 pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set
 -

 Key: PIG-1293
 URL: https://issues.apache.org/jira/browse/PIG-1293
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Allen Wittenauer
 Attachments: PIG-1293.txt


 If PIG_HOME isn't set and pig is in the path, the pig wrapper script can't 
 find its home.  Setting PIG_HOME makes it hard to support multiple versions 
 of pig. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-03-16 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846273#action_12846273
 ] 

Dmitriy V. Ryaboy commented on PIG-1205:


You can suppress the unchecked warning with @SuppresWarnings(unchecked), and 
comment why it's ok to suppress the warning

I've been playing with using HBase through pig using the 0.6 loader, and I must 
say, it's very far from being ready from prime-time. I don't know whether we 
need to exert too much effort to get this under the wire when it won't really 
be usable anyway until much further love is applied.

-D

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
 PIG_1205_4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-03-16 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846287#action_12846287
]

Hadoop QA commented on PIG-1287:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12438984/PIG-1287-2.patch
against trunk revision 924034.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 5 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/240/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/240/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/240/console

This message is automatically generated.

Use hadoop-0.20.2 with pig 0.7.0 release

Key: PIG-1287
URL: https://issues.apache.org/jira/browse/PIG-1287
Project: Pig
Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Fix For: 0.7.0

Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch

Use hadoop-0.20.2 with pig 0.7.0 release

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1301) Problem pruning columns with UDF

[jira] Updated: (PIG-1292) Interface Refinements

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Branching for Pig 0.7.0 release

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

[jira] Resolved: (PIG-1301) Problem pruning columns with UDF

[jira] Commented: (PIG-1301) Problem pruning columns with UDF

[jira] Updated: (PIG-1289) PIG Join fails while doing a filter on joined data

[jira] Commented: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

[jira] Created: (PIG-1302) Include zebra's

[jira] Updated: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

[jira] Commented: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

[jira] Commented: (PIG-1289) PIG Join fails while doing a filter on joined data

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

[jira] Commented: (PIG-1289) PIG Join fails while doing a filter on joined data

[jira] Commented: (PIG-1150) VAR() Variance UDF

[jira] Updated: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

Branching for Pig 0.7.0

[jira] Commented: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

33 matches

Site Navigation

Mail list logo

Footer information