[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777565#action_12777565
 ] 

Alan Gates commented on PIG-1090:
-

Patch looks good.

What's the ReadToEndLoader?

What's the plan for BinStorage?  Are we going to write Input and Output Formats 
for it?  If we have to do that is there an existing binary storage format with 
existing input and output formats that we can use (like Avro or something)?

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

2009-11-13 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1077:
---

Attachment: patch_Pig1077

 [Zebra] to support record(row)-based file split in Zebra's TableInputFormat
 ---

 Key: PIG-1077
 URL: https://issues.apache.org/jira/browse/PIG-1077
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_Pig1077


 TFile currently supports split by record sequence number (see Jira 
 HADOOP-6218). We want to utilize this to provide record(row)-based input 
 split support in Zebra.
 One prominent benefit is that: in cases where we have very large data files, 
 we can create much more fine-grained input splits than before where we can 
 only create one big split for one big file.
 In more detail, the new row-based getSplits() works by default (user does not 
 specify no. of splits to be generated) as follows: 
 1) Select the biggest column group in terms of data size, split all of its 
 TFiles according to hdfs block size (64 MB or 128 MB) and get a list of 
 physical byte offsets as the output per TFile. For example, let us assume for 
 the 1st TFile we get offset1, offset2, ..., offset10; 
 2) Invoke TFile.getRecordNumNear(long offset) to get the RecordNum of a 
 key-value pair near a byte offset. For the example above, say we get 
 recordNum1, recordNum2, ..., recordNum10; 
 3) Stitch [0, recordNum1], [recordNum1+1, recordNum2], ..., [recordNum9+1, 
 recordNum10], [recordNum10+1, lastRecordNum] splits of all column groups, 
 respectively to form 11 record-based input splits for the 1st TFile. 
 4) For each input split, we need to create a TFile scanner through: 
 TFile.createScannerByRecordNum(long beginRecNum, long endRecNum). 
 Note: conversion from byte offset to record number will be done by each 
 mapper, rather than being done at the job initialization phase. This is due 
 to performance concern since the conversion incurs some TFile reading 
 overhead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: FYI - forking TFile off Hadoop into Zebra

2009-11-13 Thread Alan Gates


On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote:


On Wed, Nov 11, 2009 at 18:26, Chao Wang ch...@yahoo-inc.com wrote:


Last, we would like to point out that this is a short term solution  
for

Zebra and we plan to:
1) port all changes to Zebra TFile back into Hadoop TFile.
2) in the long run have a single unified solution for this.

Just for clarity, in long run as Zebra stabilizes and Pig adopts

hadoop-0.22, Zebra will get rid of this fork?


I think the promise is they'll get rid of the fork at some point, not  
necessarily at 0.22 though.


Alan.



Ashutosh




[jira] Assigned: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-13 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1091:
-

Assignee: Yan Zhou

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor

 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-13 Thread Yan Zhou (JIRA)
[zebra] Exception when load with projection of map keys on a map column that is 
not map split 
--

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Priority: Minor


With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection 
of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777589#action_12777589
 ] 

Daniel Dai commented on PIG-1038:
-

Continue with the last comment.

4. Strip secondary keys from the value

5. Write a byte version of OutputKeyComparator

 Optimize nested distinct/sort to use secondary key
 --

 Key: PIG-1038
 URL: https://issues.apache.org/jira/browse/PIG-1038
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.4.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, 
 PIG-1038-4.patch, PIG-1038-5.patch


 If nested foreach plan contains sort/distinct, it is possible to use hadoop 
 secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
 query. 
 Eg1:
 A = load 'mydata';
 B = group A by $0;
 C = foreach B {
 D = order A by $1;
 generate group, D;
 }
 store C into 'myresult';
 We can specify a secondary sort on A.$1, and drop order A by $1.
 Eg2:
 A = load 'mydata';
 B = group A by $0;
 C = foreach B {
 D = A.$1;
 E = distinct D;
 generate group, E;
 }
 store C into 'myresult';
 We can specify a secondary sort key on A.$1, and simplify D=A.$1; E=distinct 
 D to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1093) pig.properties file is missing from distributions

2009-11-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777598#action_12777598
 ] 

Alan Gates commented on PIG-1093:
-

This also affects the 0.6 release, and should be repaired before that release.

 pig.properties file is missing from distributions
 -

 Key: PIG-1093
 URL: https://issues.apache.org/jira/browse/PIG-1093
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.5.0, 0.6.0
Reporter: Alan Gates

 pig.properties (in fact the entire conf directory) is not included in the 
 jars distributed as part of the 0.5 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777599#action_12777599
 ] 

Pradeep Kamath commented on PIG-1090:
-

bq. What's the ReadToEndLoader?
This is a internal utility LoadFunc I wrote to make it easy to read side files. 
It encapsulates the real Loader. Though this has been implemented as a 
LoadFunc, the only LoadFunc method which is truly implemented is getNext(). The 
usage pattern is to construct an instance using the constructor which would 
take a reference to the true LoadFunc (which can read the side file data) and 
then repeatedly call getNext() till null is encountered in the return value. 
The implementation of ReadToEndLoader hides the actions of getting InputSplits 
from the underlying InputFormat and then processing each split by getting the 
RecordReader and processing data in the split before moving to the next.

bq. What's the plan for BinStorage?
An input and output format has already been created and checked in in this 
branch for Binstorage

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1090:


  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to load-store-redesign branch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1064:


Attachment: PIG-1064-4.patch

Attach a patch to fix TestSecondarySort unit failure.

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1064:


Status: Patch Available  (was: Open)

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1094) Fix unit tests corresponding to source changes so far

2009-11-13 Thread Pradeep Kamath (JIRA)
Fix unit tests corresponding to source changes so far
-

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777699#action_12777699
 ] 

Hadoop QA commented on PIG-1064:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424878/PIG-1064-4.patch
  against trunk revision 835499.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/console

This message is automatically generated.

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1247#action_1247
 ] 

Pradeep Kamath commented on PIG-1064:
-

Can't make out what is wrong with the unit tests from the report above - am 
running them all on my local box - will update with the results

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2009-11-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1072:
-

Assignee: Richard Ding

 ReversibleLoadStoreFunc interface should be removed to enable different load 
 and store implementation classes to be used in a reversible manner
 ---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Richard Ding



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1062:
---

Attachment: PIG-1062.patch.3

New patch after merge with latest changes to load-store-redesign branch. 
Incompatible with trunk .
Pasting output of test-patch (test cases have not been updated)

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 load-store-redesign branch: change SampleLoader and subclasses to work with 
 new LoadFunc interface 
 ---

 Key: PIG-1062
 URL: https://issues.apache.org/jira/browse/PIG-1062
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1062.patch, PIG-1062.patch.3


 This is part of the effort to implement new load store interfaces as laid out 
 in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
 PigStorage and BinStorage are now working.
 SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
 be changed to work with new LoadFunc interface.  
 Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
 PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1062:
---

Status: Patch Available  (was: Open)

 load-store-redesign branch: change SampleLoader and subclasses to work with 
 new LoadFunc interface 
 ---

 Key: PIG-1062
 URL: https://issues.apache.org/jira/browse/PIG-1062
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1062.patch, PIG-1062.patch.3


 This is part of the effort to implement new load store interfaces as laid out 
 in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
 PigStorage and BinStorage are now working.
 SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
 be changed to work with new LoadFunc interface.  
 Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
 PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1294#action_1294
 ] 

Hadoop QA commented on PIG-1062:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424927/PIG-1062.patch.3
  against trunk revision 835499.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/156/console

This message is automatically generated.

 load-store-redesign branch: change SampleLoader and subclasses to work with 
 new LoadFunc interface 
 ---

 Key: PIG-1062
 URL: https://issues.apache.org/jira/browse/PIG-1062
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1062.patch, PIG-1062.patch.3


 This is part of the effort to implement new load store interfaces as laid out 
 in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
 PigStorage and BinStorage are now working.
 SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
 be changed to work with new LoadFunc interface.  
 Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
 PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

2009-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1293#action_1293
 ] 

Hadoop QA commented on PIG-1077:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424874/patch_Pig1077
  against trunk revision 835499.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 104 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/console

This message is automatically generated.

 [Zebra] to support record(row)-based file split in Zebra's TableInputFormat
 ---

 Key: PIG-1077
 URL: https://issues.apache.org/jira/browse/PIG-1077
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_Pig1077


 TFile currently supports split by record sequence number (see Jira 
 HADOOP-6218). We want to utilize this to provide record(row)-based input 
 split support in Zebra.
 One prominent benefit is that: in cases where we have very large data files, 
 we can create much more fine-grained input splits than before where we can 
 only create one big split for one big file.
 In more detail, the new row-based getSplits() works by default (user does not 
 specify no. of splits to be generated) as follows: 
 1) Select the biggest column group in terms of data size, split all of its 
 TFiles according to hdfs block size (64 MB or 128 MB) and get a list of 
 physical byte offsets as the output per TFile. For example, let us assume for 
 the 1st TFile we get offset1, offset2, ..., offset10; 
 2) Invoke TFile.getRecordNumNear(long offset) to get the RecordNum of a 
 key-value pair near a byte offset. For the example above, say we get 
 recordNum1, recordNum2, ..., recordNum10; 
 3) Stitch [0, recordNum1], [recordNum1+1, recordNum2], ..., [recordNum9+1, 
 recordNum10], [recordNum10+1, lastRecordNum] splits of all column groups, 
 respectively to form 11 record-based input splits for the 1st TFile. 
 4) For each input split, we need to create a TFile scanner through: 
 TFile.createScannerByRecordNum(long beginRecNum, long endRecNum). 
 Note: conversion from byte offset to record number will be done by each 
 mapper, rather than being done at the job initialization phase. This is due 
 to performance concern since the conversion incurs some TFile reading 
 overhead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.