date:20091113

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777565#action_12777565
 ] 

Alan Gates commented on PIG-1090:
-

Patch looks good.

What's the ReadToEndLoader?

What's the plan for BinStorage?  Are we going to write Input and Output Formats 
for it?  If we have to do that is there an existing binary storage format with 
existing input and output formats that we can use (like Avro or something)?

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

2009-11-13 Thread Chao Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1077:
---

Attachment: patch_Pig1077

 [Zebra] to support record(row)-based file split in Zebra's TableInputFormat
 ---

 Key: PIG-1077
 URL: https://issues.apache.org/jira/browse/PIG-1077
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_Pig1077


 TFile currently supports split by record sequence number (see Jira 
 HADOOP-6218). We want to utilize this to provide record(row)-based input 
 split support in Zebra.
 One prominent benefit is that: in cases where we have very large data files, 
 we can create much more fine-grained input splits than before where we can 
 only create one big split for one big file.
 In more detail, the new row-based getSplits() works by default (user does not 
 specify no. of splits to be generated) as follows: 
 1) Select the biggest column group in terms of data size, split all of its 
 TFiles according to hdfs block size (64 MB or 128 MB) and get a list of 
 physical byte offsets as the output per TFile. For example, let us assume for 
 the 1st TFile we get offset1, offset2, ..., offset10; 
 2) Invoke TFile.getRecordNumNear(long offset) to get the RecordNum of a 
 key-value pair near a byte offset. For the example above, say we get 
 recordNum1, recordNum2, ..., recordNum10; 
 3) Stitch [0, recordNum1], [recordNum1+1, recordNum2], ..., [recordNum9+1, 
 recordNum10], [recordNum10+1, lastRecordNum] splits of all column groups, 
 respectively to form 11 record-based input splits for the 1st TFile. 
 4) For each input split, we need to create a TFile scanner through: 
 TFile.createScannerByRecordNum(long beginRecNum, long endRecNum). 
 Note: conversion from byte offset to record number will be done by each 
 mapper, rather than being done at the job initialization phase. This is due 
 to performance concern since the conversion incurs some TFile reading 
 overhead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: FYI - forking TFile off Hadoop into Zebra

2009-11-13 Thread Alan Gates



On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote:


On Wed, Nov 11, 2009 at 18:26, Chao Wang ch...@yahoo-inc.com wrote:


Last, we would like to point out that this is a short term solution  
for

Zebra and we plan to:
1) port all changes to Zebra TFile back into Hadoop TFile.
2) in the long run have a single unified solution for this.

Just for clarity, in long run as Zebra stabilizes and Pig adopts

hadoop-0.22, Zebra will get rid of this fork?


I think the promise is they'll get rid of the fork at some point, not  
necessarily at 0.22 though.


Alan.



Ashutosh

[jira] Assigned: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-13 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1091:
-

Assignee: Yan Zhou

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor

 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-13 Thread Yan Zhou (JIRA)

[zebra] Exception when load with projection of map keys on a map column that is 
not map split 
--

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Priority: Minor


With schema of f1:string, f2:map, storage info of [f1]; [f2], a projection 
of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-13 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777589#action_12777589
 ] 

Daniel Dai commented on PIG-1038:
-

Continue with the last comment.

4. Strip secondary keys from the value

5. Write a byte version of OutputKeyComparator

 Optimize nested distinct/sort to use secondary key
 --

 Key: PIG-1038
 URL: https://issues.apache.org/jira/browse/PIG-1038
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.4.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, 
 PIG-1038-4.patch, PIG-1038-5.patch


 If nested foreach plan contains sort/distinct, it is possible to use hadoop 
 secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
 query. 
 Eg1:
 A = load 'mydata';
 B = group A by $0;
 C = foreach B {
 D = order A by $1;
 generate group, D;
 }
 store C into 'myresult';
 We can specify a secondary sort on A.$1, and drop order A by $1.
 Eg2:
 A = load 'mydata';
 B = group A by $0;
 C = foreach B {
 D = A.$1;
 E = distinct D;
 generate group, E;
 }
 store C into 'myresult';
 We can specify a secondary sort key on A.$1, and simplify D=A.$1; E=distinct 
 D to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1093) pig.properties file is missing from distributions

2009-11-13 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777598#action_12777598
 ] 

Alan Gates commented on PIG-1093:
-

This also affects the 0.6 release, and should be repaired before that release.

 pig.properties file is missing from distributions
 -

 Key: PIG-1093
 URL: https://issues.apache.org/jira/browse/PIG-1093
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.5.0, 0.6.0
Reporter: Alan Gates

 pig.properties (in fact the entire conf directory) is not included in the 
 jars distributed as part of the 0.5 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Pradeep Kamath (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777599#action_12777599
]

Pradeep Kamath commented on PIG-1090:
-

bq. What's the ReadToEndLoader?
This is a internal utility LoadFunc I wrote to make it easy to read side files.
It encapsulates the real Loader. Though this has been implemented as a
LoadFunc, the only LoadFunc method which is truly implemented is getNext(). The
usage pattern is to construct an instance using the constructor which would
take a reference to the true LoadFunc (which can read the side file data) and
then repeatedly call getNext() till null is encountered in the return value.
The implementation of ReadToEndLoader hides the actions of getting InputSplits
from the underlying InputFormat and then processing each split by getting the
RecordReader and processing data in the split before moving to the next.

bq. What's the plan for BinStorage?
An input and output format has already been created and checked in in this
branch for Binstorage

Update sources to reflect recent changes in load-store interfaces
-

Key: PIG-1090
URL: https://issues.apache.org/jira/browse/PIG-1090
Project: Pig
Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Attachments: PIG-1090.patch

There have been some changes (as recorded in the Changes Section, Nov 2 2009
sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the
load/store interfaces - this jira is to track the task of making those
changes under src. Changes under test will be addresses in a different jira.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-13 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1090:


  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to load-store-redesign branch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1064:


Attachment: PIG-1064-4.patch

Attach a patch to fix TestSecondarySort unit failure.

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1064:


Status: Patch Available  (was: Open)

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1094) Fix unit tests corresponding to source changes so far

2009-11-13 Thread Pradeep Kamath (JIRA)

Fix unit tests corresponding to source changes so far
-

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777699#action_12777699
 ] 

Hadoop QA commented on PIG-1064:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424878/PIG-1064-4.patch
  against trunk revision 835499.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/155/console

This message is automatically generated.

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

2009-11-13 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1247#action_1247
 ] 

Pradeep Kamath commented on PIG-1064:
-

Can't make out what is wrong with the unit tests from the report above - am 
running them all on my local box - will update with the results

 Behvaiour of COGROUP with and without schema when using * operator
 

 Key: PIG-1064
 URL: https://issues.apache.org/jira/browse/PIG-1064
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1064-2.patch, PIG-1064-3.patch, PIG-1064-4.patch, 
 PIG-1064.patch


 I have 2 tab separated files, 1.txt and 2.txt
 $ cat 1.txt 
 
 1   2
 2   3
 
 $ cat 2.txt 
 1   2
 2   3
 I use COGROUP feature of Pig in the following way:
 $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt' as (b0, b1);
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1012: Each COGroup input has to have the same number of inner plans
 Details at logfile: pig_1256845224752.log
 ==
 If I reverse, the order of the schema's
 {code}
 grunt A = load '1.txt' as (a0, a1);
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;  
 {code}
 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1013: Grouping attributes can either be star (*) or a list of expressions, 
 but not both.
 Details at logfile: pig_1256845224752.log
 ==
 Now running without schema??
 {code}
 grunt A = load '1.txt';
 grunt B = load '2.txt';
 grunt C = cogroup A by *, B by *;
 grunt dump C; 
 {code}
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-319926700/tmp-1990275961
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 2
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 154
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-10-29 12:55:37,202 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((1,2),{(1,2)},{(1,2)})
 ((2,3),{(2,3)},{(2,3)})
 ==
 Is this a bug or a feature?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2009-11-13 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1072:
-

Assignee: Richard Ding

 ReversibleLoadStoreFunc interface should be removed to enable different load 
 and store implementation classes to be used in a reversible manner
 ---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Richard Ding



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1062:
---

Attachment: PIG-1062.patch.3

New patch after merge with latest changes to load-store-redesign branch. 
Incompatible with trunk .
Pasting output of test-patch (test cases have not been updated)

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 load-store-redesign branch: change SampleLoader and subclasses to work with 
 new LoadFunc interface 
 ---

 Key: PIG-1062
 URL: https://issues.apache.org/jira/browse/PIG-1062
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1062.patch, PIG-1062.patch.3


 This is part of the effort to implement new load store interfaces as laid out 
 in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
 PigStorage and BinStorage are now working.
 SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
 be changed to work with new LoadFunc interface.  
 Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
 PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1062:
---

Status: Patch Available  (was: Open)

 load-store-redesign branch: change SampleLoader and subclasses to work with 
 new LoadFunc interface 
 ---

 Key: PIG-1062
 URL: https://issues.apache.org/jira/browse/PIG-1062
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1062.patch, PIG-1062.patch.3


 This is part of the effort to implement new load store interfaces as laid out 
 in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
 PigStorage and BinStorage are now working.
 SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
 be changed to work with new LoadFunc interface.  
 Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
 PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2009-11-13 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1294#action_1294
]

Hadoop QA commented on PIG-1062:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424927/PIG-1062.patch.3
against trunk revision 835499.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/156/console

This message is automatically generated.

load-store-redesign branch: change SampleLoader and subclasses to work with
new LoadFunc interface
---

Key: PIG-1062
URL: https://issues.apache.org/jira/browse/PIG-1062
Project: Pig
Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Attachments: PIG-1062.patch, PIG-1062.patch.3

This is part of the effort to implement new load store interfaces as laid out
in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
PigStorage and BinStorage are now working.
SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to
be changed to work with new LoadFunc interface.
Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
PoissonSampleLoader is used by skew join.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

2009-11-13 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1293#action_1293
]

Hadoop QA commented on PIG-1077:

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424874/patch_Pig1077
against trunk revision 835499.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 104 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/49/console

This message is automatically generated.

[Zebra] to support record(row)-based file split in Zebra's TableInputFormat
---

Key: PIG-1077
URL: https://issues.apache.org/jira/browse/PIG-1077
Project: Pig
Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
Fix For: 0.6.0

Attachments: patch_Pig1077

TFile currently supports split by record sequence number (see Jira
HADOOP-6218). We want to utilize this to provide record(row)-based input
split support in Zebra.
One prominent benefit is that: in cases where we have very large data files,
we can create much more fine-grained input splits than before where we can
only create one big split for one big file.
In more detail, the new row-based getSplits() works by default (user does not
specify no. of splits to be generated) as follows:
1) Select the biggest column group in terms of data size, split all of its
TFiles according to hdfs block size (64 MB or 128 MB) and get a list of
physical byte offsets as the output per TFile. For example, let us assume for
the 1st TFile we get offset1, offset2, ..., offset10;
2) Invoke TFile.getRecordNumNear(long offset) to get the RecordNum of a
key-value pair near a byte offset. For the example above, say we get
recordNum1, recordNum2, ..., recordNum10;
3) Stitch [0, recordNum1], [recordNum1+1, recordNum2], ..., [recordNum9+1,
recordNum10], [recordNum10+1, lastRecordNum] splits of all column groups,
respectively to form 11 record-based input splits for the 1st TFile.
4) For each input split, we need to create a TFile scanner through:
TFile.createScannerByRecordNum(long beginRecNum, long endRecNum).
Note: conversion from byte offset to record number will be done by each
mapper, rather than being done at the job initialization phase. This is due
to performance concern since the conversion incurs some TFile reading
overhead.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

[jira] Updated: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

Re: FYI - forking TFile off Hadoop into Zebra

[jira] Assigned: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

[jira] Created: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

[jira] Commented: (PIG-1093) pig.properties file is missing from distributions

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

[jira] Updated: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

[jira] Created: (PIG-1094) Fix unit tests corresponding to source changes so far

[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using * operator

[jira] Assigned: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

[jira] Updated: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

[jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

[jira] Commented: (PIG-1077) [Zebra] to support record(row)-based file split in Zebra's TableInputFormat

19 matches

Site Navigation

Mail list logo

Footer information