from:"Raghu Angadi"

[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770336#action_12770336
 ] 

Raghu Angadi commented on PIG-1053:
---

a big +1.

It is understandable from PIG developer's point of view to be annoyed by 
beginners complaining about run time with toy local inputs. may be clear 
heads-up in tutorial would reduce those.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766841#action_12766841
 ] 

Raghu Angadi commented on PIG-993:
--


I think the test needs to be fixed.  It deletes 6 column groups from 6 
different threads. The spec explicitly states read accesses and parallel 
deletions expected to fail. But the table is always left in consistent state. 
The rationale for this is that in practice these tables are accessed from 
different machines and it would add unnecessary complication to support 
coordinate all the readers and the writers (especially with no locking support 
on HDFS). Zebra tables have no state outside the directory. This applies to 
writing as well.

One options I see is to make each thread make multiple attempts in case of 
errors. 
  

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>    Reporter: Raghu Angadi
>    Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-11 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764552#action_12764552
 ] 

Raghu Angadi commented on PIG-993:
--

This patch depends on PIG-992. It is not a functional dependency and can be 
removed if required.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>    Reporter: Raghu Angadi
>    Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Patch Available  (was: Open)

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Open  (was: Patch Available)

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs-2.patch, Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan!

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Attachment: Bugs-2.patch

I am committing a slightly modified patch. I removed the following lines that 
modified build.xml at the top level. Please ask one of the PIG committers to 
commit that change.

The part that is removed :
{noformat}
@@ -940,4 +942,13 @@

  

+
+
+
+
+
 
{noformat}

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs-2.patch, Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763836#action_12763836
 ] 

Raghu Angadi commented on PIG-987:
--

Thanks Yan. It might be better to remove gauravj also since it is ignored 
anyway. 

This implies column access control is not tested in this patch, right?

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763516#action_12763516
 ] 

Raghu Angadi commented on PIG-987:
--

> Can you chgrp a local FS file to a group called "users" on your box?
No.

Its the same problem. I don't have a group called "users".. and I don't think 
we can require others to have it.

I didn't know owner is ignored. It is still allowed by storage hint?

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-07 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763346#action_12763346
 ] 

Raghu Angadi commented on PIG-987:
--

I finally got some time look into this. Yes. I think the it should be fixed in 
the tests. TestColumnGroup.java says :  
{noformat}
ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted,
"pig", "gz", "gauravj", "users", (short) Short.parseShort("755", 8), 
false, conf);
{noformat}

using local FS. How can we expect users to have a user name "gauravj" on their 
machines and run as superusers :)? just can not be done.

If the test wants to run with these permissions we should do :
 a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all 
the permissions on a MiniDFS.
 b) minor : use a generic name than gauravj.


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-07 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

Attachment: tmp-987-plus-991.patch
TEST-org.apache.hadoop.zebra.io.TestCheckin.txt

Attachments :
   # tmp-987-plus-991.patch : latest patch here + patch for PIG-991
   # TEST-org.apache.hadoop.zebra.io.TestCheckin.txt : output of the failed 
tests.

Yan,  looks like lzo related errors are fixed with the combined patch. But 
there are still some failures. I think some of these failures exist on trunk as 
well.

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762871#action_12762871
 ] 

Raghu Angadi commented on PIG-987:
--

Even with PIG-991 included, I am seeing lzo related failures. Could you run 
tests on a clean checkout? If you didn't see the errors before then you 
probably have lzo set up in your environment, which is not a requirement. 



> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Fix Version/s: 0.6.0

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>        Reporter: Raghu Angadi
>    Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762829#action_12762829
 ] 

Raghu Angadi commented on PIG-987:
--

Not sure if this is related to PIG. When I applied PIG-991 over this, the tests 
passed (except the ones that fail on trunk).


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

Attachment: TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt

I am attaching {{mapred.TestCheckin.txt}} that passes without the patch.

btw, not all tests pass even without the patch. What is the environment 
required? I did a fresh check out, and ran 'ant test'.

I guess the tests failures on trunk are related to lzo. But I didn't expect 
more failures with the patch.

Looks like PIG-991 removes the lzo dependency. I will try with that patch 
included.

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Release Note:   (was: Patch should be applied after that of Jira987.)

bq. Patch should be applied after that of Jira987.

[moved above comment from 'Release Notes' to this comment].

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762812#action_12762812
 ] 

Raghu Angadi commented on PIG-987:
--

I tried to commit this patch. 'ant test' says all the tests fail, where as only 
one two tests fail without the patch.

Does Hudson actual run Zebra tests?


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761769#action_12761769
 ] 

Raghu Angadi commented on PIG-993:
--

> zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.
I meant say PIG-986.


> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>    Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Attachment: zebra-drop-cg.patch
DropColumnGroupExample.java

Attachments ; 

  DropColumnGropuExample.java : a simple example to illustrate the 
functionality.

  zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.

  Some of the tests included there are written by Jing Huang. Jing also helped 
with testing the patchon real clusters with various errors. Yan Zhou helped 
with correctly handling missing column groups.



> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>        Reporter: Raghu Angadi
>    Assignee: Raghu Angadi
> Fix For: 0.5.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761767#action_12761767
 ] 

Raghu Angadi commented on PIG-993:
--

Deletion procedure : 

   # Check if a column group with the given name exists and throw an error if 
there is no such group.
   # If the column group is already deleted return normally.
  ** If a column group is already marked deleted and the corresponding 
physical directory still 
exists, try to remove the the column group data again. An earlier 
attempt might not have
removed the directory.
   # Create a an empty file ".deleted-CGNAME" in the top level directory. 
   # If the creation fails, check if the file already exists. This can happen 
when two users concurrently
  try to delete the same column group. If CG is marked deleted after this, 
return success. Exception is 
  thrown for any other error.
   # Delete the column group directory. 
   # An exception is thrown if deletion fails. Note that, column group is 
already marked deleted even though 
  the deletion of a directory failed. A subsequent deletion of such a 
column group will again try to to delete the directory.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761766#action_12761766
 ] 

Raghu Angadi commented on PIG-993:
--


API  is pretty simple : {code}
class org.apache.hadoop.zebra.BasicTable {
 /** see the patch for JavaDoc and attached example for usage */

public static void dropColumnGroup(Path path,
   Configuration conf,   String cgName)
   throws IOException { ... }
}
{code}

  * Table schema is not modified.  
  * this API takes a name for a column group. PIG-986 adds explicit names for 
CGs.
  * Once a CGs is deleted, NULL is returned for the fields that were stored in 
the CG. 
 ** This is the main difference between just manually deleting  a directory 
on filesystem and 'properly' deleting a CG.
 ** Many changes made in other parts of zebra are related to handling the 
missing CGs.


> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)

[zebra] Abitlity to drop a column group in a table
--

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0



A Zebra table is stored as multiple sub tables each containing a set of columns 
called column group (CG). The user specifies how these columns are grouped 
while creating a table through the _storage hint_.

For some of the large tables, it might be necessary for users to remove a set 
of columns and retain the rest. This jira provides a way for users to delete an 
entire column group. 

The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-985) [zebra] Make necessary changes to build scripts to accommodate new zebra features plus other improvement.

2009-09-30 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761045#action_12761045
 ] 

Raghu Angadi commented on PIG-985:
--

> 5) drop column group change (Raghu Angadi)
> 6) schema package separation change (Yan Zhou)

Just to clarify, this patch does not contain the above two features. It only 
contains couple of minor changes made in build.xml as part of these changes. 
Separate jiras will be filed for these two and other features soon. 


> [zebra] Make necessary changes to build scripts to accommodate new zebra 
> features plus other improvement.
> -
>
> Key: PIG-985
> URL: https://issues.apache.org/jira/browse/PIG-985
> Project: Pig
>  Issue Type: Task
>  Components: build
>Reporter: Chao Wang
>Assignee: Chao Wang
> Attachments: patch
>
>
> The whole task consists of a series of steps as follows:
> 1) nightly test change - prevent checkin tests from running twice in nightly 
> (Chao Wang)
> 2) row based block splits for tables change (Raghu Angadi)
> 3) add clover target (Jing Huang)
> 4) add findbugs target (Chao Wang)
> 5) drop column group change (Raghu Angadi) 
> 6) schema package separation change (Yan Zhou)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759789#action_12759789
 ] 

Raghu Angadi commented on PIG-949:
--

I just committed this. Thanks Yan for the fix and Jing for the test!

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

   Resolution: Fixed
Fix Version/s: (was: 0.4.0)
   Status: Resolved  (was: Patch Available)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [VOTE] Release Pig 0.4.0 (candidate 2)

2009-09-22 Thread Raghu Angadi



+1. ran 'ant test-core'.

contrib/zebra: 'ant test' passed after following directions as suggested 
: got a patch from PIG-660, and hadoop20.jar from PIG-833. For clarity 
we might attach patch suitable for PIG-660 for 0.4.


Raghu.

Olga Natkovich wrote:

Hi,

The new version is available in
http://people.apache.org/~olga/pig-0.4.0-candidate-2/.

I see one failure in a unit test in piggybank (contrib.) but it is not
related to the functions themselves but seems to be an issue with
MiniCluster and I don't feel we need to chase this down. I made sure
that the same test runs ok with Hadoop 20.

Please, vote by end of day on Thursday, 9/24.

Olga

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Thursday, September 17, 2009 12:09 PM

To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 1)

Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, September 14, 2009 2:06 PM

To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,

 


I created a candidate build for Pig 0.4.0 release. The highlights of
this release are

 


-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data sets.

-  Support for Outer join.

-  Works with Hadoop 18

 


I ran the release audit and rat report looked fine. The relevant part is
attached below.

 


Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.

 


Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.

 


Should we release this? Vote closes on Thursday, 9/17.

 


Olga

 

 


 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG
ES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x
ml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer
ence.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users
.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li
st
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes.
html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS
inces.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com
ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
changes-summary.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_changes.html
 [java]  !?

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Fix Version/s: 0.5.0
   0.4.0
   Status: Patch Available  (was: Open)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.4.0, 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Status: Open  (was: Patch Available)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758328#action_12758328
 ] 

Raghu Angadi commented on PIG-949:
--

Yan, please include the test case in the patch. 

Also I would suggest a regular name for the test case file something like 
'TestMapAcrossMultipleCGs.java' or something shorter. Inside the file you could 
mention JIRA number in the comment.

Raghu.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-03 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi reassigned PIG-918:


Assignee: Yan Zhou

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-03 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi resolved PIG-918.
--

Resolution: Fixed

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750055#action_12750055
 ] 

Raghu Angadi commented on PIG-918:
--

I just committed this. Thanks Yan.

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Affects Version/s: (was: 0.3.0)
   0.4.0

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Attachment: pig-zebra.patch

When you generate a patch with 'git diff' please use 'git diff --no-prefix' so 
that patch applies with 'patch -p0' command. I am updating the attached patch 
with this change.


> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745219#action_12745219
 ] 

Raghu Angadi commented on PIG-833:
--

Thanks Jing. There are some PIG examples listed at the bottom of Zebra wiki : 
http://wiki.apache.org/pig/zebra (wiki is still under construction).

Just listing java strings in Jing's comment with out Jira formatting :

{noformat}
final static String STR_SCHEMA = 
 "s1:bool, s2:int, s3:long, s4:float, s5:string, s6:bytes, " +
 "r1:record(f1:int, f2:long), r2:record(r3:record(f3:float, f4)), " +
 "m1:map(string),m2:map(map(int)), c:collection(f13:double, f14:float, 
f15:bytes)";

final static String STR_STORAGE = 
  "[s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}];  " +
  "[r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}]";
{noformat}

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-18 Thread Raghu Angadi



Right. I just noticed the mails on Pig.0.4.0. I joined pig-dev list just 
yesterday. waiting for 0.4.0 might be good enough if it is just a couple 
of weeks. will keep a watch on it.


I think we will wait for a few days and attach any new feature patches 
to jiras. Those patches can certainly wait there. For interdependencies 
of the patches, we might maintain a private git.


Raghu.

Santhosh Srinivasan wrote:

I would recommend that zebra wait for Pig 0.4.0 (a couple of weeks?). A
branch will be created for the 0.4.0 release and zebra will
automatically benefit.

Santhosh

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Tuesday, August 18, 2009 9:49 AM

To: pig-dev@hadoop.apache.org
Subject: Re: Proposal to create a branch for contrib project Zebra

Milind A Bhandarkar wrote:

Since zebra.jar is not included in pig.jar (I hope not), I can still

use

stable zebra jar (binary) with latest pig compiled in trunk.


The problem is that though the current version is "expected to be" 
stable, it would still require some bug fixes. We essentially need to 
maintain another branch (official or a private git) to provide version 
0.1 jar with critical bug fixes.


In that sense, would it be better if we created a "zebra-v1" branch and 
commit the new features to trunk? May be for regular users we can create


Pig.jar and zebra.jar from different lines.

Raghu.


Also, build failure in zebra need not impact pig release, since the

other

contrib, i.e. Piggybank is also "build-optional".

I think that creating a branch results in too many changes on that

branch

before a mainline merge happens. Each of the feature additions you

mention

would be very highly desirable even in the absence of others.

Just my 2 non-binding cents.

- milind

Re: Proposal to create a branch for contrib project Zebra

2009-08-18 Thread Raghu Angadi


Milind A Bhandarkar wrote:


Since zebra.jar is not included in pig.jar (I hope not), I can still use
stable zebra jar (binary) with latest pig compiled in trunk.


The problem is that though the current version is "expected to be" 
stable, it would still require some bug fixes. We essentially need to 
maintain another branch (official or a private git) to provide version 
0.1 jar with critical bug fixes.


In that sense, would it be better if we created a "zebra-v1" branch and 
commit the new features to trunk? May be for regular users we can create 
Pig.jar and zebra.jar from different lines.


Raghu.


Also, build failure in zebra need not impact pig release, since the other
contrib, i.e. Piggybank is also "build-optional".

I think that creating a branch results in too many changes on that branch
before a mainline merge happens. Each of the feature additions you mention
would be very highly desirable even in the absence of others.

Just my 2 non-binding cents.

- milind

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi


Raghu Angadi wrote:

Hi Santosh,

There are two separate things :
  (a) voting a contributor as a committer
  (b) committing to a contrib project.


[...]

Reason for (a) is simple scalability. We can not monitor everything. If 


I meant to say "Reason for (b)" (why contrib commits are treated bit 
differently).


Our motivation is not to bypass any oversight.. it is just so that we 
don't to burden PIG committers too much. We are happy if a PIG committer 
volunteers to oversee and commit.


Raghu.

you or another PIG developer volunteers to commit zebra patches, we are 
more than happy to let you do it. Please let us know. Or at any stage, 
if you feel we may be violating normal conventions (like breaking builds 
or committing some PIG changes).. please raise the issue. We have not 
seen serious problems in this regd with any other project, I think we 
should get benefit or doubt.


I have not addressed the reason for a new branch here. will pitch for it 
another mail.


Raghu.

Santhosh Srinivasan wrote:

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh
-Original Message-----
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 
17, 2009 4:06 PM

To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.


In short, Zebra is a table storage layer built for use in PIG and 
other Hadoop applications.


While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and


in the new branch. We will merge the branch when it is ready. We 
expect the changes to affect only 'contrib/zebra' directory.


As a regular contributor to Hadoop, I will be the initial committer 
for Zebra. As more patches are contributed by other Zebra developers, 
there might be more commiters added through normal Hadoop/Apache 
procedure.


I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi



The reason for a branch is purely based on fair number of improvements 
we are planning for Zebra and our desire to have a stable Zebra 
implementation for users to use along with PIG on Hadoop-0.20.


New features planned (jiras will be filed soon) :
   * Column security (different permissions for different columns)
   * Ability to drop columns
   * ability to address "column groups" by name
   * Support for sorted tables, map side joins,
   * ...

Many of these changes involve changes to table metadata, schema syntax, 
 and on disk format of the metadata (all of these will be backward 
compatible).


If Zebra was a project of its own, one would have made a 0.1.0 branch 
and worked on new features in the trunk. The new proposed branch is for 
achieving the same by keeping PIG and stable Zebra together. PIG branch 
0.4.0 will be made when it is appropriate for PIG. Generally, a contrib 
project should not influence that decision.


Is there an alternative to creating a branch? Would you prefer we commit 
new features to a line that is being used by users?


Raghu.

Milind A Bhandarkar wrote:

IANAC, but my (non-binding) vote is also -1. I think all the improvements
and feature addition to zebra should be available through pig trunk. The
codebase is not big enough to justify creating a branch. If the reason is
Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry
should be taken up asap, so that those who want to use zebra can use pig
trunk with hadoop 0.20

- milind


On 8/17/09 5:14 PM, "Yiping Han"  wrote:


+1


On 8/18/09 7:11 AM, "Olga Natkovich"  wrote:


+1

-----Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for
Zebra. As more patches are contributed by other Zebra developers, there
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

--
Yiping Han
F-3140 
(408)349-4403

y...@yahoo-inc.com

[jira] Commented: (PIG-833) Storage access layer

2009-08-17 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744361#action_12744361
 ] 

Raghu Angadi commented on PIG-833:
--


will try to get some initial docs attached to this jira asap. I think the 
current plan is to have proper wiki pages (and attached here). This is part of 
the reason by we would like to keep this jira open.

The bulk initial dump is certainly not desirable but has been fairly common for 
many contrib projects in Hadoop. A bit of rush to get this committed to contrib 
is in part to avoid such large changes going again. The longer we delay larger 
the patch is going to get. We want to get the subsequent patches and 
discussions to public jira asap and we are already doing that.

I would like to clarify that this is not a PIG feature but rather a contrib 
project. We would not want this commit to be generalized for PIG commits. All 
the responsibility is with Zebra team. This patch is the initial verion. It 
does include many tests. 






> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi


Hi Santosh,

There are two separate things :
  (a) voting a contributor as a committer
  (b) committing to a contrib project.

(b):
My experience with Hadoop is that "Contrib" by definition is very 
loosely coupled with core. By convention, we as committers to core 
(hdfs, mapred, etc) did not have to monitor changes to contrib as 
thoroughly as we would monitor core changes. It is the responsibility of 
contrib developers to make sure they are not breaking builds etc. 
Contrib changes get reviewed by people interested in the project.


(a):
Voting takes place when a contributor is being blessed as a committer. 
It involves some legal stuff as well. Although a committer has 
permissions to commit to any part of a project, it is expected that they 
don't misuse it. e.g. if I have a patch for core Map/Reduce, I would 
certainly wait for a regular MR contributor to review it and possibly 
commit it. It does not matter how many patches I might have contributed 
to say HDFS.


Reason for (a) is simple scalability. We can not monitor everything. If 
you or another PIG developer volunteers to commit zebra patches, we are 
more than happy to let you do it. Please let us know. Or at any stage, 
if you feel we may be violating normal conventions (like breaking builds 
or committing some PIG changes).. please raise the issue. We have not 
seen serious problems in this regd with any other project, I think we 
should get benefit or doubt.


I have not addressed the reason for a new branch here. will pitch for it 
another mail.


Raghu.

Santhosh Srinivasan wrote:

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 


-Original Message-----
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM

To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.


In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.


While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and


in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.


As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.


I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi



Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.


In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.


While we are stabilizing current version V1 in the trunk, we plan to add 
more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and 
in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.


As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.


I would like to create a branch called 'zebra-v2' with approval from PIG 
team.


Thanks,
Raghu.

[jira] Commented: (PIG-833) Storage access layer

2009-08-12 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742435#action_12742435
 ] 

Raghu Angadi commented on PIG-833:
--

>  this means Pig contrib/ is no longer compatible with Hadoop 18.

This is not desirable and expected to be temporary until PIG-660 is committed. 
PIG-660 has other dependencies different schedule. We thought committing zebra 
will make zebra builds and subsequent patches easier if it is committed. 

As such PIG does not build contrib from top level ('ant test-contrib' is a 
no-op). So each contrib project needs to be build explicitly anyway. This is 
different from Hadoop build. This this patch should not fail existing automated 
builds.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742069#action_12742069
 ] 

Raghu Angadi commented on PIG-833:
--

Alan, in order to run unit tests you need to build pig test-core.

As mentioned in the instructions above please run {{'ant -Dtestcase=none 
test-core'}} under top level directory before running 'ant test' under 
contrib/zebra.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

Updated patch. Only change is that ant prints a descriptive error to user if 
hadoop20.jar does not exist in top level lib directory. It lists basic steps to 
get this built until PIG-660 is committed.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736998#action_12736998
 ] 

Raghu Angadi commented on PIG-833:
--

There will be benchmark results either attached to this jira or to a subsequent 
jira.

I would like to compare to SequenceFiles and the new format in Hive. Should to 
see on par performance.

Major performance benefits come from commonly used projections (through column 
groups) and map side joins of sorted tables. An important part of motivation is 
some features like column security, ability to delete entire columns. 

We are running some larger scale benchmarks internally.. but these run on 
Yahoo's internal data sources.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: zebra-javadoc.tgz

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch

The first cut of contrib/zebra. The patch is very large and should probably 
compress the subsequent versions of it.

More documentation on design and usage will be added to the jira.

How to compile :
--
 * check out latest PIG trunk
 * Apply the latest patch from PIG-660
 * copy attached hadoop20.jar to ./lib
 * run '{{ant jar}}' (and {{'ant -Dtestcase=none test-core'}} for zebra tests).
 * cd contrib/zebra
 * ant jar
 * ant test (for tests).

Currently there are compile time deprecation warnings related to use of 
deprecated mapred API (JobConf). There is will be fixed later.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736424#action_12736424
 ] 

Raghu Angadi commented on PIG-833:
--


Will surely look at Hive's storage layer and SerDe. I will be able to better 
comment on specifics  once I get better handle. In the mean while I will attach 
the work that is already been done on Zebra. 

This is currently a contrib in PIG. Based on these experiences we could 
probably provide a common storage layer more widely suitable for multiple 
Hadoop related projects.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: hadoop20.jar.bz2

Attaching hadoop20.jar that needs to be placed under lib/ directory under the 
top level PIG directory. will included specific instructions later in the jira.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-660:
-

Attachment: PIG-660_6.patch

Updated patch fixes two minor conflicts with the current pig trunk.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736297#action_12736297
 ] 

Raghu Angadi commented on PIG-660:
--

Thanks Olga and Santosh.

build.xml change is already in the patch. Thanks.

I will attach hadoop20.jar that works with PIG. This is useful for anyone to 
tryout the patch. This will also be used by zebra (PIG-833). Please commit the 
jar file to PIG trunk. It could be updated with a later version of hadoop-0.20 
branch.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736264#action_12736264
 ] 

Raghu Angadi commented on PIG-660:
--

Currently, hadoop jar for 0.18 under lib/ is called hadoop18.jar. Should we 
change build.xml to use hadoop20.jar instead of hadoop18.jar?

I can file a jira to commit hadoop20.jar. This might be replaced by updated jar 
when this jira is committed.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

56 matches

Mail list logo