[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850484#action_12850484 ] Hadoop QA commented on PIG-1306: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439937/PIG-1306.patch against trunk revision 928080. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/console This message is automatically generated. [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850485#action_12850485 ] Hadoop QA commented on PIG-1331: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439938/owl.contrib.3.tgz against trunk revision 928080. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/253/console This message is automatically generated. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: owl.contrib.3.tgz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850638#action_12850638 ] Jay Tang commented on PIG-1331: --- Owl's data access API, OwlInputFormat, provides a uniform API to access data stored in different storage format like Zebra, RCFile, SequenceFile, etc. Its a single data access abstraction on top of disparate data. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: owl.contrib.3.tgz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850639#action_12850639 ] Carl Steinbach commented on PIG-1331: - bq. Owl's data access API, OwlInputFormat, provides a uniform API to access data stored in different storage format like Zebra, RCFile, SequenceFile, etc. Its a single data access abstraction on top of disparate data. This sounds like Hive's SerDe interface. Are there any differences? Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Attachments: owl.contrib.3.tgz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.