[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773453#action_12773453 ]
Hadoop QA commented on PIG-997: ------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423995/SortedTable.patch against trunk revision 832599. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 177 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/138/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/138/console This message is automatically generated. > [zebra] Sorted Table Support by Zebra > ------------------------------------- > > Key: PIG-997 > URL: https://issues.apache.org/jira/browse/PIG-997 > Project: Pig > Issue Type: New Feature > Reporter: Yan Zhou > Assignee: Yan Zhou > Fix For: 0.6.0 > > Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, > SortedTable.patch > > > This new feature is for Zebra to support sorted data in storage. As a storage > library, Zebra will not sort the data by itself. But it will support creation > and use of sorted data either through PIG or through map/reduce tasks that > use Zebra as storage format. > The sorted table keeps the data in a "totally sorted" manner across all > TFiles created by potentially all mappers or reducers. > For sorted data creation through PIG's STORE operator , if the input data is > sorted through "ORDER BY", the new Zebra table will be marked as sorted on > the sorted columns; > For sorted data creation though Map/Reduce tasks, three new static methods > of the BasicTableOutput class will be provided to allow or help the user to > achieve the goal. "setSortInfo" allows the user to specify the sorted columns > of the input tuple to be stored; "getSortKeyGenerator" and "getSortKey" help > the user to generate the key acceptable by Zebra as a sorted key based upon > the schema, sorted columns and the input tuple. > For sorted data read through PIG's LOAD operator, pass string "sorted" as an > extra argument to the TableLoader constructor to ask for sorted table to be > loaded; > For sorted data read through Map/Reduce tasks, a new static method of > TableInputFormat class, requireSortedTable, can be called to ask for a sorted > table to be read. Additionally, an overloaded version of the new method can > be called to ask for a sorted table on specified sort columns and comparator. > For this release, sorted table only supported sorting in ascending order, not > in descending order. In addition, the sort keys must be of simple types not > complex types such as RECORD, COLLECTION and MAP. > Multiple-key sorting is supported. But the ordering of the multiple sort keys > is significant with the first sort column being the primary sort key, the > second being the secondary sort key, etc. > In this release, the sort keys are stored along with the sort columns where > the keys were originally created from, resulting in some data storage > redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.