[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to "order by" clause of RDBMS)

Anil Gupta (JIRA) Thu, 10 Jan 2013 01:02:36 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549452#comment-13549452
 ]


Anil Gupta commented on HBASE-7474:
-----------------------------------

License headers in SortingClient.java and 
BigDecimalSortingColumnInterpreter.java are not properly formatted.
Anil: Pending
Some log statements, such as the following, can be at debug level.

+      log.info("Querying only one region for sorting");
Anil: Done

+            if (sortDecreasing) return instance.sortDecreasing(scan, 
columnFamily, columnQualifier,
+              colInterpreter, startIndex, pageSize, true);
+            else return instance.sortIncreasing(scan, columnFamily, 
columnQualifier,

'else' keyword is not needed above.
Anil: Done. But, personally i like to explicitly use the "else" keyword for 
readability purpose. I am curious to know if there is any technical reason for 
not using "else" in the above case?

+   * This method is used to do the merge sort the rows from multiple regions 
and produce the final output

Remove 'do the'. Wrap long line.
Anil: Done

+    for (Map.Entry<byte[], Result[]> regionResultsEntryMap : 
regionResultMap.entrySet()) {

regionResultsEntryMap -> regionResultsEntry or regionResultsMapEntry
Anil: Done

+    if(totalNoOfRows < startIndex)
+    {

Normally left brace is on the same line as if statement. Insert a space between 
if and (.
Anil: Done

currentMaxorMinValueRegion and maxOrMin are used in the if / else blocks. You 
can move them inside if / else block and give them names that are clearer in 
meaning.
Anil: Done

+        for (Result[] regionResult : regionResults) {
+          if ((regionResult.length - 1) < arrayIndex[regionNum]) {

regionResults and arrayIndex are both arrays. So you can use the same index to 
access them - in my opinion the code is more readable.
Anil:Pending

+          finalResult[finalResultCurrentSize++] = 
regionResults[currentMaxorMinValueRegion][arrayIndex[currentMaxorMinValueRegion]];

Wrap long line above.
Anil: Done

+          if (colInterpreter.compare(tmp, maxOrMin) > 0) {

If I read the code correctly, the above comparison is the major difference 
between ascending and descending sorting. A little abstraction would allow you 
to unify the two cases.
Anil: IMHO, the only way to do that is to put an If(sortDecresing) condition 
and then either do ">" or "<" comparison on the basis of sortDecreasing. I am 
worried that this abstraction will make the implementation a tab more slow 
since the worst case complexity of this sorting is O(n*n). I would prefer 
performance over few extra lines of code. Let me know your views.

Looking at SortingColumnInterpreter, this is the only method which is not 
present in ColumnInterpreter:

+  T getValue(KeyValue kv) throws IOException;

The following method is already provided by ColumnInterpreter:

  public abstract T getValue(byte[] colFamily, byte[] colQualifier, KeyValue kv)
      throws IOException;
Anil: I am thinking of adding the missing method "T getValue(KeyValue kv) 
throws IOException;" in ColumnInterpreter. Is that fine? I dont understand why 
we need colFamily and colQualifier in getValue method when only a KeyValue is 
passed.

Please consider dropping SortingColumnInterpreter

Thanks a lot for doing the code review.
~Anil Gupta
Software Engineer II, Intuit, Inc

                
> Endpoint Implementation to support Scans with Sorting of Rows based on column 
> values(similar to "order by" clause of RDBMS)
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7474
>                 URL: https://issues.apache.org/jira/browse/HBASE-7474
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors, Scanners
>    Affects Versions: 0.94.3
>            Reporter: Anil Gupta
>            Assignee: Anil Gupta
>            Priority: Minor
>              Labels: coprocessors, scan, sort
>             Fix For: 0.94.5
>
>         Attachments: hbase-7474.patch, hbase-7474-v2.patch, 
> SortingEndpoint_high_level_flowchart.pdf
>
>
> Recently, i have developed an Endpoint which can sort the Results(rows) on 
> the basis of column values. This functionality is similar to "order by" 
> clause of RDBMS. I will be submitting this Patch for HBase0.94.3
> I am almost done with the initial development and testing of feature. But, i 
> need to write the JUnits for this. I will also try to make design doc.
> Thanks,
> Anil Gupta
> Software Engineer II, Intuit, inc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to "order by" clause of RDBMS)

Reply via email to