[jira] [Commented] (HBASE-13721) Improve shell scan performances when using LIMIT

Jean-Marc Spaggiari (JIRA) Wed, 20 May 2015 08:12:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552460#comment-14552460
 ]


Jean-Marc Spaggiari commented on HBASE-13721:
---------------------------------------------

Same thing in other places:
{code}
    
#----------------------------------------------------------------------------------------------
    # Count rows in a table
    def _count_internal(interval = 1000, caching_rows = 10)
      # We can safely set scanner caching with the first key only filter
      scan = org.apache.hadoop.hbase.client.Scan.new
      scan.setCacheBlocks(false)
      scan.setCaching(caching_rows)
      scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)

      # Run the scanner
      scanner = @table.getScanner(scan)
      count = 0
      iter = scanner.iterator

      # Iterate results
      while iter.hasNext
        row = iter.next
        count += 1
        next unless (block_given? && count % interval == 0)
        # Allow command modules to visualize counting process
        yield(count,
              org.apache.hadoop.hbase.util.Bytes::toStringBinary(row.getRow))
      end

      # Return the counter
      return count
    end
{code}

Here again scanner is not closed. I will open a JIRA and look at that.

> Improve shell scan performances when using LIMIT
> ------------------------------------------------
>
>                 Key: HBASE-13721
>                 URL: https://issues.apache.org/jira/browse/HBASE-13721
>             Project: HBase
>          Issue Type: Bug
>          Components: shell
>    Affects Versions: 1.1.0
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>         Attachments: HBASE-13721-v0-trunk.txt
>
>
> When doing a scan which is expected to return the exact same number of rows 
> as the LIMIT we give, we still scan the entire table until we return the 
> row(s) and then test the numbers of rows we have. This can take a lot of time.
> Example:
> scan 'sensors', { COLUMNS => ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], 
> STARTROW => '000a', LIMIT => 1 }
> This is because we will break on the limit condition AFTER we ask for the 
> next row. If there is none, we scan the entire table than exit.
> Goal of this patch is to handle this specific case without impacting the 
> others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13721) Improve shell scan performances when using LIMIT

Reply via email to