[
https://issues.apache.org/jira/browse/HBASE-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552460#comment-14552460
]
Jean-Marc Spaggiari commented on HBASE-13721:
---------------------------------------------
Same thing in other places:
{code}
#----------------------------------------------------------------------------------------------
# Count rows in a table
def _count_internal(interval = 1000, caching_rows = 10)
# We can safely set scanner caching with the first key only filter
scan = org.apache.hadoop.hbase.client.Scan.new
scan.setCacheBlocks(false)
scan.setCaching(caching_rows)
scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
# Run the scanner
scanner = @table.getScanner(scan)
count = 0
iter = scanner.iterator
# Iterate results
while iter.hasNext
row = iter.next
count += 1
next unless (block_given? && count % interval == 0)
# Allow command modules to visualize counting process
yield(count,
org.apache.hadoop.hbase.util.Bytes::toStringBinary(row.getRow))
end
# Return the counter
return count
end
{code}
Here again scanner is not closed. I will open a JIRA and look at that.
> Improve shell scan performances when using LIMIT
> ------------------------------------------------
>
> Key: HBASE-13721
> URL: https://issues.apache.org/jira/browse/HBASE-13721
> Project: HBase
> Issue Type: Bug
> Components: shell
> Affects Versions: 1.1.0
> Reporter: Jean-Marc Spaggiari
> Assignee: Jean-Marc Spaggiari
> Attachments: HBASE-13721-v0-trunk.txt
>
>
> When doing a scan which is expected to return the exact same number of rows
> as the LIMIT we give, we still scan the entire table until we return the
> row(s) and then test the numbers of rows we have. This can take a lot of time.
> Example:
> scan 'sensors', { COLUMNS => ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'],
> STARTROW => '000a', LIMIT => 1 }
> This is because we will break on the limit condition AFTER we ask for the
> next row. If there is none, we scan the entire table than exit.
> Goal of this patch is to handle this specific case without impacting the
> others.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)