Andrew Purtell wrote:
..
Maybe a map of MapFile to row count estimations can be stored in the FS next to 
the MapFiles and can be updated appropriately during compactions. Then a client 
can iterate over the regions of a table, ask the regionservers involved for row 
count estimations, the regionservers can consult the estimation-map and send 
the largest count found there for the table plus the largest memcache count for 
the table, and finally the client can total all of the results.

I like this idea. Suggest sticking it in the issue. Each store already has an accompanying 'meta' file under the sympathetic 'info' dir. Could stuff estimates in here. Estimate of rows would also help sizing bloom filters when the 'enable-bloomfilters' switch is thrown. We'd have to be clear this count an estimate particularly when rows of sparsely populated columns.

St.Ack


   - Andy

From: Jean-Daniel Cryans <[EMAIL PROTECTED]>
Subject: Re: any chance to get the size of a table?
To: [email protected]
Date: Monday, July 21, 2008, 6:43 AM

Zhao,

Yes, the only way is to use a scanner but it will take a
_long_ time. HBASE-32 about adding a row count
estimator. For those who want to know why it's so slow,
having a scanner that goes on each row of a table requires
doing a read request on disk for each one of them (except
for the stuff in the memcache that waits to be flushed).
If you have 6 500 000 rows like I saw last week on the IRC
channel, i may take well over 80 minutes (it depends on the
cpu/io/network load, hardware, etc).

J-D




Reply via email to