Andrew Purtell wrote:
..
Maybe a map of MapFile to row count estimations can be stored in the FS next to
the MapFiles and can be updated appropriately during compactions. Then a client
can iterate over the regions of a table, ask the regionservers involved for row
count estimations, the regionservers can consult the estimation-map and send
the largest count found there for the table plus the largest memcache count for
the table, and finally the client can total all of the results.
I like this idea. Suggest sticking it in the issue. Each store already
has an accompanying 'meta' file under the sympathetic 'info' dir. Could
stuff estimates in here. Estimate of rows would also help sizing bloom
filters when the 'enable-bloomfilters' switch is thrown. We'd have to
be clear this count an estimate particularly when rows of sparsely
populated columns.
St.Ack
- Andy
From: Jean-Daniel Cryans <[EMAIL PROTECTED]>
Subject: Re: any chance to get the size of a table?
To: [email protected]
Date: Monday, July 21, 2008, 6:43 AM
Zhao,
Yes, the only way is to use a scanner but it will take a
_long_ time. HBASE-32 about adding a row count
estimator. For those who want to know why it's so slow,
having a scanner that goes on each row of a table requires
doing a read request on disk for each one of them (except
for the stuff in the memcache that waits to be flushed).
If you have 6 500 000 rows like I saw last week on the IRC
channel, i may take well over 80 minutes (it depends on the
cpu/io/network load, hardware, etc).
J-D