RowCount is great example code for introduction to MapRed programs over HBase. I found it to be very beneficial to my understanding of hbase to reimpliment a RowCount job from the ground up as an exercise.
-Daniel On Mon, Jul 21, 2008 at 11:36 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > Having spent many years in the RDBMS world, the straight answer to that is, > it depends. > > Postgres was notorious for being a poor performer when it came to count(). > That's because Postgres fetches each row off of disk while doing the count, > for safety reasons. MySQL, on the other hand, "trusts" its indexes and > therefore can perform a full table count() just by pulling an index off of > disk. > > The safer the database, the longer a row count will take. > > However most RDBMS' keep table statistics used in query planning. If you > want "rough" row counts, you can also do straight select queries into the > statistics tables. > > As stack suggests, our solution for doing row counting is running MR jobs. > In postgres we used to have a TRIGGER system that would maintain > pre-computed counts for things we need to aggregate. Things like this tend > to be a nightmare any way you slice it :) > > Jon > > -----Original Message----- > From: ZhaoWei [mailto:[EMAIL PROTECTED] > Sent: Monday, July 21, 2008 7:15 AM > To: [email protected]; [EMAIL PROTECTED] > Subject: Re: any chance to get the size of a table? > > Thanks J-D, that sounds annoying. Should the row count be a piece of meta > data? > How does a RDBMS do when one types "selct count(xxx) from xxx"? > >> Zhao, >> >> Yes, the only way is to use a scanner but it will take a _long_ time. > HBASE-32 >> <https://issues.apache.org/jira/browse/HBASE-32>is about adding a row > count >> estimator. For those who want to know why it's so slow, having a scanner >> that goes on each row of a table requires doing a read request on disk for >> each one of them (except for the stuff in the memcache that waits to be >> flushed). If you have 6 500 000 rows like I saw last week on the IRC >> channel, i may take well over 80 minutes (it depends on the cpu/io/network >> load, hardware, etc). >> >> J-D >> >> On Mon, Jul 21, 2008 at 5:21 AM, ZhaoWei <[EMAIL PROTECTED]> wrote: >> >> > Hi J-D, >> > How to get row count of a table, only scanner? >> > >> > >> > Thanks! >> > >> > > Daniel, >> > > >> > > Sorry, this feature is still missing in HBase. For the moment, the > best >> > you >> > > can do is to use HDFS web UI. If you would like to this in a future >> > release, >> > > feel free to fill a Jira: https://issues.apache.org/jira/browse/HBASE >> > > >> > > J-D >> > > >> > > On Sat, Jul 19, 2008 at 5:58 PM, Daniel <[EMAIL PROTECTED]> wrote: >> > > >> > > > hi all, >> > > > it's a bit strange, but i cant find some class or method to get > the >> > > > 'size' of a created table - maybe the total size of all the HStores > ? >> > > > or is there any command in HQL can do this? >> > > > Thanks. >> > > > >> > > > Daniel >> > > > >> > > >
