Out of curiousity, why is it necessary to store the family and row with every cell? Aren't all the contents of a family confined to the same file, and couldn't a row length be stored at the beginning of each row or in a block index? Is this true for values in the caches and memstore as well?
It could have drastic implications for storing rows with many small values but with long keys, long column names, and innocently verbose column family names. Matt 2010/3/31 alex kamil <[email protected]> > i would also suggest to chk dfs.*replication* setting in hdfs (in /conf/* > hdfs*-site.xml) > > A-K > > 2010/3/31 Jean-Daniel Cryans <[email protected]> > > > HBase is column-oriented; every cell is stored with the row, family, > > qualifier and timestamp so every pieces of data will bring a larger > > disk usage. Without any knowledge of your keys, I can't comment much > > more. > > > > Then HDFS keeps a trash so every file compacted will end up there... > > if you just did the import, there will be a lot of these. > > > > Finally if you imported the data more than once, hbase keeps 3 > > versions by default. > > > > So in short, is it reasonable? Answer: it depends! > > > > J-D > > > > 2010/3/31 <[email protected]>: > > > Hi, > > > > > > We've dumped oracele data to files then put these files into different > > > hbase table. > > > The size of these files is 35G; we saw the HDFS usage up to 562G after > > > putting it into hbase. > > > Is that reasonable? > > > Thanks > > > > > > > > > > > > Fleming Chiu(邱宏明) > > > 707-6128 > > > [email protected] > > > 週一無肉日吃素救地球(Meat Free Monday Taiwan) > > > > > > > > > > > > --------------------------------------------------------------------------- > > > TSMC PROPERTY > > > This email communication (and any attachments) is proprietary > > information > > > for the sole use of its > > > intended recipient. Any unauthorized review, use or distribution by > > anyone > > > other than the intended > > > recipient is strictly prohibited. If you are not the intended > > recipient, > > > please notify the sender by > > > replying to this email, and then delete this email and any copies of > it > > > immediately. Thank you. > > > > > > --------------------------------------------------------------------------- > > > > > > > > > > > > > > >
