On Fri, Nov 13, 2009 at 7:19 AM, Cedric McDougal <[email protected]>wrote:
> Hi, > > I'm using HBase for a project in which I have very few columns in each > table > with greatly varying lengths. For example, in one table I might have one > column with 1 million rows of data and one column with 100. In other words, > there will be a lot of null cells in each table. > > What I'm wondering is how these null cells are treated when the table is > read into memory using the scan operation? I'm assuming they are read into > a > buffer, found to be null, then discarded, but I'm not really sure what is > happening within the system during the scan. Will a large number of null > cells noticeably slow down the scan or are they handled very quickly? Would > it be too expensive to have a single table with a lot of nulls vs. having > multiple tables with very few? > nulls do not cost. There is no 'null' signifier stored per row to mark an absence. If you have a table with a couple of rows where one column has an entry across 1M rows whereas the other has only 10 entries across the same 1M rows, only the ten values of the second column are stored. St.Ack
