hi, all,
I have a table which may contain super big rows, e.g. with
millions of cells in one row, 1.5GB in size.
now I have problem at emitting data into the table, probably
because of these super big rows are too large for my regionserver(with only
1GB heap), the region which contains the big row cannot be split, since
there is only on row in that region.
My questions:
when a regionserver is asked to open a region, will it load the
whole region into memory? or just use the memory as a cache, swap between
RAM and disk to return the wanted cells?
I saw from regionserver web UI there is a list of "online
regions", with each region's size printed, which part of these online
regions' data is stored in memory? only index data or both index info and
the "real" data.
when HBase do a read, will it load the whole region/row's data
into memory?
when HBase do a major compaction, it loads HFiles from HDFS, merge
them with what is HStore and then output to one file in HDFS, right?
thanks in advance for your answers!