Maybe someone here can explain this to me

Setup
I am running a bulk import of large columns size average 15KB (web pages source) or so per record
I have one region server with only 1 region no splits yet
I have one other server running thrift server and the same server running 1 thread import process

I am seeing at start about 60-80 records inserted per 3 secs reported by the GUI of the master but once I hit my 64MB memcache limit on the region server it blocks and flushes the column. Then immediately after that I see insert rate of about 600-700 per 3 sec said the gui of the master and this last until I am done inserting only to slow down for more flushes 20-25 secs later and continues to speed along.

Any idea why it starts slow and jumps to such a higher rate of insert after the memcache flush? Again this is all single threaded so no MR job or anything like I have ran this and seen it happen each time with the flushes Happening at different times in the import and the same results happen so that rules out smaller data in the end half

So wondering if this is something related to the region server or the thrift server.

hadoop 0.17.0, r652576
hbase 0.2.0-dev, r654653

Billy Pearson


Reply via email to