Hi Dhruba, > another bottleneck that I am seeing is that all transactions need to come to > a halt when rolling hlogs, the reason being that all transactions need to be > drained before we can close the hlog
I didn't measure the rate but I'd expect quite often due to a constant as-many-writes-as-we-can-push workload. The performance limitation in memstore suggested by the impact of flushes was the dominant factor. > InitialOccupancyFactor > what is the size of ur NewGen? This is what I'm testing with: -Xmx4000m -Xms4000m -Xmn400m \ -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 \ -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC \ -XX:+CMSParallelRemarkEnabled -XX:MaxGCPauseMillis=100 \ -XX:+UseMembar > how many client threads 20 single threaded mapreduce clients > and how many region server handler threads are u using? 100 per rs > For increment operation, I introduced the concept of a > ModifyableKeyValue whereby every increment actually updates > the same KeyValue record if found in the MemStore (instead > of creating a new KeyValue record and re-inserting > it into memstore). Patch! Patch! Patch! :-) :-) (I'd ... consider ... trying it.) - Andy --- On Sat, 3/19/11, Dhruba Borthakur <[email protected]> wrote: > From: Dhruba Borthakur <[email protected]> > Subject: Re: minor compaction bug (was: upsert case performance problem) > To: [email protected], [email protected] > Date: Saturday, March 19, 2011, 10:24 PM > Hi andrew, > > I have been doing a set of experiments for the last one month on a workload > that is purely "increments". I too have seen that the performance drops when > the memstore fills up. My guess is that although the complexity is O(logn), > still when n is large the time needed to insert/lookup could be large. It > would have been nice if it were a hashMap instead of a tree, but the > tradeoff is that we would have to sort it while writing to hfile. > > another bottleneck that I am seeing is that all transactions need to come to > a halt when rolling hlogs, the reason being that all transactions need to be > drained before we can close the hlog. how frequently is this occuring in ur > case? > > how much GC are u seeing and what is the InitialOccupancyFactor for the JVM, > I have set InitialOccupancyFactor to 40 in my case. what is the size of ur > NewGen? > > how many client threads and how many region server handler threads are u > using? > > For increment operation, I introduced the concept of a ModifyableKeyValue > whereby every increment actually updates the same KeyValue record if found > in the MemStore (instead of creating a new KeyValue record and re-inserting > it into memstore). > > I am very interested in exchanging notes and what else u find, > thanks, > dhruba > > On Sat, Mar 19, 2011 at 11:15 AM, Andrew Purtell <[email protected]>wrote: > > > See below. > > > > Doing some testing on that I let the mapreduce program > and an hbase shell > > flushing every 60 seconds run overnight. The result on > two tables was: > > > > 562 store files! > > > > ip-10-170-34-18.us-west-1.compute.internal:60020 > 1300494200158 > > requests=51, regions=1, > usedHeap=1980, maxHeap=3960 > > > akamai.ip,,1300494562755.1b0614eaecca0d232d7315ff4a3ebb87. > > > stores=1, storefiles=562, > storefileSizeMB=310, > > memstoreSizeMB=1, storefileIndexSizeMB=2 > > > > 528 store files! > > > > > ip-10-170-49-35.us-west-1.compute.internal:60020 > 1300494214101 > > requests=79, regions=1, > usedHeap=1830, maxHeap=3960 > > > akamai.domain,,1300494560898.af85225ae650574dbc4caa34df8b6a35. > > > stores=1, storefiles=528, > storefileSizeMB=460, > > memstoreSizeMB=3, storefileIndexSizeMB=3 > > > > ... so that killed performance after a while ... > > > > Here's something else. > > > > - Andy > > > > --- On Sat, 3/19/11, Andrew Purtell <[email protected]> > wrote: > > > > From: Andrew Purtell <[email protected]> > > Subject: upsert case performance problem (doubts > about > > ConcurrentSkipListMap) > > To: [email protected] > > Date: Saturday, March 19, 2011, 11:10 AM > > > > I have a mapreduce task put together for > experimentation which does a lot > > of Increments over three tables and Puts to another. I > set writeToWAL to > > false. My HBase includes the patch that fixes > serialization of writeToWAL > > for Increments. MemstoreLAB is enabled but is probably > not a factor, but > > still need to test to exclude it. > > > > After starting a job up on a test cluster on EC2 with > 20 mappers over 10 > > slaves I see initially 10-15K/ops/sec/server. This > performance drops over a > > short time to stabilize around 1K/ops/sec/server. So I > flush the tables with > > the shell. Immediately after flushing the tables, > performance is back up to > > 10-15K/ops/sec/server. If I don't flush, performance > remains low > > indefinitely. If I flush only the table receiving the > Gets, performance > > remains low. > > > > If I set the shell to flush in a loop every 60 > seconds, performance > > repeatedly drops during that interval, then recovers > after flushing. > > > > When Gary and I went to NCHC in Taiwan, we saw a guy > from PhiCloud present > > something similar to this regarding 0.89DR. He > measured the performance of > > the memstore for a get-and-put use case over time and > graphed it, looked > > like time increased on a staircase with a trend to > O(n). This was a > > surprising result. ConcurrentSkipListMap#put is > supposed to run in O(log n). > > His workaround was to flush after some fixed number of > gets+puts, 1000 I > > think. At the time we weren't sure what was going on > given the language > > barrier. > > > > Sound familiar? > > > > I don't claim to really understand what is going on, > but need to get to the > > bottom of this. Going to look at it in depth starting > Monday. > > > > - Andy > > > > > > > > > > > > > -- > Connect to me at http://www.facebook.com/dhruba >
