Re: LRU lock per slab class

2014-08-04 Thread Byung-chul Hong
Thanks for the answer.


2014년 8월 4일 월요일 오후 2시 34분 10초 UTC+9, Dormando 님의 말:

  Hello Dormando, 
  Thanks for the answer. 
  
  The LRU fiddling only happens once a minute per item, so hot items don't 
 affect the lock as much. The more you lean toward hot 
  items the better it scales as-is. 
  = For linked-list traversal, pthreads acquire item-partitioned lock. 
 But threads acquire global lock for LRU update.  
  So, all the GET commands that found requested item on the hash table 
 tries to acquire the same lock, so, I think the total hit 
  rate is more affecting factor to the lock contention  
  than how often each item is touched for LRU update. I missed something?? 

 The GET command only acquires the LRU lock if it's been more than a minute 
 since the last time it was retrieved. That's all there is to it. 

= Oh, you meant the 'ITEM_UPDATE_INTERVAL' parameter. okay, i understand 
what you mean!!
 


  I don't think anything stops it. Rebalance tends to stay within one 
 class. It was on my list of scalability fixes to work on, 
  but I postponed it for a few reasons. 
  One is that most tend to have over half of their requests in one slab 
 class. So splitting the lock doesn't give as much of a 
  long term benefit. 
  So, I wanted to come back to it later and see what other options were 
 plausible for scaling the lru within a single slab class. 
  Nobody's complained about the performance after the last round of work 
 as well, so it stays low priority. 
  Are your objects always only hit once per minute? What kind of 
 performance are you seeing and what do you need to get out of it? 
  = Thanks for your comments. I was trying to find some proper network 
 speed(1Gb,10Gb) for current memcached operation.  
  I saw the best performance around 4~6 threads (1.1M rps) with the help 
 of multi-get.  

 With the LRU out of the way it does go up to 12-16 threads. Also if you 
 use numactl to pin it to one node it seems to do better... but most people 
 just don't hit it that hard, so it doesn't matter? 

= Yes, my test also showed better scalability after changing the LRU lock.
By numactl, do you mean that multiple instances of memcached ? (i.e. one 
instance per node)
I agree with that multiple instances with numactl will increase the 
scalability and performance.
I will give a try in numa mode. Thanks a lot.


  
  2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말: 
  
  
  On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com 
 wrote: 
  
Hello, 
  I'm testing the scalability of memcached-1.4.20 version in a GET 
 dominated system. 
  For a linked-list traversal in a hash table (do_item_get), it is 
 protected by interleaved lock (per bucket), 
  so it showed very high scalability.  
  But, after linked-list traversal, LRU update is protected by a global 
 lock (cache_lock), 
  so the scalability was limited around 4~6 threads by global lock of the 
 LRU update global in a Xeon server system 
  (10Gb ethernet). 
  
  
  The LRU fiddling only happens once a minute per item, so hot items don't 
 affect the lock as much. The more you lean toward 
  hot items the better it scales as-is.  
  
  
  
  As i know, LRU is maintained per slab class, so LRU update modifies only 
 the items contained in the same class. 
  So, i think the global lock of LRU update may be changed to interleaved 
 lock per slab class. 
  By SET command at the same time, store and removal of items in the same 
 class can happen concurrently,  
  but SET operation also can be changed to get the slab class lock before 
 adding/removing some new items to/from the 
  slab class.  
  
  In case of store/removal of the linked item in the hash table (which 
 may reside on the different slab class),  
  it only updates the h_next value of current item, and it does not touch 
 LRU pointers (next, prev).  
  So, i think it would be safe to change to interleaved lock. 
  
  Are there any other reasons that LRU update requires a global lock that 
 I missed ?? 
  (I'm not using slab rebalance and giving an initial hash power value 
 large enough, and clients only use GET, SET 
  commands) 
  
  
  I don't think anything stops it. Rebalance tends to stay within one 
 class. It was on my list of scalability fixes to work 
  on, but I postponed it for a few reasons. 
  
  One is that most tend to have over half of their requests in one slab 
 class. So splitting the lock doesn't give as much of 
  a long term benefit. 
  
  So, I wanted to come back to it later and see what other options were 
 plausible for scaling the lru within a single slab 
  class. Nobody's complained about the performance after the last round of 
 work as well, so it stays low priority. 
  
  Are your objects always only hit once per minute? What kind of 
 performance are you seeing and what do you need to get out 
  of it? 
  
  It would be highly appreciated for any comments!! 
  
  -- 
  
  --- 
  You received this message because you are 

Re: LRU lock per slab class

2014-08-03 Thread Byung-chul Hong
Hello Dormando,

Thanks for the answer.

The LRU fiddling only happens once a minute per item, so hot items don't 
affect the lock as much. The more you lean toward hot items the better it 
scales as-is.
= For linked-list traversal, pthreads acquire item-partitioned lock. But 
threads acquire global lock for LRU update. 
So, all the GET commands that found requested item on the hash table tries 
to acquire the same lock, so, I think the total hit rate is more affecting 
factor to the lock contention 
than how often each item is touched for LRU update. I missed something??

I don't think anything stops it. Rebalance tends to stay within one class. 
It was on my list of scalability fixes to work on, but I postponed it for a 
few reasons.
One is that most tend to have over half of their requests in one slab 
class. So splitting the lock doesn't give as much of a long term benefit.
So, I wanted to come back to it later and see what other options were 
plausible for scaling the lru within a single slab class. Nobody's 
complained about the performance after the last round of work as well, so 
it stays low priority.
Are your objects always only hit once per minute? What kind of performance 
are you seeing and what do you need to get out of it?
= Thanks for your comments. I was trying to find some proper network 
speed(1Gb,10Gb) for current memcached operation. 
I saw the best performance around 4~6 threads (1.1M rps) with the help of 
multi-get. 


2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말:



 On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com 
 javascript: wrote:

 Hello,

 I'm testing the scalability of memcached-1.4.20 version in a GET dominated 
 system.
 For a linked-list traversal in a hash table (do_item_get), it is protected 
 by interleaved lock (per bucket),
 so it showed very high scalability. 
 But, after linked-list traversal, LRU update is protected by a global lock 
 (cache_lock),
 so the scalability was limited around 4~6 threads by global lock of the 
 LRU update global in a Xeon server system (10Gb ethernet).


 The LRU fiddling only happens once a minute per item, so hot items don't 
 affect the lock as much. The more you lean toward hot items the better it 
 scales as-is. 



 As i know, LRU is maintained per slab class, so LRU update modifies only 
 the items contained in the same class.
 So, i think the global lock of LRU update may be changed to interleaved 
 lock per slab class.
 By SET command at the same time, store and removal of items in the same 
 class can happen concurrently, 
 but SET operation also can be changed to get the slab class lock before 
 adding/removing some new items to/from the slab class. 

 In case of store/removal of the linked item in the hash table (which may 
 reside on the different slab class), 
 it only updates the h_next value of current item, and it does not touch 
 LRU pointers (next, prev). 
 So, i think it would be safe to change to interleaved lock.

 Are there any other reasons that LRU update requires a global lock that I 
 missed ??
 (I'm not using slab rebalance and giving an initial hash power value large 
 enough, and clients only use GET, SET commands)


 I don't think anything stops it. Rebalance tends to stay within one class. 
 It was on my list of scalability fixes to work on, but I postponed it for a 
 few reasons.

 One is that most tend to have over half of their requests in one slab 
 class. So splitting the lock doesn't give as much of a long term benefit.

 So, I wanted to come back to it later and see what other options were 
 plausible for scaling the lru within a single slab class. Nobody's 
 complained about the performance after the last round of work as well, so 
 it stays low priority.

 Are your objects always only hit once per minute? What kind of performance 
 are you seeing and what do you need to get out of it?


 It would be highly appreciated for any comments!!

  -- 

 --- 
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+...@googlegroups.com javascript:.
 For more options, visit https://groups.google.com/d/optout.



-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: LRU lock per slab class

2014-08-03 Thread dormando
 Hello Dormando,
 Thanks for the answer.

 The LRU fiddling only happens once a minute per item, so hot items don't 
 affect the lock as much. The more you lean toward hot
 items the better it scales as-is.
 = For linked-list traversal, pthreads acquire item-partitioned lock. But 
 threads acquire global lock for LRU update. 
 So, all the GET commands that found requested item on the hash table tries to 
 acquire the same lock, so, I think the total hit
 rate is more affecting factor to the lock contention 
 than how often each item is touched for LRU update. I missed something??

The GET command only acquires the LRU lock if it's been more than a minute
since the last time it was retrieved. That's all there is to it.

 I don't think anything stops it. Rebalance tends to stay within one class. It 
 was on my list of scalability fixes to work on,
 but I postponed it for a few reasons.
 One is that most tend to have over half of their requests in one slab class. 
 So splitting the lock doesn't give as much of a
 long term benefit.
 So, I wanted to come back to it later and see what other options were 
 plausible for scaling the lru within a single slab class.
 Nobody's complained about the performance after the last round of work as 
 well, so it stays low priority.
 Are your objects always only hit once per minute? What kind of performance 
 are you seeing and what do you need to get out of it?
 = Thanks for your comments. I was trying to find some proper network 
 speed(1Gb,10Gb) for current memcached operation. 
 I saw the best performance around 4~6 threads (1.1M rps) with the help of 
 multi-get. 

With the LRU out of the way it does go up to 12-16 threads. Also if you
use numactl to pin it to one node it seems to do better... but most people
just don't hit it that hard, so it doesn't matter?


 2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말:


 On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com wrote:

   Hello,
 I'm testing the scalability of memcached-1.4.20 version in a GET dominated 
 system.
 For a linked-list traversal in a hash table (do_item_get), it is protected by 
 interleaved lock (per bucket),
 so it showed very high scalability. 
 But, after linked-list traversal, LRU update is protected by a global lock 
 (cache_lock),
 so the scalability was limited around 4~6 threads by global lock of the LRU 
 update global in a Xeon server system
 (10Gb ethernet).


 The LRU fiddling only happens once a minute per item, so hot items don't 
 affect the lock as much. The more you lean toward
 hot items the better it scales as-is. 



 As i know, LRU is maintained per slab class, so LRU update modifies only the 
 items contained in the same class.
 So, i think the global lock of LRU update may be changed to interleaved lock 
 per slab class.
 By SET command at the same time, store and removal of items in the same class 
 can happen concurrently, 
 but SET operation also can be changed to get the slab class lock before 
 adding/removing some new items to/from the
 slab class. 

 In case of store/removal of the linked item in the hash table (which may 
 reside on the different slab class), 
 it only updates the h_next value of current item, and it does not touch LRU 
 pointers (next, prev). 
 So, i think it would be safe to change to interleaved lock.

 Are there any other reasons that LRU update requires a global lock that I 
 missed ??
 (I'm not using slab rebalance and giving an initial hash power value large 
 enough, and clients only use GET, SET
 commands)


 I don't think anything stops it. Rebalance tends to stay within one class. It 
 was on my list of scalability fixes to work
 on, but I postponed it for a few reasons.

 One is that most tend to have over half of their requests in one slab class. 
 So splitting the lock doesn't give as much of
 a long term benefit.

 So, I wanted to come back to it later and see what other options were 
 plausible for scaling the lru within a single slab
 class. Nobody's complained about the performance after the last round of work 
 as well, so it stays low priority.

 Are your objects always only hit once per minute? What kind of performance 
 are you seeing and what do you need to get out
 of it?

 It would be highly appreciated for any comments!!

 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

 --

 ---
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.



-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from 

Re: LRU lock per slab class

2014-08-01 Thread Dormando


 On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungchul.h...@gmail.com 
 wrote:
 
 Hello,
 
 I'm testing the scalability of memcached-1.4.20 version in a GET dominated 
 system.
 For a linked-list traversal in a hash table (do_item_get), it is protected by 
 interleaved lock (per bucket),
 so it showed very high scalability. 
 But, after linked-list traversal, LRU update is protected by a global lock 
 (cache_lock),
 so the scalability was limited around 4~6 threads by global lock of the LRU 
 update global in a Xeon server system (10Gb ethernet).

The LRU fiddling only happens once a minute per item, so hot items don't affect 
the lock as much. The more you lean toward hot items the better it scales as-is.

 
 As i know, LRU is maintained per slab class, so LRU update modifies only the 
 items contained in the same class.
 So, i think the global lock of LRU update may be changed to interleaved lock 
 per slab class.
 By SET command at the same time, store and removal of items in the same class 
 can happen concurrently, 
 but SET operation also can be changed to get the slab class lock before 
 adding/removing some new items to/from the slab class. 
 
 In case of store/removal of the linked item in the hash table (which may 
 reside on the different slab class), 
 it only updates the h_next value of current item, and it does not touch LRU 
 pointers (next, prev). 
 So, i think it would be safe to change to interleaved lock.
 
 Are there any other reasons that LRU update requires a global lock that I 
 missed ??
 (I'm not using slab rebalance and giving an initial hash power value large 
 enough, and clients only use GET, SET commands)

I don't think anything stops it. Rebalance tends to stay within one class. It 
was on my list of scalability fixes to work on, but I postponed it for a few 
reasons.

One is that most tend to have over half of their requests in one slab class. So 
splitting the lock doesn't give as much of a long term benefit.

So, I wanted to come back to it later and see what other options were plausible 
for scaling the lru within a single slab class. Nobody's complained about the 
performance after the last round of work as well, so it stays low priority.

Are your objects always only hit once per minute? What kind of performance are 
you seeing and what do you need to get out of it?
 
 It would be highly appreciated for any comments!!
 
 -- 
 
 --- 
 You received this message because you are subscribed to the Google Groups 
 memcached group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to memcached+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
memcached group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.