Re: LRU lock per slab class
Thanks for the answer. 2014년 8월 4일 월요일 오후 2시 34분 10초 UTC+9, Dormando 님의 말: Hello Dormando, Thanks for the answer. The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. = For linked-list traversal, pthreads acquire item-partitioned lock. But threads acquire global lock for LRU update. So, all the GET commands that found requested item on the hash table tries to acquire the same lock, so, I think the total hit rate is more affecting factor to the lock contention than how often each item is touched for LRU update. I missed something?? The GET command only acquires the LRU lock if it's been more than a minute since the last time it was retrieved. That's all there is to it. = Oh, you meant the 'ITEM_UPDATE_INTERVAL' parameter. okay, i understand what you mean!! I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? = Thanks for your comments. I was trying to find some proper network speed(1Gb,10Gb) for current memcached operation. I saw the best performance around 4~6 threads (1.1M rps) with the help of multi-get. With the LRU out of the way it does go up to 12-16 threads. Also if you use numactl to pin it to one node it seems to do better... but most people just don't hit it that hard, so it doesn't matter? = Yes, my test also showed better scalability after changing the LRU lock. By numactl, do you mean that multiple instances of memcached ? (i.e. one instance per node) I agree with that multiple instances with numactl will increase the scalability and performance. I will give a try in numa mode. Thanks a lot. 2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말: On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com wrote: Hello, I'm testing the scalability of memcached-1.4.20 version in a GET dominated system. For a linked-list traversal in a hash table (do_item_get), it is protected by interleaved lock (per bucket), so it showed very high scalability. But, after linked-list traversal, LRU update is protected by a global lock (cache_lock), so the scalability was limited around 4~6 threads by global lock of the LRU update global in a Xeon server system (10Gb ethernet). The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. As i know, LRU is maintained per slab class, so LRU update modifies only the items contained in the same class. So, i think the global lock of LRU update may be changed to interleaved lock per slab class. By SET command at the same time, store and removal of items in the same class can happen concurrently, but SET operation also can be changed to get the slab class lock before adding/removing some new items to/from the slab class. In case of store/removal of the linked item in the hash table (which may reside on the different slab class), it only updates the h_next value of current item, and it does not touch LRU pointers (next, prev). So, i think it would be safe to change to interleaved lock. Are there any other reasons that LRU update requires a global lock that I missed ?? (I'm not using slab rebalance and giving an initial hash power value large enough, and clients only use GET, SET commands) I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? It would be highly appreciated for any comments!! -- --- You received this message because you are
Re: LRU lock per slab class
Hello Dormando, Thanks for the answer. The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. = For linked-list traversal, pthreads acquire item-partitioned lock. But threads acquire global lock for LRU update. So, all the GET commands that found requested item on the hash table tries to acquire the same lock, so, I think the total hit rate is more affecting factor to the lock contention than how often each item is touched for LRU update. I missed something?? I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? = Thanks for your comments. I was trying to find some proper network speed(1Gb,10Gb) for current memcached operation. I saw the best performance around 4~6 threads (1.1M rps) with the help of multi-get. 2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말: On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com javascript: wrote: Hello, I'm testing the scalability of memcached-1.4.20 version in a GET dominated system. For a linked-list traversal in a hash table (do_item_get), it is protected by interleaved lock (per bucket), so it showed very high scalability. But, after linked-list traversal, LRU update is protected by a global lock (cache_lock), so the scalability was limited around 4~6 threads by global lock of the LRU update global in a Xeon server system (10Gb ethernet). The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. As i know, LRU is maintained per slab class, so LRU update modifies only the items contained in the same class. So, i think the global lock of LRU update may be changed to interleaved lock per slab class. By SET command at the same time, store and removal of items in the same class can happen concurrently, but SET operation also can be changed to get the slab class lock before adding/removing some new items to/from the slab class. In case of store/removal of the linked item in the hash table (which may reside on the different slab class), it only updates the h_next value of current item, and it does not touch LRU pointers (next, prev). So, i think it would be safe to change to interleaved lock. Are there any other reasons that LRU update requires a global lock that I missed ?? (I'm not using slab rebalance and giving an initial hash power value large enough, and clients only use GET, SET commands) I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? It would be highly appreciated for any comments!! -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: LRU lock per slab class
Hello Dormando, Thanks for the answer. The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. = For linked-list traversal, pthreads acquire item-partitioned lock. But threads acquire global lock for LRU update. So, all the GET commands that found requested item on the hash table tries to acquire the same lock, so, I think the total hit rate is more affecting factor to the lock contention than how often each item is touched for LRU update. I missed something?? The GET command only acquires the LRU lock if it's been more than a minute since the last time it was retrieved. That's all there is to it. I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? = Thanks for your comments. I was trying to find some proper network speed(1Gb,10Gb) for current memcached operation. I saw the best performance around 4~6 threads (1.1M rps) with the help of multi-get. With the LRU out of the way it does go up to 12-16 threads. Also if you use numactl to pin it to one node it seems to do better... but most people just don't hit it that hard, so it doesn't matter? 2014년 8월 2일 토요일 오전 8시 19분 59초 UTC+9, Dormando 님의 말: On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungch...@gmail.com wrote: Hello, I'm testing the scalability of memcached-1.4.20 version in a GET dominated system. For a linked-list traversal in a hash table (do_item_get), it is protected by interleaved lock (per bucket), so it showed very high scalability. But, after linked-list traversal, LRU update is protected by a global lock (cache_lock), so the scalability was limited around 4~6 threads by global lock of the LRU update global in a Xeon server system (10Gb ethernet). The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. As i know, LRU is maintained per slab class, so LRU update modifies only the items contained in the same class. So, i think the global lock of LRU update may be changed to interleaved lock per slab class. By SET command at the same time, store and removal of items in the same class can happen concurrently, but SET operation also can be changed to get the slab class lock before adding/removing some new items to/from the slab class. In case of store/removal of the linked item in the hash table (which may reside on the different slab class), it only updates the h_next value of current item, and it does not touch LRU pointers (next, prev). So, i think it would be safe to change to interleaved lock. Are there any other reasons that LRU update requires a global lock that I missed ?? (I'm not using slab rebalance and giving an initial hash power value large enough, and clients only use GET, SET commands) I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? It would be highly appreciated for any comments!! -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from
Re: LRU lock per slab class
On Jul 31, 2014, at 10:01 AM, Byung-chul Hong byungchul.h...@gmail.com wrote: Hello, I'm testing the scalability of memcached-1.4.20 version in a GET dominated system. For a linked-list traversal in a hash table (do_item_get), it is protected by interleaved lock (per bucket), so it showed very high scalability. But, after linked-list traversal, LRU update is protected by a global lock (cache_lock), so the scalability was limited around 4~6 threads by global lock of the LRU update global in a Xeon server system (10Gb ethernet). The LRU fiddling only happens once a minute per item, so hot items don't affect the lock as much. The more you lean toward hot items the better it scales as-is. As i know, LRU is maintained per slab class, so LRU update modifies only the items contained in the same class. So, i think the global lock of LRU update may be changed to interleaved lock per slab class. By SET command at the same time, store and removal of items in the same class can happen concurrently, but SET operation also can be changed to get the slab class lock before adding/removing some new items to/from the slab class. In case of store/removal of the linked item in the hash table (which may reside on the different slab class), it only updates the h_next value of current item, and it does not touch LRU pointers (next, prev). So, i think it would be safe to change to interleaved lock. Are there any other reasons that LRU update requires a global lock that I missed ?? (I'm not using slab rebalance and giving an initial hash power value large enough, and clients only use GET, SET commands) I don't think anything stops it. Rebalance tends to stay within one class. It was on my list of scalability fixes to work on, but I postponed it for a few reasons. One is that most tend to have over half of their requests in one slab class. So splitting the lock doesn't give as much of a long term benefit. So, I wanted to come back to it later and see what other options were plausible for scaling the lru within a single slab class. Nobody's complained about the performance after the last round of work as well, so it stays low priority. Are your objects always only hit once per minute? What kind of performance are you seeing and what do you need to get out of it? It would be highly appreciated for any comments!! -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups memcached group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.