[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016238#comment-13016238 ] Hudson commented on HBASE-3694: --- Integrated in HBase-TRUNK #1831 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1831/]) high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.92.0 Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015236#comment-13015236 ] stack commented on HBASE-3694: -- @Liyin Then your last posted patch is good to go? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015075#comment-13015075 ] Liyin Tang commented on HBASE-3694: --- I think using AtomicLong is pretty safe here:) high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011589#comment-13011589 ] Ted Yu commented on HBASE-3694: --- I agree with Todd's comment about the increment call. There are two reasons. 1. The return value is not used - after switching return type to void, the code compiles cleanly. 2. It somewhat exposes the implementation detail of the underlying class (in this case AtomicLong). I am attaching a patch that utilizes Cliff Click Counter which Stack mentioned at 25/Mar/11 22:44. Thanks for the great work Liyin. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011675#comment-13011675 ] Liyin Tang commented on HBASE-3694: --- +1 with the change method name to addAndGetMemstoreSize But Cliff Click Counter is not thread safe. Are you sure to use it? We want everything in the RegionServerAccounting is accurate, something that is crucial to correct operation. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011682#comment-13011682 ] stack commented on HBASE-3694: -- Use AtomicLong if alternative is not thread safe. Name should be addMemstoreSize and not addAndGetMemstoreSize if not returning a value (as per Todd and Ted above). Thanks for being persistent Liyin. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011686#comment-13011686 ] stack commented on HBASE-3694: -- bq. But Cliff Click Counter is not thread safe. I thought whole point of the CC Counters was that they were (lockless) threadsafe high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011694#comment-13011694 ] Liyin Tang commented on HBASE-3694: --- Thanks stack and Ted, I thought CC Counters was thread safe to add, since they have an array of counters internally to avoid cache contention, but it looks like it is not thread safe the get. From their javadoc: public long get() Current value of the counter. Since other threads are updating furiously the value is only approximate, but it includes all counts made by the current thread. Requires a pass over the internally striped counters. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011703#comment-13011703 ] Ted Yu commented on HBASE-3694: --- Here is javadoc for add_if_mask() which is called by add(): {code} The sum can overflow or 'x' can contain bits in // the mask. Value is CAS'd so no counts are lost. The CAS is retried until // it succeeds or bits are found under the mask. {code} Where mask of 0 is used, meaning no failure. Looking further into failure case inside add_if_mask() we can verify the above assumption: {code} if( (oldmask) != 0 ) return old; // Failed for bit-set under mask {code} high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011717#comment-13011717 ] stack commented on HBASE-3694: -- @Liyin Approx count on get is fine by me. If you need it to be 'exact', go w/ AtomicLong. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 3694-cliffs-counter.txt, Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085592]_7.patch, Hbase-3694[r1085593]_5.patch, Hbase-3694[r1085593]_6.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011414#comment-13011414 ] stack commented on HBASE-3694: -- Patch looks good but I stumble when I come to this: {code} + /** + * @return the global mem store size in the region server + */ + public AtomicLong getGlobalMemstoreSize(); {code} Here we are adding the getting of a single value to the RSS Interface. RSS is usually about more macro-type services than single data member value. Rare would the user of RSS be interested in this single value. More useful i'd think would be if the RSS returned a class that allowed client a (read-only) view on multiple RS values; e.g. Above there is talk of a MemoryAccountingManager which I imagine would have this memstore size among other values. We could change getRpcMetrics to be a generic getMetrics and it would return a RegionServerMetrics instance taht would include instance of HBaseRpcMetrics and current state of above counter? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011421#comment-13011421 ] Liyin Tang commented on HBASE-3694: --- Thanks Stack. I think adding globalMemstoreSize into RegionServerMetrics makes more sense than add a new class MemoryAccountingManager? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011437#comment-13011437 ] Todd Lipcon commented on HBASE-3694: I don't want to conflate metrics (things that get exported for monitoring purposes) with internal accounting (things which are necessarily correct and up-to-date for proper functioning of the server). Some internal accounting may be exposed as metrics, but the two subsystems are quite separate in my mind. Does that make sense? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011444#comment-13011444 ] Ted Yu commented on HBASE-3694: --- memstoreSizeMB is a member of RegionServerMetrics and is set at hbase.regionserver.msginterval See line 1162 in HRegionServer.java: {code} this.metrics.memstoreSizeMB.set((int) (memstoreSize / (1024 * 1024))); {code} memstoreSizeMB is of type MetricsIntValue which is a subclass of MetricsBase and stores value in: {code} private int value; {code} We can create MetricsAtomicLongValue class with following signature: {code} public class MetricsAtomicLongValue extends MetricsBase{ private AtomicLong value; private boolean changed; {code} If we reach agreement on adding this method to RegionServerServices (which is available in HRegionServer and being used by MemStoreFlusher): {code} /** * @return Region server metrics instance. */ public RegionServerMetrics getMetrics() { {code} then we can change memstoreSizeMB to memstoreSize which is of type MetricsAtomicLongValue and blend Liyin's changes onto memstoreSize. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011449#comment-13011449 ] Liyin Tang commented on HBASE-3694: --- The internal accounting makes sense. I just think MemoryAccountingManager is too specific. We need something more general to reuse it in the future, RegionServerAccountingManager. Thoughts? Liyin high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011452#comment-13011452 ] Jonathan Gray commented on HBASE-3694: -- Do we really want to put things like this into RegionServerMetrics? That class is a mess and is currently only used for the publishing of our metrics (not used for internal state tracking). And we should avoid the hadoop Metrics* classes like the plague... heavily synchronized and generally confusing. My vote would be to add a new class, maybe {{RegionServerHeapManager}} or something like that... might be a good opportunity to cleanup and centralize the code related to that. But could just hold this one AtomicLong for now. Agree that adding a new interface method just for the long is not ideal since it buys us nothing down the road. Better to add something new that we can use later. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011457#comment-13011457 ] Todd Lipcon commented on HBASE-3694: +1 to jgray's suggestion. Please please please let's not conflate metrics and something that is crucial to correct operation. In terms of overall design, I would love to see RegionServerServices evolve into something like an IOC container - it's just used to provide wiring between the different components that make up a running RS. That makes mocking easier and should help with general modularity. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011461#comment-13011461 ] stack commented on HBASE-3694: -- bq. In terms of overall design, I would love to see RegionServerServices evolve into something like an IOC container Yeah, thats the planNeed to keep it macro though. Args on why this is not 'metrics' are good. I go along. Just say no to atomic long counters now we have cliff click counters in our CLASSPATH bq. The internal accounting makes sense. I just think MemoryAccountingManager is too specific. We need something more general to reuse it in the future, RegionServerAccountingManager. Agreed. Should be more than just about Memory accounting (and agree w/ Jon that it could be path out of our hairball HRegionServer class). For you Liyin and this patch, I think just make a class named RegionServerAccounting -- drop Manager I'd say, that might be a little megalomanicial -- and put just this one counter in it (as per Jon). Add getRegionServerAccounting to RSS Interface. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011467#comment-13011467 ] Todd Lipcon commented on HBASE-3694: Sounds good to me. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011561#comment-13011561 ] stack commented on HBASE-3694: -- Please do not use HBaseClusterTestCase as basis for your test. Its been deprecated ' * @deprecated Use junit4 and {@link HBaseTestingUtility}'. Sorry about that. We should have made sure you got the memo on that one. The alternative HBaseTestingUtility has cleaner means of creating multiregion table. Fix copyright on your test -- also, the javadoc is copy/pasted from elsewhere -- and in your accounting class. Its 2011! RegionServerAccounting needs a bit of class javadoc to say what the class is for. I'd write 'private final AtomicLong atomicGlobalMemstoreSize = new AtomicLong(0);' rather than wait to assign in the Constructor (no need for a constructor then). I'd rename incGlobalMemstoreSize as addAndGetGlobalMemstoreSize as in AtomicLong and I'd return the current value as per AtomicLong (why not?). I'd also call it getAndAddMemstoreSize rather than incMemoryUsage. Otherwise the patch looks great Liyin. Thanks for doing this. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085593]_5.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011570#comment-13011570 ] Todd Lipcon commented on HBASE-3694: I don't think we should return the current value from the increment call unless it's necessary. For striped counters and such, a blind increment can often be cheaper than an increment-and-get. Isn't this the case with the Cliff Click Counters? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch, Hbase-3694[r1085306]_3.patch, Hbase-3694[r1085508]_4.patch, Hbase-3694[r1085593]_5.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010974#comment-13010974 ] stack commented on HBASE-3694: -- RSS has: {code} /** * Returns a reference to the RPC server metrics. */ public HBaseRpcMetrics getRpcMetrics(); {code} Could you add your counter to HBaseRpcMetrics class or would that be weird? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010980#comment-13010980 ] Ted Yu commented on HBASE-3694: --- How about piggybacking HServerInfo: {code} public HServerInfo getServerInfo(); {code} high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010981#comment-13010981 ] Todd Lipcon commented on HBASE-3694: RpcMetrics seems like the wrong spot to me. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010983#comment-13010983 ] Jonathan Gray commented on HBASE-3694: -- Neither of these seem right. Issue with adding another method for this? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010984#comment-13010984 ] stack commented on HBASE-3694: -- We could add a new method. Just trying to keep the methods to a minimum because mocking the Interface becomes a pain if a million methods to fill in (looks ugly too in tests). But go for it. Add getting a Counts class or something. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010986#comment-13010986 ] Ted Yu commented on HBASE-3694: --- @Liyin can you run your test after incorporating HBASE-3654 ? Just wonder how much influence the synchronization of onlineRegions might have on this issue. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011029#comment-13011029 ] Ted Yu commented on HBASE-3694: --- HRegion already has this: {code} final RegionServerServices rsServices; {code} You can reuse it instead of adding HRegionServer reference directly. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: Hbase-3694[r1085306], Hbase-3694[r1085306]_2.patch The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010376#comment-13010376 ] Todd Lipcon commented on HBASE-3694: I don't think a static variable is the way to go. In minicluster tests, you want to separately count memory for each RS, even though they share the same heap. Instead, I think we should add it to HRegionServer, or a new class like 'MemoryAccountingManager' which is accessible through HRegionServer. Thoughts? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010492#comment-13010492 ] Liyin Tang commented on HBASE-3694: --- We tried to add this var in RS and passing it to Region via its constructor at the beginning. However since the HRegion is not created by RS, it is hard to implement and it will pass the NULL to the HRegion constructor in most cases. Of course, we can set the RegionServer reference to the Regions every time, but it will make the code much more complicated. As long as this change ONLY conflicts with some unit tests, we can make it work for that case. For example, we can write a function in minicluster tests to get the global mem store size for the given Region Server. Any Thoughts:) Liyin high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010513#comment-13010513 ] Todd Lipcon commented on HBASE-3694: Just seems to me that static state is just the java equivalent of ugly global variables. They always come back to bite us in some way or another later on. I don't have the code handy at the moment (booted into Windows to work on a ppt :( ) but it seems like there has to be some way that the HRegion can get at the region server. I thought it had a RegionServerServices instance somewhere inside? high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010518#comment-13010518 ] ryan rawson commented on HBASE-3694: Lets avoid the static if at all possible. Ditto todd, it makes life hard later high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3694) high multiput latency due to checking global mem store size in a synchronized function
[ https://issues.apache.org/jira/browse/HBASE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010521#comment-13010521 ] Ted Yu commented on HBASE-3694: --- HRegion has reference to RegionServerServices HRegionServer is the only implementer of RegionServerServices. high multiput latency due to checking global mem store size in a synchronized function -- Key: HBASE-3694 URL: https://issues.apache.org/jira/browse/HBASE-3694 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is we found the multiput latency is very high. In our case, we have almost 22 Regions in each RS and there are no flush happened during these puts. After investigation, we believe that the root cause is the function getGlobalMemStoreSize, which is to check the high water mark of mem store. This function takes almost 40% of total execution time of multiput when instrumenting some metrics in the code. The actual percentage may be more higher. The execution time is spent on synchronize contention. One solution is to keep a static var in HRegion to keep the global MemStore size instead of calculating them every time. Why using static variable? Since all the HRegion objects in the same JVM share the same memory heap, they need to share fate as well. The static variable, globalMemStroeSize, naturally shows the total mem usage in this shared memory heap for this JVM. If multiple RS need to run in the same JVM, they still need only one globalMemStroeSize. If multiple RS run on different JVMs, everything is fine. After changing, in our cases, the avg multiput latency decrease from 60ms to 10ms. I will submit a patch based on the current trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira