[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831719#comment-13831719 ] stack commented on HBASE-10015: --- On mac I see this w/ jvm1.8: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3032.9 sigma:60.501983438561744 Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2322.4 sigma:119.92430946226041 Let me try another box. How can I be sure there is no regression for wide tables? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831721#comment-13831721 ] Andrew Purtell commented on HBASE-10015: JDK 7, 16 vcores, SSD, xfs Without: 5 runs, Mean: 14.4136 Sigma: 0.17013712116995514 With the 0.94 -lock patch: 5 runs, Mean: 10.614 Sigma: 0.18904179432072687 Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831740#comment-13831740 ] Andrew Purtell commented on HBASE-10015: This is the same system with TestScanFilterPerformance: Without: 10 runs mean:2248.7 sigma:36.59795076230362 With the 0.94 -lock patch: 10 runs mean:1860.3 sigma:29.397448868906977 Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831742#comment-13831742 ] Ted Yu commented on HBASE-10015: On Linux with jdk 1.7 : with patch: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2297.5 sigma:23.29484921608208 without: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3242.4 sigma:732.0478399667605 Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831814#comment-13831814 ] Lars Hofhansl commented on HBASE-10015: --- So it seems this is universal. Our performance dude has actually confirmed that this is an expected outcome. I'll gather some more CPU metrics as I find time today. [~stack], these locks are taken in StoreScanner, which is row based (unlike StoreFileScanner and MemstoreScanner). So the frequent locks/unlocks there we'd mostly see per row. Wider rows won't be slower, but the speedup effect would be proportionally less. To be sure I'll validate with wider tables (maybe 20 CQs). I will also double check that biased locking is in effect. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832029#comment-13832029 ] Lars Hofhansl commented on HBASE-10015: --- 10 columns, 100 byte values, 1st and 4th selected. 5 runs, Mean: 13.756 Sigma: 0.04210938137755054, with lock patch 5 runs, Mean: 14.2028 Sigma: 0.08452313292821084, without 10 columns, all selected 5 runs, Mean: 8.577 Sigma: 0.09060463564299566 5 runs, Mean: 9.9846 Sigma: 0.1023749969474969, without Per KV cost now is dominant. In no scenario have I observed this to be slower. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832032#comment-13832032 ] Ted Yu commented on HBASE-10015: +1 Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832107#comment-13832107 ] Lars Hofhansl commented on HBASE-10015: --- I force enabled biased locking. No improvement with intrinsic locking; according to the docs it is enabled in JDK6+ anyway, was just making sure. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832147#comment-13832147 ] Enis Soztutar commented on HBASE-10015: --- If we do ref counting, can't we still finish the compaction, but not archive the files as long as there are scanners against them. We can do it by decoupling the store file archiving from compaction. At the worst case on RS crash, we might end up already compacted files which won't affect semantics. Regardless, +1 on -lock patch just for the gains. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832162#comment-13832162 ] Lars Hofhansl commented on HBASE-10015: --- Yeah, that's the idea. We'd delay archiving any HFile until no scanners are referring to it any more. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831001#comment-13831001 ] stack commented on HBASE-10015: --- So you are thinking this cannot be completed until after we add delayed clean up of no-longer-used files? Only then can we safely remove synchronizations? How many threads we talking anyways? It should be uncontended. The only thread is the current handler asking to return scan results -- this changes as different handlers come in on each bulk next invocation -- and then an incidental update readers request..and that is it? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1383#comment-1383 ] Lars Hofhansl commented on HBASE-10015: --- I think so (to your first point). This is (almost) never contented, but StoreScanner.peek() is called *very* frequently (including the compares in KeyValueHeap) and the memory fences enforced by synchronized cause a slowdown. I did notice that during flushed and compactions we do *not* register any listeners for changed readers, so my earlier idea of just synchronizing on the RegionScannerImpl should work after all. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831165#comment-13831165 ] Lars Hofhansl commented on HBASE-10015: --- Was just looking at making a trunk patch. In trunk StoreScanners for compaction *do* register a change observer? But not in 0.94?! So - sigh - the patch I just made for 0.94 will not work in trunk. And maybe more importantly: Is there a lingering bug in 0.94? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831191#comment-13831191 ] Lars Hofhansl commented on HBASE-10015: --- Continuing my monologue here... :) To my surprise I get a 25-30% perf improvement when I just replace intrinsic locking (synchronized) with ReentrantLock - again for very tall tables only. synchronized with biased locking should outperform the ReentrantLock in uncontended cases, but it does not (all my tests are against a real RegionServer, so the initial delay for BiasedLocking has not effect here). Something is going on. This is on JDK7 on old'ish 2 core machine. Will try on some other machines as well. If this bears out on other machines as well it would be safe and quick win. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13831202#comment-13831202 ] Lars Hofhansl commented on HBASE-10015: --- Verified on different machine architecture with JDK6 (using the attached unittest). 4400ms vs 3200ms (JDK7, 2 core machine) 2600mx vs 2000ms (JDK6, 12 core machine) Again, if somebody is still reading and could verify the latest patch with the earlier unittest on their hardware that would be greatly appreciated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-lock.txt, 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830719#comment-13830719 ] Lars Hofhansl commented on HBASE-10015: --- Something like this. On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch. In any case, just commit this change and keep working on it? As Vladimir points out there are other issues with more serious performance implications. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830735#comment-13830735 ] Andrew Purtell commented on HBASE-10015: bq. On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch. I haven't thought about this nearly as much as you have recently but that sounds promising. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830739#comment-13830739 ] stack commented on HBASE-10015: --- +1 on committing what is done already. epoch sounds like refcounting? Yeah, no hurry deleting the old stuff as long as it is done eventually. Accounting would be easier if we could move/rename files under the scanner as long is it does not disrupt (maybe I can try this). I like your idea of lockless scanning. Would be good to put it up as a goal even if hard to attain, if only to orientate which way progress lies. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830786#comment-13830786 ] Lars Hofhansl commented on HBASE-10015: --- Epoch would be like reference counting per distinct sweet of readers. With just a reference count of scanners I'd be worried that compaction would never make any progress. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830849#comment-13830849 ] Lars Hofhansl commented on HBASE-10015: --- Actually this is not quite right in 0.94, because it does not have HBASE-6499. (seek does not call checkReseek). I'll make that change as well, also checkUpdatedReaders can be folded into checkReseek for better readability. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830857#comment-13830857 ] Lars Hofhansl commented on HBASE-10015: --- No longer sure that patch is good. The problem was that StoreScanner.peek() is synchronized. With the patch peek() will use the old scanner stack, looking at MemstoreScanner and StoreFileScanner it *is* correct (both scanners just return the current KV and StoreFileScanner does not touch its reader during peek), but it is fragile, it also requires StoreScanner to hang on to all StoreFileScanners and the MemstoreScanner, until checkReseek is called or the scanner is closed. This in turn means that we keep references to the readers open, etc. I keep doing that... Filing jiras with patches and then finding that the fix is a bad idea. Well, at least I learned about modern CPU and that synchronized in tight loops is very expensive. I will keep thinking about this. Unscheduling for now. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829824#comment-13829824 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615283/10015-trunk-v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830174#comment-13830174 ] Lars Hofhansl commented on HBASE-10015: --- Thanks Stack. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830243#comment-13830243 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615356/10015-trunk-v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830296#comment-13830296 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615371/10015-trunk-v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830309#comment-13830309 ] Lars Hofhansl commented on HBASE-10015: --- Need to look at the find bugs thing Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830333#comment-13830333 ] stack commented on HBASE-10015: --- +1 on commit. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830464#comment-13830464 ] Lars Hofhansl commented on HBASE-10015: --- There's more stuff to do. I got a crash course on CPU design from Hussam Mousa, one of our performance experts. We saw a lot of CPU frontend and backend stalls during scanning, so there is potential for a lot more improvements. I'm still looking for ways to remove all synchronization from StoreScanner and RegionScannerImpl. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830467#comment-13830467 ] Lars Hofhansl commented on HBASE-10015: --- I'll look at the findbugs issue, do some more tests, and then commit. An interesting metric we have to start to pay more attention is the scan cost per KV. For example scanning through 1m rows with one CQ is *much* slower than scanning through 100k rows with 10 CQs, even though it touches the same number of KVs. This patch helps a bit to even that out. As [~stack] and I said in the comments here, it should be possible to remove all synchronization form StoreScanner and RegionScannerImpl. It would require some refactoring. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830481#comment-13830481 ] stack commented on HBASE-10015: --- bq. It would require some refactoring. Lets make a plan. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830495#comment-13830495 ] Lars Hofhansl commented on HBASE-10015: --- The findbugs warning is a dud, it complains about StoreScanner.heap being locked 79% of the cases. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830496#comment-13830496 ] Vladimir Rodionov commented on HBASE-10015: --- {quote} so there is potential for a lot more improvements {quote} KV creation (new) is one of the serious bottlenecks, but I have no idea how to not create new instances on *next*. I have done some HBase internal hacks to get Maximum possible performance from scan. It is the multi-threaded application (in my case - 8HT threads) and scans on StoreFileScanner directly (data is cached 100%). The table was tall and narrow. Stock HBase was able to reach 50M KV per sec Stock with KV reuse (hack) - 90M KV per sec. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830498#comment-13830498 ] Lars Hofhansl commented on HBASE-10015: --- Here's another idea: * we already lock the RegionScannerImpl.next/seek/reseek, etc. * what if we pass the RegionScannerImpl instance as a lock object to StoreScanner * in updateReaders() we'd then lock on that RegionScannerImpl instance, or we'd call a special synchronized method in RegionScannerImpl and have it update the readers. The lock scope would be broader, but it should be correct, and it would allow us to remove all locking from StoreScanner. I'll experiment with that. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830499#comment-13830499 ] Lars Hofhansl commented on HBASE-10015: --- [~vrodionov], yeah, need to work on that too. Is that with block encoding? When using block encoding we need to copy the underlying byte[]. When no block encoding is used making a new KV is just a few dozen bytes. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830578#comment-13830578 ] Lars Hofhansl commented on HBASE-10015: --- bq. pass the RegionScannerImpl instance as a lock object to StoreScanner Alas, that works fine for scanning, but not for compactions/flushes (which also use StoreScanner, and can actually override the StoreScanner via a coprocessor hook). Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830580#comment-13830580 ] stack commented on HBASE-10015: --- bq. but not for compactions/flushes Tell us more? Am interested. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830588#comment-13830588 ] Lars Hofhansl commented on HBASE-10015: --- I thought, since we lock all operations at the RegioScannerImpl level anyway, we could exclude a Store's updateReaders at that level too. That works fine if we have a RegionScanner, but in the case of compactions and flushes we don't one, so I have no way to protect the StoreScanner reading on behalf of a flush or a compaction. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830597#comment-13830597 ] stack commented on HBASE-10015: --- And it would be silly creating a regionscannerimpl to do a storescan. We need to do update readers differently. Compactions and flushes are rare and background tasks. They should defer to the ongoing scans. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830600#comment-13830600 ] Lars Hofhansl commented on HBASE-10015: --- Yeah, I'm coming around to that. :) We have to figure out how to do that without excessive synchronization. At the minimum we have to guarantee that nobody is currently using the readers in question before we retire them. Alternatively we wait for all running scanners to finish. How do we do that? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830605#comment-13830605 ] Lars Hofhansl commented on HBASE-10015: --- I can come up with various schemes for that, but nothing that is cheaper than just synchronizing all the methods. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830612#comment-13830612 ] stack commented on HBASE-10015: --- + Scans run free for N seconds or nanoseconds since checking this should be cheap(?) and then they go to a checkpoint where they look to see if they should reset. ChangedReaders blocks until checkpoint has been cleared. + Scans check for closing being set on each op (I suppose this check of a volatile would be just as bad as a synchronization). + We refcount outstanding scanners and only delete compacted files when refcount goes to zero. Not sure how we'd switch in flushes unless we prevent the flush happening while ongoing scan (eek). Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829403#comment-13829403 ] Lars Hofhansl commented on HBASE-10015: --- Thanks [~vrodionov], we'll look at RegionScanner next. :) My evil plans is to eventually get rid of all synchronization during scanning and provide exclusion by containment instead. The overall improvement is real. Validated on various different machines. The effect of memory stalls is probably more pronounced in the running server. In any case, I think we agree that this patch can't make things worse (provide it is correct, of course). I also tried with count(*) queries in Phoenix on tall tables (to rule out some anomalies with Filters). I see a 90% improvement there - again only on tall tables. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829302#comment-13829302 ] Lars Hofhansl commented on HBASE-10015: --- I'll make a trunk patch later today. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829205#comment-13829205 ] Ted Yu commented on HBASE-10015: I tried to run the unit test but got: {code} testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance) Time elapsed: 0.007 sec ERROR! org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tyu/hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) {code} Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829416#comment-13829416 ] Lars Hofhansl commented on HBASE-10015: --- Some more results: 10 runs mean:1851.0 sigma:127.71374240855992 vs (without StoreScanner patch) 10 runs mean:2795.8 sigma:57.52182194611015 Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829423#comment-13829423 ] Lars Hofhansl commented on HBASE-10015: --- {code} 0: jdbc:phoenix:localhost select count(*) from tableTall; +--+ | COUNT(1) | +--+ | 2000 | +--+ 1 row selected (3.946 seconds) {code} With patch: {code} 0: jdbc:phoenix:localhost select count(*) from tableTall; +--+ | COUNT(1) | +--+ | 2000 | +--+ 1 row selected (2.915 seconds) {code} Phoenix is using FAST_DIFF encoding and the ExplicitColumnTracker, so the relative gain is only 25% there (one CQ per row) Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829485#comment-13829485 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615205/10015-trunk.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.util.TestMergeTable org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829212#comment-13829212 ] Lars Hofhansl commented on HBASE-10015: --- That would be related to your setup I think. Seems like it can't start the mini cluster. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829225#comment-13829225 ] Andrew Purtell commented on HBASE-10015: [~te...@apache.org] that's probably HBASE-5711, do 'umask 022' first. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829365#comment-13829365 ] Vladimir Rodionov commented on HBASE-10015: --- I ran RegionScanner and see ~10% improvement on on a very narrow rows only (8 store files in a region). This is 0.94.6. By the way the performance difference between StoreScanner and RegionScanner is huge (almost 3x times). Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829268#comment-13829268 ] Ted Yu commented on HBASE-10015: 200 iterations of TestStoreScanner, TestAtomicOperation and TestAcidGuarantees passed based on 0.94 patch. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829118#comment-13829118 ] Lars Hofhansl commented on HBASE-10015: --- If somebody could run the attached unit test that would be greatly appreciated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829223#comment-13829223 ] Andrew Purtell commented on HBASE-10015: Without: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2224.3 sigma:24.178709642989634 With: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:1373.3 sigma:14.920120642943875 This is on an EC2 c3.2xlarge, which uses 16 of 20 available hardware threads of a two socket board with Xeon E5-2680s, backed by SSDs. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829547#comment-13829547 ] Lars Hofhansl commented on HBASE-10015: --- The failures are all NPEs. Two of them could be related, the last is during DFS cluster shutdown. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829552#comment-13829552 ] Lars Hofhansl commented on HBASE-10015: --- Good together with some of our hardware exports. We measured some of a perf counters and found that with the patch we see 50% less branch-misses, which is significant. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829576#comment-13829576 ] Lars Hofhansl commented on HBASE-10015: --- 10 runs mean:2770.5 sigma:85.37827592543668, without 10 runs mean:1791.2 sigma:50.81495842761264, with Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829603#comment-13829603 ] Lars Hofhansl commented on HBASE-10015: --- Still getting the NPE in trunk in TestHRegion with v2, though (0.94 is fine with v2)... Looking. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829230#comment-13829230 ] Ted Yu commented on HBASE-10015: @Andy: Thanks for the hint. I ran 'umask 022' first but got the same result on Mac. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829624#comment-13829624 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615249/10015-trunk-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829645#comment-13829645 ] stack commented on HBASE-10015: --- The numbers are great. Patch looks good given the premise. Was going to suggest volatile instead of AtomicBoolean but looks like the getAndSet is fundamental so withdraw this nit. Is test failure because of this patch? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829649#comment-13829649 ] Lars Hofhansl commented on HBASE-10015: --- Unfortunately I am no longer sure it actually works. I identified the problem: In HStore.completeCompaction we call notifyChangedReadersObservers, which calls updateReaders on all StoreScanners. Before this patch, this method would block if there was a StoreScanner with a next/seek/reseek method. So before that patch one would guarantee that after notifyChangedReadersObservers returns it is safe to remove the compated files. That is no longer true. I'll see if I can come up with something. It is a shame that we have to synchronize and get 10's of millions of branch mispredictions per second, just so we can compact a few times a day. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829670#comment-13829670 ] Lars Hofhansl commented on HBASE-10015: --- A possible solution is keeping updateReaders() all methods that actually call checkReseek synchronized. So peek() could still go without synchronization. Will test the performance with that. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829673#comment-13829673 ] Lars Hofhansl commented on HBASE-10015: --- That seems to be correct. updateReaders now waits until all operations that are effected by the reader changes to finish. peek() does not need to be synchronized as long as we do not change scanner stack from under its feet, which we avoid by deferring to the scanner thread to do that. The unittest now yields this: 10 runs mean:4449.2 sigma:24.395901295094635, without patch 10 runs mean:2609.2 sigma:67.94085663280968, with patch So, still a significant win, albeit to quite what it was before. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829679#comment-13829679 ] stack commented on HBASE-10015: --- Can we have it so no changing of readers during a scan? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829686#comment-13829686 ] Lars Hofhansl commented on HBASE-10015: --- I think it would be hard to # push reader switching to the scanning thread # guarantee that the readers will eventually be switched to the compaction can proceed and finished What if a scanner isn't closed? Or nobody ever calls next() on it? Then we have to wait for the lease to expire before the compaction can finish. next()/seek()/reseek() are not as critical as they are not called remotely as often as peek. peek() is also a very small method and (according to our perf expert here) small synchronized methods have the greatest potential to throw the branch prediction off. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829695#comment-13829695 ] Lars Hofhansl commented on HBASE-10015: --- Phoenix with v3 (different machines than above, so do not compare absolute numbers). Without patch: {code} 0: jdbc:phoenix:localhost select count(*) from myNew1; +--+ | COUNT(1) | +--+ | 3000 | +--+ 1 row selected (18.417 seconds) {code} With patch: {code} 0: jdbc:phoenix:localhost select count(*) from myNew1; +--+ | COUNT(1) | +--+ | 3000 | +--+ 1 row selected (13.319 seconds) {code} Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829698#comment-13829698 ] stack commented on HBASE-10015: --- bq. Then we have to wait for the lease to expire before the compaction can finish. I was thinking this might not be the end of the world. What if we closed and reopened the current scanner when readers are changed. It is a rare event. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829700#comment-13829700 ] Lars Hofhansl commented on HBASE-10015: --- Actually now it does not even need the AtomicBoolean anymore. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829701#comment-13829701 ] Lars Hofhansl commented on HBASE-10015: --- The closing and reopening is what needs the synchronization :) A thread might still be using the scanner. If we can make it so that we do not have to synchronize all methods in StoreScanner, and still do it safely I'd be happy... But frankly ATM I do not see how without other expensive synchronization. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829732#comment-13829732 ] stack commented on HBASE-10015: --- bq. The closing and reopening is what needs the synchronization I'm not helping. This looks like an issue: +if (shouldUpdateReaders) { + shouldUpdateReaders = false; It is happening outside a sync block and shouldUpdateReaders is not volatile. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829736#comment-13829736 ] Lars Hofhansl commented on HBASE-10015: --- checkUpdateReaders is called from checkReseek, which is only called from synchronized methods. :) You are helping. Looking at this from different angles is good. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829751#comment-13829751 ] stack commented on HBASE-10015: --- OK. Would suggest running hadoopqa a few times. Tests around changing readers are usually pretty good at finding issues if any. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829754#comment-13829754 ] Lars Hofhansl commented on HBASE-10015: --- Yep. Found issues with the first two versions of this. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829767#comment-13829767 ] Hadoop QA commented on HBASE-10015: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615278/10015-trunk-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//console This message is automatically generated. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10015-0.94-v2.txt, 10015-0.94-v3.txt, 10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828074#comment-13828074 ] Lars Hofhansl commented on HBASE-10015: --- The 3x improvement I see for tall tables. For wider tables the improvement is less pronounced (for 5 columns I see a 20% or so improvement). Also a note as to why I think this should be correct: Even with the current synchronized, a next/peek/reseek that has already started, will finish with the old reader. That has not changed. Need to look closer at races between close() and next/peek/reseek, as close could be called from another thread (I think), when the lease expired. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828082#comment-13828082 ] Lars Hofhansl commented on HBASE-10015: --- Actually the main difference is between ScanWildcardColumnTracker and ExplicitColumnTracker. With ExplicitColumnTracker there are still many reseeks that outweigh the performance improvement seen here. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828088#comment-13828088 ] Vladimir Rodionov commented on HBASE-10015: --- It is interesting. Java synchronization w/o thread contention cost is close to zero. You would see the difference only when you run multiple threads accessing the same StoreScanner. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828103#comment-13828103 ] Lars Hofhansl commented on HBASE-10015: --- That is true as far as actual thread synchronization goes. Every synchronize still places a read and write memory fence, though, and I think that is the effect we're seeing. I was a bit surprised about the magnitude of this myself. I can try switching one of my cores off and measure this again, if the issues is memory fencing we should not see any improvement with one core only. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828131#comment-13828131 ] Lars Hofhansl commented on HBASE-10015: --- Nope. I see the same effect with just one core. Will on some other machines and different versions of the JDK. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828130#comment-13828130 ] Vladimir Rodionov commented on HBASE-10015: --- I just ran mys tests and found no difference at all, but they were single-thread StoreScanner. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828278#comment-13828278 ] Lars Hofhansl commented on HBASE-10015: --- Close is fine, since it is only triggered via RegionScannerImpl, which is synchronized. The overhead I measured *could* explain the numbers I've seen here: https://issues.apache.org/jira/browse/HBASE-9440?focusedCommentId=13767047page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13767047 (Compared to the raw scan speed in an HFile I found medium overhead per column and a lot of overhead per rows, and StoreScanner.peek()/next(), etc, are mostly called per row). Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828289#comment-13828289 ] Andrew Purtell commented on HBASE-10015: The patch doesn't look wrong, but I'm surprised at the improvement. If not too onerous, perhaps you could attach a test case that reproduces it? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828288#comment-13828288 ] Lars Hofhansl commented on HBASE-10015: --- bq. I just ran mys tests and found no difference at all, but they were single-thread StoreScanner. Hmm... All my tests are single threaded. Different JVM versions? Are you testing with tall tables? Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828297#comment-13828297 ] Lars Hofhansl commented on HBASE-10015: --- It'll be hard to disentangle the test. Basically I am testing with a table with a single column family, and filtering all data at the server with a ValueFilter. Lemme have an extra look at the code. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828307#comment-13828307 ] Nick Dimiduk commented on HBASE-10015: -- Could the test be isolated in a variant of HFilePerfEval? I've had luck isolating the BlockCache in a similar variant (HBASE-9806). Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828332#comment-13828332 ] Vladimir Rodionov commented on HBASE-10015: --- {quote} Different JVM versions? {quote} 1.6_56 Mac OSX 10.7.5. I concur with myself one more time: the cost of synchronized is very low when there is no thread contention. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828346#comment-13828346 ] Lars Hofhansl commented on HBASE-10015: --- Strange... On my desktop machine (12 cores, JDK6) I also measure no difference. My Laptop has 2 cores and OpenJDK7. I wonder whether I am seeing an OpenJDK bug. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828361#comment-13828361 ] Lars Hofhansl commented on HBASE-10015: --- Confirmed on my laptop with OpenJDK, 2.5x improvement over 10 runs very low standard deviation. Going to do some microbenchmarks. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828365#comment-13828365 ] Lars Hofhansl commented on HBASE-10015: --- Wait... No, I do see the same improvement in the JDK6, 12 core box (I forgot to switch the server over). I'll attach a quick and dirty benchmark. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828410#comment-13828410 ] Lars Hofhansl commented on HBASE-10015: --- bq. I concur with myself one more time: the cost of synchronized is very low when there is no thread contention. Well, you're wrong twice then :) Just try it... Call a synchronized method in a loop a few 100 million times. Then remove the synchronized. Make sure the method returns something, such as a reference to a member, so it is not optimized immediately. On my test machines (JDK6 and JDK7) the latter is at least 40x faster on some machines it's 63x slower. All just a single thread. As I said before, synchronized does more than exclusion. # it barres JVM from reordering instructions # it places memory fences (both read and write), which -depending on exact hw- disallows instruction reordering of the CPU # it may flush cache lines Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828418#comment-13828418 ] Lars Hofhansl commented on HBASE-10015: --- If somebody else could run the test I attached that'd be great. I ran against a local single node HBase on top of a single node HDFS cluster, with all data in the blockcache. Maybe a unittest against a mini cluster would be better. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828439#comment-13828439 ] Vladimir Rodionov commented on HBASE-10015: --- May be I am wrong (empty synchronized method call cost on my laptop is 25 ns) but my own tests on StoreScanner show 0 improvement. Code is simple: create region, populate with data (make sure data is in a cache) , then {code} LOG.info(Test store scanner); Scan scan = new Scan(); scan.setStartRow(region.getStartKey()); scan.setStopRow(region.getEndKey()); Store store = region.getStore(CF); StoreScanner scanner = new StoreScanner(store, store.getScanInfo(), scan, null); long start = System.currentTimeMillis(); int total = 0; ListKeyValue result = new ArrayListKeyValue(); while(scanner.next(result)){ total++; result.clear(); } LOG.info(Test store scanner finished. Found +total + in +(System.currentTimeMillis() - start)+ms); {code} This test shows exact the same time for both: default StoreScanner and *unsynchronized* StoreScanner. The scan is not very fast: 1-1.5M rows per sec (rows are relatively small: 1 CF + 5 CQ, ~ 120 bytes ) Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828466#comment-13828466 ] Vladimir Rodionov commented on HBASE-10015: --- For all microbenchmarks, please add the following command -line args: {code} -XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0 {code} Oracle JVM does not enable biased locking (single thread lock optimization) first 4s of program execution. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828472#comment-13828472 ] Lars Hofhansl commented on HBASE-10015: --- Can you try at the RegionScanner level, which maintains a heap of StoreScanners and calls peek() quite frequently. In my sampling sampling profiler I saw StoreScanner.peek() come us as some of the top method where the time is spent (and does nothing but doing a compare on a local member then calls peek on its heap). Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828473#comment-13828473 ] Lars Hofhansl commented on HBASE-10015: --- Will add those options. All the tests run at least 15s, though. Oh, also rereading your comment above... I saw the most significant improvement per row (i.e. with 5 CQ you'd see less of an improvement). This very much looks like it is an interaction between KeyValueHeap and StoreScanner, where the cost seems to be mostly per row, rather than KeyValue. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828477#comment-13828477 ] Vladimir Rodionov commented on HBASE-10015: --- {quote} Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences. {quote} With biased locking on, its ~ 2ns. I will try RegionScanner. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828547#comment-13828547 ] Lars Hofhansl commented on HBASE-10015: --- One last note before I call it quits today. The win appears to reduce the per row overhead by 50-60%. With 5 CQs this shrinks proportionally to 10-15% or so. Major performance improvement: Avoid synchronization in StoreScanner Key: HBASE-10015 URL: https://issues.apache.org/jira/browse/HBASE-10015 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Attachments: 10015-0.94-withtest.txt, 10015-0.94.txt, TestLoad.java Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there. It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time. So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work. The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios. Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass. Will attach a sample patch. -- This message was sent by Atlassian JIRA (v6.1#6144)