[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110448#comment-17110448 ] Wei-Chiu Chuang commented on HDFS-15202: Reverted the breaking commit, and squashed the original commit with the addendum patch into one. Verified the branch compiles in trunk and branch-3.3. Thanks guys! > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110449#comment-17110449 ] Hudson commented on HDFS-15202: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18270 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18270/]) Revert "HDFS-15202 Boost short circuit cache (rebase PR-1884) (#2016)" (weichiu: rev 4525292d41482330a86f1cc3935e072f9f67c308) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderLocal.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java HDFS-15202 Boost short circuit cache (rebase PR-1884) (#2016) (weichiu: rev 2abcf7762ae74b936e1cedb60d5d2b4cc4ee86ea) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderLocal.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) ---
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110439#comment-17110439 ] Danil Lipovoy commented on HDFS-15202: -- I'am so sorry, my bad( > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110431#comment-17110431 ] Xiaoqiao He commented on HDFS-15202: [^HDFS-15202-Addendum-01.patch] works fine for me. +1 from my side. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110428#comment-17110428 ] Wei-Chiu Chuang commented on HDFS-15202: Thanks. Working on that. Given me a few minutes. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110421#comment-17110421 ] Ayush Saxena commented on HDFS-15202: - [~weichiu] can you check is the code compiling with this? Wasn't happening for me. I attached the changes I did for my compilation to work. Give a check if the compilation is broken for you too.. If the code change is itself wrong you can revert itself or see if the addendum works for you. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110419#comment-17110419 ] Mikhail Pryakhin commented on HDFS-15202: - [~weichiu] it seems that this patch breaks test compilation at trunk > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15202-Addendum-01.patch, HDFS_CPU_full_cycle.png, > cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110319#comment-17110319 ] Hudson commented on HDFS-15202: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18267 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18267/]) HDFS-15202 Boost short circuit cache (rebase PR-1884) (#2016) (github: rev 86e6aa8eec538e142044e2b6415ec1caff5e9cbd) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientContext.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/client/impl/TestBlockReaderLocal.java > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110317#comment-17110317 ] Wei-Chiu Chuang commented on HDFS-15202: Merged the PR to trunk, and cherrypicked to branch-3.3. The commit doesn't apply cleanly to branch-3.2 and lower branches. If that is desired please raise a new PR against those branches. Thanks [~pustota] [~leosun08] > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098310#comment-17098310 ] Danil Lipovoy commented on HDFS-15202: -- [~leosun08] Thank you!) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098280#comment-17098280 ] Lisheng Sun commented on HDFS-15202: Sorry [~pustota] Since there are many things recently, I will review this patch in the coming week. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081718#comment-17081718 ] Danil Lipovoy commented on HDFS-15202: -- [~leosun08] Could you take a look, please? > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076149#comment-17076149 ] Danil Lipovoy commented on HDFS-15202: -- [~leosun08] Sorry for waiting, was a little bit busy) I added few UT, all passed: [[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 116.58 s - in org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal|http://example.com] > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059985#comment-17059985 ] Danil Lipovoy commented on HDFS-15202: -- [~quanli] I did 2 tests: 1. Via HBase. It was massive reading huge tables: 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total Random read in 100, 200,... 500 threads via YCSB (good utility special for DB performance testing) It show +25% increase perforormance. Don't so much because the bottle neck is here HBase . 2. Direct reading by function (I provided more details above): FSDataInputStream in = fileSystem.open(path); int res = in.read(position, byteBuffer, 0, 65536); This function working in separate thread. Each thread is reading separate file ~1Gb. I run 10, 20, ... 200 threads and it means were reading 10, 20, ... 200 files. Importdant notice - this performance possible only with fast SSD or when data cached by Linux buff/cache (my case). In this scenario increase performance about +500%. Environment: 4 nodes E5-2698 v4 @ 2.20GHz (40 cores each), 700 Gb Mem. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059763#comment-17059763 ] Quan Li commented on HDFS-15202: What procedure is followed for the performance test Danil? and what is the environment details > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059653#comment-17059653 ] Lisheng Sun commented on HDFS-15202: UT mean unit test. the current unit test is that only the case that clientShortCircuitNum = 1, we also need to add the case that clientShortCircuitNum != 1. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059648#comment-17059648 ] Danil Lipovoy commented on HDFS-15202: -- >>Yes, you can develop it step by step. Cool) >>I think you need add UT when clientShortCircuitNum != 1. Could you explain what "UT" mean, please? > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059643#comment-17059643 ] Lisheng Sun commented on HDFS-15202: Yes, you can develop it step by step. I think you need add UT when clientShortCircuitNum != 1. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059638#comment-17059638 ] Danil Lipovoy commented on HDFS-15202: -- Yes, you are right, there are ways improve performance even more. On the other hand it would by more difficult. Can we go step by step, something like agile way? Any way, since it is better then before, can we merge it? It is important for me) I would test some CRC32 for example later. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059624#comment-17059624 ] Lisheng Sun commented on HDFS-15202: Two situations: 1. when clientShortCircuitNum is 10, The ShortCircuitCache's for the blockId is not very uniform. 2. For example if clientShortCircuitNum is 3, when a lot of blockids of SSR are ***1, ***4, ***7, this situation will fall into a ShortCircuitCache. So i suggest, is it possible to design a strategy which is allocated to a specific ShortCircuitCache. This should improve performance even more. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058076#comment-17058076 ] Danil Lipovoy commented on HDFS-15202: -- [~leosun08] Thanks for interest!) Let me provide more details. I added some logging: {code:java} private BlockReader getBlockReaderLocal() throws InvalidToken { ... *LOG.info("SSC blockId: " + block.getBlockId());* ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} And run reading test via HBase. We can see the log output: cat hbase-cmf-hbase-REGIONSERVER-wx1122-02.ru.log.out |grep "SSC blockId" 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256526 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252104 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256969 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256965 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256382 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251751 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256871 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256825 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256241 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256548 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256808 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252097 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110257035 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256779 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256331 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251521 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,452 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,455 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 We can see that last digit is evenly distributed. There are: 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 -> 3 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 -> 5 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 -> 6 Then I collected some statistics: cat hbase-cmf-hbase-REGIONSERVER-hw2288-32.cloud.dev.df.sbrf.ru.log.out.2 |grep "SSC blockId"|awk '{print substr($0,length,1)}' >> ~/ids.txt cat ~/ids.txt | sort | uniq -c | sort -nr | awk '{printf "%-8s%s\n", $2, $1}'|sort 0 645813 1 559617 2 532624 3 551147 4 484945 5 465928 6 570635 7 473285 8 525565 9 447981 It means the logs the last digit 0 found 645813 times Digit 1 - 559617 times etc. Quite evenly. When we divide blockId by modulus the blocks will cached evenly too. For example if clientShortCircuitNum = 3, then last digts will go to: blockId *0 -> shortCircuitCaches[0] blockId *1 -> shortCircuitCaches[1] blockId *2 -> shortCircuitCaches[2] blockId *3 -> shortCircuitCaches[0] blockId *4 -> shortCircuitCaches[1] blockId *5 -> shortCircuitCaches[2] blockId *6 -> shortCircuitCaches[0] blockId *7 -> shortCircuitCaches[1] blockId *8 -> shortCircuitCaches[2] blockId *9 -> shortCircuitCaches[0] There a little bit more [0] then [1] or [2], but not too much and works good) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057937#comment-17057937 ] Lisheng Sun commented on HDFS-15202: Thanx [~pustota] for your report. I understand your PR is allocating more ShortCircuitCaches, to utilize each ShortCircuitCache's own lock without blocking each othe to solve performance problems. Therefore, we need to consider distributing the blocks as evenly as possible in more caches. The current code is simply taking the remainder through the blockId and clientShortCircuitNum. This should also have hot cache issues. Please correct me i was wrong. Thank you. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054902#comment-17054902 ] Danil Lipovoy commented on HDFS-15202: -- How to read the picture: the execution time of 100 thousand readings in blocks of 64 KB with one cache (SSC=1) take 78 seconds. When SCC=5 caches this is done in 17 seconds. It means acceleration of 4.5 times. The best results relative CPU utilization when SCC=2 or SCC=3. If we compare SCC=3 increase performance = ~3.3 in average but CPU utilization only ~2.8. !hdfs_scc_3_test.png! > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054897#comment-17054897 ] Danil Lipovoy commented on HDFS-15202: -- Done more tests. There were test via HBase. More clean just read direct: {code:java} conf.set("dfs.client.read.shortcircuit", "true"); conf.set("dfs.client.read.shortcircuit.buffer.size", "65536"); // by default 1 Mb and it is too much conf.set("dfs.client.short.circuit.num", num); // from 1 to 10 ... FSDataInputStream in = fileSystem.open(path); for (int i = 0; i < count; i++) { position += 65536; if (position > 9) position = 0L; int res = in.read(position, byteBuffer, 0, 65536); } {code} This code is executed in separate threads and we will increase the number of simultaneously reading files (from 10 to 200 - the horizontal axis) and the number of caches (from 1 to 10 - lines). The vertical axis shows the acceleration that gives an increase in SSC relative to the case when there is only one cache. !hdfs_scc_test_full-cycle.png! !HDFS_CPU_full_cycle.png! > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_test_full-cycle.png, locks.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054059#comment-17054059 ] Hadoop QA commented on HDFS-15202: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 58s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} trunk passed {color} | | {color:orange}-0{color} | {color:orange} patch {color} | {color:orange} 3m 14s{color} | {color:orange} Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 38s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 58s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 39s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 39s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 54s{color} | {color:orange} hadoop-hdfs-project: The patch generated 6 new + 272 unchanged - 0 fixed = 278 total (was 272) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 39s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 1m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 2s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 39s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 59s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 0s
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054040#comment-17054040 ] Danil Lipovoy commented on HDFS-15202: -- Here we can see the locks when the cache is only 1. !locks.png! Thats why we have more fast reading when use few caches. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, > locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054022#comment-17054022 ] Danil Lipovoy commented on HDFS-15202: -- !cpu_SSC2.png! > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, cpu_SSC2.png, hdfs_cpu.png, hdfs_reads.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054021#comment-17054021 ] Danil Lipovoy commented on HDFS-15202: -- Done more tests. Moved out generators on others hosts. And added more threads (by 100) every 5 minutes. It is help to see were the limits of performance. And now CPU usage grows like performance about +30% !cpu_SSC.png! !requests_SSC.png! > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: cpu_SSC.png, hdfs_cpu.png, hdfs_reads.png, > requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054018#comment-17054018 ] Danil Lipovoy commented on HDFS-15202: -- [~weichiu] [Done|https://github.com/apache/hadoop/pull/1884] now) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051452#comment-17051452 ] Danil Lipovoy commented on HDFS-15202: -- [~weichiu] ok, [done|https://github.com/apache/hadoop-common/pull/51]) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050520#comment-17050520 ] Wei-Chiu Chuang commented on HDFS-15202: [~pustota] could you raise a PR against https://github.com/apache/hadoop (trunk branch) I'm sorry it must be confusing to you. > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050518#comment-17050518 ] Wei-Chiu Chuang commented on HDFS-15202: [~openinx] [~leosun08] are you guys interested in reviewing this improvement since this concerns SCR performance? > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15202) HDFS-client: boost ShortCircuit Cache
[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050472#comment-17050472 ] Danil Lipovoy commented on HDFS-15202: -- Hi [~weichiu]! Of course, I am newbe here, don't know how to do better) > HDFS-client: boost ShortCircuit Cache > - > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 1200 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) >Reporter: Danil Lipovoy >Assignee: Danil Lipovoy >Priority: Minor > Attachments: hdfs_cpu.png, hdfs_reads.png > > > I want to propose how to improve reading performance HDFS-client. The idea: > create few instances SchortCircuit caches instead of one. > The key points: > 1. Create array of caches (see *dfs.client.short.circuit.num* in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org