[ https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058076#comment-17058076 ]
Danil Lipovoy edited comment on HDFS-15202 at 3/12/20, 4:40 PM: ---------------------------------------------------------------- [~leosun08] Thanks for interest!) Let me provide more details. I added some logging: {code:java} private BlockReader getBlockReaderLocal() throws InvalidToken { ... LOG.info("SSC blockId: " + block.getBlockId()); ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} And run reading test via HBase. We can see the log output: cat hbase-cmf-hbase-REGIONSERVER-wx1122-02.ru.log.out |grep "SSC blockId" 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256526 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252104 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256969 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256965 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256382 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251751 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256871 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256825 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256241 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256548 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256808 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252097 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110257035 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256779 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256331 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251521 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,452 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,455 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 We can see that last digit is evenly distributed. There are: 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 -> 3 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 -> 5 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 -> 6 Then I collected some statistics: cat hbase-cmf-hbase-REGIONSERVER-wx1122-02.ru.out |grep "SSC blockId"|awk '{print substr($0,length,1)}' >> ~/ids.txt cat ~/ids.txt | sort | uniq -c | sort -nr | awk '{printf "%-8s%s\n", $2, $1}'|sort 0 645813 1 559617 2 532624 3 551147 4 484945 5 465928 6 570635 7 473285 8 525565 9 447981 It means in the logs the last digit 0 found 645813 times Digit 1 - 559617 times etc. Quite evenly. When we divide blockId by modulus the blocks will cached evenly too. For example if clientShortCircuitNum = 3, then last digits will go to: blockId *0 -> shortCircuitCaches[0] blockId *1 -> shortCircuitCaches[1] blockId *2 -> shortCircuitCaches[2] blockId *3 -> shortCircuitCaches[0] blockId *4 -> shortCircuitCaches[1] blockId *5 -> shortCircuitCaches[2] blockId *6 -> shortCircuitCaches[0] blockId *7 -> shortCircuitCaches[1] blockId *8 -> shortCircuitCaches[2] blockId *9 -> shortCircuitCaches[0] There a little bit more [0] then [1] or [2], but not too much and works good) was (Author: pustota): [~leosun08] Thanks for interest!) Let me provide more details. I added some logging: {code:java} private BlockReader getBlockReaderLocal() throws InvalidToken { ... LOG.info("SSC blockId: " + block.getBlockId()); ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId()); {code} And run reading test via HBase. We can see the log output: cat hbase-cmf-hbase-REGIONSERVER-wx1122-02.ru.log.out |grep "SSC blockId" 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256526 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252104 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256969 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256965 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256382 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251751 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256871 2020-03-12 18:47:32,447 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256825 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256241 2020-03-12 18:47:32,448 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256548 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251488 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256808 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110252097 2020-03-12 18:47:32,449 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110257035 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256779 2020-03-12 18:47:32,450 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256331 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251521 2020-03-12 18:47:32,451 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 2020-03-12 18:47:32,452 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251691 2020-03-12 18:47:32,455 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251769 We can see that last digit is evenly distributed. There are: 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256403 -> 3 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110251835 -> 5 2020-03-12 18:47:32,446 INFO org.apache.hadoop.hdfs.BlockReaderFactory: SSC blockId: 1110256236 -> 6 Then I collected some statistics: cat hbase-cmf-hbase-REGIONSERVER-wx1122-02.ru.out |grep "SSC blockId"|awk '{print substr($0,length,1)}' >> ~/ids.txt cat ~/ids.txt | sort | uniq -c | sort -nr | awk '{printf "%-8s%s\n", $2, $1}'|sort 0 645813 1 559617 2 532624 3 551147 4 484945 5 465928 6 570635 7 473285 8 525565 9 447981 It means the logs the last digit 0 found 645813 times Digit 1 - 559617 times etc. Quite evenly. When we divide blockId by modulus the blocks will cached evenly too. For example if clientShortCircuitNum = 3, then last digts will go to: blockId *0 -> shortCircuitCaches[0] blockId *1 -> shortCircuitCaches[1] blockId *2 -> shortCircuitCaches[2] blockId *3 -> shortCircuitCaches[0] blockId *4 -> shortCircuitCaches[1] blockId *5 -> shortCircuitCaches[2] blockId *6 -> shortCircuitCaches[0] blockId *7 -> shortCircuitCaches[1] blockId *8 -> shortCircuitCaches[2] blockId *9 -> shortCircuitCaches[0] There a little bit more [0] then [1] or [2], but not too much and works good) > HDFS-client: boost ShortCircuit Cache > ------------------------------------- > > Key: HDFS-15202 > URL: https://issues.apache.org/jira/browse/HDFS-15202 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient > Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem. > 8 RegionServers (2 by host) > 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total > Random read in 800 threads via YCSB and a little bit updates (10% of reads) > Reporter: Danil Lipovoy > Assignee: Danil Lipovoy > Priority: Minor > Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, > hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, > hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png > > > ТотI want to propose how to improve reading performance HDFS-client. The > idea: create few instances ShortCircuit caches instead of one. > The key points: > 1. Create array of caches (set by > clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull > requests below): > {code:java} > private ClientContext(String name, DfsClientConf conf, Configuration config) { > ... > shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; > for (int i = 0; i < this.clientShortCircuitNum; i++) { > this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); > } > {code} > 2 Then divide blocks by caches: > {code:java} > public ShortCircuitCache getShortCircuitCache(long idx) { > return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; > } > {code} > 3. And how to call it: > {code:java} > ShortCircuitCache cache = > clientContext.getShortCircuitCache(block.getBlockId()); > {code} > The last number of offset evenly distributed from 0 to 9 - that's why all > caches will full approximately the same. > It is good for performance. Below the attachment, it is load test reading > HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that > performance grows ~30%, CPU usage about +15%. > Hope it is interesting for someone. > Ready to explain some unobvious things. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org