[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache
[ https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424499#comment-17424499 ] Janus Chow edited comment on HDFS-13821 at 10/6/21, 7:58 AM: - The performance bottleneck should be related to the bug mentioned in [https://github.com/google/guava/issues/1055] . We can work around this issue by setting initialCapacity to maxCacheSize (mentioned in [https://unportant.info/chasing-down-guava-cache-slowness.html] ) In branch 2.10, the guava version is 11.0.2, it's still affected. was (Author: symious): The performance bottleneck should be related to the bug mentioned in [https://github.com/google/guava/issues/1055.] We can work around this issue by setting initialCapacity to maxCacheSize (mentioned in [https://unportant.info/chasing-down-guava-cache-slowness.html).] In branch 2.10, the guava version is 11.0.2, it's still affected. > RBF: Add dfs.federation.router.mount-table.cache.enable so that users can > disable cache > --- > > Key: HDFS-13821 > URL: https://issues.apache.org/jira/browse/HDFS-13821 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0, 2.9.1, 3.0.3 >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Major > Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2 > > Attachments: HDFS-13821.001.patch, HDFS-13821.002.patch, > HDFS-13821.003.patch, HDFS-13821.004.patch, HDFS-13821.005.patch, > HDFS-13821.006.patch, HDFS-13821.007.patch, HDFS-13821.008.patch, > LocalCacheTest.java, image-2018-08-13-11-27-49-023.png > > > When i test rbf, if found performance problem. > I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and > get the following stack frames > {quote} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005c264acd8> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {quote} > Many threads blocked on *LocalCache* > After disable the cache, ProxyAvgTime is down as follow showed > !image-2018-08-13-11-27-49-023.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache
[ https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588405#comment-16588405 ] Yiqun Lin edited comment on HDFS-13821 at 8/22/18 5:21 AM: --- Committed this to trunk, branch-3.1, branch-3.0, branch-2 and branch-2.9. Thanks [~ferhui] for the contribution and thanks [~elgoiri] for the review. [~ferhui], I also added you as a HDFS contributor role. Congratulates for your first HDFS patch, :). was (Author: linyiqun): Committed this to trunk, branch-3.1, branch-3.0, branch-2 and branch-2.9. Thanks [~ferhui] for the contribution and thanks [~ferhui] for the review. [~ferhui], I also added you as a HDFS contributor role. Congratulates for your first HDFS patch, :). > RBF: Add dfs.federation.router.mount-table.cache.enable so that users can > disable cache > --- > > Key: HDFS-13821 > URL: https://issues.apache.org/jira/browse/HDFS-13821 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0, 2.9.1, 3.0.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2 > > Attachments: HDFS-13821.001.patch, HDFS-13821.002.patch, > HDFS-13821.003.patch, HDFS-13821.004.patch, HDFS-13821.005.patch, > HDFS-13821.006.patch, HDFS-13821.007.patch, HDFS-13821.008.patch, > LocalCacheTest.java, image-2018-08-13-11-27-49-023.png > > > When i test rbf, if found performance problem. > I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and > get the following stack frames > {quote} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005c264acd8> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {quote} > Many threads blocked on *LocalCache* > After disable the cache, ProxyAvgTime is down as follow showed > !image-2018-08-13-11-27-49-023.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache
[ https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580749#comment-16580749 ] Yiqun Lin edited comment on HDFS-13821 at 8/15/18 6:12 AM: --- Thanks [~ferhui] for providing the test results! As [~ferhui] pointed that the bottleneck seems in the localcache. I look into the localcache instance, it uses reentrant lock (not read/write lock) for the thread-safe operation. So here the problem is that when multiple read/write operations are doing for the cache, the cache maybe looks bad. {quote} Improve the locking model. From the trace Fei Hui posted, I'm guessing that the issue is that we are holding the write lock a lot. {quote} [~elgoiri], improve the locking model in MountTableResolver maybe not help us a lot if bottleneck is in the localcache. was (Author: linyiqun): Thanks [~ferhui] for providing the test results! As [~ferhui] pointed that the bottleneck seems in the localcache. I look into the localcache instance, it uses reentrant lock (not read/write lock) for the thread-safe operation. So here the problem is that when multiple read/write operations are doing for the cache, the cache maybe looks bad. {quote} {quote} > RBF: Add dfs.federation.router.mount-table.cache.enable so that users can > disable cache > --- > > Key: HDFS-13821 > URL: https://issues.apache.org/jira/browse/HDFS-13821 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0, 2.9.1, 3.0.3 >Reporter: Fei Hui >Priority: Major > Attachments: HDFS-13821.001.patch, LocalCacheTest.java, > image-2018-08-13-11-27-49-023.png > > > When i test rbf, if found performance problem. > I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and > get the following stack frames > {quote} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005c264acd8> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {quote} > Many threads blocked on *LocalCache* > After disable the cache, ProxyAvgTime is down as follow showed > !image-2018-08-13-11-27-49-023.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org