[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2021-10-06 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424499#comment-17424499
 ] 

Janus Chow edited comment on HDFS-13821 at 10/6/21, 7:58 AM:
-

The performance bottleneck should be related to the bug mentioned in 
[https://github.com/google/guava/issues/1055] .

We can work around this issue by setting initialCapacity to maxCacheSize 
(mentioned in [https://unportant.info/chasing-down-guava-cache-slowness.html] )

In branch 2.10, the guava version is 11.0.2, it's still affected.


was (Author: symious):
The performance bottleneck should be related to the bug mentioned in 
[https://github.com/google/guava/issues/1055.] 

We can work around this issue by setting initialCapacity to maxCacheSize 
(mentioned in [https://unportant.info/chasing-down-guava-cache-slowness.html).] 

In branch 2.10, the guava version is 11.0.2, it's still affected.

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-13821.001.patch, HDFS-13821.002.patch, 
> HDFS-13821.003.patch, HDFS-13821.004.patch, HDFS-13821.005.patch, 
> HDFS-13821.006.patch, HDFS-13821.007.patch, HDFS-13821.008.patch, 
> LocalCacheTest.java, image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-21 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588405#comment-16588405
 ] 

Yiqun Lin edited comment on HDFS-13821 at 8/22/18 5:21 AM:
---

Committed this to trunk, branch-3.1, branch-3.0, branch-2 and branch-2.9.
Thanks [~ferhui] for the contribution and thanks [~elgoiri] for the review.
[~ferhui], I also added you as a HDFS contributor role. Congratulates for your 
first HDFS patch, :).


was (Author: linyiqun):
Committed this to trunk, branch-3.1, branch-3.0, branch-2 and branch-2.9.
Thanks [~ferhui] for the contribution and thanks [~ferhui] for the review.
[~ferhui], I also added you as a HDFS contributor role. Congratulates for your 
first HDFS patch, :).

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-13821.001.patch, HDFS-13821.002.patch, 
> HDFS-13821.003.patch, HDFS-13821.004.patch, HDFS-13821.005.patch, 
> HDFS-13821.006.patch, HDFS-13821.007.patch, HDFS-13821.008.patch, 
> LocalCacheTest.java, image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-15 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580749#comment-16580749
 ] 

Yiqun Lin edited comment on HDFS-13821 at 8/15/18 6:12 AM:
---

Thanks [~ferhui] for providing the test results!
As [~ferhui] pointed that the bottleneck seems in the localcache. I look into 
the localcache instance, it uses reentrant lock (not read/write lock) for the 
thread-safe operation. So here the problem is that when multiple read/write 
operations are doing  for the cache, the cache maybe looks bad.

{quote}
Improve the locking model. From the trace Fei Hui posted, I'm guessing that the 
issue is that we are holding the write lock a lot.
{quote}
[~elgoiri], improve the locking model in MountTableResolver maybe not help us a 
lot if bottleneck is in the localcache.


was (Author: linyiqun):
Thanks [~ferhui] for providing the test results!
As [~ferhui] pointed that the bottleneck seems in the localcache. I look into 
the localcache instance, it uses reentrant lock (not read/write lock) for the 
thread-safe operation. So here the problem is that when multiple read/write 
operations are doing  for the cache, the cache maybe looks bad.

{quote}
{quote}

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13821.001.patch, LocalCacheTest.java, 
> image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org