[ 
https://issues.apache.org/jira/browse/HDFS-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791877#comment-17791877
 ] 

ASF GitHub Bot commented on HDFS-17241:
---------------------------------------

shuaiqig commented on PR #6236:
URL: https://github.com/apache/hadoop/pull/6236#issuecomment-1835368742

   > > When rollEditLog() is called, ANN writes to seen_txid in both the 
dfs.namenode.name.dir and the dfs.namenode.edits.dir (regardless of whether 
they are isolated or not), using a write lock . If the ioutil is high, it will 
take a long time to write the small file seen_txid, so indirectly cause ANN to 
hold the write lock for a long time.
   > 
   > Back to this PR. For HA-mode cluster, if we set the same storage device 
for both `dfs.namenode.name.dir` and `dfs.namenode.edits.dir`, it could lead 
high load of this storage, especially for large cluster and impact performance 
of ANN. [HDFS-12733](https://issues.apache.org/jira/browse/HDFS-12733) try to 
disable local edit for HA-mode with shared edit dirs, which proposed years ago. 
(NOTE: this is draft patch and not checkin to trunk, could not checkin smoothy 
now, it need to review carefully if reference.) Hope it could solve this issue. 
Thanks.
   
   I have set different path for `dfs.namenode.name.dir` and 
`dfs.namenode.edits.dir`, but they are still on the same storage device. I will 
try to set different storage devices for them later. Thanks for your help.




> long write lock on active NN from rollEditLog()
> -----------------------------------------------
>
>                 Key: HDFS-17241
>                 URL: https://issues.apache.org/jira/browse/HDFS-17241
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.1.2
>            Reporter: shuaiqi.guo
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-17241.patch
>
>
> when standby NN triggering log roll on active NN and sending fsimage to 
> active NN at the same time, the active NN will have a long write lock, which 
> blocks almost all requests. like:
> {code:java}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
> lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:422)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to