[jira] [Updated] (HDFS-17631) Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17631: --- Description: For namenode HA mode, standby namenode load editlog form journalnodes via QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is used for combine multiple remote journalnode inputstreams. The problems is that when read editlog with RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() throw IOException ( network errors, or hardware problems etc..) , it will be State.OK rather than State.STREAM_FAILED. And the proper state will be like blew and fault tolerant: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -> State.OK was: For namenode HA mode, standby namenode load editlog form journalnodes via QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is used for combine multiple remote journalnode inputstreams. The problems is that when read editlog with RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() throw IOException ( network errors, or hardware problems etc..) , it will be State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -> State.OK > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > --- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > For namenode HA mode, standby namenode load editlog form journalnodes via > QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is > used for combine multiple remote journalnode inputstreams. > The problems is that when read editlog with > RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() > throw IOException ( network errors, or hardware problems etc..) , it will be > State.OK rather than State.STREAM_FAILED. > And the proper state will be like blew and fault tolerant: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > -> State.OK -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17631) Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17631: --- Description: For namenode HA mode, standby namenode load editlog form journalnodes via QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is used for combine multiple remote journalnode inputstreams. The problems is that when read editlog with RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() throw IOException ( network errors, or hardware problems etc..) , it will be State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -> State.OK was: For namenode HA mode, standby namenode load editlog form journalnodes via QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is used for combine multiple remote journalnode inputstreams. Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > --- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > For namenode HA mode, standby namenode load editlog form journalnodes via > QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is > used for combine multiple remote journalnode inputstreams. > The problems is that when read editlog with > RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() > throw IOException ( network errors, or hardware problems etc..) , it will be > State.OK rather than State.STREAM_FAILED. > > The proper state will be like blew: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > -> State.OK -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17631) Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17631: --- Description: For namenode HA mode, standby namenode load editlog form journalnodes via QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is used for combine multiple remote journalnode inputstreams. Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL was: For namenode HA mode, standby namenode load editlog form journalnodes. Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > --- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > For namenode HA mode, standby namenode load editlog form journalnodes via > QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is > used for combine multiple remote journalnode inputstreams. > Now when EditLogInputStream.skipUntil() throw IOException in > RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than > State.STREAM_FAILED. > The proper state will be like blew: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17631) Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17631: --- Description: For namenode HA mode, standby namenode load editlog form journalnodes. Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL was: Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > --- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > For namenode HA mode, standby namenode load editlog form journalnodes. > > Now when EditLogInputStream.skipUntil() throw IOException in > RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than > State.STREAM_FAILED. > The proper state will be like blew: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17631) Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17631: --- Summary: Fix RedundantEditLogInputStream.nextOp() state error when EditLogInputStream.skipUntil() throw IOException (was: RedundantEditLogInputStream.nextOp() will be State.STREAM_FAILED when EditLogInputStream.skipUntil() throw IOException) > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > --- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when EditLogInputStream.skipUntil() throw IOException in > RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than > State.STREAM_FAILED. > The proper state will be like blew: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17631) RedundantEditLogInputStream.nextOp() will be State.STREAM_FAILED when EditLogInputStream.skipUntil() throw IOException
[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17631: -- Assignee: liuguanghua > RedundantEditLogInputStream.nextOp() will be State.STREAM_FAILED when > EditLogInputStream.skipUntil() throw IOException > -- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Now when EditLogInputStream.skipUntil() throw > IOException in RedundantEditLogInputStream.nextOp(), it is still into > State.OK rather than State.STREAM_FAILED. > The proper state will be like blew: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17631) RedundantEditLogInputStream.nextOp() will be State.STREAM_FAILED when EditLogInputStream.skipUntil() throw IOException
liuguanghua created HDFS-17631: -- Summary: RedundantEditLogInputStream.nextOp() will be State.STREAM_FAILED when EditLogInputStream.skipUntil() throw IOException Key: HDFS-17631 URL: https://issues.apache.org/jira/browse/HDFS-17631 Project: Hadoop HDFS Issue Type: Bug Environment: Now when EditLogInputStream.skipUntil() throw IOException in RedundantEditLogInputStream.nextOp(), it is still into State.OK rather than State.STREAM_FAILED. The proper state will be like blew: State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17592) FastCopy support data copy in different nameservices without federation
[ https://issues.apache.org/jira/browse/HDFS-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17592: --- Description: FastCopy is a faster data copy tools. In federation cluster or a single cluster , FastCopy copy blocks via hardlink. This is more much faster than original copy. FastCopy can support data copy via transfer in different nameservices without federation. In theory, it could reduces one IO transfer and almost reduce halt time. Test Data: blocksize 128M 1TB ECfiles + 1TB 3 replicated files |distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via FastCopy(Transfer)|Distcp(original)| | Time Spent|5m6.687s|22m44.094s|38m17.024s| was: FastCopy is a faster data copy tools. In federation cluster or a single cluster , FastCopy copy blocks via hardlink. This is more much faster than original copy. FastCopy can support data copy via transfer in different nameservices without federation. In theory, it could save almost half the time compared to origianl copy. > FastCopy support data copy in different nameservices without federation > --- > > Key: HDFS-17592 > URL: https://issues.apache.org/jira/browse/HDFS-17592 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: FastCopy via Transfer.jpg > > > FastCopy is a faster data copy tools. In federation cluster or a single > cluster , FastCopy copy blocks via hardlink. This is more much faster than > original copy. > FastCopy can support data copy via transfer in different nameservices without > federation. In theory, it could reduces one IO transfer and almost reduce > halt time. > > Test Data: > blocksize 128M > 1TB ECfiles + 1TB 3 replicated files > > |distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via > FastCopy(Transfer)|Distcp(original)| > | Time Spent|5m6.687s|22m44.094s|38m17.024s| > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17592) FastCopy support data copy in different nameservices without federation
[ https://issues.apache.org/jira/browse/HDFS-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17592: --- Attachment: FastCopy via Transfer.jpg > FastCopy support data copy in different nameservices without federation > --- > > Key: HDFS-17592 > URL: https://issues.apache.org/jira/browse/HDFS-17592 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: FastCopy via Transfer.jpg > > > FastCopy is a faster data copy tools. In federation cluster or a single > cluster , FastCopy copy blocks via hardlink. This is more much faster than > original copy. > FastCopy can support data copy via transfer in different nameservices without > federation. In theory, it could save almost half the time compared to > origianl copy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17592) FastCopy support data copy in different nameservices without federation
[ https://issues.apache.org/jira/browse/HDFS-17592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17592: -- Assignee: liuguanghua > FastCopy support data copy in different nameservices without federation > --- > > Key: HDFS-17592 > URL: https://issues.apache.org/jira/browse/HDFS-17592 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > > FastCopy is a faster data copy tools. In federation cluster or a single > cluster , FastCopy copy blocks via hardlink. This is more much faster than > original copy. > FastCopy can support data copy via transfer in different nameservices without > federation. In theory, it could save almost half the time compared to > origianl copy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17592) FastCopy support data copy in different nameservices without federation
liuguanghua created HDFS-17592: -- Summary: FastCopy support data copy in different nameservices without federation Key: HDFS-17592 URL: https://issues.apache.org/jira/browse/HDFS-17592 Project: Hadoop HDFS Issue Type: Sub-task Reporter: liuguanghua FastCopy is a faster data copy tools. In federation cluster or a single cluster , FastCopy copy blocks via hardlink. This is more much faster than original copy. FastCopy can support data copy via transfer in different nameservices without federation. In theory, it could save almost half the time compared to origianl copy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867676#comment-17867676 ] liuguanghua commented on HDFS-2139: --- -- We already has hadoop distcp, Why do we need hdfs dfs -distcp? [~zeekling] , dfs -fastcp command like dfs -cp , and distcp is a mapreduce program and reply on yarn. For some case, we can use fastcp without distcp with only hdfs usage. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Labels: pull-request-available > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode > [~xuzq_zander]Provided a design doc > https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867636#comment-17867636 ] liuguanghua edited comment on HDFS-2139 at 7/22/24 2:52 AM: [~hexiaoqiao] , thanks for reply. For 3 : fastcopy can use in a federation cluster, and in a single cluster , and in two different cluster with no federation. The difference is that fastcopy will use hardlink in federation cluster or in a single cluster. And fastcopy will use transfer in two different cluster with no federation. Test Data: blocksize 128M 1TB ECfiles + 1TB 3 replicated files |distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via FastCopy(Transfer)|Distcp(original)| | Time Spent|5m6.687s|22m44.094s|38m17.024s| [~zeekling] , fastcopy can improve data copy efficiency. was (Author: liuguanghua): [~hexiaoqiao] , thanks for reply. For 3 : fastcopy can use in a federation cluster, and in a single cluster , and in two different cluster with no federation. The difference is that fastcopy will use hardlink in federation cluster or in a single cluster. And fastcopy will use transfer in two different cluster with no federation. Test Data: blocksize 128M 1TB ECfiles + 1TB 3 replicated files |distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via FastCopy(Transfer)|Distcp(original)| |时间|5m6.687s|22m44.094s|38m17.024s| [~zeekling] , fastcopy can improve data copy efficiency. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Labels: pull-request-available > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode > [~xuzq_zander]Provided a design doc > https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867636#comment-17867636 ] liuguanghua commented on HDFS-2139: --- [~hexiaoqiao] , thanks for reply. For 3 : fastcopy can use in a federation cluster, and in a single cluster , and in two different cluster with no federation. The difference is that fastcopy will use hardlink in federation cluster or in a single cluster. And fastcopy will use transfer in two different cluster with no federation. Test Data: blocksize 128M 1TB ECfiles + 1TB 3 replicated files |distcp map=20|DIstcp via FastCopy(HardLink)|DistCp via FastCopy(Transfer)|Distcp(original)| |时间|5m6.687s|22m44.094s|38m17.024s| [~zeekling] , fastcopy can improve data copy efficiency. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Labels: pull-request-available > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode > [~xuzq_zander]Provided a design doc > https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17581) Add FastCopy tool and support dfs -fastcp command
[ https://issues.apache.org/jira/browse/HDFS-17581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17581: --- Description: Add FastCopy Tool : (1) support data replication with replication files (2) support data replication with ec files And add hdfs dfs -fastcp command for copy file use fastcopy. And the fastcp is similar to cp command This is depend on HDFS-16757 was: Add FastCopy Tool : (1) support data replication with replication files (2) support data replication with ec files And add hdfs dfs -fastcp command for copy file use fastcopy. And the fastcp is similar to cp command > Add FastCopy tool and support dfs -fastcp command > - > > Key: HDFS-17581 > URL: https://issues.apache.org/jira/browse/HDFS-17581 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > > Add FastCopy Tool : > (1) support data replication with replication files > (2) support data replication with ec files > And add hdfs dfs -fastcp command for copy file use fastcopy. And the fastcp > is similar to cp command > > This is depend on HDFS-16757 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17582) Distcp support fastcopy
liuguanghua created HDFS-17582: -- Summary: Distcp support fastcopy Key: HDFS-17582 URL: https://issues.apache.org/jira/browse/HDFS-17582 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: liuguanghua Assignee: liuguanghua DistCp support fastcopy for distribute data replication cross same nameservice or different nameservices in hdfs federation cluster. This is depend on # HDFS-16757 # HDFS-17581 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17581) Add FastCopy tool and support dfs -fastcp command
liuguanghua created HDFS-17581: -- Summary: Add FastCopy tool and support dfs -fastcp command Key: HDFS-17581 URL: https://issues.apache.org/jira/browse/HDFS-17581 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: liuguanghua Assignee: liuguanghua Add FastCopy Tool : (1) support data replication with replication files (2) support data replication with ec files And add hdfs dfs -fastcp command for copy file use fastcopy. And the fastcp is similar to cp command -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.
[ https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847165#comment-17847165 ] liuguanghua commented on HDFS-17509: Thanks [~xuzq_zander] > RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. > -- > > Key: HDFS-17509 > URL: https://issues.apache.org/jira/browse/HDFS-17509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > hdfs dfs -concat /tmp/merge /tmp/t1 /tmp/t2 > When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.
[ https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17509: -- Assignee: liuguanghua > RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. > -- > > Key: HDFS-17509 > URL: https://issues.apache.org/jira/browse/HDFS-17509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > hdfs dfs -concat /tmp/merge /tmp/t1 /tmp/t2 > When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846574#comment-17846574 ] liuguanghua commented on HDFS-2139: --- [~haiyang Hu] Hello sir. Are you still working on this now ? I am interested in doing some things in this job. [~xuzq_zander] And I will use fastcopy in production environment. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode > [~xuzq_zander]Provided a design doc > https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844497#comment-17844497 ] liuguanghua commented on HDFS-2139: --- [~xuzq_zander] Hello,sir. The design doc can not be viewd because of permission. [https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing] Can you upload a new version in Attachments? Thanks very much > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode > [~xuzq_zander]Provided a design doc > https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.
[ https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17509: --- Summary: RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. (was: RBF: ClientProtocol.concat will throw NPE if tgr is a empty file.) > RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. > -- > > Key: HDFS-17509 > URL: https://issues.apache.org/jira/browse/HDFS-17509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > hdfs dfs -concat /tmp/merge /tmp/t1 /tmp/t2 > When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17509) RBF: ClientProtocol.concat will throw NPE if tgr is a empty file.
[ https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17509: --- Description: hdfs dfs -concat /tmp/merge /tmp/t1 /tmp/t2 When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. > RBF: ClientProtocol.concat will throw NPE if tgr is a empty file. > -- > > Key: HDFS-17509 > URL: https://issues.apache.org/jira/browse/HDFS-17509 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > hdfs dfs -concat /tmp/merge /tmp/t1 /tmp/t2 > When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17509) RBF: ClientProtocol.concat will throw NPE if tgr is a empty file.
liuguanghua created HDFS-17509: -- Summary: RBF: ClientProtocol.concat will throw NPE if tgr is a empty file. Key: HDFS-17509 URL: https://issues.apache.org/jira/browse/HDFS-17509 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829445#comment-17829445 ] liuguanghua edited comment on HDFS-16016 at 3/21/24 9:13 AM: - Thank for reply. [~zhangxiping] IBR contains DELETED_BLOCK, RECEIVED_BLOCK, RECEIVING_BLOCK. Mis-order of IBR and FBR not only effect to_remove blocks. And NN should remove blocks which FBR does not contain if disk damage lost blocks. was (Author: liuguanghua): Thank for reply. IBR contains DELETED_BLOCK, RECEIVED_BLOCK, RECEIVING_BLOCK. Mis-order of IBR and FBR not only effect to_remove blocks. And NN should remove blocks which FBR does not contain if disk damage lost blocks. > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.6 > > Attachments: image-2023-11-03-18-11-54-502.png, > image-2023-11-06-10-53-13-584.png, image-2023-11-06-10-55-50-939.png, > image-2024-03-20-18-31-23-937.png, image-2024-03-21-16-20-46-746.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829445#comment-17829445 ] liuguanghua commented on HDFS-16016: Thank for reply. IBR contains DELETED_BLOCK, RECEIVED_BLOCK, RECEIVING_BLOCK. Mis-order of IBR and FBR not only effect to_remove blocks. And NN should remove blocks which FBR does not contain if disk damage lost blocks. > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.6 > > Attachments: image-2023-11-03-18-11-54-502.png, > image-2023-11-06-10-53-13-584.png, image-2023-11-06-10-55-50-939.png, > image-2024-03-20-18-31-23-937.png, image-2024-03-21-16-20-46-746.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829403#comment-17829403 ] liuguanghua commented on HDFS-16016: In step4, It is according to the following way? (1) In a loop, Heartbeat -> IBR(if need) -> FBR(6h) (2) And DN keeps all blocks(FBR) in memory ,and merge every IBR [~zhangxiping] , Thanks. > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.6 > > Attachments: image-2023-11-03-18-11-54-502.png, > image-2023-11-06-10-53-13-584.png, image-2023-11-06-10-55-50-939.png, > image-2024-03-20-18-31-23-937.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17357) NioInetPeer.close() should close socket connection.
[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17357: --- Summary: NioInetPeer.close() should close socket connection. (was: EC: NioInetPeer.close() should close socket connection.) > NioInetPeer.close() should close socket connection. > --- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > NioInetPeer.close() now do not close socket connection. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > > > This solution can be reporduced as following: > (1) Client write data to HDFS > (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the > new Xceiver will fail and throw IOException . And the socket will not release. > (3) Client crash for that no new data will be added or client.close is > executed. > (4) There will be socket connection leakage between datanodes. > > > The connection leakage like this > dn1 > dn1:57042 dn2:50010 ESTABLISHED > dn2 > dn2:50010 dn1:57042 ESTABLISHED -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17357) EC: NioInetPeer.close() should close socket connection.
[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17357: --- Description: NioInetPeer.close() now do not close socket connection. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes. The connection leakage like this dn1 dn1:57042 dn2:50010 ESTABLISHED dn2 dn2:50010 dn1:57042 ESTABLISHED was: NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes. The connection leakage like this dn1 dn1:57042 dn2:50010 ESTABLISHED dn2 dn2:50010 dn1:57042 ESTABLISHED > EC: NioInetPeer.close() should close socket connection. > --- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > NioInetPeer.close() now do not close socket connection. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > > > This solution can be reporduced as following: > (1) Client write data to HDFS > (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the > new Xceiver will fail and throw IOException . And the socket will not release. > (3) Client crash for that no new data will be added or client.close is > executed. > (4) There will be socket connection leakage between datanodes. > > > The connection leakage like this > dn1 > dn1:57042 dn2:50010 ESTABLISHED > dn2 > dn2:50010 dn1:57042 ESTABLISHED -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17357) NioInetPeer.close() should close socket connection.
[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17357: --- Description: NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes. The connection leakage like this dn1 dn1:57042 dn2:50010 ESTABLISHED dn2 dn2:50010 dn1:57042 ESTABLISHED was: NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes. > NioInetPeer.close() should close socket connection. > --- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > NioInetPeer.close() now do not close socket connection. > > In my environment,all data were stored with EC. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > > > This solution can be reporduced as following: > (1) Client write data to HDFS > (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the > new Xceiver will fail and throw IOException . And the socket will not release. > (3) Client crash for that no new data will be added or client.close is > executed. > (4) There will be socket connection leakage between datanodes. > > > The connection leakage like this > dn1 > dn1:57042 dn2:50010 ESTABLISHED > dn2 > dn2:50010 dn1:57042 ESTABLISHED -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17357) NioInetPeer.close() should close socket connection.
[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17357: --- Description: NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes. was: NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer > NioInetPeer.close() should close socket connection. > --- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > NioInetPeer.close() now do not close socket connection. > > In my environment,all data were stored with EC. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > > > This solution can be reporduced as following: > (1) Client write data to HDFS > (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the > new Xceiver will fail and throw IOException . And the socket will not release. > (3) Client crash for that no new data will be added or client.close is > executed. > (4) There will be socket connection leakage between datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17357) NioInetPeer.close() should close socket connection.
liuguanghua created HDFS-17357: -- Summary: NioInetPeer.close() should close socket connection. Key: HDFS-17357 URL: https://issues.apache.org/jira/browse/HDFS-17357 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua NioInetPeer.close() now do not close socket connection. In my environment,all data were stored with EC. And I found 3w+ connections leakage in datanode . And I found many warn message as blew. 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer When any Exception is found in DataXceiverServer, it will execute clostStream. IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage. Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17357) NioInetPeer.close() should close socket connection.
[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17357: -- Assignee: liuguanghua > NioInetPeer.close() should close socket connection. > --- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > > NioInetPeer.close() now do not close socket connection. > > In my environment,all data were stored with EC. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.
[ https://issues.apache.org/jira/browse/HDFS-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17311: --- Description: In the Router, find blow log 2023-12-29 15:18:54,799 ERROR org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add more than 2048 connections at the same time The log indicates that ConnectionManager.creatorQueue is full at a certain point. But my cluster does not have so many users cloud reach up 2048 pair of . This may be due to the following reasons: # ConnectionManager.creatorQueue is a queue that will be offered ConnectionPool if ConnectionContext is not enough. # ConnectionCreator thread will consume from creatorQueue and make more ConnectionContexts for a ConnectionPool. # Client will concurrent invoke for ConnectionManager.getConnection() for a same user. And this maybe lead to add many same ConnectionPool into ConnectionManager.creatorQueue. # When creatorQueue is full, a new ConnectionPool will not be added in successfully and log this error. This maybe lead to a really new ConnectionPool clould not produce more ConnectionContexts for new user. So this pr try to make creatorQueue will not add same ConnectionPool at once. was: 2023-12-29 15:18:54,799 ERROR org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add more than 2048 connections at the same time In my environment, ConnectionManager creatorQueue is full ,but the cluster does not have so many users cloud reach up 2048 pair of in router. In the case of a large number of concurrent creatorQueue add same pool more than once. > RBF: ConnectionManager creatorQueue should offer a pool that is not already > in creatorQueue. > > > Key: HDFS-17311 > URL: https://issues.apache.org/jira/browse/HDFS-17311 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > In the Router, find blow log > > 2023-12-29 15:18:54,799 ERROR > org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add > more than 2048 connections at the same time > > The log indicates that ConnectionManager.creatorQueue is full at a certain > point. But my cluster does not have so many users cloud reach up 2048 pair of > . > This may be due to the following reasons: > # ConnectionManager.creatorQueue is a queue that will be offered > ConnectionPool if ConnectionContext is not enough. > # ConnectionCreator thread will consume from creatorQueue and make more > ConnectionContexts for a ConnectionPool. > # Client will concurrent invoke for ConnectionManager.getConnection() for a > same user. And this maybe lead to add many same ConnectionPool into > ConnectionManager.creatorQueue. > # When creatorQueue is full, a new ConnectionPool will not be added in > successfully and log this error. This maybe lead to a really new > ConnectionPool clould not produce more ConnectionContexts for new user. > So this pr try to make creatorQueue will not add same ConnectionPool at once. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17300) [SBN READ] A rpc call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time.
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17300: -- Assignee: liuguanghua > [SBN READ] A rpc call in Observer should throw > ObserverRetryOnActiveException if its stateid is always lower than client > stateid for a configured time. > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > > Now when Observer is enable, Observer will update its stateid through that > EditLogTailer near-real-time tailing editlog form Active Namenode. And if a > rpc call's stateid is lower than client stateid which may update from active > namenode with msync, the call will be requeued into callqueue. > This PR is intend to if a rpc call's stateid is always lower than client > statid for a configured time , the call should throw > ObserverRetryOnActiveException for client and client will go to active > namenode for processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] A rpc call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time.
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Summary: [SBN READ] A rpc call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time. (was: [SBN READ] A rcp call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time.) > [SBN READ] A rpc call in Observer should throw > ObserverRetryOnActiveException if its stateid is always lower than client > stateid for a configured time. > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > > Now when Observer is enable, Observer will update its stateid through that > EditLogTailer near-real-time tailing editlog form Active Namenode. And if a > rpc call's stateid is lower than client stateid which may update from active > namenode with msync, the call will be requeued into callqueue. > This PR is intend to if a rpc call's stateid is always lower than client > statid for a configured time , the call should throw > ObserverRetryOnActiveException for client and client will go to active > namenode for processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] A rcp call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time.
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Description: Now when Observer is enable, Observer will update its stateid through that EditLogTailer near-real-time tailing editlog form Active Namenode. And if a rpc call's stateid is lower than client stateid which may update from active namenode with msync, the call will be requeued into callqueue. This PR is intend to if a rpc call's stateid is always lower than client statid for a configured time , the call should throw ObserverRetryOnActiveException for client and client will go to active namenode for processing. was: Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time. > [SBN READ] A rcp call in Observer should throw > ObserverRetryOnActiveException if its stateid is always lower than client > stateid for a configured time. > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > > Now when Observer is enable, Observer will update its stateid through that > EditLogTailer near-real-time tailing editlog form Active Namenode. And if a > rpc call's stateid is lower than client stateid which may update from active > namenode with msync, the call will be requeued into callqueue. > This PR is intend to if a rpc call's stateid is always lower than client > statid for a configured time , the call should throw > ObserverRetryOnActiveException for client and client will go to active > namenode for processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] A rcp call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time.
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Summary: [SBN READ] A rcp call in Observer should throw ObserverRetryOnActiveException if its stateid is always lower than client stateid for a configured time. (was: [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time) > [SBN READ] A rcp call in Observer should throw > ObserverRetryOnActiveException if its stateid is always lower than client > stateid for a configured time. > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309 ] liuguanghua deleted comment on HDFS-17309: was (Author: liuguanghua): [~slfan1989] Ok, I will submit new PR according to this format in the future. Thank you. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803508#comment-17803508 ] liuguanghua commented on HDFS-17309: [~slfan1989] Ok, I will submit new PR according to this format in the future. Thank you. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md
[ https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17325: -- Assignee: liuguanghua > Doc: Fix the documentation of fs expunge command in FileSystemShell.md > -- > > Key: HDFS-17325 > URL: https://issues.apache.org/jira/browse/HDFS-17325 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > Fix doc in FileSystemShell.md. > hadoop fs -expunge --immediate should be hadoop fs -expunge -immediate > > Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md
[ https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17325: --- Description: Fix doc in FileSystemShell.md. hadoop fs -expunge --immediate should be hadoop fs -expunge -immediate Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ] > Doc: Fix the documentation of fs expunge command in FileSystemShell.md > -- > > Key: HDFS-17325 > URL: https://issues.apache.org/jira/browse/HDFS-17325 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Fix doc in FileSystemShell.md. > hadoop fs -expunge --immediate should be hadoop fs -expunge -immediate > > Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md
liuguanghua created HDFS-17325: -- Summary: Doc: Fix the documentation of fs expunge command in FileSystemShell.md Key: HDFS-17325 URL: https://issues.apache.org/jira/browse/HDFS-17325 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17324: -- Assignee: liuguanghua > RBF: Router should not return nameservices that not enable observer read in > RpcResponseHeaderProto > -- > > Key: HDFS-17324 > URL: https://issues.apache.org/jira/browse/HDFS-17324 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > {color:#172b4d}Router Observer Read is controled by > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color} > {color:#172b4d}If nameservice is not enable for observer read in Router, > RpcResponseHeaderProto in Router should not return it.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17324: --- Description: {color:#172b4d}Router Observer Read is controled by RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color} {color:#172b4d}If nameservice is not enable for observer read in Router, RpcResponseHeaderProto in Router should not return it.{color} was: {color:#172b4d}Router Observer Read is controled by RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES. {color} {color:#172b4d}If nameservice is not enable for observer read in router, {color} > RBF: Router should not return nameservices that not enable observer read in > RpcResponseHeaderProto > -- > > Key: HDFS-17324 > URL: https://issues.apache.org/jira/browse/HDFS-17324 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > > {color:#172b4d}Router Observer Read is controled by > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color} > {color:#172b4d}If nameservice is not enable for observer read in Router, > RpcResponseHeaderProto in Router should not return it.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17324: --- Description: {color:#172b4d}Router Observer Read is controled by RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES. {color} {color:#172b4d}If nameservice is not enable for observer read in router, {color} > RBF: Router should not return nameservices that not enable observer read in > RpcResponseHeaderProto > -- > > Key: HDFS-17324 > URL: https://issues.apache.org/jira/browse/HDFS-17324 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > > {color:#172b4d}Router Observer Read is controled by > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and > RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES. > {color} > {color:#172b4d}If nameservice is not enable for observer read in router, > {color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto
liuguanghua created HDFS-17324: -- Summary: RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto Key: HDFS-17324 URL: https://issues.apache.org/jira/browse/HDFS-17324 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17321) RBF: Add RouterAutoMsyncService for auto msync in Router
[ https://issues.apache.org/jira/browse/HDFS-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17321: --- Description: Router should have the ability to to auto msync to a nameservice. And it can ensure router periodically refreshes its record of a namespace's state. Different from HDFS-17027, this is controled by router itself without configuring with AbstractNNFailoverProxyProvider. And HDFS-16890 maybe lead to many read requests into active NN at the same time. This PR provides a new way to implememts auto msync in Router. > RBF: Add RouterAutoMsyncService for auto msync in Router > > > Key: HDFS-17321 > URL: https://issues.apache.org/jira/browse/HDFS-17321 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > > Router should have the ability to to auto msync to a nameservice. And it can > ensure router periodically refreshes its record of a namespace's state. > Different from HDFS-17027, this is controled by router itself without > configuring with AbstractNNFailoverProxyProvider. > And HDFS-16890 maybe lead to many read requests into active NN at the same > time. > This PR provides a new way to implememts auto msync in Router. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17321) RBF: Add RouterAutoMsyncService for auto msync in Router
liuguanghua created HDFS-17321: -- Summary: RBF: Add RouterAutoMsyncService for auto msync in Router Key: HDFS-17321 URL: https://issues.apache.org/jira/browse/HDFS-17321 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.
liuguanghua created HDFS-17311: -- Summary: RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue. Key: HDFS-17311 URL: https://issues.apache.org/jira/browse/HDFS-17311 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua 2023-12-29 15:18:54,799 ERROR org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add more than 2048 connections at the same time In my environment, ConnectionManager creatorQueue is full ,but the cluster does not have so many users cloud reach up 2048 pair of in router. In the case of a large number of concurrent creatorQueue add same pool more than once. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17309) Fix Router Safemode check contidition error
liuguanghua created HDFS-17309: -- Summary: Fix Router Safemode check contidition error Key: HDFS-17309 URL: https://issues.apache.org/jira/browse/HDFS-17309 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua With HDFS-17116, Router safemode check contidition use monotonicNow(). For code in RouterSafemodeService.periodicInvoke() long now = monotonicNow(); long cacheUpdateTime = stateStore.getCacheUpdateTime(); boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; Function monotonicNow() is implemented with System.nanoTime(). System.nanoTime() in javadoc description: This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The following situation maybe exists : If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
liuguanghua created HDFS-17306: -- Summary: RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto Key: HDFS-17306 URL: https://issues.apache.org/jira/browse/HDFS-17306 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer nodes, and client via DFSRouter comminutes with nns. If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will receive all nameservices in RpcResponseHeaderProto. We should reduce rpc response size if nameservices don't enable observer nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a period of time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Description: Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time. was: Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a period of time > -- > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Summary: [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time (was: [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a period of time) > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a period of time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17300: --- Description: Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a period of time > -- > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > > Now when Observer NN is used, if the stateid is delayed , the rpcServer will > be requeued into callqueue. > If EditLogTailer is broken or something else wrong , the call will be > requeued again and again. > So -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a period of time
liuguanghua created HDFS-17300: -- Summary: [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a period of time Key: HDFS-17300 URL: https://issues.apache.org/jira/browse/HDFS-17300 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17170) add metris for datanode in function processQueueMessages and reportTo
[ https://issues.apache.org/jira/browse/HDFS-17170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua resolved HDFS-17170. Resolution: Not A Problem > add metris for datanode in function processQueueMessages and reportTo > - > > Key: HDFS-17170 > URL: https://issues.apache.org/jira/browse/HDFS-17170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > Add two metrics for datanode: > (1)BPServiceActorAction.reportTo will execute errorReport and > reportBadBlocks, record the counts of it. > (2) BPServiceActor.processQueueMessages in the loop with heartbeat to > NN,should recore the numOps and time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17285: --- Summary: RBF: Add a safe mode check period configuration (was: [RBF] Decrease dfsrouter safe mode check period.) > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17285) [RBF] Decrease dfsrouter safe mode check period.
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17285: --- Description: When dfsrouter start, it enters safe mode. And it will cost 1min to leave. The log is blow: 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave startup safe mode after 3 ms 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter safe mode after 18 ms without reaching the State Store 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Entering safe mode 14:35:24,996 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Delaying safemode exit for 28721 milliseconds... 14:36:25,037 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leaving safe mode after 61319 milliseconds It depends on these configs. DFS_ROUTER_SAFEMODE_EXTENSION 30s DFS_ROUTER_SAFEMODE_EXPIRATION 3min DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) Because in safe mode dfsrouter will reject write requests, so it should be shorter in check period if refreshCaches is done. And we should remove DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. was: When dfsrouter start, it enters safe mode. And it will cost 1min to leave. The log is blow: 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave startup safe mode after 3 ms 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter safe mode after 18 ms without reaching the State Store 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Entering safe mode 14:35:24,996 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Delaying safemode exit for 28721 milliseconds... 14:36:25,037 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leaving safe mode after 61319 milliseconds It depends on these configs. DFS_ROUTER_SAFEMODE_EXTENSION 30s DFS_ROUTER_SAFEMODE_EXPIRATION 3min DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) Because in safe mode dfsrouter will reject write requests, so it should be shorter in check period if refreshCaches is done. And we should be separted DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. > [RBF] Decrease dfsrouter safe mode check period. > > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17285) [RBF] Decrease dfsrouter safe mode check period.
liuguanghua created HDFS-17285: -- Summary: [RBF] Decrease dfsrouter safe mode check period. Key: HDFS-17285 URL: https://issues.apache.org/jira/browse/HDFS-17285 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua When dfsrouter start, it enters safe mode. And it will cost 1min to leave. The log is blow: 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave startup safe mode after 3 ms 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter safe mode after 18 ms without reaching the State Store 14:35:23,717 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Entering safe mode 14:35:24,996 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Delaying safemode exit for 28721 milliseconds... 14:36:25,037 INFO org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leaving safe mode after 61319 milliseconds It depends on these configs. DFS_ROUTER_SAFEMODE_EXTENSION 30s DFS_ROUTER_SAFEMODE_EXPIRATION 3min DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) Because in safe mode dfsrouter will reject write requests, so it should be shorter in check period if refreshCaches is done. And we should be separted DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17269) [RBF] Make listStatus user root trash dir will return all subclusters trash subdirs if user has any mount points in nameservice.
[ https://issues.apache.org/jira/browse/HDFS-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17269: -- Assignee: liuguanghua > [RBF] Make listStatus user root trash dir will return all subclusters trash > subdirs if user has any mount points in nameservice. > > > Key: HDFS-17269 > URL: https://issues.apache.org/jira/browse/HDFS-17269 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > > Same scenario as HDFS-17263 > If user trash config fs.trash.checkpoint.interval set to 10min in namenodes, > the trash root dir /user/$USER/.Trash/Current will be very 10 min renamed to > /user/$USER/.Trash/timestamp . > > When user ls /user/$USER/.Trash, it should be return blow: > /user/$USER/.Trash/Current > /user/$USER/.Trash/timestamp (This is invisible now) > > So we should make that user ls trash root dir can see all trash subdirs in > all nameservices which user has any mountpoint in nameservice. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17269) [RBF] Make listStatus user root trash dir will return all subclusters trash subdirs if user has any mount points in nameservice.
liuguanghua created HDFS-17269: -- Summary: [RBF] Make listStatus user root trash dir will return all subclusters trash subdirs if user has any mount points in nameservice. Key: HDFS-17269 URL: https://issues.apache.org/jira/browse/HDFS-17269 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua Same scenario as HDFS-17263 If user trash config fs.trash.checkpoint.interval set to 10min in namenodes, the trash root dir /user/$USER/.Trash/Current will be very 10 min renamed to /user/$USER/.Trash/timestamp . When user ls /user/$USER/.Trash, it should be return blow: /user/$USER/.Trash/Current /user/$USER/.Trash/timestamp (This is invisible now) So we should make that user ls trash root dir can see all trash subdirs in all nameservices which user has any mountpoint in nameservice. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17263) RBF: Fix client ls trash path cannot get except default nameservices trash path
[ https://issues.apache.org/jira/browse/HDFS-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17263: -- Assignee: liuguanghua > RBF: Fix client ls trash path cannot get except default nameservices trash > path > --- > > Key: HDFS-17263 > URL: https://issues.apache.org/jira/browse/HDFS-17263 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-16024, we can rename data to the Trash should be based on src > locations. That is great for my useage. After a period of use, I found this > cause a issue. > There are two nameservices ns0 ns1, and ns0 is the default nameservice. > (1) Add moutTable > /home/data -> (ns0, /home/data) > /data1/test1 -> (ns1, /data1/test1 ) > /data2/test2 -> (ns1, /data2/test2 ) > (2)mv file to trash > ns0: /user/test-user/.Trash/Current/home/data/file1 > ns1: /user/test-user/.Trash/Current/data1/test1/file1 > (3) client via DFSRouter ls will not see > /user/test-user/.Trash/Current/data1 > (4) client ls /user/test-user/.Trash/Current/data2/test2 will return > exception . > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17263) [RBF] Fix client ls trash path cannot get except default nameservices trash path
liuguanghua created HDFS-17263: -- Summary: [RBF] Fix client ls trash path cannot get except default nameservices trash path Key: HDFS-17263 URL: https://issues.apache.org/jira/browse/HDFS-17263 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua With HDFS-16024, we can rename data to the Trash should be based on src locations. That is great for my useage. After a period of use, I found this cause a issue. There are two nameservices ns0 ns1, and ns0 is the default nameservice. (1) Add moutTable /home/data -> (ns0, /home/data) /data1/test1 -> (ns1, /data1/test1 ) /data2/test2 -> (ns1, /data2/test2 ) (2)mv file to trash ns0: /user/test-user/.Trash/Current/home/data/file1 ns1: /user/test-user/.Trash/Current/data1/test1/file1 (3) client via DFSRouter ls will not see /user/test-user/.Trash/Current/data1 (4) client ls /user/test-user/.Trash/Current/data2/test2 will return exception . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17261) Fix getFileInfo return wrong path when get mountTable path which multi-level
[ https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17261: --- Description: With DFSRouter, Suppose there are two nameservices : ns0,ns1 # Add mountTable /testgetfileinfo/ns1/dir -> (ns1 -> /testgetfileinfo/ns1/dir) # hdfs client via DFSRouter accesses a directory: hdfs dfs -ls -d /testgetfileinfo # it will return worng path : /testgetfileinfo/testgetfileinfo > Fix getFileInfo return wrong path when get mountTable path which multi-level > > > Key: HDFS-17261 > URL: https://issues.apache.org/jira/browse/HDFS-17261 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > With DFSRouter, Suppose there are two nameservices : ns0,ns1 > # Add mountTable /testgetfileinfo/ns1/dir -> (ns1 -> > /testgetfileinfo/ns1/dir) > # hdfs client via DFSRouter accesses a directory: hdfs dfs -ls -d > /testgetfileinfo > # it will return worng path : /testgetfileinfo/testgetfileinfo > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17261) Fix getFileInfo return wrong path when get mountTable path which multi-level
liuguanghua created HDFS-17261: -- Summary: Fix getFileInfo return wrong path when get mountTable path which multi-level Key: HDFS-17261 URL: https://issues.apache.org/jira/browse/HDFS-17261 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) invalid
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Labels: (was: pull-request-available) > invalid > --- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17249) TestDFSUtil.testIsValidName() run failure
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17249: -- Assignee: liuguanghua > TestDFSUtil.testIsValidName() run failure > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > > TestDFSUtil.testIsValidName runs failed when > assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. > Add test case in TestDFSUtil.testIsValidName. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17249) TestDFSUtil.testIsValidName() run failure
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17249: --- Summary: TestDFSUtil.testIsValidName() run failure (was: Add test case for DFSUtil.isValidName) > TestDFSUtil.testIsValidName() run failure > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > TestDFSUtil.testIsValidName runs failed when > assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. > Add test case in TestDFSUtil.testIsValidName. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17249) Add test case for DFSUtil.isValidName
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17249: --- Description: TestDFSUtil.testIsValidName runs failed when assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. Add test case in TestDFSUtil.testIsValidName. was: TestDFSUtil.testIsValidName runs failed when assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. Add test case in TestDFSUtil.testIsValidName for HDFS-17246. > Add test case for DFSUtil.isValidName > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > TestDFSUtil.testIsValidName runs failed when > assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. > Add test case in TestDFSUtil.testIsValidName. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17249) Add test case for DFSUtil.isValidName
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17249: --- Description: TestDFSUtil.testIsValidName runs failed when assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. Add test case in TestDFSUtil.testIsValidName for HDFS-17246. was: TestDFSUtil.testIsValidName runs failed when assertFalse(DFSUtil.isValidName("/foo/:/bar")) ,fixed it. Add test case in TestDFSUtil.testIsValidName for HDFS-17246. > Add test case for DFSUtil.isValidName > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > TestDFSUtil.testIsValidName runs failed when > assertFalse(DFSUtil.isValidName("/foo/:/bar")) , fixed it. > Add test case in TestDFSUtil.testIsValidName for HDFS-17246. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17249) Add test case for DFSUtil.isValidName
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17249: --- Description: TestDFSUtil.testIsValidName runs failed when assertFalse(DFSUtil.isValidName("/foo/:/bar")) ,fixed it. Add test case in TestDFSUtil.testIsValidName for HDFS-17246. > Add test case for DFSUtil.isValidName > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > TestDFSUtil.testIsValidName runs failed when > assertFalse(DFSUtil.isValidName("/foo/:/bar")) ,fixed it. > Add test case in TestDFSUtil.testIsValidName for HDFS-17246. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17249) Add test case for DFSUtil.isValidName
[ https://issues.apache.org/jira/browse/HDFS-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17249: --- Summary: Add test case for DFSUtil.isValidName (was: Fix test case for HDFS-17246) > Add test case for DFSUtil.isValidName > - > > Key: HDFS-17249 > URL: https://issues.apache.org/jira/browse/HDFS-17249 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17249) Fix test case for HDFS-17246
liuguanghua created HDFS-17249: -- Summary: Fix test case for HDFS-17246 Key: HDFS-17249 URL: https://issues.apache.org/jira/browse/HDFS-17249 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Fix
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Description: was: Add test case for HDFS-17246 and assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 merged. > Fix > > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) invalid
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Summary: invalid (was: Fix ) > invalid > --- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Fix
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Summary: Fix (was: Add Test case for for building Hadoop on Windows and ) > Fix > > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > > Add test case for HDFS-17246 and > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17246) Fix shaded client for building Hadoop on Windows
[ https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17246: --- Summary: Fix shaded client for building Hadoop on Windows (was: Fix DFSUtilClient.ValidName ERROR) > Fix shaded client for building Hadoop on Windows > > > Key: HDFS-17246 > URL: https://issues.apache.org/jira/browse/HDFS-17246 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.4.0 > Environment: Windows 10 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2023-11-03-17-31-14-990.png > > > Currently, the *shaded client* Yetus personality in Hadoop fails to build on > Windows - > https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615. > This happens due to the integration test failures in Hadoop client modules - > https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests. > There are several issues that need to be addressed in order to get the > integration tests working - > # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by > the integration tests. > # Add Hadoop binaries to PATH, so that winutils.exe can be located. > # Create a new user with Symlink privilege in the Docker image. This is > needed for the proper working of Mini YARN cluster, spawned by the > integration tests. > # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. > The colon is used a delimiter for the PATH variable while specifying multiple > paths. However, this isn't a delimiter in the case of Windows and must be > handled appropriately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17246) Fix DFSUtilClient.ValidName ERROR
[ https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17246: --- Summary: Fix DFSUtilClient.ValidName ERROR (was: Fix shaded client for building Hadoop on Windows) > Fix DFSUtilClient.ValidName ERROR > - > > Key: HDFS-17246 > URL: https://issues.apache.org/jira/browse/HDFS-17246 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.4.0 > Environment: Windows 10 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2023-11-03-17-31-14-990.png > > > Currently, the *shaded client* Yetus personality in Hadoop fails to build on > Windows - > https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615. > This happens due to the integration test failures in Hadoop client modules - > https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests. > There are several issues that need to be addressed in order to get the > integration tests working - > # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by > the integration tests. > # Add Hadoop binaries to PATH, so that winutils.exe can be located. > # Create a new user with Symlink privilege in the Docker image. This is > needed for the proper working of Mini YARN cluster, spawned by the > integration tests. > # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. > The colon is used a delimiter for the PATH variable while specifying multiple > paths. However, this isn't a delimiter in the case of Windows and must be > handled appropriately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Add Test case for for building Hadoop on Windows and
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Description: Add test case for HDFS-17246 and assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 merged. was:assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 merged. > Add Test case for for building Hadoop on Windows and > -- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > > Add test case for HDFS-17246 and > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17248) Add Test case for for building Hadoop on Windows and
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua resolved HDFS-17248. Resolution: Invalid > Add Test case for for building Hadoop on Windows and > -- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > > Add test case for HDFS-17246 and > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Add Test case for for building Hadoop on Windows and
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Summary: Add Test case for for building Hadoop on Windows and (was: Add Test for for building Hadoop on Windows) > Add Test case for for building Hadoop on Windows and > -- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Add Test for for building Hadoop on Windows
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Summary: Add Test for for building Hadoop on Windows (was: Fix DFSUtilClient.ValidName ERROR) > Add Test for for building Hadoop on Windows > --- > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17248) Fix DFSUtilClient.ValidName ERROR
liuguanghua created HDFS-17248: -- Summary: Fix DFSUtilClient.ValidName ERROR Key: HDFS-17248 URL: https://issues.apache.org/jira/browse/HDFS-17248 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17248) Fix DFSUtilClient.ValidName ERROR
[ https://issues.apache.org/jira/browse/HDFS-17248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17248: --- Priority: Minor (was: Major) > Fix DFSUtilClient.ValidName ERROR > - > > Key: HDFS-17248 > URL: https://issues.apache.org/jira/browse/HDFS-17248 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Minor > > assertFalse(DFSUtil.isValidName("/foo/:/bar")); will be error when HDFS-17246 > merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14500) NameNode StartupProgress continues to report edit log segments after the LOADING_EDITS phase is finished
[ https://issues.apache.org/jira/browse/HDFS-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-14500: --- Description: When testing out a cluster with the edit log tailing fast path feature enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in safe mode for an extended period of time, preventing the NameNode from fully completing its startup sequence. We noticed that the Startup Progress web UI displayed many edit log segments (millions of them). I traced this problem back to {{{}StartupProgress{}}}. Within {{{}FSEditLogLoader{}}}, the loader continually tries to update the startup progress with a new {{Step}} any time that it loads edits. Per the Javadoc for {{{}StartupProgress{}}}, this should be a no-op once startup is completed: {code:java|title=StartupProgress.java} * After startup completes, the tracked data is frozen. Any subsequent updates * or counter increments are no-ops. {code} However, {{StartupProgress}} only implements that logic once the _entire_ startup sequence has been completed. When {{FSEditLogLoader}} calls {{{}addStep(){}}}, it adds it into the {{LOADING_EDITS}} phase: {code:java|title=FSEditLogLoader.java} StartupProgress prog = NameNode.getStartupProgress(); Step step = createStartupProgressStep(edits); prog.beginStep(Phase.LOADING_EDITS, step); {code} This phase, in our case, ended long before, so it is nonsensical to continue to add steps to it. I believe it is a bug that {{StartupProgress}} accepts such steps instead of ignoring them; once a phase is complete, it should no longer change. was: When testing out a cluster with the edit log tailing fast path feature enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in safe mode for an extended period of time, preventing the NameNode from fully completing its startup sequence. We noticed that the Startup Progress web UI displayed many edit log segments (millions of them). I traced this problem back to {{StartupProgress}}. Within {{FSEditLogLoader}}, the loader continually tries to update the startup progress with a new {{Step}} any time that it loads edits. Per the Javadoc for {{StartupProgress}}, this should be a no-op once startup is completed: {code:title=StartupProgress.java} * After startup completes, the tracked data is frozen. Any subsequent updates * or counter increments are no-ops. {code} However, {{StartupProgress}} only implements that logic once the _entire_ startup sequence has been completed. When {{FSEditLogLoader}} calls {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase: {code:title=FSEditLogLoader.java} StartupProgress prog = NameNode.getStartupProgress(); Step step = createStartupProgressStep(edits); prog.beginStep(Phase.LOADING_EDITS, step); {code} This phase, in our case, ended long before, so it is nonsensical to continue to add steps to it. I believe it is a bug that {{StartupProgress}} accepts such steps instead of ignoring them; once a phase is complete, it should no longer change. > NameNode StartupProgress continues to report edit log segments after the > LOADING_EDITS phase is finished > > > Key: HDFS-14500 > URL: https://issues.apache.org/jira/browse/HDFS-14500 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14500-branch-2.001.patch, HDFS-14500.000.patch, > HDFS-14500.001.patch > > > When testing out a cluster with the edit log tailing fast path feature > enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in > safe mode for an extended period of time, preventing the NameNode from fully > completing its startup sequence. We noticed that the Startup Progress web UI > displayed many edit log segments (millions of them). > I traced this problem back to {{{}StartupProgress{}}}. Within > {{{}FSEditLogLoader{}}}, the loader continually tries to update the startup > progress with a new {{Step}} any time that it loads edits. Per the Javadoc > for {{{}StartupProgress{}}}, this should be a no-op once startup is completed: > {code:java|title=StartupProgress.java} > * After startup completes, the tracked data is frozen. Any subsequent > updates > * or counter increments are no-ops. > {code} > However, {{StartupProgress}} only implements that logic once the _entire_ > startup sequence has been completed. When {{FSEditLogLoader}} calls > {{{}addStep(){}}}, it adds it into the {{LOADING_EDITS}} phase: > {code:java|title=FSEditLogLoader.java} > StartupProgress prog
[jira] [Created] (HDFS-17186) NamenodeProtocol.versionRequest() threw null in DFSRouter service.
liuguanghua created HDFS-17186: -- Summary: NamenodeProtocol.versionRequest() threw null in DFSRouter service. Key: HDFS-17186 URL: https://issues.apache.org/jira/browse/HDFS-17186 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.3.2 Reporter: liuguanghua In the dfsrouter service, I found blow erros. It seems that NamenodeProtocol.versionRequest() got errors. And then with retry found client CallId is null. 2023-09-09 04:00:50,822 ERROR org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: Unexpected exception while communicating with hdfs1-nn1:hdfsmaster1-001:8022: null java.lang.IllegalStateException at org.apache.hadoop.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:494) at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:119) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:162) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy17.versionRequest(Unknown Source) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:270) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:218) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:172) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17182) DataSetLockManager.lockLeakCheck() is not thread-safe.
[ https://issues.apache.org/jira/browse/HDFS-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17182: --- Summary: DataSetLockManager.lockLeakCheck() is not thread-safe. (was: DataSetLockManager.threadCountMap is not thread-safe. ) > DataSetLockManager.lockLeakCheck() is not thread-safe. > --- > > Key: HDFS-17182 > URL: https://issues.apache.org/jira/browse/HDFS-17182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > > threadCountMap is not thread-safe. Other functions add protected by > synchronized expect lockLeakCheck(). Add synchronized on function > lockLeakCheck(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17182) DataSetLockManager.threadCountMap is not thread-safe.
[ https://issues.apache.org/jira/browse/HDFS-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17182: --- Summary: DataSetLockManager.threadCountMap is not thread-safe. (was: threadCountMap is not thread-safe. ) > DataSetLockManager.threadCountMap is not thread-safe. > -- > > Key: HDFS-17182 > URL: https://issues.apache.org/jira/browse/HDFS-17182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > > threadCountMap is not thread-safe. Other functions add protected by > synchronized expect lockLeakCheck(). Add synchronized on function > lockLeakCheck(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17182) threadCountMap is not thread-safe.
[ https://issues.apache.org/jira/browse/HDFS-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17182: -- Assignee: liuguanghua > threadCountMap is not thread-safe. > --- > > Key: HDFS-17182 > URL: https://issues.apache.org/jira/browse/HDFS-17182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > > threadCountMap is not thread-safe. Other functions add protected by > synchronized expect lockLeakCheck(). Add synchronized on function > lockLeakCheck(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17182) threadCountMap is not thread-safe.
liuguanghua created HDFS-17182: -- Summary: threadCountMap is not thread-safe. Key: HDFS-17182 URL: https://issues.apache.org/jira/browse/HDFS-17182 Project: Hadoop HDFS Issue Type: Bug Reporter: liuguanghua threadCountMap is not thread-safe. Other functions add protected by synchronized expect lockLeakCheck(). Add synchronized on function lockLeakCheck(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17129) mis-order of ibr and fbr on datanode
[ https://issues.apache.org/jira/browse/HDFS-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reassigned HDFS-17129: -- Assignee: liuguanghua > mis-order of ibr and fbr on datanode > - > > Key: HDFS-17129 > URL: https://issues.apache.org/jira/browse/HDFS-17129 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: hdfs3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. > But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] (HDFS-17129) mis-order of ibr and fbr on datanode
[ https://issues.apache.org/jira/browse/HDFS-17129 ] liuguanghua deleted comment on HDFS-17129: was (Author: liuguanghua): merge into HDFS-17121 > mis-order of ibr and fbr on datanode > - > > Key: HDFS-17129 > URL: https://issues.apache.org/jira/browse/HDFS-17129 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: hdfs3.4.0 >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. > But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-17129) mis-order of ibr and fbr on datanode
[ https://issues.apache.org/jira/browse/HDFS-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua reopened HDFS-17129: > mis-order of ibr and fbr on datanode > - > > Key: HDFS-17129 > URL: https://issues.apache.org/jira/browse/HDFS-17129 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: hdfs3.4.0 >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. > But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17170) add metris for datanode in function processQueueMessages and reportTo
liuguanghua created HDFS-17170: -- Summary: add metris for datanode in function processQueueMessages and reportTo Key: HDFS-17170 URL: https://issues.apache.org/jira/browse/HDFS-17170 Project: Hadoop HDFS Issue Type: Improvement Reporter: liuguanghua Add two metrics for datanode: (1)BPServiceActorAction.reportTo will execute errorReport and reportBadBlocks, record the counts of it. (2) BPServiceActor.processQueueMessages in the loop with heartbeat to NN,should recore the numOps and time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17129) mis-order of ibr and fbr on datanode
[ https://issues.apache.org/jira/browse/HDFS-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua resolved HDFS-17129. Resolution: Abandoned merge into HDFS-17121 > mis-order of ibr and fbr on datanode > - > > Key: HDFS-17129 > URL: https://issues.apache.org/jira/browse/HDFS-17129 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: hdfs3.4.0 >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. > But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17129) mis-order of ibr and fbr on datanode
[ https://issues.apache.org/jira/browse/HDFS-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747716#comment-17747716 ] liuguanghua commented on HDFS-17129: Yes , the fbr should acquire the lock all the time. And blockreport will send ibr and then fbr, so it can prevent mis-order. > mis-order of ibr and fbr on datanode > - > > Key: HDFS-17129 > URL: https://issues.apache.org/jira/browse/HDFS-17129 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: hdfs3.4.0 >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. > But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17129) mis-order of ibr and fbr on datanode
liuguanghua created HDFS-17129: -- Summary: mis-order of ibr and fbr on datanode Key: HDFS-17129 URL: https://issues.apache.org/jira/browse/HDFS-17129 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.4.0 Environment: hdfs3.4.0 Reporter: liuguanghua HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. But it maybe casue the mis-order of ibr and fbr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17121) BPServiceActor to provide new thread to handle FBR
[ https://issues.apache.org/jira/browse/HDFS-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HDFS-17121: --- Description: After HDFS-16016 , it makes ibr in a thread to avoid heartbeat blocking with ibr when require readlock in Datanode. Now fbr should do as this. The reason is this: (1)heartbeat maybe block because of fbr with readlock in datanode (2)fbr will only may return FinalizeCommand was: # After HDFS-16016 , it makes ibr in a thread to avoid heartbeat blocking with ibr when require readlock in Datanode. # Now fbr should do as this. > BPServiceActor to provide new thread to handle FBR > -- > > Key: HDFS-17121 > URL: https://issues.apache.org/jira/browse/HDFS-17121 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: liuguanghua >Priority: Minor > > After HDFS-16016 , it makes ibr in a thread to avoid heartbeat blocking with > ibr when require readlock in Datanode. > Now fbr should do as this. The reason is this: > (1)heartbeat maybe block because of fbr with readlock in datanode > (2)fbr will only may return FinalizeCommand -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17121) BPServiceActor to provide new thread to handle FBR
liuguanghua created HDFS-17121: -- Summary: BPServiceActor to provide new thread to handle FBR Key: HDFS-17121 URL: https://issues.apache.org/jira/browse/HDFS-17121 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: liuguanghua # After HDFS-16016 , it makes ibr in a thread to avoid heartbeat blocking with ibr when require readlock in Datanode. # Now fbr should do as this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org