[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780172#comment-17780172
 ] 

ASF GitHub Bot commented on HDFS-17024:
---

aajisaka commented on PR #6223:
URL: https://github.com/apache/hadoop/pull/6223#issuecomment-1782324754

   Thank you @jojochuang!




> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-17024:
---
Fix Version/s: 3.3.9

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-17024:
--

Assignee: Segawa Hiroaki

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-17024.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780171#comment-17780171
 ] 

ASF GitHub Bot commented on HDFS-17024:
---

jojochuang merged PR #6223:
URL: https://github.com/apache/hadoop/pull/6223




> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-17024:
--

Assignee: (was: Wei-Chiu Chuang)

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17241) long write lock on active NN from rollEditLog()

2023-10-26 Thread shuaiqi.guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuaiqi.guo updated HDFS-17241:
---
Description: 
when standby NN triggering log roll on active NN and sending fsimage to active 
NN at the same time, the active NN will have a long write lock, which blocks 
almost all requests. like:
{code:java}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 {code}

  was:
when standby NN triggering log roll on active NN and sending fsimage to active 
NN at the same time, the active NN will hive a long write lock, which blocks 
almost all requests. like:
{code:java}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 {code}


> long write lock on active NN from rollEditLog()
> ---
>
> Key: HDFS-17241
> URL: https://issues.apache.org/jira/browse/HDFS-17241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: shuaiqi.guo
>Priority: Major
>
> when standby NN triggering log roll on active NN and sending fsimage to 
> active NN at the same time, the active NN will have a long write lock, which 
> blocks almost all requests. like:
> {code:java}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
> lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(N

[jira] [Updated] (HDFS-17241) long write lock on active NN from rollEditLog()

2023-10-26 Thread shuaiqi.guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuaiqi.guo updated HDFS-17241:
---
Description: 
when standby NN triggering log roll on active NN and sending fsimage to active 
NN at the same time, the active NN will hive a long write lock, which blocks 
almost all requests. like:
{code:java}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 {code}

  was:
when standby NN triggering log roll on active NN and sending fsimage to active 
NN at the same time, the active NN while hive a long write lock, which blocks 
almost all requests. like:
{code:java}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 {code}


> long write lock on active NN from rollEditLog()
> ---
>
> Key: HDFS-17241
> URL: https://issues.apache.org/jira/browse/HDFS-17241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: shuaiqi.guo
>Priority: Major
>
> when standby NN triggering log roll on active NN and sending fsimage to 
> active NN at the same time, the active NN will hive a long write lock, which 
> blocks almost all requests. like:
> {code:java}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
> lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(

[jira] [Created] (HDFS-17241) long write lock on active NN from rollEditLog()

2023-10-26 Thread shuaiqi.guo (Jira)
shuaiqi.guo created HDFS-17241:
--

 Summary: long write lock on active NN from rollEditLog()
 Key: HDFS-17241
 URL: https://issues.apache.org/jira/browse/HDFS-17241
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.1.2
Reporter: shuaiqi.guo


when standby NN triggering log roll on active NN and sending fsimage to active 
NN at the same time, the active NN while hive a long write lock, which blocks 
almost all requests. like:
{code:java}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write 
lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146)
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17063) Datanode configures different Capacity Reserved for each disk

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780140#comment-17780140
 ] 

ASF GitHub Bot commented on HDFS-17063:
---

qijiale76 commented on PR #5793:
URL: https://github.com/apache/hadoop/pull/5793#issuecomment-1782188296

   > LGTM.
   
   Thank you.




> Datanode configures different Capacity Reserved for each disk
> -
>
> Key: HDFS-17063
> URL: https://issues.apache.org/jira/browse/HDFS-17063
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
>Affects Versions: 3.3.6
>Reporter: Jiale Qi
>Assignee: Jiale Qi
>Priority: Minor
>  Labels: pull-request-available
>
> Now _dfs.datanode.du.reserved_ takes effect for all directory of a datanode.
> This issue allows cluster administrator to configure 
> {_}dfs.datanode.du.reserved./data/hdfs1/data{_}, which only take effect for a 
> specific directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780123#comment-17780123
 ] 

ASF GitHub Bot commented on HDFS-17024:
---

aajisaka commented on PR #6223:
URL: https://github.com/apache/hadoop/pull/6223#issuecomment-1782081868

   @jojochuang Could you take a look?




> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service

2023-10-26 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780041#comment-17780041
 ] 

Wei-Chiu Chuang commented on HDFS-15273:


+1 sorry for the very late review.

IMO the second sleep isn't really needed.

> CacheReplicationMonitor hold lock for long time and lead to NN out of service
> -
>
> Key: HDFS-15273
> URL: https://issues.apache.org/jira/browse/HDFS-15273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, 
> HDFS-15273.003.patch
>
>
> CacheReplicationMonitor scan Cache Directives and Cached BlockMap 
> periodically. If we add more and more cache directives, 
> CacheReplicationMonitor will cost very long time to rescan all of cache 
> directives and cache blocks. Meanwhile, scan operation hold global write 
> lock, during scan period, NameNode could not process other request.
> So I think we should warn this risk to end user who turn on CacheManager 
> feature before improve this implement.
> {code:java}
>   private void rescan() throws InterruptedException {
> scannedDirectives = 0;
> scannedBlocks = 0;
> try {
>   namesystem.writeLock();
>   try {
> lock.lock();
> if (shutdown) {
>   throw new InterruptedException("CacheReplicationMonitor was " +
>   "shut down.");
> }
> curScanCount = completedScanCount + 1;
>   } finally {
> lock.unlock();
>   }
>   resetStatistics();
>   rescanCacheDirectives();
>   rescanCachedBlockMap();
>   blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
> } finally {
>   namesystem.writeUnlock();
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15273:
---
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> CacheReplicationMonitor hold lock for long time and lead to NN out of service
> -
>
> Key: HDFS-15273
> URL: https://issues.apache.org/jira/browse/HDFS-15273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, 
> HDFS-15273.003.patch
>
>
> CacheReplicationMonitor scan Cache Directives and Cached BlockMap 
> periodically. If we add more and more cache directives, 
> CacheReplicationMonitor will cost very long time to rescan all of cache 
> directives and cache blocks. Meanwhile, scan operation hold global write 
> lock, during scan period, NameNode could not process other request.
> So I think we should warn this risk to end user who turn on CacheManager 
> feature before improve this implement.
> {code:java}
>   private void rescan() throws InterruptedException {
> scannedDirectives = 0;
> scannedBlocks = 0;
> try {
>   namesystem.writeLock();
>   try {
> lock.lock();
> if (shutdown) {
>   throw new InterruptedException("CacheReplicationMonitor was " +
>   "shut down.");
> }
> curScanCount = completedScanCount + 1;
>   } finally {
> lock.unlock();
>   }
>   resetStatistics();
>   rescanCacheDirectives();
>   rescanCachedBlockMap();
>   blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
> } finally {
>   namesystem.writeUnlock();
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17232) RBF: Fix NoNamenodesAvailableException for a long time, when use observer

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780040#comment-17780040
 ] 

ASF GitHub Bot commented on HDFS-17232:
---

hadoop-yetus commented on PR #6208:
URL: https://github.com/apache/hadoop/pull/6208#issuecomment-1781581422

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  
hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 0 new + 0 unchanged - 
1 fixed = 0 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  23m 17s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 168m 13s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6208 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c27973c4abac 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4535f16424a5663abf262562660ebbf073f08e54 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/testReport/ |
   | Max. process+thread count | 2422 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This m

[jira] [Commented] (HDFS-16849) Terminate SNN when failing to perform EditLogTailing

2023-10-26 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780007#comment-17780007
 ] 

Srinivasu Majeti commented on HDFS-16849:
-

CCing [~weichiu] to review this .

> Terminate SNN when failing to perform EditLogTailing
> 
>
> Key: HDFS-16849
> URL: https://issues.apache.org/jira/browse/HDFS-16849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Karthik Palanisamy
>Priority: Major
>
> We should terminate SNN if we fail LogTrailing for sufficient JN. We found 
> this after Kerberos error. 
> {code:java}
> 2022-10-14 10:53:16,796 INFO 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms 
> (timeout=2 ms) for a response for selectStreamingInputStreams. Exceptions 
> so far: [:8485:  DestHost:destPort :8485 , LocalHost:localPort 
> /:0. Failed on local exception: 
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> hdfs/  javax.security.auth.login.LoginException: Client not found in 
> Kerberos database (6)]
> 2022-10-14 10:53:30,796 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
> streams from QJM to [:8485, :8485, :8485]. Skipping.
> java.io.IOException: Timed out waiting 2ms for a quorum of nodes to 
> respond.
>         at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427)
>  {code}
>  
> We have no check whether sufficient JN met: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280]
> So we should implement a similar check this,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service

2023-10-26 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780003#comment-17780003
 ] 

Srinivasu Majeti commented on HDFS-15273:
-

CCing [~weichiu] to review this and approve.

> CacheReplicationMonitor hold lock for long time and lead to NN out of service
> -
>
> Key: HDFS-15273
> URL: https://issues.apache.org/jira/browse/HDFS-15273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, 
> HDFS-15273.003.patch
>
>
> CacheReplicationMonitor scan Cache Directives and Cached BlockMap 
> periodically. If we add more and more cache directives, 
> CacheReplicationMonitor will cost very long time to rescan all of cache 
> directives and cache blocks. Meanwhile, scan operation hold global write 
> lock, during scan period, NameNode could not process other request.
> So I think we should warn this risk to end user who turn on CacheManager 
> feature before improve this implement.
> {code:java}
>   private void rescan() throws InterruptedException {
> scannedDirectives = 0;
> scannedBlocks = 0;
> try {
>   namesystem.writeLock();
>   try {
> lock.lock();
> if (shutdown) {
>   throw new InterruptedException("CacheReplicationMonitor was " +
>   "shut down.");
> }
> curScanCount = completedScanCount + 1;
>   } finally {
> lock.unlock();
>   }
>   resetStatistics();
>   rescanCacheDirectives();
>   rescanCachedBlockMap();
>   blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
> } finally {
>   namesystem.writeUnlock();
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17240) Fix a typo in DataStorage.java

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780002#comment-17780002
 ] 

ASF GitHub Bot commented on HDFS-17240:
---

yuw1 opened a new pull request, #6226:
URL: https://github.com/apache/hadoop/pull/6226

   ### Description of PR
   Fix a typo in DataStorage.java
   
  /**
   -   * Analize which and whether a transition of the fs state is required
   +   * Analyze which and whether a transition of the fs state is required
   * and perform it if necessary.
   * 
   
   ### How was this patch tested?
   This patch only modifies comments and does not require adding new test cases
   
   




> Fix a typo in DataStorage.java
> --
>
> Key: HDFS-17240
> URL: https://issues.apache.org/jira/browse/HDFS-17240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Yu Wang
>Priority: Trivial
>
> Fix a typo in DataStorage.java
>  
> {code:java}
>    /**
> -   * Analize which and whether a transition of the fs state is required
> +   * Analyze which and whether a transition of the fs state is required
>     * and perform it if necessary.
>     * {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17240) Fix a typo in DataStorage.java

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17240:
--
Labels: pull-request-available  (was: )

> Fix a typo in DataStorage.java
> --
>
> Key: HDFS-17240
> URL: https://issues.apache.org/jira/browse/HDFS-17240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Yu Wang
>Priority: Trivial
>  Labels: pull-request-available
>
> Fix a typo in DataStorage.java
>  
> {code:java}
>    /**
> -   * Analize which and whether a transition of the fs state is required
> +   * Analyze which and whether a transition of the fs state is required
>     * and perform it if necessary.
>     * {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-10-26 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780001#comment-17780001
 ] 

Wei-Chiu Chuang commented on HDFS-16950:


Karthik said because of the missing edit logs it caused data loss. And it's 
reproducible.

A workaround would be to enter the NN in safe mode, take checkpoint, before 
proceed with the migration.


> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Critical
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17240) Fix a typo in DataStorage.java

2023-10-26 Thread Yu Wang (Jira)
Yu Wang created HDFS-17240:
--

 Summary: Fix a typo in DataStorage.java
 Key: HDFS-17240
 URL: https://issues.apache.org/jira/browse/HDFS-17240
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Yu Wang


Fix a typo in DataStorage.java

 
{code:java}
   /**
-   * Analize which and whether a transition of the fs state is required
+   * Analyze which and whether a transition of the fs state is required
    * and perform it if necessary.
    * {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16950:
---
Priority: Critical  (was: Major)

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Critical
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16950) Gap in edits after -initializeSharedEdits

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16950:
---
Issue Type: Bug  (was: Improvement)

> Gap in edits after -initializeSharedEdits
> -
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, namenode
>Reporter: Karthik Palanisamy
>Priority: Critical
>
> Namenode failed in the production cluster when JN role is migrated. 
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start 
> namenode.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid xx, but got txid xx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no 
> checkpoint is performed in the past few hours. 
> InitializeSharedEdits created a new log segment from the edit_inprogres 
> transaction and deleted all old transactions. 
> My ask here is to delete any edit transaction older than the fimage 
> transaction. But currently, it deletes all transactions and no check is 
> enforced in JNStorage#format(). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779965#comment-17779965
 ] 

ASF GitHub Bot commented on HDFS-17239:
---

hadoop-yetus commented on PR #6225:
URL: https://github.com/apache/hadoop/pull/6225#issuecomment-1781406625

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  3s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 52 unchanged - 
0 fixed = 54 total (was 52)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | -1 :x: |  spotbugs  |   3m 23s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 
total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  37m 25s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 227m  3s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 374m 29s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  Possible null pointer dereference of info in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock,
 FsVolumeImpl)  Dereferenced at FsDatasetImpl.java:info in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock,
 FsVolumeImpl)  Dereferenced at FsDatasetImpl.java:[line 2466] |
   |  |  Possible null pointer derefere

[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779963#comment-17779963
 ] 

ASF GitHub Bot commented on HDFS-17239:
---

hadoop-yetus commented on PR #6225:
URL: https://github.com/apache/hadoop/pull/6225#issuecomment-1781403809

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m  9s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  3s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 52 unchanged - 
0 fixed = 54 total (was 52)  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | -1 :x: |  spotbugs  |   3m 26s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 
total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  35m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 232m 52s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 375m 35s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  Possible null pointer dereference of info in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock,
 FsVolumeImpl)  Dereferenced at FsDatasetImpl.java:info in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock,
 FsVolumeImpl)  Dereferenced at FsDatasetImpl.java:[line 2466] |
   |  |  Possible null pointer derefere

[jira] [Commented] (HDFS-17232) RBF: Fix NoNamenodesAvailableException for a long time, when use observer

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779946#comment-17779946
 ] 

ASF GitHub Bot commented on HDFS-17232:
---

KeeProMise commented on code in PR #6208:
URL: https://github.com/apache/hadoop/pull/6208#discussion_r1373336587


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestNoNamenodesAvailableLongTime.java:
##
@@ -0,0 +1,432 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import static org.apache.hadoop.fs.permission.AclEntryType.USER;
+import static org.apache.hadoop.fs.permission.FsAction.ALL;
+import static org.apache.hadoop.fs.permission.AclEntryScope.DEFAULT;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.AclEntry;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import 
org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster.RouterContext;
+import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder;
+import org.apache.hadoop.hdfs.server.federation.StateStoreDFSCluster;
+import org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCMetrics;
+import 
org.apache.hadoop.hdfs.server.federation.resolver.FederationNamenodeContext;
+import 
org.apache.hadoop.hdfs.server.federation.resolver.FederationNamenodeServiceState;
+import org.apache.hadoop.hdfs.server.namenode.NameNode;
+import org.apache.hadoop.ipc.RemoteException;
+import org.apache.hadoop.util.Lists;
+import org.junit.After;
+import org.junit.Test;
+
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.List;
+
+import static org.apache.hadoop.ha.HAServiceProtocol.HAServiceState.ACTIVE;
+import static 
org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster.DEFAULT_HEARTBEAT_INTERVAL_MS;
+import static org.apache.hadoop.hdfs.server.namenode.AclTestHelpers.aclEntry;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * When failover occurs, the router may record that the ns has no active 
namenode
+ * even if there is actually an active namenode.
+ * Only when the router updates the cache next time can the memory status be 
updated,
+ * causing the router to report NoNamenodesAvailableException for a long time,
+ *
+ * @see 
org.apache.hadoop.hdfs.server.federation.router.NoNamenodesAvailableException
+ */
+public class TestNoNamenodesAvailableLongTime {
+
+  // router load cache interval 10s
+  private static final long CACHE_FLUSH_INTERVAL_MS = 1;
+  private StateStoreDFSCluster cluster;
+  private FileSystem fileSystem;
+  private RouterContext routerContext;
+  private FederationRPCMetrics rpcMetrics;
+
+  @After
+  public void cleanup() throws IOException {
+rpcMetrics = null;
+routerContext = null;
+if (fileSystem != null) {
+  fileSystem.close();
+  fileSystem = null;
+}
+if (cluster != null) {
+  cluster.shutdown();
+  cluster = null;
+}
+  }
+
+  /**
+   * Set up state store cluster.
+   *
+   * @param numNameservices number of name services
+   * @param numberOfObserver number of observer
+   * @param useObserver whether to use observer
+   */
+  private void setupCluster(int numNameservices, int numberOfObserver, boolean 
useObserver)
+  throws Exception {
+if (!useObserver) {
+  numberOfObserver = 0;
+}
+int numberOfNamenode = 2 + numberOfObserver;
+cluster = new StateStoreDFSCluster(true, numNameservices, numberOfNamenode,
+DEFAULT_HEARTBEAT_INTERVAL_MS, CACHE_FLUSH_INTERVAL_MS);
+Configuration routerConf = new RouterConfigBuilder()
+.stateStore()
+.metrics()
+.admin()
+.rpc()
+.heartbeat()
+.build();
+
+// Set router observer related configs
+if (useObserver) {
+  
routerConf.setB

[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17239:
--
Labels: pull-request-available  (was: )

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779817#comment-17779817
 ] 

ASF GitHub Bot commented on HDFS-17239:
---

hfutatzhanghb opened a new pull request, #6225:
URL: https://github.com/apache/hadoop/pull/6225

   ### Description of PR
   In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
the range of BLOCK_POOl write lock.  We should move them out of write lock.
   




> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread farmmamba (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779796#comment-17779796
 ] 

farmmamba commented on HDFS-17239:
--

[~haiyang Hu] Sir, what's your opinions? Thanks.

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba updated HDFS-17239:
-
Description: In method FsDatasetImpl#removeReplicaFromMem,  there exists 
some loggings in the range of BLOCK_POOl write lock.  We should move them out 
of write lock.  (was: In method FsDatasetImpl#invalidate,  there exists some 
loggings in the range of BLOCK_POOl write lock.  We should move them out of 
write lock.)

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2023-10-26 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba updated HDFS-17239:
-
Summary: Remove logging of method removeReplicaFromMem which is in 
BLOCK_POOl write lock.   (was: Remove logging of method invalidate which is in 
BLOCK_POOl write lock. )

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> In method FsDatasetImpl#invalidate,  there exists some loggings in the range 
> of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread Jim Halfpenny (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779789#comment-17779789
 ] 

Jim Halfpenny commented on HDFS-17238:
--

In your example you've set dfs.blocksize to 1.25 TiB, a value well outside the 
bounds of what would be considered typical or useful. This isn't a bug, it is a 
configuration error.

Do any of your data volumes have 1.25 TiB available? If not, then 
BlockManager.chooseTarget4NewBlock() is working as designed since it cannot 
find a data node with sufficient capacity for the new block.

> Setting the value of "dfs.blocksize" too large will cause HDFS to be unable 
> to write to files
> -
>
> Key: HDFS-17238
> URL: https://issues.apache.org/jira/browse/HDFS-17238
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.6
>Reporter: ECFuzz
>Priority: Major
>
> My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.
> core-site.xml like below.
> {code:java}
> 
>   
>         fs.defaultFS
>         hdfs://localhost:9000
>     
>     
>         hadoop.tmp.dir
>         /home/hadoop/Mutil_Component/tmp
>     
>    
> {code}
> hdfs-site.xml like below.
> {code:java}
> 
>    
>         dfs.replication
>         1
>     
> 
>         dfs.blocksize
>         134217728
>     
>    
> {code}
> And then format the namenode, and start the hdfs. HDFS is running normally.
> {code:java}
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs namenode -format
> x(many info)
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> sbin/start-dfs.sh
> Starting namenodes on [localhost]
> Starting datanodes
> Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code}
> Finally, use dfs to place a file. 
> {code:java}
> bin/hdfs dfs -mkdir -p /user/hadoop
> bin/hdfs dfs -mkdir input
> bin/hdfs dfs -put etc/hadoop/*.xml input {code}
> Discovering Exception Throwing.
> {code:java}
> 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 
> 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 
> node(s) are excluded in this operation.
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1513)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
>         at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.

[jira] [Commented] (HDFS-17239) Remove logging of method invalidate which is in BLOCK_POOl write lock.

2023-10-26 Thread farmmamba (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779785#comment-17779785
 ] 

farmmamba commented on HDFS-17239:
--

[~hexiaoqiao] [~zhangshuyan]  [~tomscut]  [~qinyuren]  Sir, could you please 
take a look at this when you have free time?

> Remove logging of method invalidate which is in BLOCK_POOl write lock. 
> ---
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> In method FsDatasetImpl#invalidate,  there exists some loggings in the range 
> of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17239) Remove logging of method invalidate which is in BLOCK_POOl write lock.

2023-10-26 Thread farmmamba (Jira)
farmmamba created HDFS-17239:


 Summary: Remove logging of method invalidate which is in 
BLOCK_POOl write lock. 
 Key: HDFS-17239
 URL: https://issues.apache.org/jira/browse/HDFS-17239
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.4.0
Reporter: farmmamba
Assignee: farmmamba


In method FsDatasetImpl#invalidate,  there exists some loggings in the range of 
BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17238:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}
And then format the namenode, and start the hdfs. HDFS is running normally.
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs namenode -format
x(many info)
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code}
Finally, use dfs to place a file. 
{code:java}
bin/hdfs dfs -mkdir -p /user/hadoop
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input {code}
Discovering Exception Throwing.
{code:java}
2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 
of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) 
are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
        at org.apache.hadoop.ipc.Client.call(Client.java:1513)
        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
        at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915)
        at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717)
        at org.apache.hadoop.

[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17238:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}
And then format the namenode, and start the hdfs. HDFS is running normally.
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs namenode -format
x(many info)
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code}
Finally, use dfs to place a file. 
{code:java}
bin/hdfs dfs -mkdir -p /user/hadoop
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input {code}
Discovering Exception Throwing.
{code:java}
2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 
of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) 
are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
        at org.apache.hadoop.ipc.Client.call(Client.java:1513)
        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
        at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915)
        at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717)
        at org.apache.hadoop.

[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17238:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}
And then format the namenode, and start the hdfs. HDFS is running normally.
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs namenode -format
x(many info)
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code}
Finally, use dfs to place a file. 
{code:java}
bin/hdfs dfs -mkdir -p /user/hadoop
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input {code}
Discovering Exception Throwing.
{code:java}
2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 
of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) 
are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
        at org.apache.hadoop.ipc.Client.call(Client.java:1513)
        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
        at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
        at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915)
        at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717)
        at org.apache.hadoop.

[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17238:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}
Then format the name node and start hdfs.

 

 

 

  was:
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}




> Setting the value of "dfs.blocksize" too large will cause HDFS to be unable 
> to write to files
> -
>
> Key: HDFS-17238
> URL: https://issues.apache.org/jira/browse/HDFS-17238
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.6
>Reporter: ECFuzz
>Priority: Major
>
> My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.
> core-site.xml like below.
> {code:java}
> 
>   
>         fs.defaultFS
>         hdfs://localhost:9000
>     
>     
>         hadoop.tmp.dir
>         /home/hadoop/Mutil_Component/tmp
>     
>    
> {code}
> hdfs-site.xml like below.
> {code:java}
> 
>    
>         dfs.replication
>         1
>     
> 
>         dfs.blocksize
>         134217728
>     
>    
> {code}
> Then format the name node and start hdfs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17069) The documentation and implementation of "dfs.blocksize" are inconsistent.

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17069:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        128k
    
   
{code}
然后格式化名称节点,并启动 hdfs。
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs namenode -format
x(many info)
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996]{code}
最后,使用 dfs 放置一个文件。然后我收到消息,这意味着 128k 小于 1M。

 
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs dfs -mkdir -p /user/hadoop
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs dfs -mkdir input
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hdfs dfs -put etc/hadoop/hdfs-site.xml input
put: Specified block size is less than configured minimum value 
(dfs.namenode.fs-limits.min-block-size): 131072 < 1048576
{code}
但我发现在文档中,dfs.blocksize 可以设置为 128k 和 hdfs-default 中的其他值.xml .
{code:java}
The default block size for new files, in bytes. You can use the following 
suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) 
to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in 
bytes (such as 134217728 for 128 MB).{code}
那么,这里的文档是否应该存在一些问题?或者应该注意用户将此配置设置为大于 1M?

 

此外,我启动纱线并运行给定的 mapreduce 作业。
{code:java}
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
sbin/start-yarn.sh 
hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep 
input output 'dfs[a-z.]+'{code}
并且,外壳会抛出一些异常,如下所示。
{code:java}
2023-07-12 15:12:29,964 INFO client.DefaultNoHARMFailoverProxyProvider: 
Connecting to ResourceManager at /0.0.0.0:8032
2023-07-12 15:12:30,430 INFO mapreduce.JobResourceUploader: Disabling Erasure 
Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001
2023-07-12 15:12:30,542 INFO mapreduce.JobSubmitter: Cleaning up the staging 
area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
size is less than configured minimum value 
(dfs.namenode.fs-limits.min-block-size): 131072 < 1048576
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2690)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2625)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:807)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:496)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
        at org.apache.hadoop.ipc.Client.call(Client.java:1513)
        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
        at com.sun.proxy.$Proxy9.create(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:383)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at

[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ECFuzz updated HDFS-17238:
--
Description: 
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}



  was:
My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}


> Setting the value of "dfs.blocksize" too large will cause HDFS to be unable 
> to write to files
> -
>
> Key: HDFS-17238
> URL: https://issues.apache.org/jira/browse/HDFS-17238
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.6
>Reporter: ECFuzz
>Priority: Major
>
> My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.
> core-site.xml like below.
> {code:java}
> 
>   
>         fs.defaultFS
>         hdfs://localhost:9000
>     
>     
>         hadoop.tmp.dir
>         /home/hadoop/Mutil_Component/tmp
>     
>    
> {code}
> hdfs-site.xml like below.
> {code:java}
> 
>    
>         dfs.replication
>         1
>     
> 
>         dfs.blocksize
>         134217728
>     
>    
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-10-26 Thread ECFuzz (Jira)
ECFuzz created HDFS-17238:
-

 Summary: Setting the value of "dfs.blocksize" too large will cause 
HDFS to be unable to write to files
 Key: HDFS-17238
 URL: https://issues.apache.org/jira/browse/HDFS-17238
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.3.6
Reporter: ECFuzz


My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.

core-site.xml like below.
{code:java}

  
        fs.defaultFS
        hdfs://localhost:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/Mutil_Component/tmp
    
   
{code}
hdfs-site.xml like below.
{code:java}

   
        dfs.replication
        1
    

        dfs.blocksize
        134217728
    
   
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org