[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780172#comment-17780172 ] ASF GitHub Bot commented on HDFS-17024: --- aajisaka commented on PR #6223: URL: https://github.com/apache/hadoop/pull/6223#issuecomment-1782324754 Thank you @jojochuang! > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Assignee: Segawa Hiroaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-17024: --- Fix Version/s: 3.3.9 > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Assignee: Segawa Hiroaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reassigned HDFS-17024: -- Assignee: Segawa Hiroaki > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Assignee: Segawa Hiroaki >Priority: Major > Labels: pull-request-available > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-17024. Fix Version/s: 3.4.0 Resolution: Fixed > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Assignee: Segawa Hiroaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780171#comment-17780171 ] ASF GitHub Bot commented on HDFS-17024: --- jojochuang merged PR #6223: URL: https://github.com/apache/hadoop/pull/6223 > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reassigned HDFS-17024: -- Assignee: (was: Wei-Chiu Chuang) > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17241) long write lock on active NN from rollEditLog()
[ https://issues.apache.org/jira/browse/HDFS-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo updated HDFS-17241: --- Description: when standby NN triggering log roll on active NN and sending fsimage to active NN at the same time, the active NN will have a long write lock, which blocks almost all requests. like: {code:java} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146) org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) {code} was: when standby NN triggering log roll on active NN and sending fsimage to active NN at the same time, the active NN will hive a long write lock, which blocks almost all requests. like: {code:java} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146) org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) {code} > long write lock on active NN from rollEditLog() > --- > > Key: HDFS-17241 > URL: https://issues.apache.org/jira/browse/HDFS-17241 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.2 >Reporter: shuaiqi.guo >Priority: Major > > when standby NN triggering log roll on active NN and sending fsimage to > active NN at the same time, the active NN will have a long write lock, which > blocks almost all requests. like: > {code:java} > INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write > lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(N
[jira] [Updated] (HDFS-17241) long write lock on active NN from rollEditLog()
[ https://issues.apache.org/jira/browse/HDFS-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuaiqi.guo updated HDFS-17241: --- Description: when standby NN triggering log roll on active NN and sending fsimage to active NN at the same time, the active NN will hive a long write lock, which blocks almost all requests. like: {code:java} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146) org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) {code} was: when standby NN triggering log roll on active NN and sending fsimage to active NN at the same time, the active NN while hive a long write lock, which blocks almost all requests. like: {code:java} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146) org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) {code} > long write lock on active NN from rollEditLog() > --- > > Key: HDFS-17241 > URL: https://issues.apache.org/jira/browse/HDFS-17241 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.2 >Reporter: shuaiqi.guo >Priority: Major > > when standby NN triggering log roll on active NN and sending fsimage to > active NN at the same time, the active NN will hive a long write lock, which > blocks almost all requests. like: > {code:java} > INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write > lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(
[jira] [Created] (HDFS-17241) long write lock on active NN from rollEditLog()
shuaiqi.guo created HDFS-17241: -- Summary: long write lock on active NN from rollEditLog() Key: HDFS-17241 URL: https://issues.apache.org/jira/browse/HDFS-17241 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.1.2 Reporter: shuaiqi.guo when standby NN triggering log roll on active NN and sending fsimage to active NN at the same time, the active NN while hive a long write lock, which blocks almost all requests. like: {code:java} INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 27179 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:273) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:235) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1617) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4663) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146) org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12974) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17063) Datanode configures different Capacity Reserved for each disk
[ https://issues.apache.org/jira/browse/HDFS-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780140#comment-17780140 ] ASF GitHub Bot commented on HDFS-17063: --- qijiale76 commented on PR #5793: URL: https://github.com/apache/hadoop/pull/5793#issuecomment-1782188296 > LGTM. Thank you. > Datanode configures different Capacity Reserved for each disk > - > > Key: HDFS-17063 > URL: https://issues.apache.org/jira/browse/HDFS-17063 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, hdfs >Affects Versions: 3.3.6 >Reporter: Jiale Qi >Assignee: Jiale Qi >Priority: Minor > Labels: pull-request-available > > Now _dfs.datanode.du.reserved_ takes effect for all directory of a datanode. > This issue allows cluster administrator to configure > {_}dfs.datanode.du.reserved./data/hdfs1/data{_}, which only take effect for a > specific directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17024) Potential data race introduced by HDFS-15865
[ https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780123#comment-17780123 ] ASF GitHub Bot commented on HDFS-17024: --- aajisaka commented on PR #6223: URL: https://github.com/apache/hadoop/pull/6223#issuecomment-1782081868 @jojochuang Could you take a look? > Potential data race introduced by HDFS-15865 > > > Key: HDFS-17024 > URL: https://issues.apache.org/jira/browse/HDFS-17024 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.3.1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > > After HDFS-15865, we found client aborted due to an NPE. > {noformat} > 2023-04-10 16:07:43,409 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region > server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server > shutdown * > org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM > RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781) > at > org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806) > {noformat} > This is only possible if a data race happened. File this jira to improve the > data and eliminate the data race. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780041#comment-17780041 ] Wei-Chiu Chuang commented on HDFS-15273: +1 sorry for the very late review. IMO the second sleep isn't really needed. > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, > HDFS-15273.003.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15273: --- Fix Version/s: 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, > HDFS-15273.003.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17232) RBF: Fix NoNamenodesAvailableException for a long time, when use observer
[ https://issues.apache.org/jira/browse/HDFS-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780040#comment-17780040 ] ASF GitHub Bot commented on HDFS-17232: --- hadoop-yetus commented on PR #6208: URL: https://github.com/apache/hadoop/pull/6208#issuecomment-1781581422 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 50m 8s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 28s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 34s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 22s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 14s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 23m 17s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 168m 13s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6208 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c27973c4abac 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4535f16424a5663abf262562660ebbf073f08e54 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/testReport/ | | Max. process+thread count | 2422 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6208/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This m
[jira] [Commented] (HDFS-16849) Terminate SNN when failing to perform EditLogTailing
[ https://issues.apache.org/jira/browse/HDFS-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780007#comment-17780007 ] Srinivasu Majeti commented on HDFS-16849: - CCing [~weichiu] to review this . > Terminate SNN when failing to perform EditLogTailing > > > Key: HDFS-16849 > URL: https://issues.apache.org/jira/browse/HDFS-16849 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Karthik Palanisamy >Priority: Major > > We should terminate SNN if we fail LogTrailing for sufficient JN. We found > this after Kerberos error. > {code:java} > 2022-10-14 10:53:16,796 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms > (timeout=2 ms) for a response for selectStreamingInputStreams. Exceptions > so far: [:8485: DestHost:destPort :8485 , LocalHost:localPort > /:0. Failed on local exception: > org.apache.hadoop.security.KerberosAuthException: Login failure for user: > hdfs/ javax.security.auth.login.LoginException: Client not found in > Kerberos database (6)] > 2022-10-14 10:53:30,796 WARN > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input > streams from QJM to [:8485, :8485, :8485]. Skipping. > java.io.IOException: Timed out waiting 2ms for a quorum of nodes to > respond. > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectStreamingInputStreams(QuorumJournalManager.java:605) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:523) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:269) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1706) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:311) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:464) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:414) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:431) > at java.base/java.security.AccessController.doPrivileged(Native > Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:361) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:427) > {code} > > We have no check whether sufficient JN met: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L280] > So we should implement a similar check this, > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java#L395] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780003#comment-17780003 ] Srinivasu Majeti commented on HDFS-15273: - CCing [~weichiu] to review this and approve. > CacheReplicationMonitor hold lock for long time and lead to NN out of service > - > > Key: HDFS-15273 > URL: https://issues.apache.org/jira/browse/HDFS-15273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch, > HDFS-15273.003.patch > > > CacheReplicationMonitor scan Cache Directives and Cached BlockMap > periodically. If we add more and more cache directives, > CacheReplicationMonitor will cost very long time to rescan all of cache > directives and cache blocks. Meanwhile, scan operation hold global write > lock, during scan period, NameNode could not process other request. > So I think we should warn this risk to end user who turn on CacheManager > feature before improve this implement. > {code:java} > private void rescan() throws InterruptedException { > scannedDirectives = 0; > scannedBlocks = 0; > try { > namesystem.writeLock(); > try { > lock.lock(); > if (shutdown) { > throw new InterruptedException("CacheReplicationMonitor was " + > "shut down."); > } > curScanCount = completedScanCount + 1; > } finally { > lock.unlock(); > } > resetStatistics(); > rescanCacheDirectives(); > rescanCachedBlockMap(); > blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime(); > } finally { > namesystem.writeUnlock(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17240) Fix a typo in DataStorage.java
[ https://issues.apache.org/jira/browse/HDFS-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780002#comment-17780002 ] ASF GitHub Bot commented on HDFS-17240: --- yuw1 opened a new pull request, #6226: URL: https://github.com/apache/hadoop/pull/6226 ### Description of PR Fix a typo in DataStorage.java /** - * Analize which and whether a transition of the fs state is required + * Analyze which and whether a transition of the fs state is required * and perform it if necessary. * ### How was this patch tested? This patch only modifies comments and does not require adding new test cases > Fix a typo in DataStorage.java > -- > > Key: HDFS-17240 > URL: https://issues.apache.org/jira/browse/HDFS-17240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yu Wang >Priority: Trivial > > Fix a typo in DataStorage.java > > {code:java} > /** > - * Analize which and whether a transition of the fs state is required > + * Analyze which and whether a transition of the fs state is required > * and perform it if necessary. > * {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17240) Fix a typo in DataStorage.java
[ https://issues.apache.org/jira/browse/HDFS-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17240: -- Labels: pull-request-available (was: ) > Fix a typo in DataStorage.java > -- > > Key: HDFS-17240 > URL: https://issues.apache.org/jira/browse/HDFS-17240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yu Wang >Priority: Trivial > Labels: pull-request-available > > Fix a typo in DataStorage.java > > {code:java} > /** > - * Analize which and whether a transition of the fs state is required > + * Analyze which and whether a transition of the fs state is required > * and perform it if necessary. > * {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16950) Gap in edits after -initializeSharedEdits
[ https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780001#comment-17780001 ] Wei-Chiu Chuang commented on HDFS-16950: Karthik said because of the missing edit logs it caused data loss. And it's reproducible. A workaround would be to enter the NN in safe mode, take checkpoint, before proceed with the migration. > Gap in edits after -initializeSharedEdits > - > > Key: HDFS-16950 > URL: https://issues.apache.org/jira/browse/HDFS-16950 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Reporter: Karthik Palanisamy >Priority: Critical > > Namenode failed in the production cluster when JN role is migrated. > {code:java} > ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid xx, but got txid xx. {code} > InitializeSharedEdits issued as part of the role migration step. Note, no > checkpoint is performed in the past few hours. > InitializeSharedEdits created a new log segment from the edit_inprogres > transaction and deleted all old transactions. > My ask here is to delete any edit transaction older than the fimage > transaction. But currently, it deletes all transactions and no check is > enforced in JNStorage#format(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17240) Fix a typo in DataStorage.java
Yu Wang created HDFS-17240: -- Summary: Fix a typo in DataStorage.java Key: HDFS-17240 URL: https://issues.apache.org/jira/browse/HDFS-17240 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Yu Wang Fix a typo in DataStorage.java {code:java} /** - * Analize which and whether a transition of the fs state is required + * Analyze which and whether a transition of the fs state is required * and perform it if necessary. * {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16950) Gap in edits after -initializeSharedEdits
[ https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-16950: --- Priority: Critical (was: Major) > Gap in edits after -initializeSharedEdits > - > > Key: HDFS-16950 > URL: https://issues.apache.org/jira/browse/HDFS-16950 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, namenode >Reporter: Karthik Palanisamy >Priority: Critical > > Namenode failed in the production cluster when JN role is migrated. > {code:java} > ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid xx, but got txid xx. {code} > InitializeSharedEdits issued as part of the role migration step. Note, no > checkpoint is performed in the past few hours. > InitializeSharedEdits created a new log segment from the edit_inprogres > transaction and deleted all old transactions. > My ask here is to delete any edit transaction older than the fimage > transaction. But currently, it deletes all transactions and no check is > enforced in JNStorage#format(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16950) Gap in edits after -initializeSharedEdits
[ https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-16950: --- Issue Type: Bug (was: Improvement) > Gap in edits after -initializeSharedEdits > - > > Key: HDFS-16950 > URL: https://issues.apache.org/jira/browse/HDFS-16950 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namenode >Reporter: Karthik Palanisamy >Priority: Critical > > Namenode failed in the production cluster when JN role is migrated. > {code:java} > ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start > namenode. > java.io.IOException: There appears to be a gap in the edit log. We expected > txid xx, but got txid xx. {code} > InitializeSharedEdits issued as part of the role migration step. Note, no > checkpoint is performed in the past few hours. > InitializeSharedEdits created a new log segment from the edit_inprogres > transaction and deleted all old transactions. > My ask here is to delete any edit transaction older than the fimage > transaction. But currently, it deletes all transactions and no check is > enforced in JNStorage#format(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779965#comment-17779965 ] ASF GitHub Bot commented on HDFS-17239: --- hadoop-yetus commented on PR #6225: URL: https://github.com/apache/hadoop/pull/6225#issuecomment-1781406625 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 47m 22s | | trunk passed | | +1 :green_heart: | compile | 1m 22s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 25s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 12s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 14s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 1m 14s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 1m 3s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 52 unchanged - 0 fixed = 54 total (was 52) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | -1 :x: | spotbugs | 3m 23s | [/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html) | hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | shadedclient | 37m 25s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 227m 3s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 374m 29s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible null pointer dereference of info in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock, FsVolumeImpl) Dereferenced at FsDatasetImpl.java:info in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock, FsVolumeImpl) Dereferenced at FsDatasetImpl.java:[line 2466] | | | Possible null pointer derefere
[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779963#comment-17779963 ] ASF GitHub Bot commented on HDFS-17239: --- hadoop-yetus commented on PR #6225: URL: https://github.com/apache/hadoop/pull/6225#issuecomment-1781403809 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 26s | | trunk passed | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 34s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 3m 29s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 10s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 9s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 1m 9s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 1m 3s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 52 unchanged - 0 fixed = 54 total (was 52) | | +1 :green_heart: | mvnsite | 1m 12s | | the patch passed | | +1 :green_heart: | javadoc | 0m 56s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | -1 :x: | spotbugs | 3m 26s | [/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html) | hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | shadedclient | 35m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 232m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6225/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 375m 35s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible null pointer dereference of info in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock, FsVolumeImpl) Dereferenced at FsDatasetImpl.java:info in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeReplicaFromMem(ExtendedBlock, FsVolumeImpl) Dereferenced at FsDatasetImpl.java:[line 2466] | | | Possible null pointer derefere
[jira] [Commented] (HDFS-17232) RBF: Fix NoNamenodesAvailableException for a long time, when use observer
[ https://issues.apache.org/jira/browse/HDFS-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779946#comment-17779946 ] ASF GitHub Bot commented on HDFS-17232: --- KeeProMise commented on code in PR #6208: URL: https://github.com/apache/hadoop/pull/6208#discussion_r1373336587 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestNoNamenodesAvailableLongTime.java: ## @@ -0,0 +1,432 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.router; + +import static org.apache.hadoop.fs.permission.AclEntryType.USER; +import static org.apache.hadoop.fs.permission.FsAction.ALL; +import static org.apache.hadoop.fs.permission.AclEntryScope.DEFAULT; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.permission.AclEntry; +import org.apache.hadoop.fs.permission.FsPermission; +import org.apache.hadoop.hdfs.DFSConfigKeys; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster.RouterContext; +import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder; +import org.apache.hadoop.hdfs.server.federation.StateStoreDFSCluster; +import org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCMetrics; +import org.apache.hadoop.hdfs.server.federation.resolver.FederationNamenodeContext; +import org.apache.hadoop.hdfs.server.federation.resolver.FederationNamenodeServiceState; +import org.apache.hadoop.hdfs.server.namenode.NameNode; +import org.apache.hadoop.ipc.RemoteException; +import org.apache.hadoop.util.Lists; +import org.junit.After; +import org.junit.Test; + + +import java.io.IOException; +import java.util.Collection; +import java.util.List; + +import static org.apache.hadoop.ha.HAServiceProtocol.HAServiceState.ACTIVE; +import static org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster.DEFAULT_HEARTBEAT_INTERVAL_MS; +import static org.apache.hadoop.hdfs.server.namenode.AclTestHelpers.aclEntry; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; +import static org.junit.Assert.assertTrue; + +/** + * When failover occurs, the router may record that the ns has no active namenode + * even if there is actually an active namenode. + * Only when the router updates the cache next time can the memory status be updated, + * causing the router to report NoNamenodesAvailableException for a long time, + * + * @see org.apache.hadoop.hdfs.server.federation.router.NoNamenodesAvailableException + */ +public class TestNoNamenodesAvailableLongTime { + + // router load cache interval 10s + private static final long CACHE_FLUSH_INTERVAL_MS = 1; + private StateStoreDFSCluster cluster; + private FileSystem fileSystem; + private RouterContext routerContext; + private FederationRPCMetrics rpcMetrics; + + @After + public void cleanup() throws IOException { +rpcMetrics = null; +routerContext = null; +if (fileSystem != null) { + fileSystem.close(); + fileSystem = null; +} +if (cluster != null) { + cluster.shutdown(); + cluster = null; +} + } + + /** + * Set up state store cluster. + * + * @param numNameservices number of name services + * @param numberOfObserver number of observer + * @param useObserver whether to use observer + */ + private void setupCluster(int numNameservices, int numberOfObserver, boolean useObserver) + throws Exception { +if (!useObserver) { + numberOfObserver = 0; +} +int numberOfNamenode = 2 + numberOfObserver; +cluster = new StateStoreDFSCluster(true, numNameservices, numberOfNamenode, +DEFAULT_HEARTBEAT_INTERVAL_MS, CACHE_FLUSH_INTERVAL_MS); +Configuration routerConf = new RouterConfigBuilder() +.stateStore() +.metrics() +.admin() +.rpc() +.heartbeat() +.build(); + +// Set router observer related configs +if (useObserver) { + routerConf.setB
[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17239: -- Labels: pull-request-available (was: ) > Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write > lock. > - > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > > In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in > the range of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779817#comment-17779817 ] ASF GitHub Bot commented on HDFS-17239: --- hfutatzhanghb opened a new pull request, #6225: URL: https://github.com/apache/hadoop/pull/6225 ### Description of PR In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in the range of BLOCK_POOl write lock. We should move them out of write lock. > Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write > lock. > - > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in > the range of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779796#comment-17779796 ] farmmamba commented on HDFS-17239: -- [~haiyang Hu] Sir, what's your opinions? Thanks. > Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write > lock. > - > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in > the range of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba updated HDFS-17239: - Description: In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in the range of BLOCK_POOl write lock. We should move them out of write lock. (was: In method FsDatasetImpl#invalidate, there exists some loggings in the range of BLOCK_POOl write lock. We should move them out of write lock.) > Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write > lock. > - > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In method FsDatasetImpl#removeReplicaFromMem, there exists some loggings in > the range of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba updated HDFS-17239: - Summary: Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock. (was: Remove logging of method invalidate which is in BLOCK_POOl write lock. ) > Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write > lock. > - > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In method FsDatasetImpl#invalidate, there exists some loggings in the range > of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779789#comment-17779789 ] Jim Halfpenny commented on HDFS-17238: -- In your example you've set dfs.blocksize to 1.25 TiB, a value well outside the bounds of what would be considered typical or useful. This isn't a bug, it is a configuration error. Do any of your data volumes have 1.25 TiB available? If not, then BlockManager.chooseTarget4NewBlock() is working as designed since it cannot find a data node with sufficient capacity for the new block. > Setting the value of "dfs.blocksize" too large will cause HDFS to be unable > to write to files > - > > Key: HDFS-17238 > URL: https://issues.apache.org/jira/browse/HDFS-17238 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.6 >Reporter: ECFuzz >Priority: Major > > My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. > core-site.xml like below. > {code:java} > > > fs.defaultFS > hdfs://localhost:9000 > > > hadoop.tmp.dir > /home/hadoop/Mutil_Component/tmp > > > {code} > hdfs-site.xml like below. > {code:java} > > > dfs.replication > 1 > > > dfs.blocksize > 134217728 > > > {code} > And then format the namenode, and start the hdfs. HDFS is running normally. > {code:java} > hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ > bin/hdfs namenode -format > x(many info) > hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ > sbin/start-dfs.sh > Starting namenodes on [localhost] > Starting datanodes > Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code} > Finally, use dfs to place a file. > {code:java} > bin/hdfs dfs -mkdir -p /user/hadoop > bin/hdfs dfs -mkdir input > bin/hdfs dfs -put etc/hadoop/*.xml input {code} > Discovering Exception Throwing. > {code:java} > 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to > 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) > at org.apache.hadoop.ipc.Client.call(Client.java:1513) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) > at com.sun.proxy.$Proxy9.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.
[jira] [Commented] (HDFS-17239) Remove logging of method invalidate which is in BLOCK_POOl write lock.
[ https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779785#comment-17779785 ] farmmamba commented on HDFS-17239: -- [~hexiaoqiao] [~zhangshuyan] [~tomscut] [~qinyuren] Sir, could you please take a look at this when you have free time? > Remove logging of method invalidate which is in BLOCK_POOl write lock. > --- > > Key: HDFS-17239 > URL: https://issues.apache.org/jira/browse/HDFS-17239 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > In method FsDatasetImpl#invalidate, there exists some loggings in the range > of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17239) Remove logging of method invalidate which is in BLOCK_POOl write lock.
farmmamba created HDFS-17239: Summary: Remove logging of method invalidate which is in BLOCK_POOl write lock. Key: HDFS-17239 URL: https://issues.apache.org/jira/browse/HDFS-17239 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.4.0 Reporter: farmmamba Assignee: farmmamba In method FsDatasetImpl#invalidate, there exists some loggings in the range of BLOCK_POOl write lock. We should move them out of write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17238: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} And then format the namenode, and start the hdfs. HDFS is running normally. {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs namenode -format x(many info) hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code} Finally, use dfs to place a file. {code:java} bin/hdfs dfs -mkdir -p /user/hadoop bin/hdfs dfs -mkdir input bin/hdfs dfs -put etc/hadoop/*.xml input {code} Discovering Exception Throwing. {code:java} 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1513) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717) at org.apache.hadoop.
[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17238: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} And then format the namenode, and start the hdfs. HDFS is running normally. {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs namenode -format x(many info) hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code} Finally, use dfs to place a file. {code:java} bin/hdfs dfs -mkdir -p /user/hadoop bin/hdfs dfs -mkdir input bin/hdfs dfs -put etc/hadoop/*.xml input {code} Discovering Exception Throwing. {code:java} 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1513) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717) at org.apache.hadoop.
[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17238: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} And then format the namenode, and start the hdfs. HDFS is running normally. {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs namenode -format x(many info) hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code} Finally, use dfs to place a file. {code:java} bin/hdfs dfs -mkdir -p /user/hadoop bin/hdfs dfs -mkdir input bin/hdfs dfs -put etc/hadoop/*.xml input {code} Discovering Exception Throwing. {code:java} 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1513) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088) at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717) at org.apache.hadoop.
[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17238: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} Then format the name node and start hdfs. was: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} > Setting the value of "dfs.blocksize" too large will cause HDFS to be unable > to write to files > - > > Key: HDFS-17238 > URL: https://issues.apache.org/jira/browse/HDFS-17238 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.6 >Reporter: ECFuzz >Priority: Major > > My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. > core-site.xml like below. > {code:java} > > > fs.defaultFS > hdfs://localhost:9000 > > > hadoop.tmp.dir > /home/hadoop/Mutil_Component/tmp > > > {code} > hdfs-site.xml like below. > {code:java} > > > dfs.replication > 1 > > > dfs.blocksize > 134217728 > > > {code} > Then format the name node and start hdfs. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17069) The documentation and implementation of "dfs.blocksize" are inconsistent.
[ https://issues.apache.org/jira/browse/HDFS-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17069: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 128k {code} 然后格式化名称节点,并启动 hdfs。 {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs namenode -format x(many info) hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996]{code} 最后,使用 dfs 放置一个文件。然后我收到消息,这意味着 128k 小于 1M。 {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs dfs -mkdir -p /user/hadoop hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs dfs -mkdir input hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hdfs dfs -put etc/hadoop/hdfs-site.xml input put: Specified block size is less than configured minimum value (dfs.namenode.fs-limits.min-block-size): 131072 < 1048576 {code} 但我发现在文档中,dfs.blocksize 可以设置为 128k 和 hdfs-default 中的其他值.xml . {code:java} The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).{code} 那么,这里的文档是否应该存在一些问题?或者应该注意用户将此配置设置为大于 1M? 此外,我启动纱线并运行给定的 mapreduce 作业。 {code:java} hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ sbin/start-yarn.sh hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep input output 'dfs[a-z.]+'{code} 并且,外壳会抛出一些异常,如下所示。 {code:java} 2023-07-12 15:12:29,964 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032 2023-07-12 15:12:30,430 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001 2023-07-12 15:12:30,542 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1689145947338_0001 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block size is less than configured minimum value (dfs.namenode.fs-limits.min-block-size): 131072 < 1048576 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2690) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2625) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:807) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:496) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1513) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy9.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:383) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
[ https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ECFuzz updated HDFS-17238: -- Description: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} was: My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} > Setting the value of "dfs.blocksize" too large will cause HDFS to be unable > to write to files > - > > Key: HDFS-17238 > URL: https://issues.apache.org/jira/browse/HDFS-17238 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.6 >Reporter: ECFuzz >Priority: Major > > My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. > core-site.xml like below. > {code:java} > > > fs.defaultFS > hdfs://localhost:9000 > > > hadoop.tmp.dir > /home/hadoop/Mutil_Component/tmp > > > {code} > hdfs-site.xml like below. > {code:java} > > > dfs.replication > 1 > > > dfs.blocksize > 134217728 > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files
ECFuzz created HDFS-17238: - Summary: Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files Key: HDFS-17238 URL: https://issues.apache.org/jira/browse/HDFS-17238 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.3.6 Reporter: ECFuzz My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation. core-site.xml like below. {code:java} fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /home/hadoop/Mutil_Component/tmp {code} hdfs-site.xml like below. {code:java} dfs.replication 1 dfs.blocksize 134217728 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org