[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host
[ https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16684. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution [~svaughan] > Exclude self from JournalNodeSyncer when using a bind host > -- > > Key: HDFS-16684 > URL: https://issues.apache.org/jira/browse/HDFS-16684 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0, 3.3.9 > Environment: Running with Java 11 and bind addresses set to 0.0.0.0. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > The JournalNodeSyncer will include the local instance in syncing when using a > bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude > the local instance, but it doesn't recognize the meta-address as a local > address. > Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log > attempts to sync with itself as part of the normal syncing rotation. For an > HA configuration running 3 JournalNodes, the "other" list used by the > JournalNodeSyncer will include 3 proxies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
[ https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16586. -- Fix Version/s: 3.4.0 3.2.4 3.3.4 Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review [~hexiaoqiao] > Purge FsDatasetAsyncDiskService threadgroup; it causes > BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal > exception and exit' > - > > Key: HDFS-16586 > URL: https://issues.apache.org/jira/browse/HDFS-16586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0, 3.2.3 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The below failed block finalize is causing a downstreamer's test to fail when > it uses hadoop 3.2.3 or 3.3.0+: > {code:java} > 2022-05-19T18:21:08,243 INFO [Command processor] > impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica > FinalizedReplica, blk_1073741840_1016, FINALIZED > getNumBytes() = 52 > getBytesOnDisk() = 52 > getVisibleLength()= 52 > getVolume() = > /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 > getBlockURI() = > file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 > for deletion > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 > (auth:SIMPLE) > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > top.TopAuditLogger(78): --- logged event for top service: > allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete > src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp > dst=null perm=null > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS > downstreamAckTimeNanos: 0 flag: 0 > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. > 2022-05-19T18:21:08,243 ERROR [Command processor] > datanode.BPServiceActor$CommandProcessingThread(1276): Command processor > encountered fatal exception and exit. > java.lang.IllegalThreadStateException: null > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] > at java.lang.Thread.(Thread.java:430) ~[?:?] > at java.lang.Thread.(Thread.java:704) ~[?:?] > at java.lang.Thread.(Thread.java:525) ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) > ~[hadoop-hdfs-3.2.3.jar:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) > ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) >
[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
Michael Stack created HDFS-16586: Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit' Key: HDFS-16586 URL: https://issues.apache.org/jira/browse/HDFS-16586 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.2.3, 3.3.0 Reporter: Michael Stack Assignee: Michael Stack The below failed block finalize is causing a downstreamer's test to fail when it uses hadoop 3.2.3 or 3.3.0+: {code:java} 2022-05-19T18:21:08,243 INFO [Command processor] impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica FinalizedReplica, blk_1073741840_1016, FINALIZED getNumBytes() = 52 getBytesOnDisk() = 52 getVisibleLength()= 52 getVolume() = /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 getBlockURI() = file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 for deletion 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 (auth:SIMPLE) 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] top.TopAuditLogger(78): --- logged event for top service: allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp dst=null perm=null 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. 2022-05-19T18:21:08,243 ERROR [Command processor] datanode.BPServiceActor$CommandProcessingThread(1276): Command processor encountered fatal exception and exit. java.lang.IllegalThreadStateException: null at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] at java.lang.Thread.(Thread.java:430) ~[?:?] at java.lang.Thread.(Thread.java:704) ~[?:?] at java.lang.Thread.(Thread.java:525) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) ~[hadoop-hdfs-3.2.3.jar:?] at java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274) ~[hadoop-hdfs-3.2.3.jar:?] 2022-05-19T18:21:08,243 DEBUG [DataXceiver for client
[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16540. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3.3. and to trunk. > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 7h > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org