[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-08-28 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16684.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution 
[~svaughan] 

> Exclude self from JournalNodeSyncer when using a bind host
> --
>
> Key: HDFS-16684
> URL: https://issues.apache.org/jira/browse/HDFS-16684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The JournalNodeSyncer will include the local instance in syncing when using a 
> bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude 
> the local instance, but it doesn't recognize the meta-address as a local 
> address.
> Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
> attempts to sync with itself as part of the normal syncing rotation.  For an 
> HA configuration running 3 JournalNodes, the "other" list used by the 
> JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-25 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16586.
--
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review 
[~hexiaoqiao] 

> Purge FsDatasetAsyncDiskService threadgroup; it causes 
> BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
> exception and exit' 
> -
>
> Key: HDFS-16586
> URL: https://issues.apache.org/jira/browse/HDFS-16586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0, 3.2.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The below failed block finalize is causing a downstreamer's test to fail when 
> it uses hadoop 3.2.3 or 3.3.0+:
> {code:java}
> 2022-05-19T18:21:08,243 INFO  [Command processor] 
> impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
> FinalizedReplica, blk_1073741840_1016, FINALIZED
>   getNumBytes()     = 52
>   getBytesOnDisk()  = 52
>   getVisibleLength()= 52
>   getVolume()       = 
> /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
>   getBlockURI()     = 
> file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
>  for deletion
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
> (auth:SIMPLE)
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> top.TopAuditLogger(78): --- logged event for top service: 
> allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
> src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
>   dst=null  perm=null
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS 
> downstreamAckTimeNanos: 0 flag: 0
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
> 2022-05-19T18:21:08,243 ERROR [Command processor] 
> datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
> encountered fatal exception and exit.
> java.lang.IllegalThreadStateException: null
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
>   at java.lang.Thread.(Thread.java:430) ~[?:?]
>   at java.lang.Thread.(Thread.java:704) ~[?:?]
>   at java.lang.Thread.(Thread.java:525) ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
> ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
>  

[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-20 Thread Michael Stack (Jira)
Michael Stack created HDFS-16586:


 Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes 
BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
exception and exit' 
 Key: HDFS-16586
 URL: https://issues.apache.org/jira/browse/HDFS-16586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.2.3, 3.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The below failed block finalize is causing a downstreamer's test to fail when 
it uses hadoop 3.2.3 or 3.3.0+:
{code:java}
2022-05-19T18:21:08,243 INFO  [Command processor] 
impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
FinalizedReplica, blk_1073741840_1016, FINALIZED
  getNumBytes()     = 52
  getBytesOnDisk()  = 52
  getVisibleLength()= 52
  getVolume()       = 
/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
  getBlockURI()     = 
file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
 for deletion
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
(auth:SIMPLE)
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
top.TopAuditLogger(78): --- logged event for top service: 
allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
  dst=null  perm=null
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1645): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, 
replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1327): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: 
seqno=-2 waiting for local datanode to finish write.
2022-05-19T18:21:08,243 ERROR [Command processor] 
datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
encountered fatal exception and exit.
java.lang.IllegalThreadStateException: null
  at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
  at java.lang.Thread.(Thread.java:430) ~[?:?]
  at java.lang.Thread.(Thread.java:704) ~[?:?]
  at java.lang.Thread.(Thread.java:525) ~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
 ~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) 
~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274)
 ~[hadoop-hdfs-3.2.3.jar:?]
2022-05-19T18:21:08,243 DEBUG [DataXceiver for client 

[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16540.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-3.3. and to trunk.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org