from:"Max Xie $Jira$"

[jira] [Created] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

2023-03-16 Thread Max Xie (Jira)

Max  Xie created HDFS-16954:
---

 Summary: RBF: The operation of renaming a multi-subcluster 
directory to a single-cluster directory should throw ioexception
 Key: HDFS-16954
 URL: https://issues.apache.org/jira/browse/HDFS-16954
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie


The operation of renaming a multi-subcluster directory to a single-cluster 
directory may cause inconsistent behavior of the file system. This operation 
should throw exception to be reasonable.

Examples are as follows:
1. add  hash_all mount point   `hdfs dfsrouteradmin -add /tmp/foo 
subcluster1,subcluster2  /tmp/foo -order HASH_ALL`
2. add   mount point   `hdfs dfsrouteradmin -add /user/foo subcluster1 
/user/foo`
3. mkdir dir for all subcluster.  ` hdfs dfs -mkdir /tmp/foo/123 `

4. check dir and all subclusters will have dir `/tmp/foo/123`
`hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir 
`hdfs://subcluster1/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
`hdfs://subcluster2/tmp/foo/123`;

5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs -mv 
/tmp/foo/123 /user/foo/123 `

6. check dir again, rbf cluster still show dir `/tmp/foo/123`
`hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs;
`hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
`hdfs://subcluster2/tmp/foo/123`;

The step 5 should throw exception.
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16946) RBF: top real owners metrics can't been parsed json string

2023-03-09 Thread Max Xie (Jira)

Max  Xie created HDFS-16946:
---

 Summary: RBF: top real owners metrics can't been parsed json string
 Key: HDFS-16946
 URL: https://issues.apache.org/jira/browse/HDFS-16946
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie
 Attachments: image-2023-03-09-22-24-39-833.png

After HDFS-15447,  Add top real owners metrics for delegation tokens. But the 
metrics can't been parsed json string.

 RBFMetrics$getTopTokenRealOwners method just return 
`org.apache.hadoop.metrics2.util.Metrics2Util$NameValuePair@1`

!image-2023-03-09-22-24-39-833.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16945) RBF: add RouterSecurityAuditLogger for router security manager

2023-03-09 Thread Max Xie (Jira)

Max  Xie created HDFS-16945:
---

 Summary: RBF: add RouterSecurityAuditLogger for router security 
manager
 Key: HDFS-16945
 URL: https://issues.apache.org/jira/browse/HDFS-16945
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie


we should add audit log for router security manager for  token APIs. For 
examples,

```
 
{{2023-03-02 20:53:02,712 INFO 
org.apache.hadoop.hdfs.server.federation.router.security.RouterSecurityManager.audit:
 allowed=true ugi=hadoop  ip=localhost/127.0.0.1  cmd=getDelegationToken
  toeknId=HDFS_DELEGATION_TOKEN token 18359 for hadoop with renewer hadoop  
  proto=webhdfs}}

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16861) RBF. Truncate API always fails when dirs use AllResolver oder on Router

2022-12-05 Thread Max Xie (Jira)

Max  Xie created HDFS-16861:
---

 Summary: RBF. Truncate API always fails when dirs use AllResolver 
oder on Router  
 Key: HDFS-16861
 URL: https://issues.apache.org/jira/browse/HDFS-16861
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie
 Attachments: image-2022-12-05-17-35-19-841.png

# prepare a directory in a HASH_ALL/SPACE/RANDOM mount point.
 # put  a test file with 1024 bytes  to this directory
 # truncate the file with 100 new length  and this op will fail and throw  a 
exception that the file does not exist.

 

After dig it, we should ignore the result of Truncate API in 
RouterClientProtocol because

Truncate can return true/false, so don't expect a result.

After fix it , the code is 

!image-2022-12-05-17-35-19-841.png!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16677) Add OP_SWAP_BLOCK_LIST as an operation code in FSEditLogOpCodes.

2022-07-26 Thread Max Xie (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max  Xie resolved HDFS-16677.
-
Resolution: Duplicate

> Add OP_SWAP_BLOCK_LIST as an operation code in FSEditLogOpCodes.
> 
>
> Key: HDFS-16677
> URL: https://issues.apache.org/jira/browse/HDFS-16677
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding, hdfs
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Copy from HDFS-15006
> In HDFS-14989, we add a new Namenode operation "swapBlockList" to replace the 
> set of blocks in an INode File with a new set of blocks. This JIRA will track 
> the effort to add an FS Edit log op to persist this operation.
> We also need to increase the NamenodeLayoutVersion for this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16677) Add OP_SWAP_BLOCK_LIST as an operation code in FSEditLogOpCodes.

2022-07-21 Thread Max Xie (Jira)

Max  Xie created HDFS-16677:
---

 Summary: Add OP_SWAP_BLOCK_LIST as an operation code in 
FSEditLogOpCodes.
 Key: HDFS-16677
 URL: https://issues.apache.org/jira/browse/HDFS-16677
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: erasure-coding, hdfs
Reporter: Max  Xie
Assignee: Max  Xie


Copy from HDFS-15006

In HDFS-14989, we add a new Namenode operation "swapBlockList" to replace the 
set of blocks in an INode File with a new set of blocks. This JIRA will track 
the effort to add an FS Edit log op to persist this operation.

We also need to increase the NamenodeLayoutVersion for this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16655) OIV: print out erasure coding policy name in oiv Delimited output

2022-07-09 Thread Max Xie (Jira)

Max  Xie created HDFS-16655:
---

 Summary: OIV: print out erasure coding policy name in oiv 
Delimited output
 Key: HDFS-16655
 URL: https://issues.apache.org/jira/browse/HDFS-16655
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 3.4.0
Reporter: Max  Xie


By adding erasure coding policy name to oiv output, it will help with oiv 
post-analysis to have a overview of all folders/files with specified ec policy 
and to apply internal regulation based on this information. In particular, it 
wiil be convenient for the platform to calculate the real storage size of the 
ec file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16504) add `dfs.namenode.get-blocks.check.operation` to enable or disable checkOperation when NNs process getBlocks

2022-03-13 Thread Max Xie (Jira)

Max  Xie created HDFS-16504:
---

 Summary: add `dfs.namenode.get-blocks.check.operation` to enable 
or disable checkOperation when NNs process getBlocks
 Key: HDFS-16504
 URL: https://issues.apache.org/jira/browse/HDFS-16504
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namanode
Affects Versions: 3.4.0
Reporter: Max  Xie


HDFS-13183  add a nice feature that Standby NameNode can process getBlocks 
request to reduce Active load.  Namenode must set  `dfs.ha.allow.stale.reads = 
true` to enable this feature. However, if we  set `dfs.ha.allow.stale.reads = 
true`, Standby Namenode will be able to  process all read requests, which may 
lead to yarn jobs fail  because  Standby Namenode is stale . 

Maybe we should add a config `dfs.namenode.get-blocks.check.operation=false` 
for namenode to disable check operation  when namenode process getBlocks 
request.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16459) RBF: register RBFMetrics in MetricsSystem for promethuessink

2022-02-18 Thread Max Xie (Jira)

Max  Xie created HDFS-16459:
---

 Summary: RBF: register RBFMetrics in MetricsSystem for 
promethuessink
 Key: HDFS-16459
 URL: https://issues.apache.org/jira/browse/HDFS-16459
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie
Assignee: Max  Xie


Router' RBFMetrics was not register  in MetricsSystem. We can't find these 
metrics from PrometheusSink. Maybe we should fix it. 

 

After fix it ,  some  RBFMetrics will export like this 

```

# HELP rbf_metrics_current_tokens_count Number of router's current tokens # 
TYPE rbf_metrics_current_tokens_count gauge 
rbf_metrics_current_tokens_count\{processname="Router",context="dfs",hostname="xxx"}
 123

``` 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16455) RBF: Router should explicitly specify the value of `jute.maxbuffer` in hadoop configuration files like core-site.xml

2022-02-11 Thread Max Xie (Jira)

Max  Xie created HDFS-16455:
---

 Summary: RBF: Router should explicitly specify the value of 
`jute.maxbuffer` in hadoop configuration files like core-site.xml
 Key: HDFS-16455
 URL: https://issues.apache.org/jira/browse/HDFS-16455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.3.0, 3.4.0
Reporter: Max  Xie


Based on the current design for delegation token in secure Router, the total 
number of  tokens store and update in zookeeper using ZKDelegationTokenManager. 
 

But the default value of  system property `jute.maxbuffer` is just 4MB,  if 
Router store too many tokens in zk, it will throw  IOException   `{{{}Packet 
lenxx is out of range{}}}` and all Router will crash. 

 

In our cluster,  Routers crashed because of it. The crash logs are below 
2022-02-09 02:15:51,607 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Token renewal for identifier: (token for xxx: HDFS_DELEGATION_TOKEN 
owner=xxx/scheduler, renewer=hadoop, realUser=, issueDate=1644344146305, 
maxDate=1644948946305, sequenceNumbe
r=27136070, masterKeyId=1107); total currentTokens 279548  2022-02-09 
02:16:07,632 WARN org.apache.zookeeper.ClientCnxn: Session 0x1000172775a0012 
for server zkurl:2181, unexpected e
rror, closing socket connection and attempting reconnect
java.io.IOException: Packet len4194553 is out of range!
at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
2022-02-09 02:16:07,733 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
1254 on default port 9001, call Call#144 Retry#0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken from ip:46534
java.lang.RuntimeException: Could not increment shared counter !!
at 
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:582)
{{}}

When we restart a Router, it crashed again
{code:java}
2022-02-09 03:14:17,308 INFO 
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager: 
Starting to load key cache.
2022-02-09 03:14:17,310 INFO 
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager: 
Loaded key cache.
2022-02-09 03:14:32,930 WARN org.apache.zookeeper.ClientCnxn: Session 
0x205584be35b0001 for server zkurl:2181, unexpected
error, closing socket connection and attempting reconnect
java.io.IOException: Packet len4194478 is out of range!
at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

2022-02-09 03:14:33,030 ERROR 
org.apache.hadoop.hdfs.server.federation.router.security.token.ZKDelegationTokenSecretManagerImpl:
 Error starting threads for z
kDelegationTokens
java.io.IOException: Could not start PathChildrenCache for tokens {code}
Finnally, we config `-Djute.maxbuffer=1000` in hadoop-env,sh to fix this 
issue.

After dig it, we found the number of the  znode 
`/ZKDTSMRoot/ZKDTSMTokensRoot`'s children node was more than 25, which's 
data size was over 4MB.

 

Maybe we should  explicitly specify the value of `jute.maxbuffer` in hadoop 
configuration files like core-site.xml, hdfs-rbf-site.xml to configure a larger 
value 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16451) RBF: add search box for Router's tab-mounttable web page.

2022-02-09 Thread Max Xie (Jira)

Max  Xie created HDFS-16451:
---

 Summary: RBF: add search box for Router's tab-mounttable web page.
 Key: HDFS-16451
 URL: https://issues.apache.org/jira/browse/HDFS-16451
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie
Assignee: Max  Xie
 Attachments: image-2022-02-09-18-17-53-498.png, 
image-2022-02-09-18-18-29-262.png

In our cluster, we have mount many paths in HDFS Router and it may lead to take 
some time to load the mount-table page of Router when we open it  in the 
browser.

In order to use the mount-table page more conveniently, maybe we should add a 
search box style, just like the screenshot below

!image-2022-02-09-18-17-53-498.png!

!image-2022-02-09-18-18-29-262.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16447) RBF: Registry HDFS Router's rpcserver & rpcclient metrics for PrometheusSink.

2022-02-08 Thread Max Xie (Jira)

Max  Xie created HDFS-16447:
---

 Summary: RBF: Registry HDFS Router's rpcserver & rpcclient metrics 
for PrometheusSink.
 Key: HDFS-16447
 URL: https://issues.apache.org/jira/browse/HDFS-16447
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie


When we enable PrometheusSink for HDFS Router,  Router' prometheus sink miss 
some metrics, for example `RpcClientNumActiveConnections` and so on.

 

We need  registry some  Router's rpcserver & rpcclient metrics for 
PrometheusSink.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread Max Xie (Jira)

Max  Xie created HDFS-16182:
---

 Summary: numOfReplicas is given the wrong value in  
BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
Heterogeneous Storage  
 Key: HDFS-16182
 URL: https://issues.apache.org/jira/browse/HDFS-16182
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namanode
Affects Versions: 3.4.0
Reporter: Max  Xie


In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
better performance. Sometimes  hdfs client transfer data in pipline,  it will 
throw IOException and exit.  Exception logs are below: 

```
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
 
DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
 
DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
 
DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
 
DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
 
original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
 
DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
```

After search it,   I found when existing pipline need replace new dn to 
transfer data, the client will get one additional dn from namenode  and check 
that the number of dn is the original number + 1.

```

## DataStreamer$findNewDatanode

if (nodes.length != original.length + 1) {
 throw new IOException(
 "Failed to replace a bad datanode on the existing pipeline "
 + "due to no more good datanodes being available to try. "
 + "(Nodes: current=" + Arrays.asList(nodes)
 + ", original=" + Arrays.asList(original) + "). "
 + "The current failed datanode replacement policy is "
 + dfsClient.dtpReplaceDatanodeOnFailure
 + ", and a client may configure this via '"
 + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
 + "' in its configuration.");
}

```

The root cause is that Namenode$getAdditionalDatanode returns multi datanodes , 
not one in DataStreamer.addDatanode2ExistingPipeline. 

 

Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
numOfReplicas should not be assigned by requiredStorageTypes.

 

   

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16180) FsVolumeImpl.nextBlock should consider that the block meta file has been deleted.

2021-08-19 Thread Max Xie (Jira)

Max  Xie created HDFS-16180:
---

 Summary: FsVolumeImpl.nextBlock should consider that the block 
meta file has been deleted.
 Key: HDFS-16180
 URL: https://issues.apache.org/jira/browse/HDFS-16180
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.0, 3.4.0
Reporter: Max  Xie


In my cluster,  we found that when VolumeScanner run, sometime dn will throw 
some error log below

```
 
2021-08-19 08:00:11,549 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-1020175758-nnip-1597745872895 blk_1142977964_69237147 URI 
file:/disk1/dfs/data/current/BP-1020175758- 
nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
2021-08-19 08:00:48,368 ERROR 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl: 
nextBlock(DS-060c8e4c-1ef6-49f5-91ef-91957356891a, BP-1020175758- 
nnip-1597745872895): I/O error
java.io.IOException: Meta file not found, 
blockFile=/disk1/dfs/data/current/BP-1020175758- 
nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetUtil.findMetaFile(FsDatasetUtil.java:101)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.nextBlock(FsVolumeImpl.java:809)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:528)
at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:628)
2021-08-19 08:00:48,368 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
VolumeScanner(/disk1/dfs/data, DS-060c8e4c-1ef6-49f5-91ef-91957356891a): 
nextBlock error on 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@7febc6b4
```

When VolumeScanner scan block  blk_1142977964,  it has been deleted by 
datanode,  scanner can not find the meta file of blk_1142977964, so it throw 
these error log.

 

Maybe we should handle FileNotFoundException during nextblock to reduce error 
log and nextblock retry times.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15886) Add a way to get protected dirs from a special configuration file

2021-03-10 Thread Max Xie (Jira)

Max  Xie created HDFS-15886:
---

 Summary: Add a way to get protected dirs from a special 
configuration file
 Key: HDFS-15886
 URL: https://issues.apache.org/jira/browse/HDFS-15886
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.4.0
Reporter: Max  Xie
Assignee: Max  Xie


We used protected dirs to ensure that important data directories cannot be 
deleted by mistake. But protected dirs can only be configured in hdfs-site.xml.

For ease of management,  we add a way to get the list of protected dirs from a 
special configuration file.

How to use.

1. set the config in hdfs-site.xml

```


 dfs.protected.directories.config.file.enable
 true



 fs.protected.directories
 file:///path/to/protected.dirs.config


```

2.  add some protected dirs to the config file 
(file:///path/to/protected.dirs.config)

```

# protect directories

/1

/2/3

```

3. done



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-08 Thread Max Xie (Jira)

Max  Xie created HDFS-15720:
---

 Summary: namenode audit async logger should add some log4j config
 Key: HDFS-15720
 URL: https://issues.apache.org/jira/browse/HDFS-15720
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.3.0
 Environment: hadoop 3.3.0
Reporter: Max  Xie


Hadoop project use log4j 1.2.x, we can't config some properties of logger in 
log4j.properties file , For example, AsyncAppender BufferSize and Blocking see 
https://logging.apache.org/log4j/1.2/apidocs/index.html.

Namenode  should add some audit async logger log4j config In order to 
facilitate the adjustment of log4j usage and audit log output performance 
adjustment. 

The new configuration is as follows

dfs.namenode.audit.log.async.blocking false

dfs.namenode.audit.log.async.buffer.size 128

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

[jira] [Created] (HDFS-16946) RBF: top real owners metrics can't been parsed json string

[jira] [Created] (HDFS-16945) RBF: add RouterSecurityAuditLogger for router security manager

[jira] [Created] (HDFS-16861) RBF. Truncate API always fails when dirs use AllResolver oder on Router

[jira] [Resolved] (HDFS-16677) Add OP_SWAP_BLOCK_LIST as an operation code in FSEditLogOpCodes.

[jira] [Created] (HDFS-16677) Add OP_SWAP_BLOCK_LIST as an operation code in FSEditLogOpCodes.

[jira] [Created] (HDFS-16655) OIV: print out erasure coding policy name in oiv Delimited output

[jira] [Created] (HDFS-16504) add `dfs.namenode.get-blocks.check.operation` to enable or disable checkOperation when NNs process getBlocks

[jira] [Created] (HDFS-16459) RBF: register RBFMetrics in MetricsSystem for promethuessink

[jira] [Created] (HDFS-16455) RBF: Router should explicitly specify the value of `jute.maxbuffer` in hadoop configuration files like core-site.xml

[jira] [Created] (HDFS-16451) RBF: add search box for Router's tab-mounttable web page.

[jira] [Created] (HDFS-16447) RBF: Registry HDFS Router's rpcserver & rpcclient metrics for PrometheusSink.

[jira] [Created] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Created] (HDFS-16180) FsVolumeImpl.nextBlock should consider that the block meta file has been deleted.

[jira] [Created] (HDFS-15886) Add a way to get protected dirs from a special configuration file

[jira] [Created] (HDFS-15720) namenode audit async logger should add some log4j config

16 matches

Site Navigation

Mail list logo

Footer information