sr2020 created HDFS-15588:
-----------------------------

             Summary: Arbitrarily low values for 
`dfs.block.access.token.lifetime` aren't safe and can cause a healthy datanode 
to be excluded
                 Key: HDFS-15588
                 URL: https://issues.apache.org/jira/browse/HDFS-15588
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs, hdfs-client, security
            Reporter: sr2020


*Description*:
Setting `dfs.block.access.token.lifetime` to arbitrarily low values (like 1) 
means the lifetime of a block token is very short, as a result some healthy 
datanodes could be wrongly excluded by the client due to the 
`InvalidBlockTokenException`.

More specifically, in `nextBlockOutputStream`, the client tries to get the 
`accessToken` from the namenode and use it to talk to datanode. And the 
lifetime of `accessToken` could set to very small (like 1 min) by setting 
`dfs.block.access.token.lifetime`. In some extreme conditions (like a VM 
migration, temporary network issue, or a stop-the-world GC), the `accessToken` 
could become expired when the client tries to use it to talk to the datanode. 
If expired, `createBlockOutputStream` will return false (and mask the 
`InvalidBlockTokenException`), so the client will think the datanode is 
unhealthy, mark the it as "excluded" and will never read/write on it.


*Proposed solution*:
A simple retry on the same datanode after catching `InvalidBlockTokenException` 
can solve this problem (assuming the extreme conditions won't happen often). 
Since currently the `dfs.block.access.token.lifetime` can even accept values 
like 0, we can also choose to prevent the users from setting 
`dfs.block.access.token.lifetime` to a small value (e.g., we can enforce a 
minimum value of 5mins for this parameter).

We submit a patch for retrying after catching `InvalidBlockTokenException` in 
`nextBlockOutputStream`. We can also provide a patch for enforcing a larger 
minimum value for `dfs.block.access.token.lifetime` if it is a better way to 
handle this.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to