[ 
https://issues.apache.org/jira/browse/HDFS-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17346:
------------------------------
    Description: 
DirectoryScanner check mark the normal blocks as corrupt and report to 
namenode, it maybe cause some corrupted blocks, actually these are health.

This can happen if Appending and DirectoryScanner are running at the same time, 
and the probability is very high.

*Root cause:*
* Create a file such as:blk_xxx_1001 and diskFile is 
file:/XXX/current/finalized/blk_xxx_1001, diskMetaFile is 
file:/XXX/current/finalized/blk_xxx_1001.meta
* Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and record 
blockFile is file:/XXX/current/finalized/blk_xxx and metaFile is 
file:/XXX/current/finalized/blk_xxx_1001.meta
* Simultaneously other thread to complete append for blk_xxx, then the diskFile 
file:/XXX/current/finalized/blk_xxx, diskMetaFile 
file:/XXX/current/finalized/blk_xxx_1002.meta, memMetaFile 
file:/XXX/current/finalized/blk_xxx, memDataFile 
file:/XXX/current/finalized/blk_xxx_1002.meta
* DirectoryScanner continue to run, due to the different generation stamps of 
the metadata file in mem and metadata file in scanInfo will add the scanInfo 
object to the list of differences
* Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of 
differences, due to current diskMetaFile 
/XXX/current/finalized/blk_xxx_1001.meta is not exists, so isRegular as false
{code:java}
final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && 
FileUtil.isRegularFile(diskFile, false);
{code}
* Here will mark the normal blocks as corrupt and report to namenode
{code:java}
} else if (!isRegular) {
  corruptBlock = new Block(memBlockInfo);
  LOG.warn("Block:{} is not a regular file.", corruptBlock.getBlockId());
}
{code}


  was:
DirectoryScanner check mark the normal blocks as corrupt and report to 
namenode, it maybe cause some corrupted blocks, actually these are health.

This can happen if Appending and DirectoryScanner are running at the same time, 
and the probability is very high.

*Root cause:*
* Create a file such as:blk_xxx_1001 and diskFile is 
file:/XXX/current/finalized/blk_xxx_1001, diskMetaFile is 
file:/XXX/current/finalized/blk_xxx_1001.meta
* Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and record 
blockFile is file:/XXX/current/finalized/blk_xxx and metaFile is 
file:/XXX/current/finalized/blk_xxx_1001.meta
* Simultaneously other thread to complete append for blk_xxx, then the diskFile 
file:/XXX/current/finalized/blk_xxx, diskMetaFile 
file:/XXX/current/finalized/blk_xxx_1002.meta, memMetaFile 
file:/XXX/current/finalized/blk_xxx, memDataFile 
file:/XXX/current/finalized/blk_xxx_1002.meta
* DirectoryScanner continue to run, due to the different generation stamps of 
the metadata file in mem and metadata file in scanInfo will add the scanInfo 
object to the list of differences
* Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of 
differences, due to current diskMetaFile 
/XXX/current/finalized/blk_xxx_1001.meta is not exists, so isRegular as false
{code:java}
final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && 
FileUtil.isRegularFile(diskFile, false);
{code}
* here will mark the normal blocks as corrupt and report to namenode
{code:java}
} else if (!isRegular) {
  corruptBlock = new Block(memBlockInfo);
  LOG.warn("Block:{} is not a regular file.", corruptBlock.getBlockId());
}
{code}



> Fix DirectoryScanner check mark the normal blocks as corrupt.
> -------------------------------------------------------------
>
>                 Key: HDFS-17346
>                 URL: https://issues.apache.org/jira/browse/HDFS-17346
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> DirectoryScanner check mark the normal blocks as corrupt and report to 
> namenode, it maybe cause some corrupted blocks, actually these are health.
> This can happen if Appending and DirectoryScanner are running at the same 
> time, and the probability is very high.
> *Root cause:*
> * Create a file such as:blk_xxx_1001 and diskFile is 
> file:/XXX/current/finalized/blk_xxx_1001, diskMetaFile is 
> file:/XXX/current/finalized/blk_xxx_1001.meta
> * Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and record 
> blockFile is file:/XXX/current/finalized/blk_xxx and metaFile is 
> file:/XXX/current/finalized/blk_xxx_1001.meta
> * Simultaneously other thread to complete append for blk_xxx, then the 
> diskFile file:/XXX/current/finalized/blk_xxx, diskMetaFile 
> file:/XXX/current/finalized/blk_xxx_1002.meta, memMetaFile 
> file:/XXX/current/finalized/blk_xxx, memDataFile 
> file:/XXX/current/finalized/blk_xxx_1002.meta
> * DirectoryScanner continue to run, due to the different generation stamps of 
> the metadata file in mem and metadata file in scanInfo will add the scanInfo 
> object to the list of differences
> * Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of 
> differences, due to current diskMetaFile 
> /XXX/current/finalized/blk_xxx_1001.meta is not exists, so isRegular as false
> {code:java}
> final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && 
> FileUtil.isRegularFile(diskFile, false);
> {code}
> * Here will mark the normal blocks as corrupt and report to namenode
> {code:java}
> } else if (!isRegular) {
>   corruptBlock = new Block(memBlockInfo);
>   LOG.warn("Block:{} is not a regular file.", corruptBlock.getBlockId());
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to