[
https://issues.apache.org/jira/browse/HDFS-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ke Han updated HDFS-17219:
--------------------------
Description:
When upgrading hdfs cluster from 2.10.2 to 3.3.6, the results returned from
*dfs count* command is inconsistent.
Restarting 3.3.6 can also trigger it (As long as FSImage is created similar to
the upgrade process)
h1. Reproduce
Start up 2.10.2 hdfs cluster (1 NN, 2 DN, 1 SNN), execute the following commands
{code:java}
bin/hdfs dfs -mkdir /GscWZRxS
bin/hdfs dfsadmin -setSpaceQuota 2 -storageType DISK /GscWZRxS
{code}
before upgrade, check the quota results
{code:java}
dfs -count -q -h -u /GscWZRxS
none inf none inf /GscWZRxS
{code}
Then prepare the upgrade. Enter safemode, {*}create image{*}, shutdown the
cluster and start up the new cluster
{code:java}
bin/hdfs dfs -count -q -h -u /GscWZRxS
8.0 E 8.0 E none inf /GscWZRxS
{code}
The values of the first two columns are inconsistent with the quota I set
before.
h1. Root Cause
The problem occurs in the deserialization process.
When deserializing the quota information in loadINodeDirectory from
+FSImageFormatPBINode.java+ , nsQuota value is -1, which is deserialized
correctly.
However, this value is not used later in DirectoryWithQuotaFeature object
construction. The default nsQuota in DirectoryWithQuotaFeature builder is
Long.MAX_VALUE, and this causes the inconsistency.
{code:java}
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
public static INodeDirectory loadINodeDirectory(INodeSection.INode n,
LoaderContext state) {
// ...
final long nsQuota = d.getNsQuota(), dsQuota = d.getDsQuota(); // -1
here
if (nsQuota >= 0 || dsQuota >= 0) {
dir.addDirectoryWithQuotaFeature(new
DirectoryWithQuotaFeature.Builder().
nameSpaceQuota(nsQuota).storageSpaceQuota(dsQuota).build());
}
EnumCounters<StorageType> typeQuotas = null;
if (d.hasTypeQuotas()) {
// ...
if (typeQuotas.anyGreaterOrEqual(0)) {
DirectoryWithQuotaFeature q = dir.getDirectoryWithQuotaFeature();
if (q == null) {
dir.addDirectoryWithQuotaFeature(new DirectoryWithQuotaFeature.
Builder().typeQuotas(typeQuotas).build());
} else {
q.setQuota(typeQuotas);
}
}
}
// ...
return dir;
}
{code}
h1. Fix to the inconsistency
One solution is to use the previous deserialized nsQuota value in the
deserialization method. In this case, the DirectoryWithQuotaFeature will have
nsQuota as -1 which is consistent with the value before the upgrade/restart. I
attached patch for 2.10.2 and 3.3.6.
Another solution is to adjust the serialization method so that the nsQuota
value is always 8E (Long.MAX_VALUE).
was:
When upgrading hdfs cluster from 2.10.2 to 3.3.6, the results returned from
*dfs count* command is inconsistent.
Restarting 3.3.6 can also trigger it (As long as FSImage is created similar to
the upgrade process)
h1. Reproduce
Start up 2.10.2 hdfs cluster (1 NN, 2 DN, 1 SNN), execute the following commands
{code:java}
bin/hdfs dfs -mkdir /GscWZRxS
bin/hdfs dfsadmin -setSpaceQuota 2 -storageType DISK /GscWZRxS
{code}
before upgrade, check the quota results
{code:java}
dfs -count -q -h -u /GscWZRxS
none inf none inf /GscWZRxS
{code}
Then prepare the upgrade. Enter safemode, {*}create image{*}, shutdown the
cluster and start up the new cluster
{code:java}
bin/hdfs dfs -count -q -h -u /GscWZRxS
8.0 E 8.0 E none inf /GscWZRxS
{code}
The values of the first two columns are inconsistent with the quota I set
before.
h1. Root Cause
The problem occurs in the deserialization process.
When deserializing the quota information in loadINodeDirectory method from
+FSImageFormatPBINode.java+ , nsQuota value is -1, which is deserialized
correctly.
However, it's not used later in DirectoryWithQuotaFeature object construction.
The default nsQuota in DirectoryWithQuotaFeature builder is Long.MAX_VALUE, and
this causes the inconsistency.
{code:java}
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
public static INodeDirectory loadINodeDirectory(INodeSection.INode n,
LoaderContext state) {
// ...
final long nsQuota = d.getNsQuota(), dsQuota = d.getDsQuota(); // -1
here
if (nsQuota >= 0 || dsQuota >= 0) {
dir.addDirectoryWithQuotaFeature(new
DirectoryWithQuotaFeature.Builder().
nameSpaceQuota(nsQuota).storageSpaceQuota(dsQuota).build());
}
EnumCounters<StorageType> typeQuotas = null;
if (d.hasTypeQuotas()) {
// ...
if (typeQuotas.anyGreaterOrEqual(0)) {
DirectoryWithQuotaFeature q = dir.getDirectoryWithQuotaFeature();
if (q == null) {
dir.addDirectoryWithQuotaFeature(new DirectoryWithQuotaFeature.
Builder().typeQuotas(typeQuotas).build());
} else {
q.setQuota(typeQuotas);
}
}
}
// ...
return dir;
}
{code}
h1. Fix to the inconsistency
One solution is to use the previous deserialized nsQuota value in the
deserialization method. In this case, the DirectoryWithQuotaFeature will have
nsQuota as -1 which is consistent with the value before the upgrade/restart. I
attached patch for 2.10.2 and 3.3.6.
Another solution is to adjust the serialization method so that the nsQuota
value is always 8E (Long.MAX_VALUE).
> Inconsistent count results when upgrading hdfs cluster from 2.10.2 to 3.3.6
> ---------------------------------------------------------------------------
>
> Key: HDFS-17219
> URL: https://issues.apache.org/jira/browse/HDFS-17219
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.10.2, 3.3.6
> Reporter: Ke Han
> Priority: Major
> Labels: consistency
> Attachments: 2.10.2-fsinode-fix.patch, 3.3.6-fsinode-fix.patch
>
>
> When upgrading hdfs cluster from 2.10.2 to 3.3.6, the results returned from
> *dfs count* command is inconsistent.
> Restarting 3.3.6 can also trigger it (As long as FSImage is created similar
> to the upgrade process)
> h1. Reproduce
> Start up 2.10.2 hdfs cluster (1 NN, 2 DN, 1 SNN), execute the following
> commands
> {code:java}
> bin/hdfs dfs -mkdir /GscWZRxS
> bin/hdfs dfsadmin -setSpaceQuota 2 -storageType DISK /GscWZRxS
> {code}
> before upgrade, check the quota results
> {code:java}
> dfs -count -q -h -u /GscWZRxS
> none inf none inf /GscWZRxS
> {code}
> Then prepare the upgrade. Enter safemode, {*}create image{*}, shutdown the
> cluster and start up the new cluster
> {code:java}
> bin/hdfs dfs -count -q -h -u /GscWZRxS
> 8.0 E 8.0 E none inf /GscWZRxS
> {code}
> The values of the first two columns are inconsistent with the quota I set
> before.
> h1. Root Cause
> The problem occurs in the deserialization process.
> When deserializing the quota information in loadINodeDirectory from
> +FSImageFormatPBINode.java+ , nsQuota value is -1, which is deserialized
> correctly.
> However, this value is not used later in DirectoryWithQuotaFeature object
> construction. The default nsQuota in DirectoryWithQuotaFeature builder is
> Long.MAX_VALUE, and this causes the inconsistency.
> {code:java}
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
>
> public static INodeDirectory loadINodeDirectory(INodeSection.INode n,
> LoaderContext state) {
> // ...
> final long nsQuota = d.getNsQuota(), dsQuota = d.getDsQuota(); // -1
> here
> if (nsQuota >= 0 || dsQuota >= 0) {
> dir.addDirectoryWithQuotaFeature(new
> DirectoryWithQuotaFeature.Builder().
> nameSpaceQuota(nsQuota).storageSpaceQuota(dsQuota).build());
> }
> EnumCounters<StorageType> typeQuotas = null;
> if (d.hasTypeQuotas()) {
> // ...
> if (typeQuotas.anyGreaterOrEqual(0)) {
> DirectoryWithQuotaFeature q = dir.getDirectoryWithQuotaFeature();
> if (q == null) {
> dir.addDirectoryWithQuotaFeature(new DirectoryWithQuotaFeature.
> Builder().typeQuotas(typeQuotas).build());
> } else {
> q.setQuota(typeQuotas);
> }
> }
> }
> // ...
> return dir;
> }
> {code}
> h1. Fix to the inconsistency
> One solution is to use the previous deserialized nsQuota value in the
> deserialization method. In this case, the DirectoryWithQuotaFeature will have
> nsQuota as -1 which is consistent with the value before the upgrade/restart.
> I attached patch for 2.10.2 and 3.3.6.
> Another solution is to adjust the serialization method so that the nsQuota
> value is always 8E (Long.MAX_VALUE).
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]