Copilot commented on code in PR #10603: URL: https://github.com/apache/ozone/pull/10603#discussion_r3475562760
########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffValueParser.java: ########## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.om.snapshot; + +import static org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.CommitKeyRequest.HSYNC_FIELD_NUMBER; + +import com.google.protobuf.ByteString; +import com.google.protobuf.CodedInputStream; +import com.google.protobuf.WireFormat; +import java.io.IOException; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.KeyValue; +import org.apache.hadoop.ozone.OzoneConsts; +import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.DirectoryInfo; +import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.KeyInfo; + +/** + * Parses snapshot diff values without full deserialization. + */ +public final class SnapshotDiffValueParser { + private static final int INT_BYTES = 4; + private static final int LONG_BYTES = 8; + private static final String DIGEST_ALGORITHM = "SHA-256"; + + private SnapshotDiffValueParser() { + } + + public static ParsedRequiredInfo parseKeyInfoRequiredFields(byte[] value, boolean includeUpdateId) + throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + long updateId = 0L; + long objectId = 0L; + long parentId = 0L; + boolean hasUpdateId = false; + String keyName = null; + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case KeyInfo.KEYNAME_FIELD_NUMBER: + keyName = input.readString(); + break; + case KeyInfo.OBJECTID_FIELD_NUMBER: + objectId = input.readUInt64(); + break; + case KeyInfo.UPDATEID_FIELD_NUMBER: + if (includeUpdateId) { + updateId = input.readUInt64(); + hasUpdateId = true; + } else { + input.skipField(tag); + } + break; + case KeyInfo.PARENTID_FIELD_NUMBER: + parentId = input.readUInt64(); + break; + default: + input.skipField(tag); + break; + } + } + + return new ParsedRequiredInfo(updateId, hasUpdateId, objectId, parentId, keyName); + } + + public static byte[] computeKeyInfoCompareSignature(byte[] value) throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + MessageDigest digest = newDigest(); + boolean hasHsyncMetadata = false; + int keyLocationListCount = 0; + ByteString latestKeyLocationList = null; + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case KeyInfo.DATASIZE_FIELD_NUMBER: + updateDigestWithLong(digest, fieldNumber, input.readUInt64()); + break; + case KeyInfo.KEYLOCATIONLIST_FIELD_NUMBER: + latestKeyLocationList = input.readBytes(); + keyLocationListCount++; + break; + case KeyInfo.METADATA_FIELD_NUMBER: + MetadataEntry metadataEntry = parseMetadataEntry(input.readBytes().toByteArray()); + if (metadataEntry != null) { + updateDigestWithRawBytes(digest, fieldNumber, metadataEntry.getDigest()); + if (metadataEntry.hasHsync()) { Review Comment: The compare signature currently depends on the on-wire ordering of `metadata` entries (and also `tags`, which are `repeated KeyValue` in the proto) because it updates the digest as entries are encountered. Since `KeyValueUtil.getFromProtobuf()` materializes metadata into a `HashMap` (`KeyValueUtil.java:36-40`) and `KeyValueUtil.toProtobuf()` iterates `entrySet()` (`KeyValueUtil.java:45-51`), the persisted entry order is not guaranteed to be stable even when the logical map content is unchanged. This can cause false-positive snapshot diffs. To make signatures stable, canonicalize map-like repeated fields: collect per-entry digests (eg `SHA-256(key,value)` as you already do in `parseMetadataEntry`) and fold them into the final digest in a deterministic order (eg sort lexicographically by key bytes or entry digest). Apply the same treatment to `tags` as well. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffValueParser.java: ########## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.om.snapshot; + +import static org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.CommitKeyRequest.HSYNC_FIELD_NUMBER; + +import com.google.protobuf.ByteString; +import com.google.protobuf.CodedInputStream; +import com.google.protobuf.WireFormat; +import java.io.IOException; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.KeyValue; +import org.apache.hadoop.ozone.OzoneConsts; +import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.DirectoryInfo; +import org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.KeyInfo; + +/** + * Parses snapshot diff values without full deserialization. + */ +public final class SnapshotDiffValueParser { + private static final int INT_BYTES = 4; + private static final int LONG_BYTES = 8; + private static final String DIGEST_ALGORITHM = "SHA-256"; + + private SnapshotDiffValueParser() { + } + + public static ParsedRequiredInfo parseKeyInfoRequiredFields(byte[] value, boolean includeUpdateId) + throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + long updateId = 0L; + long objectId = 0L; + long parentId = 0L; + boolean hasUpdateId = false; + String keyName = null; + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case KeyInfo.KEYNAME_FIELD_NUMBER: + keyName = input.readString(); + break; + case KeyInfo.OBJECTID_FIELD_NUMBER: + objectId = input.readUInt64(); + break; + case KeyInfo.UPDATEID_FIELD_NUMBER: + if (includeUpdateId) { + updateId = input.readUInt64(); + hasUpdateId = true; + } else { + input.skipField(tag); + } + break; + case KeyInfo.PARENTID_FIELD_NUMBER: + parentId = input.readUInt64(); + break; + default: + input.skipField(tag); + break; + } + } + + return new ParsedRequiredInfo(updateId, hasUpdateId, objectId, parentId, keyName); + } + + public static byte[] computeKeyInfoCompareSignature(byte[] value) throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + MessageDigest digest = newDigest(); + boolean hasHsyncMetadata = false; + int keyLocationListCount = 0; + ByteString latestKeyLocationList = null; + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case KeyInfo.DATASIZE_FIELD_NUMBER: + updateDigestWithLong(digest, fieldNumber, input.readUInt64()); + break; + case KeyInfo.KEYLOCATIONLIST_FIELD_NUMBER: + latestKeyLocationList = input.readBytes(); + keyLocationListCount++; + break; + case KeyInfo.METADATA_FIELD_NUMBER: + MetadataEntry metadataEntry = parseMetadataEntry(input.readBytes().toByteArray()); + if (metadataEntry != null) { + updateDigestWithRawBytes(digest, fieldNumber, metadataEntry.getDigest()); + if (metadataEntry.hasHsync()) { + hasHsyncMetadata = true; + } + } + break; + case KeyInfo.FILECHECKSUM_FIELD_NUMBER: + case KeyInfo.ACLS_FIELD_NUMBER: + case KeyInfo.TAGS_FIELD_NUMBER: + updateDigestWithBytes(digest, fieldNumber, input.readBytes()); + break; + default: + input.skipField(tag); + break; + } + } + + if (latestKeyLocationList != null) { + updateDigestWithBytes(digest, KeyInfo.KEYLOCATIONLIST_FIELD_NUMBER, latestKeyLocationList); + } + updateDigestWithBoolean(digest, HSYNC_FIELD_NUMBER, hasHsyncMetadata); + updateDigestWithInt(digest, keyLocationListCount); + + return digest.digest(); + } + + public static ParsedRequiredInfo parseDirectoryInfoRequiredFields(byte[] value, boolean includeUpdateId) + throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + long updateId = 0L; + long objectId = 0L; + long parentId = 0L; + boolean hasUpdateId = false; + String name = null; + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case DirectoryInfo.NAME_FIELD_NUMBER: + name = input.readString(); + break; + case DirectoryInfo.OBJECTID_FIELD_NUMBER: + objectId = input.readUInt64(); + break; + case DirectoryInfo.UPDATEID_FIELD_NUMBER: + if (includeUpdateId) { + updateId = input.readUInt64(); + hasUpdateId = true; + } else { + input.skipField(tag); + } + break; + case DirectoryInfo.PARENTID_FIELD_NUMBER: + parentId = input.readUInt64(); + break; + default: + input.skipField(tag); + break; + } + } + + return new ParsedRequiredInfo(updateId, hasUpdateId, objectId, parentId, name); + } + + public static byte[] computeDirectoryInfoCompareSignature(byte[] value) throws IOException { + CodedInputStream input = CodedInputStream.newInstance(value); + MessageDigest digest = newDigest(); + + int tag; + while ((tag = input.readTag()) != 0) { + int fieldNumber = WireFormat.getTagFieldNumber(tag); + switch (fieldNumber) { + case DirectoryInfo.METADATA_FIELD_NUMBER: + MetadataEntry metadataEntry = parseMetadataEntry(input.readBytes().toByteArray()); + if (metadataEntry != null) { + updateDigestWithRawBytes(digest, fieldNumber, metadataEntry.getDigest()); + } Review Comment: `computeDirectoryInfoCompareSignature` updates the digest for `DirectoryInfo.metadata` entries in the order encountered on the wire. `metadata` is a logical map (stored/converted via `KeyValueUtil`), so its serialized ordering is not guaranteed to be stable (`KeyValueUtil.java:36-51`). This can make directory signatures change due to reordering alone, producing false-positive diffs. Canonicalize metadata before digesting (eg compute a per-entry digest of key/value and then sort/fold deterministically) so that signatures depend only on logical content. ########## hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffValueParser.java: ########## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.om.snapshot; + +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; + +import java.util.Arrays; +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import org.apache.hadoop.fs.FileChecksum; +import org.apache.hadoop.fs.MD5MD5CRC32GzipFileChecksum; +import org.apache.hadoop.hdds.client.BlockID; +import org.apache.hadoop.hdds.client.RatisReplicationConfig; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.ReplicationFactor; +import org.apache.hadoop.io.MD5Hash; +import org.apache.hadoop.ozone.OzoneAcl; +import org.apache.hadoop.ozone.OzoneConsts; +import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo; +import org.apache.hadoop.ozone.om.helpers.OmKeyInfo; +import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo; +import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup; +import org.junit.jupiter.api.Test; + +class TestSnapshotDiffValueParser { + private static final String VOLUME = "volume"; + private static final String BUCKET = "bucket"; + private static final String KEY_NAME = "dir/file"; + private static final String DIR_NAME = "dir"; + private static final long OBJECT_ID = 10L; + private static final long PARENT_ID = 20L; + private static final long UPDATE_ID = 30L; + + @Test + void testKeyInfoParserSignatureAndUpdateId() throws Exception { + OmKeyInfo keyInfo = createKeyInfo(100L, 200L, 1024L, createChecksum((byte) 1), + createMetadata("meta", "one"), createTags("tag", "one"), createAcls(), + Collections.singletonList(createKeyLocationGroup(1L))); + byte[] rawData = OmKeyInfo.getCodec().toPersistedFormat(keyInfo); Review Comment: Current unit tests validate updateId parsing and that metadata/time changes affect (or don’t affect) the compare signatures, but they don’t assert that other intended signature inputs are included (eg tags, ACLs, file checksum, keyLocationList/latest block location, dataSize). This leaves the signature contract partially untested and makes future refactors risky. Consider adding focused assertions that toggling each of those fields changes the signature, and that changing only non-signature fields does not. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffValueParser.java: ########## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.om.snapshot; + +import static org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.CommitKeyRequest.HSYNC_FIELD_NUMBER; Review Comment: `computeKeyInfoCompareSignature` uses `CommitKeyRequest.HSYNC_FIELD_NUMBER` as the tag for the derived `hasHsyncMetadata` flag. That constant is unrelated to `KeyInfo` and also happens to be `3`, which collides with `KeyInfo.KEYNAME_FIELD_NUMBER` (see `OmClientProtocol.proto:1189-1218` and `OmClientProtocol.proto:1584-1589`). This makes the signature scheme confusing and fragile if more tagged fields are added later. Use a parser-local constant for derived fields (eg `private static final int HSYNC_METADATA_PRESENT_TAG = 1001;`) and reference that instead of importing from `CommitKeyRequest`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
