[GitHub] [iceberg] amogh-jahagirdar commented on a change in pull request #4036: Core: SnapshotRef parser and serialization logic

GitBox Fri, 04 Feb 2022 16:10:52 -0800


amogh-jahagirdar commented on a change in pull request #4036:
URL: https://github.com/apache/iceberg/pull/4036#discussion_r799720734




##########
File path: core/src/main/java/org/apache/iceberg/util/JsonUtil.java
##########
@@ -65,6 +67,16 @@ public static Integer getIntOrNull(String property, JsonNode 
node) {
     return pNode.asInt();
   }
 
+  public static Long getLongOrNull(String property, JsonNode node) {
+    if (!node.has(property)) {
+      return null;
+    }
+    JsonNode pNode = node.get(property);
+    Preconditions.checkArgument(pNode != null && !pNode.isNull() && 
pNode.isIntegralNumber() &&
+        pNode.canConvertToLong(), "Cannot parse %s from non-string value: %s", 
property, pNode);

Review comment:
       Yes that should be fixed. There are a few other places where this 
applies in JsonUtil, so I will update that as well. 

##########
File path: core/src/main/java/org/apache/iceberg/util/JsonUtil.java
##########
@@ -134,6 +146,20 @@ public static String getStringOrNull(String property, 
JsonNode node) {
         .build();
   }
 
+  public static void writeIntegerIf(boolean condition, String key, Integer 
value, JsonGenerator generator)
+      throws IOException {
+    if (condition) {
+      generator.writeNumberField(key, value);
+    }
+  }

Review comment:
       Sure, I agree, I think it is more clear to specify field in the name! 

##########
File path: core/src/test/java/org/apache/iceberg/TestSnapshotRefParser.java
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestSnapshotRefParser {
+
+  @Test
+  public void testTagToJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\"}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L).build();
+    Assert.assertEquals("Should be able to serialize default tag",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testTagToJsonAllFields() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\",\"max-ref-age-ms\":1}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L)
+        .maxRefAgeMs(1L)
+        .build();
+    Assert.assertEquals("Should be able to serialize tag with all fields",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testBranchToJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"branch\"}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L).build();
+    Assert.assertEquals("Should be able to serialize default branch",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testBranchToJsonAllFields() {
+    String json = 
"{\"snapshot-id\":1,\"type\":\"branch\",\"min-snapshots-to-keep\":2," +
+        "\"max-snapshot-age-ms\":3,\"max-ref-age-ms\":4}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L)
+        .minSnapshotsToKeep(2)
+        .maxSnapshotAgeMs(3L)
+        .maxRefAgeMs(4L)
+        .build();
+    Assert.assertEquals("Should be able to serialize branch with all fields",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testTagFromJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\"}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L).build();
+    Assert.assertTrue("Should be able to deserialize default tag",
+        parsedRefEqualsExpected(ref, SnapshotRefParser.fromJson(json)));
+  }
+
+  @Test
+  public void testTagFromJsonAllFields() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\",\"max-ref-age-ms\":1}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L)
+        .maxRefAgeMs(1L)
+        .build();
+    Assert.assertTrue("Should be able to deserialize tag with all fields",
+        parsedRefEqualsExpected(ref, SnapshotRefParser.fromJson(json)));
+  }
+
+  @Test
+  public void testBranchFromJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"branch\"}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L).build();
+    Assert.assertTrue("Should be able to deserialize default branch",
+        parsedRefEqualsExpected(ref, SnapshotRefParser.fromJson(json)));
+  }
+
+  @Test
+  public void testBranchFromJsonAllFields() {
+    String json = 
"{\"snapshot-id\":1,\"type\":\"branch\",\"min-snapshots-to-keep\":2," +
+        "\"max-snapshot-age-ms\":3,\"max-ref-age-ms\":4}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L)
+        .minSnapshotsToKeep(2)
+        .maxSnapshotAgeMs(3L)
+        .maxRefAgeMs(4L)
+        .build();
+    Assert.assertTrue("Should be able to deserialize branch with all fields",
+        parsedRefEqualsExpected(ref, SnapshotRefParser.fromJson(json)));
+  }
+
+  private boolean parsedRefEqualsExpected(SnapshotRef parsedRef, SnapshotRef 
expectedRef) {
+    return parsedRef.type().equals(expectedRef.type()) &&
+        parsedRef.snapshotId() == expectedRef.snapshotId() &&
+        parsedRef.maxRefAgeMs().equals(expectedRef.maxRefAgeMs()) &&
+        
parsedRef.minSnapshotsToKeep().equals(expectedRef.minSnapshotsToKeep()) &&
+        parsedRef.maxSnapshotAgeMs().equals(expectedRef.maxSnapshotAgeMs());
+  }

Review comment:
       This is done since I am thinking we want to defer implementation of 
equals and hashcode until we define what we want for ref retention. I think 
then we would get better clarity as to what 2 equal snapshot refs mean. Let me 
know your thoughts @jackye1995 @rdblue 

##########
File path: core/src/test/java/org/apache/iceberg/TestSnapshotRefParser.java
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestSnapshotRefParser {
+
+  @Test
+  public void testTagToJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\"}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L).build();
+    Assert.assertEquals("Should be able to serialize default tag",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testTagToJsonAllFields() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\",\"max-ref-age-ms\":1}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L)
+        .maxRefAgeMs(1L)
+        .build();
+    Assert.assertEquals("Should be able to serialize tag with all fields",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testBranchToJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"branch\"}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L).build();
+    Assert.assertEquals("Should be able to serialize default branch",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testBranchToJsonAllFields() {
+    String json = 
"{\"snapshot-id\":1,\"type\":\"branch\",\"min-snapshots-to-keep\":2," +
+        "\"max-snapshot-age-ms\":3,\"max-ref-age-ms\":4}";
+    SnapshotRef ref = SnapshotRef.branchBuilder(1L)
+        .minSnapshotsToKeep(2)
+        .maxSnapshotAgeMs(3L)
+        .maxRefAgeMs(4L)
+        .build();
+    Assert.assertEquals("Should be able to serialize branch with all fields",
+        json, SnapshotRefParser.toJson(ref));
+  }
+
+  @Test
+  public void testTagFromJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\"}";
+    SnapshotRef ref = SnapshotRef.tagBuilder(1L).build();
+    Assert.assertTrue("Should be able to deserialize default tag",

Review comment:
       Agreed that it's currently not useful, in case the test fails we cannot 
easily tell the fields which caused the failure.
   
   Originally, I didn't want to commit to an equals/hashcode implementation yet 
as I didn't think we had complete clarity on what are 2 equal references (is it 
really all the fields being equal or is it just based on snapshot ID + type).I 
was thinking that equality may be deferred until we get to updating expiration 
logic in case the semantic of equality between references changes there and 
affects that logic.
   
   After more thinking I do not think that would be the case. So I am in favor 
of implementing equals/hashCode here based on all the fields and then testing 
based off that.

##########
File path: core/src/test/java/org/apache/iceberg/TestSnapshotRefParser.java
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestSnapshotRefParser {

Review comment:
       Yeah it's something I was looking at before, I was mainly following the 
existing parser test pattern where we don't do the fail case validation (let me 
know if I am missing something). My guess is previously the assumption is that 
this metadata would only be properly updated through the Iceberg library so 
such cases where the json string is null/empty, invalid fields, etc. indicates 
a corruption via an external mechanism.
   
    That being said, I only see benefits to adding these tests(we should define 
and validate how we behave in case the metadata file gets corrupted with 
invalid values rather than assume) so I will add them. 

##########
File path: core/src/test/java/org/apache/iceberg/TestSnapshotRefParser.java
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestSnapshotRefParser {

Review comment:
       Updated to include these cases. For the unknown field names, currently 
in the parsing logic we would simply ignore them as we are selecting the 
desired fields from the JSON node. I believe this should be the desired 
implementation but let me know if you feel otherwise.

##########
File path: core/src/test/java/org/apache/iceberg/TestSnapshotRefParser.java
##########
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Objects;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestSnapshotRefParser {
+
+  @Test
+  public void testTagToJsonDefault() {
+    String json = "{\"snapshot-id\":1,\"type\":\"tag\"}";

Review comment:
       Like @jackye1995  mentioned, I think with the separation, we can see 
more clearly what the serialized representation looks like, and validate it 
looks like what we want; and this is independent of us validating the logic for 
when we deserialize and then validate the object looks like what we want. If we 
combine, and there's a test failure it would then require an investigation as 
to which part is the unexpected behavior whereas with the separation it forces 
us to have well defined outputs for each ser/de cases, and we can more quickly 
isolate which part isn't behaving as we expect.
   
   Let me know your thoughts!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] amogh-jahagirdar commented on a change in pull request #4036: Core: SnapshotRef parser and serialization logic

Reply via email to