TheNeuralBit commented on a change in pull request #11955:
URL: https://github.com/apache/beam/pull/11955#discussion_r437784412
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
super(Row.class);
this.schema = schema;
}
+
+public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {
Review comment:
Similar comment here, `withIgnoreNullsOnWrite` and add a docstring. (I
think checkstyle will complain without the docstring anyway).
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
this.schema = schema;
}
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
+ this.allowMissingFields = allowMissing;
+ return this;
+ }
+
@Override
public Row deserialize(JsonParser jsonParser, DeserializationContext
deserializationContext)
throws IOException {
// Parse and convert the root object to Row as if it's a nested field
with name 'root'
return (Row)
extractJsonNodeValue(
- FieldValue.of("root", FieldType.row(schema),
jsonParser.readValueAsTree()));
+ FieldValue.of("root", FieldType.row(schema),
jsonParser.readValueAsTree(), allowMissingFields));
}
+
+
private static Object extractJsonNodeValue(FieldValue fieldValue) {
- if (!fieldValue.isJsonValuePresent()) {
Review comment:
I think you could just check `this.allowMissingFields` here rather than
passing it into all the `FieldValue` instances, no?
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
this.schema = schema;
}
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
Review comment:
nit: could you change this to `withAllowMissingFields`?
Also please add a docstring
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
super(Row.class);
this.schema = schema;
}
+
+public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {
Review comment:
I might call it "DropNullsOnWrite" instead of ignore, but I don't feel
strongly about it
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -375,6 +400,9 @@ private void writeRow(Row row, Schema schema, JsonGenerator
gen) throws IOExcept
for (int i = 0; i < schema.getFieldCount(); ++i) {
Field field = schema.getField(i);
Object value = row.getValue(i);
+if (ignoreNullsOnWrite && value == null){
+ continue;
+}
Review comment:
This should also check `field.getType().getNullable()` like the other
conditional. If we get a null for a non-nullable field we should fail loudly
rather than silently dropping it.
##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
this.schema = schema;
}
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
Review comment:
We might want to make this an enum so in the future there could be a
third mode where nulls _must_ be encoded with a missing field, and having a
null field value would be considered an error. The mode you've added here is a
permissive middle ground where we allow either one.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org