[GitHub] [beam] TheNeuralBit commented on a change in pull request #11955: [BEAM-10220] add support for implicit nulls for converting between beam rows and json

2020-06-09 Thread GitBox


TheNeuralBit commented on a change in pull request #11955:
URL: https://github.com/apache/beam/pull/11955#discussion_r437784412



##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
   super(Row.class);
   this.schema = schema;
 }
+  
+public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {

Review comment:
   Similar comment here, `withIgnoreNullsOnWrite` and add a docstring. (I 
think checkstyle will complain without the docstring anyway).

##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
   this.schema = schema;
 }
 
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
+  this.allowMissingFields = allowMissing;
+  return this;
+  }
+
 @Override
 public Row deserialize(JsonParser jsonParser, DeserializationContext 
deserializationContext)
 throws IOException {
 
   // Parse and convert the root object to Row as if it's a nested field 
with name 'root'
   return (Row)
   extractJsonNodeValue(
-  FieldValue.of("root", FieldType.row(schema), 
jsonParser.readValueAsTree()));
+  FieldValue.of("root", FieldType.row(schema), 
jsonParser.readValueAsTree(), allowMissingFields));
 }
 
+  
+
 private static Object extractJsonNodeValue(FieldValue fieldValue) {
-  if (!fieldValue.isJsonValuePresent()) {

Review comment:
   I think you could just check `this.allowMissingFields` here rather than 
passing it into all the `FieldValue` instances, no?

##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
   this.schema = schema;
 }
 
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){

Review comment:
   nit: could you change this to `withAllowMissingFields`? 
   
   Also please add a docstring

##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -362,6 +382,11 @@ private RowJsonSerializer(Schema schema) {
   super(Row.class);
   this.schema = schema;
 }
+  
+public RowJsonSerializer ignoreNullsOnWrite(Boolean ignoreNullsOnWrite) {

Review comment:
   I might call it "DropNullsOnWrite" instead of ignore, but I don't feel 
strongly about it 

##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -375,6 +400,9 @@ private void writeRow(Row row, Schema schema, JsonGenerator 
gen) throws IOExcept
   for (int i = 0; i < schema.getFieldCount(); ++i) {
 Field field = schema.getField(i);
 Object value = row.getValue(i);
+if (ignoreNullsOnWrite && value == null){
+  continue;
+}

Review comment:
   This should also check `field.getType().getNullable()` like the other 
conditional. If we get a null for a non-nullable field we should fail loudly 
rather than silently dropping it.

##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
   this.schema = schema;
 }
 
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){

Review comment:
   We might want to make this an enum so in the future there could be a 
third mode where nulls _must_ be encoded with a missing field, and having a 
null field value would be considered an error. The mode you've added here is a 
permissive middle ground where we allow either one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #11955: [BEAM-10220] add support for implicit nulls for converting between beam rows and json

2020-06-15 Thread GitBox


TheNeuralBit commented on a change in pull request #11955:
URL: https://github.com/apache/beam/pull/11955#discussion_r440372485



##
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
##
@@ -185,18 +186,25 @@ private RowJsonDeserializer(Schema schema) {
   this.schema = schema;
 }
 
+public RowJsonDeserializer allowMissingFields(Boolean allowMissing){
+  this.allowMissingFields = allowMissing;
+  return this;
+  }
+
 @Override
 public Row deserialize(JsonParser jsonParser, DeserializationContext 
deserializationContext)
 throws IOException {
 
   // Parse and convert the root object to Row as if it's a nested field 
with name 'root'
   return (Row)
   extractJsonNodeValue(
-  FieldValue.of("root", FieldType.row(schema), 
jsonParser.readValueAsTree()));
+  FieldValue.of("root", FieldType.row(schema), 
jsonParser.readValueAsTree(), allowMissingFields));
 }
 
+  
+
 private static Object extractJsonNodeValue(FieldValue fieldValue) {
-  if (!fieldValue.isJsonValuePresent()) {

Review comment:
   Ah yeah I didn't notice it was static. I think making it non-static is 
the right call, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org