[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Balázs Németh updated BEAM-6954: -------------------------------- Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. ------- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) ... and then during deserialization it is found in the jsonOptions at ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] ), so it executes the getValueFromJson which uses an ObjectMapper to create a ValueProvider from a NullNode ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] ) . The problem here is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes. -> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] For this part of the issue see BEAM-6963 (that still doesn't solve this issue btw) was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. ------- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) ... and then during deserialization it is found in the jsonOptions at ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] ), so it executes the getValueFromJson which uses an ObjectMapper to create a ValueProvider from a NullNode ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] ) . The problem here is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes. -> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] That requires overriding the getNullValue(DeserializationContext ctxt) method. > @Default not called if the options json has null value for a property > --------------------------------------------------------------------- > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Affects Versions: 2.11.0 > Reporter: Balázs Németh > Priority: Major > > When a pipeline options get deserialized from a json with > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] > it creates a map, where properties present in the json - even if with a null > value - will be added to the map. > So we can have String->NullNode mappings. > When you create a ProxyInvocationHandler with this Map ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] > ) this map will be the backing jsonOptions map. > Later on when a getter is called on the options it will reach this code: > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] > > Then the containsKey will return true, even for NullNodes. So we won't > execute the getDefault() method hence not calculating the default value. > > I'm not sure about the expected behaviour, but either: > - the containsKey check should be expanded with an !isNull check > OR > - when we serialize the json, it shouldn't serialize null values at > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] > > Instinctively I would have expected the @Default.* annotations producing > values every single time, when the value is null - so a property with a > @Default.* annotation can't be null - but I was unable to find anything > explicit regarding this in the documentation. So I'm not sure which of the > suggested change has to be made. > ------- > Okay, I have investigated further, and it seems the default value is indeed > calculated before the json serialization by calling the mentioned method. The > problem is that it returns a RuntimeValueProvider, which gets serialized as > null ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] > ), because the isAccessible returns false ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] > + > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] > ) > ... and then during deserialization it is found in the jsonOptions at ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] > ), so it executes the getValueFromJson which uses an ObjectMapper to create > a ValueProvider from a NullNode ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] > ) . > The problem here is that according to the JsonDeserializer documentation, the > deserialize method isn't executed for null nodes. -> > [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] > For this part of the issue see BEAM-6963 (that still doesn't solve this issue > btw) -- This message was sent by Atlassian JIRA (v7.6.3#76005)