On 02/17/2014 01:17 AM, Stuart Marks wrote:
On 2/14/14 9:43 AM, David M. Lloyd wrote:
On 02/14/2014 09:56 AM, David M. Lloyd wrote:
In the JDK, java.util.Date does not read/write fields.  Perhaps others
as well.  Given that the behavior is presently undefined, that means the
serialized representation of java.util.Date (and any other such
non-conforming classes) are also undefined.

An interesting detail here - since Date doesn't have any non-transient
fields,
this happens to work out OK for a second reason (that
defaultReadFields() would
read nothing anyway) - however it still would break if a non-transient
field
were to be added.

Hi David,

(coming late to this party)

Thanks for replying, comments below.

Thanks for pointing out these clauses in the serialization
specification. I always knew that these methods "should" behave this way
but I was unaware of the undefined qualification in the spec, and I was
also unaware that even JDK classes like java.util.Date have
readObject/writeObject methods that don't fulfil this requirement.

That's the case that blew up for us first, so that's how I knew about it. For a little background, we (JBoss) have a custom serialization wire protocol which nevertheless follows the serialization spec as exactly as possible from an API point of view. When I was studying the specification, I noted this bit and decided our action would be the #2 option (throw an exception when this happens), which effectively let me decide what to do later. This was when we discovered that java.util.Date has this issue.

However I noted at the time that since this class has no non-transient fields, allowing this case is actually a relatively low risk. So we began allowing classes with no fields to skip the put/getFields step, while still throwing an exception if there are fields defined (at either end). This got us past the java.util.Date case (which is a particularly problematic case given how pervasive it appears to be in user code, particularly Java EE code).

Recently I "fixed" the remaining cases in our implementation by doing the read/discard stuff I describe in the #4 option. I had realized that doing so is backwards compatible with what we had been doing (basically replacing the exception with "safe" behavior). This works out very well for us, as you can imagine. :)

The problem with doing this in the JDK is, as you know, that there are/will be serialized blobs out there that would not include field information that would then be deserialized with implementations that expect it, and vice-versa. I wonder if perhaps it would be a good idea to make this behavior configurable though, for cases where both ends are under the same control. Or, perhaps it could be a protocol version thing (though IIRC there isn't a great facility for forwards-compatibility in the event that an "old" deserializer reads a "new" stream - how was this done for enums I wonder?).

I also think you're right that these problems are widespread. A recent
blog post on serialization [1] has some sample code whose
readObject/writeObject methods don't fulfil this requirement either.

Yeah I wouldn't be surprised. We recently encountered at least one class in one of the Bouncy Castle security provider packages as well with this issue - except that class has fields.

On the other hand, this requirement doesn't seem to appear in the
javadoc anyplace that I can find.

Yeah this is quite unfortunate.

In your initial post, you said that problems with the serialization
specification that have caused user problems. Can you be more specific
about what these problems were?

As I outlined above. We want to provide a performant, and more importantly, a safe experience for users; our custom protocol enabled the former and our conservative behavior allowed the latter. We have what turns out to be a quite good solution for our own case, but this doesn't help the general community much.

In another message earlier in this thread, you had made a few suggestions:

1) do nothing :(
2) start throwing (or writing) an exception in write/readObject when
stream ops are performed without reading fields (maybe can be disabled
with a sys prop or something)
3) leave fields cleared and risk protocol issues
4) silently start reading/writing empty field information (risks
protocol issues)

[trim response for #2 and #4 which I agree with]

#3 leads me to mention another area of the serialization specification
that *is* well-defined, which is what occurs if fields are added or
removed from one object version to the next. This is covered in sections
5.6.1 and 5.6.2 of the spec. [2] Briefly, if the current object has
fields for which values are not present in the serialization stream,
those fields are initialized to their default values (zero, null,
false). Does this have any bearing on the issues you're concerned about?
(It doesn't say so very explicitly, but field data that appears in the
serialized form is ignored if there is no corresponding field in the
current class.)

Yeah this is consistent with my understanding of the specification. I think it is also implied in 3.4, "If the class being restored is not present in the stream being read, then its readObjectNoData method, if defined, is invoked (instead of readObject); otherwise, its fields are initialized to the appropriate default values. For further detail, see section 3.5."

Wire protocol aside, this requirement is what led to my #4 proposal: from an API perspective, the behavior is no different. The fields are initialized to default zero values. But it enables users to add get/put field operations later on without protocol corruption.

Of course this change would be essentially protocol-incompatible with the current Oracle/OpenJDK implementation, as was pointed out already in various ways, unless as I said there is some clever way that I can't immediately see to let this work.

Finally, another suggestion that might help with these issues is not to
change the JDK, but to use static analysis tools such as FindBugs to
help programmers identify problem code.

Yeah, this is basically the status quo I think. It would however be extra nice if the compiler could emit a warning at the least, instead of relying on external static analysis tools, given how central serialization is. The necessary static analysis should be relatively simple, from what little I recall of the compiler internals.

One possible bad side-effect would be that people might "fix" their broken classes without realizing they've just broken their wire compatibility though. Such a tool should probably emit a message to the effect of "classes without OOS.dWO/wF should only have transient fields" instead of "this class should have a OOS.dWO/wF call". While adding transient to fields of such classes would technically change the serialized form of the class, that form is never used in any event, so it's an "okay" change, and decreases the serialized class descriptor size besides... and in the long term may even allow adding a dWO/dRO pair in the future, depending on the compatibility needs of the user for that class.

--
- DML

Reply via email to