Issue with Coder documentation regarding context

Aviem Zur Thu, 09 Feb 2017 10:13:01 -0800

Hi,

I think improvements can be made to the documentation of `encode` and
`decode` methods in `Coder`.


A coder may be used to encode/decode several objects using a single stream,
you cannot assume that the stream the coder encodes to/decodes from only
contains bytes representing a single object. For example, when the coder is
used in an `IterableCoder`, for example in `GroupByKey`.

When implementing a coder this needs to be taken into account.

The `context` argument in `encode` and `decode` methods provides the
necessary information.

The existing documentation for these methods does not seem to cover this.
If users are not aware of this when implementing these methods it can cause
errors or skewed results.

See:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L126
and:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L137

This is partially addressed in the documentation of the static `Context`
values:
`OUTER`:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L72
and `NESTED`:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L79

However, I think that the documentation of `encode` and `decode` should
explain this concept clearly, to avoid confusing users implementing coders.

Issue with Coder documentation regarding context

Reply via email to