GitHub user jkff opened a pull request: https://github.com/apache/beam/pull/3634
[BEAM-1820] Source.getDefaultOutputCoder() throws CannotProvideCoderException This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but it turned out that a lot more needs to be done to fix it properly, and the necessary changes require knowledge of sufficiently dark corners of the SDK and runners that asking an external contributor to do this is unfair, so I did the changes myself. TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to succeed. Source.getDefaultOutputCoder is only a hint for inferring the Coder of its elements, and it should not be required to succeed. E.g. when a runner is replacing a Read.from(source) transform, and the override needs to know a coder for elements of the source, if the source doesn't provide a coder, the user may have set a coder on the returned PCollection explicitly. In this case, the runner should use that coder. In other cases, when an API uses a Source and needs its coder, it must let the caller provide a Coder explicitly (e.g. SourceTestUtils). I am treating this as a backwards-compatible change because Source.getDefaultOutputCoder() is a runner-facing rather than user-facing API. From the point of view of a user implementing a Source, adding a throws to the base class method signature is compatible. R: @tgroh CC: @lgajowy You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkff/incubator-beam source-coder-fallback Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3634.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3634 ---- commit 830c5657736ffe49e06e16039f947e7227fa45a7 Author: Eugene Kirpichov <kirpic...@google.com> Date: 2017-07-24T23:38:27Z [BEAM-1820] Source.getDefaultOutputCoder() throws CannotProvideCoderException This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but it turned out that a lot more needs to be done to fix it properly, and the necessary changes require knowledge of sufficiently dark corners of the SDK and runners that asking an external contributor to do this is unfair, so I did the changes myself. TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to succeed. Source.getDefaultOutputCoder is only a hint for inferring the Coder of its elements, and it should not be required to succeed. E.g. when a runner is replacing a Read.from(source) transform, and the override needs to know a coder for elements of the source, if the source doesn't provide a coder, the user may have set a coder on the returned PCollection explicitly. In this case, the runner should use that coder. In other cases, when an API uses a Source and needs its coder, it must let the caller provide a Coder explicitly (e.g. SourceTestUtils). ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---