[ 
https://issues.apache.org/jira/browse/BEAM-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099290#comment-16099290
 ] 

ASF GitHub Bot commented on BEAM-1820:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3634

    [BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException

    This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but 
it turned out that a lot more needs to be done to fix it properly, and the 
necessary changes require knowledge of sufficiently dark corners of the SDK and 
runners that asking an external contributor to do this is unfair, so I did the 
changes myself.
    
    TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.
    
    Source.getDefaultOutputCoder is only a hint for inferring the Coder of its 
elements, and it should not be required to succeed.
    
    E.g. when a runner is replacing a Read.from(source) transform, and the 
override needs to know a coder for elements of the source, if the source 
doesn't provide a coder, the user may have set a coder on the returned 
PCollection explicitly. In this case, the runner should use that coder.
    
    In other cases, when an API uses a Source and needs its coder, it must let 
the caller provide a Coder explicitly (e.g. SourceTestUtils).
    
    I am treating this as a backwards-compatible change because 
Source.getDefaultOutputCoder() is a runner-facing rather than user-facing API. 
From the point of view of a user implementing a Source, adding a throws to the 
base class method signature is compatible.
    
    R: @tgroh 
    CC: @lgajowy


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam source-coder-fallback

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3634
    
----
commit 830c5657736ffe49e06e16039f947e7227fa45a7
Author: Eugene Kirpichov <kirpic...@google.com>
Date:   2017-07-24T23:38:27Z

    [BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException
    
    This is based on https://github.com/apache/beam/pull/3549 by @lgajowy,
    but it turned out that a lot more needs to be done to fix it properly,
    and the necessary changes require knowledge of sufficiently dark
    corners of the SDK and runners that asking an external contributor
    to do this is unfair, so I did the changes myself.
    
    TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.
    
    Source.getDefaultOutputCoder is only a hint for inferring the Coder
    of its elements, and it should not be required to succeed.
    
    E.g. when a runner is replacing a Read.from(source) transform,
    and the override needs to know a coder for elements of the source,
    if the source doesn't provide a coder, the user may have set
    a coder on the returned PCollection explicitly. In this case,
    the runner should use that coder.
    
    In other cases, when an API uses a Source and needs its coder,
    it must let the caller provide a Coder explicitly (e.g.
    SourceTestUtils).

----


> Source.getDefaultOutputCoder() should be @Nullable
> --------------------------------------------------
>
>                 Key: BEAM-1820
>                 URL: https://issues.apache.org/jira/browse/BEAM-1820
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Ɓukasz Gajowy
>              Labels: easyfix, starter
>
> Source.getDefaultOutputCoder() returns a coder for elements produced by the 
> source.
> However, the Source objects are nearly always hidden from the user and 
> instead encapsulated in a transform. Often, an enclosing transform has a 
> better idea of what coder should be used to encode these elements (e.g. a 
> user supplied a Coder to that transform's configuration). In that case, it'd 
> be good if Source.getDefaultOutputCoder() could just return null, and coder 
> would have to be handled by the enclosing transform or perhaps specified on 
> the output of that transform explicitly.
> Right now there's a bunch of code in the SDK and runners that assumes 
> Source.getDefaultOutputCoder() returns non-null. That code would need to be 
> fixed to instead use the coder set on the collection produced by 
> Read.from(source).
> It all appears pretty easy to fix, so this is a good starter item.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to