How should we move forward on this? The idea looks good, and even
comes with a PR to review. Any objections to the names?
On Wed, Dec 5, 2018 at 12:48 PM Jeff Klukas <[email protected]> wrote:
>
> Reminder that I'm looking for review on
> https://github.com/apache/beam/pull/7160
>
> On Thu, Nov 29, 2018, 11:48 AM Jeff Klukas <[email protected] wrote:
>>
>> I created a JIRA and a PR for this:
>>
>> https://issues.apache.org/jira/browse/BEAM-6150
>> https://github.com/apache/beam/pull/7160
>>
>> On naming, I'm proposing that SerializableFunction extend ProcessFunction
>> (since this new superinterface is particularly appropriate for user code
>> executed inside a ProcessElement method) and that SimpleFunction extend
>> InferableFunction (since type information and coder inference are what
>> distinguish this from ProcessFunction).
>>
>> We originally discussed deprecating SerializableFunction and SimpleFunction
>> in favor of the new types, but there appear to be two fairly separate use
>> cases for SerializableFunction. It's either defining user code that will be
>> executed in a DoFn, in which case I think we always want to prefer the new
>> interface that allows declared exceptions. But it's also used where the code
>> is to be executed as part of pipeline construction, in which case it may be
>> reasonable to want to restrict checked exceptions. In any case, deprecating
>> SerializableFunction and SimpleFunction can be discussed further in the
>> future.
>>
>>
>> On Wed, Nov 28, 2018 at 9:53 PM Kenneth Knowles <[email protected]> wrote:
>>>
>>> Nice! A clean solution and an opportunity to bikeshed on names. This has
>>> everything I love.
>>>
>>> Kenn
>>>
>>> On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas <[email protected]> wrote:
>>>>
>>>> It looks like we can add make the new interface a superinterface for the
>>>> existing SerializableFunction while maintaining binary compatibility [0].
>>>>
>>>> We'd have:
>>>>
>>>> public interface NewSerializableFunction<InputT, OutputT> extends
>>>> Serializable {
>>>> OutputT apply(InputT input) throws Exception;
>>>> }
>>>>
>>>> and then modify SerializableFunction to inherit from it:
>>>>
>>>> public interface SerializableFunction<InputT, OutputT> extends
>>>> NewSerializableFunction<InputT, OutputT>, Serializable {
>>>> @Override
>>>> OutputT apply(InputT input);
>>>> }
>>>>
>>>>
>>>> IIUC, we can then more or less replace all references to
>>>> SerializableFunction with NewSerializableFunction across the beam codebase
>>>> without having to introduce any new overrides. I'm working on a proof of
>>>> concept now.
>>>>
>>>> It's not clear what the actual names for NewSerializableFunction and
>>>> NewSimpleFunction should be.
>>>>
>>>> [0]
>>>> https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html#jls-13.4.4
>>>>
>>>>
>>>> On Mon, Nov 26, 2018 at 9:54 PM Thomas Weise <[email protected]> wrote:
>>>>>
>>>>> +1 for introducing the new interface now and deprecating the old one. The
>>>>> major version change then provides the opportunity to remove deprecated
>>>>> code.
>>>>>
>>>>>
>>>>> On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik <[email protected]> wrote:
>>>>>>
>>>>>> Before 3.0 we will still want to introduce this giving time for people
>>>>>> to migrate, would it make sense to do that now and deprecate the
>>>>>> alternatives that it replaces?
>>>>>>
>>>>>> On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas <[email protected]> wrote:
>>>>>>>
>>>>>>> Picking up this thread again. Based on the feedback from Kenn, Reuven,
>>>>>>> and Romain, it sounds like there's no objection to the idea of
>>>>>>> SimpleFunction and SerializableFunction declaring that they throw
>>>>>>> Exception. So the discussion at this point is about whether there's an
>>>>>>> acceptable way to introduce that change.
>>>>>>>
>>>>>>> IIUC correctly, Kenn was suggesting that we need to ensure backwards
>>>>>>> compatibility for existing user code both at runtime and recompile,
>>>>>>> which means we can't simply add the declaration to the existing
>>>>>>> interfaces, since that would cause errors at compile time for user code
>>>>>>> directly invoking SerializableFunction instances.
>>>>>>>
>>>>>>> I don't see an obvious way that introducing a new functional interface
>>>>>>> would help without littering the API with more variants (it's already a
>>>>>>> bit confusing that i.e. MapElements has multiple via() methods to
>>>>>>> support three different function interfaces).
>>>>>>>
>>>>>>> Perhaps this kind of cleanup is best left for Beam 3.0?
>>>>>>>
>>>>>>> On Mon, Oct 15, 2018 at 6:51 PM Reuven Lax <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Compilation compatibility is a big part of what Beam aims to provide
>>>>>>>> with its guarantees. Romain makes a good point that most users are not
>>>>>>>> invoking SeralizableFunctions themselves (they are usually invoked
>>>>>>>> inside of Beam classes such as MapElements), however I suspect some
>>>>>>>> users do these things.
>>>>>>>>
>>>>>>>> On Sun, Oct 14, 2018 at 2:38 PM Kenneth Knowles <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Romain has brought up two good aspects of backwards compatibility:
>>>>>>>>>
>>>>>>>>> - runtime replacement vs recompile
>>>>>>>>> - consumer (covariant) vs producer (contravariant, such as
>>>>>>>>> interfaces a user implements)
>>>>>>>>>
>>>>>>>>> In this case, I think the best shoice is covariant and contravariant
>>>>>>>>> (invariant) backwards compat including recompile compat. But we
>>>>>>>>> shouldn't assume there is one obvious definition of "backwards
>>>>>>>>> compatibility".
>>>>>>>>>
>>>>>>>>> Does it help to introduce a new functional interface?
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Sun, Oct 14, 2018 at 6:25 AM Romain Manni-Bucau
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Beam does not catch Exception for function usage so it will have to
>>>>>>>>>> do it in some places.
>>>>>>>>>>
>>>>>>>>>> A user does not have to execute the function so worse case it
>>>>>>>>>> impacts tests and in any case the most important: it does not impact
>>>>>>>>>> the user until it recompiles the code (= runtime is not impacted).
>>>>>>>>>>
>>>>>>>>>> Romain Manni-Bucau
>>>>>>>>>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Le dim. 14 oct. 2018 à 15:19, Reuven Lax <[email protected]> a écrit :
>>>>>>>>>>>
>>>>>>>>>>> What in Beam codebase is not ready, and how do we know that user
>>>>>>>>>>> code doesn't have the same issue?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Oct 14, 2018 at 6:04 AM Romain Manni-Bucau
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm, tested also and it works until something in the codeflow does
>>>>>>>>>>>> not respect that constraint - see
>>>>>>>>>>>> com.sun.tools.javac.comp.Flow.FlowAnalyzer#errorUncaught. In other
>>>>>>>>>>>> words beam codebase is not ready for that and will make it fail
>>>>>>>>>>>> but it is ok cause we can fix it but user code does not rely on
>>>>>>>>>>>> that so it is fine to update it normally.
>>>>>>>>>>>>
>>>>>>>>>>>> Romain Manni-Bucau
>>>>>>>>>>>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Le dim. 14 oct. 2018 à 14:39, Reuven Lax <[email protected]> a
>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just tried it, doesn't appear to work :(
>>>>>>>>>>>>>
>>>>>>>>>>>>> error: unreported exception Exception; must be caught or declared
>>>>>>>>>>>>> to be thrown
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Oct 14, 2018 at 1:55 AM Romain Manni-Bucau
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> not with java>=8 AFAIK
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Romain Manni-Bucau
>>>>>>>>>>>>>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le dim. 14 oct. 2018 à 10:49, Reuven Lax <[email protected]> a
>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But it means that other functions that call
>>>>>>>>>>>>>>> SerializableFunctions must now declare exceptions, right? If
>>>>>>>>>>>>>>> yes, this is incompatible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Oct 14, 2018 at 1:37 AM Romain Manni-Bucau
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, only parameter types and return type is used to lookup
>>>>>>>>>>>>>>>> methods.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Romain Manni-Bucau
>>>>>>>>>>>>>>>> @rmannibucau | Blog | Old Blog | Github | LinkedIn | Book
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le dim. 14 oct. 2018 à 09:45, Reuven Lax <[email protected]> a
>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've run into this problem before as well. Doesn't changing
>>>>>>>>>>>>>>>>> the signature involve a backwards-incompatible change though?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 3, 2018 at 5:11 PM Jeff Klukas
>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm working on
>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-5638 to add
>>>>>>>>>>>>>>>>>> exception handling options to single message transforms in
>>>>>>>>>>>>>>>>>> the Java SDK.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> MapElements' via() method is overloaded to accept either a
>>>>>>>>>>>>>>>>>> SimpleFunction, a SerializableFunction, or a Contextful, all
>>>>>>>>>>>>>>>>>> of which are ultimately stored as a Contextful where the
>>>>>>>>>>>>>>>>>> mapping functionis expected to have signature:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> OutputT apply(InputT element, Context c) throws Exception;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So Contextful.Fn allows throwing checked exceptions, but
>>>>>>>>>>>>>>>>>> neither SerializableFunction nor SimpleFunction do. The
>>>>>>>>>>>>>>>>>> user-provided function has to satisfy the more restrictive
>>>>>>>>>>>>>>>>>> signature:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> OutputT apply(InputT input);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is there background about why we allow arbitrary checked
>>>>>>>>>>>>>>>>>> exceptions to be thrown in one case but not the other two?
>>>>>>>>>>>>>>>>>> Could we consider expanding SerializableFunction and
>>>>>>>>>>>>>>>>>> SimpleFunction to the following?:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> OutputT apply(InputT input) throws Exception;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This would, for example, simplify the implementation of
>>>>>>>>>>>>>>>>>> ParseJsons and AsJsons, where we have to catch an
>>>>>>>>>>>>>>>>>> IOException in MapElements#via only to rethrow as
>>>>>>>>>>>>>>>>>> RuntimeException.