I like it. Not sure if my vote counts ;) On 05/12/2015 07:18 AM, Aljoscha Krettek wrote: > My proposal for the runtime classes (per my Pull Request is this): > > StreamTask: base of streaming tasks, the task is the AbstractInvokable > that runs in the TaskManager and invokes stream operators > OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the > subclasses responsible for actual types of operations. > > StreamOperator: interface for StreamOperators such as Map, Reduce and so on > OneInputOperator and TwoInputStreamOperator are the interface for > operators with one input and two inputs respectively. > > There are also AbstractStreamOperator, which provides basic > implementations for methods such as setup()/open()/close() and > AbstractUdfStreamOperator, which is derived from > AbstractStreamOperator. This is for operators that have user-code, it > deals with calling the correct functions of RichUserFunctionS > (open()/close()/setRuntimeContext()). > > I realised that we should probably not rename all the actual operators > and remove the Stream prefix and suffix, that would be to big a change > and orthogonal to my current PR. Other people can do it if they want. > > These are just my suggestions. Please suggest other consistent naming > schemes if think mine to be bad. > > On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <se...@apache.org> wrote: >> How about separating the discussions about runtime class renaming (there >> seems to be consensus) from the >> API class renaming (no consensus yet). >> >> To go ahead with the runtime classes, can you make a concrete suggestion >> for more memorable/describing names? >> >> For the API classes, kick off a thread, if you want, but please clearly >> mark in your discussion that this is about an API breaking change >> to a user-facing API (that is still declared beta). >> >> >> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> Come to think of it, why do we even need SingleOutputStreamOperator? >>> It is just a subclass of DataStream that has almost no functionality >>> that couldn't be implemented in DataStream. I think it makes people >>> wonder why the result of a transformation is not a DataStream but this >>> mouthful of a class. >>> >>> And, I light of other possibilities such as MapDriver and PactDriver I >>> am quite happy with calling the things StreamOperator and StreamMap. >>> :D >>> >>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.mar...@gmail.com> >>> wrote: >>>> Hi, >>>> >>>> I am in favor of removing the Stream (or Streaming) suffixes and >>> prefixes. >>>> I think that Gyula was also referring to those. >>>> >>>> I think the naming of the Tasks, and user facing operators >>>> (SingleOutputStreamOperator and alike) are fine. >>>> >>>> As for the other bunch of Operators we could name them Drivers to be >>> mostly >>>> in line with the batch naming. By the way, most of the classes do not >>> have >>>> "Operator" in their name currently - e.g. the one encapsulating the map >>>> functionality is called StreamMap, however the base classes >>> (StreamOperator >>>> and ChainableStreamOperator) have it in their name explicitly. I could go >>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be >>>> eliminated anyway - StreamOperator needs a new name then: worst case >>>> scenario PactDriver. :) >>>> >>>> As for n-ary operators I agree with Gyula. >>>> >>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljos...@apache.org> >>>> wrote: >>>> >>>>> Which name changes are you referring to? The proposed names in my >>>>> recent PR? Or the dropping of Stream from all the classes. For the >>>>> rest I was just rambling about how I don't like the names in the batch >>>>> API. :D >>>>> >>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.f...@gmail.com> >>> wrote: >>>>>> Generally I am in favor of making these name changes. My only concern >>> is >>>>>> regarding to the one-input and multiple inputs operators. >>>>>> >>>>>> There is a general problem with the n-ary operators regarding type >>>>> safety, >>>>>> thats why we now have SingleInput and Co (two-input) operators. I >>> think >>>>> we >>>>>> should keep these. >>>>>> >>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek < >>> aljos...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> since I'm currently reworking the Stream operators I thought it's a >>>>>>> good time to talk about the naming of some classes. We have some >>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput, >>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things in >>>>>>> streaming to have more consistent and future-proof naming. >>>>>>> >>>>>>> In streaming, there are: >>>>>>> - Tasks, these are an AbstractInvokabe and contain the main loop of a >>>>>>> streaming vertex. They read from the inputs and forward data to the >>>>>>> operator implementation. >>>>>>> >>>>>>> - Operators, these are invoked by a Task and are responsible for the >>>>>>> actual logic of the operator. Think Map, Join, Reduce and so on. >>> These >>>>>>> are responsible for calling the user-defined function. >>>>>>> >>>>>>> - Operators (again, I know), these are user facing classes (some >>>>>>> derived from DataStream, some not). There is for example >>>>>>> SingleOutputStreamOperator, for the result of a DataStream >>>>>>> transformation that has a single output. There are also >>>>>>> TemporalOperator and its derived classes StreamCrossOperator and >>>>>>> StreamJoinOperator. The actual operator inside a task (the ones I >>>>>>> mentioned before that are responsible for the user logic) that >>>>>>> executes a temporal join is called CoStreamWindow (with a >>>>>>> JoinWindowFunction). >>>>>>> >>>>>>> As I currently have it in my PR, there are two Task classes, one for >>>>>>> single input, and one for two-input operators. There are also the >>>>>>> corresponding operator interfaces for unary and binary operators (see >>>>>>> what I did there ... :D). >>>>>>> >>>>>>> What should we call all these classes (concepts). Also I'm heavily in >>>>>>> favour of dropping all the Stream (or Streaming) prefixes and >>> suffixes >>>>>>> from the class names. I know I'm in streaming because the package is >>>>>>> named streaming. And we should not restrain ourselves because the >>>>>>> batch API also has things called operator. >>>>>>> >>>>>>> Also, the concept of one-input, two-input tasks and operators is not >>>>>>> very scalable, Maybe we should have a single interface for operators >>>>>>> that has a receiveElement(int, element) method that tells the >>> operator >>>>>>> from which input an element came. Then we can scale this to n-ary >>>>>>> operators. This would of course have the overhead of always sending >>>>>>> along the number of the input instead of encoding the input number in >>>>>>> the method name, such as receiveElement1() and receiveElement2(). >>>>>>> >>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails today >>>>>>> but I think it is important we discuss these things before being >>> stuck >>>>>>> with them.) >>>>>>> >>>>>>> Cheers, >>>>>>> Aljoscha >>>>>>> >>>>> >>> >
signature.asc
Description: OpenPGP digital signature