Re: 2015: A Year in Review for Apache Flink

2015-12-30 Thread Chiwan Park
Happy New Year 2016 :)

> On Dec 31, 2015, at 11:22 AM, Henry Saputra  wrote:
> 
> Dear All,
> 
> It is almost end of 2015 and it has been busy and great year for Apache Flink 
> =)
> 
> Robert Metzger had posted great blog summarizing Apache Flink grow for
> this year:
> 
>  https://flink.apache.org/news/2015/12/18/a-year-in-review.html
> 
> Happy New Year everyone and thanks for being part of this great community!
> 
> 
> Thanks,
> 
> - Henry

Regards,
Chiwan Park





Re: Fold vs Reduce in DataStream API

2015-12-30 Thread Brian Chhun
Hi All,

Are certain considerations when using these functions on windowed streams?

>From reading the code, it looks using reduce (or another aggregation
function) on a windowed stream will pre-aggregate the result value as
elements are added to the window, keeping the size of window constant. On
the other hand, the fold function will accumulate elements into the window
and wait until the window is fired before computing the aggregation. Does
this sound correct?

On Thu, Nov 19, 2015 at 1:27 PM, Stephan Ewen  wrote:

> Hi Ron!
>
> Yes, we had to change a few things in the API between 0.9 and 0.10. The
> API in 0.9 had quite a few problems. This one now looks good, we are
> confident that it will stay.
>
> Greetings,
> Stephan
>
>
> On Thu, Nov 19, 2015 at 8:15 PM, Ron Crocker 
> wrote:
>
>> Thanks Stephan, that helps quite a bit. Looks like another one of those
>> API changes that I'll be struggling with for a little bit.
>>
>> On Thu, Nov 19, 2015 at 10:40 AM, Stephan Ewen  wrote:
>>
>>> Hi Ron!
>>>
>>> You are right, there is a copy/paste error in the docs, it should be a
>>> FoldFunction that is passed to fold(), not a ReduceFunction.
>>>
>>> In Flink-0.10, the FoldFunction is only available on
>>>
>>>   - KeyedStream (
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/KeyedStream.html#fold(R,%20org.apache.flink.api.common.functions.FoldFunction)
>>> )
>>>
>>>   - WindowedStream (
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/WindowedStream.html#fold(R,%20org.apache.flink.api.common.functions.FoldFunction,%20org.apache.flink.api.common.typeinfo.TypeInformation)
>>> )
>>>
>>> In most cases, you probably want the variant on the WindowedStream, if
>>> you aggregate values over time.
>>>
>>> 
>>>
>>> To the difference between fold() and reduce(): It is very subtle. The
>>> fold function can also convert to another type whenever it integrates a new
>>> element.
>>>
>>> Here is an example (with lists, not streams, but same principle).
>>>
>>> 
>>>
>>> ReduceFunction {
>>>
>>>   public Integer reduce(Integer a, Integer b) { return a + b; }
>>> }
>>>
>>> [1, 2, 3, 4, 5] -> reduce()  means: 1 + 2) + 3) + 4) + 5) = 15
>>>
>>> 
>>>
>>> FoldFunction {
>>>
>>>   public String fold(String current, Integer i) { return current +
>>> String.valueOf(i); }
>>> }
>>>
>>> [1, 2, 3, 4, 5] -> fold("start-")  means: ("start-" + 1) + 2) + 3) +
>>> 4) + 5) = "start-12345" (as a String)
>>>
>>>
>>> I hope that example illustrates the difference.
>>>
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Thu, Nov 19, 2015 at 7:00 PM, Ron Crocker 
>>> wrote:
>>>
 Hi Fabian -

 Thanks Fabian, that is a helpful description.

 That document WAS my source of information and it seems to also be the
 source of my confusion. Further, it appears to be wrong - there is a
 FoldFunction (
 https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/functions/FoldFunction.html)
 that should be passed into fold()?

 Separate note: fold() doesn't appear in the javadocs for 0.10.0
 DataStream (see
 https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/DataStream.html).
 So this made me look in the freshly-downloaded flink-streaming-java:0.10.0
 and fold() does not appear in org
 .apache.flink.streaming.api.datastream.DataStream either. Am I looking
 in the wrong place for it? In 0.9.1, it's located in that same class with
 this signature: fold(R initialValue, FoldFunction folder).

 Ron

 On Wed, Nov 18, 2015 at 9:39 AM, Fabian Hueske 
 wrote:

> Hi Ron,
>
> Have you checked:
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
> ?
>
> Fold is like reduce, except that you define a start element (of a
> different type than the input type) and the result type is the type of the
> initial value. In reduce, the result type must be identical to the input
> type.
>
> Best, Fabian
>
> 2015-11-18 18:32 GMT+01:00 Ron Crocker :
>
>> Is there a succinct description of the distinction between these
>> transforms?
>>
>
 --
 Ron Crocker
 Principal Software Engineer
 ( ( •)) New Relic
 rcroc...@newrelic.com
 M: +1 630 363 8835

>>>
>>>
>>
>>
>> --
>> Ron Crocker
>> Principal Software Engineer
>> ( ( •)) New Relic
>> rcroc...@newrelic.com
>> M: +1 630 363 8835
>>
>
>