Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Stephan Ewen
We will definitely also try to get the chaining overhead down a bit.

BTW: To reach this kind of throughput, you need sources that can produce
very fast...

On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan  wrote:

> Hi Stephan,
>
> That's good information to know. We will hit that throughput easily. Our
> computation graph has lot of chaining like this right now.
> I think it's safe to minimize the chain right now.
>
> Thanks a lot for this Stephan.
>
> Cheers
>
> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen  wrote:
>
>> In a set of benchmarks a while back, we found that the chaining mechanism
>> has some overhead right now, because of its abstraction. The abstraction
>> creates iterators for each element and makes it hard for the JIT to
>> specialize on the operators in the chain.
>>
>> For purely local chains at full speed, this overhead is observable (can
>> decrease throughput from 25mio elements/core to 15-20mio elements per
>> core). If your job does not reach that throughput, or is I/O bound, source
>> bound, etc, it does not matter.
>>
>> If you care about super high performance, collapsing the code into one
>> function helps.
>>
>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan  wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks for your response. Seems i will use filter and map for now as
>>> that one is really make the intention clear, and not a big performance hit.
>>>
>>> Thanks again.
>>>
>>> Cheers
>>>
>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra 
>>> wrote:
>>>
 Hey Welly,

 If you call filter and map one after the other like you mentioned,
 these operators will be chained and executed as if they were running in the
 same operator.
 The only small performance overhead comes from the fact that the output
 of the filter will be copied before passing it as input to the map to keep
 immutability guarantees (but no serialization/deserialization will happen).
 Copying might be practically free depending on your data type, though.

 If you are using operators that don't make use of the immutability of
 inputs/outputs (i.e you don't hold references to those values) than you can
 disable copying altogether by calling env.getConfig().enableObjectReuse(),
 in which case they will have exactly the same performance.

 Cheers,
 Gyula

 Welly Tambunan  ezt írta (időpont: 2015. szept. 3.,
 Cs, 4:33):

> Hi All,
>
> I would like to filter some item from the event stream. I think there
> are two ways doing this.
>
> Using the regular pipeline filter(...).map(...). We can also use
> flatMap for doing both in the same operator.
>
> Any performance improvement if we are using flatMap ? As that will be
> done in one operator instance.
>
>
> Cheers
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>

>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan,

Cheers

On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen  wrote:

> We will definitely also try to get the chaining overhead down a bit.
>
> BTW: To reach this kind of throughput, you need sources that can produce
> very fast...
>
> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan  wrote:
>
>> Hi Stephan,
>>
>> That's good information to know. We will hit that throughput easily. Our
>> computation graph has lot of chaining like this right now.
>> I think it's safe to minimize the chain right now.
>>
>> Thanks a lot for this Stephan.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen  wrote:
>>
>>> In a set of benchmarks a while back, we found that the chaining
>>> mechanism has some overhead right now, because of its abstraction. The
>>> abstraction creates iterators for each element and makes it hard for the
>>> JIT to specialize on the operators in the chain.
>>>
>>> For purely local chains at full speed, this overhead is observable (can
>>> decrease throughput from 25mio elements/core to 15-20mio elements per
>>> core). If your job does not reach that throughput, or is I/O bound, source
>>> bound, etc, it does not matter.
>>>
>>> If you care about super high performance, collapsing the code into one
>>> function helps.
>>>
>>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan 
>>> wrote:
>>>
 Hi Gyula,

 Thanks for your response. Seems i will use filter and map for now as
 that one is really make the intention clear, and not a big performance hit.

 Thanks again.

 Cheers

 On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra 
 wrote:

> Hey Welly,
>
> If you call filter and map one after the other like you mentioned,
> these operators will be chained and executed as if they were running in 
> the
> same operator.
> The only small performance overhead comes from the fact that the
> output of the filter will be copied before passing it as input to the map
> to keep immutability guarantees (but no serialization/deserialization will
> happen). Copying might be practically free depending on your data type,
> though.
>
> If you are using operators that don't make use of the immutability of
> inputs/outputs (i.e you don't hold references to those values) than you 
> can
> disable copying altogether by calling env.getConfig().enableObjectReuse(),
> in which case they will have exactly the same performance.
>
> Cheers,
> Gyula
>
> Welly Tambunan  ezt írta (időpont: 2015. szept.
> 3., Cs, 4:33):
>
>> Hi All,
>>
>> I would like to filter some item from the event stream. I think there
>> are two ways doing this.
>>
>> Using the regular pipeline filter(...).map(...). We can also use
>> flatMap for doing both in the same operator.
>>
>> Any performance improvement if we are using flatMap ? As that will be
>> done in one operator instance.
>>
>>
>> Cheers
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com 

>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-04 Thread Welly Tambunan
Hi Stephan,

Thanks for your clarification.

Basically we will have lots of sensor that will push this kind of data to
queuing system ( currently we are using RabbitMQ, but will soon move to
Kafka).
We also will use the same pipeline to process the historical data.

I also want to minimize the chaining as in the filter is doing very little
work. By minimizing the pipeline we can minimize db/external source hit and
cached local data efficiently.

Cheers

On Fri, Sep 4, 2015 at 2:58 PM, Welly Tambunan  wrote:

> Hi Stephan,
>
> Cheers
>
> On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen  wrote:
>
>> We will definitely also try to get the chaining overhead down a bit.
>>
>> BTW: To reach this kind of throughput, you need sources that can produce
>> very fast...
>>
>> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan 
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> That's good information to know. We will hit that throughput easily. Our
>>> computation graph has lot of chaining like this right now.
>>> I think it's safe to minimize the chain right now.
>>>
>>> Thanks a lot for this Stephan.
>>>
>>> Cheers
>>>
>>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen  wrote:
>>>
 In a set of benchmarks a while back, we found that the chaining
 mechanism has some overhead right now, because of its abstraction. The
 abstraction creates iterators for each element and makes it hard for the
 JIT to specialize on the operators in the chain.

 For purely local chains at full speed, this overhead is observable (can
 decrease throughput from 25mio elements/core to 15-20mio elements per
 core). If your job does not reach that throughput, or is I/O bound, source
 bound, etc, it does not matter.

 If you care about super high performance, collapsing the code into one
 function helps.

 On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan 
 wrote:

> Hi Gyula,
>
> Thanks for your response. Seems i will use filter and map for now as
> that one is really make the intention clear, and not a big performance 
> hit.
>
> Thanks again.
>
> Cheers
>
> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra 
> wrote:
>
>> Hey Welly,
>>
>> If you call filter and map one after the other like you mentioned,
>> these operators will be chained and executed as if they were running in 
>> the
>> same operator.
>> The only small performance overhead comes from the fact that the
>> output of the filter will be copied before passing it as input to the map
>> to keep immutability guarantees (but no serialization/deserialization 
>> will
>> happen). Copying might be practically free depending on your data type,
>> though.
>>
>> If you are using operators that don't make use of the immutability of
>> inputs/outputs (i.e you don't hold references to those values) than you 
>> can
>> disable copying altogether by calling 
>> env.getConfig().enableObjectReuse(),
>> in which case they will have exactly the same performance.
>>
>> Cheers,
>> Gyula
>>
>> Welly Tambunan  ezt írta (időpont: 2015. szept.
>> 3., Cs, 4:33):
>>
>>> Hi All,
>>>
>>> I would like to filter some item from the event stream. I think
>>> there are two ways doing this.
>>>
>>> Using the regular pipeline filter(...).map(...). We can also use
>>> flatMap for doing both in the same operator.
>>>
>>> Any performance improvement if we are using flatMap ? As that will
>>> be done in one operator instance.
>>>
>>>
>>> Cheers
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-03 Thread Stephan Ewen
In a set of benchmarks a while back, we found that the chaining mechanism
has some overhead right now, because of its abstraction. The abstraction
creates iterators for each element and makes it hard for the JIT to
specialize on the operators in the chain.

For purely local chains at full speed, this overhead is observable (can
decrease throughput from 25mio elements/core to 15-20mio elements per
core). If your job does not reach that throughput, or is I/O bound, source
bound, etc, it does not matter.

If you care about super high performance, collapsing the code into one
function helps.

On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan  wrote:

> Hi Gyula,
>
> Thanks for your response. Seems i will use filter and map for now as that
> one is really make the intention clear, and not a big performance hit.
>
> Thanks again.
>
> Cheers
>
> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra  wrote:
>
>> Hey Welly,
>>
>> If you call filter and map one after the other like you mentioned, these
>> operators will be chained and executed as if they were running in the same
>> operator.
>> The only small performance overhead comes from the fact that the output
>> of the filter will be copied before passing it as input to the map to keep
>> immutability guarantees (but no serialization/deserialization will happen).
>> Copying might be practically free depending on your data type, though.
>>
>> If you are using operators that don't make use of the immutability of
>> inputs/outputs (i.e you don't hold references to those values) than you can
>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>> in which case they will have exactly the same performance.
>>
>> Cheers,
>> Gyula
>>
>> Welly Tambunan  ezt írta (időpont: 2015. szept. 3.,
>> Cs, 4:33):
>>
>>> Hi All,
>>>
>>> I would like to filter some item from the event stream. I think there
>>> are two ways doing this.
>>>
>>> Using the regular pipeline filter(...).map(...). We can also use flatMap
>>> for doing both in the same operator.
>>>
>>> Any performance improvement if we are using flatMap ? As that will be
>>> done in one operator instance.
>>>
>>>
>>> Cheers
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-03 Thread Welly Tambunan
Hi Stephan,

That's good information to know. We will hit that throughput easily. Our
computation graph has lot of chaining like this right now.
I think it's safe to minimize the chain right now.

Thanks a lot for this Stephan.

Cheers

On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen  wrote:

> In a set of benchmarks a while back, we found that the chaining mechanism
> has some overhead right now, because of its abstraction. The abstraction
> creates iterators for each element and makes it hard for the JIT to
> specialize on the operators in the chain.
>
> For purely local chains at full speed, this overhead is observable (can
> decrease throughput from 25mio elements/core to 15-20mio elements per
> core). If your job does not reach that throughput, or is I/O bound, source
> bound, etc, it does not matter.
>
> If you care about super high performance, collapsing the code into one
> function helps.
>
> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan  wrote:
>
>> Hi Gyula,
>>
>> Thanks for your response. Seems i will use filter and map for now as that
>> one is really make the intention clear, and not a big performance hit.
>>
>> Thanks again.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra  wrote:
>>
>>> Hey Welly,
>>>
>>> If you call filter and map one after the other like you mentioned, these
>>> operators will be chained and executed as if they were running in the same
>>> operator.
>>> The only small performance overhead comes from the fact that the output
>>> of the filter will be copied before passing it as input to the map to keep
>>> immutability guarantees (but no serialization/deserialization will happen).
>>> Copying might be practically free depending on your data type, though.
>>>
>>> If you are using operators that don't make use of the immutability of
>>> inputs/outputs (i.e you don't hold references to those values) than you can
>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>> in which case they will have exactly the same performance.
>>>
>>> Cheers,
>>> Gyula
>>>
>>> Welly Tambunan  ezt írta (időpont: 2015. szept. 3.,
>>> Cs, 4:33):
>>>
 Hi All,

 I would like to filter some item from the event stream. I think there
 are two ways doing this.

 Using the regular pipeline filter(...).map(...). We can also use
 flatMap for doing both in the same operator.

 Any performance improvement if we are using flatMap ? As that will be
 done in one operator instance.


 Cheers


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com 

>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Welly Tambunan
Hi All,

I would like to filter some item from the event stream. I think there are
two ways doing this.

Using the regular pipeline filter(...).map(...). We can also use flatMap
for doing both in the same operator.

Any performance improvement if we are using flatMap ? As that will be done
in one operator instance.


Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Gyula Fóra
Hey Welly,

If you call filter and map one after the other like you mentioned, these
operators will be chained and executed as if they were running in the same
operator.
The only small performance overhead comes from the fact that the output of
the filter will be copied before passing it as input to the map to keep
immutability guarantees (but no serialization/deserialization will happen).
Copying might be practically free depending on your data type, though.

If you are using operators that don't make use of the immutability of
inputs/outputs (i.e you don't hold references to those values) than you can
disable copying altogether by calling env.getConfig().enableObjectReuse(),
in which case they will have exactly the same performance.

Cheers,
Gyula

Welly Tambunan  ezt írta (időpont: 2015. szept. 3., Cs,
4:33):

> Hi All,
>
> I would like to filter some item from the event stream. I think there are
> two ways doing this.
>
> Using the regular pipeline filter(...).map(...). We can also use flatMap
> for doing both in the same operator.
>
> Any performance improvement if we are using flatMap ? As that will be done
> in one operator instance.
>
>
> Cheers
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>