+1 Good idea. I think we can save quite some CPU cycles by not copying
records.

That is basically the behavior of the batch API, and there has so far never
> been an issue with that (people running into the trap of overwritten
> mutable elements).


As far as I know, this is only the case for chained operators?

On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mj...@apache.org> wrote:

> +1 for disable copy by default
>
>
> On 10/02/2015 05:53 PM, Stephan Ewen wrote:
> > Hi all!
> >
> > Now that we are coming to the next release, I wanted to make sure we
> > finalize the decision on that point, because it would be nice to not
> break
> > the behavior of system afterwards.
> >
> > Right now, when tasks are chained together, the system copies the
> elements
> > always between different tasks in the same chain.
> >
> > I think this policy was established under the assumption that copies do
> not
> > cost anything, given our own test examples, which mainly use immutable
> > types like Strings, boxed primitives, ..
> >
> > In practice, a lot of data types are actually quite expensive to copy.
> >
> > For example, a rather common data type in the event analysis of
> web-sources
> > is JSON Object.
> > Flink treats this as a generic type. Depending on its concrete
> > implementation, Kryo may have perform a serialization copy, which means
> > encoding into bytes (JSON encoding, charset encoding) and decoding again.
> >
> > This has a massive impact on the out-of-the-box performance of the
> system.
> > Given that, I was wondering whether we should set to default policy to
> "not
> > copying".
> >
> > That is basically the behavior of the batch API, and there has so far
> never
> > been an issue with that (people running into the trap of overwritten
> > mutable elements).
> >
> > What do you think?
> >
> > Stephan
> >
>
>

Reply via email to