Hey guys,

Have we disabled the default input copying after all? I don't remember
seeing a Jira or PR for this (maybe I just missed it).

And if not, do we want this in the 0.10 release?

Cheers,
Gyula

On Fri, Oct 2, 2015 at 7:57 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> Do we know what kind of impact the non-reuse policy has? Maybe the
> serialization overhead is subsumed by other effects.
>
> But in general I'm ok with changing the default to non copying. We just
> have to document this feature properly.
> On Oct 2, 2015 6:31 PM, "Maximilian Michels" <m...@apache.org> wrote:
>
> > +1 Good idea. I think we can save quite some CPU cycles by not copying
> > records.
> >
> > That is basically the behavior of the batch API, and there has so far
> never
> > > been an issue with that (people running into the trap of overwritten
> > > mutable elements).
> >
> >
> > As far as I know, this is only the case for chained operators?
> >
> > On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> > > +1 for disable copy by default
> > >
> > >
> > > On 10/02/2015 05:53 PM, Stephan Ewen wrote:
> > > > Hi all!
> > > >
> > > > Now that we are coming to the next release, I wanted to make sure we
> > > > finalize the decision on that point, because it would be nice to not
> > > break
> > > > the behavior of system afterwards.
> > > >
> > > > Right now, when tasks are chained together, the system copies the
> > > elements
> > > > always between different tasks in the same chain.
> > > >
> > > > I think this policy was established under the assumption that copies
> do
> > > not
> > > > cost anything, given our own test examples, which mainly use
> immutable
> > > > types like Strings, boxed primitives, ..
> > > >
> > > > In practice, a lot of data types are actually quite expensive to
> copy.
> > > >
> > > > For example, a rather common data type in the event analysis of
> > > web-sources
> > > > is JSON Object.
> > > > Flink treats this as a generic type. Depending on its concrete
> > > > implementation, Kryo may have perform a serialization copy, which
> means
> > > > encoding into bytes (JSON encoding, charset encoding) and decoding
> > again.
> > > >
> > > > This has a massive impact on the out-of-the-box performance of the
> > > system.
> > > > Given that, I was wondering whether we should set to default policy
> to
> > > "not
> > > > copying".
> > > >
> > > > That is basically the behavior of the batch API, and there has so far
> > > never
> > > > been an issue with that (people running into the trap of overwritten
> > > > mutable elements).
> > > >
> > > > What do you think?
> > > >
> > > > Stephan
> > > >
> > >
> > >
> >
>

Reply via email to