+1 Good idea. I think we can save quite some CPU cycles by not copying records.
That is basically the behavior of the batch API, and there has so far never > been an issue with that (people running into the trap of overwritten > mutable elements). As far as I know, this is only the case for chained operators? On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mj...@apache.org> wrote: > +1 for disable copy by default > > > On 10/02/2015 05:53 PM, Stephan Ewen wrote: > > Hi all! > > > > Now that we are coming to the next release, I wanted to make sure we > > finalize the decision on that point, because it would be nice to not > break > > the behavior of system afterwards. > > > > Right now, when tasks are chained together, the system copies the > elements > > always between different tasks in the same chain. > > > > I think this policy was established under the assumption that copies do > not > > cost anything, given our own test examples, which mainly use immutable > > types like Strings, boxed primitives, .. > > > > In practice, a lot of data types are actually quite expensive to copy. > > > > For example, a rather common data type in the event analysis of > web-sources > > is JSON Object. > > Flink treats this as a generic type. Depending on its concrete > > implementation, Kryo may have perform a serialization copy, which means > > encoding into bytes (JSON encoding, charset encoding) and decoding again. > > > > This has a massive impact on the out-of-the-box performance of the > system. > > Given that, I was wondering whether we should set to default policy to > "not > > copying". > > > > That is basically the behavior of the batch API, and there has so far > never > > been an issue with that (people running into the trap of overwritten > > mutable elements). > > > > What do you think? > > > > Stephan > > > >