Another approach that would solve the problem for our use case (object
re-usage for incremental window ReduceFunctions) would be to copy the first
object that is put into the state.
This would be a change on the ReduceState, not on the overall state
backend, which should be feasible, no?



2016-11-21 15:43 GMT+01:00 Stephan Ewen <se...@apache.org>:

> -1 for copying objects.
>
> Storing a serialized data where possible is good, but copying all objects
> by default is not a good idea, in my opinion.
> A lot of scenarios use data types that are hellishly expensive to copy.
> Even the current copy on chain handover is a problem.
>
> Let's not introduce even more copies.
>
> On Mon, Nov 21, 2016 at 3:16 PM, Maciek Próchniak <m...@touk.pl> wrote:
>
> > Hi,
> >
> > it will come with performance overhead when updating the state, but I
> > think it'll be possible to perform asynchronous snapshots using
> > HeapStateBackend (probably some changes to underlying data structures
> would
> > be needed) - which would bring more predictable performance.
> >
> > thanks,
> > maciek
> >
> >
> > On 21/11/2016 13:48, Aljoscha Krettek wrote:
> >
> >> Hi,
> >> I would be in favour of this since it brings things in line with the
> >> RocksDB backend. This will, however, come with quite the performance
> >> overhead, depending on how fast the TypeSerializer can copy.
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >> On Mon, 21 Nov 2016 at 11:30 Fabian Hueske <fhue...@gmail.com> wrote:
> >>
> >> Hi everybody,
> >>>
> >>> when implementing a ReduceFunction for incremental aggregation of SQL /
> >>> Table API window aggregates we noticed that the HeapStateBackend does
> not
> >>> store copies but holds references to the original objects. In case of a
> >>> SlidingWindow, the same object is referenced from different window
> panes.
> >>> Therefore, it is not possible to modify these objects (in order to
> avoid
> >>> object instantiations, see discussion [1]).
> >>>
> >>> Other state backends serialize their data such that the behavior is not
> >>> consistent across backends.
> >>> If we want to have light-weight tests, we have to create new objects in
> >>> the
> >>> ReduceFunction causing unnecessary overhead.
> >>>
> >>> I would propose to copy objects when storing them in a
> HeapStateBackend.
> >>> This would ensure that objects returned from state to the user behave
> >>> identical for different state backends.
> >>>
> >>> We created a related JIRA [2] that asks to copy records that go into an
> >>> incremental ReduceFunction. The scope is more narrow and would solve
> our
> >>> problem, but would leave the inconsistent behavior of state backends in
> >>> place.
> >>>
> >>> What do others think?
> >>>
> >>> Cheers, Fabian
> >>>
> >>> [1] https://github.com/apache/flink/pull/2792#discussion_r88653721
> >>> [2] https://issues.apache.org/jira/browse/FLINK-5105
> >>>
> >>>
> >
>

Reply via email to