Re: Change state backend.

2017-08-17 Thread Ted Yu
bq. we add the key-group to the heap format (1-2 bytes extra per key).

This seems to be better choice among the two.

bq. change the heap backends to write in the same way as RocksDB

+1 on above.

These two combined would give users flexibility in state backend migration.

On Thu, Aug 17, 2017 at 2:55 AM, Stefan Richter <s.rich...@data-artisans.com
> wrote:

> This is not possible out of the box. Historically, the
> checkpoint/savepoint formats have been different between heap based and
> RocksDB based backends. We have already eliminated most differences in 1.3.
>
> However, there are two problems remaining. The first problem is just how
> the number of written key-value pairs is tracked: in the heap case, we have
> all the counts and can serialize: count + iterate all elements. For RocksDB
> this is not possible because there is no way to get the exact key count, so
> the serialization iterates all elements and then writes a terminal symbol
> to the stream. So what we could consider in the future is the change the
> heap backends to write in the same way as RocksDB, which is a slightly less
> natural approach, but much easier than trying to emulate the current heap
> approach with RocksDB. The later would require us do do an iteration over
> the whole state just to obtain the counts.
>
> The second problem is about key-groups: we keep all k/v pairs in RocksDB
> in key-group order by prefixing the key bytes with the key-group id (this
> is important for rescaling). When we write the elements from RocksDB to
> disk, they include the prefix. For the heap backend, this prefix is not
> required because (as we are in memory) we can very efficiently do the
> key-group partitioning in the async part of a checkpoint/savepoint. So here
> have two options, both with some pros and cons. Either we don’t write the
> key-group bytes with RocksDB and recompute them on restore (this means we
> have to go through serde and compute a hash) or we add the key-group to the
> heap format (1-2 bytes extra per key).
>
> So, it is definitely possible to unify both formats completely, but those
> two points need to be resolved.
>
> Hope this gives some more details to the discussion.
>
> Best,
> Stefan
>
> > Am 17.08.2017 um 10:50 schrieb Biplob Biswas <revolutioni...@gmail.com>:
> >
> > I am not really sure you can do that out of the box, if not, indeed that
> > should be possible in the near future.
> >
> > https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/stream/state.html#handling-serializer-
> upgrades-and-compatibility
> >
> > There are already plans for state migration (with upgraded serializers)
> as I
> > read here, so this could be an additional task while migrating states,
> > although I am not sure how easy or difficult this could be.
> >
> > Also, as you can read here,
> > http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Evolving-serializers-and-impact-on-
> flink-managed-states-td14777.html
> >
> > Stefan really nicely explained what would/is happening on state
> migration on
> > different backends.
> >
> > So based on that, what I can imagine is moving from FsStateBackend to
> > RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend
> would
> > be easier, but not the other way round.
> >
> > Thanks
> >
> >
> >
> > --
> > View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Change-state-
> backend-tp14928p14961.html
> > Sent from the Apache Flink User Mailing List archive. mailing list
> archive at Nabble.com.
>
>


Re: Change state backend.

2017-08-17 Thread Stefan Richter
This is not possible out of the box. Historically, the checkpoint/savepoint 
formats have been different between heap based and RocksDB based backends. We 
have already eliminated most differences in 1.3. 

However, there are two problems remaining. The first problem is just how the 
number of written key-value pairs is tracked: in the heap case, we have all the 
counts and can serialize: count + iterate all elements. For RocksDB this is not 
possible because there is no way to get the exact key count, so the 
serialization iterates all elements and then writes a terminal symbol to the 
stream. So what we could consider in the future is the change the heap backends 
to write in the same way as RocksDB, which is a slightly less natural approach, 
but much easier than trying to emulate the current heap approach with RocksDB. 
The later would require us do do an iteration over the whole state just to 
obtain the counts.

The second problem is about key-groups: we keep all k/v pairs in RocksDB in 
key-group order by prefixing the key bytes with the key-group id (this is 
important for rescaling). When we write the elements from RocksDB to disk, they 
include the prefix. For the heap backend, this prefix is not required because 
(as we are in memory) we can very efficiently do the key-group partitioning in 
the async part of a checkpoint/savepoint. So here have two options, both with 
some pros and cons. Either we don’t write the key-group bytes with RocksDB and 
recompute them on restore (this means we have to go through serde and compute a 
hash) or we add the key-group to the heap format (1-2 bytes extra per key).

So, it is definitely possible to unify both formats completely, but those two 
points need to be resolved.

Hope this gives some more details to the discussion.

Best,
Stefan

> Am 17.08.2017 um 10:50 schrieb Biplob Biswas <revolutioni...@gmail.com>:
> 
> I am not really sure you can do that out of the box, if not, indeed that
> should be possible in the near future.
> 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility
> 
> There are already plans for state migration (with upgraded serializers) as I
> read here, so this could be an additional task while migrating states,
> although I am not sure how easy or difficult this could be.
> 
> Also, as you can read here,
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html
> 
> Stefan really nicely explained what would/is happening on state migration on
> different backends.
> 
> So based on that, what I can imagine is moving from FsStateBackend to
> RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend would
> be easier, but not the other way round. 
> 
> Thanks 
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14961.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.



Re: Change state backend.

2017-08-17 Thread Biplob Biswas
I am not really sure you can do that out of the box, if not, indeed that
should be possible in the near future.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility

There are already plans for state migration (with upgraded serializers) as I
read here, so this could be an additional task while migrating states,
although I am not sure how easy or difficult this could be.

Also, as you can read here,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html

Stefan really nicely explained what would/is happening on state migration on
different backends.

So based on that, what I can imagine is moving from FsStateBackend to
RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend would
be easier, but not the other way round. 

Thanks 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14961.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Change state backend.

2017-08-16 Thread Ted Yu
I guess shashank meant switching state backend w.r.t. savepoints.

On Wed, Aug 16, 2017 at 4:00 AM, Biplob Biswas <revolutioni...@gmail.com>
wrote:

> Could you clarify a bit more? Do you want an existing state on a running
> job
> to be migrated from FsStateBackend to RocksDbStateBackend?
>
> Or
>
> Do you have the option of restarting your job after changing existing code?
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Change-state-
> backend-tp14928p14930.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Change state backend.

2017-08-16 Thread Biplob Biswas
Could you clarify a bit more? Do you want an existing state on a running job
to be migrated from FsStateBackend to RocksDbStateBackend? 

Or 

Do you have the option of restarting your job after changing existing code? 







--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14930.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Change state backend.

2017-08-16 Thread shashank agarwal
Hi,

Can i change State backend from FsStateBackend to RocksDBStateBackend
directly or i have to do some migration ?


-- 
Thanks Regards

SHASHANK AGARWAL
 ---  Trying to mobilize the things