RE: Data Snapshots in Ignite

Andrey Kornev Thu, 22 Oct 2015 04:51:04 -0700

Hello,

Just a few questions.


1) It's not clear from the proposed API how to capture/retrieve a consistent 
snapshot of multiple caches. If my query involves a join I'd like to ensure 
consistency across all join participants.
2) Implementation wise, is the snapshot just a physical copy of all cache 
entries and their indexes? Or some other mechanism is being considered?
3) Isolation: is the snapshot isolated with respect to concurrent modifications?
4) Serialization: what are my options to ensure that I can still read the data 
from the old snapshots as my key/value class definitions change over time?

 I feel I do not quite understand the specific use case this feature is 
expected to be applicable to. Why keeping a snapshot for 2 weeks is 
unimaginable, but 1 or 2 hours is ok? 

Also, I think forcing people to set a TTL on a snapshot is pointless and will 
be abused by setting it to an unreasonably large value, just in case.

Thanks
Andrey

> From: [email protected]
> Date: Wed, 21 Oct 2015 10:06:25 +0100
> Subject: Data Snapshots in Ignite
> To: [email protected]
> 
> Hey guys,
> 
> LevelDb has a functionality called Snapshots which provides a consistent
> read-only view of the DB at a given point in time, against which queries
> can be executed.
> 
> To my knowledge, this functionality doesn't exist in the world of open
> source In-Memory Computing. Ignite could be an innovator here.
> 
> Ignite Snapshots would allow queries, distributed closures, map-reduce
> jobs, etc. It could be useful for Spark RDDs to avoid data shift while the
> computation is taking place (not sure if there's already some form of
> snapshotting, though). Same for IGFS.
> 
> Example usage:
> 
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create();
> 
>     // all three queries are executed against a view of the cache at the
> point in time where it was snapshotted
>     snapshot.query("select ...");
>     snapshot.query("select ...");
>     snapshot.query("select ...");
> 
> In fact, it would be awesome to be able to logically save this snapshot
> with a name so that later jobs, queries, etc. can run on top of it, e.g.:
> 
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create("abc");
> 
>     // ...
>     // in another module of a distributed system, or in another thread in
> parallel, use the saved snapshot
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().get("abc");
>     ....
> 
> Named snapshotting can be dangerous due to data retention, e.g. imagine
> keeping a snapshot for 2 weeks! So we should force the user to specify a
> TTL:
> 
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS);
> 
> Such functionality would allow for "reporting checkpoints" and "time
> travel", for example, where you want users to be able to query the data as
> it stood 1 hour ago, 2 hours ago, etc.
> 
> What do you think?
> 
> P.S.: We do have some form of snapshotting in the Compute checkpointing
> functionality – but my proposal is to generalise the notion.
> 
> Regards,
> 
> *Raúl Kripalani*
> PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> Messaging Engineer
> http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> http://blog.raulkr.net | twitter: @raulvk

RE: Data Snapshots in Ignite

Reply via email to