Hey Andre, I think I answered some of your questions in my response to Dmitriy [1]. Could you please have a look and tell me if it answers your questions?
N.B.: My idea is based around the typical use case for LevelDb Snapshots, but we might create something entirely different in Ignite if the community wants to. [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-Snapshots-in-Ignite-tp4183p4220.html *Raúl Kripalani* PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and Messaging Engineer http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani http://blog.raulkr.net | twitter: @raulvk On Thu, Oct 22, 2015 at 12:49 PM, Andrey Kornev <[email protected]> wrote: > Hello, > > Just a few questions. > > 1) It's not clear from the proposed API how to capture/retrieve a > consistent snapshot of multiple caches. If my query involves a join I'd > like to ensure consistency across all join participants. > 2) Implementation wise, is the snapshot just a physical copy of all cache > entries and their indexes? Or some other mechanism is being considered? > 3) Isolation: is the snapshot isolated with respect to concurrent > modifications? > 4) Serialization: what are my options to ensure that I can still read the > data from the old snapshots as my key/value class definitions change over > time? > > I feel I do not quite understand the specific use case this feature is > expected to be applicable to. Why keeping a snapshot for 2 weeks is > unimaginable, but 1 or 2 hours is ok? > > Also, I think forcing people to set a TTL on a snapshot is pointless and > will be abused by setting it to an unreasonably large value, just in case. > > Thanks > Andrey > > > From: [email protected] > > Date: Wed, 21 Oct 2015 10:06:25 +0100 > > Subject: Data Snapshots in Ignite > > To: [email protected] > > > > Hey guys, > > > > LevelDb has a functionality called Snapshots which provides a consistent > > read-only view of the DB at a given point in time, against which queries > > can be executed. > > > > To my knowledge, this functionality doesn't exist in the world of open > > source In-Memory Computing. Ignite could be an innovator here. > > > > Ignite Snapshots would allow queries, distributed closures, map-reduce > > jobs, etc. It could be useful for Spark RDDs to avoid data shift while > the > > computation is taking place (not sure if there's already some form of > > snapshotting, though). Same for IGFS. > > > > Example usage: > > > > IgniteCacheSnapshot snapshot = > > ignite.cache("mycache").snapshots().create(); > > > > // all three queries are executed against a view of the cache at the > > point in time where it was snapshotted > > snapshot.query("select ..."); > > snapshot.query("select ..."); > > snapshot.query("select ..."); > > > > In fact, it would be awesome to be able to logically save this snapshot > > with a name so that later jobs, queries, etc. can run on top of it, e.g.: > > > > IgniteCacheSnapshot snapshot = > > ignite.cache("mycache").snapshots().create("abc"); > > > > // ... > > // in another module of a distributed system, or in another thread in > > parallel, use the saved snapshot > > IgniteCacheSnapshot snapshot = > > ignite.cache("mycache").snapshots().get("abc"); > > .... > > > > Named snapshotting can be dangerous due to data retention, e.g. imagine > > keeping a snapshot for 2 weeks! So we should force the user to specify a > > TTL: > > > > IgniteCacheSnapshot snapshot = > > ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS); > > > > Such functionality would allow for "reporting checkpoints" and "time > > travel", for example, where you want users to be able to query the data > as > > it stood 1 hour ago, 2 hours ago, etc. > > > > What do you think? > > > > P.S.: We do have some form of snapshotting in the Compute checkpointing > > functionality – but my proposal is to generalise the notion. > > > > Regards, > > > > *Raúl Kripalani* > > PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and > > Messaging Engineer > > http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani > > http://blog.raulkr.net | twitter: @raulvk > >
