On Wed, Oct 21, 2015 at 4:48 PM, Konstantin Boudnik <c...@apache.org> wrote:

> I like it quite a bit, as well! Ticket would make the most sense as well,
> so
> there will be a single place to collect the design docs (if needed), etc.
>
> On Wed, Oct 21, 2015 at 04:45PM, Dmitriy Setrakyan wrote:
> > I also really like the idea. One potential use case is fraud analysis in
> > financial institutions. Rarely it makes sense to perform such analysis
> on a
> > life system, but rather a snapshot of some data needs to be taken and
> > analyzed offline.
> >
> > I think snapshots should be saved to disk, so users could load them for
> > analysis on a totally different cluster.
>
> I think disk persistence should be optional, not mandatory.
>

I would actually prefer to support disk-only snapshots. I think it will be
difficult (double-the-work) to support both, in-memory and disk formats.
Also, storing snapshots in-memory would require extra memory (a lot of it)
for something that gets saved mainly for historic purposes or offline
analysis.


>
> Cos
>
> > Raul, if you don’t mind, can you file a ticket and see if anyone in the
> > community wants to pick it up?
> >
> > D.
> >
> > On Wed, Oct 21, 2015 at 5:51 AM, Sergi Vladykin <
> sergi.vlady...@gmail.com>
> > wrote:
> >
> > > Raul,
> > >
> > > Actually SQL indexes are already snapshotable. I'm not sure if it does
> make
> > > sense to make
> > > the whole cache (with full cache API support) snapshotable, but I like
> your
> > > idea
> > > about running multiple SQL statements against the same snapshot.
> > >
> > > Also I don't think that it is a good idea to keep snapshots for a long
> > > time,
> > > so I'd prefer to have typical AutoClosable API like:
> > >
> > > try (Snapshot s = ...) {
> > >     s.query(...);
> > >     s.query(...);
> > >     s.query(...);
> > > }
> > >
> > > Though I'm not sure when we will be able to get down to this.
> > >
> > > Sergi
> > >
> > > 2015-10-21 12:06 GMT+03:00 Raul Kripalani <ra...@apache.org>:
> > >
> > > > Hey guys,
> > > >
> > > > LevelDb has a functionality called Snapshots which provides a
> consistent
> > > > read-only view of the DB at a given point in time, against which
> queries
> > > > can be executed.
> > > >
> > > > To my knowledge, this functionality doesn't exist in the world of
> open
> > > > source In-Memory Computing. Ignite could be an innovator here.
> > > >
> > > > Ignite Snapshots would allow queries, distributed closures,
> map-reduce
> > > > jobs, etc. It could be useful for Spark RDDs to avoid data shift
> while
> > > the
> > > > computation is taking place (not sure if there's already some form of
> > > > snapshotting, though). Same for IGFS.
> > > >
> > > > Example usage:
> > > >
> > > >     IgniteCacheSnapshot snapshot =
> > > > ignite.cache("mycache").snapshots().create();
> > > >
> > > >     // all three queries are executed against a view of the cache at
> the
> > > > point in time where it was snapshotted
> > > >     snapshot.query("select ...");
> > > >     snapshot.query("select ...");
> > > >     snapshot.query("select ...");
> > > >
> > > > In fact, it would be awesome to be able to logically save this
> snapshot
> > > > with a name so that later jobs, queries, etc. can run on top of it,
> e.g.:
> > > >
> > > >     IgniteCacheSnapshot snapshot =
> > > > ignite.cache("mycache").snapshots().create("abc");
> > > >
> > > >     // ...
> > > >     // in another module of a distributed system, or in another
> thread in
> > > > parallel, use the saved snapshot
> > > >     IgniteCacheSnapshot snapshot =
> > > > ignite.cache("mycache").snapshots().get("abc");
> > > >     ....
> > > >
> > > > Named snapshotting can be dangerous due to data retention, e.g.
> imagine
> > > > keeping a snapshot for 2 weeks! So we should force the user to
> specify a
> > > > TTL:
> > > >
> > > >     IgniteCacheSnapshot snapshot =
> > > > ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS);
> > > >
> > > > Such functionality would allow for "reporting checkpoints" and "time
> > > > travel", for example, where you want users to be able to query the
> data
> > > as
> > > > it stood 1 hour ago, 2 hours ago, etc.
> > > >
> > > > What do you think?
> > > >
> > > > P.S.: We do have some form of snapshotting in the Compute
> checkpointing
> > > > functionality – but my proposal is to generalise the notion.
> > > >
> > > > Regards,
> > > >
> > > > *Raúl Kripalani*
> > > > PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big
> Data and
> > > > Messaging Engineer
> > > > http://about.me/raulkripalani |
> http://www.linkedin.com/in/raulkripalani
> > > > http://blog.raulkr.net | twitter: @raulvk
> > > >
> > >
>

Reply via email to