Re: kick off a discussion

Dmitriy Setrakyan Tue, 12 Jul 2016 19:31:32 -0700

Hi Alex,

I believe most of your comments have to do with disk-based functionality,
especially in regard to backups, snapshots, etc. However, Ignite is
currently an in-memory system, at least for the nearest future. Let me know
if I misunderstood something.


D.

On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik <
[email protected]> wrote:

> Dmitriy, thank you for your time and questions, which helped me to
> realize what I forget to mentioned!
> See my answers inline; later I'll combine everything together to help
> to the next readers :)
>
> I put together some implementation ideas in Apache Ignite JIRA, as
> promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
> this facility as another CacheStore implementation, so it wouldn't
> interfere with base principals of Ignite platform.
>
>
> On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
> <[email protected]> wrote:
> > My answers are inline…
> >
> > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[email protected]
> >
> > wrote:
> >
> >> Thanks Sasha!
> >>
> >> Resending to the dev list.
> >>
> >> D.
> >>
> >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <
> [email protected]>
> >> wrote:
> >>
> >>> Apache Ignite a great platform but it lacks of certain capabilities,
> >>> which are common in RDMS world, such as:
> >>> - Consistent on-line backup for data on entire cluster (or for
> >>> specified set of caches)
> >>>
> >>
> > I think you mean data center replication here. It is not an easy feature
> to
> > implement, and so far has been handled by commercial vendors of Ignite,
> > e.g. GridGain.
> >
> Actually not. Right here I meant exactly what I said: full or
> incremental backup of all/selected caches in consistent state so it
> can be used for the purpose of being able to restore them in case of
> data loss or data corruption. One of important use cases is the OLAP
> systems (let's say for banking), which has been built on Apache Ignite
> platform.
>
> And you right, data center replication can be easily implemented based
> on log/snapshot shipment.
>
> >
> >> - Hierarchal snapshots for specified set caches
> >>>
> >>
> > What do you mean by hierarchical?
> >
> In this particular case the notion of hierarchical snapshots is very
> similar to the same notion used in SAN appliances or by Virtual Box or
> vmware. Using concept of snapshots we can do all this amazing things:
> - full and incremental backup
> - restore
> - rollback to checkpoint
> - roll forward
> much easier, with minimal memory and I/O overhead.
>
> >
> >> - Transaction log
> >>>
> >>
> > Why does Ignite need it for in-memory transactions?
> >
> At least it is required to provide roll-forward functionality, when
> you restores the state of the cache from checkpoint (the cache state
> before snapshot has been made) and then reapply transactions one by
> one.
>
> >
> >> - Restore cluster state as of certain point in time
> >>>
> >>
> > Given that such restorability may introduce lots of memory overhead, does
> > it really make sense  for an in-memory cache?
> >
> Actually, it will not consume any memory. It will use external memory,
> such as HDD/SSD space instead. And yes, I think that this
> functionality makes complete sense for our users IRL, who will love
> it.
>
> >
> >> - Rolling forward from snapshot with ability to filter/modify
> transactions
> >>>
> >>
> > Same as above
> >
> The same as above: my customers in trenches are begging for that feature.
>
> >
> >> - Asynchronous replication based either on log shipment or snapshot
> >>> shipment
> >>> -- Between clusters
> >>>
> >>
> > This is the same as data center replication, no?
> Including but not limited to: log shipment or snapshot shipment also
> could be used to implement so called "better-than-lambda-architecture"
> for BI and OLAP, when data replicated to a query-able datasource let's
> say Oracle as soon as they are produced by OLTP system. We can use
> RDBMS API such as Oracle Streams (going to be discontinued - sad) or
> Golden Gate to filter changes from logs/snapshots and then apply them.
> That approach allows to save a tons of legacy reports and BI
> dashboards.
>
> >
> >
> >> -- Continues data export to let’s say RDMS
> >>>
> >>
> > Don’t we already support it with our write-through feature to a database?
> >
> When write-through used for non-local caches it may cause the data
> corruption in RDBMS: I have opened this issue a few weeks ago:
> https://issues.apache.org/jira/browse/IGNITE-3321
>
> >
> >> It is also a necessity to reduce cold start time for huge clusters
> >>> with strict SLAs.
> >>>
> >>
> > What part are you trying to speed up here? Are you talking about loading
> > data from databases?
> >
> I'm talking about the initial load from Persistent Store when cluster
> has been cold-started (like from GridGain's Local Recoverable Store).
>
> >
> >>
> >>> I'll put some implementation ideas in JIRA later on. I believe that
> >>> this list is far from being complete, but I want the community to
> >>> discuss these abovementioned use cases.
> >>>
> >>> --Sasha
> >>>
> >>
> >>
>

Re: kick off a discussion

Reply via email to