Re: IgniteSet implementation: changes required

Pavel Pereslegin Tue, 25 Sep 2018 10:13:29 -0700

Hello Igniters.

As was discussed, IgniteSet implementation was based on on-heap data
duplication (setDataMap), as a result, the data was not recovered after
cluster restart and in the case of large data sets, this led to a
significant heap growing and gc pressure.


We changed the implementation so that this structure works well without
duplicating the data [1]. To reduce performance drop and speed up large
data sets, non-collocated version of IgniteSet now uses separate cache [2].

[1] https://issues.apache.org/jira/browse/IGNITE-5553
[2] https://issues.apache.org/jira/browse/IGNITE-7823



ср, 27 июн. 2018 г. в 23:26, Amir Akhmedov <[email protected]>:

> Yes, you are right.
>
> Thanks,
> Amir
>
>
> On Wed, Jun 27, 2018 at 1:15 PM Denis Magda <[email protected]> wrote:
>
> > Got you. If it's about redundant data duplication in onheap region then
> no
> > any concerns from my side.
> >
> > Anyway, considering that the data structure will be interacting with the
> > page memory directly then its entries can be stored in Ignite persistence
> > automatically (if the latter is on). Does it mean that the data structure
> > will be fully recovered after a restart and its entries can be pulled
> from
> > disk on demand?
> >
> > --
> > Denis
> >
> >
> > On Tue, Jun 26, 2018 at 1:49 PM Amir Akhmedov <[email protected]>
> > wrote:
> >
> > > I also think it will better to remove setDataMap support cause
> > > 1. It's making extra pressure on GC by keeping entries on heap
> > > 2. It has difficult logic to support with lots of nuances
> > > 3. To maintain setDataMap today GridCacheMapEntry calls
> > > cctx.dataStructures().onEntryUpdated() on each entry mutation. I think
> > it's
> > > unnecessary cohesion.
> > > 4. For the case with single Ignite cache for all collocated
> > datastructure,
> > > an iterator creation will not be much slower than current
> implementation
> > > since we can run affinity call on the node where all entries reside.
> > Also,
> > > we can create a better affinity mapper to fairly distribute
> > datastructures
> > > across a cluster rather than mapping by datastructure's name.
> > >
> > > Thanks,
> > > Amir
> > >
> > >
> > > On Tue, Jun 26, 2018 at 8:10 AM Anton Vinogradov <[email protected]>
> wrote:
> > >
> > > > Denis,
> > > >
> > > > I think that better case is to remove onheap
> optimisation/duplication.
> > > > This brings no drop to frequently used operations (put/remove), but
> > even
> > > > will make it slightly faster.
> > > >
> > > > The only one question we have here is "is it possible to restore
> onheap
> > > map
> > > > in easy way?".
> > > > Seems that answer is no, so, I vote for setDataMap removal.
> > > >
> > > > вт, 26 июн. 2018 г. в 15:00, Denis Magda <[email protected]>:
> > > >
> > > > > Anton,
> > > > >
> > > > > Will it be possible to reuse such a functionality for the rest of
> > data
> > > > > structures? I would invest our time in this if all data structures
> > > would
> > > > be
> > > > > able to work with Ignite persistence this way.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Tue, Jun 26, 2018 at 1:53 AM Anton Vinogradov <[email protected]>
> > > wrote:
> > > > >
> > > > > > >> Why don't we read data straight from the persistence layer
> > warming
> > > > RAM
> > > > > > up
> > > > > > >> in the background?
> > > > > > Because it's not a trivial task to finish such loading on
> unstable
> > > > > > topology.
> > > > > > That's possible, ofcourse, but solution and complexity will be
> > almost
> > > > > > equals to WAL enable/disable.
> > > > > >
> > > > > > пн, 25 июн. 2018 г. в 22:13, Denis Magda <[email protected]>:
> > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > Why don't we read data straight from the persistence layer
> > warming
> > > > RAM
> > > > > up
> > > > > > > in the background? (like we do for SQL and other APIs). If
> it's a
> > > > > > question
> > > > > > > of time, then I would suggest us not to hurry up and do it in a
> > > right
> > > > > > way.
> > > > > > >
> > > > > > > --
> > > > > > > Denis
> > > > > > >
> > > > > > > On Mon, Jun 25, 2018 at 6:20 AM Anton Vinogradov <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 to removal in case there is no easy, fast and consistent
> way
> > > to
> > > > > > > restore
> > > > > > > > setDataMap on node restart.
> > > > > > > > I see that we'll gain some performance drop on size() or
> > keys(),
> > > > but
> > > > > > > these
> > > > > > > > methods are rarely used.
> > > > > > > >
> > > > > > > > пн, 25 июн. 2018 г. в 16:07, Pavel Pereslegin <
> > [email protected]
> > > >:
> > > > > > > >
> > > > > > > > > Hello, Igniters.
> > > > > > > > >
> > > > > > > > > I tried to implement IgniteSet data recovery when
> persistence
> > > > > enabled
> > > > > > > > > [1] using trivial cache scanning, however I cannot find
> > optimal
> > > > way
> > > > > > to
> > > > > > > > > do that because of the following reasons:
> > > > > > > > > - Performing operations on IgniteSet requires completion of
> > > data
> > > > > > > > > loading (restoring of setDataMap) on all nodes. Do this
> > during
> > > > > > > > > partition map exchange is too long.
> > > > > > > > > - The prohibition of operations on IgniteSet before the
> > > > completion
> > > > > of
> > > > > > > > > asynchronous cache scanning on all nodes looks rather
> > > > complicated,
> > > > > > > > > because It is necessary to support all situations of
> unstable
> > > > > > > > > topology.
> > > > > > > > >
> > > > > > > > > So I see one option to fix data loss on node restart -
> remove
> > > the
> > > > > > > > > entire optimization (setDataMap) and rework the iterator
> > > > > > > > > implementation to perform cache scanning.
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5553
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2018-03-17 8:20 GMT+03:00 Andrey Kuznetsov <
> > [email protected]
> > > >:
> > > > > > > > > > Thanks, Dmitry. I agree ultimately, DS API uniformity is
> a
> > > > > weighty
> > > > > > > > > reason.
> > > > > > > > > >
> > > > > > > > > > 2018-03-17 3:54 GMT+03:00 Dmitriy Setrakyan <
> > > > > [email protected]
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> On Fri, Mar 16, 2018 at 7:39 AM, Andrey Kuznetsov <
> > > > > > > [email protected]>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Dmitry, your way allows to reuse existing
> > {{Ignite.set()}}
> > > > API
> > > > > > to
> > > > > > > > > create
> > > > > > > > > >> > both set flavors. We can adopt it unless somebody in
> the
> > > > > > community
> > > > > > > > > >> objects.
> > > > > > > > > >> > Personally, I like {{IgniteCache.asSet()}} approach
> > > proposed
> > > > > by
> > > > > > > > > Vladimir
> > > > > > > > > >> O.
> > > > > > > > > >> > more, since it emphasizes the difference between sets
> > > being
> > > > > > > created,
> > > > > > > > > but
> > > > > > > > > >> > this will require API extension.
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> Andrey, I am suggesting that Ignite.set(...) in
> > > non-collocated
> > > > > > mode
> > > > > > > > > behaves
> > > > > > > > > >> exactly the same as the proposed IgniteCache.asSet()
> > > method. I
> > > > > do
> > > > > > > not
> > > > > > > > > like
> > > > > > > > > >> the IgniteCache.asSet() API because it is inconsistent
> > with
> > > > > Ignite
> > > > > > > > data
> > > > > > > > > >> structure design. All data structures are provided on
> > Ignite
> > > > API
> > > > > > > > > directly
> > > > > > > > > >> and we should not change that.
> > > > > > > > > >>
> > > > > > > > > >> D.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > >   Andrey Kuznetsov.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IgniteSet implementation: changes required

Reply via email to