Re: AI 3.0: writeSynchronizationMode re-thinking

Sergey Kozlov Tue, 30 Apr 2019 11:24:34 -0700

Anton

I'm Ok with your proposal but IMO  it should be provided as IEP?


On Mon, Apr 29, 2019 at 4:05 PM Anton Vinogradov <a...@apache.org> wrote:

> Sergey,
>
> I'd like to continue the discussion since it closely linked to the problem
> I'm currently working on.
>
> 1) writeSynchronizationMode should not be a part of cache configuration,
> agree.
> This should be up to the user to decide how strong should be "update
> guarantee".
> So, I propose to have a special cache proxy, .withBlaBla() (at 3.x).
>
> 2) Primary fail on !FULL_SYNC is not the single problem leads to an
> inconsistent state.
> Bugs and incorrect recovery also cause the same problem.
>
> Currently, we have a solution [1] to check cluster to be consistent, but it
> has a bad resolution (will tell you only what partitions are broken).
> So, to find the broken entries you need some special API, which will check
> all copies and let you know what's went wrong.
>
> 3) Since we mostly agree that write should affect some backups in sync way,
> how about to have similar logic for reading?
>
> So, I propose to have special proxy .withQuorumRead(backupsCnt) which will
> check the explicit amount of backups on each read and return you the latest
> values.
> This proxy already implemented [2] for all copies, but I'm going to extend
> it with explicit backups number.
>
> Thoughts?
>
> 3.1) Backups can be checked in two ways:
> - request data from all backups, but wait for explicit number (solves the
> slow backup issue, but produce traffic)
> - request data from an explicit number of backups (less traffic, but can be
> as slow as all copies check case)
> what strategy is better? Should it be configurable?
>
> [1]
>
> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
> [2] https://issues.apache.org/jira/browse/IGNITE-10663
>
> On Thu, Apr 25, 2019 at 7:04 PM Sergey Kozlov <skoz...@gridgain.com>
> wrote:
>
> > There's another point to improve:
> > if  *syncPartitions=N* comes as the configurable in run-time it will
> allow
> > to manage the consistency-performance balance runtime, e.g. switch to
> full
> > async for preloading and then go to back to full sync for regular
> > operations
> >
> >
> > On Thu, Apr 25, 2019 at 6:48 PM Sergey Kozlov <skoz...@gridgain.com>
> > wrote:
> >
> > > Vyacheskav,
> > >
> > > You're right with the referring to MongoDB doc. In general the idea is
> > > very similar. Many vendors use such approach (1).
> > >
> > > [1]
> > >
> >
> https://dev.mysql.com/doc/refman/8.0/en/replication-options-master.html#sysvar_rpl_semi_sync_master_wait_for_slave_count
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Apr 25, 2019 at 6:40 PM Vyacheslav Daradur <
> daradu...@gmail.com>
> > > wrote:
> > >
> > >> Hi, Sergey,
> > >>
> > >> Makes sense to me in case of performance issues, but may lead to
> losing
> > >> data.
> > >>
> > >> >> *by the new option *syncPartitions=N* (not best name just for
> > >> referring)
> > >>
> > >> Seems similar to "Write Concern"[1] in MongoDB. It is used in the same
> > >> way as you described.
> > >>
> > >> On the other hand, if you have such issues it should be investigated
> > >> first: why it causes performance drops: network issues etc.
> > >>
> > >> [1] https://docs.mongodb.com/manual/reference/write-concern/
> > >>
> > >> On Thu, Apr 25, 2019 at 6:24 PM Sergey Kozlov <skoz...@gridgain.com>
> > >> wrote:
> > >> >
> > >> > Ilya
> > >> >
> > >> > See comments inline.
> > >> > On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev <
> > >> ilya.kasnach...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hello!
> > >> > >
> > >> > > When you have 2 backups and N = 1, how will conflicts be resolved?
> > >> > >
> > >> >
> > >> > > Imagine that you had N = 1, and primary node failed immediately
> > after
> > >> > > operation. Now you have one backup that was updated synchronously
> > and
> > >> one
> > >> > > which did not. Will they stay unsynced, or is there any mechanism
> of
> > >> > > re-syncing?
> > >> > >
> > >> >
> > >> > Same way as Ignite processes the failures for PRIMARY_SYNC.
> > >> >
> > >> >
> > >> > >
> > >> > > Why would one want to "update for 1 primary and 1 backup
> > >> synchronously,
> > >> > > update the rest of backup partitions asynchronously"? What's the
> use
> > >> case?
> > >> > >
> > >> >
> > >> > The case to have more backups but do not pay the performance penalty
> > for
> > >> > that :)
> > >> > For the distributed systems one backup looks like risky. But more
> > >> backups
> > >> > directly impacts to performance.
> > >> > Other point is to split the strict consistent apps like bank apps
> and
> > >> the
> > >> > other apps like fraud detection, analytics, reports and so on.
> > >> > In that case you can configure partitions distribution by a custom
> > >> affinity
> > >> > and have following:
> > >> >  - first set of nodes for critical (from consistency point)
> operations
> > >> >  - second set of nodes have async backup partitions only for other
> > >> > operations (reports, analytics)
> > >> >
> > >> >
> > >> >
> > >> > >
> > >> > > Regards,
> > >> > > --
> > >> > > Ilya Kasnacheev
> > >> > >
> > >> > >
> > >> > > чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov <skoz...@gridgain.com
> >:
> > >> > >
> > >> > > > Igniters
> > >> > > >
> > >> > > > I'm working with the wide range of cache configurations and
> found
> > >> (from
> > >> > > my
> > >> > > > standpoint) the interesting point for the discussion:
> > >> > > >
> > >> > > > Now we have following *writeSynchronizationMode *options:
> > >> > > >
> > >> > > >    1. *FULL_ASYNC*
> > >> > > >       -  primary partition updated asynchronously
> > >> > > >       -  backup partitions updated asynchronously
> > >> > > >    2. *PRIMARY_SYNC*
> > >> > > >       - primary partition updated synchronously
> > >> > > >       - backup partitions updated asynchronously
> > >> > > >    3. *FULL_SYNC*
> > >> > > >       - primary partition updated synchronously
> > >> > > >       - backup partitions updated synchronously
> > >> > > >
> > >> > > > The approach above is covering everything if you've 0 or 1
> backup.
> > >> > > > But for 2 or more backups we can't reach the following case
> > >> (something
> > >> > > > between *PRIMARY_SYNC *and *FULL_SYNC*):
> > >> > > >  - update for 1 primary and 1 backup synchronously
> > >> > > >  - update the rest of backup partitions asynchronously
> > >> > > >
> > >> > > > The idea is to join all current modes into single one and
> replace
> > >> > > > *writeSynchronizationMode
> > >> > > > *by the new option *syncPartitions=N* (not best name just for
> > >> referring)
> > >> > > > covers the approach:
> > >> > > >
> > >> > > >    - N = 0 means *FULL_ASYNC*
> > >> > > >    - N = (backups+1) means *FULL_SYNC*
> > >> > > >    - 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or
> new
> > >> mode
> > >> > > >    described above
> > >> > > >
> > >> > > > IMO it will allow to make more flexible and consistent
> > >> configurations
> > >> > > >
> > >> > > > --
> > >> > > > Sergey Kozlov
> > >> > > > GridGain Systems
> > >> > > > www.gridgain.com
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Sergey Kozlov
> > >> > GridGain Systems
> > >> > www.gridgain.com
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Vyacheslav D.
> > >>
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: AI 3.0: writeSynchronizationMode re-thinking

Reply via email to