Re: Rework locking architecture for MVCC and transactional SQL

Vladimir Ozerov Fri, 15 Dec 2017 05:43:13 -0800

Alex,

That might be very good idea. In fact, what you describe closely resembles
TempDB in MS SQL Server [1]. It is also offloaded to disk, minimally logged
and purged on restart. Ideally we could try designing this component in
generic way, so that it could store a lot different temporal stuff:
1) Locks
2) UNDO data
3) Sort/join data (for SELECT and CREATE INDEX, statistics, whatsoever)
4) If needed - visibility info (e.g. for covering indexes and purge/vacuum)


WDYT?

Vladimir.

[1]
https://docs.microsoft.com/en-us/sql/relational-databases/databases/tempdb-database

On Fri, Dec 15, 2017 at 4:26 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Vladimir,
>
> What about moving the entire locking mechanism to a separate off-heap
> memory region which will be volatile wrt restarts, but will still support
> off-load to disk. In the current architecture, it means that we will need
> to allocate a separate DataRegion with no WAL and no crash recovery - locks
> are meaningless after a restart, and we will automatically drop them. I
> would be interesting to prototype this because I think we may be on-par
> with on-heap lock placement, as we already proved for in-memory caches.
>
> 2017-12-14 21:53 GMT+03:00 Denis Magda <dma...@apache.org>:
>
> > Vladimir,
> >
> > No it’s crystal clear, thanks.
> >
> > If this approach works only for Ignite persistence based deployment, how
> > will we handle locking for pure in-memory and caching of 3rd party
> > databases scenarios? As I understand the tuples still will be stored in
> the
> > page memory while there won’t be any opportunity to fallback to disk if
> the
> > memory usage increases some threshold.
> >
> > —
> > Denis
> >
> > > On Dec 13, 2017, at 11:21 PM, Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> > >
> > > Denis,
> > >
> > > Sorry, may be I was not clear enough - "tuple-approach" and "persistent
> > > approach" are the same. By "tuple" I mean a row stored inside a data
> > block.
> > > Currently we store lock information in Java heap and proposal is to
> move
> > it
> > > to data blocks. The main driver is memory - if there are a rows to be
> > > locked we will either run out of memory, or produce serious memory
> > > pressure. For example, currently update of 1M entries will consume
> ~500Mb
> > > of heap. With proposed approach it will consume almost nothing. The
> > > drawback is increased number of dirty data pages, but it should not be
> a
> > > problem because in final implementation we will update data rows before
> > > prepare phase anyway, so I do not expect any write amplification in
> usual
> > > case.
> > >
> > > This approach is only applicable for Ignite persistence.
> > >
> > > On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > >> Vladimir,
> > >>
> > >> Thanks for a throughout overview and proposal.
> > >>
> > >>> Also we could try employing tiered approach
> > >>> 1) Try to keep everything in-memory to minimize writes to blocks
> > >>> 2) Fallback to persistent lock data if certain threshold is reached.
> > >>
> > >> What are the benefits of the backed-by-persistence approach in compare
> > to
> > >> the one based on tuples? Specifically:
> > >> - will the persistence approach work for both 3rd party and Ignite
> > >> persistence?
> > >> - any performance impacts depending on a chosen method?
> > >> - what’s faster to implement?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov <voze...@gridgain.com>
> > >> wrote:
> > >>>
> > >>> Igniters,
> > >>>
> > >>> As you probably we know we work actively on MVCC [1] and
> transactional
> > >> SQL
> > >>> [2] features which could be treated as a single huge improvement. We
> > >> face a
> > >>> number of challenges and one of them is locking.
> > >>>
> > >>> At the moment information about all locks is kept in memory on
> > per-entry
> > >>> basis (see GridCacheMvccManager). For every locked key we maintain
> > >> current
> > >>> lock owner (XID) and the list of would-be-owner transactions. When
> > >>> transaction is about to lock an entry two scenarios are possible:
> > >>> 1) If entry is not locked we obtain the lock immediately
> > >>> 2) if entry is locked we add current transaction to the wait list and
> > >> jumps
> > >>> to the next entry to be locked. Once the first entry is released by
> > >>> conflicting transaction, current transaction becomes an owner of the
> > >> first
> > >>> entry and tries to promote itself for subsequent entries.
> > >>>
> > >>> Once all required locks are obtained, response is sent to the caller.
> > >>>
> > >>> This approach doesn't work well for transactional SQL - if we update
> > >>> millions of rows in a single transaction we will simply run out of
> > >> memory.
> > >>> To mitigate the problem other database vendors keep information about
> > >> locks
> > >>> inside the tuples. I propose to apply the similar design as follows:
> > >>>
> > >>> 1) No per-entry lock information is stored in memory anymore.
> > >>> 2) The list of active transactions are maintained in memory still
> > >>> 3) When TX locks an entry, it sets special marker to the tuple [3]
> > >>> 4) When TX meets already locked entry, it enlists itself to wait
> queue
> > of
> > >>> conflicting transaction and suspends
> > >>> 5) When first transaction releases conflicting lock, it notifies and
> > >> wakes
> > >>> up suspended transactions, so they resume locking
> > >>> 6) Entry lock data is cleared on transaction commit
> > >>> 7) Entry lock data is not cleared on rollback or node restart;
> Instead,
> > >> we
> > >>> will could use active transactions list to identify invalid locks and
> > >>> overwrite them as needed.
> > >>>
> > >>> Also we could try employing tiered approach
> > >>> 1) Try to keep everything in-memory to minimize writes to blocks
> > >>> 2) Fallback to persistent lock data if certain threshold is reached.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/IGNITE-3478
> > >>> [2] https://issues.apache.org/jira/browse/IGNITE-4191
> > >>> [3] Depends on final MVCC design - it could be per-tuple XID, undo
> > >> vectors,
> > >>> per-block transaction lists, etc..
> > >>>
> > >>> Vladimir.
> > >>
> > >>
> >
> >
>

Re: Rework locking architecture for MVCC and transactional SQL

Reply via email to