David,

I think you're quite close to the mechanism used by Eve, the system
detailed in a paper I sent earlier
<http://web.cse.ohio-state.edu/~yangwang/osdi12-final190.pdf>.

A few questions:

1) Can follower nodes service reads, or are all reads serviced by the
leader?

2) Is the APPROVE_COMMIT message sent after every transaction, or every
batch of transactions?

3) How does hashing work in the case of a delete? Do you send hashes of the
deleted pages as well?

4) In the case of a ROLLBACK, what do you rollback to? In my previous
example, even if you ROLLBACK the update (2) on the follower node after
committing the insert, this protocol is not guaranteed to make forward
progress since you would have to rollback the insert as well (which you
have already committed).

Adam



On Wed, Apr 20, 2016 at 1:58 AM, David Barrett <[email protected]>
wrote:

> Adam -- The scenario you describe seems to depend on:
>
> - Transaction A modifying a page to match a WHERE clause
> - Transaction B updating a page that matches a WHERE clause
>
> However, I wonder if that can be solved by passing along a hash of the
> pages read or written with the APPROVE_COMMIT message sent to slaves.  (We
> use a two-phase commit algorithm for all transactions.)  So it'd work like
> this:
>
> 1) After executing the query on the master, generate two hashes: one of
> the pages read, and the other of the pages written.
> 2) Include those two hashes in the APPROVE_COMMIT message to the slave.
> 3) Recalculate the hashes after executing the transaction on the slave.
> 4) Compare the master and slave hashes.
> 5) If they match, the slave approves the commit.
> 6) If all slaves approve, the master sends out the COMMIT message, and
> everybody commits.
> 7) If even one slave rejects it, the master sends out a ROLLBACK message.
> 8) After ROLLBACK, the master just re-tries and hopes the next time it
> doesn't conflict.  (Or, it runs all conflicting transactions with their own
> "nonConflictingBatchID", just to be safe.)
>
> Basically, any "hidden conflict" between two queries that is overlooked by
> the master would either:
>
> a. Not matter if the slaves coincidentally commit the transactions in the
> same order
> b. Be caught by at least one slave noticing that it read or wrote a page
> that the master didn't.
>
> Would this address the hole you identified?  Can you see other holes in
> the algorithm presented?
>
> -david
>
>
> On Tue, Apr 19, 2016 at 10:53 AM, David Barrett <[email protected]>
> wrote:
>
>> Thank you for the comments!
>>
>> Re: Michael - Great point.  One solution to the "two atomic increments of
>> the same value" problem is to just ROLLBACK the 2nd transaction and re-run
>> it after the first batch completes committing.  This way non-conflicting
>> transactions can safely commit at the same time, while conflicting
>> transactions remain serialized.  Do you agree this would solve that problem?
>>
>> Re: Adam - Huh, really great example.  I don't see a solution off the
>> cuff; I'll need to think on this (and determine if this theoretical case
>> would actually affect our real-world schema).
>>
>> Thanks both, and anybody else who sees problems in the design, please
>> share your thoughts!
>>
>> -david
>>
>>
>>
>> On Tue, Apr 19, 2016 at 6:11 AM, Adam Midvidy <[email protected]> wrote:
>>
>>> David,
>>>
>>> I don't think this conflict detection mechanism works well with updates.
>>>
>>> Consider the following schedule of transactions, running against an
>>> empty table 'foo' with two integer valued columns 'a' and 'b':
>>>
>>> 1) UPDATE foo SET b=2 WHERE a=1
>>>
>>> 2) INSERT INTO foo VALUES (1, 1), (1, 1)
>>>
>>> On the primary the first update is a no-op since the table is empty.
>>> After the subsequent insert the contents of foo are ((1, 1), (1, 1)).
>>>
>>> These transactions are batches together since they touch no common
>>> pages.
>>>
>>> When the replica applies the batch, 2 completes before 1, and the
>>> contents of 'foo' on the secondary are ((1, 2), (1, 2)). As you can see,
>>> the replicas have diverged!
>>>
>>> In general it will be hard to do this type of conflict analysis with
>>> physical replication, you would need to do some kind of analysis on the DML
>>> itself. Unfortunately that boils down to detecting when arbitrary
>>> compositions of boolean predicates intersect, which is NP complete.
>>>
>>> Adam
>>>
>>> On Tue, Apr 19, 2016 at 3:30 AM, David Barrett <[email protected]>
>>> wrote:
>>>
>>>> Thank you so much for the detailed response!  I'm really surprised that
>>>> MySQL and Oracle are so... basic for replication.  Here this whole time
>>>> I've assumed they had some magic capability, huh.  Anyway, regarding this:
>>>>
>>>> On Mon, Apr 18, 2016 at 10:04 PM, Adam Midvidy <[email protected]>
>>>> wrote:
>>>>
>>>>> I can't think of any CP systems used in industry that use your idea of
>>>>> batching non-conflicting replication updates and allowing replicas to 
>>>>> apply
>>>>> them in parallel. ... The reason why this approach isn't popular is that
>>>>> efficiently detecting non-conflicting transactions is actually not a
>>>>> trivial task by any means
>>>>>
>>>>
>>>> Maybe I'm just naive, but it seems straightforward to detect
>>>> conflicting transactions.  Basically:
>>>>
>>>> 1. Start out with:
>>>> - nonConflictingBatchID = 0
>>>>
>>>> 2. Set a callback that is called every time a query touches a page.  In
>>>> this callback, maintain a series of sets:
>>>> - pagesReadByThisTransaction
>>>> - pagesWrittenByThisTransaction
>>>>
>>>> 3. Before you commit, compare those pages to:
>>>> - pagesReadByCommittedTransactions
>>>> - pagesWrittenByCommittedTransactions
>>>>
>>>> 3. It's totally fine for the new transaction to read from a page read
>>>> by other transactions already committed.  But if you read from a page
>>>> written by another transaction, or write to a page read by another
>>>> transaction, then you know that the new transaction conflicts with one or
>>>> more of the previous, already committed transactions.
>>>>
>>>> 4. If there are no conflicts, then commit this new transaction with
>>>> confidence that it didn't conflict with any of the other transactions in
>>>> this batch.  Send out this transaction to slaves with the current
>>>> "nonConflictingBatchID", signaling to slaves that they are free to commit
>>>> this transaction simultaneous to other transactions in the same batch.
>>>> Also:
>>>> - pagesReadByCommittedTransactions += pagesReadByThisTransaction +
>>>> pagesWrittenByThisTransaction; // A write is also a read
>>>> - pagesWrittenByCommittedTransactions += pagesWrittenByThisTransaction;
>>>>
>>>> 5. If there is a conflict, however, then you can still go ahead and
>>>> commit this transaction on the master.  However, increment the
>>>> "nonConflictingBatchID" before replicating out the transaction, to signal
>>>> "wait for all previous batches to finish being committed before
>>>> committing", and then set:
>>>> - pagesReadByCommittedTransactions = pagesReadByThisTransaction;
>>>> - pagesWrittenByCommittedTransactions = pagesWrittenByThisTransaction;
>>>>
>>>> This doesn't actually seem that hard, and maintaining a series of
>>>> unordered sets of page numbers seems like it'll be pretty fast -- much
>>>> faster than the queries themselves, at least.
>>>>
>>>> As for:
>>>>
>>>> and that when the workload is composed of non-conflicting transactions,
>>>>> partitioning (sharding) provides the same performance benefits with less
>>>>> complexity.
>>>>>
>>>>
>>>> True sharding isn't an option as mentioned before.  But this
>>>> effectively creates "ephemeral shards" by detecting non-conflicting queries
>>>> in realtime.  In practice I don't actually think it'll take much to get
>>>> pretty massive improvements: if I can get even *two* non-conflicting
>>>> commits on average, that's the same performance advantage of having 2
>>>> perfectly disjoint shards.  But in practice, I expect we'll get vastly more
>>>> concurrency than that, as we have millions of users across a large database
>>>> -- the odds that two transactions back to back are even for the same
>>>> company (not to mention, linked in such a way as to conflict) seems really
>>>> rare.
>>>>
>>>> Anyway, that's my plan above.  I'd be curious for your thoughts if you
>>>> think it'll work, or if I'm missing something obvious.  Thanks!
>>>>
>>>> -david
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> p2p-hackers mailing list
>>>> [email protected]
>>>> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>>>>
>>>>
>>>
>>> _______________________________________________
>>> p2p-hackers mailing list
>>> [email protected]
>>> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>>>
>>>
>>
>
> _______________________________________________
> p2p-hackers mailing list
> [email protected]
> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>
>
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to