Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

Thomas Munro Thu, 12 Nov 2015 11:12:13 -0800

On Fri, Nov 13, 2015 at 1:16 AM, Simon Riggs <[email protected]> wrote:

> On 11 November 2015 at 09:22, Thomas Munro <[email protected]>
> wrote:
>
>
>> 1.  Reader waits with exposed LSNs, as Heikki suggests.  This is what
>> BerkeleyDB does in "read-your-writes" mode.  It means that application
>> developers have the responsibility for correctly identifying transactions
>> with causal dependencies and dealing with LSNs (or whatever equivalent
>> tokens), potentially even passing them to other processes where the
>> transactions are causally dependent but run by multiple communicating
>> clients (for example, communicating microservices).  This makes it
>> difficult to retrofit load balancing to pre-existing applications and (like
>> anything involving concurrency) difficult to reason about as applications
>> grow in size and complexity.  It is efficient if done correctly, but it is
>> a tax on application complexity.
>>
>
> Agreed. This works if you have a single transaction connected thru a pool
> that does statement-level load balancing, so it works in both session and
> transaction mode.
>
> I was in favour of a scheme like this myself, earlier, but have more
> thoughts now.
>
> We must also consider the need for serialization across sessions or
> transactions.
>
> In transaction pooling mode, an application could get assigned a different
> session, so a token would be much harder to pass around.
>

Sorry for the double reply, I just wanted to add a couple more thoughts.

As discussed elsewhere in the thread, I think it makes absolute sense to
offer some kind of support for causality tokens, I don't see that on its
own as enough for most users.  (At the least, it would be good to have
pg_wait_for_xlog_replay_location(lsn, timeout), but perhaps explicit BEGIN
syntax as suggested by Heikki, or a new field in the libpq protocol which
can be attached to any statement, and likewise for the commit LSN of
results).

It's true that a pooling system/middleware could spy on your sessions and
insert causality token handling imposing a global ordering of visibility
for you, so that naive users don't have to deal with them.  Whenever it
sees a COMMIT result (assuming they are taught to return LSNs), it could
update a highest-LSN-seen variable, and transparently insert a wait for
that LSN into every transaction that it sees beginning.  But then you would
have to push all your queries through a single point that can see
everything across all Postgres servers, and maintain this global high LSN.

In contrast, my writer-waits proposal makes different trade-offs to provide
causal reads as a built-in feature without an external single point
observer of all transactions.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

Reply via email to