Markus Wanner wrote:
I don't think synchronous replication guarantees that it will be
immediately visible. Even if it did push the change to the other
machine, and the other machine had committed it, that doesn't guarantee
that any reader sees it any more than if I commit to the same machine
(no replication), I am guaranteed to see the change from another
session.

AFAIK every snapshot taken after a transaction has acknowledged its
commit is guaranteed to see changes from that transaction. Isn't that a
pretty frequent and obvious user expectation?

Yes - but that's only really true while the session continues. From another session? I've never assumed that I could reconnect and be guaranteed to get the latest snapshot that includes absolutely everything that has been committed.

Any system that guaranteed this even when involving multiple machines would be guaranteed to be inefficient and difficult to scale in my opinion. How could any system promise to have reasonable commit times while also guaranteeing that once a commit completes, any session to any other server will be able to see the commit? I think this forces some sort of serialization between multiple machines and defeats the purpose of having multiple machines. Where before it was indeterminate to know when the commit would take effect at each replica, it's not indeterminate when my commit will succeed. That is, my commit cannot succeed until every single server acknowledge that it is has fully received and committed my transaction. What happens if there are network problems, or what happens if I am replicating over a slower link? What if I am committing to 100 servers? Is it reasonable to expect 100 server negotiations to complete in full before my own commit will return?

Synchronous replication only means that I can be assured that
my change has been saved permanently by the time my commit completes. It
doesn't mean anybody else can see my change or is guaranteed to see my
change if the query from another session.
So you wouldn't be surprised if a transaction from two hours ago isn't
visible on another node, just because that node happens to be rather
busy with lots of other readers and maintenance tasks?

Any system that is two hours behind should fall out of the pool used to satisfy reads from. So, if there was a surprise, it would be this. I don't believe ACID requires that a commit on one server is immediately visible on another server. Any work I do on the "behind" server would still be safe from a transaction and referential integrity perspective. However, if I executed 'commit' on this "behind" server, I would expect the commit to wait until it catches up, or in the case of a 2 hour behind, I would expect the commit to fail. Look at the alternative - all commits to any server in the pool would be locked up waiting for this one machine to catch up on 2 hours of transaction. This emphasizes that the problem is that a server two hours of date is still in the pool, rather than the problem being keeping things up-to-date.


If my application assumes that it can commit to one server, and then
read back the commit from another server, and my application breaks as a
result, it's because I didn't understand the problem.
Well, yeah, depends on user expectations. I'm surprised to hear that you
have that understanding of synchronous replication.

I've seen people face it in the past. Most recently we had a presentation from the developer of digg.com, and he described how he had this problem with MySQL and that he had to work around it.

On a smaller scale and slightly unrelated, I had this problem frequently between memcache and PostgreSQL. That is, memcache would always be latest, but PostgreSQL might not be latest, because the commit had not occurred.

It seems like a standard enough problem to me. I don't expect Postgres-R to do the impossible. As with my previous paragraph, I don't expect Postgres-R to wait 2-hours to commit just because one server is falling behind.

Even if PostgreSQL
didn't use the word "synchronous replication", I could still be
confused. I need to understand the problem no matter what words are used.

As said, it depends on what the common understanding of "synchronous
replication" is. I've so far been under the impression, that these
potential lags are unexpected and confusing. Several people pointed me
at that problem and I've thus "relabeled" Postgres-R as not being
synchronous. I'm at least surprised to suddenly get pushed into the
other direction. :-)

However, I absolutely agree that it's not that important how we name it.
What is important, is that users and developers understand the difference

I agree they are unexpected and confusing. I don't agree that they are unexpected or confusing to those knowledgeable in the domain. So, the question becomes - whose expectation is wrong? Should the user learn more? Or should we push for a change in terminology? Does it make sense for Postgres-R (which looks excellent to me BTW, at least in principle) be marketed differently, because a few users tie "synchronous replication" to "serialized access"?

Because that's really what we're talking about - we're talking about transactions in all sessions being serialized between machines to provide less surprise to users who don't understand the complexity of having multiple replicas.

Forget replication - even for the exact same server - I don't expect that if I commit from one session, I will be able to see the change immediately from my other session or a new session that I just opened. Perhaps this is often stable to rely on this, and it is useful for the database server to minimize the window during which the commit becomes visible to others, but I think it's a false expectation from the start that it absolutely will be immediately visible to another session. I'm thinking of situations where some part of the table is in cache. The only way the commit can communicate that the new transaction is available is by during communication between the processes or threads, or between the multiple CPUs on the machine. Do I want every commit to force each session to become fully in alignment before my commit completes? Does PostgreSQL make this guarantee today? I bet it doesn't if you look far enough into the guts. It might be very fast - I don't think it is infinitely fast.

Cheers,
mark

--
Mark Mielke <m...@mielke.cc>

Reply via email to