Re: [HACKERS] Hot standby, overflowed snapshots, testing
On 11/15/09 2:25 AM PST, "Simon Riggs" wrote: > On Sat, 2009-11-14 at 08:43 -0800, Robert Hodges wrote: > >> I can help set up automated basic tests for hot standby using 1+1 setups on >> Amazon. I¹m already working on tests for warm standby for our commercial >> Tungsten implementation and need to solve the problem of creating tests that >> adapt flexibly across different replication mechanisms. > > I didn't leap immediately to say yes for a couple of reasons. > I'm easy on this. We are going to find some hot standby problems no matter what from our own testing. At least I hope so. It does sound to me as if there is a class of errors that would be easiest to find by putting up a long running test that throws a lot of different queries at the server over time. We have such tests already written in our Bristlecone tools. Cheers, Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, overflowed snapshots, testing
Hi Simon and Heikki, I can help set up automated basic tests for hot standby using 1+1 setups on Amazon. I¹m already working on tests for warm standby for our commercial Tungsten implementation and need to solve the problem of creating tests that adapt flexibly across different replication mechanisms. It would be nice to add a list of test cases to the write-up on the Hot Standby wiki (http://wiki.postgresql.org/wiki/Hot_Standby). I would be happy to help with that effort. Cheers, Robert On 11/13/09 1:43 PM PST, "Simon Riggs" wrote: > On Fri, 2009-11-13 at 22:19 +0200, Heikki Linnakangas wrote: > >> I got the impression earlier that you had some test environment set up >> to test hot standby. Can you share any details of what test cases >> you've run? > > Fair question. The Sep 15 submission happened too quickly for us to > mobilise testers, so the final submission was submitted with only manual > testing by me. Many last minute major bug fixes meant that the code was > much less tested than I would have hoped - you found some of those while > I lay exhausted from the efforts to hit a superimposed and unrealistic > deadline. I expected us to kick in to fix those but it never happened > and that was why I was keen to withdraw the patch about a week later. > > You've been kicking hell out of it for a while now, rightly so, so I've > left it a while before commencing another set of changes and more > testing to follow. > > It takes time, and money, to mobilise qualified testers, so that should > begin again shortly. > > I agreed with you at PGday that we shouldn't expect a quick commit. > There are good reasons for that, but still no panic in my mind about > skipping this release. > > -- > Simon Riggs www.2ndQuadrant.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write ahead logging in standby (streaming replication)
Hi Greg and Fujii, Just a point on terminology: there's a difference in the usage of semi-synchronous between DRBD and MySQL semi-synchronous replication, which was originally developed by Google. In the Google case semi-synchronous replication is a quorum algorithm where clients receive a commit notification only after at least one of N slaves has received the replication event. In the DRBD case semi-synchronous means that events have reached the slave but are not necessarily durable. There's no quorum. Of these two usages the Google semi-sync approach is the more interesting because it avoids the availability problems associated with fully synchronous operation but gets most of the durability benefits. Cheers, Robert On 11/12/09 9:29 PM PST, "Fujii Masao" wrote: > On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith wrote: >> Right, those are the possibilities, all four of them have valid use cases in >> the field and are worth implementing. I don't like the label >> "semi-synchronous replication" myself, but it's a valuable feature to >> implement, and that is unfortunately the term other parts of the industry >> use for that approach. > > BTW, MySQL and DRBD use the term "semi-synchronous": > http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication > http://www.drbd.org/users-guide/s-replication-protocols.html > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Conflict resolution in Multimaster replication(Postgres-R)
Hi Srinivas, Multi-master replication in Postgres-R is handled using a process called certification that ensures there are no serializability violations. Look at the paper by Kemme and Alonzo entitled "Don't be Lazy, Be Consistent..." (http://www.cs.mcgill.ca/~kemme/papers/vldb00.html). In the first case you describe one transaction must abort if applying them would break serializability. In the second case you describe, you must transmit read sets as well as write sets. The same sort of algorithm is applied as for writes. Please email me directly if you want more information. Thanks, Robert On 9/3/08 4:02 PM, "M2Y" <[EMAIL PROTECTED]> wrote: Hello, My basic question is: in multimaster replication, if each site goes ahead and does the modifications issued by the transaction and then sends the writeset to others in the group, how the ACID properties be maintained? Details: Suppose there are two sites in the group, lets say, A and B and are managing a database D. Two transactions TA and TB started in sites A and B respectively, at nearly same time, wanted to update same row of a table in the database. As, no locking structures and other concurrency handling structures are replicated each will go ahead and do the modifications in their corresponding databases and sends the writeset. Since, both writesets contain update to the same row, will the two transactions be rolled back or anything other than this happens? A more general question is: for Transactional isolation level 4(serializable level), the information such as locking of rows be transmitted across sites? If not, what is the mechanism to address concurrency with serializibility. Thanks, Srinivas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Transaction-controlled robustness for replication
Hi Tom, Part of this is semantics-I like Simon's logical vs. physical terminology because it distinguishes neatly between replication that copies implementation down to OIDs etc. and replication that copies data content including schema changes but not implementation. It seems a noble goal get both to work well, as they are quite complementary. There are various ways to get information to recapitulate SQL, but piggy-backing off WAL record generation has a lot of advantages. You at least have the data structures and don't have to reverse-engineer log information on disk. Of the multiple ways to build capable logical replication solutions, this seems to involve the least effort. My company is currently heads down building a solution for Oracle based on reading REDO log files. It requires a master of Oracle dark arts to decode them and is also purely asynchronous. PostgreSQL will eventually be far better as these discussions boil down into designs. Thanks, Robert On 8/12/08 8:51 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: Markus Wanner <[EMAIL PROTECTED]> writes: > Robert Hodges wrote: >> Could you expand on why logical application of WAL records is impractical in >> these cases? This is what Oracle does. Moreover once you are into SQL a >> lot of other use cases immediately become practical, such as large scale >> master/slave set-ups for read scaling. > I cannot speak for Tom, but what strikes me as a strange approach here > is using the WAL for "logical application" of changes. That's because > the WAL is quite far away from SQL, and thus from a "logical > representation" of the data. It's rather pretty physical, meaning it's > bound to a certain Postgres release and CPU architecture. Right. To take just one example: the value of MAXALIGN affects not only how many tuples you can put on a heap page (thus changing TIDs of tuples, which fundamentally breaks most of the current types of WAL records) but how many tuples you can put on an index page (and thus index page split decisions, and thereby pretty much every single fact about the contents of upper btree levels). We need not go into architecture dependencies that are stronger than that one, though there are many. As for version independence, who thinks they can WAL-replicate changes of the system catalogs into a different version with significantly different system catalogs? You couldn't even begin to make this work with anything approaching the current level of semantic detail of WAL entries. What I think Simon was actually driving at was query-shipping, which is not my idea of "WAL" at all. It has some usefulness, but also a bunch of downsides of its very own, mostly centered around reproducibility. With the current WAL design I have some faith that the slaves reproduce the contents of the master. With any "logical replication" design that becomes a faith of the religious kind, because it sure isn't provable. regards, tom lane -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Transaction-controlled robustness for replication
Hi Tom, Could you expand on why logical application of WAL records is impractical in these cases? This is what Oracle does. Moreover once you are into SQL a lot of other use cases immediately become practical, such as large scale master/slave set-ups for read scaling. Thanks, Robert On 8/12/08 12:40 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: >> On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote: >>> What is the attraction of logical application of the WAL logs? >>> Transmitting to a server with different architecture? > >> Yes, > >> * different release >> * different encoding >> * different CPU architecture >> * (with the correct transform) a different DBMS > > The notion that the WAL logs will ever be portable across such > differences is so ... so ... well, it's barely worth laughing at. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Follow-up on replication hooks for PostgreSQL
Hi Marko, No fear, we definitely will discuss on pgsql-hackers. I just wanted to make sure that people understood we are still committed to solving this problem and will one way or another commit resources to help. Just to be clear, by logical replication I mean replication based on sending SQL or near-SQL (e.g., generic DML events) between servers. Physical replication on the other hand uses internal formats to replicate changes without intervening conversion to SQL, for example by shipping WAL records. There are advantages to each for different applications. BTW, I heard this nomenclature from Simon Riggs. It seems quite helpful. The DDL trigger proposal is interesting and would be a very useful feature addition to PostgreSQL. To execute correctly it may also be necessary to know which database you were using at the time the SQL was issued. For our part we are looking for ways to replicate most or all data on a server as efficiently as possible. Generic call-outs at commit time or reading the log directly are attractive approaches. Depending on the implementation you can avoid double writes of replicated data on the master host. Also, it avoids the management headache of ensuring that triggers are correctly installed. It seems as if one of these generic approaches could hook into WAL record transport. Cheers, Robert On 7/10/08 4:56 AM, "Marko Kreen" <[EMAIL PROTECTED]> wrote: On 7/10/08, Robert Hodges <[EMAIL PROTECTED]> wrote: > This is a quick update on a promise I made early in June to suggest > requirements as well as ways to add replication hooks that would support > logical replication, as opposed to the physical replication work currently > underway based on NTT's code. > > Well, June was a pretty busy month, so it has taken a while to get back to > this. However, we are now beginning to examine options for PostgreSQL > logical replication. To make a long story short we are willing to commit > resources to this problem or fund other people to do it for us. If you are > interested please contact me directly. Meanwhile, we are quite serious > about this problem and intend to work on helpful additions to PostgreSQL in > this area. I will post more as we make progress. Well, I'm not exactly sure what you are planning. It's OK to do draft design privately, but before actually starting coding, the design should be discussed in -hackers. And I'm not exactly sure what you mean in "logical replication"? Way to log DDL statements? Do you want to log DML also? FWIW, here's very draft design for functionality that could be used to make current Slony-I/Londiste-like solutions to replicate DDL also. 1. CREATE DDL TRIGGER statement that allows to call function for all DDL statements. Only filtering that makes sense here is filtering by area: tables/functions/views/etc. It must be possible to do AFTER trigger. Whether BEFORE trigger for DDL make sense or not, I'm not sure. 2. When function is called, following information is given: - Object type the event was for (table/view/function) - Array of object names. - SQL statement as text. The trigger function can filter further based on object names whether it does want to log the event or not. Trying to make the trigger run on only subset of events is complex, and parsing the SQL to pieces for trigger to understand it better is also complex and neither is needed. Unless there are some common situation where such simple design fails to work, I would not make the scheme more complex. Also the design should be based on assumption that the target side is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE. We should not make scheme more complex to survive cases where target is not in sync. That way madness lies. The effect should be like same SQL statements are applied to target by hand, no more, no less. -- marko -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
[HACKERS] Follow-up on replication hooks for PostgreSQL
Hi everyone, This is a quick update on a promise I made early in June to suggest requirements as well as ways to add replication hooks that would support logical replication, as opposed to the physical replication work currently underway based on NTT's code. Well, June was a pretty busy month, so it has taken a while to get back to this. However, we are now beginning to examine options for PostgreSQL logical replication. To make a long story short we are willing to commit resources to this problem or fund other people to do it for us. If you are interested please contact me directly. Meanwhile, we are quite serious about this problem and intend to work on helpful additions to PostgreSQL in this area. I will post more as we make progress. Thanks, Robert -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] P.s., Happy 12th birthday everyone!
Re: [HACKERS] Table rewrites vs. pending AFTER triggers
Hi Gokul, If you are saying that DDL should be auto-commit, yes, this really does limit some use cases. Transactional DDL is quite helpful for SQL generators, which need to avoid leaving schema half-changed if the application crashes or there¹s a problem with the database that causes a command to fail. SLONY is an example of such a generator where transactional DDL would be helpful though I don¹t know for a fact that SLONY uses it. We have used it in the past for building queues in SQL, which required multiple schema changes for a single queue. In sum, it¹s much easier to implement such tools if you can do a set of schema changes atomically. There are no doubt other use cases as well. Cheers, Robert On 1/2/08 11:04 PM, "Gokulakannan Somasundaram" <[EMAIL PROTECTED]> wrote: > Is there why we allow DDLs inside a transaction and allow it to be rolled > back? If we commit the previous transaction, as soon as we encounter a DDL, > and commit the DDL too (without waiting for commit) will it be affecting some > use cases? > > I actually mean to say that DDLs can be declared as self-committing. That > would get rid of these exceptions. > > Am i missing something? > > Thanks, > Gokul. > > On Jan 3, 2008 12:02 AM, Andrew Dunstan < [EMAIL PROTECTED]> wrote: >> >> >> Simon Riggs wrote: >>> On Tue, 2008-01-01 at 16:09 -0500, Tom Lane wrote: >>> >>> >>>> Paranoia would >>>> suggest forbidding *any* form of ALTER TABLE when there are pending >>>> trigger events, but maybe that's unnecessarily strong. >>>> >>> >>> That works for me. Such a combination makes no sense, so banning it is >>> the right thing to do. >>> >>> >> >> +1. Doesn't make much sense to me either. >> >> cheers >> >> andrew >> >> ---(end of broadcast)--- >> TIP 4: Have you searched our list archives? >> >>http://archives.postgresql.org >> <http://archives.postgresql.org> > > > -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Hannu, Hi Hannu, On 6/1/08 2:14 PM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote: > >> As a consequence, I don¹t see how you can get around doing some sort >> of row-based replication like all the other databases. > > Is'nt WAL-base replication "some sort of row-based replication" ? > Yes, in theory. However, there's a big difference between replicating physical WAL records and doing logical replication with SQL statements. Logical replication requires extra information to reconstruct primary keys. (Somebody tell me if this is already in the WAL; I'm learning the code as fast as possible but assuming for now it's not.) > >> Now that people are starting to get religion on this issue I would >> strongly advocate a parallel effort to put in a change-set extraction >> API that would allow construction of comprehensive master/slave >> replication. > > Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko > Kreen's presentation at pgCon08. > > Thanks very much for the pointer. The slides look interesting. Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Merlin, My point here is that with reasonably small extensions to the core you can build products that are a lot better than SLONY. Triggers do not cover DDL, among other issues, and it's debatable whether they are the best way to implement quorum policies like Google's semi-synchronous replication. As I mentioned separately this topic deserves another thread which I promise to start. It is of course possible to meet some of these needs with an appropriate client interface to WAL shipping. There's no a-priori reason why built-in PostgreSQL slaves need to be the only client. I would put a vote in for covering this possibility in the initial replication design. We are using a very similar approach in our own master/slave replication product. Thanks, Robert P.S., No offense intended to Jan Wieck et al. There are some pretty cool things in SLONY. On 5/29/08 8:16 PM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote: On Thu, May 29, 2008 at 3:05 PM, Robert Hodges <[EMAIL PROTECTED]> wrote: > Third, you can't stop with just this feature. (This is the BUT part of the > post.) The use cases not covered by this feature area actually pretty > large. Here are a few that concern me: > > 1.) Partial replication. > 2.) WAN replication. > 3.) Bi-directional replication. (Yes, this is evil but there are problems > where it is indispensable.) > 4.) Upgrade support. Aside from database upgrade (how would this ever > really work between versions?), it would not support zero-downtime app > upgrades, which depend on bi-directional replication tricks. > 5.) Heterogeneous replication. > 6.) Finally, performance scaling using scale-out over large numbers of > replicas. I think it's possible to get tunnel vision on this-it's not a big > requirement in the PG community because people don't use PG in the first > place when they want to do this. They use MySQL, which has very good > replication for performance scaling, though it's rather weak for > availability. These type of things are what Slony is for. Slony is trigger based. This makes it more complex than log shipping style replication, but provides lots of functionality. wal shipping based replication is maybe the fastest possible solution...you are already paying the overhead so it comes virtually for free from the point of view of the master. mysql replication is imo nearly worthless from backup standpoint. merlin -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] replication hooks
Hi Marko, Replication requirements vary widely of course, but DDL support is shared by such a wide range of use cases it is very difficult to see how any real solution would fail to include it. This extends to change extraction APIs, however, defined. The question of what DDL to replicate is also quite clear-all of it with as few exceptions as possible. For instance, it is almost impossible to set up and manage replicated systems easily if you cannot propagate schema changes in serialized order along with other updates from applications. The inconvenience of using alternative mechanisms like the SLONY 'execute script' is considerable and breaks most commonly used database management tools. That said, SLONY at least serializes the changes. Non-serialized approaches lead to serious outages and can get you into distributed consensus problems, such as when is it 'safe' to change schema across different instances. These are very hard to solve practically and tend to run into known impossibility results like Brewer's Conjecture, which holds that it is impossible to keep distributed databases consistent while also remaining open for updates and handling network partitions. I'll post back later on the question of the API. The key is to do something simple that avoids the problems discussed by Andrew and ties it accurately to use cases. However, this requires a more prepared response than my hastily written post from last night. Cheers, Robert On 5/29/08 9:05 PM, "Marko Kreen" <[EMAIL PROTECTED]> wrote: On 5/29/08, Andrew Sullivan <[EMAIL PROTECTED]> wrote: > On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: > > people are starting to get religion on this issue I would strongly > > advocate a parallel effort to put in a change-set extraction API > > that would allow construction of comprehensive master/slave > > replication. > > You know, I gave a talk in Ottawa just last week about how the last > effort to develop a comprehensive API for replication failed. I had > some ideas about why, the main one of which is something like this: > "Big features with a roadmap have not historically worked, so unless > we're willing to change the way we work, we won't get that." > > I don't think an API is what's needed. It's clear proposals for > particlar features that can be delivered in small pieces. That's what > the current proposal offers. I think any kind of row-based approach > such as what you're proposing would need that kind of proposal too. > > That isn't to say that I think an API is impossible or undesirable. > It is to say that the last few times we tried, it went nowhere; and > that I don't think the circumstances have changed. I think the issue is simpler - API for synchronous replication is undesirable - it would be too complex and hinder future development (as I explained above). And the API for asynchronous replication is already there - triggers, txid functions for queueing. There is this tiny matter of replicating schema changes asynchronously, but I suspect nobody actually cares. Few random points about that: - The task cannot even be clearly defined (on technical level - how the events should be represented). - Any schema changes need to be carefully prepared anyway. Whether to apply them to one or more servers does not make much difference. - Major plus of async replica is ability to actually have different schema on slaves. - People _do_ care about exact schema on single place - failover servers. - But for failover server we want also synchronous replication. So if we have synchronous WAL based replication for failover servers, the interest in hooks to log schema changes will decrease even more. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Tom, Thanks for the reasoned reply. As you saw from point #2 in my comments, I think you should do this feature. I hope this answers Josh Berkus' concern about my comments. You make a very interesting comment which seems to go to the heart of this design approach: > About the only thing that would make me want to consider row-based > replication in core would be if we determine that read-only slave > queries are impractical atop a WAL-log-shipping implementation. It's possible I'm misunderstanding some of the implementation issues, but it is striking that the detailed responses to your proposal list a number of low-level dependencies between master and slave states when replicating WAL records. It appears that you are designing a replication mechanism that works effectively between a master and a relatively small number of "nearby" slaves. This is clearly an important use case but it also seems clear that the WAL approach is not a general-purpose approach to replication. In other words, you'll incrementally get to that limited end point I describe. This will still leave a lot to be desired on read scaling, not to mention many other cases. Hence my original comments. However, rather than harp on that further I will open up a separate thread to describe a relatively small set of extensions to PostgreSQL that would be enabling for a wide range of replication applications. Contrary to popular opinion these extensions are actually well understood at the theory level and have been implemented as prototypes as well as in commercial patches multiple times in different databases. Those of us who are deeply involved in replication deserve just condemnation for not stepping up and getting our thoughts out on the table. Meanwhile, I would be interested in your reaction to these thoughts on the scope of the real-time WAL approach. There's obviously tremendous interest in this feature. A general description that goes beyond the NTT slides would be most helpful for further discussions. Cheers, Robert P.s., The NTT slides were really great. Takahiro and Masao deserve congratulations on an absolutely first-rate presentation. On 5/29/08 9:09 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Andrew Sullivan <[EMAIL PROTECTED]> writes: >> On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: >>> people are starting to get religion on this issue I would strongly >>> advocate a parallel effort to put in a change-set extraction API >>> that would allow construction of comprehensive master/slave >>> replication. > >> You know, I gave a talk in Ottawa just last week about how the last >> effort to develop a comprehensive API for replication failed. > > Indeed, core's change of heart on this issue was largely driven by > Andrew's talk and subsequent discussion. We had more or less been > waiting for the various external replication projects to tell us > what they wanted in this line, and it was only the realization that > no such thing was likely to happen that forced us to think seriously > about what could be done within the core project. > > As I said originally, we have no expectation that the proposed features > will displace the existing replication projects for "high end" > replication problems ... and I'd characterize all of Robert's concerns > as "high end" problems. We are happy to let those be solved outside > the core project. > > About the only thing that would make me want to consider row-based > replication in core would be if we determine that read-only slave > queries are impractical atop a WAL-log-shipping implementation. > Which could happen; in fact I think that's the main risk of the > proposed development plan. But I also think that the near-term > steps of the plan are worth doing anyway, for various other reasons, > and so we won't be out too much effort if the plan fails. > > regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi everyone, First of all, I'm absolutely delighted that the PG community is thinking seriously about replication. Second, having a solid, easy-to-use database availability solution that works more or less out of the box would be an enormous benefit to customers. Availability is the single biggest problem for customers in my experience and as other people have commented the alternatives are not nice. It's an excellent idea to build off an existing feature-PITR is already pretty useful and the proposed features are solid next steps. The fact that it does not solve all problems is not a drawback but means it's likely to get done in a reasonable timeframe. Third, you can't stop with just this feature. (This is the BUT part of the post.) The use cases not covered by this feature area actually pretty large. Here are a few that concern me: 1.) Partial replication. 2.) WAN replication. 3.) Bi-directional replication. (Yes, this is evil but there are problems where it is indispensable.) 4.) Upgrade support. Aside from database upgrade (how would this ever really work between versions?), it would not support zero-downtime app upgrades, which depend on bi-directional replication tricks. 5.) Heterogeneous replication. 6.) Finally, performance scaling using scale-out over large numbers of replicas. I think it's possible to get tunnel vision on this-it's not a big requirement in the PG community because people don't use PG in the first place when they want to do this. They use MySQL, which has very good replication for performance scaling, though it's rather weak for availability. As a consequence, I don't see how you can get around doing some sort of row-based replication like all the other databases. Now that people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. (Another approach would be to make it possible for third party apps to read the logs and regenerate SQL.) There are existing models for how to do change set extraction; we have done it several times at my company already. There are also research projects like GORDA that have looked fairly comprehensively at this problem. My company would be quite happy to participate in or even sponsor such an API. Between the proposed WAL-based approach and change-set-based replication it's not hard to see PG becoming the open source database of choice for a very large number of users. Cheers, Robert On 5/29/08 6:37 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: David Fetter <[EMAIL PROTECTED]> writes: > On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote: >> The only question I have is... what does this give us that PITR >> doesn't give us? > It looks like a wrapper for PITR to me, so the gain would be ease of > use. A couple of points about that: * Yeah, ease of use is a huge concern here. We're getting beat up because people have to go find a separate package (and figure out which one they want), install it, learn how to use it, etc. It doesn't help that the most mature package is Slony which is, um, not very novice-friendly or low-admin-complexity. I personally got religion on this about two months ago when Red Hat switched their bugzilla from Postgres to MySQL because the admins didn't want to deal with Slony any more. People want simple. * The proposed approach is trying to get to "real" replication incrementally. Getting rid of the loss window involved in file-by-file log shipping is step one, and I suspect that step two is going to be fixing performance issues in WAL replay to ensure that slaves can keep up. After that we'd start thinking about how to let slaves run read-only queries. But even without read-only queries, this will be a useful improvement for HA/backup scenarios. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Deterministic locking in PostgreSQL
Hi Tom, First of all thanks for the quick response. No, the arrival order will not be deterministic. Here is how we ensure determinism. 1.) SQL requests are delivered to the replication agent in a specific total order. This could occur either because they were already serialized by a database (master/slave case) or delivery through group communications (master/master case). 2.) Within replication we use advisory table locks at the middleware level to guide scheduling of request execution. This allows non-conflicting SQL statements to proceed in parallel but blocks those that might conflict. The reason I asked about determinism in locking is that this algorithm has a problem with distributed deadlock. If you look back at the example in the original post, you get the following: 1: T1, T2, T3: begin 2: T1: update foo set value='x' where id=25; <-- Grabs row lock, grabs and releases middleware table lock 3: T2: update foo set value='y' where id=25; <-- Grabs middleware table lock, blocks on row lock 4: T3: update foo set value='z' where id=25; <-- DEADLOCKED 5: T1: update foo set value='x1' where id=25; 6: T1: commit 7: T2: commit 8: T3: commit At step 3 we deadlock since the request blocks in the database while holding the middleware table lock. Our plan to alleviate this problem is to look for requests that block (i.e., show up in pg_locks) and release their middleware table lock. As long as locks are granted deterministically this allows the next request to proceed--the ordering is now enforced by the database itself. There are some other possible race conditions, such as results of sub-selects on UPDATE statements, but this optimization will help us avoid a number of unnecessary failures in master/master replication. If anything else about this raises hackles on your neck (or anyone else's for that matter) please let me know. It's better to know now. :) Cheers, Robert On 5/9/08 4:53 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Robert Hodges <[EMAIL PROTECTED]> writes: >> This question may have an obvious answer I have somehow missed, but to what >> extent is locking order deterministic in PostgreSQL? For example, if >> requests from multiple transactions arrive in some deterministic order and >> acquire locks, can one assume that locks will be granted in the same order >> if the requests are repeated at different times or on different servers? > > Yeah, it should be deterministic given consistent arrival order. > >> Lock determinism is an important issue for replication algorithms that >> depend on database instances to behave as state machines. > > However, the idea of depending on a replication algorithm that has race > conditions gives me the willies ... and that sure sounds like what you > are describing. Do not trust your data to the assumption that arrival > order will be deterministic. > > regards, tom lane > -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Deterministic locking in PostgreSQL
Hi everyone, This question may have an obvious answer I have somehow missed, but to what extent is locking order deterministic in PostgreSQL? For example, if requests from multiple transactions arrive in some deterministic order and acquire locks, can one assume that locks will be granted in the same order if the requests are repeated at different times or on different servers? Lock determinism is an important issue for replication algorithms that depend on database instances to behave as state machines. Here's a simple example of the behavior I'm seeking. Suppose you have transactions T1, T2, and T3 that execute as shown below. Each line represents an "increment" of time. T1, T2, T3: begin T1: update foo set value='x' where id=25; <-- Grabs row lock T2: update foo set value='y' where id=25; <-- Blocked T3: update foo set value='z' where id=25; <-- Blocked T1: update foo set value='x1' where id=25; T1: commit T2: commit T3: commit T2 and T3 are both blocked until T1 commits. At that point, is the row lock granted to T2 and T3 in some deterministic order? Or can it vary based on load, lock manager state, etc., so that sometimes you get 'y' and sometimes 'z' as the final result? If this case turns out to be deterministic, are there other cases that come to mind that would turn out to be non-deterministic? Thanks, Robert -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table rewrites vs. pending AFTER triggers
Hi, I'm with David on this one. Transactional DDL also turns out to be incredibly helpful for tools that generate and load DDL to extend the database, for example triggers and control tables to implement reliable messaging. You can put the setup in a single transaction, which vastly simplifies tool implementation. We have an application at Continuent that depends on exactly this behavior. I was investigating PostgreSQL semantics just last week and was delighted to find they appear to be exactly right. Oracle on the other hand is going to be a pain... Cheers, Robert Hodges On 1/3/08 12:11 AM, "David Fetter" <[EMAIL PROTECTED]> wrote: > On Thu, Jan 03, 2008 at 01:08:47PM +0530, Gokulakannan Somasundaram wrote: >> On Jan 3, 2008 12:44 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >>> "Gokulakannan Somasundaram" <[EMAIL PROTECTED]> writes: >>>> I actually mean to say that DDLs can be declared as >>>> self-committing. >>> >>> Egad, an Oracle lover in our midst. >> >> :). True, its an impact of working more with Oracle. I made the >> suggestion here, because it might reduce some if conditions. >> >>> Most of us think that roll-back-able DDL is one of the best >>> features of Postgres, and certainly one of our best selling points >>> vis-a-vis Oracle. Don't expect us to give it up. >> >> Can you please explain, any specific use-case where DDLs are >> necessary within a transaction? > > Let's imagine that you want to do a DDL change to a production > database. You've tested the change script on a test database, but you > want to be sure you get *exactly* from the place you were to the place > you want to be. With transactional DDL, you know absolutely for sure > that you've done either the whole change or none of it, i.e. not > half-way in between :) > > Cheers, > David (a giant fan of transactional DDL) - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Test lab
Hi everyone, Here are a couple of additions to the performance test lab discussion. I hope you will find these useful. 1.) Test tools. The Bristlecone testing package I presented at the PG Fall 2007 Conference is now available at http:// bristlecone.continuent.org. There are two main tools: Evaluator and Benchmark. Evaluator generates a CPU-intensive mixed load. Benchmark generates very specific loads with systematically varying parameters. I have been using bristlecone to do a lot of testing of MySQL and PostgreSQL, since we have middleware that runs on both. I plan to follow Josh's request and run some of the current benchmarks to compare 8.2.5 vs. 8.3 performance. So far most of my tests have compared MySQL and PostgreSQL vs. our middleware but I recently started to compare the databases directly. One initial result: MySQL appears to be much faster at streaming very large result sets. 2.) Test hardware. We have a number of hosts in Grenoble, France that are available to help set up a European lab.We gave away 4 to the postgresql.fr folks but if there's anyone else within driving (or trucking distance) we still have at least a dozen 1U rack mountable Compaq units. They are in a garage and winter will soon be upon the Alps, so we need to try to unload them. Unluckily we overbought hardware in this location but with luck this can be someone else's good fortune. It probably won't help in the US of A due to shipping costs. Please look at Bristlecone. It's very early on but I have found these tools to be exceedingly useful. Among other things it should be possible to add features that allow us to do regression testing on performance, something that is a pain for "normal" test frameworks. Cheers, Robert Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm On Nov 6, 2007, at 9:49 AM, Mark Wong wrote: On Tue, 06 Nov 2007 13:15:02 + Simon Riggs <[EMAIL PROTECTED]> wrote: On Mon, 2007-11-05 at 14:33 -0800, Mark Wong wrote: On 11/4/07, Simon Riggs <[EMAIL PROTECTED]> wrote: Why don't you post a TODO list for TPC-E somewhere, so people can bite small pieces off of the list. I'm sure there's lots of people can help if we do it that way. This should be a good start: http://osdldbt.sourceforge.net/dbt5/todo.html Ah, thanks. Not sure what some of the TODOs mean, but I'll see if I have time to look at some of the code to see if I can help. No worries, just ask when you get to it. ;) I'm making slow progress anyway. I'll get to them all eventually... Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq