Re: [HACKERS] Some questions about mammoth replication
Hannu Krosing wrote: We have hooks in executor calling our own collecting functions, so we don't need the trigger machinery to launch replication. But where do you store the collected info - in your own replication_log table, or do reuse data in WAL you extract it on master befor replication to slave (or on slave after moving the WAL) ? We don't use either a log table in database or WAL. The data to replicate is stored in disk files, one per transaction. As Joshua said, the WAL is used to ensure that only those transactions that are recorded as committed in WAL are sent to slaves. Do you make use of snapshot data, to make sure, what parts of WAL log are worth migrating to slaves , or do you just apply everything in WAL in separate transactions and abort if you find out that original transaction aborted ? We check if a data transaction is recorded in WAL before sending it to a slave. For an aborted transaction we just discard all data collected from that transaction. Do you duplicate postgresql's MVCC code for that, or will this happen automatically via using MVCC itself for collected data ? Every transaction command that changes data in a replicated relation is stored on disk. PostgreSQL MVCC code is used on a slave in a natural way when transaction commands are replayed there. How do you handle really large inserts/updates/deletes, which change say 10M rows in one transaction ? We produce really large disk files ;). When a transaction commits - a special queue lock is acquired and transaction is enqueued to a sending queue. Since the locking mode for that lock is exclusive a commit of a very large transaction would delay commits of other transactions until the lock is held. We are working on minimizing the time of holding this lock in the new version of Replicator. Do you extract / generate full sql DML queries from data in WAL logs, or do you apply the changes at some lower level ? We replicate the binary data along with a command type. Only the data necessary to replay the command on a slave are replicated. Do you replay it as SQL insert/update/delete commands, or directly on heap/indexes ? We replay the commands directly using heap/index functions on a slave. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Some questions about mammoth replication
Hannu Krosing wrote: We don't use either a log table in database or WAL. The data to replicate is stored in disk files, one per transaction. Clever :) How well does it scale ? That is, at what transaction rate can your replication keep up with database ? This depend on a number of concurrent transactions (the data is collected by every backend process), max transaction size etc. I don't have numbers here, sorry. As Joshua said, the WAL is used to ensure that only those transactions that are recorded as committed in WAL are sent to slaves. How do you force correct commit order of applying the transactions ? The first transaction that is committed in PostgreSQL is the first transaction placed into the queue, and the first that is restored by the slave. Do you make use of snapshot data, to make sure, what parts of WAL log are worth migrating to slaves , or do you just apply everything in WAL in separate transactions and abort if you find out that original transaction aborted ? We check if a data transaction is recorded in WAL before sending it to a slave. For an aborted transaction we just discard all data collected from that transaction. Do you duplicate postgresql's MVCC code for that, or will this happen automatically via using MVCC itself for collected data ? Every transaction command that changes data in a replicated relation is stored on disk. PostgreSQL MVCC code is used on a slave in a natural way when transaction commands are replayed there. Do you replay several transaction files in the same transaction on slave ? Can you replay several transaction files in parallel ? No, we have plans for concurrent restore of replicated data, but currently we a single slave process responsible for restoring data received from MCP. How do you handle really large inserts/updates/deletes, which change say 10M rows in one transaction ? We produce really large disk files ;). When a transaction commits - a special queue lock is acquired and transaction is enqueued to a sending queue. Since the locking mode for that lock is exclusive a commit of a very large transaction would delay commits of other transactions until the lock is held. We are working on minimizing the time of holding this lock in the new version of Replicator. Why does it take longer to queue a large file ? dou you copy data from one file to another ? Yes, currently the data is copied from the transaction files into the queue (this doesn't apply to dump transactions). However, we have recently changed this, the new code will acquire the queue lock only to record transaction as committed in replication log without moving the data. Do you replay it as SQL insert/update/delete commands, or directly on heap/indexes ? We replay the commands directly using heap/index functions on a slave. Does that mean that the table structures will be exactly the same on both master slave ? Yes, the table structure on the slaves should match the table structure on master. That is, do you replicate a physical table image (maybe not including transaction ids on master) ? Yes, we call this 'full dump', and it is fired automatically for every replicated table. We replicate only data however, not DDL commands to create/alter table or sequence. Or you just use lower-level versions on INSERT/UPDATE/DELETE ? - Hannu Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Some questions about mammoth replication
Hello, Hannu Krosing wrote: Here come my questions : From looking at http://www.commandprompt.com/images/MR_components.jpg it seems that you don't do replication just from WAL logs, but also collect some extra info inside postgreSQL server. Is this so ? If it is, then in what way does it differ from simple trigger-based change logging ? We have hooks in executor calling our own collecting functions, so we don't need the trigger machinery to launch replication. Do you make use of snapshot data, to make sure, what parts of WAL log are worth migrating to slaves , or do you just apply everything in WAL in separate transactions and abort if you find out that original transaction aborted ? We check if a data transaction is recorded in WAL before sending it to a slave. For an aborted transaction we just discard all data collected from that transaction. Are your slaves a) standby b) read-only or c) read-write ? Replicated relations are read-only on slaves. Do you extract / generate full sql DML queries from data in WAL logs, or do you apply the changes at some lower level ? We replicate the binary data along with a command type. Only the data necessary to replay the command on a slave are replicated. For what use cases do you think your WAL-based approach is better than Slony/Skytools trigger-based one ? A pure trigger based approach can only replicate data for the commands which fire triggers. AFAIK Slony is unable to replicate TRUNCATE command (I don't know if Skytools can). Replicator doesn't have this limitation. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Some questions about mammoth replication
Marko Kreen wrote: On 10/11/07, Alexey Klyukin [EMAIL PROTECTED] wrote: Hannu Krosing wrote: For what use cases do you think your WAL-based approach is better than Slony/Skytools trigger-based one ? A pure trigger based approach can only replicate data for the commands which fire triggers. AFAIK Slony is unable to replicate TRUNCATE command (I don't know if Skytools can). Replicator doesn't have this limitation. No, Skytools is same as Slony. Can you also replicate changes to system tables? No, we need a table to have primary key to be able to replicate it. From the other hand replicating a system relation can be dangerous, i.e. what if you replicate the contents of pg_class without corresponing relations on the slave, that's why explicitly forbid enabling replication for relations from pg_catalog namespace. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] GIT patch
Heikki Linnakangas wrote: Alvaro Herrera wrote: Hmm, do say, doesn't it seem like the lack of feedback and the failed bitmap patch played against final development of this patch? Yes :(. That's a one reason why I tried to help with the review of that patch. At this point I feel like the patch still needs some work and reshuffling before it is in an acceptable state. The fact that there are some API changes for which the patch needs to be adjusted makes me feel like we should put this patch on hold for 8.4. So we would first get the API changes discussed and done and then adapt this patch to them. I hate to say it but I agree. Getting the API changes discussed and committed was my plan in February/March. Unfortunately it didn't happen back then. There's a few capabilities we need from the new API: 1. Support for candidate matches. Because a clustered index doesn't contain the key for every heap tuple, when you search for a value we don't know exactly which ones match. Instead, you get a bunch of candidate matches, which need to be rechecked after fetching the heap tuple. Oleg Teodor pointed out that GiST could use the capability as well. I also proposed a while ago to change the hash index implementation so that it doesn't store the index key in the index, but just the hash of it. That again would need the support for candidate matches. And there's range-encoded bitmap indexes, if we implement them in a more distant future. Well, then should we return to the review of your 'bitmapscan changes' patch ? I've posted a version which applies (or applied to the cvs head at the time of post) cleanly there: http://archives.postgresql.org/pgsql-patches/2007-06/msg00204.php 2. Support to sort the heap tuples represented by one index tuple, in normal index scans, if we go with alternative 1. Or support to do binary searches over them, if we go with alternative 2 or 3. As the patch stands, the sorting is done within b-tree, but that's quite ugly. -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] GIT patch review
Hello, I'd like to help reviewing patches, in particular the group index tupes (GIT) patch by Heikki Linnakangas (http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php). I've spoken with Alvaro about it, he gave me several links to the threads on hackers related to the GIT patch and I have some questions: What about proposition for indexam changes, I can't find any patches there ? (http://momjian.us/mhonarc/patches/msg00125.html) Is the patch for changing amgetmulti to amgetbitmap relevant to the GIT patch ? This original patch is here: http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php It doesn't apply cleanly to the cvs head, I can provide necessary changes (actually I've sent them and figured there is a nowhere mentioned limit on -hackers which is why I'm resending the message without that patch), Any other suggestions on reviewing the GIT patch ? Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Subversion repo up
Hello, The svn repository is currently accessible only via https,the address is: https://projects.commandprompt.com/public/pgsql/repo/ AFAIK Joshua planned to upgrade svn to version 1.4, however, I don't know when it would happen. Hannes Eder wrote: Joshua D. Drake wrote: You can now checkout a pgsql converted to svn repo here: http://projects.commandprompt.com/public/pgsql/repo/ I'd like to use svnsync (see [1]) to create a read-only mirror of the pgsql svn repository (see [2]). svnsync requires svn version = 1.4 as source repository server (see [1]). The pgsql svn repository [2] is currently running subversion 1.3.1. Is there a way to upgrade this svn repository to svn = 1.4 or to push this svn repository to a svn = 1.4 server? Any other ideas? Or should I create my own svn repository from the main cvs repository? kind regards, Hannes Eder Footnotes: [1] http://subversion.tigris.org/svn_1.4_releasenotes.html#svnsync [2] http://projects.commandprompt.com/public/pgsql/repo/ Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] What tools do people use to hack on PostgreSQL?
Hi, Florian G. Pflug wrote: Hi I'm very excited that my project for implementing read-only queries on PITR slaves was accepted for GSoC, and I'm now trying to work out what tools I'll use for that job. I'd like to be able to create some sort of branches and tags for my own work (only inside my local repository of course). I've considered using git, but I couldn't make the cvs-git gateway work - neither using the postgresql CVS repository directly, nor with a private copy obtained with CVSup. There is also svk, but I think I'd need a svn repo that mirrors the postgresql CVS for that to work. I think Joshua Drake created one once, but I don't now if it is kept up-to-date. Yes, it is (the latest visible commit was made 6 hours ago), you can browse sources at: http://projects.commandprompt.com/public/pgsql/browser or do the anonymous checkout with: svn co http://projects.commandprompt.com/public/pgsql/repo/ Regards, -- Alexey Klyukin [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] What tools do people use to hack on PostgreSQL?
Alvaro Herrera wrote: But if you have a checked out tree, does it work to do an update after the tree has been regenerated? As far as I know, the repo is generated completely every few hours, so it wouldn't surprise me that the checked out copy is not compatible with the new repo. I admit I haven't tried. I have tried and svn up worked without issues. Regards, -- Alexey Klyukin [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] plperl/plperlu interaction
Jeremy Drake wrote: On Thu, 26 Oct 2006, Alvaro Herrera wrote: Jeff Trout wrote: On Oct 26, 2006, at 3:23 PM, Martijn van Oosterhout wrote: On Thu, Oct 26, 2006 at 03:15:00PM -0400, Andrew Dunstan wrote: Perhaps people who use other platforms could look for these flags in the output of perl -e 'use Config qw(myconfig config_sh config_vars config_re); print config_sh();' OSX 10.4.8: usemultiplicity='define' usethreads='define' useithreads='define' Same here on Debian unstable (stock Perl packages). On my current Gentoo box: useithreads='undef' usemultiplicity='undef' usethreads='undef' My USE flags have ithreads disabled, since the description of the feature is Enable Perl threads, has some compatibility problems On my Ubuntu 'Dapper' system: useithreads='define' usemultiplicity='define' usethreads='define' And I'm getting 'undef' for each of these flags on both Gentoo 2006.1 and Gentoo 1.4 systems using the default perl installation. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Replication documentation addition
Hi, A typo: (a write to any server has to be _propogated_) s/propogated/propagated Bruce Momjian wrote: Here is a new replication documentation section I want to add for 8.2: ftp://momjian.us/pub/postgresql/mypatches/replication Comments welcomed. -- Regards, Alexey Klyukin alexk(at)vollmond.org.ua Simferopol, Crimea, Ukraine. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings