Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Alexey Klyukin
Hannu Krosing wrote:

  We have hooks in executor calling our own collecting functions, so we
  don't need the trigger machinery to launch replication.
 
 But where do you store the collected info - in your own replication_log
 table, or do reuse data in WAL you extract it on master befor
 replication to slave (or on slave after moving the WAL) ?

We don't use either a log table in database or WAL. The data to
replicate is stored in disk files, one per transaction. As Joshua said,
the WAL is used to ensure that only those transactions that are recorded
as committed in WAL are sent to slaves.

 
   Do you make use of snapshot data, to make sure, what parts of WAL log
   are worth migrating to slaves , or do you just apply everything in WAL
   in separate transactions and abort if you find out that original
   transaction aborted ?
  
  We check if a data transaction is recorded in WAL before sending
  it to a slave. For an aborted transaction we just discard all data 
  collected 
  from that transaction.
 
 Do you duplicate postgresql's MVCC code for that, or will this happen
 automatically via using MVCC itself for collected data ?

Every transaction command that changes data in a replicated relation is
stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
when transaction commands are replayed there.

 
 How do you handle really large inserts/updates/deletes, which change say 10M 
 rows in one transaction ?

We produce really large disk files ;). When a transaction commits - a
special queue lock is acquired and transaction is enqueued to a sending
queue. Since the locking mode for that lock is exclusive a commit of a
very large transaction would delay commits of other transactions until
the lock is held. We are working on minimizing the time of holding this
lock in the new version of Replicator.

 
   Do you extract / generate full sql DML queries from data in WAL logs, or
   do you apply the changes at some lower level ?
  
  We replicate the binary data along with a command type. Only the data
  necessary to replay the command on a slave are replicated.
 
 Do you replay it as SQL insert/update/delete commands, or directly on
 heap/indexes ?

We replay the commands directly using heap/index functions on a slave.

Regards,

-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Alexey Klyukin
Hannu Krosing wrote:

  We don't use either a log table in database or WAL. The data to
  replicate is stored in disk files, one per transaction.
 
 Clever :)
 
 How well does it scale ? That is, at what transaction rate can your
 replication keep up with database ?

This depend on a number of concurrent transactions (the data is
collected by every backend process), max transaction size etc. I don't
have numbers here, sorry.

 
   As Joshua said,
  the WAL is used to ensure that only those transactions that are recorded
  as committed in WAL are sent to slaves.
 
 How do you force correct commit order of applying the transactions ?

The first transaction that is committed in PostgreSQL is the first
transaction placed into the queue, and the first that is restored by the
slave.

 
   
 Do you make use of snapshot data, to make sure, what parts of WAL log
 are worth migrating to slaves , or do you just apply everything in WAL
 in separate transactions and abort if you find out that original
 transaction aborted ?

We check if a data transaction is recorded in WAL before sending
it to a slave. For an aborted transaction we just discard all data 
collected 
from that transaction.
   
   Do you duplicate postgresql's MVCC code for that, or will this happen
   automatically via using MVCC itself for collected data ?
  
  Every transaction command that changes data in a replicated relation is
  stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
  when transaction commands are replayed there.
 
 Do you replay several transaction files in the same transaction on
 slave ?

 Can you replay several transaction files in parallel ?

No, we have plans for concurrent restore of replicated data, but
currently we a single slave process responsible for restoring data
received from MCP.

 
   How do you handle really large inserts/updates/deletes, which change say 
   10M 
   rows in one transaction ?
  
  We produce really large disk files ;). When a transaction commits - a
  special queue lock is acquired and transaction is enqueued to a sending
  queue. 
  Since the locking mode for that lock is exclusive a commit of a
  very large transaction would delay commits of other transactions until
  the lock is held. We are working on minimizing the time of holding this
  lock in the new version of Replicator.
 
 Why does it take longer to queue a large file ? dou you copy data from
 one file to another ?

Yes, currently the data is copied from the transaction files into the
queue (this doesn't apply to dump transactions).

However, we have recently changed this, the new code will acquire the
queue lock only to record transaction as committed in replication log
without moving the data.

   Do you replay it as SQL insert/update/delete commands, or directly on
   heap/indexes ?
  
  We replay the commands directly using heap/index functions on a slave.
 
 Does that mean that the table structures will be exactly the same on
 both master slave ? 

Yes, the table structure on the slaves should match the table structure
on master.

 That is, do you replicate a physical table image
 (maybe not including transaction ids on master) ?

Yes, we call this 'full dump', and it is fired automatically for every
replicated table. We replicate only data however, not DDL commands to
create/alter table or sequence.

 
 Or you just use lower-level versions on INSERT/UPDATE/DELETE ?
 
 -
 Hannu
 
 
 

Regards,
-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin:
 Hannu Krosing wrote:
 
   We have hooks in executor calling our own collecting functions, so we
   don't need the trigger machinery to launch replication.
  
  But where do you store the collected info - in your own replication_log
  table, or do reuse data in WAL you extract it on master befor
  replication to slave (or on slave after moving the WAL) ?
 
 We don't use either a log table in database or WAL. The data to
 replicate is stored in disk files, one per transaction.

Clever :)

How well does it scale ? That is, at what transaction rate can your
replication keep up with database ?

  As Joshua said,
 the WAL is used to ensure that only those transactions that are recorded
 as committed in WAL are sent to slaves.

How do you force correct commit order of applying the transactions ?

  
Do you make use of snapshot data, to make sure, what parts of WAL log
are worth migrating to slaves , or do you just apply everything in WAL
in separate transactions and abort if you find out that original
transaction aborted ?
   
   We check if a data transaction is recorded in WAL before sending
   it to a slave. For an aborted transaction we just discard all data 
   collected 
   from that transaction.
  
  Do you duplicate postgresql's MVCC code for that, or will this happen
  automatically via using MVCC itself for collected data ?
 
 Every transaction command that changes data in a replicated relation is
 stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
 when transaction commands are replayed there.

Do you replay several transaction files in the same transaction on
slave ?

Can you replay several transaction files in parallel ?

  How do you handle really large inserts/updates/deletes, which change say 
  10M 
  rows in one transaction ?
 
 We produce really large disk files ;). When a transaction commits - a
 special queue lock is acquired and transaction is enqueued to a sending
 queue. 
 Since the locking mode for that lock is exclusive a commit of a
 very large transaction would delay commits of other transactions until
 the lock is held. We are working on minimizing the time of holding this
 lock in the new version of Replicator.

Why does it take longer to queue a large file ? dou you copy data from
one file to another ?

Do you extract / generate full sql DML queries from data in WAL logs, or
do you apply the changes at some lower level ?
   
   We replicate the binary data along with a command type. Only the data
   necessary to replay the command on a slave are replicated.
  
  Do you replay it as SQL insert/update/delete commands, or directly on
  heap/indexes ?
 
 We replay the commands directly using heap/index functions on a slave.

Does that mean that the table structures will be exactly the same on
both master slave ? That is, do you replicate a physical table image
(maybe not including transaction ids on master) ?

Or you just use lower-level versions on INSERT/UPDATE/DELETE ?

-
Hannu




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Some questions about mammoth replication

2007-10-11 Thread Hannu Krosing
 
 btw, can you publicly discuss how CommandPrompts  WAL-based
 replication works ? 

It's my company, if course I am ;)... but not in this thread. If you
are interested feel free to email me directly or start a new thread.

Good :)

Here come my questions :

From looking at http://www.commandprompt.com/images/MR_components.jpg it
seems that you don't do replication just from WAL logs, but also collect
some extra info inside postgreSQL server. Is this so ?

If it is, then in what way does it differ from simple trigger-based
change logging ?

Do you make use of snapshot data, to make sure, what parts of WAL log
are worth migrating to slaves , or do you just apply everything in WAL
in separate transactions and abort if you find out that original
transaction aborted ?

Are your slaves a) standby b) read-only or c) read-write ?

Do you extract / generate full sql DML queries from data in WAL logs, or
do you apply the changes at some lower level ?

For what use cases do you think your WAL-based approach is better than
Slony/Skytools trigger-based one ?

--
Hannu




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Alexey Klyukin
Hello,

Hannu Krosing wrote:
 
 Here come my questions :
 
 From looking at http://www.commandprompt.com/images/MR_components.jpg it
 seems that you don't do replication just from WAL logs, but also collect
 some extra info inside postgreSQL server. Is this so ?

 If it is, then in what way does it differ from simple trigger-based
 change logging ?

We have hooks in executor calling our own collecting functions, so we
don't need the trigger machinery to launch replication.

 Do you make use of snapshot data, to make sure, what parts of WAL log
 are worth migrating to slaves , or do you just apply everything in WAL
 in separate transactions and abort if you find out that original
 transaction aborted ?

We check if a data transaction is recorded in WAL before sending
it to a slave. For an aborted transaction we just discard all data collected 
from that transaction.

 
 Are your slaves a) standby b) read-only or c) read-write ?

Replicated relations are read-only on slaves.

 
 Do you extract / generate full sql DML queries from data in WAL logs, or
 do you apply the changes at some lower level ?

We replicate the binary data along with a command type. Only the data
necessary to replay the command on a slave are replicated.

 
 For what use cases do you think your WAL-based approach is better than
 Slony/Skytools trigger-based one ?

A pure trigger based approach can only replicate data for the commands
which fire triggers. AFAIK Slony is unable to replicate TRUNCATE
command (I don't know if Skytools can). Replicator doesn't have this
limitation.

Regards,
-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Andreas Pflug
Alexey Klyukin wrote:

   
 For what use cases do you think your WAL-based approach is better than
 Slony/Skytools trigger-based one ?
 

 A pure trigger based approach can only replicate data for the commands
 which fire triggers. AFAIK Slony is unable to replicate TRUNCATE
 command
It could be wrapped with ddl_script which obviously isn't transparent to
the application, but I guess a table that's truncated regularly won't be
a typical candidate  for (async) replication either.

Regards,
Andreas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Marko Kreen
On 10/11/07, Alexey Klyukin [EMAIL PROTECTED] wrote:
 Hannu Krosing wrote:
  For what use cases do you think your WAL-based approach is better than
  Slony/Skytools trigger-based one ?

 A pure trigger based approach can only replicate data for the commands
 which fire triggers. AFAIK Slony is unable to replicate TRUNCATE
 command (I don't know if Skytools can). Replicator doesn't have this
 limitation.

No, Skytools is same as Slony.

Can you also replicate changes to system tables?

-- 
marko

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Alexey Klyukin
Marko Kreen wrote:
 On 10/11/07, Alexey Klyukin [EMAIL PROTECTED] wrote:
  Hannu Krosing wrote:
   For what use cases do you think your WAL-based approach is better than
   Slony/Skytools trigger-based one ?
 
  A pure trigger based approach can only replicate data for the commands
  which fire triggers. AFAIK Slony is unable to replicate TRUNCATE
  command (I don't know if Skytools can). Replicator doesn't have this
  limitation.
 
 No, Skytools is same as Slony.
 
 Can you also replicate changes to system tables?

No, we need a table to have primary key to be able to replicate it. From
the other hand replicating a system relation can be dangerous, i.e. what if
you replicate the contents of pg_class without corresponing relations on the 
slave, that's why explicitly forbid enabling replication for relations from 
pg_catalog namespace.

Regards,
-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Joshua D. Drake
On Thu, 11 Oct 2007 19:10:18 +0300
Alexey Klyukin [EMAIL PROTECTED] wrote:

 Marko Kreen wrote:
  On 10/11/07, Alexey Klyukin [EMAIL PROTECTED] wrote:
   Hannu Krosing wrote:
For what use cases do you think your WAL-based approach is
better than Slony/Skytools trigger-based one ?
  
   A pure trigger based approach can only replicate data for the
   commands which fire triggers. AFAIK Slony is unable to replicate
   TRUNCATE command (I don't know if Skytools can). Replicator
   doesn't have this limitation.
  
  No, Skytools is same as Slony.
  
  Can you also replicate changes to system tables?

 
 No, we need a table to have primary key to be able to replicate it.
 From the other hand replicating a system relation can be dangerous,
 i.e. what if you replicate the contents of pg_class without
 corresponing relations on the slave, that's why explicitly forbid
 enabling replication for relations from pg_catalog namespace.

We can however replicate large objects and specific system commands
like GRANT/REVOKE and CREATE/DROP user. The latter being done via
triggers.

Joshua D. Drake

 
 Regards,


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



signature.asc
Description: PGP signature


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-10-11 kell 18:25, kirjutas Alexey Klyukin:
 Hello,
 
 Hannu Krosing wrote:
  
  Here come my questions :
  
  From looking at http://www.commandprompt.com/images/MR_components.jpg it
  seems that you don't do replication just from WAL logs, but also collect
  some extra info inside postgreSQL server. Is this so ?
 
  If it is, then in what way does it differ from simple trigger-based
  change logging ?
 
 We have hooks in executor calling our own collecting functions, so we
 don't need the trigger machinery to launch replication.

But where do you store the collected info - in your own replication_log
table, or do reuse data in WAL you extract it on master befor
replication to slave (or on slave after moving the WAL) ?

  Do you make use of snapshot data, to make sure, what parts of WAL log
  are worth migrating to slaves , or do you just apply everything in WAL
  in separate transactions and abort if you find out that original
  transaction aborted ?
 
 We check if a data transaction is recorded in WAL before sending
 it to a slave. For an aborted transaction we just discard all data collected 
 from that transaction.

Do you duplicate postgresql's MVCC code for that, or will this happen
automatically via using MVCC itself for collected data ?

How do you handle really large inserts/updates/deletes, which change say 10M 
rows in one transaction ?

  Are your slaves a) standby b) read-only or c) read-write ?
 
 Replicated relations are read-only on slaves.
 
  
  Do you extract / generate full sql DML queries from data in WAL logs, or
  do you apply the changes at some lower level ?
 
 We replicate the binary data along with a command type. Only the data
 necessary to replay the command on a slave are replicated.

Do you replay it as SQL insert/update/delete commands, or directly on
heap/indexes ?


Hannu







---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Some questions about mammoth replication

2007-10-11 Thread Joshua D. Drake
On Thu, 11 Oct 2007 21:58:45 +0300
Hannu Krosing [EMAIL PROTECTED] wrote:


  We have hooks in executor calling our own collecting functions, so
  we don't need the trigger machinery to launch replication.
 
 But where do you store the collected info - in your own
 replication_log table,

No, we have our own transaction log outside the database.

 or do reuse data in WAL you extract it on
 master befor replication to slave (or on slave after moving the WAL) ?


We currently make zero use of WAL except to verify that the replication
data can be written.

 
   Do you make use of snapshot data, to make sure, what parts of WAL
   log are worth migrating to slaves , or do you just apply
   everything in WAL in separate transactions and abort if you find
   out that original transaction aborted ?
  
  We check if a data transaction is recorded in WAL before sending
  it to a slave. For an aborted transaction we just discard all data
  collected from that transaction.
 
 Do you duplicate postgresql's MVCC code for that, or will this happen
 automatically via using MVCC itself for collected data ?
 

Gonna have to wait for Alexey for the rest.

Joshua D. Drake





-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



signature.asc
Description: PGP signature