Re: [GENERAL] BDR question on dboid conflicts

2017-10-27 Thread Zhu, Joshua
Thanks, sounds like that's something unique in my environment/setup.

Here are the results of bdr.bdr_get_local_nodeid() for four nodes in a group,
Node 1: (6480169638493465053,1,16386)
Node 2: (6480169638493465053,1,20225)
Node 3: (6480169638493465053,1,29164)
Node 4: (6480169638493465053,1,20227)

And here is what pg_replication_slots table looks like on Node 4
bdr_20227_6480169638493465053_1_29164__ | bdr| logical   |  20227 | mydb   
| t  |   9603 |  | 7750 | 0/2D4E780   | 0/2D4E7B8
bdr_20227_6480169638493465053_1_16386__ | bdr| logical   |  20227 | mydb   
| t  |   9602 |  | 7750 | 0/2D4E780   | 0/2D4E7B8
bdr_20227_6480169638493465053_1_20225__ | bdr| logical   |  20227 | mydb   
| t  |   9601 |  | 7750 | 0/2D4E780   | 0/2D4E7B8

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Craig Ringer
Sent: Thursday, October 26, 2017 7:24 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR question on dboid conflicts

On 27 October 2017 at 01:15, Zhu, Joshua <j...@vormetric.com> wrote:
> Database oid is used in both bdr.bdr_nodes, as node_dboid, and 
> bdr.bdr_connections, as conn_dboid, also used in construction of 
> replication slot names.

Correct. However, it's used in conjunction with the sysid and node timeline ID.

> I noticed that when trying to join a bdr group, if the database oid on 
> the new node happens to be the same as that of an node already in the 
> bdr group, the join would fail, and the only way to resolve the 
> conflict that I was able to come up with has been to retry with 
> dropping/recreating the database until the dboid does not conflict with any 
> node already in the group.

That is extremely surprising. In our regression tests the database oids should 
be the same quite often, as we do various tests where we create multiple 
instances. More importantly, every time you bdr_init_copy, you get a clone with 
the same database oid, and that works fine.

There's no detail here to work from, so I cannot guess what's actually 
happening, but I can confidently say it's not a database oid conflict.
Nowhere in BDR should the database oid be considered without the rest of the 
(sysid,timeline,dboid) tuple.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR question on dboid conflicts

2017-10-26 Thread Craig Ringer
On 27 October 2017 at 01:15, Zhu, Joshua  wrote:
> Database oid is used in both bdr.bdr_nodes, as node_dboid, and
> bdr.bdr_connections, as conn_dboid, also used in construction of replication
> slot names.

Correct. However, it's used in conjunction with the sysid and node timeline ID.

> I noticed that when trying to join a bdr group, if the database oid on the
> new node happens to be the same as that of an node already in the bdr group,
> the join would fail, and the only way to resolve the conflict that I was
> able to come up with has been to retry with dropping/recreating the database
> until the dboid does not conflict with any node already in the group.

That is extremely surprising. In our regression tests the database
oids should be the same quite often, as we do various tests where we
create multiple instances. More importantly, every time you
bdr_init_copy, you get a clone with the same database oid, and that
works fine.

There's no detail here to work from, so I cannot guess what's actually
happening, but I can confidently say it's not a database oid conflict.
Nowhere in BDR should the database oid be considered without the rest
of the (sysid,timeline,dboid) tuple.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR question on dboid conflicts

2017-10-26 Thread Zhu, Joshua
Database oid is used in both bdr.bdr_nodes, as node_dboid, and 
bdr.bdr_connections, as conn_dboid, also used in construction of replication 
slot names.

I noticed that when trying to join a bdr group, if the database oid on the new 
node happens to be the same as that of an node already in the bdr group, the 
join would fail, and the only way to resolve the conflict that I was able to 
come up with has been to retry with dropping/recreating the database until the 
dboid does not conflict with any node already in the group.

Is there a better way to handle this kind of conflicts, especially doing so in 
a script?

Thanks


Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-11 Thread Craig Ringer
On 12 October 2017 at 11:03, Andres Freund  wrote:
> On 2017-10-12 10:25:43 +0800, Craig Ringer wrote:
>> On 4 October 2017 at 00:21, milist ujang  wrote:
>> > On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringer  wrote:
>> >>
>> >>
>> >> Can you get stacks please?
>> >>
>> >> Use -g
>> >
>> >
>> > # Events: 2K cpu-clock
>> > #
>> > # Overhead   Command  Shared ObjectSymbol
>> > #     .  
>> > #
>> > 86.96%  postgres  [kernel.kallsyms]  [k] __mutex_lock_common.isra.5
>> > |
>> > --- __mutex_lock_common.isra.5
>> > read
>>
>>
>> Unfortunately it looks like you're using a postgres built with
>> -fomit-frame-pointers (the default) on x64, with an older perf not
>> built with libunwind. This produces useless stacks.
>
> Just read this mail, but for libunwind to work you'd have to specify
> "--call-graph dwarf", no?

I think you're right. But only on a version of perf where it's
available and used.

I haven't recently checked if perf has finally grown the ability to
load external debug symbols either. It never used to.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-11 Thread Andres Freund
On 2017-10-12 10:25:43 +0800, Craig Ringer wrote:
> On 4 October 2017 at 00:21, milist ujang  wrote:
> > On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringer  wrote:
> >>
> >>
> >> Can you get stacks please?
> >>
> >> Use -g
> >
> >
> > # Events: 2K cpu-clock
> > #
> > # Overhead   Command  Shared ObjectSymbol
> > #     .  
> > #
> > 86.96%  postgres  [kernel.kallsyms]  [k] __mutex_lock_common.isra.5
> > |
> > --- __mutex_lock_common.isra.5
> > read
> 
> 
> Unfortunately it looks like you're using a postgres built with
> -fomit-frame-pointers (the default) on x64, with an older perf not
> built with libunwind. This produces useless stacks.

Just read this mail, but for libunwind to work you'd have to specify
"--call-graph dwarf", no?

- Andres


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-11 Thread Craig Ringer
On 4 October 2017 at 00:21, milist ujang  wrote:
> On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringer  wrote:
>>
>>
>> Can you get stacks please?
>>
>> Use -g
>
>
> # Events: 2K cpu-clock
> #
> # Overhead   Command  Shared ObjectSymbol
> #     .  
> #
> 86.96%  postgres  [kernel.kallsyms]  [k] __mutex_lock_common.isra.5
> |
> --- __mutex_lock_common.isra.5
> read


Unfortunately it looks like you're using a postgres built with
-fomit-frame-pointers (the default) on x64, with an older perf not
built with libunwind. This produces useless stacks.

You may need to recompile with -fno-omit-frame-pointer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-03 Thread milist ujang
Hi Craig,

Anyway, this OS is guess OS in vmware (vsphere).
Thank for your response and help.

On Tue, Oct 3, 2017 at 8:49 PM, Craig Ringer  wrote:
>
>
> Can you get stacks please?
>
> Use -g
>

# Events: 2K cpu-clock
#
# Overhead   Command  Shared ObjectSymbol
#     .  
#
86.96%  postgres  [kernel.kallsyms]  [k] __mutex_lock_common.isra.5
|
--- __mutex_lock_common.isra.5
read

 2.85%  postgres  [kernel.kallsyms]  [k] do_raw_spin_lock
|
--- do_raw_spin_lock
   |
   |--90.48%-- read
   |
   |--8.33%-- recv
   |
--1.19%-- write

 2.44%  postgres  [kernel.kallsyms]  [k] mutex_unlock
|
--- mutex_unlock
read

 2.03%  postgres  [kernel.kallsyms]  [k] arch_local_irq_restore
|
--- arch_local_irq_restore
read

 1.32%  postgres  postgres   [.] ValidXLogRecord
|
--- ValidXLogRecord

 1.25%  postgres  [kernel.kallsyms]  [k] mutex_lock
|
--- mutex_lock
read


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-03 Thread Craig Ringer
On 3 October 2017 at 19:45, milist ujang  wrote:

> Hi all,
>
> I've an environment 9.4 + bdr:
> PostgreSQL 9.4.4
>

You're on a pretty old  postgres-bdr. Update. You're missing a lot of fixes
from mainline.


> This is consolidation databases, in this machine there are around 250+ wal
> sender processes.
>

Not a great use case for BDR.

Consider pglogical.


>
> finally get which processes (wal senders) that are using mutexes:
>
> perf top -e task-clock -p 55382
>
>
Can you get stacks please?

Use -g

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] BDR, wal sender, high system cpu, mutex_lock_common

2017-10-03 Thread milist ujang
Hi all,

I've an environment 9.4 + bdr:
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
4.7.2-5) 4.7.2, 64-bit

kernel version:
3.2.0-4-amd64 #1 SMP Debian 3.2.65-1 x86_64 GNU/Linux

This is consolidation databases, in this machine there are around 250+ wal
sender processes.

top output revealed high system cpu:
%Cpu(s):  1.4 us, 49.7 sy,  0.0 ni, 48.8 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0 st

profiling cpu with perf:

perf top -e cpu-clock

Events: 142K cpu-clock
 82.37%  [kernel][k] __mutex_lock_common.isra.5
  4.49%  [kernel][k] do_raw_spin_lock
  2.23%  [kernel][k] mutex_lock
  2.16%  [kernel][k] mutex_unlock
  2.12%  [kernel][k] arch_local_irq_restore
  1.73%  postgres[.] ValidXLogRecord
  0.87%  [kernel][k] __mutex_unlock_slowpath
  0.78%  [kernel][k] arch_local_irq_enable
  0.63%  [kernel][k] sys_recvfrom


finally get which processes (wal senders) that are using mutexes:

perf top -e task-clock -p 55382

Events: 697  task-clock
 88.08%  [kernel]  [k] __mutex_lock_common.isra.5
  3.27%  [kernel]  [k] do_raw_spin_lock
  2.34%  [kernel]  [k] arch_local_irq_restore
  2.10%  postgres  [.] ValidXLogRecord
  1.87%  [kernel]  [k] mutex_unlock
  1.87%  [kernel]  [k] mutex_lock
  0.47%  [kernel]  [k] sys_recvfrom

strace output of wal sender process:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 98.301.030072   5213063201463 read
  1.690.017686   0201464201464 recvfrom
  0.010.000110   0   806   lseek
  0.000.43   0   474   468 rt_sigreturn
  0.000.00   0 6   open
  0.000.00   0 6   close
-- --- --- - - 
100.001.047911415819403395 total


strace detail, majority read from pipe and recvfrom from socket (but most
of them are EAGAIN):

read(15, 
"~\320\5\0\1\0\0\0\0@\235\1\360\16\0\0\334\26\0\0\0\0\0\0\365\27\0\0\0\0\0\0"...,
8192) = 8192 <0.25>
read(6, 0x7fffdd837b3f, 1) = -1 EAGAIN (Resource temporarily unavailable)
<0.000116>
recvfrom(10, 0x7fffdd837b17, 1, 0, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable) <0.49>

ls -l /proc/62388/fd/15
lr-x-- 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/15 ->
/data/pg_xlog/00010EF00061

ls -l /proc/62388/fd/6
lr-x-- 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/6 ->
pipe:[1090892506]

ls -l /proc/62388/fd/10
lrwx-- 1 postgres postgres 64 Oct  1 08:39 /proc/62388/fd/10 ->
socket:[1096584060]

I wonder, is there kernel version has better handling mutexes?
Or is it the expected behavior?

Sorry for cross-posting, I have posted the same on pgsql-performance too


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory

2017-09-18 Thread milist ujang
Hi Craig,

So, is it safe to drop those list from this query output?

select riname from pg_replication_identifier where riname not in
(select external_id from pg_replication_identifier_progress);

I cannot read pg_get_replication_identifier_progress function, is it likely
c function?




On Fri, Sep 15, 2017 at 11:14 AM, Craig Ringer 
wrote:

> On 15 September 2017 at 11:46, milist ujang 
> wrote:
>
>> Hi Craig,
>>
>> Thanks again for pointing to inactive replication slot.
>> After inactive replication slot been dropped, the relfrozenxid now moving.
>>
>> I wonder if  replication identifier will have some issue if left
>> un-chained? since at other side there are inactive replication identifier.
>>
>
> No, that's harmless.
>
> However, if there's still an "other side" at all, you presumably have
> broken replication.
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory

2017-09-14 Thread Craig Ringer
On 15 September 2017 at 11:46, milist ujang  wrote:

> Hi Craig,
>
> Thanks again for pointing to inactive replication slot.
> After inactive replication slot been dropped, the relfrozenxid now moving.
>
> I wonder if  replication identifier will have some issue if left
> un-chained? since at other side there are inactive replication identifier.
>

No, that's harmless.

However, if there's still an "other side" at all, you presumably have
broken replication.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory

2017-09-14 Thread milist ujang
Hi Craig,

Thanks again for pointing to inactive replication slot.
After inactive replication slot been dropped, the relfrozenxid now moving.

I wonder if  replication identifier will have some issue if left
un-chained? since at other side there are inactive replication identifier.



On Fri, Sep 15, 2017 at 9:36 AM, Craig Ringer  wrote:

>
>
> Do you have any idle/old replication slots, perhaps from failed node joins
> or abandoned nodes not properly parted?
>
> SELECT *
> FROM pg_replication_slots;
>
> Also check
>
> SELECT oid,*
> FROM pg_database;
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory

2017-09-14 Thread Craig Ringer
On 14 September 2017 at 13:35, milist ujang  wrote:

> HI list,
>
> I have a database with bdr environment which keep alerting these messages
> in log file:
>
> HINT:  Close open transactions soon to avoid wraparound problems.
> WARNING:  oldest xmin is far in the past
>

Do you have any idle/old replication slots, perhaps from failed node joins
or abandoned nodes not properly parted?

SELECT *
FROM pg_replication_slots;

Also check

SELECT oid,*
FROM pg_database;


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] BDR, near xid wraparound, a lot of files in pg_subtrans directory

2017-09-13 Thread milist ujang
HI list,

I have a database with bdr environment which keep alerting these messages
in log file:

HINT:  Close open transactions soon to avoid wraparound problems.
WARNING:  oldest xmin is far in the past

Querying pg_stat_activity where state='active';

datname  | template1
query| autovacuum: VACUUM pg_catalog.pg_depend (to prevent
wraparound)

datname  | template1
query| autovacuum: VACUUM pg_toast.pg_toast_1255 (to prevent
wraparound)

datname  | template1
query| autovacuum: VACUUM pg_catalog.pg_ts_parser (to prevent
wraparound)

SELECT
 pg_namespace.nspname
 ,c.relname AS relname
 ,greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
 ,c.relfrozenxid
 ,t.relfrozenxid
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
LEFT JOIN
 pg_namespace
   ON pg_namespace.oid = c.relnamespace
WHERE c.relkind = 'r' and c.relname='pg_depend';

-[ RECORD 1 ]+---
nspname  | pg_catalog
relname  | pg_depend
age  | 1165907267
relfrozenxid | 102246720
relfrozenxid |


Trying to vacuum manual, but no luck:

postgres=# vacuum pg_catalog.pg_depend;
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
VACUUM
postgres=# vacuum freeze pg_catalog.pg_depend;
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
VACUUM
postgres=# vacuum full pg_catalog.pg_depend;
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
VACUUM


Check fozenxid again after vacuum, not moving:

SELECT
pg_namespace.nspname
,c.relname AS relname
,greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
,c.relfrozenxid
,t.relfrozenxid
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
LEFT JOIN
 pg_namespace
   ON pg_namespace.oid = c.relnamespace
WHERE c.relkind = 'r' and c.relname='pg_depend';
-[ RECORD 1 ]+---
nspname  | pg_catalog
relname  | pg_depend
age  | 1165908742
relfrozenxid | 102246720
relfrozenxid |


Searching the similar situation, found this url:
https://postgrespro.com/list/thread-id/1556972

but the above url was in streaming replication, not bdr; but symptom is
likely same:
there are a lot of files inside pg_subtrans directory, many files older
than latest cluster restarted.

query pg_prepared_xacts, pg_stat_activity, pg_locks with no indication
about long running queries or even open transactions.


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, limit number of groups in a cluster

2017-09-10 Thread Craig Ringer
On 11 September 2017 at 09:32, milist ujang  wrote:

> Hi all,
>
> Based on the docs and look at the processes, it seems 1 wal sender on each
> node per group.
>
> If there is scenario of consolidating many databases (say hundreds) into 1
> database (in this central cluster there are hundreds wal sender), what is
> the limit number of groups?
>

It's not a use case I've paid much attention to. I expect it'll be limited
by performance and memory, rather than have any firm limit.

Maybe you should look into pglogical. This seems like a weird use for BDR.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] BDR, limit number of groups in a cluster

2017-09-10 Thread milist ujang
Hi all,

Based on the docs and look at the processes, it seems 1 wal sender on each
node per group.

If there is scenario of consolidating many databases (say hundreds) into 1
database (in this central cluster there are hundreds wal sender), what is
the limit number of groups?

I wonder if anyone has runs such the scenario?


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-07 Thread Craig Ringer
On 7 September 2017 at 21:16, milist ujang  wrote:

> Hi Craig,
>
> On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer 
> wrote:
>
>>
>> You could drop and re-create the replication slot, I guess. But your
>> nodes would be hopelessly out of sync and need manual resync (with data
>> replication disabled) of one node vs another.
>>
>
> Thanks for pointing to replication slot.
> I Simulate the similar situation in dev env by remove the wal segment on
> node1, when  node2 keep inserting into a table, now it perfectly can move
> forward to latest wal segment, but the difference situation is at
> node_status.
>
> In production node_status is  i  in node1
>

there's a known bug in bdr1 where sometimes the node status doesn't update
from 'i' after joining.


> can I safely update the node_status directy on bdr.bdr_nodes?
>

Usually not. In this one specific case where a node is known to be fully
joined and online, but its status is stuck at 'i', yes.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-07 Thread milist ujang
Hi Craig,

On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer  wrote:

>
> You could drop and re-create the replication slot, I guess. But your nodes
> would be hopelessly out of sync and need manual resync (with data
> replication disabled) of one node vs another.
>

Thanks for pointing to replication slot.
I Simulate the similar situation in dev env by remove the wal segment on
node1, when  node2 keep inserting into a table, now it perfectly can move
forward to latest wal segment, but the difference situation is at
node_status.

In production node_status is  i  in node1 but  r  in node2, where on my dev
both nodes keep  r  , even I waited to let it may change quite long.

can I safely update the node_status directy on bdr.bdr_nodes?


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-06 Thread Craig Ringer
On 6 September 2017 at 08:47, milist ujang  wrote:

> Hi Craig
>
> On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer 
> wrote:
>>
>>
>> BDR can, see bdr.skip_changes_upto .
>>
>> Unluckily my bdr is 0.9.3
>
>
>> But PostgreSQL's logical decoding requires a contiguous WAL stream to
>> maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
>> advance. So your replication from that node is broken, and you have to part
>> the node then rejoin. You'll need to manually recover any diverged data.
>>
>> Yup, I willing to reconcile data manualy via dblink or else,
> is it still possible to move wal segment in 0.9.3?
>

Well, you can skip changes on the logical decoding stream manually in 0.9.3
using the underlying postgres functions. But it won't help you because what
you broke when you deleted the WAL segments wasn't BDR, it was postgres
logical decoding.

It *requires* a contiguous stream of WAL. It cannot recover if you do not
have that. And you cannot really reset it.

You could drop and re-create the replication slot, I guess. But your nodes
would be hopelessly out of sync and need manual resync (with data
replication disabled) of one node vs another.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-05 Thread milist ujang
Hi Craig

On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer  wrote:
>
>
> BDR can, see bdr.skip_changes_upto .
>
> Unluckily my bdr is 0.9.3


> But PostgreSQL's logical decoding requires a contiguous WAL stream to
> maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
> advance. So your replication from that node is broken, and you have to part
> the node then rejoin. You'll need to manually recover any diverged data.
>
> Yup, I willing to reconcile data manualy via dblink or else,
is it still possible to move wal segment in 0.9.3?

I've played these at dev env to simulate, but no luck:

# select * from pg_replication_identifier_progress;
 local_id |  external_id   | remote_lsn | local_lsn
--+++---
1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950  | 0/3007588
(1 row)

# SELECT
pg_replication_identifier_setup_replaying_from('bdr_6461744703437035137_1_34424_30406_');
 pg_replication_identifier_setup_replaying_from


(1 row)

# SELECT
pg_replication_identifier_setup_tx_origin('0/4288960','2017-09-06');
 pg_replication_identifier_setup_tx_origin
---

(1 row)

# select * from pg_replication_identifier_progress;
 local_id |  external_id   | remote_lsn | local_lsn
--+++---
1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950  | 0/3007588
(1 row)


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-05 Thread Craig Ringer
On 6 September 2017 at 01:52, milist ujang  wrote:

> Hi all,
>
> due to space issue and high volume transaction, some wal segments removed
> from pg_xlog on bdr environment.
>

What, you deleted them?


> I had played streams and goldengate (oracle product) , that at capture
> side we can move forward to certain sequence (archivedlog/redolog - wal
> segment in postgres).
>
> So, is it possible to move forward to read recent wal segment in bdr
> environment? (assume data reconciliation will be done manually).
>

BDR can, see bdr.skip_changes_upto .

But PostgreSQL's logical decoding requires a contiguous WAL stream to
maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
advance. So your replication from that node is broken, and you have to part
the node then rejoin. You'll need to manually recover any diverged data.

Don't go in and randomly delete things in the postgres data directory, or
things will break.

The BDR manual warns of the importance of disk space monitoring...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] BDR, wal segment has been removed, is it possible move forward?

2017-09-05 Thread milist ujang
Hi all,

due to space issue and high volume transaction, some wal segments removed
from pg_xlog on bdr environment.

warning log at node1 saying "requested WAL segment . has already been
removed" following Connection reset by peer.

log at node2 :
Sending replication command: START_REPLICATION SLOT
...
...
XX000: data stream ended


I had played streams and goldengate (oracle product) , that at capture side
we can move forward to certain sequence (archivedlog/redolog - wal segment
in postgres).

So, is it possible to move forward to read recent wal segment in bdr
environment? (assume data reconciliation will be done manually).


-- 
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab


Re: [GENERAL] BDR replication port

2017-08-25 Thread Zhu, Joshua
Thanks for the idea, and that is it... it's indeed the other direction of 
replication was affected by blocking a port

-Original Message-
From: Alvaro Aguayo Garcia-Rada [mailto:aagu...@opensysperu.com] 
Sent: Friday, August 25, 2017 5:00 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: PostgreSql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] BDR replication port

That's weird. Another idea: Do changes on that server get replicated to the 
other servers? I'm not sure if incomming connections are used to receive WAL or 
to send it.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com>
Cc: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 18:35:21
Subject: RE: [GENERAL] BDR replication port

Thought about that possibility, so postgres on the node with port blocked was 
restarted after blocking the port.

-Original Message-
From: Alvaro Aguayo Garcia-Rada [mailto:aagu...@opensysperu.com] 
Sent: Friday, August 25, 2017 3:23 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: PostgreSql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] BDR replication port

Just a guess: How did you blocked the port? Depending on that, you could be 
blocking only new connections, but connections already established would 
continue to transmit data; remember BDR only reconnects when connection is lost.

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 16:49:44
Subject: [GENERAL] BDR replication port

Hi, I am experimenting how network configuration impacts BDR replication, ran 
into something that I can't explain, and wonder if someone can shed light.  
Here it goes:

With a four node BDR group configured and running (all using default port 
5432), I purposely blocked port 5432 on one of the node in the group, and was 
expecting to see changes on other nodes stop being replicated to this node, but 
that's not  what happened.

Shell commands show that the port was indeed blocked  (In the following example 
session, the port 5432 is blocked on 10.3.122.31, but open on 10.3.122.21):

% nc -v --send-only 10.3.122.21 5432 http://nmap.org/ncat )
Ncat: Connected to 10.3.122.21:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.00 seconds.

% nc -v --send-only 10.3.122.31 5432 http://nmap.org/ncat )
Ncat: Connection timed out.

% psql -h 10.3.122.21 mydb
psql (9.4.10)
Type "help" for help.
mydb=#

% psql -h 10.3.122.31 mydb
psql: could not connect to server: Connection timed out
Is the server running on host "10.3.122.31" and accepting
TCP/IP connections on port 5432?

At this state, I tried insertion and update on node 10.3.122.21, and all of 
which were replicated to node 10.3.122.31.  However, attempt to create a new 
table on node 10.3.122.21 was stuck (as expected) until the port 5432 on 
10.3.122.31 opened again.

So my question is, is there another port other than port 5432 that BDR uses for 
replication? If not, how could changes be replicated to 10.3.122.31 when its 
port 5432 was blocked?

Thanks,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR replication port

2017-08-25 Thread Alvaro Aguayo Garcia-Rada
That's weird. Another idea: Do changes on that server get replicated to the 
other servers? I'm not sure if incomming connections are used to receive WAL or 
to send it.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com>
Cc: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 18:35:21
Subject: RE: [GENERAL] BDR replication port

Thought about that possibility, so postgres on the node with port blocked was 
restarted after blocking the port.

-Original Message-
From: Alvaro Aguayo Garcia-Rada [mailto:aagu...@opensysperu.com] 
Sent: Friday, August 25, 2017 3:23 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: PostgreSql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] BDR replication port

Just a guess: How did you blocked the port? Depending on that, you could be 
blocking only new connections, but connections already established would 
continue to transmit data; remember BDR only reconnects when connection is lost.

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 16:49:44
Subject: [GENERAL] BDR replication port

Hi, I am experimenting how network configuration impacts BDR replication, ran 
into something that I can't explain, and wonder if someone can shed light.  
Here it goes:

With a four node BDR group configured and running (all using default port 
5432), I purposely blocked port 5432 on one of the node in the group, and was 
expecting to see changes on other nodes stop being replicated to this node, but 
that's not  what happened.

Shell commands show that the port was indeed blocked  (In the following example 
session, the port 5432 is blocked on 10.3.122.31, but open on 10.3.122.21):

% nc -v --send-only 10.3.122.21 5432 http://nmap.org/ncat )
Ncat: Connected to 10.3.122.21:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.00 seconds.

% nc -v --send-only 10.3.122.31 5432 http://nmap.org/ncat )
Ncat: Connection timed out.

% psql -h 10.3.122.21 mydb
psql (9.4.10)
Type "help" for help.
mydb=#

% psql -h 10.3.122.31 mydb
psql: could not connect to server: Connection timed out
Is the server running on host "10.3.122.31" and accepting
TCP/IP connections on port 5432?

At this state, I tried insertion and update on node 10.3.122.21, and all of 
which were replicated to node 10.3.122.31.  However, attempt to create a new 
table on node 10.3.122.21 was stuck (as expected) until the port 5432 on 
10.3.122.31 opened again.

So my question is, is there another port other than port 5432 that BDR uses for 
replication? If not, how could changes be replicated to 10.3.122.31 when its 
port 5432 was blocked?

Thanks,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR replication port

2017-08-25 Thread Zhu, Joshua
Thought about that possibility, so postgres on the node with port blocked was 
restarted after blocking the port.

-Original Message-
From: Alvaro Aguayo Garcia-Rada [mailto:aagu...@opensysperu.com] 
Sent: Friday, August 25, 2017 3:23 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: PostgreSql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] BDR replication port

Just a guess: How did you blocked the port? Depending on that, you could be 
blocking only new connections, but connections already established would 
continue to transmit data; remember BDR only reconnects when connection is lost.

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 16:49:44
Subject: [GENERAL] BDR replication port

Hi, I am experimenting how network configuration impacts BDR replication, ran 
into something that I can't explain, and wonder if someone can shed light.  
Here it goes:

With a four node BDR group configured and running (all using default port 
5432), I purposely blocked port 5432 on one of the node in the group, and was 
expecting to see changes on other nodes stop being replicated to this node, but 
that's not  what happened.

Shell commands show that the port was indeed blocked  (In the following example 
session, the port 5432 is blocked on 10.3.122.31, but open on 10.3.122.21):

% nc -v --send-only 10.3.122.21 5432 http://nmap.org/ncat )
Ncat: Connected to 10.3.122.21:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.00 seconds.

% nc -v --send-only 10.3.122.31 5432 http://nmap.org/ncat )
Ncat: Connection timed out.

% psql -h 10.3.122.21 mydb
psql (9.4.10)
Type "help" for help.
mydb=#

% psql -h 10.3.122.31 mydb
psql: could not connect to server: Connection timed out
Is the server running on host "10.3.122.31" and accepting
TCP/IP connections on port 5432?

At this state, I tried insertion and update on node 10.3.122.21, and all of 
which were replicated to node 10.3.122.31.  However, attempt to create a new 
table on node 10.3.122.21 was stuck (as expected) until the port 5432 on 
10.3.122.31 opened again.

So my question is, is there another port other than port 5432 that BDR uses for 
replication? If not, how could changes be replicated to 10.3.122.31 when its 
port 5432 was blocked?

Thanks,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR replication port

2017-08-25 Thread Alvaro Aguayo Garcia-Rada
Just a guess: How did you blocked the port? Depending on that, you could be 
blocking only new connections, but connections already established would 
continue to transmit data; remember BDR only reconnects when connection is lost.

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Zhu, Joshua" <j...@vormetric.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Friday, 25 August, 2017 16:49:44
Subject: [GENERAL] BDR replication port

Hi, I am experimenting how network configuration impacts BDR replication, ran 
into something that I can't explain, and wonder if someone can shed light.  
Here it goes:

With a four node BDR group configured and running (all using default port 
5432), I purposely blocked port 5432 on one of the node in the group, and was 
expecting to see changes on other nodes stop being replicated to this node, but 
that's not  what happened.

Shell commands show that the port was indeed blocked  (In the following example 
session, the port 5432 is blocked on 10.3.122.31, but open on 10.3.122.21):

% nc -v --send-only 10.3.122.21 5432 http://nmap.org/ncat )
Ncat: Connected to 10.3.122.21:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.00 seconds.

% nc -v --send-only 10.3.122.31 5432 http://nmap.org/ncat )
Ncat: Connection timed out.

% psql -h 10.3.122.21 mydb
psql (9.4.10)
Type "help" for help.
mydb=#

% psql -h 10.3.122.31 mydb
psql: could not connect to server: Connection timed out
Is the server running on host "10.3.122.31" and accepting
TCP/IP connections on port 5432?

At this state, I tried insertion and update on node 10.3.122.21, and all of 
which were replicated to node 10.3.122.31.  However, attempt to create a new 
table on node 10.3.122.21 was stuck (as expected) until the port 5432 on 
10.3.122.31 opened again.

So my question is, is there another port other than port 5432 that BDR uses for 
replication? If not, how could changes be replicated to 10.3.122.31 when its 
port 5432 was blocked?

Thanks,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR replication port

2017-08-25 Thread Zhu, Joshua

Hi, I am experimenting how network configuration impacts BDR replication, ran 
into something that I can't explain, and wonder if someone can shed light.  
Here it goes:

With a four node BDR group configured and running (all using default port 
5432), I purposely blocked port 5432 on one of the node in the group, and was 
expecting to see changes on other nodes stop being replicated to this node, but 
that's not  what happened.

Shell commands show that the port was indeed blocked  (In the following example 
session, the port 5432 is blocked on 10.3.122.31, but open on 10.3.122.21):

% nc -v --send-only 10.3.122.21 5432 http://nmap.org/ncat )
Ncat: Connected to 10.3.122.21:5432.
Ncat: 0 bytes sent, 0 bytes received in 0.00 seconds.

% nc -v --send-only 10.3.122.31 5432 http://nmap.org/ncat )
Ncat: Connection timed out.

% psql -h 10.3.122.21 mydb
psql (9.4.10)
Type "help" for help.
mydb=#

% psql -h 10.3.122.31 mydb
psql: could not connect to server: Connection timed out
Is the server running on host "10.3.122.31" and accepting
TCP/IP connections on port 5432?

At this state, I tried insertion and update on node 10.3.122.21, and all of 
which were replicated to node 10.3.122.31.  However, attempt to create a new 
table on node 10.3.122.21 was stuck (as expected) until the port 5432 on 
10.3.122.31 opened again.

So my question is, is there another port other than port 5432 that BDR uses for 
replication? If not, how could changes be replicated to 10.3.122.31 when its 
port 5432 was blocked?

Thanks,



Re: [GENERAL] BDR node removal and rejoin

2017-07-13 Thread Craig Ringer
On 14 July 2017 at 00:09, Zhu, Joshua  wrote:

>
>
> Found these log entries from one of the other node:
>
>
>
> t=2017-07-13 08:35:34 PDT p=27292 a=DEBUG:  0: found valid replication
> identifier 15
>
> t=2017-07-13 08:35:34 PDT p=27292 a=LOCATION:
> bdr_establish_connection_and_slot, bdr.c:604
>
> t=2017-07-13 08:35:34 PDT p=27292 a=ERROR:  53400: no free replication
> state could be found for 15, increase max_replication_slots
>
>
>
> Increased max_replication_slots, things are looking good now, thanks.
>
>
>
> This does bring up a couple of questions:
>
>
>
>1. Given the fact there is no real increase in the number of nodes in
>this repeated removal/rejoining exercise, yet it caused replication slots
>being used up, wouldn’t removal of a node also automatically free up the
>replication slot allocated for the node?
>
>
Yes, it should. Open issue. A patch would be welcomed.



>
>1. Or is there a way to manually free up no longer needed slots? (the
>don’t seem to show up in pg_replication_slots view, I made sure to use
>pg_drop_replication_slot when they do show up there)
>
>
It'll be complaining about replication identifiers ("origins" in 9.6); see
pg_replication_identifier


>1. If there is such a thing, what is the rule of thumb for best value
>of max_replication_slots (are they somehow related to the value
>max_wal_senders as well), with respect to, say, the max number of nodes
>intended to support?
>
>
I think that's covered in the docs, but it's safe to err fairly high. The
cost of extra slots is minimal.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR node removal and rejoin

2017-07-13 Thread Zhu, Joshua

Found these log entries from one of the other node:

t=2017-07-13 08:35:34 PDT p=27292 a=DEBUG:  0: found valid replication 
identifier 15
t=2017-07-13 08:35:34 PDT p=27292 a=LOCATION:  
bdr_establish_connection_and_slot, bdr.c:604
t=2017-07-13 08:35:34 PDT p=27292 a=ERROR:  53400: no free replication state 
could be found for 15, increase max_replication_slots

Increased max_replication_slots, things are looking good now, thanks.

This does bring up a couple of questions:


  1.  Given the fact there is no real increase in the number of nodes in this 
repeated removal/rejoining exercise, yet it caused replication slots being used 
up, wouldn’t removal of a node also automatically free up the replication slot 
allocated for the node? Or is there a way to manually free up no longer needed 
slots? (the don’t seem to show up in pg_replication_slots view, I made sure to 
use pg_drop_replication_slot when they do show up there)
  2.  If there is such a thing, what is the rule of thumb for best value of 
max_replication_slots (are they somehow related to the value max_wal_senders as 
well), with respect to, say, the max number of nodes intended to support?

Thanks

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Wednesday, July 12, 2017 11:59 PM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR node removal and rejoin

On 13 July 2017 at 01:56, Zhu, Joshua 
<j...@vormetric.com<mailto:j...@vormetric.com>> wrote:
Thanks for the clarification.

Looks like I am running into a different issue: while trying to pin down 
precisely the steps (and the order in which to perform them) needed to 
remove/rejoin a node, the removal/rejoining exercise was repeated a number of 
times, and stuck again:


  1.  The status of the re-joining node (node4) on other nodes is “I”
  2.  The status of the re-joining node on the node4 itself started at “I”, 
changed to “o”, then stuck there
  3.  From the log file for node4, the following entries are constantly being 
generated:

2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  08003: unexpected EOF 
on client connection
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  SocketBackend, 
postgres.c:355
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  08003: unexpected EOF 
on client connection

Check the logs on the other end.



--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR node removal and rejoin

2017-07-13 Thread Craig Ringer
On 13 July 2017 at 01:56, Zhu, Joshua  wrote:

> Thanks for the clarification.
>
>
>
> Looks like I am running into a different issue: while trying to pin down
> precisely the steps (and the order in which to perform them) needed to
> remove/rejoin a node, the removal/rejoining exercise was repeated a number
> of times, and stuck again:
>
>
>
>1. The status of the re-joining node (node4) on other nodes is “I”
>2. The status of the re-joining node on the node4 itself started at
>“I”, changed to “o”, then stuck there
>3. From the log file for node4, the following entries are constantly
>being generated:
>
>
>
> 2017-07-12 10:37:46 PDT [24943:bdr 
> (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:
> 0: received replication command: IDENTIFY_SYSTEM
>
> 2017-07-12 10:37:46 PDT [24943:bdr 
> (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:
> exec_replication_command, walsender.c:1309
>
> 2017-07-12 10:37:46 PDT [24943:bdr 
> (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:
> 08003: unexpected EOF on client connection
>
> 2017-07-12 10:37:46 PDT [24943:bdr 
> (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:
> SocketBackend, postgres.c:355
>
> 2017-07-12 10:37:46 PDT [24944:bdr 
> (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:
> 0: received replication command: IDENTIFY_SYSTEM
>
> 2017-07-12 10:37:46 PDT [24944:bdr 
> (6408408103171110238,1,24713,):receive:::1(33884)]LOCATION:
> exec_replication_command, walsender.c:1309
>
> 2017-07-12 10:37:46 PDT [24944:bdr 
> (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:
> 08003: unexpected EOF on client connection
>

Check the logs on the other end.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR node removal and rejoin

2017-07-12 Thread Zhu, Joshua
Thanks for the clarification.

Looks like I am running into a different issue: while trying to pin down 
precisely the steps (and the order in which to perform them) needed to 
remove/rejoin a node, the removal/rejoining exercise was repeated a number of 
times, and stuck again:


  1.  The status of the re-joining node (node4) on other nodes is “I”
  2.  The status of the re-joining node on the node4 itself started at “I”, 
changed to “o”, then stuck there
  3.  From the log file for node4, the following entries are constantly being 
generated:

2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]DEBUG:  08003: unexpected EOF 
on client connection
2017-07-12 10:37:46 PDT [24943:bdr 
(6334686800251932108,1,43865,):receive:::1(33883)]LOCATION:  SocketBackend, 
postgres.c:355
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]DEBUG:  08003: unexpected EOF 
on client connection
2017-07-12 10:37:46 PDT [24944:bdr 
(6408408103171110238,1,24713,):receive:::1(33884)]LOCATION:  SocketBackend, 
postgres.c:355
2017-07-12 10:37:46 PDT [24946:bdr 
(6334686760735153516,1,43845,):receive:::1(33885)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24946:bdr 
(6334686760735153516,1,43845,):receive:::1(33885)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24946:bdr 
(6334686760735153516,1,43845,):receive:::1(33885)]DEBUG:  08003: unexpected EOF 
on client connection
2017-07-12 10:37:46 PDT [24946:bdr 
(6334686760735153516,1,43845,):receive:::1(33885)]LOCATION:  SocketBackend, 
postgres.c:355
2017-07-12 10:37:49 PDT [24949:bdr 
(6394432535408825526,1,37325,):receive:::1(33892)]DEBUG:  0: received 
replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:49 PDT [24949:bdr 
(6394432535408825526,1,37325,):receive:::1(33892)]LOCATION:  
exec_replication_command, walsender.c:1309
2017-07-12 10:37:49 PDT [24949:bdr 
(6394432535408825526,1,37325,):receive:::1(33892)]DEBUG:  08003: unexpected EOF 
on client connection
2017-07-12 10:37:49 PDT [24949:bdr 
(6394432535408825526,1,37325,):receive:::1(33892)]LOCATION:  SocketBackend, 
postgres.c:355

What do these entries say? and what can be done to correct the situation (there 
have been no change with respect to either postgres  or network configuration 
in the remove/rejoin exercise)?

Thanks

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Wednesday, July 12, 2017 1:59 AM
To: Zhu, Joshua <j...@thalesesec.net>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR node removal and rejoin

On 11 July 2017 at 05:49, Zhu, Joshua 
<j...@vormetric.com<mailto:j...@vormetric.com>> wrote:
An update… after manually removing the record for ‘node4’ from bdr.bdr_nodes, 
corresponding record in bdr.bdr_connections, and associated replication slot 
(with pg_drop_replication_slot), rejoining was successful.

I was under the impression that there is no need to perform manual cleanup 
before a removed node (with database dropped and recreated) rejoining a BDR 
group.

BDR1 requires that you manually remove the bdr.bdr_nodes entry if you intend to 
re-use the same node name.


--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR node removal and rejoin

2017-07-12 Thread Craig Ringer
On 11 July 2017 at 05:49, Zhu, Joshua  wrote:

> An update… after manually removing the record for ‘node4’ from
> bdr.bdr_nodes, corresponding record in bdr.bdr_connections, and associated
> replication slot (with pg_drop_replication_slot), rejoining was successful.
>
>
>
> I was under the impression that there is no need to perform manual cleanup
> before a removed node (with database dropped and recreated) rejoining a BDR
> group.
>

BDR1 requires that you manually remove the bdr.bdr_nodes entry if you
intend to re-use the same node name.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR node removal and rejoin

2017-07-10 Thread Zhu, Joshua
An update... after manually removing the record for 'node4' from bdr.bdr_nodes, 
corresponding record in bdr.bdr_connections, and associated replication slot 
(with pg_drop_replication_slot), rejoining was successful.

I was under the impression that there is no need to perform manual cleanup 
before a removed node (with database dropped and recreated) rejoining a BDR 
group.

From: Zhu, Joshua
Sent: Friday, July 07, 2017 2:59 PM
To: 'pgsql-general@postgresql.org' 
Subject: BDR node removal and rejoin

Hi, I am having difficulty removing a node from a BDR group (with nodes node1 
through node5) then rejoin the group.

Prior to removing a node, the BDR is running fine, query on bdr.bdr_nodes table 
shows all nodes having the status 'r'.

Here is what I have done for removing node5 and rejoining:


  *   On node1, do bdr.bdr_part_by_node_names

At this point the status of node5 in bdr.bdr_nodes becomes 'k'


  *   On node5, do bdr.remove_bdr_from_local_node
  *   On node5, drop and recreate the database, then rejoin using 
bdr.bdr_group_join (using the same node name and external dsn)

At this point the status of node5 on node1 though node4 still remains 'k', and 
the status of node5 on node5 (there is only one record) is 'i', and they stuck 
at these status codes.
[note: I tried using a different node name on rejoining, same result]

What have I done wrong, what is the correct way of doing removal and rejoining?

Thanks




[GENERAL] BDR node removal and rejoin

2017-07-07 Thread Zhu, Joshua
Hi, I am having difficulty removing a node from a BDR group (with nodes node1 
through node5) then rejoin the group.

Prior to removing a node, the BDR is running fine, query on bdr.bdr_nodes table 
shows all nodes having the status 'r'.

Here is what I have done for removing node5 and rejoining:


  *   On node1, do bdr.bdr_part_by_node_names

At this point the status of node5 in bdr.bdr_nodes becomes 'k'


  *   On node5, do bdr.remove_bdr_from_local_node
  *   On node5, drop and recreate the database, then rejoin using 
bdr.bdr_group_join (using the same node name and external dsn)

At this point the status of node5 on node1 though node4 still remains 'k', and 
the status of node5 on node5 (there is only one record) is 'i', and they stuck 
at these status codes.
[note: I tried using a different node name on rejoining, same result]

What have I done wrong, what is the correct way of doing removal and rejoining?

Thanks




Re: [GENERAL] BDR replication and table triggers

2017-05-02 Thread Craig Ringer
> However if I perform any INSERT, UPDATE or DELETE operations on
> DB2 and these changes propagate over to DB1 via BDR I do not see DB1 firing
> any triggers. Is this intended behavior?

Yes.

>  My current understanding is that
> BDR is unable to invoke Postgres triggers as it operates on the rows
> directly, a layer below Postgres. Is this Correct?

Yes.

>  Is there any mechanism
> that exists that could provide notifications to a listening application when
> BDR makes changes to the underlying database?

You could listen to an underlying logical decoding stream, but it
might be a bit fiddly and complex for your needs.

Ideally we'd be able to fire triggers in BDR, but that's not
implemented or on the current roadmap and there's no funded work on it
at this point. There's some work to support it in pglogical though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR replication and table triggers

2017-05-02 Thread Sylvain Marechal
Why not using the logical decoding feature:
https://www.postgresql.org/docs/9.4/static/logicaldecoding-example.html

On both sides, you would have a process that regularly decodes the stream
and emits notifications for event in tables you are insterested in.

Sylvain

2017-05-02 18:18 GMT+02:00 Alvaro Aguayo Garcia-Rada <
aagu...@opensysperu.com>:

> Hi.
>
> It's not like BDR is unable to replicate triggers across the cluster: BDR
> is not intended to do so.
>
> BDR replicates everything that happens inside a transaction; that includes
> both SQL run directly from the application, as well as changes made by
> triggers and extensions. As the changes are applied directly from the WAL,
> no trigger is re-run on the other nodes. If the trigger is re-run, that
> would lead to problems, such as duplicated rows.
>
> The only "problem", if it really is, is that BDR does not copy
> notifications across the databases. As this may be seen as a problem, I
> could also consider it as a chance to make the application more
> self-conscious of the distributed environment it is running in. So I would
> try one out of two alternatives:
>
> 1. Make the application listen to notifications on both databases, so it
> will get notified of changes no matter where they happen
>
> 2. Instead of using notify, create a notification table, which your app
> should scan periodically and act accordingly.
>
> Regards,
>
> Alvaro Aguayo
> Jefe de Operaciones
> Open Comb Systems E.I.R.L.
>
> Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51)
> 954183248
> Website: www.ocs.pe
>
> - Original Message -
> From: "jamesadams89" <jamesadam...@hotmail.com>
> To: "PostgreSql-general" <pgsql-general@postgresql.org>
> Sent: Wednesday, 26 April, 2017 07:48:03
> Subject: [GENERAL] BDR replication and table triggers
>
> Hi,
>
> I have some questions regarding how BDR interacts with triggers.
>
> I have two databases that are both joined to the same BDR group and
> correctly replicating between one another sharing a table created as:
>
> create table testtable(
> key varchar(16) NOT NULL PRIMARY KEY,
> data jsonb
> );
>
> With the following trigger defined:
>
> CREATE OR REPLACE FUNCTION test_table_notify()
> RETURNS TRIGGER AS
> $$
> BEGIN
> IF TG_OP='INSERT' OR TG_OP='UPDATE' THEN
> PERFORM pg_notify( 'TestTable', TG_OP || ' ' || NEW.key );
> ELSE
> PERFORM pg_notify( 'TestTable', TG_OP || ' ' || OLD.key );
> END IF;
> RETURN NULL;
> END;
> $$ LANGUAGE plpgsql;
>
>
> CREATE TRIGGER TestTableTrigger
> AFTER INSERT OR UPDATE OR DELETE
> on testtable
> FOR EACH ROW
> EXECUTE PROCEDURE test_table_notify();
>
> I then have a client application listening on the 'TestTable' Notify on one
> of the Databases:
>
> Client
>  ___
> | |
> | A  |
> |___|
>   /\
>|
>  _|_ ___
> |  | |   |
> |DB1|-|DB2|
> |_ __| ||
>
> If I perform any INSERT, UPDATE or DELETE operations directly on DB1 I see
> the trigger on the table being fired as expected and Client Application 'A'
> recieves the notify.  I also see the changes propagate to DB2 via BDR as
> expected.  However if I perform any INSERT, UPDATE or DELETE operations on
> DB2 and these changes propagate over to DB1 via BDR I do not see DB1 firing
> any triggers. Is this intended behavior?  My current understanding is that
> BDR is unable to invoke Postgres triggers as it operates on the rows
> directly, a layer below Postgres. Is this Correct? Is there any mechanism
> that exists that could provide notifications to a listening application
> when
> BDR makes changes to the underlying database?
>
> Apologies if this is all a bit elementary, this is my first foray into BDR
> and I was unable to find anything in the documentation that mentioned
> triggers.
>
> Thanks for any input
>
>
>
> --
> View this message in context: http://www.postgresql-archive.
> org/BDR-replication-and-table-triggers-tp5958463.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] BDR replication and table triggers

2017-05-02 Thread Alvaro Aguayo Garcia-Rada
Hi.

It's not like BDR is unable to replicate triggers across the cluster: BDR is 
not intended to do so.

BDR replicates everything that happens inside a transaction; that includes both 
SQL run directly from the application, as well as changes made by triggers and 
extensions. As the changes are applied directly from the WAL, no trigger is 
re-run on the other nodes. If the trigger is re-run, that would lead to 
problems, such as duplicated rows.

The only "problem", if it really is, is that BDR does not copy notifications 
across the databases. As this may be seen as a problem, I could also consider 
it as a chance to make the application more self-conscious of the distributed 
environment it is running in. So I would try one out of two alternatives:

1. Make the application listen to notifications on both databases, so it will 
get notified of changes no matter where they happen

2. Instead of using notify, create a notification table, which your app should 
scan periodically and act accordingly.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "jamesadams89" <jamesadam...@hotmail.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Wednesday, 26 April, 2017 07:48:03
Subject: [GENERAL] BDR replication and table triggers

Hi,

I have some questions regarding how BDR interacts with triggers. 

I have two databases that are both joined to the same BDR group and
correctly replicating between one another sharing a table created as:

create table testtable(
key varchar(16) NOT NULL PRIMARY KEY,
data jsonb
);

With the following trigger defined:

CREATE OR REPLACE FUNCTION test_table_notify()
RETURNS TRIGGER AS
$$
BEGIN
IF TG_OP='INSERT' OR TG_OP='UPDATE' THEN
PERFORM pg_notify( 'TestTable', TG_OP || ' ' || NEW.key );
ELSE
PERFORM pg_notify( 'TestTable', TG_OP || ' ' || OLD.key );
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;


CREATE TRIGGER TestTableTrigger
AFTER INSERT OR UPDATE OR DELETE
on testtable
FOR EACH ROW
EXECUTE PROCEDURE test_table_notify();

I then have a client application listening on the 'TestTable' Notify on one
of the Databases:

Client
 ___
| |
| A  |
|___|
  /\
   | 
 _|_ ___
|  | |   |
|DB1|-|DB2|
|_ __| ||

If I perform any INSERT, UPDATE or DELETE operations directly on DB1 I see
the trigger on the table being fired as expected and Client Application 'A'
recieves the notify.  I also see the changes propagate to DB2 via BDR as
expected.  However if I perform any INSERT, UPDATE or DELETE operations on
DB2 and these changes propagate over to DB1 via BDR I do not see DB1 firing
any triggers. Is this intended behavior?  My current understanding is that
BDR is unable to invoke Postgres triggers as it operates on the rows
directly, a layer below Postgres. Is this Correct? Is there any mechanism
that exists that could provide notifications to a listening application when
BDR makes changes to the underlying database? 

Apologies if this is all a bit elementary, this is my first foray into BDR
and I was unable to find anything in the documentation that mentioned
triggers.

Thanks for any input



--
View this message in context: 
http://www.postgresql-archive.org/BDR-replication-and-table-triggers-tp5958463.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR replication and table triggers

2017-05-02 Thread jamesadams89
Hi,

I have some questions regarding how BDR interacts with triggers. 

I have two databases that are both joined to the same BDR group and
correctly replicating between one another sharing a table created as:

create table testtable(
key varchar(16) NOT NULL PRIMARY KEY,
data jsonb
);

With the following trigger defined:

CREATE OR REPLACE FUNCTION test_table_notify()
RETURNS TRIGGER AS
$$
BEGIN
IF TG_OP='INSERT' OR TG_OP='UPDATE' THEN
PERFORM pg_notify( 'TestTable', TG_OP || ' ' || NEW.key );
ELSE
PERFORM pg_notify( 'TestTable', TG_OP || ' ' || OLD.key );
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;


CREATE TRIGGER TestTableTrigger
AFTER INSERT OR UPDATE OR DELETE
on testtable
FOR EACH ROW
EXECUTE PROCEDURE test_table_notify();

I then have a client application listening on the 'TestTable' Notify on one
of the Databases:

Client
 ___
| |
| A  |
|___|
  /\
   | 
 _|_ ___
|  | |   |
|DB1|-|DB2|
|_ __| ||

If I perform any INSERT, UPDATE or DELETE operations directly on DB1 I see
the trigger on the table being fired as expected and Client Application 'A'
recieves the notify.  I also see the changes propagate to DB2 via BDR as
expected.  However if I perform any INSERT, UPDATE or DELETE operations on
DB2 and these changes propagate over to DB1 via BDR I do not see DB1 firing
any triggers. Is this intended behavior?  My current understanding is that
BDR is unable to invoke Postgres triggers as it operates on the rows
directly, a layer below Postgres. Is this Correct? Is there any mechanism
that exists that could provide notifications to a listening application when
BDR makes changes to the underlying database? 

Apologies if this is all a bit elementary, this is my first foray into BDR
and I was unable to find anything in the documentation that mentioned
triggers.

Thanks for any input



--
View this message in context: 
http://www.postgresql-archive.org/BDR-replication-and-table-triggers-tp5958463.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR problem rejoining node

2017-02-11 Thread Sylvain Marechal
2017-02-11 1:34 GMT+01:00 Tanner Kerr :

> I have two databases being replicated across three nodes with bdr. The
> third node filled up and crashed. I removed this node from the group
> successfully, but now I'm having trouble rejoining it. I'm able to re-join
> the one database no problem. However, trying to do a bdr join on the other
> causes the two active nodes to slow down considerably, and refuses to join.
> This database copies the data after a few minutes, however, the join fails
> and bdr.bdr_nodes on the third node shows only itself in the group with
> status 'i'. The active nodes don't show anything different in the
> bdr.bdr_nodes. All they can see is a new pg_replication_slot for that node.
> The logs say "previous init failed, manual cleanup is required" Even though
> there are no bdr_connections or broken bdr_nodes entries. Everytime I've
> tried, I've removed the pg_replication slots, the bdr_connections, and the
> bdr_nodes (bdr.remove_bdr_from_local_node()). I've also dropped the
> cluster and recreated it again each time with:
>
> pg_dropcluster 9.4 main --stop
> /usr/bin/pg_createcluster --start 9.4 -d /var/lib/postgresql/9.4/main main
>
> The database failing to copy is small only around 60M. The one being
> copied successfully, is around 1600M.
>
> The join command I've been using:
>
> select bdr.bdr_group_join(local_node_name := 'NODENAMEOFREBUILTNODE',
> node_external_dsn := 'host=IPOFREBUILTNODE port=5432 dbname=MYDB
> connect_timeout=50 keepalives_idle=5 keepalives_interval=1', join_using_dsn
> := 'host=ACTIVENODEIP  port=5432 dbname=MYDB connect_timeout=50
> keepalives_idle=5 keepalives_interval=1');
>
> I'm not sure why it works for one database and not the other.
>
> The log from the failed join attemp is below:
>
> 2017-02-10 14:49:24 PST [6981-10] LOG:  Creating replica with:
> /usr/lib/postgresql/9.4/bin/bdr_initial_load --snapshot 00C58720-1
> --source "host=ACTIVEIP  port=5432 dbname=MYDB connect_timeout=5
> 0 keepalives_idle=5 keepalives_interval=1" --target "host=REBUILDHOST
> port=5432 dbname=MYDB connect_timeout=50 keepalives_idle=5
> keepalives_interval=1" --tmp-directory "/tmp/postgres-bdr-00C58720-1
> .6981", --pg-dump-path "/usr/lib/postgresql/9.4/bin/bdr_dump",
> --pg-restore-path "/usr/lib/postgresql/9.4/bin/pg_restore"
> Dumping remote database "connect_timeout=30 keepalives=1
> keepalives_idle=20 keepalives_interval=20 keepalives_count=5
> host=ACTIVEIP  port=5432 dbname=MYDB connect_timeout=50 keepalives_idle=5
> keepalives_interval=1 fallback_application_name='bdr
> (6385577151748866557,1,16391,): init_replica dump'" with 1 concurrent
> workers to "/tmp/postgres-bdr-00C58720-1.6981"
> Restoring dump to local DB "host=REBUILDHOST port=5432 dbname=MYDB
> connect_timeout=50 keepalives_idle=5 keepalives_interval=1
> fallback_application_name='bdr (6385577151748866557,1,16391,):
> init_replica restore' options='-c bdr.do_not_replicate=on  -c
> bdr.permit_unsafe_ddl_commands=on -c bdr.skip_ddl_replication=on -c
> bdr.skip_ddl_locking=on -c session_replication_role=replica'" with 1
> concurrent workers from "/tmp/postgres-bdr-00C58720-1.6981"
> 2017-02-10 14:57:16 PST [7271-1] postgres@MYDB NOTICE:  extension
> "btree_gist" already exists, skipping
> 2017-02-10 14:57:16 PST [7271-2] postgres@MYDB NOTICE:  extension "bdr"
> already exists, skipping
> 2017-02-10 14:57:16 PST [7271-3] postgres@MYDB NOTICE:  extension
> "plpgsql" already exists, skipping
> 2017-02-10 14:57:16 PST [7271-4] postgres@MYDB NOTICE:  extension
> "pgcrypto" already exists, skipping
> 2017-02-10 14:57:16 PST [7271-5] postgres@MYDB NOTICE:  extension
> "uuid-ossp" already exists, skipping
>
> *alters tables and builds indexes*
>
> 2017-02-10 14:57:23 PST [6981-10] DEBUG:  syncing bdr_nodes and
> bdr_connections
> 2017-02-10 14:57:23 PST [6981-11] ERROR:  BEGIN or table locking on remote
> failed:
> 2017-02-10 14:57:23 PST [6884-11] LOG:  worker process: bdr db: MYDB (PID
> 6981) exited with exit code 1
> 2017-02-10 14:57:28 PST [6884-12] LOG:  starting background worker process
> "bdr db: MYDB"
> 2017-02-10 14:57:28 PST [7274-1] NOTICE:  version "1.0" of extension
> "btree_gist" is already installed
> 2017-02-10 14:57:28 PST [7274-2] NOTICE:  version "1.0.1.0" of extension
> "bdr" is already installed
> 2017-02-10 14:57:28 PST [7274-3] DEBUG:  per-db worker for node bdr
> (6385577151748866557,1,16391,) starting
> 2017-02-10 14:57:28 PST [7274-4] DEBUG:  init_replica init from remote
> host=ACTIVEIP  port=5432 dbname=MYDB connect_timeout=50 keepalives_idle=5
> keepalives_interval=1
> 2017-02-10 14:57:28 PST [7274-5] ERROR:  previous init failed, manual
> cleanup is required
> 2017-02-10 14:57:28 PST [7274-6] DETAIL:  Found bdr.bdr_nodes entry for
> bdr (6385577151748866557,1,16391,) with state=i in remote bdr.bdr_nodes
> 2017-02-10 14:57:28 PST [7274-7] HINT:  Remove all replication identifiers
> and slots corresponding to this node from the init target node then drop
> and 

[GENERAL] BDR problem rejoining node

2017-02-10 Thread Tanner Kerr
I have two databases being replicated across three nodes with bdr. The
third node filled up and crashed. I removed this node from the group
successfully, but now I'm having trouble rejoining it. I'm able to re-join
the one database no problem. However, trying to do a bdr join on the other
causes the two active nodes to slow down considerably, and refuses to join.
This database copies the data after a few minutes, however, the join fails
and bdr.bdr_nodes on the third node shows only itself in the group with
status 'i'. The active nodes don't show anything different in the
bdr.bdr_nodes. All they can see is a new pg_replication_slot for that node.
The logs say "previous init failed, manual cleanup is required" Even though
there are no bdr_connections or broken bdr_nodes entries. Everytime I've
tried, I've removed the pg_replication slots, the bdr_connections, and the
bdr_nodes (bdr.remove_bdr_from_local_node()). I've also dropped the cluster
and recreated it again each time with:

pg_dropcluster 9.4 main --stop
/usr/bin/pg_createcluster --start 9.4 -d /var/lib/postgresql/9.4/main main

The database failing to copy is small only around 60M. The one being copied
successfully, is around 1600M.

The join command I've been using:

select bdr.bdr_group_join(local_node_name := 'NODENAMEOFREBUILTNODE',
node_external_dsn := 'host=IPOFREBUILTNODE port=5432 dbname=MYDB
connect_timeout=50 keepalives_idle=5 keepalives_interval=1', join_using_dsn
:= 'host=ACTIVENODEIP  port=5432 dbname=MYDB connect_timeout=50
keepalives_idle=5 keepalives_interval=1');

I'm not sure why it works for one database and not the other.

The log from the failed join attemp is below:

2017-02-10 14:49:24 PST [6981-10] LOG:  Creating replica with:
/usr/lib/postgresql/9.4/bin/bdr_initial_load --snapshot 00C58720-1 --source
"host=ACTIVEIP  port=5432 dbname=MYDB connect_timeout=5
0 keepalives_idle=5 keepalives_interval=1" --target "host=REBUILDHOST
port=5432 dbname=MYDB connect_timeout=50 keepalives_idle=5
keepalives_interval=1" --tmp-directory "/tmp/postgres-bdr-00C58720-1
.6981", --pg-dump-path "/usr/lib/postgresql/9.4/bin/bdr_dump",
--pg-restore-path "/usr/lib/postgresql/9.4/bin/pg_restore"
Dumping remote database "connect_timeout=30 keepalives=1 keepalives_idle=20
keepalives_interval=20 keepalives_count=5   host=ACTIVEIP  port=5432
dbname=MYDB connect_timeout=50 keepalives_idle=5
keepalives_interval=1 fallback_application_name='bdr
(6385577151748866557,1,16391,): init_replica dump'" with 1 concurrent
workers to "/tmp/postgres-bdr-00C58720-1.6981"
Restoring dump to local DB "host=REBUILDHOST port=5432 dbname=MYDB
connect_timeout=50 keepalives_idle=5 keepalives_interval=1
fallback_application_name='bdr (6385577151748866557,1,16391,): init_replica
restore' options='-c bdr.do_not_replicate=on  -c
bdr.permit_unsafe_ddl_commands=on -c bdr.skip_ddl_replication=on -c
bdr.skip_ddl_locking=on -c session_replication_role=replica'" with 1
concurrent workers from "/tmp/postgres-bdr-00C58720-1.6981"
2017-02-10 14:57:16 PST [7271-1] postgres@MYDB NOTICE:  extension
"btree_gist" already exists, skipping
2017-02-10 14:57:16 PST [7271-2] postgres@MYDB NOTICE:  extension "bdr"
already exists, skipping
2017-02-10 14:57:16 PST [7271-3] postgres@MYDB NOTICE:  extension "plpgsql"
already exists, skipping
2017-02-10 14:57:16 PST [7271-4] postgres@MYDB NOTICE:  extension
"pgcrypto" already exists, skipping
2017-02-10 14:57:16 PST [7271-5] postgres@MYDB NOTICE:  extension
"uuid-ossp" already exists, skipping

*alters tables and builds indexes*

2017-02-10 14:57:23 PST [6981-10] DEBUG:  syncing bdr_nodes and
bdr_connections
2017-02-10 14:57:23 PST [6981-11] ERROR:  BEGIN or table locking on remote
failed:
2017-02-10 14:57:23 PST [6884-11] LOG:  worker process: bdr db: MYDB (PID
6981) exited with exit code 1
2017-02-10 14:57:28 PST [6884-12] LOG:  starting background worker process
"bdr db: MYDB"
2017-02-10 14:57:28 PST [7274-1] NOTICE:  version "1.0" of extension
"btree_gist" is already installed
2017-02-10 14:57:28 PST [7274-2] NOTICE:  version "1.0.1.0" of extension
"bdr" is already installed
2017-02-10 14:57:28 PST [7274-3] DEBUG:  per-db worker for node bdr
(6385577151748866557,1,16391,) starting
2017-02-10 14:57:28 PST [7274-4] DEBUG:  init_replica init from remote
host=ACTIVEIP  port=5432 dbname=MYDB connect_timeout=50 keepalives_idle=5
keepalives_interval=1
2017-02-10 14:57:28 PST [7274-5] ERROR:  previous init failed, manual
cleanup is required
2017-02-10 14:57:28 PST [7274-6] DETAIL:  Found bdr.bdr_nodes entry for bdr
(6385577151748866557,1,16391,) with state=i in remote bdr.bdr_nodes
2017-02-10 14:57:28 PST [7274-7] HINT:  Remove all replication identifiers
and slots corresponding to this node from the init target node then drop
and recreate this database and try again


Tanner


Re: [GENERAL] BDR: changing dsn on a running node

2016-10-11 Thread Craig Ringer
On 12 October 2016 at 00:55, Sylvain MARECHAL
 wrote:
> Le 07/10/2016 à 23:54, Natan Abolafya a écrit :
>
> Is it possible to change the dsn connection string of a node without leaving
> the group? I couldn’t find the related documentation unfortunately.
>
> We’re using BDR in a dynamic environment where the hostname/ip of a node may
> be changed any time. Leaving and rejoining the BDR group seems to be a
> solution but is a bit tedious and a slow process.

You can update the bdr.bdr_connections entry and then kill the apply workers.

Right now bdr.bdr_connections_changed() doesn't know to check for a
changed DSN. I'd welcome a patch to address that, since I probably
won't have time to get to it soon.

We should have a bdr.bdr_connection_set_dsn(...) function, really.
Again, a patch would be welcomed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR: changing dsn on a running node

2016-10-11 Thread Sylvain MARECHAL

Le 07/10/2016 à 23:54, Natan Abolafya a écrit :


Hi

Is it possible to change the dsn connection string of a node without 
leaving the group? I couldn’t find the related documentation 
unfortunately.


We’re using BDR in a dynamic environment where the hostname/ip of a 
node may be changed any time. Leaving and rejoining the BDR group 
seems to be a solution but is a bit tedious and a slow process.


Natan


Hi,

Just to say I did it with bdr 0.9.3 to change the password of the 
connection string.
I needed to manually modify the bdr.bdr_nodes and bdr.bdr_connections 
tables for all nodes but finally it works.

What I found strange is that the changes were not replicated automatically.
(Note there is a bdr.bdr_connections_changed() stored procedure, I am 
not sure this is needed, I did not use it)


My 2 cents,
Sylvain



[GENERAL] BDR: changing dsn on a running node

2016-10-11 Thread Natan Abolafya
Hi

 

Is it possible to change the dsn connection string of a node without leaving
the group? I couldn't find the related documentation unfortunately.

We're using BDR in a dynamic environment where the hostname/ip of a node may
be changed any time. Leaving and rejoining the BDR group seems to be a
solution but is a bit tedious and a slow process.

 

Natan

 

 

 



Re: [GENERAL] BDR: Transactions with global lock

2016-09-04 Thread Craig Ringer
On 31 August 2016 at 22:38, Salvatore Tomaselli
 wrote:
> Hello,
>
> I have been looking around in the documentation and I didn't find anything, 
> so I wonder if there is support in bdr for having transactions that happen 
> while the global lock is acquired and get replicated everywhere before the 
> transaction ends.
>
> Is there a way to achieve this?

The purpose of the global DDL lock to to prevent that.

If you ALTER a table on one node, say by ALTERing the type of a
column, while you continue to do write transactions to that table on
other nodes, when the other nodes' data replicates to the node that
does the DROP it won't make sense anymore. What value should be in the
new column?

BDR 1.0 adds a weaker DDL lock mode that is used for types of DDL
where that is not a concern. It only blocks other concurrent DDL. But
for DDL that alters tables it's necessary to block concurrent DML.

BTW, now that it's clear in-core logical replication is going in
another direction there's now a bdr-l...@2ndquadrant.com mailing list;
see https://groups.google.com/a/2ndquadrant.com/forum/#!forum/bdr-list
.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR: Transactions with global lock

2016-09-02 Thread Salvatore Tomaselli
Hello,

I have been looking around in the documentation and I didn't find anything, so 
I wonder if there is support in bdr for having transactions that happen while 
the global lock is acquired and get replicated everywhere before the 
transaction ends.

Is there a way to achieve this?

Best

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres

2016-08-25 Thread Sylvain Marechal
Hello all,

After uninstalling a BDR node, it becomes not possible to join it again.
The following log appears in loop:
<<<
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14620-1] LOG:  starting
background worker process "bdr (6287997142852742670,1,19526,)->bdr
(6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14621-1] LOG:  starting
background worker process "bdr (6287997142852742670,1,18365,)->bdr
(6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14622-1] LOG:  starting
background worker process "bdr db: mydb" #local4,support
2016-08-25 10:17:08 [ll101] postgres error [6484]: [14621-1] FATAL:
 mismatch in worker state, got 0, expected 1 #error,local4,support
2016-08-25 10:17:08 [ll101] postgres error [6486]: [14622-1] FATAL:
 mismatch in worker state, got 0, expected 1 #error,local4,support

>>>
I can not tell how this appends: before removing the node, one of the node
was in the 'catchup' state and the lag of data between the 2 nodes was
growing, that is why I removed it (the idea was to clean the lagged node
and to reattach it again.)


Questions:
* is it possible to recover from this error without restarting postgres
* in case a restart is necessary, how to be sure the postgres restart will
work? my fear is that the restart fails, meaning the service will be
completely down.

Thanks and regards,
Sylvain


Re: [GENERAL] BDR Cluster vs DB Config

2016-08-13 Thread Martín Marqués
El 20/07/16 a las 20:06, Jonathan Eastgate escribió:
> 
> I assume that once BDR is enabled on a database that any additional
> schemas added post config are automatically included in BDR replication?

All DDLs (CREATE SCHEMA ...) will be replicated to the other nodes, but
if you are asking if the data from newly created tables will
automatically start replicating, that depends on how you initially
configured your replication sets.

The short answer is, the default is to replicate all tables, new ones
included.

But if you plan on doing selective replication by creating a replication
set and adding tables there, then it's on you to add or not the new
tables to that set.

> And so you see any issues having potentially 200 schemas within the DB -
> performance or replication wise?

I don't see any possible performance degradation just for having the
tables spread in different schemata.

There will be one walsender for each downstream node, and one apply
worker for each upstream node, so you literally cut down to 1/200 the
amount of replication slots and replication connections on each node.
That's a huge performance win, particularly on 9.4.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR Cluster vs DB Config

2016-07-20 Thread Jonathan Eastgate
Thanks guys.

Very helpful - I was thinking we may need to look at moving to schemas
instead of individual db's.

I assume that once BDR is enabled on a database that any additional schemas
added post config are automatically included in BDR replication?

And so you see any issues having potentially 200 schemas within the DB -
performance or replication wise?

Thanks in advance.




*Jonathan J. Eastgate*
Chief Technology Officer | simPRO Software Group
Ph: 1300 139 467+61 7 3147 8777



Keep up to date with simPRO at: http://simprogroup.com/blog
The contents of this email are subject to our email disclaimer
.


On Wed, Jul 20, 2016 at 5:18 PM, Craig Ringer  wrote:

> On 20 July 2016 at 13:22, Jonathan Eastgate 
> wrote:
>
>> Hi everyone.
>>
>> We've been testing BDR on and off for the last 2 years and are keen to
>> start looking at implementing it in production as it seems 0.93 has
>> resolved most of the issues we faced with it in the early days.
>>
>> However there is still one item that makes it a difficult proposition...
>>
>> DSN config per database.
>>
>> Is there any way to configure BDR on a cluster wide basis so that all
>> DB's on a cluster are replicated via BDR instead of having to configure a
>> connection for each DB we want to replicate?
>>
>
> No.
>
> Not only that, but if you're replicating lots of databases between
> PostgreSQL instances you're likely to start facing some performance
> problems around the sheer number of background workers required, the way
> WAL needs to be processed multiple times, etc.
>
> If you're using this for multi-tenancy or similar, see if you can isolate
> by schema not by database.
>
>
>> The problem we have is over 20 clusters with about 200 DB's per cluster
>> and growing constantly so this would make deploying BDR a painful process -
>> if we had to add a connection for each existing DB and then every new DB.
>>
>
> Yeah. That's going to cause you pain even aside from the management of it.
>
>
>
>> Is there a way around this or are there plans to make this type of config
>> available?
>>
>
> There are no plans to automate this configuration. BDR works at a database
> level, with the exception of bdr_init_copy bringing up all BDR-enabled
> databases on the join target node as a one-time operation at setup.
>
> Maybe once we eventually have some kind of answer for how to replicate
> instance-global DDL that affects shared catalogs, like database
> creation/drop, user creation/drop, etc, then it might make sense to extend
> BDR or its successor to do this. But not at the moment.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 
--


Re: [GENERAL] BDR Cluster vs DB Config

2016-07-20 Thread Craig Ringer
On 20 July 2016 at 13:22, Jonathan Eastgate 
wrote:

> Hi everyone.
>
> We've been testing BDR on and off for the last 2 years and are keen to
> start looking at implementing it in production as it seems 0.93 has
> resolved most of the issues we faced with it in the early days.
>
> However there is still one item that makes it a difficult proposition...
>
> DSN config per database.
>
> Is there any way to configure BDR on a cluster wide basis so that all DB's
> on a cluster are replicated via BDR instead of having to configure a
> connection for each DB we want to replicate?
>

No.

Not only that, but if you're replicating lots of databases between
PostgreSQL instances you're likely to start facing some performance
problems around the sheer number of background workers required, the way
WAL needs to be processed multiple times, etc.

If you're using this for multi-tenancy or similar, see if you can isolate
by schema not by database.


> The problem we have is over 20 clusters with about 200 DB's per cluster
> and growing constantly so this would make deploying BDR a painful process -
> if we had to add a connection for each existing DB and then every new DB.
>

Yeah. That's going to cause you pain even aside from the management of it.



> Is there a way around this or are there plans to make this type of config
> available?
>

There are no plans to automate this configuration. BDR works at a database
level, with the exception of bdr_init_copy bringing up all BDR-enabled
databases on the join target node as a one-time operation at setup.

Maybe once we eventually have some kind of answer for how to replicate
instance-global DDL that affects shared catalogs, like database
creation/drop, user creation/drop, etc, then it might make sense to extend
BDR or its successor to do this. But not at the moment.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR Cluster vs DB Config

2016-07-19 Thread Alvaro Aguayo Garcia-Rada
Hello. BDR works on a per-database basis, so there's nothing like what you are 
looking for. However, if you initialize a BDR custer with bdr_init_copy, you 
will get all existing databases added to replication. Then, as part of the 
creation of new databases, you can use bdr_group_join function, which will 
allow you to add the new database(even empty) to the cluster.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

Sent from my Sony Xperia™ smartphone

 Jonathan Eastgate wrote 

Hi everyone.

We've been testing BDR on and off for the last 2 years and are keen to
start looking at implementing it in production as it seems 0.93 has
resolved most of the issues we faced with it in the early days.

However there is still one item that makes it a difficult proposition...

DSN config per database.

Is there any way to configure BDR on a cluster wide basis so that all DB's
on a cluster are replicated via BDR instead of having to configure a
connection for each DB we want to replicate?

The problem we have is over 20 clusters with about 200 DB's per cluster and
growing constantly so this would make deploying BDR a painful process - if
we had to add a connection for each existing DB and then every new DB.

Is there a way around this or are there plans to make this type of config
available?

Thanks in advance.

*Jonathan J. Eastgate*
Chief Technology Officer | simPRO Software Group
Ph: 1300 139 467+61 7 3147 8777



Keep up to date with simPRO at: http://simprogroup.com/blog
The contents of this email are subject to our email disclaimer
.

-- 
--


[GENERAL] BDR Cluster vs DB Config

2016-07-19 Thread Jonathan Eastgate
Hi everyone.

We've been testing BDR on and off for the last 2 years and are keen to
start looking at implementing it in production as it seems 0.93 has
resolved most of the issues we faced with it in the early days.

However there is still one item that makes it a difficult proposition...

DSN config per database.

Is there any way to configure BDR on a cluster wide basis so that all DB's
on a cluster are replicated via BDR instead of having to configure a
connection for each DB we want to replicate?

The problem we have is over 20 clusters with about 200 DB's per cluster and
growing constantly so this would make deploying BDR a painful process - if
we had to add a connection for each existing DB and then every new DB.

Is there a way around this or are there plans to make this type of config
available?

Thanks in advance.

*Jonathan J. Eastgate*
Chief Technology Officer | simPRO Software Group
Ph: 1300 139 467+61 7 3147 8777



Keep up to date with simPRO at: http://simprogroup.com/blog
The contents of this email are subject to our email disclaimer
.

-- 
--


[GENERAL] BDR - Ignore already exists error during DDL replay

2016-07-12 Thread Nikhil
Hello,

In my 2-node BDR setup if I make changes in db schema I am seeing below
error or after few reboots I get into below inconsistent state during DDL
replay. Is there any way to ignore ItemAlreadyExists error during DDL
replay ?

global lock of DDL replication is switched off in configuration.


Best Regards,
Nikhil

GMT%ERROR:  trigger "truncate_trigger" for relation
"provisionsettings_server_boundary" already e
xists
<283032016-07-12 15:51:58 GMT%CONTEXT:  during DDL replay of ddl statement:
CREATE  TRIGGER truncate_trigger AFTER TRUNCATE
 ON public.provisionsettings_server_boundary   FOR EACH STATEMENT  EXECUTE
PROCEDURE bdr.queue_truncate()
<3662016-07-12 15:51:58 GMT%LOG:  worker process: bdr
(6306146678097036401,2,16386,)->bdr (6306138636064436461,1, (PID 2830
3) exited with exit code 1


Re: [GENERAL] BDR

2016-06-13 Thread Alvaro Aguayo Garcia-Rada
Block based replication is the replication mechanism postgres incorporates 
natively. It's, in brief, sending all the file-level changes to all the slaves, 
so the data folder is always the same. It's like having a replicated folder, 
not including logs and some other things.

The disadvantage of block level replication, according to BDR, it that, when 
you have an open transaction, all DML is written to disk, even when the 
transaction is not committed. In a large transaction, all the date would be 
send to the slaves, even when, at the end, the transaction is rolled back. With 
BDR, the transaction is sent to the other masters only once it's committed.

Of course, this can be a problem in both cases, depending on your environment. 
With block level replication, you can get unnecessary traffic for transactions 
that would be finally rolled back(in contrast with BDR, which will send the 
whole transaction once it's committed); on BDR, you will get traffic peaks for 
some large transactions, as the whole transaction is sent once it's 
committed(in contrast to block level replication, which would send changes as 
they are being executed). The later can also cause some delay, depending on the 
connection between the servers:

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Rakesh Kumar" <rakeshkumar46...@gmail.com>
Cc: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Monday, 13 June, 2016 07:13:09
Subject: Re: [GENERAL] BDR

http://bdr-project.org/docs/next/logical-vs-physical.html

"It (BDR) has significant advantages - and some disadvantages - when
compared to PostgreSQL's older physical (block-based) streaming or
archive-based replication with warm or hot standby"

What exactly is block based? Changes are recorded in the redo log,
right? Does that mean that in streaming replication, from redo log the
server applies changes at the block level of the data-file. That would
also mean that at any time, both primary and standby would be exactly
same, block by block.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR

2016-06-13 Thread Rakesh Kumar
http://bdr-project.org/docs/next/logical-vs-physical.html

"It (BDR) has significant advantages - and some disadvantages - when
compared to PostgreSQL's older physical (block-based) streaming or
archive-based replication with warm or hot standby"

What exactly is block based? Changes are recorded in the redo log,
right? Does that mean that in streaming replication, from redo log the
server applies changes at the block level of the data-file. That would
also mean that at any time, both primary and standby would be exactly
same, block by block.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR

2016-06-12 Thread Craig Ringer
On 11 June 2016 at 02:12, Rakesh Kumar  wrote:

> Sorry if this question was asked before.  As I understand currently
> BDR does not support the replicating nodes to run different major
> versions, like
> 9.4 <-> 9.5.
>
> Is this in the works?
>

Not with BDR between 9.4 and 9.5, no, as there will not be a 9.5 version of
BDR. It'll be skipping straight to 9.6.

pglogical, a simplified and streamlined version of the logical replication
facilities used in BDR, can replicate from 9.4 to 9.5 (or to/from any other
combo of verisons 9.4+).  It doesn't support multimaster or DDL replication
like BDR does, though.

You can also look into Londiste and Slony-I.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR

2016-06-12 Thread Craig Ringer
On 11 June 2016 at 02:26, David G. Johnston 
wrote:

> On Fri, Jun 10, 2016 at 2:12 PM, Rakesh Kumar 
> wrote:
>
>> Sorry if this question was asked before.  As I understand currently
>> BDR does not support the replicating nodes to run different major
>> versions, like
>> 9.4 <-> 9.5.
>>
>> Is this in the works?
>>
>
> ​This seems relevant...​
>
> ​http://bdr-project.org/docs/stable/logical-vs-physical.html​
>
>
> ​
> ​But you question seems vague since BDR is a concept for which many
> implementations exist.
>

I think they're specifically referring to 2ndQ's BDR project here, rather
than bi-directional logical replication general.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR

2016-06-10 Thread Rakesh Kumar
> This seems relevant...
>
> http://bdr-project.org/docs/stable/logical-vs-physical.html

thanks. very useful.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR

2016-06-10 Thread David G. Johnston
On Fri, Jun 10, 2016 at 2:12 PM, Rakesh Kumar 
wrote:

> Sorry if this question was asked before.  As I understand currently
> BDR does not support the replicating nodes to run different major
> versions, like
> 9.4 <-> 9.5.
>
> Is this in the works?
>

​This seems relevant...​

​http://bdr-project.org/docs/stable/logical-vs-physical.html​


​
​But you question seems vague since BDR is a concept for which many
implementations exist.

David J.​
​


[GENERAL] BDR

2016-06-10 Thread Rakesh Kumar
Sorry if this question was asked before.  As I understand currently
BDR does not support the replicating nodes to run different major
versions, like
9.4 <-> 9.5.

Is this in the works?

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR Custom Conflict Handler

2016-06-07 Thread Elijah Bitting
There appears to be absolutely zero documentation regarding the expected 
format of a custom conflict handler function.

I'm attempting to define a custom conflict handler for a test bdr 
cluster. When I tried to register a generic function with no parameters 
I get a somewhat useful error message which states that the function is 
expected to have a signature like this:

select * from bdr.bdr_create_conflict_handler('test', 
'customconflictmgr', 
'test_conflict_trigger()', 'update_update');

ERROR: conflict handler function signature must be (IN tablerow, IN 
tablerow, IN text, IN regclass, IN bdr.bdr_conflict_type, OUT tablerow, 
OUT bdr.bdr_conflict_handler_action)
SQL state: XX000
Hint: OUT argument are not of the expected types.

So I then attempted to define a function with specified signature:

select * from bdr.bdr_replicate_ddl_command('
CREATE OR REPLACE FUNCTION public.test_conflict_trigger(IN r1 tablerow, 
IN r2 tablerow, IN t1 text, IN rc1 regclass, IN c1 
bdr.bdr_conflict_type, OUT rout tablerow, OUT a1 
bdr.bdr_conflict_handler_action)
  RETURNS record AS
$BODY$
BEGIN
raise notice ''conflict detected in test table'';
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION public.test_conflict_trigger(IN r1 oid, IN r2 oid, IN t1 
text, IN rc1 regclass, IN c1 bdr.bdr_conflict_type, OUT rout text, OUT 
a1 bdr.bdr_conflict_handler_action)
  OWNER TO postgres;
');


and get this error:
ERROR:  type tablerow does not exist

What is tablerow type Does there exist anywhere an example of a 
custom conflict handler function???




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR to ignore table exists error

2016-05-31 Thread Nikhil
Thanks a lot Martin for your replies.

On Sun, May 29, 2016 at 11:50 PM, Martín Marqués 
wrote:

> Hi,
>
> El 29/05/16 a las 06:01, Nikhil escribió:
> >
> > *​Nik>> skip_ddl_locking is set to True in my configuration. As this
> > was preventing single*
> >
> > *​node from doing DDL operation (if one is down majority is not there
> > for doing DDL on available node)*​
>
> Well, you have to be prepared to deal with burn wounds if you play with
> fire. ;)
>
> If you decide to have skip_ddl_locking on you have to be sure all DDLs
> happen on one node, else you end up with conflicts like this.
>
> I suggest you find out why the table was already created on the
> downstream node (as a forensics task so you can avoid bumping into the
> same issue).
>
> > ​Nik>> DDL used is
> >
> > ​
> > ERROR:  relation "af_npx_l3_16_146_10" already exists
> > <596802016-05-29 08:53:07 GMT%CONTEXT:  during DDL replay of ddl
> > statement: CREATE  TABLE  public.af_npx_license_l3_16_146_
> > 10 (CONSTRAINT af_npx_license_l3_16_146_10_rpt_sample_time_check CHECK
> > (((rpt_sample_time OPERATOR(pg_catalog.>=) 146417040
> > 0) AND (rpt_sample_time OPERATOR(pg_catalog.<=) 1464173999))) ) INHERITS
> > (public.af_npx_l3) WITH (oids=OFF)
> > <554132016-05-29 08:53:07 GMT%LOG:  worker process: bdr
> > (6288512113617339435,2,16384,)->bdr (6288505144157102317,1, (PID 59
> > 680) exited with exit code 1
>
> On the node where the CREATE TABLE is trying to get applied run this:
>
> BEGIN;
> SET LOCAL bdr.skip_ddl_replication TO 'on';
> SET LOCAL bdr.skip_ddl_locking TO 'on';
> DROP TABLE af_npx_l3_16_146_10;
> END;
>
> After that, the DDL that's stuck will get applied and the stream of
> changes will continue.
>
> By the looks of what you're dealing with, I wouldn't be surprised if the
> replication gets stuck again on another DDL conflict.
>
> I suggest rethinking the locking strategy, because this shows that
> there's something fishy there.
>
> Regards,
>
> --
> Martín Marquéshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] BDR to ignore table exists error

2016-05-29 Thread Martín Marqués
Hi,

El 29/05/16 a las 06:01, Nikhil escribió:
> 
> *​Nik>> skip_ddl_locking is set to True in my configuration. As this
> was preventing single*
> 
> *​node from doing DDL operation (if one is down majority is not there
> for doing DDL on available node)*​

Well, you have to be prepared to deal with burn wounds if you play with
fire. ;)

If you decide to have skip_ddl_locking on you have to be sure all DDLs
happen on one node, else you end up with conflicts like this.

I suggest you find out why the table was already created on the
downstream node (as a forensics task so you can avoid bumping into the
same issue).

> ​Nik>> DDL used is
> 
> ​
> ERROR:  relation "af_npx_l3_16_146_10" already exists
> <596802016-05-29 08:53:07 GMT%CONTEXT:  during DDL replay of ddl
> statement: CREATE  TABLE  public.af_npx_license_l3_16_146_
> 10 (CONSTRAINT af_npx_license_l3_16_146_10_rpt_sample_time_check CHECK
> (((rpt_sample_time OPERATOR(pg_catalog.>=) 146417040
> 0) AND (rpt_sample_time OPERATOR(pg_catalog.<=) 1464173999))) ) INHERITS
> (public.af_npx_l3) WITH (oids=OFF)
> <554132016-05-29 08:53:07 GMT%LOG:  worker process: bdr
> (6288512113617339435,2,16384,)->bdr (6288505144157102317,1, (PID 59
> 680) exited with exit code 1

On the node where the CREATE TABLE is trying to get applied run this:

BEGIN;
SET LOCAL bdr.skip_ddl_replication TO 'on';
SET LOCAL bdr.skip_ddl_locking TO 'on';
DROP TABLE af_npx_l3_16_146_10;
END;

After that, the DDL that's stuck will get applied and the stream of
changes will continue.

By the looks of what you're dealing with, I wouldn't be surprised if the
replication gets stuck again on another DDL conflict.

I suggest rethinking the locking strategy, because this shows that
there's something fishy there.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR to ignore table exists error

2016-05-29 Thread Nikhil
Please see my replies inline.

On Sat, May 28, 2016 at 8:08 PM, Martín Marqués 
wrote:

> El 28/05/16 a las 08:57, Nikhil escribió:
> > Once the node which was down is brought back the replication slot is not
> > turned active. The reason being replication slot is trying to create a
> > partition table which already exists. Because of this error replication
> > slot is stuck in inactive mode. Is there any way to ignore this error?
>
> BTW, how did you end up in such a state? Did you create the partition
> table skipping ddl locking?
> *​Nik>> skip_ddl_locking is set to True in my configuration. As this was
> preventing single*
>
*​node from doing DDL operation (if one is down majority is not there for
doing DDL on available node)*​


>
> At this point the easiest way out is to drop the table on the node where
> it's trying to get applied with bdr_replication off or
> skip_ddl_replication on, so the table is dropped locally but not
> replicated, and the create table from the slot can be consumed.
>
> The other option is to consume the create table statement from the slot
> directly.
> ​Nik>> DDL used is
>
​
ERROR:  relation "af_npx_l3_16_146_10" already exists
<596802016-05-29 08:53:07 GMT%CONTEXT:  during DDL replay of ddl statement:
CREATE  TABLE  public.af_npx_license_l3_16_146_
10 (CONSTRAINT af_npx_license_l3_16_146_10_rpt_sample_time_check CHECK
(((rpt_sample_time OPERATOR(pg_catalog.>=) 146417040
0) AND (rpt_sample_time OPERATOR(pg_catalog.<=) 1464173999))) ) INHERITS
(public.af_npx_l3) WITH (oids=OFF)
<554132016-05-29 08:53:07 GMT%LOG:  worker process: bdr
(6288512113617339435,2,16384,)->bdr (6288505144157102317,1, (PID 59
680) exited with exit code 1
​


> ​
>
> Be aware of the dangers of changing the default values for such
> parameters, (bdr_replication, skip_ddl_replication, skip_ddl_locking)
> and when needed they should be used with special care.
> *​Nik>>. The DDL replay is started once the node join back to bdr group. I
> think its started from an old check point causing partition already exists
> error. Is there any way to ignore replay error ? or ignore DDL errors while
> replay ?​*
>
> Regards,
>
> --
> Martín Marquéshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] BDR to ignore table exists error

2016-05-28 Thread Martín Marqués
El 28/05/16 a las 08:57, Nikhil escribió:
> Once the node which was down is brought back the replication slot is not
> turned active. The reason being replication slot is trying to create a
> partition table which already exists. Because of this error replication
> slot is stuck in inactive mode. Is there any way to ignore this error?

BTW, how did you end up in such a state? Did you create the partition
table skipping ddl locking?

At this point the easiest way out is to drop the table on the node where
it's trying to get applied with bdr_replication off or
skip_ddl_replication on, so the table is dropped locally but not
replicated, and the create table from the slot can be consumed.

The other option is to consume the create table statement from the slot
directly.

Be aware of the dangers of changing the default values for such
parameters, (bdr_replication, skip_ddl_replication, skip_ddl_locking)
and when needed they should be used with special care.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR to ignore table exists error

2016-05-28 Thread Martín Marqués
El 28/05/16 a las 08:57, Nikhil escribió:
> Once the node which was down is brought back the replication slot is not
> turned active. The reason being replication slot is trying to create a
> partition table which already exists. Because of this error replication
> slot is stuck in inactive mode. Is there any way to ignore this error?

Could you provide the DDL that's run for the patitioning?

Could you provide logs from the other node? (the one where the partition
was created)

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR to ignore table exists error

2016-05-28 Thread Nikhil
Once the node which was down is brought back the replication slot is not
turned active. The reason being replication slot is trying to create a
partition table which already exists. Because of this error replication
slot is stuck in inactive mode. Is there any way to ignore this error?
On 28-May-2016 4:56 PM, "Martín Marqués"  wrote:

> El 27/05/16 a las 06:33, Nikhil escribió:
> > Hello,
> >
> >
> > I have a BDR setup with two nodes. If I bring one node down i am seeing
> that
> > the replication slot is becoming inactive with below error.
>
> If you take down one of the nodes of a BDR mesh, the replication slots
> from each of the upstream nodes it connects to will switch to inactive.
> That's how replication slots work.
>
> > <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
> >  streaming transactions committing after 0/111A91
> > 48, reading WAL from 0/110F03F8
> > <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
> >  logical decoding found consistent point at 0/110F03
> > F8
> > <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
> >  Logical decoding will begin using saved snapshot
> > .
> > <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
> >  unexpected EOF on standby connection
>
> Downstream node got disconnected, which is sensible given that you took
> that node down.
>
> > <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> > 0.437 ms
> > <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> > 0.462 ms
> > <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> > 0.096 ms
> > <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> > 0.101 ms
> > <3462016-05-25 23:58:20 GMT%LOG:  starting background worker process "bdr
> > (6288505144157102317,1,16384,)->bdr (628851211361
> > 7339435,2,"
>
> It seems you brought up postgres on the downstream node again and it
> connected to the replication slot.
>
> > <798462016-05-25 23:58:20 GMT%ERROR:  relation
> "af_npx_device_l3_16_149_10"
> > already exists
>
> I'm not sure what happened here. Does that relation exist?
>
> Run \d+ af_npx_device_l3_16_149_10 with psql on both nodes.
>
> Also, did replication resume? Check with the lag query from the BDR
> documentation.
>
> Regards,
>
> --
> Martín Marquéshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] BDR to ignore table exists error

2016-05-28 Thread Martín Marqués
El 27/05/16 a las 06:33, Nikhil escribió:
> Hello,
> 
> 
> I have a BDR setup with two nodes. If I bring one node down i am seeing that
> the replication slot is becoming inactive with below error.

If you take down one of the nodes of a BDR mesh, the replication slots
from each of the upstream nodes it connects to will switch to inactive.
That's how replication slots work.

> <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
>  streaming transactions committing after 0/111A91
> 48, reading WAL from 0/110F03F8
> <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
>  logical decoding found consistent point at 0/110F03
> F8
> <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
>  Logical decoding will begin using saved snapshot
> .
> <10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
>  unexpected EOF on standby connection

Downstream node got disconnected, which is sensible given that you took
that node down.

> <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> 0.437 ms
> <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> 0.462 ms
> <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> 0.096 ms
> <127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
> 0.101 ms
> <3462016-05-25 23:58:20 GMT%LOG:  starting background worker process "bdr
> (6288505144157102317,1,16384,)->bdr (628851211361
> 7339435,2,"

It seems you brought up postgres on the downstream node again and it
connected to the replication slot.

> <798462016-05-25 23:58:20 GMT%ERROR:  relation "af_npx_device_l3_16_149_10"
> already exists

I'm not sure what happened here. Does that relation exist?

Run \d+ af_npx_device_l3_16_149_10 with psql on both nodes.

Also, did replication resume? Check with the lag query from the BDR
documentation.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR to ignore table exists error

2016-05-27 Thread Nikhil
Hello,


I have a BDR setup with two nodes. If I bring one node down i am seeing that
the replication slot is becoming inactive with below error.

<10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
 streaming transactions committing after 0/111A91
48, reading WAL from 0/110F03F8
<10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
 logical decoding found consistent point at 0/110F03
F8
<10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%DETAIL:
 Logical decoding will begin using saved snapshot
.
<10.106.43.152(43253)nsxpostgres798452016-05-25 23:58:19 GMTnsxdb%LOG:
 unexpected EOF on standby connection
<127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
0.437 ms
<127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
0.462 ms
<127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
0.096 ms
<127.0.0.1(31185)nsxroot792492016-05-25 23:58:19 GMTnsxdb%LOG:  duration:
0.101 ms
<3462016-05-25 23:58:20 GMT%LOG:  starting background worker process "bdr
(6288505144157102317,1,16384,)->bdr (628851211361
7339435,2,"
<798462016-05-25 23:58:20 GMT%ERROR:  relation "af_npx_device_l3_16_149_10"
already exists


Thanks & Regards,
Nikhil


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Craig Ringer
On 28 April 2016 at 02:47, Will McCormick  wrote:

> So if I wanted to extend a column from 100 characters to 255 characters is
> this permitted? The fact that I'm not making a change and the BDR kicked me
> out makes me skeptical.
>

Off the top of my head I'm not sure and would need to test. There's no
specific logic in there for detecting such changes and permitting them, so
I suspect not.

If you're changing types in BDR you're expected to do it the long way. Add
a new col, update to copy the data, drop the old col and rename the new
col. Yes, that's ugly. We'd like to change it at some point. If you find
this particular problem annoying enough it'd be helpful if you could send a
patch for bdr_commandfilter.c to detect and permit changes that only affect
a column's typmod.

In the specific case of varchar columns, personally I recommend just using
'text' and adding a CHECK constraint on length. That's what I do most
places, not just BDR.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Alvaro Aguayo Garcia-Rada
If you change the length of a character varying, it should work. I'm almost 
sure I have done that before on my BDR cluster.

It may work as long as it does not require a full table rewrite. I think, the 
length change for a character varying won't need a full table rewrite, as the 
length is only a limit, but the actual content is length-independent.

Also, even if it works for character varying, it may not work for other types, 
specially numeric types.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

Sent from my Sony Xperia™ smartphone

 Will McCormick wrote 

So if I wanted to extend a column from 100 characters to 255 characters is
this permitted? The fact that I'm not making a change and the BDR kicked me
out makes me skeptical.

On Wed, Apr 27, 2016 at 11:56 AM, Craig Ringer 
wrote:

> On 27 April 2016 at 23:43, Alvaro Aguayo Garcia-Rada <
> aagu...@opensysperu.com> wrote:
>
>> Based on my experience, I can say BDR does not performs pre-DDL checks.
>> For example, if you try to CREATE TABLE with the name of an existing table,
>> BDR will acquire lock anyway, and then will fail when executing the DDL
>> statement on the first node, because the table already exists.
>>
>
> Correct, and it has to because otherwise it'd face a race condition where
> the table might be created between when it checked and when it tries to
> create it.
>
>> In your case, it's the same: BDR does not checks(nor needs to) if the DDL
>> statement is or not required, as that's a dba dutty. Then, BDR executes the
>> statement(ane acquires locks), and fails because it would require a full
>> table rewrite, which, at the time, is not supported by BDR.
>>
>
> Yeah. This is more of a "we never thought anyone would want to do that and
> didn't much care" problem. In this case we could lock the table and then
> inspect it. In fact we really should be locking it to prevent races, but we
> rely on the global DDL lock mechanism for that right now. (That's not what
> it's for, just a side-effect that's kind of handy).
>
> Applications *must* be adapted to run on BDR. You can't just point it at a
> BDR-enabled DB and go "we'll be right". DDL is one thing, but multimaster
> async replication conflicts are rather more significant concerns. Also
> handling of the currently somewhat quirky global sequence support's habit
> of ERRORing if you go too fast, trying to keep your transaction sizes down,
> and not trusting row locking for mutual exclusion between nodes. You can't
> use LISTEN/NOTIFY between nodes either, or advisory locking, or
> pg_largeobject ... yeah. Apps require audit and usually require changes.
> Changing an expected error code will be the least of your worries.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Will McCormick
So if I wanted to extend a column from 100 characters to 255 characters is
this permitted? The fact that I'm not making a change and the BDR kicked me
out makes me skeptical.

On Wed, Apr 27, 2016 at 11:56 AM, Craig Ringer 
wrote:

> On 27 April 2016 at 23:43, Alvaro Aguayo Garcia-Rada <
> aagu...@opensysperu.com> wrote:
>
>> Based on my experience, I can say BDR does not performs pre-DDL checks.
>> For example, if you try to CREATE TABLE with the name of an existing table,
>> BDR will acquire lock anyway, and then will fail when executing the DDL
>> statement on the first node, because the table already exists.
>>
>
> Correct, and it has to because otherwise it'd face a race condition where
> the table might be created between when it checked and when it tries to
> create it.
>
>> In your case, it's the same: BDR does not checks(nor needs to) if the DDL
>> statement is or not required, as that's a dba dutty. Then, BDR executes the
>> statement(ane acquires locks), and fails because it would require a full
>> table rewrite, which, at the time, is not supported by BDR.
>>
>
> Yeah. This is more of a "we never thought anyone would want to do that and
> didn't much care" problem. In this case we could lock the table and then
> inspect it. In fact we really should be locking it to prevent races, but we
> rely on the global DDL lock mechanism for that right now. (That's not what
> it's for, just a side-effect that's kind of handy).
>
> Applications *must* be adapted to run on BDR. You can't just point it at a
> BDR-enabled DB and go "we'll be right". DDL is one thing, but multimaster
> async replication conflicts are rather more significant concerns. Also
> handling of the currently somewhat quirky global sequence support's habit
> of ERRORing if you go too fast, trying to keep your transaction sizes down,
> and not trusting row locking for mutual exclusion between nodes. You can't
> use LISTEN/NOTIFY between nodes either, or advisory locking, or
> pg_largeobject ... yeah. Apps require audit and usually require changes.
> Changing an expected error code will be the least of your worries.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Craig Ringer
On 27 April 2016 at 23:43, Alvaro Aguayo Garcia-Rada <
aagu...@opensysperu.com> wrote:

> Based on my experience, I can say BDR does not performs pre-DDL checks.
> For example, if you try to CREATE TABLE with the name of an existing table,
> BDR will acquire lock anyway, and then will fail when executing the DDL
> statement on the first node, because the table already exists.
>

Correct, and it has to because otherwise it'd face a race condition where
the table might be created between when it checked and when it tries to
create it.

> In your case, it's the same: BDR does not checks(nor needs to) if the DDL
> statement is or not required, as that's a dba dutty. Then, BDR executes the
> statement(ane acquires locks), and fails because it would require a full
> table rewrite, which, at the time, is not supported by BDR.
>

Yeah. This is more of a "we never thought anyone would want to do that and
didn't much care" problem. In this case we could lock the table and then
inspect it. In fact we really should be locking it to prevent races, but we
rely on the global DDL lock mechanism for that right now. (That's not what
it's for, just a side-effect that's kind of handy).

Applications *must* be adapted to run on BDR. You can't just point it at a
BDR-enabled DB and go "we'll be right". DDL is one thing, but multimaster
async replication conflicts are rather more significant concerns. Also
handling of the currently somewhat quirky global sequence support's habit
of ERRORing if you go too fast, trying to keep your transaction sizes down,
and not trusting row locking for mutual exclusion between nodes. You can't
use LISTEN/NOTIFY between nodes either, or advisory locking, or
pg_largeobject ... yeah. Apps require audit and usually require changes.
Changing an expected error code will be the least of your worries.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Will McCormick
But this is the exact column definition that exists on the table when I
execute the statement 

It's like it does not check the pre-existing state of the column. Our code
is expecting a column already exists error but this error predicates that.

On Wed, Apr 27, 2016 at 10:21 AM, Adrian Klaver 
wrote:

> On 04/27/2016 07:13 AM, Will McCormick wrote:
>
>> Why does this not work? From what I read only default values should
>> cause issue. I'm on release 9.4.4:
>>
>>
>> bms=# ALTER TABLE trap ALTER COLUMN trap_timestamp TYPE TIMESTAMP WITH
>> TIME ZONE;
>> ERROR:  ALTER TABLE ... ALTER COLUMN TYPE may only affect UNLOGGED or
>> TEMPORARY
>> tables when BDR is active; trap is a regular table
>>
>
> http://bdr-project.org/docs/stable/ddl-replication-statements.html
>
> 8.2.3. DDL statements with restrictions
>
> ALTER TABLE
>
> Generally ALTER TABLE commands are allowed. There are a however
> several sub-commands that are not supported, mainly those that perform a
> full-table re-write.
>
> ...
>
> ALTER COLUMN ... TYPE - changing a column's type is not supported. Chaning
> a column in a way that doesn't require table rewrites may be suppported at
> some point.
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: [GENERAL] BDR Alter table failing

2016-04-27 Thread Adrian Klaver

On 04/27/2016 07:13 AM, Will McCormick wrote:

Why does this not work? From what I read only default values should
cause issue. I'm on release 9.4.4:


bms=# ALTER TABLE trap ALTER COLUMN trap_timestamp TYPE TIMESTAMP WITH
TIME ZONE;
ERROR:  ALTER TABLE ... ALTER COLUMN TYPE may only affect UNLOGGED or
TEMPORARY
tables when BDR is active; trap is a regular table


http://bdr-project.org/docs/stable/ddl-replication-statements.html

8.2.3. DDL statements with restrictions

ALTER TABLE

Generally ALTER TABLE commands are allowed. There are a however 
several sub-commands that are not supported, mainly those that perform a 
full-table re-write.


...

ALTER COLUMN ... TYPE - changing a column's type is not supported. 
Chaning a column in a way that doesn't require table rewrites may be 
suppported at some point.


--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR Alter Table

2016-04-27 Thread Will McCormick
Why does this not work:


Re: [GENERAL] BDR Alter Table

2016-04-27 Thread Will McCormick
Hi All,

And sorry about that damn thumb pad! Premature send!

On Wed, Apr 27, 2016 at 10:15 AM, Adrian Klaver 
wrote:

> On 04/27/2016 07:11 AM, Will McCormick wrote:
>
>> Why does this not work:
>>
>>
> Because it is NULL :)?
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: [GENERAL] BDR Alter Table

2016-04-27 Thread Adrian Klaver

On 04/27/2016 07:11 AM, Will McCormick wrote:

Why does this not work:



Because it is NULL :)?

--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR Alter table failing

2016-04-27 Thread Will McCormick
Why does this not work? From what I read only default values should cause
issue. I'm on release 9.4.4:


bms=# ALTER TABLE trap ALTER COLUMN trap_timestamp TYPE TIMESTAMP WITH TIME
ZONE;
ERROR:  ALTER TABLE ... ALTER COLUMN TYPE may only affect UNLOGGED or
TEMPORARY
tables when BDR is active; trap is a regular table


Re: [GENERAL] BDR replication slots

2016-04-19 Thread Nikhil
On interface down:
--
<10.102.31.213(27599)postgres13082016-04-19 06:31:36
GMTprocess_journal%LOG:  terminating walsender process due to replication
timeout


Once interface is brought back

 425906 <12692016-04-19 08:32:58 GMT%LOG:  starting background worker
process "bdr (6275149074578269365,2,16386,)->bdr (6275135922714263763,1,"
 425907 <597732016-04-19 08:32:58 GMT%ERROR:  relation "mygroup" already
exists
 425908 <597732016-04-19 08:32:58 GMT%CONTEXT:  during DDL replay of ddl
statement: CREATE  TABLE  public.mygroup (id pg_catalog."varchar"(14)  NOT
NULL COLLATE pg_catalog."default", name pg_catalog."varchar"(100)   COLLATE
pg_catalog."default", device_type pg_catalog."varchar"(30)   COLLATE
pg_catalog."default", platform_type pg_catalog."varchar"(30)   COLLATE
pg_catalog."default",
CONSTRAINT mygroup_pkey PRIMARY KEY (id) )  WITH (oids=OFF)
 425909 <12692016-04-19 08:32:58 GMT%LOG:  worker process: bdr
(6275149074578269365,2,16386,)->bdr (6275135922714263763,1,(PID 59773)
exited with exit code 1
 425910 <10.102.31.213(13467)postgres597742016-04-19 08:32:59
GMTprocess_journal%LOG:  starting logical decoding for slot
"bdr_16386_6275135922714263763_1_16386__"
 425911 <10.102.31.213(13467)postgres597742016-04-19 08:32:59
GMTprocess_journal%DETAIL:  streaming transactions committing after
0/1014CEE8, reading WAL from 0/1014A920
 425912 <10.102.31.213(13467)postgres597742016-04-19 08:32:59
GMTprocess_journal%LOG:  logical decoding found consistent point at
0/1014A920
 425913 <10.102.31.213(13467)postgres597742016-04-19 08:32:59
GMTprocess_journal%DETAIL:  There are no running transactions.
 425914 *<10.102.31.213(13467)postgres597742016-04-19 08:32:59
GMTprocess_journal%LOG:  unexpected EOF on standby con**nection*


On Tue, Apr 19, 2016 at 10:29 AM, Alvaro Aguayo Garcia-Rada <
aagu...@opensysperu.com> wrote:

> Hello,
>
> What do you see on each node's log after enablibg interfaces?
>
> Regards,
>
> Alvaro Aguayo
> Jefe de Operaciones
> Open Comb Systems E.I.R.L.
>
> Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC:
> (+51) 954183248
> Website: www.ocs.pe
>
> Sent from my Sony Xperia™ smartphone
>
>
>  Nikhil wrote 
>
>
> Hello,
>
> I have a 2 node BDR group and replication is happening properly. if i
> bring down one of the node's interface, after sometime the replication
> slots are becoming inactive (pg_replication_slots view). Then if i bring
> back interface slots are not turning active automatically and replication
> stops. Any idea why automatically its not re-established ?
>
> Best Regards,
> Nikhil
>


Re: [GENERAL] BDR replication slots

2016-04-19 Thread Sylvain Marechal
2016-04-19 6:51 GMT+02:00 Nikhil :

> Hello,
>
> I have a 2 node BDR group and replication is happening properly. if i
> bring down one of the node's interface, after sometime the replication
> slots are becoming inactive (pg_replication_slots view). Then if i bring
> back interface slots are not turning active automatically and replication
> stops. Any idea why automatically its not re-established ?
>

May be postgres does not detect the link is down.
You could try modifying the connection strings, so that postgres detects
earlier the problem, like described in this post :
https://github.com/2ndQuadrant/bdr/issues/173 :

<<<
[...]
Example of dsn_connection field:
host=192.168.1.140 port=5432 dbname=safewalk-server connect_timeout=10
keepalives_idle=5 keepalives_interval=1
>>>


Hope this helps,

Sylvain


Re: [GENERAL] BDR replication slots

2016-04-18 Thread Alvaro Aguayo Garcia-Rada
Hello,

What do you see on each node's log after enablibg interfaces?

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

Sent from my Sony Xperia™ smartphone

 Nikhil wrote 

Hello,

I have a 2 node BDR group and replication is happening properly. if i bring
down one of the node's interface, after sometime the replication slots are
becoming inactive (pg_replication_slots view). Then if i bring back
interface slots are not turning active automatically and replication stops.
Any idea why automatically its not re-established ?

Best Regards,
Nikhil


[GENERAL] BDR replication slots

2016-04-18 Thread Nikhil
Hello,

I have a 2 node BDR group and replication is happening properly. if i bring
down one of the node's interface, after sometime the replication slots are
becoming inactive (pg_replication_slots view). Then if i bring back
interface slots are not turning active automatically and replication stops.
Any idea why automatically its not re-established ?

Best Regards,
Nikhil


Re: [GENERAL] bdr replication

2016-03-31 Thread Alvaro Aguayo Garcia-Rada
I'd rather use something like this:

bdr_dump -N bdr -h localhost -U postgres -W mydb > /backup.sql

That's for each database. You would restore it, you must first create the empty 
database:

CREATE DATABASE mydb WITH ENCODING 'UTF-8' TEMPLATE template0;

(change encoding according to your needs. Run "psql --list" on your current 
system to see current encoding for each db)
Then, restore its contents:

cat /backup.sql | psql mydb

During restore, you may see some errors due to the backup containing the bdr 
truncate trigger on each table; it's normal at this point, so just ignore it.

You may need to backup your postgres users:

pg_dumpall -g > /globals.sql

And restore it BEFORE any database with

cat /globals.sql | psql

Backing up and restoring globals may be important when rewtoring databases, 
mostly to preserve permissions on tables/sequences during each database restore.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

Sent from my Sony Xperia™ smartphone

 Slava Bendersky wrote 

Hello Alvaro, 
That sound like a good plan. I will trying tomorrow, because need maintenance 
window. 

To do back up I can use something like 

bdr_dump -Fp -h localhost -U postgres mydb -f /tmp/mydb.data > --data-only 
--exclude-table='bdr_* 

volga629 


From: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com> 
Sent: Thursday, 31 March, 2016 02:41:17 
Subject: Re: [GENERAL] bdr replication 

We are overlaping mails :P 

What I don't understand is the need of a shared storage in this case. It would 
be a lot better to have the data folder inside each server virtual disk to 
avoid troubles with the shared storage; I really see no reason for such 
configuration here. 

Now, getting into the solution rather than in the problem. I suggest you to do 
the following: 

1. First of all, backup your data folder for both nodes. Just in case. Make 
backup with postgres stopped to avoid problems. 
2. Choose one node which will be considered up-to-date(Let's say "Node A") 
3. Dump your database(s) on that node, excluding the bdr schema on each db. 
Dump also your globals 
4. Wipe or rename your data folder on each node, and then initialize each node. 
Do not configure BDR yet. 
5. Restore your data(backed up at step 3) on Node A 
6. Configure BDR on Node A 
7. Add Node B to the replication group, using "bdr_init_copy" to make it 
replicate from Node A. 

That should do the trick. There is another possibility: Drop the replication 
configuration no Node A, and then start from scratch(Steps 1, 6 & 7). However, 
this can be troublesome, as it involves editing bdr & postgres schemas, and 
that can lead you to problems on the future, so I'd recommend you the "long" 
way. 

Feel free to ask any question regarding this issue. Looks serious 

Alvaro Aguayo 
Jefe de Operaciones 
Open Comb Systems E.I.R.L. 

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51) 
954183248 
Website: www.ocs.pe 

- Original Message - 
From: "Slava Bendersky" <volga...@skillsearch.ca> 
To: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com> 
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com> 
Sent: Thursday, 31 March, 2016 12:28:09 AM 
Subject: Re: [GENERAL] bdr replication 



Hello Alvaro, 
We running BDR where each PostgreSQL vm is a master and shared storage only on 
hypervisor level. All vm leave with own virtual disk. Right now we have 2 
server for redundancy which have shared network between them. Issue that BDR is 
working right now see log below. And my question how to restore BDR replication 
correctly. 


volga629 

- Original Message - 

From: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com> 
Sent: Thursday, 31 March, 2016 02:19:42 
Subject: Re: [GENERAL] bdr replication 


What's the purpose of such configuration? Doesn't makes sense for me. The only 
reasonable case where you would want to put the data folder on a shared storage 
is for usage with warm standby, where you can have a secondary server which 
serves as a read-only replica, and can be rpomoted to master on master failure. 

If you intend high availability, you'd rather try it at VM level, like vmware 
HA or Proxmox HA. That will make your VM run on any hypervisor in the group 
disregarding the failure of some node. 

Regards, 

Alvaro Aguayo 
Jefe de Ope

Re: [GENERAL] bdr replication

2016-03-30 Thread John R Pierce

On 3/30/2016 10:41 PM, Alvaro Aguayo Garcia-Rada wrote:

What I don't understand is the need of a shared storage in this case. It would 
be a lot better to have the data folder inside each server virtual disk to 
avoid troubles with the shared storage; I really see no reason for such 
configuration here.


I believe what the original poster is trying to say, his physical 
servers have a shared storage pool, but each VM has its own private 
virtual disks allocated out of this shared pool, and thats what his BDR 
database servers are using.


this should be fine, so its a BDR problem. how to diagnose and 
repair BDR problems.   I have no experience running BDR, so will step 
out of this thread (please delete me from CC's).




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] bdr replication

2016-03-30 Thread Alvaro Aguayo Garcia-Rada
We are overlaping mails :P

What I don't understand is the need of a shared storage in this case. It would 
be a lot better to have the data folder inside each server virtual disk to 
avoid troubles with the shared storage; I really see no reason for such 
configuration here.

Now, getting into the solution rather than in the problem. I suggest you to do 
the following:

1. First of all, backup your data folder for both nodes. Just in case. Make 
backup with postgres stopped to avoid problems.
2. Choose one node which will be considered up-to-date(Let's say "Node A")
3. Dump your database(s) on that node, excluding the bdr schema on each db. 
Dump also your globals
4. Wipe or rename your data folder on each node, and then initialize each node. 
Do not configure BDR yet.
5. Restore your data(backed up at step 3) on Node A
6. Configure BDR on Node A
7. Add Node B to the replication group, using "bdr_init_copy" to make it 
replicate from Node A.

That should do the trick. There is another possibility: Drop the replication 
configuration no Node A, and then start from scratch(Steps 1, 6 & 7). However, 
this can be troublesome, as it involves editing bdr & postgres schemas, and 
that can lead you to problems on the future, so I'd recommend you the "long" 
way.

Feel free to ask any question regarding this issue. Looks serious

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Slava Bendersky" <volga...@skillsearch.ca>
To: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com>
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com>
Sent: Thursday, 31 March, 2016 12:28:09 AM
Subject: Re: [GENERAL] bdr replication



Hello Alvaro, 
We running BDR where each PostgreSQL vm is a master and shared storage only on 
hypervisor level. All vm leave with own virtual disk. Right now we have 2 
server for redundancy which have shared network between them. Issue that BDR is 
working right now see log below. And my question how to restore BDR replication 
correctly. 


volga629 

- Original Message -

From: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com> 
Sent: Thursday, 31 March, 2016 02:19:42 
Subject: Re: [GENERAL] bdr replication 


What's the purpose of such configuration? Doesn't makes sense for me. The only 
reasonable case where you would want to put the data folder on a shared storage 
is for usage with warm standby, where you can have a secondary server which 
serves as a read-only replica, and can be rpomoted to master on master failure. 

If you intend high availability, you'd rather try it at VM level, like vmware 
HA or Proxmox HA. That will make your VM run on any hypervisor in the group 
disregarding the failure of some node. 

Regards, 

Alvaro Aguayo 
Jefe de Operaciones 
Open Comb Systems E.I.R.L. 

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51) 
954183248 
Website: www.ocs.pe 

- Original Message - 
From: "Slava Bendersky" <volga...@skillsearch.ca> 
To: "John R Pierce" <pie...@hogranch.com> 
Cc: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Wednesday, 30 March, 2016 10:57:13 PM 
Subject: Re: [GENERAL] bdr replication 



In my case only virtual hosts are use share storage (feed from glusterfs), but 
actual virtual machines have own separate disks and all PostgreSQL run on 
separate data directories. 


volga629 

- Original Message - 

From: "John R Pierce" <pie...@hogranch.com> 
To: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:34:55 
Subject: Re: [GENERAL] bdr replication 


On 3/30/2016 8:09 PM, Slava Bendersky wrote: 
> Is any share storage technology recommended for PostgreSQL in virtual 
> environment ? 
> Ok what I will do is going take backups, shutdown both virtual servers 
> and place all vm use local disk on server only. 


'share storage technology'... um. thats such a vague term, it can 
mean lots of things. 

each postgres instance needs its own data store, two instances can NOT 
share the same files under any condition. these data stores can be 
on SAN or NAS, as long the storage is reliable about committed random 
writes, and as long as two different servers aren't using the SAME 
directory for their data stores. 

-- 
john r pierce, recycling bits in santa cruz 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
Hello Alvaro, 
We running BDR where each PostgreSQL vm is a master and shared storage only on 
hypervisor level. All vm leave with own virtual disk. Right now we have 2 
server for redundancy which have shared network between them. Issue that BDR is 
working right now see log below. And my question how to restore BDR replication 
correctly. 

volga629 


From: "Alvaro Aguayo Garcia-Rada" <aagu...@opensysperu.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org>, "John R Pierce" 
<pie...@hogranch.com> 
Sent: Thursday, 31 March, 2016 02:19:42 
Subject: Re: [GENERAL] bdr replication 

What's the purpose of such configuration? Doesn't makes sense for me. The only 
reasonable case where you would want to put the data folder on a shared storage 
is for usage with warm standby, where you can have a secondary server which 
serves as a read-only replica, and can be rpomoted to master on master failure. 

If you intend high availability, you'd rather try it at VM level, like vmware 
HA or Proxmox HA. That will make your VM run on any hypervisor in the group 
disregarding the failure of some node. 

Regards, 

Alvaro Aguayo 
Jefe de Operaciones 
Open Comb Systems E.I.R.L. 

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51) 
954183248 
Website: www.ocs.pe 

- Original Message - 
From: "Slava Bendersky" <volga...@skillsearch.ca> 
To: "John R Pierce" <pie...@hogranch.com> 
Cc: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Wednesday, 30 March, 2016 10:57:13 PM 
Subject: Re: [GENERAL] bdr replication 



In my case only virtual hosts are use share storage (feed from glusterfs), but 
actual virtual machines have own separate disks and all PostgreSQL run on 
separate data directories. 


volga629 

- Original Message - 

From: "John R Pierce" <pie...@hogranch.com> 
To: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:34:55 
Subject: Re: [GENERAL] bdr replication 


On 3/30/2016 8:09 PM, Slava Bendersky wrote: 
> Is any share storage technology recommended for PostgreSQL in virtual 
> environment ? 
> Ok what I will do is going take backups, shutdown both virtual servers 
> and place all vm use local disk on server only. 


'share storage technology'... um. thats such a vague term, it can 
mean lots of things. 

each postgres instance needs its own data store, two instances can NOT 
share the same files under any condition. these data stores can be 
on SAN or NAS, as long the storage is reliable about committed random 
writes, and as long as two different servers aren't using the SAME 
directory for their data stores. 

-- 
john r pierce, recycling bits in santa cruz 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 


Re: [GENERAL] bdr replication

2016-03-30 Thread Alvaro Aguayo Garcia-Rada
What's the purpose of such configuration? Doesn't makes sense for me. The only 
reasonable case where you would want to put the data folder on a shared storage 
is for usage with warm standby, where you can have a secondary server which 
serves as a read-only replica, and can be rpomoted to master on master failure.

If you intend high availability, you'd rather try it at VM level, like vmware 
HA or Proxmox HA. That will make your VM run on any hypervisor in the group 
disregarding the failure of some node.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 
954183248
Website: www.ocs.pe

- Original Message -
From: "Slava Bendersky" <volga...@skillsearch.ca>
To: "John R Pierce" <pie...@hogranch.com>
Cc: "pgsql-general" <pgsql-general@postgresql.org>
Sent: Wednesday, 30 March, 2016 10:57:13 PM
Subject: Re: [GENERAL] bdr replication



In my case only virtual hosts are use share storage (feed from glusterfs), but 
actual virtual machines have own separate disks and all PostgreSQL run on 
separate data directories. 


volga629 

- Original Message -

From: "John R Pierce" <pie...@hogranch.com> 
To: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:34:55 
Subject: Re: [GENERAL] bdr replication 


On 3/30/2016 8:09 PM, Slava Bendersky wrote: 
> Is any share storage technology recommended for PostgreSQL in virtual 
> environment ? 
> Ok what I will do is going take backups, shutdown both virtual servers 
> and place all vm use local disk on server only. 


'share storage technology'... um. thats such a vague term, it can 
mean lots of things. 

each postgres instance needs its own data store, two instances can NOT 
share the same files under any condition. these data stores can be 
on SAN or NAS, as long the storage is reliable about committed random 
writes, and as long as two different servers aren't using the SAME 
directory for their data stores. 

-- 
john r pierce, recycling bits in santa cruz 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
Hello Everyone, 
Is possible recovery from my situation at all ? I was looking on tool which 
might will help and only bdr_init_copy. If possible initialize second node 
again ? Also is it good idea enable wal archiving with bdr ? 

volga629 


From: "volga629" <volga...@skillsearch.ca> 
To: "John R Pierce" <pie...@hogranch.com> 
Cc: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:57:13 
Subject: Re: [GENERAL] bdr replication 

In my case only virtual hosts are use share storage (feed from glusterfs), but 
actual virtual machines have own separate disks and all PostgreSQL run on 
separate data directories. 

volga629 


From: "John R Pierce" <pie...@hogranch.com> 
To: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:34:55 
Subject: Re: [GENERAL] bdr replication 

On 3/30/2016 8:09 PM, Slava Bendersky wrote: 
> Is any share storage technology recommended for PostgreSQL in virtual 
> environment ? 
> Ok what I will do is going take backups, shutdown both virtual servers 
> and place all vm use local disk on server only. 


'share storage technology'... um. thats such a vague term, it can 
mean lots of things. 

each postgres instance needs its own data store, two instances can NOT 
share the same files under any condition. these data stores can be 
on SAN or NAS, as long the storage is reliable about committed random 
writes, and as long as two different servers aren't using the SAME 
directory for their data stores. 

-- 
john r pierce, recycling bits in santa cruz 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 



Re: [GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
In my case only virtual hosts are use share storage (feed from glusterfs), but 
actual virtual machines have own separate disks and all PostgreSQL run on 
separate data directories. 

volga629 


From: "John R Pierce" <pie...@hogranch.com> 
To: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Thursday, 31 March, 2016 00:34:55 
Subject: Re: [GENERAL] bdr replication 

On 3/30/2016 8:09 PM, Slava Bendersky wrote: 
> Is any share storage technology recommended for PostgreSQL in virtual 
> environment ? 
> Ok what I will do is going take backups, shutdown both virtual servers 
> and place all vm use local disk on server only. 


'share storage technology'... um. thats such a vague term, it can 
mean lots of things. 

each postgres instance needs its own data store, two instances can NOT 
share the same files under any condition. these data stores can be 
on SAN or NAS, as long the storage is reliable about committed random 
writes, and as long as two different servers aren't using the SAME 
directory for their data stores. 

-- 
john r pierce, recycling bits in santa cruz 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-general 


Re: [GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
Hello Craig, 
Is any share storage technology recommended for PostgreSQL in virtual 
environment ? 
Ok what I will do is going take backups, shutdown both virtual servers and 
place all vm use local disk on server only. 
volga629 


From: "Craig Ringer" <cr...@2ndquadrant.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Wednesday, 30 March, 2016 23:57:28 
Subject: Re: [GENERAL] bdr replication 

On 31 March 2016 at 10:43, Slava Bendersky < volga...@skillsearch.ca > wrote: 



Hello Craig, 
The current setup is two server which run libvirt and for storage which run 
glusterfs (storage server feed two virtual servers). Right now is no fencing in 
place. Each of the nodes have one PostgreSQL vm with bdr. 



That's a disaster waiting to happen. You can't just share storage like that in 
PostgreSQL, BDR or otherwise. I'm amazed it didn't fail earlier. 

Try to dump whatever data you can recover, initdb a new normal PostgreSQL 
instance, restore, and set up normal replication. Look into repmgr, pgbarman, 
etc. Read the manual on replication, backup and failover. Do _not_ use shared 
storage. 

-- 
Craig Ringer http://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Training & Services 



Re: [GENERAL] bdr replication

2016-03-30 Thread Craig Ringer
On 31 March 2016 at 10:43, Slava Bendersky  wrote:

> Hello Craig,
> The current setup is two server which run libvirt and for storage which
> run glusterfs (storage server feed two virtual servers).  Right now is no
> fencing in place. Each of the nodes have one  PostgreSQL vm with bdr.
>

That's a disaster waiting to happen. You can't just share storage like that
in PostgreSQL, BDR or otherwise. I'm amazed it didn't fail earlier.

Try to dump whatever data you can recover, initdb a new normal PostgreSQL
instance, restore, and set up normal replication. Look into repmgr,
pgbarman, etc. Read the manual on replication, backup and failover. Do
_not_ use shared storage.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
Hello Craig, 
The current setup is two server which run libvirt and for storage which run 
glusterfs (storage server feed two virtual servers). Right now is no fencing in 
place. Each of the nodes have one PostgreSQL vm with bdr. 

volga629. 

From: "Craig Ringer" <cr...@2ndquadrant.com> 
To: "volga629" <volga...@skillsearch.ca> 
Cc: "pgsql-general" <pgsql-general@postgresql.org> 
Sent: Wednesday, 30 March, 2016 23:20:49 
Subject: Re: [GENERAL] bdr replication 

On 31 March 2016 at 09:38, Slava Bendersky < volga...@skillsearch.ca > wrote: 



Hello Everyone, 
I am looking for suggestion how to recover bdr replication. 
The short story we have 2 virtual nodes with share storage. 



Can you describe the "shared storage" setup in more detail? 

In general, with PostgreSQL "shared storage" is a shortcut to "massive database 
corruption" unless you have extremely careful fencing and STONITH. 


-- 
Craig Ringer http://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Training & Services 



Re: [GENERAL] bdr replication

2016-03-30 Thread Craig Ringer
On 31 March 2016 at 09:38, Slava Bendersky  wrote:

> Hello Everyone,
> I am looking for suggestion how to recover bdr replication.
> The short story we have 2 virtual nodes with share storage.
>

Can you describe the "shared storage" setup in more detail?

In general, with PostgreSQL "shared storage" is a shortcut to "massive
database corruption" unless you have extremely careful fencing and STONITH.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] bdr replication

2016-03-30 Thread Slava Bendersky
Hello Everyone, 
I am looking for suggestion how to recover bdr replication. 
The short story we have 2 virtual nodes with share storage. Share storage lost 
power and after I brought all online bdr doesn't work properly. 

Here are some log 

2016-03-30 20:47:26 EDT postgres@10.12.160.2(43714):freeswitch_prod01 [15897] 
LOG: starting logical decoding for slot 
"bdr_16386_6242730220644137235_1_17937__" 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43714):freeswitch_prod01 [15897] 
DETAIL: streaming transactions committing after 0/39A248B0, reading WAL from 
0/39A248B0 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43712):fusionpbx_prod01 [15896] 
LOG: starting logical decoding for slot 
"bdr_20650_6242730220644137235_1_0__" 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43712):fusionpbx_prod01 [15896] 
DETAIL: streaming transactions committing after 0/39A248B0, reading WAL from 
0/39A248B0 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43712):fusionpbx_prod01 [15896] 
ERROR: requested WAL segment 00010039 has already been removed 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43714):freeswitch_prod01 [15897] 
ERROR: requested WAL segment 00010039 has already been removed 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43712):fusionpbx_prod01 [15896] 
LOG: could not receive data from client: Connection reset by peer 
2016-03-30 20:47:26 EDT postgres@10.12.160.2(43714):freeswitch_prod01 [15897] 
LOG: could not receive data from client: Connection reset by peer 
2016-03-30 20:47:28 EDT @: [977] LOG: starting background worker process "bdr 
(6242981939774453042,1,20650,)->bdr (6242730220644137235,1," 
2016-03-30 20:47:28 EDT @: [977] LOG: starting background worker process "bdr 
(6242981939774453042,1,16386,)->bdr (6242730220644137235,1," 
2016-03-30 20:47:28 EDT @: [15899] NOTICE: version "1.0" of extension 
"btree_gist" is already installed 
2016-03-30 20:47:28 EDT @: [15899] NOTICE: version "0.9.3.0" of extension "bdr" 
is already installed 
2016-03-30 20:47:28 EDT @: [15898] NOTICE: version "1.0" of extension 
"btree_gist" is already installed 
2016-03-30 20:47:28 EDT @: [15898] NOTICE: version "0.9.3.0" of extension "bdr" 
is already installed 
2016-03-30 20:47:28 EDT @: [15899] INFO: starting up replication from 1 at 0/0 
2016-03-30 20:47:28 EDT @: [15898] INFO: starting up replication from 8 at 0/0 
2016-03-30 20:47:28 EDT @: [15899] ERROR: data stream ended 
2016-03-30 20:47:28 EDT @: [15898] ERROR: data stream ended 
2016-03-30 20:47:28 EDT @: [977] LOG: worker process: bdr 
(6242981939774453042,1,20650,)->bdr (6242730220644137235,1, (PID 15898) exited 
with exit code 1 
2016-03-30 20:47:28 EDT @: [977] LOG: worker process: bdr 
(6242981939774453042,1,16386,)->bdr (6242730220644137235,1, (PID 15899) exited 
with exit code 1 


Re: [GENERAL] BDR

2016-03-19 Thread John R Pierce

On 3/14/2016 2:43 PM, Roland van Laar wrote:
However my instances are not on the same server and I attempted to 
simply add a host=(the ip) but that failed.

There are a couple of other factors:
- is postgres running on an external available ip?
- is there a replication user with a password? 


3: are the servers configured to allow network connections from each 
other?  appropriate authentication settings in pg_hba.conf on both sides?




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR

2016-03-19 Thread Craig Ringer
On 15 March 2016 at 05:17, Dustin Kempter 
wrote:

> However my instances are not on the same server and I attempted to simply
> add a host=(the ip) but that failed. Please help
>

Review the logs on both hosts to see any errors during setup.

Note that you will need to drop and re-create the database if you need to
attempt setup again.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] BDR

2016-03-14 Thread James Keener
Also, what did you run exactly (sanitized of course).

On March 14, 2016 5:38:19 PM EDT, John R Pierce  wrote:
>On 3/14/2016 2:17 PM, Dustin Kempter wrote:
>> However my instances are not on the same server and I attempted to 
>> simply add a host=(the ip) but that failed. Please help 
>
>did you get an error?   if so what error, exactly?
>
>
>
>-- 
>john r pierce, recycling bits in santa cruz
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [GENERAL] BDR

2016-03-14 Thread Roland van Laar

On 14-03-16 22:17, Dustin Kempter wrote:
Hello all, I am attempting to set up BDR between 2 separate nodes. I 
have been following the guide and got to here 
http://bdr-project.org/docs/0.9.0/quickstart-enabling.html

I am now stuck on this section

"Then you run a function that identifies a BDR group that delineates a 
connection string for other nodes to communicate with (for the first 
node, we will use port 5598) from the same SQL session as above on 
port 5598:


SELECT bdr.bdr_group_create(
  local_node_name := 'node1',
  node_external_dsn := 'port=5598 dbname=bdrdemo'
);"

However my instances are not on the same server and I attempted to 
simply add a host=(the ip) but that failed.

There are a couple of other factors:
- is postgres running on an external available ip?
- is there a replication user with a password?

Roland


Please help

Thanks in advance!







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR

2016-03-14 Thread John R Pierce

On 3/14/2016 2:17 PM, Dustin Kempter wrote:
However my instances are not on the same server and I attempted to 
simply add a host=(the ip) but that failed. Please help 


did you get an error?   if so what error, exactly?



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR

2016-03-14 Thread Dustin Kempter
Hello all, I am attempting to set up BDR between 2 separate nodes. I 
have been following the guide and got to here 
http://bdr-project.org/docs/0.9.0/quickstart-enabling.html

I am now stuck on this section

"Then you run a function that identifies a BDR group that delineates a 
connection string for other nodes to communicate with (for the first 
node, we will use port 5598) from the same SQL session as above on port 
5598:


SELECT bdr.bdr_group_create(
  local_node_name := 'node1',
  node_external_dsn := 'port=5598 dbname=bdrdemo'
);"

However my instances are not on the same server and I attempted to 
simply add a host=(the ip) but that failed. Please help


Thanks in advance!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  1   2   3   4   >