Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Masahiko Sawada
On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for testing.
>
> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro 
>  wrote in 
> 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Andres Freund
On 2016-04-04 10:35:34 +0100, Simon Riggs wrote:
> On 4 April 2016 at 09:28, Fujii Masao  wrote:
> > Barring any objections, I'll commit this patch.

No objection here either, just one question: Has anybody thought about
the ability to extend this to do per-database syncrep? Logical decoding
works on a database level, and that can cause some problems with global
configuration.

> That sounds good.
> 
> May I have one more day to review this? Actually more like 3-4 hours.

> I have no comments on an initial read, so I'm hopeful of having nothing at
> all to say on it.

Simon, perhaps you could hold the above question in your mind while
looking through this?

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Simon Riggs
On 4 April 2016 at 09:28, Fujii Masao  wrote:


> Barring any objections, I'll commit this patch.
>

That sounds good.

May I have one more day to review this? Actually more like 3-4 hours.

I have no comments on an initial read, so I'm hopeful of having nothing at
all to say on it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Kyotaro HORIGUCHI
Hello, thank you for testing.

At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Abhijit Menon-Sen
At 2016-04-04 17:28:07 +0900, masao.fu...@gmail.com wrote:
>
> Barring any objections, I'll commit this patch.

No objections, just a minor wording tweak:

doc/src/sgml/config.sgml:

"The synchronous standbys will be the standbys that their names appear
early in this list" should be "The synchronous standbys will be those
whose names appear earlier in this list".

doc/src/sgml/high-availability.sgml:

"The standbys that their names appear early in this list are given
higher priority and will be considered as synchronous" should be "The
standbys whose names appear earlier in the list are given higher
priority and will be considered as synchronous".

"The standbys that their names appear early in the list will be used as
the synchronous standby" should be "The standbys whose names appear
earlier in the list will be used as synchronous standbys".

You may prefer to reword this in some other way, but the current "that
their names appear" wording should be changed.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Fujii Masao
On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro
 wrote:
> On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
>  wrote:
>> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada  
>> wrote:
>>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  
>>> wrote:
 On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
  wrote:
> I personally don't think it needs such a survive measure. It is
> very small syntax and the parser reads very short text. If the
> parser failes in such mode, something more serious should have
> occurred.
>
> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
> wrote in 
> 
>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello,
>> >
>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>> > #define'ing fprintf). So it is doable if you mind exit().
>>
>> I'm afraid that your idea doesn't work in postmaster. Because 
>> ereport(ERROR) is
>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an 
>> internal
>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>> parser.
>
> If The ERROR may be LOG or DEBUG2 either, if we think the parser
> fatal erros are recoverable. guc-file.l is doing so.
>
>> ISTM that, when an internal flex fatal error occurs, it's
>> better to elog(FATAL) and terminate the problematic
>> process. This might lead to the server crash (e.g., if
>> postmaster emits a FATAL error, it and its all child processes
>> will exit soon). But probably we can live with this because the
>> fatal error basically rarely happens.
>
> I agree to this
>
>> OTOH, if we make the process keep running even after it gets an internal
>> fatal error (like Sawada's patch or your idea do), this might cause more
>> serious problem. Please imagine the case where one walsender gets the 
>> fatal
>> error (e.g., because of OOM), abandon new setting value of
>> synchronous_standby_names, and keep running with the previous setting 
>> value.
>> OTOH, the other walsender processes successfully parse the setting and
>> keep running with new setting. In this case, the inconsistency of the 
>> setting
>> which each walsender is based on happens. This completely will mess up 
>> the
>> synchronous replication.
>
> On the other hand, guc-file.l seems ignoring parser errors under
> normal operation, even though it may cause similar inconsistency,
> if any..
>
> | LOG:  received SIGHUP, reloading configuration files
> | LOG:  input in flex scanner failed at file 
> "/home/horiguti/data/data_work/postgresql.conf" line 1
> | LOG:  configuration file 
> "/home/horiguti/data/data_work/postgresql.conf" contains errors; no 
> changes were applied
>
>> Therefore, I think that it's better to make the problematic process exit
>> with FATAL error rather than ignore the error and keep it running.
>
> +1. Restarting walsender would be far less harmful than keeping
> it running in doubtful state.
>
> Sould I wait for the next version or have a look on the latest?
>

 Attached latest patch incorporate some review comments so far, and is
 rebased against current HEAD.

>>>
>>> Sorry I attached wrong patch.
>>> Attached patch is correct patch.

Thanks for updating the patch!

I applied the following changes to the patch.
Attached is the revised version of the patch.

- Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on
  the recent discussion with Horiguchi-san.
- Improved the documentation.
- Fixed some bugs.
- Removed the changes for recovery testing framework. I'd like to commit
   those changes later separately from the main patch of multiple sync rep.

Barring any objections, I'll commit this patch.

>> One thing I noticed is that there are LOG messages telling me when a
>> standby becomes a synchronous standby, but nothing to tell me if a
>> standby stops being a standby (ie because a higher priority one has
>> taken its place in the quorum).  Would that be interesting?

+1

>> Also, I spotted some tiny mistakes:
>>
>> +  
>> +   Dedicated language for multiple synchornous 
>> replication
>> +  
>>
>> 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-01 Thread Thomas Munro
On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
 wrote:
> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada  
> wrote:
>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  
>> wrote:
>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>>  wrote:
 I personally don't think it needs such a survive measure. It is
 very small syntax and the parser reads very short text. If the
 parser failes in such mode, something more serious should have
 occurred.

 At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
 wrote in 
 
> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello,
> >
> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
> >  wrote in 
> > 
> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> >>  wrote:
> > As mentioned in my comment, SQL parser converts yy_fatal_error
> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
> > #define'ing fprintf). So it is doable if you mind exit().
>
> I'm afraid that your idea doesn't work in postmaster. Because 
> ereport(ERROR) is
> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
> flex fatal error occurs, postmaster just exits instead of jumping out of 
> parser.

 If The ERROR may be LOG or DEBUG2 either, if we think the parser
 fatal erros are recoverable. guc-file.l is doing so.

> ISTM that, when an internal flex fatal error occurs, it's
> better to elog(FATAL) and terminate the problematic
> process. This might lead to the server crash (e.g., if
> postmaster emits a FATAL error, it and its all child processes
> will exit soon). But probably we can live with this because the
> fatal error basically rarely happens.

 I agree to this

> OTOH, if we make the process keep running even after it gets an internal
> fatal error (like Sawada's patch or your idea do), this might cause more
> serious problem. Please imagine the case where one walsender gets the 
> fatal
> error (e.g., because of OOM), abandon new setting value of
> synchronous_standby_names, and keep running with the previous setting 
> value.
> OTOH, the other walsender processes successfully parse the setting and
> keep running with new setting. In this case, the inconsistency of the 
> setting
> which each walsender is based on happens. This completely will mess up the
> synchronous replication.

 On the other hand, guc-file.l seems ignoring parser errors under
 normal operation, even though it may cause similar inconsistency,
 if any..

 | LOG:  received SIGHUP, reloading configuration files
 | LOG:  input in flex scanner failed at file 
 "/home/horiguti/data/data_work/postgresql.conf" line 1
 | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
 contains errors; no changes were applied

> Therefore, I think that it's better to make the problematic process exit
> with FATAL error rather than ignore the error and keep it running.

 +1. Restarting walsender would be far less harmful than keeping
 it running in doubtful state.

 Sould I wait for the next version or have a look on the latest?

>>>
>>> Attached latest patch incorporate some review comments so far, and is
>>> rebased against current HEAD.
>>>
>>
>> Sorry I attached wrong patch.
>> Attached patch is correct patch.
>>
>> [mulit_sync_replication_v21.patch]
>
> Here are some TPS numbers from some quick tests I ran on a set of
> Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
> as primary + 3 standbys, to try out different combinations of
> synchronous_commit levels and synchronous_standby_names numbers.  They
> were run for a short time only and these are of course systems with
> limited and perhaps uneven IO and CPU, but they still give some idea
> of the trends.  And reassuringly, the trends are travelling in the
> expected directions.
>
> All default settings except shared_buffers = 1GB, and the GUCs
> required for replication.
>
> pgbench postgres -j2 -c2 -N bench2 -T 600
>
>1(*) 2(*) 3(*)
>  
> off  = 4056 4096 4092
> local= 1323 1299 1312
> remote_write = 1130 1046  958
> on   =  860  744  701
> remote_apply =  785  725  604
>
> pgbench postgres -j16 -c16 -N bench2 -T 600
>
>1(*) 2(*) 3(*)
>  
> off  = 3952 3943 3933
> local= 2964 2984 3026
> remote_write = 2790 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Thomas Munro
On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada  wrote:
> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  
> wrote:
>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> I personally don't think it needs such a survive measure. It is
>>> very small syntax and the parser reads very short text. If the
>>> parser failes in such mode, something more serious should have
>>> occurred.
>>>
>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
>>> wrote in 
>>> 
 On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
  wrote:
 > Hello,
 >
 > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
 >  wrote in 
 > 
 > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
 >>  wrote:
 > As mentioned in my comment, SQL parser converts yy_fatal_error
 > into ereport(ERROR), which can be caught by the upper PG_TRY (by
 > #define'ing fprintf). So it is doable if you mind exit().

 I'm afraid that your idea doesn't work in postmaster. Because 
 ereport(ERROR) is
 implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
 flex fatal error occurs, postmaster just exits instead of jumping out of 
 parser.
>>>
>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>> fatal erros are recoverable. guc-file.l is doing so.
>>>
 ISTM that, when an internal flex fatal error occurs, it's
 better to elog(FATAL) and terminate the problematic
 process. This might lead to the server crash (e.g., if
 postmaster emits a FATAL error, it and its all child processes
 will exit soon). But probably we can live with this because the
 fatal error basically rarely happens.
>>>
>>> I agree to this
>>>
 OTOH, if we make the process keep running even after it gets an internal
 fatal error (like Sawada's patch or your idea do), this might cause more
 serious problem. Please imagine the case where one walsender gets the fatal
 error (e.g., because of OOM), abandon new setting value of
 synchronous_standby_names, and keep running with the previous setting 
 value.
 OTOH, the other walsender processes successfully parse the setting and
 keep running with new setting. In this case, the inconsistency of the 
 setting
 which each walsender is based on happens. This completely will mess up the
 synchronous replication.
>>>
>>> On the other hand, guc-file.l seems ignoring parser errors under
>>> normal operation, even though it may cause similar inconsistency,
>>> if any..
>>>
>>> | LOG:  received SIGHUP, reloading configuration files
>>> | LOG:  input in flex scanner failed at file 
>>> "/home/horiguti/data/data_work/postgresql.conf" line 1
>>> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
>>> contains errors; no changes were applied
>>>
 Therefore, I think that it's better to make the problematic process exit
 with FATAL error rather than ignore the error and keep it running.
>>>
>>> +1. Restarting walsender would be far less harmful than keeping
>>> it running in doubtful state.
>>>
>>> Sould I wait for the next version or have a look on the latest?
>>>
>>
>> Attached latest patch incorporate some review comments so far, and is
>> rebased against current HEAD.
>>
>
> Sorry I attached wrong patch.
> Attached patch is correct patch.
>
> [mulit_sync_replication_v21.patch]

Here are some TPS numbers from some quick tests I ran on a set of
Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
as primary + 3 standbys, to try out different combinations of
synchronous_commit levels and synchronous_standby_names numbers.  They
were run for a short time only and these are of course systems with
limited and perhaps uneven IO and CPU, but they still give some idea
of the trends.  And reassuringly, the trends are travelling in the
expected directions.

All default settings except shared_buffers = 1GB, and the GUCs
required for replication.

pgbench postgres -j2 -c2 -N bench2 -T 600

   1(*) 2(*) 3(*)
     
off  = 4056 4096 4092
local= 1323 1299 1312
remote_write = 1130 1046  958
on   =  860  744  701
remote_apply =  785  725  604

pgbench postgres -j16 -c16 -N bench2 -T 600

   1(*) 2(*) 3(*)
     
off  = 3952 3943 3933
local= 2964 2984 3026
remote_write = 2790 2724 2675
on   = 2731 2627 2523
remote_apply = 2627 2501 2432

One thing I noticed is that there are LOG messages telling me when a
standby becomes a synchronous standby, but nothing to tell me if a
standby stops being a standby 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Masahiko Sawada
On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  wrote:
> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>  wrote:
>> I personally don't think it needs such a survive measure. It is
>> very small syntax and the parser reads very short text. If the
>> parser failes in such mode, something more serious should have
>> occurred.
>>
>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
>> wrote in 
>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>> > Hello,
>>> >
>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>>> >  wrote in 
>>> > 
>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>> >>  wrote:
>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>
>>> I'm afraid that your idea doesn't work in postmaster. Because 
>>> ereport(ERROR) is
>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>>> parser.
>>
>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>> fatal erros are recoverable. guc-file.l is doing so.
>>
>>> ISTM that, when an internal flex fatal error occurs, it's
>>> better to elog(FATAL) and terminate the problematic
>>> process. This might lead to the server crash (e.g., if
>>> postmaster emits a FATAL error, it and its all child processes
>>> will exit soon). But probably we can live with this because the
>>> fatal error basically rarely happens.
>>
>> I agree to this
>>
>>> OTOH, if we make the process keep running even after it gets an internal
>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>> serious problem. Please imagine the case where one walsender gets the fatal
>>> error (e.g., because of OOM), abandon new setting value of
>>> synchronous_standby_names, and keep running with the previous setting value.
>>> OTOH, the other walsender processes successfully parse the setting and
>>> keep running with new setting. In this case, the inconsistency of the 
>>> setting
>>> which each walsender is based on happens. This completely will mess up the
>>> synchronous replication.
>>
>> On the other hand, guc-file.l seems ignoring parser errors under
>> normal operation, even though it may cause similar inconsistency,
>> if any..
>>
>> | LOG:  received SIGHUP, reloading configuration files
>> | LOG:  input in flex scanner failed at file 
>> "/home/horiguti/data/data_work/postgresql.conf" line 1
>> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
>> contains errors; no changes were applied
>>
>>> Therefore, I think that it's better to make the problematic process exit
>>> with FATAL error rather than ignore the error and keep it running.
>>
>> +1. Restarting walsender would be far less harmful than keeping
>> it running in doubtful state.
>>
>> Sould I wait for the next version or have a look on the latest?
>>
>
> Attached latest patch incorporate some review comments so far, and is
> rebased against current HEAD.
>

Sorry I attached wrong patch.
Attached patch is correct patch.

Regards,

--
Masahiko Sawada


multi_sync_replication_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Masahiko Sawada
On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
> I personally don't think it needs such a survive measure. It is
> very small syntax and the parser reads very short text. If the
> parser failes in such mode, something more serious should have
> occurred.
>
> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  wrote 
> in 
>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello,
>> >
>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>> > #define'ing fprintf). So it is doable if you mind exit().
>>
>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) 
>> is
>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>> parser.
>
> If The ERROR may be LOG or DEBUG2 either, if we think the parser
> fatal erros are recoverable. guc-file.l is doing so.
>
>> ISTM that, when an internal flex fatal error occurs, it's
>> better to elog(FATAL) and terminate the problematic
>> process. This might lead to the server crash (e.g., if
>> postmaster emits a FATAL error, it and its all child processes
>> will exit soon). But probably we can live with this because the
>> fatal error basically rarely happens.
>
> I agree to this
>
>> OTOH, if we make the process keep running even after it gets an internal
>> fatal error (like Sawada's patch or your idea do), this might cause more
>> serious problem. Please imagine the case where one walsender gets the fatal
>> error (e.g., because of OOM), abandon new setting value of
>> synchronous_standby_names, and keep running with the previous setting value.
>> OTOH, the other walsender processes successfully parse the setting and
>> keep running with new setting. In this case, the inconsistency of the setting
>> which each walsender is based on happens. This completely will mess up the
>> synchronous replication.
>
> On the other hand, guc-file.l seems ignoring parser errors under
> normal operation, even though it may cause similar inconsistency,
> if any..
>
> | LOG:  received SIGHUP, reloading configuration files
> | LOG:  input in flex scanner failed at file 
> "/home/horiguti/data/data_work/postgresql.conf" line 1
> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
> contains errors; no changes were applied
>
>> Therefore, I think that it's better to make the problematic process exit
>> with FATAL error rather than ignore the error and keep it running.
>
> +1. Restarting walsender would be far less harmful than keeping
> it running in doubtful state.
>
> Sould I wait for the next version or have a look on the latest?
>

Attached latest patch incorporate some review comments so far, and is
rebased against current HEAD.

Regards,

--
Masahiko Sawada


multi_sync_replication_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-29 Thread Kyotaro HORIGUCHI
I personally don't think it needs such a survive measure. It is
very small syntax and the parser reads very short text. If the
parser failes in such mode, something more serious should have
occurred.

At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  wrote 
in 
> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello,
> >
> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada  
> > wrote in 
> > 
> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> >>  wrote:
> > As mentioned in my comment, SQL parser converts yy_fatal_error
> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
> > #define'ing fprintf). So it is doable if you mind exit().
> 
> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) 
> is
> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
> flex fatal error occurs, postmaster just exits instead of jumping out of 
> parser.

If The ERROR may be LOG or DEBUG2 either, if we think the parser
fatal erros are recoverable. guc-file.l is doing so.

> ISTM that, when an internal flex fatal error occurs, it's
> better to elog(FATAL) and terminate the problematic
> process. This might lead to the server crash (e.g., if
> postmaster emits a FATAL error, it and its all child processes
> will exit soon). But probably we can live with this because the
> fatal error basically rarely happens.

I agree to this

> OTOH, if we make the process keep running even after it gets an internal
> fatal error (like Sawada's patch or your idea do), this might cause more
> serious problem. Please imagine the case where one walsender gets the fatal
> error (e.g., because of OOM), abandon new setting value of
> synchronous_standby_names, and keep running with the previous setting value.
> OTOH, the other walsender processes successfully parse the setting and
> keep running with new setting. In this case, the inconsistency of the setting
> which each walsender is based on happens. This completely will mess up the
> synchronous replication.

On the other hand, guc-file.l seems ignoring parser errors under
normal operation, even though it may cause similar inconsistency,
if any..

| LOG:  received SIGHUP, reloading configuration files
| LOG:  input in flex scanner failed at file 
"/home/horiguti/data/data_work/postgresql.conf" line 1
| LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
contains errors; no changes were applied

> Therefore, I think that it's better to make the problematic process exit
> with FATAL error rather than ignore the error and keep it running.

+1. Restarting walsender would be far less harmful than keeping
it running in doubtful state.

Sould I wait for the next version or have a look on the latest?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-29 Thread Fujii Masao
On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada  
> wrote in 
> sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Thank you for the new patch. Sorry to have overlooked some
>> > versions. I'm looking the  v19 patch now.
>> >
>> > make complains for an unused variable.
>
> Thank you. I'll have a closer look on it a bit later.
>
>> >> Attached latest patch incorporating all review comments so far.
>> >>
>> >> Aside from the review comments, I did following changes;
>> >> - Add logic to avoid fatal exit in yy_fatal_error().
>> >
>> > Maybe good catch, but..
>> >
>> >> syncrep_scanstr(const char *str)
>> > ..
>> >>   * Regain control after a fatal, internal flex error.  It may have
>> >>   * corrupted parser state.  Consequently, abandon the file, but trust
>> >  
>> >>   * that the state remains sane enough for syncrep_yy_delete_buffer().
>> >  
>> >
>> > guc-file.l actually abandones the config file but syncrep_scanner
>> > reads only a value of an item in it. And, the latter is
>> > eventually true but a bit hard to understand.
>> >
>> > The patch will emit a mysterious error message like this.
>> >
>> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
>> >> configuration file ".../postgresql.conf" contains errors
>> >
>> > This is utterly wrong. A bit related to that, it seems to me that
>> > syncrep_scan.l doesn't need the same mechanism with
>> > guc-file.l. The nature of the modification would be making
>> > call_*_check_hook to be tri-state instead of boolean. So just
>> > cathing errors in call_*_check_hook and ereport()'ing as SQL
>> > parser does seems enough, but either will do for me.
>>
>> Well, I think that call_*_check_hook can not catch such a fatal error.
>
> As mentioned in my comment, SQL parser converts yy_fatal_error
> into ereport(ERROR), which can be caught by the upper PG_TRY (by
> #define'ing fprintf). So it is doable if you mind exit().

I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
flex fatal error occurs, postmaster just exits instead of jumping out of parser.


ISTM that, when an internal flex fatal error occurs, it's better to elog(FATAL)
and terminate the problematic process. This might lead to the server crash
(e.g., if postmaster emits a FATAL error, it and its all child processes will
exit soon). But probably we can live with this because the fatal error basically
rarely happens.

OTOH, if we make the process keep running even after it gets an internal
fatal error (like Sawada's patch or your idea do), this might cause more
serious problem. Please imagine the case where one walsender gets the fatal
error (e.g., because of OOM), abandon new setting value of
synchronous_standby_names, and keep running with the previous setting value.
OTOH, the other walsender processes successfully parse the setting and
keep running with new setting. In this case, the inconsistency of the setting
which each walsender is based on happens. This completely will mess up the
synchronous replication.

Therefore, I think that it's better to make the problematic process exit
with FATAL error rather than ignore the error and keep it running.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-29 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada  
wrote in 
sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>  wrote:
> > Thank you for the new patch. Sorry to have overlooked some
> > versions. I'm looking the  v19 patch now.
> >
> > make complains for an unused variable.

Thank you. I'll have a closer look on it a bit later.

> >> Attached latest patch incorporating all review comments so far.
> >>
> >> Aside from the review comments, I did following changes;
> >> - Add logic to avoid fatal exit in yy_fatal_error().
> >
> > Maybe good catch, but..
> >
> >> syncrep_scanstr(const char *str)
> > ..
> >>   * Regain control after a fatal, internal flex error.  It may have
> >>   * corrupted parser state.  Consequently, abandon the file, but trust
> >  
> >>   * that the state remains sane enough for syncrep_yy_delete_buffer().
> >  
> >
> > guc-file.l actually abandones the config file but syncrep_scanner
> > reads only a value of an item in it. And, the latter is
> > eventually true but a bit hard to understand.
> >
> > The patch will emit a mysterious error message like this.
> >
> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
> >> configuration file ".../postgresql.conf" contains errors
> >
> > This is utterly wrong. A bit related to that, it seems to me that
> > syncrep_scan.l doesn't need the same mechanism with
> > guc-file.l. The nature of the modification would be making
> > call_*_check_hook to be tri-state instead of boolean. So just
> > cathing errors in call_*_check_hook and ereport()'ing as SQL
> > parser does seems enough, but either will do for me.
> 
> Well, I think that call_*_check_hook can not catch such a fatal error.

As mentioned in my comment, SQL parser converts yy_fatal_error
into ereport(ERROR), which can be caught by the upper PG_TRY (by
#define'ing fprintf). So it is doable if you mind exit().

> Because if yy_fatal_error() is called without preventing logic when
> reloading configuration file, postmaster process will abnormal exit
> immediately as well as wal sender process.


> >> - Improve regression test cases.
> >
> > I forgot to mention that, but additionalORDER BY makes the test
> > robust.
> >
> > I doubt the validity of the behavior in the following test.
> >
> >> # Change the synchronous_standby_names = '2[standby1,*,standby2]' and 
> >> check sync_state
> >
> > Is this regarded as a correct as a value for it?
> 
> Since previous s_s_names (9.5 or before) can accept this value, I
> didn't change behaviour.
> And I added this test case for checking backward compatibility more finely.

I understand that and it's fine. But we need a explanation for
the reason above in the test case or somewhere else.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Masahiko Sawada
On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the new patch. Sorry to have overlooked some
> versions. I'm looking the  v19 patch now.
>
> make complains for an unused variable.
>
> | syncrep.c: In function ‘SyncRepGetSyncStandbys’:
> | syncrep.c:601:13: warning: variable ‘next’ set but not used 
> [-Wunused-but-set-variable]
> |ListCell *next;
>
>
> At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Amit Langote
On 2016/03/28 17:50, Kyotaro HORIGUCHI wrote:
> 
> # LISPers don't hesitate to dive into Sea of Parens.

Sorry in advance to be off-topic: https://xkcd.com/297 :)

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Kyotaro HORIGUCHI
Thank you for the new patch. Sorry to have overlooked some
versions. I'm looking the  v19 patch now.

make complains for an unused variable.

| syncrep.c: In function ‘SyncRepGetSyncStandbys’:
| syncrep.c:601:13: warning: variable ‘next’ set but not used 
[-Wunused-but-set-variable]
|ListCell *next;


At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-28 Thread Masahiko Sawada
On Fri, Mar 25, 2016 at 9:20 PM, Robert Haas  wrote:
> On Thu, Mar 24, 2016 at 9:29 AM, Masahiko Sawada  
> wrote:
>> Also I felt a sense of discomfort regarding using [ and ] as a special
>> character for priority method.
>> Because (, ) and [, ] are a little similar each other, so it would
>> easily make many syntax errors when nested style is supported.
>> And the synopsis of that in documentation is odd;
>> synchronous_standby_names = 'N [ node_name [, ...] ]'
>>
>> This topic has been already discussed before but, we might want to
>> change it to other characters such as < and >?
>
> I personally would recommend against <>.  Those should mean less-than
> and greater-than, not grouping.  I think you could use parentheses,
> ().  There's nothing saying that has to mean any particular thing, so
> you may as well use it for the first thing implemented, perhaps.  Or
> you could use [] or {}.  It *is* important that you don't create
> confusing syntax summaries, but I don't think that's a reason to pick
> a nonstandard syntax for grouping.
>

I agree with you.
I've changed it to use parentheses.

Regards,

--
Masahiko Sawada
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d48a13f..1650b6d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2902,20 +2902,18 @@ include_dir 'conf.d'
   
   

-Specifies a comma-separated list of standby names that can support
-synchronous replication, as described in
-.
-At any one time there will be at most one active synchronous standby;
-transactions waiting for commit will be allowed to proceed after
-this standby server confirms receipt of their data.
-The synchronous standby will be the first standby named in this list
-that is both currently connected and streaming data in real-time
-(as shown by a state of streaming in the
+Specifies the standby names that can support synchronous replication
+using either of two syntaxes: a comma-separated list, or a more flexible syntax
+described in .
+Transactions waiting for commit will be allowed to proceed after a
+configurable subset of standby servers confirms receipt of their data.
+For the simple comma-separated list syntax, it is one server.
+The synchronous standbys will be those named in this parameter that are both
+currently connected and streaming data in real-time (as shown by a state
+of streaming in the
 
 pg_stat_replication view).
-Other standby servers appearing later in this list represent potential
-synchronous standbys.
-If the current synchronous standby disconnects for whatever reason,
+If the any of the current synchronous standbys disconnects for whatever reason,
 it will be replaced immediately with the next-highest-priority standby.
 Specifying more than one standby name can allow very high availability.

@@ -2923,9 +2921,10 @@ include_dir 'conf.d'
 The name of a standby server for this purpose is the
 application_name setting of the standby, as set in the
 primary_conninfo of the standby's WAL receiver.  There is
-no mechanism to enforce uniqueness. In case of duplicates one of the
-matching standbys will be chosen to be the synchronous standby, though
-exactly which one is indeterminate.
+no mechanism to enforce uniqueness. For each specified standby name,
+only the specified count of standbys will be chosen to be synchronous
+standbys, though exactly which ones is indeterminate.  The rest will
+represent potential synchronous standbys.
 The special entry * matches any
 application_name, including the default application name
 of walreceiver.
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 19d613e..5dd9fab 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1027,24 +1027,27 @@ primary_slot_name = 'node_a_slot'
 

 Synchronous replication offers the ability to confirm that all changes
-made by a transaction have been transferred to one synchronous standby
-server. This extends the standard level of durability
+made by a transaction have been transferred to one or more synchronous
+standby servers. This extends that standard level of durability
 offered by a transaction commit. This level of protection is referred
-to as 2-safe replication in computer science theory.
+to as 2-safe replication in computer science theory, and group-1-safe
+(group-safe and 1-safe) when synchronous_commit is set to
+more than remote_write.

 

 When requesting synchronous replication, each commit of a
 write transaction will wait until confirmation is
 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-25 Thread Robert Haas
On Thu, Mar 24, 2016 at 9:29 AM, Masahiko Sawada  wrote:
> Also I felt a sense of discomfort regarding using [ and ] as a special
> character for priority method.
> Because (, ) and [, ] are a little similar each other, so it would
> easily make many syntax errors when nested style is supported.
> And the synopsis of that in documentation is odd;
> synchronous_standby_names = 'N [ node_name [, ...] ]'
>
> This topic has been already discussed before but, we might want to
> change it to other characters such as < and >?

I personally would recommend against <>.  Those should mean less-than
and greater-than, not grouping.  I think you could use parentheses,
().  There's nothing saying that has to mean any particular thing, so
you may as well use it for the first thing implemented, perhaps.  Or
you could use [] or {}.  It *is* important that you don't create
confusing syntax summaries, but I don't think that's a reason to pick
a nonstandard syntax for grouping.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-24 Thread Masahiko Sawada
On Thu, Mar 24, 2016 at 2:26 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-24 Thread Fujii Masao
On Thu, Mar 24, 2016 at 2:26 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Masahiko Sawada
On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao  wrote:
> On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
>  wrote:
>> Hello,
>>
>> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao  
>> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Fujii Masao
On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao  wrote 
> in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-23 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Michael Paquier
On Wed, Mar 23, 2016 at 1:21 PM, Fujii Masao  wrote:
> On Wed, Mar 23, 2016 at 2:28 AM, Masahiko Sawada  
> wrote:
>> Attached patch incorporates above comments.
>> Please find it.
>
> Attached is the latest version of the patch based on your patch.

Not really having a look at the core patch yet...

+ my $result = $node_master->psql('postgres', "SELECT
application_name, sync_priority, sync_state FROM
pg_stat_replication;");
+ print "$result \n";
Having ORDER BY application_name would be good for those queries, and
the result outputs could be made more consistent as a result.

+ # Change the s_s_names = '2[standby1,standby2,standby3]' and check sync state
+ $node_master->psql('postgres', "ALTER SYSTEM SET
synchronous_standby_names = '2[standby1,standby2,standby3]';");
+ $node_master->psql('postgres', "SELECT pg_reload_conf();");
Let's add a reload routine in PostgresNode.pm, this patch is not the
only one who would use it.

--- b/src/test/recovery/t/006_multisync_rep.pl
***
*** 0 
--- 1,106 
+ use strict;
+ use warnings;
You may want to add a small description for this test as header.

  $postgres->AddFiles('src/backend/replication', 'repl_scanner.l',
  'repl_gram.y');
+ $postgres->AddFiles('src/backend/replication', 'syncrep_scanner.l',
+ 'syncrep_gram.y');
There is no need for a new routine call here, you can just append the
new files on the existing call.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Fujii Masao
On Wed, Mar 23, 2016 at 2:28 AM, Masahiko Sawada  wrote:
> On Tue, Mar 22, 2016 at 11:08 PM, Fujii Masao  wrote:
>> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> Thank you for the revised patch.
>>
>> Thanks for reviewing the patch!
>>
>>> This version looks to focus on n-priority method. Stuffs for the
>>> other methods like n-quorum has been removed. It is okay for me.
>>
>> I don't think it's so difficult to extend this version so that
>> it supports also quorum commit.
>
> Yeah, 1-nest level implementation would not so difficult.
>
>>> StringInfo for double-quoted names seems to me to be overkill,
>>> since it allocates 1024 byte block for every such name. A static
>>> buffer seems enough for the usage as I said.
>>
>> So, what about changing the scanner code as follows?
>>
>> {xdstop} {
>> yylval.str = pstrdup(xdbuf.data);
>> pfree(xdbuf.data);
>> BEGIN(INITIAL);
>> return NAME;

I applied this change to the latest version of the patch.
Please check that.

Also I changed syncrep.c so that it uses list_free_deep() to free the list
of the parsed s_s_names. Because the data in the list is palloc'd by
syncrep_scanner.l.

>>> The parser is called for not only for SIGHUP, but also for
>>> starting of every walsender. The latter is not necessary but it
>>> is the matter of trade-off between simplisity and
>>> effectiveness.
>>
>> Could you elaborate why you think that's not necessary?
>>
>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>> startup. A child process takes over the internal data struct for the parsed
>> s_s_names when it's forked by the postmaster. This is what the previous
>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>> In that environment, the data struct should be passed to a child process via
>> the special file (like write_nondefault_variables() does), or it should
>> be constructed during walsender startup (like latest version of the patch
>> does). IMO the latter is simpler.
>
> Thank you for updating patch.
>
> Followings are random review comments.
>
> ==
> +   for (cell = list_head(pending); cell; cell = next)
>
> Can we use foreach() instead?

Yes.

> ==
> +   pending = list_delete_cell(pending, cell, 
> prev);
> +
> +   if (list_length(pending) == 0)
> +   {
> +   list_free(pending);
> +   return result;  /*
> Exit if pending list is empty */
> +   }
>
> If pending list become empty after deleting element, we can return.
> It's a small optimisation.

I don' think this is necessary because currently we can get ouf of the loop
immediately after that deletion.

But I found the bug about the calculation of the next highest priority.
This could cause extra unnecessary loop. I fixed that in the latest version
of the patch.

> ==
> If num_sync is greater than the number of members of sync standby
> list, we'd rather return error message immediately.
> Thoughts?

No. For example, please imagine the case where s_s_names is set to '*'
and more than one sync standbys are connecting to the master.
That's valid setting.

> ==
> I got assertion error when master server is set up with empty s_s_names.
> Because current patch always tries to parse s_s_names and use it
> regardless value of parameter.

Yeah, you're right.

>
> Attached patch incorporates above comments.
> Please find it.

Attached is the latest version of the patch based on your patch.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2902,2921  include_dir 'conf.d'


 
! Specifies a comma-separated list of standby names that can support
! synchronous replication, as described in
! .
! At any one time there will be at most one active synchronous standby;
! transactions waiting for commit will be allowed to proceed after
! this standby server confirms receipt of their data.
! The synchronous standby will be the first standby named in this list
! that is both currently connected and streaming data in real-time
! (as shown by a state of streaming in the
  
  pg_stat_replication view).
! Other standby servers appearing later in this list represent potential
! synchronous standbys.
! If the current synchronous standby disconnects for whatever reason,
  it will be replaced immediately with the next-highest-priority standby.
  Specifying more than one standby name can allow very high availability.
 
--- 2902,2919 


 
! Specifies the standby names that can 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Masahiko Sawada
On Tue, Mar 22, 2016 at 11:08 PM, Fujii Masao  wrote:
> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
>  wrote:
>> Thank you for the revised patch.
>
> Thanks for reviewing the patch!
>
>> This version looks to focus on n-priority method. Stuffs for the
>> other methods like n-quorum has been removed. It is okay for me.
>
> I don't think it's so difficult to extend this version so that
> it supports also quorum commit.

Yeah, 1-nest level implementation would not so difficult.

>> StringInfo for double-quoted names seems to me to be overkill,
>> since it allocates 1024 byte block for every such name. A static
>> buffer seems enough for the usage as I said.
>
> So, what about changing the scanner code as follows?
>
> {xdstop} {
> yylval.str = pstrdup(xdbuf.data);
> pfree(xdbuf.data);
> BEGIN(INITIAL);
> return NAME;
>> The parser is called for not only for SIGHUP, but also for
>> starting of every walsender. The latter is not necessary but it
>> is the matter of trade-off between simplisity and
>> effectiveness.
>
> Could you elaborate why you think that's not necessary?
>
> BTW, in previous patch, s_s_names is parsed by postmaster during the server
> startup. A child process takes over the internal data struct for the parsed
> s_s_names when it's forked by the postmaster. This is what the previous
> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
> In that environment, the data struct should be passed to a child process via
> the special file (like write_nondefault_variables() does), or it should
> be constructed during walsender startup (like latest version of the patch
> does). IMO the latter is simpler.

Thank you for updating patch.

Followings are random review comments.

==
+   for (cell = list_head(pending); cell; cell = next)

Can we use foreach() instead?
==
+   pending = list_delete_cell(pending, cell, prev);
+
+   if (list_length(pending) == 0)
+   {
+   list_free(pending);
+   return result;  /*
Exit if pending list is empty */
+   }

If pending list become empty after deleting element, we can return.
It's a small optimisation.
==
If num_sync is greater than the number of members of sync standby
list, we'd rather return error message immediately.
Thoughts?
==
I got assertion error when master server is set up with empty s_s_names.
Because current patch always tries to parse s_s_names and use it
regardless value of parameter.

Attached patch incorporates above comments.
Please find it.

Regards,

--
Masahiko Sawada


multi_sync_replication_v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Fujii Masao
On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the revised patch.

Thanks for reviewing the patch!

> This version looks to focus on n-priority method. Stuffs for the
> other methods like n-quorum has been removed. It is okay for me.

I don't think it's so difficult to extend this version so that
it supports also quorum commit.

> StringInfo for double-quoted names seems to me to be overkill,
> since it allocates 1024 byte block for every such name. A static
> buffer seems enough for the usage as I said.

So, what about changing the scanner code as follows?

{xdstop} {
yylval.str = pstrdup(xdbuf.data);
pfree(xdbuf.data);
BEGIN(INITIAL);
return NAME;

> The parser is called for not only for SIGHUP, but also for
> starting of every walsender. The latter is not necessary but it
> is the matter of trade-off between simplisity and
> effectiveness.

Could you elaborate why you think that's not necessary?

BTW, in previous patch, s_s_names is parsed by postmaster during the server
startup. A child process takes over the internal data struct for the parsed
s_s_names when it's forked by the postmaster. This is what the previous
patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
In that environment, the data struct should be passed to a child process via
the special file (like write_nondefault_variables() does), or it should
be constructed during walsender startup (like latest version of the patch
does). IMO the latter is simpler.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Kyotaro HORIGUCHI
Thank you for the revised patch.

At Tue, 22 Mar 2016 16:02:39 +0900, Fujii Masao  wrote 
in 
> On Thu, Mar 10, 2016 at 7:21 PM, Fujii Masao  wrote:
> Sorry for the delay... Here is the revised version of the patch.
> Please review and test this version!
> BTW, I've not revised the documentation and regression test yet.
> I will do that during the review and test of the patch.

This version looks to focus on n-priority method. Stuffs for the
other methods like n-quorum has been removed. It is okay for me.

So using WalSnd->sync_standby_priority is reasonable. 

SyncRePGetSyncStandbys seems to work as expected, that is,
collecting n standbys in the order of priority, even if multiple
standbys are at the same prioirity, but in (pseudo) random order
among the standbys with the same priority, not LSN order. This is
the difference from the true quoraum method.

About announcement of take over,

>   if (announce_next_takeover && am_sync)
>   {
> announce_next_takeover = false;
> ereport(LOG,
> (errmsg("standby \"%s\" is now the synchronous standby with priority 
> %u",
> application_name, MyWalSnd->sync_standby_priority)));

This can announces for the seemingly same standby successively if
standbys with the same application_name are comming-in and
going-out. But this is the same as the current behavior.

Otherwise, as far as I can see, SyncRepReleaseWaiters seems to
work correctly.


SyncRepinitConfig parses s_s_names then prioritize all walsenders
based on the result. This is run at the start of a walsender and
at reloading of config. Ended walsenders are excluded on
collectiong sync-standbys. All of these seems to work
properly. (as before).

The parser became far simpler by getting rid of the stuffs for
the future expansion. It accepts only '[name, ...]' and the
old s_s_names format.

StringInfo for double-quoted names seems to me to be overkill,
since it allocates 1024 byte block for every such name. A static
buffer seems enough for the usage as I said.

The parser is called for not only for SIGHUP, but also for
starting of every walsender. The latter is not necessary but it
is the matter of trade-off between simplisity and
effectiveness. The same can be said for
check_synchronous_standby_names().

regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-22 Thread Fujii Masao
On Thu, Mar 10, 2016 at 7:21 PM, Fujii Masao  wrote:
> On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada  wrote:
>> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada  
>> wrote:
>>> Hi,
>>>
>>> Thank you so much for reviewing this patch!
>>>
>>> All review comments regarding document and comment are fixed.
>>> Attached latest v14 patch.
>>>
 This accepts 'abc^Id' as a name, which is wrong behavior (but
 such appliction names are not allowed anyway. If you assume so,
 I'd like to see a comment for that.).
>>>
>>> 'abc^Id' is accepted as application_name, no?
>>> postgres(1)=# set application_name to 'abc^Id';
>>> SET
>>> postgres(1)=# show application_name ;
>>>  application_name
>>> --
>>>  abc^Id
>>> (1 row)
>>>
 addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
 char ychar) requires differnt character types. Is there any reason
 for that?
>>>
>>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>>
 I personally don't like addlit*string() things for such simple
 syntax but itself is acceptble enough for me. However it uses
 StringInfo to hold double-quoted names, which pallocs 1024 bytes
 of memory chunk for every double-quoted name. The chunks are
 finally stacked up left uncollected until the current
 memorycontext is deleted or reset (It is deleted just after
 finishing config file processing). Addition to that, setting
 s_s_names runs the parser twice. It seems to me too greedy and
 seems that static char [NAMEDATALEN] is enough using the v12 way
 without palloc/repalloc.
>>>
>>> I though that length of group name could be more than NAMEDATALEN, so
>>> I use StringInfo.
>>> Is it not necessary?
>>>
 I found that the name SyncGroupName.wait_num is not
 instinctive. How about sync_num, sync_member_num or
 sync_standby_num? If the last is preferable, .members also should
 be .standbys .
>>>
>>> Thanks, sync_num is preferable to me.
>>>
>>> ===
 I am quite uncomfortable with the existence of
 WanSnd.sync_standby_priority. It represented the pirority in the
 old linear s_s_names format but nested groups or even
 single-level quarum list obviously doesn't fit it. Can we get rid
 of sync_standby_priority, even though we realize atmost
 n-priority for now?
>>>
>>> We could get rid of sync_standby_priority.
>>> But if so, we will not be able to see the next sync standby in
>>> pg_stat_replication system view.
>>> Regarding each node priority, I was thinking that standbys in quorum
>>> list have same priority, and in nested group each standbys are given
>>> the priority starting from 1.
>>>
>>> ===
 The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
 have specific code for every prioritizing method (which are
 priority, quorum, nested and so). Is there any reson to use it as
 a callback of SyncGroupNode?
>>>
>>> The reason why the current code is so is that current code is for only
>>> priority method supporting.
>>> At first version of this feature, I'd like to implement it more simple.
>>>
>>> Aside from this, of course I'm planning to have specific code for nested 
>>> design.
>>> - The group can have some name nodes or group nodes.
>>> - The group can use either 2 types of method: priority or quorum.
>>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>>>   - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
>>> at that moment using group's method.
>>>   - SyncRepGetStandbysFn() function returns standbys of its group,
>>> which are considered as sync using group's method.
>>>
>>> For example, s_s_name  = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
>>> memory structure will be,
>>>
>>> "main(quorum)" --- "a"
>>> |
>>> -- "b"
>>> |
>>> -- "group1(priority)" --- "c"
>>>  |
>>>  -- "d"
>>>
>>> When determine synced LSNs, we need to consider group1's LSN using by
>>> priority method at first, and then we can determine main's LSN using
>>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
>>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>>>
>>> bool
>>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
>>> {
>>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>>>
>>> if (sync_num < group->sync_num)
>>> return false;
>>>
>>> for (each member of sync_list)
>>> {
>>> if (member->type == group node)
>>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
>>> f into lsn_list.
>>> else
>>> Store name node LSNs into lsn_list.
>>> }
>>>
>>> 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-16 Thread Kyotaro HORIGUCHI
It seems to me a matter of definition of "available replicas".

At Wed, 16 Mar 2016 14:13:48 +1300, Thomas Munro 
 wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-15 Thread Thomas Munro

 Synchronous replication offers the ability to confirm that all changes
-made by a transaction have been transferred to one synchronous standby
-server. This extends the standard level of durability
+made by a transaction have been transferred to one or more
synchronous standby
+server. This extends that standard level of durability
 offered by a transaction commit. This level of protection is referred
-to as 2-safe replication in computer science theory.
+to as group-safe replication in computer science theory.


A message on the -general list today pointed me to some earlier
discussion[1] which quoted and referenced definitions of these
academic terms[2].  I think the above documentation should say:

"This level of protection is referred to as 2-safe replication in
computer science literature when synchronous_commit is
set to on, and group-1-safe (group-safe and 1-safe) when
synchronous_commit is set to remote_write."

By my reading, the situation doesn't actually change with this patch.
It doesn't matter whether you need 1 or 42 synchronous standbys to
make a quorum: 2-safe means durable (fsync) on all of them,
group-1-safe means durable on one server and received (implied by
remote_write) by all of them.

I think we should be using those definitions because Gray's earlier
definition of 2-safe from Transaction Processing 12.6.3 doesn't really
fit:  It can optionally mean remote receipt or remote durable storage,
but it doesn't wait if the 'backup' is down, so it's not the same type
of guarantee.  (He also has 'very safe' which might describe our
syncrep, I'm not sure.)

[1] 
http://www.postgresql.org/message-id/603c8f070812132142n5408e7ddk899e83cddd4cb...@mail.gmail.com
[2] http://infoscience.epfl.ch/record/33053/files/EPFL_TH2577.pdf page 76

On Thu, Mar 10, 2016 at 11:21 PM, Fujii Masao  wrote:
> On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada  wrote:
>> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada  
>> wrote:
>>> Hi,
>>>
>>> Thank you so much for reviewing this patch!
>>>
>>> All review comments regarding document and comment are fixed.
>>> Attached latest v14 patch.
>>>
 This accepts 'abc^Id' as a name, which is wrong behavior (but
 such appliction names are not allowed anyway. If you assume so,
 I'd like to see a comment for that.).
>>>
>>> 'abc^Id' is accepted as application_name, no?
>>> postgres(1)=# set application_name to 'abc^Id';
>>> SET
>>> postgres(1)=# show application_name ;
>>>  application_name
>>> --
>>>  abc^Id
>>> (1 row)
>>>
 addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
 char ychar) requires differnt character types. Is there any reason
 for that?
>>>
>>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>>
 I personally don't like addlit*string() things for such simple
 syntax but itself is acceptble enough for me. However it uses
 StringInfo to hold double-quoted names, which pallocs 1024 bytes
 of memory chunk for every double-quoted name. The chunks are
 finally stacked up left uncollected until the current
 memorycontext is deleted or reset (It is deleted just after
 finishing config file processing). Addition to that, setting
 s_s_names runs the parser twice. It seems to me too greedy and
 seems that static char [NAMEDATALEN] is enough using the v12 way
 without palloc/repalloc.
>>>
>>> I though that length of group name could be more than NAMEDATALEN, so
>>> I use StringInfo.
>>> Is it not necessary?
>>>
 I found that the name SyncGroupName.wait_num is not
 instinctive. How about sync_num, sync_member_num or
 sync_standby_num? If the last is preferable, .members also should
 be .standbys .
>>>
>>> Thanks, sync_num is preferable to me.
>>>
>>> ===
 I am quite uncomfortable with the existence of
 WanSnd.sync_standby_priority. It represented the pirority in the
 old linear s_s_names format but nested groups or even
 single-level quarum list obviously doesn't fit it. Can we get rid
 of sync_standby_priority, even though we realize atmost
 n-priority for now?
>>>
>>> We could get rid of sync_standby_priority.
>>> But if so, we will not be able to see the next sync standby in
>>> pg_stat_replication system view.
>>> Regarding each node priority, I was thinking that standbys in quorum
>>> list have same priority, and in nested group each standbys are given
>>> the priority starting from 1.
>>>
>>> ===
 The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
 have specific code for every prioritizing method (which are
 priority, quorum, nested and so). Is there any reson to use it as
 a callback of SyncGroupNode?
>>>
>>> The reason why the current code is so is that current code is for only
>>> priority method 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-10 Thread Fujii Masao
On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada  wrote:
> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada  
> wrote:
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>>> such appliction names are not allowed anyway. If you assume so,
>>> I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>>  application_name
>> --
>>  abc^Id
>> (1 row)
>>
>>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>>> char ychar) requires differnt character types. Is there any reason
>>> for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>
>>> I personally don't like addlit*string() things for such simple
>>> syntax but itself is acceptble enough for me. However it uses
>>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>>> of memory chunk for every double-quoted name. The chunks are
>>> finally stacked up left uncollected until the current
>>> memorycontext is deleted or reset (It is deleted just after
>>> finishing config file processing). Addition to that, setting
>>> s_s_names runs the parser twice. It seems to me too greedy and
>>> seems that static char [NAMEDATALEN] is enough using the v12 way
>>> without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>>
>>> I found that the name SyncGroupName.wait_num is not
>>> instinctive. How about sync_num, sync_member_num or
>>> sync_standby_num? If the last is preferable, .members also should
>>> be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>>> I am quite uncomfortable with the existence of
>>> WanSnd.sync_standby_priority. It represented the pirority in the
>>> old linear s_s_names format but nested groups or even
>>> single-level quarum list obviously doesn't fit it. Can we get rid
>>> of sync_standby_priority, even though we realize atmost
>>> n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same priority, and in nested group each standbys are given
>> the priority starting from 1.
>>
>> ===
>>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>>> have specific code for every prioritizing method (which are
>>> priority, quorum, nested and so). Is there any reson to use it as
>>> a callback of SyncGroupNode?
>>
>> The reason why the current code is so is that current code is for only
>> priority method supporting.
>> At first version of this feature, I'd like to implement it more simple.
>>
>> Aside from this, of course I'm planning to have specific code for nested 
>> design.
>> - The group can have some name nodes or group nodes.
>> - The group can use either 2 types of method: priority or quorum.
>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>>   - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
>> at that moment using group's method.
>>   - SyncRepGetStandbysFn() function returns standbys of its group,
>> which are considered as sync using group's method.
>>
>> For example, s_s_name  = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
>> memory structure will be,
>>
>> "main(quorum)" --- "a"
>> |
>> -- "b"
>> |
>> -- "group1(priority)" --- "c"
>>  |
>>  -- "d"
>>
>> When determine synced LSNs, we need to consider group1's LSN using by
>> priority method at first, and then we can determine main's LSN using
>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>>
>> bool
>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
>> {
>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>>
>> if (sync_num < group->sync_num)
>> return false;
>>
>> for (each member of sync_list)
>> {
>> if (member->type == group node)
>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
>> f into lsn_list.
>> else
>> Store name node LSNs into lsn_list.
>> }
>>
>> Determine synced LSNs of this group using lsn_list and priority method.
>> Store synced LSNs into write_lsn and flush_lsn.
>> return true;
>> }
>>
>>> SyncRepClearStandbyGroupList is defined in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-06 Thread Masahiko Sawada
Reply to multiple hackers.
Thank you for reviewing this patch.

> +used.  Priority is given to servers in the order that the appear
> in the list.
>
> s/the appear/they appear/
>
> -The minimum wait time is the roundtrip time between primary to standby.
> +The minimum wait time is the roundtrip time between the primary and the
> +almost synchronous standby.
>
> s/almost/slowest/

Will fix this typo. Thanks!

On Fri, Mar 4, 2016 at 5:22 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> Sorry for long, hard-to-read writings in advance..
>
> At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada  
> wrote in 
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>> > This accepts 'abc^Id' as a name, which is wrong behavior (but
>> > such appliction names are not allowed anyway. If you assume so,
>> > I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>>  application_name
>> --
>>  abc^Id
>> (1 row)
>
> Sorry, I implicitly used "^" in the meaning of "ctrl key". So
> "^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
> following in psql shows that.
>
> =# set application_name to E'abc\td';
> =# show application_name ;
>  application_name
> --
>  ab?d
> (1 row)
>
> The  is replaced with '?' (literally) at the time of
> guc assinment.

Oh, I see.
I will comment for that.

>> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> > char ychar) requires differnt character types. Is there any reason
>> > for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
> Umm. My qustion might have been a bit out of the point.
>
> The addlitchar_xd_string(str,unsigned char c) does
> appendStringInfoChar(, c). On the other hand, the signature of
> the function of stringinfo is the following.
>
> AppendStringInfoChar(StringInfo str, char ch);
>
> Of course "char" is equivalent of "signed char" as
> default. addlitchar_xd_string assigns the given character in
> "unsigned char" to the parameter of AppendStringInfoChar of
> "signed char".
>
> These two are incompatible types. Imagine the
> following codelet,
>
> #include 
>
> void hoge(signed char c){
>   int ch = c;
>   fprintf(stderr, "char = %d\n", ch);
> }
>
> int main(void)
> {
>   unsigned char u;
>
>   u = 200;
>   hoge(u);
>   return 0;
> }
>
> The result is -56. So we generally should get rid of such type of
> mixture of signedness for no particular reason.
>
> In this case, the domain of the variable is 0x20-0x7e so no
> problem won't be actualized but also there's no reason for the
> signedness mixture.

Thank you for explanation.
I will fix this.

>> > I personally don't like addlit*string() things for such simple
>> > syntax but itself is acceptble enough for me. However it uses
>> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> > of memory chunk for every double-quoted name. The chunks are
>> > finally stacked up left uncollected until the current
>> > memorycontext is deleted or reset (It is deleted just after
>> > finishing config file processing). Addition to that, setting
>> > s_s_names runs the parser twice. It seems to me too greedy and
>> > seems that static char [NAMEDATALEN] is enough using the v12 way
>> > without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>
> Such long names doesn't seem to necessary. Too long identifiers
> no longer act as identifier for human eyeballs. We are limiting
> the length of identifiers of the whole database system to
> NAMEDATALEN-1, which seems to have been enough so I don't see any
> reason to have a group name longer than that.
>

I see. I will fix this.

>> > I found that the name SyncGroupName.wait_num is not
>> > instinctive. How about sync_num, sync_member_num or
>> > sync_standby_num? If the last is preferable, .members also should
>> > be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>> > I am quite uncomfortable with the existence of
>> > WanSnd.sync_standby_priority. It represented the pirority in the
>> > old linear s_s_names format but nested groups or even
>> > single-level quarum list obviously doesn't fit it. Can we get rid
>> > of sync_standby_priority, even though we realize atmost
>> > n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-04 Thread Kyotaro HORIGUCHI
Hello,

Sorry for long, hard-to-read writings in advance..

At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada  
wrote in 
> Hi,
> 
> Thank you so much for reviewing this patch!
> 
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
> 
> > This accepts 'abc^Id' as a name, which is wrong behavior (but
> > such appliction names are not allowed anyway. If you assume so,
> > I'd like to see a comment for that.).
> 
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
>  application_name
> --
>  abc^Id
> (1 row)

Sorry, I implicitly used "^" in the meaning of "ctrl key". So
"^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
following in psql shows that.

=# set application_name to E'abc\td';
=# show application_name ;
 application_name 
--
 ab?d
(1 row)

The  is replaced with '?' (literally) at the time of
guc assinment.

> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> > char ychar) requires differnt character types. Is there any reason
> > for that?
> 
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.

Umm. My qustion might have been a bit out of the point.

The addlitchar_xd_string(str,unsigned char c) does
appendStringInfoChar(, c). On the other hand, the signature of
the function of stringinfo is the following.

AppendStringInfoChar(StringInfo str, char ch);

Of course "char" is equivalent of "signed char" as
default. addlitchar_xd_string assigns the given character in
"unsigned char" to the parameter of AppendStringInfoChar of
"signed char".

These two are incompatible types. Imagine the
following codelet, 

#include 

void hoge(signed char c){
  int ch = c;
  fprintf(stderr, "char = %d\n", ch);
}

int main(void)
{
  unsigned char u;

  u = 200;
  hoge(u);
  return 0;
}

The result is -56. So we generally should get rid of such type of
mixture of signedness for no particular reason.

In this case, the domain of the variable is 0x20-0x7e so no
problem won't be actualized but also there's no reason for the
signedness mixture.

> > I personally don't like addlit*string() things for such simple
> > syntax but itself is acceptble enough for me. However it uses
> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
> > of memory chunk for every double-quoted name. The chunks are
> > finally stacked up left uncollected until the current
> > memorycontext is deleted or reset (It is deleted just after
> > finishing config file processing). Addition to that, setting
> > s_s_names runs the parser twice. It seems to me too greedy and
> > seems that static char [NAMEDATALEN] is enough using the v12 way
> > without palloc/repalloc.
> 
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?

Such long names doesn't seem to necessary. Too long identifiers
no longer act as identifier for human eyeballs. We are limiting
the length of identifiers of the whole database system to
NAMEDATALEN-1, which seems to have been enough so I don't see any
reason to have a group name longer than that.

> > I found that the name SyncGroupName.wait_num is not
> > instinctive. How about sync_num, sync_member_num or
> > sync_standby_num? If the last is preferable, .members also should
> > be .standbys .
> 
> Thanks, sync_num is preferable to me.
> 
> ===
> > I am quite uncomfortable with the existence of
> > WanSnd.sync_standby_priority. It represented the pirority in the
> > old linear s_s_names format but nested groups or even
> > single-level quarum list obviously doesn't fit it. Can we get rid
> > of sync_standby_priority, even though we realize atmost
> > n-priority for now?
> 
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.

As far as I can see the varialbe is referred to as a boolean to
indicate whether a walsernder is connected to a candidate
synchronous standby. So the value is totally useless, at least
for now. However, SyncRepRelaseWaiters uses the value to check if
the synced LSNs can be advaned by a walsender so the variable is
useful as a boolean.

In the previous versions, the reason why WanSnd had the priority
value is that a pair of synchronized LSNs is determined only by
one wansender, which has the highest priority among active
wansenders. So even if a walsender receives a response from
walreceiver, it doesn't need to do nothing if it is not at the
highest priority. It's a simple world.

In the quorum commit 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-03 Thread Thomas Munro
On Fri, Mar 4, 2016 at 7:40 AM, Masahiko Sawada  wrote:
> Previous patch has bug around GUC parameter handling.
> Attached updated version.

I spotted a couple of typos:

+used.  Priority is given to servers in the order that the appear
in the list.

s/the appear/they appear/

-The minimum wait time is the roundtrip time between primary to standby.
+The minimum wait time is the roundtrip time between the primary and the
+almost synchronous standby.

s/almost/slowest/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-03 Thread Masahiko Sawada
On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada  wrote:
> Hi,
>
> Thank you so much for reviewing this patch!
>
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
>
>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>> such appliction names are not allowed anyway. If you assume so,
>> I'd like to see a comment for that.).
>
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
>  application_name
> --
>  abc^Id
> (1 row)
>
>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> char ychar) requires differnt character types. Is there any reason
>> for that?
>
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
>> I personally don't like addlit*string() things for such simple
>> syntax but itself is acceptble enough for me. However it uses
>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> of memory chunk for every double-quoted name. The chunks are
>> finally stacked up left uncollected until the current
>> memorycontext is deleted or reset (It is deleted just after
>> finishing config file processing). Addition to that, setting
>> s_s_names runs the parser twice. It seems to me too greedy and
>> seems that static char [NAMEDATALEN] is enough using the v12 way
>> without palloc/repalloc.
>
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?
>
>> I found that the name SyncGroupName.wait_num is not
>> instinctive. How about sync_num, sync_member_num or
>> sync_standby_num? If the last is preferable, .members also should
>> be .standbys .
>
> Thanks, sync_num is preferable to me.
>
> ===
>> I am quite uncomfortable with the existence of
>> WanSnd.sync_standby_priority. It represented the pirority in the
>> old linear s_s_names format but nested groups or even
>> single-level quarum list obviously doesn't fit it. Can we get rid
>> of sync_standby_priority, even though we realize atmost
>> n-priority for now?
>
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.
>
> ===
>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>> have specific code for every prioritizing method (which are
>> priority, quorum, nested and so). Is there any reson to use it as
>> a callback of SyncGroupNode?
>
> The reason why the current code is so is that current code is for only
> priority method supporting.
> At first version of this feature, I'd like to implement it more simple.
>
> Aside from this, of course I'm planning to have specific code for nested 
> design.
> - The group can have some name nodes or group nodes.
> - The group can use either 2 types of method: priority or quorum.
> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>   - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> at that moment using group's method.
>   - SyncRepGetStandbysFn() function returns standbys of its group,
> which are considered as sync using group's method.
>
> For example, s_s_name  = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> memory structure will be,
>
> "main(quorum)" --- "a"
> |
> -- "b"
> |
> -- "group1(priority)" --- "c"
>  |
>  -- "d"
>
> When determine synced LSNs, we need to consider group1's LSN using by
> priority method at first, and then we can determine main's LSN using
> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>
> bool
> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
> {
> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>
> if (sync_num < group->sync_num)
> return false;
>
> for (each member of sync_list)
> {
> if (member->type == group node)
> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
> f into lsn_list.
> else
> Store name node LSNs into lsn_list.
> }
>
> Determine synced LSNs of this group using lsn_list and priority method.
> Store synced LSNs into write_lsn and flush_lsn.
> return true;
> }
>
>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>> other related functions are defined in syncgroup_gram.y. It would
>> be better to place them together.
>
> SyncRepClearStandbyGroupList() is used by
> check_synchronous_standby_names(), 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-03 Thread Masahiko Sawada
Hi,

Thank you so much for reviewing this patch!

All review comments regarding document and comment are fixed.
Attached latest v14 patch.

> This accepts 'abc^Id' as a name, which is wrong behavior (but
> such appliction names are not allowed anyway. If you assume so,
> I'd like to see a comment for that.).

'abc^Id' is accepted as application_name, no?
postgres(1)=# set application_name to 'abc^Id';
SET
postgres(1)=# show application_name ;
 application_name
--
 abc^Id
(1 row)

> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> char ychar) requires differnt character types. Is there any reason
> for that?

Because addlit_xd_string() is for adding string(char *) to xd_string,
OTOH addlit_xd_char() is for adding just one character to xd_string.

> I personally don't like addlit*string() things for such simple
> syntax but itself is acceptble enough for me. However it uses
> StringInfo to hold double-quoted names, which pallocs 1024 bytes
> of memory chunk for every double-quoted name. The chunks are
> finally stacked up left uncollected until the current
> memorycontext is deleted or reset (It is deleted just after
> finishing config file processing). Addition to that, setting
> s_s_names runs the parser twice. It seems to me too greedy and
> seems that static char [NAMEDATALEN] is enough using the v12 way
> without palloc/repalloc.

I though that length of group name could be more than NAMEDATALEN, so
I use StringInfo.
Is it not necessary?

> I found that the name SyncGroupName.wait_num is not
> instinctive. How about sync_num, sync_member_num or
> sync_standby_num? If the last is preferable, .members also should
> be .standbys .

Thanks, sync_num is preferable to me.

===
> I am quite uncomfortable with the existence of
> WanSnd.sync_standby_priority. It represented the pirority in the
> old linear s_s_names format but nested groups or even
> single-level quarum list obviously doesn't fit it. Can we get rid
> of sync_standby_priority, even though we realize atmost
> n-priority for now?

We could get rid of sync_standby_priority.
But if so, we will not be able to see the next sync standby in
pg_stat_replication system view.
Regarding each node priority, I was thinking that standbys in quorum
list have same priority, and in nested group each standbys are given
the priority starting from 1.

===
> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
> have specific code for every prioritizing method (which are
> priority, quorum, nested and so). Is there any reson to use it as
> a callback of SyncGroupNode?

The reason why the current code is so is that current code is for only
priority method supporting.
At first version of this feature, I'd like to implement it more simple.

Aside from this, of course I'm planning to have specific code for nested design.
- The group can have some name nodes or group nodes.
- The group can use either 2 types of method: priority or quorum.
- The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
  - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
at that moment using group's method.
  - SyncRepGetStandbysFn() function returns standbys of its group,
which are considered as sync using group's method.

For example, s_s_name  = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
memory structure will be,

"main(quorum)" --- "a"
|
-- "b"
|
-- "group1(priority)" --- "c"
 |
 -- "d"

When determine synced LSNs, we need to consider group1's LSN using by
priority method at first, and then we can determine main's LSN using
by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
So SyncRepGetSyncedLsnsUsingPriority() function would be,

bool
SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
{
sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);

if (sync_num < group->sync_num)
return false;

for (each member of sync_list)
{
if (member->type == group node)
call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
f into lsn_list.
else
Store name node LSNs into lsn_list.
}

Determine synced LSNs of this group using lsn_list and priority method.
Store synced LSNs into write_lsn and flush_lsn.
return true;
}

> SyncRepClearStandbyGroupList is defined in syncrep.c but the
> other related functions are defined in syncgroup_gram.y. It would
> be better to place them together.

SyncRepClearStandbyGroupList() is used by
check_synchronous_standby_names(), so I put this function syncrep.c.

> SyncRepStandbys are to be in multilevel and the struct is
> naturally allowed to be so but SyncRepClearStandbyGroupList
> assumes it in single level.

Because I think that we don't need to implement to fully support
nested style at first 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-02 Thread Thomas Munro
On Sun, Feb 28, 2016 at 8:04 AM, Masahiko Sawada  wrote:
> Attached latest version patch.
>
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.
>
> Please review it.
>
> [000_multi_sync_replication_v13.patch]

Hi Masahiko,

Hi,

I have a couple of small suggestions for the documentation and comments:

+Specifies a standby names that can support
synchronous replication using
+either two types of syntax; comma-separated list or dedicated
language, as
+described in .
+Transcations waiting for commit will be allowed to proceed after the
+specified number of standby servers confirms receipt of their data.

Suggestion: Specifies the standby names that can support
synchronous replication using either of two syntaxes: a
comma-separated list, or a more flexible syntax described in .  Transactions waiting for commit
will be allowed to proceed after a configurable subset of standby
servers confirms receipt of their data.  For the simple
comma-separated list syntax, it is one server.

+If the current any of synchronous standbys disconnects for
whatever reason,

s/the current any of/any of the current/

+no mechanism to enforce uniqueness. For each specified standby name,
+only the specified count of standbys will be chosen to be synchronous
+standbys, though exactly which one is indeterminate, the rest will
+represent potential synchronous standbys.

s/one/ones/
s/indeterminate, the/indeterminate.  The/

+made by a transcation have been transferred to one or more
synchronous standby
+server. This extends that standard levelof durability

s/transcation/transaction/
s/that standard levelof/the standard level of/

 offered by a transaction commit. This level of protection is referred
 to as 2-safe replication in computer science theory.

Is this still called "2-safe" or does this patch make it "N-safe",
"group-safe", or something else?

-The minimum wait time is the roundtrip time between primary to standby.
+The minimum wait time is the roundtrip time between primary to standbys.

Suggestion: The minimum wait time is the roundtrip time between the
primary and the slowest synchronous standby.

+Multiple synchronous replication is set up by setting 
+using dedicated language. The syntax of dedicated language is following.

Suggestion:  Multiple synchronous replication is set up by setting
 using the following
syntax.

+Using dedicated language, we can define a synchronous group with
a number N.
+synchronous group can have some members which are consdiered as
synchronous standby using comma-separated list.
+Any standby name is accepted at any position of its list, but '*'
is accepted at only tailing of the standby list.
+The leading N is a number which specifies that how many standbys
the master server waits to commit for. This number
+must be less than actual number of members of its group.
+The listed standby are given highest priority from left defined
starting with 1.

Suggestion: This syntax allows us to define a synchronous group that
will wait for at least N standbys, and a comma-separated list of group
members.  The special value * is accepted at the tail of
the member list, and matches any standby.  The number N must not be
greater than the number of members listed in the group, unless
* is used.  Priority is given to servers in the order that
they appear in the list.  The first named server has the highest
priority.

+All ASCII characters except for special characters(',', '',
'[', ']', ' ') are allowed as standby name.
+When these special characters are used as standby name, whole
standby name string need to be written in
+double-quoted representation.

Suggestion:  ... are allowed in unquoted standby names.  To use these
special characters, the standby name should be enclosed in double
quotes.

+ * In 9.5 we support the possibility to have multiple synchronous standbys,

s/9.5/9.6/

+ * as defined in synchronous_standby_names. Before on standby can become a

s/ on / a /

+ * Waiters will be released from the queue once the number of standbys
+ * specified in synchronous_standby_names have caught.

s/caught/processed the commit record/

+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority.

s/having any priority/having a non-zero priority (meaning it is
configured as potential sync standby)./

- announce_next_takeover = true;

By removing this, haven't we lost the ability to announce takeover
more than once per walsender?  I'm not sure exactly where this should
go now but the walsender needs to detect its own transition from
potential to sync state.  Also, that message, where it appears below
should probably be tweaked slightly s/the/a/, so "standby \"%s\" is
now a synchronous standby with priority %u", not 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-29 Thread Kyotaro HORIGUCHI
Sorry, I misread the previous patch. It actually worked.


At Sun, 28 Feb 2016 04:04:37 +0900, Masahiko Sawada  
wrote in 
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.

Thank you for the new patch. The parser almost looks to work as
expected, but the following warnings were seen on build.

> In file included from syncgroup_gram.y:138:0:
> syncgroup_scanner.l:23:12: warning: ‘xd_size’ defined but not used 
> [-Wunused-variable]
>  static int xd_size; /* actual size of xd_string */
> ^
> syncgroup_scanner.l:24:12: warning: ‘xd_len’ defined but not used 
> [-Wunused-variable]
>  static int xd_len; /* string length of xd_string  */


Some random comments follow.


Commnents for the lexer part.

===
> +node_name[^\ \,\[\]]

This accepts 'abc^Id' as a name, which is wrong behavior (but
such appliction names are not allowed anyway. If you assume so,
I'd like to see a comment for that.). And the excessive escaping
make it hard to read a bit.  The pattern can be written as the
following more precisely. (but I don't know whether it is
generally easy to read..)

| node_name [\x20-\x7f]{-}[ \[\],]

===
The pattern name {node_name} gives me a bit
uneasiness. node_name_cont or name_chars would be preferable.

===
> [1-9][0-9]* {

I see no necessity to inhibit 0-prefixed integers as NUM. Would
you mind allowing [0-9]+ there?

===
addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
char ychar) requires differnt character types. Is there any reason
for that?

===
I personally don't like addlit*string() things for such simple
syntax but itself is acceptble enough for me. However it uses
StringInfo to hold double-quoted names, which pallocs 1024 bytes
of memory chunk for every double-quoted name. The chunks are
finally stacked up left uncollected until the current
memorycontext is deleted or reset (It is deleted just after
finishing config file processing). Addition to that, setting
s_s_names runs the parser twice. It seems to me too greedy and
seems that static char [NAMEDATALEN] is enough using the v12 way
without palloc/repalloc.


Comments for parser part.

===
The rule "result" in syncgruop_gram.y sets malloced chunk to
SyncRepStandbys ignoring exiting content so repetitive setting to
the gud s_s_names causes a memory leak. Using
SyncRepClearStandbyGroupList would be enough.

===
The meaning of SyncGroupNode.type seems oscure. The member seems
to be referred to decide how to treat the node, but the following
code will break the assumption.

> group_node->type = SYNC_REP_GROUP_GROUP | SYNC_REP_GROUP_MAIN;

It seems me that *_MAIN is an equivalent of *_GROUP &&
sync_method = *_PRIORITY. If so, *_MAIN is useless. The reader of
SyncGroupNode needs not to see wheter it was in traditional
s_s_names or in new format.

===
Bare names in s_s_names are down-cased and double-quoted ones are
not. The parser of this patch doesn't for both.

===
xd_stringdup() doesn't make a copy of the string against its
name. It's error-prone.

===
I found that the name SyncGroupName.wait_num is not
instinctive. How about sync_num, sync_member_num or
sync_standby_num? If the last is preferable, .members also should
be .standbys .


Comment for the quorum commit body part.
===
I am quite uncomfortable with the existence of
WanSnd.sync_standby_priority. It represented the pirority in the
old linear s_s_names format but nested groups or even
single-level quarum list obviously doesn't fit it. Can we get rid
of sync_standby_priority, even though we realize atmost
n-priority for now?

===
The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
have specific code for every prioritizing method (which are
priority, quorum, nested and so). Is there any reson to use it as
a callback of SyncGroupNode?



Others - random commnets
===
SyncRepClearStandbyGroupList is defined in syncrep.c but the
other related functions are defined in syncgroup_gram.y. It would
be better to place them together.

===
SyncRepStandbys are to be in multilevel and the struct is
naturally allowed to be so but SyncRepClearStandbyGroupList
assumes it in single level. Make the function to free multilevel
or explicitly inhibit multilevel using asserttion.

===
-   errdetail("The transaction has already committed locally, but might 
not have been replicated to the standby.")));
+   errdetail("The transaction has already committed locally, but might 
not have been replicated to the standby(s).")));

The message doesn't contain specific number of standbys so just
using plural seems to be enough for me. And besides, the message
should describe the situation more precisely. Word correction is
left to anyone else:)

+   errdetail("The transaction has already committed locally, but might 
not have been replicated to some of the required standbys.")));

===
+ * 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-27 Thread Masahiko Sawada
On Fri, Feb 26, 2016 at 10:53 AM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 26 Feb 2016 10:38:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20160226.103822.12680005.horiguchi.kyot...@lab.ntt.co.jp>
>> Hello, Thanks for the new patch.
>>
>>
>> At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada  
>> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-25 Thread Kyotaro HORIGUCHI
At Fri, 26 Feb 2016 10:38:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160226.103822.12680005.horiguchi.kyot...@lab.ntt.co.jp>
> Hello, Thanks for the new patch.
> 
> 
> At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-25 Thread Kyotaro HORIGUCHI
Hello, Thanks for the new patch.


At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-25 Thread Masahiko Sawada
On Fri, Feb 26, 2016 at 1:23 AM, Masahiko Sawada  wrote:
> Attached latest patch includes document patch.
>
>> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
>> the server crashed unexpectedly with the following error message.
>> This is obviously a bug.
>
> Fixed.
>
>>   - allows any byte except a double quote in double-quoted
>>representation. A double-quote just after a delimiter can open
>>quoted representation.
>
> No. double quote is also allowed in double-quoted representation using
> by two double-quotes.
> if s_s_names = '"node""hoge"' then standby name will be 'node"hoge'.
>
>>
>> I have no problem with it. The attached new sample parser does
>> so.
>>
>> By the way, your parser also complains for an example I've seen
>> somewhere upthread "1[2,3,4]". This is because '2', '3' and '4'
>> are regarded as INT, not NAME. Whether a sequence of digits is a
>> prefix number of a list or a host name cannot be identified until
>> reading some following characters. So my previous test.l defined
>> NAME_OR_INTEGER and it is distinguished in the grammar side to
>> resolve this problem.
>>
>> If you want them identified in the lexer side, it should do
>> looking-forward as {prefix} in the attached
>> test.l does. This makes the lexer a bit complex but in contrast
>> test.y simpler. The test.l, test.y attached got refactored but .l
>> gets a bit tricky..
>
> I think that lexer can pass both INT and NAME as char* to parser, and
> then parser regards them as integer or char*.
> It would be more simple.
> Thoughts?
>
> Thank you for giving lexer and parser example but I'm not sure that it
> makes thing more easier.
> It seems to make thing more complex.
>
> Attached patch handles parameter using similar way as postgres parses SQL.
> Please having a look it and give me feedbacks.
>

Previous patch could not parse one character standby name correctly.
Attached latest patch.

Regards,

--
Masahiko Sawada


000_multi_sync_replication_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-25 Thread Masahiko Sawada
Attached latest patch includes document patch.

> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> the server crashed unexpectedly with the following error message.
> This is obviously a bug.

Fixed.

>   - allows any byte except a double quote in double-quoted
>representation. A double-quote just after a delimiter can open
>quoted representation.

No. double quote is also allowed in double-quoted representation using
by two double-quotes.
if s_s_names = '"node""hoge"' then standby name will be 'node"hoge'.

>
> I have no problem with it. The attached new sample parser does
> so.
>
> By the way, your parser also complains for an example I've seen
> somewhere upthread "1[2,3,4]". This is because '2', '3' and '4'
> are regarded as INT, not NAME. Whether a sequence of digits is a
> prefix number of a list or a host name cannot be identified until
> reading some following characters. So my previous test.l defined
> NAME_OR_INTEGER and it is distinguished in the grammar side to
> resolve this problem.
>
> If you want them identified in the lexer side, it should do
> looking-forward as {prefix} in the attached
> test.l does. This makes the lexer a bit complex but in contrast
> test.y simpler. The test.l, test.y attached got refactored but .l
> gets a bit tricky..

I think that lexer can pass both INT and NAME as char* to parser, and
then parser regards them as integer or char*.
It would be more simple.
Thoughts?

Thank you for giving lexer and parser example but I'm not sure that it
makes thing more easier.
It seems to make thing more complex.

Attached patch handles parameter using similar way as postgres parses SQL.
Please having a look it and give me feedbacks.

Regards,

--
Masahiko Sawada


000_multi_sync_replication_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-25 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 24 Feb 2016 18:01:59 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-24 Thread Masahiko Sawada
On Wed, Feb 24, 2016 at 5:37 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> Ok, I think we should concentrate the parser part for now.
>
> At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20160223.17.178687579.horiguchi.kyot...@lab.ntt.co.jp>
>> Hello,
>>
>> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  
>> wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-24 Thread Kyotaro HORIGUCHI
Hello,

Ok, I think we should concentrate the parser part for now.

At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160223.17.178687579.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  wrote 
> in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-23 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-22 Thread Fujii Masao
On Tue, Feb 16, 2016 at 4:19 PM, Masahiko Sawada  wrote:
> On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
>  wrote:
>> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
>>> Surprizingly yes. The list is handled as an identifier list and
>>> parsed by SplitIdentifierString thus it can accept double-quoted
>>> names.
>>
>
> Attached latest version patch which has only feature logic so far.
> I'm writing document patch about this feature now, so this version
> patch doesn't have document and regression test patch.

Thanks for updating the patch!

When I changed s_s_names to 'hoge*' and reloaded the configuration file,
the server crashed unexpectedly with the following error message.
This is obviously a bug.

FATAL:  syntax error

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-18 Thread Kharage, Suraj
Hello,

>Remaining tasks are;
>- Document patch.
>- Regression test patch.
>- Syntax error message for s_s_names improvement.

Please find patch attached for regression test for multisync replication.
I have created this patch over Michael's recovery-test-suite patch.
Please review it.

Regards
Suraj kharage

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

regression_test_for_multisync_rep_v2.patch
Description: regression_test_for_multisync_rep_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-15 Thread Masahiko Sawada
On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
 wrote:
> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
>> Surprizingly yes. The list is handled as an identifier list and
>> parsed by SplitIdentifierString thus it can accept double-quoted
>> names.
>

Attached latest version patch which has only feature logic so far.
I'm writing document patch about this feature now, so this version
patch doesn't have document and regression test patch.

> | $ postgres
> | FATAL:  syntax error: unexpected character "*"
> Mmm.. It should be tough to find what has happened..

I'm trying to implement better error message, but that change is not
included in this version patch yet.

> malloc/free are used in create_name_node and other functions to
> be used in scanner, but syncgroup_gram.y is said to use
> palloc/pfree. Maybe they should use the same memory
> allocation/freeing functions.

Setting like this, I think that we use malloc/free funcion when we
allocate/free memory for SyncRepStandbys variables.
OTOH, we use palloc/pfree function during parsing SyncRepStandbyString.
Am I missing something?

> I suppose SyncRepGetSyncedLsnsFn, or SyncRepGetSyncedLsnsPriority
> can return InvalidXLogRecPtr as cur_*_pos even when it returns
> true. And, I suppose comparison of LSN values with
> InvalidXLogRecPtr is not well-defined. Anyway the condition goes
> wrong when cur_write_pos = InvalidXLogRecPtr (but ret = true).

In this version patch, it's not possible to return InvalidXLogRecPtr
with got_lsns = false (was ret = false).
So we can ensure that we got valid LSNs when got_lsns = true.

> At a glance, SyncRepGetSyncedLsnsPriority and
> SyncRepGetSyncStandbysPriority does almost the same thing and both
> runs loops over group members. Couldn't they run at once?

Yeah, I've optimized that logic.

> We may want to be careful with the use of '[' in application_name.
> I am not much thrilled with forbidding the use of []() in application_name, 
> so we may
> want to recommend user to use a backslash when using s_s_names when a
> group is defined.
> s_s_names='abc, def, " abc,""def"'
>
> Result list is ["abc", "def", " abc,\"def"]
>
> Simplly supporting the same notation addresses the problem and
> accepts strings like the following.
>
> s_s_names='2["comma,name", "foo[bar,baz]"]'

I've changed s_s_names parser so that it can handle special 4
characters (\,\ \[\]) and can handle double-quoted string accurately
same as what SplitIdentifierString does.
We can not use special 4 characters (\,\ \[ \]) without using
double-quoted string. Also if we use "(double-quote) character in
double-quoted string, we should use ""(double double-quotes).
For example,
if application_name = 'hoge " bar', s_s_name = '"hoge "" bar"' would be matched.

Other given comments are fixed.

Remaining tasks are;
- Document patch.
- Regression test patch.
- Syntax error message for s_s_names improvement.

Regards,

--
Masahiko Sawada


000_multi_sync_replication_v10.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
> Surprizingly yes. The list is handled as an identifier list and
> parsed by SplitIdentifierString thus it can accept double-quoted
> names.

Good point. I was not aware of this trick.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-14 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 10 Feb 2016 18:36:43 +0900, Michael Paquier  
wrote in 
> On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
>  wrote:
> > > > +sync_node_group:
> > > > +   sync_list   { $$ = create_group_node(1, 
> > > > $1);
> > > > }
> > > > +   |   sync_element_ast{ $$ = create_group_node(1,
> > > > $1);}
> > > > +   |   INT '[' sync_list ']'   { $$ = create_group_node($1,
> > > > $3);}
> > > > +   |   INT '[' sync_element_ast ']'{ $$ = create_group_node($1,
> > > > $3); }
> > > > We may want to be careful with the use of '[' in application_name. I am 
> > > > not
> > > > much thrilled with forbidding the use of []() in application_name, so 
> > > > we may
> > > > want to recommend user to use a backslash when using s_s_names when a 
> > > > group
> > > > is defined.
> >
> > Mmmm. I found that application_name can contain
> > commas. Furthermore, there seems to be no limitation for
> > character in the name.
> >
> > postgres=# set application_name='ho,ge';
> > postgres=# select application_name from pg_stat_activity;
> >  application_name
> > --
> >  ho,ge
> >
> > check_application_name() allows all characters in the range
> > between 32 to 126 in ascii. All other characters are replaced
> > with '?'.
> 
> Actually I was thinking about that a couple of hours ago. If the
> application_name of a node has a comma, it cannot become a sync
> replica, no? Wouldn't we need a special handling in s_s_names like
> '\,' make a comma part of an application name? Or just ban commas from
> the list of supported characters in the application name?

Surprizingly yes. The list is handled as an identifier list and
parsed by SplitIdentifierString thus it can accept deouble-quoted
names.

s_s_names='abc, def, " abc,""def"'

Result list is ["abc", "def", " abc,\"def"]

Simplly supporting the same notation addresses the problem and
accepts strings like the following.

s_s_names='2["comma,name", "foo[bar,baz]"]'


It is currently an undocumented behavior but I doubt the
necessity to have an explict mention.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-12 Thread Robert Haas
On Thu, Feb 11, 2016 at 5:40 PM, Michael Paquier
 wrote:
> On Fri, Feb 12, 2016 at 2:56 AM, Robert Haas  wrote:
>> On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
>>  wrote:
>>> So, here are some thoughts to make that more user-friendly. I think
>>> that the critical issue here is to properly flatten the meta data in
>>> the custom language and represent it properly in a new catalog,
>>> without messing up too much with the existing pg_stat_replication that
>>> people are now used to for 5 releases since 9.0.
>>
>> Putting the metadata in a catalog doesn't seem great because that only
>> can ever work on the master.  Maybe there's no need to configure this
>> on the slaves and therefore it's OK, but I feel nervous about putting
>> cluster configuration in catalogs.  Another reason for that is that if
>> synchronous replication is broken, then you need a way to change the
>> catalog, which involves committing a write transaction; there's a
>> danger that your efforts to do this will be tripped up by the broken
>> synchronous replication configuration.
>
> I was referring to a catalog view that parses the information related
> to groups of s_s_names in a flattened way to show each group sync
> status. Perhaps my words should have been clearer.

Ah.  Well, that's different, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-11 Thread Michael Paquier
On Fri, Feb 12, 2016 at 2:56 AM, Robert Haas  wrote:
> On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
>  wrote:
>> So, here are some thoughts to make that more user-friendly. I think
>> that the critical issue here is to properly flatten the meta data in
>> the custom language and represent it properly in a new catalog,
>> without messing up too much with the existing pg_stat_replication that
>> people are now used to for 5 releases since 9.0.
>
> Putting the metadata in a catalog doesn't seem great because that only
> can ever work on the master.  Maybe there's no need to configure this
> on the slaves and therefore it's OK, but I feel nervous about putting
> cluster configuration in catalogs.  Another reason for that is that if
> synchronous replication is broken, then you need a way to change the
> catalog, which involves committing a write transaction; there's a
> danger that your efforts to do this will be tripped up by the broken
> synchronous replication configuration.

I was referring to a catalog view that parses the information related
to groups of s_s_names in a flattened way to show each group sync
status. Perhaps my words should have been clearer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-11 Thread Robert Haas
On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
 wrote:
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0.

Putting the metadata in a catalog doesn't seem great because that only
can ever work on the master.  Maybe there's no need to configure this
on the slaves and therefore it's OK, but I feel nervous about putting
cluster configuration in catalogs.  Another reason for that is that if
synchronous replication is broken, then you need a way to change the
catalog, which involves committing a write transaction; there's a
danger that your efforts to do this will be tripped up by the broken
synchronous replication configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-10 Thread Michael Paquier
On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
 wrote:
>
> Hello,
>
> At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-10 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Michael Paquier
On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
> Also, to be frank, I think we ought to be putting more effort into
> another patch in this same area, specifically Thomas Munro's causal
> reads patch.  I think a lot of people today are trying to use
> synchronous replication to build load-balancing clusters and avoid the
> problem where you write some data and then read back stale data from a
> standby server.  Of course, our current synchronous replication
> facilities make no such guarantees - his patch does, and I think
> that's pretty important.  I'm not saying that we shouldn't do this
> too, of course.

Yeah, sure. Each one of those patches is trying to solve a different
problem where Postgres is deficient, here we'd like to be sure a
commit WAL record is correctly flushed on multiple standbys, while the
patch of Thomas is trying to ensure that there is no need to scan for
the replay position of a standby using some GUC parameters and a
validation/sanity layer in syncrep.c to do that. Surely the patch of
this thread has got more attention than Thomas', and both of them have
merits and try to address real problems. FWIW, the patch of Thomas is
a topic that I find rather interesting, and I am planning to look at
it as well, perhaps for next CF or even before that. We'll see how
other things move on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Masahiko Sawada
On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
 wrote:
> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
>> Also, to be frank, I think we ought to be putting more effort into
>> another patch in this same area, specifically Thomas Munro's causal
>> reads patch.  I think a lot of people today are trying to use
>> synchronous replication to build load-balancing clusters and avoid the
>> problem where you write some data and then read back stale data from a
>> standby server.  Of course, our current synchronous replication
>> facilities make no such guarantees - his patch does, and I think
>> that's pretty important.  I'm not saying that we shouldn't do this
>> too, of course.
>
> Yeah, sure. Each one of those patches is trying to solve a different
> problem where Postgres is deficient, here we'd like to be sure a
> commit WAL record is correctly flushed on multiple standbys, while the
> patch of Thomas is trying to ensure that there is no need to scan for
> the replay position of a standby using some GUC parameters and a
> validation/sanity layer in syncrep.c to do that. Surely the patch of
> this thread has got more attention than Thomas', and both of them have
> merits and try to address real problems. FWIW, the patch of Thomas is
> a topic that I find rather interesting, and I am planning to look at
> it as well, perhaps for next CF or even before that. We'll see how
> other things move on.

Attached first version dedicated language patch (document patch is not yet.)

This patch supports only 1-nest priority method, but this feature will
be expanded with adding quorum method or > 1 level nesting.
So this patch are implemented while being considered about its extensibility.
And I've implemented the new system view we discussed on this thread
but that feature is not included in this patch (because it's not
necessary yet now)

== Syntax ==
s_s_names can have two type syntaxes like follows,

1. s_s_names = 'node1, node2, node3'
2. s_s_names = '2[node1, node2, node3]'

#1 syntax is for backward compatibility, which implies the master
server wait for only 1 server.
#2 syntax is new syntax using dedicated language.

In above #2 setting, node1 standby has lowest priority and node3
standby has highest priority.
And master server will wait for COMMIT until at least 2 lowest
priority standbys send ACK to master.

== Memory Structure ==
Previously, master server has value of s_s_names as string, and used
it when master server determine standby priority.
This patch changed it so that master server has new memory structure
(called SyncGroupNode) in order to be able to handle multiple (and
nested in the future) standby nodes flexibly.
All information of SyncGroupNode are set during parsing s_s_names.

The memory structure is,

structSyncGroupNode
{
   /* Common information */
   inttype;
   char*name;
   SyncGroupNode*next; /* same group next name node */

   /* For group ndoe */
   int sync_method; /* priority */
   intwait_num;
   SyncGroupNode*member; /* member of its group */
   bool (*SyncRepGetSyncedLsnsFn) (SyncGroupNode *group, XLogRecPtr *write_pos,
   XLogRecPtr *flush_pos);
   int (*SyncRepGetSyncStandbysFn) (SyncGroupNode *group, int *list);
};

SyncGroupNode can be different two types; name node, group node, and
have pointer to another name/group node in same group and list of
group members.
name node represents a synchronous standby.
group node represents a group of some name nodes, which can have list
of group member, and synchronous method, number of waiting node.
The list of members are linked with one-way list, and are located in
s_s_names definition order.
e.g. in case of above #2 setting, member list could be,

"main".member -> "node1".next -> "node2".next -> "node3".next -> NULL

The most top level node is always "main" group node. i.g., in this
version patch, only 1 group ("main" group) is created which has some
name nodes (not group node).
And group node has two functions pointer;

* SyncRepGetSyncedLsnsFn
This function decides group write/flush LSNs at that moment.
For example in case of priority method, the lowest LSNs of standbys
that are considered as synchronous should be selected.
If there are not synchronous standbys enough to decide LSNs then this
function return false.

* SyncRepGetSyncStandbysFn :
This function obtains array of walsnd positions of its standby members
that are considered as synchronous.

This implementation might not good in some reason, so please give me feedbacks.
And I will create new commitfest entry for this patch to CF5.

Regards,

--
Masahiko Sawada
diff --git a/src/backend/Makefile b/src/backend/Makefile
index b3d5e2e..3e36686 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -203,7 +203,7 @@ distprep:
 	$(MAKE) -C parser	gram.c gram.h scan.c
 	$(MAKE) -C bootstrap	bootparse.c bootscanner.c
 	$(MAKE) -C catalog	schemapg.h postgres.bki postgres.description 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Michael Paquier
On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao  wrote:
> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada  
> wrote:
>> Attached first version dedicated language patch (document patch is not yet.)
>
> Thanks for the patch! Will review it.
>
> I think that it's time to write the documentation patch.
>
> Though I've not read the patch yet, I found that your patch
> changed s_s_names so that it rejects non-alphabet character
> like *, according to my simple test. It should accept any
> application_name which we can use.

Cool. Planning to look at it as well. Could you as well submit a
regression test based on the recovery infrastructure and submit it as
a separate patch? There is a version upthread of such a test but it
would be good to extract it properly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 9 Feb 2016 13:31:46 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 10 Feb 2016 11:25:49 +0900, Masahiko Sawada  
wrote in 
> On Wed, Feb 10, 2016 at 9:18 AM, Michael Paquier
>  wrote:
> > On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao  wrote:
> >> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada  
> >> wrote:
> >>> Attached first version dedicated language patch (document patch is not 
> >>> yet.)
> >>
> >> Thanks for the patch! Will review it.
> >>
> >> I think that it's time to write the documentation patch.
> >>
> >> Though I've not read the patch yet, I found that your patch
> >> changed s_s_names so that it rejects non-alphabet character
> >> like *, according to my simple test. It should accept any
> >> application_name which we can use.
> >
> > Cool. Planning to look at it as well. Could you as well submit a
> > regression test based on the recovery infrastructure and submit it as
> > a separate patch? There is a version upthread of such a test but it
> > would be good to extract it properly.
> 
> Yes, I will implement regression test patch and documentation patch as well.
> 
> Attached latest version patch supporting s_s_names = '*'.
> Unlike currently behaviour a bit, s_s_names can have only one '*' character.
> e.g, The following setting will get syntax error.
> 
> s_s_names = '*, node1,node2'
> s_s_names = `2[node1, *, node2]`

We could use the setting s_s_names = 'node1, node2, *' as a
extended representation of old s_s_names. It tests node1, node2
as first and try any name if they failed. Similary, '2[node1,
node2, *]' is also meaningful.

> when we use '*' character as s_s_names element, we must set s_s_names
> like follows.
> 
> s_s_names = '*'
> s_s_names = '2[*]'
> 
> BTW, we've discussed about mini language syntax.
> IIRC, the syntax uses [] and () like,
> 'N[node1, node2, ...]', to define priority standbys.
> 'N(node1, node2, ...)', to define quorum standbys.
> And current patch behaves so.
> 
> Which type of parentheses should be used for this syntax to be more clarity?
> Or other character should be used such as <>, // ?

I believed that [] and {} are used respectively for no distinct
reason. I think symmetrical pair of characters is preferable for
readability. Candidate pairs in ascii characters are.

(), {}, [] <> 

{} might be a bit difficult to distinguish from [] on unclear
consoles :p

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Masahiko Sawada
On Wed, Feb 10, 2016 at 9:18 AM, Michael Paquier
 wrote:
> On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao  wrote:
>> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada  
>> wrote:
>>> Attached first version dedicated language patch (document patch is not yet.)
>>
>> Thanks for the patch! Will review it.
>>
>> I think that it's time to write the documentation patch.
>>
>> Though I've not read the patch yet, I found that your patch
>> changed s_s_names so that it rejects non-alphabet character
>> like *, according to my simple test. It should accept any
>> application_name which we can use.
>
> Cool. Planning to look at it as well. Could you as well submit a
> regression test based on the recovery infrastructure and submit it as
> a separate patch? There is a version upthread of such a test but it
> would be good to extract it properly.

Yes, I will implement regression test patch and documentation patch as well.

Attached latest version patch supporting s_s_names = '*'.
Unlike currently behaviour a bit, s_s_names can have only one '*' character.
e.g, The following setting will get syntax error.

s_s_names = '*, node1,node2'
s_s_names = `2[node1, *, node2]`

when we use '*' character as s_s_names element, we must set s_s_names
like follows.

s_s_names = '*'
s_s_names = '2[*]'

BTW, we've discussed about mini language syntax.
IIRC, the syntax uses [] and () like,
'N[node1, node2, ...]', to define priority standbys.
'N(node1, node2, ...)', to define quorum standbys.
And current patch behaves so.

Which type of parentheses should be used for this syntax to be more clarity?
Or other character should be used such as <>, // ?

Regards,

--
Masahiko Sawada


000_multi_sync_replication_v9.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Kyotaro HORIGUCHI
Hello, 

At Wed, 10 Feb 2016 02:57:54 +0900, Fujii Masao  wrote 
in 
> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada  
> wrote:
> > On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
> >  wrote:
> >> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
> >>> Also, to be frank, I think we ought to be putting more effort into
> >>> another patch in this same area, specifically Thomas Munro's causal
> >>> reads patch.  I think a lot of people today are trying to use
> >>> synchronous replication to build load-balancing clusters and avoid the
> >>> problem where you write some data and then read back stale data from a
> >>> standby server.  Of course, our current synchronous replication
> >>> facilities make no such guarantees - his patch does, and I think
> >>> that's pretty important.  I'm not saying that we shouldn't do this
> >>> too, of course.
> >>
> >> Yeah, sure. Each one of those patches is trying to solve a different
> >> problem where Postgres is deficient, here we'd like to be sure a
> >> commit WAL record is correctly flushed on multiple standbys, while the
> >> patch of Thomas is trying to ensure that there is no need to scan for
> >> the replay position of a standby using some GUC parameters and a
> >> validation/sanity layer in syncrep.c to do that. Surely the patch of
> >> this thread has got more attention than Thomas', and both of them have
> >> merits and try to address real problems. FWIW, the patch of Thomas is
> >> a topic that I find rather interesting, and I am planning to look at
> >> it as well, perhaps for next CF or even before that. We'll see how
> >> other things move on.
> >
> > Attached first version dedicated language patch (document patch is not yet.)
> 
> Thanks for the patch! Will review it.
> 
> I think that it's time to write the documentation patch.
> 
> Though I've not read the patch yet, I found that your patch
> changed s_s_names so that it rejects non-alphabet character
> like *, according to my simple test. It should accept any
> application_name which we can use.

Thanks for the quick response. At a glance, I'd like to show you
some random suggestions, mainly on writing conventions.


===
Running postgresql with s_s_names = '*', makes error as Fujii-san
said. And it yeilds the following message.

| $ postgres 
| FATAL:  syntax error: unexpected character "*"

Mmm.. It should be tough to find what has happened..


===

check_synchronous_standby_names frees parsed SyncRepStandbyNames
immediately but no reason is explained there. The following
comment looks to be saying something related to this but it
doesn't explain the reason to free.

+ /*
+  * Any additional validation of standby names should go here.
+  *
+  * Don't attempt to set WALSender priority because this is executed by
+  * postmaster at startup, not WALSender, so the application_name is not
+  * yet correctly set.
+  */


Addtion to that, I'd like to see a description like
'syncgroup_yyparse sets the global SyncRepStandbyNames as side
effect' around it.

===
malloc/free are used in create_name_node and other functions to
be used in scanner, but syncgroup_gram.y is said to use
palloc/pfree. Maybe they should use the same memory
allocation/freeing functions.

===
The variable name SyncRepStandbyNames holds the list of
SyncGroupNode*. This is somewhat confusing. How about
SyncRepStandbys?


===
+static void
+SyncRepClearStandbyGroupList(SyncGroupNode *group)
+{
+   SyncGroupNode *n = group->member;

The name 'n' is a bit confusing, I believe that the one-letter
variables should be used following implicit (and ancient?)
convention otherwise pretty short-term and obvious cases.  name,
or group_name instead might be better. There's similar usage of
'n' in other places.


===
+ * Find active walsender position of WalSnd by name. Returns index of walsnds
+ * array if found, otherwise return -1.

I didn't get what is 'walsender position' within this
comment. And as the discussion upthread, there can be multiple
walsenders with the same name. So this might be like this.

> * Finds the first active synchronous walsender with given name
> * in WalSndCtl->wansnds and returns the index of that. Returns
> * -1 if not found.

===
+ * Get both synced LSNS: write and flush, using its group function and check
+ * whether each LSN has advanced to, or not.

This is question for all. Which to use synced, synched or
synchronized? Maybe we should use non-abbreviated spellings
unless the description become too long to make it hard to read.

> * Return true if we have enough synchronized standbys and the 'safe'
> * written and flushed LSNs, which are LSNs assured in all standbys
> * considered should be synchronized.

# Please rewrite me.

===
+SyncRepSyncedLsnAdvancedTo(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
+{
+   XLogRecPtr  cur_write_pos;
+   XLogRecPtr  

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Michael Paquier
On Wed, Feb 10, 2016 at 3:13 PM, Michael Paquier
 wrote:
> On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada 
> wrote:
> I am personally fine with () and [] as you mention, we could even consider
> {}, each one of them has a different meaning mathematically..
>
> I am not entered into a detailed review yet (waiting for the docs), but the
> patch looks brittle. I have been able to crash the server just by querying
> pg_stat_replication:
> * thread #1: tid = 0x, 0x000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
> walsender.c:2783, stop reason = signal SIGSTOP
>   * frame #0: 0x000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
> walsender.c:2783
> frame #1: 0x000105d4277d
> postgres`ExecMakeTableFunctionResult(funcexpr=0x7fea128f3838,
> econtext=0x7fea128f1b58, argContext=0x7fea128c8ea8,
> expectedDesc=0x7fea128f4710, randomAccess='\0') + 1005 at
> execQual.c:2211
> frame #2: 0x000105d70c24
> postgres`FunctionNext(node=0x7fea128f2f78) + 180 at
> nodeFunctionscan.c:95
> * thread #1: tid = 0x, 0x000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
> walsender.c:2783, stop reason = signal SIGSTOP
> frame #0: 0x000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
> walsender.c:2783
>2780/*
>2781 * Get the currently active synchronous standby.
>2782 */
> -> 2783sync_standbys = (int *) palloc(sizeof(int) *
> SyncRepStandbyNames->wait_num);
>2784LWLockAcquire(SyncRepLock, LW_SHARED);
>2785num_sync =
> SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
>2786LWLockRelease(SyncRepLock);
> (lldb) p SyncRepStandbyNames
> (SyncGroupNode *) $0 = 0x
>
> +sync_node_group:
> +   sync_list   { $$ = create_group_node(1, $1);
> }
> +   |   sync_element_ast{ $$ = create_group_node(1,
> $1);}
> +   |   INT '[' sync_list ']'   { $$ = create_group_node($1,
> $3);}
> +   |   INT '[' sync_element_ast ']'{ $$ = create_group_node($1,
> $3); }
> We may want to be careful with the use of '[' in application_name. I am not
> much thrilled with forbidding the use of []() in application_name, so we may
> want to recommend user to use a backslash when using s_s_names when a group
> is defined.
>
> +void
> +yyerror(const char *message)
> +{
> +ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +   errmsg_internal("%s", message)));
> +}
> whitespace errors here.

+#define MAX_WALSENDER_NAME 8192
+
 typedef enum WalSndState
 {
 WALSNDSTATE_STARTUP = 0,
@@ -62,6 +64,11 @@ typedef struct WalSnd
  * SyncRepLock.
  */
 intsync_standby_priority;
+
+/*
+ * Corresponding standby's application_name.
+ */
+const char   name[MAX_WALSENDER_NAME];
 } WalSnd;
NAMEDATALEN instead?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Michael Paquier
On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada 
wrote:
> Yes, I will implement regression test patch and documentation patch as
well.

Cool, now that we have a clear picture of where we want to move, that would
be an excellent thing to have. Having the docs in the place is clearly
mandatory.

> Attached latest version patch supporting s_s_names = '*'.
> Unlike currently behaviour a bit, s_s_names can have only one '*'
character.
> e.g, The following setting will get syntax error.
>
> s_s_names = '*, node1,node2'
> s_s_names = `2[node1, *, node2]`
>
> when we use '*' character as s_s_names element, we must set s_s_names
> like follows.
>
> s_s_names = '*'
> s_s_names = '2[*]'
>
> BTW, we've discussed about mini language syntax.
> IIRC, the syntax uses [] and () like,
> 'N[node1, node2, ...]', to define priority standbys.
> 'N(node1, node2, ...)', to define quorum standbys.
> And current patch behaves so.
>
> Which type of parentheses should be used for this syntax to be more
clarity?
> Or other character should be used such as <>, // ?

I am personally fine with () and [] as you mention, we could even consider
{}, each one of them has a different meaning mathematically..

I am not entered into a detailed review yet (waiting for the docs), but the
patch looks brittle. I have been able to crash the server just by querying
pg_stat_replication:
* thread #1: tid = 0x, 0x000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
walsender.c:2783, stop reason = signal SIGSTOP
  * frame #0: 0x000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
walsender.c:2783
frame #1: 0x000105d4277d
postgres`ExecMakeTableFunctionResult(funcexpr=0x7fea128f3838,
econtext=0x7fea128f1b58, argContext=0x7fea128c8ea8,
expectedDesc=0x7fea128f4710, randomAccess='\0') + 1005 at
execQual.c:2211
frame #2: 0x000105d70c24
postgres`FunctionNext(node=0x7fea128f2f78) + 180 at
nodeFunctionscan.c:95
* thread #1: tid = 0x, 0x000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
walsender.c:2783, stop reason = signal SIGSTOP
frame #0: 0x000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x7fff5a156290) + 498 at
walsender.c:2783
   2780/*
   2781 * Get the currently active synchronous standby.
   2782 */
-> 2783sync_standbys = (int *) palloc(sizeof(int) *
SyncRepStandbyNames->wait_num);
   2784LWLockAcquire(SyncRepLock, LW_SHARED);
   2785num_sync =
SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
   2786LWLockRelease(SyncRepLock);
(lldb) p SyncRepStandbyNames
(SyncGroupNode *) $0 = 0x

+sync_node_group:
+   sync_list   { $$ = create_group_node(1,
$1); }
+   |   sync_element_ast{ $$ = create_group_node(1,
$1);}
+   |   INT '[' sync_list ']'   { $$ = create_group_node($1,
$3);}
+   |   INT '[' sync_element_ast ']'{ $$ = create_group_node($1,
$3); }
We may want to be careful with the use of '[' in application_name. I am not
much thrilled with forbidding the use of []() in application_name, so we
may want to recommend user to use a backslash when using s_s_names when a
group is defined.

+void
+yyerror(const char *message)
+{
+ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+   errmsg_internal("%s", message)));
+}
whitespace errors here.
-- 
Michael


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-09 Thread Fujii Masao
On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada  wrote:
> On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
>  wrote:
>> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
>>> Also, to be frank, I think we ought to be putting more effort into
>>> another patch in this same area, specifically Thomas Munro's causal
>>> reads patch.  I think a lot of people today are trying to use
>>> synchronous replication to build load-balancing clusters and avoid the
>>> problem where you write some data and then read back stale data from a
>>> standby server.  Of course, our current synchronous replication
>>> facilities make no such guarantees - his patch does, and I think
>>> that's pretty important.  I'm not saying that we shouldn't do this
>>> too, of course.
>>
>> Yeah, sure. Each one of those patches is trying to solve a different
>> problem where Postgres is deficient, here we'd like to be sure a
>> commit WAL record is correctly flushed on multiple standbys, while the
>> patch of Thomas is trying to ensure that there is no need to scan for
>> the replay position of a standby using some GUC parameters and a
>> validation/sanity layer in syncrep.c to do that. Surely the patch of
>> this thread has got more attention than Thomas', and both of them have
>> merits and try to address real problems. FWIW, the patch of Thomas is
>> a topic that I find rather interesting, and I am planning to look at
>> it as well, perhaps for next CF or even before that. We'll see how
>> other things move on.
>
> Attached first version dedicated language patch (document patch is not yet.)

Thanks for the patch! Will review it.

I think that it's time to write the documentation patch.

Though I've not read the patch yet, I found that your patch
changed s_s_names so that it rejects non-alphabet character
like *, according to my simple test. It should accept any
application_name which we can use.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Fujii Masao
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>>  wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
 On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
  wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

 "not" is used twice in this sentence in a way that renders me not able
 to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single 
> node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>
> Thoughts?

I agree that we would need something like such new view in the future,
however it seems too late to work on that for 9.6 unfortunately.
There is only one CommitFest left. Let's focus on very simple case, i.e.,
1-level priority list, now, then we can extend it to cover other cases.

If we can commit the simple version too early and there is enough
time before the date of feature freeze, of course I'm happy to review
the extended version like you proposed, for 9.6.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 1:16 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote 
> in 
>> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>>  wrote:
>> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>> >  wrote:
>> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> >>  wrote:
>> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
>> >>> wrote:
>>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>   wrote:
>> > Yes, please let's use the custom language, and let's not care of not
>> > more than 1 level of nesting so as it is possible to represent
>> > pg_stat_replication in a simple way for the user.
>> 
>>  "not" is used twice in this sentence in a way that renders me not able
>>  to be sure that I'm not understanding it not properly.
>> >>>
>> >>> 4 times here. Score beaten.
>> >>>
>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> >>> to only support configurations up to one level of nested objects, like
>> >>> that:
>> >>> 2[node1, node2, node3]
>> >>> node1, 2[node2, node3], node3
>> >>> In short, we could restrict things so as we cannot define a group of
>> >>> nodes within an existing group.
>> >>
>> >> No, actually, that's stupid. Having up to two nested levels makes more
>> >> sense, a quite common case for this feature being something like that:
>> >> 2{node1,[node2,node3]}
>> >> In short, sync confirmation is waited from node1 and (node2 or node3).
>> >>
>> >> Flattening groups of nodes with a new catalog will be necessary to
>> >> ease the view of this data to users:
>> >> - group name?
>> >> - array of members with nodes/groups
>> >> - group type: quorum or priority
>> >> - number of items to wait for in this group
>> >
>> > So, here are some thoughts to make that more user-friendly. I think
>> > that the critical issue here is to properly flatten the meta data in
>> > the custom language and represent it properly in a new catalog,
>> > without messing up too much with the existing pg_stat_replication that
>> > people are now used to for 5 releases since 9.0. So, I would think
>> > that we will need to have a new catalog, say
>> > pg_stat_replication_groups with the following things:
>> > - One line of this catalog represents the status of a group or of a single 
>> > node.
>> > - The status of a node/group is either sync or potential, if a
>> > node/group is specified more than once, it may be possible that it
>> > would be sync and potential depending on where it is defined, in which
>> > case setting its status to 'sync' has the most sense. If it is in sync
>> > state I guess.
>> > - Move sync_priority and sync_state, actually an equivalent from
>> > pg_stat_replication into this new catalog, because those represent the
>> > status of a node or group of nodes.
>> > - group name, and by that I think that we had perhaps better make
>> > mandatory the need to append a name with a quorum or priority group.
>> > The group at the highest level is forcibly named as 'top', 'main', or
>> > whatever if not directly specified by the user. If the entry is
>> > directly a node, use the application_name.
>> > - Type of group, quorum or priority
>> > - Elements in this group, an element can be a group name or a node
>> > name, aka application_name. If group is of type priority, the elements
>> > are listed in increasing order. So the elements with lower priority
>> > get first, etc. We could have one column listing explicitly a list of
>> > integers that map with the elements of a group but it does not seem
>> > worth it, what users would like to know is what are the nodes that are
>> > prioritized. This covers the former 'priority' field of
>> > pg_stat_replication.
>> >
>> > We may have a good idea of how to define a custom language, still we
>> > are going to need to design a clean interface at catalog level more or
>> > less close to what is written here. If we can get a clean interface,
>> > the custom language implemented, and TAP tests that take advantage of
>> > this user interface to check the node/group statuses, I guess that we
>> > would be in good shape for this patch.
>> >
>> > Anyway that's not a small project, and perhaps I am over-complicating
>> > the whole thing.
>> >
>> > Thoughts?
>>
>> I agree that we would need something like such new view in the future,
>> however it seems too late to work on that for 9.6 unfortunately.
>> There is only one CommitFest left. Let's focus on very simple case, i.e.,
>> 1-level priority list, now, then we can extend it to cover other cases.
>>
>> If we can commit the simple version too early and there is enough
>> time before the date of feature freeze, of course I'm happy to review
>> the 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 12:16 PM, kharagesuraj 
wrote:

> Hello,
>
>
>
>
>
> >> I agree with first version, and attached the updated *patch* which are
> >> modified so that it supports simple multiple sync replication you
> >>suggested.
> >> (but test cases are not included yet.)
>
>
>
> I have tried for some basic in-built test cases for multisync rep.
>
> I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=[hidden email]
> ">patch patch.
>
> Still it is in progress.
>
> Please have look and correct me if i am wrong and suggest remaining test
> cases.
>

So the interesting part of this patch is 006_sync_rep.pl. I think that you
had better build something on top of my patch as a separate patch. This
would make things clearer.

+my $result = $node_master->psql('postgres', "select application_name,
sync_state from pg_stat_replication;");
+print "$result \n";
+is($result, "standby_1|sync\nstandby_2|sync\nstandby_3|potential",
'checked for sync standbys state initially');
Now regarding the test, you visibly got the idea, though I think that we'd
want to update a bit the parameters of postgresql.conf and re-run those
queries a couple of times, that's cheaper than having to re-create new
cluster nodes all the time, so just create a base, then switch s_s_names a
bit, and query pg_stat_replication, and you are already doing the latter.

Also, please attach patches directly to your emails. When loading something
on nabble this is located only there and not within postgresql.org which
would be annoying if nabble disappears at some point. You would also want
to use directly an email client and interact with the community mailing
lists this way instead of going through the nabble's forum-like interface
(never used it, not really willing to use it, but I guess that it is
similar to that).

I am attaching what you posted on this email for the archive's sake.
-- 
Michael


recovery_test_suite_with_multisync.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Amit Langote

Hi Suraj,

On 2016/02/09 12:16, kharagesuraj wrote:
> Hello,
> 
> 
>>> I agree with first version, and attached the updated patch which are
>>> modified so that it supports simple multiple sync replication you
>>> suggested.
>>> (but test cases are not included yet.)
> 
> I have tried for some basic in-built test cases for multisync rep.
> I have created one patch over Michael's  href="http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com;>patch
>  patch.
> Still it is in progress.
> Please have look and correct me if i am wrong and suggest remaining test 
> cases.
> 
> recovery_test_suite_with_multisync.patch (36K) 
> 

Thanks for creating the patch. Sorry to nitpick but as has been brought up
before, it's better to send patches as email attachments (that is, not as
a links to external sites).

Also, it would be helpful if your patch is submitted as a diff over
applying Michael's patch. That is, only the stuff specific to testing the
multiple sync feature and let the rest be taken care of by Michael's base
patch.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote in 

> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>  wrote:
> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> >  wrote:
> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> >>  wrote:
> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
> >>> wrote:
>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>   wrote:
> > Yes, please let's use the custom language, and let's not care of not
> > more than 1 level of nesting so as it is possible to represent
> > pg_stat_replication in a simple way for the user.
> 
>  "not" is used twice in this sentence in a way that renders me not able
>  to be sure that I'm not understanding it not properly.
> >>>
> >>> 4 times here. Score beaten.
> >>>
> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> >>> to only support configurations up to one level of nested objects, like
> >>> that:
> >>> 2[node1, node2, node3]
> >>> node1, 2[node2, node3], node3
> >>> In short, we could restrict things so as we cannot define a group of
> >>> nodes within an existing group.
> >>
> >> No, actually, that's stupid. Having up to two nested levels makes more
> >> sense, a quite common case for this feature being something like that:
> >> 2{node1,[node2,node3]}
> >> In short, sync confirmation is waited from node1 and (node2 or node3).
> >>
> >> Flattening groups of nodes with a new catalog will be necessary to
> >> ease the view of this data to users:
> >> - group name?
> >> - array of members with nodes/groups
> >> - group type: quorum or priority
> >> - number of items to wait for in this group
> >
> > So, here are some thoughts to make that more user-friendly. I think
> > that the critical issue here is to properly flatten the meta data in
> > the custom language and represent it properly in a new catalog,
> > without messing up too much with the existing pg_stat_replication that
> > people are now used to for 5 releases since 9.0. So, I would think
> > that we will need to have a new catalog, say
> > pg_stat_replication_groups with the following things:
> > - One line of this catalog represents the status of a group or of a single 
> > node.
> > - The status of a node/group is either sync or potential, if a
> > node/group is specified more than once, it may be possible that it
> > would be sync and potential depending on where it is defined, in which
> > case setting its status to 'sync' has the most sense. If it is in sync
> > state I guess.
> > - Move sync_priority and sync_state, actually an equivalent from
> > pg_stat_replication into this new catalog, because those represent the
> > status of a node or group of nodes.
> > - group name, and by that I think that we had perhaps better make
> > mandatory the need to append a name with a quorum or priority group.
> > The group at the highest level is forcibly named as 'top', 'main', or
> > whatever if not directly specified by the user. If the entry is
> > directly a node, use the application_name.
> > - Type of group, quorum or priority
> > - Elements in this group, an element can be a group name or a node
> > name, aka application_name. If group is of type priority, the elements
> > are listed in increasing order. So the elements with lower priority
> > get first, etc. We could have one column listing explicitly a list of
> > integers that map with the elements of a group but it does not seem
> > worth it, what users would like to know is what are the nodes that are
> > prioritized. This covers the former 'priority' field of
> > pg_stat_replication.
> >
> > We may have a good idea of how to define a custom language, still we
> > are going to need to design a clean interface at catalog level more or
> > less close to what is written here. If we can get a clean interface,
> > the custom language implemented, and TAP tests that take advantage of
> > this user interface to check the node/group statuses, I guess that we
> > would be in good shape for this patch.
> >
> > Anyway that's not a small project, and perhaps I am over-complicating
> > the whole thing.
> >
> > Thoughts?
> 
> I agree that we would need something like such new view in the future,
> however it seems too late to work on that for 9.6 unfortunately.
> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> 1-level priority list, now, then we can extend it to cover other cases.
> 
> If we can commit the simple version too early and there is enough
> time before the date of feature freeze, of course I'm happy to review
> the extended version like you proposed, for 9.6.

I agree to Fujii-san. There would be many of convenient gadgets
around this and they are completely welcome, but having
fundamental 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread kharagesuraj
Hello,


>> I agree with first version, and attached the updated patch which are
>> modified so that it supports simple multiple sync replication you
>>suggested.
>> (but test cases are not included yet.)

I have tried for some basic in-built test cases for multisync rep.
I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com;>patch
 patch.
Still it is in progress.
Please have look and correct me if i am wrong and suggest remaining test cases.

Regards
Suraj Kharage


If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
This email was sent by 
kharagesuraj
 (via Nabble)
To receive all replies by email, subscribe to this 
discussion

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

recovery_test_suite_with_multisync.patch (36K) 





--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886503.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-07 Thread kharagesuraj
hello,

I have tested v7 patch.
but i think you forgot to remove some debug points in patch from
src/backend/replication/syncrep.c file.

for (i = 0; i < num_sync; i++)
+   {
+   elog(WARNING, "sync_standbys[%d] = %d", i, sync_standbys[i]);
+   }
+   elog(WARNING, "num_sync = %d, s_s_num = %d", num_sync,
synchronous_standby_num);

Please correct my understanding if i am wrong.

Regards
Suraj Kharage 





--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-05 Thread Michael Paquier
On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>>  wrote:
 Yes, please let's use the custom language, and let's not care of not
 more than 1 level of nesting so as it is possible to represent
 pg_stat_replication in a simple way for the user.
>>>
>>> "not" is used twice in this sentence in a way that renders me not able
>>> to be sure that I'm not understanding it not properly.
>>
>> 4 times here. Score beaten.
>>
>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> to only support configurations up to one level of nested objects, like
>> that:
>> 2[node1, node2, node3]
>> node1, 2[node2, node3], node3
>> In short, we could restrict things so as we cannot define a group of
>> nodes within an existing group.
>
> No, actually, that's stupid. Having up to two nested levels makes more
> sense, a quite common case for this feature being something like that:
> 2{node1,[node2,node3]}
> In short, sync confirmation is waited from node1 and (node2 or node3).
>
> Flattening groups of nodes with a new catalog will be necessary to
> ease the view of this data to users:
> - group name?
> - array of members with nodes/groups
> - group type: quorum or priority
> - number of items to wait for in this group

So, here are some thoughts to make that more user-friendly. I think
that the critical issue here is to properly flatten the meta data in
the custom language and represent it properly in a new catalog,
without messing up too much with the existing pg_stat_replication that
people are now used to for 5 releases since 9.0. So, I would think
that we will need to have a new catalog, say
pg_stat_replication_groups with the following things:
- One line of this catalog represents the status of a group or of a single node.
- The status of a node/group is either sync or potential, if a
node/group is specified more than once, it may be possible that it
would be sync and potential depending on where it is defined, in which
case setting its status to 'sync' has the most sense. If it is in sync
state I guess.
- Move sync_priority and sync_state, actually an equivalent from
pg_stat_replication into this new catalog, because those represent the
status of a node or group of nodes.
- group name, and by that I think that we had perhaps better make
mandatory the need to append a name with a quorum or priority group.
The group at the highest level is forcibly named as 'top', 'main', or
whatever if not directly specified by the user. If the entry is
directly a node, use the application_name.
- Type of group, quorum or priority
- Elements in this group, an element can be a group name or a node
name, aka application_name. If group is of type priority, the elements
are listed in increasing order. So the elements with lower priority
get first, etc. We could have one column listing explicitly a list of
integers that map with the elements of a group but it does not seem
worth it, what users would like to know is what are the nodes that are
prioritized. This covers the former 'priority' field of
pg_stat_replication.

We may have a good idea of how to define a custom language, still we
are going to need to design a clean interface at catalog level more or
less close to what is written here. If we can get a clean interface,
the custom language implemented, and TAP tests that take advantage of
this user interface to check the node/group statuses, I guess that we
would be in good shape for this patch.

Anyway that's not a small project, and perhaps I am over-complicating
the whole thing.

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-05 Thread Joshua Berkus

> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
> 
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.

Yes.  The more I look at this, the worse the idea of custom syntax looks.  Yes, 
I realize there are drawbacks to using JSON, but this is worse.

Further, there's a lot of horse-cart inversion here.  This proposal involves 
letting the syntax for sync_list configuration determine the feature set for 
N-sync.  That's backwards; we should decide the total list of features we want 
to support, and then adopt a syntax which will make it possible to have them.

-- 
Josh Berkus
Red Hat OSAS
(opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-05 Thread Michael Paquier
On Fri, Feb 5, 2016 at 12:19 PM, Masahiko Sawada  wrote:
> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>  wrote:
> I agree with adding new system catalog to easily checking replication
> status for user. And group name will needed for this.
> What about adding group name with ":" to immediately after set of
> standbys like follows?

This way is fine for me.

> 2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]
>
> Also, regarding sync replication according to configuration, the view
> I'm thinking is following definition.
>
> =# \d pg_synchronous_replication
>  Column  |  Type   | Modifiers
> -+---+---
>  name| text  |
>  sync_type | text  |
>  wait_num  | integer  |
>  sync_priority | inteter   |
>  sync_state| text  |
>  member| text[] |
>  level | integer  |
>  write_location| pg_lsn  |
>  flush_location| pg_lsn  |
>  apply_location   | pg_lsn   |
>
> - "name" : node name or group name, or "main" meaning top level node.

Check.

> - "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.

That would be one or the other.

> - "wait_num" : number of nodes/groups to wait for in this group.

Check. This is taken directly from the meta data.

> - "sync_priority" : priority of node/group in this group. "main" node has "0".
>   - the standby is in quorum group always has
> priority 1.
>   - the standby is in priority group has
> priority according to definition order.

This is a bit confusing if the same node or group in in multiple
groups. My previous suggestion was to list the elements of the group
in increasing order of priority. That's an important point.

> - "sync_state" : 'sync' or 'potential' or 'quorum'.
>  - the standby is in quorum group is always 'quorum'.
>  - the standby is in priority group is 'sync'
> / 'potential'.

potential and quorum are the same thing, no? The only difference is
based on the group type here.

> - "member" : array of members for group node, otherwise NULL.

This can be NULL only when the entry is a node.

> - "level" : nested level. "main" node is level 0.

Not sure this one is necessary.

> - "write/flush/apply_location" : group/node calculated LSN according
> to configuration.

This does not need to be part of this catalog, that's a representation
of the data that is part of the WAL sender.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-05 Thread Masahiko Sawada
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>>  wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
 On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
  wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

 "not" is used twice in this sentence in a way that renders me not able
 to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single 
> node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>

I agree with adding new system catalog to easily checking replication
status for user. And group name will needed for this.
What about adding group name with ":" to immediately after set of
standbys like follows?

2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]

Also, regarding sync replication according to configuration, the view
I'm thinking is following definition.

=# \d pg_synchronous_replication
 Column  |  Type   | Modifiers
-+---+---
 name| text  |
 sync_type | text  |
 wait_num  | integer  |
 sync_priority | inteter   |
 sync_state| text  |
 member| text[] |
 level | integer  |
 write_location| pg_lsn  |
 flush_location| pg_lsn  |
 apply_location   | pg_lsn   |

- "name" : node name or group name, or "main" meaning top level node.
- "sync_type" : 'priority' or 'quorum' for group 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 4 Feb 2016 23:06:45 +0300, Michael Paquier  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Thom Brown
On 4 February 2016 at 14:34, Fujii Masao  wrote:
> On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas  wrote:
>> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao  wrote:
>>> So you disagree with only third version that I proposed, i.e.,
>>> adding some hooks for sync replication? If yes and you're OK
>>> with the first and second versions, ISTM that we almost reached
>>> consensus on the direction of multiple sync replication feature.
>>> The first version can cover "one local and one remote sync standbys" case,
>>> and the second can cover "one local and at least one from several remote
>>> standbys" case. I'm thinking to focus on the first version now,
>>> and then we can work on the second to support the quorum commit
>>
>> Well, I think the only hard part of the third problem is deciding on
>> what syntax to use.  It seems like a waste of time to me to go to a
>> bunch of trouble to implement #1 and #2 using one syntax and then have
>> to invent a whole new syntax for #3.  Seriously, this isn't that hard:
>> it's not a technical problem.  It's just that we've got a bunch of
>> people who can't agree on what syntax to use.  IMO, you should just
>> pick something.  You're presumably the committer for this patch, and I
>> think you should just decide which of the 47,123 things proposed so
>> far is best and insist on that.  I trust that you will make a good
>> decision even if it's different than the decision that I would have
>> made.
>
> If we use one syntax for every cases, possible approaches that we can choose
> are mini-language, json, etc. Since my previous proposal covers only very
> simple cases, extra syntax needs to be supported for more complicated cases.
> My plan was to add the hooks so that the developers can choose their own
> syntax. But which might confuse users.
>
> Now I'm thinking that mini-language is better choice. A json has some good
> points, but its big problem is that the setting value is likely to be very 
> long.
> For example, when the master needs to wait for one local standby and
> at least one from three remote standbys in London data center, the setting
> value (synchronous_standby_names) would be
>
>   s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
> "nodes":["london1", "london2", "london3"]}]}'
>
> OTOH, the value with mini-language is simple and not so long as follows.
>
>   s_s_names = '2[local1, 1(london1, london2, london3)]'
>
> This is why I'm now thinking that mini-language is better. But it's not easy
> to completely implement mini-language. There seems to be many problems
> that we need to resolve. For example, please imagine the case where
> the master needs to wait for at least one from two standbys "tokyo1", "tokyo2"
> in Tokyo data center. If Tokyo data center fails, the master needs to
> wait for at least one from two standbys "london1", "london2" in London
> data center, instead. This case can be configured as follows in mini-language.
>
>   s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]'
>
> One problem here is; what pg_stat_replication.sync_state value should be
> shown for each standbys? Which standby should be marked as sync? potential?
> any other value like quorum? The current design of pg_stat_replication
> doesn't fit complicated sync replication cases, so maybe we need to separate
> it into several views. It's almost impossible to complete those problems.
>
> My current plan for 9.6 is to support the minimal subset of mini-language;
> simple syntax of "[name, ...]". "" specifies the number of
> sync standbys that the master needs to wait for. "[name, ...]" specifies
> the priorities of the listed standbys. This first version supports neither
> quorum commit nor nested sync replication configuration like
> "[name, [name, ...]]". It just supports very simple
> "1-level" configuration.

Whatever the solution, I'm really don't like the idea of changing the
definition of s_s_names based on the value of another GUC, mainly
because it seems hacky, but also because the name of the GUC stops
making sense.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Fujii Masao
On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas  wrote:
> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao  wrote:
>> So you disagree with only third version that I proposed, i.e.,
>> adding some hooks for sync replication? If yes and you're OK
>> with the first and second versions, ISTM that we almost reached
>> consensus on the direction of multiple sync replication feature.
>> The first version can cover "one local and one remote sync standbys" case,
>> and the second can cover "one local and at least one from several remote
>> standbys" case. I'm thinking to focus on the first version now,
>> and then we can work on the second to support the quorum commit
>
> Well, I think the only hard part of the third problem is deciding on
> what syntax to use.  It seems like a waste of time to me to go to a
> bunch of trouble to implement #1 and #2 using one syntax and then have
> to invent a whole new syntax for #3.  Seriously, this isn't that hard:
> it's not a technical problem.  It's just that we've got a bunch of
> people who can't agree on what syntax to use.  IMO, you should just
> pick something.  You're presumably the committer for this patch, and I
> think you should just decide which of the 47,123 things proposed so
> far is best and insist on that.  I trust that you will make a good
> decision even if it's different than the decision that I would have
> made.

If we use one syntax for every cases, possible approaches that we can choose
are mini-language, json, etc. Since my previous proposal covers only very
simple cases, extra syntax needs to be supported for more complicated cases.
My plan was to add the hooks so that the developers can choose their own
syntax. But which might confuse users.

Now I'm thinking that mini-language is better choice. A json has some good
points, but its big problem is that the setting value is likely to be very long.
For example, when the master needs to wait for one local standby and
at least one from three remote standbys in London data center, the setting
value (synchronous_standby_names) would be

  s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
"nodes":["london1", "london2", "london3"]}]}'

OTOH, the value with mini-language is simple and not so long as follows.

  s_s_names = '2[local1, 1(london1, london2, london3)]'

This is why I'm now thinking that mini-language is better. But it's not easy
to completely implement mini-language. There seems to be many problems
that we need to resolve. For example, please imagine the case where
the master needs to wait for at least one from two standbys "tokyo1", "tokyo2"
in Tokyo data center. If Tokyo data center fails, the master needs to
wait for at least one from two standbys "london1", "london2" in London
data center, instead. This case can be configured as follows in mini-language.

  s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]'

One problem here is; what pg_stat_replication.sync_state value should be
shown for each standbys? Which standby should be marked as sync? potential?
any other value like quorum? The current design of pg_stat_replication
doesn't fit complicated sync replication cases, so maybe we need to separate
it into several views. It's almost impossible to complete those problems.

My current plan for 9.6 is to support the minimal subset of mini-language;
simple syntax of "[name, ...]". "" specifies the number of
sync standbys that the master needs to wait for. "[name, ...]" specifies
the priorities of the listed standbys. This first version supports neither
quorum commit nor nested sync replication configuration like
"[name, [name, ...]]". It just supports very simple
"1-level" configuration.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Robert Haas
On Thu, Feb 4, 2016 at 9:34 AM, Fujii Masao  wrote:
> Now I'm thinking that mini-language is better choice. A json has some good
> points, but its big problem is that the setting value is likely to be very 
> long.
> For example, when the master needs to wait for one local standby and
> at least one from three remote standbys in London data center, the setting
> value (synchronous_standby_names) would be
>
>   s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
> "nodes":["london1", "london2", "london3"]}]}'
>
> OTOH, the value with mini-language is simple and not so long as follows.
>
>   s_s_names = '2[local1, 1(london1, london2, london3)]'

Yeah, that was my thought also.  Another idea which was suggested is
to create a completely new configuration file for this.  Most people
would only have simple stuff in there, of course, but then you could
have the information spread across multiple lines.

I don't in the end care very much about how we solve this problem.
But I'm glad you agree that whatever we do to solve the simple problem
should be a logical subset of what the full solution will eventually
look like, not a completely different design.  I think that's
important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Michael Paquier
On Thu, Feb 4, 2016 at 7:27 PM, Robert Haas  wrote:
> I don't in the end care very much about how we solve this problem.
> But I'm glad you agree that whatever we do to solve the simple problem
> should be a logical subset of what the full solution will eventually
> look like, not a completely different design.  I think that's
> important.

Yes, please let's use the custom language, and let's not care of not
more than 1 level of nesting so as it is possible to represent
pg_stat_replication in a simple way for the user.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Robert Haas
On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
 wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

"not" is used twice in this sentence in a way that renders me not able
to be sure that I'm not understanding it not properly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Robert Haas
On Thu, Feb 4, 2016 at 2:49 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>  wrote:
>>> Yes, please let's use the custom language, and let's not care of not
>>> more than 1 level of nesting so as it is possible to represent
>>> pg_stat_replication in a simple way for the user.
>>
>> "not" is used twice in this sentence in a way that renders me not able
>> to be sure that I'm not understanding it not properly.
>
> 4 times here. Score beaten.
>
> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> to only support configurations up to one level of nested objects, like
> that:
> 2[node1, node2, node3]
> node1, 2[node2, node3], node3
> In short, we could restrict things so as we cannot define a group of
> nodes within an existing group.

I see.  Such a restriction doesn't seem likely to me to prevent people
from doing anything actually useful.  But I don't know that it buys
very much either.  It's often not very much simpler to handle 2 levels
than n levels.  However, I ain't writing the code so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Michael Paquier
On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>  wrote:
>>> Yes, please let's use the custom language, and let's not care of not
>>> more than 1 level of nesting so as it is possible to represent
>>> pg_stat_replication in a simple way for the user.
>>
>> "not" is used twice in this sentence in a way that renders me not able
>> to be sure that I'm not understanding it not properly.
>
> 4 times here. Score beaten.
>
> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> to only support configurations up to one level of nested objects, like
> that:
> 2[node1, node2, node3]
> node1, 2[node2, node3], node3
> In short, we could restrict things so as we cannot define a group of
> nodes within an existing group.

No, actually, that's stupid. Having up to two nested levels makes more
sense, a quite common case for this feature being something like that:
2{node1,[node2,node3]}
In short, sync confirmation is waited from node1 and (node2 or node3).

Flattening groups of nodes with a new catalog will be necessary to
ease the view of this data to users:
- group name?
- array of members with nodes/groups
- group type: quorum or priority
- number of items to wait for in this group
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-04 Thread Michael Paquier
On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>  wrote:
>> Yes, please let's use the custom language, and let's not care of not
>> more than 1 level of nesting so as it is possible to represent
>> pg_stat_replication in a simple way for the user.
>
> "not" is used twice in this sentence in a way that renders me not able
> to be sure that I'm not understanding it not properly.

4 times here. Score beaten.

Sorry. Perhaps I am tired... I was just wondering if it would be fine
to only support configurations up to one level of nested objects, like
that:
2[node1, node2, node3]
node1, 2[node2, node3], node3
In short, we could restrict things so as we cannot define a group of
nodes within an existing group.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-02 Thread Robert Haas
On Mon, Feb 1, 2016 at 9:28 AM, Fujii Masao  wrote:
> So what about the following plan?
>
> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
>   s_s_num = 1, s_s_names = 'local'
>   q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.

-1.  We're wrapping ourselves around the axle here and ending up with
a design that will not let someone say "the local standby and at least
one remote standby" without writing C code.  I understand nobody likes
the mini-language I proposed and nobody likes a JSON configuration
file either.  I also understand that either of those things would
allow ridiculously complicated configurations that nobody will ever
need in the real world.  But I think "one local and one remote" is a
fairly common case and that you shouldn't need a PhD in
PostgreSQLology to configure it.

Also, to be frank, I think we ought to be putting more effort into
another patch in this same area, specifically Thomas Munro's causal
reads patch.  I think a lot of people today are trying to use
synchronous replication to build load-balancing clusters and avoid the
problem where you write some data and then read back stale data from a
standby server.  Of course, our current synchronous replication
facilities make no such guarantees - his patch does, and I think
that's pretty important.  I'm not saying that we shouldn't do this
too, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-02 Thread Fujii Masao
On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas  wrote:
> On Mon, Feb 1, 2016 at 9:28 AM, Fujii Masao  wrote:
>> So what about the following plan?
>>
>> [first version]
>> Add only synchronous_standby_num which specifies the number of standbys
>> that the master must wait for before marking sync replication as completed.
>> This version supports simple use cases like "I want to have two synchronous
>> standbys".
>>
>> [second version]
>> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
>> additionally supports simple quorum commit case like "I want to ensure
>> that WAL is replicated synchronously to at least two standbys from five
>> ones listed in s_s_names".
>>
>> Or
>>
>> Add something like quorum_replication_num and quorum_standby_names, i.e.,
>> the master must wait for at least q_r_num standbys from ones listed in
>> q_s_names before marking sync replication as completed. Also the master
>> must wait for sync replication according to s_s_num and s_s_num.
>> That is, this approach separates 'priority' and 'quorum' to each parameters.
>> This increases the number of GUC parameters, but ISTM less confusing, and
>> it supports a bit complicated case like "there is one local standby and three
>> remote standbys, then I want to ensure that WAL is replicated synchronously
>> to the local standby and at least two remote one", e.g.,
>>
>>   s_s_num = 1, s_s_names = 'local'
>>   q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>>
>> [third version]
>> Add the hooks for more complicated sync replication cases.
>
> -1.  We're wrapping ourselves around the axle here and ending up with
> a design that will not let someone say "the local standby and at least
> one remote standby" without writing C code.  I understand nobody likes
> the mini-language I proposed and nobody likes a JSON configuration
> file either.  I also understand that either of those things would
> allow ridiculously complicated configurations that nobody will ever
> need in the real world.  But I think "one local and one remote" is a
> fairly common case and that you shouldn't need a PhD in
> PostgreSQLology to configure it.

So you disagree with only third version that I proposed, i.e.,
adding some hooks for sync replication? If yes and you're OK
with the first and second versions, ISTM that we almost reached
consensus on the direction of multiple sync replication feature.
The first version can cover "one local and one remote sync standbys" case,
and the second can cover "one local and at least one from several remote
standbys" case. I'm thinking to focus on the first version now,
and then we can work on the second to support the quorum commit

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-02 Thread Masahiko Sawada
On Mon, Feb 1, 2016 at 11:28 PM, Fujii Masao  wrote:
> On Mon, Feb 1, 2016 at 5:36 PM, Masahiko Sawada  wrote:
>> On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
>>  wrote:
>>> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada  
>>> wrote:
 On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
  wrote:
> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
> wrote:
>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>  wrote:
>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
 By the discussions so far, I'm planning to have several replication
 methods such as 'quorum', 'complex' in the feature, and the each
 replication method specifies the syntax of s_s_names.
 It means that s_s_names could have the number of sync standbys like
 what current patch does.
>>>
>>> What if the application_name of a standby node has the format of an 
>>> integer?
>>
>> Even if the standby has an integer as application_name, we can set
>> s_s_names like '2,1,2,3'.
>> The leading '2' is always handled as the number of sync standbys when
>> s_r_method = 'priority'.
>
> Hm. I agree with Fujii-san here, having the number of sync standbys
> defined in a parameter that should have a list of names is a bit
> confusing. I'd rather have a separate GUC, which brings us back to one
> of the first patches that I came up with, and a couple of people,
> including Josh were not happy with that because this did not support
> real quorum. Perhaps the final answer would be really to get a set of
> hooks, and a contrib module making use of that.

 Yeah, I agree with having set of hooks, and postgres core has simple
 multi sync replication mechanism like you suggested at first version.
>>>
>>> If there are hooks, I don't think that we should really bother about
>>> having in core anything more complicated than what we have now. The
>>> trick will be to come up with a hook design modular enough to support
>>> the kind of configurations mentioned on this thread. Roughly perhaps a
>>> refactoring of the syncrep code so as it is possible to wait for
>>> multiple targets some of them being optional,, one modular way in
>>> pg_stat_get_wal_senders to represent the status of a node to user, and
>>> another hook to return to decide which are the nodes to wait for. Some
>>> of the nodes being waited for may be based on conditions for quorum
>>> support. That's a hard problem to do that in a flexible enough way.
>>
>> Hm, I think not-nested quorum and priority are not complicated, and we
>> should support at least both or either simple method in core of
>> postgres.
>> More complicated method like using json-style, or dedicated language
>> would be supported by external module.
>
> So what about the following plan?
>
> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
>   s_s_num = 1, s_s_names = 'local'
>   q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.
>
> I'm thinking that the realistic target for 9.6 might be the first one.
>

Thank you for suggestion.

I agree with first version, and attached the updated patch which are
modified so that it supports simple multiple sync replication you
suggested.
(but test cases are not included yet.)

Regards,

--
Masahiko Sawada
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 7f85b88..9a2f7e7 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -29,10 +29,10 @@
  * single ordered queue of waiting backends, 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-02 Thread Robert Haas
On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao  wrote:
> So you disagree with only third version that I proposed, i.e.,
> adding some hooks for sync replication? If yes and you're OK
> with the first and second versions, ISTM that we almost reached
> consensus on the direction of multiple sync replication feature.
> The first version can cover "one local and one remote sync standbys" case,
> and the second can cover "one local and at least one from several remote
> standbys" case. I'm thinking to focus on the first version now,
> and then we can work on the second to support the quorum commit

Well, I think the only hard part of the third problem is deciding on
what syntax to use.  It seems like a waste of time to me to go to a
bunch of trouble to implement #1 and #2 using one syntax and then have
to invent a whole new syntax for #3.  Seriously, this isn't that hard:
it's not a technical problem.  It's just that we've got a bunch of
people who can't agree on what syntax to use.  IMO, you should just
pick something.  You're presumably the committer for this patch, and I
think you should just decide which of the 47,123 things proposed so
far is best and insist on that.  I trust that you will make a good
decision even if it's different than the decision that I would have
made.

Now, if it's easier to implement a subset of that syntax first and
then extend it later, fine.   But it makes no sense to me to implement
the easy cases without having some idea of how you're go to extend
that to the hard cases.  Then you'll just end up with a mishmash.
Pick something that can be extended to handle all of the plausible
cases, whether it's a mini-language or a JSON blob or a
pg_hba.conf-type file or some other crazy thing that you invent, and
just do it and be done with it.  We've wasted far too much time trying
to reach consensus on this: it's time for you to exercise your vast
dictatorial power.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-01 Thread Masahiko Sawada
On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
 wrote:
> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada  
> wrote:
>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>>  wrote:
>>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
>>> wrote:
 On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
  wrote:
> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>> By the discussions so far, I'm planning to have several replication
>> methods such as 'quorum', 'complex' in the feature, and the each
>> replication method specifies the syntax of s_s_names.
>> It means that s_s_names could have the number of sync standbys like
>> what current patch does.
>
> What if the application_name of a standby node has the format of an 
> integer?

 Even if the standby has an integer as application_name, we can set
 s_s_names like '2,1,2,3'.
 The leading '2' is always handled as the number of sync standbys when
 s_r_method = 'priority'.
>>>
>>> Hm. I agree with Fujii-san here, having the number of sync standbys
>>> defined in a parameter that should have a list of names is a bit
>>> confusing. I'd rather have a separate GUC, which brings us back to one
>>> of the first patches that I came up with, and a couple of people,
>>> including Josh were not happy with that because this did not support
>>> real quorum. Perhaps the final answer would be really to get a set of
>>> hooks, and a contrib module making use of that.
>>
>> Yeah, I agree with having set of hooks, and postgres core has simple
>> multi sync replication mechanism like you suggested at first version.
>
> If there are hooks, I don't think that we should really bother about
> having in core anything more complicated than what we have now. The
> trick will be to come up with a hook design modular enough to support
> the kind of configurations mentioned on this thread. Roughly perhaps a
> refactoring of the syncrep code so as it is possible to wait for
> multiple targets some of them being optional,, one modular way in
> pg_stat_get_wal_senders to represent the status of a node to user, and
> another hook to return to decide which are the nodes to wait for. Some
> of the nodes being waited for may be based on conditions for quorum
> support. That's a hard problem to do that in a flexible enough way.

Hm, I think not-nested quorum and priority are not complicated, and we
should support at least both or either simple method in core of
postgres.
More complicated method like using json-style, or dedicated language
would be supported by external module.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-01 Thread Michael Paquier
On Mon, Feb 1, 2016 at 11:28 PM, Fujii Masao  wrote:

> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each
> parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and
> three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
>   s_s_num = 1, s_s_names = 'local'
>   q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.
>
> I'm thinking that the realistic target for 9.6 might be the first one.
>

If we want to get something out for this release, clearly yes, and being
able to specify 2 sync targets is already a win when the two sync standbys
are not exactly at the same location. FWIW, I don't doing coding and/or
review work, that's basically my first patch that needs a bit more love and
polishing, *and* test cases but I am used enough to perl and PostgresNode
these days to produce something based on sanity checks of
pg_stat_replication and my other set of patches that have more basic
routines.

Now I would not mind if we actually jump into the 3rd case if we are fine
with doing nothing for this release, but this requires a lot of design and
background work, so that's not plausible for 9.6. Of course if there are
voices against the scenario proposed by Fujii-san others feel free to speak
up.
-- 
Michael


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-01 Thread Fujii Masao
On Mon, Feb 1, 2016 at 5:36 PM, Masahiko Sawada  wrote:
> On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
>  wrote:
>> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada  
>> wrote:
>>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>>>  wrote:
 On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
 wrote:
> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>  wrote:
>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>> By the discussions so far, I'm planning to have several replication
>>> methods such as 'quorum', 'complex' in the feature, and the each
>>> replication method specifies the syntax of s_s_names.
>>> It means that s_s_names could have the number of sync standbys like
>>> what current patch does.
>>
>> What if the application_name of a standby node has the format of an 
>> integer?
>
> Even if the standby has an integer as application_name, we can set
> s_s_names like '2,1,2,3'.
> The leading '2' is always handled as the number of sync standbys when
> s_r_method = 'priority'.

 Hm. I agree with Fujii-san here, having the number of sync standbys
 defined in a parameter that should have a list of names is a bit
 confusing. I'd rather have a separate GUC, which brings us back to one
 of the first patches that I came up with, and a couple of people,
 including Josh were not happy with that because this did not support
 real quorum. Perhaps the final answer would be really to get a set of
 hooks, and a contrib module making use of that.
>>>
>>> Yeah, I agree with having set of hooks, and postgres core has simple
>>> multi sync replication mechanism like you suggested at first version.
>>
>> If there are hooks, I don't think that we should really bother about
>> having in core anything more complicated than what we have now. The
>> trick will be to come up with a hook design modular enough to support
>> the kind of configurations mentioned on this thread. Roughly perhaps a
>> refactoring of the syncrep code so as it is possible to wait for
>> multiple targets some of them being optional,, one modular way in
>> pg_stat_get_wal_senders to represent the status of a node to user, and
>> another hook to return to decide which are the nodes to wait for. Some
>> of the nodes being waited for may be based on conditions for quorum
>> support. That's a hard problem to do that in a flexible enough way.
>
> Hm, I think not-nested quorum and priority are not complicated, and we
> should support at least both or either simple method in core of
> postgres.
> More complicated method like using json-style, or dedicated language
> would be supported by external module.

So what about the following plan?

[first version]
Add only synchronous_standby_num which specifies the number of standbys
that the master must wait for before marking sync replication as completed.
This version supports simple use cases like "I want to have two synchronous
standbys".

[second version]
Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
additionally supports simple quorum commit case like "I want to ensure
that WAL is replicated synchronously to at least two standbys from five
ones listed in s_s_names".

Or

Add something like quorum_replication_num and quorum_standby_names, i.e.,
the master must wait for at least q_r_num standbys from ones listed in
q_s_names before marking sync replication as completed. Also the master
must wait for sync replication according to s_s_num and s_s_num.
That is, this approach separates 'priority' and 'quorum' to each parameters.
This increases the number of GUC parameters, but ISTM less confusing, and
it supports a bit complicated case like "there is one local standby and three
remote standbys, then I want to ensure that WAL is replicated synchronously
to the local standby and at least two remote one", e.g.,

  s_s_num = 1, s_s_names = 'local'
  q_s_num = 2, q_s_names = 'remote1, remote2, remote3'

[third version]
Add the hooks for more complicated sync replication cases.

I'm thinking that the realistic target for 9.6 might be the first one.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-31 Thread Masahiko Sawada
On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
 wrote:
> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>> By the discussions so far, I'm planning to have several replication
>> methods such as 'quorum', 'complex' in the feature, and the each
>> replication method specifies the syntax of s_s_names.
>> It means that s_s_names could have the number of sync standbys like
>> what current patch does.
>
> What if the application_name of a standby node has the format of an integer?

Even if the standby has an integer as application_name, we can set
s_s_names like '2,1,2,3'.
The leading '2' is always handled as the number of sync standbys when
s_r_method = 'priority'.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-31 Thread Michael Paquier
On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  wrote:
> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>  wrote:
>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>> By the discussions so far, I'm planning to have several replication
>>> methods such as 'quorum', 'complex' in the feature, and the each
>>> replication method specifies the syntax of s_s_names.
>>> It means that s_s_names could have the number of sync standbys like
>>> what current patch does.
>>
>> What if the application_name of a standby node has the format of an integer?
>
> Even if the standby has an integer as application_name, we can set
> s_s_names like '2,1,2,3'.
> The leading '2' is always handled as the number of sync standbys when
> s_r_method = 'priority'.

Hm. I agree with Fujii-san here, having the number of sync standbys
defined in a parameter that should have a list of names is a bit
confusing. I'd rather have a separate GUC, which brings us back to one
of the first patches that I came up with, and a couple of people,
including Josh were not happy with that because this did not support
real quorum. Perhaps the final answer would be really to get a set of
hooks, and a contrib module making use of that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-31 Thread Masahiko Sawada
On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
 wrote:
> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
> wrote:
>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>  wrote:
>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
 By the discussions so far, I'm planning to have several replication
 methods such as 'quorum', 'complex' in the feature, and the each
 replication method specifies the syntax of s_s_names.
 It means that s_s_names could have the number of sync standbys like
 what current patch does.
>>>
>>> What if the application_name of a standby node has the format of an integer?
>>
>> Even if the standby has an integer as application_name, we can set
>> s_s_names like '2,1,2,3'.
>> The leading '2' is always handled as the number of sync standbys when
>> s_r_method = 'priority'.
>
> Hm. I agree with Fujii-san here, having the number of sync standbys
> defined in a parameter that should have a list of names is a bit
> confusing. I'd rather have a separate GUC, which brings us back to one
> of the first patches that I came up with, and a couple of people,
> including Josh were not happy with that because this did not support
> real quorum. Perhaps the final answer would be really to get a set of
> hooks, and a contrib module making use of that.

Yeah, I agree with having set of hooks, and postgres core has simple
multi sync replication mechanism like you suggested at first version.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-31 Thread Michael Paquier
On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada  wrote:
> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>  wrote:
>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
>> wrote:
>>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>>  wrote:
 On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
> By the discussions so far, I'm planning to have several replication
> methods such as 'quorum', 'complex' in the feature, and the each
> replication method specifies the syntax of s_s_names.
> It means that s_s_names could have the number of sync standbys like
> what current patch does.

 What if the application_name of a standby node has the format of an 
 integer?
>>>
>>> Even if the standby has an integer as application_name, we can set
>>> s_s_names like '2,1,2,3'.
>>> The leading '2' is always handled as the number of sync standbys when
>>> s_r_method = 'priority'.
>>
>> Hm. I agree with Fujii-san here, having the number of sync standbys
>> defined in a parameter that should have a list of names is a bit
>> confusing. I'd rather have a separate GUC, which brings us back to one
>> of the first patches that I came up with, and a couple of people,
>> including Josh were not happy with that because this did not support
>> real quorum. Perhaps the final answer would be really to get a set of
>> hooks, and a contrib module making use of that.
>
> Yeah, I agree with having set of hooks, and postgres core has simple
> multi sync replication mechanism like you suggested at first version.

If there are hooks, I don't think that we should really bother about
having in core anything more complicated than what we have now. The
trick will be to come up with a hook design modular enough to support
the kind of configurations mentioned on this thread. Roughly perhaps a
refactoring of the syncrep code so as it is possible to wait for
multiple targets some of them being optional,, one modular way in
pg_stat_get_wal_senders to represent the status of a node to user, and
another hook to return to decide which are the nodes to wait for. Some
of the nodes being waited for may be based on conditions for quorum
support. That's a hard problem to do that in a flexible enough way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   >