On Fri, Mar 2, 2012 at 2:26 PM, Magnus Hagander <mag...@hagander.net> wrote:
> On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao.fu...@gmail.com> wrote:
>> On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <mag...@hagander.net> wrote:
>>> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fu...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/
>>>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v 
>>>>> -h 127.0.0.1 -p 5921 -U replication
>>>>> xlog start point: 2/AC4E2600
>>>>> pg_basebackup: starting background WAL receiver
>>>>> 692447/692447 kB (100%), 1/1 tablespace
>>>>> xlog end point: 2/AC4E2600
>>>>> pg_basebackup: waiting for background process to finish streaming...
>>>>> pg_basebackup: base backup completed
>>>>>
>>>>> real    3m56.237s
>>>>> user    0m0.224s
>>>>> sys     0m0.936s
>>>>>
>>>>> (time is long because this is only test database with no traffic, so I 
>>>>> had to make some inserts for it to finish)
>>>>
>>>> The above article points out the problem of pg_basebackup from the standby:
>>>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if
>>>> there is no traffic in the database.
>>>>
>>>> When "-x stream" is specified, pg_basebackup forks the background process
>>>> for receiving WAL records during backup, takes an online backup and waits 
>>>> for
>>>> the background process to end. The forked background process keeps 
>>>> receiving
>>>> WAL records, and whenever it reaches end of WAL file, it checks whether it 
>>>> has
>>>> already received all WAL files required for the backup, and exits if yes. 
>>>> Which
>>>> means that at least one WAL segment switch is required for pg_basebackup 
>>>> with
>>>> "-x stream" option to end.
>>>>
>>>> In the backup from the master, WAL file switch always occurs at both start 
>>>> and
>>>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so 
>>>> the
>>>> above logic works fine even if there is no traffic. OTOH, in the backup 
>>>> from the
>>>> standby, while there is no traffic, WAL file switch is not performed at 
>>>> all. So
>>>> in that case, there is no chance that the background process reaches end 
>>>> of WAL
>>>> file, check whether all required WAL arrives and exit. At the end, 
>>>> pg_basebackup
>>>> gets stuck.
>>>>
>>>> To fix the problem, I'd propose to change the background process so that it
>>>> checks whether all required WAL has arrived, every time data is received, 
>>>> even
>>>> if end of WAL file is not reached. Patch attached. Comments?
>>>
>>> This seems like a good thing in general.
>>>
>>> Why does it need to modify pg_receivexlog, though? I thought only
>>> pg_basebackup had tihs issue?
>>>
>>> I guess it is because of the change of the API to
>>> stream_continue_callback only?
>>
>> Yes, that's the reason why I changed continue_streaming() in 
>> pg_receivexlog.c.
>>
>> But the reason why I changed segment_callback() in pg_receivexlog.c is not 
>> the
>> same. I did that because previously segment_finish_callback is called
>> only at the
>> end of WAL segment but in the patch it can be called at the middle of 
>> segment.
>> OTOH, segment_callback() must emit a verbose message only when current
>> WAL segment is complete. So I had to add the check of whether current WAL
>> segment is partial or complete into segment_callback().
>
> Yeah, I caught that.
>
>
>>> Looking at it after your patch,
>>> stream_continue_callback and segment_finish_callback are the same.
>>> Should we perhaps just fold them into a single
>>> stream_continue_callback? Since you had to move the "detect segment
>>> end" to the caller anyway?
>>
>> No. I think we cannot do that because in pg_receivexlog they are not the 
>> same.
>
> But couldn't they be made the same by making the same check as you put
> in for the verbose message above?
>

While reviewing and cleaning this patch up a bit I noticed it actually
broke pg_receivexlog in the renaming.

Here is a new version of the patch, reworked based on the above so
we're down to a single callback. I moved the "rename last segment file
even if it's not complete" to be a parameter into ReceiveXlogStream()
instead of trying to overload a third functionality on the callback
(which is what broke pg_receivexlog).

How does this look? Have I overlooked any cases?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment: xlog_stream2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to