On Wed, Sep 27, 2017 at 4:08 PM, Igor Polishchuk <ora4...@gmail.com> wrote:

> Scott,
> Thank you for your insight. I do have some extra disk and network
> throughput to spare. However, my question is ‘Can I run rsync while
> streaming is running?’
>

Ahh, I see.  Sorry

You need to stop the slave, put the master into backup mode, do the
parallel rsync over the existing slave's data directory (for differential).

Then, pg_stop_backup(), start the slave.

--Scott



> A streaming replica is a physical copy of a master, so why not. My concern
> is a possible silent introduction of some block corruptions, that would not
> be fixed by a block copy in wal files. I think such corruptions should not
> happen, and I saw a few instances where running rsync seemed to work.
> I’m curious if somebody is aware about a situation where a corruption is
> likely to happen.
>



>
> Igor
>
> On Sep 27, 2017, at 12:48, Scott Mead <sco...@openscg.com> wrote:
>
>
>
> On Wed, Sep 27, 2017 at 1:59 PM, Igor Polishchuk <ora4...@gmail.com>
> wrote:
>
>> Sorry, here are the missing details, if it helps:
>> Postgres 9.6.5 on CentOS 7.2.1511
>>
>> > On Sep 27, 2017, at 10:56, Igor Polishchuk <ora4...@gmail.com> wrote:
>> >
>> > Hello,
>> > I have a multi-terabyte streaming replica on a bysy database. When I
>> set it up, repetative rsyncs take at least 6 hours each.
>> > So, when I start the replica, it begins streaming, but it is many hours
>> behind right from the start. It is working for hours, and cannot reach a
>> consistent state
>> > so the database is not getting opened for queries. I have plenty of WAL
>> files available in the master’s pg_xlog, so the replica never uses archived
>> logs.
>> > A question:
>> > Should I be able to run one more rsync from the master to my replica
>> while it is streaming?
>> > The idea is to overcome the throughput limit imposed by a single
>> recovery process on the replica and allow to catch up quicker.
>> > I remember doing it many years ago on Pg 8.4, and also heard from other
>> people doing it. In all cases, it seamed working.
>> > I’m just not sure if there is no high risk of introducing some hidden
>> data corruption, which I may not notice for a while on such a huge database.
>> > Any educated opinions on the subject here?
>>
>
> It really comes down to the amount of I/O (network and disk) your system
> can handle while under load.  I've used 2 methods to do this in the past:
>
> - http://moo.nac.uci.edu/~hjm/parsync/
>
>   parsync (parallel rsync)is nice, it does all the hard work for you of
> parellizing rsync.  It's just a pain to get all the prereqs installed.
>
>
> - rsync --itemize-changes
>   Essentially, use this to get a list of files, manually split them out
> and fire up a number of rsyncs.  parsync does this for you, but, if you
> can't get it going for any reason, this works.
>
>
> The real trick, after you do your parallel rsync, make sure that you run
> one final rsync to sync-up any missed items.
>
> Remember, it's all about I/O.  The more parallel threads you use, the
> harder you'll beat up the disks / network on the master, which could impact
> production.
>
> Good luck
>
> --Scott
>
>
>
>
>
>
>
>> >
>> > Thank you
>> > Igor Polishchuk
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> --
> Scott Mead
> Sr. Architect
> *OpenSCG <http://openscg.com/>*
> http://openscg.com
>
>
>


-- 
--
Scott Mead
Sr. Architect
*OpenSCG <http://openscg.com>*
http://openscg.com

Reply via email to