[GENERAL] Sync production DB with development?

2014-10-08 Thread Israel Brewster
I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch production host database name | psql database name, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps?My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data.
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD




Re: [GENERAL] Sync production DB with development?

2014-10-08 Thread Emanuel Calvo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512



El 08/10/14 a las 14:01, Israel Brewster escibió:
 I am currently doing periodic syncs of one of my production
 databases to my development database using the command pg_dump -ch
 production host database name | psql database name, run on my
 development server. This works well enough, but as the size of the
 production database grows, this command is, for obvious reasons,
 getting progressively slower (a full sync like this currently takes
 about 35 seconds). Is there a better way? Something that will only
  transfer records that are different on the production server, like
 rsync does for files perhaps?

You can setup a streaming server, however I wont' recommend to sync
from a production server.

Usually there is no need to have *all* the data from prod to
development. Both environments should be isolated for security reasons.

Other thing is to implement a QA server, streaming from the master or
taking a nightly snapshot with pg_basebackup. I think it could be more
than enough.

Actually, doing pg_dump | psql could take more time than pg_basebackup.

 
 My main concern here is the time it takes to sync, given that the
 database size will only continue growing as time passes (unless I
 start implementing an archive at some point). The current database
 has two years worth of records. I would assume that the time the
 sync takes would grow roughly linearly with the number of records,
 so I could easily be over a minute of sync time in another two
 years. I would really rather not have to wait several minutes every
 time I want to update my development data.

Which is the entire size of your production cluster?


- -- 
- --
Emanuel Calvo  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
Bs. As., Argentina (GMT-3)
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB
7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd
3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF
mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd
HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh
N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/
GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6
xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt
UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5
oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq
2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw
xDGMjU/lhV7A9MagRZa6
=g73R
-END PGP SIGNATURE-


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sync production DB with development?

2014-10-08 Thread Israel Brewster
On Oct 8, 2014, at 9:30 AM, Emanuel Calvo emanuel.ca...@2ndquadrant.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 
 
 El 08/10/14 a las 14:01, Israel Brewster escibió:
 I am currently doing periodic syncs of one of my production
 databases to my development database using the command pg_dump -ch
 production host database name | psql database name, run on my
 development server. This works well enough, but as the size of the
 production database grows, this command is, for obvious reasons,
 getting progressively slower (a full sync like this currently takes
 about 35 seconds). Is there a better way? Something that will only
 transfer records that are different on the production server, like
 rsync does for files perhaps?
 
 You can setup a streaming server, however I wont' recommend to sync
 from a production server.

No, that wouldn't work well, because I need full access to my development 
server, and I need to be able to NOT have it in sync while I am working on it.

 
 Usually there is no need to have *all* the data from prod to
 development. Both environments should be isolated for security reasons.

Agreed. and no, I don't need all the data. But pg_dump doesn't give me an 
option to, say, only grab the last week of data.

 
 Other thing is to implement a QA server, streaming from the master or
 taking a nightly snapshot with pg_basebackup. I think it could be more
 than enough.

A QA server is great, and nightly snapshots are probably fine for that, however 
it doesn't help with my development server, I don't think.

 
 Actually, doing pg_dump | psql could take more time than pg_basebackup.
 
 
 My main concern here is the time it takes to sync, given that the
 database size will only continue growing as time passes (unless I
 start implementing an archive at some point). The current database
 has two years worth of records. I would assume that the time the
 sync takes would grow roughly linearly with the number of records,
 so I could easily be over a minute of sync time in another two
 years. I would really rather not have to wait several minutes every
 time I want to update my development data.
 
 Which is the entire size of your production cluster?

At the moment, only about 538MB. Which I realize isn't all that large in the 
grand scheme of databases. 

 
 
 - -- 
 - --
 Emanuel Calvo  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 Bs. As., Argentina (GMT-3)
 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB
 7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd
 3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF
 mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd
 HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh
 N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/
 GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6
 xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt
 UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5
 oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq
 2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw
 xDGMjU/lhV7A9MagRZa6
 =g73R
 -END PGP SIGNATURE-
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sync production DB with development?

2014-10-08 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 so I could easily be over a minute of sync time in another two years. 
 I would really rather not have to wait several minutes every time I 
 want to update my development data.

A minute is really not that long of a wait, especially given the tradeoff 
in complexity. Still, if the majority of the time is spent moving old 
data from one or more tables, you could exclude those from the pg_dump 
with -T, then copy over some small subset of the table with a pair of 
COPY commands from prod to dev.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201410081635
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlQ1oBIACgkQvJuQZxSWSsjSxgCgjhcAvjgoBgpYA2FEKiKovSos
l/QAn1tdZk69ku8Z1LArrFzESopr1/OB
=l59M
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sync production DB with development?

2014-10-08 Thread Emanuel Calvo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512



El 08/10/14 a las 17:24, Israel Brewster escibió:
 On Oct 8, 2014, at 9:30 AM, Emanuel Calvo
 emanuel.ca...@2ndquadrant.com wrote:
 
 
 
 El 08/10/14 a las 14:01, Israel Brewster escibió:
 I am currently doing periodic syncs of one of my production 
 databases to my development database using the command
 pg_dump -ch production host database name | psql
 database name, run on my development server. This works
 well enough, but as the size of the production database
 grows, this command is, for obvious reasons, getting
 progressively slower (a full sync like this currently takes 
 about 35 seconds). Is there a better way? Something that will
 only transfer records that are different on the production
 server, like rsync does for files perhaps?
 
 You can setup a streaming server, however I wont' recommend to
 sync from a production server.
 
 No, that wouldn't work well, because I need full access to my
 development server, and I need to be able to NOT have it in sync
 while I am working on it.
 
 

That's the issue: streaming will left your development server as
read-only and I guess you don't want to do that.

 Usually there is no need to have *all* the data from prod to 
 development. Both environments should be isolated for security
 reasons.
 
 Agreed. and no, I don't need all the data. But pg_dump doesn't
 give me an option to, say, only grab the last week of data.
 

Yeah, at this point you will need to build your own sample data using
production values.

 
 Other thing is to implement a QA server, streaming from the master
 or taking a nightly snapshot with pg_basebackup. I think it could
 be more than enough.
 
 A QA server is great, and nightly snapshots are probably fine for
 that, however it doesn't help with my development server, I don't
 think.
 

Well, the nightly snapshots will allow you to sync every night your
development server and write over it during the day (you can sync
many times as you want).

 
 Actually, doing pg_dump | psql could take more time than
 pg_basebackup.
 
 
 My main concern here is the time it takes to sync, given that
 the database size will only continue growing as time passes
 (unless I start implementing an archive at some point). The
 current database has two years worth of records. I would
 assume that the time the sync takes would grow roughly
 linearly with the number of records, so I could easily be
 over a minute of sync time in another two years. I would
 really rather not have to wait several minutes every time I
 want to update my development data.
 
 Which is the entire size of your production cluster?
 
 At the moment, only about 538MB. Which I realize isn't all that
 large in the grand scheme of databases.
 

I think that at this point you shouldn't be worried at all.

Usually, you can build sample data DB and using it as a template.
So, any time you want to drp and recreate your database, you just do
a *create database* with the template option.


- -- 
- --
Emanuel Calvo  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
Bs. As., Argentina (GMT-3)
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCgAGBQJUNaRnAAoJEIBeI/HMagHm5sEQALsfKrf3UHoVAr3N9EQfvlza
Ds7ZXue48Mt63FHLlsscC/lsJlfublJXwnV/7H9aFgviVDgWRqpjfrtFWk/5WzXn
g0c6zbjB13uc5K50OQqeFjo4Sb1UMZqSInGLDa3wi2BCT8XQepUQk60Hy9YJo449
2VkibVrJPulEJN2pviL7nlHNQoW3A2KHne9uEc3sdVDtAPgXrbB45BW184Qgpc34
r2ReAqM8S533lTe/ZfXpn6ZUru8uJHXnwkRirt8HOelXeIYNxvZyjmzaFpXmt5ZG
grJfx0WZB+qTmT4lLw2OdwZ/3U/M1y4cMLnePmBWpdxph43gSH7AEDO2XMyhpsnQ
5To1zgfZexvZXZ1AIMAAShgGxPMLgCVy7lTdzfZVS92CiU2ou4gsh8s3wZpwjc68
VuplS28HIY2GNZQm4OackfAXwm9yR80YdW7rE6Il7eUx1pAv8OZJyQPDmoav4J7V
Ir7X9kIMK8JUtb+G79lpsQcBwJ6f+BGW+OMMqfYHfuSfPErLprYuoN9A7cZVJNtv
D9fWyHtcvyFMfyxfZmGdY1pK5M/9DWrI7e9ILp29oywZies0Zk9b9AuV3Hki8LZm
kQuJOswXKM5/g1U4QS9a5Pf0DF5qx4vAZq9OgtLnN8kyUykgZaHuJJzVSdE+wsOW
Q14aZJWWdJpBSu45NtrG
=Zb7i
-END PGP SIGNATURE-


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sync production DB with development?

2014-10-08 Thread Jeff Ross

On 10/8/14, 2:24 PM, Israel Brewster wrote:

Agreed. and no, I don't need all the data. But pg_dump doesn't give me an 
option to, say, only grab the last week of data.


You might want to check out pg_sample.  It lets you get a referentially 
consistent sample of a database.  You specify the number of rows you 
want and it will get do the rest, retrieving enough rows from other 
tables to satisfy any foreign key constraints.


https://github.com/mla/pg_sample

Jeff


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general