[GENERAL] Sync production DB with development?
I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch production host database name | psql database name, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps?My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data. ---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD
Re: [GENERAL] Sync production DB with development?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 El 08/10/14 a las 14:01, Israel Brewster escibió: I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch production host database name | psql database name, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps? You can setup a streaming server, however I wont' recommend to sync from a production server. Usually there is no need to have *all* the data from prod to development. Both environments should be isolated for security reasons. Other thing is to implement a QA server, streaming from the master or taking a nightly snapshot with pg_basebackup. I think it could be more than enough. Actually, doing pg_dump | psql could take more time than pg_basebackup. My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data. Which is the entire size of your production cluster? - -- - -- Emanuel Calvo http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Bs. As., Argentina (GMT-3) -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB 7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd 3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/ GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6 xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5 oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq 2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw xDGMjU/lhV7A9MagRZa6 =g73R -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sync production DB with development?
On Oct 8, 2014, at 9:30 AM, Emanuel Calvo emanuel.ca...@2ndquadrant.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 El 08/10/14 a las 14:01, Israel Brewster escibió: I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch production host database name | psql database name, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps? You can setup a streaming server, however I wont' recommend to sync from a production server. No, that wouldn't work well, because I need full access to my development server, and I need to be able to NOT have it in sync while I am working on it. Usually there is no need to have *all* the data from prod to development. Both environments should be isolated for security reasons. Agreed. and no, I don't need all the data. But pg_dump doesn't give me an option to, say, only grab the last week of data. Other thing is to implement a QA server, streaming from the master or taking a nightly snapshot with pg_basebackup. I think it could be more than enough. A QA server is great, and nightly snapshots are probably fine for that, however it doesn't help with my development server, I don't think. Actually, doing pg_dump | psql could take more time than pg_basebackup. My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data. Which is the entire size of your production cluster? At the moment, only about 538MB. Which I realize isn't all that large in the grand scheme of databases. - -- - -- Emanuel Calvo http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Bs. As., Argentina (GMT-3) -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB 7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd 3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/ GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6 xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5 oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq 2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw xDGMjU/lhV7A9MagRZa6 =g73R -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sync production DB with development?
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data. A minute is really not that long of a wait, especially given the tradeoff in complexity. Still, if the majority of the time is spent moving old data from one or more tables, you could exclude those from the pg_dump with -T, then copy over some small subset of the table with a pair of COPY commands from prod to dev. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201410081635 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAlQ1oBIACgkQvJuQZxSWSsjSxgCgjhcAvjgoBgpYA2FEKiKovSos l/QAn1tdZk69ku8Z1LArrFzESopr1/OB =l59M -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sync production DB with development?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 El 08/10/14 a las 17:24, Israel Brewster escibió: On Oct 8, 2014, at 9:30 AM, Emanuel Calvo emanuel.ca...@2ndquadrant.com wrote: El 08/10/14 a las 14:01, Israel Brewster escibió: I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch production host database name | psql database name, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps? You can setup a streaming server, however I wont' recommend to sync from a production server. No, that wouldn't work well, because I need full access to my development server, and I need to be able to NOT have it in sync while I am working on it. That's the issue: streaming will left your development server as read-only and I guess you don't want to do that. Usually there is no need to have *all* the data from prod to development. Both environments should be isolated for security reasons. Agreed. and no, I don't need all the data. But pg_dump doesn't give me an option to, say, only grab the last week of data. Yeah, at this point you will need to build your own sample data using production values. Other thing is to implement a QA server, streaming from the master or taking a nightly snapshot with pg_basebackup. I think it could be more than enough. A QA server is great, and nightly snapshots are probably fine for that, however it doesn't help with my development server, I don't think. Well, the nightly snapshots will allow you to sync every night your development server and write over it during the day (you can sync many times as you want). Actually, doing pg_dump | psql could take more time than pg_basebackup. My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data. Which is the entire size of your production cluster? At the moment, only about 538MB. Which I realize isn't all that large in the grand scheme of databases. I think that at this point you shouldn't be worried at all. Usually, you can build sample data DB and using it as a template. So, any time you want to drp and recreate your database, you just do a *create database* with the template option. - -- - -- Emanuel Calvo http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Bs. As., Argentina (GMT-3) -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBCgAGBQJUNaRnAAoJEIBeI/HMagHm5sEQALsfKrf3UHoVAr3N9EQfvlza Ds7ZXue48Mt63FHLlsscC/lsJlfublJXwnV/7H9aFgviVDgWRqpjfrtFWk/5WzXn g0c6zbjB13uc5K50OQqeFjo4Sb1UMZqSInGLDa3wi2BCT8XQepUQk60Hy9YJo449 2VkibVrJPulEJN2pviL7nlHNQoW3A2KHne9uEc3sdVDtAPgXrbB45BW184Qgpc34 r2ReAqM8S533lTe/ZfXpn6ZUru8uJHXnwkRirt8HOelXeIYNxvZyjmzaFpXmt5ZG grJfx0WZB+qTmT4lLw2OdwZ/3U/M1y4cMLnePmBWpdxph43gSH7AEDO2XMyhpsnQ 5To1zgfZexvZXZ1AIMAAShgGxPMLgCVy7lTdzfZVS92CiU2ou4gsh8s3wZpwjc68 VuplS28HIY2GNZQm4OackfAXwm9yR80YdW7rE6Il7eUx1pAv8OZJyQPDmoav4J7V Ir7X9kIMK8JUtb+G79lpsQcBwJ6f+BGW+OMMqfYHfuSfPErLprYuoN9A7cZVJNtv D9fWyHtcvyFMfyxfZmGdY1pK5M/9DWrI7e9ILp29oywZies0Zk9b9AuV3Hki8LZm kQuJOswXKM5/g1U4QS9a5Pf0DF5qx4vAZq9OgtLnN8kyUykgZaHuJJzVSdE+wsOW Q14aZJWWdJpBSu45NtrG =Zb7i -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sync production DB with development?
On 10/8/14, 2:24 PM, Israel Brewster wrote: Agreed. and no, I don't need all the data. But pg_dump doesn't give me an option to, say, only grab the last week of data. You might want to check out pg_sample. It lets you get a referentially consistent sample of a database. You specify the number of rows you want and it will get do the rest, retrieving enough rows from other tables to satisfy any foreign key constraints. https://github.com/mla/pg_sample Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general