Re: [GENERAL] Help estimating database and WAL size
On 2012-10-08, Daniel Serodio (lists) daniel.li...@mandic.com.br wrote: We are preparing a PostgreSQL database for production usage and we need to estimate the storage size for this database. We're a team of developers with low expertise on database administration, so we are doing research, reading manuals and using our general IT knowledge to achieve this. We have actual data to migrate to this database and some rough estimations of growth. For the sake of the example, let's say we have a estimation of growth of 50% per year. The point is: what's the general proper technique for doing a good size estimation? We are estimating the storage usage by the following rules. Topics where we need advice are marked with ** asterisks **. Feedback on the whole process is more than welcome. 1) Estimate the size of each table 1.1) Discover the actual size of each row. - For fields with a fixed size (like bigint, char, etc) we used the sizes described in the documentation - For fields with a dynamic size (like text) we estimated the string length and used the function select pg_column_size('expected text here'::text) long text is subject to compression, pg_column_size doesn't seem to test compression, compression is some sort of LZ.. - We added 4 more bytes for the OID that PostgreSQL uses internally OID is optional, IIRC PGXID is not 1.2) Multiply the size of each row by the number of estimated rows ** Do I need to consider any overhead here, like row or table metadata? ** page size 8K column overhead 1 byte per not-NULL column, NULLs are free, 2) Estimate the size of each table index ** Don't know how to estimate this, need advice here ** IIRC ( data being indexed + 8 bytes ) / fill factor 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** how big are your transactions? 4) Estimate the size of the backups (full and incremental) ** Don't know how to estimate this, need advice here ** depends on the format you use, backups tend to compress well. 5) Sum all the estimates for the actual minimum size no, you get estimated size. 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 1, 2 and 4 for the minimum size after 1 year 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 and 6 for a good safety margin I know the rules got pretty extensive, please let me know if you need more data or examples for a better understanding. We've also posted this question to http://dba.stackexchange.com/q/25617/10166 Thanks in advance, Daniel Serodio -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help estimating database and WAL size
On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote: 3) Estimate the size of the transaction log ** We've got no idea how to estimate this, need advice ** postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead Logs). These are typically 16MB each. on databases with a really heavy write load, I might bump the checkpoint_segments as high as 60, which seems to result in about 120 of them being created, 2GB total. these files get reused, unless you are archiving them to implement a continuous realtime backup system (which enables PITR, Point in Time Recovery) -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
On 2012-10-10, Vineet Deodhar vineet.deod...@gmail.com wrote: --f46d040714c5d7a08c04cbb08256 Content-Type: text/plain; charset=UTF-8 Hi ! At present, I am using MySQL as backend for my work. Because of the licensing implications, I am considering to shift from MySQL to pgsql. Typically, my apps are multi-user, web based or LAN based. 1) Read over the internet that --- Postgres is not threaded, but every connection gets it's own process. The OS will distribute the processes across the processors. Basically a single connection will not be any faster with SMP, but multiple connections will be. MySQL is multi-threaded server so it can use many processors. A separate thread is created for each connection. source: http://dcdbappl1.cern.ch:8080/dcdb/archive/ttraczyk/db_compare/db_compare.html#Comparison+of+Oracle%2C+MySQL+and+Postgres+DBMS In what way it might affect my app performance? It's going to hurt a bit if creating processes is expensive and you need many. 2) I run MySQL from a USB stick. There is no installation required (on WinXP.). (not tried on Ubuntu) Is it the same for pgsql? It's probably possible, but usb flash gets you all the disadvantages of SSD with no benefits (except portability), you'd probably have to reformat the usb to NTFS - windows FAT doesn't support postgres. And you'd also need to do a custom install. 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data type or something else) Probably what properties of tinyint do you need? -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
On 2012-10-11, Vineet Deodhar vineet.deod...@gmail.com wrote: To give an example, I have tables for storing master records (year master, security master, etc.) for which pkid TINYINT is just sufficient. These pkid's are used as fk constraints in tables for storing business transactions. The no. of rows in business transactions tables is in millions. Here, I NEED to worry about the storage space occupied by the pkid fields. with disk at about 50c/Gigabyte why is it you need to worry? -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] non-integer constant in ORDER BY: why exactly, and documentation?
On 2012-10-11, David Johnston pol...@yahoo.com wrote: This is a multipart message in MIME format. --=_NextPart_000_0400_01CDA7D1.CAF1CC60 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Ken Tanzer Sent: Thursday, October 11, 2012 4:49 PM To: pgsql-general@postgresql.org Subject: [GENERAL] non-integer constant in ORDER BY: why exactly, and documentation? Hi. I recently ran a query that generate the same error as this: SELECT * FROM generate_series(1,10) ORDER BY 'foo'; ERROR: non-integer constant in ORDER BY LINE 1: SELECT * FROM generate_series(1,10) ORDER BY 'foo'; yeah, it seems there's a difference between a constant and a constant-valued expression SELECT * FROM generate_series(1,10) ORDER BY 'foo'::text; SELECT * FROM generate_series(1,10) ORDER BY 1::int desc; SELECT * FROM generate_series(1,10) ORDER BY 1 desc; -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] stored procedure multiple call call question
On 2012-10-02, Chris McDonald chrisjonmcdon...@gmail.com wrote: Hi, If I had a single table targ to insert into I would do an INSERT INTO targ SELECT thiscol, thatcol, theothercol FROM FOO. The problem is that I have tables targ1, targ2, targn to insert things into and a nice stored procedure myproc which does the insertion into all 3 tables - problem is that I dont see how I can effectively do INSERT INTO myproc SELECT thiscol, thatcol, theothercol FROM FOO. see the chapter on partitioning for info on how to write a rule or trigger to do it with syntax like that. However with what you already have you you can do this: SELECT myproc(thiscol, thatcol, theothercol) FROM FOO; -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Improve MMO Game Performance
we are all aware of the popular trend of MMO games. where players face each other live. My area of concern, is storage of player moves and game results. Using Csharp and PostgreSql The game client is browser based ASP.NET and calls Csharp functions for all database related processing To understand my query, please consider the following scenario we store game progress in a postgres table. A tournament starts with four players and following activity Each player starts with 100hitpoints player 1 makes a strike (we refer to a chart to convert blows to hitpoints with random-range %) player 2 has 92HP, and returns a light blow, so player1 has 98hp The above two round will now be in Game Progress Table, asROW Player1HP Player2HP Strikefrom StrikeTo ReturnStrikeHP Round TimeStamp StrikeMethod 1 100 100 000 0 2 9892P1P2 2 1 There is a tremendous flow of sql queries, There are average/minimum 100 tournaments online per 12 minutes or 500 players / hour In Game Progress table, We are storing each player move a 12 round tourament of 4 player there can be 48 records plus around same number for spells or special items a total of 96 per tourament or 48000 record inserts per hour (500 players/hour) Are there any particular settings or methods available to improve Just insert_table operations thanks arvind
Re: [GENERAL] moving from MySQL to pgsql
On Sat, Oct 13, 2012 at 3:22 AM, Jasen Betts ja...@xnet.co.nz wrote: On 2012-10-11, Vineet Deodhar vineet.deod...@gmail.com wrote: To give an example, I have tables for storing master records (year master, security master, etc.) for which pkid TINYINT is just sufficient. These pkid's are used as fk constraints in tables for storing business transactions. The no. of rows in business transactions tables is in millions. Here, I NEED to worry about the storage space occupied by the pkid fields. with disk at about 50c/Gigabyte why is it you need to worry? see upthread: OP is running off a USB stick. If he's running MyISAM, the postgres database is going to be larger and there is going to be a lot more writing than he's used to. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Improve MMO Game Performance
On Sat, Oct 13, 2012 at 1:52 PM, Arvind Singh arvin...@hotmail.com wrote: To understand my query, please consider the following scenario we store game progress in a postgres table. A tournament starts with four players and following activity Each player starts with 100hitpoints player 1 makes a strike (we refer to a chart to convert blows to hitpoints with random-range %) player 2 has 92HP, and returns a light blow, so player1 has 98hp The above two round will now be in Game Progress Table, as ROW Player1HP Player2HP Strikefrom StrikeTo ReturnStrikeHP Round TimeStamp StrikeMethod 1 100 100 000 0 2 9892P1P2 2 1 There is a tremendous flow of sql queries, There are average/minimum 100 tournaments online per 12 minutes or 500 players / hour In Game Progress table, We are storing each player move a 12 round tourament of 4 player there can be 48 records plus around same number for spells or special items a total of 96 per tourament or 48000 record inserts per hour (500 players/hour) that's below 15 insert/s ... not something to worry about, on recent hardware. Are there any particular settings or methods available to improve Just insert_table operations - avoid too many unused indexes - keep your model normalized - keep pg_xlog on separate device - follow tuning advices from wiki http://wiki.postgresql.org/wiki/Performance_Optimization thanks arvind -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Improve MMO Game Performance
On Sat, Oct 13, 2012 at 5:52 AM, Arvind Singh arvin...@hotmail.com wrote: we are all aware of the popular trend of MMO games. where players face each other live. My area of concern, is storage of player moves and game results. Using Csharp and PostgreSql The game client is browser based ASP.NET and calls Csharp functions for all database related processing To understand my query, please consider the following scenario we store game progress in a postgres table. A tournament starts with four players and following activity Each player starts with 100hitpoints player 1 makes a strike (we refer to a chart to convert blows to hitpoints with random-range %) player 2 has 92HP, and returns a light blow, so player1 has 98hp The above two round will now be in Game Progress Table, as ROW Player1HP Player2HP Strikefrom StrikeTo ReturnStrikeHP Round TimeStamp StrikeMethod 1 100 100 000 0 2 9892P1P2 2 1 There is a tremendous flow of sql queries, There are average/minimum 100 tournaments online per 12 minutes or 500 players / hour In Game Progress table, We are storing each player move a 12 round tourament of 4 player there can be 48 records plus around same number for spells or special items a total of 96 per tourament or 48000 record inserts per hour (500 players/hour) That's only about 13 inserts per second, and if you're batching them up in transactions then you could easily be doing only one insert per second or so. My laptop could handle that load easily. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] citext question
Hi I have played around a bit with the citext extention. It looks like it is a lot like the text data type - allmost like a memo field. Is there any way to restrict the length of citext fields, like char and varchar fields? Thanks H.F.
[GENERAL] database corruption questions
Hi Are there any best practices for avoiding database corruption? I suppose the most obvious one is to have a ups if it's a desktop machine. How do you detect corruption in a Postgresql database and are there any ways to fix it besides restoring a backup? Thanks H.F.
Re: [GENERAL] database corruption questions
Lørdag 13. oktober 2012 23.53.03 skrev Heine Ferreira : Hi Are there any best practices for avoiding database corruption? In my experience, database corruption always comes down to flaky disk drives. Keep your disks new and shiny eg. less than 3 years, and go for some kind of redundancy in a RAID configuration. regards, Leif -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] citext question
On Oct 13, 2012, at 17:48, Heine Ferreira heine.ferre...@gmail.com wrote: Hi I have played around a bit with the citext extention. It looks like it is a lot like the text data type - allmost like a memo field. Is there any way to restrict the length of citext fields, like char and varchar fields? Thanks H.F. Try citext(25)...if it works then yes otherwise no... David J. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/13/12 3:04 PM, Leif Biberg Kristensen wrote: Lørdag 13. oktober 2012 23.53.03 skrev Heine Ferreira : Hi Are there any best practices for avoiding database corruption? In my experience, database corruption always comes down to flaky disk drives. Keep your disks new and shiny eg. less than 3 years, and go for some kind of redundancy in a RAID configuration. also, ECC RAM so creeping bit rot doesn't slip in from memory without detection.if you use a raid controller with a write-back cache, be sure it has BBU or flash-back. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] citext question
On Oct 13, 2012, at 6:34 PM, David Johnston pol...@yahoo.com wrote: Hi I have played around a bit with the citext extention. It looks like it is a lot like the text data type - allmost like a memo field. Is there any way to restrict the length of citext fields, like char and varchar fields? Thanks H.F. Try citext(25)...if it works then yes otherwise no... No, citext(length) not supported. However, you can define check constraint, if that fulfill your requirement as given below: create table test2(col citext check(length(col) 3)); Or you can create a domain which you can use in CREATE TABLE command as given below: CREATE domain citext_char as CITEXT CHECK(length(value) = 3); Thanks Regards, Vibhor Kumar EnterpriseDB Corporation The Enterprise PostgreSQL Company Blog: http://vibhork.blogspot.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Improve MMO Game Performance
On 10/13/2012 07:52 PM, Arvind Singh wrote: we are all aware of the popular trend of MMO games. where players face each other live. My area of concern, is storage of player moves and game results. Using Csharp and PostgreSql The game client is browser based ASP.NET and calls Csharp functions for all database related processing To understand my query, please consider the following scenario /we store game progress in a postgres table./ I suspect that this design will scale quite poorly. As others have noted it should work OK right now if tuned correctly. If you expect this to get bigger, though, consider splitting it up a bit. What I'd want to do is: - Store data that must remain persistent in the main PostgreSQL DB; things like the outcomes of games that have ended, overall scores, etc. - Use memcached or a similar system to cache any data that doesn't have to be perfectly up-to-date and/or doesn't change much, like rankings or player names; - Use LISTEN / NOTIFY to do cache invalidation of memcached data if necessary; and - Store transient data in `UNLOGGED` tables with `async_commit` enabled, a long `commit_delay`, etc. Possibly on a different DB server. You'll certainly want to use different transactions to separate your important data where durability matters from your transient data. I'd run two different Pg clusters with separate table storage and WAL, so the transient-data one could run with the quickest-and-dirtiest settings possible. I might not even store the transient data in Pg at all, I might well use a system that offers much weaker consistency, atomicicty and integrity guarantees. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/14/2012 05:53 AM, Heine Ferreira wrote: Hi Are there any best practices for avoiding database corruption? * Maintain rolling backups with proper ageing. For example, keep one a day for the last 7 days, then one a week for the last 4 weeks, then one a month for the rest of the year, then one a year. * Use warm standby with log shipping and/or replication to maintain a live copy of the DB. * If you want point-in-time recovery, keep a few days or weeks worth of WAL archives and a basebackup around. That'll help you recover from those oops I meant DROP TABLE unimportant; not DROP TABLE vital_financial_records; issues. * Keep up to date with the latest PostgreSQL patch releases. Don't be one of those people still running 9.0.0 when 9.0.10 is out. * Plug-pull test your system when you're testing it before going live. Put it under load with something like pgbench, then literally pull the plug out. If your database doesn't come back up fine you have hardware, OS or configuration problems. * Don't `kill -9` the postmaster. It should be fine, but it's still not smart. * ABSOLUTELY NEVER DELETE postmaster.pid * Use good quality hardware with proper cooling and a good quality power supply. If possible, ECC RAM is a nice extra. * Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD reviews periodically posted on this mailing list if considering using SSDs. Make sure the SSD has a supercapacitor or other reliable option for flushing its write cache on power loss. Always do repeated plug-pull testing when using SSDs. * Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not the right choices for a database you care about. Never, ever, ever use FAT32. * If on Windows, do not run an anti-virus program on your database server. Nobody should be using it for other things or running programs on it anyway. * Avoid RAID 5, mostly because the performance is terrible, but also because I've seen corruption issues with rebuilds from parity on failing disks. * Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance. * If you're going to have a UPS (you shouldn't need one as your system should be crash-safe), don't waste your money on a cheap one. Get a good online double-conversion unit that does proper power filtering. Cheap UPSs are just a battery with a fast switch, they provide no power filtering and what little surge protection they offer is done with a component that wears out after absorbing a few surges, becoming totally ineffective. Since your system should be crash-safe a cheap UPS will do nothing for corruption protection, it'll only help with uptime. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/13/12 7:13 PM, Craig Ringer wrote: * Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance. a comment on this one I have some test servers with lots of SAS and/or SATA drives on controllers like LSI Logic 9261-8i, with 512MB or 1GB battery-backed cache. I can configure the controller for JBOD and use linux mdraid raid10 and get the same performance as the controllers native raid10, as long as the write-back cache is enabled. disable the writeback cache, and you might as well be using SATA JBOD. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/14/2012 11:00 AM, John R Pierce wrote: On 10/13/12 7:13 PM, Craig Ringer wrote: * Use a good quality hardware RAID controller with a battery backup cache unit if you're using spinning disks in RAID. This is as much for performance as reliability; a BBU will make an immense difference to database performance. a comment on this one I have some test servers with lots of SAS and/or SATA drives on controllers like LSI Logic 9261-8i, with 512MB or 1GB battery-backed cache. I can configure the controller for JBOD and use linux mdraid raid10 and get the same performance as the controllers native raid10, as long as the write-back cache is enabled. disable the writeback cache, and you might as well be using SATA JBOD. Yeah, without the write-back cache you don't gain much. I run a couple of DBs on plain old `md` RAID and I'm actually quite happy with it. I've expanded this into a blog post and improved that section there. http://blog.ringerc.id.au/2012/10/avoiding-postgresql-database-corruption.html Comments appreciated. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/14/2012 05:53 AM, Heine Ferreira wrote: Hi Are there any best practices for avoiding database corruption? I forgot to mention, you should also read: http://www.postgresql.org/docs/current/static/wal-reliability.html -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] citext question
On 10/14/2012 05:48 AM, Heine Ferreira wrote: Hi I have played around a bit with the citext extention. It looks like it is a lot like the text data type - allmost like a memo field. Is there any way to restrict the length of citext fields, like char and varchar fields? First, don't use char(n) or plain char. Neither do what you (as a sane and sensible person) probably expect them to do. In PostgreSQL, varchar(n) is effectively the same as text with a length(col_name) = n CHECK constraint. There is no difference in how they are stored, and there's no advantage to using varchar over text. It's similar with citext. While citext doesn't accept a typmod to constrain its length, you can and should use CHECK constraints as appropriate in your data definitions. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On Sun, Oct 14, 2012 at 1:13 PM, Craig Ringer ring...@ringerc.id.au wrote: * Never, ever, ever use cheap SSDs. Use good quality hard drives or (after proper testing) high end SSDs. Read the SSD reviews periodically posted on this mailing list if considering using SSDs. Make sure the SSD has a supercapacitor or other reliable option for flushing its write cache on power loss. Always do repeated plug-pull testing when using SSDs. Interesting. My boss just bought a set of SSDs for some test systems, with the intention of using them for our next deployment. They're giving really great performance under pgbench, but we haven't yet done a plug-pull test on any of them. I'll make sure I do that next week. Is there an article somewhere about how best to do a plug-pull test? Or is it as simple as fire up pgbench, kill the power, bring things back up, and see if anything isn't working? ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption questions
On 10/14/2012 12:02 PM, Chris Angelico wrote: Is there an article somewhere about how best to do a plug-pull test? Or is it as simple as fire up pgbench, kill the power, bring things back up, and see if anything isn't working? That's what I'd do and what I've always done in the past, but others here are much more experienced with testing gear into production. You can also use pg_test_fsync and diskchecker.pl . See: http://www.postgresql.org/docs/current/static/wal-reliability.html I do repeated plug-pull tests and make sure fsync is being honoured. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general