Re: [PERFORM] How we made Postgres upserts 2-3* quicker than MongoDB
On 08/01/16 19:07, Nicolas Paris wrote: Hello Mark, As far as I know, MongoDB is able to get better writing performances thanks to scaling (easy to manage sharding). Postgresql cannot (is not designed for - complicated). Why comparing postgresql & mongoDB performances on a standalone instance since mongoDB is not really designed for that ? Yes you can get better performance with mongo via the sharding route but there are a number of quite bad downsides to mongo sharding - limited ability to perform aggregation, loss of unique key constraints other than the shard key, requires at minimum 4-6* the hardware (2 replicas for each block = 4 + 2 * mongos gateway servers)... Actually pretty similar to the issues you see when trying to scale a RDBMS via sharding... We tried doing some mongo sharding and the result was a massive drop in write performance so we gave up pretty quickly... Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How we made Postgres upserts 2-3* quicker than MongoDB
Hi all, I just wrote an article about the postgres performance optimizations I've been working on recently especially compared to our old MongoDB platform https://mark.zealey.org/2016/01/08/how-we-tweaked-postgres-upsert-performance-to-be-2-3-faster-than-mongodb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 04/01/16 18:12, Andres Freund wrote: Pages containing data of unlogged tables aren't ever flushed to disk unless a) a shutdown checkpoint is performed b) a buffer containing data from an unlogged table is used for something else c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE Hence, if there's an unclean shutdown, there's absolutely no guarantee about the on-disk state of unlogged tables. Even if they haven't been modified in ages - there could have been many many dirty pages in shared buffers when crashing. Always flushing dirty pages of unlogged tables at checkpoint would greatly increase the overhead for memory resident, write heavy workloads that use unlogged tables. If there was a command to flush a specific unlogged table to disk it would work around all these issues no? Perhaps if you marked the table as read only at the same time it would flush it to disk and ensure no more data could be written to it eg (ALTER TABLE ... SET READ ONLY on an unlogged table would flush + not truncate after crash). In our case this would be great as we want to use these as permanent tables for speed; but after an initial data dump we don't change the data again so we could just do this at the end of the import process. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 04/01/16 16:27, Yves Dorfsman wrote: I haven't tried, but won't converting an unlogged table into a logged table write all the inserts at once instead of once per insert? Or are you wanting to do more bulk insert into that table later? Are you trying to avoid running a CHECKPOINT? Are you afraid the activity on the other tables will create too much I/O? Setting a table to logged still pushes all the inserts into the WAL which we don't need and causes a lot of extra IO. It also takes quite a long time as it is basically rewriting the table and all indexes (eg 60 seconds for 2m rows on one of my test tables). We can do this but a) it generates lots of additional IO which isn't really required for us, and b) it acquires an exclusive lock on the table which is also not nice for us. If the last data modification statement was run more than eg 30 seconds or 1 minute before an unclean shutdown (or the data was otherwise flushed to disk and there was no IO since then) can we not assume that the data is not corrupted and hence not truncate the unlogged tables? I have to admit that I have been surprised by this, it feels like unlogged tables are never written properly unless you do an explicit CHECKSUM. I don't know how the internals work but unlogged tables definitely flushed to disk and persist through normal server restarts. It is just according to the docs if the server ever has an unclean shutdown the tables are truncated even if they have not been updated in a year. I can't understand why it has to be like this and it seems that it would be much nicer to not automatically truncate if it doesn't have to. This would be great in the situation where you can tolerate a low chance of data-loss but want very quick upserts. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Proposal for unlogged tables
I've recently been doing some performance testing with unlogged tables vs logged tables on 9.5-rc1. Basically we're trying to do big loads of data into the database periodically. If on the very rare occasion the server crashes half way through the import it's no big deal so I've been looking specifically at unlogged tables with transactions having synchronous_commit set to OFF. When we do the inserts on a logged table with default WAL configuration settings we get a *lot* of disk IO generated (500mb/sec of pretty random IO - we have a nice ssd raid array but even so this maxes it out). Tweaking WAL settings (commit_delay, max_wal_size, min_wal_size) improves the situation quite a bit (50-100mb/sec of writes), but still we have no need to log the inserts into the WAL at the moment. Doing the imports to unlogged tables we get virtually no IO until the insert process has finished when the table gets flushed to disk which is great for us. However I read in the manuals that if the server ever has an unclean shutdown all unlogged tables will be truncated. Obviously with 9.5 we can now alter tables to be logged/unlogged after insert but this will still write all the inserts into the WAL. I can understand the requirement to truncate tables with active IO at the point of unclean shutdown where you may get corrupted data; but I'm interested to find out how easy it would be to not perform the truncate for historical unlogged tables. If the last data modification statement was run more than eg 30 seconds or 1 minute before an unclean shutdown (or the data was otherwise flushed to disk and there was no IO since then) can we not assume that the data is not corrupted and hence not truncate the unlogged tables? Thanks Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance