Re: [PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?
Mark Mielke wrote: The conclusion I read was that Linux O_SYNC behaves like O_DSYNC on other systems. For WAL, this seems satisfactory? It would be if it didn't have any bugs or limitiations, but it does. The one pointed out in the message I linked to suggests that a mix of buffered and O_SYNC direct I/O can cause a write error, with the exact behavior you get depending on the kernel version. That's a path better not explored as I see it. The kernels that have made some effort to implement this correctly actually expose O_DSYNC, on newer Linux systems. My current opinion is that if you only have Linux O_SYNC, don't use it. The ones with O_DSYNC haven't been around for long enough to be proven or disproven as effective yet. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
Dimitri Fontaine wrote: Well I guess I'd prefer a per-transaction setting Not possible, as many others have said. As soon as you make an unsafe transaction, all the other transactions have nothing to rely on. On Thu, 17 Jun 2010, Pierre C wrote: A per-table (or per-index) setting makes more sense IMHO. For instance on recovery, truncate this table (this was mentioned before). That would be much more valuable. I'd like to point out the costs involved in having a whole separate version of Postgres that has all this safety switched off. Package managers will not thank anyone for having to distribute another version of the system, and woe betide the user who installs the wrong version because it runs faster. No, this is much better as a configurable option. Going back to the on recovery, truncate this table. We already have a mechanism for skipping the WAL writes on an entire table - we do that for tables that have been created in the current transaction. It would surely be a small step to allow this to be configurably permanent on a particular table. Moreover, we already have a mechanism for taking a table that has had non-logged changes, and turning it into a fully logged table - we do that to the above mentioned tables when the transaction commits. I would strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may involve some more acrobatics if the table is currently in use by multiple transactions, but would be valuable. This would allow users to create temporary tables that can be shared by several connections. It would also allow bulk loading in parallel of a single large table. With these suggestions, we would still need to WAL-log all the metadata changes, but I think in most circumstances that is not going to be a large burden on performance. Matthew -- Picard: I was just paid a visit from Q. Riker: Q! Any idea what he's up to? Picard: No. He said he wanted to be nice to me. Riker: I'll alert the crew. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote: Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. This is an interesting idea. I would guess that you could simulate this to some degree by compiling PG with a larger block size. Have you tried this to see whether/how much/for what kind of workloads it helps? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
I'd like to point out the costs involved in having a whole separate version It must be a setting, not a version. For instance suppose you have a session table for your website and a users table. - Having ACID on the users table is of course a must ; - for the sessions table you can drop the D Server crash would force all users to re-login on your website but if your server crashes enough that your users complain about that, you have another problem anyway. Having the sessions table not WAL-logged (ie faster) would not prevent you from having sessions.user_id REFERENCES users( user_id ) ... so mixing safe and unsafe tables would be much more powerful than just having unsafe tables. And I really like the idea of non-WAL-logged indexes, too, since they can be rebuilt as needed, the DBA could decide between faster index updates but rebuild on crash, or normal updates and fast recovery. Also materialized views etc, you can rebuild them on crash and the added update speed would be good. Moreover, we already have a mechanism for taking a table that has had non-logged changes, and turning it into a fully logged table - we do that to the above mentioned tables when the transaction commits. I would strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may involve some more acrobatics if the table is currently in use by multiple transactions, but would be valuable. I believe the old discussions called this ALTER TABLE SET PERSISTENCE. This would allow users to create temporary tables that can be shared by several connections. It would also allow bulk loading in parallel of a single large table. This would need to WAL-log the entire table to send it to the slaves if replication is enabled, but it's a lot faster than replicating each record. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
On Fri, Jun 18, 2010 at 12:46:11AM +0100, Tom Wilcox wrote: On 17/06/2010 22:41, Greg Smith wrote: Tom Wilcox wrote: Any suggestions for good monitoring software for linux? By monitoring, do you mean for alerting purposes or for graphing purposes? Nagios is the only reasonable choice for the former, while doing at best a mediocre job at the latter. For the later, I've found that Munin does a good job of monitoring Linux and PostgreSQL in its out of the box configuration, in terms of providing useful activity graphs. And you can get it to play nice with Nagios. Thanks Greg. Ill check Munin and Nagios out. It is very much for graphing purposes. I would like to be able to perform objective, platform-independent style performance comparisons. Cheers, Tom Zabbix-1.8+ is also worth taking a look at and it can run off our favorite database. It allows for some very flexible monitoring and trending data collection. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
On Fri, 18 Jun 2010, Robert Haas wrote: On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote: Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. This is an interesting idea. I would guess that you could simulate this to some degree by compiling PG with a larger block size. Have you tried this to see whether/how much/for what kind of workloads it helps? To a degree, that is the case. However, if you follow the thread a bit further back, you will find evidence that when the index is in memory, increasing the page size actually decreases the performance, because it uses more CPU. To make it clear - 8kB is not an optimal page size for either fully cached data or sparsely cached data. For disc access, large pages are appropriate, on the order of 256kB. If the page size is much lower than that, then the time taken to fetch it doesn't actually decrease much, and we are trying to get the maximum amount of work done per fetch without slowing fetches down significantly. Given such a large page size, it would then be appropriate to have a better data structure inside the page. Currently, our indexes (at least the GiST ones - I haven't looked at the Btree ones) use a simple linear array in the index page. Using a proper tree inside the index page would improve the CPU usage of the index lookups. One detail that would need to be sorted out is the cache eviction policy. I don't know if it is best to evict whole 256kB pages, or to evict 8kB pages. Probably the former, which would involve quite a bit of change to the shared memory cache. I can see the cache efficiency decreasing as a result of this, which is the only disadvantage I can see. This sort of thing has been fairly well researched at an academic level, but has not been implemented in that many real world situations. I would encourage its use in Postgres. Matthew -- Failure is not an option. It comes bundled with your Microsoft product. -- Ferenc Mantfeld -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Matthew Wakeling wrote: This sort of thing has been fairly well researched at an academic level, but has not been implemented in that many real world situations. I would encourage its use in Postgres. I guess, but don't forget that work on PostgreSQL is driven by what problems people are actually running into. There's a long list of performance improvements sitting in the TODO list waiting for people to find time to work on them, ones that we're quite certain are useful. That anyone is going to chase after any of these speculative ideas from academic research instead of one of those is unlikely. Your characterization of the potential speed up here is Using a proper tree inside the index page would improve the CPU usage of the index lookups, which seems quite reasonable. Regardless, when I consider is that something I have any reason to suspect is a bottleneck on common workloads?, I don't think of any, and return to working on one of things I already know is instead. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] HashAggregate slower than sort?
Jatinder Sangha j...@coalition.com wrote: I have a simple query that when planned either uses hash- aggregates or a sort depending on the amount of working memory available. The problem is that when it uses the hash-aggregates, the query runs 25% slower than when using the sort method. The table in question contains about 60 columns, many of which are boolean, 32-bit integers and some are 64-bit integers. Many fields are text - and some of these can be quite long (eg 32Kb). Obviously, I can re-write the query to use a distinct on (...) clause Yeah, that seems prudent, to say the least. Why is the hash-aggregate slower than the sort? Is it something to do with the number of columns? ie. When sorting, the first few columns defined on the table (id, version) make the row unique - but when using the hash-aggregate feature, presumably every column needs to be hashed which takes longer especially for long text fields? Sounds like a reasonable guess to me. But since you're apparently retrieving about 9,000 wide rows in (worst case) 56 ms, it would seem that your active data set may be fully cached. If so, you could try reducing both random_page_cost and seq_page_cost to something in the 0.1 to 0.005 range and see if it improves the accuracy of the cost estimates. Not that you should go back to using DISTINCT on all 60 column, including big text columns; but these cost factors might help other queries pick faster plans. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
Kenneth Marshall wrote: Zabbix-1.8+ is also worth taking a look at and it can run off our favorite database. It allows for some very flexible monitoring and trending data collection. Note that while Zabbix is perfectly reasonable general solution, the number of things it monitors out of the box for PostgreSQL: http://www.zabbix.com/wiki/howto/monitor/db/postgresql is only a fraction of what Munin shows you. The main reason I've been suggesting Munin lately is because it seems to get all the basics right for new users without them having to do anything but activate the PostgreSQL plug-in. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Greg Smith wrote: Matthew Wakeling wrote: This sort of thing has been fairly well researched at an academic level, but has not been implemented in that many real world situations. I would encourage its use in Postgres. I guess, but don't forget that work on PostgreSQL is driven by what problems people are actually running into. There's a long list of performance improvements sitting in the TODO list waiting for people to find time to work on them, ones that we're quite certain are useful. That anyone is going to chase after any of these speculative ideas from academic research instead of one of those is unlikely. Your characterization of the potential speed up here is Using a proper tree inside the index page would improve the CPU usage of the index lookups, which seems quite reasonable. Regardless, when I consider is that something I have any reason to suspect is a bottleneck on common workloads?, I don't think of any, and return to working on one of things I already know is instead. There are two different things concerning gist indexes: 1) with larger block sizes and hence, larger # entries per gist page, results in more generic keys of those pages. This in turn results in a greater number of hits, when the index is queried, so a larger part of the index is scanned. NB this has nothing to do with caching / cache sizes; it holds for every IO model. Tests performed by me showed performance improvements of over 200%. Since then implementing a speedup has been on my 'want to do list'. 2) there are several approaches to get the # entries per page down. Two have been suggested in the thread referred to by Matthew (virtual pages (but how to order these?) and tree within a page). It is interesting to see if ideas from Prokop's cache oblivous algorithms match with this problem to find a suitable virtual page format. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Yeb Havinga yebhavi...@gmail.com wrote: concerning gist indexes: 1) with larger block sizes and hence, larger # entries per gist page, results in more generic keys of those pages. This in turn results in a greater number of hits, when the index is queried, so a larger part of the index is scanned. NB this has nothing to do with caching / cache sizes; it holds for every IO model. Tests performed by me showed performance improvements of over 200%. Since then implementing a speedup has been on my 'want to do list'. As I recall, the better performance in your tests was with *smaller* GiST pages, right? (The above didn't seem entirely clear on that.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Greg Smith g...@2ndquadrant.com writes: Your characterization of the potential speed up here is Using a proper tree inside the index page would improve the CPU usage of the index lookups, which seems quite reasonable. Regardless, when I consider is that something I have any reason to suspect is a bottleneck on common workloads?, I don't think of any, and return to working on one of things I already know is instead. Note also that this doesn't do a thing for b-tree indexes, which already have an intelligent within-page structure. So that instantly makes it not a mainstream issue. Perhaps somebody will be motivated to work on it, but most of us are chasing other things. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Kevin Grittner wrote: Yeb Havinga yebhavi...@gmail.com wrote: concerning gist indexes: 1) with larger block sizes and hence, larger # entries per gist page, results in more generic keys of those pages. This in turn results in a greater number of hits, when the index is queried, so a larger part of the index is scanned. NB this has nothing to do with caching / cache sizes; it holds for every IO model. Tests performed by me showed performance improvements of over 200%. Since then implementing a speedup has been on my 'want to do list'. As I recall, the better performance in your tests was with *smaller* GiST pages, right? (The above didn't seem entirely clear on that.) Yes, making pages smaller made index scanning faster. -- Yeb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
It must be a setting, not a version. For instance suppose you have a session table for your website and a users table. - Having ACID on the users table is of course a must ; - for the sessions table you can drop the D You're trying to solve a different use-case than the one I am. Your use-case will be solved by global temporary tables. I suggest that you give Robert Haas some help feedback on that. My use case is people using PostgreSQL as a cache, or relying entirely on replication for durability. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On 6/18/10 2:15 AM, Matthew Wakeling wrote: I'd like to point out the costs involved in having a whole separate version of Postgres that has all this safety switched off. Package managers will not thank anyone for having to distribute another version of the system, and woe betide the user who installs the wrong version because it runs faster. No, this is much better as a configurable option. Agreed, although initial alphas of this concept are likely to in fact be a separate source code tree. Eventually when we have it working well it could become an initdb-time option. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
On Jun 16, 2010, at 1:53 PM, Alvaro Herrera wrote: Excerpts from Tom Lane's message of lun jun 14 23:57:11 -0400 2010: Scott Carey sc...@richrelevance.com writes: Great points. There is one other option that is decent for the WAL: If splitting out a volume is not acceptable for the OS and WAL -- absolutely split those two out into their own partitions. It is most important to make sure that WAL and data are not on the same filesystem, especially if ext3 is involved. Uh, no, WAL really needs to be on its own *spindle*. The whole point here is to have one disk head sitting on the WAL and not doing anything else except writing to that file. However, there's another point here -- probably what Scott is on about: on Linux (at least ext3), an fsync of any file does not limit to flushing that file's blocks -- it flushes *ALL* blocks on *ALL* files in the filesystem. This is particularly problematic if you have pgsql_tmp in the same filesystem and do lots of disk-based sorts. So if you have it in the same spindle but on a different filesystem, at least you'll avoid that extra fsync work, even if you have to live with the extra seeking. yes, especially with a battery backed up caching raid controller the whole own spindle thing doesn't really matter, the WAL log writes fairly slowly and linearly and any controller with a damn will batch those up efficiently. By FAR, the most important thing is to have WAL on its own file system. If using EXT3 in a way that is safe for your data (data = ordered or better), even with just one SATA disk, performance will improve a LOT if data and xlog are separated into different file systems. Yes, an extra spindle is better. However with a decent RAID card or caching storage, 8 spindles for it all in one raid 10, with a partition for xlog and one for data, is often better performing than a mirrored pair for OS/xlog and 6 for data so long as the file systems are separated. With a dedicated xlog and caching reliable storage, you can even mount it direct to avoid polluting OS page cache. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance