Re: [PERFORM] Lock pileup causes server to stall
Current FK checking makes you wait if the referenced tuple is modified on any indexed column, not just those that are actually used in foreign keys. Maybe this case would be sped up if we optimized that. Even if it is an gin index that is being modified? seems like a harsh limitation to me. Well, as I recall it's only unique indexes, so it's not *that* harsh. Sounds good. Indices are there for all kinds of reasons, unique ones are more related to referential integrity, so even not 100% accurate, at least 90% of the way in my world. We do have an star-schema in the db with some amount of information needed in the center that needs updates, apart from that a massive update activity on the sorrounding columns, locks on the center entity has quite high impact on the sorrounding updates. (9.2 moving to 9.3 reallly soon and looking forward for this enhancement. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Lock pileup causes server to stall
On 10/11/2014, at 22.40, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Josh Berkus wrote: All, pg version: 9.3.5 RHEL 6.5 128GB/32 cores Configured with shared_buffers=16GB Java/Tomcat/JDBC application Server has an issue that whenever we get lock waits (transaction lock waits, usually on an FK dependancy) lasting over a minute or more than 10 at once, *all* queries on the server slow to a crawl, taking 100X to 400X normal execution times. Current FK checking makes you wait if the referenced tuple is modified on any indexed column, not just those that are actually used in foreign keys. Maybe this case would be sped up if we optimized that. Even if it is an gin index that is being modified? seems like a harsh limitation to me. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
Kevin Grittner kgri...@ymail.com writes: Are we talking about the probe for the end (or beginning) of an index? If so, should we even care about visibility of the row related to the most extreme index entry? Should we even go to the heap during the plan phase? Consider the case where some transaction inserted a wildly out-of-range value, then rolled back. If we don't check validity of the heap row, we'd be using that silly endpoint value for planning purposes --- indefinitely. That's not an improvement over the situation that the probe is meant to fix. Apparently it is waiting for locks, cant the check be make in a non-blocking way, so if it ends up waiting for a lock then it just assumes non-visible and moves onto the next non-blocking? This stuff is a 9.2 feature right? What was the original problem to be adressed? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 21/09/2013, at 00.01, Jeff Janes jeff.ja...@gmail.com wrote: See In progress INSERT wrecks plans on table and Performance bug in prepared statement binding in 9.2 also on this list This feels like the same http://postgresql.1045698.n5.nabble.com/Slow-query-plan-generation-fast-query-PG-9-2-td5769363.html The issues are: 1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics. In my app i would prefer potentially out-of-date statistics instead. Jesper
[PERFORM] Slow query-plan generation (fast query) PG 9.2
Hi. I have a strange situation where generating the query plan takes 6s+ and executing it takes very little time. 2013-09-03 09:19:38.726 db=# explain select table.id from db.table left join db.tablepro on db.id = tablepro.table_id where table.fts @@ to_tsquery('english','q12345') ; QUERY PLAN --- Nested Loop Left Join (cost=43.71..12711.39 rows=2930 width=4) - Bitmap Heap Scan on sequence (cost=43.71..4449.10 rows=2930 width=4) Recheck Cond: (fts @@ '''q12345'''::tsquery) - Bitmap Index Scan on table_gin_idx (cost=0.00..42.98 rows=2930 width=0) Index Cond: (fts @@ '''q12345'''::tsquery) - Index Only Scan using tablepro_seqid_idx on tablepro (cost=0.00..2.81 rows=1 width=4) Index Cond: (tablepro_id = table.id) (7 rows) Time: 10458.404 ms The query gives 4 rows out of 50.000.000, so the query-plan is actually correct and as expected. Any suggestions? Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query-plan generation (fast query) PG 9.2
On 03/09/13 09:47, Craig Ringer wrote: On 09/03/2013 03:46 PM, jes...@krogh.cc wrote: Hi. I have a strange situation where generating the query plan takes 6s+ and executing it takes very little time. How do you determine that it's planning time at fault here? Not that I'm sure, but the timing I send were only for explain not explain analyze. The database is constantly updating and at the moment i cannot reproduce it any more. But at the time I picked the numbers it were very reproducible.. (tried 10+ times over 15 minutes). Please take separate timing for: PREPARE testq AS select table.id from db.table left join db.tablepro on db.id = tablepro.table_id where table.fts @@ to_tsquery('english','q12345') ; and then: EXPLAIN ANALYZE EXECUTE testq; I'll try to do that if i see the problem re-occour. I'm just very interested in what explain then does if it is not only the time for the query plan. When I did try the PREPARE / EXECUTE dance as you described .. i didnt see the prepare state take time, which seems to be consistent with that the planning time is in the EXECUTE step according to the documentation. -- Jesper
Re: [PERFORM] Best practice when reindexing in production
On 29/05/13 14:24, Niels Kristian Schjødt wrote:On 29/05/13 14:24, Niels Kristian Schjødt wrote: Hi, I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this? Hi. Since you still dont know wether it is worth it or not, I would strongly suggest that you test this out before. Simply just creating an index next to the old one with the same options (but different name) and compare sizes would be simple. Second, if the new index is significantly smaller than the old on, I suggest that you try to crank up the autovacuum daemon instead of blindly dropping and creating indexes, this will help to mitigate the bloat you're seeing accumulating in above test. Cranking up autovacuum is going to have significan less impact on the concurrent queries while doing it and can help to maintain the database in a shape where regular re-indexings shouldnt be nessesary. Autovacuum has build in logic to sleep inbetween operations in order to reduce the IO-load of you system for the benefit of concurrent users. The approach of duplicate indices will pull all the resources it can get and concurrent users may suffer while you do it.. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hardware upgrade, performance degrade?
On 01/03/2013, at 10.52, Steven Crandell steven.crand...@gmail.com wrote: Recently I moved my ~600G / ~15K TPS database from a 48 core@2.0GHz server with 512GB RAM on 15K RPM disk to a newer server with 64 core@2.2Ghz server with 1T of RAM on 15K RPM disks The move was from v9.1.4 to v9.1.8 (eventually also tested with v9.1.4 on the new hardware) and was done via base backup followed by slave promotion. All postgres configurations were matched exactly as were system and kernel parameters. my guess is that you have gone down in clockfrequency on memory when you doubled the amount of memory in a mainly memory cached database the performance is extremely sensitive to memory speed Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] FTS performance issue probably due to wrong planner estimate of detoasting
On 08/02/13 01:52, Stefan Keller wrote: Hi, I have problems with the performance of FTS in a query like this: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE I have to check again, if I'm doing something wrong but I'm pretty sure it has to do with de-toasting and (wrong?) cost estimations. If you havent done it .. bump up statistics target on the column and re-analyze, see what that gives. I have also been playing with the cost-numbers in order to get it to favour an index-scan more often. That is lowering random_page_cost to be close to seq_page_cost, dependent on your system, the amount of memory, etc, then this can have negative side-effects on non-gin-queries. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
On 13/09/12 16:42, Bill Martin wrote: Yes, I've run the ANALYZE command. Regards, Bill Martin The main problem in your case is actually that you dont store the tsvector in the table. If you store to_tsvector('simple',content.content) in a column in the database and search against that instead then you'll allow PG to garther statistics on the column and make the query-planner act according to that. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
On 10/09/12 16:24, bill_mar...@freenet.de wrote: Hi All I´ve ft_simple_core_content_content_idx ON core_content USING gin (to_tsvector('simple'::regconfig, content) ); If I´m seaching for a word which is NOT in the column content the query plan and the execution time differs with the given limit. If I choose 3927 or any higher number the query execution took only few milliseconds. core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1) - Seq Scan on core_content content (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1) Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 52147.173 ms Is there any posibility to improve the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan? The big hammer is: set enable_seqscan = off, but if you tell which PG version you're on there may be something to do. I suggest you'd start by bumping the statistics target for the column to 1 and run analyze to see what that changes. -- Jesper
Re: [PERFORM] Performance of a large array access by position (tested version 9.1.3)
On 22/06/12 09:02, Maxim Boguk wrote: Hi all, May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time. But in tests I found that access speed degrade as O(N) of array size. Test case (performed on large not busy server with 1GB work_mem to ensure I working with memory only): WITH t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array) SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,1) as g(i); Results for N between 1 and 10.000.000 (used locally connected psql with \timing): N: Time: 1 5.8ms 10 5.8ms 100 5.8ms 10006.7ms --until there all reasonable 5k 21ms 10k34ms 50k 177ms 100k 321ms 500k 4100ms 1M 8100ms 2M 22000ms 5M 61000ms 10M22ms = 22ms to sinlge array element access. Is that behaviour is correct? PS: what I actually lookin for - constant fast access by position tuplestore for use with recursive queries and/or pl/pgsql, but without using C programming. Default column storage is to compress it, and store in TOAST with large values. This it what is causing the shift. Try to change the column storage of the column to EXTERNAL instead and rerun the test. ALTER TABLE tablename ALTER COLUMN column name SET STORAGE EXTERNAL Default is EXTENDED which runs compression on it, which again makes it hard to position into without reading and decompressing everything. http://www.postgresql.org/docs/9.1/static/sql-altertable.html Let us know what you get.? Jesper
Re: [PERFORM] Gin index insert performance issue
On 13/03/12 06:43, Rural Hunter wrote: I tried to increase work_mem but the inserts hang more time each time with less frequency. So it makes almost no difference for the total hanging time. Frequent vacuum is not a choice since the hang happens very 3-5 mins. is there any improvement I can make with pg for such data volumn(still increasing) or it's time to turn to other full text search solution such as lucene etc? We're using gin for fts-search, current index-size is up to 149GB and yes the update process is quite tough on the disk-io-subsystem. What you're experiencing is filling of the fastupdate queue, thats being flushed. Setting wok_mem higher causes the system to stall for longer period less frequent and has a side cost on queries that need to go through the pending list (that is bigger) in addition to the index-search. To me it seems like all other writing/updating processes are being stalled when the pending list is flushed, but I am not sure about the specifice here. Our solution is to turn fastupdate off for our gin-indices. http://www.postgresql.org/docs/9.0/static/sql-createindex.html Can also be set with ALTER TABLE ALTER INDEX I would have preferred a backend local batch-update process so it could batch up everything from its own transaction instead of interferring with other transactions. I would say, that we came from Xapian and the PG-index is a way better fit for our application. The benefits of having the fts index next to all the other data saves a significant amount of development time in the application both in terms of development and maintaince. (simpler, easier and more manageble). -- Jesper
Re: [PERFORM] Very long deletion time on a 200 GB database
On 23/02/12 09:39, Reuven M. Lerner wrote: Hi, everyone. I'm maintaining an application that exists as a black box in manufacturing plants. The system is based on Windows, .NET, and PostgreSQL 8.3. I'm a Unix kind of guy myself, but the application layer and system administration are being handled by other people; I'm just the PostgreSQL guy. Just thinking loud. It looks like (just guessing) that the application needs store data worth 1 month back and it was put into production under the assumption that it would never fill up or deletion easily could be done under maintaince windows. And that now turns out not to be the case. I would stuff in a trigger function on the table that automatically does the cleanup.. It could be a BEFORE INSERT OR UPDATE TRIGGER that just tries to prune 2-3 rows of the table if they have exceeded the keep-back time. Just installing that in the maintance window would allow the system to self-heal over time. If the maintaince window allows for more cleanup, then manually do some deletions. Now the black-box is self-healing. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified
On 2012-01-10 18:04, Tom Lane wrote: darklowdark...@gmail.com writes: But the performance problems starts when i do the same query specifying LIMIT. *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;* By some reason index is not used. It apparently thinks there are enough matches that it might as well just seqscan the table and expect to find some matches at random, in less time than using the index would take. The estimate seems to be off quite a bit, so maybe raising the stats target for this column would help. The cost of matching ts_match_vq against a toasted column is not calculated correctly. This is completely parallel with http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php Try raising the cost for ts_match_vq(tsvector,tsquery) that help a bit, but its hard to get the cost high enough. Raising statistics target helps too.. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with FTS
On 2011-11-30 21:58, Robert Haas wrote: The row-count estimates look reasonably accurate, so there's some other problem here. What do you have random_page_cost, seq_page_cost, and effective_cache_size set to? You might try SET random_page_cost=2 or even SET random_page_cost=0.5; SET seq_page_cost=0.3 and see if those settings help I may be seing ghosts here, since I've encountered the same problem. But the Query-planner does not take toast into account, so a Sequential Scan + filter only cost what it takes to scan the main table, but fts-fields are typically large enough to be toasted so the cost should be main+toast (amount of pages) + filtering cost. I posted about it yesterday: http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php If above problem is on 9.1 a patch to proper account of gin-estimates have been added to 9.1 which also may benefit the planning: http://www.postgresql.org/docs/9.1/static/release-9-1.html Improve GIN index scan cost estimation (Teodor Sigaev) Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Summaries on SSD usage?
On 2011-09-03 00:04, Stefan Keller wrote: 2011/9/2 Scott Marlowescott.marl...@gmail.com: On Tue, Aug 30, 2011 at 11:23 AM, Stefan Kellersfkel...@gmail.com wrote: How big is your DB? What kind of reads are most common, random access or sequential? How big of a dataset do you pull out at once with a query. SSDs are usually not a big winner for read only databases. If the dataset is small (dozen or so gigs) get more RAM to fit it in If it's big and sequentially accessed, then build a giant RAID-10 or RAID-6 If it's big and randomly accessed then buy a bunch of SSDs and RAID them My dataset is a mirror of OpenStreetMap updated daily. For Switzerland it's about 10 GB total disk space used (half for tables, half for indexes) based on 2 GB raw XML input. Europe would be about 70 times larger (130 GB) and world has 250 GB raw input. It's both randomly (= index scan?) and sequentially (= seq scan?) accessed with queries like: SELECT * FROM osm_point WHERE tags @ hstore('tourism','zoo') AND name ILIKE 'Zoo%' . You can try it yourself online, e.g. http://labs.geometa.info/postgisterminal/?xapi=node[tourism=zoo] So I'm still unsure what's better: SSD, NVRAM (PCI card) or plain RAM? And I'm eager to understand if unlogged tables could help anyway It's not that hard to figure out.. take some of your typical queries. say the one above.. Change the search-term to something you'd expect the user to enter in a minute, but hasn't been run. (could be museum instead of zoo.. then you run it with \timing and twice.. if the two queries are close to each other in timing, then you only hit memory anyway and neither SSD, NVRAM or more RAM will buy you anything. Faster memory and faster CPU-cores will.. if you have a significant speedup to the second run, then more RAM, NVRAM, SSD is a good fix. Typically I have slow-query-logging turned on, permanently set to around 250ms. If I find queries in the log that i didnt expect to take above 250ms then I'd start to investigate if query-plans are correct .. and so on.. The above numbers are raw-data size and now how PG uses them.. or? And you havent told anything about the size of your current system. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Summaries on SSD usage?
On 2011-09-01 23:28, Jim Nasby wrote: On Aug 30, 2011, at 12:23 PM, Stefan Keller wrote: I'm looking for summaries (or best practices) on SSD usage with PostgreSQL. My use case is mainly a read-only database. Are there any around? I'm not sure, but for read-only why not just put more memory in the server? It'll be a lot cheaper than SSDs It is really expensive to go over 512GB memory and the performance regression for just hitting disk in a system where you assume everything is in memory is really huge. SSD makes the edge be a bit smoother than rotating drives do. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware advice for scalable warehouse db
1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache 16 GB (4x4GB) Low Volt DDR3 1066Mhz PERC H700 SAS RAID controller 4 x 300 GB 10k SAS 6Gbps 2.5 in RAID 10 Apart from Gregs excellent recommendations. I would strongly suggest more memory. 16GB in 2011 is really on the low side. PG is using memory (either shared_buffers og OS cache) for keeping frequently accessed data in. Good recommendations are hard without knowledge of data and access-patterns, but 64, 128 and 256GB system are quite frequent when you have data that can't all be in memory at once. SAN's are nice, but I think you can buy a good DAS thing each year for just the support cost of a Netapp, but you might have gotten a really good deal there too. But you are getting a huge amount of advanced configuration features and potential ways of sharing and.. and .. just see the specs. .. and if you need those the SAN is a good way to go, but they do come with a huge pricetag. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sequential scan unduly favored over text search gin index
On 2011-06-20 17:38, Sushant Sinha wrote: I have a tsvector column docvector and a gin index on it docmeta1_docvector_idx I have a simple query select * from docmeta1 where docvector @@ plainto_tsquery('english', 'free'); I find that the planner chooses a sequential scan of the table even when the index performs orders of magnitude. I set random_page_cost = 1.0 for the database to favor index use. However, I still see that the cost estimate for sequential scan of the entire table (23000) is cheaper than the cost of using the index (33000). The time taken for sequential access is 5200 ms and for index usage is only 85 ms. The cost-estimation code for gin-indices are not good in 9.0, this has hugely been improved in 9.1 http://git.postgresql.org/gitweb?p=postgresql.gita=searchh=HEADst=commits=gincost I think the individual patches apply quite cleanly to 9.0 as far as I remember. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance advice for a new low(er)-power server
On 2011-06-16 17:09, Haestan wrote: I am evaluating hardware for a new PostgreSQL server. For reasons concerning power consumption and available space it should not have more than 4 disks (in a 1U case), if possible. Now, I am not sure what disks to use and how to layout them to get the best performance. What is your data:memory-size ratio? Can you afford to have everything in memory and only have the disks to be able to sustain writes? The cheaper option would be to buy 15k Seagate SAS disks with a 3ware 9750SA (battery backed) controller. Does it matter whether to use a 4-disk RAID10 or 2x 2-disk RAID1 (system+pg_xlog , pg_data) setup? Am I right that both would be faster than just using a single 2-disk RAID1 for everything? A higher end option would be to use 2x 64G Intel X-25E SSD's with a LSI MegaRAID 9261 controller for pg_data and/or pg_xlog and 2x SAS disks for the rest. Unfortunately, these SSD are the only ones offered by our supplier and they don't use a supercapacitor, AFAIK. Therefore I would have to disable the write cache on the SSD's somehow and just use the cache on the controller only. Does anyone know if this will work or even uses such a setup. Any SSD is orders of magnitude better than any rotating drive in terms of random reads. If you will benefit depends on your data:memory ratio.. Furthermore, the LSI MegaRAID 9261 offers CacheCade which uses SSD disks a as secondary tier of cache for the SAS disks. Would this feature make sense for a PostgreSQL server, performance wise? I have one CacheCade setup... not a huge benefit but it seems measurable. (but really hard to test). .. compared to a full SSD-setup I wouldn't consider it at all. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On 2011-05-16 06:41, Jesper Krogh wrote: On 2011-05-16 03:18, Greg Smith wrote: You can't do it in real-time. You don't necessarily want that to even if it were possible; too many possibilities for nasty feedback loops where you always favor using some marginal index that happens to be in memory, and therefore never page in things that would be faster once they're read. The only reasonable implementation that avoids completely unstable plans is to scan this data periodically and save some statistics on it--the way ANALYZE does--and then have that turn into a planner input. Would that be feasible? Have process collecting the data every now-and-then probably picking some conservative-average function and feeding it into pg_stats for each index/relation? To me it seems like a robust and fairly trivial way to to get better numbers. The fear is that the OS-cache is too much in flux to get any stable numbers out of it. Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On 2011-05-03 17:52, Willy-Bas Loos wrote: Our database has gotten rather large and we are running out of disk space. our disks are 15K rpm SAS disks in RAID 10. We are going to rent some space on a FibreChannel SAN. That gives us the opportunity to separate the data and the indexes. Now i thought it would be best to move the indexes to the SAN and leave the data on the disks, since the disks are better at sequential I/O and the SAN will have lots of random I/O since there are lots of users on it. Is that a wise thing to do? If you're satisfied with the current performance then it should be safe to keep the indices and move the data, the risk of the SAN performing worse on sequential I/O is not that high. But without testing and knowledge about the SAN then it is hard to say if what you currently have is better or worse than the SAN. The vendor may have a way better san, but is may also be shared among 200 other hosts connected over iSCSI or FC so your share may be even worse than what you currently have. Without insight and testing is it hard to guess. I've pretty much come to the conclusion of going the DAS way every time, but it all depends on what your end looks like. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On 2011-05-04 07:25, Willy-Bas Loos wrote: are you saying that, generally speaking, moving the data would be better unless the SAN performs worse than the disks? It was more, given all the incertainties, that seems like the least risky. The SAN might actually be less well performing than what you currently have, you dont know yet I guess? besides your point that it depends on what our end looks like i mean. (and what do you mean by the DAS way, sry no native speaker) DAS way = A disk array where the database has sole access to the hardware (not shared among other systems). Examples are Dell MD1200/1220 or similary. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] REINDEX takes half a day (and still not complete!)
How do DB folks do this with small maintenance windows? This is for a very high traffic website so it's beginning to get embarrassing. Normally there is no need to issue reindex. What's your reason for the need? Jesper
Re: [PERFORM] Linux: more cores = less concurrency.
On 2011-04-11 21:42, Glyn Astill wrote: I'll have to try with the synthetic benchmarks next then, but somethings definately going off here. I'm seeing no disk activity at all as they're selects and all pages are in ram. Well, if you dont have enough computations to be bottlenecked on the cpu, then a 4 socket system is slower than a comparative 2 socket system and a 1 socket system is even better. If you have a 1 socket system, all of your data can be fetched from local ram seen from you cpu, on a 2 socket, 50% of your accesses will be way slower, 4 socket even worse. So the more sockets first begin to kick in when you can actually use the CPU's or add in even more memory to keep your database from going to disk due to size. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux: more cores = less concurrency.
On 2011-04-11 22:39, James Cloos wrote: GA == Glyn Astillglynast...@yahoo.co.uk writes: GA I was hoping someone had seen this sort of behaviour before, GA and could offer some sort of explanation or advice. Jesper's reply is probably most on point as to the reason. I know that recent Opterons use some of their cache to better manage cache-coherency. I presum recent Xeons do so, too, but perhaps yours are not recent enough for that? Better cache-coherence also benefits, but it does nothing to the fact that remote DRAM fetches is way more expensive than local ones. (Hard numbers to get excact nowadays). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-28 22:21, Greg Smith wrote: Some may still find these two cheap for enterprise use, given the use of MLC limits how much activity these drives can handle. But it's great to have a new option for lower budget system that can tolerate some risk there. Drifting of the topic slightly.. Has anyone opinions/experience with: http://www.ocztechnology.com/ocz-z-drive-r2-p88-pci-express-ssd.html They seem to be like the FusionIO drives just quite a lot cheaper, wonder what the state of those 512MB is in case of a power-loss. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-29 16:16, Jeff wrote: halt for 0.5-2 seconds, then resume. The fix we're going to do is replace each drive in order with the rebuild occuring between each. Then we do a security erase to reset the drive back to completely empty (including the spare blocks kept around for writes). Are you replacing the drives with new once, or just secure-erase and back in? What kind of numbers are you drawing out of smartmontools in usage figures? (Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's, and have been planning to cycle drives for quite some time, without actually getting to it. Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. Thats also my experience. -- Jesper
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-29 18:50, Jeff wrote: we have some new drives that we are going to use initially, but eventually it'll be a secure-erase'd one we replace it with (which should perform identical to a new one) What enclosure controller are you using on the 24 disk beast? LSI ELP and a HP D2700 enclosure. Works flawlessly, the only bad thing (which actually is pretty grave) is that the controller mis-numbers the slots in the enclosure, so you'll have to have the mapping drawn on paper next to the enclosure to replace the correct disk. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-29 06:13, Merlin Moncure wrote: My own experience with MLC drives is that write cycle expectations are more or less as advertised. They do go down (hard), and have to be monitored. If you are writing a lot of data this can get pretty expensive although the cost dynamics are getting better and better for flash. I have no idea what would be precisely prudent, but maybe some good monitoring tools and phased obsolescence at around 80% duty cycle might not be a bad starting point. With hard drives, you can kinda wait for em to pop and swap em in -- this is NOT a good idea for flash raid volumes. What do you mean by hard, I have some in our setup, but havent seen anyting hard just yet. Based on report on the net they seem to slow down writes to next to nothing when they get used but that seems to be more gracefully than old rotating drives.. can you elaborate a bit more? Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On 2011-03-18 01:51, Oliver Charles wrote: Hello, At MusicBrainz we're looking to get a new database server, and are hoping to buy this in the next couple of days. I'm mostly a software guy, but I'm posting this on behalf of Rob, who's actually going to be buying the hardware. Here's a quote of what we're looking to get: I think most of it has been said already: * Battery backed write cache * See if you can get enough memory to make all of your active dataset fit in memory. (typically not that hard in 2011). * Dependent on your workload of-course, you're typically not bottlenecked by the amount of cpu-cores, so strive for fewer faster cores. * As few sockets as you can screeze you memory and cpu-requirements onto. * If you can live with (or design around) the tradeoffs with SSD it will buy you way more performance than any significant number of rotating drives. (a good backup plan with full WAL-log to a second system as an example). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Indexes with condition using immutable functions applied to column not used
On 2011-02-08 01:14, Sylvain Rabot wrote: CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 0; CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 1; CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 2; CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 3; mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4; Should be written as: select * from directory where __mod_cons_hash(id_user,4) = 4%4; Then it should just work. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle
On 2010-10-28 15:13, Merlin Moncure wrote: On Wed, Oct 27, 2010 at 3:47 PM, Jesper Kroghjes...@krogh.cc wrote: On 2010-10-27 20:51, Merlin Moncure wrote: Yes, I am quite aware of how the o/s page cache works. All else being equal, I more compact database obviously would be preferred. However 'all else' is not necessarily equal. I can mount my database on bzip volume, that must make it faster, right? wrong. I understand the postgres storage architecture pretty well, and the low hanging fruit having been grabbed further layout compression is only going to come as a result of tradeoffs. Or configureabillity.. Not directly related to overall space consumption but I have been working on a patch that would make TOAST* kick in earlier in the process, giving a slimmer main table with visibillity information and simple columns and moving larger colums more aggressively to TOAST. Do you have any benchmarks supporting if/when such a change would be beneficial? On, IO-bound queries it pretty much translates to the ration between the toast-table-size vs. the main-table-size. Trying to aggressively speed up select count(*) from table gives this: http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg146153.html with shutdown of pg and drop caches inbetween... the default select count (*) on 50K tuples gives 4.613ms (2 tuples pr page) vs. 318ms... (8 tuples pr page). PG default is inbetween... -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle
On 2010-10-27 20:28, Merlin Moncure wrote: Postgres indexes are pretty compact, and oracle (internals I am not familiar with) also has to do MVCC type management, so I am suspecting your measurement is off (aka, operator error) or oracle is cheating somehow by optimizing away storage requirements somehow via some sort of tradeoff. However you still fail to explain why storage size is a problem. Are planning to port oracle to postgres on a volume that is 50% full? :-) Pretty ignorant comment.. sorry .. But when your database approaches something that is not mainly fitting in memory, space directly translates into speed and a more compact table utillizes the OS-page cache better. This is both true for index and table page caching. And the more compact your table the later you hit the stage where you cant fit into memory anymore. .. but if above isn't issues, then your statements are true. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle
On 2010-10-27 20:51, Merlin Moncure wrote: Yes, I am quite aware of how the o/s page cache works. All else being equal, I more compact database obviously would be preferred. However 'all else' is not necessarily equal. I can mount my database on bzip volume, that must make it faster, right? wrong. I understand the postgres storage architecture pretty well, and the low hanging fruit having been grabbed further layout compression is only going to come as a result of tradeoffs. Or configureabillity.. Not directly related to overall space consumption but I have been working on a patch that would make TOAST* kick in earlier in the process, giving a slimmer main table with visibillity information and simple columns and moving larger colums more aggressively to TOAST. The overall disadvantage of TOAST is the need for an extra disk seek if you actually need the data. If the application rarely needs the large columns but often do count/filtering on simple values this will eventually lead to a better utillization of the OS-page-cache with a very small overhead to PG (in terms of code) and 0 overhead in the applications that benefit. Keeping in mind that as SSD-drives get more common the the extra disk seek drops dramatically, but the drive is by itself probably still 100-1000x slower than main memory, so keeping the right data in the OS-cache is also a parameter. If you deal with data where the individual tuple-size goes up, currently TOAST first kicks in at 2KB (compressed size) which leads to a very sparse main table in terms of visibillity information and count and selects on simple values will drag a huge amount of data into the cache-layers thats not needed there. Another suggestion could be to make the compression of text columns kick in earlier .. if thats possible. (I dont claim that its achiveable) Unless the tuple-header is hugely bloated I have problems creating a situation in my head where hammering that one can change anything significantly. * http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg159726.html -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On 2010-10-22 17:37, Greg Smith wrote: I think that most people who have thought they were safe to turn off full_page_writes in the past did so because they believed they were in category (1) here. I've never advised anyone to do that, because it's so difficult to validate the truth of. Just given that, I'd be tempted to join in on suggesting this parameter just go away in the name of safety, except that I think category (2) here is growing now. ZFS is the most obvious example where the atomic write implementation seems to always make disabling full_page_writes safe. Can you point to some ZFS docs that tell that this is the case.. I'd be surprised if it doesnt copy away the old block and replaces it with the new one in-place. The other behaviour would quite quickly lead to a hugely fragmented filesystem that performs next to useless and ZFS doesnt seem to be in that category.. ... All given my total lack of insight into ZFS. -- Jesper
Re: [PERFORM] Slow count(*) again...
On 2010-10-21 06:47, Scott Carey wrote: On a wimpy disk, I/O bound for sure. But my disks go 1000MB/sec. No query can go fast enough for them. The best I've gotten is 800MB/sec, on a wide row (average 800 bytes). Most tables go 300MB/sec or so. And with 72GB of RAM, many scans are in-memory anyway. Is it cpu or io bound while doing it? Can you scan it faster using time cat relation-oid.* /dev/null A single SSD with supercapacitor will go about 500MB/sec by itself next spring. I will easily be able to build a system with 2GB/sec I/O for under $10k. What filesystem are you using? Readahead? Can you try to check the filesystemfragmentation of the table using filefrag? -- Jesper
Re: [PERFORM] Slow count(*) again...
On 2010-10-14 06:22, mark wrote: Could this be an interesting test use of https://www.fossexperts.com/ ? 'Community' driven proposal - multiple people / orgs agree to pay various portions? Maybe with multiple funders a reasonable target fund amount could be reached. I might convince my boss to chip in... but how do we get the task up there.. should we find one to give an estimate first? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 2010-10-14 21:56, Robert Haas wrote: On Thu, Oct 14, 2010 at 12:22 AM, markdvlh...@gmail.com wrote: Could this be an interesting test use of https://www.fossexperts.com/ ? 'Community' driven proposal - multiple people / orgs agree to pay various portions? Maybe with multiple funders a reasonable target fund amount could be reached. Just throwing around ideas here. This is a bit off-topic, but as of now, they're only accepting proposals for projects to be performed by CommandPrompt itself. So that doesn't help me much (note the sig). But in theory it's a good idea. Of course, when and if they open it up, then what? If more than one developer or company is interested in a project, who determines who gets to do the work and get paid for it? If that determination is made by CommandPrompt itself, or if it's just a free-for-all to see who can get their name on the patch that ends up being committed, it's going to be hard to get other people/companies to take it very seriously. Couldnt you open up a dialog about it? Another problem is that even when they do open it up, they apparently intend to charge 7.5 - 15% of the contract value as a finder's fee. That's a lot of money. For a $100 project it's totally reasonable, but for a $10,000 project it's far more expensive than the value of the service they're providing can justify. (Let's not even talk about a $100,000 project.) Hi Robert. I can definately see your arguments, but you failed to describe a better way? Many of us rely heavily on PostgreSQL and would like to get this feature, but sponsoring it all alone does not seem like a viable option (just a guess), taken into consideration we dont even have an estimate about how big it is, but I saw the estimate of 15K USD of the ALTER column position description.. and the visibillity map is most likely in the same ballpark (from my perspective). So in order to get something like a visibillity map (insert your favorite big feature here), you have the option: * Sponsor it all by yourself. (where its most likely going to be too big, or if it is the center of your applictions, then you definitely turn to a RDBMS that has supported it for longer times, if you can). * Wait for someone else to sponsor it all by them selves. (that happens occationally, but for particular features is it hard to see when and what, and the actual sponsor would still have the dilemma in the first point). * Hack it yourselves (many of us dont have time neither skills to do it, and my employer actually wants me to focus on the stuff that brings most direct value for my time, which is a category hacking PG does not fall into when the business is about something totally else). * A kind of microsponsoring like above? * Your proposal in here? To me.. the 4'th bullet point looks like the most viable so far.. To be honest, if it is EDB, Redpill, Command Prompt, 2nd Quadrant or whoever end up doing the job is, seen from this perspective not important, just it ends in the hands of someone capable of doing it. ... allthougth Heikki has done some work on this task allready. Preferrably I would like to get it coordinated by the PG project itself. But I can see that it is really hard to do that kind of stuff. And you would still face the challenge about who should end up doing the thing. Jesper .. dropped Joshua Drake on CC, he might have given all of this some seconds of thought allready. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 2010-10-12 18:02, Scott Carey wrote: However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O. Supposedly, an online defragmenter is in the works for ext4 but it may be years before its available. If some clever postgres hacker could teach postgres to allocate blocks using posix_fallocate in quite large batches, say .. something like: fallocate(min(current_relation_size *0.1,1073741824)); So if you have a relations filling 10GB allready, they the next file for the relations is just fully allocated on the first byte by the filesystem. That would ensure that large table is sitting efficiently on the filesystem level with a minimum of fragmentation on ext4(and other FS's supporting posix_fallocate) and for small systems it would only fill 10% more of diskspace... .. .. last night I spend an hour looking for where its done but couldnt find the source-file where extention of an existing relation takes place.. can someone give directions? -- Jesper
Re: [PERFORM] Slow count(*) again...
On 2010-10-12 19:07, Tom Lane wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. I don't think any of the previous discussion in this thread is on-point at all, except for the parts where people suggested avoiding it. I would have to say that allthough it is nice to get count(*) faster I think your testing is way too simple. It pretty much proves that in terms of the code involved in the count(*) process there is not much to be achieved. But your table has way to little payload. As PG currently is it will start by pushing data off to TOAST when the tuple size reaches 1KB and the speed of count(*) is very much dominated by the amount of dead weight it has to draw in together with the heap-access for the row on accessing the table. Creating a case where the table is this slim is (in my viewpoint) very much to the extreme on the small side. Just having 32 bytes bytes of payload would more or less double you time to count if I read you test results correctly?. .. and in the situation where diskaccess would be needed .. way more. Dividing by pg_relation_size by the amout of tuples in our production system I end up having no avg tuple size less than 100bytes. .. without having complete insigt.. a visibillity map that could be used in conjunction with indices would solve that. What the cost would be of maintaining it is also a factor. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] gist indexes for distance calculations
On 2010-09-30 20:33, Marcelo Zabani wrote: If you can also pinpoint me to where I can find this sort of information (index utilization and planning, performance tuning), I'd be very grateful. Thank you already, Isn't this what the knngist patches are for? https://commitfest.postgresql.org/action/patch_view?id=350 http://www.sai.msu.su/~megera/wiki/knngist -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query much faster with enable_seqscan=0
On 2010-09-21 20:21, Ogden wrote: I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value? Should be lowered to a bit over seq_page_cost.. and more importantly.. you should make sure that you have updated your statistics .. run ANALYZE; -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL PITR - more doubts
On 2010-07-12 13:23, Jayadevan M wrote: Hello all, One doubt about how PostgreSQL PITR works. Let us say I have all the archived WALs for the past week with archive_command = 'cp -i %p /home/postgres/archive/%f/dev/null' I took a base backup last night. If I try to recover the server today after copying the base backup from yesterday and providing restore_command = 'cp /home/postgres/archive/%f %p' does PostgreSQL go through all the past week's archived WALS or it can figure out that the base backup is from yesterday, so skip a large number of archived WALs and start only from file xxx ? Yes, It starts out form where it needs to. Assuming you did a pg_start_backup() before you did your base backup? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On 2010-07-10 00:59, Greg Smith wrote: Matthew Wakeling wrote: If you have an external pool solution, you can put it somewhere else - maybe on multiple somewhere elses. This is the key point to observe: if you're at the point where you have so many connections that you need a pool, the last place you want to put that is on the overloaded database server itself. Therefore, it must be an external piece of software to be effective, rather than being part of the server itself. Database servers are relatively expensive computing hardware due to size/quantity/quality of disks required. You can throw a pooler (or poolers) on any cheap 1U server. This is why a built-in pooler, while interesting, is not particularly functional for how people normally scale up real-world deployments. That may be totally correct for the 10% of the userbase that are in a squeezed situation, but for the 90% that isn't (or isn't aware of being there), the build-in would be a direct benefit. For the 20% living near the edge it may be the difference between just working and extra hassle. I think it is a fair assumption that the majority of PG's users solves the problems without an connection pooler, and the question is if it is beneficial to let them scale better without doing anything? I have also provided a case where Kevin proposal might be a benefit but a connection pooler cannot solve it: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01438.php (at least as I see it, but I'm fully aware that there is stuff I dont know of) I dont think a build-in connection-poller (or similiar) would in any way limit the actions and abillities of an external one? * Both numbers wildly guessed.. -- Jesper
Re: [PERFORM] Write performance
On 2010-06-24 15:45, Janning Vygen wrote: On Thursday 24 June 2010 15:16:05 Janning wrote: On Thursday 24 June 2010 14:53:57 Matthew Wakeling wrote: On Thu, 24 Jun 2010, Janning wrote: We have a 12 GB RAM machine with intel i7-975 and using 3 disks Seagate Barracuda 7200.11, ST31500341AS (1.5 TB) For each drive, you will be able to read/write approximately 8kB / 0.0085s, giving 941kB per second. If you have multiple processes all doing random access, then you may be able to utilise both discs and get double that. So with your calculation I have a maximum of 2MB/s random access. So i really need to upgrade my disk configuration! i was looking at tomshardware.com and the fastest disk is Maxtor Atlas 15K II * 8K147S0,SAS,147 GB, 16 MB Cache,15000 rpm with 5.5 ms random access time. So even if i switch to those disks i can only reach a perfomace gain of 1.5, right? To achieve a better disk performance by factor of ten, i need a raid-10 setup with 12 disks (so i have 6 raid-1 bundles). Or are there other factors with high end disks? Well. On the write-side, you can add in a Raid controller with Battery backed write cache to not make the writes directly hit disk. This improves the amount of writing you can do. On the read-side you can add more memory to your server so a significant part of your most active dataset is cached in memory. It depends on the actual sizes and workload what gives the most benefit for you. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Aggressive autovacuuming ?
Hi. I have been wondering if anyone has been experimenting with really agressive autovacuuming. The database I'm adminstrating rarely have long running transactions (over several minutes). And a fair amount of buffercache and an OS cache of (at best 64GB). A lot of the OS cache is being used for read-caching. My thought was that if I tuned autovacuum to be really aggressive then I could get autovacuum to actually vacuum the tuples before they get evicted from the OS cache thus effectively saving the IO-overhead of vacuuming. The largest consequence I can see at the moment is that when I get a full vacuum (for preventing transaction-id wraparound) it would be run with the same aggressive settings, thus giving a real performance hit in that situation. Has anyone tried to do similar? What is your experience? Is the idea totally bogus? Jesper -- Jesper Krogh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need to increase performance of a query
On 2010-06-10 19:50, Anne Rosset wrote: Any advice on how to make it run faster? What timing do you get if you run it with \t (timing on) and without explain analyze ? I would be surprised if you can get it much faster than what is is.. I may be that a significant portion is planning cost so if you run it a lot you might benefit from a prepared statement. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] planner costs in warm cache tests
On 2010-05-30 20:34, Tom Lane wrote: Jesper Kroghjes...@krogh.cc writes: testdb=# set seq_page_cost = 0.1; SET testdb=# set random_page_cost = 0.1; SET Well, hmm, I really doubt that that represents reality either. A page access is by no means free even when the page is already in cache. I don't recall anyone suggesting that you set these numbers to less than perhaps 0.01. Thank you for the prompt response. Is it a false assumption that the cost should in some metric between different plans be a measurement of actual run-time in a dead-disk run? It should most likely be matching a typical workload situation, but that it really hard to tell anything about, so my feeling would be that the dead disk case is the one closest? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] planner costs in warm cache tests
)) Total runtime: 271.533 ms (9 rows) So in the situation where i have tried to nullify the actual disc-cost, hopefully leaving only the cpu and other cost back and running the query in fully cached mode (two runs). the bitmap-heap-scan is still hugely favorable in actual runtime. (which isn't that much a suprise) but it seems strange that the index-scan is still favored in the cost calculations? I have tried to alter the cost of ts_match_vq but even setting it to 1000 does not change the overall picture. Is the approach simply too naive? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which hardware setup
Option 2: App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x 146GB 15k RPM SAS (RAID1) disks you didnt mention your dataset size, but i the second option would be preferrable in most situations since it gives more of the os memory for disc caching. 12 gb vs 4 gb for the host running pg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Overal comment.. Try reading hrough these old threads most of your problem is the same issue: http://article.gmane.org/gmane.comp.db.postgresql.performance/22395/match=gin http://thread.gmane.org/gmane.comp.db.postgresql.performance/22331/focus=22434 Table public.post Column | Type | Modifiers ---+--+ subject | text | message | text | inserted | timestamp with time zone | modified | timestamp with time zone | replied | timestamp with time zone | ordinal | integer | not null default nextval('post_ordinal_seq'::regclass) Indexes: post_pkey PRIMARY KEY, btree (ordinal) idx_message gin (to_tsvector('english'::text, message)) idx_subject gin (to_tsvector('english'::text, subject)) There's a bunch of other stuff in the table and many more indices, plus foreign references, but stripping the table down to JUST THIS shows the problem. ticker=# explain analyze select * from post where to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc; QUERY PLAN --- Sort (cost=31795.16..31819.68 rows=9808 width=436) (actual time=14.222..17.213 rows=3421 loops=1) Sort Key: modified Sort Method: quicksort Memory: 3358kB - Bitmap Heap Scan on post (cost=1418.95..31144.90 rows=9808 width=436) (actual time=1.878..7.514 rows=3421 loops=1) Recheck Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) - Bitmap Index Scan on idx_message (cost=0.00..1416.49 rows=9808 width=0) (actual time=1.334..1.334 rows=3434 loops=1) Index Cond: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) Total runtime: 20.547 ms (8 rows) Ok, very nice. 20ms. I like that. Now lets limit the return to 100 items: ticker=# explain analyze select * from post where to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100; QUERY PLAN Limit (cost=0.00..5348.69 rows=100 width=436) (actual time=198.047..2607.077 rows=100 loops=1) - Index Scan Backward using post_modified on post (cost=0.00..524599.31 rows=9808 width=436) (actual time=198.043..2606.864 rows=100 loops=1) Filter: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) Total runtime: 2607.231 ms (4 rows) Bad. Notice that the optimizer decided it was going to do an index scan with an internal filter on it! That's BACKWARD; what I want is for the planner to first execute the index scan on the GIN index, then order the return and limit the returned data set. But it gets much worse - let's use something that's NOT in the message base (the table in question has some ~2m rows by the way and consumes several gigabytes on disk - anything that actually READS the table is instant bad news!) The one problem is that the query-planner doesn't have any specific knowlege about the cost of the gin-index search. Thats mentioned in one of the above threads. The other problem is that the cost of to_tsvector and ts_match_vq are set way to conservative in the default installation. Bumping those up will increase your amount of correct plans, but it doesnt solve all of it since the above problem is also interferring. But try upping the cost of those two functions significantly. alter function ts_match_vq(tsvector,tsquery) cost 500 (upping the cost times 500 for that one). I've I've got it right it is more in the correct ballpark it more or less translates to how much more expensive the function is compared to really simple operators). Another thing you can do, that favours the running time of the queries using to_tsvector() is to specifically store the tsvector in the table and create an index on that. That will at run-time translate into fewer calls (0 to be precise) of to_tsvector and only costing the ts_match_vq at run-time. Why is the planner taking into consideration the LIMIT (I know the docs say it does) and choosing to sequentially scan a table of nearly 2 million rows?! I don't see how that makes sense irrespective of the query being LIMITed. If it matters setting enable_seqscan OFF does not impact the results. No, because you end up in index-scans on non-gin indexes in that situtaion.. so turning seqscan off has no effect. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql
Re: [PERFORM] 3ware vs. MegaRAID
On 2010-04-08 05:44, Dave Crooke wrote: For a card level RAID controller, I am a big fan of the LSI , which is available in a PCIe riser form factor for blade / 1U servers, and comes with 0.5GB of battery backed cache. Full Linux support including mainline kernel drivers and command line config tools. Was using these with SAS expanders and 48x 1TB SATA-300 spindles per card, and it was pretty (adjective) quick for a card-based system ... comparable with a small FC-AL EMC Clariion CX3 series in fact, just without the redundancy. Can someone shed simple light on an extremely simple question. How do you physicallly get 48 drives attached to an LSI that claims to only have 2 internal and 2 external ports? (the controller claims to support up to 240 drives). I'm currently looking at getting a server with space for 8 x 512GB SSDs running raid5 (or 6) and are looking for an well performing controller with BBWC for the setup. So I was looking for something like the LSI888ELP. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 3ware vs. MegaRAID
On 2010-04-09 17:27, Greg Smith wrote: Jesper Krogh wrote: Can someone shed simple light on an extremely simple question. How do you physicallly get 48 drives attached to an LSI that claims to only have 2 internal and 2 external ports? (the controller claims to support up to 240 drives). There are these magic boxes that add SAS expansion, which basically splits a single port so you can connect more drives to it. An example from a vendor some of the regulars on this list like is http://www.aberdeeninc.com/abcatg/kitjbod-1003.htm You normally can't buy these except as part of an integrated drive chassis subsystem. If you get one that has an additional pass-through port, that's how you can stack these into multiple layers and hit really large numbers of disks. I've spent quite some hours googling today. Am I totally wrong if the: HP MSA-20/30/70 and Sun Oracle J4200's: https://shop.sun.com/store/product/53a01251-2fce-11dc-9482-080020a9ed93 are of the same type just from major vendors. That would enable me to reuse the existing server and moving to something like Intel's X25-M 160GB disks with just a higher amount (25) in a MSA-70. -- Jesper .. that's beginning to look like a decent plan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 3ware vs. MegaRAID
On 2010-04-09 20:22, Greg Smith wrote: Jesper Krogh wrote: I've spent quite some hours googling today. Am I totally wrong if the: HP MSA-20/30/70 and Sun Oracle J4200's: https://shop.sun.com/store/product/53a01251-2fce-11dc-9482-080020a9ed93 are of the same type just from major vendors. Yes, those are the same type of implementation. Every vendor has their own preferred way to handle port expansion, and most are somewhat scared about discussing the whole thing now because EMC has a ridiculous patent on the whole idea[1]. They all work the same from the user perspective, albeit sometimes with their own particular daisy chaining rules. That would enable me to reuse the existing server and moving to something like Intel's X25-M 160GB disks with just a higher amount (25) in a MSA-70. I guess, but note that several of us here consider Intel's SSDs unsuitable for critical database use. There are some rare but not impossible to encounter problems with its write caching implementation that leave you exposed to database corruption if there's a nasty power interruption. Can't get rid of the problem without destroying both performance and longevity of the drive[2][3]. If you're going to deploy something using those drives, please make sure you're using an aggressive real-time backup scheme such as log shipping in order to minimize your chance of catastrophic data loss. [1] http://www.freepatentsonline.com/7624206.html [2] http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ [3] http://petereisentraut.blogspot.com/2009/07/solid-state-drive-benchmarks-and-write.html There are some things in my scenario... that cannot be said to be general in all database situations. Having to go a week back (backup) is not really a problem, so as long as i have a reliable backup and the problems doesnt occour except from unexpected poweroffs then I think I can handle it. Another thing is that the overall usage is far dominated by random-reads, which is the performance I dont ruin by disabling write-caching. And by adding a 512/1024MB BBWC on the controller I bet I can re-gain enough write performance to easily make the system funcition. Currently the average writeout is way less than 10MB/s but the reading processes all spends most of their time in iowait. Since my application is dominated by by random reads I think that I still should have a huge gain over regular SAS drives on that side of the equation, but most likely not on the write-side. But all of this is so far only speculations, since the vendors doesnt seem eager on lending out stuff these day, so everything is only on paper so far. There seem to be consensus that on the write-side, SAS-disks can fairly easy outperform SSDs. I have not seen anything showing that they dont still have huge benefits on the read-side. It would be nice if there was an easy way to test and confirm that it actually was robust to power-outtake.. .. just having a disk-array with build-in-battery for the SSDs would solve the problem. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] perf problem with huge table
Hi all, i am trying to move my app from M$sql to PGsql, but i need a bit of help :) Except from all the other good advises about partitioning the dataset and such there is another aspect to keep in mind. When you have a large dataset and your queries become IO-bound the tuple density is going to hit you in 2 ways. Your dataset seems to have a natural clustering around the time, which is also what you would use for the partitioning. That also means that if you sort of have the clustering of data on disk you would have the tuples you need to satisfy a query on the same page or pages close to. The cost of checking visibillity for a tuple is to some degree a function of the tuple size, so if you can do anything to increase the tuple density that will most likely benefit speed in two ways: * You increace the likelyhood that the next tuple was in the same page and then dont result in a random I/O seek. * You increace the total amount of tuples you have sitting in your system cache in the same amount of pages (memory) so they dont result in a random I/O seek. So .. if you are carrying around columns you dont really need, then throw them away. (that could be colums that trivially can be computed bases on other colums), but you need to do your own testing here. To stress the first point theres a sample run on a fairly old desktop with one SATA drive. testtable has the id integer and a data which is 486 bytes of text. testtable2 has the id integer and a data integer. both filled with 10M tuples and PG restarted and rand drop caches before to simulate totally disk bound system. testdb=# select count(id) from testtable where id 800 and id 850; count 49 (1 row) Time: 7909.464 ms testdb=# select count(id) from testtable2 where id 800 and id 850; count 49 (1 row) Time: 2149.509 ms In this sample.. 4 times faster, the query does not touch the data column. (on better hardware you'll most likely see better results). If the columns are needed, you can push less frequently used columns to a 1:1 relation.. but that gives you some administrative overhead, but then you can desice at query time if you want the extra random seeks to access data. You have the same picture the other way around if your queries are accession data sitting in TOAST, you'll be paying double random IO-cost for getting the tuple. So it is definately a tradeoff, that should be done with care. I've monkeypatched my own PG using this patch to toy around with criteria to send the less frequently used data to a TOAST table. http://article.gmane.org/gmane.comp.db.postgresql.devel.general/135158/match= Google vertical partition for more, this is basically what it is. (I belive this could benefit my own application, so I'm also trying to push some interest into the area). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!
Frankly, I was quite surprised by this, since some of the benchmarks people have published on the effects of using a separate RAID for the WAL files have only shown a one or two percent difference when using a hardware RAID controller with BBU cache configured for write-back. Hi Kevin. Nice report, but just a few questions. Sorry if it is obvious.. but what filesystem/OS are you using and do you have BBU-writeback on the main data catalog also? Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Digesting explain analyze
Hi. I have a table that consists of somewhere in the magnitude of 100.000.000 rows and all rows are of this tuples (id1,id2,evalue); Then I'd like to speed up a query like this: explain analyze select id from table where id1 = 2067 or id2 = 2067 order by evalue asc limit 100; QUERY PLAN --- Limit (cost=1423.28..1423.28 rows=100 width=12) (actual time=2.565..2.567 rows=100 loops=1) - Sort (cost=1423.28..1424.54 rows=505 width=12) (actual time=2.560..2.560 rows=100 loops=1) Sort Key: evalue Sort Method: top-N heapsort Memory: 25kB - Bitmap Heap Scan on table (cost=16.58..1420.75 rows=505 width=12) (actual time=0.709..1.752 rows=450 loops=1) Recheck Cond: ((id1 = 2067) OR (id2 = 2067)) - BitmapOr (cost=16.58..16.58 rows=506 width=0) (actual time=0.676..0.676 rows=0 loops=1) - Bitmap Index Scan on id1_evalue_idx (cost=0.00..11.44 rows=423 width=0) (actual time=0.599..0.599 rows=450 loops=1) Index Cond: (id1_id = 2067) - Bitmap Index Scan on id2_evalue_idx (cost=0.00..4.89 rows=83 width=0) (actual time=0.070..0.070 rows=1 loops=1) Index Cond: (id2_id = 2067) Total runtime: 2.642 ms (12 rows) What I had expected was to see the Bitmap Index Scan on id1_evalue_idx to chop it off at a limit 1. The inner sets are on average 3.000 for both id1 and id2 and a typical limit would be 100, so if I could convince postgresql to not fetch all of them then I would reduce the set retrieved by around 60. The dataset is quite large so the random query is not very likely to be hitting the same part of the dataset again, so there is going to be a fair amount of going to disk., I would also mean that using it in a for loop in a stored-procedure in plpgsql it would not get any benefit from the CURSOR effect? I actually tried to stuff id1,id2 into an array and do a GIST index on the array,evalue hoping that it directly would satisfy this query.. it used the GIST index fetch the rows the post-sorting and limit on the set. What it boils down to is more or less: Does a bitmap index scan support ordering and limit ? Does a multicolummn gist index support ordering and limit ? Have I missed anything that can hugely speed up fetching these (typically 100 rows) from the database. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Digesting explain analyze
Ron Mayer wrote: ...The inner sets are on average 3.000 for both id1 and id2 and a typical limit would be 100, so if I could convince postgresql to not fetch all of them then I would reduce the set retrieved by around 60. The dataset is quite large so the random query is not very likely to be hitting the same part of the dataset again, so there is going to be a fair amount of going to disk., If disk seeks are killing you a kinda crazy idea would be to duplicate the table - clustering one by (id1) and the other one by an index on (id2) and unioning the results of each. That's doubling the disk space needs for the table. Is there any odds that this would benefit when the intitial table significantly exceeds available memory by itself? Since each of these duplicates of the table will be clustered by the column you're querying it on, it should just take one seek in each table. Then your query could be something like select * from ( select * from t1 where id1=2067 order by evalue limit 100 union select * from t2 where id2=2067 order by evalue limit 100 ) as foo order by evalue limit 100; This is actually what I ended up with as the best performing query, just still on a single table, because without duplication I can add index and optimize this one by (id1,evalue) and (id2,evalue). It is still getting killed quite a lot by disk IO. So I guess I'm up to: 1) By better disk (I need to get an estimate how large it actually is going to get). 2) Stick with one table, but make sure to have enough activity to get a large part of the index in the OS-cache anyway. (and add more memory if nessesary). The data is seeing a fair amount of growth (doubles in a couple of years ) so it is fairly hard to maintain clustering on them .. I would suspect. Is it possible to get PG to tell me, how many rows that fits in a disk-page. All columns are sitting in plain storage according to \d+ on the table. Hmm.. and I wonder if putting evalue into the criteria to cluster the tables too (i.e. cluster on id1,evalue) if you could make it so the limit finds the right 100 evalues first for each table I didnt cluster it, since clustering locks everything. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Message queue table - strange performance drop with changing limit size.
Hi. I have a message queue table, that contains in the order of 1-10m messages. It is implemented using TheSchwartz: http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz.pm So when a worker picks the next job it goes into the job table an select the top X highest priority messages with the funcid that it can work on. The table looks like this: db=# \d workqueue.job Table workqueue.job Column | Type | Modifiers ---+--+--- jobid | integer | not null default nextval('workqueue.job_jobid_seq'::regclass) funcid| integer | not null arg | bytea| uniqkey | text | insert_time | integer | run_after | integer | not null grabbed_until | integer | not null priority | smallint | coalesce | text | Indexes: job_funcid_key UNIQUE, btree (funcid, uniqkey) funcid_coalesce_priority btree (funcid, coalesce, priority) funcid_prority_idx2 btree (funcid, priority) job_jobid_idx btree (jobid) efam=# explain ANALYZe select jobid from workqueue.job where job.funcid in (3) order by priority asc limit 1000; QUERY PLAN Limit (cost=0.00..2008.53 rows=1000 width=6) (actual time=0.077..765.169 rows=1000 loops=1) - Index Scan using funcid_prority_idx2 on job (cost=0.00..7959150.95 rows=3962674 width=6) (actual time=0.074..763.664 rows=1000 loops=1) Index Cond: (funcid = 3) Total runtime: 766.104 ms (4 rows) efam=# explain ANALYZe select jobid from workqueue.job where job.funcid in (3) order by priority asc limit 50; QUERY PLAN -- Limit (cost=0.00..100.43 rows=50 width=6) (actual time=0.037..505.765 rows=50 loops=1) - Index Scan using funcid_prority_idx2 on job (cost=0.00..7959150.95 rows=3962674 width=6) (actual time=0.035..505.690 rows=50 loops=1) Index Cond: (funcid = 3) Total runtime: 505.959 ms (4 rows) efam=# explain ANALYZe select jobid from workqueue.job where job.funcid in (3) order by priority asc limit 10; QUERY PLAN Limit (cost=0.00..20.09 rows=10 width=6) (actual time=0.056..0.653 rows=10 loops=1) - Index Scan using funcid_prority_idx2 on job (cost=0.00..7959152.95 rows=3962674 width=6) (actual time=0.054..0.640 rows=10 loops=1) Index Cond: (funcid = 3) Total runtime: 0.687 ms (4 rows) So what I see is that top 10 takes 1ms, top 50 takes over 500 times more, and top 1000 only 1.5 times more than top 50. What can the reason be for the huge drop between limit 10 and limit 50 be? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Tom Lane wrote: It may well be that Jesper's identified a place where the GIN code could be improved --- it seems like having the top-level search logic be more aware of the AND/OR structure of queries would be useful. But the particular example shown here doesn't make a very good case for that, because it's hard to tell how much of a penalty would be taken in more realistic examples. With a term sitting in: 80% of the docs the penalty is: x23 60% of the docs the penalty is: x17 40% of the docs the penalty is: x13 of doing vectorcol @@ ts_query('term commonterm') compared to vectorcol @@ ts_query('term) and vectorcol @@ ts_query('commonterm'); where term is non-existing (or rare). (in query execution performance on a fully memory recident dataset, doing test with drop_caches and restart pg to simulate a dead disk the numbers are a bit higher). http://article.gmane.org/gmane.comp.db.postgresql.performance/22496/match= Would you ever quantify a term sitting in 60-80% as a stop-word candidate? I dont know if x13 in execution performance is worth hunting or there are lower hanging fruits sitting in the fts-search-system. This is essentially the penalty the user will get for adding a terms to their search that rarely restricts the results. In term of the usual set theory that databases work in, a search for a stop-word translated into the full set. This is just not the case in where it throws a warning and returns the empty set. This warning can be caught by application code to produce the correct result to the users, but just slightly more complex queries dont do this: ftstest=# select id from ftstest where body_fts @@ to_tsquery('random | the') limit 10; id (0 rows) Here I would have expected the same error.. I basically have to hook in the complete stop-word dictionary in a FTS-preparser to give the user the expected results or have I missed a feature somwhere? My reason for not pushing commonterms into the stopword list is that they actually perform excellent in PG. Same body as usual, but commonterm99 is sitting in 99% of the documents. ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN -- Bitmap Heap Scan on ftstest (cost=1051476.74..1107666.07 rows=197887 width=4) (actual time=51.036..121.348 rows=197951 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm99'::text)) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..1051427.26 rows=197887 width=0) (actual time=49.602..49.602 rows=197951 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 147.350 ms (5 rows) ftstest=# set enable_seqscan=on; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN -- Seq Scan on ftstest (cost=0.00..56744.00 rows=197887 width=4) (actual time=0.086..7134.384 rows=197951 loops=1) Filter: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 7194.182 ms (3 rows) So in order to get the result with a speedup of more than x50 I simply cannot add these terms to the stop-words because then the first query would resolve to an error and getting results would then be up to the second query. My bet is that doing a seq_scan will never be beneficial for this type of query. As far as I can see the only consequence of simply not remove stop-words at all is a (fairly small) increase in index-size. It seems to me that stop-words were invented when it was hard to get more than 2GB of memory into a computer to get the index-size reduced to a size that better could fit into memory. But nowadays it seems like the downsides are hard to see? Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Hi. I've now got a test-set that can reproduce the problem where the two fully equivalent queries ( body_fts @@ to_tsquery(commonterm nonexistingterm) and body_fts @@ to_tsquery(coomonterm) AND body_fts @@ to_tsquery(nonexistingterm) give a difference of x300 in execution time. (grows with document-base-size). this can now be reproduced using: * http://krogh.cc/~jesper/fts-queryplan.pl and http://krogh.cc/~jesper/words.txt It build up a table with 200.000 documents where commonterm exists in all of them. nonexistingterm is in 0. To get the query-planner get a sane query I need to do a: ftstest# set enable_seqscan=off Then: ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('nonexistingterm commonterm'); QUERY PLAN Bitmap Heap Scan on ftstest (cost=5563.09..7230.93 rows=1000 width=4) (actual time=30.861..30.861 rows=0 loops=1) Recheck Cond: (body_fts @@ to_tsquery('nonexistingterm commonterm'::text)) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..5562.84 rows=1000 width=0) (actual time=30.856..30.856 rows=0 loops=1) Index Cond: (body_fts @@ to_tsquery('nonexistingterm commonterm'::text)) Total runtime: 30.907 ms (5 rows) ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm'); QUERY PLAN -- Bitmap Heap Scan on ftstest (cost=5565.59..7238.43 rows=1000 width=4) (actual time=0.059..0.059 rows=0 loops=1) Recheck Cond: ((body_fts @@ to_tsquery('nonexistingterm'::text)) AND (body_fts @@ to_tsquery('commonterm'::text))) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..5565.34 rows=1000 width=0) (actual time=0.057..0.057 rows=0 loops=1) Index Cond: ((body_fts @@ to_tsquery('nonexistingterm'::text)) AND (body_fts @@ to_tsquery('commonterm'::text))) Total runtime: 0.111 ms (5 rows) Run repeatedly to get a full memory recident dataset. In this situation the former query end up being 300x slower than the latter allthough they are fully equivalent. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Craig Ringer wrote: On 8.4 on a different system Pg uses the seq scan by preference, with a runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when searching for `commonterm' even when enable_seqscan is set to `off'. A search for `commonterm80' also uses a seq scan (1067ms), but if enable_seqscan is set to off it'll use a bitmap heap scan at 237ms. Ok, thats excactly as my number. On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ... odd. If I force it not to use a bitmap heap scan it'll use an index scan. Preventing that too results in a seq scan with a runtime of 1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like a pretty strange result on face value. PG 8.3 doesnt have statistics data available for gin-indexes so that may be why the query-planner can do otherwise on 8.3. It also means that it is a regression since in these cases 8.4 will perform worse than 8.3 did. (allthough the statistics makes a lot other cases way better). So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT longer than the bitmap heap scan, though similar numbers of tuples are being read by both. I see the same results when actually reading the results rather than just doing an `explain analyze'. With psql set to send output to /dev/null and with \timing enabled: test= \o /dev/null test= set enable_seqscan = on; Time: 0.282 ms test= select id from ftstest where body_fts @@ to_tsquery('commonterm80'); Time: 988.880 ms test= set enable_seqscan = off; Time: 0.286 ms test= select id from ftstest where body_fts @@ to_tsquery('commonterm80'); Time: 159.167 ms so - nearly 1s vs 0.15s is a big difference between what I previously confirmed to be bitmap heap scan and seq scan respectively for the same query. The same number of records are being returned in both cases. If I select * rather than just reading the `id' field, the runtimes are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE' results are still quite different, though, at 1020ms seq scan vs 233ms bitmap heap, suggesting that the similarity is created only by the time taken to actually transfer the data to the client. The time difference between the two is much the same. So - for some reason the seq scan takes 800ms or so longer than the bitmap heap scan. I can see why you're puzzled. I can reproduce it on two different machines with two different Pg versions, and using two slightly different methods for loading the data as well. So, I can confirm your test results now that I'm actually testing properly. Thanks a lot. test= explain analyze select * from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN -- Bitmap Heap Scan on ftstest (cost=25836.66..36432.95 rows=39753 width=54) (actual time=27.452..175.481 rows=39852 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..25826.72 rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 233.473 ms (5 rows) test= set enable_seqscan = on; SET test= explain analyze select * from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN Seq Scan on ftstest (cost=0.00..10750.00 rows=39753 width=54) (actual time=0.141..956.496 rows=39852 loops=1) Filter: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 1020.936 ms (3 rows) My systems seems more to prefer bitmap-scans a bit more, but given the actual number it seems to be preferrablem. Thats about query-planning, my main reason for posting was the actual run time. By the way, for the 8.4 test I modifed the loader script so it wouldn't take quite so painfully long to run second time 'round. I turned autocommit off, wrapped all the inserts up in a single transaction, and moved the fts index creation to after all the data has been inserted. It's a *LOT* faster, and the test results match yours. I'll make that change if I have to work a bit more with it. Thanks for speding time confirming my findings. (the I know its not just me getting blind at some problem). Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh jes...@krogh.cc wrote: Given that the seq-scan have to visit 50K row to create the result and the bitmap heap scan only have to visit 40K (but search the index) we would expect the seq-scan to be at most 25% more expensive than the bitmap-heap scan.. e.g. less than 300ms. I've seen behavior similar to this in the past with a plain old B-tree index. As in your case, a bitmap index scan was significantly faster than a sequential scan even though essentially all the heap pages had to be scanned, but the planner expected the opposite to be true. The planner's expectation is that the dominent cost will be fetching the pages, and it furthermore thinks that fetching things in sequential order is much better than skipping around randomly. However, if all the pages are memory-resident - possibly even in L2 or L3 CPU cache - fetching the pages is nearly free, so the dominant cost becomes the CPU time to process the tuples. Well, no. This topic is not at all about the query-planner. It is about the actual run-time of the two allmost identical queries. It may be that we're seeing the results because one fits better into L2 or L3 cache, but the complete dataset is memory resident and run multiple times in a row to eliminate disk-access. My best guess is that in cases like this index cond is cheaper to evaluate than the recheck cond/filter, so the index scan wins not by reading fewer pages but by avoiding the need to examine some of the tuples on each page. I might be all wet, though. In my example the seq-scan evaulates 50K tuples and the heap-scan 40K. The question is why does the per-tuple evaluation become that much more expensive (x7.5)[1] on the seq-scan than on the index-scan, when the complete dataset indeed is in memory? If your whole database fits in RAM, you could try changing your seq_page_cost and random_page_cost variables from the default values of 1 and 4 to something around 0.05, or maybe even 0.01, and see whether that helps. This is about planning the query. We're talking actual runtimes here. [1] 50K tuples in 1.800ms vs. 40K tuples in 200ms -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full text search - query plan? PG 8.4.1
Tom Lane wrote: Jesper Krogh jes...@krogh.cc writes: Is is possible to manually set the cost for the @@ operator? You want to set the cost for the underlying function. alter function ts_match_vq(tsvector,tsquery) cost 500 seems to change my test-queries in a very positive way (e.g. resolve to bitmap index scan on most queryies but fall onto index-scans on alternative columns when queriterms are common enough). According to the documentation the default cost is 1 for builin functions and 100 for others, is this true for the ts-stuff also? Can I query the database for the cost of the functions? It somehow seems natural that comparing a 1,3 term tsquery to a 200+ term tsvector is orders of magitude more expensive than simple operations. I somehow suspect that this is a bad thing to do if I have other gin-indexes where the tsvector is much smaller in the same database. But then I can just use ts_match_qv for those queries or add my own which justy raises the cost. Thanks. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Hi. I'm currently trying to figure out why the tsearch performance seems to vary a lot between different queryplans. I have created a sample dataset that sort of resembles the data I have to work on. The script that builds the dataset is at: http://krogh.cc/~jesper/build-test.pl and http://krogh.cc/~jesper/words.txt is needed for it to run. Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. The dataset consists of words randomized, but .. all records contains commonterm, around 80% contains commonterm80 and so on.. my $rand = rand(); push @doc,commonterm if $commonpos == $j; push @doc,commonterm80 if $commonpos == $j $rand 0.8; Results are run multiple times after each other so they should be reproducible: ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN Seq Scan on ftstest (cost=0.00..10750.00 rows=40188 width=4) (actual time=0.102..1792.215 rows=40082 loops=1) Filter: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 1809.437 ms (3 rows) ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); QUERY PLAN --- Bitmap Heap Scan on ftstest (cost=115389.14..125991.96 rows=40188 width=4) (actual time=17.445..197.356 rows=40082 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text)) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..115379.09 rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm80'::text)) Total runtime: 204.201 ms (5 rows) Given that the seq-scan have to visit 50K row to create the result and the bitmap heap scan only have to visit 40K (but search the index) we would expect the seq-scan to be at most 25% more expensive than the bitmap-heap scan.. e.g. less than 300ms. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Craig Ringer wrote: On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote: Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1. The dataset consists of words randomized, but .. all records contains commonterm, around 80% contains commonterm80 and so on.. my $rand = rand(); push @doc,commonterm if $commonpos == $j; push @doc,commonterm80 if $commonpos == $j $rand 0.8; You should probably re-generate your random value for each call rather than store it. Currently, every document with commonterm20 is guaranteed to also have commonterm40, commonterm60, etc, which probably isn't very realistic, and also makes doc size correlate with word rarity. I had that in the first version, but I wanted to have the gaurantee that a commonterm60 was indeed a subset of commonterm80, so that why its sturctured like that. I know its not realistic, but it gives measureable results since I know my queries will hit the same tuples. I fail to see how this should have any direct effect on query time? Given that the seq-scan have to visit 50K row to create the result and the bitmap heap scan only have to visit 40K (but search the index) we would expect the seq-scan to be at most 25% more expensive than the bitmap-heap scan.. e.g. less than 300ms. I suspect table bloat. Try VACUUMing your table and trying again. No bloat here: ftstest=# VACUUM FULL VERBOSE ftstest; INFO: vacuuming public.ftstest INFO: ftstest: found 0 removable, 5 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 1352 to 1652 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 6859832 bytes. 0 pages are or will become empty, including 0 at the end of the table. 536 pages containing 456072 free bytes are potential move destinations. CPU 0.03s/0.03u sec elapsed 0.06 sec. INFO: index ftstest_id_key now contains 5 row versions in 139 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.13 sec. INFO: index ftstest_gin_idx now contains 5 row versions in 35792 pages DETAIL: 0 index pages have been deleted, 25022 are currently reusable. CPU 0.46s/0.11u sec elapsed 11.16 sec. INFO: ftstest: moved 0 row versions, truncated 1 to 1 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: vacuuming pg_toast.pg_toast_908525 INFO: pg_toast_908525: found 0 removable, 10 nonremovable row versions in 16710 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 270 to 2032 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 3695712 bytes. 0 pages are or will become empty, including 0 at the end of the table. 5063 pages containing 1918692 free bytes are potential move destinations. CPU 0.38s/0.17u sec elapsed 2.64 sec. INFO: index pg_toast_908525_index now contains 10 row versions in 276 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.28 sec. INFO: pg_toast_908525: moved 0 row versions, truncated 16710 to 16710 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ftstest=# In this sort of test it's often a good idea to TRUNCATE the table before populating it with a newly generated data set. That helps avoid any residual effects from table bloat etc from lingering between test runs. As you could see in the scripts, the table is dropped just before its recreated and filled with data. Did you try to re-run the test? Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Craig Ringer wrote: On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote: You should probably re-generate your random value for each call rather than store it. Currently, every document with commonterm20 is guaranteed to also have commonterm40, commonterm60, etc, which probably isn't very realistic, and also makes doc size correlate with word rarity. I had that in the first version, but I wanted to have the gaurantee that a commonterm60 was indeed a subset of commonterm80, so that why its sturctured like that. I know its not realistic, but it gives measureable results since I know my queries will hit the same tuples. I fail to see how this should have any direct effect on query time? Probably not, in truth, but with the statistics-based planner I'm occasionally surprised by what can happen. In this sort of test it's often a good idea to TRUNCATE the table before populating it with a newly generated data set. That helps avoid any residual effects from table bloat etc from lingering between test runs. As you could see in the scripts, the table is dropped just before its recreated and filled with data. Did you try to re-run the test? No, I didn't. I thought it worth checking if bloat might be the result first, though I should've read the scripts to confirm you weren't already handling that possibility. Anyway, I've done a run to generate your data set and run a test. After executing the test statement twice (once with and once without enable_seqscan) to make sure all data is in cache and not being read from disk, when I run the tests here are my results: test= set enable_seqscan=on; SET test= explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm80'); Here you should search for commonterm not commonterm80, commonterm will go into a seq-scan. You're not testing the same thing as I did. Any chance your disk cache was cold on the first test run, so Pg was having to read the table from disk during the seqscan, and could just use shared_buffers when you repeated the test for the index scan? they were run repeatedly. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote: In effect, what you want are words that aren't searched (or stored) in the index, but are included in the tsvector (so the RECHECK still works). That sounds like it would solve your problem and it would reduce index size, improve update performance, etc. I don't know how difficult it would be to implement, but it sounds reasonable to me. That sounds like it could require an index rebuild if the distribution changes? My thought was that the common words could be declared to be common the same way stop words are. As long as words are only added to this list, it should be OK. That would be another plan to pursue, but the MCV is allready there The problem with MCVs is that the index search can never eliminate documents because they don't contain a match, because it might contain a match that was previously an MCV, but is no longer. No, it definately has to go visit the index/table to confirm findings, but that why I wrote Queryplan in the subject line, because this os only about the strategy to pursue to obtain the results. And a strategy about limiting the amout of results as early as possible (as PG usually does) would be what I'd expect and MCV can help it guess on that. Similar finding, rewrite the query: (now i took the extreme and made raretem a spellingerror), so result is 0. ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts @@ to_tsquery('commonterm spellerror') limit 100; QUERY PLAN - Limit (cost=132.63..188.89 rows=28 width=739) (actual time=862.714..862.714 rows=0 loops=1) - Bitmap Heap Scan on ftsbody (cost=132.63..188.89 rows=28 width=739) (actual time=862.711..862.711 rows=0 loops=1) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('commonterm spellerror'::text)) - Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.62 rows=28 width=0) (actual time=862.702..862.702 rows=0 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('commonterm spellerror'::text)) Total runtime: 862.771 ms (6 rows) ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts @@ to_tsquery('commonterm') and ftsbody_body_fts @@ to_tsquery('spellerror') limit 100; QUERY PLAN Limit (cost=132.70..189.11 rows=28 width=739) (actual time=8.669..8.669 rows=0 loops=1) - Bitmap Heap Scan on ftsbody (cost=132.70..189.11 rows=28 width=739) (actual time=8.665..8.665 rows=0 loops=1) Recheck Cond: ((ftsbody_body_fts @@ to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@ to_tsquery('spellerror'::text))) - Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.70 rows=28 width=0) (actual time=8.658..8.658 rows=0 loops=1) Index Cond: ((ftsbody_body_fts @@ to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@ to_tsquery('spellerror'::text))) Total runtime: 8.724 ms (6 rows) So getting them with AND inbetween gives x100 better performance. All queries are run on hot disk repeated 3-5 times and the number are from the last run, so disk-read effects should be filtered away. Shouldn't it somehow just do what it allready are capable of doing? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
jes...@krogh.cc wrote: So getting them with AND inbetween gives x100 better performance. All queries are run on hot disk repeated 3-5 times and the number are from the last run, so disk-read effects should be filtered away. Shouldn't it somehow just do what it allready are capable of doing? I'm guessing to_tsquery(...) will produce a tree of search terms (since it allows for quite complex expressions). Presumably there's a standard order it gets processed in too, so it should be possible to generate a more or less efficient ordering. That structure isn't exposed to the planner though, so it doesn't benefit from any re-ordering the planner would normally do for normal (exposed) AND/OR clauses. Now, to_tsquery() can't re-order the search terms because it doesn't know what column it's being compared against. In fact, it might not be a simple column at all. I cant follow this logic based on explain output, but I may have misunderstood something. The only difference in these two query-plans is that we have an additional or'd term in the to_tsquery(). What we see is that, the query-planner indeed has knowledge about changes in the row estimates based on changes in the query to to_tsquery(). My guess is that it is because to_tsquery actually parses the query and give the estimates, now how can to_tsquery give estimates without having access to the statistics for the column? ftstest=# explain select id from ftsbody where ftsbody_body_fts @@ to_tsquery('reallyrare'); QUERY PLAN - Bitmap Heap Scan on ftsbody (cost=132.64..190.91 rows=29 width=4) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text)) - Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..132.63 rows=29 width=0) Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text)) (4 rows) ftstest=# explain select id from ftsbody where ftsbody_body_fts @@ to_tsquery('reallyrare | morerare'); QUERY PLAN - Bitmap Heap Scan on ftsbody (cost=164.86..279.26 rows=57 width=4) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare | morerare'::text)) - Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..164.84 rows=57 width=0) Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare | morerare'::text)) (4 rows) ftstest=# explain select id from ftsbody where ftsbody_body_fts @@ to_tsquery('reallyrare | reallycommon'); QUERY PLAN -- Seq Scan on ftsbody (cost=0.00..1023249.39 rows=5509293 width=4) Filter: (ftsbody_body_fts @@ to_tsquery('reallyrare | reallycommon'::text)) (2 rows) 2. A variant to_tsquery_with_sorting() which would take the column-name or something and look up the stats to work against. Does above not seem like its there allready? (sorry.. looking at C-code from my point of view would set me a couple of weeks back, so I have troble getting closer to the answer than interpreting the output and guessing the rest). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Calculating selectivity for the query-planner on ts_vector colums.
Hi It seems to me that the row estimates on a ts_vector search is a bit on the low side for terms that is not in th MCV-list in pg_stats: ftstest=# explain select id from ftstest where ftstest_body_fts @@ to_tsquery('nonexistingterm') order by id limit 10; QUERY PLAN - Limit (cost=221.93..221.95 rows=10 width=4) - Sort (cost=221.93..222.01 rows=33 width=4) Sort Key: id - Bitmap Heap Scan on ftstest (cost=154.91..221.22 rows=33 width=4) Recheck Cond: (ftstest_body_fts @@ to_tsquery('nonexistingterm'::text)) - Bitmap Index Scan on ftstest_tfs_idx (cost=0.00..154.90 rows=33 width=0) Index Cond: (ftstest_body_fts @@ to_tsquery('nonexistingterm'::text)) (7 rows) Then I have been reading: http://www.postgresql.org/docs/8.4/static/row-estimation-examples.html and trying to reproduce the selectivity number for this query: selectivity = (1 - sum(mvf))/(num_distinct - num_mcv) num_distinct is around 10m. ftstest=# SELECT attname,array_dims(most_common_vals),array_dims(most_common_freqs) FROM pg_stats WHERE tablename='ftstest' AND attname='ftstest_body_fts'; attname | array_dims | array_dims --++ ftstest_body_fts | [1:2088] | [1:2090] (1 row) ftstest=# select tablename,attname,freq from (select tablename,attname, sum(freq) as freq from (SELECT tablename,attname,unnest(most_common_freqs) as freq FROM pg_stats) as foo group by tablename,attname) as foo2 where freq 1; tablename | attname | freq ---+--+- ftstest | ftstest_body_fts | 120.967 (1 row) then the selectivity is (1-120.967)/(1000 - 2088) = -.1199920543409463 Which seem .. well wrong. The algorithm makes the assumption that if a record is matching one of the MCV's then it is not in the matching a rare-term. The above algorithm doesnt give me the 33 rows about, so can anyone shortly describe the changes for this algorithm when using ts_vectors? Thanks. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Calculating selectivity for the query-planner on ts_vector colums.
Tom Lane wrote: Jesper Krogh jes...@krogh.cc writes: It seems to me that the row estimates on a ts_vector search is a bit on the low side for terms that is not in th MCV-list in pg_stats: tsvector has its own selectivity estimator that's not like plain scalar equality. Look into src/backend/tsearch/ts_selfuncs.c if you want to see the rules. Thanks. least_common_frequence / 2 Which also gives 33 in my situation. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full text search - query plan? PG 8.4.1
Tom Lane wrote: Jesper Krogh jes...@krogh.cc writes: Tom Lane wrote: ... There's something strange about your tsvector index. Maybe it's really huge because the documents are huge? huge is a relative term, but length(ts_vector(body)) is about 200 for each document. Is that huge? It's bigger than the toy example I was trying, but not *that* much bigger. I think maybe your index is bloated. Try dropping and recreating it and see if the estimates change any. I'm a bit reluctant to dropping it and re-creating it. It'll take a couple of days to regenerate, so this should hopefully not be an common situation for the system. I have set the statistics target to 1000 for the tsvector, the documentation didn't specify any heavy negative sides of doing that and since that I haven't seen row estimates that are orders of magnitude off. It is build from scratch using inserts all the way to around 10m now, should that result in index-bloat? Can I inspect the size of bloat without rebuilding (or similar locking operation)? The query still has a wrong tipping point between the two query-plans: ftstest=# explain analyze select body from ftstest where ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100; QUERY PLAN -- Limit (cost=0.00..7357.77 rows=100 width=738) (actual time=3978.974..8595.086 rows=100 loops=1) - Index Scan using ftstest_id_pri_idx on ftstest (cost=0.00..1436458.05 rows=19523 width=738) (actual time=3978.971..8594.932 rows=100 loops=1) Filter: (ftstest_body_fts @@ to_tsquery('testterm'::text)) Total runtime: 8595.222 ms (4 rows) ftstest=# set enable_indexscan=off; SET ftstest=# explain analyze select body from ftstest where ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100; QUERY PLAN Limit (cost=59959.61..59959.86 rows=100 width=738) (actual time=338.832..339.055 rows=100 loops=1) - Sort (cost=59959.61..60008.41 rows=19523 width=738) (actual time=338.828..338.908 rows=100 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 32kB - Bitmap Heap Scan on ftstest (cost=22891.18..59213.45 rows=19523 width=738) (actual time=5.097..316.780 rows=19444 loops=1) Recheck Cond: (ftstest_body_fts @@ to_tsquery('testterm'::text)) - Bitmap Index Scan on ftstest_tfs_idx (cost=0.00..22886.30 rows=19523 width=0) (actual time=4.259..4.259 rows=20004 loops=1) Index Cond: (ftstest_body_fts @@ to_tsquery('testterm'::text)) Total runtime: 339.201 ms (9 rows) So for getting 100 rows where the term exists in 19.444 of 10.000.000 documents it chooses the index-scan where it (given random distribution of the documents) should scan: 100*(1000/19444) = 51429 documents. So it somehow believes that the cost for the bitmap index scan is higher than it actually is or the cost for the index-scan is lower than it actually is. Is is possible to manually set the cost for the @@ operator? It seems natural that matching up a ts_vector to a ts_query, which is a much heavier operation than = and even is stored in EXTENDED storage should be much higher than a integer in plain storage. I tried to search docs for operator cost, but I only found the overall ones in the configuration file that are base values. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Queryplan within FTS/GIN index -search.
@@ to_tsquery('TERM1 TERM2 TERM3 TERM4 TERM5'::text)) - Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..100.42 rows=1 width=0) (actual time=1508.998..1508.998 rows=1 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 TERM2 TERM3 TERM4 TERM5'::text)) Total runtime: 1509.109 ms (9 rows) Can (perhaps more readable) be found at http://krogh.cc/~jesper/test.out Can this be optimized? (I cannot really prevent users from typing stuff in that are common). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Jeff Davis wrote: On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote: I somehow would expect the index-search to take advantage of the MCV's informations in the statistics that sort of translate it into a search and post-filtering (as PG's queryplanner usually does at the SQL-level). MCVs are full values that are found in columns or indexes -- you aren't likely to have two entire documents that are exactly equal, so MCVs are useless in your example. According to my testing, this is not the case and if it was the case, the queryplanner most likely wouldn't be able to plan this query correct: select id from ftstable where tsvectorcol @@ to_tsquery('commonterm') order by id limit 10; (into a index-scan on ID and select id from ftstable where tsvectorcol @@ to_tsquery('rareterm'); into a bitmap index scan on the tsvectorcol and a subsequent sort. This is indeed information on individual terms from the statistics that enable this. I believe that stop words are a more common way of accomplishing what you want to do, but they are slightly more limited: they won't be checked at any level, and so are best used for truly common words like and. From your example, I assume that you still want the word checked, but it's not selective enough to be usefully checked by the index. the terms are really common non-stop-words. In effect, what you want are words that aren't searched (or stored) in the index, but are included in the tsvector (so the RECHECK still works). That sounds like it would solve your problem and it would reduce index size, improve update performance, etc. I don't know how difficult it would be to implement, but it sounds reasonable to me. The only disadvantage is that it's more metadata to manage -- all of the existing data like dictionaries and stop words, plus this new common words. Also, it would mean that almost every match requires RECHECK. It would be interesting to know how common a word needs to be before it's better to leave it out of the index. That sounds like it could require an index rebuild if the distribution changes? That would be another plan to pursue, but the MCV is allready there : ftstest=# select * from ftsbody; id | body | ftsbody_body_fts +--+- 1 | the cat is not a rat uniqueterm1 uniqueterm2 | 'cat':2 'rat':6 'uniqueterm1':7 'uniqueterm2':8 2 | elephant uniqueterm1 uniqueterm2 | 'eleph':1 'uniqueterm1':2 'uniqueterm2':3 3 | cannon uniqueterm1 uniqueterm2 | 'cannon':1 'uniqueterm1':2 'uniqueterm2':3 (3 rows) ftstest=# select most_common_vals, most_common_freqs from pg_stats where tablename = 'ftsbody' and attname = 'ftsbody_body_fts'; most_common_vals | most_common_freqs ---+--- {uniqueterm1,uniqueterm2} | {1,1,1,1} (1 row) And the query-planner uses this information. -- Jesper. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Random penalties on GIN index updates?
Hi (running PG8.4.1) As far as I have gotten in my test of PG Full Text Search.. I have got over 6m documents indexed so far and the index has grown to 37GB. The systems didnt do any autovacuums in the process but I manually vacuumed a few times and that stopped growth for a short period of time. table_name | index_name| times_used | table_size | index_size | num_writes | definition +-+++++-- ftstest| body_tfs_idx |171 | 5071 MB| 37 GB | 6122086 | CREATE INDEX ftstest_tfs_idx ON ftstest USING gin (ftstest_body_fts) (1 row) This is sort of what I'd expect this is not more scary than the Xapian index it is comparing with. Search speed seems excellent. But I feel I'm getting a significant drop-off in indexing speed as time goes by, I dont have numbers to confirm this. If i understand the technicalities correct then INSERT/UPDATES to the index will be accumulated in the maintainance_work_mem and the user being unlucky to fill it up will pay the penalty of merging all the changes into the index? I currently have maintainance_work_mem set to 128MB and according to pg_stat_activity i currently have a insert sitting for over 1 hour. If I strace the postgres process-id it is reading and writing a lot on the filesystem and imposing an IO-wait load of 1 cpu. Can I do something to prevent this from happening? Is it by design? -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random penalties on GIN index updates?
Tom Lane wrote: jes...@krogh.cc writes: If i understand the technicalities correct then INSERT/UPDATES to the index will be accumulated in the maintainance_work_mem and the user being unlucky to fill it up will pay the penalty of merging all the changes into the index? You can turn off the fastupdate index parameter to disable that, but I think there may be a penalty in index bloat as well as insertion speed. It would be better to use a more conservative work_mem (work_mem, not maintenance_work_mem, is what limits the amount of stuff accumulated during normal inserts). Ok, I read the manual about that. Seems worth testing, hat I'm seeing is stuff like this: 2009-10-21T16:32:21 2009-10-21T16:32:25 2009-10-21T16:32:30 2009-10-21T16:32:35 2009-10-21T17:10:50 2009-10-21T17:10:59 2009-10-21T17:11:09 ... then it went on steady for another 180.000 documents. Each row is a printout from the application doing INSERTS, it print the time for each 1000 rows it gets through. It is the 38minutes in the middle I'm a bit worried about. work_mem is set to 512MB, that may translate into 180.000 documents in my system? What I seems to miss a way to make sure som background application is the one getting the penalty, so a random user doing a single insert won't get stuck. Is that doable? It also seems to lock out other inserts while being in this state. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random penalties on GIN index updates?
Robert Haas wrote: On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jesper Krogh jes...@krogh.cc writes: What I seems to miss a way to make sure som background application is the one getting the penalty, so a random user doing a single insert won't get stuck. Is that doable? You could force a vacuum every so often, but I don't think that will help the locking situation. You really need to back off work_mem --- 512MB is probably not a sane global value for that anyway. Yeah, it's hard to imagine a system where that doesn't threaten all kinds of other bad results. I bet setting this to 4MB will make this problem largely go away. Arguably we shouldn't be using work_mem to control this particular behavior, but... I came from Xapian, where you only can have one writer process, but batching up in several GB's improved indexing performance dramatically. Lowering work_mem to 16MB gives batches of 11.000 documents and stall between 45 and 90s. ~ 33 docs/s -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Full text search - query plan? PG 8.4.1
Hi. I'm currently testing out PostgreSQL's Full Text Search capabillities. We're currenly using Xapian, it has some nice features and some drawbacks (sorting), so it is especially this area I'm investigating. I've loaded the database with 50K documents, and the table definition is: ftstest=# \d uniprot Table public.uniprot Column | Type | Modifiers --+--+-- id | integer | not null default nextval('textbody_id_seq'::regclass) body | text | not null default ''::text textbody_body_fts | tsvector | accession_number | text | not null default ''::text Indexes: accno_unique_idx UNIQUE, btree (accession_number) textbody_tfs_idx gin (textbody_body_fts) Triggers: tsvectorupdate BEFORE INSERT OR UPDATE ON textbody FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('textbody_body_fts', 'pg_catalog.english', 'body') commonterm matches 37K of the 50K documents (majority), but the query plan is odd in my eyes. * Why does it mis-guess the cost of a Seq Scan on textbody so much? * Why doesn't it use the index in id to fetch the 10 records? ftstest=# ANALYZE textbody; ANALYZE ftstest=# explain analyze select body from textbody where textbody_body_fts @@ to_tsquery('commonterm') order by id limit 10 offset 0 QUERY PLAN -- Limit (cost=2841.08..2841.11 rows=10 width=5) (actual time=48031.563..48031.568 rows=10 loops=1) - Sort (cost=2841.08..2933.01 rows=36771 width=5) (actual time=48031.561..48031.564 rows=10 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 31kB - Seq Scan on textbody (cost=0.00..2046.47 rows=36771 width=5) (actual time=100.107..47966.590 rows=37133 loops=1) Filter: (textbody_body_fts @@ to_tsquery('commonterm'::text)) Total runtime: 48031.612 ms (7 rows) This query-plan doesn't answer the questions above, but it does indeed speed it up significantly (by heading into a Bitmap Index Scan instead of a Seq Scan) ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select body from textbody where textbody_body_fts @@ to_tsquery('commonterm') order by id limit 10 offset 0 QUERY PLAN --- Limit (cost=269942.41..269942.43 rows=10 width=5) (actual time=47.567..47.572 rows=10 loops=1) - Sort (cost=269942.41..270034.34 rows=36771 width=5) (actual time=47.565..47.567 rows=10 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 31kB - Bitmap Heap Scan on textbody (cost=267377.23..269147.80 rows=36771 width=5) (actual time=15.763..30.576 rows=37133 loops=1) Recheck Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) - Bitmap Index Scan on textbody_tfs_idx (cost=0.00..267368.04 rows=36771 width=0) (actual time=15.419..15.419 rows=37134 loops=1) Index Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) Total runtime: 47.634 ms (9 rows) To me it seems like the query planner could do a better job? On rare terms everything seems to work excellent. N.B.: looks a lot like this: http://archives.postgresql.org/pgsql-performance/2009-07/msg00190.php -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full text search - query plan? PG 8.4.1
Tom Lane wrote: Jesper Krogh jes...@krogh.cc writes: commonterm matches 37K of the 50K documents (majority), but the query plan is odd in my eyes. * Why does it mis-guess the cost of a Seq Scan on textbody so much? The cost looks about right to me. The cost units are not milliseconds. * Why doesn't it use the index in id to fetch the 10 records? You haven't *got* an index on id, according to the \d output. Thanks (/me bangs my head against the table). I somehow assumed that id SERIAL automatically created it for me. Even enough to not looking for it to confirm. The only part of your results that looks odd to me is the very high cost estimate for the bitmapscan: - Bitmap Heap Scan on textbody (cost=267377.23..269147.80 rows=36771 width=5) (actual time=15.763..30.576 rows=37133 loops=1) Recheck Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) - Bitmap Index Scan on textbody_tfs_idx (cost=0.00..267368.04 rows=36771 width=0) (actual time=15.419..15.419 rows=37134 loops=1) Index Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) When I try this with a 64K-row table having 'commonterm' in half of the rows, what I get is estimates of 1530 cost units for the seqscan and 1405 for the bitmapscan (so it prefers the latter). It will switch over to using an index on id if I add one, but that's not the point at the moment. There's something strange about your tsvector index. Maybe it's really huge because the documents are huge? huge is a relative term, but length(ts_vector(body)) is about 200 for each document. Is that huge? I can postprocess them a bit to get it down and will eventually do that before going to production. Thanks alot. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Speed while runnning large transactions.
Hi. I have a transaction running at the database for around 20 hours .. still isn't done. But during the last hours it has come to the point where it really hurts performance of other queries. Given pg_stat_activity output there seems to be no locks interfering but the overall cpu-usage of all queries continue to rise. iowait numbers are also very low. What can I do to make the system handle other queries better? PG: 8.2 -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speed while runnning large transactions.
On Thu, Sep 24, 2009 at 2:27 AM, jes...@krogh.cc wrote: Hi. I have a transaction running at the database for around 20 hours .. still isn't done. But during the last hours it has come to the point where it really hurts performance of other queries. What is your transaction doing during this time? It is a massive DB-update affecting probably 1.000.000 records with a lot of roundtrips to the update-application during that. Given pg_stat_activity output there seems to be no locks interfering but the overall cpu-usage of all queries continue to rise. iowait numbers are also very low. What does select count(*) from pg_stat_activity where waiting; say? There is no particular query. No indication of locks it just seems that having the transaction open (with a lot of changes hold in it) has an impact on the general performance. Even without touching the same records. What can I do to make the system handle other queries better? Really kinda depends on what your transaction is doing. insert's, updates, delete.. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speed while runnning large transactions.
On Thu, Sep 24, 2009 at 9:27 AM, jes...@krogh.cc wrote: Hi. I have a transaction running at the database for around 20 hours .. still isn't done. But during the last hours it has come to the point where it really hurts performance of other queries. Given pg_stat_activity output there seems to be no locks interfering but the overall cpu-usage of all queries continue to rise. iowait numbers are also very low. What can I do to make the system handle other queries better? show us explain from the query(s). use select * from pg_stat_activity to find out the state query is in, and perhaps which one of the queries it really is. I'm actively monitoring pg_stat_activity for potential problems but the thread is spending most of the time in the application side. The transaction is holding a large set of inserts/update and delete for the DB. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using IOZone to simulate DB access patterns
henk de wit wrote: I've been using Bonnie++ for ages to do filesystem testing of new DB servers. But Josh Drake recently turned me on to IOZone. Perhaps a little off-topic here, but I'm assuming you are using Linux to test your DB server (since you mention Bonnie++). But it seems to me that IOZone only has a win32 client. How did you actually run IOZone on Linux? $ apt-cache search iozone iozone3 - Filesystem and Disk Benchmarking Tool -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Backup strategies
Ivan Voras wrote: Warning: DO NOT do on-the-fly binary backups without snapshots. Archiving the database directory with tar on a regular file system, while the server is running, will result in an archive that most likely won't work when restored. Even if you do a pg_start_backup/pg_stop_backup as specified here: http://www.postgresql.org/docs/8.3/interactive/continuous-archiving.html = Making a Base backup. ?? It worked when I tested it, but I may just have been darn lucky. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Message queue table..
Hi. I have this message queue table.. currently with 8m+ records. Picking the top priority messages seem to take quite long.. it is just a matter of searching the index.. (just as explain analyze tells me it does). Can anyone digest further optimizations out of this output? (All records have funcid=4) # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce FROM workqueue.job WHERE (job.funcid = 4) AND (job.run_after = 1208442668) AND (job.grabbed_until = 1208442668) AND (job.coalesce = 'Efam') ORDER BY funcid, priority ASC LIMIT 1 ; QUERY PLAN -- Limit (cost=0.00..0.09 rows=1 width=106) (actual time=245.273..245.274 rows=1 loops=1) - Index Scan using workqueue_job_funcid_priority_idx on job (cost=0.00..695291.80 rows=8049405 width=106) (actual time=245.268..245.268 rows=1 loops=1) Index Cond: (funcid = 4) Filter: ((run_after = 1208442668) AND (grabbed_until = 1208442668) AND (coalesce = 'Efam'::text)) Total runtime: 245.330 ms (5 rows) -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Message queue table..
Craig Ringer wrote: Jesper Krogh wrote: Hi. I have this message queue table.. currently with 8m+ records. Picking the top priority messages seem to take quite long.. it is just a matter of searching the index.. (just as explain analyze tells me it does). Can anyone digest further optimizations out of this output? (All records have funcid=4) You mean all records of interest, right, not all records in the table? Actually all the records.. since all the other virtual queues currently are empty. What indexes do you have in place? What's the schema? Can you post a \d tablename from psql? # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce FROM workqueue.job WHERE (job.funcid = 4) AND (job.run_after = 1208442668) AND (job.grabbed_until = 1208442668) AND (job.coalesce = 'Efam') ORDER BY funcid, priority ASC LIMIT 1 I found that removing the funcid from the order by made it use a better index. (priority, run_after, grabbed_until) that probably makes sense since the funcid doesnt give any value in the index at all. thanks for leading me back on track. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query or just Bad hardware?
[])) - Bitmap Index Scan on reference_seq_idx (cost=0.00..385.58 rows=11606 width=0) (actual time=422.691..422.691 rows=450 loops=1) Index Cond: (sequence_id = ANY ('{34284,41503,42274,42285,76847,78204,104721,126279,274770,274790,274809,305346,307383,307411,309691,311362,344930,352530,371033,371058,507790,517521,517537,517546,526883,558976,4894317,4976383,1676203,4700800,688803,5028679,5028694,5028696,5028684,5028698,5028701,5028676,5028682,5028686,5028692,5028689,3048683,5305427,5305426,4970187,4970216,4970181,4970208,4970196,4970226,4970232,4970201,4970191,4970222,4350307,4873618,1806537,1817367,1817432,4684270,4981822,3172776,4894299,4894304,4700798,1120990,4981817,4831109,4831036,4831068,4831057,4831105,4831038,4831044,4831081,4831063,4831051,4831086,4831049,4831071,4831075,4831114,4831093,2635142,4660208,4660199,4912338,4660150,4662011,5307782,4894286,4894292,4894296,4894309,4894313,1428388,1932290,5306082,2010148,3979647,4382006,4220374,1880794,1526588,774838,1377100,969316,1796618,1121046,4662009,963535,5302610,1121105,688700,688743,688836,688763,688788,1056859,2386006,2386015,2386023,4265832,4231262,4265743,5302612,1121056,1121 090,1121074,688659,688650}'::integer[])) - Index Scan using ecn_ref_idx on number eumbers (cost=0.00..2.74 rows=1 width=108) (actual time=1.794..1.795 rows=0 loops=389) Index Cond: (numbers.reference_id = me.id) Total runtime: 2287.701 ms (10 rows) .. subsequent run: 32.367ms On a X4600 server with 32GB of ram and Equalogic iSCSI SAN attached. Jesper -- Jesper Krogh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Scott Marlowe wrote: On Fri, Mar 14, 2008 at 12:17 AM, Jesper Krogh [EMAIL PROTECTED] wrote: Scott Marlowe wrote: On Thu, Mar 13, 2008 at 3:09 PM, justin [EMAIL PROTECTED] wrote: I chose to use ext3 on these partition You should really consider another file system. ext3 has two flaws that mean I can't really use it properly. A 2TB file system size limit (at least on the servers I've tested) and it locks the whole file system while deleting large files, which can take several seconds and stop ANYTHING from happening during that time. This means that dropping or truncating large tables in the middle of the day could halt your database for seconds at a time. This one misfeature means that ext2/3 are unsuitable for running under a database. I cannot acknowledge or deny the last one, but the first one is not true. I have several volumes in the 4TB+ range on ext3 performing nicely. I can test the large file stuff, but how large? .. several GB is not a problem here. Is this on a 64 bit or 32 bit machine? We had the problem with a 32 bit linux box (not sure what flavor) just a few months ago. I would not create a filesystem on a partition of 2+TB It is on a 64 bit machine.. but ext3 doesnt have anything specifik in it as far as I know.. I have mountet filesystems created on 32 bit on 64 bit and the other way around. The filesystems are around years old. http://en.wikipedia.org/wiki/Ext3 = Limit seems to be 16TB currently (It might get down to something lower if you choose a small blocksize). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Scott Marlowe wrote: On Thu, Mar 13, 2008 at 3:09 PM, justin [EMAIL PROTECTED] wrote: I chose to use ext3 on these partition You should really consider another file system. ext3 has two flaws that mean I can't really use it properly. A 2TB file system size limit (at least on the servers I've tested) and it locks the whole file system while deleting large files, which can take several seconds and stop ANYTHING from happening during that time. This means that dropping or truncating large tables in the middle of the day could halt your database for seconds at a time. This one misfeature means that ext2/3 are unsuitable for running under a database. I cannot acknowledge or deny the last one, but the first one is not true. I have several volumes in the 4TB+ range on ext3 performing nicely. I can test the large file stuff, but how large? .. several GB is not a problem here. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Restore performance?
Well, your pg_dump command lost your BLOBs since the plain text format doesn't support them. Well, no.. they are stored as BYTEA not Large Objects.. They are encoded in ASCII in the pg_dump output. But once you use the -Fc format on your dump and enable blob backups, you can speed up reloads by increasing your checkpoint segments to a big number like 256 and the checkpoint timeout to something like 10 minutes. All other normal tuning parameters should be what you plan to use for your normal operations, too. Thanks. Jesper -- Jesper Krogh ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Restore performance?
Hi I'm currently upgrading a Posgresql 7.3.2 database to a 8.1.something-good I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? The database contains quite alot of BLOB, thus the size. Jesper -- ./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Dump restore performance 7.3 - 8.1
Hi I'm currently upgrading a Posgresql 7.3.2 database to a 8.1.something-good I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? Neither disk-io (viewed using vmstat 1) or cpu (viewed using top) seems to be the bottleneck. The database contains quite alot of BLOB's, thus the size. Jesper -- Jesper Krogh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Restore performance?
If they both took the same amount of time, then you are almost certainly bottlenecked on gzip. Try a faster CPU or use gzip -fast. gzip does not seem to be the bottleneck, on restore is psql the nr. 1 consumer on cpu-time. Jesper Sorry for the double post. -- Jesper Krogh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Storing binary data.
Hi. Please be a bit patient.. I'm quite new to PostgreSQL. I'd like some advise on storing binary data in the database. Currently I have about 300.000 320.000 Bytes Bytea records in the database. It works quite well but I have a feeling that it actually is slowing the database down on queries only related to the surrounding attributes. The common solution, I guess would be to store them in the filesystem instead, but I like to have them just in the database it is nice clean database and application design and if I can get PostgreSQL to not cache them then it should be excactly as fast i assume. The binary data is not a part of most queries in the database only a few explicitly written to fetch them and they are not accessed very often. What do people normally do? Thanks, Jesper -- ./Jesper Krogh, [EMAIL PROTECTED] Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Storing binary data.
I gmane.comp.db.postgresql.performance, skrev Shridhar Daithankar: On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote: The common solution, I guess would be to store them in the filesystem instead, but I like to have them just in the database it is nice clean database and application design and if I can get PostgreSQL to not cache them then it should be excactly as fast i assume. You can normalize them so that a table contains an id and a bytea column only. Te main table will contain all the other attributes and a mapping id. That way you will have only the main table cached. You don't have to go to filesystem for this, I hope. Further benchmarking. I tried to create a table with the excact same attributes but without the binary attribute. It didn't change anything, so my idea that it should be the binary-stuff that sloved it down was wrong. I have a timestamp column in the table that I sort on. Data is ordered over the last 4 years and I select based on a timerange, I cannot make the query-planner use the index on the timestamp by itself but if I set enable_seqscan = false the query time drops by 1/3 (from 1.200 msec to about 400 msec). I cannot figure out why the query-planner chooses wrong. NB: It's postgresql 7.4.3 -- ./Jesper Krogh, [EMAIL PROTECTED] Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] pg_dump performance?
I have a database that I should migrate from 7.3 - 7.4.3 but pg_dump | psql seems to take forever. (Several hours) Is there anything that can I do to speed it up? The databse is primary a table with 300.000 records of about 200Kbytes each. ~ 60 GB. This is becoming an issue with the daily backup too.. (running pg_dump over night ) Jesper -- ./Jesper Krogh, [EMAIL PROTECTED] Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] pg_dump performance?
I gmane.comp.db.postgresql.performance, skrev Christopher Kings-Lynne: Is it the dump or the restore that's really slow? Primarily the dump, it seems to be CPU-bound on the postmaster' process. No signs on IO-bottleneck when I try to monitor with iostat or vmstat -- ./Jesper Krogh, [EMAIL PROTECTED] Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster