Re: [PERFORM] Any better plan for this query?..
Hi Simon, may you explain why REINDEX may help here?.. - database was just created, data loaded, and then indexes were created + analyzed.. What may change here after REINDEX?.. With hashjoin disabled was a good try! Running this query "as it" from 1.50ms we move to 0.84ms now, and the plan is here: QUERY PLAN -- Sort (cost=4562.83..4568.66 rows=2329 width=176) (actual time=0.225..0.229 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB -> Merge Join (cost=4345.89..4432.58 rows=2329 width=176) (actual time=0.056..0.205 rows=20 loops=1) Merge Cond: (s.ref = h.ref_stat) -> Index Scan using stat_ref_idx on stat s (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079 rows=193 loops=1) -> Sort (cost=4345.89..4351.72 rows=2329 width=135) (actual time=0.041..0.043 rows=20 loops=1) Sort Key: h.ref_stat Sort Method: quicksort Memory: 30kB -> Index Scan using history_ref_idx on history h (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) Total runtime: 0.261 ms (12 rows) Curiously planner expect to run it in 0.26ms Any idea why planner is not choosing this plan from the beginning?.. Any way to keep this plan without having a global or per sessions hashjoin disabled?.. Rgds, -Dimitri On 5/6/09, Simon Riggs wrote: > > On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote: > >> I've already tried a target 1000 and the only thing it changes >> comparing to the current 100 (default) is instead of 2404 rows it says >> 240 rows, but the plan remaining the same.. > > Try both of these things > * REINDEX on the index being used in the query, then re-EXPLAIN > * enable_hashjoin = off, then re-EXPLAIN > > You should first attempt to get the same plan, then confirm it really is > faster before we worry why the optimizer hadn't picked that plan. > > We already know that MySQL favors nested loop joins, so turning up a > plan that on this occasion is actually better that way is in no way > representative of general performance. Does MySQL support hash joins? > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
The problem with "gprof" - it'll profile all stuff from the beginning to the end, and a lot of function executions unrelated to this query will be profiled... As well when you look on profiling technology - all such kind of solutions are based on the system clock frequency and have their limits on time resolution. On my system this limit is 0.5ms, and it's too big comparing to the query execution time :-) So, what I've done - I changed little bit a reference key criteria from = '01' to < '51', so instead of 20 rows I have 1000 rows on output now, it's still slower than InnoDB (12ms vs 9ms), but at least may be profiled (well, we also probably moving far from the problem as time may be spent mostly on the output traffic now, but I've tried anyway) - I've made a loop of 100 iterations of this query which is reading but not printing data. The total execution time of this loop is 1200ms, and curiously under profiling was not really changed. Profiler was able to catch 733ms of total execution time (if I understand well, all functions running faster than 0.5ms are remain un-profiled). The top profiler output is here: Excl. Incl. Name User CPU User CPU sec. sec. 0.733 0.733 0.103 0.103 memcpy 0.045 0.045 slot_deform_tuple 0.037 0.040 AllocSetAlloc 0.021 0.021 AllocSetFree 0.018 0.037 pfree 0.018 0.059 appendBinaryStringInfo 0.017 0.031 heap_fill_tuple 0.017 0.017 _ndoprnt 0.016 0.016 nocachegetattr 0.015 0.065 heap_form_minimal_tuple 0.015 0.382 ExecProcNode 0.015 0.015 strlen 0.014 0.037 ExecScanHashBucket 0.014 0.299 printtup 0.013 0.272 ExecHashJoin 0.011 0.011 enlargeStringInfo 0.011 0.086 index_getnext 0.010 0.010 hash_any 0.009 0.076 FunctionCall1 0.009 0.037 MemoryContextAlloc 0.008 0.008 LWLockAcquire 0.007 0.069 pq_sendcountedtext 0.007 0.035 ExecProject 0.007 0.127 ExecScan ... Curiously "memcpy" is in top. Don't know if it's impacted in many cases, but probably it make sense to see if it may be optimized, etc.. Rgds, -Dimitri On 5/7/09, Euler Taveira de Oliveira wrote: > Dimitri escreveu: >> BTW, is there already an integrated profiled within a code? or do I >> need external tools?.. >> > Postgres provides support for profiling. Add --enable-profiling flag. Use > gprof to get the profile. > > > -- > Euler Taveira de Oliveira > http://www.timbira.com/ > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote: > Tom Lane wrote: > > Alvaro Herrera writes: > >> I think there should be a way to refer to individual partitions as > >> objects. > > > > Yeah, the individual partitions should be nameable tables, otherwise we > > will be reinventing a *whole* lot of management stuff to little gain. > > I don't actually think there is anything wrong with using table > > inheritance as the basic infrastructure --- I just want more smarts > > about one particular use pattern of inheritance. > > Maybe it's worth examining and documenting existing partition setups, > the reasoning behind them, and how they're implemented, in order to > guide any future plans for native partitioning support? > > Maybe that's already been/being done. On the off chance that it's not: > > Ones I can think of: > > - Partitioning an equally active dataset by ranges over a key to improve > scan performance, INSERT/UPDATE costs on indexes, locking issues, etc. > > - The "classic" active/archive partition scheme where there's only one > partition growing at any one time, and the others are historical data > that's nowhere near as "hot". > > - A variant on the basic active/archive structure, where query activity > decreases slowly over time and there are many partitions of recent data. > Partitions are merged into larger ones as they age, somewhat like a RRD > database. > > I also expect that in the future there will be demand for striping data > across multiple partitions in different tablespaces to exploit > in-parallel scanning (when/if supported) for better I/O utilization in > multiple-disk-array situations. For example, partitioning on > "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent > scans, one per partition, to satisfy a query. That's a good summary. It has already been documented and discussed, but saying it again and again is the best way to get this across. You've highlighted that partitioning is a feature with many underlying requirements: infrequent access to data (frequently historical), striping for parallelism and getting around RDBMS flaws (if any). We must be careful to implement each requirement in full, yet separately, so we don't end up with 60% functionality in each case by delivering an average or least common denominator solution. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Wed, 6 May 2009, Tom Lane wrote: Matthew Wakeling writes: Here is my patch ported over to the seg contrib package, attached. Apply it to seg.c and all should be well. A similar thing needs to be done to cube, but I haven't looked at that. Teodor, Oleg, do you intend to review/apply this patch? Tom, I just returned from trek around Annapurna and just learned about Matthew's experiments, Teodor is in holidays and will be available after May 11, then there are should be PGCon, so if it can wait, we could look on this after PGCon. Matthew, did you try various data ? From our experience we learned there are can be various corner cases. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Thu, May 7, 2009 at 4:20 AM, Dimitri wrote: > Hi Simon, > > may you explain why REINDEX may help here?.. - database was just > created, data loaded, and then indexes were created + analyzed.. What > may change here after REINDEX?.. > > With hashjoin disabled was a good try! > Running this query "as it" from 1.50ms we move to 0.84ms now, > and the plan is here: > > QUERY > PLAN > -- > Sort (cost=4562.83..4568.66 rows=2329 width=176) (actual > time=0.225..0.229 rows=20 loops=1) > Sort Key: h.horder > Sort Method: quicksort Memory: 30kB > -> Merge Join (cost=4345.89..4432.58 rows=2329 width=176) (actual > time=0.056..0.205 rows=20 loops=1) > Merge Cond: (s.ref = h.ref_stat) > -> Index Scan using stat_ref_idx on stat s > (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079 > rows=193 loops=1) > -> Sort (cost=4345.89..4351.72 rows=2329 width=135) (actual > time=0.041..0.043 rows=20 loops=1) > Sort Key: h.ref_stat > Sort Method: quicksort Memory: 30kB > -> Index Scan using history_ref_idx on history h > (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024 > rows=20 loops=1) > Index Cond: (ref_object = '01'::bpchar) > Total runtime: 0.261 ms > (12 rows) > > Curiously planner expect to run it in 0.26ms > > Any idea why planner is not choosing this plan from the beginning?.. > Any way to keep this plan without having a global or per sessions > hashjoin disabled?.. can you work prepared statements into your app? turn off hash join, prepare the query, then turn it back on. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Simon Riggs writes: > We already know that MySQL favors nested loop joins >From what I read I thought that was the *only* type of join MySQL supports. The big picture view here is that whether we run a short query in half a millisecond versus two milliseconds is usually not really important. It could matter if you're concerned with how many transactions/s you can run in a busy server -- but that's not exactly the same thing and you should really measure that in that case. It would be nice if we were in the same ballpark as MySQL but we would only be interesting in such optimizations if they don't come at the expense of scalability under more complex workloads. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Thu, 2009-05-07 at 12:58 +0100, Gregory Stark wrote: > It would be nice if we were in the same ballpark as MySQL but we would only be > interesting in such optimizations if they don't come at the expense of > scalability under more complex workloads. It doesn't appear there is a scalability issue here at all. Postgres can clearly do the same query in about the same time. We just have a case where MySQL happens to optimise it well and Postgres doesn't. Since we can trivially design cases that show the opposite I'm not worried too much. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Thu, 7 May 2009, Oleg Bartunov wrote: Did you try Guttman quadratic split algorithm ? We also found linear split algorithm for Rtree. The existing (bugfixed) seg split algorithm is the Guttman quadratic split algorithm. Guttman did all his work on two-dimensional and above data, dismissing one-dimensional data as being handled adequately by B-trees, which is not true for segment overlaps. It turns out that the algorithm has a weakness with certain types of data, and one-dimensional data is almost certain to exercise that weakness. The greater the number of dimensions, the less the weakness is exercised. The problem is that the algorithm does not calculate a split pivot. Instead it finds two suitable entries, and adds the remaining entries to those two in turn. This can lead to the majority of the entries being added to just one side. In fact, I saw lots of cases where 367 entries were being split into two pages of 366 and one entry. Guttman's linear split algorithm has the same weakness. One thing I am seeing is a really big difference in performance between Postgres/GiST and a Java implementation I have written, using the same algorithms. Postgres takes three minutes to perform a set of index lookups while java takes six seconds. The old version of bioseg took an hour. I can't see anything in the GiST support code that could account for this. is the number of index lookups different, or just index lookup time is very big ? Same number of index lookups. Same algorithms. I have a set of 681879 segments, and I load them all into the index. I then query the index for overlaps for each one in turn. For some reason, GiST lookups seem to be slow, even if they are using a good algorithm. I have seen that problem with btree_gist on integers too. I can't see any reason for this is the GiST code - it all seems pretty tight to me. We probably need to do some profiling. Matthew -- I suppose some of you have done a Continuous Maths course. Yes? Continuous Maths? Whoah, it was like that, was it! -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - the 'datatable' in the example below although in order to improve performance this table is partitioned (by date range) into a number of partition tables. Each partition contains up to 20GB of data (tens of millons of rows), with an additional ~3GB of indexes, all this is served off a fairly high performance server (8 core 32Gb, with FC attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit RHEL 5.2) This has been working reasonably well, however in the last few days I've been seeing extremely slow performance on what are essentially fairly simple 'index hitting' selects on this data. From the host side I see that the postgres query process is mostly in IO wait, however there is very little data actually being transferred (maybe 2-4 MB/s) - when a different query (say a select count(*) form datatable) will yield a sustained 150+ MB/s. There have been no configuration changes during this time, although of course the database has grown as data is added on a daily basis. I'm not sure of the best way to diagnose this issue - the possible causes I can think of are: 1. Problem with random versus sequential reads on storage system. 2. 'Something' with PostgreSQL itself. 3. Problem with the host environment - one suspicion I have here is that we are >90% full on the storage drives (ext3), I'm not sure if that is impacting performance. Any thoughts as to how to procede from here would be very welcome. Here is an example query plan - looks reasonable to me, seems is making use of the indexes and the constraint exclusion on the partition tables: Nested Loop Left Join (cost=0.00..6462463.96 rows=1894 width=110) -> Append (cost=0.00..6453365.66 rows=1894 width=118) -> Seq Scan on datatable sum (cost=0.00..10.75 rows=1 width=118) Filter: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END)) -> Index Scan using datatable_20090328_customeriddatapointdate_idx on datatable_20090328 sum (cost=0.00..542433.51 rows=180 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090404_customeriddatapointdate_idx on datatable_20090404 sum (cost=0.00..1322098.74 rows=405 width=48) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090411_customeriddatapointdate_idx on datatable_20090411 sum (cost=0.00..1612744.29 rows=450 width=48) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090418_customeriddatapointdate_idx on datatable_20090418 sum (cost=0.00..1641913.58 rows=469 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090425_customeriddatapointdate_idx on datatable_20090425 sum (cost=0.00..1334164.80 rows=389 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, May 7, 2009 at 10:14 AM, David Brain wrote: > Hi, > > Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - > the 'datatable' in the example below although in order to improve > performance this table is partitioned (by date range) into a number of > partition tables. Each partition contains up to 20GB of data (tens of > millons of rows), with an additional ~3GB of indexes, all this is > served off a fairly high performance server (8 core 32Gb, with FC > attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit > RHEL 5.2) > > This has been working reasonably well, however in the last few days > I've been seeing extremely slow performance on what are essentially > fairly simple 'index hitting' selects on this data. Have you re-indexed any of your partitioned tables? If you're index is fragmented, you'll be incurring extra I/O's per index access. Take a look at the pgstattuple contrib for some functions to determine index fragmentation. You can also take a look at the pg_stat_all_indexes tables. If your number of tup's fetched is 100 x more than your idx_scans, you *may* consider reindexing. --Scott
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, 7 May 2009, David Brain wrote: This has been working reasonably well, however in the last few days I've been seeing extremely slow performance on what are essentially fairly simple 'index hitting' selects on this data. From the host side I see that the postgres query process is mostly in IO wait, however there is very little data actually being transferred (maybe 2-4 MB/s) - when a different query (say a select count(*) form datatable) will yield a sustained 150+ MB/s. Has there been a performance *change*, or are you just concerned about a query which doesn't seem to use "enough" disc bandwidth? 1. Problem with random versus sequential reads on storage system. Certainly random access like this index scan can be extremely slow. 2-4 MB/s is quite reasonable if you're fetching one 8kB block per disc seek - no more than 200 per second. 3. Problem with the host environment - one suspicion I have here is that we are >90% full on the storage drives (ext3), I'm not sure if that is impacting performance. One concern I might have with a big setup like that is how big the database directory has got, and whether directory lookups are taking time. Check to see if you have the directory_index option enabled on your ext3 filesystem. Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like "ha-ha-ha", but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Interesting, for one index on one partition: idx_scan: 329 idx_tup_fetch: 8905730 So maybe a reindex would help? David. On Thu, May 7, 2009 at 10:26 AM, Scott Mead wrote: > On Thu, May 7, 2009 at 10:14 AM, David Brain wrote: >> >> Hi, >> >> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - >> the 'datatable' in the example below although in order to improve >> performance this table is partitioned (by date range) into a number of >> partition tables. Each partition contains up to 20GB of data (tens of >> millons of rows), with an additional ~3GB of indexes, all this is >> served off a fairly high performance server (8 core 32Gb, with FC >> attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit >> RHEL 5.2) >> >> This has been working reasonably well, however in the last few days >> I've been seeing extremely slow performance on what are essentially >> fairly simple 'index hitting' selects on this data. > > Have you re-indexed any of your partitioned tables? If you're index is > fragmented, you'll be incurring extra I/O's per index access. Take a look > at the pgstattuple contrib for some functions to determine index > fragmentation. You can also take a look at the pg_stat_all_indexes tables. > If your number of tup's fetched is 100 x more than your idx_scans, you *may* > consider reindexing. > > --Scott > > -- David Brain dbr...@bandwidth.com 919.297.1078 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Some answers in-line: > > Has there been a performance *change*, or are you just concerned about a > query which doesn't seem to use "enough" disc bandwidth? Performance has degraded noticeably over the past few days. > Certainly random access like this index scan can be extremely slow. 2-4 MB/s > is quite reasonable if you're fetching one 8kB block per disc seek - no more > than 200 per second. We have read ahead set pretty aggressively high as the SAN seems to 'like' this, given some testing we did: /sbin/blockdev --getra /dev/sdb 16384 > One concern I might have with a big setup like that is how big the database > directory has got, and whether directory lookups are taking time. Check to > see if you have the directory_index option enabled on your ext3 filesystem. > That's a thought, I doubt the option is set (I didn't set it and I don't _think_ rhel does by default), however the 'base' directory only contains ~5500 items total, so it's not getting too out of hand. David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
> > Nested Loop Left Join (cost=0.00..6462463.96 rows=1894 width=110) > -> Append (cost=0.00..6453365.66 rows=1894 width=118) > -> Seq Scan on datatable sum (cost=0.00..10.75 rows=1 width=118) > Filter: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete) > THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN > (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true > WHEN (cdrdatasourceid = 1) THEN false ELSE true END END)) > -> Index Scan using > datatable_20090328_customeriddatapointdate_idx on datatable_20090328 > sum (cost=0.00..542433.51 rows=180 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090404_customeriddatapointdate_idx on datatable_20090404 > sum (cost=0.00..1322098.74 rows=405 width=48) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090411_customeriddatapointdate_idx on datatable_20090411 > sum (cost=0.00..1612744.29 rows=450 width=48) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090418_customeriddatapointdate_idx on datatable_20090418 > sum (cost=0.00..1641913.58 rows=469 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090425_customeriddatapointdate_idx on datatable_20090425 > sum (cost=0.00..1334164.80 rows=389 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using pk_cdrextension on cdrextension ext > (cost=0.00..4.77 rows=1 width=8) > Index Cond: (sum.id = ext.datatableid) > > Something doesn't look right. Why is it doing an index scan on datatable_20090404 when the constraint for that table puts it as entirely in the date range? Shouldn't it just seq scan the partition or use the partition's customerid index?
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, 7 May 2009, David Brain wrote: Certainly random access like this index scan can be extremely slow. 2-4 MB/s is quite reasonable if you're fetching one 8kB block per disc seek - no more than 200 per second. We have read ahead set pretty aggressively high as the SAN seems to 'like' this, given some testing we did: /sbin/blockdev --getra /dev/sdb 16384 Read-ahead won't really help with completely random access. I think a much more interesting line of enquiry will be trying to work out what has changed, and why it was fast before. How much of the data you're accessing are you expecting to be in the OS cache? Is the table you're index scanning on ordered at all? Could that have changed recently? That's a thought, I doubt the option is set (I didn't set it and I don't _think_ rhel does by default), however the 'base' directory only contains ~5500 items total, so it's not getting too out of hand. I think quite a few systems do set it by default now. Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri escribió: > As well when you look on profiling technology - all such kind of > solutions are based on the system clock frequency and have their > limits on time resolution. On my system this limit is 0.5ms, and it's > too big comparing to the query execution time :-) > > So, what I've done - I changed little bit a reference key criteria from > = '01' to < '51', so instead of 20 rows I have 1000 > rows on output now, Another thing you can try is run the query several times (like 1 or so). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, May 7, 2009 at 11:19 AM, Matthew Wakeling wrote: > On Thu, 7 May 2009, David Brain wrote: > >> Certainly random access like this index scan can be extremely slow. 2-4 >>> MB/s >>> is quite reasonable if you're fetching one 8kB block per disc seek - no >>> more >>> than 200 per second. >>> >> >> We have read ahead set pretty aggressively high as the SAN seems to >> 'like' this, given some testing we did: >> >> /sbin/blockdev --getra /dev/sdb >> 16384 >> > > Read-ahead won't really help with completely random access. Thats a shame because it would be really nice to get the entire index into shared memory or OS cache. Most of the time queries are on data in the past few months. All of the indexes in the past few months should fit in cache. Did something happen to get those indexes flushed from the cache? Were they in the cache before? > I think a much more interesting line of enquiry will be trying to work out > what has changed, and why it was fast before. > > How much of the data you're accessing are you expecting to be in the OS > cache? > > Is the table you're index scanning on ordered at all? Could that have > changed recently? I wrote the application that puts data in that table. Its sort of ordered by that timestamp. Every five minutes it adds rows in no particular order that need to be added. The rows that need to be added every five minutes are ordered by another timestamp that is correlated to but not the same as the indexed timestamp. > > > That's a thought, I doubt the option is set (I didn't set it and I >> don't _think_ rhel does by default), however the 'base' directory only >> contains ~5500 items total, so it's not getting too out of hand. >> > > I think quite a few systems do set it by default now. > > Matthew > > -- > Me... a skeptic? I trust you have proof? > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
[PERFORM] Bad Plan for Questionnaire-Type Query
I have a query [1] that Postgres is insisting on using a Nested Loop for some reason when a Hash Join is much faster. It seems like the estimates are way off. I've set default_statistics_target to 250, 500, 1000 and analyzed, but they never seem to improve. If I disable nestloops, the query completes in around 3-5s. With them enabled, it takes anywhere from 45 to 60 seconds. Here is the DDL for the tables and the month_last_day function [4]. Any help would be appreciated! David Blewett 1. http://dpaste.com/hold/41842/ 2. http://explain.depesz.com/s/Wg 3. http://explain.depesz.com/s/1s 4. http://dpaste.com/hold/41846/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad Plan for Questionnaire-Type Query
On Thu, May 7, 2009 at 12:53 PM, David Blewett wrote: > 1. http://dpaste.com/hold/41842/ > 2. http://explain.depesz.com/s/Wg > 3. http://explain.depesz.com/s/1s > 4. http://dpaste.com/hold/41846/ Forgot to mention that I'm using Postgres 8.3.6 on linux 2.6.24. Shared buffers are set to 1GB, effective_cache_size is set to 3GB. Server has 6GB RAM, running on a SCSI 4-disk RAID10. David Blewett -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On 5/7/09 1:54 AM, "Simon Riggs" wrote: > > > On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote: >> Tom Lane wrote: >> >> I also expect that in the future there will be demand for striping data >> across multiple partitions in different tablespaces to exploit >> in-parallel scanning (when/if supported) for better I/O utilization in >> multiple-disk-array situations. For example, partitioning on >> "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent >> scans, one per partition, to satisfy a query. > > That's a good summary. It has already been documented and discussed, but > saying it again and again is the best way to get this across. > > You've highlighted that partitioning is a feature with many underlying > requirements: infrequent access to data (frequently historical), Actually, infrequent access is not a requirement. It is a common requirement however. Take for instance, a very large set of data that contains an integer column 'type_id' that has about 200 distinct values. The data is accessed with a strict 'type_id = X' requirement 99.9% of the time. If this was one large table, then scans of all sorts become much more expensive than if it is partitioned on 'type_id'. Furthermore, partitioning on type_id removes the requirement to even index on this value. Statistics on each partition may vary significantly, and the plannner can thus adapt to changes in the data per value of type_id naturally. The raw need is not "infrequent access" but highly partitioned access. It doesn't matter if your date-partitioned data is accessed evenly across all dates or skewed to the most frequent -- it matters that you are almost always accessing by small date ranges. > striping for parallelism and getting around RDBMS flaws (if any). We > must be careful to implement each requirement in full, yet separately, > so we don't end up with 60% functionality in each case by delivering an > average or least common denominator solution. > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
I've simply restarted a full test with hashjoin OFF. Until 32 concurrent users things are going well. Then since 32 users response time is jumping to 20ms, with 64 users it's higher again, and with 256 users reaching 700ms, so TPS is dropping from 5.000 to ~200.. With hashjoin ON it's not happening, and I'm reaching at least 11.000 TPS on fully busy 32 cores. I should not use prepare/execute as the test conditions should remain "generic". About scalability issue - there is one on 8.3.7, because on 32 cores with such kind of load it's using only 50% CPU and not outpassing 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS.. On the same time while I'm comparing 8.3 and 8.4 - the response time is 2 times lower in 8.4, and seems to me the main gain for 8.4 is here. I'll publish all details, just need a time :-) Rgds, -Dimitri On 5/7/09, Merlin Moncure wrote: > On Thu, May 7, 2009 at 4:20 AM, Dimitri wrote: >> Hi Simon, >> >> may you explain why REINDEX may help here?.. - database was just >> created, data loaded, and then indexes were created + analyzed.. What >> may change here after REINDEX?.. >> >> With hashjoin disabled was a good try! >> Running this query "as it" from 1.50ms we move to 0.84ms now, >> and the plan is here: >> >> QUERY >> PLAN >> -- >> Sort (cost=4562.83..4568.66 rows=2329 width=176) (actual >> time=0.225..0.229 rows=20 loops=1) >> Sort Key: h.horder >> Sort Method: quicksort Memory: 30kB >> -> Merge Join (cost=4345.89..4432.58 rows=2329 width=176) (actual >> time=0.056..0.205 rows=20 loops=1) >> Merge Cond: (s.ref = h.ref_stat) >> -> Index Scan using stat_ref_idx on stat s >> (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079 >> rows=193 loops=1) >> -> Sort (cost=4345.89..4351.72 rows=2329 width=135) (actual >> time=0.041..0.043 rows=20 loops=1) >> Sort Key: h.ref_stat >> Sort Method: quicksort Memory: 30kB >> -> Index Scan using history_ref_idx on history h >> (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024 >> rows=20 loops=1) >> Index Cond: (ref_object = '01'::bpchar) >> Total runtime: 0.261 ms >> (12 rows) >> >> Curiously planner expect to run it in 0.26ms >> >> Any idea why planner is not choosing this plan from the beginning?.. >> Any way to keep this plan without having a global or per sessions >> hashjoin disabled?.. > > can you work prepared statements into your app? turn off hash join, > prepare the query, then turn it back on. > > merlin > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Thu, 2009-05-07 at 20:36 +0200, Dimitri wrote: > I've simply restarted a full test with hashjoin OFF. Until 32 > concurrent users things are going well. Then since 32 users response > time is jumping to 20ms, with 64 users it's higher again, and with 256 > users reaching 700ms, so TPS is dropping from 5.000 to ~200.. > > With hashjoin ON it's not happening, and I'm reaching at least 11.000 > TPS on fully busy 32 cores. Much better to stick to the defaults. Sounds like a problem worth investigating further, but not pro bono. > About scalability issue - there is one on 8.3.7, because on 32 cores > with such kind of load it's using only 50% CPU and not outpassing > 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS.. Yeh, small changes make a big difference. Thanks for the info. How does MySQL perform? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad Plan for Questionnaire-Type Query
David Blewett writes: > On Thu, May 7, 2009 at 12:53 PM, David Blewett wrote: >> 1. http://dpaste.com/hold/41842/ >> 2. http://explain.depesz.com/s/Wg >> 3. http://explain.depesz.com/s/1s >> 4. http://dpaste.com/hold/41846/ > Forgot to mention that I'm using Postgres 8.3.6 on linux 2.6.24. Well, the reason it likes the nestloop plan is the estimate of just one row out of the lower joins --- that case is pretty much always going to favor a nestloop over other kinds of joins. If it were estimating even as few as ten rows out, it'd likely switch to a different plan. So the question to ask is why the rowcount estimates are so abysmally bad. You mentioned having tried to increase the stats targets, but without seeing the actual stats data it's hard to speculate about this. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad Plan for Questionnaire-Type Query
On Thu, May 7, 2009 at 4:31 PM, Tom Lane wrote: > as few as ten rows out, it'd likely switch to a different plan. So the > So the question to ask is why the rowcount estimates are so abysmally bad. > You mentioned having tried to increase the stats targets, but without > seeing the actual stats data it's hard to speculate about this. How do I get that data for you? David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad Plan for Questionnaire-Type Query
David Blewett writes: > On Thu, May 7, 2009 at 4:31 PM, Tom Lane wrote: >> as few as ten rows out, it'd likely switch to a different plan. So the >> So the question to ask is why the rowcount estimates are so abysmally bad. >> You mentioned having tried to increase the stats targets, but without >> seeing the actual stats data it's hard to speculate about this. > How do I get that data for you? Look into pg_stats for the rows concerning the columns used in the query's WHERE and JOIN/ON clauses. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Indexes not used in DELETE
Hi everybody, I'm wondering why a DELETE statement of mine does not make use of defined indexes on the tables. I have the following tables which are linked as such: component -> rank -> node -> corpus; Now I want to delete all entries in component by giving a list of corpus ids. The query is as such: DELETE FROM component USING corpus toplevel, corpus child, node, rank WHERE toplevel.id IN (25) AND toplevel.top_level = 'y' AND toplevel.pre <= child.pre AND toplevel.post >= child.pre AND node.corpus_ref = child.id AND rank.node_ref = node.id AND rank.component_ref = component.id; The table corpus is defined as such: Table "public.corpus" Column | Type | Modifiers ---++--- id| numeric(38,0) | not null name | character varying(100) | not null type | character varying(100) | not null version | character varying(100) | pre | numeric(38,0) | not null post | numeric(38,0) | not null top_level | boolean| not null Indexes: "corpus_pkey" PRIMARY KEY, btree (id) "corpus_post_key" UNIQUE, btree (post) "corpus_pre_key" UNIQUE, btree (pre) "idx_corpus__id_pre_post" btree (id, pre, post) "idx_corpus__pre_post" btree (pre, post) "idx_corpus__toplevel" btree (id) WHERE top_level = true The query plan of the above statement looks like this: QUERY PLAN --- Hash Join (cost=708.81..4141.14 rows=9614 width=6) Hash Cond: (rank.component_ref = component.id) -> Nested Loop (cost=3.20..3268.07 rows=8373 width=8) -> Hash Join (cost=3.20..1306.99 rows=4680 width=8) Hash Cond: (node.corpus_ref = child.id) -> Seq Scan on node (cost=0.00..1075.63 rows=48363 width=14) -> Hash (cost=3.16..3.16 rows=3 width=27) -> Nested Loop (cost=0.00..3.16 rows=3 width=27) Join Filter: ((toplevel.pre <= child.pre) AND (toplevel.post >= child.pre)) -> Seq Scan on corpus toplevel (cost=0.00..1.39 rows=1 width=54) Filter: (top_level AND (id = 25::numeric)) -> Seq Scan on corpus child (cost=0.00..1.31 rows=31 width=54) -> Index Scan using fk_rank_2_struct on rank (cost=0.00..0.39 rows=2 width=16) Index Cond: (rank.node_ref = node.id) -> Hash (cost=390.27..390.27 rows=25227 width=14) -> Seq Scan on component (cost=0.00..390.27 rows=25227 width=14) (16 rows) Specifically, I'm wondering why the innermost scan on corpus (toplevel) does not use the index idx_corpus__toplevel and why the join between corpus (toplevel) and corpus (child) is not a merge join using the index corpus_pre_key to access the child table. FYI, corpus.pre and corpus.post encode a corpus tree (or rather a forest) using a combined pre and post order. This scheme guarantees that parent.post > child.post > child.pre for all edges parent -> child in the corpus tree. I'm using the same scheme elsewhere in SELECT statements and they work fine there. Thanks, Viktor -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad Plan for Questionnaire-Type Query
On Thu, May 7, 2009 at 6:44 PM, Tom Lane wrote: > Look into pg_stats for the rows concerning the columns used in the > query's WHERE and JOIN/ON clauses. Okay, here you go: http://rafb.net/p/20y8Oh72.html David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Indexes not used in DELETE
Viktor Rosenfeld writes: > -> Seq Scan on corpus toplevel (cost=0.00..1.39 > rows=1 width=54) > Filter: (top_level AND (id = 25::numeric)) > Specifically, I'm wondering why the innermost scan on corpus > (toplevel) does not use the index idx_corpus__toplevel The cost estimate indicates that there are so few rows in corpus that an indexscan would be a waste of time. > and why the > join between corpus (toplevel) and corpus (child) is not a merge join > using the index corpus_pre_key to access the child table. Same answer. Populate the table and the plan will change. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On Wed, May 6, 2009 at 6:08 PM, Simon Riggs wrote: > Agreed. Perhaps I should say then that the syntax needs to express the > requirements of the planner/executor behaviour, rather than being the > main aspect of the feature, as some have suggested. Agreed. > Hopefully, notions of partitioning won't be directly tied to chunking of > data for parallel query access. Most queries access recent data and > hence only a single partition (or stripe), so partitioning and > parallelism and frequently exactly orthogonal. Yes, I think those things are unrelated. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On Thu, 7 May 2009, Robert Haas wrote: On Wed, May 6, 2009 at 6:08 PM, Simon Riggs wrote: Agreed. Perhaps I should say then that the syntax needs to express the requirements of the planner/executor behaviour, rather than being the main aspect of the feature, as some have suggested. Agreed. Hopefully, notions of partitioning won't be directly tied to chunking of data for parallel query access. Most queries access recent data and hence only a single partition (or stripe), so partitioning and parallelism and frequently exactly orthogonal. Yes, I think those things are unrelated. I'm not so sure (warning, I am relativly inexperianced in this area) it sounds like you can take two basic approaches to partition a database 1. The Isolation Plan you want to have it so that your queries match your partitioning. this is with the goal of only having to query a small number of paritions, minimizing the total amount of data touched (including minimumizing the number of indexes searched) this matches the use case mentioned above, with the partition based on date and only looking at the most recent date range. 2. The Load Balancing Plan you want to have your partitioning and your queries _not_ match as much as possible this is with the goal of having the query hit as many partitions as possible, so that the different parts of the search can happen in parallel However, with either partitioning plan, you will have queries that degenerate to look like the other plan. In the case of the isolation plan, you may need to search for all instances of a rare thing over the entire history (after all, if you never need to access that history, why do you pay for disks to store it? ;-) and even when you are searching a narrow time window, it may still span multiple partitions. I have a log analysis setup using the Splunk prioriatary database, it paritions by time, creating a new parition as the current one hits a configurable size (by default 10G on 64 bit systems). for my volume of logs I end up with each parition only covering a few hours. it's very common to want to search over a few days, which can be a few dozen partitions (this is out of many hundreds of partitions, so it's still a _huge_ win to narrow the timeframe) In the case of the load balancing plan, you may run into a query that happens to only fall into one partition (the query matches your paritioning logic) I think the only real difference is how common it is to need to search multiple partitions. If the expectation is that you will frequently need to search most/all of the partitions (the load balancing plan), then it's a waste of time to analyse the query to try and figure out which paritions you need to look at. If the expectation is that you will frequently only need to search a small number of the partitions (the isolation plan), then it's extremely valuble to spend as much time as needed working to analyse the query to try and figure out which partitions you need to look at. I believe that the isolation plan is probably more common than the load balancing plan, but I don't see them as being that different for the database engine point of view. To tune a system that can handle the isolation plan for load balancing, the key thing to do would be to have a knob to disable the partition planning, and just blindly send the search out to every partition. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance