Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri
Hi Simon,

may you explain why REINDEX may help here?.. - database was just
created, data loaded, and then indexes were created + analyzed.. What
may change here after REINDEX?..

With hashjoin disabled was a good try!
Running this query "as it" from 1.50ms we move to 0.84ms now,
and the plan is here:

  QUERY PLAN
--
 Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
time=0.225..0.229 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176) (actual
time=0.056..0.205 rows=20 loops=1)
 Merge Cond: (s.ref = h.ref_stat)
 ->  Index Scan using stat_ref_idx on stat s
(cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079
rows=193 loops=1)
 ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135) (actual
time=0.041..0.043 rows=20 loops=1)
   Sort Key: h.ref_stat
   Sort Method:  quicksort  Memory: 30kB
   ->  Index Scan using history_ref_idx on history h
(cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024
rows=20 loops=1)
 Index Cond: (ref_object = '01'::bpchar)
 Total runtime: 0.261 ms
(12 rows)

Curiously planner expect to run it in 0.26ms

Any idea why planner is not choosing this plan from the beginning?..
Any way to keep this plan without having a global or per sessions
hashjoin disabled?..

Rgds,
-Dimitri


On 5/6/09, Simon Riggs  wrote:
>
> On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote:
>
>> I've already tried a target 1000 and the only thing it changes
>> comparing to the current 100 (default) is instead of 2404 rows it says
>> 240 rows, but the plan remaining the same..
>
> Try both of these things
> * REINDEX on the index being used in the query, then re-EXPLAIN
> * enable_hashjoin = off, then re-EXPLAIN
>
> You should first attempt to get the same plan, then confirm it really is
> faster before we worry why the optimizer hadn't picked that plan.
>
> We already know that MySQL favors nested loop joins, so turning up a
> plan that on this occasion is actually better that way is in no way
> representative of general performance. Does MySQL support hash joins?
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri
The problem with "gprof" - it'll profile all stuff from the beginning
to the end, and a lot of function executions unrelated to this query
will be profiled...

As well when you look on profiling technology - all such kind of
solutions are based on the system clock frequency and have their
limits on time resolution. On my system this limit is 0.5ms, and it's
too big comparing to the query execution time :-)

So, what I've done - I changed little bit a reference key criteria from
= '01' to < '51', so instead of 20 rows I have 1000
rows on output now, it's still slower than InnoDB (12ms vs 9ms), but
at least may be profiled (well, we also probably moving far from the
problem as time may be spent mostly on the output traffic now, but
I've tried anyway) - I've made a loop of 100 iterations of this query
which is reading but not printing data. The total execution time of
this loop is 1200ms, and curiously under profiling was not really
changed. Profiler was able to catch 733ms of total execution time (if
I understand well, all functions running faster than 0.5ms are remain
un-profiled). The top profiler output is here:

Excl. Incl.  Name
User CPU  User CPU
 sec.  sec.
0.733 0.733  
0.103 0.103  memcpy
0.045 0.045  slot_deform_tuple
0.037 0.040  AllocSetAlloc
0.021 0.021  AllocSetFree
0.018 0.037  pfree
0.018 0.059  appendBinaryStringInfo
0.017 0.031  heap_fill_tuple
0.017 0.017  _ndoprnt
0.016 0.016  nocachegetattr
0.015 0.065  heap_form_minimal_tuple
0.015 0.382  ExecProcNode
0.015 0.015  strlen
0.014 0.037  ExecScanHashBucket
0.014 0.299  printtup
0.013 0.272  ExecHashJoin
0.011 0.011  enlargeStringInfo
0.011 0.086  index_getnext
0.010 0.010  hash_any
0.009 0.076  FunctionCall1
0.009 0.037  MemoryContextAlloc
0.008 0.008  LWLockAcquire
0.007 0.069  pq_sendcountedtext
0.007 0.035  ExecProject
0.007 0.127  ExecScan
...

Curiously "memcpy" is in top. Don't know if it's impacted in many
cases, but probably it make sense to see if it may be optimized, etc..

Rgds,
-Dimitri



On 5/7/09, Euler Taveira de Oliveira  wrote:
> Dimitri escreveu:
>> BTW, is there already an integrated profiled within a code? or do I
>> need external tools?..
>>
> Postgres provides support for profiling. Add --enable-profiling flag. Use
> gprof to get the profile.
>
>
> --
>   Euler Taveira de Oliveira
>   http://www.timbira.com/
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
> Tom Lane wrote:
> > Alvaro Herrera  writes:
> >> I think there should be a way to refer to individual partitions as
> >> objects.
> > 
> > Yeah, the individual partitions should be nameable tables, otherwise we
> > will be reinventing a *whole* lot of management stuff to little gain.
> > I don't actually think there is anything wrong with using table
> > inheritance as the basic infrastructure --- I just want more smarts
> > about one particular use pattern of inheritance.
> 
> Maybe it's worth examining and documenting existing partition setups,
> the reasoning behind them, and how they're implemented, in order to
> guide any future plans for native partitioning support?
> 
> Maybe that's already been/being done. On the off chance that it's not:
> 
> Ones I can think of:
> 
> - Partitioning an equally active dataset by ranges over a key to improve
>  scan performance, INSERT/UPDATE costs on indexes, locking issues, etc.
> 
> - The "classic" active/archive partition scheme where there's only one
> partition growing at any one time, and the others are historical data
> that's nowhere near as "hot".
> 
> - A variant on the basic active/archive structure, where query activity
> decreases slowly over time and there are many partitions of recent data.
> Partitions are merged into larger ones as they age, somewhat like a RRD
> database.
> 
> I also expect that in the future there will be demand for striping data
> across multiple partitions in different tablespaces to exploit
> in-parallel scanning (when/if supported) for better I/O utilization in
> multiple-disk-array situations. For example, partitioning on
> "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent
> scans, one per partition, to satisfy a query.

That's a good summary. It has already been documented and discussed, but
saying it again and again is the best way to get this across.

You've highlighted that partitioning is a feature with many underlying
requirements: infrequent access to data (frequently historical),
striping for parallelism and getting around RDBMS flaws (if any). We
must be careful to implement each requirement in full, yet separately,
so we don't end up with 60% functionality in each case by delivering an
average or least common denominator solution.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2009-05-07 Thread Oleg Bartunov

On Wed, 6 May 2009, Tom Lane wrote:


Matthew Wakeling  writes:

Here is my patch ported over to the seg contrib package, attached. Apply
it to seg.c and all should be well. A similar thing needs to be done to
cube, but I haven't looked at that.


Teodor, Oleg, do you intend to review/apply this patch?


Tom,

I just returned from trek around Annapurna and just learned about Matthew's
experiments, Teodor is in holidays and will be available after May 11, 
then there are should be PGCon, so if it can wait, we could look on this

after PGCon.

Matthew, did you try various data ? From our experience we learned there
are can be various corner cases.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Merlin Moncure
On Thu, May 7, 2009 at 4:20 AM, Dimitri  wrote:
> Hi Simon,
>
> may you explain why REINDEX may help here?.. - database was just
> created, data loaded, and then indexes were created + analyzed.. What
> may change here after REINDEX?..
>
> With hashjoin disabled was a good try!
> Running this query "as it" from 1.50ms we move to 0.84ms now,
> and the plan is here:
>
>                                                                      QUERY 
> PLAN
> --
>  Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
> time=0.225..0.229 rows=20 loops=1)
>   Sort Key: h.horder
>   Sort Method:  quicksort  Memory: 30kB
>   ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176) (actual
> time=0.056..0.205 rows=20 loops=1)
>         Merge Cond: (s.ref = h.ref_stat)
>         ->  Index Scan using stat_ref_idx on stat s
> (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079
> rows=193 loops=1)
>         ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135) (actual
> time=0.041..0.043 rows=20 loops=1)
>               Sort Key: h.ref_stat
>               Sort Method:  quicksort  Memory: 30kB
>               ->  Index Scan using history_ref_idx on history h
> (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024
> rows=20 loops=1)
>                     Index Cond: (ref_object = '01'::bpchar)
>  Total runtime: 0.261 ms
> (12 rows)
>
> Curiously planner expect to run it in 0.26ms
>
> Any idea why planner is not choosing this plan from the beginning?..
> Any way to keep this plan without having a global or per sessions
> hashjoin disabled?..

can you work prepared statements into your app?  turn off hash join,
prepare the query, then turn it back on.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Gregory Stark
Simon Riggs  writes:

> We already know that MySQL favors nested loop joins

>From what I read I thought that was the *only* type of join MySQL supports.

The big picture view here is that whether we run a short query in half a
millisecond versus two milliseconds is usually not really important. It could
matter if you're concerned with how many transactions/s you can run in a busy
server -- but that's not exactly the same thing and you should really measure
that in that case.

It would be nice if we were in the same ballpark as MySQL but we would only be
interesting in such optimizations if they don't come at the expense of
scalability under more complex workloads.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 12:58 +0100, Gregory Stark wrote:

> It would be nice if we were in the same ballpark as MySQL but we would only be
> interesting in such optimizations if they don't come at the expense of
> scalability under more complex workloads.

It doesn't appear there is a scalability issue here at all.

Postgres can clearly do the same query in about the same time.

We just have a case where MySQL happens to optimise it well and Postgres
doesn't. Since we can trivially design cases that show the opposite I'm
not worried too much. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2009-05-07 Thread Matthew Wakeling

On Thu, 7 May 2009, Oleg Bartunov wrote:

Did you try Guttman quadratic split algorithm ? We also found linear
split algorithm for Rtree.


The existing (bugfixed) seg split algorithm is the Guttman quadratic split 
algorithm. Guttman did all his work on two-dimensional and above data, 
dismissing one-dimensional data as being handled adequately by B-trees, 
which is not true for segment overlaps. It turns out that the algorithm 
has a weakness with certain types of data, and one-dimensional data is 
almost certain to exercise that weakness. The greater the number of 
dimensions, the less the weakness is exercised.


The problem is that the algorithm does not calculate a split pivot. 
Instead it finds two suitable entries, and adds the remaining entries to 
those two in turn. This can lead to the majority of the entries being 
added to just one side. In fact, I saw lots of cases where 367 entries 
were being split into two pages of 366 and one entry.


Guttman's linear split algorithm has the same weakness.

One thing I am seeing is a really big difference in performance between 
Postgres/GiST and a Java implementation I have written, using the same 
algorithms. Postgres takes three minutes to perform a set of index lookups 
while java takes six seconds. The old version of bioseg took an hour. I 
can't see anything in the GiST support code that could account for this.


is the number of index lookups different, or just index lookup time is very
big ?


Same number of index lookups. Same algorithms. I have a set of 681879 
segments, and I load them all into the index. I then query the index for 
overlaps for each one in turn. For some reason, GiST lookups seem to be 
slow, even if they are using a good algorithm. I have seen that problem 
with btree_gist on integers too. I can't see any reason for this is the 
GiST code - it all seems pretty tight to me. We probably need to do some 
profiling.


Matthew

--
I suppose some of you have done a Continuous Maths course. Yes? Continuous
Maths?  Whoah, it was like that, was it!
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
the 'datatable' in the example below although in order to improve
performance this table is partitioned (by date range) into a number of
partition tables.  Each partition contains up to 20GB of data (tens of
millons of rows), with an additional ~3GB of indexes, all this is
served off a fairly high performance server (8 core 32Gb, with FC
attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
RHEL 5.2)

This has been working reasonably well, however in the last few days
I've been seeing extremely slow performance on what are essentially
fairly simple 'index hitting' selects on this data.  From the host
side I see that the postgres query process is mostly in IO wait,
however there is very little data actually being transferred (maybe
2-4 MB/s) - when a different query (say a select count(*) form
datatable) will yield a sustained 150+ MB/s.  There have been no
configuration changes during this time, although of course the
database has grown as data is added on a daily basis.

I'm not sure of the best way to diagnose this issue - the possible
causes  I can think of are:

1. Problem with random versus sequential reads on storage system.
2. 'Something' with PostgreSQL itself.
3. Problem with the host environment - one suspicion I have here is
that we are >90% full on the storage drives (ext3), I'm not sure if
that is impacting performance.

Any thoughts as to how to procede from here would be very welcome.

Here is an example query plan - looks reasonable to me, seems is
making use of the indexes and the constraint exclusion on the
partition tables:

Nested Loop Left Join  (cost=0.00..6462463.96 rows=1894 width=110)
   ->  Append  (cost=0.00..6453365.66 rows=1894 width=118)
 ->  Seq Scan on datatable sum  (cost=0.00..10.75 rows=1 width=118)
   Filter: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete)
THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN
(obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true
WHEN (cdrdatasourceid = 1) THEN false ELSE true END END))
 ->  Index Scan using
datatable_20090328_customeriddatapointdate_idx on datatable_20090328
sum  (cost=0.00..542433.51 rows=180 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090404_customeriddatapointdate_idx on datatable_20090404
sum  (cost=0.00..1322098.74 rows=405 width=48)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090411_customeriddatapointdate_idx on datatable_20090411
sum  (cost=0.00..1612744.29 rows=450 width=48)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090418_customeriddatapointdate_idx on datatable_20090418
sum  (cost=0.00..1641913.58 rows=469 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090425_customeriddatapointdate_idx on datatable_20090425
sum  (cost=0.00..1334164.80 rows=389 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2

Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Scott Mead
On Thu, May 7, 2009 at 10:14 AM, David Brain  wrote:

> Hi,
>
> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
> the 'datatable' in the example below although in order to improve
> performance this table is partitioned (by date range) into a number of
> partition tables.  Each partition contains up to 20GB of data (tens of
> millons of rows), with an additional ~3GB of indexes, all this is
> served off a fairly high performance server (8 core 32Gb, with FC
> attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
> RHEL 5.2)
>
> This has been working reasonably well, however in the last few days
> I've been seeing extremely slow performance on what are essentially
> fairly simple 'index hitting' selects on this data.


   Have you re-indexed any of your partitioned tables?  If you're index is
fragmented, you'll be incurring extra I/O's per index access.  Take a look
at the pgstattuple contrib for some functions to determine index
fragmentation.  You can also take a look at the pg_stat_all_indexes tables.
If your number of tup's fetched is 100 x more than your idx_scans, you *may*
consider reindexing.

--Scott


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Matthew Wakeling

On Thu, 7 May 2009, David Brain wrote:

This has been working reasonably well, however in the last few days
I've been seeing extremely slow performance on what are essentially
fairly simple 'index hitting' selects on this data.  From the host
side I see that the postgres query process is mostly in IO wait,
however there is very little data actually being transferred (maybe
2-4 MB/s) - when a different query (say a select count(*) form
datatable) will yield a sustained 150+ MB/s.


Has there been a performance *change*, or are you just concerned about a 
query which doesn't seem to use "enough" disc bandwidth?



1. Problem with random versus sequential reads on storage system.


Certainly random access like this index scan can be extremely slow. 2-4 
MB/s is quite reasonable if you're fetching one 8kB block per disc seek - 
no more than 200 per second.



3. Problem with the host environment - one suspicion I have here is
that we are >90% full on the storage drives (ext3), I'm not sure if
that is impacting performance.


One concern I might have with a big setup like that is how big the 
database directory has got, and whether directory lookups are taking time. 
Check to see if you have the directory_index option enabled on your ext3 
filesystem.


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like "ha-ha-ha", but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Interesting, for one index on one partition:

idx_scan: 329
idx_tup_fetch: 8905730

So maybe a reindex would help?

David.

On Thu, May 7, 2009 at 10:26 AM, Scott Mead
 wrote:
> On Thu, May 7, 2009 at 10:14 AM, David Brain  wrote:
>>
>> Hi,
>>
>> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
>> the 'datatable' in the example below although in order to improve
>> performance this table is partitioned (by date range) into a number of
>> partition tables.  Each partition contains up to 20GB of data (tens of
>> millons of rows), with an additional ~3GB of indexes, all this is
>> served off a fairly high performance server (8 core 32Gb, with FC
>> attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
>> RHEL 5.2)
>>
>> This has been working reasonably well, however in the last few days
>> I've been seeing extremely slow performance on what are essentially
>> fairly simple 'index hitting' selects on this data.
>
>    Have you re-indexed any of your partitioned tables?  If you're index is
> fragmented, you'll be incurring extra I/O's per index access.  Take a look
> at the pgstattuple contrib for some functions to determine index
> fragmentation.  You can also take a look at the pg_stat_all_indexes tables.
> If your number of tup's fetched is 100 x more than your idx_scans, you *may*
> consider reindexing.
>
> --Scott
>
>



-- 
David Brain
dbr...@bandwidth.com
919.297.1078

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Some answers in-line:

>
> Has there been a performance *change*, or are you just concerned about a
> query which doesn't seem to use "enough" disc bandwidth?

Performance has degraded noticeably over the past few days.

> Certainly random access like this index scan can be extremely slow. 2-4 MB/s
> is quite reasonable if you're fetching one 8kB block per disc seek - no more
> than 200 per second.

We have read ahead set pretty aggressively high as the SAN seems to
'like' this, given some testing we did:

/sbin/blockdev --getra /dev/sdb
16384


> One concern I might have with a big setup like that is how big the database
> directory has got, and whether directory lookups are taking time. Check to
> see if you have the directory_index option enabled on your ext3 filesystem.
>

That's a thought, I doubt the option is set (I didn't set it and I
don't _think_ rhel does by default), however the 'base' directory only
contains ~5500 items total, so it's not getting too out of hand.

David

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Nikolas Everett
>
> Nested Loop Left Join  (cost=0.00..6462463.96 rows=1894 width=110)
>   ->  Append  (cost=0.00..6453365.66 rows=1894 width=118)
> ->  Seq Scan on datatable sum  (cost=0.00..10.75 rows=1 width=118)
>   Filter: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete)
> THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN
> (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true
> WHEN (cdrdatasourceid = 1) THEN false ELSE true END END))
> ->  Index Scan using
> datatable_20090328_customeriddatapointdate_idx on datatable_20090328
> sum  (cost=0.00..542433.51 rows=180 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090404_customeriddatapointdate_idx on datatable_20090404
> sum  (cost=0.00..1322098.74 rows=405 width=48)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090411_customeriddatapointdate_idx on datatable_20090411
> sum  (cost=0.00..1612744.29 rows=450 width=48)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090418_customeriddatapointdate_idx on datatable_20090418
> sum  (cost=0.00..1641913.58 rows=469 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090425_customeriddatapointdate_idx on datatable_20090425
> sum  (cost=0.00..1334164.80 rows=389 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
>   ->  Index Scan using pk_cdrextension on cdrextension ext
> (cost=0.00..4.77 rows=1 width=8)
> Index Cond: (sum.id = ext.datatableid)
>
>
Something doesn't look right.  Why is it doing an index scan on
datatable_20090404 when the constraint for that table puts it as entirely in
the date range? Shouldn't it just seq scan the partition or use the
partition's customerid index?


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Matthew Wakeling

On Thu, 7 May 2009, David Brain wrote:

Certainly random access like this index scan can be extremely slow. 2-4 MB/s
is quite reasonable if you're fetching one 8kB block per disc seek - no more
than 200 per second.


We have read ahead set pretty aggressively high as the SAN seems to
'like' this, given some testing we did:

/sbin/blockdev --getra /dev/sdb
16384


Read-ahead won't really help with completely random access.

I think a much more interesting line of enquiry will be trying to work out 
what has changed, and why it was fast before.


How much of the data you're accessing are you expecting to be in the OS 
cache?


Is the table you're index scanning on ordered at all? Could that have 
changed recently?



That's a thought, I doubt the option is set (I didn't set it and I
don't _think_ rhel does by default), however the 'base' directory only
contains ~5500 items total, so it's not getting too out of hand.


I think quite a few systems do set it by default now.

Matthew

--
Me... a skeptic?  I trust you have proof?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Alvaro Herrera
Dimitri escribió:

> As well when you look on profiling technology - all such kind of
> solutions are based on the system clock frequency and have their
> limits on time resolution. On my system this limit is 0.5ms, and it's
> too big comparing to the query execution time :-)
> 
> So, what I've done - I changed little bit a reference key criteria from
> = '01' to < '51', so instead of 20 rows I have 1000
> rows on output now,

Another thing you can try is run the query several times (like 1 or so).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Nikolas Everett
On Thu, May 7, 2009 at 11:19 AM, Matthew Wakeling wrote:

> On Thu, 7 May 2009, David Brain wrote:
>
>> Certainly random access like this index scan can be extremely slow. 2-4
>>> MB/s
>>> is quite reasonable if you're fetching one 8kB block per disc seek - no
>>> more
>>> than 200 per second.
>>>
>>
>> We have read ahead set pretty aggressively high as the SAN seems to
>> 'like' this, given some testing we did:
>>
>> /sbin/blockdev --getra /dev/sdb
>> 16384
>>
>
> Read-ahead won't really help with completely random access.


Thats a shame because it would be really nice to get the entire index into
shared memory or OS cache.  Most of the time queries are on data in the past
few months.  All of the indexes in the past few months should fit in cache.

Did something happen to get those indexes flushed from the cache?  Were they
in the cache before?


> I think a much more interesting line of enquiry will be trying to work out
> what has changed, and why it was fast before.
>
> How much of the data you're accessing are you expecting to be in the OS
> cache?
>
> Is the table you're index scanning on ordered at all? Could that have
> changed recently?


I wrote the application that puts data in that table.  Its sort of ordered
by that timestamp.  Every five minutes it adds rows in no particular order
that need to be added.  The rows that need to be added every five minutes
are ordered by another timestamp that is correlated to but not the same as
the indexed timestamp.


>
>
>  That's a thought, I doubt the option is set (I didn't set it and I
>> don't _think_ rhel does by default), however the 'base' directory only
>> contains ~5500 items total, so it's not getting too out of hand.
>>
>
> I think quite a few systems do set it by default now.
>
> Matthew
>
> --
> Me... a skeptic?  I trust you have proof?
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


[PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread David Blewett
I have a query [1] that Postgres is insisting on using a Nested Loop
for some reason when a Hash Join is much faster. It seems like the
estimates are way off. I've set default_statistics_target to 250, 500,
1000 and analyzed, but they never seem to improve. If I disable
nestloops, the query completes in around 3-5s. With them enabled, it
takes anywhere from 45 to 60 seconds. Here is the DDL for the tables
and the month_last_day function [4].

Any help would be appreciated!

David Blewett

1. http://dpaste.com/hold/41842/
2. http://explain.depesz.com/s/Wg
3. http://explain.depesz.com/s/1s
4. http://dpaste.com/hold/41846/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread David Blewett
On Thu, May 7, 2009 at 12:53 PM, David Blewett  wrote:
> 1. http://dpaste.com/hold/41842/
> 2. http://explain.depesz.com/s/Wg
> 3. http://explain.depesz.com/s/1s
> 4. http://dpaste.com/hold/41846/

Forgot to mention that I'm using Postgres 8.3.6 on linux 2.6.24.
Shared buffers are set to 1GB, effective_cache_size is set to 3GB.
Server has 6GB RAM, running on a SCSI 4-disk RAID10.

David Blewett

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Scott Carey



On 5/7/09 1:54 AM, "Simon Riggs"  wrote:

> 
> 
> On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
>> Tom Lane wrote:
>> 
>> I also expect that in the future there will be demand for striping data
>> across multiple partitions in different tablespaces to exploit
>> in-parallel scanning (when/if supported) for better I/O utilization in
>> multiple-disk-array situations. For example, partitioning on
>> "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent
>> scans, one per partition, to satisfy a query.
> 
> That's a good summary. It has already been documented and discussed, but
> saying it again and again is the best way to get this across.
> 
> You've highlighted that partitioning is a feature with many underlying
> requirements: infrequent access to data (frequently historical),

Actually, infrequent access is not a requirement.  It is a common
requirement however.

Take for instance, a very large set of data that contains an integer column
'type_id' that has about 200 distinct values.  The data is accessed with a
strict 'type_id = X' requirement 99.9% of the time.  If this was one large
table, then scans of all sorts become much more expensive than if it is
partitioned on 'type_id'.  Furthermore, partitioning on type_id removes the
requirement to even index on this value.  Statistics on each partition may
vary significantly, and the plannner can thus adapt to changes in the data
per value of type_id naturally.

The raw need is not "infrequent access" but highly partitioned access.  It
doesn't matter if your date-partitioned data is accessed evenly across all
dates or skewed to the most frequent -- it matters that you are almost
always accessing by small date ranges.

> striping for parallelism and getting around RDBMS flaws (if any). We
> must be careful to implement each requirement in full, yet separately,
> so we don't end up with 60% functionality in each case by delivering an
> average or least common denominator solution.
> 
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri
I've simply restarted a full test with hashjoin OFF. Until 32
concurrent users things are going well. Then since 32 users response
time is jumping to 20ms, with 64 users it's higher again, and with 256
users reaching 700ms, so TPS is dropping from 5.000 to ~200..

With hashjoin ON it's not happening, and I'm reaching at least 11.000
TPS on fully busy 32 cores.

I should not use prepare/execute as the test conditions should remain "generic".

About scalability issue - there is one on 8.3.7, because on 32 cores
with such kind of load it's using only 50% CPU and not outpassing
6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..

On the same time while I'm comparing 8.3 and 8.4 - the response time
is 2 times lower in 8.4, and seems to me the main gain for 8.4 is
here.

I'll publish all details, just need a time :-)

Rgds,
-Dimitri

On 5/7/09, Merlin Moncure  wrote:
> On Thu, May 7, 2009 at 4:20 AM, Dimitri  wrote:
>> Hi Simon,
>>
>> may you explain why REINDEX may help here?.. - database was just
>> created, data loaded, and then indexes were created + analyzed.. What
>> may change here after REINDEX?..
>>
>> With hashjoin disabled was a good try!
>> Running this query "as it" from 1.50ms we move to 0.84ms now,
>> and the plan is here:
>>
>>  QUERY
>> PLAN
>> --
>>  Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
>> time=0.225..0.229 rows=20 loops=1)
>>   Sort Key: h.horder
>>   Sort Method:  quicksort  Memory: 30kB
>>   ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176) (actual
>> time=0.056..0.205 rows=20 loops=1)
>> Merge Cond: (s.ref = h.ref_stat)
>> ->  Index Scan using stat_ref_idx on stat s
>> (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079
>> rows=193 loops=1)
>> ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135) (actual
>> time=0.041..0.043 rows=20 loops=1)
>>   Sort Key: h.ref_stat
>>   Sort Method:  quicksort  Memory: 30kB
>>   ->  Index Scan using history_ref_idx on history h
>> (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024
>> rows=20 loops=1)
>> Index Cond: (ref_object = '01'::bpchar)
>>  Total runtime: 0.261 ms
>> (12 rows)
>>
>> Curiously planner expect to run it in 0.26ms
>>
>> Any idea why planner is not choosing this plan from the beginning?..
>> Any way to keep this plan without having a global or per sessions
>> hashjoin disabled?..
>
> can you work prepared statements into your app?  turn off hash join,
> prepare the query, then turn it back on.
>
> merlin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 20:36 +0200, Dimitri wrote:

> I've simply restarted a full test with hashjoin OFF. Until 32
> concurrent users things are going well. Then since 32 users response
> time is jumping to 20ms, with 64 users it's higher again, and with 256
> users reaching 700ms, so TPS is dropping from 5.000 to ~200..
> 
> With hashjoin ON it's not happening, and I'm reaching at least 11.000
> TPS on fully busy 32 cores.

Much better to stick to the defaults. 

Sounds like a problem worth investigating further, but not pro bono.

> About scalability issue - there is one on 8.3.7, because on 32 cores
> with such kind of load it's using only 50% CPU and not outpassing
> 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..

Yeh, small changes make a big difference. Thanks for the info.

How does MySQL perform?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread Tom Lane
David Blewett  writes:
> On Thu, May 7, 2009 at 12:53 PM, David Blewett  wrote:
>> 1. http://dpaste.com/hold/41842/
>> 2. http://explain.depesz.com/s/Wg
>> 3. http://explain.depesz.com/s/1s
>> 4. http://dpaste.com/hold/41846/

> Forgot to mention that I'm using Postgres 8.3.6 on linux 2.6.24.

Well, the reason it likes the nestloop plan is the estimate of just one
row out of the lower joins --- that case is pretty much always going to
favor a nestloop over other kinds of joins.  If it were estimating even
as few as ten rows out, it'd likely switch to a different plan.  So the
question to ask is why the rowcount estimates are so abysmally bad.
You mentioned having tried to increase the stats targets, but without
seeing the actual stats data it's hard to speculate about this.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread David Blewett
On Thu, May 7, 2009 at 4:31 PM, Tom Lane  wrote:
> as few as ten rows out, it'd likely switch to a different plan.  So the
> So the question to ask is why the rowcount estimates are so abysmally bad.
> You mentioned having tried to increase the stats targets, but without
> seeing the actual stats data it's hard to speculate about this.

How do I get that data for you?

David

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread Tom Lane
David Blewett  writes:
> On Thu, May 7, 2009 at 4:31 PM, Tom Lane  wrote:
>> as few as ten rows out, it'd likely switch to a different plan.  So the
>> So the question to ask is why the rowcount estimates are so abysmally bad.
>> You mentioned having tried to increase the stats targets, but without
>> seeing the actual stats data it's hard to speculate about this.

> How do I get that data for you?

Look into pg_stats for the rows concerning the columns used in the
query's WHERE and JOIN/ON clauses.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Indexes not used in DELETE

2009-05-07 Thread Viktor Rosenfeld

Hi everybody,

I'm wondering why a DELETE statement of mine does not make use of  
defined indexes on the tables.


I have the following tables which are linked as such: component ->  
rank -> node -> corpus;


Now I want to delete all entries in component by giving a list of  
corpus ids.


The query is as such:

DELETE FROM component
USING corpus toplevel, corpus child, node, rank
WHERE toplevel.id IN (25) AND toplevel.top_level = 'y'
AND toplevel.pre <= child.pre AND toplevel.post >= child.pre
AND node.corpus_ref = child.id AND rank.node_ref = node.id AND  
rank.component_ref = component.id;


The table corpus is defined as such:

 Table "public.corpus"
  Column   |  Type  | Modifiers
---++---
 id| numeric(38,0)  | not null
 name  | character varying(100) | not null
 type  | character varying(100) | not null
 version   | character varying(100) |
 pre   | numeric(38,0)  | not null
 post  | numeric(38,0)  | not null
 top_level | boolean| not null
Indexes:
"corpus_pkey" PRIMARY KEY, btree (id)
"corpus_post_key" UNIQUE, btree (post)
"corpus_pre_key" UNIQUE, btree (pre)
"idx_corpus__id_pre_post" btree (id, pre, post)
"idx_corpus__pre_post" btree (pre, post)
"idx_corpus__toplevel" btree (id) WHERE top_level = true


The query plan of the above statement looks like this:

  QUERY PLAN
---
 Hash Join  (cost=708.81..4141.14 rows=9614 width=6)
   Hash Cond: (rank.component_ref = component.id)
   ->  Nested Loop  (cost=3.20..3268.07 rows=8373 width=8)
 ->  Hash Join  (cost=3.20..1306.99 rows=4680 width=8)
   Hash Cond: (node.corpus_ref = child.id)
   ->  Seq Scan on node  (cost=0.00..1075.63 rows=48363  
width=14)

   ->  Hash  (cost=3.16..3.16 rows=3 width=27)
 ->  Nested Loop  (cost=0.00..3.16 rows=3 width=27)
   Join Filter: ((toplevel.pre <= child.pre)  
AND (toplevel.post >= child.pre))
   ->  Seq Scan on corpus toplevel   
(cost=0.00..1.39 rows=1 width=54)
 Filter: (top_level AND (id =  
25::numeric))
   ->  Seq Scan on corpus child   
(cost=0.00..1.31 rows=31 width=54)
 ->  Index Scan using fk_rank_2_struct on rank   
(cost=0.00..0.39 rows=2 width=16)

   Index Cond: (rank.node_ref = node.id)
   ->  Hash  (cost=390.27..390.27 rows=25227 width=14)
 ->  Seq Scan on component  (cost=0.00..390.27 rows=25227  
width=14)

(16 rows)

Specifically, I'm wondering why the innermost scan on corpus  
(toplevel) does not use the index idx_corpus__toplevel and why the  
join between corpus (toplevel) and corpus (child) is not a merge join  
using the index corpus_pre_key to access the child table.


FYI, corpus.pre and corpus.post encode a corpus tree (or rather a  
forest) using a combined pre and post order.  This scheme guarantees  
that parent.post > child.post > child.pre for all edges parent ->  
child in the corpus tree. I'm using the same scheme elsewhere in  
SELECT statements and they work fine there.


Thanks,
Viktor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad Plan for Questionnaire-Type Query

2009-05-07 Thread David Blewett
On Thu, May 7, 2009 at 6:44 PM, Tom Lane  wrote:
> Look into pg_stats for the rows concerning the columns used in the
> query's WHERE and JOIN/ON clauses.

Okay, here you go:
http://rafb.net/p/20y8Oh72.html

David

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Indexes not used in DELETE

2009-05-07 Thread Tom Lane
Viktor Rosenfeld  writes:
> ->  Seq Scan on corpus toplevel (cost=0.00..1.39 
> rows=1 width=54)
>   Filter: (top_level AND (id = 25::numeric))

> Specifically, I'm wondering why the innermost scan on corpus  
> (toplevel) does not use the index idx_corpus__toplevel

The cost estimate indicates that there are so few rows in corpus
that an indexscan would be a waste of time.

> and why the  
> join between corpus (toplevel) and corpus (child) is not a merge join  
> using the index corpus_pre_key to access the child table.

Same answer.  Populate the table and the plan will change.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Robert Haas
On Wed, May 6, 2009 at 6:08 PM, Simon Riggs  wrote:
> Agreed. Perhaps I should say then that the syntax needs to express the
> requirements of the planner/executor behaviour, rather than being the
> main aspect of the feature, as some have suggested.

Agreed.

> Hopefully, notions of partitioning won't be directly tied to chunking of
> data for parallel query access. Most queries access recent data and
> hence only a single partition (or stripe), so partitioning and
> parallelism and frequently exactly orthogonal.

Yes, I think those things are unrelated.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread david

On Thu, 7 May 2009, Robert Haas wrote:


On Wed, May 6, 2009 at 6:08 PM, Simon Riggs  wrote:

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.


Agreed.


Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal.


Yes, I think those things are unrelated.


I'm not so sure (warning, I am relativly inexperianced in this area)

it sounds like you can take two basic approaches to partition a database

1. The Isolation Plan

   you want to have it so that your queries match your partitioning.

  this is with the goal of only having to query a small number of 
paritions, minimizing the total amount of data touched (including 
minimumizing the number of indexes searched)


 this matches the use case mentioned above, with the partition based on 
date and only looking at the most recent date range.


2. The Load Balancing Plan

  you want to have your partitioning and your queries _not_ match as much 
as possible


  this is with the goal of having the query hit as many partitions as 
possible, so that the different parts of the search can happen in parallel



However, with either partitioning plan, you will have queries that 
degenerate to look like the other plan.


In the case of the isolation plan, you may need to search for all 
instances of a rare thing over the entire history (after all, if you never 
need to access that history, why do you pay for disks to store it? ;-)


and even when you are searching a narrow time window, it may still span 
multiple partitions. I have a log analysis setup using the Splunk 
prioriatary database, it paritions by time, creating a new parition as the 
current one hits a configurable size (by default 10G on 64 bit systems). 
for my volume of logs I end up with each parition only covering a few 
hours. it's very common to want to search over a few days, which can be a 
few dozen partitions (this is out of many hundreds of partitions, so it's 
still a _huge_ win to narrow the timeframe)



In the case of the load balancing plan, you may run into a query that 
happens to only fall into one partition (the query matches your 
paritioning logic)





I think the only real difference is how common it is to need to search 
multiple partitions.


If the expectation is that you will frequently need to search most/all of 
the partitions (the load balancing plan), then it's a waste of time to 
analyse the query to try and figure out which paritions you need to look 
at.


If the expectation is that you will frequently only need to search a small 
number of the partitions (the isolation plan), then it's extremely valuble 
to spend as much time as needed working to analyse the query to try and 
figure out which partitions you need to look at.



I believe that the isolation plan is probably more common than the load 
balancing plan, but I don't see them as being that different for the 
database engine point of view. To tune a system that can handle the 
isolation plan for load balancing, the key thing to do would be to have a 
knob to disable the partition planning, and just blindly send the search 
out to every partition.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance