Qingquing,

> First, I want to second Jonah's enthusiasm.  This is very exciting!

Me, three!  I didn't think this was ever going to come to Postgres absent 
major corporate funding.

> This is really only a gut feeling for me (it can't be otherwise, since
> we can't yet test), but I think parallelizing a single seqscan is
> pretty much guaranteed to do nothing, because seqscans, especially on
> large tables, are IO bound.

Actuall, not true.  Our current seqscan performance suffers from 
produce-consumer fluctuation.  GreenPlum and Sun did a whole bunch of 
testing on this.

Basically reading a large table off disk does this:

read some table while not processing
process in cpu while not reading
read some more table while not processing
process some more in cpu while not reading
etc.
resulting in an I/O througput graph that looks like:

    *       *       *
   * *    * *    * *
  *    * *    * *    *
 *       *       *      *

The really annoying part about this, for me personally, is that the peaks 
are significantly faster than comparable commercial DBMSes ... but our 
average is far less.   So even on a single seq scan, parallel query 
execution would make a significant difference in performance, possibly as 
much as +75% on seq scans of large tables.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to