On 1/13/15 9:42 PM, Amit Kapila wrote:
As an example one of the the strategy
could be if the table size is X MB and there are 8 workers, then
divide the work as X/8 MB for each worker (which I have currently
used in patch) and another could be each worker does scan
1 block at a time and then check some global structure to see which
next block it needs to scan, according to me this could lead to random
scan.  I have read that some other databases also divide the work
based on partitions or segments (size of segment is not very clear).

Long-term I think we'll want a mix between the two approaches. Simply doing 
something like blkno % num_workers is going to cause imbalances, but trying to 
do this on a per-block basis seems like too much overhead.

Also long-term, I think we also need to look at a more specialized version of 
parallelism at the IO layer. For example, during an index scan you'd really 
like to get IO requests for heap blocks started in the background while the 
backend is focused on the mechanics of the index scan itself. While this could 
be done with the stuff Robert has written I have to wonder if it'd be a lot 
more efficient to use fadvise or AIO. Or perhaps it would just be better to 
deal with an entire index page (remembering TIDs) and then hit the heap.

But I agree with Robert; there's a lot yet to be done just to get *any* kind of 
parallel execution working before we start thinking about how to optimize it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to