Re: [HACKERS] Seq scans roadmap

CK Tan Wed, 09 May 2007 20:55:26 -0700

Hi,

In reference to the seq scans roadmap, I have just submitted a patchthat addresses some of the concerns.


The patch does this:

1. for small relation (smaller than 60% of bufferpool), use thecurrent logic

2. for big relation:
        - use a ring buffer in heap scan
        - pin first 12 pages when scan starts
        - on consumption of every 4-page, read and pin the next 4-page

- invalidate used pages of in the scan so they do not force outother useful pages


4 files changed:
bufmgr.c, bufmgr.h, heapam.c, relscan.h

If there are interests, I can submit another scan patch that returnsN tuples at a time, instead of current one-at-a-time interface. Thisimproves code locality and further improve performance by another10-20%.

For TPCH 1G tables, we are seeing more than 20% improvement in scanson the same hardware.

-------------------------------------------------------------------------

----- PATCHED VERSION

-------------------------------------------------------------------------

gptest=# select count(*) from lineitem;
  count
---------
6001215
(1 row)

Time: 2117.025 ms

-------------------------------------------------------------------------

----- ORIGINAL CVS HEAD VERSION

-------------------------------------------------------------------------

gptest=# select count(*) from lineitem;
  count
---------
6001215
(1 row)

Time: 2722.441 ms


Suggestions for improvement are welcome.

Regards,
-cktan
Greenplum, Inc.

On May 8, 2007, at 5:57 AM, Heikki Linnakangas wrote:

Luke Lonergan wrote:
What do you mean with using readahead inside the heapscan?Starting an async read request?
Nope - just reading N buffers ahead for seqscans. Subsequentcalls use
previously read pages.  The objective is to issue contiguous reads to
the OS in sizes greater than the PG page size (which is much smaller
than what is needed for fast sequential I/O).
Are you filling multiple buffers in the buffer cache with a singleread-call? The OS should be doing readahead for us anyway, so Idon't see how just issuing multiple ReadBuffers one after eachother helps.
Yes, I think the ring buffer strategy should be used when thetable sizeis > 1 x bufcache and the ring buffer should be of a fixed sizesmaller
than L2 cache (32KB - 128KB seems to work well).
I think we want to let the ring grow larger than that for updatingtransactions and vacuums, though, to avoid the WAL flush problem.
--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
---------------------------(end ofbroadcast)---------------------------
TIP 6: explain analyze is your friend




---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] Seq scans roadmap

Reply via email to