Re: [HACKERS] New Access Method

2007-03-04 Thread Alan Gatt

Also, any pointers where I can find the operator classes and functions for
the GIST index?

Thanks

On 04/03/07, Alan Gatt [EMAIL PROTECTED] wrote:


Ok, so I am using GCC 3.2 as a compiler, and the following is the error
message:

make[4]: Entering directory
`/c/dev/postgres/pgsql_tip/src/backend/access/mdim'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g
-I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
-I../../../../src/include/port/win32 -DBUILDING_DLL  -c -o mdim.o mdim.c

make[4]: *** [mdim.o] Error 3
make[4]: Target `SUBSYS.o' not remade because of errors.
make[4]: Leaving directory
`/c/dev/postgres/pgsql_tip/src/backend/access/mdim'
make[3]: *** [mdim-recursive] Error 2

If I choose the GIST development, do you know of any documentation which
can help me?

Thanks,
Alan


On 04/03/07, Alvaro Herrera  [EMAIL PROTECTED] wrote:

 Alan Gatt escribió:
  Hello,
 
  I am trying to create a new index which is multidimensional (based on
  R-Tree) which can be used for data warehosuing. I have read the
 Developers'
  manual about adding new indexes, and now I want to start coding the
 new
  index.

 Have you considered coding it using the GiST infrastructure?  Should be
 much easier, and automatically give crash safety, concurrency, and a
 shorter development time.

 --
 Alvaro Herrera
 http://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.





Re: [HACKERS] New Access Method

2007-03-04 Thread Martijn van Oosterhout
On Sun, Mar 04, 2007 at 09:32:58AM +0100, Alan Gatt wrote:
 Also, any pointers where I can find the operator classes and functions for
 the GIST index?

The documentation is a good start:

http://www.postgresql.org/docs/8.1/static/gist.html

There are also plenty of examples in the contrib portion of the source
code.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] ERROR: operator does not exist: integer !=- integer

2007-03-04 Thread Andrew - Supernews
On 2007-03-04, William ZHANG [EMAIL PROTECTED] wrote:
 Here is the steps to reproduce it in CVS HEAD:
 backend select -1 !=-1;

This arguably isn't a bug, because != is not a standard SQL operator, and
therefore !=- can legitimately be defined as a single operator by the user.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Synchronized Scan update

2007-03-04 Thread Simon Riggs
On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote:
 Is there any consensus about whether to include these two parameters as
 GUCs or constants if my patch is to be accepted?
 
 (1) sync_scan_threshold: Use synchronized scanning for tables greater
 than this many pages; smaller tables will not be affected.

That sounds OK.

 (2) sync_scan_offset: Start a new scan this many pages before a
 currently running scan to take advantage of the pages
  that are likely already in cache.

I'm somewhat dubious about this parameter, I have to say, even though I
am eager for this feature. It seems like a magic parameter that works
only when we have the right knowledge to set it correctly.

How will we know what to default it to and how will we know whether to
set it higher or lower for better performance? Does that value vary
according to the workload on the system? How?

I'm worried that we get a feature that works well on simple tests and
not at all in real world circumstances. I don't want to cast doubt on
what could be a great patch or be negative: I just see that the feature
relies on the dynamic behaviour of the system. I'd like to see some
further studies on how this works to make sure that we can realistically
set know how to set this knob, that its the correct knob and it is the
only one we need.

Further thoughts: It sounds like sync_scan_offset is related to
effective_cache_size. Can you comment on whether that might be a
something we can use as well/instead? (i.e. set the scan offset to say K
* effective_cache_size, 0.1 = K = 0.5)???

Might we do roughly the same thing with sync_scan_threshold as well, and
just have enable_sync_scan instead? i.e. sync_scan_threshold =
effective_cache_size? When would those two parameters not be connected
directly to each other?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-04 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-03-02 kell 10:13, kirjutas Tom Lane:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  My instinct would be to follow your first strategy, i.e. detect which 
  path is needed rather than try one and then if it fails do the other.
 
 The very first thing you need to think about is how to solve the race
 condition problem, ie, two backends concurrently trying to insert
 identical data.  

Then one of them will update the data inserted by whoeved got the insert
first.

 Until you have a plausible mechanism for that, the
 whole thing is pie-in-the-sky.

Is'nt the standard way of doing it thus:

UPDATE
IF NOT FOUND THEN 
  INSERT
  IF DUPLICATE KEY THEN
UPDATE
  END IF
END IF

At least this is how UPSERT is usually done in plpgsql


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] UPSERT

2007-03-04 Thread Bruno Wolff III
On Sun, Mar 04, 2007 at 14:55:47 +0200,
  Hannu Krosing [EMAIL PROTECTED] wrote:
 
 UPDATE
 IF NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END IF

I believe it is possible for the above to fail. For example another
transaction could create a matching record between the update and insert
and then another transaction could delete it between the insert and the
second update.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-04 Thread Hannu Krosing
Ühel kenal päeval, P, 2007-03-04 kell 07:46, kirjutas Bruno Wolff III:
 On Sun, Mar 04, 2007 at 14:55:47 +0200,
   Hannu Krosing [EMAIL PROTECTED] wrote:
  
  UPDATE
  IF NOT FOUND THEN 
INSERT
IF DUPLICATE KEY THEN
  UPDATE
END IF
  END IF
 
 I believe it is possible for the above to fail. For example another
 transaction could create a matching record between the update and insert
 and then another transaction could delete it between the insert and the
 second update.

Then we may do the second part as a loop and hope that eventually we hit
the right point with either INSERT or UPDATE:

 UPDATE
 WHILE NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END WHILE

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] UPSERT

2007-03-04 Thread Petr Jelinek

Bruno Wolff III wrote:

On Sun, Mar 04, 2007 at 14:55:47 +0200,
  Hannu Krosing [EMAIL PROTECTED] wrote:

UPDATE
IF NOT FOUND THEN
  INSERT
  IF DUPLICATE KEY THEN
UPDATE
  END IF
END IF


I believe it is possible for the above to fail. For example another
transaction could create a matching record between the update and insert
and then another transaction could delete it between the insert and the
second update.


You know we have example in manual right ?
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

:)

--
Regards
Petr Jelinek (PJMODOS)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-04 Thread Martijn van Oosterhout
On Sun, Mar 04, 2007 at 02:55:47PM +0200, Hannu Krosing wrote:
 Is'nt the standard way of doing it thus:
 
 UPDATE
 IF NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END IF
 
 At least this is how UPSERT is usually done in plpgsql

Well, you need to loop, because that last UPDATE can get a not-found
again, so you have to keep trying both until they work.

I think MERGE would still be cool, because then it's only one command
that has to be repeated, rather than two.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] ERROR: operator does not exist: integer !=- integer

2007-03-04 Thread Shane Ambler

Andrew - Supernews wrote:

On 2007-03-04, William ZHANG [EMAIL PROTECTED] wrote:

Here is the steps to reproduce it in CVS HEAD:
backend select -1 !=-1;


This arguably isn't a bug, because != is not a standard SQL operator, and
therefore !=- can legitimately be defined as a single operator by the user.



I missed the first post and can't seem to search for it - so correct me 
if I am missing something.


Isn't the problem here a missing space? != is a valid operator and -1 is 
the value you are comparing to. !=-1 is not valid but != -1 is correct 
and what I assume you are looking to achieve.


The negation operator goes with the int being negated and is not part of 
the comparison operator != the space is needed there to separate the two.




--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] ERROR: operator does not exist: integer !=- integer

2007-03-04 Thread William ZHANG
Here is the steps to reproduce it in CVS HEAD:

$ uname -a
Linux os-server 2.6.9-11.19AX #1 Fri Aug 5 05:12:07 EDT 2005 i686 i686 i386 
GNU/Linux

$ ./postgres --single -D $HOME/pgsql/data postgres

PostgreSQL stand-alone backend 8.3devel
backend show server_version;
 1: server_version  (typeid = 25, len = -1, typmod = -1, byval = 
f)

 1: server_version = 8.3devel (typeid = 25, len = -1, typmod = -1, 
byval = f)

backend select -1 != -1;
 1: ?column?(typeid = 16, len = 1, typmod = -1, byval = t)

 1: ?column? = f  (typeid = 16, len = 1, typmod = -1, byval = 
t)

backend select -1 !=-1;
ERROR:  operator does not exist: integer !=- integer at character 11
HINT:  No operator matches the given name and argument type(s). You might 
need to add explicit type casts.
STATEMENT:  select -1 !=-1;

A quick hack in scan.l :

*** src/backend/parser/scan.l.old   2007-03-04 11:39:56.831289992 +0800
--- src/backend/parser/scan.l   2007-03-04 11:40:04.142178568 +0800
***
*** 605,610 
--- 605,617 
{
int ic;

+   /* filter out operaters end 
with '=' */
+   if (yytext[nchars - 2] == 
'=')
+   {
+   nchars--;
+   continue;
+   }
+
for (ic = nchars-2; ic = 0; 
ic--)
{
if 
(strchr([EMAIL PROTECTED]|`?%, yytext[ic]))

Now the result is correct:

backend select -1 !=-1;
 1: ?column?(typeid = 16, len = 1, typmod = -1, byval = t)

 1: ?column? = f  (typeid = 16, len = 1, typmod = -1, byval = 
t)

--
Regards,
William ZHANG



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Trivial HugeTLB Benchmark

2007-03-04 Thread Ryan Cumming
Hey,

Out of curiosity I benchmarked PostgreSQL 8.2.3 using huge pages for shared 
memory. Oracle claims fairly significant speedups with huge pages but I 
couldn't find any information on PostgreSQL.

I used the attached patch to enable huge pages on Linux. The test hardware is a 
dual Nocona Xeon 3.2Ghz with 4GB of RAM and two 15K 73GB Ultra320 disks in a 
software RAID-1. The box is running CentOS 4.4 for x86-64 and the vendor's 
stock 2.6.9 kernel. The relevant postgresql.conf settings are:

shared_buffers=160MB
work_mem=8MB
fsync=off
full_page_writes=off
effective_cache_size=3GB

I ran each pgbench after a fresh reboot. I used 85 huge pages reserved at boot 
for the huge page test, and none for the normal shared memory test.

Normal shared memory:
-bash-3.00$ pgbench -c 5 -t 1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 5
number of transactions per client: 1
number of transactions actually processed: 5/5
tps = 1669.009344 (including connections establishing)
tps = 1669.941756 (excluding connections establishing)

Huge pages:
-bash-3.00$  pgbench -c 5 -t 1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 5
number of transactions per client: 1
number of transactions actually processed: 5/5
tps = 1678.392138 (including connections establishing)
tps = 1679.268344 (excluding connections establishing)

Assuming that this is a representative benchmark, it looks like huge pages are 
a very slight (0.5%) performance win. I'm guessing that PostgreSQL doesn't 
benefit as much as Oracle due to its much less ridiculous shared memory size. 
That performance boost is almost certainly not worth the platform-specific code 
or administration overhead of hugetlb on Linux.

-Ryan
 
This electronic mail transmission and any accompanying attachments contain 
confidential information intended only for the use of the individual or entity 
named above. Any dissemination, distribution, copying or action taken in 
reliance on the contents of this communication by anyone other than the 
intended recipient is strictly prohibited. If you have received this 
communication in error please immediately delete the e-mail and either notify 
the sender at the above e-mail address or by telephone at 250.386.5323.
diff -u postgresql-8.2.3/src/backend/port/sysv_shmem.c /usr/src/redhat/BUILD/postgresql-8.2.3/src/backend/port/sysv_shmem.c
--- postgresql-8.2.3/src/backend/port/sysv_shmem.c	2006-07-13 22:28:28.0 -0700
+++ /usr/src/redhat/BUILD/postgresql-8.2.3/src/backend/port/sysv_shmem.c	2007-03-03 17:05:37.0 -0800
@@ -46,6 +46,7 @@
 #define PG_SHMAT_FLAGS			0
 #endif
 
+#define PG_LARGE_PAGE_SIZE 2048 * 1024
 
 unsigned long UsedShmemSegID = 0;
 void	   *UsedShmemSegAddr = NULL;
@@ -75,7 +76,8 @@
 	IpcMemoryId shmid;
 	void	   *memAddress;
 
-	shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection);
+	size = ((size - 1)  ~(PG_LARGE_PAGE_SIZE - 1)) + PG_LARGE_PAGE_SIZE;
+	shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection | SHM_HUGETLB);
 
 	if (shmid  0)
 	{

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: operator does not exist: integer !=- integer

2007-03-04 Thread Tom Lane
William ZHANG [EMAIL PROTECTED] writes:
 backend select -1 !=-1;
 ERROR:  operator does not exist: integer !=- integer at character 11

This is not a bug.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Trivial HugeTLB Benchmark

2007-03-04 Thread Tom Lane
Ryan Cumming [EMAIL PROTECTED] writes:
 I ran each pgbench after a fresh reboot. I used 85 huge pages reserved at 
 boot for the huge page test, and none for the normal shared memory test.

 Normal shared memory:
 -bash-3.00$ pgbench -c 5 -t 1
 starting vacuum...end.
 transaction type: TPC-B (sort of)
 scaling factor: 1
 number of clients: 5
 number of transactions per client: 1
 number of transactions actually processed: 5/5
 tps = 1669.009344 (including connections establishing)
 tps = 1669.941756 (excluding connections establishing)

If you did this only once, the results are not really trustworthy;
you need to average several similar runs before you can have much
confidence.  pgbench's inter-run variation is usually upwards of 10%,
so trying to draw conclusions about half-percentage-point differences
without averaging is a waste of time.

Also, if scaling factor  number of clients then what you're mostly
measuring is update-contention behavior.  Try it with -s 10 and -c 5;
and don't forget to reinitialize the database for each run of tests
to be sure it's fair.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Synchronized Scan update

2007-03-04 Thread Joshua D. Drake



(2) sync_scan_offset: Start a new scan this many pages before a
currently running scan to take advantage of the pages
 that are likely already in cache.



I'm somewhat dubious about this parameter, I have to say, even though I
am eager for this feature. It seems like a magic parameter that works
only when we have the right knowledge to set it correctly.

  

Hello,

Don't get me wrong, I want things to be easily understandable as well 
but the reason you site above pretty much
makes us need to remove most of the postgresql.conf, including all 
bgwriter, vacuum cost delay, and autovac settings.

Not to mention commit delay and others ;).

Sincerely,

Joshua D. Drake







---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] New Access Method

2007-03-04 Thread Tom Lane
Alan Gatt [EMAIL PROTECTED] writes:
 make[4]: Entering directory
 `/c/dev/postgres/pgsql_tip/src/backend/access/mdim'
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
 -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g
 -I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND
 -I../../../../src/include/port/win32 -DBUILDING_DLL  -c -o mdim.o mdim.c

 make[4]: *** [mdim.o] Error 3

Umm ... what happened to the actual compiler error message?  I'd have
expected to see something in between those lines.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ERROR: operator does not exist: integer !=- integer

2007-03-04 Thread Joshua D. Drake


I missed the first post and can't seem to search for it - so correct 
me if I am missing something.


Isn't the problem here a missing space? != is a valid operator and -1 
is the value you are comparing to. !=-1 is not valid but != -1 is 
correct and what I assume you are looking to achieve.




Well yes it will work if you add a space, but technically the problem is 
the query should be written like this:


1 -1 or 1  -1

Joshua D. Drake



The negation operator goes with the int being negated and is not part 
of the comparison operator != the space is needed there to separate 
the two.







---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
I'm putting this out there before we publish a fix so that we can discuss
how best to fix it.

Doug and Sherry recently found the source of an important performance issue
with the Postgres shared buffer cache.

The issue is summarized like this: the buffer cache in PGSQL is not scan
resistant as advertised.  A sequential scan of a table larger than cache
will pollute the buffer cache in almost all circumstances.

Here is performance of GPDB 2.301 (Postgres 8.1.6) on a single X4500
(thumper-3) with 4 cores where bigtable is a table 2x the size of RAM and
memtable is a table that fits into I/O cache:

With our default setting of shared_buffers (16MB):

Operation   memtablebigtable
---
SELECT COUNT(*) 1221 MB/s   973 MB/s
VACUUM  1709 MB/s   1206 MB/s

We had observed that VACUUM would perform better when done right after a
SELECT.  In the above example, the faster rate from disk was 1608 MB/s,
compared to the normal rate of 1206 MB/s.

We verified this behavior on Postgres 8.2 as well.  The buffer selection
algorithm is choosing buffer pages scattered throughout the buffer cache in
almost all circumstances.

Sherry traced the behavior to the processor repeatedly flushing the L2
cache.  Doug found that we weren't using the Postgres buffer cache the way
we expected, instead we were loading the scanned data from disk into the
cache even though there was no possibility of reusing it.  In addition to
pushing other, possibly useful pages from the cache, it has the additional
behavior of invalidating the L2 cache for the remainder of the executor path
that uses the data.

To prove that the buffer cache was the source of the problem, we dropped the
shared buffer size to fit into L2 cache (1MB per Opteron core), and this is
what we saw (drop size of shared buffers to 680KB):

Operation   memtablebigtable
---
SELECT COUNT(*) 1320 MB/s   1059 MB/s
VACUUM  3033 MB/s   1597 MB/s

These results do not vary with the order of operations.

Thoughts on the best way to fix the buffer selection algorithm?  Ideally,
one page would be used in the buffer cache in circumstances where the table
to be scanned is (significantly?) larger than the size of the buffer cache.
 
- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 The issue is summarized like this: the buffer cache in PGSQL is not scan
 resistant as advertised.

Sure it is.  As near as I can tell, your real complaint is that the
bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache;
which is hardly surprising considering it doesn't know the size of L2
cache.  That's not a consideration that we've ever taken into account.

I'm also less than convinced that it'd be helpful for a big seqscan:
won't reading a new disk page into memory via DMA cause that memory to
get flushed from the processor cache anyway?  I wonder whether your
numbers are explained by some other consideration than you think.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
When we instrument the page selections made within the buffer cache, they are 
sequential and span the entire address space of the cache.

With respect to whether it's L2, it's a conclusion based on the experimental 
results.  It's not the TLB, as we also tested for the 512 entries for each L2.

One thing I left out of the previous post: the difference between fast and slow 
behavior was that in the fast case, the buffer selection alternated between two 
buffer pages.  This was the case only when the preceding statement was a SELECT 
and the statement was VACUUM.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]
Sent:   Sunday, March 04, 2007 08:36 PM Eastern Standard Time
To: Luke Lonergan
Cc: PGSQL Hackers; Doug Rady; Sherry Moore
Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant

Luke Lonergan [EMAIL PROTECTED] writes:
 The issue is summarized like this: the buffer cache in PGSQL is not scan
 resistant as advertised.

Sure it is.  As near as I can tell, your real complaint is that the
bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache;
which is hardly surprising considering it doesn't know the size of L2
cache.  That's not a consideration that we've ever taken into account.

I'm also less than convinced that it'd be helpful for a big seqscan:
won't reading a new disk page into memory via DMA cause that memory to
get flushed from the processor cache anyway?  I wonder whether your
numbers are explained by some other consideration than you think.

regards, tom lane



Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
One more thing: the L2 is invalidated when re-written from the kernel IO cache, 
but the pages addressed in L2 retain their values when 'writeen thru' which 
allows the new data to be re-used up the executor chain.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]
Sent:   Sunday, March 04, 2007 08:36 PM Eastern Standard Time
To: Luke Lonergan
Cc: PGSQL Hackers; Doug Rady; Sherry Moore
Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant

Luke Lonergan [EMAIL PROTECTED] writes:
 The issue is summarized like this: the buffer cache in PGSQL is not scan
 resistant as advertised.

Sure it is.  As near as I can tell, your real complaint is that the
bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache;
which is hardly surprising considering it doesn't know the size of L2
cache.  That's not a consideration that we've ever taken into account.

I'm also less than convinced that it'd be helpful for a big seqscan:
won't reading a new disk page into memory via DMA cause that memory to
get flushed from the processor cache anyway?  I wonder whether your
numbers are explained by some other consideration than you think.

regards, tom lane



[HACKERS] Automatic adjustment of bgwriter_lru_maxpages (was: Dead Space Map version 2)

2007-03-04 Thread ITAGAKI Takahiro
Jim C. Nasby [EMAIL PROTECTED] wrote:

   Perhaps it would be better to have the bgwriter take a look at how many
   dead tuples (or how much space the dead tuples account for) when it
   writes a page out and adjust the DSM at that time.
  
  Yeah, I feel it is worth optimizable, too. One question is, how we treat
  dirty pages written by backends not by bgwriter? If we want to add some
  works in bgwriter, do we also need to make bgwriter to write almost of
  dirty pages?
 
 IMO yes, we want the bgwriter to be the only process that's normally
 writing pages out. How close we are to that, I don't know...

I'm working on making the bgwriter to write almost of dirty pages. This is
the proposal for it using automatic adjustment of bgwriter_lru_maxpages.

The bgwriter_lru_maxpages value will be adjusted to the equal number of calls
of StrategyGetBuffer() per cycle with some safety margins (x2 at present).
The counter are incremented per call and reset to zero at StrategySyncStart().


This patch alone is not so useful except for hiding hardly tunable parameters
from users. However, it would be a first step of allow bgwriters to do some
works before writing dirty buffers.

- [DSM] Pick out pages worth vaccuming and register them into DSM.
- [HOT] Do a per page vacuum for HOT updated tuples. (Is it worth doing?)
- [TODO Item] Shrink expired COLD updated tuples to just their headers.
- Set commit hint bits to reduce subsequent writes of blocks.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg01363.php


I tested the attached patch on pgbench -s5 (80MB) with shared_buffers=32MB.
I got an expected result as below. Over 75% of buffers are written by
bgwriter. In addition , automatic adjusted bgwriter_lru_maxpages values
were much higher than the default value (5). It shows that the most suitable
values greatly depends on workloads.

 benchmark  | throughput | cpu-usage | by-bgwriter | bgwriter_lru_maxpages
++---+-+---
 default|300tps  | 100%  |  77.5%  |  120 pages/cycle
 with sleep |150tps  |  50%  |  98.6%  |   70 pages/cycle


I hope that this patch will be a first step of the intelligent bgwriter.
Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Mark Kirkwood

Tom Lane wrote:

Luke Lonergan [EMAIL PROTECTED] writes:

The issue is summarized like this: the buffer cache in PGSQL is not scan
resistant as advertised.


Sure it is.  As near as I can tell, your real complaint is that the
bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache;
which is hardly surprising considering it doesn't know the size of L2
cache.  That's not a consideration that we've ever taken into account.



To add a little to this - forgetting the scan resistant point for the
moment... cranking down shared_buffers to be smaller than the L2 cache
seems to help *any* sequential scan immensely, even on quite modest HW:

e.g: PIII 1.26Ghz 512Kb L2 cache, 2G ram,

SELECT count(*) FROM lineitem (which is about 11GB) performance:

Shared_buffers  Elapsed
--  ---
400MB   101 s
128KB74 s

When I've profiled this activity, I've seen a lot of time spent
searching for/allocating a new buffer for each page being fetched.
Obviously having less of them to search through will help, but having
less than the L2 cache-size worth of 'em seems to help a whole lot!

Cheers

Mark





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Gavin Sherry
On Mon, 5 Mar 2007, Mark Kirkwood wrote:

 To add a little to this - forgetting the scan resistant point for the
 moment... cranking down shared_buffers to be smaller than the L2 cache
 seems to help *any* sequential scan immensely, even on quite modest HW:

 e.g: PIII 1.26Ghz 512Kb L2 cache, 2G ram,

 SELECT count(*) FROM lineitem (which is about 11GB) performance:

 Shared_buffers  Elapsed
 --  ---
 400MB   101 s
 128KB74 s

 When I've profiled this activity, I've seen a lot of time spent
 searching for/allocating a new buffer for each page being fetched.
 Obviously having less of them to search through will help, but having
 less than the L2 cache-size worth of 'em seems to help a whole lot!

Could you demonstrate that point by showing us timings for shared_buffers
sizes from 512K up to, say, 2 MB? The two numbers you give there might
just have to do with managing a large buffer.

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
Gavin, Mark,

 Could you demonstrate that point by showing us timings for 
 shared_buffers sizes from 512K up to, say, 2 MB? The two 
 numbers you give there might just have to do with managing a 
 large buffer.

I suggest two experiments that we've already done:
1) increase shared buffers to double the L2 cache size, you should see
that the behavior reverts to the slow performance and is constant at
larger sizes

2) instrument the calls to BufferGetPage() (a macro) and note that the
buffer block numbers returned increase sequentially during scans of
tables larger than the buffer size

- Luke


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Tom Lane
Gavin Sherry [EMAIL PROTECTED] writes:
 Could you demonstrate that point by showing us timings for shared_buffers
 sizes from 512K up to, say, 2 MB? The two numbers you give there might
 just have to do with managing a large buffer.

Using PG CVS HEAD on 64-bit Intel Xeon (1MB L2 cache), Fedora Core 5,
I don't measure any noticeable difference in seqscan speed for
shared_buffers set to 32MB or 256kB.  I note that the code would
not let me choose the latter setting without a large decrease in
max_connections, which might be expected to cause some performance
changes in itself.

Now this may only prove that the disk subsystem on this machine is
too cheap to let the system show any CPU-related issues.  I'm seeing
a scan rate of about 43MB/sec for both count(*) and plain ol' wc,
which is a factor of 4 or so less than Mark's numbers suggest...
but top shows CPU usage of less than 5%, so even with a 4x faster
disk I'd not really expect that CPU speed would become interesting.

(This is indeed a milestone, btw, because it wasn't so long ago that
count(*) was nowhere near disk speed.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq