Re: [HACKERS] sync_file_range()
Christopher Kings-Lynne <[EMAIL PROTECTED]> wrote: > http://lwn.net/Articles/178199/ > Check out the article on sync_file_range(): > Is that at all useful for PostgreSQL's purposes? I'm interested in it, with which we could improve responsiveness during checkpoints. Though it is Linux specific system call, but we could use the combination of mmap() and msync() instead of it; I mean we can use mmap only to flush dirty pages, not to read or write pages. --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] kill_prior_tuple for bitmap scan
"Tom Lane" <[EMAIL PROTECTED]> wrote > > Really? An indexscan will release pin before returning no-more-tuples, > and had better do so else we leak pins during queries involving many > indexscans. > I guess I see your point. For the scan stages not returning no-more-tuples, we can do kill, but the problem is that most bitmap index scan can finish in just one round :-(. > > Not sure I believe that either. Even granting the assumption that the > pages are still in cache, this implies a big increase in bufmgr traffic. > If you mean the bufmgr traffic is on the BufMappingLock, then I don't worry too much. Notice that we can have a list of buffer_ids that we are interested in, we can pin/recheck-buftag of these targets without asking bufmgr where are they. If we missed, then unpin and forget them is ok. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] sync_file_range()
http://lwn.net/Articles/178199/ Check out the article on sync_file_range(): long sync_file_range(int fd, loff_t offset, loff_t nbytes, int flags); This call will synchronize a file's data to disk, starting at the given offset and proceeding for nbytes bytes (or to the end of the file if nbytes is zero). How the synchronization is done is controlled by flags: * SYNC_FILE_RANGE_WAIT_BEFORE blocks the calling process until any already in-progress writeout of pages (in the given range) completes. * SYNC_FILE_RANGE_WRITE starts writeout of any dirty pages in the given range which are not already under I/O. * SYNC_FILE_RANGE_WAIT_AFTER blocks the calling process until the newly-initiated writes complete. An application which wants to initiate writeback of all dirty pages should provide the first two flags. Providing all three flags guarantees that those pages are actually on disk when the call returns. Is that at all useful for PostgreSQL's purposes? Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] kill_prior_tuple for bitmap scan
"Qingqing Zhou" <[EMAIL PROTECTED]> writes: > As I read, the kill_prior_tuple optimization doesn't work for bitmap scan > code. To fix this problem, we have two choices. > One is still use the kill_prior_tuple trick in a modified way. Heap TIDs > recycling should not be a problem. This is because generally we always hold > pin of the last index page we scanned, Really? An indexscan will release pin before returning no-more-tuples, and had better do so else we leak pins during queries involving many indexscans. (IIRC, I broke that around 7.3 :-() > Another smaller problem is that we have > to scan the heap pages again to identify the index pointers, but that's > affordable since these pages should be in the buffer pool with big chance. Not sure I believe that either. Even granting the assumption that the pages are still in cache, this implies a big increase in bufmgr traffic. This certainly is an interesting problem, but there is not a simple solution. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] kill_prior_tuple for bitmap scan
As I read, the kill_prior_tuple optimization doesn't work for bitmap scan code. To fix this problem, we have two choices. One is still use the kill_prior_tuple trick in a modified way. Heap TIDs recycling should not be a problem. This is because generally we always hold pin of the last index page we scanned, so the vacuum has to wait and the underlying heap can't get vaccummed. Another smaller problem is that we have to scan the heap pages again to identify the index pointers, but that's affordable since these pages should be in the buffer pool with big chance. An alternative way might leave the job to autovacuum -- once we found bitmap scan seeing a lot of dead tuples, notify it to do the job. But this doesn't look interesting -- the autovacuum performs in a coarse and blind level and cause a lot of CPUs/IOs. Anyway, it is a performance lost on a frequently updated table if we do nothing. I observed in a optimized OLTP server for 7.4 using index scan experienced a performance problem due to the plan changed to bitmap index scan. Is there any show-stop reasons that we don't do either of them? Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Rethinking stats communication mechanisms
"Bort, Paul" <[EMAIL PROTECTED]> writes: >> Anyone know a variant of this that really works? > Here's a theory: If the counter is bumped to an odd number before > modification, and an even number after it's done, then the reader will > know it needs to re-read if the counter is an odd number. Great minds think alike ;-) ... I just committed exactly that protocol. I believe it is correct, because AFAICS there are only four possible risk cases: * reader's read starts before and ends after writer's update: reader will certainly note a change in update counter. * reader's read starts before and ends within writer's update: reader will note a change in update counter. * reader's read starts within and ends after writer's update: reader will note a change in update counter. * reader's read starts within and ends within writer's update: reader will see update counter as odd. Am I missing anything? Note in particular that this protocol does not assume atomic update of the counter, so we don't need to worry about whether int is sig_atomic_t. If any of the bytes change, we have what we need. We could use a counter narrower than int, but then there's some tiny risk that the counter could wrap all the way around while the reader is blocked. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Rethinking stats communication mechanisms
> > > BTW, I think the writer would actually need to bump the > counter twice, > > once before and once after it modifies its stats area. > Else there's > > no way to detect that you've copied a partially-updated stats entry. > > Actually, neither of these ideas works: it's possible that > the reader copies the entry between the two increments of the > counter. Then, it won't see any reason to re-read, but > nonetheless it has copied an inconsistent partially-modified entry. > > Anyone know a variant of this that really works? > Here's a theory: If the counter is bumped to an odd number before modification, and an even number after it's done, then the reader will know it needs to re-read if the counter is an odd number. This might be assuming too much about what the writer knows about the current contents of the counter, but since it's per-back end, I think it would work. Regards, Paul Bort ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] regresssion script hole
On Sun, Jun 18, 2006 at 07:18:07PM -0600, Michael Fuhr wrote: > Maybe I'm misreading the packet, but I think the query is for > ''kaltenbrunner.cc (two single quotes followed by kaltenbrunner.cc) Correction: ''.kaltenbrunner.cc -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] regresssion script hole
On Sun, Jun 18, 2006 at 07:50:04PM -0400, Tom Lane wrote: > 24583 postgres CALL > recvfrom(0x3,0x477e4000,0x1,0,0xfffe4da0,0xfffe4d5c) > 24583 postgres GIO fd 3 read 37 bytes >"\M-Sr\M^A\M^B\0\^A\0\0\0\0\0\0\^B''\rkaltenbrunner\^Bcc\0\0\^A\0\^A" > 24583 postgres RET recvfrom 37/0x25 > 24583 postgres CALL close(0x3) > 24583 postgres RET close 0 > > I'm not too up on what the DNS protocol looks like on-the-wire, but I'll > bet this is it. I think it's trying to look up "kaltenbrunner.cc" and > failing. Maybe I'm misreading the packet, but I think the query is for ''kaltenbrunner.cc (two single quotes followed by kaltenbrunner.cc) and the DNS server is responding with SRVFAIL. -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] regresssion script hole
Tom Lane wrote: Anyway, the tail end of the trace shows it repeatedly sending off a UDP packet and getting practically the same data back: I'm not too up on what the DNS protocol looks like on-the-wire, but I'll bet this is it. I think it's trying to look up "kaltenbrunner.cc" and failing. Why are we actually looking up anything? Just so we can bind to a listening socket? Anyway, maybe the box needs a lookup line in its /etc/resolv.conf to direct it to use files first, something like lookup file bind Stefan, can you look into that? It would be a bit ugly if it's calling DNS (and failing) to resolve localhost. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] regresssion script hole
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> One idea that comes to mind is a DNS lookup timeout. Can you strace the >> postmaster to see what it's doing? > There is ktrace output I managed to capture at > http://developer.postgresql.org/~adunstan/ktrace.txt > Not sure what it tells us. I do see it reading in the whole timezone db, > which isn't pretty. Yeah... I think we fixed that in 8.1. Anyway, the tail end of the trace shows it repeatedly sending off a UDP packet and getting practically the same data back: 24583 postgres CALL socket(0x2,0x2,0) 24583 postgres RET socket 3 24583 postgres CALL sendto(0x3,0x43e1e000,0x25,0,0x493a6338,0x10) 24583 postgres GIO fd 3 wrote 37 bytes "\M-Sr\^A\0\0\^A\0\0\0\0\0\0\^B''\rkaltenbrunner\^Bcc\0\0\^A\0\^A" 24583 postgres RET sendto 37/0x25 24583 postgres CALL getpid() 24583 postgres RET getpid 24583/0x6007 24583 postgres CALL select(0x4,0x40739600,0,0,0xfffe4d90) 24583 postgres RET select 1 24583 postgres CALL recvfrom(0x3,0x477e4000,0x1,0,0xfffe4da0,0xfffe4d5c) 24583 postgres GIO fd 3 read 37 bytes "\M-Sr\M^A\M^B\0\^A\0\0\0\0\0\0\^B''\rkaltenbrunner\^Bcc\0\0\^A\0\^A" 24583 postgres RET recvfrom 37/0x25 24583 postgres CALL close(0x3) 24583 postgres RET close 0 I'm not too up on what the DNS protocol looks like on-the-wire, but I'll bet this is it. I think it's trying to look up "kaltenbrunner.cc" and failing. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] regresssion script hole
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: We need both, I think. I am still trying to find out why it's taking so long. This is on the 8.0 branch, though. Later branches seem to be working. One idea that comes to mind is a DNS lookup timeout. Can you strace the postmaster to see what it's doing? There is ktrace output I managed to capture at http://developer.postgresql.org/~adunstan/ktrace.txt Not sure what it tells us. I do see it reading in the whole timezone db, which isn't pretty. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] regresssion script hole
Andrew Dunstan <[EMAIL PROTECTED]> writes: > We need both, I think. I am still trying to find out why it's taking so > long. This is on the 8.0 branch, though. Later branches seem to be working. One idea that comes to mind is a DNS lookup timeout. Can you strace the postmaster to see what it's doing? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] regresssion script hole
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: The problem is that if the postmaster takes more than 60 seconds to start listening (as is apparently happening on spoonbill - don't yet know why) this code falls through. If the postmaster takes that long to start listening, I'd say we need to fix the postmaster not pg_regress. We need both, I think. I am still trying to find out why it's taking so long. This is on the 8.0 branch, though. Later branches seem to be working. I'm inclined to run the psql test one more time to make sure we can actually connect, and if not then fail at this point. How does that differ from just iterating the loop one more time? There is no guarantee that at the end of the loop we have connected successfully to postgres. I will post a patch that shows what I suggest. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] regresssion script hole
Andrew Dunstan <[EMAIL PROTECTED]> writes: > The problem is that if the postmaster takes more than 60 seconds to > start listening (as is apparently happening on spoonbill - don't yet > know why) this code falls through. If the postmaster takes that long to start listening, I'd say we need to fix the postmaster not pg_regress. > I'm inclined to run the psql test one more time to make sure we can > actually connect, and if not then fail at this point. How does that differ from just iterating the loop one more time? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MultiXacts & WAL
>There is no "regular shared locks" in postgres in that sense. Shared locks >are only used for maintaining FK integrity. Or by manually issuing a >SELECT FOR SHARE, but that's also for maintaining integrity. MVCC >rules take care of the "plain reads". If you're not familiar with MVCC, >it's explained in chapter 12 of the manual.>>The source code in heapam.c also mentions Point In Time Recovery to >require logging the locks, though I'm not sure why.Thanks for your explanations, now I can see what was confusing me.The problem with is that we don't know beforehand if a transaction is a distributed one or not.Feel free to write a benchmark to see how much difference the logging makes! If it's significant, I'm sure we can figure out ways to improve it.Now that i finally see that multixacts are due only to explicit shared lock requests or to FKs, I tend to agree with tom's original doubts about the actual impact of the multixact related logging activities. Of course in practice such an impact would vary from application to application, so it may still make sense for some classes of workloads to avoid multixact logging, assuming they contain no distributed transactions and finding an hack to know beforehand whether a transaction is distributed or not... BTW, if i manage to find some free time to do some performance tests, i'll sure let you know!Thanks again, Paolo Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
[HACKERS] regresssion script hole
While investigating some problems with buildfarm member spoonbill I came across this piece of code in pg_regress.sh, which seems less than robust: # Wait till postmaster is able to accept connections (normally only # a second or so, but Cygwin is reportedly *much* slower). Don't # wait forever, however. i=0 max=60 until "$bindir/psql" -X $psql_options postgres /dev/null do i=`expr $i + 1` if [ $i -ge $max ] then break fi if kill -0 $postmaster_pid >/dev/null 2>&1 then : still starting up else break fi sleep 1 done if kill -0 $postmaster_pid >/dev/null 2>&1 then echo "running on port $PGPORT with pid $postmaster_pid" else echo echo "$me: postmaster did not start" echo "Examine $LOGDIR/postmaster.log for the reason." echo (exit 2); exit fi The problem is that if the postmaster takes more than 60 seconds to start listening (as is apparently happening on spoonbill - don't yet know why) this code falls through. I'm inclined to run the psql test one more time to make sure we can actually connect, and if not then fail at this point. I wouldn't bother but it did confuse the heck out of both Stefan and me when createlang failed. Thoughts? cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Rethinking stats communication mechanisms
I wrote: > PFC <[EMAIL PROTECTED]> writes: >> So, the proposal : >> On executing a command, Backend stores the command string, then >> overwrites the counter with (counter + 1) and with the timestamp of >> command start. >> Periodically, like every N seconds, a separate process reads the counter, >> then reads the data, then reads the counter again. > BTW, I think the writer would actually need to bump the counter twice, > once before and once after it modifies its stats area. Else there's > no way to detect that you've copied a partially-updated stats entry. Actually, neither of these ideas works: it's possible that the reader copies the entry between the two increments of the counter. Then, it won't see any reason to re-read, but nonetheless it has copied an inconsistent partially-modified entry. Anyone know a variant of this that really works? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Rethinking stats communication mechanisms
Ühel kenal päeval, P, 2006-06-18 kell 15:09, kirjutas Tom Lane: > "Magnus Hagander" <[EMAIL PROTECTED]> writes: > > Might it not be a win to also store "per backend global values" in the > > shared memory segment? Things like "time of last command", "number of > > transactions executed in this backend", "backend start time" and other > > values that are fixed-size? One thing that is doable in constant size memory and would be enormously usable for us is counting to-level function calls and storing their total (and possibly also max) duration. The resaon being, that our production databases are accessed by clients using functions only (with some uninteresting exeptions of course), that is call in form of "SELECT x,y,z FROM myfunc(i,j,k)" So reserving N*1.5 slots (N being the number of functions defined at databse startup) would be ok. If more than N*0.5 functions are defined in the database lifetime, then the rest are simply ignored (not counted). Or maybe a better approach would be to have a conf variable "number-of-functions-to-track" and a special boolean flag track_me in pg_functions. In this way you don't accidentally run out of shared mem by defining lots of new functions and then restarting the cluster. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Rethinking stats communication mechanisms
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > Might it not be a win to also store "per backend global values" in the > shared memory segment? Things like "time of last command", "number of > transactions executed in this backend", "backend start time" and other > values that are fixed-size? I'm including backend start time, command start time, etc under the heading of "current status" which'll be in the shared memory. However, I don't believe in trying to count events (like transaction commits) that way. If we do then we risk losing events whenever a backend quits and is replaced by another. I haven't yet looked through the stats in detail, but this approach basically presumes that we are only going to count events per-table and per-database --- I am thinking that the background stats collector process won't even keep track of individual backends anymore. (So, we'll fix the old problem of loss of backend-exit messages resulting in bogus displays.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Rethinking stats communication mechanisms
> The existing stats collection mechanism seems OK for event > counts, although I'd propose two changes: one, get rid of the > separate buffer process, and two, find a way to emit event > reports in a time-driven way rather than once per transaction > commit. I'm a bit vague about how to do the latter at the moment. Might it not be a win to also store "per backend global values" in the shared memory segment? Things like "time of last command", "number of transactions executed in this backend", "backend start time" and other values that are fixed-size? You can obviously not do it for things like per-table values, since the size can't be predicted, but all per-backend counters that are fixed size should be able to do this, I think. And if it's just a counter, it should be reasonably safe to just do the increment operation without locking, since there's only one writer for each process. That should have a much lower overhead than UDP or whatever to the stats process, no? It might be worthwhile to add a section for things like bgwriter (and possibly the archiver?) to deliver statics that we can add statistics views for. (they can obviously not use a standard backend "struct" for this since they'd have completely different values to report) //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MultiXacts & WAL
On Sun, 18 Jun 2006, paolo romano wrote: Anyway, again in theory, if one wanted to minimize logging overhead for shared locks, one might adopt a different treatment for (i) regular shared locks (i.e. locks due to plain reads not requiring durability in case of 2PC) and (ii) shared locks held because some SQL command is referencing a tuple via a FK, which have to be persisted until the 2-nd 2PC phase (There is no any other scenario in which you *must* persist shared locks, is there?) There is no "regular shared locks" in postgres in that sense. Shared locks are only used for maintaining FK integrity. Or by manually issuing a SELECT FOR SHARE, but that's also for maintaining integrity. MVCC rules take care of the "plain reads". If you're not familiar with MVCC, it's explained in chapter 12 of the manual. The source code in heapam.c also mentions Point In Time Recovery to require logging the locks, though I'm not sure why. By the way, postgresql is detailedly logging *every* single shared lock, even though this is actually needed only if (i) the transaction turns out to be a distributed one (i.e. prepare is issued on that transactions), AND (ii) the shared lock is due to ensure validity of a FK. AFAICS, in most practical workloads (i) local transactions dominate distributed ones and (ii) shared locks due to plain reads dominate locks due to FK, so the current implementaion does not seem to be optimizing the most frequent scenario. The problem with is that we don't know beforehand if a transaction is a distributed one or not. Feel free to write a benchmark to see how much difference the logging makes! If it's significant, I'm sure we can figure out ways to improve it. - Heikki ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Segfault Exiting psql
Jason Essington <[EMAIL PROTECTED]> writes: > Has there been any movement on this? as of 8.1.2 psql still whines on > OS X tiger when you exit. > I realize it is not significant, but I'd still rather not see it. I've committed that fix into CVS HEAD. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Rethinking stats communication mechanisms
Douglas McNaught <[EMAIL PROTECTED]> writes: > (a) and (b): of course you would only do it on a temporary basis for > problem diagnosis. Temporary or not it isn't really an option when you're dealing with high volumes. You could imagine a setup where say, 1% of page requests randomly turn on debugging to get a random sample of database traffic. There are always solutions. But my point is that there's no reason to think that only long queries are useful for snapshots. Short queries are equally capable of consuming resources in aggregate. Hiding some subset of queries assuming they're uninteresting is only going to produce deceptive results data. > Ideally, you'd find the query storm problem in load testing before you > ever got to production. I hope to someday visit that planet--it must > be nice. Indeed, when you get there send me a postcard :) -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] union all bug?
"Gurjeet Singh" <[EMAIL PROTECTED]> writes: > Probably this explains the ERROR for the last query... The ORDER BY > and LIMIT clauses are expected to end a query (except for subqueries, > of course), and hence the keyword UNION is not expected after the > LIMIT clause... Yeah. In theory that's unambiguous, but IIRC we couldn't persuade bison of the fact, so you have to add parens. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] union all bug?
Probably this explains the ERROR for the last query... The ORDER BY and LIMIT clauses are expected to end a query (except for subqueries, of course), and hence the keyword UNION is not expected after the LIMIT clause... On 6/18/06, Tom Lane <[EMAIL PROTECTED]> wrote: Joe Conway <[EMAIL PROTECTED]> writes: > I was trying to work around limitations with "partitioning" of tables > using constraint exclusion, when I ran across this little oddity: I think you're under a misimpression about the syntax behavior of ORDER BY and UNION. Per spec, ORDER BY binds less tightly than UNION, thus select foo union select bar order by x means (select foo union select bar) order by x If you want to apply ORDER BY to either arm of a union individually, you need parentheses, eg (select foo order by x) union (select bar order by x) (Note that this construct fails to guarantee that the output of the union will be sorted by x!) LIMIT is not in the spec but we treat it like ORDER BY for this purpose. To guarantee the ordering, one can use select * from (select foo from A union select bar from B) order by x ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] union all bug?
Joe Conway <[EMAIL PROTECTED]> writes: > I was trying to work around limitations with "partitioning" of tables > using constraint exclusion, when I ran across this little oddity: I think you're under a misimpression about the syntax behavior of ORDER BY and UNION. Per spec, ORDER BY binds less tightly than UNION, thus select foo union select bar order by x means (select foo union select bar) order by x If you want to apply ORDER BY to either arm of a union individually, you need parentheses, eg (select foo order by x) union (select bar order by x) (Note that this construct fails to guarantee that the output of the union will be sorted by x!) LIMIT is not in the spec but we treat it like ORDER BY for this purpose. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] union all bug?
I was trying to work around limitations with "partitioning" of tables using constraint exclusion, when I ran across this little oddity: -- works test=# select * from (select time from url_access_2006_06_07 order by 1 limit 2) as ss1; time - 2006-06-07 15:07:41 2006-06-07 15:07:41 (2 rows) -- works test=# select time from url_access_2006_06_08 order by 1 limit 2; time - 2006-06-08 15:07:41 2006-06-08 15:07:41 (2 rows) -- huh ?!? test=# select * from (select time from url_access_2006_06_07 order by 1 limit 2) as ss1 union all select time from url_access_2006_06_08 order by 1 limit 2; time - 2006-06-07 15:07:41 2006-06-07 15:07:41 (2 rows) -- works test=# select * from (select time from url_access_2006_06_07 order by 1 limit 2) as ss1 union all select * from (select time from url_access_2006_06_08 order by 1 limit 2) as ss2; time - 2006-06-07 15:07:41 2006-06-07 15:07:41 2006-06-08 15:07:41 2006-06-08 15:07:41 (4 rows) I get an error if I try to eliminate the first FROM clause subselect: test=# select time from url_access_2006_06_07 order by 1 limit 2 union all select * from (select time from url_access_2006_06_08 order by 1 limit 2) as ss2; ERROR: syntax error at or near "all" at character 65 LINE 1: ...om url_access_2006_06_07 order by 1 limit 2 union all select... So I'm wondering whether the second FROM clause subselect is really required, but not getting enforced as it should? Joe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Slightly bogus regression test for contrib/dblink
Lines 509-512 of contrib/dblink/expected/dblink.out read: -- this should fail because there is no open transaction SELECT dblink_exec('myconn','DECLARE xact_test CURSOR FOR SELECT * FROM foo'); ERROR: sql error DETAIL: ERROR: cursor "xact_test" already exists The error message is not consistent with what the comment claims. Looking at the test case in total, I think the code is responding correctly given the rules stated in your last commit message: 2005-10-17 22:55 joe * contrib/dblink/: dblink.c, expected/dblink.out, sql/dblink.sql: When a cursor is opened using dblink_open, only start a transaction if there isn't one already open. Upon dblink_close, only commit the open transaction if it was started by dblink_open, and only then when all cursors opened by dblink_open are closed. The transaction accounting is done individually for all named connections, plus the persistent unnamed connection. However, this comment is wrong and so is the preceding one, and I think maybe you want to alter the test case so it does actually exercise closing the transaction completely. BTW, I was led to notice this while examining the current buildfarm failure report from osprey, http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=osprey&dt=2006-06-17%2004:00:16 It looks to me like the diffs are consistent with the idea that the test is using a copy of dblink that predates this patch ... do you agree? If so, anyone have an idea how that could happen? I thought we'd fixed all the rpath problems, and anyway osprey wasn't failing like this before today. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MultiXacts & WAL
paolo romano <[EMAIL PROTECTED]> writes: > Anyway, again in theory, if one wanted to minimize logging overhead for > shared locks, one might adopt a different treatment for (i) regular shared > locks (i.e. locks due to plain reads not requiring durability in case of 2PC) > and (ii) shared locks held because some SQL command is referencing a tuple > via a FK, which have to be persisted until the 2-nd 2PC phase (There is no > any other scenario in which you *must* persist shared locks, is there?) I can't see any basis at all for asserting that you don't need to persist particular types of locks. In the current system, a multixact lock might arise from either FK locking, or a user-issued SELECT FOR SHARE. In either case it's possible that the lock was taken to guarantee the integrity of a data change made somewhere else. So we can't release it before commit. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Unable to initdb using HEAD on Windows XP
Never mind. I scrubbed my folders and obtained a new fresh copy from CVS. Now it works. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MultiXacts & WAL
No, it's not safe to release them until 2nd phase commit.Imagine table foo and table bar. Table bar has a foreign key reference to foo.1. Transaction A inserts a row to bar, referencing row R in foo. This acquires a shared lock on R.2. Transaction A precommits, releasing the lock.3. Transaction B deletes R. The new row inserted by A is not visible to B, so the delete succeeds.4. Transaction A and B commit. Oops, the new row in bar references R that doesn't exist anymore.Holding the lock until the true end of transaction, the 2nd phase of commit, blocks B from deleting R.- Heikki---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanlyHeikki, thanks for the clarifications. I was not considering the additional issues arising in case of referential integrity constraints... in fact i was citing a known result from theory books on 2PC, which did not include FK in their speculations... But as usual in theory things look always much simpler than in practice!Anyway, again in theory, if one wanted to minimize logging overhead for shared locks, one might adopt a different treatment for (i) regular shared locks (i.e. locks due to plain reads not requiring durability in case of 2PC) and (ii) shared locks held because some SQL command is referencing a tuple via a FK, which have to be persisted until the 2-nd 2PC phase (There is no any other scenario in which you *must* persist shared locks, is there?) Of course, in practice distinguishing the 2 above situations may not be so simple and it still has to be shown whether such an optimization is really worth of... By the way, postgresql is detailedly logging *every* single shared lock, even though this is actually needed only if (i) the transaction turns out to be a distributed one (i.e. prepare is issued on that transactions), AND (ii) the shared lock is due to ensure validity of a FK. AFAICS, in most practical workloads (i) local transactions dominate distributed ones and (ii) shared locks due to plain reads dominate locks due to FK, so the current implementaion does not seem to be optimizing the most frequent scenario.regards, paolo Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Re: [HACKERS] Unable to initdb using HEAD on Windows XP
Some more info. If I manually create the data directory first, the output is different: C:\Tada\Workspace>mkdir data C:\Tada\Workspace>initdb -D data The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. fixing permissions on existing directory data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 4000/20 creating configuration files ... ok creating template1 database in data/base/1 ... ok initializing pg_authid ... child process was terminated by signal 5 initdb: removing contents of data directory "data" Thomas Hallgren wrote: I just compiled a fresh copy from CVS head. I then tried to do an initdb as user 'postgres' (non admin user on my system). I get the following error: C:\Tada\Workspace>initdb -D data The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. creating directory data ... ok creating subdirectories ... initdb: could not create directory "data/global": Permission denied initdb: removing data directory "data" could not open directory "data": No such file or directory initdb: failed to remove data directory AFAICS, no data directory is ever created so the 'creating directory data ... ok' message is probably incorrect. I even tried to change the permissions on the parent directory so that user 'postgres' has full control. It doesn't help. I didn't think it would since I am able to create a database in this directory if I'm using version 8.1.4. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Unable to initdb using HEAD on Windows XP
I just compiled a fresh copy from CVS head. I then tried to do an initdb as user 'postgres' (non admin user on my system). I get the following error: C:\Tada\Workspace>initdb -D data The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale English_United States.1252. creating directory data ... ok creating subdirectories ... initdb: could not create directory "data/global": Permission denied initdb: removing data directory "data" could not open directory "data": No such file or directory initdb: failed to remove data directory AFAICS, no data directory is ever created so the 'creating directory data ... ok' message is probably incorrect. I even tried to change the permissions on the parent directory so that user 'postgres' has full control. It doesn't help. I didn't think it would since I am able to create a database in this directory if I'm using version 8.1.4. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Rethinking stats communication mechanisms
Greg Stark <[EMAIL PROTECTED]> writes: > Douglas McNaught <[EMAIL PROTECTED]> writes: > >> Yeah, but if you turn on query logging in that case you'll see the >> bajillions of short queries, so you don't need the accurate snapshot >> to diagnose that. > > Query logging on a production OLTP machine? a) that would be a huge > performance drain on the production system b) it would produce so much logging > that it would take a significant amount of disk and i/o resources just to > handle and c) you would need to implement special purpose tools just to make > sense of these huge logs. (a) and (b): of course you would only do it on a temporary basis for problem diagnosis. We do that with our production apps where I work (when absolutely necessary). (c): Perl. :) Ideally, you'd find the query storm problem in load testing before you ever got to production. I hope to someday visit that planet--it must be nice. -Doug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
On 17-6-2006 1:24, Josh Berkus wrote: Arjen, I can already confirm very good scalability (with our workload) on postgresql on that machine. We've been testing a 32thread/16G-version and it shows near-linear scaling when enabling 1, 2, 4, 6 and 8 cores (with all four threads enabled). Keen. We're trying to keep the linear scaling going up to 32 cores of course (which doesn't happen, presently). Would you be interested in helping us troubleshoot some of the performance issues? You can ask your questions, if I happen to do know the answer, you're a step further in the right direction. But actually, I didn't do much to get this scalability... So I won't be of much help to you, its not that I spent hours on getting this performance. I just started out with the "normal" attempts to get a good config. Currently the shared buffers is set to 30k. Larger settings didn't seem to differ much on our previous 4-core version, so I didn't even check it out on this one. I noticed I forgot to set the effective cache size to more than 6G for this one too, but since our database is smaller than that, that shouldn't make any difference. The work memory was increased a bit to 2K. So there are no magic tricks here. I do have to add its a recent checkout of 8.2devel compiled using Sun Studio 11. It was compiled using this as CPPFLAGS: -xtarget=ultraT1 -fast -xnolibmopt The -xnolibmopt was added because we couldn't figure out why it yielded several linking errors at the end of the compilation when the -xlibmopt from -fast was enabled, so we disabled that particular setting from the -fast macro. The workload generated is an abstraction and simplification of our website's workload, used for benchmarking. Its basically a news and price comparision site and it runs on LAMP (with the M of MySQL), i.e. a lot of light queries, many primary-key or indexed "foreign-key" lookups for little amounts of records. Some aggregations for summaries, etc. There are little writes and hardly any on the most read tables. The database easily fits in memory, the total size of the actively read tables is about 3G. This PostgreSQL-version is not a direct copy of the queries and tables, but I made an effort of getting it more PostgreSQL-minded as much as possible. I.e. I combined a few queries, I changed "boolean"-enum's in MySQL to real booleans in Postgres, I added specific indexes (including partials) etc. We use apache+php as clients and just open X apache processes using 'ab' at the same time to generate various amounts of concurrent workloads. Solaris scales really well to higher concurrencies and PostgreSQL doesn't seem to have problems with it either in our workload. So its not really a real-life scenario, but its not a synthetic benchmark either. Here is a graph of our performance measured on PostgreSQL: http://achelois.tweakers.net/~acm/pgsql-t2000/T2000-schaling-postgresql.png What you see are three lines. Each represents the amount of total "page views" processed in 600 seconds for a specific amount of Niagara-cores (i.e. 1, 2, 4, 6 and 8). Each core had all its threads enabled, so its actually 4, 8, 16, 24 and 32 virtual cpu's you're looking at. The "Max"-line displays the maximum generated "page views" on a specific core-amount for any concurrency, respectively: 5, 13, 35, 45 and 60. The "Bij 50" is the amount of "page views" it generated with 50 apache-processes working at the same time (on two dual xeon machines, so 25 each). I took 50 a bit arbitrary but all core-configs seemed to do pretty well under that workload. The "perfect" line is based on the "Max" value for 1 core and then just multiplied by the amount of cores to have a linear reference. The "Bij 50" and the "perfect" line don't differ too much in color, but the top-one is the "perfect" line. In the near future we'll be presenting an article on this on our website, although that will be in dutch the graphs should still be easy to read for you guys. And because of that I can't promise too much detailed information until then. I hope I clarified things a bit now, if not ask me about it, Best regards, Arjen ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Rethinking stats communication mechanisms
Douglas McNaught <[EMAIL PROTECTED]> writes: > Yeah, but if you turn on query logging in that case you'll see the > bajillions of short queries, so you don't need the accurate snapshot > to diagnose that. Query logging on a production OLTP machine? a) that would be a huge performance drain on the production system b) it would produce so much logging that it would take a significant amount of disk and i/o resources just to handle and c) you would need to implement special purpose tools just to make sense of these huge logs. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match