Re: [HACKERS] .NET or Mono functions in PG
Tom Lane wrote: It is also pretty well established that if pltcl or plperl cause the backend to become multithreaded, things break horribly. I strongly Isn't that something that needs to be fixed? Its one thing not to allow for multiple threads to be in your code, but not having a threaded library linked in is going to become more and more of a problem. James ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] buildenv.pl/buildenv.bat
Magnus Hagander wrote: Hi! When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. Also, do we now have both buildenv.bat and buildenv.pl? (I have them both in my environment, but that could be a result of being lazy) builddoc.bat still seems to require buildenv.bat. /D ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] buildenv.pl/buildenv.bat
Hi! When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. Also, do we now have both buildenv.bat and buildenv.pl? (I have them both in my environment, but that could be a result of being lazy) //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Regression testing
Hi All, I am trying to do a regression testing on the version installed. Now on the installed version of postgreSQL-8.2.4 with the service started, and at the regress filder, I run the command gmake installcheck, when I run this command I get the following message: GNUmakefile:15: ../../../src/Makefile.global: No such file or directory GNUmakefile:78: /src/Makefile.shlib: No such file or directory gmake: *** No rule to make target `/src/Makefile.shlib'. Stop. Can anyone please tell me why am I getting this message and the solution for the same. Thanks in advance Regards Cinu Kuriakose Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Sorting Improvements for 8.4
On Fri, 2007-11-30 at 12:07 -0800, Jeff Davis wrote: On Tue, 2007-11-27 at 18:03 +, Simon Riggs wrote: 5. DYNAMIC RUN HANDLING (in Final Merge) Another way of addressing a) is to simply make better use of memory itself. Let's look at that in more detail: Number of runs that can be merged at once is currently fixed, based upon available memory. This has the underlying assumption that all runs will be concurrently active during final merging, which may not always be true. If we have random data then almost all runs will overlap with all other runs, i.e. the min and max values are sufficiently wide that the runs do all overlap. In many cases, data arrives in somewhat sorted order, e.g. financial data is fairly regular with some late payers but not many, and those trail off with a fairly tight decay. In the somewhat sorted case we find that the actual overlap is less than total, so there are many later runs that don't overlap the earlier ones. In the best case we would find run 1 and 2 overlap, runs 2 and 3 overlap, then 3 and 4 overlap. I have spoken with Len Shapiro, a professor at Portland State University, regarding sorting before. He suggests that PostgreSQL should implement forecasting, which is similar to what you're describing. Forecasting does not require that entire runs are disjoint, it works by tracking the maximum values from the last block read from every run. This allows you to know which run you will need more blocks from the soonest. I'm still looking into the problem to understand it better, but the algorithm is in Knuth Vol 3. I can look at it in more detail, but have you already looked into this idea? Is there a reason we don't do this currently? Interesting, I hadn't read that part. Knuth's Algorithm F covers how to do a P-way merge using 2P + 2 buffers. My ideas cover how to do a P-way merge when you don't have enough memory for that many buffers. The current sort code makes two assumptions, amongst others 1. minimizing number of runs is always worth it 2. there is a single fixed maximum size of P, depending upon memory I'm challenging both of those. Only runs that overlap need to be merged simultaneously, so if the runs aren't overlapping then its OK to allow more runs to be formed. If its OK to allow more runs, then reducing heap size to allow better CPU efficiency is possible. So Algorithm F is somewhat orthogonal to what I've proposed, though maybe still interesting. What we do now is fairly close, but you should look at the code in tuplesort.c and logtape.c to see how well it matches. That might lead to an increase in the limit of the number of concurrent runs mergeable at any one time. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Stored procedure issue
Hello What I have noticed is that when I don't use procedure at all,there's only 2-5 locks in pg_locks,after I start application which uses stored procedure the number in pg_locks increase rapidly to steady 75 even to 130 at certain moments. Any clue why procedure usage might increase locks so heavily ? Sincerely --- Dragan Zubac [EMAIL PROTECTED] wrote: Hello I have a stored procedure which does the billing stuff in our system,it works ok,but if I put in production,where there is some 5-10 billing events per second,the whole database slows down. It won't even drop some test table,reindex,vacuum,things which were done before in the blink of an eye. If I stop the application which calls the procedure,all is back to normal. We didn't implement any special locking mechanism in the procedure,all is default. The procedure is updating user's balance in table 'users'. On the other hand a couple of 'heavy load' table has foreign keys pointing to table 'users'. Is it the matter of concurency and some locking issue or maybe the existing of all those foreign keys pointing to table 'users',or maybe something else which we're not aware at the moment ? Sincerely Pera Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Regression testing
cinu wrote: GNUmakefile:15: ../../../src/Makefile.global: No such file or directory GNUmakefile:78: /src/Makefile.shlib: No such file or directory gmake: *** No rule to make target `/src/Makefile.shlib'. Stop. You need to run configure first (found in the top-level directory), so that certain files are generated. -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ You're _really_ hosed if the person doing the hiring doesn't understand relational systems: you end up with a whole raft of programmers, none of whom has had a Date with the clue stick. (Andrew Sullivan) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] [HACKERS] Stored procedure issue
Hello On 03/12/2007, Dragan Zubac [EMAIL PROTECTED] wrote: Hello What I have noticed is that when I don't use procedure at all,there's only 2-5 locks in pg_locks,after I start application which uses stored procedure the number in pg_locks increase rapidly to steady 75 even to 130 at certain moments. Any clue why procedure usage might increase locks so heavily ? Sincerely upgrade to 8.2? There is shared lock and there are less problems with locks. But I am not sure if this solves your problem. General protection before locks is all things with maximal speed. Are your queris well optimazed? Regards Pavel Stehule --- Dragan Zubac [EMAIL PROTECTED] wrote: Hello I have a stored procedure which does the billing stuff in our system,it works ok,but if I put in production,where there is some 5-10 billing events per second,the whole database slows down. It won't even drop some test table,reindex,vacuum,things which were done before in the blink of an eye. If I stop the application which calls the procedure,all is back to normal. We didn't implement any special locking mechanism in the procedure,all is default. The procedure is updating user's balance in table 'users'. On the other hand a couple of 'heavy load' table has foreign keys pointing to table 'users'. Is it the matter of concurency and some locking issue or maybe the existing of all those foreign keys pointing to table 'users',or maybe something else which we're not aware at the moment ? Sincerely Pera Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] buildenv.pl/buildenv.bat
Magnus Hagander [EMAIL PROTECTED] writes: When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. On a related note, I was wondering yesterday if there are any obsolete statements here: http://developer.postgresql.org/pgdocs/postgres/install-win32-libpq.html regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Integer datatimes vs msvc
Did we ever come to a conclusion on what to do with integer datetimes on msvc for 8.3? For the uninformed - 8.2 and earlier had the binary build with floating point timestamps, 8.3 has integer timestamps. (We're just talking the binary build here) //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] buildenv.pl/buildenv.bat
On Mon, Dec 03, 2007 at 10:52:15AM -0500, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. On a related note, I was wondering yesterday if there are any obsolete statements here: http://developer.postgresql.org/pgdocs/postgres/install-win32-libpq.html No, I refreshed that page back-when. It's used for the frontend-only build that can use earlier versions of visual studio. It also supports 64-bit libpq which the main build doesn't (yet). //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Integer datatimes vs msvc
Magnus Hagander [EMAIL PROTECTED] writes: Did we ever come to a conclusion on what to do with integer datetimes on msvc for 8.3? For the uninformed - 8.2 and earlier had the binary build with floating point timestamps, 8.3 has integer timestamps. (We're just talking the binary build here) I believe we have consensus that 8.4 is the time to do that --- see thread here: http://archives.postgresql.org/pgsql-patches/2007-05/msg00046.php So my recommendation is that the MSVC build should follow the source change, ie, not change until 8.4. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] 8.3 beta 4 crash on windows 2003 64-bit
Hi, I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to restore a database into it (about 30GB,5schemas and about 2000 tables) but whenever I try to open the tables in pgadmin3, the database crashes (i've tried with both the default postgresql.conf file and with some more tuned ones (based on 8.2.5 running on the same box before holding the same db): 2007-12-03 16:38:01 GMT LOG: database system is ready to accept connections 2007-12-03 16:38:01 GMT LOG: autovacuum launcher started This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2007-12-03 16:38:22 GMT LOG: server process (PID 3716) exited with exit code 3 2007-12-03 16:38:22 GMT LOG: terminating any other active server processes 2007-12-03 16:38:22 GMT WARNING: terminating connection because of crash of another server process 2007-12-03 16:38:22 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2007-12-03 16:38:22 GMT HINT: In a moment you should be able to reconnect to the database and repeat your command. 2007-12-03 16:38:22 GMT LOG: all server processes terminated; reinitializing 2007-12-03 16:38:23 GMT FATAL: pre-existing shared memory block is still in use 2007-12-03 16:38:23 GMT HINT: Check if there are any old server processes still running, and terminate them. ~ filip ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Integer datatimes vs msvc
On Dec 3, 2007, at 11:27 , Tom Lane wrote: I believe we have consensus that 8.4 is the time to do that --- see thread here: http://archives.postgresql.org/pgsql-patches/2007-05/msg00046.php Is there anything in the release notes (or elsewhere) for 8.3 that mention this as the intention, similar to the formerly deprecated money type? Would we want to? Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Integer datatimes vs msvc
Andrew Dunstan wrote: Magnus Hagander wrote: Did we ever come to a conclusion on what to do with integer datetimes on msvc for 8.3? For the uninformed - 8.2 and earlier had the binary build with floating point timestamps, 8.3 has integer timestamps. (We're just talking the binary build here) If you are preparing a binary package you can use whatever options you like. But I think in principle the MSVC defaults in our source code should mirror the Unix defaults. Right. I know there are certainly linux distributions that change it to integer timestamps (debian for example), so it's just a matter of what we want to ship in the installer. But if that's the consensus - Dave, unless you wrapped the new beta already, please change that in your config file before you do! //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Integer datatimes vs msvc
Magnus Hagander wrote: Did we ever come to a conclusion on what to do with integer datetimes on msvc for 8.3? For the uninformed - 8.2 and earlier had the binary build with floating point timestamps, 8.3 has integer timestamps. (We're just talking the binary build here) If you are preparing a binary package you can use whatever options you like. But I think in principle the MSVC defaults in our source code should mirror the Unix defaults. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit
Filip Wuytack wrote: Hi, I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to restore a database into it (about 30GB,5schemas and about 2000 tables) but whenever I try to open the tables in pgadmin3, the database crashes (i've tried with both the default postgresql.conf file and with some more tuned ones (based on 8.2.5 running on the same box before holding the same db): Where did you get the binary from? How was it built? cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit
From the postgresql ftp server. I initially wanted to download beta3, but spotted beta4 so I gave that a go ftp.postgresql.org /pub/binary/v8.3beta4/win32/ ~ Filip -Original Message- From: Andrew Dunstan [mailto:[EMAIL PROTECTED] Sent: 03 December 2007 17:09 To: Filip Wuytack Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit Filip Wuytack wrote: Hi, I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to restore a database into it (about 30GB,5schemas and about 2000 tables) but whenever I try to open the tables in pgadmin3, the database crashes (i've tried with both the default postgresql.conf file and with some more tuned ones (based on 8.2.5 running on the same box before holding the same db): Where did you get the binary from? How was it built? cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Integer datatimes vs msvc
Magnus Hagander wrote: Andrew Dunstan wrote: Magnus Hagander wrote: Did we ever come to a conclusion on what to do with integer datetimes on msvc for 8.3? For the uninformed - 8.2 and earlier had the binary build with floating point timestamps, 8.3 has integer timestamps. (We're just talking the binary build here) If you are preparing a binary package you can use whatever options you like. But I think in principle the MSVC defaults in our source code should mirror the Unix defaults. Right. I know there are certainly linux distributions that change it to integer timestamps (debian for example), so it's just a matter of what we want to ship in the installer. But if that's the consensus - Dave, unless you wrapped the new beta already, please change that in your config file before you do! Hmm. Too late to pull that one, but we'll make the change for next beta/rc then. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] buildenv.pl/buildenv.bat
Magnus Hagander wrote: Hi! When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. Also, do we now have both buildenv.bat and buildenv.pl? (I have them both in my environment, but that could be a result of being lazy) My recollection is that I changed the minimum amount necessary, because I was expecting us to go into beta at anmy moment (silly me). That might be why we still have both. There was an expectation that some cleanup might be required during 8.4 development. I know I left other .bat files as wrappers for the perl scripts, but that's obviously not appropriate here. I'll see if I can adjust builddoc.bat so we can get rid of buildenv.bat. As for documentation, you're probably right, I could easily have missed it. I'll look into it. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit
Andrew, As to how it was built: I was under the impression that the 8.3 release is build using MSVSC? I was keen to give it a try to see if this version would work more optimally on windows as the pg instance (8.2.5) I run normally suffers from serious high pages/sec (by looking at perfmon on windows) slowing the whole thing down (it's connected to a SAN on a 4Gb card and holds 80 disks...so file I/O should not be the bottleneck). Now a quick restore back to 8.2.5... ~ Filip -Original Message- From: Filip Wuytack Sent: 03 December 2007 17:12 To: 'Andrew Dunstan' Cc: 'pgsql-hackers@postgresql.org' Subject: RE: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit From the postgresql ftp server. I initially wanted to download beta3, but spotted beta4 so I gave that a go ftp.postgresql.org /pub/binary/v8.3beta4/win32/ ~ Filip -Original Message- From: Andrew Dunstan [mailto:[EMAIL PROTECTED] Sent: 03 December 2007 17:09 To: Filip Wuytack Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] 8.3 beta 4 crash on windows 2003 64-bit Filip Wuytack wrote: Hi, I loaded pg8.3 beta 4 on a win2k3 64bit box (8GB ram). I was able to restore a database into it (about 30GB,5schemas and about 2000 tables) but whenever I try to open the tables in pgadmin3, the database crashes (i've tried with both the default postgresql.conf file and with some more tuned ones (based on 8.2.5 running on the same box before holding the same db): Where did you get the binary from? How was it built? cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Kludge in pg_standby.c
There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge added to work around a Windows problem which makes it work but at great expense: #ifdef WIN32 /* * Windows reports that the file has the right number of bytes * even though the file is still being copied and cannot be * opened by pg_standby yet. So we wait for sleeptime secs * before attempting to restore. If that is not enough, we * will rely on the retry/holdoff mechanism. */ pg_usleep(sleeptime * 100L); #endif This happens before we return *any* WAL file to be processed. That means it slows down the processing of any file by 1s. On a server which has fallen behind this means it can't process files as quickly as it can copy them, it's limited to at most 1/s. I think it wouldn't be hard to do this properly. We can try to open the file, handle the expected Windows error by sleeping for 1s and repeating until we can successfully open it. Something like (untested): bool success = false; int fd, tries = 10; while (--tries) { fd = open(WALFilePath, O_RDONLY); if (fd = 0) { close(fd); success = true; break; } else if (errno == EWINDOWSBLOWS) { usleep(100); } else { perror(pg_standby open:); exit(2); } } if (!success) { fprintf(stderr, pg_standby: couldn't open file \%s\ due to \%s\, WALFilePath, strerror(EWINDOWSBLOWS)); exit(2); } -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Sorting Improvements for 8.4
On Mon, 2007-12-03 at 11:51 +, Simon Riggs wrote: So Algorithm F is somewhat orthogonal to what I've proposed, though maybe still interesting. What we do now is fairly close, but you should look at the code in tuplesort.c and logtape.c to see how well it matches. That might lead to an increase in the limit of the number of concurrent runs mergeable at any one time. tuplesort.c: * When merging runs, we use a heap containing just the frontmost tuple from * each source run; we repeatedly output the smallest tuple and insert the * next tuple from its source tape (if any). When the heap empties, the merge * is complete. The basic merge algorithm thus needs very little memory --- * only M tuples for an M-way merge, and M is constrained to a small number. * However, we can still make good use of our full workMem allocation by * pre-reading additional tuples from each source tape. Without prereading, * our access pattern to the temporary file would be very erratic; on average * we'd read one block from each of M source tapes during the same time that * we're writing M blocks to the output tape, so there is no sequentiality of * access at all, defeating the read-ahead methods used by most Unix kernels. * Worse, the output tape gets written into a very random sequence of blocks * of the temp file, ensuring that things will be even worse when it comes * time to read that tape. A straightforward merge pass thus ends up doing a * lot of waiting for disk seeks. We can improve matters by prereading from * each source tape sequentially, loading about workMem/M bytes from each tape * in turn. Then we run the merge algorithm, writing but not reading until * one of the preloaded tuple series runs out. Then we switch back to preread * mode, fill memory again, and repeat. This approach helps to localize both * read and write accesses. The idea of prefetching, as I understand it, is that we don't blindly preread workMem/M bytes from each of M tapes; instead we predict which tapes we will need tuples from next through forecasting. If I understand correctly, we just keep track of the maximum value of the last block read from each run, and then always read from the run in which the last block read has the lowest maximum. It seems as if this would allow a variable number of runs to be merged at once, but if the data really *is* random, we'd want it to degrade gracefully something resembling the current implementation. I'm being somewhat vague here because I haven't taken the time to really understand it. If you think this idea has potential I will look into it in more detail. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] buildenv.pl/buildenv.bat
On Mon, 2007-12-03 at 12:15 -0500, Andrew Dunstan wrote: Magnus Hagander wrote: Hi! When you redid the msvc build stuff you seem to have missed the documentatino.. Specifically, I notice that buildenv.pl isn't documented - docs still say buildenv.bat is the way to go. Also, do we now have both buildenv.bat and buildenv.pl? (I have them both in my environment, but that could be a result of being lazy) My recollection is that I changed the minimum amount necessary, because I was expecting us to go into beta at anmy moment (silly me). That might be why we still have both. There was an expectation that some cleanup might be required during 8.4 development. I know I left other .bat files as wrappers for the perl scripts, but that's obviously not appropriate here. I'll see if I can adjust builddoc.bat so we can get rid of buildenv.bat. As for documentation, you're probably right, I could easily have missed it. I'll look into it. Great. Let me know if you need me to poke at anything. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Sorting Improvements for 8.4
On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote: If I understand correctly, we just keep track of the maximum value of the last block read from each run, and then always read from the run in which the last block read has the lowest maximum. Yep, sounds like Algorithm F It seems as if this would allow a variable number of runs to be merged at once, but if the data really *is* random, we'd want it to degrade gracefully something resembling the current implementation. If we also keep track of the endpoints of runs that we haven't yet read from, then yes that would link my ideas with Algorithm F, so we just have a single implementation. (F++ ?) Probably easiest to store the endpoint tuples directly, with some sane limits for when we have very large tuples. You'll still need to do run-level forecasting as I had proposed to tell whether you need to do any intermediate merging prior to the final merge. So the two sets of ideas can't be brought together completely. I'm being somewhat vague here because I haven't taken the time to really understand it. If you think this idea has potential I will look into it in more detail. Yes, F++ sound like it will use memory more effectively than we do currently. That's likely to improve performance when the number of runs approaches the limit for the size of work_mem. So this will improve external sorts with too small memory allocations, but it won't do anything about sorts with too large a memory allocation. That's probably the order of importance for tackling sort performance, so thats good. Probably best to test with - 1M - 4M work_mem, so we see the full benefit of any improvements in memory utilisation in a typical context - number of runs is nearly at limit for memory - total sort is very large, so we see real I/O issues starkly You'll need to instrument things carefully so you can tell how many runs are being merged at any one time and how that effects elapsed time/row. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Is postgres.gif missing in cvs?
Hi, I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this error: cp ./../graphics/postgres.gif . cp: cannot stat `./../graphics/postgres.gif': No such file or directory make: *** [postgres.gif] Error 1 CVS/tarball does not include this file, and I think we should add. Regards, -- Devrim GÜNDÜZ , RHCE PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Sorting Improvements for 8.4
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2007-12-03 at 10:32 -0800, Jeff Davis wrote: If I understand correctly, we just keep track of the maximum value of the last block read from each run, and then always read from the run in which the last block read has the lowest maximum. So it sounds like the use case where this is the biggest win would be a situation where you have presorted input which has been sliced up. So for example sorting by zip code in a table which was clustered by city. The alphabetic order of the cities isn't correlated to the results but all the zip codes for a city are in a contiguous block somewhere in the output. In such a case after doing a single pass we would have a bunch of tapes each of which corresponded to a single city and was able to completely reorder the zip codes in that city to be ordered. So the desired results would be, for example, all the tuples from tape 17 (NYC) followed by all the tuples from tape 3 (Buffalo) followed by all the tuples from tape 1 (Albuquerque), etc. We currently preread an equal amount from each tape and then would empty all the preread tuples from tape 17, refill them, preread them again, repeat until tape 17 is empty then move on to tape 3. All the tuples except the currently active tape are completely idle. I think the way to do what you're proposing is to preread one tuple from each tape, then when one preread bunch is emptied refill it with twice as many and repeat. In this case you would end up with nearly all of workmem full of tuples from NYC until you're done with NYC. That would increase the prereading block size by a factor of 20 in this case. So the question is just how many seeks are we doing during sorting. If we're doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely (which we can't do) isn't going to speed up seeking all that much. If we're doing 20% seeks and can get that down to 10% it might be worthwhile. I'm not sure where the idea of keeping the current bounds of the input tapes comes into it. We only preread when we run out of tuples anyways and then we don't really have a choice about which tape we want to preread from. And it's a good thing too since maintaining such a list of bounds and finding the lowest or highest would mean maintaining a second heap which would basically double the cpu cost of sorting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Is postgres.gif missing in cvs?
On Mon, 3 Dec 2007, Devrim G?ND?Z wrote: I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this error: You need to say make postgres-US.pdf or -A4. The .gif file is coming from make's default rules. Perhaps we should provide a real target matching postgres.pdf and error out with a helpful message as this isn't the first time people have been bitten by that change. Kris Jurka ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Sorting Improvements for 8.4
On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote: I think the way to do what you're proposing is... Don't understand that. Algorithm F covers that already doesn't it? So the question is just how many seeks are we doing during sorting. If we're doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely (which we can't do) isn't going to speed up seeking all that much. If we're doing 20% seeks and can get that down to 10% it might be worthwhile. The buffer size at max tapes is an optimum - a trade off between avoiding intermediate merging and merging efficiently. Freeing more memory is definitely going to help in the case of low work_mem and lots of runs. You're right that there is a limit to the benefit you can get. I wrote a patch in 2005/6 to optimise the memory usage when there were few runs and lots of memory. I still think there's value in that. I'm not sure where the idea of keeping the current bounds of the input tapes comes into it. We only preread when we run out of tuples anyways and then we don't really have a choice about which tape we want to preread from. You have to decide whether to perform intermediate merges or whether you can do everything at the final merge. Otherwise you can't merge more runs than you have buffers for, since you'd at some point freeze up and not be able to input. And it's a good thing too since maintaining such a list of bounds and finding the lowest or highest would mean maintaining a second heap which would basically double the cpu cost of sorting. I think you're not understanding me. You only need to record the lowest or highest when a run completes/starts. When all runs have been written we then have a table of the highest and lowest values for each run. We then scan that to see whether we can perform merging in one pass, or if not what kind of intermediate merging is required. We keep the merge plan in memory and then follow it. So probably very small % of total sort cost, though might save you doing intermediate merges with huge costs. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Is postgres.gif missing in cvs?
Devrim GÜNDÜZ wrote: Hi, I tried to run make postgres.pdf (PostgreSQL 8.3beta4), and got this error: cp ./../graphics/postgres.gif . cp: cannot stat `./../graphics/postgres.gif': No such file or directory make: *** [postgres.gif] Error 1 No, the reason for that error message is that it is trying to generate postgres.pdf from postgres.gif. Try make postgres-A4.pdf or postgres-US.pdf instead. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ Hay quien adquiere la mala costumbre de ser infeliz (M. A. Evans) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Sorting Improvements for 8.4
Simon Riggs [EMAIL PROTECTED] writes: The buffer size at max tapes is an optimum - a trade off between avoiding intermediate merging and merging efficiently. Freeing more memory is definitely going to help in the case of low work_mem and lots of runs. I can't follow these abstract arguments. That's why I tried to spell out a concrete example. I think you're not understanding me. You only need to record the lowest or highest when a run completes/starts. When all runs have been written we then have a table of the highest and lowest values for each run. We then scan that to see whether we can perform merging in one pass, or if not what kind of intermediate merging is required. We keep the merge plan in memory and then follow it. So probably very small % of total sort cost, though might save you doing intermediate merges with huge costs. Ok, that's a very different concept than what I was thinking. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Sorting Improvements for 8.4
On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote: So the question is just how many seeks are we doing during sorting. If we're doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely (which we can't do) isn't going to speed up seeking all that much. If we're doing 20% seeks and can get that down to 10% it might be worthwhile. It's not just about eliminating seeks, it's about being able to merge more runs at one time. If you are merging 10 runs at once, and only two of those runs overlap and the rest are much greater values, you might be spending 99% of the time in sequential I/O. But the point is, we're wasting the memory holding those other 8 runs in memory (wasting 80% of the memory you're using), so we really could be merging a lot more than 10 runs at once. This might eliminate stages from the merge process. My point is just that how many seeks are we doing is not the only question. We could be doing 99% sequential I/O and still make huge wins. In reality, of course, the runs aren't going to be disjoint completely, but they may be partially disjoint. That's where forecasting comes in: you preread from the tapes you will actually need tuples from soonest. Regards, Jeff Davis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Is postgres.gif missing in cvs?
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes: I got some SGML errors: https://devrim.privatepaste.com/501oMnwCYw Hmph. What version of the SGML tools are you using? It seems more prone to get confused by non-entity-ized '' and '' than what the rest of us are using. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Is postgres.gif missing in cvs?
Hi, On Mon, 2007-12-03 at 18:33 -0300, Alvaro Herrera wrote: No, the reason for that error message is that it is trying to generate postgres.pdf from postgres.gif. Try make postgres-A4.pdf or postgres-US.pdf instead. Oh, thanks :) I got some SGML errors: https://devrim.privatepaste.com/501oMnwCYw I could fix them; but both the errors and the line numbers seemed very irrelevant to me. Can someone please take a look at it? Regards, -- Devrim GÜNDÜZ , RHCE PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Sorting Improvements for 8.4
On Mon, 2007-12-03 at 20:40 +, Gregory Stark wrote: I'm not sure where the idea of keeping the current bounds of the input tapes comes into it. We only preread when we run out of tuples anyways and then we don't really have a choice about which tape we want to preread from. And it's a good thing too since maintaining such a list of bounds and finding the lowest or highest would mean maintaining a second heap which would basically double the cpu cost of sorting. You're only keeping track of the maximum value for each run, which should be cheap to track. The only time it changes is when you're reading more data from that run, in which case it increases. The tradeoff that's happening right now is: we want to merge many runs at once because it reduces the number of merge phases, but the problem is that it increases the seeking because we read one block from one run, then one block from another run, etc., especially if the input is random. If we reduce the number of runs, then we can preread more efficiently. See: tuplesort.c: * as sorted runs, we can eliminate any repeated I/O at all. In the current * code we determine the number of tapes M on the basis of workMem: we want * workMem/M to be large enough that we read a fair amount of data each time * we preread from a tape, so as to maintain the locality of access described * above. Nonetheless, with large workMem we can have many tapes. So, for workMem/M to be large enough, M has to be small enough. But a small M means we have to do more merge phases, which is expensive. Forecasting improves this trade. Forecasting no longer _blindly_ prereads from each tape, it uses information that it already has (the max value of the last block read from each run) to determine the runs from which we need tuples the soonest. Then, it prereads the _correct_ data. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Is postgres.gif missing in cvs?
Hi, On Mon, 2007-12-03 at 17:49 -0500, Tom Lane wrote: I got some SGML errors: https://devrim.privatepaste.com/501oMnwCYw Hmph. What version of the SGML tools are you using? The ones supplied with Fedora 8. sgml-common: 0.6.3 openjada: 1.3.2 Regards, -- Devrim GÜNDÜZ , RHCE PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Is postgres.gif missing in cvs?
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Hmph. What version of the SGML tools are you using? It seems more prone to get confused by non-entity-ized '' and '' than what the rest of us are using. I'm not totally au fait with the rules of SGML. Does it allow literal '' in text nodes? In most places I looked in our docs we seem to use 'lt;' as I would have expected. It appears to me that the tools will silently take (and also ) as literal characters, *if* what follows them happens to not look too much like a tag or entity :-(. Pretty ugly. The particular cases that were biting Devrim seemed to all be occurrences of which perhaps is an allowed tag in his release. I found out that -wxml will cause openjade to warn about these cases. It turns on a boatload of other warnings that we probably don't care about, so I'm not going to recommend using it by default, but it enabled me to find a lot of problem spots just now. Oh, another interesting behavior that was turned up by this --- apparently you can get away with leaving off the ; in lt;, because we had done so in a few places. -wxml catches that too. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Is postgres.gif missing in cvs?
Tom Lane wrote: Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes: I got some SGML errors: https://devrim.privatepaste.com/501oMnwCYw Hmph. What version of the SGML tools are you using? It seems more prone to get confused by non-entity-ized '' and '' than what the rest of us are using. I'm not totally au fait with the rules of SGML. Does it allow literal '' in text nodes? In most places I looked in our docs we seem to use 'lt;' as I would have expected. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Is postgres.gif missing in cvs?
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes: On Mon, 2007-12-03 at 17:49 -0500, Tom Lane wrote: Hmph. What version of the SGML tools are you using? The ones supplied with Fedora 8. sgml-common: 0.6.3 openjada: 1.3.2 Those are the same version numbers I see in Fedora 6, which doesn't behave like that ... Anyway, I've committed some cleanup in HEAD. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Is postgres.gif missing in cvs?
Hi, On Mon, 2007-12-03 at 18:51 -0500, Tom Lane wrote: Those are the same version numbers I see in Fedora 6, which doesn't behave like that ... Anyway, I've committed some cleanup in HEAD. Thanks. Now, we have some more warnings/errors: http://www.gunduz.org/postgresql/postgres-A4.log.gz I can reproduce this on Fedora-8 and CentOS 5.1 (which is more or less like Fedora 6). Regards, -- Devrim GÜNDÜZ , RHCE PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Is postgres.gif missing in cvs?
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes: Thanks. Now, we have some more warnings/errors: http://www.gunduz.org/postgresql/postgres-A4.log.gz You need bigger TeX settings. Please read Building the Documentation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org