Re: [HACKERS] Sorting Improvements for 8.4
Jeff Davis wrote: My first thought would be that we would need a new executor node (e.g. ParallelSort) that would only be chosen when the cost of the sort is large enough to outweigh other factors (such as process creation time, dividing available work_mem, and any necessary IPC). It seems to me the simplest way to do it would be to allow each sub process to allocate work_mem/P where P is the degree of parallelization. However, that somewhat works against our schemes for dynamic run handling and forecasting, and may lead to more random reading from disk. Any other scheme I can think of would involve more IPC, which might make the idea just too complex. I am curious - what algorithms exist to efficiently do a parallel sort? Do you mean if sorting 1 million items, it is possible to separate this into 2 sets of 500 thousand each, execute them in separate threads (with task administration and synchronization overhead) , combine the results, and complete the task in significantly less time than doing it in one thread? I am skeptical that this is possible, and suspect that the overall efficiency of the system would go down even if the throughput of a single execution increases. Or do you mean being able to perform parts of the query plan fully in parallel? If this, then one would need a lot more than ParallelSort... Cheers, mark -- Mark Mielke [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Sorting Improvements for 8.4
2007/12/19, Mark Mielke [EMAIL PROTECTED]: Jeff Davis wrote: My first thought would be that we would need a new executor node (e.g. ParallelSort) that would only be chosen when the cost of the sort is large enough to outweigh other factors (such as process creation time, dividing available work_mem, and any necessary IPC). It seems to me the simplest way to do it would be to allow each sub process to allocate work_mem/P where P is the degree of parallelization. However, that somewhat works against our schemes for dynamic run handling and forecasting, and may lead to more random reading from disk. Any other scheme I can think of would involve more IPC, which might make the idea just too complex. I am curious - what algorithms exist to efficiently do a parallel sort? Do you mean if sorting 1 million items, it is possible to separate this into 2 sets of 500 thousand each, execute them in separate threads (with task administration and synchronization overhead) , combine the results, and complete the task in significantly less time than doing it in one thread? I am skeptical that this is possible, and suspect that the overall efficiency of the system would go down even if the throughput of a single execution increases. Ok - we want to sort table with quick sort and we want to do it on - N threads. Every thread - gets begin and end of indices of the table. First step starts at 0 and lasts with count -1. Single step: find medium value and move lover before it and bigger after. In normal case - we use recursive call - so the same procedure is being called for that two parts. In thread we can put indices at side list - and use queue of threads to pick up data from the list. We can use common table, access to side list with indices has to be serialized. Or do you mean being able to perform parts of the query plan fully in parallel? If this, then one would need a lot more than ParallelSort... Nice to have, but rather for data warehouses. In other cases... IE - backend for Internet - there are many requests and every processor / core works nice. -- Regards, Michał Zaborowski (TeXXaS) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Testing mail list
I'm receiving bogus bounce messages like this (which are malformed even, the Subject isn't properly encoded). I'm not sure what list is generating them or what address but if we can figure out who could we drop whoever it is from the list please? ---BeginMessage--- A mensagem de email enviada para [EMAIL PROTECTED] requer confirmação para ser entregue. Por favor, responda este e-mail informando os caracteres que você vê na imagem abaixo. The email message sent to [EMAIL PROTECTED] requires a confirmation to be delivered. Please, answer this email informing the characters that you see in the image below. Não remova a próxima linha / Don't remove next line captchakey:asbEJKOW1nbXcwMTY4Mjk ---End Message--- -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] WORM and Read Only Tables (v0.1)
Ühel kenal päeval, E, 2007-12-17 kell 09:20, kirjutas Simon Riggs: On Sat, 2007-12-15 at 13:32 +0100, Albert Cervera i Areny wrote: Read-Only Tables Postgres supports the concept of freezing tuples, so they can live forever within the database without needing further writes. Currently there is no command that will guarantee that a table has been completely frozen. This makes it difficult to reliably write data files to WORM media for longer term archiving. (WORM means Write-Once, Read-Many). It's also a pain having to VACUUM a large table again just because a small number of rows need to be frozen. I'm not an expert at all, but I'd like to understand this, do you plan that READ-ONLY tables wouldn't even store transaction information? That should save quite a lot of space. Maybe when the table would be moved to the compressed tablespace, MVCC information could be dropped too? Of course that would avoid future insert update possibilities though. It could, but its a lot of work for little gain. The tuple headers look like they will compress fairly well, so why bother to remove them at all? One place for removing them would be if we do column-stores where there would be one header per column instead of one per tuple. -- Hannu ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Sorting Improvements for 8.4
On Tuesday 18 December 2007 10:03:25 Dimitri Fontaine wrote: Hi, Le mardi 18 décembre 2007, Ron Mayer a écrit : Has anyone looked into sorting algorithms that could use more than one CPU or core at a time? [...] PS: Yeah, I know multi-threading is a hot-button on these lists; but sorting seems a relatively isolated of the code and I'd wonder if it'd be isolate-able enough that multiple CPUs could be used there. And before that objection to multi-threading implementation and portability concerns arise, what about using a coroutine BSD-licenced portable implementation such as Protothreads to have backend code use several CPU at a time? http://www.sics.se/~adam/pt/ With such a tool, would it be possible to think about producer/consumer parallel executions for sorting, aggregates nodes or other parts of the executor? Hope this helps, regards, And remember; Users don't care about portability-issues, they care about performance. If multi-threading is a way to speed up sorting considerably, it should, IMHO, be considered seriously. -- Andreas Joseph Krogh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Testing mail list
Gregory Stark wrote: I'm receiving bogus bounce messages like this (which are malformed even, the Subject isn't properly encoded). I'm not sure what list is generating them or what address but if we can figure out who could we drop whoever it is from the list please? Subject: Confirmação de envio / Sending confirmation (captchaid:1324333124c3) From: [EMAIL PROTECTED] The email message sent to [EMAIL PROTECTED] requires a confirmation to be delivered. Please, answer this email informing the characters that you see in the image below Receipt of messages like this is guaranteed an immediate entry in my junk filter. Use of this braindead software is bad enough, but being so clueless as not to whitelist a technical mailing list you subscribe to is truly horrible. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] buildenv.pl/buildenv.bat
On Fri, Dec 14, 2007 at 10:47:02AM -0500, Andrew Dunstan wrote: Magnus Hagander wrote: On Fri, Dec 14, 2007 at 03:39:14PM +, Dave Page wrote: Andrew Dunstan wrote: Writing and calling a temp .bat file might be yucky - having to keep two environment files is a lot more yucky, IMNSHO, and we shouldn't make people do it. +1 Ok, I guess I'm outvoted ;-) I don't care *that* much about it, so let's do it your way. You need help testing that patch further, or do you consider it ready for application? If so, go. Yes, please do test it - I don't use buildenv.* at all normally, as then buildfarm client takes care of its own environment, and that's what I usually use to build on Windows. It works in my two standalone environments (win32 and win64 machine), so I've applied the patch now to get full buildfarm testing. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pgwin32_open returning EINVAL
On Sun, Dec 16, 2007 at 10:31:59AM +, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Andrew Dunstan [EMAIL PROTECTED] writes: Interesting. Maybe forever is going a bit too far, but retrying for n seconds or so. I think looping forever is the right thing. Having a fixed timeout just means Postgres will break sometimes instead of all the time. And it introduces non-deterministic behaviour too. Looping forever would be considered broken by a very large fraction of the community. Really? I understood we're talking about having Postgres fail with an error if any of its files are opened by another program such as backup software. So with a 30s limit it means Postgres might or might not fail depending on how long this other software has the file open. That doesn't seem like an improvement. If your software is locking a file for that long, that software is more than just broken, it's horribly broken. Having a workaround against something that might happen once or twice because of a bug in the other software is one thing, but if it's actually *designed* to do that you really need to get that software removed from your machine. Having the server completely stop for 30 seconds waiting for something to happen is bad enough, I think. Better to error out to let the user know that there's a major configuration problem on the machine. IIRC we have a 30-second timeout in rename() for Windows, and that seems to be working well enough, so I'd be inclined to copy the behavior for this case. I thought it was unlink, and the worst-case there is that we leak a file until some later time. I'm wasn't exactly following that case though. We do it on both rename and unlink. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pgwin32_open returning EINVAL
On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Andrew Dunstan [EMAIL PROTECTED] writes: Interesting. Maybe forever is going a bit too far, but retrying for n seconds or so. I think looping forever is the right thing. Having a fixed timeout just means Postgres will break sometimes instead of all the time. And it introduces non-deterministic behaviour too. Looping forever would be considered broken by a very large fraction of the community. IIRC we have a 30-second timeout in rename() for Windows, and that seems to be working well enough, so I'd be inclined to copy the behavior for this case. Here's a patch that I think implements this ;) Alvaro - do you have a build env so you can test it? I can't reproduce the problem in my environment... Also, it currently just silently loops. Would it be interesting to ereport(WARNING) that it's looping on the open, to let the user know there's a problem? (Naturally, only warning the first time it tries it on each file, so we don't spam the log too hard) //Magnus Index: src/port/open.c === RCS file: /projects/cvsroot/pgsql/src/port/open.c,v retrieving revision 1.22 diff -c -r1.22 open.c *** src/port/open.c 30 Nov 2007 11:16:43 - 1.22 --- src/port/open.c 19 Dec 2007 12:42:05 - *** *** 58,65 pgwin32_open(const char *fileName, int fileFlags,...) { int fd; ! HANDLE h; SECURITY_ATTRIBUTES sa; /* Check that we can handle the request */ assert((fileFlags ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND | --- 58,66 pgwin32_open(const char *fileName, int fileFlags,...) { int fd; ! HANDLE h = INVALID_HANDLE_VALUE; SECURITY_ATTRIBUTES sa; + int loops = 0; /* Check that we can handle the request */ assert((fileFlags ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND | *** *** 71,77 sa.bInheritHandle = TRUE; sa.lpSecurityDescriptor = NULL; ! if ((h = CreateFile(fileName, /* cannot use O_RDONLY, as it == 0 */ (fileFlags O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) : ((fileFlags O_WRONLY) ? GENERIC_WRITE : GENERIC_READ), --- 72,78 sa.bInheritHandle = TRUE; sa.lpSecurityDescriptor = NULL; ! while ((h = CreateFile(fileName, /* cannot use O_RDONLY, as it == 0 */ (fileFlags O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) : ((fileFlags O_WRONLY) ? GENERIC_WRITE : GENERIC_READ), *** *** 88,93 --- 89,108 ((fileFlags O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0), NULL)) == INVALID_HANDLE_VALUE) { + /* +* Sharing violation or locking error can indicate antivirus, backup +* or similar software that's locking the file. Try again for 30 seconds +* before giving up. +*/ + if (GetLastError() == ERROR_SHARING_VIOLATION || + GetLastError() == ERROR_LOCK_VIOLATION) + { + pg_usleep(10); + loops++; + if (loops 30) + continue; + } + _dosmaperr(GetLastError()); return -1; } ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgwin32_open returning EINVAL
Magnus Hagander [EMAIL PROTECTED] writes: If your software is locking a file for that long, that software is more than just broken, it's horribly broken. Having a workaround against something that might happen once or twice because of a bug in the other software is one thing, but if it's actually *designed* to do that you really need to get that software removed from your machine. I was under the impression though that this was just how open worked on windows. Only one process can have a file open at the same time. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Testing mail list
Andrew Dunstan [EMAIL PROTECTED] writes: Receipt of messages like this is guaranteed an immediate entry in my junk filter. Use of this braindead software is bad enough, but being so clueless as not to whitelist a technical mailing list you subscribe to is truly horrible. It's worse than that in this case. This is an *impressively* broken configuration. What appears to be happening is that the mail server at this university is looking at the To and From headers and treating it as a personal email between those two addresses. It sends this captcha to the From header claiming that the person in the To header is insisting on the captcha being filled out. The first such bounce I looked at actually claimed it was on Tom's behalf! If I were the list maintainer here I would ban infotecnica.com.br addresses from subscribing to any of our lists. Ideally with a message saying as a result of misconfigured mail software addreses from infotecnica.com.br are banned from pgsql mailing lists. Please contact your postmaster to request they fix the problems -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pgwin32_open returning EINVAL
On Wed, Dec 19, 2007 at 01:05:00PM +, Gregory Stark wrote: Magnus Hagander [EMAIL PROTECTED] writes: If your software is locking a file for that long, that software is more than just broken, it's horribly broken. Having a workaround against something that might happen once or twice because of a bug in the other software is one thing, but if it's actually *designed* to do that you really need to get that software removed from your machine. I was under the impression though that this was just how open worked on windows. Only one process can have a file open at the same time. Then you're wrong. You can open files in shared mode no problem - it's how we have multiple backensd opening the same file. You do have to remember to specify the flag when you open the file - something backup software for example has always done, unless it's really crappy design. Most proper backup software will use VSS these days, which makes the whole point moot. And antivirus is supposed to be entirely transparent, so it's definitely a bug when it happens. But we've seen a lot of buggy antivirus. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pgwin32_open returning EINVAL
On Wed, Dec 19, 2007 at 11:23:35AM -0300, Alvaro Herrera wrote: Magnus Hagander wrote: On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Andrew Dunstan [EMAIL PROTECTED] writes: Interesting. Maybe forever is going a bit too far, but retrying for n seconds or so. I think looping forever is the right thing. Having a fixed timeout just means Postgres will break sometimes instead of all the time. And it introduces non-deterministic behaviour too. Looping forever would be considered broken by a very large fraction of the community. IIRC we have a 30-second timeout in rename() for Windows, and that seems to be working well enough, so I'd be inclined to copy the behavior for this case. Here's a patch that I think implements this ;) Alvaro - do you have a build env so you can test it? I can't reproduce the problem in my environment... Thanks -- forwarded to the appropriate parties. :-) Thanks. Let us know the results :-) Also, it currently just silently loops. Would it be interesting to ereport(WARNING) that it's looping on the open, to let the user know there's a problem? (Naturally, only warning the first time it tries it on each file, so we don't spam the log too hard) Yeah, I think it would be useful to log one message if after (say) 5 seconds we still haven't been able to open the file. Either that, or on the first run. Is the sleep time correct? If I'm reading it right, it sleeps 100 ms each time, 30 times, that totals 3 seconds ... ? Uh, I copied that from pgunlink() and pgrename(), but forgot a zero on the loop. It's supposed to loop 300 times. (Are we OK with the idea of sleeping 1 second each time?) I think not. 0.1 seconds is better. We don't want to delay a full second if it's just a transient thing. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pgwin32_open returning EINVAL
Magnus Hagander wrote: On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Andrew Dunstan [EMAIL PROTECTED] writes: Interesting. Maybe forever is going a bit too far, but retrying for n seconds or so. I think looping forever is the right thing. Having a fixed timeout just means Postgres will break sometimes instead of all the time. And it introduces non-deterministic behaviour too. Looping forever would be considered broken by a very large fraction of the community. IIRC we have a 30-second timeout in rename() for Windows, and that seems to be working well enough, so I'd be inclined to copy the behavior for this case. Here's a patch that I think implements this ;) Alvaro - do you have a build env so you can test it? I can't reproduce the problem in my environment... Thanks -- forwarded to the appropriate parties. :-) Also, it currently just silently loops. Would it be interesting to ereport(WARNING) that it's looping on the open, to let the user know there's a problem? (Naturally, only warning the first time it tries it on each file, so we don't spam the log too hard) Yeah, I think it would be useful to log one message if after (say) 5 seconds we still haven't been able to open the file. Is the sleep time correct? If I'm reading it right, it sleeps 100 ms each time, 30 times, that totals 3 seconds ... ? (Are we OK with the idea of sleeping 1 second each time?) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgwin32_open returning EINVAL
Yeah, I think it would be useful to log one message if after (say) 5 seconds we still haven't been able to open the file. Either that, or on the first run. Imho 1-5s is better, so that would be after the 10-50th try. loop. It's supposed to loop 300 times. Yes. (Are we OK with the idea of sleeping 1 second each time?) I think not. 0.1 seconds is better. We don't want to delay a full second if it's just a transient thing. Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds (http://support.microsoft.com/kb/316609) Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pgwin32_open returning EINVAL
Zeugswetter Andreas ADI SD wrote: Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds (http://support.microsoft.com/kb/316609) We have seen cases in the past where these locks last quite a long time. That 30s total timeout in rename and unlink was not chosen arbitrarily - it's based on experience. I think a retry interval of 0.1s here should be OK. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgwin32_open returning EINVAL
On Wed, Dec 19, 2007 at 04:20:23PM +0100, Zeugswetter Andreas ADI SD wrote: Yeah, I think it would be useful to log one message if after (say) 5 seconds we still haven't been able to open the file. Either that, or on the first run. Imho 1-5s is better, so that would be after the 10-50th try. Ok, so I'll put in a warning after 50 tries. loop. It's supposed to loop 300 times. Yes. (Are we OK with the idea of sleeping 1 second each time?) I think not. 0.1 seconds is better. We don't want to delay a full second if it's just a transient thing. Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds (http://support.microsoft.com/kb/316609) Their sample code sleeps for 0.25 seconds though. (Sleep() takes milliseconds). So we're definitely fine with 0.1sec I think. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Benchmark for GiST index?
What kind of benchmark? I have benchmark for text search http://www.sigaev.ru/cvsweb/cvsweb.cgi/ftsbench/ and several others, based on it. Koichi Suzuki wrote: Hi, Does anybody have an information about GiST index benchmark? I'm planning to run GiST benchmark and analyze WAL structure to see PG's dynamic behavior. Regards; -- Koichi Suzuki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] pgwin32_open returning EINVAL
On Wed, Dec 19, 2007 at 04:44:12PM +0100, Magnus Hagander wrote: On Wed, Dec 19, 2007 at 04:20:23PM +0100, Zeugswetter Andreas ADI SD wrote: Yeah, I think it would be useful to log one message if after (say) 5 seconds we still haven't been able to open the file. Either that, or on the first run. Imho 1-5s is better, so that would be after the 10-50th try. Ok, so I'll put in a warning after 50 tries. Updated version attached. Comments on the wording of the messages are also welcome ;-) //Magnus Index: src/port/open.c === RCS file: /projects/cvsroot/pgsql/src/port/open.c,v retrieving revision 1.22 diff -c -r1.22 open.c *** src/port/open.c 30 Nov 2007 11:16:43 - 1.22 --- src/port/open.c 19 Dec 2007 15:50:06 - *** *** 13,19 #ifdef WIN32 ! #include c.h #include windows.h #include fcntl.h --- 13,23 #ifdef WIN32 ! #ifndef FRONTEND ! #include postgres.h ! #else ! #include postgres_fe.h ! #endif #include windows.h #include fcntl.h *** *** 58,65 pgwin32_open(const char *fileName, int fileFlags,...) { int fd; ! HANDLE h; SECURITY_ATTRIBUTES sa; /* Check that we can handle the request */ assert((fileFlags ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND | --- 62,70 pgwin32_open(const char *fileName, int fileFlags,...) { int fd; ! HANDLE h = INVALID_HANDLE_VALUE; SECURITY_ATTRIBUTES sa; + int loops = 0; /* Check that we can handle the request */ assert((fileFlags ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND | *** *** 71,77 sa.bInheritHandle = TRUE; sa.lpSecurityDescriptor = NULL; ! if ((h = CreateFile(fileName, /* cannot use O_RDONLY, as it == 0 */ (fileFlags O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) : ((fileFlags O_WRONLY) ? GENERIC_WRITE : GENERIC_READ), --- 76,82 sa.bInheritHandle = TRUE; sa.lpSecurityDescriptor = NULL; ! while ((h = CreateFile(fileName, /* cannot use O_RDONLY, as it == 0 */ (fileFlags O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) : ((fileFlags O_WRONLY) ? GENERIC_WRITE : GENERIC_READ), *** *** 88,93 --- 93,121 ((fileFlags O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0), NULL)) == INVALID_HANDLE_VALUE) { + /* +* Sharing violation or locking error can indicate antivirus, backup +* or similar software that's locking the file. Try again for 30 seconds +* before giving up. +*/ + if (GetLastError() == ERROR_SHARING_VIOLATION || + GetLastError() == ERROR_LOCK_VIOLATION) + { + pg_usleep(10); + loops++; + + #ifndef FRONTEND + if (loops == 50) + ereport(WARNING, + (errmsg(could not open file \%s\ due to sharing violation. Will retry for 30 seconds., + fileName, +errhint(You may have antivirus, backup or similar software interfering with the database.; + #endif + + if (loops 300) + continue; + } + _dosmaperr(GetLastError()); return -1; } ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
Andreas Joseph Krogh [EMAIL PROTECTED] writes: And remember; Users don't care about portability-issues, they care about performance. Nonsense. If Postgres stops working on their machine, they'll care. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgwin32_open returning EINVAL
Zeugswetter Andreas ADI SD wrote: I think not. 0.1 seconds is better. We don't want to delay a full second if it's just a transient thing. Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds (http://support.microsoft.com/kb/316609) Hm, the article only mentions ERROR_SHARING_VIOLATION but we're also considering ERROR_LOCKING_VIOLATION. Should we retry only on the SHARING violation? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Testing mail list
Gregory Stark [EMAIL PROTECTED] writes: It's worse than that in this case. This is an *impressively* broken configuration. Understatement of the week. The mail includes absolutely no evidence about what message is allegedly being filtered. Are you sure that this is really a filtering engine at all, and not just random spam hoping to draw responses from careless people? I've heard of web comment-spammers who try to get other people to decode captchas for them this way. Adding to my suspicion is that I don't recall having seen one of these personally, and if it were really tied to posting on any of the PG lists, I shoulda seen a lot ;-) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Testing mail list
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: It's worse than that in this case. This is an *impressively* broken configuration. Understatement of the week. The mail includes absolutely no evidence about what message is allegedly being filtered. Are you sure that this is really a filtering engine at all, and not just random spam hoping to draw responses from careless people? I've heard of web comment-spammers who try to get other people to decode captchas for them this way. Adding to my suspicion is that I don't recall having seen one of these personally, and if it were really tied to posting on any of the PG lists, I shoulda seen a lot ;-) Yeah, I think it comes from pgsql-performance. I just got one mentioning an address to which I had responded some minutes before. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Testing mail list
On Wed, Dec 19, 2007 at 11:15:37AM -0500, Tom Lane wrote: hoping to draw responses from careless people? I've heard of web comment-spammers who try to get other people to decode captchas for them this way. Yes. This is the latest spammer trick. They get people all over the globe to decode the captchas. It's way easier than programming to decode the captchas (which itself isn't that hard -- there are plenty of toolkits out there that will decode such things for you). A ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Testing mail list
I wrote: Adding to my suspicion is that I don't recall having seen one of these personally, I take that back --- some digging in my mail logs shows that I have gotten a few of these, but they went straight to /dev/null because my spam filters thought they were a virus. Have you checked whether that gif is really an image, rather than a bit of malware? The mail-log trace of the last such attempt is pretty interesting too: Dec 16 13:05:16 sss2 sm-mta[27362]: lBGI5G1g027362: infotecnica.com.br [201.35.247.5] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA Dec 16 13:05:16 sss2 sm-mta[27363]: lBGI5GFn027363: infotecnica.com.br [201.35.247.5] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA Dec 16 13:05:17 sss2 sm-mta[27365]: lBGI5HIe027365: infotecnica.com.br [201.35.247.5] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA Dec 16 13:05:52 sss2 sm-mta[27368]: lBGI5n2G027368: from=[EMAIL PROTECTED], size=27892, class=0, nrcpts=1, msgid=[EMAIL PROTECTED] r, bodytype=8BITMIME, proto=ESMTP, daemon=MTA, relay=infotecnica.com.br [201.35.247.5] Dec 16 13:05:52 sss2 sm-mta[27369]: lBGI5n2G027368: to=|/usr/local/bin/procmail -tYf- || exit 75 #tgl, ctladdr=[EMAIL PROTECTED] (301/20), delay=00:00:02, xdelay=0 0:00:00, mailer=prog, pri=58095, dsn=2.0.0, stat=Sent Since 11 December there are consistently three no-op connections before anything actually happens, which adds a whole new layer of incompetence that could be charged against whoever is running this, if it actually is a mail server --- which I grow increasingly dubious of. I also see a whole lot of connection attempts in the preceding months in which nothing was *ever* sent, just did not issue MAIL reports in bursts of three. Looks like spamhaus.org was blocking them for portions of last month, too, so other people have been unhappy about this as well. Whoever these people are, I've seen enough; I'm off to add this IP address to my local permanent blacklist. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Sorting Improvements for 8.4
Michał Zaborowski wrote: Ok - we want to sort table with quick sort and we want to do it on - N threads. Every thread - gets begin and end of indices of the table. First step starts at 0 and lasts with count -1. Single step: find medium value and move lover before it and bigger after. In normal case - we use recursive call - so the same procedure is being called for that two parts. In thread we can put indices at side list - and use queue of threads to pick up data from the list. We can use common table, access to side list with indices has to be serialized. Stupid question #2: Is it well recognized that the CPU is the bottleneck in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and I/O? It would seem to me that any sort worth parallelizing (administrative and synchronization overhead), must have data larger than the L2 cache. If larger than the L2 cache, it becomes real memory speed. If real memory speed, wouldn't one CPU without hardware synchronization, be able to fill the memory read/write pipe? If 'divide and conquer' to parallize, wouldn't the values written from one thread, often (1 / N) need to be read from another thread, requiring hardware data synchronization? I see the wikipedia.org page describes how easy it is to parallelize quick sort, and scale performance linearly with the number of processors, but I don't see references to back this claim. At least some of these steps seem difficult or impractical to parallelize. For example, the initial partition reorder that moves items lower than the pivot to the left, and items higher than the pivot to the right, would not be easy to parallelize using an in-place re-order. It needs to move one partition down before it can 'divide and conquer'. They say no synchronization is required, but I think they are missing the hardware synchronization required (especially in the inner most loops where the thread task becomes shorter, and starts to fit in L1/L2). They say linear, but then talk about a 'new thread being created'. New thread creation has a cost, and if reduced to using a thread pool, then synchronization *is* required. It sounds like a 'nice in theory' idea. :-) Which doesn't mean it is wrong... I am curious enough to write a test... Or do you mean being able to perform parts of the query plan fully in parallel? If this, then one would need a lot more than ParallelSort.. Nice to have, but rather for data warehouses. In other cases... IE - backend for Internet - there are many requests and every processor / core works nice. I'm a fan of the 'each plan item is a task, that is assigned to the pool, with each CPU grabbing tasks from the pool'. Another 'nice in theory' idea (used by DB2?). As it is, though, I think PostgreSQL planning is heavily designed to maximize performance on a single CPU, and single queries would not easily scale to multiple CPUs. (Perhaps hashing could be done on another CPU, or as you describe above, sorting) Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)
I have submitted the first working patch for the trailing null optimization. It currently does the following a) Doesn't store the null bitmap, if the heap tuple / index tuple contains only trailing nulls b) In Heap Tuple, the trailing nulls won't occupy space in the null bitmap. The General design is like this a) After checking for trailing nulls, i reduce the number of attributes field, which gets stored in each heap tuple. b) For Index, i have changed the Index_form_tuple to store the unaligned total size in the size mask. While navigating through the index tuple, if the offset exceeds the unaligned total size stored, then a null is returned Please review it and provide suggestions. I doubt you have fixed it; I doubt it's *possible* to fix it without significant rejiggering of IndexTuple representation. The problem is that IndexTuple lacks a number-of-fields field, so there is no place to indicate how many null bitmap bits you have actually stored. Actually i have made one change to the structure of IndexTupleData. Instead of storing the Aligned size in the size mask, i have stored the un-aligned size. I am storing the size before the final MAXALIGN. The interface remains un-changed. IndexTupleSize does a MAXALIGN before returning the size value. so the interface remains un-changed. The advantage of storing the un-aligned size is that we can get both aligned size and un-aligned size(As you may know). I have created two more macros to return the un-aligned size. I would suggest forgetting that part and submitting the part that has some chance of getting accepted. Actually i want to submit the patch, which is best according to me. I suspect there's also an awkward case that *does* need to handled when you insert a tuple which has a null column which you're leaving out of the tuple but which appears in an index. You would have to make sure that the index tuple has that datum listed as NULL even though it's entirely missing from the heap tuple. Actually this is taken care because of your suggestion. When you add a new column, it doesn't appear in the heaptuple, but if you create an index on that column afterwards, the case is handled. There is a field in HeapTuple, which mentions the number of attributes in the tuple. If we are requesting for attribute numbers greater than this number, it is returned as null. So that problem was taken care. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com) trailing-nulls.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Sorting Improvements for 8.4
On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote: Stupid question #2: Is it well recognized that the CPU is the bottleneck in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and I/O? I think it depends a lot on several factors. It's probably a different bottleneck for integers versus localized text, and depends on the available memory and I/O characteristics. It would seem to me that any sort worth parallelizing (administrative and synchronization overhead), must have data larger than the L2 cache. If larger than the L2 cache, it becomes real memory speed. If real memory speed, wouldn't one CPU without hardware synchronization, be able to fill the memory read/write pipe? We do an external merge sort, which involves merging M runs together. You seem to be implying that we can generate the output run at disk speed, and therefore the CPU speed is unimportant. I suspect that the comparison costs are enough that the above statement isn't true in all cases, particularly in the case of localized text. Also, there is probably a lot of memory copying going on, and that probably destroys a lot of the effectiveness of L2 caching. When L2 caching is ineffective, the CPU spends a lot of time just waiting on memory. In that case, it's better to have P threads of execution all waiting on memory operations in parallel. This would explain why 1p2t would outperform a 1p1t in Ron's reference above. These are just my first thoughts, however. There is a lot of existing research out there that we can look into, and also a lot of tests that we can run before jumping into this. I think parallel sorting can be looked into separately from the other sorting improvements. Regards, Jeff Davis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
Mark Mielke wrote: I am curious - what algorithms exist to efficiently do a parallel sort? Do you mean if sorting 1 million items, it is possible to separate this into 2 sets of 500 thousand each, execute them in separate threads (with task administration and synchronization overhead) , combine the results, and complete the task in significantly less time than doing it in one thread? I am skeptical that this is possible... The link in the beginning of the thread points to articles that seem to describe one such algorithm; along with benchmarks. (http://tinyurl.com/3bvu4u, http://tinyurl.com/32wg2m) The improvements were pretty consistent from set sizes ranging from very small sets (hundreds) to quite large ones (hundreds of K). Interestingly, even multi-threading helped a lot. Our tests correlate well with previous research that showed Intel’s implementation of SMT (Hyper-Threading) to be adept at hiding this latency [6, 20, 12].Table 4 shows that by having two threads access memory at the same time, performance improved over 80% when compared to the singlethreaded version. It uses both quicksort phases and merge phases; for the merge phase using 2CPUs (no hyperthreading) apparently gave more than 2X speed improvement; apparently because it could parallelize memory access with CPU more. Or do you mean being able to perform parts of the query plan fully in parallel? If this, then one would need a lot more than ParallelSort... I wouldn't recommend that - it seems like a Hard Problem. My guess is that the best way to use multiple threads in one backend would be to find specific algorithms like sorting that would be easier to isolate. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)
Gokulakannan Somasundaram wrote: I would suggest forgetting that part and submitting the part that has some chance of getting accepted. Actually i want to submit the patch, which is best according to me. That's not an attitude that is likely to succeed - you need to take suggestions from Tom very seriously. Also, please submit patches as context diffs, as set out in the Developer FAQ, which you should probably read carefully: http://www.postgresql.org/docs/faqs.FAQ_DEV.html cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Proposal for Null Bitmap Optimization(for TrailingNULLs)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, 19 Dec 2007 13:46:15 -0500 Andrew Dunstan [EMAIL PROTECTED] wrote: I would suggest forgetting that part and submitting the part that has some chance of getting accepted. Actually i want to submit the patch, which is best according to me. You do need to be able to be able to feel that your work is up to a standard that you find redeemable. However... That's not an attitude that is likely to succeed - you need to take suggestions from Tom very seriously. Andrew is absolutely correct here. If you do not agree with Tom, you best prove why. Otherwise your patch will likely be ignored on submission. Sincerely, Joshua D. Drake - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHaWj9ATb/zqfZUUQRAqsNAJ9k6p0z7rQEcqal0JoKw/ZZG8h5kACfaB9y xQJ4O+h1xe947O1gnTLEbTU= =WaSW -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
Ron Mayer wrote: The link in the beginning of the thread points to articles that seem to describe one such algorithm; along with benchmarks. (http://tinyurl.com/3bvu4u, http://tinyurl.com/32wg2m) The improvements were pretty consistent from set sizes ranging from very small sets (hundreds) to quite large ones (hundreds of K). Interestingly, even multi-threading helped a lot. Our tests correlate well with previous research that showed Intel’s implementation of SMT (Hyper-Threading) to be adept at hiding this latency [6, 20, 12].Table 4 shows that by having two threads access memory at the same time, performance improved over 80% when compared to the singlethreaded version. It uses both quicksort phases and merge phases; for the merge phase using 2CPUs (no hyperthreading) apparently gave more than 2X speed improvement; apparently because it could parallelize memory access with CPU more. Good points. I had forgotten about DDR and DDR2 having high throughput at the cost of high latency. Somewhere in there, having the most number of memory requests in the queue would allow hardware to eliminate this high latency effect. Or do you mean being able to perform parts of the query plan fully in parallel? If this, then one would need a lot more than ParallelSort... I wouldn't recommend that - it seems like a Hard Problem. My guess is that the best way to use multiple threads in one backend would be to find specific algorithms like sorting that would be easier to isolate. Also a good point. :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Sorting Improvements for 8.4
Jeff Davis wrote: On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote: Stupid question #2: Is it well recognized that the CPU is the bottleneck in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and I/O? I think it depends a lot on several factors. It's probably a different bottleneck for integers versus localized text, and depends on the available memory and I/O characteristics. Makes sense. It would seem to me that any sort worth parallelizing (administrative and synchronization overhead), must have data larger than the L2 cache. If larger than the L2 cache, it becomes real memory speed. If real memory speed, wouldn't one CPU without hardware synchronization, be able to fill the memory read/write pipe? We do an external merge sort, which involves merging M runs together. You seem to be implying that we can generate the output run at disk speed, and therefore the CPU speed is unimportant. Correct. Or, alternatively, you could achieve the same effect using asychronous I/O or read ahead. I suspect that the comparison costs are enough that the above statement isn't true in all cases, particularly in the case of localized text. That sounds possible, but I still feel myself suspecting that disk reads will be much slower than localized text comparison. Perhaps I am overestimating the performance of the comparison function? Also, there is probably a lot of memory copying going on, and that probably destroys a lot of the effectiveness of L2 caching. When L2 caching is ineffective, the CPU spends a lot of time just waiting on memory. In that case, it's better to have P threads of execution all waiting on memory operations in parallel. I didn't consider the high throughput / high latency effect. This could be true if the CPU prefetch isn't effective enough. This would explain why 1p2t would outperform a 1p1t in Ron's reference above. These are just my first thoughts, however. There is a lot of existing research out there that we can look into, and also a lot of tests that we can run before jumping into this. I think parallel sorting can be looked into separately from the other sorting improvements. Yep - I started to read up on it. It still sounds like it's a hard-ish problem (to achieve near N times speedup for N CPU cores without degrading performance for existing loads), but that doesn't mean impossible. :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [HACKERS] Sorting Improvements for 8.4
As long as sorting improvements are being considered, may I suggest an experiment that uses a very simple model? Assuming that you have K subfiles created by the initial sorting pass, insert the top record of each file into a priority queue. Then, emit records from the queue until the priority queue is empty. Now, there will be the objection that we will be jumping willy-nilly all over the disk because of reading one record at a time, but (depending on how it is implemented) generally several records are buffered during a read. So (as a gentle suggestion) I suggest testing the model. It works great for a single CPU or multiple CPU system for the work that *I* do. I have no idea if it will be a benefit for PostgreSQL or not, but it should be a very simple matter to try it. As long as someone is doing the work right now, it would be a good time to give it a go. I am not very familiar with PostgreSQL internals, but I would be willing to give a hand with it (not really sure how much time I can guarantee, though, since I would be doing it on my free time). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
On Wed, 2007-12-19 at 15:51 -0500, Mark Mielke wrote: That sounds possible, but I still feel myself suspecting that disk reads will be much slower than localized text comparison. Perhaps I am overestimating the performance of the comparison function? I think this simple test will change your perceptions: Do an initdb with --locale=en_US.UTF-8 and start postgres. test= create table sorter(t text, b bytea, f float); CREATE TABLE test= insert into sorter select r AS rt, r::text::bytea AS rb, r AS rf from (select random() as r from generate_series(1,100)) a; INSERT 0 100 test= select pg_size_pretty(pg_total_relation_size('sorter')); pg_size_pretty 70 MB (1 row) test= explain analyze select * from sorter order by t; test= explain analyze select * from sorter order by b; test= explain analyze select * from sorter order by f; On my machine this table fits easily in memory (so there aren't any disk reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary data, and 20 seconds for localized text. That's much longer than it would take to read that data from disk, since it's only 70MB (which takes a fraction of a second on my machine). I think this disproves your hypothesis that sorting happens at disk speed. Yep - I started to read up on it. It still sounds like it's a hard-ish problem (to achieve near N times speedup for N CPU cores without degrading performance for existing loads), but that doesn't mean impossible. :-) You don't even need multiple cores to achieve a speedup, according to Ron's reference. Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Sorting Improvements for 8.4
On Wed, 2007-12-19 at 14:41 -0800, Dann Corbit wrote: As long as sorting improvements are being considered, may I suggest an experiment that uses a very simple model? Assuming that you have K subfiles created by the initial sorting pass, insert the top record of each file into a priority queue. Then, emit records from the queue until the priority queue is empty. What is the principle difference between that idea and our existing sort algorithm? There's a good explanation in the comment at the top of tuplesort.c. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Sorting Improvements for 8.4
-Original Message- From: Jeff Davis [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 19, 2007 3:10 PM To: Dann Corbit Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Sorting Improvements for 8.4 On Wed, 2007-12-19 at 14:41 -0800, Dann Corbit wrote: As long as sorting improvements are being considered, may I suggest an experiment that uses a very simple model? Assuming that you have K subfiles created by the initial sorting pass, insert the top record of each file into a priority queue. Then, emit records from the queue until the priority queue is empty. What is the principle difference between that idea and our existing sort algorithm? There's a good explanation in the comment at the top of tuplesort.c. According to the comments, PostgreSQL uses replacement selection. Replacement selection is a wonderful thing because it creates runs that are twice as long as normal due to the snowplow effect. See (for instance): http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/69/27216/01209012. pdf Then, the merge routine will have half as many runs to merge the files together. So (for instance) without replacement selection, if you create 1024 subfiles, then replacement selection will create 512. That saves one merge pass. The algorithm that I am suggesting will take exactly one pass to merge all of the files. It works like this... Imagine an array of pointers to the subfiles: [*subfile][*subfile]...[*subfile] Step 0: We sort the array by a comparison operator that examines the top element of each subfile. So now the array is ordered such that the record with the smallest key is in array slot 0. Step 1: We remove the first record from the subfile in array slot 0. Now, the priority of the first element *may* have changed. So if it is no longer smaller than the subfile immediately to the right, we do a binary insertion to put this subfile in its new location, moving the contents of array slot[1] to array slot 0 if it is needed. Step 2: Is the entire list of subfiles empty? If yes, then terminate, if no then go to Step 1. Like I said, it is ultra-simple and it sorts the entire contents of all subfiles to the output with a single pass. Consider the way that current replacement selection works. The actual O(f(N)) behavior of replacement selection is just terrible O(n^2). But because we save one full merge pass, it is usually worth it anyway, since memory access is much faster than disk. And if we only have a few subfiles, the savings will be large. In the case of a priority queue merge, we only have one single merge pass no matter how many subfiles there are. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
P.S. A beautiful paper on replacement selection is found here: http://students.fim.uni-passau.de/~fickensc/Proseminar/Proseminar.pdf ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pgwin32_open returning EINVAL
Magnus Hagander [EMAIL PROTECTED] writes: Updated version attached. Comments on the wording of the messages are also welcome ;-) Well, since you asked ;-) ... I don't particularly like this. 1. Doesn't distinguish SHARING_VIOLATION from LOCK_VIOLATION. We might want to know that. 2. Do we really want this to be WARNING? LOG seems a better idea, since it's not warning about anything the client app did wrong. 3. Doesn't follow the message style guide: shoehorning two sentences into errmsg() is surely not compliant. Perhaps errmsg: could not open file \%s\: sharing violation OR errmsg: could not open file \%s\: lock violation errdetail: Continuing to retry for up to 30 seconds. errhint: OK as you have it. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Sorting Improvements for 8.4
On Wed, 2007-12-19 at 15:19 -0800, Dann Corbit wrote: The algorithm that I am suggesting will take exactly one pass to merge all of the files. From tuplesort.c: In the current code we determine the number of tapes M on the basis of workMem: we want workMem/M to be large enough that we read a fair amount of data each time we preread from a tape, so as to maintain the locality of access described above. Nonetheless, with large workMem we can have many tapes. It seems like you are just choosing M to be equal to the number of initial runs, whereas the current code takes into account the cost of having workMem/M too small. We do want to increase the number of runs that can be merged at once; that's what dynamic run handling and forecasting are all about. But we want to avoid unnecessary seeking, also. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Sorting Improvements for 8.4
Mark Mielke [EMAIL PROTECTED] writes: Jeff Davis wrote: Also, there is probably a lot of memory copying going on, and that probably destroys a lot of the effectiveness of L2 caching. When L2 caching is ineffective, the CPU spends a lot of time just waiting on memory. In that case, it's better to have P threads of execution all waiting on memory operations in parallel. I didn't consider the high throughput / high latency effect. This could be true if the CPU prefetch isn't effective enough. Note that if this is the argument, then there's a ceiling on the speedup you can expect to get: it's just the extent of mismatch between the CPU and memory speeds. I can believe that suitable test cases would show 2X improvement for 2 threads, but it doesn't follow that you will get 10X improvement with 10 threads, or even 4X with 4. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Sorting Improvements for 8.4
Jeff Davis [EMAIL PROTECTED] writes: test= explain analyze select * from sorter order by t; test= explain analyze select * from sorter order by b; test= explain analyze select * from sorter order by f; On my machine this table fits easily in memory (so there aren't any disk reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary data, and 20 seconds for localized text. That's much longer than it would take to read that data from disk, since it's only 70MB (which takes a fraction of a second on my machine). I think this disproves your hypothesis that sorting happens at disk speed. I suspect most of that is spent just copying the data around. Which would not be helped by having multiple threads doing the copying -- and in fact might be exacerbated if it required an extra copy to consolidate all the data in the end. How long does a explain analyze sinmple select * from sorter take? And assuming you're doing disk sorts (in disk cache) you're doing quite a lot of copying to temporary files (in disk cache) and then back to memory. Note that speeding up a query from 20s to 5s isn't terribly useful. If it's OLTP you can't be using all your cores for each user anyways. And if it's DSS 20s isn't a problem. Where parallel processing like this becomes attractive is when you're running a 2 hour query on a machine sequentially running scheduled batch jobs which can be sped up to 30 minutes. But in that case you're almost certainly being limited by your disk bandwidth, not your cpu speed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] New style of hash join proposal
On Dec 13, 2007, at 7:13 AM, Gregory Stark wrote: We currently execute a lot of joins as Nested Loops which would be more efficient if we could batch together all the outer keys and execute a single inner bitmap index scan for all of them together. Essentially what I'm saying is that we're missing a trick with Hash Joins which currently require that we can execute the inner side once without any parameters from the outer side. Instead what we could do is build up the hash table, then scan the hash table building up an array of keys and pass them as a parameter to the inner side. The inner side could do a bitmap index scan to fetch them all at once and start returning them just as normal to the hash join. There are a couple details: 1) Batched hash joins. Actually I think this would be fairly straightforward. You want to rescan the inner side once for each batch. That would actually be easier than what we currently do with saving tuples to files and all that. 2) How to pass the keys. This could be a bit tricky especially for multi-column keys. My first thought was to build up an actually Array node but that only really works for single-column keys I think. Besides it would be more efficient to somehow arrange to pass over a reference to the whole hash. I fear the real complexity would be (as always) in the planner rather than the executor. I haven't really looked into what it would take to arrange this or how to decide when to do it. TODO? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Sorting Improvements for 8.4
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Gregory Stark Sent: Wednesday, December 19, 2007 5:26 PM To: Jeff Davis Cc: Mark Mielke; Michał Zaborowski; Simon Riggs; Ron Mayer; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] Sorting Improvements for 8.4 Jeff Davis [EMAIL PROTECTED] writes: test= explain analyze select * from sorter order by t; test= explain analyze select * from sorter order by b; test= explain analyze select * from sorter order by f; On my machine this table fits easily in memory (so there aren't any disk reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary data, and 20 seconds for localized text. That's much longer than it would take to read that data from disk, since it's only 70MB (which takes a fraction of a second on my machine). I think this disproves your hypothesis that sorting happens at disk speed. I suspect most of that is spent just copying the data around. Which would not be helped by having multiple threads doing the copying -- and in fact might be exacerbated if it required an extra copy to consolidate all the data in the end. Benchmarking a single system will really only explain that system. Someone may have a disk farm with 2GB/Sec throughput: http://www.sql-server-performance.com/articles/per/system_storage_configuration_p1.aspx But such a configuration is very unlikely. Someone may have 10GB/S NIC cards, but those too, are rare. So for any benchmark, we will really just end up with a number for that system. Typically, disk is the bottleneck. I found this on the net somewhere, but it's quite a useful table for capacity planning (to find the weak link in the chain using back of the envelope calculations): Interface Width Frequency Bytes/Sec Bits/Sec 4-way interleaved PC1600 (DDR200) SDRAM 4 x 64bits 100 MHz DDR 6.4 GB/s51 Gbps Opteron HyperTransport memory bus 128bits 200 MHz DDR 6.4 GB/s 51 Gbps Pentium 4 800 MHz FSB 64bits 200 MHz QDR 6.4 GB/s51 Gbps PC2 6400 (DDR-II 800) SDRAM 64bits 400 MHz DDR 6.4 GB/s51 Gbps PC2 5300 (DDR-II 667) SDRAM 64bits 333 MHz DDR 5.3 GB/s43 Gbps Pentium 4 533 MHz FSB 64bits 133 MHz QDR 4.3 GB/s34 Gbps PC2 4300 (DDR-II 533) SDRAM 64bits 266 MHz DDR 4.3 GB/s34 Gbps 2-channel PC1066 RDRAM 2 x 16bits 533 MHz DDR 4.3 GB/s34 Gbps PCI-X 533 64bits 533 MHz 4.3 GB/s34 Gbps PCI-Express x16 serial/16lanes 2.5 GHz 4 GB/s 32 Gbps Pentium 4 400 MHz FSB 64bits 100 MHz QDR 3.2 GB/s25.6 Gbps 2-channel PC800 RDRAM 2 x 16bits 400 MHz DDR 3.2 GB/s25.6 Gbps 2-way interleaved PC1600 (DDR200) SDRAM 2 x 64bits 100 MHz DDR 3.2 GB/s25.6 Gbps PC2 3200 (DDR-II 400) SDRAM 64bits 200 MHz DDR 3.2 GB/s25.6 Gbps PC3200 (DDR400) SDRAM 64bits 200 MHz DDR 3.2 GB/s25.6 Gbps PC2700 (DDR333) SDRAM 64bits 167 MHz DDR 2.7 GB/s21 Gbps PC2100 (DDR266) SDRAM 64bits 133 MHz DDR 2.1 GB/s17 Gbps AGP 8x 32bits 533 MHz 2.1 GB/s17 Gbps PCI-X 266 64bits 266 MHz 2.1 GB/s17 Gbps PCI-Express x8 serial/8lanes 2.5 GHz 2 GB/s 16 Gbps EV6 bus (Athlon/Duron FSB) 64bits 100 MHz DDR 1.6 GB/s13 Gbps PC1600 (DDR200) SDRAM 64bits 100 MHz DDR 1.6 GB/s13 Gbps PC800 RDRAM 16bits 400 MHz DDR 1.6 GB/s13 Gbps PC150 SDRAM 64bits 150 MHz 1.3 GB/s10.2 Gbps 10 gigabit ethernet serial 10 GHz 1.25 GB/s 10 Gbps OC-192 serial 9.953 GHz 1.24 GB/s 9.953 Gbps 133 MHz FSB 64bits 133 MHz 1.06 GB/s 8.5 Gbps PC133 SDRAM 64bits 133 MHz 1.06 GB/s 8.5 Gbps AGP 4x 32bits 266 MHz 1.06 GB/s 8.5 Gbps PCI-X 64bits 133 MHz 1.06 GB/s 8.5 Gbps PCI-Express x4 serial/4lanes 2.5 GHz 1 GB/s 8 Gbps 100 MHz FSB 64bits 100 MHz 800 MB/s6.4 Gbps PC100 SDRAM 64bits 100 MHz 800 MB/s6.4 Gbps PC66 SDRAM 64bits 66 MHz 533 MB/s4.3 Gbps fast/wide PCI 64bits 66 MHz 533 MB/s4.3 Gbps AGP 2x 32bits 133 MHz 533 MB/s4.3 Gbps single-link DVI 12bits 165 MHz DDR 495 MB/s3.96 Gbps Ultra-320 SCSI 16bits 160 MHz 320 MB/s2.6 Gbps OC-48 network serial 2.488 GHz 311 MB/s2.488 Gbps AGP 32bits 66 MHz 266 MB/s2.1 Gbps PCI-Express x1 serial 2.5 GHz 250 MB/s2 Gbps Serial ATA/1500 diskserial 1.5 GHz 187 MB/s1.5 Gbps Ultra-160 SCSI 16bits 80 MHz 160 MB/s1.3 Gbps OC-24 network serial 1.244 GHz 155 MB/s1.244 Gbps PCI 32bits 33 MHz 133 MB/s1.06 Gbps ATA/133 disk8bits 66 MHz DDR 133 MB/s1.06 Gbps gigabit ethernetserial 1 GHz 125 MB/s1 Gbps ATA/100 disk8bits 50 MHz DDR 100 MB/s
Re: [HACKERS] Sorting Improvements for 8.4
On Wed, 19 Dec 2007, Dann Corbit wrote: Benchmarking a single system will really only explain that system. Someone may have a disk farm with 2GB/Sec throughput But such a configuration is very unlikely. If you believe comments like those at http://www.c0t0d0s0.org/archives/1792-Do-it-yourself-X4500.html it's possible to hit 2GB/s total to the 48 disks in one of the Sun X4500 servers, which start at $24K. May be unlikely to you, but I was reading there after I set one up last night, and that's a boring standard configuration for some Sun and Greenplum customers. Also, that's today--by the time 8.4 is mainstream high-end machines will be even faster. Wanna make a bet on how much disk throughput will be available as SSD disks go mainstream in the next two years? -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Sorting Improvements for 8.4
Dann Corbit [EMAIL PROTECTED] writes: Note that speeding up a query from 20s to 5s isn't terribly useful. If it's OLTP you can't be using all your cores for each user anyways. And if it's DSS 20s isn't a problem. Unless (of course) there are 20,000 users doing the queries that would take 20 seconds but now they take 5 (when run single-user). They will still have a bit of a wait, of course. I'm not exactly following. If you have 20,000 users then you're probably using all the processors already. If you process them one by one on 4 cores in 5s then you'll get the same throughput as if you ran them four at a time on 1 core each in 20s. Where parallel processing like this becomes attractive is when you're running a 2 hour query on a machine sequentially running scheduled batch jobs which can be sped up to 30 minutes. But in that case you're almost certainly being limited by your disk bandwidth, not your cpu speed. A linear speedup of 2 or more is always worth while[*]. Since sorting (e.g. for group by' and 'order by') and sort joins are a major database task, I guess that a linear speedup by a factor of 2 might make the database operations on the whole be 10% faster or so {OK, it's a SWAG}. I guess it would look good on the benchmarks, if nothing else. Except note that you're not getting this linear speedup for free. To get a linear speedup of 2x you'll be using more than 2x the cpu resources. If there is nothing else contending for that resource (such as the scenario I described where you're running a single large batch query on a system and want to use all available resources to run it as fast as possible), then you'll get a 2x speedup. But if there is more than one query running on the system then you're not actually gaining anything. Each query will run faster but you won't be able to run as many simultaneously without having them slow back down. And the overhead of parallelizing the query will be a net loss. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Sorting Improvements for 8.4
Greg Smith [EMAIL PROTECTED] writes: On Wed, 19 Dec 2007, Dann Corbit wrote: Benchmarking a single system will really only explain that system. Someone may have a disk farm with 2GB/Sec throughput But such a configuration is very unlikely. If you believe comments like those at http://www.c0t0d0s0.org/archives/1792-Do-it-yourself-X4500.html it's possible to hit 2GB/s total to the 48 disks in one of the Sun X4500 servers, which start at $24K. May be unlikely to you, but I was reading there after I set one up last night, and that's a boring standard configuration for some Sun and Greenplum customers. Surely such machines have kickass memory backplanes too though? How could it ever be reasonable to have an i/o controller with more bandwidth than your memory? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Sorting Improvements for 8.4
Tom Lane wrote: ...I can believe that suitable test cases would show 2X improvement for 2 threads, One other thing I found interesting is that their test case showed a near 2X improvement for hyperthreading; where I haven't heard of many other ways to get hyperthreading to show improvements for postgreql. but it doesn't follow that you will get 10X improvement with 10 threads, or even 4X with 4. Yeah - unless those 10 cores have additional I/O to the memories compared to a 1 core system (which I'd hope would be the case or else I'd expect many apps would be run into memory bottlenecks on such systems, no?). ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] PGCon 2008 - call for papers
Hello folks, PGCon 2008 will be held 22-23 May 2008, in Ottawa at the University of Ottawa. It will be preceded by two days of tutorials on 20-21 May 2008. We are now requesting proposals for presentations. If you are doing something interesting with PostgreSQL, please submit a proposal. You might be one of the backend hackers or work on a PostgreSQL related project and want to share your know-how with others. You might be developing an interesting system using PostgreSQL as the foundation. Perhaps you migrated from another database to PostgreSQL and would like to share details. These, and other stories are welcome. Both users and developers are encouraged to share their experiences. Here are a few ideas to jump start your proposal process: - novel, unique or complex ways in which PostgreSQL are used - migration of production systems to PostgreSQL - data warehousing with PostgreSQL - tuning PostgreSQL for different work loads - replicating data on top of PostgreSQL Both users and developers are encouraged to share their experiences. The schedule is: 19 Dec 2007 Proposal acceptance begins 19 Jan 2008 Proposal acceptance ends 19 Feb 2008 Confirmation of accepted proposals 19 Apr 2008 Final papers/slides must arrive no later than this date See also http://www.pgcon.org/2008/papers.php Instructions for submitting a proposal to PGCon 2008 are available from: http://www.pgcon.org/2008/submissions.php -- Dan Langille - http://www.langille.org/ BSDCan - The Technical BSD Conference: http://www.bsdcan.org/ PGCon - The PostgreSQL Confernce: http://www.pgcon.org/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Sorting Improvements for 8.4
On Thu, 20 Dec 2007, Gregory Stark wrote: Surely such machines have kickass memory backplanes too though? How could it ever be reasonable to have an i/o controller with more bandwidth than your memory? Dann had the right general numbers here--max of 6.4GB/s between processors and you might coax an aggregate of double that out of the DDR RAM with 2 4-way interleaved banks of memory. Let's call it 12GB/s theoretical max. If the theoretical max of the disks is 2GB/s, that's only a 6:1 headroom, and with a decent cache rate it's not outrageous to imagine you could bottleneck on memory with some things before you run out of disk throughput. Right now I think a lot of the disk bottlenecks are seek-limited more than anything, but SSD will knock that one out for apps that are more about throughput than maximum storage. I could already switch to SDD usefully today for some of what I do that's in that category, it's just a bit too expensive to do yet; soon, though. Just trying to usefully estimate where the edge of that back of the envelope should go to. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Benchmark for GiST index?
I'm trying to run some GiST activities and see XLOG record structure. README tells it runs PG8.2 or later. Does it run with PG8.3beta4? It seems to depend upon tsearch2 in contrib. 2007/12/20, Teodor Sigaev [EMAIL PROTECTED]: What kind of benchmark? I have benchmark for text search http://www.sigaev.ru/cvsweb/cvsweb.cgi/ftsbench/ and several others, based on it. Koichi Suzuki wrote: Hi, Does anybody have an information about GiST index benchmark? I'm planning to run GiST benchmark and analyze WAL structure to see PG's dynamic behavior. Regards; -- Koichi Suzuki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- -- Koichi Suzuki ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org