Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
On 15 Nov 2004 02:00:37 -0500, Greg Stark <[EMAIL PROTECTED]> wrote: > I think people should get away from thinking about "order by + limit". That > isn't going to work for anything with a GROUP BY. And it isn't going to work > for anything more complex than a single min() or max(). > > min() only needs the first record from whatever set of records it's operating > on as long as they're provided in a specified order. This is just as true for > a min() applied to only a single GROUP as it is for a min() applied to an > entire table. But as far as I can tell there is no way of forcing such order, at least ORDER BY queries are doomed to fail: select max(val) from test_max order by val desc; ERROR: column "test_max.val" must appear in the GROUP BY clause or be used in an aggregate function Anyway I think that any optimization (supposedly "imlicit" order by when min() or max() is the only requested column) would at least stop people from using awkward syntax for performance reasons... Regards, Dawid ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Odd plpgsql behaviour
On 7.4: This is what we wanted to do: IF TG_OP = 'INSERT' OR (TG_OP = 'UPDATE' AND NEW.name != OLD.name) THEN EXECUTE x; END IF; However, we had to write it like this: IF TG_OP = 'INSERT' THEN EXECUTE x; ELSIF TG_OP = 'UPDATE' AND NEW.name != OLD.name THEN EXECUTE x; END IF; Because in the first case it would complain that OLD.name wasn't defined, if the trigger was NOT an update. OK, but the second case works??!?! Is this a weird peculiarity of the pl/pgsql lazy evaluation rules? Why doesn't the first one work if the second one does? Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Jan Wieck <[EMAIL PROTECTED]> writes: > Both cases can be expressed with order by + limit queries, that would indeed > utilize those indexes. But what's been discussed so far does not cover any of > them. I think people should get away from thinking about "order by + limit". That isn't going to work for anything with a GROUP BY. And it isn't going to work for anything more complex than a single min() or max(). min() only needs the first record from whatever set of records it's operating on as long as they're provided in a specified order. This is just as true for a min() applied to only a single GROUP as it is for a min() applied to an entire table. I don't think you want to use the existing Limit executor node. That will only ever let you handle these simple aggregates that return the first value they see. What you want is a normal Aggregate node, but the node feeding it should be an altered index scan that knows it only needs to pull out the first and/or last record for each GROUP. That will let you handle min() and max() in the same query for example. It might also leave open the door for other more complex data subsets. Say a geometric data type where it needs all the bounding points of an area. -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] psql \e broken again
hi, As for mac os x this is on darwin mac os (it's mac something) I ran a find command with a space in it. mechatronics:~ joe$ find . -name '* *' ./backups/dscpadmin/scriptsMay19/dscp/validate/List of Cancers.doc ./backups/pain/PAINjune/validate/List of Cancers.doc ./backups/untarhere/test/List of Cancers.doc ./Library/Address Book Plug-Ins ./Library/Application Support ./Library/Caches/Desktop/Desktop Pictures.dpImageCache ./Library/Caches/Desktop/Solid Colors.dpImageCache ./Library/Caches/iPhoto Cache ./Library/Caches/Software Update ./Library/FontCollections/Fixed Width.collection ./Library/iMovie/Sound Effects ./Library/Internet Plug-Ins ./Library/Keyboard Layouts ./Library/Preferences/QuickTime Preferences ./Library/Safari/Form Values ./Public/Drop Box On Sun, Nov 14, 2004 at 10:14:06PM -0500, Bruce Momjian wrote: > John Hansen wrote: > > > > 1. Quote only on Windows. > > > > > > > > 2. Expect user to put quotes in the EDITOR value if it contains a > > > > space-containing path. > > > > > > As far I I'm aware, the options on windows are very much like those on > > unix: > > > > "path containing spaces" or > > path\ containing\ spaces > > My guess is that we should only address spaces on Win32, but should > document why did didn't do it on Unix. Also, what about OS X? Doesn't > that have frequent spaces in file paths? > > -- > Bruce Momjian| http://candle.pha.pa.us > [EMAIL PROTECTED] | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup.| Newtown Square, Pennsylvania 19073 > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faqs/FAQ.html -- if you're christian, god bless; otherwise, good luck; and, if you dont believe in luck ... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] GiST: PickSplit and multi-attr indexes
On Sun, 2004-11-14 at 18:54 -0500, Tom Lane wrote: > It's probably just a hangover from the days when GiST didn't support > multi-column indexes at all. I agree it should be changed. I'm not sure the right way to fix it (at least without significant changes to the GiST API). At present, the PickSplit() method is passed a vector of GISTENTRYs and fills in a GIST_SPLITVEC. The GISTENTRYs correspond to the first attributes of all the tuples in the node to be split. There is no provision for the GiST extension to be informed about any additional attributes in the index tuples. Even if we changed the API to allow that, the GiST extension would have a hard time making a reasonable decision in the multi-attribute case: the additional index attributes could well be implemented using a different GiST extension. > But note you will then have to cope with NULL values. Yes -- I'm not sure offhand why GiST does not allow leading NULL values in index attributes, but that ought to be fixed anyway. -Neil ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] psql \e broken again
John Hansen wrote: > > > 1. Quote only on Windows. > > > > > > 2. Expect user to put quotes in the EDITOR value if it contains a > > > space-containing path. > > > As far I I'm aware, the options on windows are very much like those on > unix: > > "path containing spaces" or > path\ containing\ spaces My guess is that we should only address spaces on Win32, but should document why did didn't do it on Unix. Also, what about OS X? Doesn't that have frequent spaces in file paths? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6
Kris Jurka <[EMAIL PROTECTED]> writes: > On Sun, 14 Nov 2004, Tom Lane wrote: >> Is your current value one less than a multiple of 16, by any chance? > Currently 32. It is unclear whether you think 31 is the failure case your > thinking of or whether 31 might help. No, 32 is actually the best case (most slop) if I'm reading the code correctly. I'd suggest an update to 7.3 or later ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6
On Sun, 14 Nov 2004, Tom Lane wrote: > The comment in ProcGetNewSemIdAndNum suggests that you might be able to > suppress the problem in 7.2 by using a different max_connections value. > Is your current value one less than a multiple of 16, by any chance? > Currently 32. It is unclear whether you think 31 is the failure case your thinking of or whether 31 might help. Kris Jurka ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] IpcSemaphoreLock/Unlock and proc_exit on 7.2.6
Kris Jurka <[EMAIL PROTECTED]> writes: > I have an underpowered server running 7.2.6 that backs a website which > occasionally gets hit by a bunch of traffic and starts firing off "FATAL > 1: Sorry, too many clients already" messages. This is all as expected, > but sometimes it just crashes. I had no clue what was going on until I > checked the stderr log (because I had set it up to use syslog). In there > I find a whole bunch of these: > IpcSemaphoreLock: semop(id=-1) failed: Invalid argument [ eyeballs code... ] It looks like this could happen in 7.2 during exit from a backend that failed to acquire a semaphore --- ProcKill does things like LockReleaseAll, which needs to acquire the lockmanager LWLock, which could try to block using the process semaphore if there's contention for the LWLock. The problem should be gone in 7.3 and later due to reorganization of the semaphore management code. I'm not sure it's worth trying to fix in 7.2.* --- the odds of introducing new problems seem too high, and we're not really maintaining 7.2 anymore anyway. The comment in ProcGetNewSemIdAndNum suggests that you might be able to suppress the problem in 7.2 by using a different max_connections value. Is your current value one less than a multiple of 16, by any chance? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] GiST: PickSplit and multi-attr indexes
Neil Conway <[EMAIL PROTECTED]> writes: > If I understand the code correctly, GiST will only pass the first > attribute of each index tuple to the user-defined PickSplit method when > it wants to split a node. (see circa line 1269 of gist.c) > Is this a wise design decision? It's probably just a hangover from the days when GiST didn't support multi-column indexes at all. I agree it should be changed. But note you will then have to cope with NULL values. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] German-style quotes in the source file
"Serguei Mokhov" <[EMAIL PROTECTED]> writes: > I was about to update initdb translation, but noticed > that newly introduced error messages in the code have > German-style quotes. These propagated to the .po files now... Good catch; fix committed. A quick grep confirms these are the only two cases. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] code question: storing INTO relation
On Sun, 2004-11-14 at 22:59, Neil Conway wrote: > On Sun, 2004-11-14 at 11:06 +, Simon Riggs wrote: > > HASH - works OK, but a pain to administer, no huge benefit in using > > At least in theory, I think this could offer better performance for > equality searches than b+-tree. Given how common those kinds of queries > are, I still think hash indexes are worth putting some time into. My > guess is that their relatively poor performance at present (relative to > b+-trees) is just a reflection of how much more tuning and design work > has gone into the b+-tree code than the hash code. Can be faster for equality searches on a fairly static table; on a growing table, could be same or worse. IMHO The theoretical difference in speed doesn't seem worth the effort of spending additional time in that part of the code, given the inherent pain of REINDEX. > > GiST - index of choice for PostGIS, TSearch2, in need of optimization > > I'm working on adding page-level locking and WAL safety, although this > is a pretty difficult project. Difficult, yes. I'm glad you're stepping up to the plate for the WAL safety. Two index types is sufficient, and ISTM should be the maximum therefore. When you've finished tuning GiST, I wager that you will agree :) -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] German-style quotes in the source file
Hello Peter, I was about to update initdb translation, but noticed that newly introduced error messages in the code have German-style quotes. These propagated to the .po files now... It happened in this commit: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c.diff?r1=1.65;r2=1.66 -- Serguei A. Mokhov ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] postmaster segfaults with HUGE table
Neil Conway <[EMAIL PROTECTED]> writes: > This specific assertion is triggered because we represent attribute > numbers throughout the code base as a (signed) int16 -- the assertion > failure has occurred because an int16 has wrapped around due to > overflow. A fix would be to add a check to DefineRelation() (or even > earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" > columns. Good analysis. We can't check earlier than DefineRelation AFAICS, because earlier stages don't know about inherited columns. On reflection I suspect there are similar issues with SELECTs that have more than 64K output columns. This probably has to be guarded against in parser/analyze.c. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] code question: storing INTO relation
On Sun, 2004-11-14 at 11:06 +, Simon Riggs wrote: > HASH - works OK, but a pain to administer, no huge benefit in using At least in theory, I think this could offer better performance for equality searches than b+-tree. Given how common those kinds of queries are, I still think hash indexes are worth putting some time into. My guess is that their relatively poor performance at present (relative to b+-trees) is just a reflection of how much more tuning and design work has gone into the b+-tree code than the hash code. > R-TREE - slightly broken in places, limited in usablity I agree. I hope that when we have a good GiST infrastructure, implementing rtree via GiST will offer performance that is as good as or better than the builtin rtree. > GiST - index of choice for PostGIS, TSearch2, in need of optimization I'm working on adding page-level locking and WAL safety, although this is a pretty difficult project. Gavin and I are also looking at algorithms for bulk loading GiST indexes, although I'm not yet sure how possible that will be. -Neil ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
On 11/10/2004 11:57 PM, Mark Kirkwood wrote: Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of "what should be optimized" is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? The problem is, that select min(foo) from bar where foo > 100; is still solvable with an index scan, assuming there is an index on foo. But select min(foo) from bar where baz = 'IT'; is only doable with an index scan if you have a compound index on (foo,baz). Both cases can be expressed with order by + limit queries, that would indeed utilize those indexes. But what's been discussed so far does not cover any of them. Jan Jim C. Nasby wrote: On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote: I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Out of curiosity, will you be doing this in such a way that SELECT min(foo), max(foo) FROM bar will end up as SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC LIMIT 1) ? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] psql \e broken again
> > 1. Quote only on Windows. > > > > 2. Expect user to put quotes in the EDITOR value if it contains a > > space-containing path. As far I I'm aware, the options on windows are very much like those on unix: "path containing spaces" or path\ containing\ spaces Kind Regards John Hansen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT
Bruce Momjian <[EMAIL PROTECTED]> writes: > OK, Tom please go ahead with the patch. Done. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] psql \e broken again
Peter Eisentraut <[EMAIL PROTECTED]> writes: > The EDITOR variable seems to have a fairly standard meaning on Unix > systems. I've been using that EDITOR value for years without problems, > only when I use psql's \e once in a while it breaks. I don't think we > should deviate from what seems to be a standard practice. Agreed, no quotes on Unix. I'm just wondering what to do on Windows. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] psql \e broken again
Tom Lane wrote: > I think the rationale was to allow paths containing spaces, which is > a pretty serious problem on Windows. Seems like we have two basic > options: > > 1. Quote only on Windows. > > 2. Expect user to put quotes in the EDITOR value if it contains a > space-containing path. The EDITOR variable seems to have a fairly standard meaning on Unix systems. I've been using that EDITOR value for years without problems, only when I use psql's \e once in a while it breaks. I don't think we should deviate from what seems to be a standard practice. I wonder whether a similar convention exists on Windows. I could certainly live with quoting only on Windows if that is what the convention is there. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] psql \e broken again
Peter Eisentraut <[EMAIL PROTECTED]> writes: > A remember specifically that I somewhat recently fixed psql to accept > editors with arguments, say EDITOR="pico -t". This was apparently > broken again during some Windows-related reshuffling. It now takes the > editor as one quoted string rather than possibly several shell tokens. > Could this please be fixed? I think the rationale was to allow paths containing spaces, which is a pretty serious problem on Windows. Seems like we have two basic options: 1. Quote only on Windows. 2. Expect user to put quotes in the EDITOR value if it contains a space-containing path. I don't much care for either of these :-(. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] psql \e broken again
A remember specifically that I somewhat recently fixed psql to accept editors with arguments, say EDITOR="pico -t". This was apparently broken again during some Windows-related reshuffling. It now takes the editor as one quoted string rather than possibly several shell tokens. Could this please be fixed? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pgxs regression
Joe Conway wrote: I'm not sure exactly what has changed, nor at the moment how to fix it, but I'm finding that pgxs no longer works for PL/R or dblink. Error as follows: make: *** No rule to make target `/usr/local/pgsql-dev/lib/pgxs/src/makefiles/../../src/port/pg_config_paths.h', needed by `all-static-lib'. Stop. The problem is related specifically to Makefiles using MODULE_big. I tested a few contribs that use MODULES and they seem to work fine under pgxs. Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) Seems you have the same issue that I have with PL/Java. I had to explicitly change my target from: all: all_lib to: all: $(shlib) The thread "Problems with pgxs" started in hackers on 10/31 may be of some interest. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] code question: storing INTO relation
Simon Riggs <[EMAIL PROTECTED]> writes: > On Fri, 2004-11-12 at 23:13, Tom Lane wrote: >> Greg is correct --- at least for btree build, which is the only index >> type we have WAL-ified at all :-( > Is there a place (or a single best place) to document this behaviour? If you're talking about the lack of WAL backup for non-btree indexes, it is documented (at the end of the PITR section IIRC). If you're talking about the optimization of not logging index builds, I don't see a need to document that per se. Ordinary users shouldn't need to care, mainly because they can't affect it one way or the other. Anyone who does care can look at the code and see how it's done. (Open source has a big advantage over closed source in that regard, and I think it's reasonable to have different documentation practices than closed-source products would use.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Update TIP 9 please
Who maintains the tips? Ð ÐÑÐ, 14/11/2004 Ð 11:31 +, Simon Riggs ÐÐÑÐÑ: > When 8.0 is released, TIP 9 should change from > > TIP 9: the planner will ignore your desire to choose an index scan if > your joining column's datatypes do not match > > to > > TIP 9: the planner will ignore your desire to choose an index scan if > your joining column's datatypes do not match (upgrade to 8.0!) > > We could change this now... > > ...and add another TIP to encourage people to upgrade...? > -- Markus Bertheau <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Update TIP 9 please
When 8.0 is released, TIP 9 should change from TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match to TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match (upgrade to 8.0!) We could change this now... ...and add another TIP to encourage people to upgrade...? -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] code question: storing INTO relation
On Fri, 2004-11-12 at 23:13, Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Greg Stark wrote: > >> I think that's already done for CREATE INDEX/REINDEX. > > > I don't think so. Can someone confirm? > > Greg is correct --- at least for btree build, which is the only index > type we have WAL-ified at all :-( > Is there a place (or a single best place) to document this behaviour? - with each command? - in the backup section? - in runtime? Seems a shame to optimize and not tell anyone. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postmaster segfaults with HUGE table
On Sun, 2004-11-14 at 10:05, Neil Conway wrote: > Joachim Wieland wrote: > > this query makes postmaster (beta4) die with signal 11: > > > > (echo "CREATE TABLE footest("; > > for i in `seq 0 66000`; do > > echo "col$i int NOT NULL,"; > > done; > > echo "PRIMARY KEY(col0));") | psql test > > > > > > ERROR: tables can have at most 1600 columns > > LOG: server process (PID 2140) was terminated by signal 11 > > LOG: terminating any other active server processes > > LOG: all server processes terminated; reinitializing > > At best you're going to get the error message above: "tables can have at > most 1600 columns". But this is definitely a bug: we end up triggering > this assertion: > > TRAP: BadArgument("!(attributeNumber >= 1)", File: "tupdesc.c", Line: 405) > > This specific assertion is triggered because we represent attribute > numbers throughout the code base as a (signed) int16 -- the assertion > failure has occurred because an int16 has wrapped around due to > overflow. A fix would be to add a check to DefineRelation() (or even > earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" > columns. We eventually do perform this check in > heap_create_with_catalog(), but by that point it is too late: various > functions have been invoked that assume we're dealing with a sane number > of columns. > > BTW, I noticed that there's an O(n^2) algorithm to check for duplicate > column names in DefineRelation() -- with 60,000 column names that takes > minutes to execute, but an inconsequential amount of time with 1500 > column names. So I don't think there's any point rewriting the algorithm > to be faster -- provided we move the check for MaxHeapAttributeNumber > previous to this loop, we should be fine. > This seems too obvious a problem to have caused a bug...presumably this has been there for a while? Does this mean that we do not have regression tests for each maximum setting ... i.e. are we missing a whole class of tests in the regression tests? -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] code question: storing INTO relation
On Fri, 2004-11-12 at 23:13, Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Greg Stark wrote: > >> I think that's already done for CREATE INDEX/REINDEX. > > > I don't think so. Can someone confirm? > > Greg is correct --- at least for btree build, which is the only index > type we have WAL-ified at all :-( [well...at least they're optimized then... :) ] With regard to the other index types, my opinion was: HASH - works OK, but a pain to administer, no huge benefit in using R-TREE - slightly broken in places, limited in usablity GiST - index of choice for PostGIS, TSearch2, in need of optimization Following recent optimization work on GiST, it now seems worth the trouble to add WAL logging to it. ISTM that the other two aren't widely used enough to make it worthwhile to spend time on, evidence for which is also that no one ever has, up 'til now. Time-management seems to be the key to making progress in the most important areas... -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Increasing the length of
On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote: > On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote: > > On Wed, 2004-11-10 at 21:48, Richard Huxton wrote: > > > > > > Isn't that: > > > log_min_duration_statement (integer) > > > > That gets written when a statement completes, not during execution. > > I've been following this thread, and I was thinking the same thing. > I wonder how much work it'd be to have another log setting -- say > log_statement_after_min_duration (integer) -- which did what Simon > wants. That'd more than satisfy my need, for sure. Might the cost > of that be too high, though? I think this is a great idea. ...Rather than invent a new GUC, I think this is the solution: log_min_duration_statement writes to log at end of execution, if execution time is greater than that threshold. Let's move that piece of code so it is executed as the query progresses. That way, you get notified that a problem query is occurring NOW, rather than "it has occurred". The code already has such a timer check, for statement_timeout, in backend/storage/lmgr/proc.c. We could reuse this timer to go off at log_min_duration_statement and then log query if still executing. (If log_min_duration_statement >= statement_timeout, we would skip that step.) We would then reset the timer so that it then goes off at where it does now, at statement_timeout. So, same piece of code, used twice... That way you can set up 2 limits, with three bands of actions: Between 0 and log_min_duration_statement - logs nothing Between log_min_duration_statement and statement_timeout - statement written to log, though execution continues... At statement_timeout - statement cancelled We'd just need a small piece of code to set timer correctly first, then another piece to record state change and reset timer again. Lift and drop the existing code from end-of-execution. This then: - solves the *problem query* diagnosis issue, as originally raised by Sean and seconded by myself and Greg - makes the answer exactly what Tom proposed - look in the logs - doesn't make any changes to the technical innards of UDP and pgstats.c - no administrative interface changes, just slightly changed behaviour - existing users mostly wouldn't even notice we'd done it... Thoughts? Easy enough change to be included as a hot fix for 8.0: no new system code, no new interface code, just same behaviour at different time. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] postmaster segfaults with HUGE table
Joachim Wieland wrote: this query makes postmaster (beta4) die with signal 11: (echo "CREATE TABLE footest("; for i in `seq 0 66000`; do echo "col$i int NOT NULL,"; done; echo "PRIMARY KEY(col0));") | psql test ERROR: tables can have at most 1600 columns LOG: server process (PID 2140) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing At best you're going to get the error message above: "tables can have at most 1600 columns". But this is definitely a bug: we end up triggering this assertion: TRAP: BadArgument("!(attributeNumber >= 1)", File: "tupdesc.c", Line: 405) This specific assertion is triggered because we represent attribute numbers throughout the code base as a (signed) int16 -- the assertion failure has occurred because an int16 has wrapped around due to overflow. A fix would be to add a check to DefineRelation() (or even earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber" columns. We eventually do perform this check in heap_create_with_catalog(), but by that point it is too late: various functions have been invoked that assume we're dealing with a sane number of columns. BTW, I noticed that there's an O(n^2) algorithm to check for duplicate column names in DefineRelation() -- with 60,000 column names that takes minutes to execute, but an inconsequential amount of time with 1500 column names. So I don't think there's any point rewriting the algorithm to be faster -- provided we move the check for MaxHeapAttributeNumber previous to this loop, we should be fine. Thanks for the report. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] postmaster segfaults with HUGE table
Hi, this query makes postmaster (beta4) die with signal 11: (echo "CREATE TABLE footest("; for i in `seq 0 66000`; do echo "col$i int NOT NULL,"; done; echo "PRIMARY KEY(col0));") | psql test ERROR: tables can have at most 1600 columns LOG: server process (PID 2140) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing Program received signal SIGSEGV, Segmentation fault. 0x4015d43c in mallopt () from /lib/tls/libc.so.6 (gdb) bt #0 0x4015d43c in mallopt () from /lib/tls/libc.so.6 #1 0x00021680 in ?? () [...] #16 0x0001 in ?? () #17 0x0821bc9b in AllocSetDelete () Previous frame inner to this frame (corrupt stack?) Furthermore the backend doesn't react on query cancel requests from psql during execution of the query. Joachim ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] GiST: PickSplit and multi-attr indexes
Oleg & Teodor, If I understand the code correctly, GiST will only pass the first attribute of each index tuple to the user-defined PickSplit method when it wants to split a node. (see circa line 1269 of gist.c) Is this a wise design decision? Granted, in many situations the first attribute in the index is sufficient to make a reasonable decision about how to divide the node into two halves, but I don't think that is universally true. For example, consider a two column index whose first attribute has a small number of distinct values. It could well be that all the first attribute values in a node-to-be-split would be the same. Only passing the first attribute to PickSplit would result in an essentially random distribution of tuples among the split nodes, rather than allowing the GiST extension to make use of the second attribution to partition the nodes. That's an extreme example, but it is easy to construct more realistic scenarios (basically, any situation in which the cardinality of the first index attribute is low -- a reasonably common occurrence with a multi-column index, I believe). I'm not sure whether this would be a problem in practice. Speculation: repeated invocations of PickSplit are one of the main factors in deriving the ultimate shape of the GiST tree. Poor distribution of keys resulting from PickSplit would eventually result in unnecessarily loose bounding predicates in internal nodes, which would hurt performance. Comments welcome, Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Big Database
I would like to ask the more experienced users on Postgres database a couple of questions I have on a db I manage with a lot of data. A lot of data means something like 15.000.000 rows in a table. I will try to describe the tables and what I will have to do on them :) There is a table that has product data in the form of Table product: product_id varchar(8), product_name text and product actions table: product_id varchar(8), flow char(1), who int, where int, value float. I will have to make sql queries in the form "select value from product_actions where who='someone' and where='somewhere' and maybe make also some calculations on these results. I allready have made some indexes on these tables and a view that joins the two of them but I would like to ask you people if someone is using such a big db and how can I speed up things as much as it is possible on this ... these product_actions tables exists for each year from 1988 till 2003 so this means a lot of data... Thanks in Advance ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]