[HACKERS] Bug with UTF-8 character
good morning, I got a bug request for the following unicode character in PostgreSQL 8.1.4: 0xedaeb8 ERROR: invalid byte sequence for encoding UTF8: 0xedaeb8 This one seemed to work properly in PostgreSQL 8.0.3. I think the following code in postgreSQL 814 has a bug in it. File: postgresql-8.1.4/src/backend/utils/mb/wchar.c The entry values to the function are: source = ed ae b8 20 20 20 20 20 20 20 20 20 20 20 20 length = 3 (length is the length of current utf-8 character) But the code does a check where the second character should not be greater than 0x9F, when first character is 0xED. This is not according to UTF-8 standard in RFC 3629. I believe that is not a valid test. This test fails on our string, when it shouldn’t. I believe this is a bug, could you please confirm or let me know what I am doing wrong. Many thanks, Hans -- Cybertec Geschwinde Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Updatable views/with check option parsing
Am Mittwoch, 24. Mai 2006 20:42 schrieb Tom Lane: Peter Eisentraut [EMAIL PROTECTED] writes: I have spent some time figuring out how to resolve the parsing conflicts in Bernd Helmle's updatable views patch. The problem has now been reduced to specifically this situation: Could we see the proposed patches for gram.y? Here it is. $ make -W gram.y gram.c bison -y -d gram.y conflicts: 4 shift/reduce These are basically for instances of the same problem. -- Peter Eisentraut http://developer.postgresql.org/~petere/ --- /home/pei/devel/pg82/pgsql/src/backend/parser/gram.y 2006-05-22 09:05:13.0 +0200 +++ gram.y 2006-05-26 09:41:21.0 +0200 @@ -339,7 +339,8 @@ %type list constraints_set_list %type boolean constraints_set_mode %type str OptTableSpace OptConsTableSpace OptTableSpaceOwner - +%type listopt_check_option +%type nodeopt_check_mode /* * If you make any token changes, update the keyword table in @@ -356,7 +357,7 @@ BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BOOLEAN_P BOTH BY - CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P + CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB @@ -4618,12 +4619,12 @@ /* * * QUERY: - * CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list ')' AS query + * CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list ')' AS query [ WITH [ CASCADED | LOCAL ] CHECK OPTION ] * */ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list -AS SelectStmt +AS SelectStmt opt_check_option { ViewStmt *n = makeNode(ViewStmt); n-replace = false; @@ -4631,10 +4632,11 @@ n-view-istemp = $2; n-aliases = $5; n-query = (Query *) $7; + n-options = (List *) $8; $$ = (Node *) n; } | CREATE OR REPLACE OptTemp VIEW qualified_name opt_column_list -AS SelectStmt +AS SelectStmt opt_check_option { ViewStmt *n = makeNode(ViewStmt); n-replace = true; @@ -4642,10 +4644,25 @@ n-view-istemp = $4; n-aliases = $7; n-query = (Query *) $9; + n-options = (List *) $10; $$ = (Node *) n; } ; +opt_check_option: +WITH opt_check_mode CHECK OPTION + { + $$ = list_make1( $2 ); +} + | /* EMPTY */ { $$ = NIL; } +; + +opt_check_mode: +CASCADED { $$ = (Node *)makeString(cascaded); } +| LOCAL { $$ = (Node *)makeString(local); } +| /* EMPTY */ { $$ = (Node *)makeString(cascaded); } + ; + /* * * QUERY: @@ -8500,7 +8517,6 @@ | VARYING | VIEW | VOLATILE - | WITH | WITHOUT | WORK | WRITE @@ -8551,8 +8567,6 @@ | SETOF | SMALLINT | SUBSTRING - | TIME - | TIMESTAMP | TREAT | TRIM | VARCHAR @@ -8608,6 +8622,7 @@ | ASC | ASYMMETRIC | BOTH + | CASCADED | CASE | CAST | CHECK @@ -8662,6 +8677,8 @@ | SYMMETRIC | TABLE | THEN + | TIME + | TIMESTAMP | TO | TRAILING | TRUE_P @@ -8671,6 +8688,7 @@ | USING | WHEN | WHERE + | WITH ; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GIN stuck in loop during PITR
Thanks a lot, applied. Can you describe test suite? It may be useful for test more... GIN is young code and it needs to independently tests. Andreas Seltenreich wrote: I'm just experimenting a bit with GIN, and it is occasionally getting stuck looping in findParents() during WAL replay. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] max(*)
Shouldn't SELECT max(*) FROM foo; give an error? Instead it's executed like SELECT max(1) FROM foo; Just like count(*) is executed as count(1). Something for the TODO or is it a feature? ps. I know it's not an important case since no one sane would try to calculate max(*), but still. /Dennis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] XLogArchivingActive
On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: I propose to introduce a GUC permanent_archiving or so, to select whether wal archiving happens permanently or only when a backup is in progress (i.e. between pg_start_backup and pg_stop_backup). This is silly. Why not just turn archiving on and off? Not quite. I want online backup, but no archiving. I can see what you want and why you want it. It's good to have the option of a physical online backup as opposed to the logical online backup that pg_dump offers. Currently, I have to edit postgresql.conf and SIGHUP to turn on archiving configuring a (hopefully) writable directory, do the backup, edit postgresql.conf and SIGHUP again. Not too convenient... You're doing this for pgAdmin right? My understanding was that we had the tools now to edit the postgresql.conf programmatically? Seems like its not too convenient to change the way the server operates to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm also not that happy about curtailing people's options on backup either: if people decided they wanted to have a mixture of isolated on-line backup (as you suggest), plus active archiving at other times they would still have the problems you suggest. Not sure what the edit commands are offhand, but we would need the following program: - edit postgresql.conf - pg_reload_conf() - wait 30 - pg_start_backup('blah') - backup - pg_stop_backup() - unedit postgresql.conf - pg_reload_conf() Which could then be wrapped even more simply as - pg_start_backup_online('blah') - backup - pg_stop_backup_online() Which overall seems lots easier than changing the server and adding another parameter. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] max(*)
On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote: Shouldn't SELECT max(*) FROM foo; give an error? SQL:2003 would not allow this; SQL:2003 permits only COUNT(*) and no other aggregate function. All other aggregates require a value expression. Instead it's executed like SELECT max(1) FROM foo; Just like count(*) is executed as count(1). Something for the TODO or is it a feature? Doesn't seem an important or even useful extension of the standard, but would probably require special case processing for every aggregate function in order to implement that. Its not dangerous... so I'm not sure we should take any action at all. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] XLogArchivingActive
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: That's right, but my proposal would implicitely switch on archiving while backup is in progress, thus explicitely enabling/disabling archiving wouldn't be necessary. I'm not sure you can expect that to work. The system is not built to guarantee instantaneous response to mode changes like that. Um, as long as xlog writing stops immediate recycling when pg_start_backup is executed everything should be fine, since archived logs are not expected to be present until pg_stop_backup is done. The conventional wisdom is that pg_dump files are substantially smaller than the on-disk footprint ... and that's even without compressing them. I think you are taking a corner case, ie bytea data, and presenting it as something that ought to be the design center. I certainly have an extreme cornercase, since data is highly compressible. I won't suggest to replace pg_dump by physical backup methods, but disaster recovery may take considerably longer from a dump than from filesystem level backup. Something that might be worth considering is an option to allow pg_dump to use binary COPY. I don't think this'd work nicely for text dumps, but seems like custom- or tar-format dumps could be made to use it. This would probably be a win for many datatypes not only bytea, and it'd still be far more portable than a filesystem dump. I'd really love a copy format that works for binary and text data as well, optimally compressed. Initial replication to a new slony cluster node uses COPY, and network bandwidth may become the restricting factor. Line protocol compression would be desirable for that too, but that's another story. Regards, Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] XLogArchivingActive
Simon Riggs wrote: On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: Currently, I have to edit postgresql.conf and SIGHUP to turn on archiving configuring a (hopefully) writable directory, do the backup, edit postgresql.conf and SIGHUP again. Not too convenient... You're doing this for pgAdmin right? Not yet, just trying to manage a server. My understanding was that we had the tools now to edit the postgresql.conf programmatically? Seems like its not too convenient to change the way the server operates to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm also not that happy about curtailing people's options on backup either: if people decided they wanted to have a mixture of isolated on-line backup (as you suggest), plus active archiving at other times they would still have the problems you suggest. Why? My suggestion is to redefine XLogArchivingActive. Currently, it tests for non-null archive_command. I propose bool XlogArchivingActive() { if (XLogArchiveCommand[0] == 0) return false; return (XLogPermanentArchive // from GUC || OnlineBackupRunning()); // from pg_start_backup } The people you mention simply have XLogPermanentActive=true in postgresql.conf, delivering the current behaviour. Not sure what the edit commands are offhand, but we would need the following program: - edit postgresql.conf - pg_reload_conf() - wait 30 - pg_start_backup('blah') - backup - pg_stop_backup() - unedit postgresql.conf - pg_reload_conf() Which could then be wrapped even more simply as - pg_start_backup_online('blah') - backup - pg_stop_backup_online() Editing postgresql.conf for this is ugly. In addition, pg_start_backup_online would need an additional parameter, the (highly machine specific) archive_command string. I'd like to see that parameter untouched in postgresql.conf. Regards, Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] v814 + OSX10.4.6 builds OK, but fails launch ...
On Wed, May 24, 2006 at 05:45:57PM -0700, Richard wrote: /usr/local/pgsql/bin ls -al postmaster lrwxrwx--- 1 root wheel 8 2006-05-24 07:48 postmaster - postgres How did that happen. I was always under the impression that permission bits on symbolic links were ignored, and chmod on my machine (linux) won't let me change them anyway, it changes the bits on the underlying file. Is OSX different here? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] timezones to own config file
On Wed, May 24, 2006 at 09:13:42PM -0400, Tom Lane wrote: The zic database doesn't seem to have a problem with using the same abbreviations to mean many different things. We could look to it for information, or maybe even use its classification of timezone groups, but I don't think it can solve the problem for us. I think you may be thinking of yet a separate TODO item, which is to be able to use the zic timezone names in timestamptz input, viz '2006-05-24 21:11 Americas/New_York'::timestamptz But names like 'IST' or 'CDT' are not zic timezone names, they just represent specific offsets from UTC. Well, the zic database does contain information about the abbreviations, so we would be able to build a list of them. I think the right solution is probably fix the above first (allow full zic timezones in timestamps) and then setup the CST/CEST/etc as a list of aliases users can customise... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] XLogArchivingActive
Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: I'm not sure you can expect that to work. The system is not built to guarantee instantaneous response to mode changes like that. Um, as long as xlog writing stops immediate recycling when pg_start_backup is executed everything should be fine, since archived logs are not expected to be present until pg_stop_backup is done. Wrong. You forgot about all the *other* behaviors that change depending on XLogArchivingActive, like whether CREATE INDEX gets archived or just fsync'd. I don't think it makes sense for CREATE INDEX to change that behavior in midstream, even assuming that it noticed the flag change instantly. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] max(*)
On 5/26/06, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote: Shouldn't SELECT max(*) FROM foo; give an error?IMO, yes. SQL:2003 would not allow this; SQL:2003 permits only COUNT(*) and no other aggregate function. All other aggregates require a value _expression_. This precisely being the reason. Instead it's executed like SELECT max(1) FROM foo; Just like count(*) is executed as count(1). That's right; see the intearction pasted below. Something for the TODO or is it a feature?We definitely cannot tout it as a feature, because it is not even a 'useful extension of the standard' Doesn't seem an important or even useful extension of the standard, but would probably require special case processing for every aggregate function in order to implement that. Its not dangerous... so I'm not sure we should take any action at all.A TODO wouldn't do any harm. If somebosy comes up with some smart solution, you can always incorporate it. Something not supported should be stated as such through an ERROR. Except for count(), none of the following make any sense:The transcipt:test=# \d t1 Table public.t1 Column |Type | Modifiers +-+--- a| integer | not null Indexes:t1_pkey PRIMARY KEY, btree (a) test=# select * from t1; a--- 1 2 3 4 5 (5 rows)test=# select count(*) from t1; count--- 5(1 row) test=# select count(1) from t1; count--- 5(1 row) test=# select max(*) from t1;max- 1 (1 row) test=# select max(1) from t1; max- 1 (1 row)test=# select min(*) from t1;min- 1(1 row)test=# select avg(*) from t1; avg1.(1 row)test=# select sum(*) from t1;sum- 5(1 row)test=# select sum(1) from t1;sum- 5 --- this is correct (1 row)test=#
Re: [HACKERS] Bug with UTF-8 character
On 5/26/06, Martijn van Oosterhout kleptog@svana.org wrote: On Fri, May 26, 2006 at 08:21:56AM +0200, Hans-Jürgen Schönig wrote: I got a bug request for the following unicode character in PostgreSQL 8.1.4: 0xedaeb8 ERROR: invalid byte sequence for encoding UTF8: 0xedaeb8 Your character converts to char DBB8. According to the standard, characters in the range D800-DFFF are not characters but surrogates. They don't mean anything by themselves and are thus rejected by postgres. http://www.unicode.org/faq/utf_bom.html#30 This character should be preceded by a low surrogate (D800-DBFF). You should combine the two into a single 4-byte UTF-8 character. You are talking about UTF16, not UTF8. -- marko ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] XLogArchivingActive
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: I'm not sure you can expect that to work. The system is not built to guarantee instantaneous response to mode changes like that. Um, as long as xlog writing stops immediate recycling when pg_start_backup is executed everything should be fine, since archived logs are not expected to be present until pg_stop_backup is done. Wrong. You forgot about all the *other* behaviors that change depending on XLogArchivingActive, like whether CREATE INDEX gets archived or just fsync'd. I don't think it makes sense for CREATE INDEX to change that behavior in midstream, even assuming that it noticed the flag change instantly. Ok, but how can I recognize whether all running commands have safely switched to archiving mode after enabling it, to continue backing up? Thought a little about your proposal to use a non-copying archive_command, since I only want to have a backup of the state the cluster had when backup started, but this won't work because all write actions that are not appending (truncate, drop) would remove files needed for pre-backup state while possibly not backed up yet, thus the WAL archive is needed. Following your proposal, I could redirect archiving to /dev/null while not backing up, but how can I make sure that WAL files of transactions, open when starting the backup procedure, are written to the wal directory, not lost previously? When pg_start_backup() is executed, I'd need the archiver to write all hot xlog files again. Regards, Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bug with UTF-8 character
=?windows-1252?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes: But the code does a check where the second character should not be greater than 0x9F, when first character is 0xED. This is not according to UTF-8 standard in RFC 3629. Better read the RFC again: it says UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) The reason for the prohibition is explained as The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. I don't know anything about surrogate pairs, but I am not about to decide that we know more about this than the RFC authors do. If they say it's invalid, it's invalid. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bug with UTF-8 character
On Fri, May 26, 2006 at 05:16:59PM +0300, Marko Kreen wrote: On 5/26/06, Martijn van Oosterhout kleptog@svana.org wrote: On Fri, May 26, 2006 at 08:21:56AM +0200, Hans-Jürgen Schönig wrote: I got a bug request for the following unicode character in PostgreSQL 8.1.4: 0xedaeb8 ERROR: invalid byte sequence for encoding UTF8: 0xedaeb8 Your character converts to char DBB8. According to the standard, characters in the range D800-DFFF are not characters but surrogates. They don't mean anything by themselves and are thus rejected by postgres. http://www.unicode.org/faq/utf_bom.html#30 This character should be preceded by a low surrogate (D800-DBFF). You should combine the two into a single 4-byte UTF-8 character. You are talking about UTF16, not UTF8. UTF-8 and UTF-16 use the same charater set as base, just the encoding is different. As that page says, to convert the surrogate pair in UTF-16 (D800 DC00) to UTF-8, you have to combine them into a single 4-byte UTF-8 character. The direct encoding for D800 into UTF-8 is invalid because no such character exists. The OP apparently has some broken UTF-16 to UTF-8 conversion software and thus produced invalid UTF-8, which postgres is rejecting. Given he didn't post the other half of the surrogate, we don't actually know what character he's trying to represent, so we can't help him with the encoding. However, supplementary characters (which require surrogates in UTF-16) are all in the range 0x1 to 0x10. If you don't beleive me, check the unicode database yourself (warning large: 944KB). http://www.unicode.org/Public/UNIDATA/UnicodeData.txt DBB8 is a private use surrogate, maybe he should be using something in the range E000-F8FF which are normal private use characters. Have a ncie day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Updatable views/with check option parsing
Peter Eisentraut wrote: Am Mittwoch, 24. Mai 2006 20:42 schrieb Tom Lane: Peter Eisentraut [EMAIL PROTECTED] writes: I have spent some time figuring out how to resolve the parsing conflicts in Bernd Helmle's updatable views patch. The problem has now been reduced to specifically this situation: Could we see the proposed patches for gram.y? Here it is. $ make -W gram.y gram.c bison -y -d gram.y conflicts: 4 shift/reduce These are basically for instances of the same problem. I had a quick look - I don't think there is an easy answer with the current proposed grammar. If we want to prevent shift/reduce conflicts I suspect we'd need to use a different keyword than WITH, although I can't think of one that couldn't be a trailing clause on a select statment, which is the cause of the trouble. Another possibility would be to move the optional WITH clause so that it would come before the AS clause. Then there should be no conflict, I believe. Something like: ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list opt_check_option AS SelectStmt cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Updatable views/with check option parsing
Andrew Dunstan [EMAIL PROTECTED] writes: I had a quick look - I don't think there is an easy answer with the current proposed grammar. If we want to prevent shift/reduce conflicts I suspect we'd need to use a different keyword than WITH, although I can't think of one that couldn't be a trailing clause on a select statment, which is the cause of the trouble. Another possibility would be to move the optional WITH clause so that it would come before the AS clause. Unfortunately the SQL99 spec is perfectly clear about what it wants: view definition ::= CREATE [ RECURSIVE ] VIEW table name view specification AS query expression [ WITH [ levels clause ] CHECK OPTION ] levels clause ::= CASCADED | LOCAL I haven't had time to play with this yet, but I suspect the answer will have to be that we reinstate the token-merging UNION JOIN kluge that I just took out :-(. Or we could look into recognizing the whole thing as one token in scan.l, but I suspect that doesn't work unless we give up the no-backtrack property of the lexer, which would be more of a speed hit than the intermediate function was. Anyway it should certainly be soluble with token merging, if we can't find a pure grammar solution. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] max(*)
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote: Shouldn't SELECT max(*) FROM foo; give an error? Doesn't seem an important or even useful extension of the standard, but would probably require special case processing for every aggregate function in order to implement that. Its not dangerous... so I'm not sure we should take any action at all. We shouldn't. The spec's prohibition is based on the assumption that the only aggregate functions in existence are those listed in the spec. Since we allow user-defined aggregates, who are we to say that there are no others for which * is sensible? You could imagine adding a catalog attribute to aggregate functions to say whether they allow *, but quite honestly that strikes me as a waste of implementation effort. The amount of work would be nontrivial and the benefit negligible. (Another possibility, if we get around to implementing N-argument aggregates, is to equate agg(*) to an invocation of a zero-argument aggregate as I suggested awhile ago. Then count() would be the only zero-argument aggregate mentioned in the standard catalogs. That would at least fall out of some work that's actually worth doing ...) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Updatable views/with check option parsing
On Wed, May 24, 2006 at 01:13:06PM +0200, Peter Eisentraut wrote: CREATE VIEW foo AS SELECT expr :: TIME . WITH (where expr is a_expr or b_expr and TIME could also be TIMESTAMP or TIME(x) or TIMESTAMP(x)). The continuation here could be WITH TIME ZONE (calling for a shift) or WITH CHECK OPTION (calling for a reduce). All the usual ideas about unfolding the rules or making keywords more reserved don't work (why should they). A one-token lookahead simply can't parse this. I have had some ideas about trying to play around with the precedence rules -- giving WITH TIME ZONE a higher precedence than WITH CHECK OPTION -- but I have no experience with that and I am apparently not doing it right, if that is supposed to work at all. All precedence rules do is force the parser to either shift or reduce without complaining about a conflict. i.e. it resolves the conflict by forcing a particular option. So all you would acheive with precedence rules is that you codify the solution. For example: always shift and if the user specifies WITH CHECK they get a parse error. It would be nice to be able to detect this and tell the user to parenthesise the (expr::TIME) thus solving the problem. Given that :: is not SQL-standard anyway, perhaps this is not a bad solution. Incidently, IIRC the default behaviour on conflict is a shift anyway, so that what the patch already does anyway. So we get: CREATE VIEW foo AS SELECT expr :: TIME WITH TIME ZONE-- OK CREATE VIEW foo AS SELECT expr :: TIME WITH CHECK OPTION -- parse error CREATE VIEW foo AS SELECT (expr :: TIME) WITH CHECK OPTION -- OK Of course, any code that decompiles into SQL will have to be careful to not produce unparseable SQL. Have a ncie day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Inefficient bytea escaping?
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: Looking at CopySendData, I wonder whether any traction could be gained by trying not to call fwrite() once per character. I'm not sure how much per-call overhead there is in that function. We've done a lot of work trying to optimize the COPY IN path since 8.0, but nothing much on COPY OUT ... Hm, I'll see whether I can manage to check CVS head too, and see what's happening, not a production alternative though. OK, make sure you get the copy.c version I just committed ... Here are the results, with the copy patch: psql \copy 1.4 GB from table, binary: 8.0 8.1 8.2dev 36s 34s 36s psql \copy 1.4 GB to table, binary: 8.0 8.1 8.2dev 106s95s 98s psql \copy 6.6 GB from table, std: 8.0 8.1 8.2dev 375s362s290s (second:283s) psql \copy 6.6 GB to table, std: 8.0 8.1 8.2dev 511s230s238s INSERT INTO foo SELECT * FROM bar 8.0 8.1 8.2dev 75s 75s 75s So obviously text COPY is enhanced by 20 % now, but it's still far from the expected throughput. The dump disk should be capable of 60MB/s, limiting text COPY to about 110 seconds, but the load process is CPU restricted at the moment. For comparision purposes, I included the in-server copy benchmarks as well (bytea STORAGE EXTENDED; EXTERNAL won't make a noticable difference). This still seems slower than expected to me, since the table's on-disk footage is relatively small (138MB). Regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Compression and on-disk sorting
I've done some more testing with Tom's recently committed changes to tuplesort.c, which remove the tupleheaders from the sort data. It does about 10% better than compression alone does. What's interesting is that the gains are about 10% regardless of compression, which means compression isn't helping very much on all the redundant header data, which I find very odd. And the header data is very redundant: bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid limit 1; xmin | xmax | cmin | cmax | aid +--+--+--+- 280779 |0 |0 |0 | 1 (1 row) bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid desc limit 1; xmin | xmax | cmin | cmax |aid +--+--+--+--- 310778 |0 |0 |0 | 3 (1 row) Makes sense, since pgbench loads the database via a string of COPY commands, each of which loads 1 rows. Something else worth mentioning is that sort performance is worse with larger work_mem for all cases except the old HEAD, prior to the tuplesort.c changes. It looks like whatever was done to fix that will need to be adjusted/rethought pending the outcome of using compression. In any case, compression certainly seems to be a clear win, at least in this case. If there's interest, I can test this on some larger hardware, or if someone wants to produce a patch for pgbench that will load some kind of real data into accounts.filler, I can test that as well. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compression and on-disk sorting
Jim C. Nasby [EMAIL PROTECTED] writes: Something else worth mentioning is that sort performance is worse with larger work_mem for all cases except the old HEAD, prior to the tuplesort.c changes. It looks like whatever was done to fix that will need to be adjusted/rethought pending the outcome of using compression. Please clarify. What are you comparing here exactly, and what cases did you test? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] LIKE, leading percent, bind parameters and indexes
On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote: select * from table where field like 'THE NAME%'; -- index scan select * from table where field like '%THE NAME%'; -- seq scan select * from table where field like :bind_param; -- seq scan (always) How difficult would it be to make LIKE check the value of the bound parameter for a starting % and use that information to decide on a query plan? IMHO this is worth making into a special case in the planner, because it's very easy to detect and makes a tremendous difference in the query plan/performance. Planning doesn't work that way. Like is just a function invokation, the planner doesn't ask or tell it where it is in the plan. And it's not the function that determines how the query is planned. Also, might a bitmap scan be a win for the %string case? Presumably it's much faster to find matching rows via an index and then go back into the heap for them; unless you're matching a heck of a lot of rows. This is an interesting thought. Currently, AFAICS, the bitmap-scan code only considers operators that are indexable, just like for narmal index scans. However, in this case the query could scan the entire index, apply the LIKE to each one and produce a bitmap of possible matches. Then do a bitmap scan over the table to check the results. Not just LIKE could use this, but any function marked STABLE. You'd have to weigh up the cost of scanning the *entire* index (because we don't have any actual restriction clauses) against avoiding a full table scan. Actually, if you're going to scan the whole index, maybe you can use the recent changes that allow VACUUM to scan the index sequentially, rather than by index order. Surely a sequential disk scan over the index to create the bitmap would a big win. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
[HACKERS] Creating a case insensitive data type
could CREATE TYPE be used to make a case insensitive version of varchar? So that doing something like LOWER(username) = 'joe' could just be done like username = 'joe' ? I want to try to avoid using CREATE TYPE with C extensions and using a new operator all over the place is not an option... How could I do this? Can I accomplish this via another method? -Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Inefficient bytea escaping?
Andreas Pflug [EMAIL PROTECTED] writes: Here are the results, with the copy patch: psql \copy 1.4 GB from table, binary: 8.0 8.1 8.2dev 36s 34s 36s psql \copy 6.6 GB from table, std: 8.0 8.1 8.2dev 375s 362s290s (second:283s) Hmph. There's something strange going on on your platform (what is it anyway?) Using CVS HEAD on Fedora Core 4 x86_64, I get bytea=# copy t to '/home/tgl/t.out'; COPY 1024 Time: 273325.666 ms bytea=# copy binary t to '/home/tgl/t.outb'; COPY 1024 Time: 62113.355 ms Seems \timing doesn't work on \copy (annoying), so $ time psql -c \\copy t to '/home/tgl/t.out2' bytea real3m47.507s user0m3.700s sys 0m36.406s $ ls -l t.* -rw-r--r-- 1 tgl tgl 5120001024 May 26 12:58 t.out -rw-rw-r-- 1 tgl tgl 5120001024 May 26 13:14 t.out2 -rw-r--r-- 1 tgl tgl 1024006165 May 26 13:00 t.outb $ This test case is 1024 rows each containing a 100-byte bytea, stored EXTERNAL (no on-disk compression), all bytes chosen to need expansion to \nnn form. So the ratio in runtimes is in keeping with the amount of data sent. It's interesting (and surprising) that the runtime is actually less for psql \copy than for server COPY. This is a dual Xeon machine, maybe the frontend copy provides more scope to use both CPUs? It would be interesting to see what's happening on your machine with oprofile or equivalent. I can't test psql binary \copy just yet, but will look at applying your recent patch so that case can be checked. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Updatable views/with check option parsing
Martijn van Oosterhout kleptog@svana.org writes: So we get: CREATE VIEW foo AS SELECT expr :: TIME WITH TIME ZONE-- OK CREATE VIEW foo AS SELECT expr :: TIME WITH CHECK OPTION -- parse error CREATE VIEW foo AS SELECT (expr :: TIME) WITH CHECK OPTION -- OK I haven't really been following this conversation, but just on the off chance this is a useful idea: Would it work to make WITH just a noise word? then you would just need one token of look-ahead to recognize TIME ZONE or CHECK OPTION instead of 2. I don't know what levels clauses look like so I'm not sure if you would be able to recognize them without seeing the WITH. I'm not even sure this works even if you can for that matter. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Inefficient bytea escaping?
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: Here are the results, with the copy patch: psql \copy 1.4 GB from table, binary: 8.0 8.1 8.2dev 36s 34s 36s psql \copy 6.6 GB from table, std: 8.0 8.1 8.2dev 375s362s290s (second:283s) Hmph. There's something strange going on on your platform (what is it anyway?) Debian 2.6.26. It's interesting (and surprising) that the runtime is actually less for psql \copy than for server COPY. This is a dual Xeon machine, maybe the frontend copy provides more scope to use both CPUs? The dual CPU explanation sounds reasonable, but I found the same tendency on a single 3GHz (HT disabled). Strange observation using top: user 90%, sys 10%, idle+wait 0% but only postmaster consumes cpu, showing 35%, the rest neglectable. It would be interesting to see what's happening on your machine with oprofile or equivalent. I'll investigate further, trying to find the missing CPU. Regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Creating a case insensitive data type
On Thu, May 25, 2006 at 08:23:53PM -0400, Dave wrote: could CREATE TYPE be used to make a case insensitive version of varchar? So that doing something like LOWER(username) = 'joe' could just be done like username = 'joe' ? I want to try to avoid using CREATE TYPE with C extensions and using a new operator all over the place is not an option... How could I do this? Can I accomplish this via another method? Try this: http://gborg.postgresql.org/project/citext/projdisplay.php Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] max(*)
On Fri, May 26, 2006 at 11:03:17AM -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote: Shouldn't SELECT max(*) FROM foo; give an error? Doesn't seem an important or even useful extension of the standard, but would probably require special case processing for every aggregate function in order to implement that. Its not dangerous... so I'm not sure we should take any action at all. We shouldn't. The spec's prohibition is based on the assumption that the only aggregate functions in existence are those listed in the spec. Since we allow user-defined aggregates, who are we to say that there are no others for which * is sensible? But if aggregate(*) just gets turned into aggregate(1) by the backend, why not just tell people to use aggregate(1) for their custom aggregates? Or am I misunderstanding how aggregate(*) is actually handled? My concern is that it's not inconceiveable to typo max(field) into max(*), which could make for a rather frustrating error. Not to mention this being something that could trip newbies up. If nothing else I'd say it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's page. :) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] max(*)
On Fri, May 26, 2006 at 14:06:29 -0500, Jim C. Nasby [EMAIL PROTECTED] wrote: But if aggregate(*) just gets turned into aggregate(1) by the backend, why not just tell people to use aggregate(1) for their custom aggregates? Or am I misunderstanding how aggregate(*) is actually handled? My concern is that it's not inconceiveable to typo max(field) into max(*), which could make for a rather frustrating error. Not to mention this being something that could trip newbies up. If nothing else I'd say it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's page. :) Tom's suggestion that (*) map to () which would refer to a zero argument aggregate would cover this case, since there wouldn't be a zero argument version of max. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] max(*)
Jim C. Nasby [EMAIL PROTECTED] writes: My concern is that it's not inconceiveable to typo max(field) into max(*), which could make for a rather frustrating error. Not to mention this being something that could trip newbies up. If nothing else I'd say it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's page. :) count(*) has been implemented that way since about 1999, and no one's complained yet, so I think you are overstating the importance of the problem. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 26, 2006 at 12:35:36PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Something else worth mentioning is that sort performance is worse with larger work_mem for all cases except the old HEAD, prior to the tuplesort.c changes. It looks like whatever was done to fix that will need to be adjusted/rethought pending the outcome of using compression. Please clarify. What are you comparing here exactly, and what cases did you test? Sorry, forgot to put the url in: http://jim.nasby.net/misc/pgsqlcompression/compress_sort.txt But the meat is: -- work_mem -- Scale 20002 not compressed 150 805.7 797.7 not compressed 300017820 17436 compressed 150 371.4 400.1 compressed 300081528537 compressed, no headers 300073257876 Performance degrades with more work_mem any time compression is used. I thought I had data on just your tuplesort.c change without compression, but I guess I don't. :( I can run that tonight if desired. As for the code, the 3 things I've tested are HEAD as of 5/17/06 with no patches (labeld 'not compressed'); that code with the compression patch (compressed), and that code with both the compression patch and your change to tuplesort.c that removes tuple headers from the sorted data (compressed, no headers). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] XLogArchivingActive
Originally I wanted the command to be a string, and archiving to be a boolean, but Tom wanted a single parameter, and others agreed. --- Andreas Pflug wrote: Simon Riggs wrote: On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote: Currently, I have to edit postgresql.conf and SIGHUP to turn on archiving configuring a (hopefully) writable directory, do the backup, edit postgresql.conf and SIGHUP again. Not too convenient... You're doing this for pgAdmin right? Not yet, just trying to manage a server. My understanding was that we had the tools now to edit the postgresql.conf programmatically? Seems like its not too convenient to change the way the server operates to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm also not that happy about curtailing people's options on backup either: if people decided they wanted to have a mixture of isolated on-line backup (as you suggest), plus active archiving at other times they would still have the problems you suggest. Why? My suggestion is to redefine XLogArchivingActive. Currently, it tests for non-null archive_command. I propose bool XlogArchivingActive() { if (XLogArchiveCommand[0] == 0) return false; return (XLogPermanentArchive // from GUC || OnlineBackupRunning()); // from pg_start_backup } The people you mention simply have XLogPermanentActive=true in postgresql.conf, delivering the current behaviour. Not sure what the edit commands are offhand, but we would need the following program: - edit postgresql.conf - pg_reload_conf() - wait 30 - pg_start_backup('blah') - backup - pg_stop_backup() - unedit postgresql.conf - pg_reload_conf() Which could then be wrapped even more simply as - pg_start_backup_online('blah') - backup - pg_stop_backup_online() Editing postgresql.conf for this is ugly. In addition, pg_start_backup_online would need an additional parameter, the (highly machine specific) archive_command string. I'd like to see that parameter untouched in postgresql.conf. Regards, Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] XLogArchivingActive
On Fri, May 26, 2006 at 12:15:34AM +0200, Andreas Pflug wrote: Jim Nasby wrote: Another consideration is that you can use rsync to update a filesystem-level backup, but there's no pg_dump equivalent. On a large database that can make a sizable difference in the amount of time required for a backup. That's fine to cut the backup execution time, but to guarantee consistency while the cluster is running pg_start_backup/pg_stop_backup and WAL archiving will still be necessary. Of course, but the point is that it would only be necessary while you're running rsync. If you don't care about being able to actually roll forward from that backup, you don't need any WAL files from after rsync completes. One possible way to accomplish this would be to allow specifying an archiver command to pg_start_backup, which would then fire up an archiver for the duration of your backup. Then you can: SELECT pg_start_backup('label', 'cp -i %p /mnt/server/archivedir/%f /dev/null'); rsync SELECT pg_stop_backup(); No messing with postgresql.conf, no need to HUP the postmaster. Perhaps the OP would be interested in coding this up, or sponsoring someone to do so, since I think it provide what they were looking for. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Inefficient bytea escaping?
I wrote: I can't test psql binary \copy just yet, but will look at applying your recent patch so that case can be checked. With patch applied: $ time psql -c \\copy t to '/home/tgl/t.out2' bytea real3m46.057s user0m2.724s sys 0m36.118s $ time psql -c \\copy t to '/home/tgl/t.outb2' binary bytea real1m5.222s user0m0.640s sys 0m6.908s $ ls -l t.* -rw-rw-r-- 1 tgl tgl 5120001024 May 26 16:02 t.out2 -rw-rw-r-- 1 tgl tgl 1024006165 May 26 16:03 t.outb2 The binary time is just slightly more than what I got before for a server COPY: bytea=# copy t to '/home/tgl/t.out'; COPY 1024 Time: 273325.666 ms bytea=# copy binary t to '/home/tgl/t.outb'; COPY 1024 Time: 62113.355 ms So those numbers seem to hang together, and it's just the text case that is not making too much sense. I'm off for a little visit with oprofile... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] XLogArchivingActive
On Fri, May 26, 2006 at 10:59:37AM +0100, Simon Riggs wrote: Not sure what the edit commands are offhand, but we would need the following program: - edit postgresql.conf - pg_reload_conf() - wait 30 - pg_start_backup('blah') Rather than 'wait 30', ISTM it would be better to just leave archiving enabled, but not actually archiving WAL files. Or, setup some mechanism so that you can tell if any commands who's behavior would change based on archiving are running, and if any of those that are running think archiving is disabled, pg_start_backup_online blocks on them. Also, regarding needing to place an archiver command in pg_start_backup_online, another option would be to depend on the filesystem backup to copy the WAL files, and just let them pile up in pg_xlog until pg_stop_backup_online. Of course, that would require a two-step filesystem copy, since you'd need to first copy everything in $PGDATA, and then copy $PGDATA/pg_xlog after you have that. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] GIN stuck in loop during PITR
Andreas Seltenreich schrob: Teodor Sigaev schrob: Thanks a lot, applied. Can you describe test suite? It may be useful for test more... Here's a shell script that triggers the bug when I revert the patch. Just tried the script on HEAD, and it was triggering an assertion. I guess it is because we are still returning InvalidOffsetNumber in the trivial case (looks like a typo to me). I've attached a patch. regards, andreas Index: ginbtree.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v retrieving revision 1.2 diff -c -r1.2 ginbtree.c *** ginbtree.c 26 May 2006 08:01:17 - 1.2 --- ginbtree.c 26 May 2006 20:09:45 - *** *** 189,195 Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root-off != btree-findChildPtr(btree, page, stack-blkno, InvalidOffsetNumber)) != InvalidBuffer ) { stack-parent = root; return; } --- 189,195 Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root-off = btree-findChildPtr(btree, page, stack-blkno, InvalidOffsetNumber)) != InvalidOffsetNumber ) { stack-parent = root; return; } ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Compression and on-disk sorting
On Fri, 2006-05-26 at 14:47 -0500, Jim C. Nasby wrote: But the meat is: -- work_mem -- Scale 20002 not compressed 150 805.7 797.7 not compressed 300017820 17436 compressed 150 371.4 400.1 compressed 300081528537 compressed, no headers 300073257876 Since Tom has committed the header-removing patch, we need to test not compressed, no headers v compressed, no headers There is a noticeable rise in sort time with increasing work_mem, but that needs to be offset from the benefit that in-general comes from using a large Heap for the sort. With the data you're using that always looks like a loss, but that isn't true with all input data orderings. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Compression and on-disk sorting
Simon Riggs [EMAIL PROTECTED] writes: There is a noticeable rise in sort time with increasing work_mem, but that needs to be offset from the benefit that in-general comes from using a large Heap for the sort. With the data you're using that always looks like a loss, but that isn't true with all input data orderings. Yeah, these are all the exact same test data, right? We need a bit more variety in the test cases before drawing any sweeping conclusions. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Compression and on-disk sorting
On Fri, May 26, 2006 at 04:41:51PM -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: There is a noticeable rise in sort time with increasing work_mem, but that needs to be offset from the benefit that in-general comes from using a large Heap for the sort. With the data you're using that always looks like a loss, but that isn't true with all input data orderings. Yeah, these are all the exact same test data, right? We need a bit more variety in the test cases before drawing any sweeping conclusions. All testing is select count(*) from (select * from accounts order by bid) a; hitting a pgbench database, since that's something anyone can (presumably) reproduce. Suggestions for other datasets welcome. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Question about name datatype
In c.h There is the following comment: /* * We want NameData to have length NAMEDATALEN and int alignment, * because that's how the data type 'name' is defined in pg_type. * Use a union to make sure the compiler agrees. Note that NAMEDATALEN * must be a multiple of sizeof(int), else sizeof(NameData) will probably * not come out equal to NAMEDATALEN. */ And indeed in pg_type the typalign column says i for this data type. My question is just, why? What would be the problem with an alignment of 1 for name? -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Updatable views/with check option parsing
Andrew Dunstan wrote: The GLR parsers require a compiler for ISO C89 or later. In addition, they use the inline keyword, which is not C89, but is C99 and is a common extension in pre-C99 compilers. It is up to the user of these parsers to handle portability issues. We already use inline, or handle its nonexistence, respectively. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Inefficient bytea escaping?
I wrote: I'm off for a little visit with oprofile... It seems the answer is that fwrite() does have pretty significant per-call overhead, at least on Fedora Core 4. The patch I did yesterday still ended up making an fwrite() call every few characters when dealing with bytea text output, because it'd effectively do two fwrite()s per occurrence of '\' in the data being output. I've committed a further hack that buffers a whole data row before calling fwrite(). Even though this presumably is adding one extra level of data copying, it seems to make things noticeably faster: bytea=# copy t to '/home/tgl/t.out'; COPY 1024 Time: 209842.139 ms as opposed to 268 seconds before. We were already applying the line-at-a-time buffering strategy for frontend copies, so that path didn't change much (it's about 226 seconds for the same case). At this point, a copy-to-file is just marginally faster than a frontend copy happening on the local machine; which speaks well for the level of optimization of the Linux send/recv calls. More importantly, I see consistent results for the text and binary cases. Let me know what this does on your Debian machine ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] LIKE, leading percent, bind parameters and indexes
On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote: I think more exactly, the planner can't possibly know how to plan an indexscan with a leading '%', because it has nowhere to start. The fact is that index scan is performed on LIKE expression on a string not preceded by '%', except when bound parameter is used. select * from table where field like 'THE NAME%'; -- index scan select * from table where field like '%THE NAME%'; -- seq scan select * from table where field like :bind_param; -- seq scan (always) Since I'm somewhat doubtful of coming up with a generic means for dealing with plan changes based on different bound parameter values any time soon... How difficult would it be to make LIKE check the value of the bound parameter for a starting % and use that information to decide on a query plan? IMHO this is worth making into a special case in the planner, because it's very easy to detect and makes a tremendous difference in the query plan/performance. My solution is a function in one of my libraries called strrev() which returns the reverse of a string. I make a function index of a strrev(field). Then, just search where strrev('%the name') like strrev(field); ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question about name datatype
Greg Stark [EMAIL PROTECTED] writes: And indeed in pg_type the typalign column says i for this data type. My question is just, why? What would be the problem with an alignment of 1 for name? Probably none, but that's how it's been defined since day one, and there doesn't seem any good reason to change. (Looking at the system catalog definitions, it doesn't appear that we'd save anything on alignment.) It's possible that back when that decision was taken, there was something in the bootstrap code that wouldn't work with non-int-aligned columns? Just a guess. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Updatable views/with check option parsing
Peter Eisentraut [EMAIL PROTECTED] writes: Andrew Dunstan wrote: The GLR parsers require a compiler for ISO C89 or later. In addition, they use the inline keyword, which is not C89, but is C99 and is a common extension in pre-C99 compilers. It is up to the user of these parsers to handle portability issues. We already use inline, or handle its nonexistence, respectively. Yeah, I don't see anything in that statement that we don't assume already. The interesting question to me is how much different is GLR from garden-variety bison; in particular, what's the parser performance like? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] max(*)
On Friday 26 May 2006 09:45, Gurjeet Singh wrote: Something not supported should be stated as such through an ERROR. Except for count(), none of the following make any sense: The transcipt: test=# \d t1 Table public.t1 Column | Type | Modifiers +-+--- a | integer | not null Indexes: t1_pkey PRIMARY KEY, btree (a) test=# select * from t1; a --- 1 2 3 4 5 (5 rows) given: pagila=# select 1 from t1; ?column? -- 1 1 1 1 1 (5 rows) test=# select count(*) from t1; count --- 5 (1 row) this makes sense test=# select count(1) from t1; count --- 5 (1 row) and so does this test=# select max(*) from t1; max - 1 (1 row) not so much test=# select max(1) from t1; max - 1 (1 row) but this does test=# select min(*) from t1; min - 1 (1 row) not here though test=# select avg(*) from t1; avg 1. (1 row) nor here test=# select sum(*) from t1; sum - 5 (1 row) or here test=# select sum(1) from t1; sum - 5 --- this is correct (1 row) test=# yep... but really most aggregates are ok with a 1 -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Updatable views/with check option parsing
Tom Lane said: Peter Eisentraut [EMAIL PROTECTED] writes: Andrew Dunstan wrote: The GLR parsers require a compiler for ISO C89 or later. In addition, they use the inline keyword, which is not C89, but is C99 and is a common extension in pre-C99 compilers. It is up to the user of these parsers to handle portability issues. We already use inline, or handle its nonexistence, respectively. Yeah, I don't see anything in that statement that we don't assume already. The interesting question to me is how much different is GLR from garden-variety bison; in particular, what's the parser performance like? As I understand it, it runs one parser pretty much like the standard LALR(1) case, until it finds an ambiguity (shift/reduce or reduce/reduce) at which stage it clones the parser to take parallel paths, working in lockstep, and storing up semantic actions. When one of the clones encounters an error it goes away, and the surviving clone takes its stored semantic actions. If that's true, then probably the only significant performance hit is in cases of ambiguity, and we would only have a handful of those, each lasting for one token, so the performance hit should be very small. We'd have to test it, of course ;-) cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Updatable views/with check option parsing
Andrew Dunstan [EMAIL PROTECTED] writes: As I understand it, it runs one parser pretty much like the standard LALR(1) case, until it finds an ambiguity (shift/reduce or reduce/reduce) at which stage it clones the parser to take parallel paths, working in lockstep, and storing up semantic actions. When one of the clones encounters an error it goes away, and the surviving clone takes its stored semantic actions. If that's true, then probably the only significant performance hit is in cases of ambiguity, and we would only have a handful of those, each lasting for one token, so the performance hit should be very small. We'd have to test it, of course ;-) Yeah, I just read the same in the bison manual. The thing that's bothering me is that a GLR parser would hide that ambiguity from you, and thus changes in the grammar might cause us to incur performance hits without realizing it. The manual indicates that the performance is pretty awful whenever an ambiguity does occur. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Updatable views/with check option parsing
Tom Lane wrote: Yeah, I just read the same in the bison manual. The thing that's bothering me is that a GLR parser would hide that ambiguity from you, It doesn't really hide it. You still get the N shift/reduce conflicts warnings from bison. You just know that they are being handled. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Updatable views/with check option parsing
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane wrote: Yeah, I just read the same in the bison manual. The thing that's bothering me is that a GLR parser would hide that ambiguity from you, It doesn't really hide it. You still get the N shift/reduce conflicts warnings from bison. You just know that they are being handled. Well, that has the same problem that we've raised every other time someone has said I don't want to fix the grammar to not have any conflicts. If bison only tells you there were N conflicts, how do you know these are the same N conflicts you had yesterday? In a grammar that we hack around as much as we do with Postgres, I really don't think that's acceptable. I think that by far the most reliable solution is to put back the filter yylex function that I removed a couple months ago: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/parser.c.diff?r1=1.64r2=1.65 We can use the same technique that we used for UNION JOIN, but instead join, say, WITH and TIME into one token and make the time datatype productions look for TIME WITHTIME ZONE and so on. (I propose this rather than putting the ugliness into WITH CHECK OPTION, because this way we should only need one merged token and thus only one case to check in the filter function; AFAICS we'd need three cases if we merge tokens on that end of it.) I'm not sure we can just revert the above-mentioned patch, because it had some interactions with a later patch to use %option prefix. Shouldn't be too hard to fix though. I'll put together a proposed patch. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] pg_proc probin misuse
Hi, In PL/Py, I had the bright idea of storing bytecode in the probin field of the function's pg_proc row. However, this idea has lately become rather dim as I have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs a PL/Py function as CREATE FUNCTION x() RETURNS y LANGUAGE python AS 'bytecode', 'source'. Of course, when loading this, it fails: 'ERROR: only one AS item needed for language python'. So is this fix your broken PL or pg_dump should only be doing that for C language functions? I imagine the former, so if that is the case perhaps the 'probin' column description at [1] should be reworded to ensure others don't get the same bright idea(the language specific part in particular). Ugh, even if it were the latter, I would still be breaking existing versions, so I'm inclined to fix it regardless.. Have a good evening (afternoon, morning, etc :). [1] http://www.postgresql.org/docs/8.1/static/catalog-pg-proc.html [Yeah, I do see the clarification at the bottom of the page. :( ] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Updatable views/with check option parsing
I wrote: We can use the same technique that we used for UNION JOIN, but instead join, say, WITH and TIME into one token and make the time datatype productions look for TIME WITHTIME ZONE and so on. (I propose this rather than putting the ugliness into WITH CHECK OPTION, because this way we should only need one merged token and thus only one case to check in the filter function; AFAICS we'd need three cases if we merge tokens on that end of it.) On investigation that turns out to have been a bad idea: if we do it that way, it becomes necessary to promote WITH to a fully reserved word. The counterexample is CREATE VIEW v AS SELECT * FROM foo WITH ... Is WITH an alias for foo (with no AS), or is it the start of a WITH CHECK OPTION? No way to tell without lookahead. While I don't think that making WITH a fully reserved word would cause any great damage, I'm unwilling to do it just to save a couple of lines of code. Accordingly, I propose the attached patch. This reinstates the filter yylex function formerly used for UNION JOIN (in a slightly cleaner fashion than it was previously done) and parses WITH CHECK OPTION without any bison complaints, and with no new reserved words. If no objections, I'll go ahead and apply this, and Peter can get on with making the stub productions do something useful. regards, tom lane Index: src/backend/parser/Makefile === RCS file: /cvsroot/pgsql/src/backend/parser/Makefile,v retrieving revision 1.43 diff -c -r1.43 Makefile *** src/backend/parser/Makefile 7 Mar 2006 01:00:17 - 1.43 --- src/backend/parser/Makefile 27 May 2006 02:39:51 - *** *** 57,63 # Force these dependencies to be known even without dependency info built: ! gram.o keywords.o: $(srcdir)/parse.h # gram.c, parse.h, and scan.c are in the distribution tarball, so they --- 57,63 # Force these dependencies to be known even without dependency info built: ! gram.o keywords.o parser.o: $(srcdir)/parse.h # gram.c, parse.h, and scan.c are in the distribution tarball, so they Index: src/backend/parser/gram.y === RCS file: /cvsroot/pgsql/src/backend/parser/gram.y,v retrieving revision 2.544 diff -c -r2.544 gram.y *** src/backend/parser/gram.y 30 Apr 2006 18:30:39 - 2.544 --- src/backend/parser/gram.y 27 May 2006 02:39:52 - *** *** 70,75 --- 70,81 (Current) = (Rhs)[0]; \ } while (0) + /* + * The %name-prefix option below will make bison call base_yylex, but we + * really want it to call filtered_base_yylex (see parser.c). + */ + #define base_yylex filtered_base_yylex + extern List *parsetree; /* final parse result is delivered here */ static bool QueryIsRule = FALSE; *** *** 339,344 --- 345,351 %type list constraints_set_list %type boolean constraints_set_mode %type str OptTableSpace OptConsTableSpace OptTableSpaceOwner + %type list opt_check_option /* *** *** 356,362 BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BOOLEAN_P BOTH BY ! CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB --- 363,369 BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BOOLEAN_P BOTH BY ! CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB *** *** 431,436 --- 438,449 ZONE + /* The grammar thinks these are keywords, but they are not in the keywords.c + * list and so can never be entered directly. The filter in parser.c + * creates these tokens when required. + */ + %tokenWITH_CASCADED WITH_LOCAL WITH_CHECK + /* Special token types, not actually keywords - see the lex file */ %token str IDENT FCONST SCONST BCONST XCONST Op %token ival ICONST PARAM *** *** 4618,4629 /* * *QUERY: ! *CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list ')' AS query * */ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list ! AS SelectStmt { ViewStmt *n = makeNode(ViewStmt);
Re: [HACKERS] pg_proc probin misuse
James William Pye [EMAIL PROTECTED] writes: In PL/Py, I had the bright idea of storing bytecode in the probin field of the function's pg_proc row. However, this idea has lately become rather dim as I have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs a PL/Py function as CREATE FUNCTION x() RETURNS y LANGUAGE python AS 'bytecode', 'source'. Of course, when loading this, it fails: 'ERROR: only one AS item needed for language python'. So is this fix your broken PL or pg_dump should only be doing that for C language functions? Offhand it seems to me that pg_dump is behaving reasonably: it's storing probin if it sees something there to be stored. The asymmetry is in the backend, specifically functioncmds.c's interpret_AS_clause(): it has a hardwired assumption that probin is only relevant to C functions. Feel free to propose a saner definition. AFAICS the current coding makes probin useless for all except C functions, so I think it could be improved. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster