Re: [HACKERS] invalidly encoded strings
On Tue, Sep 11, 2007 at 11:27:50AM +0900, Tatsuo Ishii wrote: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); Without using convert(), he will get random order of data. This is because Kanji characters are in random order in UTF-8, while Kanji characters are reasonably ordered in EUC_JP. The usual way to approach this is to make convert return bytea instead of text. Then your problems vanish. Bytea can still be sorted, but it won't be treated as a text string and thus does not need to conform to the requirements of a text string. Languages like perl distinguish between encode which is text-bytea and decode which is bytea-text. We've got convert for oth and that causes problems. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] invalidly encoded strings
On Tue, 2007-09-11 at 14:50 +0900, Tatsuo Ishii wrote: On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR: invalid byte sequence for encoding UTF8: 0xbd I know this could be a problem (like chr() with invalid byte pattern). What I really want to know is, read query something like this: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); I guess I don't quite understand the question. I agree that ORDER BY convert() must be safe in the C locale, because it just passes the strings to strcmp(). Are you saying that we should not remove convert() until we can support multiple locales in one database? If we make convert() operate on bytea and return bytea, as Tom suggested, would that solve your use case? Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] invalidly encoded strings
Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR: invalid byte sequence for encoding UTF8: 0xbd I know this could be a problem (like chr() with invalid byte pattern). And that's enough of a problem already. We don't need more problems. What I really want to know is, read query something like this: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); could be a problem (I assume we use C locale). If convert() produce a sequence of bytes that can't be interpreted as a string in the server encoding then it's broken. Imho convert() should return a bytea value. If we hade good encoding/charset support we could do better, but we can't today. The above example would work fine if convert() returned a bytea. In the C locale the string would be compared byte for byte and that's what you get with bytea values as well. Strings are not sequences of bytes that can be interpreted in different ways. That's what bytea values are. Strings are in a specific encoding always, and in pg that encoding is fixed to a single one for a whole cluster at initdb time. We should not confuse text with bytea. /Dennis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] invalidly encoded strings
On Tue, 2007-09-11 at 14:50 +0900, Tatsuo Ishii wrote: On Tue, 2007-09-11 at 12:29 +0900, Tatsuo Ishii wrote: Please show me concrete examples how I could introduce a vulnerability using this kind of convert() usage. Try the sequence below. Then, try to dump and then reload the database. When you try to reload it, you will get an error: ERROR: invalid byte sequence for encoding UTF8: 0xbd I know this could be a problem (like chr() with invalid byte pattern). What I really want to know is, read query something like this: SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp); I guess I don't quite understand the question. I agree that ORDER BY convert() must be safe in the C locale, because it just passes the strings to strcmp(). Are you saying that we should not remove convert() until we can support multiple locales in one database? If we make convert() operate on bytea and return bytea, as Tom suggested, would that solve your use case? The problem is, the above use case is just one of what I can think of. Another use case is, something like this: SELECT sum(octet_length(convert(text_column using utf8_to_euc_jp))) FROM mytable; to know the total byte length of text column if it's encoded in EUC_JP. So I'm not sure we could change convert() returning bytea without complaing from users... -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Ts_rank internals
On Tue, 11 Sep 2007, Teodor Sigaev wrote: I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what the resulting DocRepresentation looks like. I wonder if we could get rid of the istrue flag in QueryOperand, and use a local BitmapSet variable instead? It seems wrong to have a temporary flag that's only used in one function, in a struct that's used everywhere. It's a play around CDR algorithms (Cover Density Ranking). Based on paper Clarke et al., Relevance Ranking for One to Three Term Queries. (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I lost the article itself, but may be Oleg has it. Simple and short description is placed at http://www2002.org/CDROM/refereed/643/node7.html. We change original algorithm to support weight of lexeme, details are on Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking Actually, we used two papers http://citeseer.ist.psu.edu/clarke00relevance.html and http://portal.acm.org/ft_gateway.cfm?id=333137type=pdfdl=GUIDEdl=ACM I can send you the latter if you have no access to the ACM. Array of DocRepresentation is a representation of document, it contains only lexemes from both tsvector and tsquery, and lexemes are ordered by position - as in original doc. Each DocRepresentation has links to corresponding QueryOperand to optimize query execution while extent search. When we enlarge current extent for one word then we set istrue flag for corresponding QueryOperand and execution tsquery from cover becomes very simple task. It's possible to eliminate istrue flag, but it's needed to implement algorithm to execute tsquery over continuos part of document, not over whole document. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] invalidly encoded strings
Andrew Dunstan wrote: Instead of the code point, I'd prefer the actual encoding of the character as argument to chr() and return value of ascii(). And frankly, I don't know how to do it sanely anyway. A character encoding has a fixed byte pattern, but a given byte pattern doesn't have a single universal number value. I really don't think we want to have the value of chr(n) depend on the endianness of the machine, do we? The reason we are prepared to make an exception for Unicode is precisely because the code point maps to an encoding pattern independently of architecture, ISTM. Point taken. I only wanted to make sure that there are good reasons to differ from Oracle. Oracle's chr() is big-endian on all platforms, BTW. Yours, Laurenz Albe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Per-function search_path = per-function GUC settings
Tom Lane wrote: I thought about ways to include GUC settings directly into CREATE FUNCTION, but it seemed pretty ugly and inconsistent with the existing syntax. So I'm thinking of supporting only the above syntaxes, meaning it'll take at least two commands to create a secure SECURITY DEFINER function. Comments? I have a question about what does happen if search path is not defined for SECURITY DEFINER function. My expectation is that SECURITY DEFINER function should defined empty search patch in this case. This behavior is similar to how dynamic linker processes setuid binaries - (ignoring LD_LIBRARY_PATH and so on). Zdenek ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
I've completed a review of all of the LWlocking in the backends. This is documented in the enclosed file. I would propose that we use this as comments in lwlock.h or in the README, if people agree. A number of points emerge from that analysis: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. After analysis, I am still concerned because subxact abort could now be starved out by large number of shared holders, then when it is acquired we may experience starvation of shared requestors, as described in point (4) here: http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php I no longer want to solve it in the way described there, but have a solution described in a separate post on -hackers. The original solution still seems valid, but if we can solve it another way we should. 2. CountActiveBackends() searches the whole of the proc array, even though it could stop when it gets to commit_siblings. Stopping once the heuristic has been determined seems like the best thing to do. A small patch to implement this is attached. 3. ReceiveSharedInvalidMessages() takes a Shared lock on SInvalLock, then takes an Exclusive lock later in the same routine to perform SIDelExpiredDataEntries(). The latter routine examines data that it hasn't touched to see if it can delete anything. If it finds anything other than its own consumed message it will only be because it beat another backend in the race to delete a message it just consumed. So most callers of SIDelExpiredDataEntries() will do nothing at all, after having queued for an X lock. I can't see the sense in that, but maybe there is some deeper purpose? ISTM that we should only attempt to clean the queue when it fills, during SIInsertDataEntry(), which it already does. We want to avoid continually re-triggering postmaster signals, but we should do that anyway with a yes-I-already-did-that flag, rather than by eager cleaning of the queue, which just defers a postmaster signal storm, but does not prevent it. 4. WALWriteLock is acquired in Shared mode by bgwriter when it runs GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the Shared request will queue like everybody else. WALWriteLock queue length can be long, so the bgwriter can get stuck for much longer than bgwriter_delay when it makes this call; this happens only when archive_timeout 0 so probably has never shown up in any performance testing. XLogWrite takes info_lck also, so we can move the lastSegSwitchTime behind that lock instead. That way bgwriter need never wait on I/O, just spin for access to info_lck. Minor change. 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), but I can't find a caller of that anywhere in core or contrib. Can those now be removed? 6. David Strong talked about doing some testing to see if NUM_BUFFER_PARTITIONS should be increased above 16. We don't have any further information on that. Should we increase the value to 32 or 64? A minor increase seems safe and should provide the most gain without decreasing performance for lower numbers of CPUs. 7. VACUUM has many contention points within it, so HOT should avoid the annoyance of having to run VACUUM repeatedly on very small heavily-updated tables. I haven't further analysed the SLRU locks, since nothing much has changed there recently and they were already pretty efficient, IIRC. I'm working on patches for 1-4. We've moved far in recent weeks, so it seems like we should finish the job. Comments? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com BufFreelistLock, /* X - all - each time we allocate a new buffer for data block I/O * Never held across I/O */ ShmemIndexLock, /* X - all - create/attach to shared memory * Never held across I/O */ OidGenLock, /* X - all - each GetNewOid() and each GetNewRelFileNode() * S - bgwriter - acquired during checkpoint * Writes WAL record every 8192 OIDs, so vanishing chance * of being held across I/O */ XidGenLock, /* X - all - for each GetNewTransactionId() * check whether we need to call ExtendClog or ExtendSubtrans * could be held across I/O if clog or subtrans buffers * have a dirty LRU page * S - all - for each ReadNewTransactionId() *5 * called by GetNextXidAndEpoch(), * once per VACUUM of each relation
Re: [HACKERS] CSStorm occurred again by postgreSQL8.2
On Wed, 2006-09-13 at 21:45 -0400, Tom Lane wrote: Anyway, given that there's this one nonobvious gotcha, there might be others. My recommendation is that we take this off the open-items list for 8.2 and revisit it in the 8.3 cycle when there's more time. Well, its still 8.3 just... As discussed in the other thread Final Thoughts for 8.3 on LWLocking and Scalability, XidCacheRemoveRunningXids() is now the only holder of an X lock during normal processing, so I would like to remove it. Here's how: Currently, we take the lock, remove the subxact and then shuffle down all the other subxactIds so that the subxact cache is contiguous. I propose that we simply zero out the subxact entry without re-arranging the cache; this will be atomic, so we need not acquire an X lock. We then increment ndeletedxids. When we enter a new subxact into the cache, if ndeletedxids 0 we scan the cache to find an InvalidTransactionId that we can use, then decrement ndeletedxids. So ndeletedxids is just a hint, not an absolute requirement. nxids then becomes the number of cache entries and never goes down until EOXact. The subxact cache is no longer in order, but then it doesn't need to be either. When we take a snapshot we will end up taking a copy of zeroed cache entries, so the snapshots will be slightly larger than previously. Though still no larger than the max. The size reduction was not so large as to make a significant difference across the whole array, so scalability is the main issue to resolve. The snapshots will be valid with no change, since InvalidTransactionId will never match against any recorded Xid. I would also like to make the size of the subxact cache configurable with a parameter such as subtransaction_cache_size = 64 (default), valid range 4-256. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
On 9/11/07, Simon Riggs [EMAIL PROTECTED] wrote: 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), but I can't find a caller of that anywhere in core or contrib. Can those now be removed? GetNextXidAndEpoch() is needed for external modules to use 8-byte transaction ids user-level. Used by txid module in Skytools/pgq. Please keep it. -- marko ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
On Tue, 2007-09-11 at 13:31 +0300, Marko Kreen wrote: On 9/11/07, Simon Riggs [EMAIL PROTECTED] wrote: 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), but I can't find a caller of that anywhere in core or contrib. Can those now be removed? GetNextXidAndEpoch() is needed for external modules to use 8-byte transaction ids user-level. Used by txid module in Skytools/pgq. Please keep it. Guessed it might be you folks that needed it. I'll comment that, so we don't forget it again. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pgcrypto related backend crash on solaris 10/x86_64
Marko Kreen wrote: On 9/9/07, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote: I brought back clownfish(still a bit dubious about the unexplained failures which seem vmware emulation bugs but this one seems to be easily reproduceable) onto the buildfarm and enabled --with-openssl after the the recent openssl/pgcrypto related fixes but I'm still getting a backend crash during the pgcrypto regression tests: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfishdt=2007-09-09%2012:14:50 backtrace looks like: program terminated by signal SEGV (no mapping at the fault address) 0xfd7fff241b61: AES_encrypt+0x0241: xorq (%r15,%rdx,8),%rbx (dbx) where =[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49, 0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfd7fff241b61 [2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0 This is crashing because of the crippled OpenSSL on some version of Solaris. Zdenek Kotala posted a workaround for that, I am cleaning it but have not found the time to finalize it. I'll try to post v03 of Zdenek's patch ASAP. However, I guess there still will be a problem with regression tests, because pg_crypto will reports error in case when user tries to use stronger cipher, but it generates diff between expected and real output. I don't know if is possible select different output based on test if strong crypto is installed or not. Maybe some magic in Makefile/Configure. Test should be: # ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra # libcrypto_extra.so.0.9.8 = (file not found) if output contains (file not found) library is not installed or not in path (/usr/sfw/lib). Zdenek ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] What is happening on buildfarm member dugong
On Tue, 11 Sep 2007, Tom Lane wrote: dugong has been failing contribcheck repeatably for the last day or so, with a very interesting symptom: CREATE DATABASE is failing with The reason for that is that I've been trying to switch from 9.1 to 10.0 version of the ICC compiler. A month ago, I've tried for the first time, discovered a segfault due the bug in ICC, submitted it to Intel. Lately Intel fixed it in icc 10.0.026. And in last several days I tried to make the new version work with postgres. From the first impression it worked, so I upgraded the compiler for the buildfarm, but the buildfarm failed. Few notes: 1) without the --enable-cassert everything works 2) with --enable-cassert it, the only thing that fails in the tests is contrib-installcheck... 3) And recently I tried to compile PG also with -O0 flag, it actually worked. 4) Also, just now I tried to compile PG 8.2.4 and the same problem occurs. So for me the most probable explanation is the ICC bug, but unfortunately since it is not a pure segfault, it is a bit hard for me to tackle... So, I can either completely switch back to 9.1 and forget it, or we can try to find or at least localize this bug(if it is ICC fault). But to do that, I need some advices/help, how to do it better... regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007, Tom Lane wrote: dugong has been failing contribcheck repeatably for the last day or so, with a very interesting symptom: CREATE DATABASE is failing with The reason for that is that I've been trying to switch from 9.1 to 10.0 version of the ICC compiler. Hah, interesting. Few notes: 1) without the --enable-cassert everything works 2) with --enable-cassert it, the only thing that fails in the tests is contrib-installcheck... 3) And recently I tried to compile PG also with -O0 flag, it actually worked. 4) Also, just now I tried to compile PG 8.2.4 and the same problem occurs. So, I can either completely switch back to 9.1 and forget it, or we can try to find or at least localize this bug(if it is ICC fault). But to do that, I need some advices/help, how to do it better... Well, the first thing I'd suggest is trying to localize which Assert makes it fail. From the bug's behavior I think it is highly probable that the problem is in fsync signalling, which puts it either in bgwriter.c or md.c. Try recompiling those modules separately without cassert (leaving all else enabled) and see if the problem comes and goes; if so, comment out one Assert at a time till you find which one. Actually ... another possibility is that it's not directly an Assert, but CLOBBER_FREED_MEMORY that exposes the bug. (This would suggest that the compiler is trying to re-order memory accesses around a pfree.) So before you get into the one-assert-at-a-time test, try with --enable-cassert but modify pg_config_manual.h to not define CLOBBER_FREED_MEMORY. This could be a compiler bug, or it could be our fault --- might need a volatile on some pointer or other, for example, to prevent the compiler from making an otherwise legitimate assumption. So it seems worth chasing it down. BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing? I see that configure tries to feed that switch to it, but it might want some other spelling. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CSStorm occurred again by postgreSQL8.2
Simon Riggs [EMAIL PROTECTED] writes: As discussed in the other thread Final Thoughts for 8.3 on LWLocking and Scalability, XidCacheRemoveRunningXids() is now the only holder of an X lock during normal processing, Nonsense. Main transaction exit also takes an exclusive lock, and is far more likely to be exercised in typical workloads than a subtransaction abort. In any case: there has still not been any evidence presented by anyone that optimizing XidCacheRemoveRunningXids will help one bit. Given the difficulty of measuring any benefit from the last couple of optimizations in this general area, I'm thinking that such evidence will be hard to come by. And we have got way more than enough on our plates already. Can we let go of this for 8.3, please? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Per-function search_path = per-function GUC settings
Zdenek Kotala [EMAIL PROTECTED] writes: I have a question about what does happen if search path is not defined for SECURITY DEFINER function. My expectation is that SECURITY DEFINER function should defined empty search patch in this case. Your expectation is incorrect. We are not in the business of breaking every application in sight, which is what that would do. Nor do I think it's a good plan to try to be smarter than the programmer. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] What is happening on buildfarm member dugong
This could be a compiler bug, or it could be our fault --- might need a volatile on some pointer or other, for example, to prevent the compiler from making an otherwise legitimate assumption. So it seems worth chasing it down. Tom, Thank you for the directions, I'll try to do what you recommended. BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing? I see that configure tries to feed that switch to it, but it might want some other spelling. Apparently in none of the ICC manuals -fno-strict-aliasing is described, but ICC accepts such flag, and produce the same code as with '-fno-alias' flag (described in ICC manuals). regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
Simon Riggs [EMAIL PROTECTED] writes: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. It would be brilliant if it were true, but it isn't. Better look again. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgcrypto related backend crash on solaris 10/x86_64
On 9/11/07, Zdenek Kotala [EMAIL PROTECTED] wrote: Marko Kreen wrote: This is crashing because of the crippled OpenSSL on some version of Solaris. Zdenek Kotala posted a workaround for that, I am cleaning it but have not found the time to finalize it. I'll try to post v03 of Zdenek's patch ASAP. However, I guess there still will be a problem with regression tests, because pg_crypto will reports error in case when user tries to use stronger cipher, but it generates diff between expected and real output. I don't know if is possible select different output based on test if strong crypto is installed or not. Maybe some magic in Makefile/Configure. Test should be: # ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra # libcrypto_extra.so.0.9.8 = (file not found) if output contains (file not found) library is not installed or not in path (/usr/sfw/lib). Failing regression tests are fine - it is good if user can easily see that the os is broken. -- marko ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] pg_dump and money type
Since the money type has a locale dependent input and output format, there has to be some context saved when a database dump is created. For example, if your environment uses a locale that uses the opposite point-vs-comma conventions from English (e.g., de_DE), then the following will fail to replicate the regression test database: pg_dump regression | psql foo The database regression has lc_monetary = C set, so this will produce C output piped into, say, de_DE input. The first problem appears to be that pg_dump --create ought to save the database-specific configuration settings. pg_dumpall gets this right. But secondly, lc_monetary ought to be saved at the top of the dump file, much like client_encoding. Unfortunately, that would probably break portability of dump files between different operating systems. Perhaps we can get away with fixing --create and documenting this. But something ought to be done about this; otherwise using the money type introduces a risk of breaking backup or upgrade procedures. Comments? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. It would be brilliant if it were true, but it isn't. Better look again. On the more detailed explanation, I say in normal operation. My analytical notes attached to the original post show ProcArrayLock is acquired exclusively during backend start, exit and while making a prepared (twophase) commit. So yes, it is locked Exclusively in other places, but they happen rarely and they actually add/remove procs from the array, so its unlikely anything can change there anyhow. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing? I see that configure tries to feed that switch to it, but it might want some other spelling. Apparently in none of the ICC manuals -fno-strict-aliasing is described, but ICC accepts such flag, and produce the same code as with '-fno-alias' flag (described in ICC manuals). Well, since configure.in has a separate code path for ICC anyway, it seems like we might as well provide it the official spelling. Any objections to a patch like this? if test $GCC = yes -a $ICC = no; then CFLAGS=$CFLAGS -Wall -Wmissing-prototypes -Wpointer-arith -Winline # These work in some but not all gcc versions PGAC_PROG_CC_CFLAGS_OPT([-Wdeclaration-after-statement]) PGAC_PROG_CC_CFLAGS_OPT([-Wendif-labels]) # Disable strict-aliasing rules; needed for gcc 3.3+ PGAC_PROG_CC_CFLAGS_OPT([-fno-strict-aliasing]) elif test $ICC = yes; then # Intel's compiler has a bug/misoptimization in checking for # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS. PGAC_PROG_CC_CFLAGS_OPT([-mp1]) - # Not clear if this is needed, but seems like a good idea - PGAC_PROG_CC_CFLAGS_OPT([-fno-strict-aliasing]) + # ICC prefers to spell the no-strict-aliasing switch like this + PGAC_PROG_CC_CFLAGS_OPT([-fno-alias]) elif test x${CC} = xxlc; then # AIX xlc has to have strict aliasing turned off too PGAC_PROG_CC_CFLAGS_OPT([-qnoansialias]) fi regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Per-function search_path = per-function GUC settings
Tom Lane wrote: Zdenek Kotala [EMAIL PROTECTED] writes: I have a question about what does happen if search path is not defined for SECURITY DEFINER function. My expectation is that SECURITY DEFINER function should defined empty search patch in this case. Your expectation is incorrect. We are not in the business of breaking every application in sight, which is what that would do. Oh. I see. In this point of view I suggest to add some warning about potential security issue if SECURITY DEFINER function will create without preset search_path. I'm aware that a lot of developer forget to modify their application. Zdenek ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] pgcrypto related backend crash on solaris 10/x86_64
Marko Kreen wrote: On 9/11/07, Zdenek Kotala [EMAIL PROTECTED] wrote: Marko Kreen wrote: This is crashing because of the crippled OpenSSL on some version of Solaris. Zdenek Kotala posted a workaround for that, I am cleaning it but have not found the time to finalize it. I'll try to post v03 of Zdenek's patch ASAP. However, I guess there still will be a problem with regression tests, because pg_crypto will reports error in case when user tries to use stronger cipher, but it generates diff between expected and real output. I don't know if is possible select different output based on test if strong crypto is installed or not. Maybe some magic in Makefile/Configure. Test should be: # ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra # libcrypto_extra.so.0.9.8 = (file not found) if output contains (file not found) library is not installed or not in path (/usr/sfw/lib). Failing regression tests are fine - it is good if user can easily see that the os is broken. But if build machine still complain about problem we can easily overlook another problems. There are two possible solution 1) modify reg test or 2) recommend to install crypto package on all affected build machine. Anyway I plan to add some mention into solaris FAQ when we will have final patch. I also think It should be good to mention in pg_crypto README or add comment into regression test expected output file which will be visible in regression.diff. Zdenek ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CSStorm occurred again by postgreSQL8.2
On Tue, 2007-09-11 at 09:58 -0400, Tom Lane wrote: Can we let go of this for 8.3, please? OK, we've moved forward, so its a good place to break. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] What is happening on buildfarm member dugong
Well, the first thing I'd suggest is trying to localize which Assert makes it fail. From the bug's behavior I think it is highly probable that the problem is in fsync signalling, which puts it either in bgwriter.c or md.c. Try recompiling those modules separately without cassert (leaving all else enabled) and see if the problem comes and goes; if so, comment out one Assert at a time till you find which one. Actually ... another possibility is that it's not directly an Assert, but CLOBBER_FREED_MEMORY that exposes the bug. (This would suggest that the compiler is trying to re-order memory accesses around a pfree.) So before you get into the one-assert-at-a-time test, try with --enable-cassert but modify pg_config_manual.h to not define CLOBBER_FREED_MEMORY. It seems that neither undefining CLOBBER_FREED_MEMORY, nor disabling casserting for md.c and bgwriter.c helps The contrib-installcheck still fails. I disabled casserting for md.c and bgwriter.c by inserting #undef USE_ASSERT_CHECKING in the top of the md.c and bgwriter.c exactly after the inclusion of postgres.h, but before other includes. (I think it is the right way to do it) My configure flags: ./configure --enable-cassert --enable-depend --enable-debug --enable-nls --enable-integer-datetimes --with-libxml LDFLAGS='-lirc -limf' --enable-depend --prefix=/home/math/cvs/install/ CC=ic Regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] What is happening on buildfarm member dugong
Actually, in the log file I also see some messages about has_seq_search: КОМАНДА: CREATE DATABASE contrib_regression TEMPLATE=template0 NOTICE: database contrib_regression does not exist, skipping ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: could not fsync segment 0 of relation 1663/16384/2617: No such file or directory ERROR: checkpoint request failed I also tried to turn off asserting for dynahash.c, but it didn't help... regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: Actually, in the log file I also see some messages about has_seq_search: =EB=EF=ED=E1=EE=E4=E1: CREATE DATABASE contrib_regression TEMPLATE=3Dtem= plate0 NOTICE: database contrib_regression does not exist, skipping ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: could not fsync segment 0 of relation 1663/16384/2617: No such file= or directory ERROR: checkpoint request failed That could be a consequent effect I think --- bgwriter is lacking an AtEOXact_HashTables call in error recovery (something I will go fix) and so after enough fsync errors we'd start getting these. Anyway it seems we need to cast the net a bit wider for where the troublesome Assert is. I'd suggest rebuilding the whole system with --enable-cassert, then comment out the USE_ASSERT_CHECKING #define in pg_config.h, and make clean/make in one backend subdirectory at a time till you see where it stops failing. Then repeat at the file level. Divide and conquer... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] invalidly encoded strings
On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: The reason we have a problem here is that we've been choosing convenience over safety in encoding-related issues. I wonder if we must stoop to having a strict_encoding_checks GUC variable to satisfy everyone. That would be satisfactory to me. However, I'm sure some will cringe at a MySQL-like configurable integrity option. Might it be better as an initdb-time option? I can't think why anyone would want to change it later. It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII. I prefer this option. It's consistent with the docs, which say: The SQL_ASCII setting behaves considerably differently from the other settings, and also One way to use multiple encodings safely is to set the locale to C or POSIX during initdb, thus disabling any real locale awareness. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: Actually, in the log file I also see some messages about has_seq_search: ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans BTW, I just made a commit to include the hash table name in this message. Could you update src/backend/utils/hash/dynahash.c and retry the test? I suspect it'll say the bgwriter's pending-ops table, but we should verify that. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Per-function search_path = per-function GUC settings
Am Dienstag, 11. September 2007 15:53 schrieb Tom Lane: Zdenek Kotala [EMAIL PROTECTED] writes: I have a question about what does happen if search path is not defined for SECURITY DEFINER function. My expectation is that SECURITY DEFINER function should defined empty search patch in this case. Your expectation is incorrect. We are not in the business of breaking every application in sight, which is what that would do. Well, a SECURITY DEFINER function either sets its own search path, in which case a default search path would have no effect, or it doesn't set its own search path, in which case it's already broken (albeit in a different way). So setting a default search path can only be a net gain. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] What is happening on buildfarm member dugong
It seems to me last run (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugongdt=2007-09-11%2016:05:01) points to problem with hash implementation. *** ./expected/tsdicts.out Tue Sep 11 20:05:23 2007 --- ./results/tsdicts.out Tue Sep 11 20:18:38 2007 *** *** 301,306 --- 301,307 lword, lpart_hword, lhword WITH synonym, thesaurus, english_stem; SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three one'); + NOTICE: thesaurus word-sample the is recognized as stop-word, assign any stop-word (rule 8) At this place of tsdicts test dictionary thesaurus should be already loaded and initialized, but this NOTICE points that thesaurus was initialized here. ERROR: too many active hash_seq_search scans -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] What is happening on buildfarm member dugong
Teodor Sigaev [EMAIL PROTECTED] writes: It seems to me last run (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugongdt=2007-09-11%2016:05:01) points to problem with hash implementation. dynahash.c is used all over the system, though. If it were broken by a compiler issue, it's hard to credit that we'd be getting through all but one or two regression tests ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] What is happening on buildfarm member dugong
On Tue, 11 Sep 2007, Tom Lane wrote: NOTICE: database contrib_regression does not exist, skipping ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans With hash_seq_search ERROR, it was partially a false alarm. I've had some old postgres daemon hanging around and writing that to the log. Although I remember seeing that hash_seq_search message recently when dealing with this bug, it does not show up in the course of standard regression tests. regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Per-function search_path = per-function GUC settings
Peter Eisentraut [EMAIL PROTECTED] writes: Well, a SECURITY DEFINER function either sets its own search path, in which case a default search path would have no effect, or it doesn't set its own search path, in which case it's already broken (albeit in a different way). So setting a default search path can only be a net gain. It would break functions that actually want to use a caller-specified search path, and protect themselves by explicitly schema-qualifying every other reference than one to some caller-specified object. Which admittedly is notationally a pain in the neck, but it's possible to do. I do not think that we should foreclose potentially useful behavior *and* make a major break in backward compatibility in order to make a very small improvement in security. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007, Tom Lane wrote: NOTICE: database contrib_regression does not exist, skipping ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans With hash_seq_search ERROR, it was partially a false alarm. I've had some old postgres daemon hanging around and writing that to the log. Although I remember seeing that hash_seq_search message recently when dealing with this bug, it does not show up in the course of standard regression tests. Yeah, it's not there on your buildfarm reports, but that's not totally surprising. I would expect it to start showing up after 100 failed checkpoint attempts, which is how long it'd take the bgwriter's hash_seq_search table to overflow ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] What is happening on buildfarm member dugong
On Tue, 11 Sep 2007, Tom Lane wrote: Sergey E. Koposov [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007, Tom Lane wrote: NOTICE: database contrib_regression does not exist, skipping ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans ERROR: too many active hash_seq_search scans With hash_seq_search ERROR, it was partially a false alarm. I've had some old postgres daemon hanging around and writing that to the log. Although I remember seeing that hash_seq_search message recently when dealing with this bug, it does not show up in the course of standard regression tests. Yeah, it's not there on your buildfarm reports, but that's not totally surprising. I would expect it to start showing up after 100 failed checkpoint attempts, which is how long it'd take the bgwriter's hash_seq_search table to overflow ... Yes, indeed. After several make installcheck's I get ERROR: too many active hash_seq_search scans, cannot start one on smgr relation table ERROR: too many active hash_seq_search scans, cannot start one on smgr relation table Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov [EMAIL PROTECTED] writes: Yes, indeed. After several make installcheck's I get ERROR: too many active hash_seq_search scans, cannot start one on smgr relation table ERROR: too many active hash_seq_search scans, cannot start one on smgr relation table Hm, so that must be coming from smgrcloseall(), which is the only user of hash_seq_search on SMgrRelationHash. I bet that's popping up once a second and the bgwriter is getting nothing done, because it's failing again at the bottom of error recovery :-(. It's a good thing you happened to notice those messages, because this is a pretty bad bug. Anyway, I've committed a fix for that, so we can get back to the main question, which is why you're getting the fsync error in the first place. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
Simon Riggs wrote: On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. It would be brilliant if it were true, but it isn't. Better look again. On the more detailed explanation, I say in normal operation. My analytical notes attached to the original post show ProcArrayLock is acquired exclusively during backend start, exit and while making a prepared (twophase) commit. So yes, it is locked Exclusively in other places, but they happen rarely and they actually add/remove procs from the array, so its unlikely anything can change there anyhow. Well, and during normal during COMMIT and ABORT, which might happen rather frequently ;-) I do agree, however, that XidCacheRemoveRunningXids() is the only site left where getting rid of it might be possible, and might bring measurable benefit for some workloads. With more effort, we might not even need it during ABORT, but I doubt that the effort would be worth it. While some (plpgsql intensive) workloads might abort subxacts rather frequently, I doubt that same holds true for toplevel aborts. I'm actually working on a patch to remove that lock from XidCacheRemoveRunningXids(), but I'm not yet completely sure that my approach is safe. Tom had some objections that I take rather seriously. We'll see ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] invalidly encoded strings
Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII. I prefer this option. I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether there are valid usages that that would foreclose. Tatsuo mentioned length() but bytea has that. What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two encodings convert(bytea, name) returns text -- convert from arbitrary encoding to DB encoding The second and third would need to do a verify step before converting, of course. Comments? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] invalidly encoded strings
On Tue, 2007-09-11 at 14:48 -0400, Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII. I prefer this option. I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether there are valid usages that that would foreclose. Tatsuo mentioned length() but bytea has that. Once it's in bytea, you can make operators to achieve the old functionality. If I understood correctly, he was making a backwards compatibility argument, not a functionality argument. I can't think of a problem without a workaround, but maybe there are some. What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two encodings convert(bytea, name) returns text -- convert from arbitrary encoding to DB encoding The second and third would need to do a verify step before converting, of course. I like it. Regards, Jeff Davis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability
On Tue, 2007-09-11 at 19:32 +0200, Florian G. Pflug wrote: Simon Riggs wrote: On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. It would be brilliant if it were true, but it isn't. Better look again. On the more detailed explanation, I say in normal operation. My analytical notes attached to the original post show ProcArrayLock is acquired exclusively during backend start, exit and while making a prepared (twophase) commit. So yes, it is locked Exclusively in other places, but they happen rarely and they actually add/remove procs from the array, so its unlikely anything can change there anyhow. Well, and during normal during COMMIT and ABORT, which might happen rather frequently ;-) Agreed, that part of my assessment was not accurate... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] invalidly encoded strings
Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote: It might work the way you are expecting if the database uses SQL_ASCII encoding and C locale --- and I'd be fine with allowing convert() only when the database encoding is SQL_ASCII. I prefer this option. I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether there are valid usages that that would foreclose. Tatsuo mentioned length() but bytea has that. But length(bytea) cannot count characters, only bytes. What I think we'd need to have a complete solution is convert(text, name) returns bytea -- convert from DB encoding to arbitrary encoding convert(bytea, name, name) returns bytea -- convert between any two encodings convert(bytea, name) returns text -- convert from arbitrary encoding to DB encoding That seems good. This is the encode/decode that other systems have. However ISTM we would also need something like length(bytea, name) returns int -- counts the number of characters assuming that the bytea is in -- the given encoding Hmm, I wonder if counting chars is consistent regardless of the encoding the string is in. To me it sounds like it should, in which case it works to convert to the DB encoding and count chars there. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1, W 73º 13' 56.4 inflex really, I see PHP as like a strange amalgamation of C, Perl, Shell crab inflex: you know that amalgam means mixture with mercury, more or less, right? crab i.e., deadly poison ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] invalidly encoded strings
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: I think really the technically cleanest solution would be to make convert() return bytea instead of text; then we'd not have to put restrictions on what encoding or locale it's working inside of. However, it's not clear to me whether there are valid usages that that would foreclose. Tatsuo mentioned length() but bytea has that. But length(bytea) cannot count characters, only bytes. So what? If you want characters, just count the original text string. Encoding conversion won't change that. Hmm, I wonder if counting chars is consistent regardless of the encoding the string is in. To me it sounds like it should, in which case it works to convert to the DB encoding and count chars there. A conversion that isn't one-for-one is not merely an encoding conversion IMHO. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_dump and money type
Peter Eisentraut [EMAIL PROTECTED] writes: Since the money type has a locale dependent input and output format, there has to be some context saved when a database dump is created. For example, if your environment uses a locale that uses the opposite point-vs-comma conventions from English (e.g., de_DE), then the following will fail to replicate the regression test database: pg_dump regression | psql foo The database regression has lc_monetary = C set, so this will produce C output piped into, say, de_DE input. The first problem appears to be that pg_dump --create ought to save the database-specific configuration settings. pg_dumpall gets this right. But secondly, lc_monetary ought to be saved at the top of the dump file, much like client_encoding. Unfortunately, that would probably break portability of dump files between different operating systems. Perhaps we can get away with fixing --create and documenting this. But something ought to be done about this; otherwise using the money type introduces a risk of breaking backup or upgrade procedures. This risk seems rather overstated, as it's unlikely that someone using money would choose to reload their data into a DB with a fundamentally incompatible locale setting. They might, however, move to a different platform that spells the name of that locale differently --- so I concur that adding an lc_monetary setting to pg_dump output is likely to be a cure worse than the disease. My inclination is to do nothing except perhaps document the issue someplace. But since we've never heard any actual user complaints about it, how real is the issue? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] invalidly encoded strings
Tatsuo Ishii [EMAIL PROTECTED] writes: If we make convert() operate on bytea and return bytea, as Tom suggested, would that solve your use case? The problem is, the above use case is just one of what I can think of. Another use case is, something like this: SELECT sum(octet_length(convert(text_column using utf8_to_euc_jp))) FROM mytable; to know the total byte length of text column if it's encoded in EUC_JP. Since octet_length exists and produces identical results for both text and bytea, this example is hardly very convincing... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] What is happening on buildfarm member dugong
Sergey E. Koposov wrote: With hash_seq_search ERROR, it was partially a false alarm. I've had some old postgres daemon hanging around and writing that to the log. Although I remember seeing that hash_seq_search message recently when dealing with this bug, it does not show up in the course of standard regression tests. Your buildfarm member must be seriously misconfigured if you can get the logs from different postmasters comingled. Every run gets its own logfile in its own inst directory. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_dump and money type
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: Since the money type has a locale dependent input and output format, there has to be some context saved when a database dump is created. For example, if your environment uses a locale that uses the opposite point-vs-comma conventions from English (e.g., de_DE), then the following will fail to replicate the regression test database: pg_dump regression | psql foo The database regression has lc_monetary = C set, so this will produce C output piped into, say, de_DE input. The first problem appears to be that pg_dump --create ought to save the database-specific configuration settings. pg_dumpall gets this right. But secondly, lc_monetary ought to be saved at the top of the dump file, much like client_encoding. Unfortunately, that would probably break portability of dump files between different operating systems. Perhaps we can get away with fixing --create and documenting this. But something ought to be done about this; otherwise using the money type introduces a risk of breaking backup or upgrade procedures. This risk seems rather overstated, as it's unlikely that someone using money would choose to reload their data into a DB with a fundamentally incompatible locale setting. It doesn't sound unlikely at all to me. For example, people often use C-locale for performance reasons, or because of ignorance of locale issues. One scenario that seems particularly likely is to initialize and load a database with en_US or C locale, and run like that for a few weeks. After that, you notice that something's wrong, strings are sorted in a funny way, etc. You realize that you're using the wrong locale, so you take a backup with pg_dump, re-initdb with correct locale, and restore. I haven't been following this thread closely; is there a work-around? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] What is happening on buildfarm member dugong
On Tue, 11 Sep 2007, Andrew Dunstan wrote: Your buildfarm member must be seriously misconfigured if you can get the logs from different postmasters comingled. Every run gets its own logfile in its own inst directory. No, everything I'm doing now about that bug, I'm doing in the very separate from the buildfarm place. That logs were mixed there. regards, Sergey *** Sergey E. Koposov Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20
Preparations are being made for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20. The CVS branches are nearly ready. The releases will happen sometime early next week. The packagers have been contacted. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20
Bruce Momjian wrote: Preparations are being made for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20. The CVS branches are nearly ready. The releases will happen sometime early next week. The packagers have been contacted. Does this mean that if I commit something in these days to those branches, it will not show up in the releases? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20
Alvaro Herrera wrote: Bruce Momjian wrote: Preparations are being made for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20. The CVS branches are nearly ready. The releases will happen sometime early next week. The packagers have been contacted. Does this mean that if I commit something in these days to those branches, it will not show up in the releases? It certainly will show up if you do it before the packagers pull their CVS copies. Right now configure/configure.in are not stamped so it is impossible for a packager to use it. I would check those files before making changes to be sure. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] HOT breaks CLUSTER, a bit
While editing Bruce's README.HOT documentation I ran into a problem. I had idly added the following text after thinking about the different sorts of snapshots we use: --- Another unpleasant consequence is that it is no longer very meaningful to use SnapshotAny in an index scan: if the index was created more recently than the last vacuum, it's possible that some of the visited tuples do not match the index entry they are linked to. This does not seem to be a fatal objection in practice, since there are few users of SnapshotAny and they all use seqscans. --- However, a quick grep to confirm that turned up a problem: CLUSTER uses SnapshotAny with an indexscan. This essentially means that CLUSTER might fetch recently-dead tuples out of order, because it finds them attached to an index HOT chain that's for a different index key value. I think that this is not a fatal objection; the out-of-order-ness is limited and won't be seen at all by transactions with snapshots postdating the CLUSTER, and CLUSTER can't guarantee the ordering will stay pristine for long anyway. But it's a bit worrisome. Does anyone see a bigger problem here than I do? BTW, the proposed HOT code in indexam.c that special-cases SnapshotAny is a crock ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_dump and money type
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: This risk seems rather overstated, as it's unlikely that someone using money would choose to reload their data into a DB with a fundamentally incompatible locale setting. It doesn't sound unlikely at all to me. For example, people often use C-locale for performance reasons, or because of ignorance of locale issues. One scenario that seems particularly likely is to initialize and load a database with en_US or C locale, and run like that for a few weeks. After that, you notice that something's wrong, strings are sorted in a funny way, etc. You realize that you're using the wrong locale, so you take a backup with pg_dump, re-initdb with correct locale, and restore. If you're using type money, you will certainly have noticed whether it spells the currency sign the way you like. I can believe that someone might go for a while with C where they should have used en_US, or vice versa, but not that they'd have failed to notice the difference between $ and DM, say. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_dump and money type
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: This risk seems rather overstated, as it's unlikely that someone using money would choose to reload their data into a DB with a fundamentally incompatible locale setting. It doesn't sound unlikely at all to me. For example, people often use C-locale for performance reasons, or because of ignorance of locale issues. One scenario that seems particularly likely is to initialize and load a database with en_US or C locale, and run like that for a few weeks. After that, you notice that something's wrong, strings are sorted in a funny way, etc. You realize that you're using the wrong locale, so you take a backup with pg_dump, re-initdb with correct locale, and restore. If you're using type money, you will certainly have noticed whether it spells the currency sign the way you like. I can believe that someone might go for a while with C where they should have used en_US, or vice versa, but not that they'd have failed to notice the difference between $ and DM, say. Well, that assumes that countries use different signs. We for example use $, as does Argentina (but they add two decimal places where we have none), Mexico and US. And while both Arg. and Mexico use $ and 2 decimal places, the former uses , for decimals while the latter follows the US lead and uses .. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] CVS HEAD is broken by flex
I found CVS HEAD is broken and the following error is reported. My environment is RHEL4 Update 2 x86_64 with gcc 3.4.4 and flex 2.5.33. In file included from ../../../src/include/port.h:17, from ../../../src/include/c.h:841, from ../../../src/include/postgres_fe.h:21, from psqlscan.l:40: /usr/include/netdb.h:560: error: syntax error before '[' token make[3]: *** [psqlscan.o] Error 1 make[3]: *** Waiting for unfinished jobs make[2]: *** [all] Error 2 make[1]: *** [all] Error 2 make: *** [all] Error 2 The suspect is flex (or yy template file). It defines __unused to be empty, but netdb.h uses __unused as a variable. I found a similar case from google. http://mailman.icsi.berkeley.edu/pipermail/xorp-cvs/2007-March/009928.html [src/bin/psql/psqlscan.c] 9 #if defined(__FreeBSD__) 10 #include sys/cdefs.h 11 #else - 12 #define __unused 13 #endif [/usr/include/netdb.h] 552 struct gaicb 553 { 554 const char *ar_name; /* Name to look up. */ 555 const char *ar_service; /* Service name. */ 556 const struct addrinfo *ar_request; /* Additional request specification. */ 557 struct addrinfo *ar_result; /* Pointer to result. */ 558 /* The following are internal elements. */ 559 int __return; - 560 int __unused[5]; 561 }; Here is a brute fix for the problem, but it is probably improper. What is better way to fix? Index: src/include/port.h === --- src/include/port.h (HEAD) +++ src/include/port.h (working copy) @@ -13,6 +13,7 @@ #ifndef PG_PORT_H #define PG_PORT_H +#undef __unused #include ctype.h #include netdb.h #include pwd.h Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] buildfarm failure after ICC configure change
I just saw that my buildfarm member (running ICC 9.0 on linux) failed after the latest change to configure http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2007-09-11%2020:45:01 I was the one who sent in the first patch to configure to add the check for ICC, and as I recall at the time, the docs said -fno-strict-aliasing was the default (which is why I said not sure if this is needed in the comment). I don't know what -fno-alias does, but I think it means that the program does not do aliasing at all. The docs say You must specify -fno-alias if you do not want aliasing to be assumed in the program The true option for -fno-strict-aliasing is -ansi-alias- disable use of ANSI aliasing rules in optimizations. But this is the default... -- Help me, I'm a prisoner in a Fortune cookie file! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Add function for quote_qualified_identifier?
Hi hackers, I note that we currently expose the usefulness of the quote_identifier function to the user with quote_ident(text). Is there any reason we shouldn't do the same with quote_qualified_identifier? We could just add a quote_qualified_ident(text, text) ... it would make forming dynamic queries more convenient in databases that use multiple schemas. Clearly a DBA could just create this function himself in SQL (and it wouldn't be difficult), but is that a good reason not to have it in our standard set of functions? Would be happy to cook up a patch for this. Cheers, BJ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Per-function search_path = per-function GUC settings
On 9/12/07, Tom Lane [EMAIL PROTECTED] wrote: It would break functions that actually want to use a caller-specified search path, and protect themselves by explicitly schema-qualifying every other reference than one to some caller-specified object. Which admittedly is notationally a pain in the neck, but it's possible to do. I do not think that we should foreclose potentially useful behavior *and* make a major break in backward compatibility in order to make a very small improvement in security. In that case, is there anything wrong with Zdenek's suggestion to add a warning on SECURITY DEFINER functions that do not set a search_path? Something to the tune of WARNING: Your function is defined with SECURITY DEFINER but does not specify a local search path. This is potentially a serious security vulnerability. HINT: Use the SET clause in CREATE FUNCTION to set a safe search path which is specific to your function. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] invalidly encoded strings
However ISTM we would also need something like length(bytea, name) returns int -- counts the number of characters assuming that the bytea is in -- the given encoding Hmm, I wonder if counting chars is consistent regardless of the encoding the string is in. To me it sounds like it should, in which case it works to convert to the DB encoding and count chars there. Not necessarily. It's possible that after encoding conversion, number of chars are different before and after. An example is, UTF-8 and EUC_JIS_2004. 0xa4f7(EUC_JIS_2004) -- U+304B *and* U+309A (Unicode) This is defined in the Japanese goverment's standard. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CVS HEAD is broken by flex
ITAGAKI Takahiro [EMAIL PROTECTED] writes: The suspect is flex (or yy template file). It defines __unused to be empty, but netdb.h uses __unused as a variable. Egad. Shouldn't there be villagers breaking down the flex hackers' front door with torches and pitchforks in hand? Why is it our problem to fix their stupidity? [src/bin/psql/psqlscan.c] 9 #if defined(__FreeBSD__) 10 #include sys/cdefs.h 11 #else - 12 #define __unused 13 #endif Surely lines 11 and 12 should simply go away. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] buildfarm failure after ICC configure change
Jeremy Drake [EMAIL PROTECTED] writes: I just saw that my buildfarm member (running ICC 9.0 on linux) failed after the latest change to configure Argh! Can someone quote chapter and verse from the ICC manual about this? I was just following what Sergey said was the approved spelling of the switch ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] buildfarm failure after ICC configure change
On Wed, 12 Sep 2007, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: I just saw that my buildfarm member (running ICC 9.0 on linux) failed after the latest change to configure Argh! Can someone quote chapter and verse from the ICC manual about this? I was just following what Sergey said was the approved spelling of the switch ... The docs are not particularly clear about these options, at least not in 9.0 (which is the version I have). I figured -ansi-alias- was what -fno-strict-aliasing meant, that the gcc people decided to follow the ISO C standard strictly unless you gave -fno-strict-aliasing, which could result in additional optimization. This seems to match what the docs for -ansi-alias, but with a different default. The -fno-alias seems to go entirely the other way, saying you are not going to be doing any aliasing, standard permitted or otherwise, so it can optimize better accordingly. Here are the docs for the two options. == ansi-alias, Qansi-alias Enable use of ANSI aliasing rules in optimizations. Syntax Linux: -ansi-alias -ansi-alias- Windows: /Qansi-alias /Qansi-alias- Default -ansi-alias-Disable use of ANSI aliasing rules in optimizations. This option tells the compiler to assume that the program adheres to ISO C Standard aliasability rules. If your program adheres to these rules, then this option allows the compiler to optimize more aggressively. If it doesn't adhere to these rules, then it can cause the compiler to generate incorrect code. == falias Specifies that aliasing should be assumed in the program. Syntax Linux: -falias -fno-alias Windows: None Default -falias Aliasing is assumed in the program. This option specifies that aliasing should be assumed in the program. You must specify -fno-alias if you do not want aliasing to be assumed in the program. Alternate Options Linux: None Windows: /Oa -- Remember: Silly is a state of Mind, Stupid is a way of Life. -- Dave Butler ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings