Re: [HACKERS] visibility map
Can we just log the change of VM in log_heap_clean() for redo? Thanks -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com On Tue, Nov 23, 2010 at 3:24 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jun 14, 2010 at 1:19 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I *think* that the answer to this parenthesized question is no. When we vacuum a page, we set the LSN on both the heap page and the visibility map page. Therefore, neither of them can get written to disk until the WAL record is flushed, but they could get flushed in either order. So the visibility map page could get flushed before the heap page, as the non-parenthesized portion of the comment indicates. Right. However, at least in theory, it seems like we could fix this up during redo. Setting a bit in the visibility map is currently not WAL-logged, but yes once we add WAL-logging, that's straightforward to fix. Eh, so. Suppose - for the sake of argument - we do the following: 1. Allocate an additional infomask(2) bit that means xmin is frozen, no need to call XidInMVCCSnapshot(). When we freeze a tuple, we set this bit in lieu of overwriting xmin. Note that freezing pages is already WAL-logged, so redo is possible. 2. Modify VACUUM so that, when the page is observed to be all-visible, it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the visibility map bit, writing a single XLOG record for the whole operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same vacuum already removed tuples; otherwise and/or when no tuples were removed writing XLOG_HEAP2_FREEZE or some new record type). This loses no forensic information because of (1). (If the page is NOT observed to be all-visible, we freeze individual tuples only when they hit the current age thresholds.) Setting the visibility map bit is now crash-safe. Please poke holes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name
Robert Haas robertmh...@gmail.com writes: Especially because you also posted some revs of the ALTER EXTENSION .. SET SCHEMA patch on this thread Yes, I tried to answer where questions have been raised, and that's not helping so much at review time. That's why I take the time to update the commit fest application each time I send a new patch version. Do you still want me to prepare another patch for adding in the tests the set schema variants that already existed but are not yet covered? Which are the one you did spot, btw? [rhaas pgsql]$ git grep 'SET SCHEMA' src/test/regress/ [rhaas pgsql]$ The existing 'set schema' tests are in lower case, so I just did it the same, try with git grep -i maybe :) grep -c 'set schema' ../postgresql-extension-patches/set_schema.v8.patch 28 Please do. Tab completion support should really be included in the patch - adding it as a separate patch is better than not having it, of course. Ok, will learn about this part of psql soon'ish, hopefully by today, given a reasonable amount of other distractions. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Alvaro Herrera alvhe...@commandprompt.com writes: Hmm, the first thought that comes to mind is that the GucContext param to ParseConfigFile is unused and can be removed. This is probably an oversight from when include files were introduced in 2006. Thanks for caring about that part. I don't like the fact that this code handles custom_variable_classes internally. I think this would be exposed to the parsing of extension control files, which is obviously wrong. Well, in fact, not that much. The extension code has a special error case when dealing with custom variables if the class hasn't been already parsed, and what ParseConfigFile() is doing is pushing the custom_variable_classes setting in front of the list. guc-file.l says: /* * This variable must be processed first as it controls * the validity of other variables; so it goes at the head * of the result list. If we already found a value for it, * replace with this one. */ extension.c says: ereport(ERROR, (errmsg(Unsupported parameter '%s' in file: %s, tok1, filename), errhint(Be sure to have 'custom_variable_classes' set in a line before any custom variable.))); So if we don't change the code in ParseConfigFile() that will push custom_variable_classes in front of the list, all I have to change in the extension.c file is the error message. I fail to see a future usage of custom_variable_classes where it wouldn't help to have that in the list before any user setting that depends on it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] visibility map
On 22.11.2010 21:24, Robert Haas wrote: Eh, so. Suppose - for the sake of argument - we do the following: 1. Allocate an additional infomask(2) bit that means xmin is frozen, no need to call XidInMVCCSnapshot(). When we freeze a tuple, we set this bit in lieu of overwriting xmin. Note that freezing pages is already WAL-logged, so redo is possible. 2. Modify VACUUM so that, when the page is observed to be all-visible, it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the visibility map bit, writing a single XLOG record for the whole operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same vacuum already removed tuples; otherwise and/or when no tuples were removed writing XLOG_HEAP2_FREEZE or some new record type). This loses no forensic information because of (1). (If the page is NOT observed to be all-visible, we freeze individual tuples only when they hit the current age thresholds.) Setting the visibility map bit is now crash-safe. That's an interesting idea. You pickyback setting the vm bit on the freeze WAL record, on the assumption that you have to write the freeze record anyway. However, if that assumption doesn't hold, because the tuples are deleted before they reach vacuum_freeze_min_age, it's no better than the naive approach of WAL-logging the vm bit set separately. Whether that's acceptable or not, I don't know. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Alvaro Herrera alvhe...@commandprompt.com writes: the handling of relative vs absolute paths is bogus here. I think it'd make more sense to have a bool are we including; and if that's false and the path is not absolute, then the file is relative to CWD; or maybe we make it absolute by prepending PGDATA; maybe something else? (need to think of something that makes sense for both recovery.conf and extension control files) Current coding in extensions prepend any control or script file with sharepath, so that we're only dealing with absolute filename here. The idea is that it's no business for any other part of the code to have to know where we decide to install control and script files. My feeling is that when !is_absolute_path(config_file) and calling_file is NULL we should make the config_file absolute by prepending PGDATA. Please find that done in attached v4 of the cfparser patch. If that looks ok, do we want to add some documentation about the new lexer capabilities? beyond extra code comments? probably not. Great. Also, for what good reason would we want to prevent people from using the include facility? Not sure about this Ok, nothing special here. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 5024,5200 str_time(pg_time_t tnow) } /* - * Parse one line from recovery.conf. 'cmdline' is the raw line from the - * file. If the line is parsed successfully, returns true, false indicates - * syntax error. On success, *key_p and *value_p are set to the parameter - * name and value on the line, respectively. If the line is an empty line, - * consisting entirely of whitespace and comments, function returns true - * and *keyp_p and *value_p are set to NULL. - * - * The pointers returned in *key_p and *value_p point to an internal buffer - * that is valid only until the next call of parseRecoveryCommandFile(). - */ - static bool - parseRecoveryCommandFileLine(char *cmdline, char **key_p, char **value_p) - { - char *ptr; - char *bufp; - char *key; - char *value; - static char *buf = NULL; - - *key_p = *value_p = NULL; - - /* - * Allocate the buffer on first use. It's used to hold both the parameter - * name and value. - */ - if (buf == NULL) - buf = malloc(MAXPGPATH + 1); - bufp = buf; - - /* Skip any whitespace at the beginning of line */ - for (ptr = cmdline; *ptr; ptr++) - { - if (!isspace((unsigned char) *ptr)) - break; - } - /* Ignore empty lines */ - if (*ptr == '\0' || *ptr == '#') - return true; - - /* Read the parameter name */ - key = bufp; - while (*ptr !isspace((unsigned char) *ptr) - *ptr != '=' *ptr != '\'') - *(bufp++) = *(ptr++); - *(bufp++) = '\0'; - - /* Skip to the beginning quote of the parameter value */ - ptr = strchr(ptr, '\''); - if (!ptr) - return false; - ptr++; - - /* Read the parameter value to *bufp. Collapse any '' escapes as we go. */ - value = bufp; - for (;;) - { - if (*ptr == '\'') - { - ptr++; - if (*ptr == '\'') - *(bufp++) = '\''; - else - { - /* end of parameter */ - *bufp = '\0'; - break; - } - } - else if (*ptr == '\0') - return false; /* unterminated quoted string */ - else - *(bufp++) = *ptr; - - ptr++; - } - *(bufp++) = '\0'; - - /* Check that there's no garbage after the value */ - while (*ptr) - { - if (*ptr == '#') - break; - if (!isspace((unsigned char) *ptr)) - return false; - ptr++; - } - - /* Success! */ - *key_p = key; - *value_p = value; - return true; - } - - /* * See if there is a recovery command file (recovery.conf), and if so * read in parameters for archive recovery and XLOG streaming. * ! * XXX longer term intention is to expand this to ! * cater for additional parameters and controls ! * possibly use a flex lexer similar to the GUC one */ static void readRecoveryCommandFile(void) { FILE *fd; - char cmdline[MAXPGPATH]; TimeLineID rtli = 0; bool rtliGiven = false; ! bool syntaxError = false; fd = AllocateFile(RECOVERY_COMMAND_FILE, r); if (fd == NULL) { if (errno == ENOENT) ! return;/* not there, so no archive recovery */ ereport(FATAL, (errcode_for_file_access(), errmsg(could not open recovery command file \%s\: %m, RECOVERY_COMMAND_FILE))); } ! /* ! * Parse the file... ! */ ! while (fgets(cmdline, sizeof(cmdline), fd) != NULL) ! { ! char *tok1; ! char *tok2; ! ! if (!parseRecoveryCommandFileLine(cmdline, tok1, tok2)) ! { ! syntaxError = true; ! break; ! } ! if (tok1 == NULL) ! continue; ! if (strcmp(tok1, restore_command) == 0) { ! recoveryRestoreCommand = pstrdup(tok2); ereport(DEBUG2, (errmsg(restore_command = '%s', recoveryRestoreCommand))); } ! else if
Re: [HACKERS] SQL/MED estimated time of arrival?
On 23.11.2010 12:09, Shigeru HANADA wrote: On Sun, 21 Nov 2010 21:16:05 -0500 Robert Haasrobertmh...@gmail.com wrote: snip Ultimately, we probably want and need to get this patch down to chunks of less than 2000 lines each. But for starters, it looks quite simple to break this into three chunks: one for the PostgreSQL FDW, one for the CSV FDW, and one for the core functionality. I think that the CSV FDW, like the PG FDW, should be a loadable module. I've separated the patch into tree parts. They have codes, documents and tests within, and file_fdw and pgsql_fdw can be applied onto fdw_core for each, or together. I hope the separation helps the review of the patches. The docs need some work. The CREATE FOREIGN TABLE reference page seems to be copy-pasted from CREATE TABLE, because it talks about constraints and WITH/WITHOUT OIDS which surely don't apply to foreign tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On 22.11.2010 03:35, Robert Haas wrote: On Sun, Nov 21, 2010 at 8:10 PM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Wed, Oct 20, 2010 at 01:36, Dimitri Fontainedimi...@2ndquadrant.fr wrote: Ah yes, thinking it's an easy patch is not helping. Please find attached a revised version of it. I checked cfparser.v2.patch. It exports the static parseRecoveryCommandFileLine() in xlog.c as the global cfParseOneLine() in cfparser.c without modification. It generates one warning, but it can be easily fixed. cfparser.c:34: warning: no previous prototype for 'cfParseOneLine' Some discussions about the patch: * Is cf the best name for the prefix? Less abbreviated forms might be less confusable. Personally, I prefer conf. * Can we export ParseConfigFile() in guc-file.l rather than parseRecoveryCommandFileLine()? It can solve the issue that unquoted parameter values in recovery.conf are not recognized. Even if we won't merge them, just allowing unquoted values would be useful. I'd really like to see postgresql.conf and recovery.conf parsing merged, and I suspect, as Itagaki-san says, that postgresql.conf parsing is the better model for any new code. +1. There was unanimous agreement in the synchronous replication threads that recovery.conf should be parsed with the GUC parser. The current recovery.conf parser doesn't support escaping, for example. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On Tue, 23 Nov 2010 12:30:52 +0200 Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The docs need some work. The CREATE FOREIGN TABLE reference page seems to be copy-pasted from CREATE TABLE, because it talks about constraints and WITH/WITHOUT OIDS which surely don't apply to foreign tables. Thanks for the comments. The page you pointed has been edited for foreign table. In current design, OID system column and CHECK constraints are supported. OID is supported to get oid from the source table (yes, it works only for postgresql_fdw but it seems useful to support). CHECK constraint is supported to enable constraint exclusion. In addition, TABLEOID is supported to show which table is the actual source. I agree that some kind of documents, such as How to create new FDW, should be written. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. There's no reason it should require a postmaster restart. New backends spawned after the handler is turned on would enable it, and existing backends could be signalled to enable it as well. on-by-default is what we gain. I think that's fairly big... More than that. If a crash handler is in core, then: - It can be inited much earlier, catching crashes that happen during loading of libraries and during later backend init; and - It's basically free when the cost of shared library loading is removed, so it can be left on in production or even be on by default. I need to do some testing on this, but given the apparently near-zero cost of initing the handler I won't be surprised if testing a GUC to see if the handler should be on or not costs more than loading it does. I still wouldn't support on-by-default because you'd need an automatic process to weed out old crash dumps or limit the number stored. That's a bigger job. So I think the admin should have to turn it on, but it'd be good to make it easy to turn on in production without a restart; I don't see why that'd be hard. I'll try to put together a patch that does just that, time permitting. Things are kind of hectic ATM. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:30 AM, Tom Lane wrote: I'm not really sure why we're even considering the notion of back-patching this item. Doing so would not fit with any past practice or agreed-on project management practices, not even under our lax standards for contrib (and I keep hearing people claim that contrib is or should be as trustworthy as core, anyway). Since when do we back-patch significant features that have not been through a beta test cycle? I see no advantage to back-patching. It's easy to provide a drop-in binary DLL for older versions of Pg on Windows, who're the only people this will work for. If the EDB folks saw value in it, they could bundle the DLL with updated point releases of the installer for Windows. No back-patching is necessary. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:46 AM, Tom Lane wrote: * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. You need to minimize the number of assumptions of that sort. It doesn't need to have any backend code working, really; all it needs is a usable stack and the ability to allocate off the heap. It won't save you in total OOM situations, stack exhaustion, or severe stack corruption, but should work pretty much any other time. The crash dump facility in dbghelp.dll offers *optional* features where apps can stream in additional app-specific data like recent log excerpts, etc. I didn't try to implement anything like that in the initial module partly because I want to minimize the amount of the backend that must be working for a crash dump to succeed. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:56 AM, Heikki Linnakangas wrote: On 22.11.2010 19:47, Robert Haas wrote: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. Search for 'dbghelp.dll' on your average Windows system and you'll be surprised how many apps use it. Steam (the software distribution system) does, as does the Adobe Creative Suite on my machine. If you're running in interactive mode with access to the user's display you can use Windows error reporting. Services running in isolated user accounts don't seem to be able to use that. In any case, windows error reporting only collects *tiny* dumps with barely anything beyond the stack contents; they're a nightmare to use, and really require psychic powers and deep knowledge of scary Windows APIs for effective debugging. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
Thanks for the advice. I ran a row-by-row test, including debug output. I'll put a test case together as well but I believe I have narrowed down the problem somewhat. The first split occurrs when the 6th row is inserted and there are 6 calls to Compress(), however picksplit only receives 4 of those 6 tuples and the other two are dropped. postgres=# \i xaa psql:xaa:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 1 + Number of invalid tuples: 0 + Number of leaf tuples: 1 + Total size of tuples: 1416 bytes+ Total size of leaf tuples: 1416 bytes+ Total size of index: 8192 bytes+ postgres=# \i xab psql:xab:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 2 + Number of invalid tuples: 0 + Number of leaf tuples: 2 + Total size of tuples: 2820 bytes+ Total size of leaf tuples: 2820 bytes+ Total size of index: 8192 bytes+ postgres=# \i xac psql:xac:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 3 + Number of invalid tuples: 0 + Number of leaf tuples: 3 + Total size of tuples: 4224 bytes+ Total size of leaf tuples: 4224 bytes+ Total size of index: 8192 bytes+ postgres=# \i xad psql:xad:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 4 + Number of invalid tuples: 0 + Number of leaf tuples: 4 + Total size of tuples: 5628 bytes+ Total size of leaf tuples: 5628 bytes+ Total size of index: 8192 bytes+ postgres=# \i xae psql:xae:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 5 + Number of invalid tuples: 0 + Number of leaf tuples: 5 + Total size of tuples: 7032 bytes+ Total size of leaf tuples: 7032 bytes+ Total size of index: 8192 bytes+ postgres=# \i xaf psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:660] entered picksplit psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:812] split: 2 left, 2 right psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat Number of levels: 2 + Number of pages: 3 + Number of leaf pages: 2 + Number of tuples: 6 + Number of invalid tuples: 0 + Number of leaf tuples: 4 + Total size of tuples: 8460 bytes + Total size of leaf tuples: 5640 bytes + Total size of index: 24576 bytes+ postgres=# There are checks inside the Picksplit() function for the number of entries: OffsetNumber maxoff = entryvec-n - 1; int n_entries, j; n_entries = Max(maxoff, 1) - 1; j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); if
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300 2010: There are checks inside the Picksplit() function for the number of entries: OffsetNumber maxoff = entryvec-n - 1; int n_entries, j; n_entries = Max(maxoff, 1) - 1; j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] visibility map
On Tue, Nov 23, 2010 at 3:42 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: That's an interesting idea. You pickyback setting the vm bit on the freeze WAL record, on the assumption that you have to write the freeze record anyway. However, if that assumption doesn't hold, because the tuples are deleted before they reach vacuum_freeze_min_age, it's no better than the naive approach of WAL-logging the vm bit set separately. Whether that's acceptable or not, I don't know. I don't know, either. I was trying to think of the cases where this would generate a net increase in WAL before I sent the email, but couldn't fully wrap my brain around it at the time. Thanks for summarizing. Here's another design to poke holes in: 1. Imagine that the visibility map is divided into granules. For the sake of argument let's suppose there are 8K bits per granule; thus each granule covers 64M of the underlying heap and 1K of space in the visibility map itself. 2. In shared memory, create a new array called the visibility vacuum array (VVA), each element of which has room for a backend ID, a relfilenode, a granule number, and an LSN. Before setting bits in the visibility map, a backend is required to allocate a slot in this array, XLOG the slot allocation, and fill in its backend ID, relfilenode number, and the granule number whose bits it will be manipulating, plus the LSN of the slot allocation XLOG record. It then sets as many bits within that granule as it likes. When done, it sets the backend ID of the VVA slot to InvalidBackendId but does not remove it from the array immediately; such a slot is said to have been released. 3. When visibility map bits are set, the LSN of the page is set to the new-VVA-slot XLOG record, so that the visibility map page can't hit the disk before the new-VVA-slot XLOG record. Also, the contents of the VVA, sans backend IDs, are XLOG'd at each checkpoint. Thus, on redo, we can compute a list of all VVA slots for which visibility-bit changes might already be on disk; we go through and clear both the visibility map bit and the PD_ALL_VISIBLE bits on the underlying pages. 4. To free a VVA slot that has been released, we must xlogflush as far as the record that allocated the slot and sync the visibility map and heap segments containing that granule. Thus, all slots released before a checkpoint starts can be freed after it completes. Alternatively, an individual backend can free a previously-released slot by perfoming the xlog flush and syncs itself. (This might require a few more bookkeeping details to be stored in the VVA, but it seems manageable.) One problem with this design is that the visibility map bits never get set on standby servers. If we don't XLOG setting the bit then I suppose that doesn't happen now either, but it's more sucky (that's the technical term) if you're relying on it for index-only scans (which are also relevant on the standby, either during HS or if promoted) versus if you're only relying on it for vacuum (which doesn't happen on the standby anyway unless and until it's promoted). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
On Tue, Nov 23, 2010 at 2:33 AM, Kris Jurka bo...@ejurka.com wrote: On Mon, 22 Nov 2010, Itagaki Takahiro wrote: On Fri, Oct 15, 2010 at 03:40, Rados?aw Smogura rsmog...@softperience.eu wrote: Regarding JDBC in the CF process -- other interfaces are handled there. I haven't seen one patch this size for JDBC since I've been involved, let alone two competing patches to implement the same feature. Small patches which can be quickly handled don't make sense to put into the process, but it seemed reasonable for these. In any way I'm sending this patch, and I will put this under Miscellaneous in CF. This cleared patch takes only 47k (in uncleared was some binary read classes) and about 50% it's big test case. I changed the patch's topic to JDBC. https://commitfest.postgresql.org/action/patch_view?id=399 I don't think it makes sense to try to manage anything other than core code in the commitfest app. The other patch touched the backend, so it made sense to put it in the commitfest, but as far as I understand it, this one is pure Java code. There is a backlog of JDBC issues to deal with, but I think it needs its own commitfest instead of trying to tack on to the main project's. We could have separate JDBC CommitFests inside the app if that's helpful - the CommitFests are by convention named -MM, but the app will support arbitrary names. The only problem I see is that it would mess up the calculation of the currently open CF and the currently in progress CF and the most recently closed CF. I'd be willing to put in the work to fix that, though, if you guys want to use the app too. For now I suggest we mark this Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Nov 22, 2010 at 8:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: On balance I'm inclined to leave the unique key as per previous proposal (with a purpose column) and add the which-sort-order-is-that information as payload columns that aren't part of the key. This is probably OK too, although I confess I'm a lot less happy about it now that you've pointed out the need for those payload columns. The reason I said columns is that I can foresee eventually wanting to specify a pathkey in its entirety --- opfamily, asc/desc, nulls_first, and whatever we come up with for collation. We don't currently need to store more than the opfamily, since the others can never need to have non-default values given the current implementation of KNNGIST. But the idea that they might all be there eventually makes me feel that we don't want to try to incorporate this data in pg_amop's unique key. I'm satisfied to say that only one sort order can be associated with a particular operator in a particular opclass, which is what would be implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component. Does that imply that KNNGIST would only be able to support one ordering per AMOP_ORDER-operator, or does it imply that each such ordering would require a separate strategy number? The second might be OK, but the first sounds bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
Robert Haas robertmh...@gmail.com writes: On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm satisfied to say that only one sort order can be associated with a particular operator in a particular opclass, which is what would be implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component. Does that imply that KNNGIST would only be able to support one ordering per AMOP_ORDER-operator, or does it imply that each such ordering would require a separate strategy number? The second might be OK, but the first sounds bad. It would be the first, because simply assigning another strategy number only satisfies one of the unique constraints on pg_amop. To allow arbitrary flexibility here, we would have to include all components of the ordering specification in the unique constraint that's presently just (amopopr, amopfamily) and is proposed to become (amopopr, amopfamily, amoppurpose). I think that's an undue amount of complexity to support something that's most likely physically impossible from the index's standpoint anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm satisfied to say that only one sort order can be associated with a particular operator in a particular opclass, which is what would be implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component. Does that imply that KNNGIST would only be able to support one ordering per AMOP_ORDER-operator, or does it imply that each such ordering would require a separate strategy number? The second might be OK, but the first sounds bad. It would be the first, because simply assigning another strategy number only satisfies one of the unique constraints on pg_amop. To allow arbitrary flexibility here, we would have to include all components of the ordering specification in the unique constraint that's presently just (amopopr, amopfamily) and is proposed to become (amopopr, amopfamily, amoppurpose). I think that's an undue amount of complexity to support something that's most likely physically impossible from the index's standpoint anyway. Or, you'd need to pass these details separately to the AM, which seems like a more more flexible way of doing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
Robert Haas robertmh...@gmail.com writes: On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: It would be the first, because simply assigning another strategy number only satisfies one of the unique constraints on pg_amop. To allow arbitrary flexibility here, we would have to include all components of the ordering specification in the unique constraint that's presently just (amopopr, amopfamily) and is proposed to become (amopopr, amopfamily, amoppurpose). I think that's an undue amount of complexity to support something that's most likely physically impossible from the index's standpoint anyway. Or, you'd need to pass these details separately to the AM, which seems like a more more flexible way of doing it. A more flexible way of doing what? The first requirement here is that the catalog entries provide sufficient information to determine the semantics. You can't just say this opclass supports ordering, you have to specify what that ordering is. Punting to the index AM helps not at all, unless your proposal is to hard-wire this in GIST rather than in the core planner. We will probably *also* want to pass these details explicitly to the index AM, but that doesn't solve the problem that some catalog somewhere has to say what it is that the opclass can do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
I should correct what I just wrote: the first and last entries in entryvec-vector are invalid. On Nov 23, 2010, at 11:39 AM, Peter Tanski wrote: Picksplit() seems to be an exceptional case here: the first and last numbers in entryvec are invalid so entryvec-vector[entryvec-n - 1] is invalid. All the other GiST code Picksplit() functions use the same convention. For example, see the btree_gist picksplit function, at http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241 OffsetNumber i, maxoff = entryvec-n - 1; On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote: Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300 2010: There are checks inside the Picksplit() function for the number of entries: OffsetNumber maxoff = entryvec-n - 1; int n_entries, j; n_entries = Max(maxoff, 1) - 1; j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
I wrote: We will probably *also* want to pass these details explicitly to the index AM, but that doesn't solve the problem that some catalog somewhere has to say what it is that the opclass can do. ... although having said that, the obvious question is why that catalog has to be pg_amop. Maybe we should leave pg_amop alone (so that it represents only search operators) and invent a new catalog pg_amorderop. I envision it having the same columns as pg_amop, plus an ordering opclass OID (and maybe we might as well stick in asc/desc and nulls_first). The reason to do this instead of just adding those columns to pg_amop is that then we can have a different set of unique indexes. I'm thinking about (amopfamily, amoplefttype, amoprighttype, amopstrategy), which would be the same as in pg_amop, plus (amopopr, amopfamily, amopstrategy). This would allow a single operator to be registered under multiple strategy numbers, which presumably would carry different payload sort-order-specification columns. This still seems like overkill to me, because I don't actually believe that it's practical for an index to support multiple sort orders. But if anyone would like to make an argument why that's not pie in the sky, this might be the way to represent it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
On Tue, Nov 23, 2010 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: It would be the first, because simply assigning another strategy number only satisfies one of the unique constraints on pg_amop. To allow arbitrary flexibility here, we would have to include all components of the ordering specification in the unique constraint that's presently just (amopopr, amopfamily) and is proposed to become (amopopr, amopfamily, amoppurpose). I think that's an undue amount of complexity to support something that's most likely physically impossible from the index's standpoint anyway. Or, you'd need to pass these details separately to the AM, which seems like a more more flexible way of doing it. A more flexible way of doing what? The first requirement here is that the catalog entries provide sufficient information to determine the semantics. You can't just say this opclass supports ordering, you have to specify what that ordering is. Punting to the index AM helps not at all, unless your proposal is to hard-wire this in GIST rather than in the core planner. Eh, let's just do it the way you want to do it. It's probably going to have to be redone the next time somebody wants to make an enhancement in this area, but I guess it's going to be easy to do that then than to figure where to go with it now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
Picksplit() seems to be an exceptional case here: the first and last numbers in entryvec are invalid so entryvec-vector[entryvec-n - 1] is invalid. All the other GiST code Picksplit() functions use the same convention. For example, see the btree_gist picksplit function, at http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241 OffsetNumber i, maxoff = entryvec-n - 1; On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote: Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300 2010: There are checks inside the Picksplit() function for the number of entries: OffsetNumber maxoff = entryvec-n - 1; int n_entries, j; n_entries = Max(maxoff, 1) - 1; j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] NLS builds on windows and lc_messages
So I'm still trying to look into: http://archives.postgresql.org/pgsql-hackers/2010-11/msg00686.php which is still broken in 8.4-STABLE even after toms patch, to get better debugging I was trying to get non-translated error messages in the log which seems to be completely impossible in windows(neither through postgresql.conf nor in a session - at least on my custom build). * lc_messages on windows just excepts anything (including ridiculous large values with escapes and all that - which might actually be a problem) but does not seem to do anything * the docs claim If you want the system to behave as if it had no locale support, use the special locale C or POSIX. - setting it to 'C' on windows does not work either * if that stuff is not working at all on windows we clearly need to add an appropriate sentence or two in the docs - but this seems like a really annoying and stupid limitation Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Greg Stark gsst...@mit.edu writes: On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. Not easily, cheaply, or portably. The closest you could get in that line would be to use towlower(), which doesn't exist everywhere (though I grant probably most platforms have it by now). The much much bigger problem though is that we don't know what character representation towlower() deals in. We recently kluged the regex code to assume that the wchar_t representation for UTF8 locales is the standardized Unicode code point. I haven't heard of that breaking, but 9.0 hasn't been out that long. In other multibyte encodings we have no idea how to use that function, short of invoking mbstowcs/wcstombs or local equivalent, which is expensive and doesn't readily allow a short-circuit for ASCII. And, after you've hacked your way through all that, you still end up with case-folding behavior that depends on the prevailing locale. Which is dangerous for the previously cited reasons, and arguably not spec-compliant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On 11/23/2010 11:14 AM, Greg Stark wrote: On Mon, Nov 22, 2010 at 12:38 AM, Tom Lanet...@sss.pgh.pa.us wrote: No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. What casing rules will you apply? How will you know what is an upper case character and what its lower case character is? The sad, short answer is that there are no simple rules beyond ASCII. See the URL I posted upthread. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Tue, Nov 23, 2010 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: And, after you've hacked your way through all that, you still end up with case-folding behavior that depends on the prevailing locale. Which is dangerous for the previously cited reasons, and arguably not spec-compliant. So I thought the problem with the Turkish locale definition was that it redefined how a capital ascii character which was present in standard SQL identifiers was lowercased. Resulting in standard SQL syntax not working. I'm not sure I understand the danger if a user creates an object in a database with a particular encoding and locale using that locale for downcasing in the future. We don't currently support changing the locale of a database or using different locales in the future. Even with Peter's patch I think we can reasonably require the user to specify a single locale which controls how the SQL identifiers are interpreted regardless of the collations used in the operations. The points about the C API being limited and nonportable are a different issue.I guess I would need to do research to see if we're missing something which would help here. Otherwise I might be beginning to see the value in that /other/ library which I've argued against in the past. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Greg Stark gsst...@mit.edu writes: I'm not sure I understand the danger if a user creates an object in a database with a particular encoding and locale using that locale for downcasing in the future. The case I was worried about is dumping from one database and reloading into another one with a different locale. Although I suppose there are enough *other* reasons why that might fail that adding changes of downcasing behavior might not be a big deal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On 23.11.2010 14:22, Shigeru HANADA wrote: On Tue, 23 Nov 2010 12:30:52 +0200 Heikki Linnakangasheikki.linnakan...@enterprisedb.com wrote: The docs need some work. The CREATE FOREIGN TABLE reference page seems to be copy-pasted from CREATE TABLE, because it talks about constraints and WITH/WITHOUT OIDS which surely don't apply to foreign tables. Thanks for the comments. The page you pointed has been edited for foreign table. In current design, OID system column and CHECK constraints are supported. Oh, ok. OID is supported to get oid from the source table (yes, it works only for postgresql_fdw but it seems useful to support). I don't think that's worthwhile. Oids on user tables is a legacy feature, not recommended for new applications. And if you have to access an existing table that uses oids, you can define a regular column for the oid: CREATE FOREIGN TABLE foreigntable (oid oid, data int4) SERVER myserver; CHECK constraint is supported to enable constraint exclusion. Hmm, my gut reaction is that that's a premature optimization. But what about DEFAULTs then, surely that doesn't make sense for a foreign table? I agree that some kind of documents, such as How to create new FDW, should be written. A well-documented file FDW implementation goes a long way for that. But a chapter that explains SQL/MED, how to create foreign tables, servers, user mappings etc, and how they behave. That we need. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Tue, Nov 23, 2010 at 5:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: The case I was worried about is dumping from one database and reloading into another one with a different locale. Although I suppose there are enough *other* reasons why that might fail that adding changes of downcasing behavior might not be a big deal. If you dump the whole database then pg_dump would create the new database with the correct encoding and locale. If you change it then that can already cause it to fail if the data can't be converted to the new encoding. And as you point out there are all kinds of ways you can cause that to fail by making the context incompatible with the definitions you're loading. The lesson we learned in the past is that we have to ignore the locale for all the characters present in the standard identifiers. Beyond that I think this is just an implementation problem which may be a show stopper in itself but if we can do anything with mulitbyte characters it's probably an improvement over what we do now. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On Tue, 2010-11-23 at 20:18 +0200, Heikki Linnakangas wrote: On 23.11.2010 14:22, Shigeru HANADA wrote: OID is supported to get oid from the source table (yes, it works only for postgresql_fdw but it seems useful to support). I don't think that's worthwhile. Oids on user tables is a legacy feature, not recommended for new applications. Agreed. We should do everything we can to NOT encourage their use. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
On Tue, Nov 23, 2010 at 5:09 AM, Shigeru HANADA han...@metrosystems.co.jp wrote: I've separated the patch into tree parts. They have codes, documents and tests within, and file_fdw and pgsql_fdw can be applied onto fdw_core for each, or together. I hope the separation helps the review of the patches. Contents of each patch are: fdw_core : DDL for FDW HANDLER and FOREIGN TABLE, ForeignScan file_fdw : FDW for file, as contrib (loadable) module pgsql_fdw: FDW for PG, as contrib module I think it would be useful to repost each of these on their own thread and adjust the CF app to have an entry referring to each thread. That way we can keep discussion and review for each one separate from the others. Note that this thread is already totally off-topic anyway - it started as a question about when SQL/MED would land and then it got patches posted to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote: ok, I can only recapitulate so this feature was proposed cca two months ago, and minimally Tom and maybe you did agreement - with request on syntax - do you remember? I am little bit tired so this agreement was changed when I spent my time with this. I went back and reread the thread I believe you're speaking about. The first post is here: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php I cannot find one single email on that thread where Tom or I or anyone else endorsed the syntax you've proposed here; indeed, it and some other suggestions were roundly criticized. You responded to that by saying that the arguments against it were all wrong, but no one other than you ever appeared to believe that. There are a few emails on that thread where other people agreed that it would be nice, in theory, to have some syntax for this, but there is not one single email that I see saying that any syntax you proposed was a good one. If you read that thread and concluded that there was consensus to implement this syntax, you did not read it very carefully. If we had ELEMENT as a reserved keyword (which apparently it is in some versions of the SQL standard), maybe FOR ELEMENT wunk IN wunkarray... would be sufficiently unambiguous. But it's not even an unreserved keyword right now, and I have a hard time thinking it would be worth reserving it just for this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s/LABEL/VALUE/ for ENUMs
On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote: ... original patch. Sorry. Let's not fiddle with the names. To be clear, as things stand now, the new command is: ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } existing_enum_value ] So while the term in the SQL statement is VALUE, it's called a label in the documentation. I think that's confusing. Does anyone else? Best David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On sön, 2010-11-21 at 18:48 -0500, Tom Lane wrote: Yeah. I'm actually not sure that the SQL committee has thought very hard about this, because the spec is worded as though they think that Unicode case normalization is all they have to say to uniquely define what to do. The Unicode guys recognize that case mapping is locale-specific, which puts us right back at square one. But leaving spec compliance aside, we know from bitter experience that we cannot use a definition that lets the Turkish locale fool with the mapping of i/I. I suspect that locale-dependent mappings of any other characters are just as bad, we simply haven't had enough users burnt by such cases to have an institutional memory of it. The number of locale-dependent case mappings in the entire universe of Unicode is actually limited to 7 cases for Lithuanian and 8 cases for Turkish. (ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt) So it would be fair to say that there is a default case mapping, and that is what the SQL standard presumably refers to. One thing that we could do is let the user declare that he thinks his current locale is consistent with the Unicode case normalization, and apply the full Unicode conversion if so. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Excerpts from Pavel Stehule's message of lun nov 22 10:01:23 -0300 2010: sorry - it is related to tests from FOR-IN-ARRAY thread test query: select avg(array_upper(filter02(x,'%AA%', 10),1)) from t1000; Yeah, I can measure a 25x improvement in that test with the patch applied. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s/LABEL/VALUE/ for ENUMs
On Tue, Nov 23, 2010 at 2:01 PM, David E. Wheeler da...@kineticode.com wrote: On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote: ... original patch. Sorry. Let's not fiddle with the names. To be clear, as things stand now, the new command is: ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } existing_enum_value ] So while the term in the SQL statement is VALUE, it's called a label in the documentation. I think that's confusing. Does anyone else? Yes. As between the two options, I favor changing the command. And let's also paint it pink. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s/LABEL/VALUE/ for ENUMs
On Nov 23, 2010, at 11:48 AM, Robert Haas wrote: So while the term in the SQL statement is VALUE, it's called a label in the documentation. I think that's confusing. Does anyone else? Yes. As between the two options, I favor changing the command. And let's also paint it pink. Would that go with Andrew's boots and buttons? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s/LABEL/VALUE/ for ENUMs
On Tue, 2010-11-23 at 14:48 -0500, Robert Haas wrote: On Tue, Nov 23, 2010 at 2:01 PM, David E. Wheeler da...@kineticode.com wrote: On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote: ... original patch. Sorry. Let's not fiddle with the names. To be clear, as things stand now, the new command is: ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } existing_enum_value ] So while the term in the SQL statement is VALUE, it's called a label in the documentation. I think that's confusing. Does anyone else? Yes. As between the two options, I favor changing the command. And let's also paint it pink. We discussed this at West remember? When asked to paint a wall black we ask, which shade?. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote: j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. The first for loop of all others compare with i = maxoff instead of i maxoff. You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.) On the other hand, there are two problems: 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size. 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself: postgres=# set enable_seqscan=false; SET postgres=# set enable_indexscan=true; SET postgres=# create table fps2 (id serial, soid character(24) not null, fingerprint fprint not null); NOTICE: CREATE TABLE will create implicit sequence fps2_id_seq for serial column fps2.id CREATE TABLE postgres=# create index fps2_fingerprint_ix on fps2 using gist (fingerprint fprint_gist_ops); CREATE INDEX postgres=# \i xaa psql:xaa:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# \i xab psql:xab:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# \i xac psql:xac:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# \i xad psql:xad:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 4 + Number of invalid tuples: 0 + Number of leaf tuples: 4 + Total size of tuples: 5628 bytes+ Total size of leaf tuples: 5628 bytes+ Total size of index: 8192 bytes+ postgres=# \i xae psql:xae:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat --- Number of levels: 1 + Number of pages: 1 + Number of leaf pages: 1 + Number of tuples: 5 + Number of invalid tuples: 0 + Number of leaf tuples: 5 + Total size of tuples: 7032 bytes+ Total size of leaf tuples: 7032 bytes+ Total size of index: 8192 bytes+ postgres=# \i xaf psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:659] entered picksplit psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:838] split: 3 left, 2 right psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress INSERT 0 1 postgres=# select gist_stat('fps2_fingerprint_ix'); gist_stat Number of levels: 2 + Number of pages: 3 + Number of leaf pages: 2 + Number of tuples: 7 + Number of invalid tuples: 0 + Number of leaf tuples: 5 + Total size of tuples: 9864 bytes + Total size of leaf tuples: 7044 bytes + Total size of index: 24576 bytes+ postgres=# select id, soid from fps2; id | soid +-- 1 | 4c65a39d4d9bca2c3382 2 | 4c65a39d4d9bca2c338a 3 | 4c65a39d4d9bca2c3390 4 | 4c65a39d4d9bca2c3399 5 | 4c65a39d4d9bca2c33a5 6 | 4c65a39d4d9bca2c33a8 postgres=# select f1.id, f2.id, fprint_cmp(f1.fingerprint, f2.fingerprint) from fps2 f1 join fps2 f2 on f1.fingerprint=f2.fingerprint; id | id |fprint_cmp ++-- 1 | 1 | 1.00031467691569 2 | 2 | 1.00031467691569 4 | 4 | 1.00031467691569 5 | 5 | 1.00031467691569 6 | 6 | 1.00031467691569 So GiST does not include a tuple for row 3; one of the old tuples. After inserting a few more rows to trigger another Picksplit(): postgres=# \i xag
[HACKERS] Horizontal Write Scaling
I know there has been a lot of talk about replication getting built into Postgres and I know of many projects that aim to fill the role. However, I have not seen much in the way of a serious attempt at multi-master write scaling. I understand the fundamental problem with write scaling across multiple nodes is Disk I/O and inter-node communication latency and that in the conventional synchronous, multi-master replication type setup you would be limited to the speed of the slowest node, plus the communication protocol overhead and latency. However, it occurs to me that if you had a shared disk system via either iSCSI, Fiber Channel, NFS, or whatever (which also had higher I/O capabilities than a single server could utilize), if you used a file system that supported locks on a particular section (extent) of a file, it should theoretically be possible for multiple Postgres instances on multiple systems sharing the database to read and write to the database without causing corruption. Obviously, it would take some carefully designed code in terms of file extent locking, but it seems like it would also eliminate the need entirely for any type of replication system. Lustre seems to support the type of file locking required. Of course, I am assuming the disk system would be RAID 1, RAID 10, RAID 5, or RAID 6 for reliability purposes and that it is sufficiently redundant that you don't have to worry about an outage of your storage system. Has anyone put any thought into what it would take to do this in Postgres? Is it simply a matter of making the database file interaction code aware of extent locking, or is it considerably more involved than that? It also occurs to me that you probably need some form of transaction ordering mechanism across the nodes based on synchronized timestamps, but it seems Postgres-R has the required code to do that portion already written. That may not even be needed since all nodes would actually be working on the same database files and the locks would ensure a strict ordering of queries. The only place I see that possibly causing a problem is someone load balancing across the servers and sending a delete and insert really close to each other to different nodes such that if the insert executes first, it would be deleted by the delete even though the intent was to have the delete run first. Timestamp ordering of the queries or just some shared transaction ID across the nodes would eliminate that possibility. Other than that, is there anything else I am missing? Wouldn't this type of setup be far simpler to implement and provide better scalability than trying to do multi-master replication using log shipping or binary object shipping or any other techniques? Wouldn't it also be far more efficient since you don't need to have a copy of your data on each master node and therefor also don't have to ship your data to each node and have each node process it? I am mostly asking for educational purposes, and I would appreciate technical (and hopefully specific) explanations as to what in Postgres would need to change to support this.
Re: [HACKERS] ALTER TYPE recursion to typed tables
On ons, 2010-11-17 at 21:05 +0100, Dimitri Fontaine wrote: Code wise, though, I wonder about the name of the recursing parameter of the renameatt_internal function is src/backend/commands/tablecmds.c, which seems to only get used to detect erroneous attempt at renaming the table column directly. Maybe it's only me not used enough to PostgreSQL code yet, but here it distract the code reader. Having another parameter called recurse is not helping, too, but I don't see this one needs to be changed. This parameter has only minimal use in the renameatt case, but the same terminology is used throughout the ALTER TABLE code, so I think it's wise to keep it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horizontal Write Scaling
On Tue, Nov 23, 2010 at 3:43 PM, Eliot Gable egable+pgsql-hack...@gmail.comegable%2bpgsql-hack...@gmail.com wrote: snip Other than that, is there anything else I am missing? Wouldn't this type of setup be far simpler to implement and provide better scalability than trying to do multi-master replication using log shipping or binary object shipping or any other techniques? Wouldn't it also be far more efficient since you don't need to have a copy of your data on each master node and therefor also don't have to ship your data to each node and have each node process it? I am mostly asking for educational purposes, and I would appreciate technical (and hopefully specific) explanations as to what in Postgres would need to change to support this. Now that I think about this more, it seems you would still need to ship the transactions to your other nodes and have some form of processing system on each that knew which node was supposed to be executing each transaction and whether that node is currently online. It would also have to have designated backup nodes to execute the transaction on. Otherwise, you could end up waiting forever for a transaction to finish that was sent to one node right before that node lost power. However, if a transaction manager on each node is able to figured out the ordering of the transactions for itself based on some globally incrementing transaction ID and able to figure out which node will be executing the transaction and which node is the backup if the first one fails, etc., then if the backup sees the primary for that transaction go offline, it could execute the transaction instead. Then, I suppose you also need some system in Postgres which can allow concurrent processing of transactions such that they don't process stuff in a transaction which is dependent on a transaction that has not yet been committed, but can process other stuff. So, evaluation of deterministic functions could take place, but anything volatile could not until all previous transactions finished. I assume Postgres already has something like this in order to scale across multiple cores in a single box. This setup would basically make all the master nodes for the database look like just extra memory and CPU cores.
Re: [HACKERS] Horizontal Write Scaling
Eliot Gable egable+pgsql-hack...@gmail.com wrote: the locks would ensure a strict ordering of queries. PostgreSQL doesn't support S2PL. I'm not sure what locks you mean. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Tue, Nov 23, 2010 at 15:02, Craig Ringer cr...@postnewspapers.com.au wrote: However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. There's no reason it should require a postmaster restart. New backends spawned after the handler is turned on would enable it, and existing backends could be signalled to enable it as well. I think that came off my comment that we could store the on/off in the startup shared memory block. It'd then be the only way to get it into any existing backends. on-by-default is what we gain. I think that's fairly big... More than that. If a crash handler is in core, then: - It can be inited much earlier, catching crashes that happen during loading of libraries and during later backend init; and Yeah. - It's basically free when the cost of shared library loading is removed, so it can be left on in production or even be on by default. I need to do some testing on this, but given the apparently near-zero cost of initing the handler I won't be surprised if testing a GUC to see if the handler should be on or not costs more than loading it does. I'm fairly certain it does. The GUC would be there to be able to turn the whole thing off because you don't want the dumps, not for performance reasons. I still wouldn't support on-by-default because you'd need an automatic process to weed out old crash dumps or limit the number stored. That's a bigger job. So I think the admin should have to turn it on, but it'd be good to make it easy to turn on in production without a restart; I don't see why that'd be hard. I think the admin should deal with that - just like the admin has to clear out the old logs. I'll try to put together a patch that does just that, time permitting. Things are kind of hectic ATM. Let me know if you want me to look at adapting the patch for that - i can do that if you prefer. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au wrote: On 11/23/2010 01:46 AM, Tom Lane wrote: * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. The crash dump facility in dbghelp.dll offers *optional* features where apps can stream in additional app-specific data like recent log excerpts, etc. I didn't try to implement anything like that in the initial module partly because I want to minimize the amount of the backend that must be working for a crash dump to succeed. Yeah, we already have the logs in the logfile etc. Let's keep this uncomplicated so that it stays working :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au wrote: Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. Or you set the handler always, and have the handler only actually create the dump if the directory exists. +1 for that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horizontal Write Scaling
On 24/11/10 09:43, Eliot Gable wrote: However, it occurs to me that if you had a shared disk system via either iSCSI, Fiber Channel, NFS, or whatever (which also had higher I/O capabilities than a single server could utilize) Yeah, current Postgres multi-master projects seem to be focusing on shared-nothing architecture as opposed to a shared-disk one. I guess the advantage of the former is that specialized (i.e expensive) hardware is not required to attempt to overcome the point of failure with shared-disk systems - the disk they share. Cheers Mark
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/24/2010 05:18 AM, Magnus Hagander wrote: Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. That's way smarter. No extra filesystem access during startup, even if it is cheap. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
On Sun, Nov 21, 2010 at 11:07 PM, Andy Colson a...@squeakycode.net wrote: After all this... there are too many numbers for me. I have no idea what this means. I think what it means that is that, for you, unlogged tables were almost as fast as shutting off all of synchronous_commit, full_page_writes, and fsync, and further setting bgwriter_lru_maxpages=0. Now, that seems a little strange, because you'd think if anything it would be faster. I'm not sure what accounts for the difference, although I wonder if checkpoints are part of it. With the current code, which doesn't exclude unlogged table pages from checkpoints, a checkpoint will still be faster with fsync=off than with unlogged tables. It seems like we're agreed that this is a problem to be fixed in phase two, though, either by fsyncing every unlogged table we can find at shutdown time, or else by providing two durability options, one that works as the current code does (but survives clean shutdowns) and another that excludes dirty pages from checkpoints (and does not survive clean shutdowns). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
2010/11/23 Robert Haas robertmh...@gmail.com: On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote: ok, I can only recapitulate so this feature was proposed cca two months ago, and minimally Tom and maybe you did agreement - with request on syntax - do you remember? I am little bit tired so this agreement was changed when I spent my time with this. I went back and reread the thread I believe you're speaking about. The first post is here: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php Here perhaps ? (or before) http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php I cannot find one single email on that thread where Tom or I or anyone else endorsed the syntax you've proposed here; Nah, but you didn't disagree on the main idea, you just said : 'like Tom I agree that syntax must be uptaded to something beter' , more or less indeed, it and some other suggestions were roundly criticized. You responded to that by saying that the arguments against it were all wrong, but no one other than you ever appeared to believe that. There are a few emails on that thread where other people agreed that it would be nice, in theory, to have some syntax for this, but there is not one single email that I see saying that any syntax you proposed was a good one. If you read that thread and concluded that there was consensus to implement this syntax, you did not read it very carefully. I think you (Robert) misunderstood dramatically what Pavel try to do. Pavel did an excellent optimization work for a specific point. This specific point looks crucial for me in the current behavior of PostgreSQL[1]. AFAIS Pavel didn't want to implement a genious syntax, but an optimization feature. I don't care about syntax, I care with Tom explanation on that. but no more. I care with the idea that this patch is just a quick way to cut the iceberg. It is. and ? And we might do it better with more deep analysis and refactoring more stuff, I agree... Still this patch is interesting enought from perf point of view to not trash it that quickly, IMO. If we had ELEMENT as a reserved keyword (which apparently it is in some versions of the SQL standard), maybe FOR ELEMENT wunk IN wunkarray... would be sufficiently unambiguous. But it's not even an unreserved keyword right now, and I have a hard time thinking it would be worth reserving it just for this. I am not aware of SQL spec precisely about that. David, did your recent post about UNNEST stuff looks relevant in this thread ? I mean can we elaborate something from your suggestion to improve the situation of the current patch (and vice-versa) ? [1] data compression in the array allow to insert billions of data for a small size print. (I know it is not pure design, it is just pure end-user very effective solution) -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
2010/11/21 Andres Freund and...@anarazel.de: On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote: For a similar problem we had (kernel buffering too much) we had success using the fadvise and madvise WONTNEED syscalls to force the data to exit the cache much sooner than it would otherwise. This was on Linux and it had the side-effect that the data was deleted from the kernel cache, which we wanted, but probably isn't appropriate here. Yep, works fine. Although it has the issue that the data will get read again if archiving/SR is enabled. mmhh . the current code does call DONTNEED or WILLNEED for WAL depending of the archiving off or on. This matters *only* once the data is writen (fsync, fdatasync), before that it should not have an effect. There is also sync_file_range, but that's linux specific, although close to what you want I think. It would allow you to work with blocks smaller than 1GB. Unfortunately that puts the data under quite high write-out pressure inside the kernel - which is not what you actually want because it limits reordering and such significantly. It would be nicer if you could get a mix of both semantics (looking at it, depending on the approach that seems to be about a 10 line patch to the kernel). I.e. indicate that you want to write the pages soonish, but don't put it on the head of the writeout queue. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Suggested easy TODO: pg_dump --from-list
Hackers, I just thought of a good feature to put on the [E] list for TODO, if people agree: a switch to pg_dump called --from-list which would take a filename and dump the list of objects in that file. Objections to putting this on the TODO? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tab completion for view triggers in psql
On Fri, Oct 29, 2010 at 10:33 AM, David Fetter da...@fetter.org wrote: That seems like a matter for a separate patch. Looking this over, I found I'd created a query that can never get used, so please find enclosed the next version of the patch :) I like deletables better than deleteables for Query_for_list_of_deleteables. Sources: dictionary.com and a git grep through the rest of the PG source. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/11/23 Robert Haas robertmh...@gmail.com: On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote: ok, I can only recapitulate so this feature was proposed cca two months ago, and minimally Tom and maybe you did agreement - with request on syntax - do you remember? I am little bit tired so this agreement was changed when I spent my time with this. I went back and reread the thread I believe you're speaking about. The first post is here: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php Here perhaps ? (or before) http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php Dang. You're right. I stand corrected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security hooks on object creation
Thanks for your reviewing, and sorry for the late reply. I've not been available for a few days. (2010/11/22 12:11), Robert Haas wrote: 2010/11/12 KaiGai Koheikai...@ak.jp.nec.com: (2010/11/12 19:34), KaiGai Kohei wrote: I revised my patch according to the prior suggestions. I'm sorry. I revised my patch, but not attached. Please see this attached one. I'm satisfied with this approach, although I intend to change InvokeObjectAccessHook0 to simply InvokeObjectAccessHook before committing it; OK. We have no other object-access-type which takes any arguments right now. It is quite cosmetic things, so we may be able to add the number of arguments later, such as SysCache. and correct your use of AttributeRelationId to RelationRelationId for consistency with the rest of the code. Oops, it was my bug. I'll fix it. What I'm not quite sure about is where to put the definitions you've added to a new file utils/hooks.h; I don't feel that's a very appropriate location. It's tempting to put them in utils/acl.h just because this is vaguely access-control related and that header is already included in most of the right places, but maybe that's too much of a stretch; or perhaps catalog/catalog.h, although that doesn't feel quite right either. If we are going to add a new header file, I still don't like utils/hooks.h much - it's considerably more generic than can be justified by its contents. I don't think utils/acl.h is long-standing right place, because we intended not to restrict the purpose of this hooks to access controls as you mentioned. I think somewhere under the catalog/ directory is a good idea because it hooks events that user wants (eventually) to modify system catalogs. How about catalog/hooks.h, instead of utils/hooks.h? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security hooks on object creation
2010/11/23 KaiGai Kohei kai...@ak.jp.nec.com: What I'm not quite sure about is where to put the definitions you've added to a new file utils/hooks.h; I don't feel that's a very appropriate location. It's tempting to put them in utils/acl.h just because this is vaguely access-control related and that header is already included in most of the right places, but maybe that's too much of a stretch; or perhaps catalog/catalog.h, although that doesn't feel quite right either. If we are going to add a new header file, I still don't like utils/hooks.h much - it's considerably more generic than can be justified by its contents. I don't think utils/acl.h is long-standing right place, because we intended not to restrict the purpose of this hooks to access controls as you mentioned. I think somewhere under the catalog/ directory is a good idea because it hooks events that user wants (eventually) to modify system catalogs. How about catalog/hooks.h, instead of utils/hooks.h? Well, if we're going to create a new header file for this, I think it should be called something like catalog/objectaccess.h, rather than just hooks.h. But I'd rather reuse something that's already there, all things being equal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help. For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased. On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote: On 2010-11-23 20:54, Peter Tanski wrote: On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote: j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. The first for loop of all others compare with i= maxoff instead of i maxoff. You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.) On the other hand, there are two problems: 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size. 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself: I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec-n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4. btw: interesting topic, audio fingerprinting! regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Mon, Nov 22, 2010 at 6:54 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 21.11.2010 15:18, Robert Haas wrote: On Sat, Nov 20, 2010 at 4:07 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: So what DO we need to guard against here? I think the general problem can be stated as process A changes two or more values in shared memory in a fairly short span of time, and process B, which is concurrently examining the same variables, sees those changes occur in a different order than A thought it made them in. In practice we do not need to worry about changes made with a kernel call in between, as any sort of context swap will cause the kernel to force cache synchronization. Also, the intention is that the locking primitives will take care of this for any shared structures that are protected by a lock. (There were some comments upthread suggesting maybe our lock code is not bulletproof; but if so that's something to fix in the lock code, not a logic error in code using the locks.) So what this boils down to is being an issue for shared data structures that we access without using locks. As, for example, the latch structures. So is the problem case a race involving owning/disowning a latch vs. setting that same latch? No. (or maybe that as well, but that's not what we've been concerned about here). As far as I've understood correctly, the problem is that process A does something like this: /* set a shared variable */ ((volatile bool *) shmem)-variable = true; /* Wake up process B to notice that we changed the variable */ SetLatch(); And process B does this: for (;;) { ResetLatch(); if (((volatile bool *) shmem)-variable) DoStuff(); WaitLatch(); } This is the documented usage pattern of latches. The problem arises if process A runs just before ResetLatch, but the effect of setting the shared variable doesn't become visible until after the if-test in process B. Process B will clear the is_set flag in ResetLatch(), but it will not DoStuff(), so it in effect misses the wakeup from process A and goes back to sleep even though it would have work to do. This situation doesn't arise in the current use of latches, because the shared state comparable to shmem-variable in the above example is protected by a spinlock. But it might become an issue in some future use case. Eh, so, should we do anything about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tab completion for view triggers in psql
On Tue, Nov 23, 2010 at 09:37:57PM -0500, Josh Kupershmidt wrote: On Fri, Oct 29, 2010 at 10:33 AM, David Fetter da...@fetter.org wrote: That seems like a matter for a separate patch. Looking this over, I found I'd created a query that can never get used, so please find enclosed the next version of the patch :) I like deletables better than deleteables for Query_for_list_of_deleteables. Sources: dictionary.com and a git grep through the rest of the PG source. Thanks for the review. Please find attached a patch changing both this and updateable to updatable, also per the very handy git grep I just learned about :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** *** 303,308 static const SchemaQuery Query_for_list_of_tables = { --- 303,359 NULL }; + /* The bit masks for the following three functions come from + * src/include/catalog/pg_trigger.h. + */ + static const SchemaQuery Query_for_list_of_insertables = { + /* catname */ + pg_catalog.pg_class c, + /* selcondition */ + c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS + (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND t.tgtype | (1 2) = t.tgtype)), + /* viscondition */ + pg_catalog.pg_table_is_visible(c.oid), + /* namespace */ + c.relnamespace, + /* result */ + pg_catalog.quote_ident(c.relname), + /* qualresult */ + NULL + }; + + static const SchemaQuery Query_for_list_of_deletables = { + /* catname */ + pg_catalog.pg_class c, + /* selcondition */ + c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS + (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND t.tgtype | (1 3) = t.tgtype)), + /* viscondition */ + pg_catalog.pg_table_is_visible(c.oid), + /* namespace */ + c.relnamespace, + /* result */ + pg_catalog.quote_ident(c.relname), + /* qualresult */ + NULL + }; + + static const SchemaQuery Query_for_list_of_updatables = { + /* catname */ + pg_catalog.pg_class c, + /* selcondition */ + c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS + (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND t.tgtype | (1 4) = t.tgtype)), + /* viscondition */ + pg_catalog.pg_table_is_visible(c.oid), + /* namespace */ + c.relnamespace, + /* result */ + pg_catalog.quote_ident(c.relname), + /* qualresult */ + NULL + }; + static const SchemaQuery Query_for_list_of_tisv = { /* catname */ pg_catalog.pg_class c, *** *** 333,338 static const SchemaQuery Query_for_list_of_tsv = { --- 384,404 NULL }; + static const SchemaQuery Query_for_list_of_tv = { + /* catname */ + pg_catalog.pg_class c, + /* selcondition */ + c.relkind IN ('r', 'v'), + /* viscondition */ + pg_catalog.pg_table_is_visible(c.oid), + /* namespace */ + c.relnamespace, + /* result */ + pg_catalog.quote_ident(c.relname), + /* qualresult */ + NULL + }; + static const SchemaQuery Query_for_list_of_views = { /* catname */ pg_catalog.pg_class c, *** *** 630,636 psql_completion(char *text, int start, int end) *prev2_wd, *prev3_wd, *prev4_wd, ! *prev5_wd; static const char *const sql_commands[] = { ABORT, ALTER, ANALYZE, BEGIN, CHECKPOINT, CLOSE, CLUSTER, --- 696,703 *prev2_wd, *prev3_wd, *prev4_wd, ! *prev5_wd, ! *prev6_wd; static const char *const sql_commands[] = { ABORT, ALTER, ANALYZE, BEGIN, CHECKPOINT, CLOSE, CLUSTER, *** *** 669,675 psql_completion(char *text, int start, int end) completion_info_charp2 = NULL; /* !* Scan the input line before our current position for the last five * words. According to those we'll make some smart decisions on what the * user is probably intending to type. TODO: Use strtokx() to do this. */ --- 736,742 completion_info_charp2 = NULL; /* !* Scan the input line before our current position for the last six * words. According to those we'll make some smart decisions on what the *
Re: [HACKERS] Suggested easy TODO: pg_dump --from-list
On 11/23/2010 09:33 PM, Josh Berkus wrote: Hackers, I just thought of a good feature to put on the [E] list for TODO, if people agree: a switch to pg_dump called --from-list which would take a filename and dump the list of objects in that file. Objections to putting this on the TODO? Well, very little about pg_dump is very [E], IMNSHO. The question in my mind here is what format the list file will take. For example, how would we specify a function? Would we need to specify all the argument types (or at least the IN arguments)? It's not as easy as a list with pg_restore, which is just a list of TOC ids, and all the rest is just a comment in the list file. I certainly don't think we should put this on the list without at least having the idea fleshed out some more. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggested easy TODO: pg_dump --from-list
On Tue, Nov 23, 2010 at 10:24 PM, Andrew Dunstan and...@dunslane.net wrote: Well, very little about pg_dump is very [E], IMNSHO. The question in my mind here is what format the list file will take. For example, how would we specify a function? Would we need to specify all the argument types (or at least the IN arguments)? It's not as easy as a list with pg_restore, which is just a list of TOC ids, and all the rest is just a comment in the list file. I certainly don't think we should put this on the list without at least having the idea fleshed out some more. I think the list should be generated by pg_dump itself in a first run, by building a complete TOC and then dumping a pg_restore -l like list format (without dumpIds) where the user just deletes the objects that he doesn't want to get dumped. The list wouldn't contain dumpIds, but catalogIds and those should be sufficiently unique and easy to parse and compare. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assertion failure on hot standby
Hi, http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php When I did unusual operations (e.g., suspend bgwriter by gdb, pgbench -i and issue txid_current many times) on the master in order to try to reproduce the above HS error, I encountered the following assertion error. Since I compiled the standby postgres with WAL_DEBUG and ran it with wal_debug = on, all the replayed WAL records were logged. sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09 sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09 sby LOG: consistent recovery state reached at 0/134C0510 sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09 sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0; len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409 sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0; len 20: Standby - running xacts: nextXid 23256 latestCompletedXid 23255 oldestRunningXid 23256 TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: twophase.c, Line: 1209) sby LOG: database system is ready to accept read only connections sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted sby LOG: terminating any other active server processes Does anyone know what the cause of the problem is? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby: too many KnownAssignedXids
On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 19.11.2010 23:46, Joachim Wieland wrote: FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978, pArray-maxKnownAssignedXids: 6890 Hmm, that's a lot of entries in KnownAssignedXids. Can you recompile with WAL_DEBUG, and run the recovery again with wal_debug=on ? That will print all the replayed WAL records, which is a lot of data, but it might give a hint what's going on. Sure, but this gives me only one more line: [...] LOG: redo starts at 1F8/FC00E978 LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid 385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid 3829898/23 FATAL: too many KnownAssignedXids CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23 LOG: startup process (PID 4587) exited with exit code 1 LOG: terminating any other active server processes Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggested easy TODO: pg_dump --from-list
. Well, very little about pg_dump is very [E], IMNSHO. The question in my mind here is what format the list file will take I was thinking same format as pg_restore -l, only without the dumpIDs. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] profiling connection overhead
Per previous threats, I spent some time tonight running oprofile (using the directions Tom Lane was foolish enough to provide me back in May). I took testlibpq.c and hacked it up to just connect to the server and then disconnect in a tight loop without doing anything useful, hoping to measure the overhead of starting up a new connection. Ha, ha, funny about that: 120899 18.0616 postgres AtProcExit_Buffers 56891 8.4992 libc-2.11.2.so memset 30987 4.6293 libc-2.11.2.so memcpy 26944 4.0253 postgres hash_search_with_hash_value 26554 3.9670 postgres AllocSetAlloc 20407 3.0487 libc-2.11.2.so _int_malloc 17269 2.5799 libc-2.11.2.so fread 13005 1.9429 ld-2.11.2.so do_lookup_x 11850 1.7703 ld-2.11.2.so _dl_fixup 10194 1.5229 libc-2.11.2.so _IO_file_xsgetn In English: the #1 overhead here is actually something that happens when processes EXIT, not when they start. Essentially all the time is in two lines: 56920 6.6006 :for (i = 0; i NBuffers; i++) :{ 98745 11.4507 :if (PrivateRefCount[i] != 0) Non-default configs: max_connections = 100 shared_buffers = 480MB work_mem = 4MB maintenance_work_mem = 128MB checkpoint_segments = 30 random_page_cost = 2.0 Anything we can do about this? That's a lot of overhead, and it'd be a lot worse on a big machine with 8GB of shared_buffers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On Wed, Nov 24, 2010 at 12:07 AM, Robert Haas robertmh...@gmail.com wrote: Per previous threats, I spent some time tonight running oprofile (using the directions Tom Lane was foolish enough to provide me back in May). I took testlibpq.c and hacked it up to just connect to the server and then disconnect in a tight loop without doing anything useful, hoping to measure the overhead of starting up a new connection. Oh, right: attachments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company CPU: AMD64 family10, speed 2200 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 120899 18.0616 postgres AtProcExit_Buffers 56891 8.4992 libc-2.11.2.so memset 30987 4.6293 libc-2.11.2.so memcpy 26944 4.0253 postgres hash_search_with_hash_value 26554 3.9670 postgres AllocSetAlloc 20407 3.0487 libc-2.11.2.so _int_malloc 17269 2.5799 libc-2.11.2.so fread 13005 1.9429 ld-2.11.2.so do_lookup_x 11850 1.7703 ld-2.11.2.so _dl_fixup 10194 1.5229 libc-2.11.2.so _IO_file_xsgetn 10087 1.5069 postgres MemoryContextAllocZero 9143 1.3659 ld-2.11.2.so strcmp 8957 1.3381 postgres load_relcache_init_file 7132 1.0655 postgres fmgr_info_cxt_security 5630 0.8411 libc-2.11.2.so vfprintf 5029 0.7513 ld-2.11.2.so _dl_lookup_symbol_x 4997 0.7465 postgres _bt_getroot 3935 0.5879 libc-2.11.2.so memcmp 3874 0.5788 postgres hash_seq_search 3718 0.5554 postgres LWLockAcquire 3666 0.5477 postgres guc_name_compare 3457 0.5165 libc-2.11.2.so __strlen_sse2 3297 0.4926 postgres load_relmap_file 3175 0.4743 libc-2.11.2.so malloc 3170 0.4736 postgres LockAcquireExtended 3139 0.4689 postgres hash_any 3113 0.4651 postgres MemoryContextAlloc 2946 0.4401 postgres _bt_compare 2936 0.4386 postgres index_getnext 2885 0.4310 ld-2.11.2.so _dl_sort_fini 2873 0.4292 libc-2.11.2.so _int_free 2619 0.3913 postgres MemoryContextCreate 2579 0.3853 ld-2.11.2.so check_match.12146 2485 0.3712 postgres heap_page_prune_opt 2457 0.3671 postgres LWLockRelease 2438 0.3642 postgres CreateTemplateTupleDesc 2322 0.3469 ld-2.11.2.so _dl_fini 2301 0.3438 postgres set_config_option 2253 0.3366 postgres _bt_first 2225 0.3324 postgres PinBuffer 2140 0.3197 postgres BeginReportingGUCOptions 2091 0.3124 postgres _bt_preprocess_keys 2085 0.3115 libc-2.11.2.so _IO_vfscanf 2051 0.3064 postgres element_alloc 1962 0.2931 postgres ServerLoop 1936 0.2892 postgres CreateTupleDescCopyConstr 1884 0.2815 libc-2.11.2.so __strcpy_sse2 1846 0.2758 libkrb5.so.3.3 /lib64/libkrb5.so.3.3 1801 0.2691 postgres FunctionCall2 1797 0.2685 postgres hash_create 1782 0.2662 postgres PgstatCollectorMain 1761 0.2631 postgres _bt_checkpage 1728 0.2582 postgres AllocSetFree 1597 0.2386 libselinux.so.1 /lib64/libselinux.so.1 1579 0.2359 libc-2.11.2.so _IO_default_xsputn 1543 0.2305 libc-2.11.2.so free 1531 0.2287 postgres SearchCatCache 1528 0.2283 postgres BuildHardcodedDescriptor 1506 0.2250 libc-2.11.2.so strchrnul 1475 0.2204 postgres _bt_checkkeys 1457 0.2177 postgres ResourceOwnerForgetRelationRef 1451 0.2168 ld-2.11.2.so _dl_runtime_resolve 1443 0.2156 postgres InitCatCache 1443 0.2156 postgres hash_search 1382 0.2065 ld-2.11.2.so _dl_name_match_p 1360 0.2032 postgres PostgresMain 1347 0.2012 postgres pgstat_report_stat 1342 0.2005 libssl.so.1.0.0b /usr/lib64/libssl.so.1.0.0b 1340 0.2002 postgres systable_beginscan 1311 0.1959 libgssapi_krb5.so.2.2/lib64/libgssapi_krb5.so.2.2 1254 0.1873 postgres errstart 1247 0.1863 libc-2.11.2.so __strncmp_sse2 1245
Re: [HACKERS] Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: Possibly the cleanest fix is to implement pg_ping as a libpq function. You do have to distinguish connection failures (ie connection refused) from errors that came back from the postmaster, and the easiest place to be doing that is inside libpq. OK, so a new libpq function --- got it. Would we just pass the status from the backend or can it be done without backend modifications? It would definitely be better to do it without backend mods, so that the functionality would work against back-branch postmasters. To my mind, the entire purpose of such a function is to classify the possible errors so that the caller doesn't have to. So I wouldn't consider that it ought to pass back the status from the backend. I think what we basically want is a function that takes a conninfo string (or one of the variants of that) and returns an enum defined more or less like this: * failed to connect to postmaster * connected, but postmaster is not accepting sessions * postmaster is up and accepting sessions I'm not sure those are exactly the categories we want, but something close to that. In particular, I don't know if there's any value in subdividing the not accepting sessions status --- pg_ctl doesn't really care, but other use-cases might want to tell the difference between the various canAcceptConnections failure states. BTW, it is annoying that we can't definitively distinguish postmaster is not running from a connectivity problem, but I can't see a way around that. Agreed. I will research this. I have researched this and developed the attached patch. It implements PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl -w server status detection. The new output for cases where .pgpass is not allowing for a connection is: $ pg_ctl -w -l /dev/null start waiting for server to start done server started However, could not connect, perhaps due to invalid authentication or misconfiguration. The code basically checks the connection status between PQconnectStart() and connectDBComplete() to see if the server is running but we failed to connect for some reason. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a911c50..32c58a5 100644 *** /tmp/b2EvXa_libpq.sgml Tue Nov 23 17:41:50 2010 --- doc/src/sgml/libpq.sgml Tue Nov 23 17:36:32 2010 *** int PQbackendPID(const PGconn *conn); *** 1511,1516 --- 1511,1584 /listitem /varlistentry + varlistentry id=libpq-pqpingparams + termfunctionPQpingParams/functionindextermprimaryPQpingParams///term + listitem + para +functionPQpingParams/function indicates the status of the +server. The currently recognized parameter key words are the +same as functionPQconnectParams/. + + synopsis + PGPing PQpingParams(const char **keywords, const char **values, int expand_dbname); + /synopsis + +It returns one of the following values: + +variablelist + varlistentry id=libpq-pqpingparams-pqaccess + termliteralPQACCESS/literal/term + listitem + para +The server is running and allows access. + /para + /listitem + /varlistentry + + varlistentry id=libpq-pqpingparams-pqreject + termliteralPQREJECT/literal/term + listitem + para +The server is running but rejected a connection request. + /para + /listitem + /varlistentry + + varlistentry id=libpq-pqpingparams-pqnoresponse + termliteralPQNORESPONSE/literal/term + listitem + para +The server did not respond. + /para + /listitem + /varlistentry +/variablelist + + /para + + /listitem + /varlistentry + + varlistentry id=libpq-pqping + termfunctionPQping/functionindextermprimaryPQping///term + listitem + para +Returns the status of the server. + + synopsis + PGPing PQping(const char *conninfo); + /synopsis + /para + + para +This function uses the same literalconninfo/literal parameter +key words as functionPQconnectdb/. It returns the same +values as functionPQpingParams/ above. + /para + + /listitem + /varlistentry + varlistentry id=libpq-pqconnectionneedspassword termfunctionPQconnectionNeedsPassword/functionindextermprimaryPQconnectionNeedsPassword///term listitem diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 14d36b5..7a5bb7a 100644 ***
Re: [HACKERS] Instrument checkpoint sync calls
Robert Haas wrote: Did this get eaten by the email goblin, or you're still working on it? Fell behind due to an unfortunately timed bit of pneumonia. Hurray for the health benefits of cross country flights. Will fix this up, rebase my other patch, and head toward some more review/'Fest cleanup now that I'm feeling better. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
=?ISO-8859-1?Q?C=E9dric_Villemain?= cedric.villemain.deb...@gmail.com writes: I think you (Robert) misunderstood dramatically what Pavel try to do. Pavel did an excellent optimization work for a specific point. This specific point looks crucial for me in the current behavior of PostgreSQL[1]. AFAIS Pavel didn't want to implement a genious syntax, but an optimization feature. As near as I can tell, Pavel is bullheadedly insisting on adding new syntax, not on the optimization aspect of it. I already pointed out how he could get 100% of the performance benefit using the existing syntax, but he doesn't appear to be willing to pursue that route. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST seems to drop left-branch leaf tuples
Peter, glad to know you succeeded. FYI, a year ago we developed GiST extension for rdkit.org. Oleg On Tue, 23 Nov 2010, Peter Tanski wrote: I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help. For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased. On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote: On 2010-11-23 20:54, Peter Tanski wrote: On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote: j = 0; for (i = FirstOffsetNumber; i maxoff; i = OffsetNumberNext(i)) { FPrint* v = deserialize_fprint(entv[i].key); Isn't this off by one? Offset numbers are 1-based, so the maxoff computation is wrong. The first for loop of all others compare with i= maxoff instead of i maxoff. You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.) On the other hand, there are two problems: 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size. 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself: I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec-n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4. btw: interesting topic, audio fingerprinting! regards, Yeb Havinga Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Horizontal Write Scaling
Eliot Gable wrote: However, I have not seen much in the way of a serious attempt at multi-master write scaling. Scaling writes across nodes using PL/Proxy works. Of course, I am assuming the disk system would be RAID 1, RAID 10, RAID 5, or RAID 6 for reliability purposes and that it is sufficiently redundant that you don't have to worry about an outage of your storage system. The idea that you'll have a system that needs better write scalability that isn't limited by the storage system is an unusual one, not the expected case. And the trend everywhere in the industry is away from giant redundant systems, and toward having multiple cheaper redundant copies of all the data instead. It's impossible to protect against things like environmental failure at any single location. Once you've accepted that you have to be able to replicate this beast too if you want high availability, you're back at having a multi-node problem again. This is why the most active work is on distributed designs that start on that basis, rather than projects trying to build more scalable monoliths. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggested easy TODO: pg_dump --from-list
Josh Berkus j...@agliodbs.com writes: Well, very little about pg_dump is very [E], IMNSHO. The question in my mind here is what format the list file will take I was thinking same format as pg_restore -l, only without the dumpIDs. Nope ... those strings are just helpful comments, they aren't really guaranteed to be unique identifiers. In any case, it seems unlikely that a user could expect to get the more complicated cases exactly right other than by consulting pg_dump | pg_restore -l output. Which makes the use-case kind of dubious to me. I don't say that this wouldn't be a useful feature, but you need a better spec than this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] JDBC and Binary protocol error, for some statements
Hi, I work on implementing binary protocol, during test select ?::int has been executed few times to compile it to server prepared statement, but after this backend returned field format = text, but sent field data in binary mode. The parameter is long and is sent to in text mode, with oid 0, as value 2 Result is oid=23, format=(0) T, value = 0x00,0x00,0x00,0x02 When I send parameter explicite in binary mode and int4 oid, problem doesn't occurs. Here is debug of protocol --- 07:52:02.995 (54) simple execute, handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@508aeb74, maxRows=0, fetchSize=0, flags=17 07:52:02.996 (54) FE= Parse(stmt=null,query=SELECT $1::int,oids={0}) 07:52:02.996 (54) FE= Bind(stmt=null,portal=null,$1='2') 07:52:02.996 (54) FE= Describe(portal=null) 07:52:02.996 (54) FE= Execute(portal=null,limit=0) 07:52:02.996 (54) FE= Sync 07:52:02.997 (54) =BE ParseComplete [null] 07:52:02.997 (54) =BE BindComplete [null] 07:52:02.998 (54) =BE RowDescription(1) 07:52:02.998 (54) Field(,INT4,4,B) 07:52:02.998 (54) =BE DataRow(len=4) 07:52:02.998 (54) =BE CommandStatus(SELECT) 07:52:02.999 (54) =BE ReadyForQuery(I) --- Compiled stat exec 07:52:06.059 (54) simple execute, handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@128ae45a, maxRows=0, fetchSize=0, flags=16 07:52:06.059 (54) FE= Parse(stmt=S_1,query=SELECT $1::int,oids={0}) 07:52:06.059 (54) FE= Describe(statement=S_1) 07:52:06.060 (54) FE= Bind(stmt=S_1,portal=null,$1='2') 07:52:06.060 (54) FE= Execute(portal=null,limit=0) 07:52:06.060 (54) FE= Sync 07:52:06.060 (54) =BE ParseComplete [S_1] 07:52:06.061 (54) =BE ParameterDescription 07:52:06.061 (54) =BE RowDescription(1) 07:52:06.061 (54) Field(,INT4,4,T) 07:52:06.061 (54) =BE BindComplete [null] 07:52:06.061 (54) =BE DataRow(len=4) 07:52:06.062 (54) =BE CommandStatus(SELECT) 07:52:06.062 (54) =BE ReadyForQuery(I) org.postgresql.jdbc4.exceptions.PSQLDataException: Zła wartość dla typu int: Kind regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] profiling connection overhead
On 24.11.2010 07:07, Robert Haas wrote: Per previous threats, I spent some time tonight running oprofile (using the directions Tom Lane was foolish enough to provide me back in May). I took testlibpq.c and hacked it up to just connect to the server and then disconnect in a tight loop without doing anything useful, hoping to measure the overhead of starting up a new connection. Ha, ha, funny about that: 120899 18.0616 postgres AtProcExit_Buffers 56891 8.4992 libc-2.11.2.so memset 30987 4.6293 libc-2.11.2.so memcpy 26944 4.0253 postgres hash_search_with_hash_value 26554 3.9670 postgres AllocSetAlloc 20407 3.0487 libc-2.11.2.so _int_malloc 17269 2.5799 libc-2.11.2.so fread 13005 1.9429 ld-2.11.2.so do_lookup_x 11850 1.7703 ld-2.11.2.so _dl_fixup 10194 1.5229 libc-2.11.2.so _IO_file_xsgetn In English: the #1 overhead here is actually something that happens when processes EXIT, not when they start. Essentially all the time is in two lines: 56920 6.6006 :for (i = 0; i NBuffers; i++) :{ 98745 11.4507 :if (PrivateRefCount[i] != 0) Oh, that's quite surprising. Anything we can do about this? That's a lot of overhead, and it'd be a lot worse on a big machine with 8GB of shared_buffers. Micro-optimizing that search for the non-zero value helps a little bit (attached). Reduces the percentage shown by oprofile from about 16% to 12% on my laptop. For bigger gains, I think you need to somehow make the PrivateRefCount smaller. Perhaps only use one byte for each buffer instead of int32, and use some sort of an overflow list for the rare case that a buffer is pinned more than 255 times. Or make it a hash table instead of a simple lookup array. But whatever you do, you have to be very careful to not add overhead to PinBuffer/UnPinBuffer, those can already be quite high in oprofile reports of real applications. It might be worth experimenting a bit, at the moment PrivateRefCount takes up 5MB of memory per 1GB of shared_buffers. Machines with a high shared_buffers setting have no shortage of memory, but a large array like that might waste a lot of precious CPU cache. Now, the other question is if this really matters. Even if we eliminate that loop in AtProcExit_Buffers altogether, is connect/disconnect still be so slow that you have to use a connection pooler if you do that a lot? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 54c7109..03593fd 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1665,11 +1665,20 @@ static void AtProcExit_Buffers(int code, Datum arg) { int i; + int *ptr; + int *end; AbortBufferIO(); UnlockBuffers(); - for (i = 0; i NBuffers; i++) + /* Fast search for the first non-zero entry in PrivateRefCount */ + end = (int *) PrivateRefCount[NBuffers - 1]; + ptr = (int *) PrivateRefCount; + while(ptr end *ptr == 0) + ptr++; + i = ((int32 *) ptr) - PrivateRefCount; + + for (;i NBuffers; i++) { if (PrivateRefCount[i] != 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
2010/11/24 Robert Haas robertmh...@gmail.com: On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/11/23 Robert Haas robertmh...@gmail.com: On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote: ok, I can only recapitulate so this feature was proposed cca two months ago, and minimally Tom and maybe you did agreement - with request on syntax - do you remember? I am little bit tired so this agreement was changed when I spent my time with this. I went back and reread the thread I believe you're speaking about. The first post is here: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php Here perhaps ? (or before) http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php Dang. You're right. I stand corrected. Sorry, I though so you and Tom hasn't a problem with syntax FOR-IN-ARRAY (what is a Kevin Grittner's proposal). So problematic is just my original proposal FOR-IN-expr, but proposed feature isn't rejected. My proposal isn't really genial - is true so first my motivation was to replace a pattern array_lower(var,1)..array_upper(var,1). It's relative simple in ADA, statement FOR is defined over range type, and relative impossible in PL/pgSQL, where range type doesn't exists. Some special construct in PL/pgSQL can to solve iteration over array significantly better and simpler then any other solution - this really must not be syntax FOR-IN-ARRAY - and with any next test and next code checking I am more sure: why: * there is clean indicia so developer wants to process all items in array * there isn't random access to array * is possibility for a reuse varlena types stored in array without a temporal copy I am sorry, so I didn't speaking about these facts ear -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
sorry, there was a broken message 2010/11/24 Pavel Stehule pavel.steh...@gmail.com: 2010/11/24 Robert Haas robertmh...@gmail.com: On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/11/23 Robert Haas robertmh...@gmail.com: On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote: ok, I can only recapitulate so this feature was proposed cca two months ago, and minimally Tom and maybe you did agreement - with request on syntax - do you remember? I am little bit tired so this agreement was changed when I spent my time with this. I went back and reread the thread I believe you're speaking about. The first post is here: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php Here perhaps ? (or before) http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php Dang. You're right. I stand corrected. Sorry, I though so you and Tom hasn't a problem with syntax FOR-IN-ARRAY (what is a Kevin Grittner's proposal). So problematic is just my original proposal FOR-IN-expr, but proposed feature isn't rejected. Sorry, I though so you and Tom hasn't a problem with syntax FOR-IN-ARRAY (what is a Kevin Grittner's proposal). I though so problematic is just my original proposal FOR-IN-expr, but proposed feature isn't a problem. My proposal isn't really genial - is true so first my motivation was to replace unwished pattern array_lower(var,1)..array_upper(var,1). It's relative simple in ADA, where statement FOR is defined over range type, and relative impossible in PL/pgSQL, where range type doesn't exists yet. Some special construct in PL/pgSQL can to solve iteration over array significantly better and simpler then any other solution - there must not be used the syntax FOR-IN-ARRAY - with any next test and next code checking I am more sure: why?: * there is clean indicia so developer wants to process all items in array, or almost all * there isn't random access to array!! * is possibility for a reuse varlena's value stored in array without a temporal copy - with maybe some trick!! * there is a very low overhead I am sorry, so I didn't speaking about these advices early. I though about other possible syntax - what do you think about FOR var OVER expr LOOP ... END LOOP ? OVER is keyword now Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers