Re: [PATCHES] patch implementing the multi-argument aggregates (SOC project)
Tom Lane [EMAIL PROTECTED] writes: This patch is nowhere near ready for submission :-(. Most of the comments seem to be I don't know what to do here ... Just to be clear, I think what Tom's saying it's not ready to be *applied*. Sending patches to this list early and often during development is generally encouraged. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Trivial patch to double vacuum speed on tables with no indexes
Tom Lane [EMAIL PROTECTED] writes: stark [EMAIL PROTECTED] writes: There isn't really any need for the second pass in lazy vacuum if the table has no indexes. How often does that case come up in the real world, for tables that are large enough that you'd care about vacuum performance? Admittedly it's not the most common scenario. But it does come up. ETL applications for example that load data, then perform some manipulation of the data before loading the data. If they have many updates to do they'll probably have to do vacuums between some of them. Arguably if you don't have any indexes on a large table it's quite likely to be *because* you're planning on doing some big updates such that it'll be faster to simply rebuild the indexes when you're done anyways. I would have had the same objection if it resulted in substantially more complex code but it was so simple that it doesn't seem like a concern. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Trivial patch to double vacuum speed on tables with no indexes
) { + maxtuples = (maintenance_work_mem * 1024L) / sizeof(ItemPointerData); + maxtuples = Min(maxtuples, INT_MAX); + maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); + /* stay sane if small maintenance_work_mem */ + maxtuples = Max(maxtuples, MaxHeapTuplesPerPage); + } else { + maxtuples = MaxHeapTuplesPerPage; + } vacrelstats-num_dead_tuples = 0; vacrelstats-max_dead_tuples = (int) maxtuples; -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Concurrent connections in psql patch
Bruce Momjian [EMAIL PROTECTED] writes: Is this something people are interested in? I am thinking no based on the lack of requests and the size of the patch. Lack of requests? I was actually surprised by how enthusiastically people reacted to it. However I don't think the patch as is is ready to be committed. Aside from missing documentation and regression tests it was only intended to be a proof-of-concept and to be useful for specific tests I was doing. I did try to do a decent job, I got \timing and server-tracked variables like encoding. But I need to go back through the code and make sure there are no other details like that. It would be nice to get feedback from other developers from looking at the patch to confirm that there aren't more fundamental problems with the approach and how it uses libpq before I go through the effort of cleaning up the details. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Trivial patch to double vacuum speed on tables with no indexes
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Patch applied. Thanks. Wait a minute. This patch changes the behavior so that LockBufferForCleanup is applied to *every* heap page, not only the ones where there are removable tuples. It's not hard to imagine scenarios where that results in severe system-wide performance degradation. Has there been any real-world testing of this idea? I see the no-index case now: + if (nindexes) + LockBuffer(buf, BUFFER_LOCK_SHARE); + else + LockBufferForCleanup(buf); Let's see what Greg says, or revert. Hm, that's a good point. I could return it to the original method where it released the share lock and did he LockBufferForCleanup only if necessary. I thought it was awkward to acquire a lock then release it to acquire a different lock on the same buffer but it's true that it doesn't always have to acquire the second lock. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] Some editing of docs for create index concurrently
The references to data warehousing and large tables were bothering me since -- while true -- it's not really the main use case for CREATE INDEX CONCURRENTLY. Really it's OLTP systems that may or may not have large tables but regardless cannot stand the downtime caused by locks. Index: doc/src/sgml/indices.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/indices.sgml,v retrieving revision 1.58 diff -c -r1.58 indices.sgml *** doc/src/sgml/indices.sgml 25 Aug 2006 04:06:44 - 1.58 --- doc/src/sgml/indices.sgml 11 Sep 2006 14:36:04 - *** *** 91,101 /para para !Creating an index on a large table can take a long time. By default, !productnamePostgreSQL/productname allows reads (selects) to occur !on the table in parallel with index creation, but writes (inserts, !updates, deletes) are blocked until the index build is finished. !It is possible to allow writes to occur in parallel with index creation, but there are several caveats to be aware of mdash; for more information see xref linkend=SQL-CREATEINDEX-CONCURRENTLY endterm=SQL-CREATEINDEX-CONCURRENTLY-title. --- 91,101 /para para !By default when creating indexes productnamePostgreSQL/productname !allows reads (selects) to occur on the table being indexed in parallel with !index creation, but writes (inserts, updates, deletes) are blocked until !the index build is finished. In production environments this is often !unacceptable. It is possible to allow writes to occur in parallel with index creation, but there are several caveats to be aware of mdash; for more information see xref linkend=SQL-CREATEINDEX-CONCURRENTLY endterm=SQL-CREATEINDEX-CONCURRENTLY-title. cvs diff: Diffing doc/src/sgml/ref Index: doc/src/sgml/ref/create_index.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v retrieving revision 1.56 diff -c -r1.56 create_index.sgml *** doc/src/sgml/ref/create_index.sgml 25 Aug 2006 04:06:45 - 1.56 --- doc/src/sgml/ref/create_index.sgml 11 Sep 2006 14:36:06 - *** *** 264,281 /indexterm para ! Creating an index for a large table can be a long operation. In large data ! warehousing applications it can easily take hours or even days to build ! indexes. It's important to understand the impact creating indexes has on a ! system. !/para ! !para Normally productnamePostgreSQL/ locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the ! index build is finished. /para para --- 264,278 /indexterm para ! Creating an index can interfere with regular operation of a database. Normally productnamePostgreSQL/ locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the ! index build is finished. This could have a severe effect if the system is ! a live production database. Large tables can take many hours to be ! indexed, and even smaller tables can lock out writers for unacceptably ! long periods for a production system. /para para -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] More doc patches
. ! /para ! !sect2 id=ddl-partitioning-managing-partitions !titleManaging Partitions/title ! !para ! Normally the set of partitions established when initally defining the ! table are not intended to remain static. It's common to want to remove ! old partitions of data from the partitioned tables and add new partitions ! for new data periodically. One of the most important advantages of ! partitioning is precisely that it allows this otherwise painful task to ! be executed nearly instantaneously by manipulating the partition ! structure rather than moving the large amounts of data around physically. !/para ! !para ! The simplest option for removing old data is to simply DROP the partition ! that is no longer necessary. ! ! programlisting ! DROP TABLE measurement_y2003mm02 ! /programlisting ! ! This can very quickly delete millions of records because it doesn't have ! to individually delete every record. (Note however that some filesystems ! may still take some time to delete large files.) !/para ! !para ! Another option that is often preferable is to remove the partition from ! the partitioned table but retain access to it as a table in its own ! right. programlisting ALTER TABLE measurement_y2003mm02 NO INHERIT measurement /programlisting ! This allows further operations on the data before actually dropping it. ! Often this is a useful time to back up the data using COPY, pg_dump, or ! other tools, for example. It can also be a useful time to aggregate data ! into smaller formats, perform other data manipulations, or run reports. !/para ! !para ! Similarly we can add a new partition to handle new data. We can create an ! empty partition in the partitioned table just as the original partitions ! were created above. ! ! programlisting ! CREATE TABLE measurement_y2006m02 ( ! CHECK ( logdate = DATE '2006-02-01' AND logdate DATE '2006-03-01' ) ! ) INHERITS (measurement); ! /programlisting ! Alternatively sometimes it's easier to create the table outside the ! partition structure and move it in later. This allows the data to be ! loaded and checked prior to it appearing in the partitioned table. It ! also allows any data manipulation necessary to be performed without the ! intermediate data appearing in queries against the partitioned tale. programlisting ! CREATE TABLE measurement_y2006m02 (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS); \COPY measurement_y2006m02 FROM 'measurement_y2006m02' UPDATE ... ALTER TABLE measurement_y2006m02 ADD CONSTRAINT y2006m02 CHECK ( logdate gt;= DATE '2006-02-01' AND logdate lt DATE '2006-03-01' ); ALTER TABLE measurement_y2006m02 INHERIT MEASUREMENT; /programlisting /para +sect2 id=ddl-partitioning-caveats para ! The following caveats currently apply to partitioned tables: itemizedlist listitem para -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] More doc patches
Neil Conway [EMAIL PROTECTED] writes: BTW, I personally prefer patches to be proper MIME attachments, created as a patch against the root of the Postgres source tree. However, I'm not sure what the other committers prefer. Good to know. I usually generate diffs with cvs -c. I don't know why I did it differently here, weird. Sorry about that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Eliminating phase 3 requirement for varlen increases via ALTER COLUMN
Jonah H. Harris [EMAIL PROTECTED] writes: On 10/26/06, Tom Lane [EMAIL PROTECTED] wrote: This makes some really quite unacceptable assumptions about the meaning and encoding of typmod ... True, so VARCHAR seems like the only one? That's the only one I've really encountered in the field on a fairly regular basis. I think what you want is to add a new method entry in pg_type to allow a type to declare a method to tell you whether a change is work-free or not. Then any type, even user-defined types, can allow some changes to be work-free and some not without exposing any implementation details outside the type. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] WAL logging freezing
Tom Lane [EMAIL PROTECTED] writes: The added WAL volume should be pretty minimal, because only tuples that have gone untouched for a long time incur extra work. That seems like a weak point in the logic. It seems like it would make VACUUM which is already an i/o hog even more so. Perhaps something clever can be done with vacuum_cost_delay and commit_siblings. Something like inserting the delay between WAL logging and syncing the log and writing to the heap. So if another transaction commits in the meantime we can skip the extra fsync and continue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Enums patch v2
Andrew Dunstan [EMAIL PROTECTED] writes: I'm not a big fan of ordering columns to optimise record layout, except in the most extreme cases (massive DW type apps). I think visible column order should be logical, not governed by physical considerations. Well as long as we're talking shoulds the database should take care of this for you anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] WIP patch for operator families
Tom Lane [EMAIL PROTECTED] writes: Attached is the current state of my work on generalizing operator classes into operator families, as per discussion last week. It's far from ready to apply, but I thought I'd circulate it for comments. What's done is ... Comments? Didn't we say discussion should happen on -hackers so development progress would be more visible? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] Assorted typos
A few assorted typos and grammar corrections I caught while skimming source Index: src/backend/access/gin/ginvacuum.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginvacuum.c,v retrieving revision 1.7 diff -c -r1.7 ginvacuum.c *** src/backend/access/gin/ginvacuum.c 4 Oct 2006 00:29:48 - 1.7 --- src/backend/access/gin/ginvacuum.c 4 Jan 2007 12:45:44 - *** *** 34,40 /* * Cleans array of ItemPointer (removes dead pointers) * Results are always stored in *cleaned, which will be allocated ! * if its needed. In case of *cleaned!=NULL caller is resposible to * enough space. *cleaned and items may point to the same * memory addres. */ --- 34,40 /* * Cleans array of ItemPointer (removes dead pointers) * Results are always stored in *cleaned, which will be allocated ! * if it's needed. In case of *cleaned!=NULL caller is resposible to * enough space. *cleaned and items may point to the same * memory addres. */ Index: src/backend/access/heap/tuptoaster.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/tuptoaster.c,v retrieving revision 1.66 diff -c -r1.66 tuptoaster.c *** src/backend/access/heap/tuptoaster.c5 Oct 2006 23:33:33 - 1.66 --- src/backend/access/heap/tuptoaster.c4 Jan 2007 12:45:45 - *** *** 1113,1119 return; /* !* Open the toast relation and it's index */ toastrel = heap_open(attr-va_content.va_external.va_toastrelid, RowExclusiveLock); --- 1113,1119 return; /* !* Open the toast relation and its index */ toastrel = heap_open(attr-va_content.va_external.va_toastrelid, RowExclusiveLock); *** *** 1337,1343 endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE; /* !* Open the toast relation and it's index */ toastrel = heap_open(attr-va_content.va_external.va_toastrelid, AccessShareLock); --- 1337,1343 endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE; /* !* Open the toast relation and its index */ toastrel = heap_open(attr-va_content.va_external.va_toastrelid, AccessShareLock); Index: src/include/catalog/pg_type.h === RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_type.h,v retrieving revision 1.172 diff -c -r1.172 pg_type.h *** src/include/catalog/pg_type.h 4 Oct 2006 00:30:08 - 1.172 --- src/include/catalog/pg_type.h 4 Jan 2007 12:45:59 - *** *** 236,242 * */ ! /* keep the following ordered by OID so that later changes can be made easier*/ /* Make sure the typlen, typbyval, and typalign values here match the initial values for attlen, attbyval, and attalign in both places in pg_attribute.h --- 236,242 * */ ! /* keep the following ordered by OID so that later changes can be made more easily */ /* Make sure the typlen, typbyval, and typalign values here match the initial values for attlen, attbyval, and attalign in both places in pg_attribute.h -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: What did you think about protecting against torn writes using id numbers every 512 bytes. Pretty much not happening; or are you volunteering to fix every part of the system to tolerate injections of inserted data anywhere in a stored datum? I was thinking to do it at a low level as the xlog records are prepared to be written to the filesystem and as the data is being read from disk. I haven't read that code yet to see where to inject it but I understand there's already a copy happening and it could be done there. Even if we optimize out all the copies we could do it in the actual i/o call using readv/writev. I wasn't thinking of doing it on actual disk buffers since it doesn't help us avoid full page writes, but it could be done there too using readv/writev in the smgr. That might be useful for a full_page_writes=off system so even if it can't guarantee no corruption at least it can guarantee no silent corruption. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Pretty much not happening; or are you volunteering to fix every part of the system to tolerate injections of inserted data anywhere in a stored datum? I was thinking to do it at a low level as the xlog records are prepared to be written to the filesystem and as the data is being read from disk. I haven't read that code yet to see where to inject it but I understand there's already a copy happening and it could be done there. You understand wrong ... a tuple sitting on disk is normally read directly from the shared buffer, and I don't think we want to pay for copying it. xlog records -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Tom Lane [EMAIL PROTECTED] writes: Oh, sorry, had the wrong context in mind. I'm still not very impressed with the idea --- a CRC check will catch many kinds of problems, whereas this approach catches exactly one kind of problem. Well in fairness I tossed in a throwaway comment at the end of that email about heap pages. I'll do the same here since I can't resist. But the main thread here is about xlog really. It just seems to me like it's better to target each problem with a solution that addresses it directly than have one feature that we hope hope addresses them all more or less. Having a CRC in WAL but not in the heap seems kind of pointless. If your hardware is unreliable the corruption could anywhere. Depending on it to solve multiple problems means we can't offer the option to disable it because it would affect other things as well. What I would like to see is a CRC option that would put CRC checks in every disk page whether heap, index, WAL, control file, etc. I think we would default that to off to match our current setup most closely. Separately we would have a feature in WAL to detect torn pages so that we can reliably detect the end of valid WAL. That would have to always be on. But having it as a separate feature means the CRC could be optional. Also, incidentally like I mentioned in my previous email, we could do the torn page detection in heap pages too by handling it in the smgr using readv/writev. No copies, no corrupted datums. Essentially the tags would be inserted on the fly as the data was copied into kernel space. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-patches] Recalculating OldestXmin in a long-running vacuum
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: In any case I'd like to see some evidence of significant real-world benefit before adding such a conceptual wart ... I've asked our testers to do a TPC-C run with and without the patch. I'm not expecting it to make a huge difference there, but if you're using a big cost delay, it could make quite a difference for such a simple thing. I think the key here is that it removes the size of the table from the list of factors that govern the steady-state. Currently as the table grows the maximum age of the snapshot vacuum uses gets older too which means a greater percentage of the table is dedicated to dead tuple overhead. (which in turn means a larger table which means an even greater percentage of dead tuples...) With the patch the size of the table is no longer a factor. As long as the work vacuum has on a given page can keep up with the rate of creating dead tuples then it won't matter how large the table is. The only factors that determine the steady state are the rate of creation of dead tuples and the rate at which vacuum can process them. Unfortunately indexes, again, mean that's not entirely true. As the table grows the indexes will grow as well and that will slow vacuum down. Though indexes are usually smaller than tables. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] WIP: Recursive Queries
(rtindex, pstate-p_rtable)); --- 693,720 if (IsA(n, RangeVar)) { /* Plain relation reference */ + RangeVar *rv = (RangeVar *)n; RangeTblRef *rtr; ! RangeTblEntry *rte = NULL; int rtindex; + ListCell *cte; + + if (pstate-p_ctenamespace !rv-schemaname) { + /* We have to check if this is a reference to a common table + * expression (ie subquery defined in the WITH clause) */ + foreach(cte, pstate-p_ctenamespace) + { + RangeSubselect *r = (RangeSubselect*) lfirst(cte); + if (!strcmp(rv-relname, r-alias-aliasname)) + { + rte = transformRangeCTE(pstate, rv, r); + break; + } + } + } + if (!rte) + rte = transformTableEntry(pstate, rv); /* assume new rte is at end */ rtindex = list_length(pstate-p_rtable); Assert(rte == rt_fetch(rtindex, pstate-p_rtable)); Index: src/include/nodes/parsenodes.h === RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/nodes/parsenodes.h,v retrieving revision 1.338 diff -c -r1.338 parsenodes.h *** src/include/nodes/parsenodes.h 9 Jan 2007 02:14:15 - 1.338 --- src/include/nodes/parsenodes.h 30 Jan 2007 18:09:27 - *** *** 789,794 --- 789,795 /* * These fields are used only in upper-level SelectStmts. */ + List *with_cte_list; /* List of Common Table Expressions (ie WITH clause) */ SetOperation op; /* type of set op */ bool all; /* ALL specified? */ struct SelectStmt *larg; /* left child */ Index: src/include/parser/parse_clause.h === RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/parser/parse_clause.h,v retrieving revision 1.48 diff -c -r1.48 parse_clause.h *** src/include/parser/parse_clause.h 9 Jan 2007 02:14:15 - 1.48 --- src/include/parser/parse_clause.h 30 Jan 2007 16:07:39 - *** *** 16,21 --- 16,22 #include parser/parse_node.h + extern void transformWithClause(ParseState *pstate, List *with_cte_list); extern void transformFromClause(ParseState *pstate, List *frmList); extern int setTargetTable(ParseState *pstate, RangeVar *relation, bool inh, bool alsoSource, AclMode requiredPerms); Index: src/include/parser/parse_node.h === RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/parser/parse_node.h,v retrieving revision 1.51 diff -c -r1.51 parse_node.h *** src/include/parser/parse_node.h 5 Jan 2007 22:19:57 - 1.51 --- src/include/parser/parse_node.h 30 Jan 2007 16:07:01 - *** *** 58,63 --- 58,70 * of ParseStates, only the topmost ParseState contains paramtype info; but * we copy the p_variableparams flag down to the child nodes for speed in * coerce_type. + * + * [1] Note that p_ctenamespace is a namespace for relations but distinct + * from p_relnamespace. p_ctenamespace is the list of relations that can be + * referred to in a FROM or JOIN clause. p_relnamespace is the list of + * relations which already have and therefore can be referred to in + * qualified variable references. Also, note that p_ctenamespace is a list + * of RangeSubselects, not a list of range table entries. */ typedef struct ParseState { *** *** 68,73 --- 75,81 * node's fromlist) */ List *p_relnamespace; /* current namespace for relations */ List *p_varnamespace; /* current namespace for columns */ + List *p_ctenamespace; /* current namespace for common table expressions [1] */ Oid *p_paramtypes; /* OIDs of types for $n parameter symbols */ int p_numparams; /* allocated size of p_paramtypes[] */ int p_next_resno; /* next targetlist resno to assign */ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] LIMIT/SORT optimization
Earlier I mentioned I had implemented the limited sort optimization. This kicks in quite frequently on web applications that implement paging. Currently Postgres has to sort the entire data set, often in a disk sort. If the page is early in the result set it's a lot more efficient to just keep the top n records in memory and do a single pass through the result set. The patch is quite simple and is mostly localized to tuplesort.c which provides a new function to advise it of the maximum necessary. It uses the existing heapsort functionality which makes it easy to keep the top n records by peeking at the top element of the heap and removing it if the new record would displace it. In experimenting I found heap sort about half the speed of quicksort so I made it switch over to heap sort if the input size reached 2x the specified maximum or if it can avoid spilling to disk by switching. The two open issues (which are arguably the same issue) is how to get the information down to the sort node and how to cost the plan. Currently it's a bit of a hack: the Limit node peeks at its child and if it's a sort it calls a special function to provide the limit. limit-sort.patch.4 Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] [test]
[testing to see if -patches allows me to post yet. I send a patch last night but haven't seen it come through] -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] Short varlena headers
I've tried repeatedly to send this patch but it doesn't seem to be getting through. It's not in the archives or my inbox. Now that I look I realized that none of the WIP versions of the patch that I sent arrived either. That's fairly disappointing since I had made efforts to keep people apprised of the development. Here's a working patch that provides 1-byte headers and unaligned storage for varlena data types storing 128 bytes (actually = 126 bytes). http://community.enterprisedb.com/varlena/patch-varvarlena-9.patch.gz Things it does: 1) Changes the varlena api to use SET_VARSIZE instead of VARATT_SIZEP 2) Changes the heap_form_tuple api in a subtle way: attributes in a heap tuple may need to be detoasted even if the tuple is never written out to disk. 3) Changes the GETSTRUCT api in another subtle way: it's no loner safe to access then first varlena in a tuple directly through the GETSTRUCT interface. At least not unless special care is taken. 4) Saves a *ton* of space because it saves 3 of the 4 bytes of varlena overhead *and* the up to 4 bytes of alignment padding before every varlena. That turns out to be a lot more space than I realized. I'll post some sample schemas with space savings later. 5) Passes all postgres regression tests. Things it doesn't do: 1) 2-byte headers for objects that exceed 128 bytes :( 2) 0-byte headers for single ascii characters :( 3) avoid htonl/ntohl by using low order bits on little-endian machines 4) provide an escape hatch for types or columns that don't want this behaviour. Currently int2vector and oidvector are specifically exempted since they're used in the system tables, sometimes through the GETSTRUCT api. I doubt anything not used in the system tables has any business being exempted which only leaves us with the occasional text attribute which I plan to double check aren't problems. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PATCHES] 1-byte packed varlena headers
Updated patch at: http://community.enterprisedb.com/varlena/patch-varvarlena-12.patch.gz This fixes a rather critical oversight which caused it all to appear to work but not actually save any space. I've added an assertion check that the predicted tuple size matches the tuple size generated by heap_form*tuple. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] Numeric patch to add special-case representations for 8 bytes
I've uploaded a quick hack to store numerics in 8 bytes when possible. http://community.enterprisedb.com/numeric-hack-1.patch This is a bit of a kludge since it doesn't actually provide any interface for external clients of the numeric module to parse the resulting data. Ie, the macros in numeric.h will return garbage. But I'm not entirely convinced that matters. It's not like those macros were really useful to any other modules anyways since there was no way to extract the actual digits. The patch is also slightly unsatisfactory because as I discovered numbers like 1.1 are stored as two digits currently. But it does work and it does save a substantial amount of space for integers. postgres=# select n,pg_column_size(n) from n; n | pg_column_size --+ 1 | 2 11 | 2 101 | 2 1001 | 3 10001 | 9 11 | 9 1.1 | 9 10.1 | 9 100.1 | 9 1000.1 | 9 1.1 | 11 10.1 | 11 I had hoped to get the second batch to be 3-4 bytes. But even now note how much space is saved for integers 1. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] Updated Packed Varlena (relative to CVS post SET_VARSIZE changes)
I've updated the patch against Tom's recent commit of the SET_VARSIZE changes. It's about 40% the size it was before, most of the remaining patch is actually heaptuple.c and tuptoaster.c which is the key code. http://community.enterprisedb.com/varlena/patch-varvarlena-13.patch.gz I had fixed most of contrib last night but I pulled it out of this patch to keep the size down. I'll send in a separate patch for contrib. I'm not sure what to do about all the GIST data types. All those index support functions take C data structures as datums and I can't tell which ones are supposed to be varlenas and which are just random C data structures. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes
Patric Bechtel [EMAIL PROTECTED] writes: Maybe you want to have a look here: http://www2.hursley.ibm.com/decimal/DPDecimal.html Well we're not really looking for the optimal packing in general. All the problems here have to do with convenience in the implementation rather than the problems with the approach. It's easier in the code to have fixed size scale and weight and to have them at the beginning of the data type, which is fine for larger values but for small values they dominate and waste space. But as far as the approach goes, I admit I find it a bit hard to believe that we're still doing BCD in the 21st century at all. If we stored the digits in base 2 with a base 10 exponent would it really be too hard to output the digits? Even long multiplication and then reversing the results doesn't seem like it would be too bad and surely there must exist reasonable algorithms to do better. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes
Tom Lane [EMAIL PROTECTED] writes: Patric Bechtel [EMAIL PROTECTED] writes: Tom Lane schrieb am 02.03.2007 14:38: Exact decimal fractions are no longer exact when converted to base 2. I think multiplying with base 10 until it's a whole number, then saving that exponent with it, that's how I understood it. That hardly seems likely to win in terms of calculation efficiency --- for example, adding two numbers will now likely require a multiplication in order to align the values for addition. Having to store the exponent doesn't sound that great for the original complaint of too much overhead for short numbers, either... Adding two numbers with two different exponents would require multiplying to set the exponents equal. a) normally you're adding together numbers of the same type so normally you'll have the same precision. b) It's only going to hurt for very large numbers where handling base-10^n numbers is *so* bad that the argument kind of breaks down. c) I was picturing storing the exponent relative to the decimal place instead of relative to the first digit as we do now. That means numbers like 1x10^6 might take more space but numbers like 123456 would take less since we could define a missing exponent to represent an exponent of 0. Incidentally -- doing what I just castigated Jonah for doing -- I looked around and both libgmp and the java math.BigDecimal libraries use base-2 digits. The latter with base-10 exponents no less. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] A little COPY speedup
Tom Lane [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: I'm slightly worried though since that seems to have changed from 8.2, which I oprofiled over Christmas. If you were testing a case with wider rows than Heikki tested, you'd see less impact --- the cost of the old way was O(N^2) in the number of tuples that fit on a page, so the behavior gets rapidly worse as you get down to smaller tuple sizes. (Come to think of it, the cmin/cmax collapse would be a factor here too.) Or larger block sizes of course. A 32kb block would be 16x as bad which starts to be pretty serious. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] A little COPY speedup
Tom Lane [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Feedback from someone else looking to the problem last year. IIRC there was a feeling that if we didn't have to search for delimiters the COPY FROM input parsing could be easier. Of what use is the above comment? You have to parse the input into fields somehow. Well those two requirements aren't inconsistent if you're using fixed-width input text files. Not that I'm enamoured of such file formats, just saying. The only file formats I ever wanted when I was doing this kind of stuff is tab separated, all the others (comma separated and (egads) *pipe* separated?!) are just disasters. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Heap page diagnostic/test functions (WIP)
Simon Riggs [EMAIL PROTECTED] writes: I'll return the infomasks directly, for you to manipulate. Not happy with that, but open to suggestions. Well the alternative would be a long list of boolean columns which would make the output kind of long. Perhaps a function pg_decode_infomask(varbit) which returns a ROW of booleans with appropriate names would be a good compromise. If you want it you could use it in your query. Or perhaps you could include a ROW of booleans in your output already, something like: postgres=# insert into tuple_info values (b'000', ROW(false,false,false)); INSERT 0 1 postgres=# select * from tuple_info; infomask_bits | infomask_flags ---+ 000 | (f,f,f) (1 row) postgres=# select (infomask_flags).* from tuple_info; flag_a | flag_b | flag_c ++ f | f | f (1 row) That might be kind of tricky to cons up though. I had to create a table to do it here. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PATCHES] Packed Varlenas update
. Fixed all regressions in contrib (gist indexed) modules . Fixed a bug that occurs when a compressed datum ends up under 128 bytes . Fixed a bug on zero-column tuples . Added regression tests http://community.enterprisedb.com/varlena/patch-varvarlena-16.patch.gz -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] Updated packed varlena patch
I added a little-endian special-case using the low order bits as described by Tom. Also, I noticed that int2vector and oidvector are already flagged in pg_type with attstorage == 'p'. Really any type with attstorage == 'p' should be exempted from packed varlena handling since it isn't expecting to be toasted. This lets me clean up heaptuple.c a bit so now it doesn't depend on pg_type.h and doesn't have any special case exemption for oidvector and int2vector. It also means any other type can opt out from packed varlena packing by setting attstorage. I've left inet and cidr with toast support and changed them to have attstorage 'm' instead. http://community.enterprisedb.com/varlena/patch-varvarlena-17.patch.gz -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] non-recursive WITH clause support
Here's an updated patch that fixes the bug I had. This is now functional basic non-recursive WITH clause support. http://community.enterprisedb.com/recursive/with-pg82stable-v2.patch.gz It's a pretty short simple patch as is; it just directly inlines any WITH clauses as if they had been written as subqueries. We'll have to do something much more clever to get recursive queries to work but for non-recursive queries that's sufficient. Example: postgres=# with a as (select 1 as x) select * from (select * from a) as x; x --- 1 (1 row) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] WIP patch for plan invalidation
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Comments? Why do CREATE/DROP/REINDEX DATABASE no longer call PreventTransactionChain? sigh, nevermind. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] Packed varlena patch update
I implemented Tom's suggestion of packing the external toast pointers unaligned and copying them to a local struct to access the fields. This saves 3-6 bytes on every external toast pointer. Presumably this isn't interesting if you're accessing the toasted values since they would be quite large anyways, but it could be interesting if you're doing lots of queries that don't access the toasted values. It does uglify the code a bit but it's all contained in tuptoaster.c. http://community.enterprisedb.com/varlena/patch-varvarlena-19.patch.gz I think this is the last of the todo suggestions that were mentioned on the list previously, at least that I recall. Some implications: 1) There's no longer any macro to determine if an external attribute is compressed. I could provide a function to do it if we need it but in all the cases where I see it being used we already needed to extract the fields of the toast pointer anyways, so it wouldn't make sense. 2) We check if a toasted value is replaced with a new one by memcmp'ing the entire toast pointer. We used to just compare valueid and relid, so this means we're now comparing extsize and rawsize as well. I can't see how they could vary for the same valueid though so I don't think that's actually changing anything. Remaining things in the back of my mind though: . I'm not 100% confident of the GIST changes. I think I may have changed a few too many lines of code there. It does pass all the contrib regressions though. . I think there may be some opportunities for optimizing heaptuple.c. In particular the places where it uses VARSIZE_ANY where it knows some of the cases are impossible. It may not make a difference due to branch prediction though. . I've left the htonl/ntohl calls in place in the BIG_ENDIAN #if-branch. They're easy enough to remove and it leaves us the option of removing the #if entirely and just us use network-byte-order instead of the #ifdef. I'm a bit worried about modules building against postgres.h without including the config.h include file. Is that possible? Is it worth worrying about? I think I could make it fail to build rather than crash randomly. . One of the regression tests I wrote makes a datum of 1/2Meg. Can all the build-farm machines handle that? I don't really need anything so large for the regression but it was the most convenient way to make something which after compressing was large enough to toast externally. It might be better to find some less compressible data. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] guc patch: Make variables fall back to default values
Peter Eisentraut [EMAIL PROTECTED] writes: Joachim Wieland wrote: Attached is the long-awaited guc patch that makes values fall back to their default values when they got removed (or commented) from the configuration file. This has always been a source of confusion. I have applied your patch with some of the discussed corrections. The use of memory allocation in the GUC code might still need some review. This is a pretty major user-visible change. While I'm strongly in favour of it it seems like there ought to be some documentation file touched by this, no? Or have I missed it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] guc patch: Make variables fall back to default values
Tom Lane [EMAIL PROTECTED] writes: It's a release-note item ... assuming that it doesn't get reverted in the near future. Could we have some attention to the all-red buildfarm? It's not just a bug. There's code missing. The code seems to assume that all custom variables are strings. There are about half a dozen Assert(variable-vartype == PGC_STRING) throughout the patch. That's not true, plperl's use_strict is a boolean and we have DefineCustome*Variable functions for each type of variable. Perl bombs because plperl.use_strict is a boolean. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] Updated Packed Varlena patch v20 (final?)
Updated patch: I went through the high traffic areas and levelled them up to using the new macros to avoid detoasting smaller arguments. They key areas are really btree operators since they have a noticeable effect on sorts, especially index builds, when the data being sorted fits in memory. There is a bit of a name-game here. The macros I made are called VARDATA_ANY(datum) VARSIZE_ANY(datum) AND VARSIZE_ANY_EXHDR(datum). And the datatype macros are, for example, PG_GETARG_TEXT_PP() and DatumGetTextPP() -- short for packed pointer. Maybe not the prettiest names in the world but clear and I've found them pretty easy to spot when I'm looking for inconsistent use of VARSIZE_ANY-VARDATA for example. I thought of PVARSIZE/PVARDATA (for packed) but I'm afraid it would camouflage such cases. Anyone have any better ideas? If not I'm satisfied with them... Except maybe VARSIZE_ANY_EXHDR() which seems too long. I got to almost everything in varlena.c and varchar.c so that includes text, bpchar, and bytea as well as varchar's few procedures. That includes probably more than I really needed to, but as long as the datatypes are working with bytes it's convenient enough. Also, I replaced my tweaks to hstore and pg_trgm with Teodore's. And of course updated CVS. http://community.enterprisedb.com/varlena/varvarlena-20.patch.gz I'm going to be putting this aside for a little while. I think it's really done. There's more that can be done of course, hit inet and numeric with the new macros, for instance. But I want to see what happens when it gets reviewed before I do that kind of bookkeeping. One thing that I've left in there again is the htonl/ntohl macros in the big-endian case. It really makes sense to either remove them or remove the #ifdef. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] LIMIT/SORT optimization
Bruce Momjian [EMAIL PROTECTED] writes: Is there a newer version of this patch? As requested, I've cut an updated version of this patch against CVS HEAD: http://community.enterprisedb.com/sort-limit-v5.patch.gz The open issues I see are: Adjusting the costing of the sort to take into account the optimization Whether the communication between the Limit node and the Sort node is kosher or whether something more abstract is needed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] LIMIT/SORT optimization
Gregory Stark [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: Is there a newer version of this patch? As requested, I've cut an updated version of this patch against CVS HEAD: http://community.enterprisedb.com/sort-limit-v5.patch.gz Someone asked why I've been posting links rather than attachments. The only reason was because when I posted patches in the past they were dropped by the mailing list. I would say refused except I never received a bounce, the messages just never appeared on list or in the archive. I'll try attaching this patch again, which is relatively small compared to the recursive query patches and packed varlena patches which disappeared into the ether. Also, this one is gzipped whereas in the past I usually attached patches uncompressed so people could read them without saving and uncompressing them. Perhaps one of those differences is the source of the problem? Do people prefer receiving attachments or downloadable links? Does the answer change if the patches are quite large? sort-limit-v5.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] patch adding new regexp functions
Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic constants used throughout the source code. I'm surprised there isn't a macro for it though. Do you need the alignment? If so I want to check the code against the packed varlena patch. Just in case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] patch adding new regexp functions
Gregory Stark [EMAIL PROTECTED] writes: Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic constants used throughout the source code. I'm surprised there isn't a macro for it though. Do you need the alignment? If so I want to check the code against the packed varlena patch. Just in case. Ah, it's just to construct an array, that's not a concern at all. And you're detoasting the text data types before using or storing them so that's fine. The only thing I would say is that this should maybe be a TextGetDatum() just for code hygiene. It should be exactly equivalent though: + PointerGetDatum(matchctx-orig_str), -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] LIMIT/SORT optimization
Did Greg push a version which didn't carry the WIP label to it? As far as I remember he was still asking how to make the Sort and Limit nodes communicate. Good question. I asked for a new version of this patch and the WIP was only in the email subject line. Greg, is this ready for review? Well my question was whether it was acceptable to do things this way or whether there was a better way. If it's the right way then sure, if not then no. I guess that's what review is for. In short I'm not sure what constitutes ready for review. Are you asking if it's ready to apply? I don't know, that's why we have reviews. Or are you asking if it's ready for someone to look at? What's the point of posting WIP patches if you don't want someone to look at it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. ... and do what with the knowledge? Not remove tuples, because any such tuples would be part of RECENTLY_DEAD update chains that that xact might be following now or in the future. Well that just means it might require extra work, not that it would be impossible. Firstly, some tuples would not be part of a chain and could be cleaned up anyways. Others would be part of a HOT chain which might make it easier to clean them up. But even for tuples that are part of a chain there may be solutions. We could truncate the tuple to just the MVCC information so subsequent transactions can find the head. Or we might be able to go back and edit the forward link to skip the dead intermediate tuples (and somehow deal with the race conditions...) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Improvement of procArray.xmin for VACUUM
I have a question about what would happen for a transaction running a command like COPY FROM. Is it possible it would manage to arrange to have no live snapshots at all? So it would have no impact on concurrent VACUUMs? What about something running a large pg_restore? Tom Lane [EMAIL PROTECTED] writes: On the whole though I think we should let this idea go till 8.4; I tend to agree but for a different reason. I think it's something that will open the doors for a lot of new ideas. If we put it in CVS HEAD early in 8.4 I think (or hope at any rate) we'll think of at least a few new things we can do with the new more precise information it exposes. Just as an example, if you find you have no live snapshots can you throw out the combocid hash? Any tuple you find with a combocid that's been discarded that way must predate your current scan and therefore is deleted for you. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Numeric patch to add special-case representations for 8 bytes
Bruce Momjian [EMAIL PROTECTED] writes: Greg, do you want to submit a patch for this or add a TODO item for this? Well I never got any approval or rejection in principle so I don't know if such a patch would be accepted even if it were implemented reasonably. If it has the consensus needed to be a TODO item I would go ahead and do it. The main design issue is that I was proposing to make it impossible to access the internals of the numeric storage using macros. Currently some of the data (the sign, dscale, and weight) is visible without having to call any special numeric functions. I was proposing to use representations where those might not be as easily accessible. However I don't think we have any consumers of that data outside of numeric.c anyways. Is there anything using that functionality currently? Do we mind losing it? --- Gregory Stark wrote: I've uploaded a quick hack to store numerics in 8 bytes when possible. http://community.enterprisedb.com/numeric-hack-1.patch This is a bit of a kludge since it doesn't actually provide any interface for external clients of the numeric module to parse the resulting data. Ie, the macros in numeric.h will return garbage. But I'm not entirely convinced that matters. It's not like those macros were really useful to any other modules anyways since there was no way to extract the actual digits. The patch is also slightly unsatisfactory because as I discovered numbers like 1.1 are stored as two digits currently. But it does work and it does save a substantial amount of space for integers. postgres=# select n,pg_column_size(n) from n; n | pg_column_size --+ 1 | 2 11 | 2 101 | 2 1001 | 3 10001 | 9 11 | 9 1.1 | 9 10.1 | 9 100.1 | 9 1000.1 | 9 1.1 | 11 10.1 | 11 I had hoped to get the second batch to be 3-4 bytes. But even now note how much space is saved for integers 1. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum
Heikki Linnakangas [EMAIL PROTECTED] writes: But what surprises me is that response times went up a with the patch. I don't know why. Maybe because of increased contention of ProcArrayLock? (I assume you are using that, althought I haven't seen the patch) I am, but I doubt that's it. The response times are dominated by I/O, so any increase in lock contention would hardly show up. And the patch is only adding one GetOldestXmin call every 1000 scanned pages, which is nothing compared to the thousands of GetSnapshot calls the normal transactions are issuing per minute. The patch must have changed the I/O pattern in some subtle way. So are you stopping work on the patch? I assume so. Yes, at least for now. I can't believe the patch actually hurts performance, but I'm not going to spend time investigating it. Isn't this exactly what you would expect? It will clean up more tuples so it'll dirty more pages. Especially given the pessimal way vacuum's dirty buffers are handled until Simon's patch to fix that goes in. The benefit of the patch that we would expect to see is that you won't need to run VACUUM as often. In the long term we would expect the stock table to grow less too but I doubt these tests were long enough to demonstrate that effect. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [PATCH] add CLUSTER table ORDER BY index
Holger Schurig [EMAIL PROTECTED] writes: * psql tab-completion, it favours now CLUSTER table ORDER BY index It occurs to me (sorry that I didn't think of this earlier) that if we're going to use ORDER BY it really ought to take a list columns. It would be more consistent with what ORDER BY means in queries and one day we may want to add support for ordering by arbitrary lists of columns even if no index exists. It may be reasonable to allow both to coexist and just have an arbitrary rule that if an index of the specified name exists takes precedence over any columns. I've never seen anyone name their indexes in a way that would conflict with a column anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] LIMIT/SORT optimization
Heikki Linnakangas [EMAIL PROTECTED] writes: Some comments on the patch below. Thanks! Gregory Stark wrote: The comment claims that we use heap sort when the user says he doesn't want to use glibc's qsort. I recall that we always use our own qsort implementation nowadays. And we never used the heap sort as a qsort replacement, did we? Thanks, I had a version that used heap sort instead of qsort but that was before I discovered what you said. So I stripped that useless bit out. In performsort, you convert the in-memory heap to a sorted list in one go. I wonder if it would be better to switch to a new TSS_ALLINHEAP state that means all tuples are now in the in-memory heap, and call tuplesort_heap_siftup in gettuple. The problem is that the heap is backwards. The head of the heap is the greatest, ie, the last element we want to return. Hm, Is there such a thing as a two-way heap? There's a few blocks of code surrounded with #if 0 - #endif. Are those just leftovers that should be removed, or are things that still need to finished and enabled? Uhm, I don't remember, will go look, thanks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] LIMIT/SORT optimization
Updated patch attached: 1) Removes #if 0 optimizations 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to keep the code self-documenting purposes rather but isn't needed by anything currently 3) Fixed cost model to represent bounded sorts sort-limit-v7.patch.gz Description: Binary data Gregory Stark [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] writes: There's a few blocks of code surrounded with #if 0 - #endif. Are those just leftovers that should be removed, or are things that still need to finished and enabled? Uhm, I don't remember, will go look, thanks. Ok, they were left over code from an optimization that I decided wasn't very important to pursue. The code that was ifdef'd out detected when disk sorts could abort a disk sort merge because it had already generated enough tuples for to satisfy the limit. But I never wrote the code to actually abort the run and it looks a bit tricky. In any case the disk sort use case is extremely narrow, you would need something like LIMIT 5 or more to do it and it would have to be a an input table huge enough to cause multiple rounds of merges. I think I've figured out how to adjust the cost model. It turns out that it doesn't usually matter whether the cost model is correct since any case where the optimization kicks in is a case you're reading a small portion of the input so it's a case where an index would be *much* better if available. So the only times the optimization is used is when there's no index available. Nonetheless it's nice to get the estimates right so that higher levels in the plan get reasonable values. I think I figured out how to do the cost model. At least the results are reasonable. I'm not sure if I've done it the right way though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Blocked post
Magnus Hagander [EMAIL PROTECTED] writes: Plaintext patch, plus two plaintext .c files. Then I have no clue. Actually I've had more success with .tar.gz than plain text attachments. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Packed Varlena Update (v21)
Tom Lane [EMAIL PROTECTED] writes: I wrote: stark [EMAIL PROTECTED] writes: [ packed varlena patch ] Applied with revisions. Forgot to mention: one of the revisions was to not add the sizes.sql test, because the output was platform-dependent and is likely to get more so if any ability to change the toast thresholds gets put in. Can we improve that test to not expose any platform-dependent numbers? I had imagined that we would have two versions of output files. Other than 32-bit versus 64-bit what other platform-dependent variations enter into it? I'll look at it again on a 64-bit machine, but probably not for another week. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] non-recursive WITH clause support
David Fetter [EMAIL PROTECTED] writes: On Mon, Apr 09, 2007 at 06:43:08PM -0400, Tom Lane wrote: So this does not really add any new functionality, it's just variant syntax for something you can do about as easily without it, right? Not totally as easily. For example, you can do some kinds of aggregation with a few fewer keystrokes. I think fewer keystrokes is exactly what Tom meant by a variant syntax without new functionality. That's an accurate description. I suppose it depends in part on how important we think it is to add variant syntaxes just because they're blessed by the ANSI standard. If this were a syntax we were creating just for our user's convenience it would be a pretty weak justification for an incompatibility. But if there are users who expect this syntax to work because it's standard then it could be considered an omission in our standards compliance. I'm actually not too sure what the answer is. I hadn't heard of it before the discussion about recursive queries myself. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Packed Varlena Update (v21)
Bruce Momjian [EMAIL PROTECTED] writes: Added to TODO: o Allow single-byte header storage for arrays Fwiw this is single-byte header storage for varlena array *elements* The arrays themselves already get the packed varlena treatment. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Dead Space Map version 3 (simplified)
Hiroki Kataoka [EMAIL PROTECTED] writes: I think there is no problem. Bloating will make pages including the unnecessary area which will not be accessed. Soon, those pages will be registered into DSM. Except the whole point of the DSM is to let us vacuum those pages *before* that happens... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] LIMIT/SORT optimization
Tom Lane [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: I did some performance testing of the patch, and the results were good. I did this: test= CREATE TABLE test (x INTEGER); test= INSERT INTO test SELECT * FROM generate_series(1, 100); test= SET log_min_duration_statement = 0; test= SELECT * FROM test ORDER BY x LIMIT 3; LIMIT 3 seems an awfully favorable case; if the patch can only manage a factor of 4 speedup there, what happens at limit 10, 20, 100? I did a whole bunch of benchmarking and spent a pile of time gathering numbers in gnumeric to make some pretty graphs. Then gnumeric crashed :( It's days like this that make me appreciate our own code review process even if I'm usually wishing it were a bit looser. Until I gather all those numbers together again I'll just describe the executive summary. I found that the limit of a factor of 4 improvement was mostly an artifact of the very narrow and cheap to sort keys. When more of the time is spent pushing around large tuples or in even moderately expensive comparisons the speedup is much greater. When I tested with narrow tuples consisting of only a single integer I found that the maximum speedup was, as Bruce found, about 4x. I think what's going on is that the overhead in the linear factor of just reading in all those tuples is large enough to hide the improvements in sorting. Especially since with such narrow tuples the resulting data is just too small to overflow filesystem cache where the big benefits come. When I tested with text strings ranging from 0-97 characters long in locale en_GB.UTF-8 I found that the speedup kept going up as the input data set grew. Sorting the top 1,000 strings from an input set of 1,000,000 strings went from 8m11s in stock Postgres to 12s using the bounded heapsort. (Which was an even better result than I had prior to fully randomizing the data. It probably just got packed better on disk in the source table.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] updated SORT/LIMIT patch
Updated patch against cvs update in case it makes applying easier. One minor change: . Added #include limits.h in tuplesort.h to pull in UINT_MAX (thanks to dpage for noticing this is necessary on OSX) sort-limit-v8.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] updated SORT/LIMIT patch
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Updated patch against cvs update in case it makes applying easier. Applied with revisions --- notably, I avoided adding any overhead to HEAPCOMPARE() by the expedient of reversing the logical sort order before heapify'ing. We couldn't have done that before the NULLS_FIRST patch went in, but now it's trivial to make the sort order reverse fully. Hum. The major change I see is the bit related to rescans where you made it resort if the bound had changed. But surely the only way the bound can change is if it's a parameter, and if there is a parameter then surely the executor must be doing more than just a plain rescan? The sort key could have changed if it depends on the parameter. What does the executor do differently in the case of a subplan with a parameter that makes it re-execute the plan from scratch and not just do a simple rescan? Since you didn't include any documentation patch for the optimize_bounded_sort GUC variable, I assumed it was meant only for debugging and hid it behind #ifdef DEBUG_BOUNDED_SORT. Sure, I originally had it #ifdef'd on TRACE_SORT but took it out for reasons that I don't recall. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] Doc and comment patch for packed varlena
=== RCS file: /home/stark/src/REPOSITORY/pgsql/src/include/postgres.h,v retrieving revision 1.80 diff -c -r1.80 postgres.h *** src/include/postgres.h 4 May 2007 02:01:02 - 1.80 --- src/include/postgres.h 9 May 2007 15:38:37 - *** *** 235,240 --- 235,246 * use VARSIZE_ANY/VARSIZE_ANY_EXHDR/VARDATA_ANY. The other macros here * should usually be used only by tuple assembly/disassembly code and * code that specifically wants to work with still-toasted Datums. + * + * WARNING: It is only safe to use VARDATA_ANY() -- typically with + * PG_DETOAST_DATUM_UNPACKED() -- if you really don't care about the alignment. + * Either because you're working with something like text where the alignment + * doesn't matter or because you're not going to access its constituent parts + * and just use things like memcpy on it anyways. */ #define VARDATA(PTR) VARDATA_4B(PTR) #define VARSIZE(PTR) VARSIZE_4B(PTR) *** *** 265,270 --- 271,277 VARSIZE_4B(PTR)-4)) /* caution: this will not work on an external or compressed-in-line Datum */ + /* caution: this will return a possibly unaligned pointer */ #define VARDATA_ANY(PTR) \ (VARATT_IS_1B(PTR) ? VARDATA_1B(PTR) : VARDATA_4B(PTR)) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PATCHES] Concurrent psql patch
Fixed the major omissions that made it incomplete. - Added sgml documentation and \? usage - Added basic mvcc regression tests using new functionality - Fixed cursor-mode (FETCH_COUNT) functionality - Removed \cwait in favour of psql variable ASYNC_DELAY I'm still not sure it's quite polished enough to commit but if there's any feedback I'll be happy to fix up anything that's not acceptable. Also, if anyone has any better ideas for names than \cswitch and \cnowait now's the time. I had intended them only as placeholders because I couldn't think of anything better but it doesn't sound like anyone else has any better ideas either. If not then we're going to be stuck with them. More or less, it's explicitly described as an experimental feature in the docs so I suppose we could always change them later. concurrent-psql-v7.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] updated WIP: arrays of composites
Tom Lane [EMAIL PROTECTED] writes: * I'm a bit concerned about dump order. If a user wants to create types named foo and _foo, he can, but it will only work if he makes _foo first --- else the derived type for foo is in the way. Since pg_dump has no clue about that constraint, it might easily dump foo first leading to an unrestorable dump. The most usable solution would be to auto-rename previously created array types, but I dunno how implementable that would be. BTW, why exactly do we need array types to have names at all? The only user-visible way to refer to these types is always by foo[] isn't it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Concurrent psql patch
Jim Nasby [EMAIL PROTECTED] writes: I don't see how we could make the names shorter without moving away from a backslash command (which I'm guessing would be painful). Assuming we're stuck with a backslash command \cs[witch] and \cn [owait] seem to be about as good as we could get. I don't have \cs or \cn set up as abbreviations. I was originally thinking \c1, \c2, ... for \cswitch and \c for \cnowait. I'm not sure if going for cryptic short commands is better or worse here. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Concurrent psql patch
David Fetter [EMAIL PROTECTED] writes: I was originally thinking \c1, \c2, ... for \cswitch and \c for \cnowait. I'm not sure if going for cryptic short commands is better or worse here. +1 for \c1, \c2, etc. What's the reasoning behind \c? Does it send things into the background the way does in the shell? Sort of. It sends the *subsequent* command to the background... And unlike the shell you can't usefully do anything more in the current session while the command is in the background, you have to manually switch sessions before issuing subsequent commands. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Concurrent psql patch
Jim C. Nasby [EMAIL PROTECTED] writes: On Sun, May 13, 2007 at 02:39:45PM +0100, Gregory Stark wrote: I was originally thinking \c1, \c2, ... for \cswitch and \c for \cnowait. I'm not sure if going for cryptic short commands is better or worse here. \c sounds fine (as do \c1...\c9). \g would probably be helpful as well (send query buffer to server in nowait mode). Er, I just realized I typed the wrong thing there. It can't be \c since I do assign a meaning to that make a new connection to the same place as this one. I meant \ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Concurrent psql patch
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: David Fetter [EMAIL PROTECTED] writes: What's the reasoning behind \c? Does it send things into the background the way does in the shell? Sort of. It sends the *subsequent* command to the background... That sounds just bizarre. Existing backslash commands that do something to a SQL command are typed *after* the command they affect (\g for instance). I don't think you should randomly change that. So would you prefer \g as Jim Nasby suggested? I hadn't even considered that previously since I'm not accustomed to using \g but it does seem kind of pretty. I normally use ; but I suppose there's nothing wrong with just declaring that asynchronous commands must be issued using \g rather than use the semicolon to fire them off. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [DOCS] OS/X startup scripts
Tom Lane [EMAIL PROTECTED] writes: David Fetter [EMAIL PROTECTED] writes: I haven't included the customary diffs. This points me to some of the many deficiencies of CVS, namely that I would need write access in order to have it create a diff, Strange, it works fine for everyone else. If you have rsh/ssh access to a CVS repository then you do in fact need write access just to generate diffs. It is one of the annoyances of CVS but it doesn't really matter for Postgres where we use pserver anonymous access. Personally I find CVS so terribly slow for large trees like Postgres that it's essential to use rsync to maintain a local CVS repository. That makes 'cvs diff' remarkably fast. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [DOCS] OS/X startup scripts
Heikki Linnakangas [EMAIL PROTECTED] writes: You need write-access to add files, even on anonymouse server. We often get patches with new files as separate attachments because of that. Oh quite right. I had forgotten but that was the original reason I switched to using rsync. The alternative is to manually edit the Entries files to list the new files. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Concurrent psql patch
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: So would you prefer \g as Jim Nasby suggested? I hadn't even considered that previously since I'm not accustomed to using \g but it does seem kind of pretty. I normally use ; but I suppose there's nothing wrong with just declaring that asynchronous commands must be issued using \g rather than use the semicolon to fire them off. It makes sense to me... but what is the state of the session afterward? Should this be combined with switching to another connection? It's an interesting idea since you'll inevitably have to switch connections. If you issue a second query it'll forces the session to wait for the results. (It doesn't seem like there's any point in keeping a queue of pending queries per session.) However we do still need a command to switch back anyways so there doesn't seem to be any advantage in combining the two. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Concurrent psql patch
Jim C. Nasby [EMAIL PROTECTED] writes: Would \c# limit us to 9 concurrent connections? Might want \cs[witch] [session] Hm, we kind of have a choice with \c#. Either we treat it as part of the command in which case the way to connect to an integer-named database is to include a space. We could even have it magically connect to a database if the connection isn't already active. But these kinds of inconsistent behaviours can be traps for users. It means \c1 and \c 1 do different things even though \cpostgres and \c postgres do the same thing. And it means \c1 might connect to a database named 1 today but switch sessions tomorrow. Or we treat it as the first argument in which case even \c 9 switches to session 9. I would prefer to do that but I fear there may be people with databases named 9. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Concurrent psql patch
Jim C. Nasby [EMAIL PROTECTED] writes: Since this command will be getting used very frequently by anyone using concurrent connections interactively, it'd be nice if it was lower-case. It looks like that limits us to j, k, m, n, v, and y. In unix this idea is about jobs, what about using \j? Well currently it's not really terribly interesting to use interactively since you could always just start a second shell and run a second instance of psql. I really only have regression tests in mind for it. That's why I don't find it a problem at all to only extend \g and not semicolon handling. That said, I think a next step for this for interactive use would be to handle C-z to background the currently running query. So perhaps it does make sense to keep use cases like that when deciding on command names now. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] updated SORT/LIMIT patch
Tom Lane [EMAIL PROTECTED] writes: [ greps a bit... ] It looks like the only way that you could expose the bug in the current state of the system would be if the sort/limit with the outer parameter were the inside of a nestloop join in the subplan. nodeNestloop would set EXEC_FLAG_REWIND, causing nodeSort to set randomAccess, allowing ExecReScanSort to suppose that it could rewind the sort. I finally managed to trigger this case and found that the checks don't actually work: postgres=# SELECT (SELECT n FROM (VALUES (1)) AS x, (SELECT n FROM generate_series(1,10) AS n ORDER BY n LIMIT 1 OFFSET s-1) AS y) AS z FROM generate_series(1,10) AS s; ERROR: retrieved too many tuples in a bounded sort What's going on is that nodeLimit.c only invokes recompute_limit when the first tuple is actually generated. It has a comment saying (We can't do this any earlier, because parameters from upper nodes may not be set until now.) So the checks are still comparing the previous bound against the boundDone. Attached is a small patch which fixes this case. It also makes the check slightly more liberal -- we don't need to resort if the previous sort was unbounded or the bound was greater than or equal to the new bound. There is one bit I'm not too sure of. We may or may not end up requesting tuples from our child node. If we do we have to ReScan it but by then we don't have the exprCtx passed to the ReScan call. I just made it call ReScan always even if we later decide we can just rewind the tuplesort, is that ok? Also, I left a comment that it would be nice if we could peek at the tuplesort's boundUsed and state to avoid resorting unnecessarily. Currently it pretty much always resorts unless you construct a bizarre query like the above to force the randomAccess flag to be true. Most of the time tuplesort is going to sort in memory anyways even if random access isn't requested and resorting is pointless. I think it would be worthwhile adding a method to tuplesort to ask whether random access is possible and how many tuples were actually kept. Then nodeSort could ask it those values instead of just remembering what values were requested. sortlimit-fix-v2.diff.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] updated SORT/LIMIT patch
Alvaro Herrera [EMAIL PROTECTED] writes: Gregory Stark wrote: Attached is a small patch which fixes this case. It also makes the check slightly more liberal -- we don't need to resort if the previous sort was unbounded or the bound was greater than or equal to the new bound. Huh, can you clarify this comment: +* XXX It would be nice to check tuplesortstate-boundUsed too but that +* seems like an abstraction violation. And for that matter to check +* the tuplesort to see if randomaccess is possible even if it wasn't +* requested so we don't resort input when the parameters haven't +* changed if it was sorted in memory. I'm having serious trouble parsing it. Well in the executor currently we check node-boundedDone and node-boundDone to see whether the previous execution of this node had a bound. If it did and we now don't or if it did but now our bound is larger then we have to re-execute. However the tuplesort may have actually done an unbounded sort either because the bound (actually bound*2) may not have been reached or because it spilled to disk and we did a disk sort. It would be nice to check that and not have to reexecute unnecessarily. But that means peeking into tuplesort's state. I started doing a patch yesterday to do this by exporting a new method on tuplesort: bool tuplesort_random_access(Tuplesortstate *state, bool bounded, unsigned tuples_needed) { switch (state-status) { case TSS_FINALMERGE: return false; case TSS_SORTEDONTAPE: return state-randomAccess; case TSS_SORTEDINMEM: return (!state-boundUsed || (bounded state-bound tuples_needed)); default: return false; } } That solves the abstraction barrier issue but I'm not sure if it's quite detailed enough. Really Sort only needs to rewind the tuplesort which it can do even if state-randomAccess is false. We may need two separate functions, one which tests if rewind is supported and another which tests if random access is supported. Also, I haven't looked at how the final merge case is handled. It might be possible to support rescan (but not random access) in that state as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [DOCS] OS/X startup scripts
David Fetter [EMAIL PROTECTED] writes: On Wed, May 16, 2007 at 09:12:23AM +0100, Heikki Linnakangas wrote: Jim C. Nasby wrote: BTW, is there some trick to getting cvs diff to ignore files that aren't in the repo? Trick? That's what it does by default. I suspect he's talking about all the lines starting with '?' that diff produces. Lacking sophistication, I've been known to do: cvs diff [list of files here] |grep -v '^?' the_file.diff Those lines go to stderr. If you do cvs diff file it spits out all the cvs file statuses to the terminal but dumps the diff to the file. It doesn't matter, diffs can contain arbitrary junk between the file diffs. patch only looks at the things it recognizes. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Concurrent psql patch
So based on the feedback and suggestions here this is the interface I suggest: \connect - to open a new connection keeping the existing one \g - to submit a command asynchronously (like in the shell) \S [Sess#] - to _S_witch to a different _S_ession - if no connection # specified list available _S_essions \D - _D_isconnect from current session (like ^D in the shell) This leaves no way to submit an asynchronous command without using \g but I'm really not too concerned with that. I don't want to start messing with psql's semicolon parsing behaviour and I'm mainly only concerned with this for regression tests. Another thought I had for the future is a \C command to simulate C-c and send a query cancel. That would let us have regression tests that query cancellation worked. The tests would presumably have to be written using pg_sleep() to ensure they ran for long enough but even then there would be no way to control exactly when the interrupt arrived. Attached is an updated patch. I also found and fixed some missing ResetCancelConn()s. I think I got them all and the behaviour seems correct in practice when cancelling various combinations of synchronous queries, asynchronous queries, and backslash commands. The one thing I wonder about is that I'm a bit concerned I may have introduced an assumption about how many resultsets arrive from a single query. I'll be offline for a few days but I'll be back Monday. concurrent-psql-v8.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] updated SORT/LIMIT patch
Tom Lane [EMAIL PROTECTED] writes: This patch makes what was already a hack into a full-fledged crock (and it's not just the self-doubting comments that make me distrust it). I think we need to rip out this ad-hoc parameter change signaling code and make it work through the regular chgParam mechanism. Not sure about details though. There may be no clean solution short of folding Sort and Limit into a single node type. Well I can't disagree, I always was concerned about the inter-node communication part. If I have power on the train I might look at it then but otherwise I'm offline until Monday. I think it would be worthwhile adding a method to tuplesort to ask whether random access is possible and how many tuples were actually kept. Then nodeSort could ask it those values instead of just remembering what values were requested. I disagree with this line of development, as it amounts to exposing tuplesort implementation details as API. I'm not sure I agree. What's the difference if between using a boolean value we pass to tuplesort requesting random access and using a boolean value we get back from asking tuplesort? The tuples_needed is a bit of a wart but then it's the same inevitable wart as set_bound. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] updated SORT/LIMIT patch
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This patch makes what was already a hack into a full-fledged crock (and it's not just the self-doubting comments that make me distrust it). I think we need to rip out this ad-hoc parameter change signaling code and make it work through the regular chgParam mechanism. Not sure about details though. There may be no clean solution short of folding Sort and Limit into a single node type. Well I can't disagree, I always was concerned about the inter-node communication part. If I have power on the train I might look at it then but otherwise I'm offline until Monday. I did in fact have power on the train and due the wonders of a local rsync'd CVS repository I could even view cvs logs and diffs offline. It's interesting how I've read all this code and comments several times and each time I get more out of it. It doesn't look like the timing of the ExecRescan is an issue at all. There are plenty of nodes that Rescan their children much later than when they first start up. Even Nested Loop does so. I do still need to read more about parameters and how the parameter sets are initially constructed. It would be nice to set it up so the sort node gets signalled using the existing infrastructure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] CREATE TABLE LIKE INCLUDING INDEXES support
Tom Lane [EMAIL PROTECTED] writes: NikhilS [EMAIL PROTECTED] writes: I just now realized that even though we allow the above. We do not allow: pg=# create table t1 (a int, b int, unique(a+b)); Any specific reason for this behaviour? It'd be contrary to SQL spec. The UNIQUE constraint takes a list of column names, full stop. Does the SQL spec actually specify what happens if you provide an non-compliant table definition like this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Concurrent psql patch
Andrew Dunstan [EMAIL PROTECTED] writes: Gregory Stark wrote: Attached is an updated patch. This patch appears to add a nonexistent test to the regression schedules. I must have forgotten to cvs add it. Sorry. Also, I forgot to mention previously there is an unrelated trivial hunk in here. I noticed we free the password early, presumably for security reasons, but don't actually overwrite the memory at that point. I added a memset in there, otherwise the free seems kind of pointless. I normally don't bother with security features like that since they don't really add any security but as long as it's there it may as well do something vaguely useful. concurrent-psql-v9.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] cluster test
Tom Lane [EMAIL PROTECTED] writes: This is in the regression database after a completed regression run, so it's possible that it's a bit different state from what's seen at the instant the cluster test was running, but it sure looks like the expected results are what you get from a seqscan. Would you force a seqscan and see what EXPLAIN shows as the cost on your machine? Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore different size tables which because the planner does the lseek to measure the table size shows up as different estimates for sequential scan costs? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Seq scans status update
Tom Lane [EMAIL PROTECTED] writes: A point I have not figured out how to deal with is that in the patch as given, UnpinBuffer needs to know the strategy; and getting it that info would make the changes far more invasive. But the patch's behavior here is pretty risky anyway, since the strategy global at the time of unpin might have nothing to do with how it was set when the buffer was acquired. What I'm tempted to do is remove the special case there and adjust buffer acquisition instead (maybe make it decrement the usage_count when re-using a buffer from the ring). Is there a reason UnpinBuffer has to be the one to increment the usage count anyways? Why can't ReadBuffer handle incrementing the count and just trust that it won't be decremented until the buffer is unpinned anyways? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Concurrent psql patch
Tom Lane [EMAIL PROTECTED] writes: Andrew Dunstan [EMAIL PROTECTED] writes: if (pset.c-db-asyncStatus != PGASYNC_BUSY) { break; } There already is a defined API for this, namely PQisBusy(). In any case, I rather concur with the XXX comment: busy-waiting like this sucks. The correct way to do this is to get the socket numbers for the connections (via PQsocket), wait for any of them to be read-ready according to select() (or for the timeout to elapse, assuming that we think that behavior is good), then cycle through PQconsumeInput() and PQisBusy() on each connection. See http://www.postgresql.org/docs/8.2/static/libpq-async.html Huh, so it turns out we already have code that does exactly this in pqSocketPoll and pqSocketCheck. Except that they have too little resolution because they work with time_t which means we would have to wait at least 1-2 seconds. And pqSocketCheck keeps looping when it gets an EINTR which doesn't seem like the right thing for psql to do. It would be nice to use these functions though because: a) They get the SSL case right in that that they check the SSL buffer before calling select/poll. b) They use poll if available and fall back to select c) they would keep the select/poll system code out of psql where there's none of it currently. So would I be better off adding a PQSocketPollms() which works in milliseconds instead of seconds? Or should I just copy all this code into psql? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Concurrent psql patch
Andrew Dunstan [EMAIL PROTECTED] writes: Unless you have a better solution I think we'd better do that, though. I notice that the header is also included in command.c even though it doesn't use PGASYNC_IDLE. Are you going to make these fixes? Yes, sorry, I started to already but got distracted by some tests I've been running. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] Bug in date.c
: ~!@ I would be inclined to just go ahead and just call textout which would effectively be pallocing a copy. Is there some reason these functions in particular shouldn't leak memory? I've attached both a patch that does that and a patch that just makes the minimal fix of calling textout when the error is thrown. Alternatively it might be handy to have a custom escape in errmsg format strings for text varlena data. date.c.patch.gz Description: Binary data date-minimal.c.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Synchronized scans
Heikki Linnakangas [EMAIL PROTECTED] writes: Were you thinking of storing the PID of the backend that originally created the hint, or updating the PID every time the hint is updated? In any case, we still wouldn't know if there's other scanners still running. My reaction was if you always put the pid in when updating the hint it might work out pretty well. When you finish you remove the hint iff the pid is your own. It has exactly the same properties you describe of leaving a small window with no hint when you finish a scan but only if you were the last to update the scan position which i would expect would be pretty rare except for the last scanner. If a backend died it would leave a scan position behind but the next scanner on that table would overwrite the pid and then remove it when it's finished. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Synchronized scans
Tom Lane [EMAIL PROTECTED] writes: The vacuum-cost-limit issue may be sufficient reason to kill this idea; not sure. We already have a much higher cost for blocks that cause i/o than blocks which don't. I think if we had zero cost for blocks which don't cause i/o it would basically work unless the sleep time was so large that the other scans managed to cycle through the entire ring in that time. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Synchronized scans
Heikki Linnakangas [EMAIL PROTECTED] writes: I don't think sync-scanning vacuum is worth pursuing, though, because of the other issues: index scans, vacuum cost accounting, and the fact that the 2nd pass would be harder to synchronize. There's a lot of other interesting ideas for vacuum that are more generally applicable. I think we could probably arrange for vacuum to synchronize. If there's one sequential scan running we have to imagine there are others coming along soon too. so if we become desynchronized we'll just coerce the next one to start where we want and follow it for a while. However I have a another worry. Even if we did manage to get vacuum synchronizing well what would it do to the sequential scan performance. Instead of zipping along reading clean blocks into its small ring buffer and discarding them when it's done it'll suddenly find many of its blocks dirty when it goes to reuse them. Effectively we'll have just reinvented the problem we had with vacuum previously albeit in a way which only hits sequential scans particularly hard. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] Two aesthetic bugs in the 1-byte packed varlena code
Korry spotted two violations of the rule about accessing the first varlena field directly which I had missed. In both cases I believe the accesses are actually safe because they follow an int4 column and the data are either explicitly detoasted or passed to textout which can handle 1-byte headers fine. One instance is in flatfiles.c when it accesses the password field which is a text field. Here I don't see any reasonable way to avoid it and think we should just document that we're bending the rules. The other instance is in inv_api.c where it would be quite possible to use fastgetattr() instead. But the column is always at the same fixed offset and again it follows an int4 so it'll always be 4-byte aligned and work fine. So for performance reasons perhaps we should keep this as well? I've attached three patches. A patch which adds a comment for flatfiles.c. And two patches for inv_api.c one which just adds a comment and one which converts it to use fastgetattr(). Again, credit to Korry for spotting these. To do so he commented out all the varlena fields (excluding int2vector and oidvector) from the struct definitions and saw what broke. Why do we even have those fields in the structs if they're unsafe to use? Perhaps we should #if 0 out all the unsafe fields from all the struct definitions? That would avoid one of the most common new programmer bugs and also let us document the few exceptions and why they're allowed. inv_api.c-minimal.gz Description: Binary data flatfiles.c-minimal.gz Description: Binary data inv_api.c-fastgetattr.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Two aesthetic bugs in the 1-byte packed varlena code
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Why do we even have those fields in the structs if they're unsafe to use? 1. genbki.sh But genbki.sh wouldn't care if we #if 0 around the unsafe ones would it? 2. As you note, they're not always unsafe to use. Well, it seems like it would be nice to mark which ones are and aren't unsafe rather than have them all be there waiting to trip people up. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Two aesthetic bugs in the 1-byte packed varlena code
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: The other instance is in inv_api.c where it would be quite possible to use fastgetattr() instead. But the column is always at the same fixed offset and again it follows an int4 so it'll always be 4-byte aligned and work fine. So for performance reasons perhaps we should keep this as well? After looking closer, I think there are worse problems here: the code is still using VARSIZE/VARDATA etc, which it should not be because the field could easily be in 1-byte-header form. Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so it'll detoast them first. We could avoid the detoasting but given that it's expecting the chunks to be compressed anyways the memcpys of the smallest chunks probably don't matter much either way. I'm assuming it's like toast in that only the last chunk will be smaller than LOBLKSIZE anyways, right? I concur that we probably want to avoid adding fastgetattr for performance reasons, seeing that this table's layout seems unlikely to change. But it seems like we might want to trouble with a null test. Thoughts? There should never even be a null bitmap right? Maybe we should just elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all. Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PATCHES] Silly bug in pgbench's random number generator
pgbench's random number generator was only generating the first and last value in the specified range half as often as other values in the range. Not that it actually matters but it may as well do what it claims. This line has a pretty long and sordid history with various people tweaking it one way and another. cvs diff: Diffing contrib/pgbench Index: contrib/pgbench/pgbench.c === RCS file: /home/stark/src/REPOSITORY/pgsql/contrib/pgbench/pgbench.c,v retrieving revision 1.66 diff -u -r1.66 pgbench.c --- contrib/pgbench/pgbench.c 24 May 2007 18:54:10 - 1.66 +++ contrib/pgbench/pgbench.c 14 Jun 2007 16:22:19 - @@ -191,7 +191,7 @@ static int getrand(int min, int max) { - return min + (int) (((max - min) * (double) random()) / MAX_RANDOM_VALUE + 0.5); + return min + (int) (((max - min + 1) * (double) random()) / MAX_RANDOM_VALUE); } /* call PQexec() and exit() on failure */ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Maintaining cluster order on insert
Heikki Linnakangas [EMAIL PROTECTED] writes: The reason for switching to the new API instead of the amsuggestblock API is CPU overhead. It avoids constructing the IndexTuple twice and descending the tree twice. I wonder if it's possible to finesse this. Have the suggestblock function remember the last block it suggested and somehow recognize when it's given the same tuple to index. Keeping the block pinned would still be the sticky point though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] 'Waiting on lock'
* Tom Lane ([EMAIL PROTECTED]) wrote: Yeah, I wouldn't want one per second. Do we already track how long we've been waiting? No, because we're *asleep*. You'd have to add an additional timeout-interrupt reason. Plus there's a ton of interesting questions about what's safe to do from an interrupt service routine. In fact, I am scandalized to see that someone has inserted a boatload of elog calls into CheckDeadLock since 8.2 --- that seems entirely unsafe. [ checks revision history... ] Attached is a patch which moves the messages to ProcSleep(). To do this I had to move the semaphore signal to unconditionally be signalled whenever CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm not 100% sure that's correct but afaik we only use semaphores to signal state changes and deal with spurious semaphore firings everywhere. Incidentally in looking at this I found that the early deadlock detection never seems to fire. Reading the source it seems it ought to be firing whenever we have a simple two-process deadlock. But instead I only get the timeout-based detection. checkpoint-log-messages-fix.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Gregory Stark [EMAIL PROTECTED] writes: Incidentally in looking at this I found that the early deadlock detection never seems to fire. Reading the source it seems it ought to be firing whenever we have a simple two-process deadlock. But instead I only get the timeout-based detection. Ok, I understand now that early deadlock detection only kicks in when doing something like LOCK TABLE and even then only if you're deadlocking because you're upgrading a lock. So this works as intended though it's much less useful than I thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Tom Lane [EMAIL PROTECTED] writes: Applied with some further revisions to improve the usefulness of the log messages. There's now one issued when the deadlock timeout elapses, and another when the lock is finally obtained: LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms ... LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms Is it possible for unlocking the semaphore to wake another process other than our own? In which case checking log_lock_waits before signalling the semaphore arguably locks us into having log_lock_waits be PGC_POSTMASTER. Currently it's PGC_SIGHUP which is odd since it could have been USERSET in the old regime. Also, I did just think of a reason why maybe having the times in the messages could be annoying: it makes it hard to write regression tests. I suppose having the pids already interferes with regression tests though. Maybe we should do something like optionally postprocess .out files with some sed script like s/[0-9]+/###/ before running diff. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Is it possible for unlocking the semaphore to wake another process other than our own? In which case checking log_lock_waits before signalling the semaphore arguably locks us into having log_lock_waits be PGC_POSTMASTER. How you figure that? Well I'm not clear exactly what's going on with the semaphores here. If it's possible for to be printing the messages only as a result of another backend unlocking the semaphore then making the PGSemaphoreUnlock conditional on log_lock_waits means you can't enable log_lock_waits after startup and get deterministic behaviour because whether you get messages will depend on which other backend happens to wake you up. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Tom Lane [EMAIL PROTECTED] writes: Which is always the same process: PGSemaphoreUnlock(MyProc-sem); Aaah! I just grokked what's going on with the semaphores here. It all makes a lot more sense now. Nevermind. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Load Distributed Checkpoints, final patch
Heikki Linnakangas [EMAIL PROTECTED] writes: We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. What exactly happens if a checkpoint takes so long that the next checkpoint starts. Aside from it not actually helping is there much reason to avoid this situation? Have we ever actually tested it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Load Distributed Checkpoints, take 3
Greg Smith [EMAIL PROTECTED] writes: If you write them twice, so what? You didn't even get to that point as an option until all the important stuff was taken care of and the system was near idle. Well even if it's near idle you were still occupying the i/o system for a few milliseconds. If someone else came in with a block request at that time you extended their response time by that much. The elimination of the all-scan background writer means that true hot and dirty spots in the buffer cache, like popular index blocks on a heavily updated table that never get a zero usage_count, are never going to be written out other than as part of the checkpoint process. If they're really popular blocks on a heavily updated table then they really don't buy us anything to write them out unnecessarily. The case where they help us is when they weren't really popular but we're not doing enough to get around to writing them out and then when we do need to write it out the system's busy. In that case we wasted a chance to write them out when the system was more idle. But don't forget that we're still going through the OS's buffers. That will smooth out a lot of the i/o we generate anyways. Just because Postgres is idle doesn't mean the OS isn't busy flushing the buffers we wrote out when it was busy. That's OK for now, but I'd like it to be the case that one day the database's I/O scheduling would eventually get to those, in order to optimize performance in the kind of bursty scenarios I've been mentioning lately. I think the feeling is that the bursty scenarios are really corner cases. Heikki described a case where you're dirtying tons of blocks without triggering any WAL. That can happen but it's pretty peculiar. I can imagine a scenario where you have a system that's very busy for 60s and then idle for 60s repeatedly. And for some reason you configure a checkpoint_timeout on the order of 20m or so (assuming you're travelling precisely 60mph). In that scenario bgwriter's lru scan has to fight to keep up for 60s while it's mostly writing out dirty pages that it could have flushed out during the idle time. Effectively you're only making use of half the i/o bandwidth since bgwriter doesn't do any work for half the duty cycle. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] dblink connection security
Joe Conway [EMAIL PROTECTED] writes: Robert Treat wrote: Patch based on recent -hackers discussions, it removes usage from public, and adds a note to the documentation about why this is neccessary. I agree with the fix as the simplest and most sensible approach, and in general with the doc change, but I'm not inclined to reference the security paper. Maybe something like: As a security precaution, dblink revokes access from PUBLIC role usage for the dblink_connect functions. It is not safe to allow remote users to execute dblink from a database in a PostgreSQL installation that allows local account access using the trust authentication method. In that case, remote users could gain access to other accounts via dblink. If trust authentication is disabled, this is no longer an issue. I think putting the emphasis on Postgresql trust authentication is missing the broader point. I would suggest two paragraphs such as: dblink allows any connected user to attempt to connect to TCP/IP or Unix sockets from the database server as the user the database system is running. This could allow users to circumvent access control policies based on the connecting user or the connecting host. In particular Postgres's trust authentication is one such system. It authenticates connecting users based on the unix userid of the process forming the connection. In typical configurations any user who is granted execute access to dblink can form connections as the postgres user which is the database super-user. If trust authentication is disabled this is no longer an issue. Therefore dblink requires you to explicitly grant execute privileges to users or roles you wish to allow to form connections. It does not grant these privileges to the PUBLIC role by default as other packages typically do. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] dblink connection security
Robert Treat [EMAIL PROTECTED] writes: In particular Postgres's trust authentication is one such system. It authenticates connecting users based on the unix userid of the process forming the connection. In typical configurations any user who is granted execute access to dblink can form connections as the postgres user which is the database super-user. If trust authentication is disabled this is no longer an issue. Did you mean s/trust/ident/g, otherwise I don't think I understand the above... granted the combination of trust for localhost does open a door for remote users if they have access to dblink, but I don't think that's what you were trying to say. Er quite right. Moreover it's not even true that ``if ident authentication is disabled this is no longer an issue''. It's still possible to have other restrictions in pg_hba which dblink would allow you to circumvent. That sentence is too generous of a promise. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster