Re: [HACKERS] Page format changes for 8.2?
Josh Berkus writes: > Do we have anything in the pipeline that would result in page format changes > for 8.2? [ looks at http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h ] I don't see any page format changes per se, but inet/cidr datatype representation changed here: 2006-01-25 21:35 tgl * doc/src/sgml/func.sgml, src/backend/optimizer/path/indxpath.c, src/backend/utils/adt/network.c, src/include/catalog/catversion.h, src/include/catalog/pg_amop.h, src/include/catalog/pg_cast.h, src/include/catalog/pg_operator.h, src/include/catalog/pg_proc.h, src/include/utils/builtins.h, src/include/utils/inet.h, src/test/regress/expected/opr_sanity.out, src/test/regress/sql/opr_sanity.sql: Clean up the INET-vs-CIDR situation. Get rid of the internal is_cidr flag and rely exclusively on the SQL type system to tell the difference between the types. Prevent creation of invalid CIDR values via casting from INET or set_masklen() --- both of these operations now silently zero any bits to the right of the netmask. Remove duplicate CIDR comparison operators, letting the type rely on the INET operators instead. We've debated before how to get pg_upgrade to handle these sorts of changes. If you want to make it happen, step right up ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Page format changes for 8.2?
Bruce, Do we have anything in the pipeline that would result in page format changes for 8.2? I'm wondering if it's worth reviving pg_upgrade, folks at work are interested ... -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Table clustering idea
Jim, > I know there were discussions in the past, though as per usual I can't > find them in the archives. Search on "B-Tree Organized Tables". >From what I can find, this feature isn't prohibitively useless. It's just a singnificant amount of effort for a result which is a tradeoff. That is, you'd *only* want to use it on tables which are *always* accessed by their primary key. What stopped the features AFAICT is that the interested parties weren't up to doing the code. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS]
-- liuchao
Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)
* Stephen Frost ([EMAIL PROTECTED]) wrote: > That would be an *excellent* addition.. Honestly, I think it'd be nice > to get a 'NOTICE' in such cases too, but having it in pg_stat_activity > will help alot. Additionally, Tom, and I hate to point this out here but I don't see much of an alternative; your mail system blocked my mail and apparently would block any mail from theplanet.com (a rather large hosting company based in Texas). Having been harrassed by their support folks for running a couple rather large mailing lists (the MythTV ones at mythtv.org) more than once I'm thinking this is a rather poor RBL. I'd encourage you to drop it in favor of something a bit more sane. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)
* Tom Lane ([EMAIL PROTECTED]) wrote: > [EMAIL PROTECTED] (Bruce Momjian) writes: > > Add GUC update_process_title to control whether 'ps' display is updated > > for every command, default to on. > > It strikes me that the ps_status support provides one important bit of > information that is currently hard to get elsewhere; specifically, the > "waiting" flag that gets added while blocked on a lock. You can find > out if a process is blocked by looking in pg_locks, but that's a fairly > expensive probe in itself and then you have to join to pg_stat_activity > to make any sense of it. I wonder if we should add a "waiting" boolean > column to pg_stat_activity? Given the new implementation of > pg_stat_activity, updating such a flag would be pretty cheap. That would be an *excellent* addition.. Honestly, I think it'd be nice to get a 'NOTICE' in such cases too, but having it in pg_stat_activity will help alot. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:
Tom Lane wrote: [EMAIL PROTECTED] (Bruce Momjian) writes: Add GUC update_process_title to control whether 'ps' display is updated for every command, default to on. It strikes me that the ps_status support provides one important bit of information that is currently hard to get elsewhere; specifically, the "waiting" flag that gets added while blocked on a lock. You can find out if a process is blocked by looking in pg_locks, but that's a fairly expensive probe in itself and then you have to join to pg_stat_activity to make any sense of it. I wonder if we should add a "waiting" boolean column to pg_stat_activity? Given the new implementation of pg_stat_activity, updating such a flag would be pretty cheap. Funny - today I was just thinking how useful that would be! Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:
Tom Lane wrote: > [EMAIL PROTECTED] (Bruce Momjian) writes: > > Add GUC update_process_title to control whether 'ps' display is updated > > for every command, default to on. > > It strikes me that the ps_status support provides one important bit of > information that is currently hard to get elsewhere; specifically, the > "waiting" flag that gets added while blocked on a lock. You can find > out if a process is blocked by looking in pg_locks, but that's a fairly > expensive probe in itself and then you have to join to pg_stat_activity > to make any sense of it. I wonder if we should add a "waiting" boolean > column to pg_stat_activity? Given the new implementation of > pg_stat_activity, updating such a flag would be pretty cheap. Nice idea. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)
[EMAIL PROTECTED] (Bruce Momjian) writes: > Add GUC update_process_title to control whether 'ps' display is updated > for every command, default to on. It strikes me that the ps_status support provides one important bit of information that is currently hard to get elsewhere; specifically, the "waiting" flag that gets added while blocked on a lock. You can find out if a process is blocked by looking in pg_locks, but that's a fairly expensive probe in itself and then you have to join to pg_stat_activity to make any sense of it. I wonder if we should add a "waiting" boolean column to pg_stat_activity? Given the new implementation of pg_stat_activity, updating such a flag would be pretty cheap. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GIN index creation extremely slow ?
Teodor Sigaev wrote: >> test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); >> CREATE INDEX >> Time: 416122.896 ms >> >> so about 7 minutes - sounds very reasonable >> >> test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); >> CREATE INDEX >> Time: 52681605.101 ms > > I'll look at this, but GiST time creation is suspiciously small. > Can you test on smaller table, for example with 10 records and if > results are repeat, pls, send to me test suite... I won't have access to the original testcase and server for a few days but I just redid some testing on a slower personal box of mine with a smaller(but similiar) testset and on that box I could not reproduce that issue. So the problem is either caused by the size of the table or somehow by the data itself :-( Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] posix_fadvise versus old kernels
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> While we could possibly come up with a suitable configure test to >> determine whether posix_fadvise is actually safe to use on a given >> system, I think we should seriously consider just reverting the patch. >> As far as I saw, zero evidence was given that it actually does anything >> measurable. Without a benchmark to prove that it's worth spending more >> time on, I'm disinclined to trouble over it. > Agreed. How about if we just #ifdef NOT_USED the code and mention the > problem in a comment. Works for me; I'll write something and commit it. We can leave the is-posix_fadvise-declared configure test in place, at least for now ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SO_SNDBUF size is small on win32?
> We have definitly seen weird timing issues sometimes when both client > and server were on Windows, but have been unable to pin it exactly on > what. From Yoshiykis other mail it looks like this could possibly be it, > since he did experience a speedup in the range we've been looking for in > those cases. > > >> What I would think might help is a patch on the libpq side (because it >> *does* use a nonblocking socket) to avoid sending more than >> 8K per WSASend call. The effect would just be to break a >> long send into a series of shorter sends, which wouldn't >> really do anything useful on a well-designed TCP stack, but >> then this is Windows we're talking about... > > It could definitly be a good idea to have a patch there *as well*, but I > think they'd both be affected. As I said earlier, I would boost the socket buffer to something larger than merely 2x the packet size. I'd try for 32K (32768), that way we have some space for additional buffers before we hit the problem. It is presumed that we should have enough data in the socket buffer to at least try to match the expected amount of data that would be sent while waiting for the defered ACK. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SO_SNDBUF size is small on win32?
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > We use non-blocking sockets in backend/port/win32/socket.c so we are > able to deliver our "faked signals" while waiting for I/O on the socket. > We specifically set it in pgwin32_socket(). Hm, that seems a bit grotty, but anyway I stand corrected. > Given that, it might be a good idea to actually put the code there > instead, to localise it. With a comment and a reference to that Q > article. No, I think the patch has it in the right place, because pgwin32_socket would have no defensible way of knowing what the max send size would be. (Indeed, with a slightly different implementation in pqcomm.c, there would not *be* any hard upper limit; the current code wastes cycles copying data around, when with a large message it probably should just send() directly from the message buffer...) I agree it needs a comment though. >> What I would think might help is a patch on the libpq side (because it >> *does* use a nonblocking socket) to avoid sending more than >> 8K per WSASend call. > It could definitly be a good idea to have a patch there *as well*, but I > think they'd both be affected. On the libpq side, sending large messages is probably rare except for COPY IN mode. Has anyone noticed performance issues specifically with COPY IN? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] posix_fadvise versus old kernels
Tom Lane wrote: > I've been digging into why buildfarm member thrush has been dumping core > consistently during the regression tests since the posix_fadvise patch > went in. I've confirmed that posix_fadvise() itself will SIGSEGV in a > standalone test program, and found that this happens only if > _FILE_OFFSET_BITS=64 ... which is our default configuration on Linux. > > Some googling turned up the following > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219 > which basically says that posix_fadvise64 + 2.4 kernel + older glibc > = crash. It sounds like the 2.4 kernel hasn't got this call but glibc > thought it did, up till about a year ago. > > While we could possibly come up with a suitable configure test to > determine whether posix_fadvise is actually safe to use on a given > system, I think we should seriously consider just reverting the patch. > As far as I saw, zero evidence was given that it actually does anything > measurable. Without a benchmark to prove that it's worth spending more > time on, I'm disinclined to trouble over it. Agreed. How about if we just #ifdef NOT_USED the code and mention the problem in a comment. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Greg Stark wrote: > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > PFC wrote: > > > > > > > My idea is that if an UPDATE places the new tuple on the same page as > > > > the old tuple, it will not create new index entries for any indexes > > > > where the key doesn't change. > > > > > > Basically the idea behind preventing index bloat by updates is to have > > > one index tuple point to several actual tuples having the same value. > > > > > > > The idea is not to avoid index bloat, but to allow heap reuse, and having > > one index entry for multiple versions of an UPDATEd row is merely an > > implementation detail. > > It sort of sounds like you're describing a whole new index type that stores > only the page, not the precise record of any tuple it indexes. If your table Background, indexes point to page item pointers, not to actual offsets in the page. This is how vacuum can move around tuples without modifying the indexes. The index points to a page item pointer that is a chain of tuples with the same indexed columns. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [COMMITTERS] pgsql: Disallow changing/dropping default
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Wasn't this patch rejected? > > > Anyway, what is your opinion on this? > > I thought we'd rejected it. I'm not sure that we'd completely agreed > what the best thing to do is, but what this patch actually does is to > silently remove the dependency link. That is, after > > create table t1 (f1 serial); > alter table t1 alter column f1 drop default; > > t1_f1_seq is still there, but now completely unconnected to t1. > That doesn't seem to me to satisfy the principle of least surprise. > It's certainly not what the TODO item says (reject the DROP DEFAULT). > I think we were considering the alternative of having the DROP DEFAULT > remove the sequence, which probably could be implemented painlessly > with a change in the way we set up the dependency links to start with. > > In any case I don't like this patch: int/bool confusion, use of elog > instead of ereport for a user-facing error message, failure to adhere to > style guidelines for that message, etc. (Although seeing that the error > message is unreachable code, maybe that doesn't matter ;-)) Aside from > the poor coding style, the whole idea of reaching into pg_depend to > remove a single dependency strikes me as a brute-force solution to > a problem that should have a more elegant answer. Agreed, patch reverted. Thanks for the analysis. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SO_SNDBUF size is small on win32?
I would set the SO_SNDBUF to 32768. > Hi, > > I see a performance issue on win32. This problem is causes by the > following URL. > > http://support.microsoft.com/kb/823764/EN-US/ > > On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is > 8192 too. > > pqcomm.c:117 > #define PQ_BUFFER_SIZE 8192 > > send() may take as long as 200ms. So, I think we should increase > SO_SNDBUF to more than 8192. I attache the patch. > > Regards, > -- > Yoshiyuki Asaba > [EMAIL PROTECTED] > Index: pqcomm.c > === > RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v > retrieving revision 1.184 > diff -c -r1.184 pqcomm.c > *** pqcomm.c 5 Mar 2006 15:58:27 - 1.184 > --- pqcomm.c 27 Jun 2006 15:17:18 - > *** > *** 593,598 > --- 593,608 > return STATUS_ERROR; > } > > + #ifdef WIN32 > + on = PQ_BUFFER_SIZE * 2; > + if (setsockopt(port->sock, SOL_SOCKET, SO_SNDBUF, > +(char *) &on, sizeof(on)) < 0) > + { > + elog(LOG, "setsockopt(SO_SNDBUF) failed: %m"); > + return STATUS_ERROR; > + } > + #endif > + > /* >* Also apply the current keepalive parameters. If we fail to > set a >* parameter, don't error out, because these aren't universally > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Bruce Momjian <[EMAIL PROTECTED]> writes: > PFC wrote: > > > > > My idea is that if an UPDATE places the new tuple on the same page as > > > the old tuple, it will not create new index entries for any indexes > > > where the key doesn't change. > > > > Basically the idea behind preventing index bloat by updates is to have > > one index tuple point to several actual tuples having the same value. > > > > The idea is not to avoid index bloat, but to allow heap reuse, and having > one index entry for multiple versions of an UPDATEd row is merely an > implementation detail. It sort of sounds like you're describing a whole new index type that stores only the page, not the precise record of any tuple it indexes. If your table has only such indexes then you never need to worry about updating indexes if your new tuple version goes on the same page as the old one. It's an interesting thought experiment. It might trade off a lot of work in index maintenance as well as saving space in the index for a lot of additional work performing index scans. There can easily be enough tuples on a page to make scanning the entire page pretty costly. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] posix_fadvise versus old kernels
I've been digging into why buildfarm member thrush has been dumping core consistently during the regression tests since the posix_fadvise patch went in. I've confirmed that posix_fadvise() itself will SIGSEGV in a standalone test program, and found that this happens only if _FILE_OFFSET_BITS=64 ... which is our default configuration on Linux. Some googling turned up the following http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219 which basically says that posix_fadvise64 + 2.4 kernel + older glibc = crash. It sounds like the 2.4 kernel hasn't got this call but glibc thought it did, up till about a year ago. While we could possibly come up with a suitable configure test to determine whether posix_fadvise is actually safe to use on a given system, I think we should seriously consider just reverting the patch. As far as I saw, zero evidence was given that it actually does anything measurable. Without a benchmark to prove that it's worth spending more time on, I'm disinclined to trouble over it. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SO_SNDBUF size is small on win32?
> > From: Tom Lane <[EMAIL PROTECTED]> > >> It also says that the condition only occurs if the program uses > >> non-blocking sockets ... which the backend does not. So this page > >> offers no support for the proposed patch. > > > WSAEventSelect() sets a socket to nonblocking mode. > > Yeah, but that socket is only used for inter-backend > signaling with small (1 byte, I think) messages. The socket > used for communication with the frontend is not in > nonblocking mode, unless I'm totally confused. For once, I beleive you are :-) We use non-blocking sockets in backend/port/win32/socket.c so we are able to deliver our "faked signals" while waiting for I/O on the socket. We specifically set it in pgwin32_socket(). Given that, it might be a good idea to actually put the code there instead, to localise it. With a comment and a reference to that Q article. > Have you actually measured any performance benefit from this > patch, and if so what was the test case? I'm not opposed to > the patch if it does something useful, but the info currently > available does not suggest that it will help. We have definitly seen weird timing issues sometimes when both client and server were on Windows, but have been unable to pin it exactly on what. From Yoshiykis other mail it looks like this could possibly be it, since he did experience a speedup in the range we've been looking for in those cases. > What I would think might help is a patch on the libpq side (because it > *does* use a nonblocking socket) to avoid sending more than > 8K per WSASend call. The effect would just be to break a > long send into a series of shorter sends, which wouldn't > really do anything useful on a well-designed TCP stack, but > then this is Windows we're talking about... It could definitly be a good idea to have a patch there *as well*, but I think they'd both be affected. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Table clustering idea
On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote: I think one of the issues might have been: how will you handle other indexes on the table when you can no longer point them at an item (since items will need to move to maintain an IOT). There are clean ways to handle this. The table is organized on the primary key, a typical requirement for IOTs. Any indexes you add to IOT reference the primary key of the heap tuple. Since the heap and PK index are the same thing, external indexes use the PK as the tuple identifier. The only caveat is that this creates performance asymmetries. IOTs have significantly faster access through their primary keys but slower external index access since two B-Trees have to be traversed. An IOT is typically only used for tables that are only accessed through their primary key. Not supporting external indexes on IOTs is a functional implementation (and probably recommended in practice), though most real implementations allow external indexes if not always in their first version. J. Andrew Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] SO_SNDBUF size is small on win32?
Yoshiyuki Asaba <[EMAIL PROTECTED]> writes: > From: Tom Lane <[EMAIL PROTECTED]> >> It also says that the condition only occurs if the program uses >> non-blocking sockets ... which the backend does not. So this page >> offers no support for the proposed patch. > WSAEventSelect() sets a socket to nonblocking mode. Yeah, but that socket is only used for inter-backend signaling with small (1 byte, I think) messages. The socket used for communication with the frontend is not in nonblocking mode, unless I'm totally confused. Have you actually measured any performance benefit from this patch, and if so what was the test case? I'm not opposed to the patch if it does something useful, but the info currently available does not suggest that it will help. What I would think might help is a patch on the libpq side (because it *does* use a nonblocking socket) to avoid sending more than 8K per WSASend call. The effect would just be to break a long send into a series of shorter sends, which wouldn't really do anything useful on a well-designed TCP stack, but then this is Windows we're talking about... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject
Sorry, wrong list... I reposted this on pgsql-jdbc instead. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Table clustering idea
> I think one of the issues might have been: how will you handle other > indexes on the table when you can no longer point them at an item (since > items will need to move to maintain an IOT). I guess you shouldn't allow any other indexes. That's a perfectly acceptable compromise I think... it would be still very useful for big and narrow tables which would benefit from being clustered. The other concern is how would you do sequential scans on the table if items are allowed to move ? I think some other DBs have a facility to make a "fast index scan" which is essentially a sequential scan of the index file, something like that would be needed here too. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SO_SNDBUF size is small on win32?
From: Tom Lane <[EMAIL PROTECTED]> Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? Date: Tue, 27 Jun 2006 12:28:35 -0400 > Andrew Dunstan <[EMAIL PROTECTED]> writes: > > Martijn van Oosterhout wrote: > >> On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote: > >>> http://support.microsoft.com/kb/823764/EN-US/ > > > No, it says it occurs if this condition is met: "A single *send* call or > > *WSASend* call fills the whole underlying socket send buffer." > > It also says that the condition only occurs if the program uses > non-blocking sockets ... which the backend does not. So this page > offers no support for the proposed patch. WSAEventSelect() sets a socket to nonblocking mode. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcecomm5/html/wce50lrfWSAEventSelect.asp pgwin32_send() calls pgwin32_waitforsinglesocket() before WSASend(). And pgwin32_waitforsinglesocket() calls WSAEventSelect(). Regards, -- Yoshiyuki Asaba [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Table clustering idea
On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote: > Jim, > > On 6/26/06 8:15 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > On a somewhat related note, I think that it would be advantageous if the > > FSM had a means to prefer certain pages for a given tuple over other > > pages. This would allow for a better way to keep heap and possibly index > > data more compacted, and it would also be a means of keeping tables > > loosely clustered. It would also make it far easier to shrink heaps that > > have become bloated, because the FSM could be told to favor pages at the > > beginning of the relation. > > Interesting idea - page affinity implemented using the FSM. > > WRT feasibility of BTREE organized tables, I'm not sure I see the problem. > Teradata implemented a hashing filesystem for their heap storage and I've > always wondered about how they handle collision and chaining efficiently, > but it's a solved problem for sure - knowing that makes the challenge that > much easier :-) I know there were discussions in the past, though as per usual I can't find them in the archives. At one point I had suggested clustering not on a row level, but on a page level, since it doesn't really matter terribly if the tuples in a page are clustered (worst case you can scan the entire page). I think one of the issues might have been: how will you handle other indexes on the table when you can no longer point them at an item (since items will need to move to maintain an IOT). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SO_SNDBUF size is small on win32?
From: Martijn van Oosterhout Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? Date: Tue, 27 Jun 2006 18:13:18 +0200 > On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote: > > No, it says it occurs if this condition is met: "A single *send* call or > > *WSASend* call fills the whole underlying socket send buffer." > > > > This will surely be true if the buffer sizes are the same. They > > recommend making the socket buffer at least 1 byte bigger. > > Ok, but even then, are there any benchmarks to show it makes a > difference. The articles suggests there should be but it would be nice > to see how much difference it makes... I see the problem in this environment. * client - Windows XP - using ODBC driver * server - Windows XP - 8.1.4 * query time - original -> about 12sec. - patch version -> about 3sec. However, this problem did not occur when I changed a client machine... Regards, -- Yoshiyuki Asaba [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
On Tue, Jun 27, 2006 at 10:42:54AM +0200, PFC wrote: > Also, I insist (again) that there is a lot to gain by using a bit of > compression on the data pages, even if it's very simple compression like > storing the new version of a row as a difference from the previous version > (ie. only store the columns that changed). > I think DB2 stores the latest version entirely, and stores the > previous versions as a delta. This is more efficient. This would only help on tables that: have many columns[1] are frequently updated the updates normally touch few columns [1] I'm assuming that un-changed toasted fields keep the same pointer I'm doubtful that that case is common enough to warrant the amount of work that would be involved in doing this. I think it might be useful to consider ways to make vertical partitioning easier, since that's the common means to reduce the impact of these scenarios. > In the case of tables containing TEXT values, these could also get > TOASTed. When an update does not modify the TOASTed columns, it would be > nice to simply be able to keep the reference to the TOASTed data instead > of decompressing it and recompressing it. Or is it already the case ? Hopefully it is, but I'm not sure... something that would be good is a means to force fields to be toasted sooner than when the tuple is bigger than 2k, because that'd be a very easy way to get gains from vertical partitioning. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SO_SNDBUF size is small on win32?
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Martijn van Oosterhout wrote: >> On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote: >>> http://support.microsoft.com/kb/823764/EN-US/ > No, it says it occurs if this condition is met: "A single *send* call or > *WSASend* call fills the whole underlying socket send buffer." It also says that the condition only occurs if the program uses non-blocking sockets ... which the backend does not. So this page offers no support for the proposed patch. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject & SetString]
There's an inconsistency between the handling of trailing whitespace in query parameters in the client jdbc driver compared to the PL/Java SPI based driver. According to Jean-Pierre, the former apparently trims the trailing spaces before passing the query (see below). What is the correct behavior? Regards, Thomas Hallgren Original Message Subject: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject & SetString Date: Tue, 27 Jun 2006 12:07:19 -0400 From: JEAN-PIERRE PELLETIER <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Hi Thomas, There are very few char columns on my system, I can easily live with explicit trim in my application code, I only wanted to let you guys know. I am not sure which of the two JDBC implementations is right. psql and pgadmin would both handle char as expected. As for JDBC, you might want to know how other dbms are handling this. Thanks for your reply. Jean-Pierre Pelletier From: Thomas Hallgren <[EMAIL PROTECTED]> To: JEAN-PIERRE PELLETIER <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED] Subject: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject & SetString Date: Tue, 27 Jun 2006 17:47:24 +0200 Hi Jean-Pierre, I'm not sure this is incorrect behavior. There's nothing in the spec that indicates that String values should be trimmed by setString and setObject. On the contrary. Some datatypes (the CHAR in particular) are sensitive to whitespace according to the SQL standard. Perhaps the client jdbc driver is doing something wrong here? Regards, Thomas Hallgren JEAN-PIERRE PELLETIER wrote: Hi, Trailing space are not handled properly by setObject & setString. PreparedStatement pstmt = connection.prepareStatement( "select * from mytable where mycharcolumn = ?"); String myString = "abc "; pstmt.setObject(1, myString); // or setObject(1, myString, Types.CHAR) or setString(1, myString) No rows are returned, but using trim works fine as in: pstmt.setObject(1, myString.trim()); My environment is Pl/Java 1.3, Sun JDK 1.5.07, PostgreSQL 8.1.4, Windows XP SP2 With PostgreSQL own (non pl/java) jdbc driver, setObject on char works fine without the trim. Thanks, Jean-Pierre Pelletier ___ Pljava-dev mailing list [EMAIL PROTECTED] http://gborg.postgresql.org/mailman/listinfo/pljava-dev ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Jim C. Nasby wrote: > > > Perhaps my point got lost... in the case where no index keys change > > > during an update, SITC seems superior in every way to my proposal. My > > > idea (let's call it Index Tuple Page Consolidation, ITPC) would be > > > beneficial to UPDATEs that modify one or more index keys but still put > > > the tuple on the same page. Where SITC would be most useful for tables > > > that have a very heavy update rate and very few indexes, ITPC would > > > benefit tables that have more indexes on them; where presumably it's > > > much more likely for UPDATEs to change at least one index key (which > > > means SITC goes out the window, if I understand it correctly). If I'm > > > missing something and SITC can in fact deal with some index keys > > > changing during an UPDATE, then I see no reason for ITPC. > > > > I understood what you had said. The question is whether we want to get > > that complex with this feature, and if there are enough use cases > > (UPDATE with index keys changing) to warrant it. > > Ideas on how to test a table to see how many tuples would fit this > criteria? > > Or we could just shelve ITPC as a possibility in the future, after we > see how much the limitations of SITC hurt. Probably. I am not sure even SITC is a win given the complexity it will add, but I think it is worth trying. Getting into more complex cases where chains change indexed values seems like something we could try later if we have to. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote: > > > Yes, and for index_getmulti (which doesn't visit the heap at all) we'll > > > have to change all the users of that (which aren't many, I suppose). > > > It's probably worth making a utility function to expand them. > > > > > > I'm still confused where bitmap index scan fit into all of this. Is > > > preserving the sequential scan aspect of these a goal with this new > > > setup? > > > > No. I was just pointing out that if you get to the tuple via an index, > > you get handed the head of the SITC via the index tuple, but if you are > > doing a sequential scan, you don't get it, so you have to find it, or > > any other non-visible SITC header. > > Ok, but it remains true that you can only have one SITC per tuple. So > if you have 5 indexes on a table, any SITC will only join tuples that > didn't change any values in any of the indexed columns. That's probably > not a big deal though; indexes columns arn't likely to be the ones > changing much. Right. > So, for the bitmap scan you have to make sure that within a single > transaction, scanning multiple indexes will have to provide the same > SITC for each set of tuples, even in the face of concurrent updates. > Otherwise the BitmapAnd will incorrectly throw them out. The index points to the item id on the page, and that never changes, even if the head tuple changes later. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
On Mon, Jun 26, 2006 at 11:08:24PM -0400, Bruce Momjian wrote: > Jim C. Nasby wrote: > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: > > > > > > It is certainly possible to do what you are suggesting, that is have two > > > index entries point to same chain head, and have the index access > > > routines figure out if the index qualifications still hold, but that > > > seems like a lot of overhead. > > > > > > Also, once there is only one visible row in the chain, removing old > > > index entries seems quite complex because you have to have vacuum keep > > > the qualifications of each row to figure out which index tuple is the > > > valid one (seems messy). > > > > Perhaps my point got lost... in the case where no index keys change > > during an update, SITC seems superior in every way to my proposal. My > > idea (let's call it Index Tuple Page Consolidation, ITPC) would be > > beneficial to UPDATEs that modify one or more index keys but still put > > the tuple on the same page. Where SITC would be most useful for tables > > that have a very heavy update rate and very few indexes, ITPC would > > benefit tables that have more indexes on them; where presumably it's > > much more likely for UPDATEs to change at least one index key (which > > means SITC goes out the window, if I understand it correctly). If I'm > > missing something and SITC can in fact deal with some index keys > > changing during an UPDATE, then I see no reason for ITPC. > > I understood what you had said. The question is whether we want to get > that complex with this feature, and if there are enough use cases > (UPDATE with index keys changing) to warrant it. Ideas on how to test a table to see how many tuples would fit this criteria? Or we could just shelve ITPC as a possibility in the future, after we see how much the limitations of SITC hurt. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
PFC wrote: > > > My idea is that if an UPDATE places the new tuple on the same page as > > the old tuple, it will not create new index entries for any indexes > > where the key doesn't change. > > Basically the idea behind preventing index bloat by updates is to have > one index tuple point to several actual tuples having the same value. > The idea is not to avoid index bloat, but to allow heap reuse, and having one index entry for multiple versions of an UPDATEd row is merely an implementation detail. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Hannu Krosing wrote: > ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing: > > ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: > > > Jim C. Nasby wrote: > > > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: > > > > > > > > > > It is certainly possible to do what you are suggesting, that is have > > > > > two > > > > > index entries point to same chain head, and have the index access > > > > > routines figure out if the index qualifications still hold, but that > > > > > seems like a lot of overhead. > > > > I think Jim meant not 2 pointing to the same head, but 2 pointing into > > the same chain. Say we have table with (id serial, ts timestamp) where > > ts changes at each update and id does not. > > > > So after 3 updates on one page we have one CITC/ITPC head with pointers > > from both indexes and two follow-up tuples with pointers from only ts > > index. > > > > The problem with this setup is, that we can't reuse any of those > > follow-up tuples without index cleanup. > > But we still have to think about similar cases (index entries pointing > inside CITC chains), unless we plan to disallow adding indexes to > tables. CREATE INDEX has to undo any chains where the new indexed columns change in the chain, and add index entries to remove the chain. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] SO_SNDBUF size is small on win32?
On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote: > No, it says it occurs if this condition is met: "A single *send* call or > *WSASend* call fills the whole underlying socket send buffer." > > This will surely be true if the buffer sizes are the same. They > recommend making the socket buffer at least 1 byte bigger. Ok, but even then, are there any benchmarks to show it makes a difference. The articles suggests there should be but it would be nice to see how much difference it makes... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] SO_SNDBUF size is small on win32?
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Martijn van Oosterhout > On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote: > > Hi, > > > > I see a performance issue on win32. This problem is causes by the > > following URL. > > > > http://support.microsoft.com/kb/823764/EN-US/ > > > > On win32, default SO_SNDBUF value is 8192 bytes. And > libpq's buffer is > > 8192 too. > > Ok, so there's a difficiency in Windows TCP code. Do you have any > benchmarks to show this actually makes a difference. According to the > URL you give, the problem occurs if the libpq buffer is *bigger* than > the socket buffer, which it isn't... > The article also says there is a problem if they are the same size. -rocco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SO_SNDBUF size is small on win32?
Martijn van Oosterhout wrote: On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote: Hi, I see a performance issue on win32. This problem is causes by the following URL. http://support.microsoft.com/kb/823764/EN-US/ On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is 8192 too. Ok, so there's a difficiency in Windows TCP code. Do you have any benchmarks to show this actually makes a difference. According to the URL you give, the problem occurs if the libpq buffer is *bigger* than the socket buffer, which it isn't... No, it says it occurs if this condition is met: "A single *send* call or *WSASend* call fills the whole underlying socket send buffer." This will surely be true if the buffer sizes are the same. They recommend making the socket buffer at least 1 byte bigger. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SO_SNDBUF size is small on win32?
From: Tom Lane <[EMAIL PROTECTED]> Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? Date: Tue, 27 Jun 2006 11:30:56 -0400 > Yoshiyuki Asaba <[EMAIL PROTECTED]> writes: > > send() may take as long as 200ms. So, I think we should increase > > SO_SNDBUF to more than 8192. I attache the patch. > > Why would that help? We won't be sending more than 8K at a time. MSDN is, Method2: Make the Socket Send Buffer Size Larger Than the Program Send Buffer Size Modify the send call or the WSASend call to specify a buffer size at least 1 byte smaller than the SO_SNDBUF value. -- Yoshiyuki Asaba [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] SO_SNDBUF size is small on win32?
On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote: > Hi, > > I see a performance issue on win32. This problem is causes by the > following URL. > > http://support.microsoft.com/kb/823764/EN-US/ > > On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is > 8192 too. Ok, so there's a difficiency in Windows TCP code. Do you have any benchmarks to show this actually makes a difference. According to the URL you give, the problem occurs if the libpq buffer is *bigger* than the socket buffer, which it isn't... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] SO_SNDBUF size is small on win32?
Yoshiyuki Asaba <[EMAIL PROTECTED]> writes: > send() may take as long as 200ms. So, I think we should increase > SO_SNDBUF to more than 8192. I attache the patch. Why would that help? We won't be sending more than 8K at a time. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] SO_SNDBUF size is small on win32?
Hi, I see a performance issue on win32. This problem is causes by the following URL. http://support.microsoft.com/kb/823764/EN-US/ On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is 8192 too. pqcomm.c:117 #define PQ_BUFFER_SIZE 8192 send() may take as long as 200ms. So, I think we should increase SO_SNDBUF to more than 8192. I attache the patch. Regards, -- Yoshiyuki Asaba [EMAIL PROTECTED] Index: pqcomm.c === RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v retrieving revision 1.184 diff -c -r1.184 pqcomm.c *** pqcomm.c5 Mar 2006 15:58:27 - 1.184 --- pqcomm.c27 Jun 2006 15:17:18 - *** *** 593,598 --- 593,608 return STATUS_ERROR; } + #ifdef WIN32 + on = PQ_BUFFER_SIZE * 2; + if (setsockopt(port->sock, SOL_SOCKET, SO_SNDBUF, + (char *) &on, sizeof(on)) < 0) + { + elog(LOG, "setsockopt(SO_SNDBUF) failed: %m"); + return STATUS_ERROR; + } + #endif + /* * Also apply the current keepalive parameters. If we fail to set a * parameter, don't error out, because these aren't universally ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [COMMITTERS] pgsql: Disallow changing/dropping default expression
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Wasn't this patch rejected? > Anyway, what is your opinion on this? I thought we'd rejected it. I'm not sure that we'd completely agreed what the best thing to do is, but what this patch actually does is to silently remove the dependency link. That is, after create table t1 (f1 serial); alter table t1 alter column f1 drop default; t1_f1_seq is still there, but now completely unconnected to t1. That doesn't seem to me to satisfy the principle of least surprise. It's certainly not what the TODO item says (reject the DROP DEFAULT). I think we were considering the alternative of having the DROP DEFAULT remove the sequence, which probably could be implemented painlessly with a change in the way we set up the dependency links to start with. In any case I don't like this patch: int/bool confusion, use of elog instead of ereport for a user-facing error message, failure to adhere to style guidelines for that message, etc. (Although seeing that the error message is unreachable code, maybe that doesn't matter ;-)) Aside from the poor coding style, the whole idea of reaching into pg_depend to remove a single dependency strikes me as a brute-force solution to a problem that should have a more elegant answer. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote: > > Yes, and for index_getmulti (which doesn't visit the heap at all) we'll > > have to change all the users of that (which aren't many, I suppose). > > It's probably worth making a utility function to expand them. > > > > I'm still confused where bitmap index scan fit into all of this. Is > > preserving the sequential scan aspect of these a goal with this new > > setup? > > No. I was just pointing out that if you get to the tuple via an index, > you get handed the head of the SITC via the index tuple, but if you are > doing a sequential scan, you don't get it, so you have to find it, or > any other non-visible SITC header. Ok, but it remains true that you can only have one SITC per tuple. So if you have 5 indexes on a table, any SITC will only join tuples that didn't change any values in any of the indexed columns. That's probably not a big deal though; indexes columns arn't likely to be the ones changing much. So, for the bitmap scan you have to make sure that within a single transaction, scanning multiple indexes will have to provide the same SITC for each set of tuples, even in the face of concurrent updates. Otherwise the BitmapAnd will incorrectly throw them out. That should be doable, if you only change the head of the SITC on VACUUM. I'm not sure if that's what's being suggested right now. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] refcount leak warnings
Thomas Hallgren <[EMAIL PROTECTED]> writes: > I have a PL/Java user that performs some lengthy operations. Eventually, > he get warnings like: > WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, > flags=0x27, refcount=1 2) Look for ReadBuffer calls not matched by ReleaseBuffer. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] refcount leak warnings
On Tue, Jun 27, 2006 at 03:55:06PM +0200, Thomas Hallgren wrote: > I have a PL/Java user that performs some lengthy operations. Eventually, > he get warnings like: > > WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, > flags=0x27, refcount=1 2) I think the comment about failing to clean up is correct. AIUI, if you've done a heap_open without a heap_close, or an index_open without an index_close, you'll get warnings like this. Maybe using SPI without clearing everything. Obviously something didn't get cleaned up somewhere, but what? Maybe identifying the rel and looking at that block might help identify the issue. Have a ncie day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2
"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes: >> That was with a mind to performance. Checking every INSERT, >> UPDATE and DELETE statement to see if they are being done >> against a frozen table seems like a waste. > I'd think we would have relminxid in the relcache, so I don't buy the > performance argument :-) Me either. Further, auto-revoking permissions loses information. I think that idea is an ugly kluge. Anyway, the bottom line here seems to be that we should forget about pg_class_nt and just keep the info in pg_class; there's not sufficient justification to build the infrastructure needed for a nontransactional auxiliary catalog. This implies the following conclusions: * template0 has to be vacuumed against wraparound, same as any other database. * To support frozen tables, "VACUUM FREEZE" and "ALTER TABLE UNFREEZE" would need to be explicit commands taking ExclusiveLock, and can't be nested inside transaction blocks either. Automatic unfreeze upon an updating command isn't possible. Neither of these are bad enough to justify pg_class_nt --- in fact, I'd argue that explicit unfreeze is better than automatic anyway. So it was a cute idea, but its time hasn't come. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] refcount leak warnings
I have a PL/Java user that performs some lengthy operations. Eventually, he get warnings like: WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, flags=0x27, refcount=1 2) I traced this to the function PrintBufferLeakWarning. AFAICS, it's only called from the function ResourceOwnerReleaseInternal under the following comment: * During a commit, there shouldn't be any remaining pins --- that * would indicate failure to clean up the executor correctly --- so * issue warnings.In the abort case, just clean up quietly. I have no idea where to go from here. What should I look for when trying to find the cause of such warnings? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
> On Fri, Jun 23, 2006 at 06:37:01AM -0400, Mark Woodward wrote: >> While we all know session data is, at best, ephemeral, people still want >> some sort of persistence, thus, you need a database. For mcache I have a >> couple plugins that have a wide range of opitions, from read/write at >> startup and shut down, to full write through cache to a database. >> >> In general, my clients don't want this, they want the database to store >> their data. When you try to explain to them that a database may not be >> the >> right place to store this data, they ask why, sadly they have little >> hope >> of understanding the nuances and remain unconvinced. > > Have you done any benchmarking between a site using mcache and one not? > I'll bet there's a huge difference, which translates into hardware $$. > That's something managers can understand. > Last benchmark I did was on a pure data level, a couple years ago, PostgreSQL could handle about 800 session transactions a second, but degraded over time, MCache was up about 7500 session transactions a second and held steady. I should dig up that code and make it available on my site. I have a couple users that tell me that their sites couldn't work without it, not even with MySQL. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
> Ãhel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward: >> > ÃÅhel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce >> Momjian: >> >> Jonah H. Harris wrote: >> >> > On 6/23/06, Tom Lane <[EMAIL PROTECTED]> wrote: >> >> > > What I see in this discussion is a huge amount of "the grass must >> be >> >> > > greener on the other side" syndrome, and hardly any recognition >> that >> >> > > every technique has its downsides and complications. >> >> > >> >> > I'm being totally objective. I don't think we should abandon >> >> > PostgreSQL's overall design at all, because we do perform INSERTs >> and >> >> > DELETEs much better than most systems. However, I've looked at >> many >> >> > systems and how they implement UPDATE so that it is a scalable >> >> > operation. Sure, there are costs and benefits to each >> implementation, >> >> > but I think we have some pretty brilliant people in this community >> and >> >> > can come up with an elegant design for scalable UPDATEs. >> >> >> >> I think the UPDATE case is similar to the bitmap index scan or >> perhaps >> >> bitmap indexes on disk --- there are cases we know can not be handled >> >> well by our existing code, so we have added (or might add) these >> >> features to try to address those difficult cases. >> > >> > Not really. Bitmap index scan and bitmap index are both new additions >> > working well with existing framework. >> > >> > While the problem of slowdown on frequent updates is real, the >> suggested >> > fix is just plain wrong, as it is based on someones faulty assumption >> on >> > how index lookup works, and very much simplified view of how different >> > parts of the system work to implement MVCC. >> >> Yes, the suggestion was based on MVCC concepts, not a particular >> implementation. > > On the contrary - afaik, it was loosely based on how Oracle does it with > its rollback segments, only assuming that rollback segments are kept in > heap and that indexes point only to the oldest row version :p Well, give me a little more credit than that. Yes, Oracle did play small part in my thinking, but only in as much as "they can't do it, why can't we?" The problem was how to get the most recent tuple to be more efficient and not have tuples that will never be used impact performance without excessive locking or moving data around. It was a just a quick idea. Bruce's solution, you have to admit, is somewhat similar. > >> > The original fix he "suggests" was to that imagined behaviour and thus >> > ignored all the real problems of such change. >> >> The original suggestion, was nothing more than a hypothetical for the >> purpose of discussion. >> >> The problem was the steady degradation of performance on frequent >> updates. >> That was the point of discussion. I brought up "one possible way" to >> start a "brain storm." The discussion then morphed into critisizing the >> example and not addressing the problem. > > The problem is heatedly discussed every 3-4 months. And yet, here we are again. > >> Anyway, I think some decent discussion about the problem did happen, and >> that is good. > > Agreed. > > Maybe this _was_ the best way to bring up the discussion again. I have a way, for better or worse, I guess, of stirring up the pot. :-) Cry as we may about MySQL, but I have a sneaking suspicion that this is one of the issues that puts PostgreSQL at a serious disadvantage. While heavily updated rows are a single type of problem, these days I think *most* database deployments are as back-ends for web sites. This problem is *very* critical to that type of application, consequently probably why PostgreSQL has difficulty in that space. If PostgreSQL can be made *not* to suffer performance degradation on heavily updated rows, then that is realy the last issue in the way of it being a completely creadible medium to large enterprise back end. This combined with its amazing pragramability, should make it unstoppable. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] GIN index creation extremely slow ?
test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms I'll look at this, but GiST time creation is suspiciously small. Can you test on smaller table, for example with 10 records and if results are repeat, pls, send to me test suite... -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Turning off disk caching
Hi Is there anybody who knows about "Turning off disk caching" in solaris machines. If so, pl. reply back. Thanks Dhanaraj ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 11:31, kirjutas Bruce Momjian: > Hannu Krosing wrote: > > > > pass 3: clean heap based on ctid from pass 1 > > > > > > > > If yo do it this way, you dont need to invent new data structures to > > > > pass extra info about CITC internals to passes 2 and 3 > > > > > > > > On more thing - when should free space map be notified about free space > > > > in pages with CITC chains ? > > > > > > Uh, well, I am thinking we only free CITC space when we are going to use > > > it for an UPDATE, rather than free things while doing an operation. It > > > is good to keep the cleanup overhead out of the main path as much as > > > possible. > > > > So vacuum should only remove dead CITC chains and leave the ones with > > live tuples to CITC internal use ? > > Yes, it has to. What else would it do? Add index entries? No, clean out the dead part. But this would probably add the page to FSM - do we want that. Also, this cleaning should probably be done at pass1, so we dont have to carry the ctids of tuples which have no index entries around to passes 2 and 3 . This has the downside of possibly writing the heap page twice, so maybe we dont want it. > > That would also suggest that pages having live CITC chains and less than > > N% of free space should mot be reported to FSM. > > Parts of the CITC that are not visible can be used for free space by > vacuum, but the visible part is left alone. > -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2
> > > Suggest that we prevent write operations on Frozen tables by > > > revoking > > all INSERT, UPDATE or DELETE rights held, then enforcing a check > > during GRANT to prevent them being re-enabled. Superusers would need > > to check every time. If we dont do this, then we will have two > > contradictory states marked in the catalog - privilges saying Yes and > > freezing saying No. > > > > No, I'd not mess with the permissions and return a different error > > when trying to modify a frozen table. (It would also be complicated to > > unfreeze after create database) We should make it clear, that freezing > > is no replacement for revoke. > > That was with a mind to performance. Checking every INSERT, > UPDATE and DELETE statement to see if they are being done > against a frozen table seems like a waste. I'd think we would have relminxid in the relcache, so I don't buy the performance argument :-) (You could still do the actual check in the same place where the permission is checked) > There would still be a specific error message for frozen > tables, just on the GRANT rather than the actual DML statements. I'd still prefer to see the error on modify. Those that don't can revoke. Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Table clustering idea
Jim C. Nasby wrote: On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote: Other DBMS have index organized tables that can use either hash or btree organizations, both of which have their uses. We are planning to implement btree organized tables sometime - anyone else interested in this idea? I'm curious how you'll do it, as I was once told that actually trying to store heap data in a btree structure would be a non-starter (don't remember why). Ingres is now open source - they have clustering on btree/isam/hash (it's called "modify table xx to btree on col1,col2") Regards, Kim ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing: > Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: > > Jim C. Nasby wrote: > > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: > > > > > > > > It is certainly possible to do what you are suggesting, that is have two > > > > index entries point to same chain head, and have the index access > > > > routines figure out if the index qualifications still hold, but that > > > > seems like a lot of overhead. > > I think Jim meant not 2 pointing to the same head, but 2 pointing into > the same chain. Say we have table with (id serial, ts timestamp) where > ts changes at each update and id does not. > > So after 3 updates on one page we have one CITC/ITPC head with pointers > from both indexes and two follow-up tuples with pointers from only ts > index. > > The problem with this setup is, that we can't reuse any of those > follow-up tuples without index cleanup. But we still have to think about similar cases (index entries pointing inside CITC chains), unless we plan to disallow adding indexes to tables. Perhaps that case has to simply disable heap tuple reuse until some event. what would that event be? Or maybe we should have some bitmap of dirty tuple ids inside each page, that is tuple ids that have index pointers to them. and then avoid using these ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
My idea is that if an UPDATE places the new tuple on the same page as the old tuple, it will not create new index entries for any indexes where the key doesn't change. Basically the idea behind preventing index bloat by updates is to have one index tuple point to several actual tuples having the same value. So : Index entry -> list of tuples having the same value -> actual tuples (-> represents an indirection) I proposed to put the list of tuples in the index ; you propose to put it in data pages. I think both solutions have pros and cons : * List of tuples in the index : + reduces index size, makes cacheability in RAM more likely + speeds up index scans - complexity - slows down modifications to the index (a bit) * List of tuples in the page + simpler to implement + reduces index size, but less so than previous solution - useless if UPDATE puts the new tuple on a different page I guess the best solution would be a mix of both. Also, I insist (again) that there is a lot to gain by using a bit of compression on the data pages, even if it's very simple compression like storing the new version of a row as a difference from the previous version (ie. only store the columns that changed). I think DB2 stores the latest version entirely, and stores the previous versions as a delta. This is more efficient. In the case of tables containing TEXT values, these could also get TOASTed. When an update does not modify the TOASTed columns, it would be nice to simply be able to keep the reference to the TOASTed data instead of decompressing it and recompressing it. Or is it already the case ? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2
On Tue, 2006-06-27 at 10:04 +0200, Zeugswetter Andreas DCP SD wrote: > Simon wrote: > > Suggest that we prevent write operations on Frozen tables by revoking > all INSERT, UPDATE or DELETE rights held, then enforcing a check during > GRANT to prevent them being re-enabled. Superusers would need to check > every time. If we dont do this, then we will have two contradictory > states marked in the catalog - privilges saying Yes and freezing saying > No. > > No, I'd not mess with the permissions and return a different error when > trying to > modify a frozen table. (It would also be complicated to unfreeze after > create database) > We should make it clear, that freezing is no replacement for revoke. That was with a mind to performance. Checking every INSERT, UPDATE and DELETE statement to see if they are being done against a frozen table seems like a waste. There would still be a specific error message for frozen tables, just on the GRANT rather than the actual DML statements. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2
Very nice explanation, thanks Alvaro. > 2. Mark frozen databases specially somehow. >To mark databases frozen, we need a way to mark tables as frozen. >How do we do that? As I explain below, this allows some nice >optimizations, but it's a very tiny can full of a huge amount of >worms. > > Marking a Table Frozen > == > > Marking a table frozen is simple as setting relminxid = > FrozenXid for a table. As explained above, this cannot be > done in a regular postmaster environment, because a > concurrent transaction could be doing nasty stuff to a table. > So we can do it only in a standalone backend. Unless you lock the table exclusively during vacuum, that could be done with vacuum freeze. I like that more, than changing stuff that is otherwise completely frozen/static. (I see you wrote that below) > On the other hand, a "frozen" table must be marked with > relminxid = a-regular-Xid as soon as a transaction writes > some tuples on it. Note that this "unfreezing" must take > place even if the offending transaction is aborted, because > the Xid is written in the table nevertheless and thus it > would be incorrect to lose the unfreezing. The other idea was to need a special unfreeze command ... > > This is how pg_class_nt came into existence -- it would be a > place where information about a table would be stored and not > subject to the rolling back of the transaction that wrote it. Oh, that puts it in another league, since it must guarantee commit. I am not sure we can do that. The previous discussion was about concurrency and data that was not so important like tuple count. In short: - I'd start with #1 (no relminxid = FrozenXid) like Tom suggested - and then implement FREEZE/UNFREEZE with exclusive locks like Simon wrote (so it does not need pg_class_nt) and use that for the templates. Simon wrote: > Suggest that we prevent write operations on Frozen tables by revoking all INSERT, UPDATE or DELETE rights held, then enforcing a check during GRANT to prevent them being re-enabled. Superusers would need to check every time. If we dont do this, then we will have two contradictory states marked in the catalog - privilges saying Yes and freezing saying No. No, I'd not mess with the permissions and return a different error when trying to modify a frozen table. (It would also be complicated to unfreeze after create database) We should make it clear, that freezing is no replacement for revoke. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian: > Jim C. Nasby wrote: > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote: > > > > > > It is certainly possible to do what you are suggesting, that is have two > > > index entries point to same chain head, and have the index access > > > routines figure out if the index qualifications still hold, but that > > > seems like a lot of overhead. I think Jim meant not 2 pointing to the same head, but 2 pointing into the same chain. Say we have table with (id serial, ts timestamp) where ts changes at each update and id does not. So after 3 updates on one page we have one CITC/ITPC head with pointers from both indexes and two follow-up tuples with pointers from only ts index. The problem with this setup is, that we can't reuse any of those follow-up tuples without index cleanup. > > > Also, once there is only one visible row in the chain, removing old > > > index entries seems quite complex because you have to have vacuum keep > > > the qualifications of each row to figure out which index tuple is the > > > valid one (seems messy). > > > > Perhaps my point got lost... in the case where no index keys change > > during an update, SITC seems superior in every way to my proposal. My > > idea (let's call it Index Tuple Page Consolidation, ITPC) would be > > beneficial to UPDATEs that modify one or more index keys but still put > > the tuple on the same page. Where SITC would be most useful for tables > > that have a very heavy update rate and very few indexes, ITPC would > > benefit tables that have more indexes on them; where presumably it's > > much more likely for UPDATEs to change at least one index key (which > > means SITC goes out the window, if I understand it correctly). If I'm > > missing something and SITC can in fact deal with some index keys > > changing during an UPDATE, then I see no reason for ITPC. > > I understood what you had said. The question is whether we want to get > that complex with this feature, and if there are enough use cases > (UPDATE with index keys changing) to warrant it. I'd like to think that most heavily-updated tables avoid that, but there may be still cases where this is needed. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [COMMITTERS] pgsql: Clamp last_anl_tuples to n_live_tuples, in case we vacuum a table
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane > Sent: 27 June 2006 05:12 > To: Alvaro Herrera > Cc: Hackers > Subject: Re: [HACKERS] [COMMITTERS] pgsql: Clamp > last_anl_tuples to n_live_tuples, in case we vacuum a table > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > On a loosely related matter, how about changing pg_class.relpages to > > pg_class.reldensity? > > IIRC, I considered this earlier, and rejected it because it > doesn't cope > well with the corner case relpages == 0. Also, it'll break existing > clients that expect to find relpages and reltuples, if there are any There are. /D ---(end of broadcast)--- TIP 6: explain analyze is your friend