Re: [HACKERS] Page format changes for 8.2?

2006-06-27 Thread Tom Lane
Josh Berkus  writes:
> Do we have anything in the pipeline that would result in page format changes 
> for 8.2?

[ looks at
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h
]

I don't see any page format changes per se, but inet/cidr datatype
representation changed here:

2006-01-25 21:35  tgl

* doc/src/sgml/func.sgml, src/backend/optimizer/path/indxpath.c,
src/backend/utils/adt/network.c, src/include/catalog/catversion.h,
src/include/catalog/pg_amop.h, src/include/catalog/pg_cast.h,
src/include/catalog/pg_operator.h, src/include/catalog/pg_proc.h,
src/include/utils/builtins.h, src/include/utils/inet.h,
src/test/regress/expected/opr_sanity.out,
src/test/regress/sql/opr_sanity.sql: Clean up the INET-vs-CIDR
situation.  Get rid of the internal is_cidr flag and rely
exclusively on the SQL type system to tell the difference between
the types.  Prevent creation of invalid CIDR values via casting
from INET or set_masklen() --- both of these operations now
silently zero any bits to the right of the netmask.  Remove
duplicate CIDR comparison operators, letting the type rely on the
INET operators instead.

We've debated before how to get pg_upgrade to handle these sorts of
changes.  If you want to make it happen, step right up ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Page format changes for 8.2?

2006-06-27 Thread Josh Berkus
Bruce,

Do we have anything in the pipeline that would result in page format changes 
for 8.2?  I'm wondering if it's worth reviving pg_upgrade, folks at work are 
interested ...

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Josh Berkus
Jim,

> I know there were discussions in the past, though as per usual I can't
> find them in the archives.

Search on "B-Tree Organized Tables".

>From what I can find, this feature isn't prohibitively useless.  It's just a 
singnificant amount of effort for a result which is a tradeoff.   That is, 
you'd *only* want to use it on tables which are *always* accessed by their 
primary key.  

What stopped the features AFAICT is that the interested parties weren't up to 
doing the code.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS]

2006-06-27 Thread tju tju
-- liuchao


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Stephen Frost
* Stephen Frost ([EMAIL PROTECTED]) wrote:
> That would be an *excellent* addition..  Honestly, I think it'd be nice
> to get a 'NOTICE' in such cases too, but having it in pg_stat_activity
> will help alot.

Additionally, Tom, and I hate to point this out here but I don't see
much of an alternative; your mail system blocked my mail and apparently 
would block any mail from theplanet.com (a rather large hosting company 
based in Texas).  Having been harrassed by their support folks for
running a couple rather large mailing lists (the MythTV ones at
mythtv.org) more than once I'm thinking this is a rather poor RBL.  I'd
encourage you to drop it in favor of something a bit more sane.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Add GUC update_process_title to control whether 'ps' display is updated
> > for every command, default to on.
> 
> It strikes me that the ps_status support provides one important bit of
> information that is currently hard to get elsewhere; specifically, the
> "waiting" flag that gets added while blocked on a lock.  You can find
> out if a process is blocked by looking in pg_locks, but that's a fairly
> expensive probe in itself and then you have to join to pg_stat_activity
> to make any sense of it.  I wonder if we should add a "waiting" boolean
> column to pg_stat_activity?  Given the new implementation of
> pg_stat_activity, updating such a flag would be pretty cheap.

That would be an *excellent* addition..  Honestly, I think it'd be nice
to get a 'NOTICE' in such cases too, but having it in pg_stat_activity
will help alot.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:

2006-06-27 Thread Mark Kirkwood

Tom Lane wrote:

[EMAIL PROTECTED] (Bruce Momjian) writes:

Add GUC update_process_title to control whether 'ps' display is updated
for every command, default to on.


It strikes me that the ps_status support provides one important bit of
information that is currently hard to get elsewhere; specifically, the
"waiting" flag that gets added while blocked on a lock.  You can find
out if a process is blocked by looking in pg_locks, but that's a fairly
expensive probe in itself and then you have to join to pg_stat_activity
to make any sense of it.  I wonder if we should add a "waiting" boolean
column to pg_stat_activity?  Given the new implementation of
pg_stat_activity, updating such a flag would be pretty cheap.




Funny - today I was just thinking how useful that would be!

Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Add GUC update_process_title to control whether 'ps' display is updated
> > for every command, default to on.
> 
> It strikes me that the ps_status support provides one important bit of
> information that is currently hard to get elsewhere; specifically, the
> "waiting" flag that gets added while blocked on a lock.  You can find
> out if a process is blocked by looking in pg_locks, but that's a fairly
> expensive probe in itself and then you have to join to pg_stat_activity
> to make any sense of it.  I wonder if we should add a "waiting" boolean
> column to pg_stat_activity?  Given the new implementation of
> pg_stat_activity, updating such a flag would be pretty cheap.

Nice idea.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Tom Lane
[EMAIL PROTECTED] (Bruce Momjian) writes:
> Add GUC update_process_title to control whether 'ps' display is updated
> for every command, default to on.

It strikes me that the ps_status support provides one important bit of
information that is currently hard to get elsewhere; specifically, the
"waiting" flag that gets added while blocked on a lock.  You can find
out if a process is blocked by looking in pg_locks, but that's a fairly
expensive probe in itself and then you have to join to pg_stat_activity
to make any sense of it.  I wonder if we should add a "waiting" boolean
column to pg_stat_activity?  Given the new implementation of
pg_stat_activity, updating such a flag would be pretty cheap.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Stefan Kaltenbrunner
Teodor Sigaev wrote:
>> test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
>> CREATE INDEX
>> Time: 416122.896 ms
>>
>> so about 7 minutes - sounds very reasonable
>>
>> test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
>> CREATE INDEX
>> Time: 52681605.101 ms
> 
> I'll look at this,  but GiST time creation is suspiciously small.
> Can you test on smaller table, for example with 10 records and if
> results are repeat, pls, send to me test suite...

I won't have access to the original testcase and server for a few days
but I just redid some testing on a slower personal box of mine with a
smaller(but similiar) testset and on that box I could not reproduce that
issue.
So the problem is either caused by the size of the table or somehow by
the data itself :-(


Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> While we could possibly come up with a suitable configure test to
>> determine whether posix_fadvise is actually safe to use on a given
>> system, I think we should seriously consider just reverting the patch.
>> As far as I saw, zero evidence was given that it actually does anything
>> measurable.  Without a benchmark to prove that it's worth spending more
>> time on, I'm disinclined to trouble over it.

> Agreed.  How about if we just #ifdef NOT_USED the code and mention the
> problem in a comment.

Works for me; I'll write something and commit it.  We can leave the
is-posix_fadvise-declared configure test in place, at least for now ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Mark Woodward
> We have definitly seen weird timing issues sometimes when both client
> and server were on Windows, but have been unable to pin it exactly on
> what. From Yoshiykis other mail it looks like this could possibly be it,
> since he did experience a speedup in the range we've been looking for in
> those cases.
>
>
>> What I would think might help is a patch on the libpq side (because it
>> *does* use a nonblocking socket) to avoid sending more than
>> 8K per WSASend call.  The effect would just be to break a
>> long send into a series of shorter sends, which wouldn't
>> really do anything useful on a well-designed TCP stack, but
>> then this is Windows we're talking about...
>
> It could definitly be a good idea to have a patch there *as well*, but I
> think they'd both be affected.

As I said earlier, I would boost the socket buffer to something larger
than merely 2x the packet size. I'd try for 32K (32768), that way we have
some space for additional buffers before we hit the problem. It is
presumed that we should have enough data in the socket buffer to at least
try to match the expected amount of data that would be sent while waiting
for the defered ACK.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
"Magnus Hagander" <[EMAIL PROTECTED]> writes:
> We use non-blocking sockets in backend/port/win32/socket.c so we are
> able to deliver our "faked signals" while waiting for I/O on the socket.
> We specifically set it in pgwin32_socket().

Hm, that seems a bit grotty, but anyway I stand corrected.

> Given that, it might be a good idea to actually put the code there
> instead, to localise it. With a comment and a reference to that Q
> article.

No, I think the patch has it in the right place, because pgwin32_socket
would have no defensible way of knowing what the max send size would be.
(Indeed, with a slightly different implementation in pqcomm.c, there
would not *be* any hard upper limit; the current code wastes cycles
copying data around, when with a large message it probably should just
send() directly from the message buffer...)

I agree it needs a comment though.

>> What I would think might help is a patch on the libpq side (because it
>> *does* use a nonblocking socket) to avoid sending more than 
>> 8K per WSASend call.

> It could definitly be a good idea to have a patch there *as well*, but I
> think they'd both be affected.

On the libpq side, sending large messages is probably rare except for
COPY IN mode.  Has anyone noticed performance issues specifically with
COPY IN?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
> I've been digging into why buildfarm member thrush has been dumping core
> consistently during the regression tests since the posix_fadvise patch
> went in.  I've confirmed that posix_fadvise() itself will SIGSEGV in a
> standalone test program, and found that this happens only if
> _FILE_OFFSET_BITS=64 ... which is our default configuration on Linux.
> 
> Some googling turned up the following 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219
> which basically says that posix_fadvise64 + 2.4 kernel + older glibc
> = crash.  It sounds like the 2.4 kernel hasn't got this call but glibc
> thought it did, up till about a year ago.
> 
> While we could possibly come up with a suitable configure test to
> determine whether posix_fadvise is actually safe to use on a given
> system, I think we should seriously consider just reverting the patch.
> As far as I saw, zero evidence was given that it actually does anything
> measurable.  Without a benchmark to prove that it's worth spending more
> time on, I'm disinclined to trouble over it.

Agreed.  How about if we just #ifdef NOT_USED the code and mention the
problem in a comment.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Greg Stark wrote:
> 
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > PFC wrote:
> > > 
> > > > My idea is that if an UPDATE places the new tuple on the same page as
> > > > the old tuple, it will not create new index entries for any indexes
> > > > where the key doesn't change.
> > > 
> > >   Basically the idea behind preventing index bloat by updates is to have  
> > > one index tuple point to several actual tuples having the same value.
> > >   
> > 
> > The idea is not to avoid index bloat, but to allow heap reuse, and having
> > one index entry for multiple versions of an UPDATEd row is merely an
> > implementation detail.
> 
> It sort of sounds like you're describing a whole new index type that stores
> only the page, not the precise record of any tuple it indexes. If your table

Background, indexes point to page item pointers, not to actual offsets
in the page.  This is how vacuum can move around tuples without modifying the
indexes.  The index points to a page item pointer that is a chain of
tuples with the same indexed columns.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [COMMITTERS] pgsql: Disallow changing/dropping default

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> Wasn't this patch rejected?
> 
> > Anyway, what is your opinion on this?
> 
> I thought we'd rejected it.  I'm not sure that we'd completely agreed
> what the best thing to do is, but what this patch actually does is to
> silently remove the dependency link.  That is, after
> 
>   create table t1 (f1 serial);
>   alter table t1 alter column f1 drop default;
> 
> t1_f1_seq is still there, but now completely unconnected to t1.
> That doesn't seem to me to satisfy the principle of least surprise.
> It's certainly not what the TODO item says (reject the DROP DEFAULT).
> I think we were considering the alternative of having the DROP DEFAULT
> remove the sequence, which probably could be implemented painlessly
> with a change in the way we set up the dependency links to start with.
> 
> In any case I don't like this patch: int/bool confusion, use of elog
> instead of ereport for a user-facing error message, failure to adhere to
> style guidelines for that message, etc.  (Although seeing that the error
> message is unreachable code, maybe that doesn't matter ;-))  Aside from
> the poor coding style, the whole idea of reaching into pg_depend to
> remove a single dependency strikes me as a brute-force solution to
> a problem that should have a more elegant answer.

Agreed, patch reverted.  Thanks for the analysis.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Mark Woodward

I would set the SO_SNDBUF to 32768.

> Hi,
>
> I see a performance issue on win32. This problem is causes by the
> following URL.
>
> http://support.microsoft.com/kb/823764/EN-US/
>
> On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
> 8192 too.
>
> pqcomm.c:117
>   #define PQ_BUFFER_SIZE 8192
>
> send() may take as long as 200ms. So, I think we should increase
> SO_SNDBUF to more than 8192. I attache the patch.
>
> Regards,
> --
> Yoshiyuki Asaba
> [EMAIL PROTECTED]
> Index: pqcomm.c
> ===
> RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v
> retrieving revision 1.184
> diff -c -r1.184 pqcomm.c
> *** pqcomm.c  5 Mar 2006 15:58:27 -   1.184
> --- pqcomm.c  27 Jun 2006 15:17:18 -
> ***
> *** 593,598 
> --- 593,608 
>   return STATUS_ERROR;
>   }
>
> + #ifdef WIN32
> + on = PQ_BUFFER_SIZE * 2;
> + if (setsockopt(port->sock, SOL_SOCKET, SO_SNDBUF,
> +(char *) &on, sizeof(on)) < 0)
> + {
> + elog(LOG, "setsockopt(SO_SNDBUF) failed: %m");
> + return STATUS_ERROR;
> + }
> + #endif
> +
>   /*
>* Also apply the current keepalive parameters.  If we fail to 
> set a
>* parameter, don't error out, because these aren't universally
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Greg Stark

Bruce Momjian <[EMAIL PROTECTED]> writes:

> PFC wrote:
> > 
> > > My idea is that if an UPDATE places the new tuple on the same page as
> > > the old tuple, it will not create new index entries for any indexes
> > > where the key doesn't change.
> > 
> > Basically the idea behind preventing index bloat by updates is to have  
> > one index tuple point to several actual tuples having the same value.
> > 
> 
> The idea is not to avoid index bloat, but to allow heap reuse, and having
> one index entry for multiple versions of an UPDATEd row is merely an
> implementation detail.

It sort of sounds like you're describing a whole new index type that stores
only the page, not the precise record of any tuple it indexes. If your table
has only such indexes then you never need to worry about updating indexes if
your new tuple version goes on the same page as the old one.

It's an interesting thought experiment. It might trade off a lot of work in
index maintenance as well as saving space in the index for a lot of additional
work performing index scans. There can easily be enough tuples on a page to
make scanning the entire page pretty costly.


-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Tom Lane
I've been digging into why buildfarm member thrush has been dumping core
consistently during the regression tests since the posix_fadvise patch
went in.  I've confirmed that posix_fadvise() itself will SIGSEGV in a
standalone test program, and found that this happens only if
_FILE_OFFSET_BITS=64 ... which is our default configuration on Linux.

Some googling turned up the following 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219
which basically says that posix_fadvise64 + 2.4 kernel + older glibc
= crash.  It sounds like the 2.4 kernel hasn't got this call but glibc
thought it did, up till about a year ago.

While we could possibly come up with a suitable configure test to
determine whether posix_fadvise is actually safe to use on a given
system, I think we should seriously consider just reverting the patch.
As far as I saw, zero evidence was given that it actually does anything
measurable.  Without a benchmark to prove that it's worth spending more
time on, I'm disinclined to trouble over it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Magnus Hagander
> > From: Tom Lane <[EMAIL PROTECTED]>
> >> It also says that the condition only occurs if the program uses 
> >> non-blocking sockets ... which the backend does not.  So this page 
> >> offers no support for the proposed patch.
> 
> > WSAEventSelect() sets a socket to nonblocking mode.
> 
> Yeah, but that socket is only used for inter-backend 
> signaling with small (1 byte, I think) messages.  The socket 
> used for communication with the frontend is not in 
> nonblocking mode, unless I'm totally confused.

For once, I beleive you are :-)
We use non-blocking sockets in backend/port/win32/socket.c so we are
able to deliver our "faked signals" while waiting for I/O on the socket.
We specifically set it in pgwin32_socket().

Given that, it might be a good idea to actually put the code there
instead, to localise it. With a comment and a reference to that Q
article.


> Have you actually measured any performance benefit from this 
> patch, and if so what was the test case?  I'm not opposed to 
> the patch if it does something useful, but the info currently 
> available does not suggest that it will help.

We have definitly seen weird timing issues sometimes when both client
and server were on Windows, but have been unable to pin it exactly on
what. From Yoshiykis other mail it looks like this could possibly be it,
since he did experience a speedup in the range we've been looking for in
those cases.


> What I would think might help is a patch on the libpq side (because it
> *does* use a nonblocking socket) to avoid sending more than 
> 8K per WSASend call.  The effect would just be to break a 
> long send into a series of shorter sends, which wouldn't 
> really do anything useful on a well-designed TCP stack, but 
> then this is Windows we're talking about...

It could definitly be a good idea to have a patch there *as well*, but I
think they'd both be affected.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Table clustering idea

2006-06-27 Thread J. Andrew Rogers


On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote:

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item  
(since

items will need to move to maintain an IOT).



There are clean ways to handle this.  The table is organized on the  
primary key, a typical requirement for IOTs.  Any indexes you add to  
IOT reference the primary key of the heap tuple.  Since the heap and  
PK index are the same thing, external indexes use the PK as the tuple  
identifier.


The only caveat is that this creates performance asymmetries.  IOTs  
have significantly faster access through their primary keys but  
slower external index access since two B-Trees have to be traversed.   
An IOT is typically only used for tables that are only accessed  
through their primary key.  Not supporting external indexes on IOTs  
is a functional implementation (and probably recommended in  
practice), though most real implementations allow external indexes if  
not always in their first version.



J. Andrew Rogers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Yoshiyuki Asaba <[EMAIL PROTECTED]> writes:
> From: Tom Lane <[EMAIL PROTECTED]>
>> It also says that the condition only occurs if the program uses
>> non-blocking sockets ... which the backend does not.  So this page
>> offers no support for the proposed patch.

> WSAEventSelect() sets a socket to nonblocking mode.

Yeah, but that socket is only used for inter-backend signaling with
small (1 byte, I think) messages.  The socket used for communication
with the frontend is not in nonblocking mode, unless I'm totally
confused.

Have you actually measured any performance benefit from this patch,
and if so what was the test case?  I'm not opposed to the patch if it
does something useful, but the info currently available does not suggest
that it will help.

What I would think might help is a patch on the libpq side (because it
*does* use a nonblocking socket) to avoid sending more than 8K per
WSASend call.  The effect would just be to break a long send into a
series of shorter sends, which wouldn't really do anything useful on a
well-designed TCP stack, but then this is Windows we're talking about...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject

2006-06-27 Thread Thomas Hallgren

Sorry, wrong list... I reposted this on pgsql-jdbc instead.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Csaba Nagy
> I think one of the issues might have been: how will you handle other
> indexes on the table when you can no longer point them at an item (since
> items will need to move to maintain an IOT).

I guess you shouldn't allow any other indexes. That's a perfectly
acceptable compromise I think... it would be still very useful for big
and narrow tables which would benefit from being clustered.

The other concern is how would you do sequential scans on the table if
items are allowed to move ? I think some other DBs have a facility to
make a "fast index scan" which is essentially a sequential scan of the
index file, something like that would be needed here too.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Tom Lane <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Tue, 27 Jun 2006 12:28:35 -0400

> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > Martijn van Oosterhout wrote:
> >> On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
> >>> http://support.microsoft.com/kb/823764/EN-US/
> 
> > No, it says it occurs if this condition is met: "A single *send* call or 
> > *WSASend* call fills the whole underlying socket send buffer."
> 
> It also says that the condition only occurs if the program uses
> non-blocking sockets ... which the backend does not.  So this page
> offers no support for the proposed patch.

WSAEventSelect() sets a socket to nonblocking mode.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcecomm5/html/wce50lrfWSAEventSelect.asp

pgwin32_send() calls pgwin32_waitforsinglesocket() before
WSASend(). And pgwin32_waitforsinglesocket() calls WSAEventSelect().

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Jim C. Nasby
On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote:
> Jim,
> 
> On 6/26/06 8:15 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> 
> > On a somewhat related note, I think that it would be advantageous if the
> > FSM had a means to prefer certain pages for a given tuple over other
> > pages. This would allow for a better way to keep heap and possibly index
> > data more compacted, and it would also be a means of keeping tables
> > loosely clustered. It would also make it far easier to shrink heaps that
> > have become bloated, because the FSM could be told to favor pages at the
> > beginning of the relation.
> 
> Interesting idea - page affinity implemented using the FSM.
> 
> WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
> Teradata implemented a hashing filesystem for their heap storage and I've
> always wondered about how they handle collision and chaining efficiently,
> but it's a solved problem for sure - knowing that makes the challenge that
> much easier :-)

I know there were discussions in the past, though as per usual I can't
find them in the archives. At one point I had suggested clustering not
on a row level, but on a page level, since it doesn't really matter
terribly if the tuples in a page are clustered (worst case you can scan
the entire page).

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item (since
items will need to move to maintain an IOT).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Martijn van Oosterhout 
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32?
Date: Tue, 27 Jun 2006 18:13:18 +0200

> On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote:
> > No, it says it occurs if this condition is met: "A single *send* call or 
> > *WSASend* call fills the whole underlying socket send buffer."
> > 
> > This will surely be true if the buffer sizes are the same. They 
> > recommend making the socket buffer at least 1 byte bigger.
> 
> Ok, but even then, are there any benchmarks to show it makes a
> difference. The articles suggests there should be but it would be nice
> to see how much difference it makes...

I see the problem in this environment.

* client
  - Windows XP
  - using ODBC driver

* server
  - Windows XP
  - 8.1.4

* query time
  - original -> about 12sec.
  - patch version -> about 3sec.

However, this problem did not occur when I changed a client
machine...

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Jim C. Nasby
On Tue, Jun 27, 2006 at 10:42:54AM +0200, PFC wrote:
>   Also, I insist (again) that there is a lot to gain by using a bit of 
> compression on the data pages, even if it's very simple compression like  
> storing the new version of a row as a difference from the previous version  
> (ie. only store the columns that changed).
>   I think DB2 stores the latest version entirely, and stores the 
>   previous  versions as a delta. This is more efficient.
 
This would only help on tables that:

have many columns[1]
are frequently updated
the updates normally touch few columns

[1] I'm assuming that un-changed toasted fields keep the same pointer

I'm doubtful that that case is common enough to warrant the amount of
work that would be involved in doing this. I think it might be useful to
consider ways to make vertical partitioning easier, since that's the
common means to reduce the impact of these scenarios.

>   In the case of tables containing TEXT values, these could also get  
> TOASTed. When an update does not modify the TOASTed columns, it would be  
> nice to simply be able to keep the reference to the TOASTed data instead  
> of decompressing it and recompressing it. Or is it already the case ?

Hopefully it is, but I'm not sure... something that would be good is a
means to force fields to be toasted sooner than when the tuple is bigger
than 2k, because that'd be a very easy way to get gains from vertical
partitioning.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Martijn van Oosterhout wrote:
>> On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
>>> http://support.microsoft.com/kb/823764/EN-US/

> No, it says it occurs if this condition is met: "A single *send* call or 
> *WSASend* call fills the whole underlying socket send buffer."

It also says that the condition only occurs if the program uses
non-blocking sockets ... which the backend does not.  So this page
offers no support for the proposed patch.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject & SetString]

2006-06-27 Thread Thomas Hallgren
There's an inconsistency between the handling of trailing whitespace in 
query parameters in the client jdbc driver compared to the PL/Java SPI 
based driver. According to Jean-Pierre, the former apparently trims the 
trailing spaces before passing the query (see below). What is the 
correct behavior?


Regards,
Thomas Hallgren


 Original Message 
Subject: 	Re: [Pljava-dev] char with trailing space, 
PreparedStatement.setObject & SetString

Date:   Tue, 27 Jun 2006 12:07:19 -0400
From:   JEAN-PIERRE PELLETIER <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]



Hi Thomas,

There are very few char columns on my system, I can easily live with 
explicit trim

in my application code, I only wanted to let you guys know.

I am not sure which of the two JDBC implementations is right.
psql and pgadmin would both handle char as expected.
As for JDBC, you might want to know how other dbms are handling this.

Thanks for your reply.
Jean-Pierre Pelletier


From: Thomas Hallgren <[EMAIL PROTECTED]>
To: JEAN-PIERRE PELLETIER <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
Subject: Re: [Pljava-dev] char with trailing 
space,	PreparedStatement.setObject & SetString

Date: Tue, 27 Jun 2006 17:47:24 +0200

Hi Jean-Pierre,
I'm not sure this is incorrect behavior. There's nothing in the spec that 
indicates that String values should be trimmed by setString and setObject. 
On the contrary. Some datatypes (the CHAR in particular) are sensitive to 
whitespace according to the SQL standard. Perhaps the client jdbc driver is 
doing something wrong here?


Regards,
Thomas Hallgren


JEAN-PIERRE PELLETIER wrote:

Hi,

Trailing space are not handled properly by setObject & setString.

PreparedStatement pstmt = connection.prepareStatement(
   "select * from mytable where mycharcolumn = ?");

String myString = "abc ";
pstmt.setObject(1, myString); // or setObject(1, myString, Types.CHAR) or 
setString(1, myString)


No rows are returned, but using trim works fine as in:
pstmt.setObject(1, myString.trim());

My environment is Pl/Java 1.3, Sun JDK 1.5.07,  PostgreSQL 8.1.4, Windows 
XP SP2


With PostgreSQL own (non pl/java) jdbc driver, setObject on char works 
fine without the trim.


Thanks,
Jean-Pierre Pelletier


___
Pljava-dev mailing list
[EMAIL PROTECTED]
http://gborg.postgresql.org/mailman/listinfo/pljava-dev





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Jim C. Nasby wrote:
> > > Perhaps my point got lost... in the case where no index keys change
> > > during an update, SITC seems superior in every way to my proposal. My
> > > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > > beneficial to UPDATEs that modify one or more index keys but still put
> > > the tuple on the same page. Where SITC would be most useful for tables
> > > that have a very heavy update rate and very few indexes, ITPC would
> > > benefit tables that have more indexes on them; where presumably it's
> > > much more likely for UPDATEs to change at least one index key (which
> > > means SITC goes out the window, if I understand it correctly). If I'm
> > > missing something and SITC can in fact deal with some index keys
> > > changing during an UPDATE, then I see no reason for ITPC.
> > 
> > I understood what you had said.  The question is whether we want to get
> > that complex with this feature, and if there are enough use cases
> > (UPDATE with index keys changing) to warrant it.
> 
> Ideas on how to test a table to see how many tuples would fit this
> criteria?
> 
> Or we could just shelve ITPC as a possibility in the future, after we
> see how much the limitations of SITC hurt.

Probably.  I am not sure even SITC is a win given the complexity it will
add, but I think it is worth trying.  Getting into more complex cases
where chains change indexed values seems like something we could try
later if we have to.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote:
> > > Yes, and for index_getmulti (which doesn't visit the heap at all) we'll
> > > have to change all the users of that (which aren't many, I suppose).
> > > It's probably worth making a utility function to expand them.
> > > 
> > > I'm still confused where bitmap index scan fit into all of this. Is
> > > preserving the sequential scan aspect of these a goal with this new
> > > setup?
> > 
> > No.  I was just pointing out that if you get to the tuple via an index,
> > you get handed the head of the SITC via the index tuple, but if you are
> > doing a sequential scan, you don't get it, so you have to find it, or
> > any other non-visible SITC header.
> 
> Ok, but it remains true that you can only have one SITC per tuple. So
> if you have 5 indexes on a table, any SITC will only join tuples that
> didn't change any values in any of the indexed columns. That's probably
> not a big deal though; indexes columns arn't likely to be the ones
> changing much.

Right.

> So, for the bitmap scan you have to make sure that within a single
> transaction, scanning multiple indexes will have to provide the same
> SITC for each set of tuples, even in the face of concurrent updates.
> Otherwise the BitmapAnd will incorrectly throw them out.

The index points to the item id on the page, and that never changes,
even if the head tuple changes later.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Jim C. Nasby
On Mon, Jun 26, 2006 at 11:08:24PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > 
> > > It is certainly possible to do what you are suggesting, that is have two
> > > index entries point to same chain head, and have the index access
> > > routines figure out if the index qualifications still hold, but that
> > > seems like a lot of overhead.
> > > 
> > > Also, once there is only one visible row in the chain, removing old
> > > index entries seems quite complex because you have to have vacuum keep
> > > the qualifications of each row to figure out which index tuple is the
> > > valid one (seems messy).
> >  
> > Perhaps my point got lost... in the case where no index keys change
> > during an update, SITC seems superior in every way to my proposal. My
> > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > beneficial to UPDATEs that modify one or more index keys but still put
> > the tuple on the same page. Where SITC would be most useful for tables
> > that have a very heavy update rate and very few indexes, ITPC would
> > benefit tables that have more indexes on them; where presumably it's
> > much more likely for UPDATEs to change at least one index key (which
> > means SITC goes out the window, if I understand it correctly). If I'm
> > missing something and SITC can in fact deal with some index keys
> > changing during an UPDATE, then I see no reason for ITPC.
> 
> I understood what you had said.  The question is whether we want to get
> that complex with this feature, and if there are enough use cases
> (UPDATE with index keys changing) to warrant it.

Ideas on how to test a table to see how many tuples would fit this
criteria?

Or we could just shelve ITPC as a possibility in the future, after we
see how much the limitations of SITC hurt.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
PFC wrote:
> 
> > My idea is that if an UPDATE places the new tuple on the same page as
> > the old tuple, it will not create new index entries for any indexes
> > where the key doesn't change.
> 
>   Basically the idea behind preventing index bloat by updates is to have  
> one index tuple point to several actual tuples having the same value.
>   

The idea is not to avoid index bloat, but to allow heap reuse, and having
one index entry for multiple versions of an UPDATEd row is merely an
implementation detail.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
> > ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> > > Jim C. Nasby wrote:
> > > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > > > 
> > > > > It is certainly possible to do what you are suggesting, that is have 
> > > > > two
> > > > > index entries point to same chain head, and have the index access
> > > > > routines figure out if the index qualifications still hold, but that
> > > > > seems like a lot of overhead.
> > 
> > I think Jim meant not 2 pointing to the same head, but 2 pointing into
> > the same chain. Say we have table with (id serial, ts timestamp) where
> > ts changes at each update and id does not.
> > 
> > So after 3 updates on one page we have one CITC/ITPC head with pointers
> > from both indexes and two follow-up tuples with pointers from only ts
> > index.
> > 
> > The problem with this setup is, that we can't reuse any of those
> > follow-up tuples without index cleanup.
> 
> But we still have to think about similar cases (index entries pointing
> inside CITC chains), unless we plan to disallow adding indexes to
> tables.

CREATE INDEX has to undo any chains where the new indexed columns change
in the chain, and add index entries to remove the chain.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Martijn van Oosterhout
On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote:
> No, it says it occurs if this condition is met: "A single *send* call or 
> *WSASend* call fills the whole underlying socket send buffer."
> 
> This will surely be true if the buffer sizes are the same. They 
> recommend making the socket buffer at least 1 byte bigger.

Ok, but even then, are there any benchmarks to show it makes a
difference. The articles suggests there should be but it would be nice
to see how much difference it makes...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Rocco Altier
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Martijn van Oosterhout
> On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
> > Hi,
> > 
> > I see a performance issue on win32. This problem is causes by the
> > following URL. 
> > 
> > http://support.microsoft.com/kb/823764/EN-US/
> > 
> > On win32, default SO_SNDBUF value is 8192 bytes. And 
> libpq's buffer is
> > 8192 too.
> 
> Ok, so there's a difficiency in Windows TCP code. Do you have any
> benchmarks to show this actually makes a difference. According to the
> URL you give, the problem occurs if the libpq buffer is *bigger* than
> the socket buffer, which it isn't...
> 
The article also says there is a problem if they are the same size.

-rocco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Andrew Dunstan

Martijn van Oosterhout wrote:


On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
 


Hi,

I see a performance issue on win32. This problem is causes by the
following URL. 


http://support.microsoft.com/kb/823764/EN-US/

On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
8192 too.
   



Ok, so there's a difficiency in Windows TCP code. Do you have any
benchmarks to show this actually makes a difference. According to the
URL you give, the problem occurs if the libpq buffer is *bigger* than
the socket buffer, which it isn't...
 



No, it says it occurs if this condition is met: "A single *send* call or 
*WSASend* call fills the whole underlying socket send buffer."


This will surely be true if the buffer sizes are the same. They 
recommend making the socket buffer at least 1 byte bigger.


cheers

andrew



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Tom Lane <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Tue, 27 Jun 2006 11:30:56 -0400

> Yoshiyuki Asaba <[EMAIL PROTECTED]> writes:
> > send() may take as long as 200ms. So, I think we should increase
> > SO_SNDBUF to more than 8192. I attache the patch.
> 
> Why would that help?  We won't be sending more than 8K at a time.

MSDN is,

  Method2: Make the Socket Send Buffer Size Larger Than the Program
   Send Buffer Size

Modify the send call or the WSASend call to specify a buffer size
at least 1 byte smaller than the SO_SNDBUF value.

--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
> Hi,
> 
> I see a performance issue on win32. This problem is causes by the
> following URL. 
> 
> http://support.microsoft.com/kb/823764/EN-US/
> 
> On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
> 8192 too.

Ok, so there's a difficiency in Windows TCP code. Do you have any
benchmarks to show this actually makes a difference. According to the
URL you give, the problem occurs if the libpq buffer is *bigger* than
the socket buffer, which it isn't...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Yoshiyuki Asaba <[EMAIL PROTECTED]> writes:
> send() may take as long as 200ms. So, I think we should increase
> SO_SNDBUF to more than 8192. I attache the patch.

Why would that help?  We won't be sending more than 8K at a time.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
Hi,

I see a performance issue on win32. This problem is causes by the
following URL. 

http://support.microsoft.com/kb/823764/EN-US/

On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
8192 too.

pqcomm.c:117
  #define PQ_BUFFER_SIZE 8192

send() may take as long as 200ms. So, I think we should increase
SO_SNDBUF to more than 8192. I attache the patch.

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]
Index: pqcomm.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v
retrieving revision 1.184
diff -c -r1.184 pqcomm.c
*** pqcomm.c5 Mar 2006 15:58:27 -   1.184
--- pqcomm.c27 Jun 2006 15:17:18 -
***
*** 593,598 
--- 593,608 
return STATUS_ERROR;
}
  
+ #ifdef WIN32
+   on = PQ_BUFFER_SIZE * 2;
+   if (setsockopt(port->sock, SOL_SOCKET, SO_SNDBUF,
+  (char *) &on, sizeof(on)) < 0)
+   {
+   elog(LOG, "setsockopt(SO_SNDBUF) failed: %m");
+   return STATUS_ERROR;
+   }
+ #endif
+ 
/*
 * Also apply the current keepalive parameters.  If we fail to 
set a
 * parameter, don't error out, because these aren't universally

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [COMMITTERS] pgsql: Disallow changing/dropping default expression

2006-06-27 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Wasn't this patch rejected?

> Anyway, what is your opinion on this?

I thought we'd rejected it.  I'm not sure that we'd completely agreed
what the best thing to do is, but what this patch actually does is to
silently remove the dependency link.  That is, after

create table t1 (f1 serial);
alter table t1 alter column f1 drop default;

t1_f1_seq is still there, but now completely unconnected to t1.
That doesn't seem to me to satisfy the principle of least surprise.
It's certainly not what the TODO item says (reject the DROP DEFAULT).
I think we were considering the alternative of having the DROP DEFAULT
remove the sequence, which probably could be implemented painlessly
with a change in the way we set up the dependency links to start with.

In any case I don't like this patch: int/bool confusion, use of elog
instead of ereport for a user-facing error message, failure to adhere to
style guidelines for that message, etc.  (Although seeing that the error
message is unreachable code, maybe that doesn't matter ;-))  Aside from
the poor coding style, the whole idea of reaching into pg_depend to
remove a single dependency strikes me as a brute-force solution to
a problem that should have a more elegant answer.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Martijn van Oosterhout
On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote:
> > Yes, and for index_getmulti (which doesn't visit the heap at all) we'll
> > have to change all the users of that (which aren't many, I suppose).
> > It's probably worth making a utility function to expand them.
> > 
> > I'm still confused where bitmap index scan fit into all of this. Is
> > preserving the sequential scan aspect of these a goal with this new
> > setup?
> 
> No.  I was just pointing out that if you get to the tuple via an index,
> you get handed the head of the SITC via the index tuple, but if you are
> doing a sequential scan, you don't get it, so you have to find it, or
> any other non-visible SITC header.

Ok, but it remains true that you can only have one SITC per tuple. So
if you have 5 indexes on a table, any SITC will only join tuples that
didn't change any values in any of the indexed columns. That's probably
not a big deal though; indexes columns arn't likely to be the ones
changing much.

So, for the bitmap scan you have to make sure that within a single
transaction, scanning multiple indexes will have to provide the same
SITC for each set of tuples, even in the face of concurrent updates.
Otherwise the BitmapAnd will incorrectly throw them out.

That should be doable, if you only change the head of the SITC on
VACUUM. I'm not sure if that's what's being suggested right now.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] refcount leak warnings

2006-06-27 Thread Tom Lane
Thomas Hallgren <[EMAIL PROTECTED]> writes:
> I have a PL/Java user that performs some lengthy operations. Eventually, 
> he get warnings like:

> WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
> flags=0x27, refcount=1 2)

Look for ReadBuffer calls not matched by ReleaseBuffer.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] refcount leak warnings

2006-06-27 Thread Martijn van Oosterhout
On Tue, Jun 27, 2006 at 03:55:06PM +0200, Thomas Hallgren wrote:
> I have a PL/Java user that performs some lengthy operations. Eventually, 
> he get warnings like:
> 
> WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
> flags=0x27, refcount=1 2)

I think the comment about failing to clean up is correct. AIUI, if
you've done a heap_open without a heap_close, or an index_open without
an index_close, you'll get warnings like this. Maybe using SPI without
clearing everything.

Obviously something didn't get cleaned up somewhere, but what? Maybe
identifying the rel and looking at that block might help identify the
issue.

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Tom Lane
"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes:
>> That was with a mind to performance. Checking every INSERT, 
>> UPDATE and DELETE statement to see if they are being done 
>> against a frozen table seems like a waste.

> I'd think we would have relminxid in the relcache, so I don't buy the
> performance argument :-)

Me either.  Further, auto-revoking permissions loses information.
I think that idea is an ugly kluge.

Anyway, the bottom line here seems to be that we should forget about
pg_class_nt and just keep the info in pg_class; there's not sufficient
justification to build the infrastructure needed for a nontransactional
auxiliary catalog.  This implies the following conclusions:

* template0 has to be vacuumed against wraparound, same as any other
database.

* To support frozen tables, "VACUUM FREEZE" and "ALTER TABLE UNFREEZE"
would need to be explicit commands taking ExclusiveLock, and can't be
nested inside transaction blocks either.  Automatic unfreeze upon an
updating command isn't possible.

Neither of these are bad enough to justify pg_class_nt --- in fact,
I'd argue that explicit unfreeze is better than automatic anyway.
So it was a cute idea, but its time hasn't come.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] refcount leak warnings

2006-06-27 Thread Thomas Hallgren
I have a PL/Java user that performs some lengthy operations. Eventually, 
he get warnings like:


WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
flags=0x27, refcount=1 2)


I traced this to the function PrintBufferLeakWarning. AFAICS, it's only 
called from the function ResourceOwnerReleaseInternal under the 
following comment:


* During a commit, there shouldn't be any remaining pins --- that
* would indicate failure to clean up the executor correctly --- so
* issue warnings.In the abort case, just clean up quietly.

I have no idea where to go from here. What should I look for when trying 
to find the cause of such warnings?


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Mark Woodward
> On Fri, Jun 23, 2006 at 06:37:01AM -0400, Mark Woodward wrote:
>> While we all know session data is, at best, ephemeral, people still want
>> some sort of persistence, thus, you need a database. For mcache I have a
>> couple plugins that have a wide range of opitions, from read/write at
>> startup and shut down, to full write through cache to a database.
>>
>> In general, my clients don't want this, they want the database to store
>> their data. When you try to explain to them that a database may not be
>> the
>> right place to store this data, they ask why, sadly they have little
>> hope
>> of understanding the nuances and remain unconvinced.
>
> Have you done any benchmarking between a site using mcache and one not?
> I'll bet there's a huge difference, which translates into hardware $$.
> That's something managers can understand.
>

Last benchmark I did was on a pure data level, a couple years ago,
PostgreSQL could handle about 800 session transactions a second, but
degraded over time, MCache was up about 7500 session transactions a second
and held steady. I should dig up that code and make it available on my
site.

I have a couple users that tell me that their sites couldn't work without
it, not even with MySQL.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Mark Woodward
> Ühel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward:
>> > Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce
>> Momjian:
>> >> Jonah H. Harris wrote:
>> >> > On 6/23/06, Tom Lane <[EMAIL PROTECTED]> wrote:
>> >> > > What I see in this discussion is a huge amount of "the grass must
>> be
>> >> > > greener on the other side" syndrome, and hardly any recognition
>> that
>> >> > > every technique has its downsides and complications.
>> >> >
>> >> > I'm being totally objective.  I don't think we should abandon
>> >> > PostgreSQL's overall design at all, because we do perform INSERTs
>> and
>> >> > DELETEs much better than most systems.  However, I've looked at
>> many
>> >> > systems and how they implement UPDATE so that it is a scalable
>> >> > operation.  Sure, there are costs and benefits to each
>> implementation,
>> >> > but I think we have some pretty brilliant people in this community
>> and
>> >> > can come up with an elegant design for scalable UPDATEs.
>> >>
>> >> I think the UPDATE case is similar to the bitmap index scan or
>> perhaps
>> >> bitmap indexes on disk --- there are cases we know can not be handled
>> >> well by our existing code, so we have added (or might add) these
>> >> features to try to address those difficult cases.
>> >
>> > Not really. Bitmap index scan and bitmap index are both new additions
>> > working well with existing framework.
>> >
>> > While the problem of slowdown on frequent updates is real, the
>> suggested
>> > fix is just plain wrong, as it is based on someones faulty assumption
>> on
>> > how index lookup works, and very much simplified view of how different
>> > parts of the system work to implement MVCC.
>>
>> Yes, the suggestion was based on MVCC concepts, not a particular
>> implementation.
>
> On the contrary - afaik, it was loosely based on how Oracle does it with
> its rollback segments, only assuming that rollback segments are kept in
> heap and that indexes point only to the oldest row version :p

Well, give me a little more credit than that. Yes, Oracle did play small
part in my thinking, but only in as much as "they can't do it, why can't
we?" The problem was how to get the most recent tuple to be more efficient
and not have tuples that will never be used impact performance without
excessive locking or moving data around.

It was a just a quick idea. Bruce's solution, you have to admit, is
somewhat similar.

>
>> > The original fix he "suggests" was to that imagined behaviour and thus
>> > ignored all the real problems of such change.
>>
>> The original suggestion, was nothing more than a hypothetical for the
>> purpose of discussion.
>>
>> The problem was the steady degradation of performance on frequent
>> updates.
>> That was the point of discussion.  I brought up "one possible way" to
>> start a "brain storm." The discussion then morphed into critisizing the
>> example and not addressing the problem.
>
> The problem is heatedly discussed every 3-4 months.

And yet, here we are again.

>
>> Anyway, I think some decent discussion about the problem did happen, and
>> that is good.
>
> Agreed.
>
> Maybe this _was_ the best way to bring up the discussion again.

I have a way, for better or worse, I guess, of stirring up the pot. :-)

Cry as we may about MySQL, but I have a sneaking suspicion that this is
one of the issues that puts PostgreSQL at a serious disadvantage.

While heavily updated rows are a single type of problem, these days I
think *most* database deployments are as back-ends for web sites. This
problem is *very* critical to that type of application, consequently
probably why PostgreSQL has difficulty in that space.

If PostgreSQL can be made *not* to suffer performance degradation on
heavily updated rows, then that is realy the last issue in the way of it
being a completely creadible medium to large enterprise back end. This
combined with its amazing pragramability, should make it unstoppable.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Teodor Sigaev

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms


I'll look at this,  but GiST time creation is suspiciously small.
Can you test on smaller table, for example with 10 records and if results 
are repeat, pls, send to me test suite...


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Turning off disk caching

2006-06-27 Thread Dhanaraj M


Hi

Is there anybody who knows about  "Turning off disk caching" in solaris 
machines.

If so, pl. reply back.

Thanks
Dhanaraj

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 11:31, kirjutas Bruce Momjian:
> Hannu Krosing wrote:
> > > > pass 3: clean heap based on ctid from pass 1
> > > > 
> > > > If yo do it this way, you dont need to invent new data structures to
> > > > pass extra info about CITC internals to passes 2 and 3
> > > > 
> > > > On more thing - when should free space map be notified about free space
> > > > in pages with CITC chains ?
> > > 
> > > Uh, well, I am thinking we only free CITC space when we are going to use
> > > it for an UPDATE, rather than free things while doing an operation.  It
> > > is good to keep the cleanup overhead out of the main path as much as
> > > possible.
> > 
> > So vacuum should only remove dead CITC chains and leave the ones with
> > live tuples to CITC internal use ?
> 
> Yes, it has to.  What else would it do?  Add index entries?

No, clean out the dead part. 

But this would probably add the page to FSM - do we want that.

Also, this cleaning should probably be done at pass1, so we dont have to
carry the ctids of tuples which have no index entries around to passes 2
and 3 . This has the downside of possibly writing the heap page twice,
so maybe we dont want it.

> > That would also suggest that pages having live CITC chains and less than
> > N% of free space should mot be reported to FSM.
> 
> Parts of the CITC that are not visible can be used for free space by
> vacuum, but the visible part is left alone.
> 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Zeugswetter Andreas DCP SD

> > > Suggest that we prevent write operations on Frozen tables by 
> > > revoking
> > all INSERT, UPDATE or DELETE rights held, then enforcing a check 
> > during GRANT to prevent them being re-enabled. Superusers would need

> > to check every time. If we dont do this, then we will have two 
> > contradictory states marked in the catalog - privilges saying Yes
and 
> > freezing saying No.
> > 
> > No, I'd not mess with the permissions and return a different error 
> > when trying to modify a frozen table. (It would also be complicated
to 
> > unfreeze after create database) We should make it clear, that
freezing 
> > is no replacement for revoke.
> 
> That was with a mind to performance. Checking every INSERT, 
> UPDATE and DELETE statement to see if they are being done 
> against a frozen table seems like a waste.

I'd think we would have relminxid in the relcache, so I don't buy the
performance argument :-) (You could still do the actual check in the
same place where the permission is checked)

> There would still be a specific error message for frozen 
> tables, just on the GRANT rather than the actual DML statements.

I'd still prefer to see the error on modify. Those that don't can
revoke.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Kim Bisgaard

Jim C. Nasby wrote:

On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
  

Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses.  We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?



I'm curious how you'll do it, as I was once told that actually trying to
store heap data in a btree structure would be a non-starter (don't
remember why).
  
Ingres is now open source - they have clustering on btree/isam/hash 
(it's called "modify table xx to btree on col1,col2")


Regards,
Kim


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
> Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> > Jim C. Nasby wrote:
> > > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > > 
> > > > It is certainly possible to do what you are suggesting, that is have two
> > > > index entries point to same chain head, and have the index access
> > > > routines figure out if the index qualifications still hold, but that
> > > > seems like a lot of overhead.
> 
> I think Jim meant not 2 pointing to the same head, but 2 pointing into
> the same chain. Say we have table with (id serial, ts timestamp) where
> ts changes at each update and id does not.
> 
> So after 3 updates on one page we have one CITC/ITPC head with pointers
> from both indexes and two follow-up tuples with pointers from only ts
> index.
> 
> The problem with this setup is, that we can't reuse any of those
> follow-up tuples without index cleanup.

But we still have to think about similar cases (index entries pointing
inside CITC chains), unless we plan to disallow adding indexes to
tables.

Perhaps that case has to simply disable heap tuple reuse until some
event. what would that event be?

Or maybe we should have some bitmap of dirty tuple ids inside each page,
that is tuple ids that have index pointers to them. and then avoid using
these ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread PFC



My idea is that if an UPDATE places the new tuple on the same page as
the old tuple, it will not create new index entries for any indexes
where the key doesn't change.


	Basically the idea behind preventing index bloat by updates is to have  
one index tuple point to several actual tuples having the same value.


So : Index entry -> list of tuples having the same value -> actual 
tuples
(-> represents an indirection)

	I proposed to put the list of tuples in the index ; you propose to put it  
in data pages.


I think both solutions have pros and cons :

* List of tuples in the index :
+ reduces index size, makes cacheability in RAM more likely
+ speeds up index scans
- complexity
- slows down modifications to the index (a bit)

* List of tuples in the page
+ simpler to implement
+ reduces index size, but less so than previous solution
- useless if UPDATE puts the new tuple on a different page

I guess the best solution would be a mix of both.

	Also, I insist (again) that there is a lot to gain by using a bit of  
compression on the data pages, even if it's very simple compression like  
storing the new version of a row as a difference from the previous version  
(ie. only store the columns that changed).
	I think DB2 stores the latest version entirely, and stores the previous  
versions as a delta. This is more efficient.


	In the case of tables containing TEXT values, these could also get  
TOASTed. When an update does not modify the TOASTed columns, it would be  
nice to simply be able to keep the reference to the TOASTed data instead  
of decompressing it and recompressing it. Or is it already the case ?



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Simon Riggs
On Tue, 2006-06-27 at 10:04 +0200, Zeugswetter Andreas DCP SD wrote:

> Simon wrote:
> > Suggest that we prevent write operations on Frozen tables by revoking
> all INSERT, UPDATE or DELETE rights held, then enforcing a check during
> GRANT to prevent them being re-enabled. Superusers would need to check
> every time. If we dont do this, then we will have two contradictory
> states marked in the catalog - privilges saying Yes and freezing saying
> No.
> 
> No, I'd not mess with the permissions and return a different error when
> trying to
> modify a frozen table. (It would also be complicated to unfreeze after
> create database)
> We should make it clear, that freezing is no replacement for revoke.

That was with a mind to performance. Checking every INSERT, UPDATE and
DELETE statement to see if they are being done against a frozen table
seems like a waste.

There would still be a specific error message for frozen tables, just on
the GRANT rather than the actual DML statements.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Zeugswetter Andreas DCP SD

Very nice explanation, thanks Alvaro.

> 2. Mark frozen databases specially somehow.
>To mark databases frozen, we need a way to mark tables as frozen.
>How do we do that?  As I explain below, this allows some nice
>optimizations, but it's a very tiny can full of a huge amount of
>worms.
> 
> Marking a Table Frozen
> ==
> 
> Marking a table frozen is simple as setting relminxid = 
> FrozenXid for a table.  As explained above, this cannot be 
> done in a regular postmaster environment, because a 
> concurrent transaction could be doing nasty stuff to a table. 
>  So we can do it only in a standalone backend.

Unless you lock the table exclusively during vacuum, that could be done
with 
vacuum freeze. I like that more, than changing stuff that is otherwise
completely 
frozen/static. (I see you wrote that below)

> On the other hand, a "frozen" table must be marked with 
> relminxid = a-regular-Xid as soon as a transaction writes 
> some tuples on it.  Note that this "unfreezing" must take 
> place even if the offending transaction is aborted, because 
> the Xid is written in the table nevertheless and thus it 
> would be incorrect to lose the unfreezing.

The other idea was to need a special unfreeze command ...

> 
> This is how pg_class_nt came into existence -- it would be a 
> place where information about a table would be stored and not 
> subject to the rolling back of the transaction that wrote it.

Oh, that puts it in another league, since it must guarantee commit.
I am not sure we can do that. The previous discussion was about
concurrency and data that was not so important like tuple count.

In short: 
- I'd start with #1 (no relminxid = FrozenXid) like Tom
suggested
- and then implement FREEZE/UNFREEZE with exclusive locks 
like Simon wrote (so it does not need pg_class_nt) and use that
for the templates.

Simon wrote:
> Suggest that we prevent write operations on Frozen tables by revoking
all INSERT, UPDATE or DELETE rights held, then enforcing a check during
GRANT to prevent them being re-enabled. Superusers would need to check
every time. If we dont do this, then we will have two contradictory
states marked in the catalog - privilges saying Yes and freezing saying
No.

No, I'd not mess with the permissions and return a different error when
trying to
modify a frozen table. (It would also be complicated to unfreeze after
create database)
We should make it clear, that freezing is no replacement for revoke.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
> Jim C. Nasby wrote:
> > On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
> > > 
> > > It is certainly possible to do what you are suggesting, that is have two
> > > index entries point to same chain head, and have the index access
> > > routines figure out if the index qualifications still hold, but that
> > > seems like a lot of overhead.

I think Jim meant not 2 pointing to the same head, but 2 pointing into
the same chain. Say we have table with (id serial, ts timestamp) where
ts changes at each update and id does not.

So after 3 updates on one page we have one CITC/ITPC head with pointers
from both indexes and two follow-up tuples with pointers from only ts
index.

The problem with this setup is, that we can't reuse any of those
follow-up tuples without index cleanup.

> > > Also, once there is only one visible row in the chain, removing old
> > > index entries seems quite complex because you have to have vacuum keep
> > > the qualifications of each row to figure out which index tuple is the
> > > valid one (seems messy).
> >  
> > Perhaps my point got lost... in the case where no index keys change
> > during an update, SITC seems superior in every way to my proposal. My
> > idea (let's call it Index Tuple Page Consolidation, ITPC) would be
> > beneficial to UPDATEs that modify one or more index keys but still put
> > the tuple on the same page. Where SITC would be most useful for tables
> > that have a very heavy update rate and very few indexes, ITPC would
> > benefit tables that have more indexes on them; where presumably it's
> > much more likely for UPDATEs to change at least one index key (which
> > means SITC goes out the window, if I understand it correctly). If I'm
> > missing something and SITC can in fact deal with some index keys
> > changing during an UPDATE, then I see no reason for ITPC.
> 
> I understood what you had said.  The question is whether we want to get
> that complex with this feature, and if there are enough use cases
> (UPDATE with index keys changing) to warrant it.

I'd like to think that most heavily-updated tables avoid that, but there
may be still cases where this is needed.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [COMMITTERS] pgsql: Clamp last_anl_tuples to n_live_tuples, in case we vacuum a table

2006-06-27 Thread Dave Page
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: 27 June 2006 05:12
> To: Alvaro Herrera
> Cc: Hackers
> Subject: Re: [HACKERS] [COMMITTERS] pgsql: Clamp 
> last_anl_tuples to n_live_tuples, in case we vacuum a table 
> 
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > On a loosely related matter, how about changing pg_class.relpages to
> > pg_class.reldensity?
> 
> IIRC, I considered this earlier, and rejected it because it 
> doesn't cope
> well with the corner case relpages == 0.  Also, it'll break existing
> clients that expect to find relpages and reltuples, if there are any

There are.

/D

---(end of broadcast)---
TIP 6: explain analyze is your friend