Re: [HACKERS] Table clustering idea

2006-06-27 Thread Luke Lonergan
Jim,

On 6/26/06 8:15 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:

 On a somewhat related note, I think that it would be advantageous if the
 FSM had a means to prefer certain pages for a given tuple over other
 pages. This would allow for a better way to keep heap and possibly index
 data more compacted, and it would also be a means of keeping tables
 loosely clustered. It would also make it far easier to shrink heaps that
 have become bloated, because the FSM could be told to favor pages at the
 beginning of the relation.

Interesting idea - page affinity implemented using the FSM.

WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
Teradata implemented a hashing filesystem for their heap storage and I've
always wondered about how they handle collision and chaining efficiently,
but it's a solved problem for sure - knowing that makes the challenge that
much easier :-)
 
- Luke 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [COMMITTERS] pgsql: Clamp last_anl_tuples to n_live_tuples, in case we vacuum a table

2006-06-27 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: 27 June 2006 05:12
 To: Alvaro Herrera
 Cc: Hackers
 Subject: Re: [HACKERS] [COMMITTERS] pgsql: Clamp 
 last_anl_tuples to n_live_tuples, in case we vacuum a table 
 
 Alvaro Herrera [EMAIL PROTECTED] writes:
  On a loosely related matter, how about changing pg_class.relpages to
  pg_class.reldensity?
 
 IIRC, I considered this earlier, and rejected it because it 
 doesn't cope
 well with the corner case relpages == 0.  Also, it'll break existing
 clients that expect to find relpages and reltuples, if there are any

There are.

/D

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
 Jim C. Nasby wrote:
  On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
   
   It is certainly possible to do what you are suggesting, that is have two
   index entries point to same chain head, and have the index access
   routines figure out if the index qualifications still hold, but that
   seems like a lot of overhead.

I think Jim meant not 2 pointing to the same head, but 2 pointing into
the same chain. Say we have table with (id serial, ts timestamp) where
ts changes at each update and id does not.

So after 3 updates on one page we have one CITC/ITPC head with pointers
from both indexes and two follow-up tuples with pointers from only ts
index.

The problem with this setup is, that we can't reuse any of those
follow-up tuples without index cleanup.

   Also, once there is only one visible row in the chain, removing old
   index entries seems quite complex because you have to have vacuum keep
   the qualifications of each row to figure out which index tuple is the
   valid one (seems messy).
   
  Perhaps my point got lost... in the case where no index keys change
  during an update, SITC seems superior in every way to my proposal. My
  idea (let's call it Index Tuple Page Consolidation, ITPC) would be
  beneficial to UPDATEs that modify one or more index keys but still put
  the tuple on the same page. Where SITC would be most useful for tables
  that have a very heavy update rate and very few indexes, ITPC would
  benefit tables that have more indexes on them; where presumably it's
  much more likely for UPDATEs to change at least one index key (which
  means SITC goes out the window, if I understand it correctly). If I'm
  missing something and SITC can in fact deal with some index keys
  changing during an UPDATE, then I see no reason for ITPC.
 
 I understood what you had said.  The question is whether we want to get
 that complex with this feature, and if there are enough use cases
 (UPDATE with index keys changing) to warrant it.

I'd like to think that most heavily-updated tables avoid that, but there
may be still cases where this is needed.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Zeugswetter Andreas DCP SD

Very nice explanation, thanks Alvaro.

 2. Mark frozen databases specially somehow.
To mark databases frozen, we need a way to mark tables as frozen.
How do we do that?  As I explain below, this allows some nice
optimizations, but it's a very tiny can full of a huge amount of
worms.
 
 Marking a Table Frozen
 ==
 
 Marking a table frozen is simple as setting relminxid = 
 FrozenXid for a table.  As explained above, this cannot be 
 done in a regular postmaster environment, because a 
 concurrent transaction could be doing nasty stuff to a table. 
  So we can do it only in a standalone backend.

Unless you lock the table exclusively during vacuum, that could be done
with 
vacuum freeze. I like that more, than changing stuff that is otherwise
completely 
frozen/static. (I see you wrote that below)

 On the other hand, a frozen table must be marked with 
 relminxid = a-regular-Xid as soon as a transaction writes 
 some tuples on it.  Note that this unfreezing must take 
 place even if the offending transaction is aborted, because 
 the Xid is written in the table nevertheless and thus it 
 would be incorrect to lose the unfreezing.

The other idea was to need a special unfreeze command ...

 
 This is how pg_class_nt came into existence -- it would be a 
 place where information about a table would be stored and not 
 subject to the rolling back of the transaction that wrote it.

Oh, that puts it in another league, since it must guarantee commit.
I am not sure we can do that. The previous discussion was about
concurrency and data that was not so important like tuple count.

In short: 
- I'd start with #1 (no relminxid = FrozenXid) like Tom
suggested
- and then implement FREEZE/UNFREEZE with exclusive locks 
like Simon wrote (so it does not need pg_class_nt) and use that
for the templates.

Simon wrote:
 Suggest that we prevent write operations on Frozen tables by revoking
all INSERT, UPDATE or DELETE rights held, then enforcing a check during
GRANT to prevent them being re-enabled. Superusers would need to check
every time. If we dont do this, then we will have two contradictory
states marked in the catalog - privilges saying Yes and freezing saying
No.

No, I'd not mess with the permissions and return a different error when
trying to
modify a frozen table. (It would also be complicated to unfreeze after
create database)
We should make it clear, that freezing is no replacement for revoke.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Simon Riggs
On Tue, 2006-06-27 at 10:04 +0200, Zeugswetter Andreas DCP SD wrote:

 Simon wrote:
  Suggest that we prevent write operations on Frozen tables by revoking
 all INSERT, UPDATE or DELETE rights held, then enforcing a check during
 GRANT to prevent them being re-enabled. Superusers would need to check
 every time. If we dont do this, then we will have two contradictory
 states marked in the catalog - privilges saying Yes and freezing saying
 No.
 
 No, I'd not mess with the permissions and return a different error when
 trying to
 modify a frozen table. (It would also be complicated to unfreeze after
 create database)
 We should make it clear, that freezing is no replacement for revoke.

That was with a mind to performance. Checking every INSERT, UPDATE and
DELETE statement to see if they are being done against a frozen table
seems like a waste.

There would still be a specific error message for frozen tables, just on
the GRANT rather than the actual DML statements.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread PFC



My idea is that if an UPDATE places the new tuple on the same page as
the old tuple, it will not create new index entries for any indexes
where the key doesn't change.


	Basically the idea behind preventing index bloat by updates is to have  
one index tuple point to several actual tuples having the same value.


So : Index entry - list of tuples having the same value - actual 
tuples
(- represents an indirection)

	I proposed to put the list of tuples in the index ; you propose to put it  
in data pages.


I think both solutions have pros and cons :

* List of tuples in the index :
+ reduces index size, makes cacheability in RAM more likely
+ speeds up index scans
- complexity
- slows down modifications to the index (a bit)

* List of tuples in the page
+ simpler to implement
+ reduces index size, but less so than previous solution
- useless if UPDATE puts the new tuple on a different page

I guess the best solution would be a mix of both.

	Also, I insist (again) that there is a lot to gain by using a bit of  
compression on the data pages, even if it's very simple compression like  
storing the new version of a row as a difference from the previous version  
(ie. only store the columns that changed).
	I think DB2 stores the latest version entirely, and stores the previous  
versions as a delta. This is more efficient.


	In the case of tables containing TEXT values, these could also get  
TOASTed. When an update does not modify the TOASTed columns, it would be  
nice to simply be able to keep the reference to the TOASTed data instead  
of decompressing it and recompressing it. Or is it already the case ?



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
 Ühel kenal päeval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
  Jim C. Nasby wrote:
   On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:

It is certainly possible to do what you are suggesting, that is have two
index entries point to same chain head, and have the index access
routines figure out if the index qualifications still hold, but that
seems like a lot of overhead.
 
 I think Jim meant not 2 pointing to the same head, but 2 pointing into
 the same chain. Say we have table with (id serial, ts timestamp) where
 ts changes at each update and id does not.
 
 So after 3 updates on one page we have one CITC/ITPC head with pointers
 from both indexes and two follow-up tuples with pointers from only ts
 index.
 
 The problem with this setup is, that we can't reuse any of those
 follow-up tuples without index cleanup.

But we still have to think about similar cases (index entries pointing
inside CITC chains), unless we plan to disallow adding indexes to
tables.

Perhaps that case has to simply disable heap tuple reuse until some
event. what would that event be?

Or maybe we should have some bitmap of dirty tuple ids inside each page,
that is tuple ids that have index pointers to them. and then avoid using
these ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Kim Bisgaard

Jim C. Nasby wrote:

On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
  

Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses.  We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?



I'm curious how you'll do it, as I was once told that actually trying to
store heap data in a btree structure would be a non-starter (don't
remember why).
  
Ingres is now open source - they have clustering on btree/isam/hash 
(it's called modify table xx to btree on col1,col2)


Regards,
Kim


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Zeugswetter Andreas DCP SD

   Suggest that we prevent write operations on Frozen tables by 
   revoking
  all INSERT, UPDATE or DELETE rights held, then enforcing a check 
  during GRANT to prevent them being re-enabled. Superusers would need

  to check every time. If we dont do this, then we will have two 
  contradictory states marked in the catalog - privilges saying Yes
and 
  freezing saying No.
  
  No, I'd not mess with the permissions and return a different error 
  when trying to modify a frozen table. (It would also be complicated
to 
  unfreeze after create database) We should make it clear, that
freezing 
  is no replacement for revoke.
 
 That was with a mind to performance. Checking every INSERT, 
 UPDATE and DELETE statement to see if they are being done 
 against a frozen table seems like a waste.

I'd think we would have relminxid in the relcache, so I don't buy the
performance argument :-) (You could still do the actual check in the
same place where the permission is checked)

 There would still be a specific error message for frozen 
 tables, just on the GRANT rather than the actual DML statements.

I'd still prefer to see the error on modify. Those that don't can
revoke.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-06-26 kell 11:31, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
pass 3: clean heap based on ctid from pass 1

If yo do it this way, you dont need to invent new data structures to
pass extra info about CITC internals to passes 2 and 3

On more thing - when should free space map be notified about free space
in pages with CITC chains ?
   
   Uh, well, I am thinking we only free CITC space when we are going to use
   it for an UPDATE, rather than free things while doing an operation.  It
   is good to keep the cleanup overhead out of the main path as much as
   possible.
  
  So vacuum should only remove dead CITC chains and leave the ones with
  live tuples to CITC internal use ?
 
 Yes, it has to.  What else would it do?  Add index entries?

No, clean out the dead part. 

But this would probably add the page to FSM - do we want that.

Also, this cleaning should probably be done at pass1, so we dont have to
carry the ctids of tuples which have no index entries around to passes 2
and 3 . This has the downside of possibly writing the heap page twice,
so maybe we dont want it.

  That would also suggest that pages having live CITC chains and less than
  N% of free space should mot be reported to FSM.
 
 Parts of the CITC that are not visible can be used for free space by
 vacuum, but the visible part is left alone.
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Turning off disk caching

2006-06-27 Thread Dhanaraj M


Hi

Is there anybody who knows about  Turning off disk caching in solaris 
machines.

If so, pl. reply back.

Thanks
Dhanaraj

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Teodor Sigaev

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms


I'll look at this,  but GiST time creation is suspiciously small.
Can you test on smaller table, for example with 10 records and if results 
are repeat, pls, send to me test suite...


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Mark Woodward
 Ühel kenal päeval, E, 2006-06-26 kell 09:10, kirjutas Mark Woodward:
  Ühel kenal päeval, R, 2006-06-23 kell 17:27, kirjutas Bruce
 Momjian:
  Jonah H. Harris wrote:
   On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote:
What I see in this discussion is a huge amount of the grass must
 be
greener on the other side syndrome, and hardly any recognition
 that
every technique has its downsides and complications.
  
   I'm being totally objective.  I don't think we should abandon
   PostgreSQL's overall design at all, because we do perform INSERTs
 and
   DELETEs much better than most systems.  However, I've looked at
 many
   systems and how they implement UPDATE so that it is a scalable
   operation.  Sure, there are costs and benefits to each
 implementation,
   but I think we have some pretty brilliant people in this community
 and
   can come up with an elegant design for scalable UPDATEs.
 
  I think the UPDATE case is similar to the bitmap index scan or
 perhaps
  bitmap indexes on disk --- there are cases we know can not be handled
  well by our existing code, so we have added (or might add) these
  features to try to address those difficult cases.
 
  Not really. Bitmap index scan and bitmap index are both new additions
  working well with existing framework.
 
  While the problem of slowdown on frequent updates is real, the
 suggested
  fix is just plain wrong, as it is based on someones faulty assumption
 on
  how index lookup works, and very much simplified view of how different
  parts of the system work to implement MVCC.

 Yes, the suggestion was based on MVCC concepts, not a particular
 implementation.

 On the contrary - afaik, it was loosely based on how Oracle does it with
 its rollback segments, only assuming that rollback segments are kept in
 heap and that indexes point only to the oldest row version :p

Well, give me a little more credit than that. Yes, Oracle did play small
part in my thinking, but only in as much as they can't do it, why can't
we? The problem was how to get the most recent tuple to be more efficient
and not have tuples that will never be used impact performance without
excessive locking or moving data around.

It was a just a quick idea. Bruce's solution, you have to admit, is
somewhat similar.


  The original fix he suggests was to that imagined behaviour and thus
  ignored all the real problems of such change.

 The original suggestion, was nothing more than a hypothetical for the
 purpose of discussion.

 The problem was the steady degradation of performance on frequent
 updates.
 That was the point of discussion.  I brought up one possible way to
 start a brain storm. The discussion then morphed into critisizing the
 example and not addressing the problem.

 The problem is heatedly discussed every 3-4 months.

And yet, here we are again.


 Anyway, I think some decent discussion about the problem did happen, and
 that is good.

 Agreed.

 Maybe this _was_ the best way to bring up the discussion again.

I have a way, for better or worse, I guess, of stirring up the pot. :-)

Cry as we may about MySQL, but I have a sneaking suspicion that this is
one of the issues that puts PostgreSQL at a serious disadvantage.

While heavily updated rows are a single type of problem, these days I
think *most* database deployments are as back-ends for web sites. This
problem is *very* critical to that type of application, consequently
probably why PostgreSQL has difficulty in that space.

If PostgreSQL can be made *not* to suffer performance degradation on
heavily updated rows, then that is realy the last issue in the way of it
being a completely creadible medium to large enterprise back end. This
combined with its amazing pragramability, should make it unstoppable.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Mark Woodward
 On Fri, Jun 23, 2006 at 06:37:01AM -0400, Mark Woodward wrote:
 While we all know session data is, at best, ephemeral, people still want
 some sort of persistence, thus, you need a database. For mcache I have a
 couple plugins that have a wide range of opitions, from read/write at
 startup and shut down, to full write through cache to a database.

 In general, my clients don't want this, they want the database to store
 their data. When you try to explain to them that a database may not be
 the
 right place to store this data, they ask why, sadly they have little
 hope
 of understanding the nuances and remain unconvinced.

 Have you done any benchmarking between a site using mcache and one not?
 I'll bet there's a huge difference, which translates into hardware $$.
 That's something managers can understand.


Last benchmark I did was on a pure data level, a couple years ago,
PostgreSQL could handle about 800 session transactions a second, but
degraded over time, MCache was up about 7500 session transactions a second
and held steady. I should dig up that code and make it available on my
site.

I have a couple users that tell me that their sites couldn't work without
it, not even with MySQL.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] refcount leak warnings

2006-06-27 Thread Thomas Hallgren
I have a PL/Java user that performs some lengthy operations. Eventually, 
he get warnings like:


WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
flags=0x27, refcount=1 2)


I traced this to the function PrintBufferLeakWarning. AFAICS, it's only 
called from the function ResourceOwnerReleaseInternal under the 
following comment:


* During a commit, there shouldn't be any remaining pins --- that
* would indicate failure to clean up the executor correctly --- so
* issue warnings.In the abort case, just clean up quietly.

I have no idea where to go from here. What should I look for when trying 
to find the cause of such warnings?


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Non-transactional pg_class, try 2

2006-06-27 Thread Tom Lane
Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:
 That was with a mind to performance. Checking every INSERT, 
 UPDATE and DELETE statement to see if they are being done 
 against a frozen table seems like a waste.

 I'd think we would have relminxid in the relcache, so I don't buy the
 performance argument :-)

Me either.  Further, auto-revoking permissions loses information.
I think that idea is an ugly kluge.

Anyway, the bottom line here seems to be that we should forget about
pg_class_nt and just keep the info in pg_class; there's not sufficient
justification to build the infrastructure needed for a nontransactional
auxiliary catalog.  This implies the following conclusions:

* template0 has to be vacuumed against wraparound, same as any other
database.

* To support frozen tables, VACUUM FREEZE and ALTER TABLE UNFREEZE
would need to be explicit commands taking ExclusiveLock, and can't be
nested inside transaction blocks either.  Automatic unfreeze upon an
updating command isn't possible.

Neither of these are bad enough to justify pg_class_nt --- in fact,
I'd argue that explicit unfreeze is better than automatic anyway.
So it was a cute idea, but its time hasn't come.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] refcount leak warnings

2006-06-27 Thread Martijn van Oosterhout
On Tue, Jun 27, 2006 at 03:55:06PM +0200, Thomas Hallgren wrote:
 I have a PL/Java user that performs some lengthy operations. Eventually, 
 he get warnings like:
 
 WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
 flags=0x27, refcount=1 2)

I think the comment about failing to clean up is correct. AIUI, if
you've done a heap_open without a heap_close, or an index_open without
an index_close, you'll get warnings like this. Maybe using SPI without
clearing everything.

Obviously something didn't get cleaned up somewhere, but what? Maybe
identifying the rel and looking at that block might help identify the
issue.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] refcount leak warnings

2006-06-27 Thread Tom Lane
Thomas Hallgren [EMAIL PROTECTED] writes:
 I have a PL/Java user that performs some lengthy operations. Eventually, 
 he get warnings like:

 WARNING: buffer refcount leak: [779] (rel=1663/16440/52989, blockNum=3, 
 flags=0x27, refcount=1 2)

Look for ReadBuffer calls not matched by ReleaseBuffer.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Martijn van Oosterhout
On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote:
  Yes, and for index_getmulti (which doesn't visit the heap at all) we'll
  have to change all the users of that (which aren't many, I suppose).
  It's probably worth making a utility function to expand them.
  
  I'm still confused where bitmap index scan fit into all of this. Is
  preserving the sequential scan aspect of these a goal with this new
  setup?
 
 No.  I was just pointing out that if you get to the tuple via an index,
 you get handed the head of the SITC via the index tuple, but if you are
 doing a sequential scan, you don't get it, so you have to find it, or
 any other non-visible SITC header.

Ok, but it remains true that you can only have one SITC per tuple. So
if you have 5 indexes on a table, any SITC will only join tuples that
didn't change any values in any of the indexed columns. That's probably
not a big deal though; indexes columns arn't likely to be the ones
changing much.

So, for the bitmap scan you have to make sure that within a single
transaction, scanning multiple indexes will have to provide the same
SITC for each set of tuples, even in the face of concurrent updates.
Otherwise the BitmapAnd will incorrectly throw them out.

That should be doable, if you only change the head of the SITC on
VACUUM. I'm not sure if that's what's being suggested right now.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
Hi,

I see a performance issue on win32. This problem is causes by the
following URL. 

http://support.microsoft.com/kb/823764/EN-US/

On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
8192 too.

pqcomm.c:117
  #define PQ_BUFFER_SIZE 8192

send() may take as long as 200ms. So, I think we should increase
SO_SNDBUF to more than 8192. I attache the patch.

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]
Index: pqcomm.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v
retrieving revision 1.184
diff -c -r1.184 pqcomm.c
*** pqcomm.c5 Mar 2006 15:58:27 -   1.184
--- pqcomm.c27 Jun 2006 15:17:18 -
***
*** 593,598 
--- 593,608 
return STATUS_ERROR;
}
  
+ #ifdef WIN32
+   on = PQ_BUFFER_SIZE * 2;
+   if (setsockopt(port-sock, SOL_SOCKET, SO_SNDBUF,
+  (char *) on, sizeof(on))  0)
+   {
+   elog(LOG, setsockopt(SO_SNDBUF) failed: %m);
+   return STATUS_ERROR;
+   }
+ #endif
+ 
/*
 * Also apply the current keepalive parameters.  If we fail to 
set a
 * parameter, don't error out, because these aren't universally

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Yoshiyuki Asaba [EMAIL PROTECTED] writes:
 send() may take as long as 200ms. So, I think we should increase
 SO_SNDBUF to more than 8192. I attache the patch.

Why would that help?  We won't be sending more than 8K at a time.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
 Hi,
 
 I see a performance issue on win32. This problem is causes by the
 following URL. 
 
 http://support.microsoft.com/kb/823764/EN-US/
 
 On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
 8192 too.

Ok, so there's a difficiency in Windows TCP code. Do you have any
benchmarks to show this actually makes a difference. According to the
URL you give, the problem occurs if the libpq buffer is *bigger* than
the socket buffer, which it isn't...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Tom Lane [EMAIL PROTECTED]
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Tue, 27 Jun 2006 11:30:56 -0400

 Yoshiyuki Asaba [EMAIL PROTECTED] writes:
  send() may take as long as 200ms. So, I think we should increase
  SO_SNDBUF to more than 8192. I attache the patch.
 
 Why would that help?  We won't be sending more than 8K at a time.

MSDN is,

  Method2: Make the Socket Send Buffer Size Larger Than the Program
   Send Buffer Size

Modify the send call or the WSASend call to specify a buffer size
at least 1 byte smaller than the SO_SNDBUF value.

--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Andrew Dunstan

Martijn van Oosterhout wrote:


On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
 


Hi,

I see a performance issue on win32. This problem is causes by the
following URL. 


http://support.microsoft.com/kb/823764/EN-US/

On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
8192 too.
   



Ok, so there's a difficiency in Windows TCP code. Do you have any
benchmarks to show this actually makes a difference. According to the
URL you give, the problem occurs if the libpq buffer is *bigger* than
the socket buffer, which it isn't...
 



No, it says it occurs if this condition is met: A single *send* call or 
*WSASend* call fills the whole underlying socket send buffer.


This will surely be true if the buffer sizes are the same. They 
recommend making the socket buffer at least 1 byte bigger.


cheers

andrew



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Rocco Altier
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Martijn van Oosterhout
 On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
  Hi,
  
  I see a performance issue on win32. This problem is causes by the
  following URL. 
  
  http://support.microsoft.com/kb/823764/EN-US/
  
  On win32, default SO_SNDBUF value is 8192 bytes. And 
 libpq's buffer is
  8192 too.
 
 Ok, so there's a difficiency in Windows TCP code. Do you have any
 benchmarks to show this actually makes a difference. According to the
 URL you give, the problem occurs if the libpq buffer is *bigger* than
 the socket buffer, which it isn't...
 
The article also says there is a problem if they are the same size.

-rocco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Martijn van Oosterhout
On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote:
 No, it says it occurs if this condition is met: A single *send* call or 
 *WSASend* call fills the whole underlying socket send buffer.
 
 This will surely be true if the buffer sizes are the same. They 
 recommend making the socket buffer at least 1 byte bigger.

Ok, but even then, are there any benchmarks to show it makes a
difference. The articles suggests there should be but it would be nice
to see how much difference it makes...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Hannu Krosing wrote:
 ?hel kenal p?eval, T, 2006-06-27 kell 10:38, kirjutas Hannu Krosing:
  ?hel kenal p?eval, E, 2006-06-26 kell 23:08, kirjutas Bruce Momjian:
   Jim C. Nasby wrote:
On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
 
 It is certainly possible to do what you are suggesting, that is have 
 two
 index entries point to same chain head, and have the index access
 routines figure out if the index qualifications still hold, but that
 seems like a lot of overhead.
  
  I think Jim meant not 2 pointing to the same head, but 2 pointing into
  the same chain. Say we have table with (id serial, ts timestamp) where
  ts changes at each update and id does not.
  
  So after 3 updates on one page we have one CITC/ITPC head with pointers
  from both indexes and two follow-up tuples with pointers from only ts
  index.
  
  The problem with this setup is, that we can't reuse any of those
  follow-up tuples without index cleanup.
 
 But we still have to think about similar cases (index entries pointing
 inside CITC chains), unless we plan to disallow adding indexes to
 tables.

CREATE INDEX has to undo any chains where the new indexed columns change
in the chain, and add index entries to remove the chain.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
PFC wrote:
 
  My idea is that if an UPDATE places the new tuple on the same page as
  the old tuple, it will not create new index entries for any indexes
  where the key doesn't change.
 
   Basically the idea behind preventing index bloat by updates is to have  
 one index tuple point to several actual tuples having the same value.
   

The idea is not to avoid index bloat, but to allow heap reuse, and having
one index entry for multiple versions of an UPDATEd row is merely an
implementation detail.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Jim C. Nasby
On Mon, Jun 26, 2006 at 11:08:24PM -0400, Bruce Momjian wrote:
 Jim C. Nasby wrote:
  On Mon, Jun 26, 2006 at 02:32:59PM -0400, Bruce Momjian wrote:
   
   It is certainly possible to do what you are suggesting, that is have two
   index entries point to same chain head, and have the index access
   routines figure out if the index qualifications still hold, but that
   seems like a lot of overhead.
   
   Also, once there is only one visible row in the chain, removing old
   index entries seems quite complex because you have to have vacuum keep
   the qualifications of each row to figure out which index tuple is the
   valid one (seems messy).
   
  Perhaps my point got lost... in the case where no index keys change
  during an update, SITC seems superior in every way to my proposal. My
  idea (let's call it Index Tuple Page Consolidation, ITPC) would be
  beneficial to UPDATEs that modify one or more index keys but still put
  the tuple on the same page. Where SITC would be most useful for tables
  that have a very heavy update rate and very few indexes, ITPC would
  benefit tables that have more indexes on them; where presumably it's
  much more likely for UPDATEs to change at least one index key (which
  means SITC goes out the window, if I understand it correctly). If I'm
  missing something and SITC can in fact deal with some index keys
  changing during an UPDATE, then I see no reason for ITPC.
 
 I understood what you had said.  The question is whether we want to get
 that complex with this feature, and if there are enough use cases
 (UPDATE with index keys changing) to warrant it.

Ideas on how to test a table to see how many tuples would fit this
criteria?

Or we could just shelve ITPC as a possibility in the future, after we
see how much the limitations of SITC hurt.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
 On Mon, Jun 26, 2006 at 11:29:27AM -0400, Bruce Momjian wrote:
   Yes, and for index_getmulti (which doesn't visit the heap at all) we'll
   have to change all the users of that (which aren't many, I suppose).
   It's probably worth making a utility function to expand them.
   
   I'm still confused where bitmap index scan fit into all of this. Is
   preserving the sequential scan aspect of these a goal with this new
   setup?
  
  No.  I was just pointing out that if you get to the tuple via an index,
  you get handed the head of the SITC via the index tuple, but if you are
  doing a sequential scan, you don't get it, so you have to find it, or
  any other non-visible SITC header.
 
 Ok, but it remains true that you can only have one SITC per tuple. So
 if you have 5 indexes on a table, any SITC will only join tuples that
 didn't change any values in any of the indexed columns. That's probably
 not a big deal though; indexes columns arn't likely to be the ones
 changing much.

Right.

 So, for the bitmap scan you have to make sure that within a single
 transaction, scanning multiple indexes will have to provide the same
 SITC for each set of tuples, even in the face of concurrent updates.
 Otherwise the BitmapAnd will incorrectly throw them out.

The index points to the item id on the page, and that never changes,
even if the head tuple changes later.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Jim C. Nasby wrote:
   Perhaps my point got lost... in the case where no index keys change
   during an update, SITC seems superior in every way to my proposal. My
   idea (let's call it Index Tuple Page Consolidation, ITPC) would be
   beneficial to UPDATEs that modify one or more index keys but still put
   the tuple on the same page. Where SITC would be most useful for tables
   that have a very heavy update rate and very few indexes, ITPC would
   benefit tables that have more indexes on them; where presumably it's
   much more likely for UPDATEs to change at least one index key (which
   means SITC goes out the window, if I understand it correctly). If I'm
   missing something and SITC can in fact deal with some index keys
   changing during an UPDATE, then I see no reason for ITPC.
  
  I understood what you had said.  The question is whether we want to get
  that complex with this feature, and if there are enough use cases
  (UPDATE with index keys changing) to warrant it.
 
 Ideas on how to test a table to see how many tuples would fit this
 criteria?
 
 Or we could just shelve ITPC as a possibility in the future, after we
 see how much the limitations of SITC hurt.

Probably.  I am not sure even SITC is a win given the complexity it will
add, but I think it is worth trying.  Getting into more complex cases
where chains change indexed values seems like something we could try
later if we have to.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject SetString]

2006-06-27 Thread Thomas Hallgren
There's an inconsistency between the handling of trailing whitespace in 
query parameters in the client jdbc driver compared to the PL/Java SPI 
based driver. According to Jean-Pierre, the former apparently trims the 
trailing spaces before passing the query (see below). What is the 
correct behavior?


Regards,
Thomas Hallgren


 Original Message 
Subject: 	Re: [Pljava-dev] char with trailing space, 
PreparedStatement.setObject  SetString

Date:   Tue, 27 Jun 2006 12:07:19 -0400
From:   JEAN-PIERRE PELLETIER [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]



Hi Thomas,

There are very few char columns on my system, I can easily live with 
explicit trim

in my application code, I only wanted to let you guys know.

I am not sure which of the two JDBC implementations is right.
psql and pgadmin would both handle char as expected.
As for JDBC, you might want to know how other dbms are handling this.

Thanks for your reply.
Jean-Pierre Pelletier


From: Thomas Hallgren [EMAIL PROTECTED]
To: JEAN-PIERRE PELLETIER [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: [Pljava-dev] char with trailing 
space,	PreparedStatement.setObject  SetString

Date: Tue, 27 Jun 2006 17:47:24 +0200

Hi Jean-Pierre,
I'm not sure this is incorrect behavior. There's nothing in the spec that 
indicates that String values should be trimmed by setString and setObject. 
On the contrary. Some datatypes (the CHAR in particular) are sensitive to 
whitespace according to the SQL standard. Perhaps the client jdbc driver is 
doing something wrong here?


Regards,
Thomas Hallgren


JEAN-PIERRE PELLETIER wrote:

Hi,

Trailing space are not handled properly by setObject  setString.

PreparedStatement pstmt = connection.prepareStatement(
   select * from mytable where mycharcolumn = ?);

String myString = abc ;
pstmt.setObject(1, myString); // or setObject(1, myString, Types.CHAR) or 
setString(1, myString)


No rows are returned, but using trim works fine as in:
pstmt.setObject(1, myString.trim());

My environment is Pl/Java 1.3, Sun JDK 1.5.07,  PostgreSQL 8.1.4, Windows 
XP SP2


With PostgreSQL own (non pl/java) jdbc driver, setObject on char works 
fine without the trim.


Thanks,
Jean-Pierre Pelletier


___
Pljava-dev mailing list
[EMAIL PROTECTED]
http://gborg.postgresql.org/mailman/listinfo/pljava-dev





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Martijn van Oosterhout wrote:
 On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
 http://support.microsoft.com/kb/823764/EN-US/

 No, it says it occurs if this condition is met: A single *send* call or 
 *WSASend* call fills the whole underlying socket send buffer.

It also says that the condition only occurs if the program uses
non-blocking sockets ... which the backend does not.  So this page
offers no support for the proposed patch.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Jim C. Nasby
On Tue, Jun 27, 2006 at 10:42:54AM +0200, PFC wrote:
   Also, I insist (again) that there is a lot to gain by using a bit of 
 compression on the data pages, even if it's very simple compression like  
 storing the new version of a row as a difference from the previous version  
 (ie. only store the columns that changed).
   I think DB2 stores the latest version entirely, and stores the 
   previous  versions as a delta. This is more efficient.
 
This would only help on tables that:

have many columns[1]
are frequently updated
the updates normally touch few columns

[1] I'm assuming that un-changed toasted fields keep the same pointer

I'm doubtful that that case is common enough to warrant the amount of
work that would be involved in doing this. I think it might be useful to
consider ways to make vertical partitioning easier, since that's the
common means to reduce the impact of these scenarios.

   In the case of tables containing TEXT values, these could also get  
 TOASTed. When an update does not modify the TOASTed columns, it would be  
 nice to simply be able to keep the reference to the TOASTed data instead  
 of decompressing it and recompressing it. Or is it already the case ?

Hopefully it is, but I'm not sure... something that would be good is a
means to force fields to be toasted sooner than when the tuple is bigger
than 2k, because that'd be a very easy way to get gains from vertical
partitioning.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Martijn van Oosterhout kleptog@svana.org
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32?
Date: Tue, 27 Jun 2006 18:13:18 +0200

 On Tue, Jun 27, 2006 at 11:45:53AM -0400, Andrew Dunstan wrote:
  No, it says it occurs if this condition is met: A single *send* call or 
  *WSASend* call fills the whole underlying socket send buffer.
  
  This will surely be true if the buffer sizes are the same. They 
  recommend making the socket buffer at least 1 byte bigger.
 
 Ok, but even then, are there any benchmarks to show it makes a
 difference. The articles suggests there should be but it would be nice
 to see how much difference it makes...

I see the problem in this environment.

* client
  - Windows XP
  - using ODBC driver

* server
  - Windows XP
  - 8.1.4

* query time
  - original - about 12sec.
  - patch version - about 3sec.

However, this problem did not occur when I changed a client
machine...

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Jim C. Nasby
On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote:
 Jim,
 
 On 6/26/06 8:15 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
 
  On a somewhat related note, I think that it would be advantageous if the
  FSM had a means to prefer certain pages for a given tuple over other
  pages. This would allow for a better way to keep heap and possibly index
  data more compacted, and it would also be a means of keeping tables
  loosely clustered. It would also make it far easier to shrink heaps that
  have become bloated, because the FSM could be told to favor pages at the
  beginning of the relation.
 
 Interesting idea - page affinity implemented using the FSM.
 
 WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
 Teradata implemented a hashing filesystem for their heap storage and I've
 always wondered about how they handle collision and chaining efficiently,
 but it's a solved problem for sure - knowing that makes the challenge that
 much easier :-)

I know there were discussions in the past, though as per usual I can't
find them in the archives. At one point I had suggested clustering not
on a row level, but on a page level, since it doesn't really matter
terribly if the tuples in a page are clustered (worst case you can scan
the entire page).

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item (since
items will need to move to maintain an IOT).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Yoshiyuki Asaba
From: Tom Lane [EMAIL PROTECTED]
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Tue, 27 Jun 2006 12:28:35 -0400

 Andrew Dunstan [EMAIL PROTECTED] writes:
  Martijn van Oosterhout wrote:
  On Wed, Jun 28, 2006 at 12:23:13AM +0900, Yoshiyuki Asaba wrote:
  http://support.microsoft.com/kb/823764/EN-US/
 
  No, it says it occurs if this condition is met: A single *send* call or 
  *WSASend* call fills the whole underlying socket send buffer.
 
 It also says that the condition only occurs if the program uses
 non-blocking sockets ... which the backend does not.  So this page
 offers no support for the proposed patch.

WSAEventSelect() sets a socket to nonblocking mode.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcecomm5/html/wce50lrfWSAEventSelect.asp

pgwin32_send() calls pgwin32_waitforsinglesocket() before
WSASend(). And pgwin32_waitforsinglesocket() calls WSAEventSelect().

Regards,
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Csaba Nagy
 I think one of the issues might have been: how will you handle other
 indexes on the table when you can no longer point them at an item (since
 items will need to move to maintain an IOT).

I guess you shouldn't allow any other indexes. That's a perfectly
acceptable compromise I think... it would be still very useful for big
and narrow tables which would benefit from being clustered.

The other concern is how would you do sequential scans on the table if
items are allowed to move ? I think some other DBs have a facility to
make a fast index scan which is essentially a sequential scan of the
index file, something like that would be needed here too.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [Fwd: Re: [Pljava-dev] char with trailing space, PreparedStatement.setObject

2006-06-27 Thread Thomas Hallgren

Sorry, wrong list... I reposted this on pgsql-jdbc instead.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Yoshiyuki Asaba [EMAIL PROTECTED] writes:
 From: Tom Lane [EMAIL PROTECTED]
 It also says that the condition only occurs if the program uses
 non-blocking sockets ... which the backend does not.  So this page
 offers no support for the proposed patch.

 WSAEventSelect() sets a socket to nonblocking mode.

Yeah, but that socket is only used for inter-backend signaling with
small (1 byte, I think) messages.  The socket used for communication
with the frontend is not in nonblocking mode, unless I'm totally
confused.

Have you actually measured any performance benefit from this patch,
and if so what was the test case?  I'm not opposed to the patch if it
does something useful, but the info currently available does not suggest
that it will help.

What I would think might help is a patch on the libpq side (because it
*does* use a nonblocking socket) to avoid sending more than 8K per
WSASend call.  The effect would just be to break a long send into a
series of shorter sends, which wouldn't really do anything useful on a
well-designed TCP stack, but then this is Windows we're talking about...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Table clustering idea

2006-06-27 Thread J. Andrew Rogers


On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote:

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item  
(since

items will need to move to maintain an IOT).



There are clean ways to handle this.  The table is organized on the  
primary key, a typical requirement for IOTs.  Any indexes you add to  
IOT reference the primary key of the heap tuple.  Since the heap and  
PK index are the same thing, external indexes use the PK as the tuple  
identifier.


The only caveat is that this creates performance asymmetries.  IOTs  
have significantly faster access through their primary keys but  
slower external index access since two B-Trees have to be traversed.   
An IOT is typically only used for tables that are only accessed  
through their primary key.  Not supporting external indexes on IOTs  
is a functional implementation (and probably recommended in  
practice), though most real implementations allow external indexes if  
not always in their first version.



J. Andrew Rogers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Magnus Hagander
  From: Tom Lane [EMAIL PROTECTED]
  It also says that the condition only occurs if the program uses 
  non-blocking sockets ... which the backend does not.  So this page 
  offers no support for the proposed patch.
 
  WSAEventSelect() sets a socket to nonblocking mode.
 
 Yeah, but that socket is only used for inter-backend 
 signaling with small (1 byte, I think) messages.  The socket 
 used for communication with the frontend is not in 
 nonblocking mode, unless I'm totally confused.

For once, I beleive you are :-)
We use non-blocking sockets in backend/port/win32/socket.c so we are
able to deliver our faked signals while waiting for I/O on the socket.
We specifically set it in pgwin32_socket().

Given that, it might be a good idea to actually put the code there
instead, to localise it. With a comment and a reference to that Q
article.


 Have you actually measured any performance benefit from this 
 patch, and if so what was the test case?  I'm not opposed to 
 the patch if it does something useful, but the info currently 
 available does not suggest that it will help.

We have definitly seen weird timing issues sometimes when both client
and server were on Windows, but have been unable to pin it exactly on
what. From Yoshiykis other mail it looks like this could possibly be it,
since he did experience a speedup in the range we've been looking for in
those cases.


 What I would think might help is a patch on the libpq side (because it
 *does* use a nonblocking socket) to avoid sending more than 
 8K per WSASend call.  The effect would just be to break a 
 long send into a series of shorter sends, which wouldn't 
 really do anything useful on a well-designed TCP stack, but 
 then this is Windows we're talking about...

It could definitly be a good idea to have a patch there *as well*, but I
think they'd both be affected.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Tom Lane
I've been digging into why buildfarm member thrush has been dumping core
consistently during the regression tests since the posix_fadvise patch
went in.  I've confirmed that posix_fadvise() itself will SIGSEGV in a
standalone test program, and found that this happens only if
_FILE_OFFSET_BITS=64 ... which is our default configuration on Linux.

Some googling turned up the following 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219
which basically says that posix_fadvise64 + 2.4 kernel + older glibc
= crash.  It sounds like the 2.4 kernel hasn't got this call but glibc
thought it did, up till about a year ago.

While we could possibly come up with a suitable configure test to
determine whether posix_fadvise is actually safe to use on a given
system, I think we should seriously consider just reverting the patch.
As far as I saw, zero evidence was given that it actually does anything
measurable.  Without a benchmark to prove that it's worth spending more
time on, I'm disinclined to trouble over it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Greg Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 PFC wrote:
  
   My idea is that if an UPDATE places the new tuple on the same page as
   the old tuple, it will not create new index entries for any indexes
   where the key doesn't change.
  
  Basically the idea behind preventing index bloat by updates is to have  
  one index tuple point to several actual tuples having the same value.
  
 
 The idea is not to avoid index bloat, but to allow heap reuse, and having
 one index entry for multiple versions of an UPDATEd row is merely an
 implementation detail.

It sort of sounds like you're describing a whole new index type that stores
only the page, not the precise record of any tuple it indexes. If your table
has only such indexes then you never need to worry about updating indexes if
your new tuple version goes on the same page as the old one.

It's an interesting thought experiment. It might trade off a lot of work in
index maintenance as well as saving space in the index for a lot of additional
work performing index scans. There can easily be enough tuples on a page to
make scanning the entire page pretty costly.


-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Mark Woodward

I would set the SO_SNDBUF to 32768.

 Hi,

 I see a performance issue on win32. This problem is causes by the
 following URL.

 http://support.microsoft.com/kb/823764/EN-US/

 On win32, default SO_SNDBUF value is 8192 bytes. And libpq's buffer is
 8192 too.

 pqcomm.c:117
   #define PQ_BUFFER_SIZE 8192

 send() may take as long as 200ms. So, I think we should increase
 SO_SNDBUF to more than 8192. I attache the patch.

 Regards,
 --
 Yoshiyuki Asaba
 [EMAIL PROTECTED]
 Index: pqcomm.c
 ===
 RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v
 retrieving revision 1.184
 diff -c -r1.184 pqcomm.c
 *** pqcomm.c  5 Mar 2006 15:58:27 -   1.184
 --- pqcomm.c  27 Jun 2006 15:17:18 -
 ***
 *** 593,598 
 --- 593,608 
   return STATUS_ERROR;
   }

 + #ifdef WIN32
 + on = PQ_BUFFER_SIZE * 2;
 + if (setsockopt(port-sock, SOL_SOCKET, SO_SNDBUF,
 +(char *) on, sizeof(on))  0)
 + {
 + elog(LOG, setsockopt(SO_SNDBUF) failed: %m);
 + return STATUS_ERROR;
 + }
 + #endif
 +
   /*
* Also apply the current keepalive parameters.  If we fail to 
 set a
* parameter, don't error out, because these aren't universally

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [COMMITTERS] pgsql: Disallow changing/dropping default

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Wasn't this patch rejected?
 
  Anyway, what is your opinion on this?
 
 I thought we'd rejected it.  I'm not sure that we'd completely agreed
 what the best thing to do is, but what this patch actually does is to
 silently remove the dependency link.  That is, after
 
   create table t1 (f1 serial);
   alter table t1 alter column f1 drop default;
 
 t1_f1_seq is still there, but now completely unconnected to t1.
 That doesn't seem to me to satisfy the principle of least surprise.
 It's certainly not what the TODO item says (reject the DROP DEFAULT).
 I think we were considering the alternative of having the DROP DEFAULT
 remove the sequence, which probably could be implemented painlessly
 with a change in the way we set up the dependency links to start with.
 
 In any case I don't like this patch: int/bool confusion, use of elog
 instead of ereport for a user-facing error message, failure to adhere to
 style guidelines for that message, etc.  (Although seeing that the error
 message is unreachable code, maybe that doesn't matter ;-))  Aside from
 the poor coding style, the whole idea of reaching into pg_depend to
 remove a single dependency strikes me as a brute-force solution to
 a problem that should have a more elegant answer.

Agreed, patch reverted.  Thanks for the analysis.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-27 Thread Bruce Momjian
Greg Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  PFC wrote:
   
My idea is that if an UPDATE places the new tuple on the same page as
the old tuple, it will not create new index entries for any indexes
where the key doesn't change.
   
 Basically the idea behind preventing index bloat by updates is to have  
   one index tuple point to several actual tuples having the same value.
 
  
  The idea is not to avoid index bloat, but to allow heap reuse, and having
  one index entry for multiple versions of an UPDATEd row is merely an
  implementation detail.
 
 It sort of sounds like you're describing a whole new index type that stores
 only the page, not the precise record of any tuple it indexes. If your table

Background, indexes point to page item pointers, not to actual offsets
in the page.  This is how vacuum can move around tuples without modifying the
indexes.  The index points to a page item pointer that is a chain of
tuples with the same indexed columns.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
 I've been digging into why buildfarm member thrush has been dumping core
 consistently during the regression tests since the posix_fadvise patch
 went in.  I've confirmed that posix_fadvise() itself will SIGSEGV in a
 standalone test program, and found that this happens only if
 _FILE_OFFSET_BITS=64 ... which is our default configuration on Linux.
 
 Some googling turned up the following 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219
 which basically says that posix_fadvise64 + 2.4 kernel + older glibc
 = crash.  It sounds like the 2.4 kernel hasn't got this call but glibc
 thought it did, up till about a year ago.
 
 While we could possibly come up with a suitable configure test to
 determine whether posix_fadvise is actually safe to use on a given
 system, I think we should seriously consider just reverting the patch.
 As far as I saw, zero evidence was given that it actually does anything
 measurable.  Without a benchmark to prove that it's worth spending more
 time on, I'm disinclined to trouble over it.

Agreed.  How about if we just #ifdef NOT_USED the code and mention the
problem in a comment.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 We use non-blocking sockets in backend/port/win32/socket.c so we are
 able to deliver our faked signals while waiting for I/O on the socket.
 We specifically set it in pgwin32_socket().

Hm, that seems a bit grotty, but anyway I stand corrected.

 Given that, it might be a good idea to actually put the code there
 instead, to localise it. With a comment and a reference to that Q
 article.

No, I think the patch has it in the right place, because pgwin32_socket
would have no defensible way of knowing what the max send size would be.
(Indeed, with a slightly different implementation in pqcomm.c, there
would not *be* any hard upper limit; the current code wastes cycles
copying data around, when with a large message it probably should just
send() directly from the message buffer...)

I agree it needs a comment though.

 What I would think might help is a patch on the libpq side (because it
 *does* use a nonblocking socket) to avoid sending more than 
 8K per WSASend call.

 It could definitly be a good idea to have a patch there *as well*, but I
 think they'd both be affected.

On the libpq side, sending large messages is probably rare except for
COPY IN mode.  Has anyone noticed performance issues specifically with
COPY IN?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-27 Thread Mark Woodward
 We have definitly seen weird timing issues sometimes when both client
 and server were on Windows, but have been unable to pin it exactly on
 what. From Yoshiykis other mail it looks like this could possibly be it,
 since he did experience a speedup in the range we've been looking for in
 those cases.


 What I would think might help is a patch on the libpq side (because it
 *does* use a nonblocking socket) to avoid sending more than
 8K per WSASend call.  The effect would just be to break a
 long send into a series of shorter sends, which wouldn't
 really do anything useful on a well-designed TCP stack, but
 then this is Windows we're talking about...

 It could definitly be a good idea to have a patch there *as well*, but I
 think they'd both be affected.

As I said earlier, I would boost the socket buffer to something larger
than merely 2x the packet size. I'd try for 32K (32768), that way we have
some space for additional buffers before we hit the problem. It is
presumed that we should have enough data in the socket buffer to at least
try to match the expected amount of data that would be sent while waiting
for the defered ACK.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] posix_fadvise versus old kernels

2006-06-27 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 While we could possibly come up with a suitable configure test to
 determine whether posix_fadvise is actually safe to use on a given
 system, I think we should seriously consider just reverting the patch.
 As far as I saw, zero evidence was given that it actually does anything
 measurable.  Without a benchmark to prove that it's worth spending more
 time on, I'm disinclined to trouble over it.

 Agreed.  How about if we just #ifdef NOT_USED the code and mention the
 problem in a comment.

Works for me; I'll write something and commit it.  We can leave the
is-posix_fadvise-declared configure test in place, at least for now ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Stefan Kaltenbrunner
Teodor Sigaev wrote:
 test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
 CREATE INDEX
 Time: 416122.896 ms

 so about 7 minutes - sounds very reasonable

 test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
 CREATE INDEX
 Time: 52681605.101 ms
 
 I'll look at this,  but GiST time creation is suspiciously small.
 Can you test on smaller table, for example with 10 records and if
 results are repeat, pls, send to me test suite...

I won't have access to the original testcase and server for a few days
but I just redid some testing on a slower personal box of mine with a
smaller(but similiar) testset and on that box I could not reproduce that
issue.
So the problem is either caused by the size of the table or somehow by
the data itself :-(


Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Tom Lane
[EMAIL PROTECTED] (Bruce Momjian) writes:
 Add GUC update_process_title to control whether 'ps' display is updated
 for every command, default to on.

It strikes me that the ps_status support provides one important bit of
information that is currently hard to get elsewhere; specifically, the
waiting flag that gets added while blocked on a lock.  You can find
out if a process is blocked by looking in pg_locks, but that's a fairly
expensive probe in itself and then you have to join to pg_stat_activity
to make any sense of it.  I wonder if we should add a waiting boolean
column to pg_stat_activity?  Given the new implementation of
pg_stat_activity, updating such a flag would be pretty cheap.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:

2006-06-27 Thread Bruce Momjian
Tom Lane wrote:
 [EMAIL PROTECTED] (Bruce Momjian) writes:
  Add GUC update_process_title to control whether 'ps' display is updated
  for every command, default to on.
 
 It strikes me that the ps_status support provides one important bit of
 information that is currently hard to get elsewhere; specifically, the
 waiting flag that gets added while blocked on a lock.  You can find
 out if a process is blocked by looking in pg_locks, but that's a fairly
 expensive probe in itself and then you have to join to pg_stat_activity
 to make any sense of it.  I wonder if we should add a waiting boolean
 column to pg_stat_activity?  Given the new implementation of
 pg_stat_activity, updating such a flag would be pretty cheap.

Nice idea.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql:

2006-06-27 Thread Mark Kirkwood

Tom Lane wrote:

[EMAIL PROTECTED] (Bruce Momjian) writes:

Add GUC update_process_title to control whether 'ps' display is updated
for every command, default to on.


It strikes me that the ps_status support provides one important bit of
information that is currently hard to get elsewhere; specifically, the
waiting flag that gets added while blocked on a lock.  You can find
out if a process is blocked by looking in pg_locks, but that's a fairly
expensive probe in itself and then you have to join to pg_stat_activity
to make any sense of it.  I wonder if we should add a waiting boolean
column to pg_stat_activity?  Given the new implementation of
pg_stat_activity, updating such a flag would be pretty cheap.




Funny - today I was just thinking how useful that would be!

Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 [EMAIL PROTECTED] (Bruce Momjian) writes:
  Add GUC update_process_title to control whether 'ps' display is updated
  for every command, default to on.
 
 It strikes me that the ps_status support provides one important bit of
 information that is currently hard to get elsewhere; specifically, the
 waiting flag that gets added while blocked on a lock.  You can find
 out if a process is blocked by looking in pg_locks, but that's a fairly
 expensive probe in itself and then you have to join to pg_stat_activity
 to make any sense of it.  I wonder if we should add a waiting boolean
 column to pg_stat_activity?  Given the new implementation of
 pg_stat_activity, updating such a flag would be pretty cheap.

That would be an *excellent* addition..  Honestly, I think it'd be nice
to get a 'NOTICE' in such cases too, but having it in pg_stat_activity
will help alot.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fully replacing ps_status (was Re: [COMMITTERS] pgsql: Add GUC update_process_title to control whether 'ps' display is)

2006-06-27 Thread Stephen Frost
* Stephen Frost ([EMAIL PROTECTED]) wrote:
 That would be an *excellent* addition..  Honestly, I think it'd be nice
 to get a 'NOTICE' in such cases too, but having it in pg_stat_activity
 will help alot.

Additionally, Tom, and I hate to point this out here but I don't see
much of an alternative; your mail system blocked my mail and apparently 
would block any mail from theplanet.com (a rather large hosting company 
based in Texas).  Having been harrassed by their support folks for
running a couple rather large mailing lists (the MythTV ones at
mythtv.org) more than once I'm thinking this is a rather poor RBL.  I'd
encourage you to drop it in favor of something a bit more sane.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS]

2006-06-27 Thread tju tju
-- liuchao


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Josh Berkus
Jim,

 I know there were discussions in the past, though as per usual I can't
 find them in the archives.

Search on B-Tree Organized Tables.

From what I can find, this feature isn't prohibitively useless.  It's just a 
singnificant amount of effort for a result which is a tradeoff.   That is, 
you'd *only* want to use it on tables which are *always* accessed by their 
primary key.  

What stopped the features AFAICT is that the interested parties weren't up to 
doing the code.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Page format changes for 8.2?

2006-06-27 Thread Josh Berkus
Bruce,

Do we have anything in the pipeline that would result in page format changes 
for 8.2?  I'm wondering if it's worth reviving pg_upgrade, folks at work are 
interested ...

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Page format changes for 8.2?

2006-06-27 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 Do we have anything in the pipeline that would result in page format changes 
 for 8.2?

[ looks at
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h
]

I don't see any page format changes per se, but inet/cidr datatype
representation changed here:

2006-01-25 21:35  tgl

* doc/src/sgml/func.sgml, src/backend/optimizer/path/indxpath.c,
src/backend/utils/adt/network.c, src/include/catalog/catversion.h,
src/include/catalog/pg_amop.h, src/include/catalog/pg_cast.h,
src/include/catalog/pg_operator.h, src/include/catalog/pg_proc.h,
src/include/utils/builtins.h, src/include/utils/inet.h,
src/test/regress/expected/opr_sanity.out,
src/test/regress/sql/opr_sanity.sql: Clean up the INET-vs-CIDR
situation.  Get rid of the internal is_cidr flag and rely
exclusively on the SQL type system to tell the difference between
the types.  Prevent creation of invalid CIDR values via casting
from INET or set_masklen() --- both of these operations now
silently zero any bits to the right of the netmask.  Remove
duplicate CIDR comparison operators, letting the type rely on the
INET operators instead.

We've debated before how to get pg_upgrade to handle these sorts of
changes.  If you want to make it happen, step right up ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org