[HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
Forgot to include the group...



-- Forwarded message --
From: Gokulakannan Somasundaram [EMAIL PROTECTED]
Date: Nov 5, 2007 3:04 PM
Subject: Re: Clarification about HOT
To: Gregory Stark [EMAIL PROTECTED]



On 11/2/07, Gregory Stark [EMAIL PROTECTED] wrote:


 Gokulakannan Somasundaram [EMAIL PROTECTED] writes:

  Another reason is that it avoids the whole problem of updating multiple
  pages atomically, without deadlocks.
 
 
  Thanks Heikki.  I am still not getting what you said. In the case of
 HOT,
  you need to update the top pointer to point to some other tuple in some
  other page. That's one update. what's the other one?

 There are several problems, two that come to mind are:

 1) How do you make the dead top pointer redirect to the first live pointer
 in
 the chain? Currently we store the item number of the first live tuple in
 the
 line pointer. You would have to keep the tuple around, though you could
 truncate it to just the tuple header.

 2) When vacuuming how do you find the head of the chain when you're
 looking at
 a dead tuple? There's no space in the tuple header to store the head of
 the
 chain. Besides you want to vacuum scanning sequentially, not randomly.


Thanks for all the inputs. My question would be if we decide to update the
top of the HOT chain in the Index itself. Right now we are carrying a list
of tuple-ids to be vacuumed, when we vacuum the index. Say we carry another
list (or through some better mechanism), which would carry the corresponding
live HOT tuple to be pointed. In other words we would try to make the index
point to the top of the HOT chain during Vacuum.

Am i making some mistake?

-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


[HACKERS] Visibility map thoughts

2007-11-05 Thread Heikki Linnakangas
Though 8.3 isn't out of the oven just yet, I've been thinking about the 
dead space map a lot, and decided I have to start writing down those 
thoughts.


Reducing VACUUM time is important, but the real big promise is the 
ability to do index-only-scans. Because that's the main focus of this 
exercise, I'm calling it the the Visibility Map from now on, because 
it's not about tracking dead space, but tuple visibility in general. 
Don't worry, reduced VACUUM times on read-mostly tables with hot spots 
will still fall out of it.




What to store in the Visibility Map?


Each potential user of the visibility map needs slightly different 
information:


a) VACUUM needs to know if a page has enough dead tuples that it's 
interesting to visit.
b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that 
can be frozen.
c) Index-only-scans needs to know if all tuples on a page are visible to 
all transactions. HOT updates are OK, though.


From this point on, this document concentrates on a map suited for 
index-only-scans.


The basic structure is a bitmap with one bit per heap page. A set bit 
means all tuples on heap page are visible to all transactions. A 
cleared bit means that we don't know if that's true or not.


This kind of map is useful for VACUUM as well, though VACUUM could get 
away with less strict semantics, and we might not want to visit a page 
in VACUUM if there's only very few dead tuples on a page. What this 
implementation gives us is a conservative VACUUM that will always find 
all the same dead tuples as a regular VACUUM (except for HOT updated 
tuples which can be pruned without scanning indexes). VACUUM using the 
visibility map can't update stats, so we'd still want regular vacuums 
every now and then, or rely on ANALYZE to do that.


It's not useful for VACUUM FREEZE, unless we're willing to freeze much 
more aggressively, and change the meaning of a set bit to all tuples on 
heap page are frozen.


Other kinds of visibility maps could be useful in some narrow scenarios. 
For example, if we had a map of pages that have no live tuples, we could 
skip those pages in sequential scans. We could also use more than one 
bit per page to store more information, but let's keep it simple for now.



Where to store the visibility map?
--
a) In a fixed size shared memory struct. Not acceptable.

b) In the special area of every nth heap page. I played a bit with this 
back in February 2006, because it's simple and requires few changes.


c) As a new relation kind, like toast tables.

d) As an auxiliary smgr relation, attached to the heap.

I'm leaning towards D at the moment. It requires a little bit of changes 
to the buffer manager API and elsewhere, but not too much.


B is not good because we might want to have other kinds of auxiliary 
files in the future (FSM in particular). And it is a bit hacky anyway. 
Let's do something more generic.


C doesn't seem like the right approach, because a visibility map doesn't 
really have much in common with real relations.


In any case, the bitmap is divided into pages, and the pages live in 
shared_buffers, and are managed by the buffer manager like any other page.


Updating the visibility map
---

A single page in the visibility map covers a wide range of heap pages 
((8192 - page header) * 8 = ~65000 heap pages). It can easily become 
locking bottleneck, if every update on any page in that range needs to 
lock the visibility map page.


Setting a bit is just a hint. It's ok to lose it on crash. However, 
setting a bit mustn't hit the disk too early. What might otherwise 
happen is that the change that made all tuples on a page visible, like 
committing an inserting transaction, isn't replayed after crash, but the 
set bit is already on disk. In other words, setting a bit must set the 
LSN of the visibility map page.


Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's 
ok if it hits the disk before the operation that cleared it. Therefore 
we don't need to set the LSN, but it must be redone at WAL replay of 
heap insert/update/delete.


Because a concurrent update on the same word might be lost if we didn't 
lock the page at all, we need to lock the visibility map page to set or 
clear a bit. Since these are quick operations, especially clearing a 
bit, perhaps we could use just the buffer header spinlock.


To further reduce contention, we can have a copy of the bit in the page 
header of the heap page itself. That way we'd only need to access the 
visibility map on the first update on a page that clears a bit.


When to update the visibility map?
--
- Clear bit on every update/insert/delete.
- Set bit in vacuum or prune, if all tuples on page are now visible to 
all transactions.
- If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't 
set the bit yet, but as soon as the 

Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

Thanks for all the inputs. My question would be if we decide to update the
top of the HOT chain in the Index itself. Right now we are carrying a list
of tuple-ids to be vacuumed, when we vacuum the index. Say we carry another
list (or through some better mechanism), which would carry the corresponding
live HOT tuple to be pointed. In other words we would try to make the index
point to the top of the HOT chain during Vacuum.


Yeah, we could do that. It was discussed in Spring, along with many 
other alternatives. Search the archives for pointer swinging.


Basically, we decided we can live without it for now. It would've 
required quite a bit of changes, for not that much gain. We might still 
want it in the future if there's demand for it. If you really need to 
recover those 4 bytes per HOT chain, you can use VACUUM FULL, though it 
does take an exclusive lock on the table.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  Thanks for all the inputs. My question would be if we decide to update
 the
  top of the HOT chain in the Index itself. Right now we are carrying a
 list
  of tuple-ids to be vacuumed, when we vacuum the index. Say we carry
 another
  list (or through some better mechanism), which would carry the
 corresponding
  live HOT tuple to be pointed. In other words we would try to make the
 index
  point to the top of the HOT chain during Vacuum.

 Yeah, we could do that. It was discussed in Spring, along with many
 other alternatives. Search the archives for pointer swinging.

 Basically, we decided we can live without it for now. It would've
 required quite a bit of changes, for not that much gain. We might still
 want it in the future if there's demand for it. If you really need to
 recover those 4 bytes per HOT chain, you can use VACUUM FULL, though it
 does take an exclusive lock on the table.


I think pointer swinging is still about maintaining the HOT chain within a
page. Actually i am thinking about continuing the HOT chain across pages.
The advantages are obvious

a) Indexes need not get updated unless their values are updated, even if the
inserts go into new pages
b) Much smaller Index Footprint.

May be i am missing something in the big picture. Please clarify me on that.



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

I think pointer swinging is still about maintaining the HOT chain within a
page. Actually i am thinking about continuing the HOT chain across pages.


AFAICS, pointer-swinging would be exactly the same on cross-page HOT 
chains as same-page chains.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  I think pointer swinging is still about maintaining the HOT chain within
 a
  page. Actually i am thinking about continuing the HOT chain across
 pages.

 AFAICS, pointer-swinging would be exactly the same on cross-page HOT
 chains as same-page chains.


When i read pointer-swinging, it talks a lot about in-page updates, pointing
to the latest tuple instead of oldest tuple and circular pointers etc.
Maybe, if i am missing the post, which you are referring to, please correct
me on the steps i am talking about.

This scheme, if implemented would avoid the use of HOT stub and avoids the
need to go to index, even if some other indexes are getting updated/ even if
the updated new tuple goes into a new page

a) Whenever we update, we will update only those indexes whose attributes
are updated. So the index will point to the top of the HOT chain.
b) Whenever we Vacuum the index, we take a list of tids and check whether
there are any index tuples pointing to it. If the Vacuumed tuple is a start
of the HOT chain, then we will carry the next in-line HOT tuple when we goto
Vacuum the index. If the next in-line also satisfies the Vacuum, it will
carry with it the next in-line HOT tuple-id.

In this way, we will make sure the Index points to a live tuple after the
Vacuum.

This will remove the in-page pruning exercises, but as i said already the
cost of updates will go down a lot with normal indexes.

Can you please tell, what are the disadvantages of the above mentioned
approach?


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

AFAICS, pointer-swinging would be exactly the same on cross-page HOT
chains as same-page chains.


When i read pointer-swinging, it talks a lot about in-page updates, pointing
to the latest tuple instead of oldest tuple and circular pointers etc.
Maybe, if i am missing the post, which you are referring to, please correct
me on the steps i am talking about.


It talks about in-page updates etc, because we are only doing HOT 
updates within page. The pointer-swinging would still work across page 
boundaries, AFAICS.



This scheme, if implemented would avoid the use of HOT stub and avoids the
need to go to index, even if some other indexes are getting updated/ even if
the updated new tuple goes into a new page


You'd still have the HOT stubs until the next VACUUM.


a) Whenever we update, we will update only those indexes whose attributes
are updated. So the index will point to the top of the HOT chain.


I don't see how pointer-swinging would solve the problem with updating 
just some indexes. On vacuum, you'd have to know which indexes were 
updated, and remove the old pointers in the ones that were updated, and 
pointer-swing others. You could store that information in the heap 
tuple, but that means more bloat in the heap, and more complexity.


We did talk about that back in spring as well, but all the suggestions 
were quite complex. Besides, as soon as you insert at least one new 
index pointer, you no longer can remove the dead HOT updated tuple 
without scanning at least that one index.



b) Whenever we Vacuum the index, we take a list of tids and check whether
there are any index tuples pointing to it. If the Vacuumed tuple is a start
of the HOT chain, then we will carry the next in-line HOT tuple when we goto
Vacuum the index. If the next in-line also satisfies the Vacuum, it will
carry with it the next in-line HOT tuple-id.


Sorry, I didn't understand that. But the way you described it earlier, 
it's exactly the same thing as the pointer-swinging we talked about in 
spring. Is it the same or not?



This will remove the in-page pruning exercises, but as i said already the
cost of updates will go down a lot with normal indexes.


We don't want to get rid of the in-page pruning. It allows us to reclaim 
dead space without having to VACUUM. That's a major point of HOT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram

  b) Whenever we Vacuum the index, we take a list of tids and check
 whether
  there are any index tuples pointing to it. If the Vacuumed tuple is a
 start
  of the HOT chain, then we will carry the next in-line HOT tuple when we
 goto
  Vacuum the index. If the next in-line also satisfies the Vacuum, it will
  carry with it the next in-line HOT tuple-id.

 Sorry, I didn't understand that. But the way you described it earlier,
 it's exactly the same thing as the pointer-swinging we talked about in
 spring. Is it the same or not?


The onle extra overhead is that we will need more memory during Vacuum. We
are currently calling the tid_reaped function / lazy_tid_reaped function. It
does a binary search to check whether the tid pointed by the index, is
present in its array/list. If it is present, it means that it is ready for
Vacuum. For HOT tuples, this list will carry a replacement tid(the next
in-line HOT Tuple). So instead of removing the index tuples, we will update
the tid part of the index tuples. So there is no HOT stub here. The index
will try to point to the live HOT tuple (it would also be the head of the
chain).

Say index was previously pointing to (3,4). After (3,4) gets ready to
Vacuum, it will send a replacement tid, the one in its t_data. say (5,6).

So once the Vacuum identifies the correct Index Tuple, it will update the
tid portion of index tuple to (5,6). Please advise me on whether i am
missing something / not clear in the explanation.

I think i am very poor in understanding things at the first time and also
very poor in putting across my point the very first time. Please bear with
that :)


 This will remove the in-page pruning exercises, but as i said already the
  cost of updates will go down a lot with normal indexes.

 We don't want to get rid of the in-page pruning. It allows us to reclaim
 dead space without having to VACUUM. That's a major point of HOT.


But we are going to get the index sizes very small and also we are going to
reduce the cost of updates. Isn't that sufficient enough reason for us?



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] should I worry?

2007-11-05 Thread ohp
Hi Tom
On Sun, 4 Nov 2007, Tom Lane wrote:

 Date: Sun, 04 Nov 2007 19:47:04 -0500
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] should I worry?

 I wrote:
  Hmm, this is messier than I thought.  What evidently has happened is
  that at one time or another, one of the two tables involved in an FK
  relationship has been dropped and re-created.  If you'd had proper
  FK constraints the constraints would have gone away cleanly, but with
  these old trigger definitions there was no mechanism to make that
  happen, and so the triggers on the other table remained in place.

 That seems to have happened several times, in fact.  After tweaking
 ConvertTriggerToFK() to be more verbose and to insist on finding all
 three triggers, I get this:

 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_position(ppos_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY objet_position(pobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY objet_position(pobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  converting trigger group into constraint unnamed FOREIGN KEY 
 objet_position(pobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referenced table's UPDATE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_objets(perobj_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  converting trigger group into constraint unnamed FOREIGN KEY 
 perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod)
 DETAIL:  Found referenced table's UPDATE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  converting trigger group into constraint unnamed FOREIGN KEY 
 messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referenced table's UPDATE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_dest(dmsg_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  converting trigger group into constraint unnamed FOREIGN KEY 
 messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod)
 DETAIL:  Found referenced table's UPDATE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY messages_exp(emsg_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referencing table's trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_position(ppos_pos_cod) REFERENCES positions(pos_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY perso_position(ppos_pos_cod) REFERENCES positions(pos_cod)
 DETAIL:  Found referenced table's UPDATE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY objet_position(pobj_pos_cod) REFERENCES positions(pos_cod)
 DETAIL:  Found referenced table's DELETE trigger.
 NOTICE:  ignoring incomplete trigger group for constraint unnamed FOREIGN 
 KEY 

Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

b) Whenever we Vacuum the index, we take a list of tids and check

whether

there are any index tuples pointing to it. If the Vacuumed tuple is a

start

of the HOT chain, then we will carry the next in-line HOT tuple when we

goto

Vacuum the index. If the next in-line also satisfies the Vacuum, it will
carry with it the next in-line HOT tuple-id.

Sorry, I didn't understand that. But the way you described it earlier,
it's exactly the same thing as the pointer-swinging we talked about in
spring. Is it the same or not?


The onle extra overhead is that we will need more memory during Vacuum. We
are currently calling the tid_reaped function / lazy_tid_reaped function. It
does a binary search to check whether the tid pointed by the index, is
present in its array/list. If it is present, it means that it is ready for
Vacuum. For HOT tuples, this list will carry a replacement tid(the next
in-line HOT Tuple). So instead of removing the index tuples, we will update
the tid part of the index tuples. So there is no HOT stub here. The index
will try to point to the live HOT tuple (it would also be the head of the
chain).

Say index was previously pointing to (3,4). After (3,4) gets ready to
Vacuum, it will send a replacement tid, the one in its t_data. say (5,6).

So once the Vacuum identifies the correct Index Tuple, it will update the
tid portion of index tuple to (5,6). Please advise me on whether i am
missing something / not clear in the explanation.


To answer the question I asked you, based on the above, this really is 
exactly the same pointer-swinging we talked about in spring.



This will remove the in-page pruning exercises, but as i said already the

cost of updates will go down a lot with normal indexes.

We don't want to get rid of the in-page pruning. It allows us to reclaim
dead space without having to VACUUM. That's a major point of HOT.


But we are going to get the index sizes very small and also we are going to
reduce the cost of updates. Isn't that sufficient enough reason for us?


No.

You haven't actually explained why you'd have to remove the in-page 
pruning exercises. I suspect that's not true.


Not that any of this really matters, until you address the arguments 
against doing HOT updates across pages in the first place.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Fwd: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
Again Forgot to include the group...

-- Forwarded message --
From: Gokulakannan Somasundaram [EMAIL PROTECTED]
Date: Nov 5, 2007 5:09 PM
Subject: Re: [HACKERS] Fwd: Clarification about HOT
To: Heikki Linnakangas [EMAIL PROTECTED]



On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  b) Whenever we Vacuum the index, we take a list of tids and check
  whether
  there are any index tuples pointing to it. If the Vacuumed tuple is a
  start
  of the HOT chain, then we will carry the next in-line HOT tuple when
 we
  goto
  Vacuum the index. If the next in-line also satisfies the Vacuum, it
 will
  carry with it the next in-line HOT tuple-id.
  Sorry, I didn't understand that. But the way you described it earlier,
  it's exactly the same thing as the pointer-swinging we talked about in
  spring. Is it the same or not?
 
  The onle extra overhead is that we will need more memory during Vacuum.
 We
  are currently calling the tid_reaped function / lazy_tid_reaped
 function. It
  does a binary search to check whether the tid pointed by the index, is
  present in its array/list. If it is present, it means that it is ready
 for
  Vacuum. For HOT tuples, this list will carry a replacement tid(the next
  in-line HOT Tuple). So instead of removing the index tuples, we will
 update
  the tid part of the index tuples. So there is no HOT stub here. The
 index
  will try to point to the live HOT tuple (it would also be the head of
 the
  chain).
 
  Say index was previously pointing to (3,4). After (3,4) gets ready to
  Vacuum, it will send a replacement tid, the one in its t_data. say
 (5,6).
 
  So once the Vacuum identifies the correct Index Tuple, it will update
 the
  tid portion of index tuple to (5,6). Please advise me on whether i am
  missing something / not clear in the explanation.

 To answer the question I asked you, based on the above, this really is
 exactly the same pointer-swinging we talked about in spring.

  This will remove the in-page pruning exercises, but as i said already
 the
  cost of updates will go down a lot with normal indexes.
  We don't want to get rid of the in-page pruning. It allows us to
 reclaim
  dead space without having to VACUUM. That's a major point of HOT.
 
  But we are going to get the index sizes very small and also we are going
 to
  reduce the cost of updates. Isn't that sufficient enough reason for us?

 No.

 You haven't actually explained why you'd have to remove the in-page
 pruning exercises. I suspect that's not true.


OK. If HOT updates are going to be across pages, we may not know whether we
need to do Pruning, because we don't know whether it is a in-page HOT/
out-of page HOT. May be we can allocate some extra bits for that. I am not
favouring it. That's an option to be tried out.

Not that any of this really matters, until you address the arguments
 against doing HOT updates across pages in the first place.


Yes, but those arguments were centered around the current implementation,
where we have a HOT stub, which will point us to the latest live HOT Tuple.
the two problems which were put forth in this thread are
a) updating two pages concurrently and
Soln:  We are just following the current Vacuum process. We are updating the
index tuple tid, instead of reclaiming its space with a Super Exclusive
lock. We will vacuum the heap tuple as usual.

b) we won't know the HOT chain stub tid, when we encounter a HOT dead tuple
Soln: Whenever we find a HOT dead tuple, during vacuum, we will add it to
the list(tid_reaped uses) with the next-in-line HOT Tuple. So when we vacuum
the index, it would be taken care of

i am actually expecting some issues, as this is not a proposal and just a
discussion



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
( www.alliedgroups.com)


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


[HACKERS] Slow regression tests on windows

2007-11-05 Thread Gevik Babakhani

I am trying to run regression tests on both windows and RH.
It looks like that the tests on windows run slower than linux 
using two machines with same hardware config.

Is this known?




Gevik Babakhani

PostgreSQL NL   http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] should I worry?

2007-11-05 Thread Tom Lane
[EMAIL PROTECTED] writes:
 On Sun, 4 Nov 2007, Tom Lane wrote:
 So you have a *bunch* of partially broken FK constraints in that source
 database.

 I just talk to my customer and he/we'll make a big batch deleting and
 recreating all foreign keys on 8.2.5.
 The question is, how do we get rid of those useless unamed triggers?

DROP TRIGGER should be fine.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Slow regression tests on windows

2007-11-05 Thread Andrew Dunstan



Gevik Babakhani wrote:

I am trying to run regression tests on both windows and RH.
It looks like that the tests on windows run slower than linux 
using two machines with same hardware config.


Is this known?

  


We need far  more information than this before we can say much sensibly, 
I think.. For example, what configure flags you are using, what 
compilers, etc.


From what I can see MinGW regression is somewhat slower than MSVC (and 
even than Cygwin) on my buildfarm VM that runs all three, even though 
the latter two are rate limited by MAX_CONNECTIONS.


The regression tests really aren't performance tests, though.

cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Tom Lane
Gokulakannan Somasundaram [EMAIL PROTECTED] writes:
 May be i am missing something in the big picture. Please clarify me on that.

Locking.  Your proposal involves lots of multi-page operations, which
are best avoided.

Moreover, you have offered no data to suggest that there would be any
real gain from all this extra complexity.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Gokulakannan Somasundaram
This is my feedback regarding the Visibility map.

I just want to disagree on the fact that the DSM/Visibility map would not
focus on Vacuum. I think Index only scans should be thought of as a fringe
benefit of DSMs/ Visibility Maps. There are some basic assumptions in the
design, which says that
a) The inserts won't increase the size of the table. If it increases, it has
to lock one full page of Visibility map and this is not suitable for tables,
which are short-lived like partitioned tables
b) Even if the inserts don't increase the size of the table, it might make
DSM useless, if lot of inserts keep converting the all-visible ones to
uncertain ones. For that matter, even the Deletes and Updates are also going
to make lot of pages into uncertain ones.
c) Visibility map gets useless, when there is a long running batch query /
periodic background queries which run for longer times
d) More updates- more blocks of uncertainity - space usage by DSM and the
reference made to DSM is just an overhead
e) Lot of times, people may not need index-only scans. Again this gets to be
a overhead
f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
in the queries during the time, the DSM gets re-constructed.

I am not opposing this, as it is a redundant feature for Thick indexes.
After all every one of us, want Postgres to be the fastest one in the world.

But because DSM has a inherent assumption that lot of tables will become
static and all the tuples would be visible to everyone. If there are such
tables, then definitely Thick index becomes a overhead in terms of space.
But DSM should not become overhead at any cost, as it is a memory resident
one at all times and also always gets into the lifecycle of a query. Only
way to achieve it is to make it a dual purpose one. It should help Vacuum,
freezing and visibility checks.

Thanks,
Gokul.

On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Though 8.3 isn't out of the oven just yet, I've been thinking about the
 dead space map a lot, and decided I have to start writing down those
 thoughts.

 Reducing VACUUM time is important, but the real big promise is the
 ability to do index-only-scans. Because that's the main focus of this
 exercise, I'm calling it the the Visibility Map from now on, because
 it's not about tracking dead space, but tuple visibility in general.
 Don't worry, reduced VACUUM times on read-mostly tables with hot spots
 will still fall out of it.



 What to store in the Visibility Map?
 

 Each potential user of the visibility map needs slightly different
 information:

 a) VACUUM needs to know if a page has enough dead tuples that it's
 interesting to visit.
 b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
 can be frozen.
 c) Index-only-scans needs to know if all tuples on a page are visible to
 all transactions. HOT updates are OK, though.

 From this point on, this document concentrates on a map suited for
 index-only-scans.

 The basic structure is a bitmap with one bit per heap page. A set bit
 means all tuples on heap page are visible to all transactions. A
 cleared bit means that we don't know if that's true or not.

 This kind of map is useful for VACUUM as well, though VACUUM could get
 away with less strict semantics, and we might not want to visit a page
 in VACUUM if there's only very few dead tuples on a page. What this
 implementation gives us is a conservative VACUUM that will always find
 all the same dead tuples as a regular VACUUM (except for HOT updated
 tuples which can be pruned without scanning indexes). VACUUM using the
 visibility map can't update stats, so we'd still want regular vacuums
 every now and then, or rely on ANALYZE to do that.

 It's not useful for VACUUM FREEZE, unless we're willing to freeze much
 more aggressively, and change the meaning of a set bit to all tuples on
 heap page are frozen.

 Other kinds of visibility maps could be useful in some narrow scenarios.
 For example, if we had a map of pages that have no live tuples, we could
 skip those pages in sequential scans. We could also use more than one
 bit per page to store more information, but let's keep it simple for now.


 Where to store the visibility map?
 --
 a) In a fixed size shared memory struct. Not acceptable.

 b) In the special area of every nth heap page. I played a bit with this
 back in February 2006, because it's simple and requires few changes.

 c) As a new relation kind, like toast tables.

 d) As an auxiliary smgr relation, attached to the heap.

 I'm leaning towards D at the moment. It requires a little bit of changes
 to the buffer manager API and elsewhere, but not too much.

 B is not good because we might want to have other kinds of auxiliary
 files in the future (FSM in particular). And it is a bit hacky anyway.
 Let's do something more generic.

 C doesn't seem like the right approach, because a visibility map 

Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
Tom,
   Let me try to understand your statement.

What extra multi-page operations are we doing?
  Currently, during Vacuum,  we goto the Index and mark it as dead and
reclaim the space. For doing this, we are acquiring a Super-Exclusive lock.
After this implementation, we would update the index tuple instead of
marking it for cleanup. What can be foreseen as a locking overhead here?

Actually i don't know what should be the best practice. Should i start a
discussion, take inputs from everyone and start a implementation? (Or)
Should i finish the task, get the performance figures and then come to the
forum?
I realize, that i am doing something wrong here.

HOT in its present implementation has some complexities associated, as it is
dealt as a special case. I saw that you have also made that comment in your
review. The only place where HOT is innovative is in its underlying
assumption
- Since we are not storing the snapshot info into the index, it need not get
updated, if the index info is not changing.

Currently we have implemented in a very limited sense- works only when you
do in-page updates, works only when no index get updated.
It would relish its completeness, if it works across pages and works
treating each index as a seperate entity and makes decisions on updating it
on a index-by-index basis.

By doing this, we will have a rich indexing infrastructure, where thin
indexes are suitable for heavily updated tables and thick indexes would be
suitable for heavily selected tables.

I definitely need guidance from you, before going into its implementation.
So please don't consider this as a proposal. With your experience, try to
gauge the usefulness of this feature. Some small tricks from you to make it
even more effective, would also be very useful.

Thanks,
Gokul.


On 11/5/07, Tom Lane [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram [EMAIL PROTECTED] writes:
  May be i am missing something in the big picture. Please clarify me on
 that.

 Locking.  Your proposal involves lots of multi-page operations, which
 are best avoided.

 Moreover, you have offered no data to suggest that there would be any
 real gain from all this extra complexity.

 regards, tom lane




-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Pavan Deolasee
On Nov 5, 2007 7:37 PM, Gokulakannan Somasundaram [EMAIL PROTECTED]
wrote:

 Tom,
Let me try to understand your statement.

 What extra multi-page operations are we doing?
   Currently, during Vacuum,  we goto the Index and mark it as dead and
 reclaim the space. For doing this, we are acquiring a Super-Exclusive lock.
 After this implementation, we would update the index tuple instead of
 marking it for cleanup. What can be foreseen as a locking overhead here?



Its not just about vacuuming. You need to worry about locking during the
HOT-fetches as well as chain pruning. There could be tricky corner cases
between index/seq scans and pruning. And don't forget CREATE INDEX
which would become even more challenging if you have HOT chains
spanning multiple pages.

This is not to discourage you from trying to improve HOT. But
once-upon-a-time
we had this multi-page HOT (it was called Heap-Overflow-Tuple) and I can
tell you: it was really complex.


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
Thanks for the feedback. Let me try to put what is there in my mind for
this. Please clarify whether my assumptions are valid

On 11/5/07, Pavan Deolasee [EMAIL PROTECTED] wrote:



 On Nov 5, 2007 7:37 PM, Gokulakannan Somasundaram [EMAIL PROTECTED]
 wrote:

  Tom,
 Let me try to understand your statement.
 
  What extra multi-page operations are we doing?
Currently, during Vacuum,  we goto the Index and mark it as dead and
  reclaim the space. For doing this, we are acquiring a Super-Exclusive lock.
  After this implementation, we would update the index tuple instead of
  marking it for cleanup. What can be foreseen as a locking overhead here?
 
 
 
 Its not just about vacuuming. You need to worry about locking during the
 HOT-fetches as well as chain pruning.


During HOT fetches, the normal case is that we will take the tuple-id from
the index, goto the top of HOT chain, descend from there, until we reach the
right tuple. So this would involve BUFFER_SHARE locks which should not be of
any concern. There may not be anything called chain-pruning. Instead the
tuples, which are to be vacuumed, will get vacuumed, after redirecting their
index tuple peers, during the Vacuum process.

There could be tricky corner cases
 between index/seq scans and pruning.


In seq-scans, i think we need not worry about the HOT implementation. we
need to take each tuple, check for the visibility criteria and take the
appropriate step. During clean-up, we will be taking the super exclusive
lock here. In Index scans, we will change the index-entry, only after
obtaining the Super-Exclusive locks - so no pins - so no index scans are
going on during this time

And don't forget CREATE INDEX
 which would become even more challenging if you have HOT chains
 spanning multiple pages.


Create index has to do seq scan. so it will consider only tuples which are
live at the time of creation of index. It won't worry about the HOT chains.

This is not to discourage you from trying to improve HOT. But
 once-upon-a-time
 we had this multi-page HOT (it was called Heap-Overflow-Tuple) and I can
 tell you: it was really complex.



Thanks a lot for the encouraging words.  I would definitely refer to the
Heap Overflow Tuple and check, whether there are any problems that are going
to recur in this.

In the mean-while, if you can think of a specific case, in which this design
would fail, please notify me.


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Tom Lane
Gokulakannan Somasundaram [EMAIL PROTECTED] writes:
   Currently, during Vacuum,  we goto the Index and mark it as dead and
 reclaim the space. For doing this, we are acquiring a Super-Exclusive lock.
 After this implementation, we would update the index tuple instead of
 marking it for cleanup. What can be foreseen as a locking overhead here?

There are three operations involved: marking the child tuple not-HOT,
updating the index entry(s), and removing the parent tuple.  I doubt you
can do them safely as independent atomic operations; I think you'll need
to have more than one page locked at a time.  The locking problem also
applies to trying to collapse out a dead HOT tuple that's in the middle
of the chain: if it's the end of a cross-page link then you need two
pages super-exclusive-locked in order to do that.

There's also the little problem that a redirect line pointer doesn't
have room for a cross-page link, and the big problem that having to
chase across multiple pages for an index lookup would almost certainly
negate any performance gains you might get.  (In particular it'd
completely destroy locality of access for bitmap indexscans...)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Pavan Deolasee
On Nov 5, 2007 8:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED]
wrote:



 There may not be anything called chain-pruning. Instead the tuples, which
 are to be vacuumed, will get vacuumed, after redirecting their index tuple
 peers, during the Vacuum process.



This won't help us check the heap bloat. Though containing index bloat is
important,
most of the performance benefits of HOT comes from doing page level retail
vacuuming.
This not only reduces the heap bloat but also results in less frequent
vacuuming
of the table.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

a) The inserts won't increase the size of the table. If it increases, it has
to lock one full page of Visibility map and this is not suitable for tables,
which are short-lived like partitioned tables


The overhead of locking a page is very small.

Actually, extending a heap only needs to touch the visibility map when 
we need a new visibility map page, if we initialize all bits to zero. 
Like we do already anyway.



b) Even if the inserts don't increase the size of the table, it might make
DSM useless, if lot of inserts keep converting the all-visible ones to
uncertain ones. For that matter, even the Deletes and Updates are also going
to make lot of pages into uncertain ones.


Sure. If you have a lot of (random) inserts/updates/deletes, it becomes 
much less useful.


A small mitigating factor is that an insert/update/delete will fetch the 
heap page to memory anyway. Therefore having to access it just after the 
update is cheap. This helps inserts in particular, because after the 
inserting transaction is  OldestXmin, we can set the bit again.



c) Visibility map gets useless, when there is a long running batch query /
periodic background queries which run for longer times


Yeah, long running transactions are a bitch in many ways.


d) More updates- more blocks of uncertainity - space usage by DSM and the
reference made to DSM is just an overhead
e) Lot of times, people may not need index-only scans. Again this gets to be
a overhead


The beauty of this approach is that the overhead is very small.


f) If there are scheduled reboots, the DSM crashes and periodic slow-downs
in the queries during the time, the DSM gets re-constructed.


That's rubbish.


I am not opposing this, as it is a redundant feature for Thick indexes.
After all every one of us, want Postgres to be the fastest one in the world.


And also the easiest to maintain, most space-efficient, most reliable 
and so forth...



But because DSM has a inherent assumption that lot of tables will become
static and all the tuples would be visible to everyone. If there are such
tables, then definitely Thick index becomes a overhead in terms of space.
But DSM should not become overhead at any cost, as it is a memory resident
one at all times and also always gets into the lifecycle of a query. Only
way to achieve it is to make it a dual purpose one. It should help Vacuum,
freezing and visibility checks.


I don't understand this paragraph.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] minimal update

2007-11-05 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  

Tom Lane wrote:


A BEFORE UPDATE trigger would be better, and probably hardly more
expensive than a wired-in facility (especially if you were willing to
write it in C).
  


  
Yes. I also  prefer the trigger idea to a rule because triggers are easy 
to enable  and disable. It's still a lot of work for what must be a 
common want, though. Could it be done generically?



Well, you could write the trigger in C and it'd work for any table.
I think it could be as simple as a memcmp of the tuples' data areas,
since we now require padding bytes to be 0 ...


  



Something like this fragment?

 newtuple = trigdata-tg_newtuple;
 oldtuple = trigdata-tg_trigtuple;
 rettuple = newtuple;

 if (newtuple-t_len == oldtuple-t_len 
 newtuple-t_data-t_hoff == oldtuple-t_data-t_hoff 
 memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple),
newtuple-t_len - newtuple-t_data-t_hoff) == 0)
   rettuple = NULL;

 return PointerGetDatum(rettuple);


Also, when did we first require padding bytes to be 0?


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Tom Lane [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram [EMAIL PROTECTED] writes:
Currently, during Vacuum,  we goto the Index and mark it as dead and
  reclaim the space. For doing this, we are acquiring a Super-Exclusive
 lock.
  After this implementation, we would update the index tuple instead of
  marking it for cleanup. What can be foreseen as a locking overhead here?

 There are three operations involved: marking the child tuple not-HOT,
 updating the index entry(s), and removing the parent tuple.  I doubt you
 can do them safely as independent atomic operations; I think you'll need
 to have more than one page locked at a time.


I think, in my suggestion, there are only two steps, redirect the index
tuple tid to point to a new tuple and mark the parent heap tuple vacuumed.
This is no different from the existing Vacuum process, except that we are
updating the index tuple, instead of marking it for Vacuum . I think you are
thinking about the current implementation. We need not mark any tuple as HOT
in the proposed one. Instead our code will work under the assumption that
normal indexes won't get updated, unless the indexing info changes. i don't
foresee a requirement for HOT classification. Please explain me, if it is
required in any case.


 The locking problem also
 applies to trying to collapse out a dead HOT tuple that's in the middle
 of the chain:


Currently, we are maintaining a list/array in the tid_reaped or
lazy_tid_Reaped function. So this will contain all the tids that can be
Vacuumed. In our case,as already explained, all the heap tuples, which are
updated, will contain an extra tid, which is taken from it t_ctid stored in
t_data. So after we update the tid in the index-tuple, we can call
tid_reaped again to check whether the replaced tid has to get replaced / can
be Vacuumed


if it's the end of a cross-page link then you need two
 pages super-exclusive-locked in order to do that.


If the chain ends, the heap tuple would have got marked deleted. it won't
point to any tuple in the t_data and would have got marked HEAP_XMAX_VALID.
So it is the same case as Vacuuming a normal tuple. Have i correctly
answered your question??

There's also the little problem that a redirect line pointer doesn't
 have room for a cross-page link,


i don't know, what is a re-direct line pointer. The tid part of index tuple
can be made to point to any place. So i don't understand what you mean over
here.


and the big problem that having to
 chase across multiple pages for an index lookup would almost certainly
 negate any performance gains you might get.  (In particular it'd
 completely destroy locality of access for bitmap indexscans...)


Yes, i think  bitmap index scans will get affected, but only for the select
on heavily updated tuples. That's the cost we need to pay for extracting the
performance out of updates. As i already said the normal index would be the
index for heavily updated tables and thick index would be the index for
tables with heavy selects.

We can also look at optimizing the index tuple structure to reduce its
footprint.

Please get back, if i am not clear in any.
-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

There's also the little problem that a redirect line pointer doesn't
have room for a cross-page link,


i don't know, what is a re-direct line pointer.


Then you clearly don't understand at all how HOT works. Please go read 
src/backend/access/heap/README.HOT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Pavan Deolasee [EMAIL PROTECTED] wrote:



 On Nov 5, 2007 8:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED]
 wrote:

 
 
  There may not be anything called chain-pruning. Instead the tuples,
  which are to be vacuumed, will get vacuumed, after redirecting their index
  tuple peers, during the Vacuum process.
 
 

 This won't help us check the heap bloat. Though containing index bloat is
 important,
 most of the performance benefits of HOT comes from doing page level retail
 vacuuming.
 This not only reduces the heap bloat but also results in less frequent
 vacuuming
 of the table.


Can you please explain this in more detail?
If the HOT chain doesn't break and completely gets into a single page, the
Vacuum daemon need not intervene with the HOT tuples for space reclamation.
But ultimately the space would get reclaimed even with the normal
Vacuum(without HOT Pruning). Isn't it? Then how do we say that without HOT
Pruning, we will have Heap Bloat?





-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Slow regression tests on windows

2007-11-05 Thread Magnus Hagander
On Mon, Nov 05, 2007 at 08:43:01AM -0500, Andrew Dunstan wrote:
 
 
 Gevik Babakhani wrote:
 I am trying to run regression tests on both windows and RH.
 It looks like that the tests on windows run slower than linux 
 using two machines with same hardware config.
 
 Is this known?
 
   
 
 We need far  more information than this before we can say much sensibly, 
 I think.. For example, what configure flags you are using, what 
 compilers, etc.

In general, the regression tests are slower on windows, yes. It's because
there's a lot of processes spawned on both client and sderver side..

(Heck, in generall, *everything* in pg is slower on Windows than on Linux
with the same hardware config)


 From what I can see MinGW regression is somewhat slower than MSVC (and 
 even than Cygwin) on my buildfarm VM that runs all three, even though 
 the latter two are rate limited by MAX_CONNECTIONS.

Uh, you're saying MSVC and Cygwin somehow differ from MingW? 

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
Thanks Heikki. To clarify, there won't be any redirect-line pointers in this
implementation. That space is saved. We will have the index tuple point to
the latest live tuple in the update chain.  So no need for redirect-line
pointers.

Thanks,
Gokul.

On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  There's also the little problem that a redirect line pointer doesn't
  have room for a cross-page link,
 
  i don't know, what is a re-direct line pointer.

 Then you clearly don't understand at all how HOT works. Please go read
 src/backend/access/heap/README.HOT.

 --
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com




-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


[HACKERS] plpgsql keywords are hidden reserved words

2007-11-05 Thread Tom Lane
I was somewhat bemused just now to find that this function stopped
working:

regression=# create function estimate_rows(query text) returns float8 as $$
declare r text;
begin
  for r in execute 'explain ' || query loop
if substring(r from 'rows=[0-9]') is not null then
  return substring (r from 'rows=([0-9]+)');
end if;
  end loop;
  return null;
end$$ language plpgsql strict;
CREATE FUNCTION
regression=# select estimate_rows('select * from tenk1 where unique1500');
ERROR:  column query does not exist
LINE 1: SELECT  'explain ' || query
  ^
QUERY:  SELECT  'explain ' || query
CONTEXT:  PL/pgSQL function estimate_rows line 3 at FOR over EXECUTE statement

This works fine in 8.2.  The reason it no longer works is that query
is now a special token in the plpgsql lexer, and that means that it will
never be substituted for by read_sql_construct().  So it's effectively
a reserved word.

While I can work around this by changing the parameter name or using

for r in execute 'explain ' || estimate_rows.query loop

it's still a tad annoying, and it means that we have to be *very*
circumspect about adding new keywords to plpgsql.

I don't see any fix for this that's reasonable to try to shoehorn
into 8.3, but I think we really need to revisit the whole area of
plpgsql variable substitution during 8.4.  We could make this problem
go away if variable substitution happened through a parser callback
instead of before parsing.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] minimal update

2007-11-05 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, you could write the trigger in C and it'd work for any table.
 I think it could be as simple as a memcmp of the tuples' data areas,
 since we now require padding bytes to be 0 ...

 Something like this fragment?

   newtuple = trigdata-tg_newtuple;
   oldtuple = trigdata-tg_trigtuple;
   rettuple = newtuple;

   if (newtuple-t_len == oldtuple-t_len 
   newtuple-t_data-t_hoff == oldtuple-t_data-t_hoff 
   memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple),
  newtuple-t_len - newtuple-t_data-t_hoff) == 0)
 rettuple = NULL;

   return PointerGetDatum(rettuple);

Close, but I think you also need to take care to compare natts and
the null bitmaps (if any).  Might be worth comparing OIDs too, though
AFAIR there is no mechanism for substituting a different OID during
UPDATE.  Probably the easiest coding is to memcmp all the way from
offsetof(t_bits) to t_len, after comparing natts and the HASNULL and
HASOID flags.

 Also, when did we first require padding bytes to be 0?

The 8.3 varvarlena patch is what requires it, but in practice
heap_formtuple has always started with a palloc0, so I think it would
work a long ways back.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:

Thanks Heikki. To clarify, there won't be any redirect-line pointers in this
implementation. That space is saved. We will have the index tuple point to
the latest live tuple in the update chain.  So no need for redirect-line
pointers.


The redirected line pointers are there for a reason. The reason is to be 
able to retail vacuum (= prune) a page without having to do a regular 
vacuum, scanning all indexes.


If you lose that ability, the idea is dead in the water.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] plpgsql keywords are hidden reserved words

2007-11-05 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I was somewhat bemused just now to find that this function stopped
 working:

 regression=# create function estimate_rows(query text) returns float8 as $$
...
 This works fine in 8.2.  The reason it no longer works is that query
 is now a special token in the plpgsql lexer, and that means that it will
 never be substituted for by read_sql_construct().  So it's effectively
 a reserved word.

Perhaps we should be throwing a more intelligible error if you have a
parameter (or variable?) named in a way that will conflict?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Slow regression tests on windows

2007-11-05 Thread Andrew Dunstan



Magnus Hagander wrote:
From what I can see MinGW regression is somewhat slower than MSVC (and 
even than Cygwin) on my buildfarm VM that runs all three, even though 
the latter two are rate limited by MAX_CONNECTIONS.



Uh, you're saying MSVC and Cygwin somehow differ from MingW? 



  


I'm saying my buildfarm members differ from each other.

In fact, on the last run of each, MinGW was by far the slowest and 
Cygwin the fastest in the make check step. But there could be any 
number of reasons for that, including extraneous activity on the VM host 
that could have slowed the whole VM down.


Anyway, comparing regression test speeds is probably not very productive.

cheers

andredw

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  Thanks Heikki. To clarify, there won't be any redirect-line pointers in
 this
  implementation. That space is saved. We will have the index tuple point
 to
  the latest live tuple in the update chain.  So no need for redirect-line
  pointers.

 The redirected line pointers are there for a reason. The reason is to be
 able to retail vacuum (= prune) a page without having to do a regular
 vacuum, scanning all indexes.

 If you lose that ability, the idea is dead in the water.

 --


Since we are going to have the index point to the top of the chain and
sliding the chain happens concurrently with index and heap, there is no need
for redirected line pointers. But i doubt whether we can retail Vacuum a
page since there is just one HOT chain in the page. We can retail Vacuum
only the HOT chain and for the rest of the tuples, which are not HOT
updated, we need to consult with the indexes. Any other tuple in that page
has to go through the normal Vacuum process. Is my understanding correct
here? If so, we don't retail vacuum a page. We just try to shorten the HOT
chain at the expense of some space overhead. If this is going to give so
much benefit, then may be the design can be made slightly flexible in order
to accomodate it.
Say, if we have a table with 4 indexes and updates occur in such intervals,
we may not be able to find space in the same page for the update. Currently
we are incurring the overhead of updating all the indexes in this scenario.
Even if one of the index is updated, we will be incurring the same overhead
again in the current scenario.
In the proposed scenario, we will not have all those overheads. if  we feel
that this  overhead is required for achieving a pruning of HOT chain, then i
think we should let drop this idea.

Thanks,
Gokul.


   Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com




-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] Slow regression tests on windows

2007-11-05 Thread Magnus Hagander
On Mon, Nov 05, 2007 at 11:01:26AM -0500, Andrew Dunstan wrote:
 
 
 Magnus Hagander wrote:
 From what I can see MinGW regression is somewhat slower than MSVC (and 
 even than Cygwin) on my buildfarm VM that runs all three, even though 
 the latter two are rate limited by MAX_CONNECTIONS.
 
 
 Uh, you're saying MSVC and Cygwin somehow differ from MingW? 
 
 
   
 
 I'm saying my buildfarm members differ from each other.

Ah. I thought you meant we had some builtin cap in the regression tests,
which is what confused me.


 In fact, on the last run of each, MinGW was by far the slowest and 
 Cygwin the fastest in the make check step. But there could be any 
 number of reasons for that, including extraneous activity on the VM host 
 that could have slowed the whole VM down.

Yeah. It does surprise me a lot that cygwin should be fastest, really.


 Anyway, comparing regression test speeds is probably not very productive.

Agreed.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: PL/pgSQL EXECUTE INTO USING (for 8.4)

2007-11-05 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Pavel Stehule wrote:
 Hello,
 
 this proposal change older unaccepted proposal
 http://archives.postgresql.org/pgsql-hackers/2006-03/msg01157.php .
 
 Changes:
 * based on prepared statements
 * syntax and behave is near to Oracle
 * usable as protection from SQL injection
 
 New syntax:
 
 a) EXECUTE stringexpr
   [INTO [STRICT] varlist
   [USING exprlist]
 
 b) FOR varlist IN EXECUTE stringexpr USING exprlist LOOP 
 
 Reason:
 * defence from SQL injection
 * more readable, shorter, more comfortable
 
 Sample (secure dynamic statement):
 EXECUTE
  'SELECT * FROM ' ||
   CASE tblname
  WHEN 'tab1' THEN 'tab1'
  WHEN 'tab2' THEN 'tab2'
  ELSE 'some is wrong' END ||
   ' WHERE c1 = $1 AND c2 = $2'
USING unsecure_parameter1, unsecure_parameter2;
 
 Difference between PL/SQL and proposal:
 * allow only IN variables
 * use PostgreSQL placeholders notation - $n instead :n
 
 Compliance with PL/SQL
 * You can use numeric, character, and string literals as bind arguments
 * You cannot use bind arguments to pass the names of schema objects to
 a dynamic SQL statement.
 
 Best regards
 
 Pavel Stehule
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [COMMITTERS] pgsql: Add a note about another issue that needs to be considered before

2007-11-05 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 FWIW I found another issue with this variable when I was experimenting with
 small block sizes. If you set the target = the tuple header the toaster
 breaks. This is because it's doing unsigned arithmetic (Size is unsigned). I
 think the right solution is just to change it to use plain int32 arithmetic
 everywhere.

Somehow I can't get excited about that.  If we ever do open things up to
allow user control of the target, we'd certainly constrain it to a sane
range.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Slow regression tests on windows

2007-11-05 Thread Andrew Dunstan



Magnus Hagander wrote:
In fact, on the last run of each, MinGW was by far the slowest and 
Cygwin the fastest in the make check step. But there could be any 
number of reasons for that, including extraneous activity on the VM host 
that could have slowed the whole VM down.



Yeah. It does surprise me a lot that cygwin should be fastest, really.


  



Don't read anything into it. The VMware clock is quite unreliable, in my 
experience.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Gokulakannan Somasundaram wrote:
  a) The inserts won't increase the size of the table. If it increases, it
 has
  to lock one full page of Visibility map and this is not suitable for
 tables,
  which are short-lived like partitioned tables

 The overhead of locking a page is very small.

 Actually, extending a heap only needs to touch the visibility map when
 we need a new visibility map page, if we initialize all bits to zero.
 Like we do already anyway.


As you have pointed out 1 page in the visibility map points to 65535 pages
in the heap. So even if we are locking the visibility map for a small time,
it will affect all those scans, which will need to access these pages.

 b) Even if the inserts don't increase the size of the table, it might make
  DSM useless, if lot of inserts keep converting the all-visible ones to
  uncertain ones. For that matter, even the Deletes and Updates are also
 going
  to make lot of pages into uncertain ones.

 Sure. If you have a lot of (random) inserts/updates/deletes, it becomes
 much less useful.

 A small mitigating factor is that an insert/update/delete will fetch the
 heap page to memory anyway. Therefore having to access it just after the
 update is cheap. This helps inserts in particular, because after the
 inserting transaction is  OldestXmin, we can set the bit again.


But we can set this bit only after a Vacuum process. The tuples might not be
there, till the Vacuum process pitches in and marks this.

 c) Visibility map gets useless, when there is a long running batch query /
  periodic background queries which run for longer times

 Yeah, long running transactions are a bitch in many ways.


But  these are determined by business conditions and  we need to provide
efficient solutions to deal with it.

 d) More updates- more blocks of uncertainity - space usage by DSM and the
  reference made to DSM is just an overhead
  e) Lot of times, people may not need index-only scans. Again this gets
 to be
  a overhead

 The beauty of this approach is that the overhead is very small.

  f) If there are scheduled reboots, the DSM crashes and periodic
 slow-downs
  in the queries during the time, the DSM gets re-constructed.

 That's rubbish.


I think DSM is not  WAL-Logged.  So when it gets reconstructed every time
for a big table, isn't it a overhead?

 I am not opposing this, as it is a redundant feature for Thick indexes.
  After all every one of us, want Postgres to be the fastest one in the
 world.

 And also the easiest to maintain, most space-efficient, most reliable
 and so forth...


Provided it supports Vacuuming  Freezing..

 But because DSM has a inherent assumption that lot of tables will become
  static and all the tuples would be visible to everyone. If there are
 such
  tables, then definitely Thick index becomes a overhead in terms of
 space.
  But DSM should not become overhead at any cost, as it is a memory
 resident
  one at all times and also always gets into the lifecycle of a query.
 Only
  way to achieve it is to make it a dual purpose one. It should help
 Vacuum,
  freezing and visibility checks.

 I don't understand this paragraph.


Because  updates, inserts and deletes reduce the utility of Visibility map,
it seems to be designed for more static tables, which don't experience much
of these operations.



-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] pgsql: Add a note about another issue that needs to be considered before

2007-11-05 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 FWIW I found another issue with this variable when I was experimenting with
 small block sizes. If you set the target = the tuple header the toaster
 breaks. This is because it's doing unsigned arithmetic (Size is unsigned). I
 think the right solution is just to change it to use plain int32 arithmetic
 everywhere.

 Somehow I can't get excited about that.  If we ever do open things up to
 allow user control of the target, we'd certainly constrain it to a sane
 range.

Yeah, i didn't even mention it at the time. I just thought I would mention it
now since you were adding comments on restrictions to the toast threshold. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Martijn van Oosterhout
On Mon, Nov 05, 2007 at 09:32:08PM +0530, Gokulakannan Somasundaram wrote:
 Say, if we have a table with 4 indexes and updates occur in such intervals,
 we may not be able to find space in the same page for the update. Currently
 we are incurring the overhead of updating all the indexes in this scenario.
 Even if one of the index is updated, we will be incurring the same overhead
 again in the current scenario.

Ok, I've been following this tangentially, but here is one thing I
really don't understand: Sure, you might save this cost during update,
but you do incur this cost while updating the head of the chain. There
is no link from the chain to the index tuple, so the work to find the
current index tuple is nearly the same as the cost to create a new one.

It seems to me that updating and pruning the head will happen about
equally often, so I'm not sure you're saving anything here. Or am I
missing something?

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [HACKERS] Fwd: Clarification about HOT

2007-11-05 Thread Gokulakannan Somasundaram
On 11/5/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote:

 On Mon, Nov 05, 2007 at 09:32:08PM +0530, Gokulakannan Somasundaram wrote:
  Say, if we have a table with 4 indexes and updates occur in such
 intervals,
  we may not be able to find space in the same page for the update.
 Currently
  we are incurring the overhead of updating all the indexes in this
 scenario.
  Even if one of the index is updated, we will be incurring the same
 overhead
  again in the current scenario.

 Ok, I've been following this tangentially, but here is one thing I
 really don't understand: Sure, you might save this cost during update,
 but you do incur this cost while updating the head of the chain. There
 is no link from the chain to the index tuple, so the work to find the
 current index tuple is nearly the same as the cost to create a new one.

 It seems to me that updating and pruning the head will happen about
 equally often, so I'm not sure you're saving anything here. Or am I
 missing something?


Whatever you just said will happen during the index Vacuum. Instead of
marking a particular tuple as Dead/ ready to be Vacuumed, we will update its
tid to point to the tuple which is next to the head of the chain. This won't
incur extra overhead, except that during the Vacuum process we will carry
the tid of the tuple, which is next to the head of the chain(a memory
overhead).

Hope , i was clear.  Please get back, in case you haven't understood.


-- 
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)


Re: [HACKERS] pg_ctl configurable timeout

2007-11-05 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I think the mythical pg_ping utility should be written.  It seems the
  easiest way out of the problem.
 
 If pg_ctl were still a shell script there would be some point in that,
 but since it's a C program it can certainly do anything a separate
 utility would do.

Well, pg_ctl would not be the only user of such an utility.  Things like
(say) control panels for shared hosting could benefit from it as well.
As would system health monitors.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] pg_ctl configurable timeout

2007-11-05 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 I think the mythical pg_ping utility should be written.  It seems the
 easiest way out of the problem.
 
 If pg_ctl were still a shell script there would be some point in that,
 but since it's a C program it can certainly do anything a separate
 utility would do.

 Well, pg_ctl would not be the only user of such an utility.  Things like
 (say) control panels for shared hosting could benefit from it as well.
 As would system health monitors.

I still see no point in creating a separate binary for the
functionality.  If you want to make it available to shell scripts,
invent a pg_ctl ping subcommand.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] dblink un-named connection doesn't get re-used

2007-11-05 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Decibel! wrote:
 Is it intentional that dblink's unnamed connections don't get re-used?
 
 stats=# select datname, usename from pg_stat_activity;
 datname | usename
 -+-
 stats   | decibel
 (1 row)
 
 stats=# select dblink_connect('dbname=stats');
 dblink_connect
 
 OK
 (1 row)
 
 stats=# select dblink_connect('dbname=postgres');
 dblink_connect
 
 OK
 (1 row)
 
 stats=# select datname, usename from pg_stat_activity;
 datname  | usename
 --+--
 stats| decibel
 stats| postgres
 postgres | postgres
 (3 rows)
 
 AFAIK there's no way I could possibly use or refer to the connection  
 to stats at this point; so why doesn't dblink close it when I issue  
 the second connect?
 -- 
 Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
 Give your computer some brain candy! www.distributed.net Team #1828
 
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] A small rant about coding style for backend functions

2007-11-05 Thread Bruce Momjian

I understand your suggestions but it seems there would be too many
individual items to be readable.  Can you suggest a full list so we can
get an idea of how long it would be?

---

Brendan Jurd wrote:
 On 11/1/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
  I have not forgotten this suggestion.  Do have any ideas what such a
  list would look like?  Examples?
 
 
 Thanks for the reply Bruce.
 
 Code examples, perhaps with good style and bad style versions to
 illustrate each point.
 
 In the case of Tom's OP about making all your GETARG calls at the top
 of the function, you could show (for a trivial function) how it looks
 with all the GETARGs stacked neatly at the top, and how it looks with
 the GETARGs scattered and inaccurate.
 
 I think that would make it immediately clear to any newbie why it's a good 
 idea.
 
  I think we have avoided more details in fear of scaring off coders.
  People usually follow our style as they gain experience.  Having a hard
  list seems like it would be a lot of do's and don't's.
 
 Just my perspective, but I think you'll scare off a lot more coders by
 giving them no firm guidance in the first place, and then jumping down
 their throats with you did this wrong when they post a patch.
 
 Might be worth opening up a wiki page for devs to contribute their
 insights about writing excellent (as opposed to merely passable)
 Postgres code.  The GETARG rant could make a good starting point.
 
 Cheers
 BJ
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Open items for 8.3

2007-11-05 Thread Bruce Momjian
In an attempt to move us toward 8.3 RC1 I have put all open item emails
into the patches queue:

http://momjian.postgresql.org/cgi-bin/pgpatches

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 5 Nov 2007 12:47:10 -0500 (EST)
Bruce Momjian [EMAIL PROTECTED] wrote:

 In an attempt to move us toward 8.3 RC1 I have put all open item
 emails into the patches queue:
 
   http://momjian.postgresql.org/cgi-bin/pgpatches
 

It would be great if this would push to the wiki. That has been working
really well through the cycle.

Joshua D. Drake

- -- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHL1jNATb/zqfZUUQRAmI0AJ9HtxqXM52c9p999mkQ4CfZWQMeQQCfYMh2
arqwrUGX8YJkw5l4K/hkRM4=
=cmdZ
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] plpgsql keywords are hidden reserved words

2007-11-05 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 This works fine in 8.2.  The reason it no longer works is that query
 is now a special token in the plpgsql lexer, and that means that it will
 never be substituted for by read_sql_construct().  So it's effectively
 a reserved word.

 Perhaps we should be throwing a more intelligible error if you have a
 parameter (or variable?) named in a way that will conflict?

Actually, it seems you already do get such a complaint if you try to
use a keyword as a variable name:

ERROR:  syntax error at or near query
LINE 2: declare query text;
^

Not the most tremendously helpful message, maybe, but at least it's
pointing at the right place.  So the problem is only for function
parameter names, which aren't lexed by plpgsql itself but by the main
parser.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] plpgsql keywords are hidden reserved words

2007-11-05 Thread Tom Lane
John DeSoi [EMAIL PROTECTED] writes:
 Is there any feasibility to the idea of allowing pl/pgsql variables  
 and parameters to be prefixed with a special character like '$'?

I find this (a) really ugly, (b) incompatible with Oracle, which
you'll recall is one of the main driving ideas for plpgsql, and
(c) almost certainly a lexical conflict with dollar quoting.

Most other special characters you might suggest would create
parsing ambiguities too.

 I'm constantly adding prefixes like 'v_' because of conflicts with table  
 or column names.

The reason you have to do that is that we got the lookup order backward:
per Oracle, column names within a query should bind more tightly than
plpgsql variable names, and if you need to disambiguate you qualify
the variables.  We should fix that instead of bogotifying the syntax.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?

2007-11-05 Thread Zdenek Kotala
I'm trying fix independence of pg_dump.c on postgres.h. And I found 
following construct in dumpSequence function:




09391 snprintf(bufm, sizeof(bufm), INT64_FORMAT, SEQ_MINVALUE);
09392 snprintf(bufx, sizeof(bufx), INT64_FORMAT, SEQ_MAXVALUE);
09393
09394 appendPQExpBuffer(query,
09395   SELECT sequence_name, last_value, 
increment_by, 
09396CASE WHEN increment_by  0 AND max_value = %s 
THEN NULL 
09397 WHEN increment_by  0 AND max_value = -1 
THEN NULL 

09398ELSE max_value 
09399   END AS max_value, 
09400 CASE WHEN increment_by  0 AND min_value = 1 
THEN NULL 
09401 WHEN increment_by  0 AND min_value = %s 
THEN NULL 

09402ELSE min_value 
09403   END AS min_value, 
09404   cache_value, is_cycled, is_called from %s,
09405   bufx, bufm,
09406   fmtId(tbinfo-dobj.name));


This construct is used to determine if max_value/min_value is used and 
after that pg_dump add NO MAXVALUE to the output instead of the value. 
If I compare it with manual documentation NO MAXVALUE uses default value 
and I do not see any reason to have this code there. I think we can 
remove this code and release dependency on sequence.h.


Any comments?

Zdenek

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Bruce Momjian
Joshua D. Drake wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Mon, 5 Nov 2007 12:47:10 -0500 (EST)
 Bruce Momjian [EMAIL PROTECTED] wrote:
 
  In an attempt to move us toward 8.3 RC1 I have put all open item
  emails into the patches queue:
  
  http://momjian.postgresql.org/cgi-bin/pgpatches
  
 
 It would be great if this would push to the wiki. That has been working
 really well through the cycle.

The wiki has a nice summary and links to the discussion.  I don't have
time to do that much work on this, and maintain it.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 On Mon, 5 Nov 2007 12:47:10 -0500 (EST)
 Bruce Momjian [EMAIL PROTECTED] wrote:

 In an attempt to move us toward 8.3 RC1 I have put all open item
 emails into the patches queue:
 
  http://momjian.postgresql.org/cgi-bin/pgpatches
 

 It would be great if this would push to the wiki. That has been working
 really well through the cycle.

How many developers have even jumped through the hoops to get wiki accounts?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] sending row data to frontend - reg

2007-11-05 Thread Rose Catherine K
Hi All,

we are trying to modify the source code of postgresql to support a new
command pivot colA colB tablename
we are doing the following steps:
1. After parsing the input, we are able to get the values entered by the
user for column A , column B and tablename.
2. Then, from postgresmain, we call exec_simple_query with a new query
select * from tablename
3. Then we read the result of execution of this query from
sendrowdescriptionmessage and printtup functions and also, don't allow these
to send the result of the select query to the frontend
4. now, we do the required changes needed for the pivot command
5. after this, from executorrun,  we call sendrowdescriptionmessage function
( with modifications) to send the newly calculated row description - this is
working fine
6. Now, we want to send the newly calculated row data to the frontend. We
are not able to do this...
is there any way to send the row values, without invoking the executeplan
again? - since executeplan is overwriting the earlier sent rowdescription
(which we are not able to figure out why ! )

Can anyone help us?
we would be grateful, if you could give any pointers in this regard.

Thanking you in advance,
Rose.


Re: [HACKERS] pg 8.3beta 2 restore db with autovacuum report

2007-11-05 Thread Alvaro Herrera
Guillaume Smet wrote:

 I did the test again with the reference database I used a month ago.
 
 My previous figures with 8.3devel of October 1st were:
 - autovacuum off: 14m39
 - autovacuum on, delay 20: 51m37
 
 With 8.3devel of today, I have:
 - autovacuum on, delay 20: 15m26

Yay!  Thanks!

(It does take a bit longer, but I'm not really concerned about it.)

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica (Irulan)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Postgresql 8.3 beta crash

2007-11-05 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Heikki Linnakangas wrote:
 It still feels unsafe to call ExecEvalExpr while holding on to xml
 structs. It means that it's not safe for external modules to use
 libxml2 and call xmlMemSetup or xmlSetGenericErrorFunc themselves.

 Well yeah, they shouldn't do that.  I don't think we want to support 
 that.

I'm with Heikki that it would be cleaner/safer if we could allow that.
The particular case that's bothering me is the idea that something like
Perl could well try to use libxml internally, and if so it'd very likely
call these functions.  Now if Perl thinks it's got sole control, and
tries to (say) re-use the results of xmlInitParser across calls, we're
screwed anyway.  But that's not an argument for designing our own code
in a way that guarantees it can't share libxml with other code.

So I'm thinking that we should continue to call xmlMemSetup,
xmlSetGenericErrorFunc, and xmlInitParser (if needed) at the start of
every XML function, and reorganize the code so that we don't call out
to random other code until we've shut down libxml again.

The main disadvantage I can see of reorganizing like that is that
it will increase xmlelement's transient memory consumption, since it
will need to accumulate all the OutputFunctionCall result strings
before it starts to pass them to libxml.  This probably isn't a huge
problem though.

Has anyone started actively working on this yet?  If not, I can tackle
it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?

2007-11-05 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 This construct is used to determine if max_value/min_value is used and 
 after that pg_dump add NO MAXVALUE to the output instead of the value. 
 If I compare it with manual documentation NO MAXVALUE uses default value 
 and I do not see any reason to have this code there. I think we can 
 remove this code and release dependency on sequence.h.

IIUC you are proposing that it's okay to print random-huge-values in
the dumped CREATE SEQUENCE commands.  I don't agree with that.
In particular it would make for a legacy/compatibility issue for
INT64_IS_BROKEN platforms.

A better solution might be to move the declarations of
SEQ_MINVALUE/SEQ_MAXVALUE someplace else.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] sending row data to frontend - reg

2007-11-05 Thread Tom Lane
Rose Catherine K [EMAIL PROTECTED] writes:
 we are trying to modify the source code of postgresql to support a new
 command pivot colA colB tablename
 we are doing the following steps:
 1. After parsing the input, we are able to get the values entered by the
 user for column A , column B and tablename.
 2. Then, from postgresmain, we call exec_simple_query with a new query
 select * from tablename
 3. Then we read the result of execution of this query from
 sendrowdescriptionmessage and printtup functions and also, don't allow these
 to send the result of the select query to the frontend
 4. now, we do the required changes needed for the pivot command
 5. after this, from executorrun,  we call sendrowdescriptionmessage function
 ( with modifications) to send the newly calculated row description - this is
 working fine
 6. Now, we want to send the newly calculated row data to the frontend. We
 are not able to do this...

It sounds to me like you've kluged the I/O support stuff to the point
that it doesn't work at all.  Leave that alone and use SPI to execute
your internal query.

As a general style suggestion, if you are touching postgres.c at all
to add a new command, you are putting it in the wrong place.  The guts
of it should be under backend/commands/.  I think you ought to look at
existing utility commands that return rows, such as EXPLAIN or SHOW,
for guidance.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Jeff Davis
On Mon, 2007-11-05 at 09:52 +, Heikki Linnakangas wrote:
 Reducing VACUUM time is important, but the real big promise is the 
 ability to do index-only-scans. Because that's the main focus of this 
 exercise, I'm calling it the the Visibility Map from now on, because 
 it's not about tracking dead space, but tuple visibility in general. 
 Don't worry, reduced VACUUM times on read-mostly tables with hot spots 
 will still fall out of it.

I like Visibility map because it's a positive name, and that prevents
confusion over double-negatives (for the same reason it's called
synchronous_commit even though the new feature is the ability to be
asynchronous). With Dead Space Map, I wouldn't immediately know whether
a 1 means always visible or might be invisible.

However, DSM is a much less overloaded acronym than VM ;)

Regarding the focus, it will depend a lot on the user. Some people will
care more about VACUUM and some will care more about index-only scans.

 It's not useful for VACUUM FREEZE, unless we're willing to freeze much 
 more aggressively, and change the meaning of a set bit to all tuples on 
 heap page are frozen.
 

This means that a regular VACUUM will no longer be enough to ensure
safety from transaction id wraparound.

I don't think this will be hard to fix, but it's an extra detail that
would need to be decided. The most apparent options appear to be:

1) Do as you say above. What are some of the cost trade-offs here? It
seems that frequent VACUUM FREEZE runs would keep the visibility map
mostly full, but will also cause more writing. I suppose the worst case
is that every tuple write needs results in two data page writes, one
normal write and another to freeze it later, which sounds bad. Maybe
there's a way to try to freeze the tuples on a page before it's written
out?

The idea of frozen and always visible seem very close in concept to
me.

2) Change to autovacuum to FREEZE on the forced autovacuum to prevent
wraparound.

3) Use multiple bits per visibility map

4) Have multiple types of visibility maps

The more I think about the visibility map, the more I think it will be a
huge win for PostgreSQL. It's especially nice that it works so well with
HOT. Thanks for working on it!

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?

2007-11-05 Thread Zdenek Kotala

Tom Lane wrote:



A better solution might be to move the declarations of
SEQ_MINVALUE/SEQ_MAXVALUE someplace else.


Hmm. It seems better, but it is also hard to find correct place. :( I'm 
thinking put it into c.h.


Another question why sequence does not have separate flag which 
determines if it is default/no max value or predefined?


Any comments, better ideas?

Zdenek

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?

2007-11-05 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A better solution might be to move the declarations of
 SEQ_MINVALUE/SEQ_MAXVALUE someplace else.

 Hmm. It seems better, but it is also hard to find correct place. :( I'm 
 thinking put it into c.h.

The idea that was in the back of my mind was pg_config_manual.h,
since these numbers could be seen as configuration constants if
you hold your head at the right angle ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] should I worry?

2007-11-05 Thread Rick Gigger
Doesn't DROP TRIGGER require the name of the trigger?  He says they  
are unnamed.  How then does he drop them?



On Nov 5, 2007, at 6:31 AM, Tom Lane wrote:


[EMAIL PROTECTED] writes:

On Sun, 4 Nov 2007, Tom Lane wrote:
So you have a *bunch* of partially broken FK constraints in that  
source

database.



I just talk to my customer and he/we'll make a big batch deleting and
recreating all foreign keys on 8.2.5.
The question is, how do we get rid of those useless unamed  
triggers?


DROP TRIGGER should be fine.

regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that  
your

  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] should I worry?

2007-11-05 Thread Heikki Linnakangas

Rick Gigger wrote:
Doesn't DROP TRIGGER require the name of the trigger?  He says they are 
unnamed.  How then does he drop them?


They're not really unnamed. pg_dump just replaces the real name with 
unnamed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Though 8.3 isn't out of the oven just yet, I've been thinking about the 
 dead space map a lot, and decided I have to start writing down those 
 thoughts.

I think we should do this at the same time as pushing the FSM out of
shared memory, and design a data structure that serves both needs.

 Where to store the visibility map?
 --

 a) In a fixed size shared memory struct. Not acceptable.

 b) In the special area of every nth heap page. I played a bit with this 
 back in February 2006, because it's simple and requires few changes.

 c) As a new relation kind, like toast tables.

 d) As an auxiliary smgr relation, attached to the heap.

 I'm leaning towards D at the moment. It requires a little bit of changes 
 to the buffer manager API and elsewhere, but not too much.

I like B.  The main objection I have to D is that it'll be far more
invasive to the buffer manager (and a bunch of other code) than you
admit; for starters RelFileNode won't work as-is.  Another problem with
D is that you can't readily make the number of blocks per visibility
page an exact power of 2, unless you are willing to waste near half of
each visibility page.  That will complicate and slow down addressing
... ok, maybe not much, but some.

What I'm envisioning is that we dedicate a quarter or half of every n'th
heap page to visibility + free space map.  Probably a byte per page is
needed (this would give us 1 visibility bit and 7 bits for free space,
or other tradeoffs if needed).  So there would be 2K or 4K heap pages
associated with each such special page.  Or we could dial it down to
even less, say 1K heap pages per special page, which would have the
advantage of reducing update contention for those pages.

 Setting a bit is just a hint. It's ok to lose it on crash. However, 
 setting a bit mustn't hit the disk too early. What might otherwise 
 happen is that the change that made all tuples on a page visible, like 
 committing an inserting transaction, isn't replayed after crash, but the 
 set bit is already on disk. In other words, setting a bit must set the 
 LSN of the visibility map page.

I don't think this really works.  You are effectively assuming that no
kind of crash-induced corruption can set a bit that you didn't intend
to set.  Stated in those terms it seems obviously bogus.  What we will
need is that every WAL-logged update operation also include the
appropriate setting or clearing of the page's visibility bit; where
necessary, add new information to the WAL trace to allow this.

 To further reduce contention, we can have a copy of the bit in the page 
 header of the heap page itself. That way we'd only need to access the 
 visibility map on the first update on a page that clears a bit.

Seems exceedingly prone to corruption.

 - We don't need to clear the bit on HOT updates, because by definition 
 none of the indexed columns changed.

Huh?  I don't think I believe that, and I definitely don't believe your
argument for it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] should I worry?

2007-11-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Rick Gigger wrote:
 Doesn't DROP TRIGGER require the name of the trigger?  He says they are 
 unnamed.  How then does he drop them?

 They're not really unnamed. pg_dump just replaces the real name with 
 unnamed.

And \d will show the real names of the triggers, so it's not really
that hard to drop them:

u=# \d t1
  Table public.t1
 Column |  Type   | Modifiers 
+-+---
 f1 | integer | not null
Indexes:
t1_pkey PRIMARY KEY, btree (f1)
Triggers:
RI_ConstraintTrigger_229629 AFTER DELETE ON t1 FROM t2 NOT DEFERRABLE 
INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE 
RI_FKey_noaction_del('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1')
RI_ConstraintTrigger_229630 AFTER UPDATE ON t1 FROM t2 NOT DEFERRABLE 
INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE 
RI_FKey_noaction_upd('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1')

u=# drop trigger RI_ConstraintTrigger_229629 on t1;
DROP TRIGGER

I do recall newbies forgetting to double-quote the mixed-case trigger
names when this came up in times past, though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 05 Nov 2007 18:57:39 +
Gregory Stark [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
 
  On Mon, 5 Nov 2007 12:47:10 -0500 (EST)
  Bruce Momjian [EMAIL PROTECTED] wrote:
 
  In an attempt to move us toward 8.3 RC1 I have put all open item
  emails into the patches queue:
  
 http://momjian.postgresql.org/cgi-bin/pgpatches
  
 
  It would be great if this would push to the wiki. That has been
  working really well through the cycle.
 
 How many developers have even jumped through the hoops to get wiki
 accounts?

You don't need a wiki account to view.
 


- -- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHL4+NATb/zqfZUUQRAn4KAJ9AsHQODMrle5yRmxz8rGbG4upsXwCbBjLH
1FeUN3lIPZGw/RUF4MIY59Q=
=nGLR
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Segmentation fault using digest from pg_crypto

2007-11-05 Thread Alvaro Herrera
Bruce Momjian wrote:
 Marko Kreen wrote:
  On 8/24/07, Manuel Sugawara [EMAIL PROTECTED] wrote:

   If something is not going a work (or is going a work in a different
   way) in some version after loading a shot form a previous one I think
   it should be documented or some kind of backwards compatibility
   mechanism should be provided.
  
  That's a valid complaint and I take the blame.
  
  The problem was that such evolution was not actually planned.
  So when I noticed the 8.2 commit, I did not think of the implications
  hard enough to realize the need for release note for it.
  
  Tom, how about putting a note about that into next 8.2 minor
  release notes?  (8.3 too?)  Something like You need to refresh
  pgcrypto functions, because since rel 8.2 the code depends
  on functions being tagged STRICT.
 
 Seems 8.2.5 was released without this release notes mention, but we
 haven't gotten any complaints about it so perhaps we don't need to add
 anything.

Huh, I see exactly that complaint above.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo (G. Lama)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Segmentation fault using digest from pg_crypto

2007-11-05 Thread Bruce Momjian
Alvaro Herrera wrote:
If something is not going a work (or is going a work in a different
way) in some version after loading a shot form a previous one I think
it should be documented or some kind of backwards compatibility
mechanism should be provided.
   
   That's a valid complaint and I take the blame.
   
   The problem was that such evolution was not actually planned.
   So when I noticed the 8.2 commit, I did not think of the implications
   hard enough to realize the need for release note for it.
   
   Tom, how about putting a note about that into next 8.2 minor
   release notes?  (8.3 too?)  Something like You need to refresh
   pgcrypto functions, because since rel 8.2 the code depends
   on functions being tagged STRICT.
  
  Seems 8.2.5 was released without this release notes mention, but we
  haven't gotten any complaints about it so perhaps we don't need to add
  anything.
 
 Huh, I see exactly that complaint above.

My point is that we can't mention it in the release notes until 8.2.6. 
Will there still be people who are having an issue with it who haven't
found the problem already?  And if we put it in 8.2.6, it really was
effective for 8.2 so we just mention it in 8.2.6 and say it applies to
8.2.X too?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Test lab

2007-11-05 Thread Mark Wong
On 11/4/07, Simon Riggs [EMAIL PROTECTED] wrote:
 Mark,

 Why don't you post a TODO list for TPC-E somewhere, so people can bite
 small pieces off of the list. I'm sure there's lots of people can help
 if we do it that way.

This should be a good start:

http://osdldbt.sourceforge.net/dbt5/todo.html

Regards,
Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Heikki Linnakangas

Jeff Davis wrote:

On Mon, 2007-11-05 at 09:52 +, Heikki Linnakangas wrote:
It's not useful for VACUUM FREEZE, unless we're willing to freeze much 
more aggressively, and change the meaning of a set bit to all tuples on 
heap page are frozen.


This means that a regular VACUUM will no longer be enough to ensure
safety from transaction id wraparound.


Good point. So we'd still need regular VACUUMs every now and then.

(Gosh, we really need a name for the sort of vacuum. I was about to say 
we'd still need regular regular VACUUMs :-))



I don't think this will be hard to fix, but it's an extra detail that
would need to be decided. The most apparent options appear to be:

1) Do as you say above. What are some of the cost trade-offs here? It
seems that frequent VACUUM FREEZE runs would keep the visibility map
mostly full, but will also cause more writing. I suppose the worst case
is that every tuple write needs results in two data page writes, one
normal write and another to freeze it later, which sounds bad. Maybe
there's a way to try to freeze the tuples on a page before it's written
out?


It would also create more WAL traffic, because freezing tuples needs to 
be WAL-logged.



2) Change to autovacuum to FREEZE on the forced autovacuum to prevent
wraparound.


Doesn't necessarily need to be a VACUUM FREEZE. Just a regular VACUUM 
instead of using the visibility map.



3) Use multiple bits per visibility map


That would work.


4) Have multiple types of visibility maps


I'd rather not do that. It starts to get more complex and more expensive 
to update.


5) Have a more fine-grain equivalent of relfrozenxid. For example one 
frozenxid per visibility map page, so that whenever you update the 
visibility map, you also update the frozenxid. To advance the 
relfrozenxid in pg_class, you scan the visibility map and set 
relfrozenxid to the smallest frozenxid. Unlike relfrozenxid, it could be 
set to FrozenXid if the group of pages are totally frozen.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] should I worry?

2007-11-05 Thread Rick Gigger

Ah, yes it was the quotes.  I guess that makes me a newbie.  :)


On Nov 5, 2007, at 1:53 PM, Tom Lane wrote:


Heikki Linnakangas [EMAIL PROTECTED] writes:

Rick Gigger wrote:
Doesn't DROP TRIGGER require the name of the trigger?  He says  
they are

unnamed.  How then does he drop them?



They're not really unnamed. pg_dump just replaces the real name with
unnamed.


And \d will show the real names of the triggers, so it's not really
that hard to drop them:

u=# \d t1
Table public.t1
Column |  Type   | Modifiers
+-+---
f1 | integer | not null
Indexes:
t1_pkey PRIMARY KEY, btree (f1)
Triggers:
RI_ConstraintTrigger_229629 AFTER DELETE ON t1 FROM t2 NOT  
DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE  
RI_FKey_noaction_del('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2',  
'f1')
RI_ConstraintTrigger_229630 AFTER UPDATE ON t1 FROM t2 NOT  
DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE  
RI_FKey_noaction_upd('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2',  
'f1')


u=# drop trigger RI_ConstraintTrigger_229629 on t1;
DROP TRIGGER

I do recall newbies forgetting to double-quote the mixed-case trigger
names when this came up in times past, though.

regards, tom lane




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
Though 8.3 isn't out of the oven just yet, I've been thinking about the 
dead space map a lot, and decided I have to start writing down those 
thoughts.


I think we should do this at the same time as pushing the FSM out of
shared memory, and design a data structure that serves both needs.


I'm afraid the data structure required for FSM is very different.

For the visibility map, we need something that can be quickly addressed 
by heap block number. For FSM, we need something that's addressable by 
tuple size.


Cache locality of having the FSM and the visibility map in the same 
structure is also not very good. Visibility map is accessed by reads, 
and updates/inserts/deletes on pages that previously had a set bit in 
the map, while the FSM is only needed for cold updates and deletes.



Where to store the visibility map?
--



a) In a fixed size shared memory struct. Not acceptable.


b) In the special area of every nth heap page. I played a bit with this 
back in February 2006, because it's simple and requires few changes.



c) As a new relation kind, like toast tables.



d) As an auxiliary smgr relation, attached to the heap.


I'm leaning towards D at the moment. It requires a little bit of changes 
to the buffer manager API and elsewhere, but not too much.


I like B.  The main objection I have to D is that it'll be far more
invasive to the buffer manager (and a bunch of other code) than you
admit; for starters RelFileNode won't work as-is.


One problem is that you have to atomically update the visibility map 
when you update the heap. That means that you have to lock the 
visibility map page and the heap page at the same time. If the 
visibility map is in the heap, you need to take care that you don't 
deadlock.


We can't directly use B for the FSM, because we also need a FSM for 
indexes, so we'll need something else for indexes anyway.


B is also problematic if we ever get to do upgrades without 
dump/restore, because it requires changes to the format of the heap itself.



What I'm envisioning is that we dedicate a quarter or half of every n'th
heap page to visibility + free space map.  Probably a byte per page is
needed (this would give us 1 visibility bit and 7 bits for free space,
or other tradeoffs if needed).  So there would be 2K or 4K heap pages
associated with each such special page.  Or we could dial it down to
even less, say 1K heap pages per special page, which would have the
advantage of reducing update contention for those pages.


How would you search that for a page with X bytes of free space?


I don't think this really works.  You are effectively assuming that no
kind of crash-induced corruption can set a bit that you didn't intend
to set.  


Don't we already assume that for hint-bit updates?


Stated in those terms it seems obviously bogus.  What we will
need is that every WAL-logged update operation also include the
appropriate setting or clearing of the page's visibility bit; where
necessary, add new information to the WAL trace to allow this.


I think we already emit a WAL record whenever we would set the bit in 
the visibility map, so we can certainly do that. It was more an 
interesting observation than anything else.


To further reduce contention, we can have a copy of the bit in the page 
header of the heap page itself. That way we'd only need to access the 
visibility map on the first update on a page that clears a bit.


Seems exceedingly prone to corruption.


It does?

It seems fairly simple to keep it up-to-date. Whenever we update the 
visibility map, we're holding the heap page locked as well.


If you were thinking of hardware failure, I don't see how that bit would 
be more susceptible to that, and the potential damage isn't any bigger 
than from any other random bit-flip either.


We can double-check and correct both the bit in the page header and in 
the visibility map whenever we perform a regular vacuum (or prune), if 
they did get corrupt for some reason.


- We don't need to clear the bit on HOT updates, because by definition 
none of the indexed columns changed.


Huh?  I don't think I believe that, and I definitely don't believe your
argument for it.


HOT-updating a row doesn't change what an index-only scan would see. An 
index-only scan only needs columns that are in the index, and since it's 
a HOT update, none of those columns have changed, so it doesn't matter 
which tuple we're returning them from.


Pages that have HOT updated tuples, but haven't otherwise been modified 
since last vacuum are also not interesting for VACUUM, because a prune 
will do the job of getting rid of dead HOT-updated tuples just as well.


Am I missing something?


Another thought: Should we support having a FSM and visibility map on 
temp tables? Doesn't seem very useful, but why not, I guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] plpgsql keywords are hidden reserved words

2007-11-05 Thread John DeSoi


On Nov 5, 2007, at 10:20 AM, Tom Lane wrote:


I don't see any fix for this that's reasonable to try to shoehorn
into 8.3, but I think we really need to revisit the whole area of
plpgsql variable substitution during 8.4.  We could make this problem
go away if variable substitution happened through a parser callback
instead of before parsing.



Is there any feasibility to the idea of allowing pl/pgsql variables  
and parameters to be prefixed with a special character like '$'? I'm  
constantly adding prefixes like 'v_' because of conflicts with table  
or column names. It would be nice to have something like declare  
$myvar integer; so it would be very easy to distinguish variable and  
parameter names from structure names or reserved words.




John DeSoi, Ph.D.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] A small rant about coding style for backend functions

2007-11-05 Thread Brendan Jurd
On 11/6/07, Bruce Momjian [EMAIL PROTECTED] wrote:

 I understand your suggestions but it seems there would be too many
 individual items to be readable.  Can you suggest a full list so we can
 get an idea of how long it would be?

If the body of material on writing good Postgres code becomes so
comprehensive that it doesn't all fit on one page:

 a) I would see that as a positive!
 b) We can of course split it up into sub-articles.

I can't realistically suggest a full list since I am not an
experienced Postgres hacker.  But I would hope for, as a minimum:

 * the stuff about good GETARG style,
 * something about using ereport() effectively, and writing
localization-friendly messages,
 * some actual coding style guidelines.

Regards,
BJ

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Jeremy Drake
On Mon, 5 Nov 2007, Gregory Stark wrote:

 How many developers have even jumped through the hoops to get wiki accounts?

According to
http://developer.postgresql.org/index.php?title=Special:Listusersgroup=pgdevlimit=500

there are currently 51 members of the group pgdev on the wiki.


-- 
Spare no expense to save money on this one.
-- Samuel Goldwyn

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Open items for 8.3

2007-11-05 Thread Magnus Hagander
  How many developers have even jumped through the hoops to get wiki accounts?
 
 According to
 http://developer.postgresql.org/index.php?title=Special:Listusersgroup=pgdevlimit=500
 
 there are currently 51 members of the group pgdev on the wiki.
 

Well, a lot of those people aren't actually developers, which was the question. 
But a lot of them are. More interesting would be how many has ever edited 
anything past just signing up...

/Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-11-05 Thread Shreya Bhargava
Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.


We performed some probe tests on our patch on 
hash index and the original btree index to compare the 
performance between the two. We used a 80 million tuple table.
The table contained single integer attribute and the data
range was 0 - 1. (generated by a random generator).
We did the following:

1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache  psql buffers.
   (exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an integer in the range of values in the table.
   (all test numbers were big ones, like 98934599)
6. record the time.
7. exit psql.
8. drop caches.(as described above)
9. repeat 4-8 for different numbers.
10. Drop Hash index.
11. Create Btree index and repeat 3-9.
 
The results are as shown below. All trials are in ms. 
Count is the number of such tuples in the table.
 
 HASH INDEX:
 Number   Count   Trial1  Trial2  Trial3
 21367898  2 152.0129.5  131.1
 95678699  2 140.6145.6  137.5
 99899799  2 148.0147.4  152.6
 97677899  2 142.0153.5  112.0
 94311123  2 137.6146.3  141.3
 67544455  2 141.6144.6  152.7
 88877677  2 122.1123.1  130.8
 88788988  2 150.2150.0  171.7
 4444  1 119.9116.3  117.6
 34267878  1  97.5 99.9  114.8
 55489781  2 115.7117.2  145.3 
 97676767  1 169.5168.5  181.7 
 99767564  1 142.7133.6  132.7
 21367989  4 198.0193.2  186.7
 
BTREE INDEX
 
 Number  Trial1Trial2  Trial3
 21367898   145.5 145.6   150.6
 95678699   177.5 170.0   176.4
 99899799   175.4 181.2   185.4
 97677899   136.9 149.0   130.8
 94311123   179.0 175.3   166.3
 67544455   161.7 162.2   170.4
 88877677   138.7 135.2   149.9
 88788988   165.7 179.3   186.3
 4444   166.0 172.8   184.3
 34267878   159.1 168.8   147.8
 55489781   183.2 195.4   185.1
 97676767   147.2 143.6   132.6
 99767564   167.8 178.3   162.1
 21367989   232.3 238.1   216.9
From the results obtained, the average of all the hash probes is 141.8ms, the 
average for btree is 168.5, a difference of about 27.The standard deviations 
are about 23, so this is a statistically significant difference.

Our prediction that the hash index would take on the average one
probe for about 10ms and the btree would take three probes for about 30 ms
or a difference of about 20ms was pretty well shown by the difference we
got of about 27. Hope these data points will help with some questions
about the performance differences between Hash and Btree index.

Regards,
Shreya




Gregory Stark [EMAIL PROTECTED] wrote: Kenneth Marshall  writes:

 On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:

 Using our implementation, build times and index sizes are
 comparable with btree index build times and index sizes.
...
 That is super! (and timely)

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree 

Re: [HACKERS] Visibility map thoughts

2007-11-05 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 One problem is that you have to atomically update the visibility map when
 you update the heap. That means that you have to lock the visibility map
 page and the heap page at the same time. If the visibility map is in the
 heap, you need to take care that you don't deadlock.

Well that's still a problem if it's in another filenode.

On the other hand if you allocate a whole byte to every page you could set it
atomically and not have to lock the page. Effectively having the lock on the
original page protect the info byte. Whereas setting a single bit requires
protecting against someone setting one of the other bits corresponding to
another page entirely.

 - We don't need to clear the bit on HOT updates, because by definition none
 of the indexed columns changed.

 Huh?  I don't think I believe that, and I definitely don't believe your
 argument for it.

 HOT-updating a row doesn't change what an index-only scan would see. An
 index-only scan only needs columns that are in the index, and since it's a HOT
 update, none of those columns have changed, so it doesn't matter which tuple
 we're returning them from.

 Pages that have HOT updated tuples, but haven't otherwise been modified since
 last vacuum are also not interesting for VACUUM, because a prune will do the
 job of getting rid of dead HOT-updated tuples just as well.

 Am I missing something?

I think the point is that for index-only scans you really just want to know
the visibility of the whole chain. The whole chain is either known-visible or
not. A HOT update doesn't change that, it just changes which version along the
chain is visible and the values of the non-indexed columns in that version.

Some thought has to be given to the transition stages when you create or drop
an index but I don't see a problem there. If you have a broken hot chain it
doesn't change the visibility rules for anyone who does use an old index (and
nobody who could see the broken end of the chain should be using the new index
anyways).

The problem with this is that it's different from what a DSM would need. We
could skip vacuuming such HOT updated dead tuples, assuming a page prune will
get it the next time we access the page I suppose. Or we could use a separate
bit for more aggressive vacuum information.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq