[HACKERS] Fwd: Clarification about HOT
Forgot to include the group... -- Forwarded message -- From: Gokulakannan Somasundaram [EMAIL PROTECTED] Date: Nov 5, 2007 3:04 PM Subject: Re: Clarification about HOT To: Gregory Stark [EMAIL PROTECTED] On 11/2/07, Gregory Stark [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram [EMAIL PROTECTED] writes: Another reason is that it avoids the whole problem of updating multiple pages atomically, without deadlocks. Thanks Heikki. I am still not getting what you said. In the case of HOT, you need to update the top pointer to point to some other tuple in some other page. That's one update. what's the other one? There are several problems, two that come to mind are: 1) How do you make the dead top pointer redirect to the first live pointer in the chain? Currently we store the item number of the first live tuple in the line pointer. You would have to keep the tuple around, though you could truncate it to just the tuple header. 2) When vacuuming how do you find the head of the chain when you're looking at a dead tuple? There's no space in the tuple header to store the head of the chain. Besides you want to vacuum scanning sequentially, not randomly. Thanks for all the inputs. My question would be if we decide to update the top of the HOT chain in the Index itself. Right now we are carrying a list of tuple-ids to be vacuumed, when we vacuum the index. Say we carry another list (or through some better mechanism), which would carry the corresponding live HOT tuple to be pointed. In other words we would try to make the index point to the top of the HOT chain during Vacuum. Am i making some mistake? -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com) -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
[HACKERS] Visibility map thoughts
Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts. Reducing VACUUM time is important, but the real big promise is the ability to do index-only-scans. Because that's the main focus of this exercise, I'm calling it the the Visibility Map from now on, because it's not about tracking dead space, but tuple visibility in general. Don't worry, reduced VACUUM times on read-mostly tables with hot spots will still fall out of it. What to store in the Visibility Map? Each potential user of the visibility map needs slightly different information: a) VACUUM needs to know if a page has enough dead tuples that it's interesting to visit. b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that can be frozen. c) Index-only-scans needs to know if all tuples on a page are visible to all transactions. HOT updates are OK, though. From this point on, this document concentrates on a map suited for index-only-scans. The basic structure is a bitmap with one bit per heap page. A set bit means all tuples on heap page are visible to all transactions. A cleared bit means that we don't know if that's true or not. This kind of map is useful for VACUUM as well, though VACUUM could get away with less strict semantics, and we might not want to visit a page in VACUUM if there's only very few dead tuples on a page. What this implementation gives us is a conservative VACUUM that will always find all the same dead tuples as a regular VACUUM (except for HOT updated tuples which can be pruned without scanning indexes). VACUUM using the visibility map can't update stats, so we'd still want regular vacuums every now and then, or rely on ANALYZE to do that. It's not useful for VACUUM FREEZE, unless we're willing to freeze much more aggressively, and change the meaning of a set bit to all tuples on heap page are frozen. Other kinds of visibility maps could be useful in some narrow scenarios. For example, if we had a map of pages that have no live tuples, we could skip those pages in sequential scans. We could also use more than one bit per page to store more information, but let's keep it simple for now. Where to store the visibility map? -- a) In a fixed size shared memory struct. Not acceptable. b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes. c) As a new relation kind, like toast tables. d) As an auxiliary smgr relation, attached to the heap. I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much. B is not good because we might want to have other kinds of auxiliary files in the future (FSM in particular). And it is a bit hacky anyway. Let's do something more generic. C doesn't seem like the right approach, because a visibility map doesn't really have much in common with real relations. In any case, the bitmap is divided into pages, and the pages live in shared_buffers, and are managed by the buffer manager like any other page. Updating the visibility map --- A single page in the visibility map covers a wide range of heap pages ((8192 - page header) * 8 = ~65000 heap pages). It can easily become locking bottleneck, if every update on any page in that range needs to lock the visibility map page. Setting a bit is just a hint. It's ok to lose it on crash. However, setting a bit mustn't hit the disk too early. What might otherwise happen is that the change that made all tuples on a page visible, like committing an inserting transaction, isn't replayed after crash, but the set bit is already on disk. In other words, setting a bit must set the LSN of the visibility map page. Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's ok if it hits the disk before the operation that cleared it. Therefore we don't need to set the LSN, but it must be redone at WAL replay of heap insert/update/delete. Because a concurrent update on the same word might be lost if we didn't lock the page at all, we need to lock the visibility map page to set or clear a bit. Since these are quick operations, especially clearing a bit, perhaps we could use just the buffer header spinlock. To further reduce contention, we can have a copy of the bit in the page header of the heap page itself. That way we'd only need to access the visibility map on the first update on a page that clears a bit. When to update the visibility map? -- - Clear bit on every update/insert/delete. - Set bit in vacuum or prune, if all tuples on page are now visible to all transactions. - If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't set the bit yet, but as soon as the
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: Thanks for all the inputs. My question would be if we decide to update the top of the HOT chain in the Index itself. Right now we are carrying a list of tuple-ids to be vacuumed, when we vacuum the index. Say we carry another list (or through some better mechanism), which would carry the corresponding live HOT tuple to be pointed. In other words we would try to make the index point to the top of the HOT chain during Vacuum. Yeah, we could do that. It was discussed in Spring, along with many other alternatives. Search the archives for pointer swinging. Basically, we decided we can live without it for now. It would've required quite a bit of changes, for not that much gain. We might still want it in the future if there's demand for it. If you really need to recover those 4 bytes per HOT chain, you can use VACUUM FULL, though it does take an exclusive lock on the table. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: Thanks for all the inputs. My question would be if we decide to update the top of the HOT chain in the Index itself. Right now we are carrying a list of tuple-ids to be vacuumed, when we vacuum the index. Say we carry another list (or through some better mechanism), which would carry the corresponding live HOT tuple to be pointed. In other words we would try to make the index point to the top of the HOT chain during Vacuum. Yeah, we could do that. It was discussed in Spring, along with many other alternatives. Search the archives for pointer swinging. Basically, we decided we can live without it for now. It would've required quite a bit of changes, for not that much gain. We might still want it in the future if there's demand for it. If you really need to recover those 4 bytes per HOT chain, you can use VACUUM FULL, though it does take an exclusive lock on the table. I think pointer swinging is still about maintaining the HOT chain within a page. Actually i am thinking about continuing the HOT chain across pages. The advantages are obvious a) Indexes need not get updated unless their values are updated, even if the inserts go into new pages b) Much smaller Index Footprint. May be i am missing something in the big picture. Please clarify me on that. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: I think pointer swinging is still about maintaining the HOT chain within a page. Actually i am thinking about continuing the HOT chain across pages. AFAICS, pointer-swinging would be exactly the same on cross-page HOT chains as same-page chains. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: I think pointer swinging is still about maintaining the HOT chain within a page. Actually i am thinking about continuing the HOT chain across pages. AFAICS, pointer-swinging would be exactly the same on cross-page HOT chains as same-page chains. When i read pointer-swinging, it talks a lot about in-page updates, pointing to the latest tuple instead of oldest tuple and circular pointers etc. Maybe, if i am missing the post, which you are referring to, please correct me on the steps i am talking about. This scheme, if implemented would avoid the use of HOT stub and avoids the need to go to index, even if some other indexes are getting updated/ even if the updated new tuple goes into a new page a) Whenever we update, we will update only those indexes whose attributes are updated. So the index will point to the top of the HOT chain. b) Whenever we Vacuum the index, we take a list of tids and check whether there are any index tuples pointing to it. If the Vacuumed tuple is a start of the HOT chain, then we will carry the next in-line HOT tuple when we goto Vacuum the index. If the next in-line also satisfies the Vacuum, it will carry with it the next in-line HOT tuple-id. In this way, we will make sure the Index points to a live tuple after the Vacuum. This will remove the in-page pruning exercises, but as i said already the cost of updates will go down a lot with normal indexes. Can you please tell, what are the disadvantages of the above mentioned approach? -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: AFAICS, pointer-swinging would be exactly the same on cross-page HOT chains as same-page chains. When i read pointer-swinging, it talks a lot about in-page updates, pointing to the latest tuple instead of oldest tuple and circular pointers etc. Maybe, if i am missing the post, which you are referring to, please correct me on the steps i am talking about. It talks about in-page updates etc, because we are only doing HOT updates within page. The pointer-swinging would still work across page boundaries, AFAICS. This scheme, if implemented would avoid the use of HOT stub and avoids the need to go to index, even if some other indexes are getting updated/ even if the updated new tuple goes into a new page You'd still have the HOT stubs until the next VACUUM. a) Whenever we update, we will update only those indexes whose attributes are updated. So the index will point to the top of the HOT chain. I don't see how pointer-swinging would solve the problem with updating just some indexes. On vacuum, you'd have to know which indexes were updated, and remove the old pointers in the ones that were updated, and pointer-swing others. You could store that information in the heap tuple, but that means more bloat in the heap, and more complexity. We did talk about that back in spring as well, but all the suggestions were quite complex. Besides, as soon as you insert at least one new index pointer, you no longer can remove the dead HOT updated tuple without scanning at least that one index. b) Whenever we Vacuum the index, we take a list of tids and check whether there are any index tuples pointing to it. If the Vacuumed tuple is a start of the HOT chain, then we will carry the next in-line HOT tuple when we goto Vacuum the index. If the next in-line also satisfies the Vacuum, it will carry with it the next in-line HOT tuple-id. Sorry, I didn't understand that. But the way you described it earlier, it's exactly the same thing as the pointer-swinging we talked about in spring. Is it the same or not? This will remove the in-page pruning exercises, but as i said already the cost of updates will go down a lot with normal indexes. We don't want to get rid of the in-page pruning. It allows us to reclaim dead space without having to VACUUM. That's a major point of HOT. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fwd: Clarification about HOT
b) Whenever we Vacuum the index, we take a list of tids and check whether there are any index tuples pointing to it. If the Vacuumed tuple is a start of the HOT chain, then we will carry the next in-line HOT tuple when we goto Vacuum the index. If the next in-line also satisfies the Vacuum, it will carry with it the next in-line HOT tuple-id. Sorry, I didn't understand that. But the way you described it earlier, it's exactly the same thing as the pointer-swinging we talked about in spring. Is it the same or not? The onle extra overhead is that we will need more memory during Vacuum. We are currently calling the tid_reaped function / lazy_tid_reaped function. It does a binary search to check whether the tid pointed by the index, is present in its array/list. If it is present, it means that it is ready for Vacuum. For HOT tuples, this list will carry a replacement tid(the next in-line HOT Tuple). So instead of removing the index tuples, we will update the tid part of the index tuples. So there is no HOT stub here. The index will try to point to the live HOT tuple (it would also be the head of the chain). Say index was previously pointing to (3,4). After (3,4) gets ready to Vacuum, it will send a replacement tid, the one in its t_data. say (5,6). So once the Vacuum identifies the correct Index Tuple, it will update the tid portion of index tuple to (5,6). Please advise me on whether i am missing something / not clear in the explanation. I think i am very poor in understanding things at the first time and also very poor in putting across my point the very first time. Please bear with that :) This will remove the in-page pruning exercises, but as i said already the cost of updates will go down a lot with normal indexes. We don't want to get rid of the in-page pruning. It allows us to reclaim dead space without having to VACUUM. That's a major point of HOT. But we are going to get the index sizes very small and also we are going to reduce the cost of updates. Isn't that sufficient enough reason for us? -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] should I worry?
Hi Tom On Sun, 4 Nov 2007, Tom Lane wrote: Date: Sun, 04 Nov 2007 19:47:04 -0500 From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] should I worry? I wrote: Hmm, this is messier than I thought. What evidently has happened is that at one time or another, one of the two tables involved in an FK relationship has been dropped and re-created. If you'd had proper FK constraints the constraints would have gone away cleanly, but with these old trigger definitions there was no mechanism to make that happen, and so the triggers on the other table remained in place. That seems to have happened several times, in fact. After tweaking ConvertTriggerToFK() to be more verbose and to insist on finding all three triggers, I get this: NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_position(ppos_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY objet_position(pobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY objet_position(pobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: converting trigger group into constraint unnamed FOREIGN KEY objet_position(pobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referenced table's UPDATE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_objets(perobj_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: converting trigger group into constraint unnamed FOREIGN KEY perso_objets(perobj_obj_cod) REFERENCES objets(obj_cod) DETAIL: Found referenced table's UPDATE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: converting trigger group into constraint unnamed FOREIGN KEY messages_dest(dmsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referenced table's UPDATE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_dest(dmsg_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: converting trigger group into constraint unnamed FOREIGN KEY messages_exp(emsg_msg_cod) REFERENCES messages(msg_cod) DETAIL: Found referenced table's UPDATE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY messages_exp(emsg_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referencing table's trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_competences(pcomp_perso_cod) REFERENCES perso(perso_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_position(ppos_pos_cod) REFERENCES positions(pos_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY perso_position(ppos_pos_cod) REFERENCES positions(pos_cod) DETAIL: Found referenced table's UPDATE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY objet_position(pobj_pos_cod) REFERENCES positions(pos_cod) DETAIL: Found referenced table's DELETE trigger. NOTICE: ignoring incomplete trigger group for constraint unnamed FOREIGN KEY
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: b) Whenever we Vacuum the index, we take a list of tids and check whether there are any index tuples pointing to it. If the Vacuumed tuple is a start of the HOT chain, then we will carry the next in-line HOT tuple when we goto Vacuum the index. If the next in-line also satisfies the Vacuum, it will carry with it the next in-line HOT tuple-id. Sorry, I didn't understand that. But the way you described it earlier, it's exactly the same thing as the pointer-swinging we talked about in spring. Is it the same or not? The onle extra overhead is that we will need more memory during Vacuum. We are currently calling the tid_reaped function / lazy_tid_reaped function. It does a binary search to check whether the tid pointed by the index, is present in its array/list. If it is present, it means that it is ready for Vacuum. For HOT tuples, this list will carry a replacement tid(the next in-line HOT Tuple). So instead of removing the index tuples, we will update the tid part of the index tuples. So there is no HOT stub here. The index will try to point to the live HOT tuple (it would also be the head of the chain). Say index was previously pointing to (3,4). After (3,4) gets ready to Vacuum, it will send a replacement tid, the one in its t_data. say (5,6). So once the Vacuum identifies the correct Index Tuple, it will update the tid portion of index tuple to (5,6). Please advise me on whether i am missing something / not clear in the explanation. To answer the question I asked you, based on the above, this really is exactly the same pointer-swinging we talked about in spring. This will remove the in-page pruning exercises, but as i said already the cost of updates will go down a lot with normal indexes. We don't want to get rid of the in-page pruning. It allows us to reclaim dead space without having to VACUUM. That's a major point of HOT. But we are going to get the index sizes very small and also we are going to reduce the cost of updates. Isn't that sufficient enough reason for us? No. You haven't actually explained why you'd have to remove the in-page pruning exercises. I suspect that's not true. Not that any of this really matters, until you address the arguments against doing HOT updates across pages in the first place. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Fwd: [HACKERS] Fwd: Clarification about HOT
Again Forgot to include the group... -- Forwarded message -- From: Gokulakannan Somasundaram [EMAIL PROTECTED] Date: Nov 5, 2007 5:09 PM Subject: Re: [HACKERS] Fwd: Clarification about HOT To: Heikki Linnakangas [EMAIL PROTECTED] On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: b) Whenever we Vacuum the index, we take a list of tids and check whether there are any index tuples pointing to it. If the Vacuumed tuple is a start of the HOT chain, then we will carry the next in-line HOT tuple when we goto Vacuum the index. If the next in-line also satisfies the Vacuum, it will carry with it the next in-line HOT tuple-id. Sorry, I didn't understand that. But the way you described it earlier, it's exactly the same thing as the pointer-swinging we talked about in spring. Is it the same or not? The onle extra overhead is that we will need more memory during Vacuum. We are currently calling the tid_reaped function / lazy_tid_reaped function. It does a binary search to check whether the tid pointed by the index, is present in its array/list. If it is present, it means that it is ready for Vacuum. For HOT tuples, this list will carry a replacement tid(the next in-line HOT Tuple). So instead of removing the index tuples, we will update the tid part of the index tuples. So there is no HOT stub here. The index will try to point to the live HOT tuple (it would also be the head of the chain). Say index was previously pointing to (3,4). After (3,4) gets ready to Vacuum, it will send a replacement tid, the one in its t_data. say (5,6). So once the Vacuum identifies the correct Index Tuple, it will update the tid portion of index tuple to (5,6). Please advise me on whether i am missing something / not clear in the explanation. To answer the question I asked you, based on the above, this really is exactly the same pointer-swinging we talked about in spring. This will remove the in-page pruning exercises, but as i said already the cost of updates will go down a lot with normal indexes. We don't want to get rid of the in-page pruning. It allows us to reclaim dead space without having to VACUUM. That's a major point of HOT. But we are going to get the index sizes very small and also we are going to reduce the cost of updates. Isn't that sufficient enough reason for us? No. You haven't actually explained why you'd have to remove the in-page pruning exercises. I suspect that's not true. OK. If HOT updates are going to be across pages, we may not know whether we need to do Pruning, because we don't know whether it is a in-page HOT/ out-of page HOT. May be we can allocate some extra bits for that. I am not favouring it. That's an option to be tried out. Not that any of this really matters, until you address the arguments against doing HOT updates across pages in the first place. Yes, but those arguments were centered around the current implementation, where we have a HOT stub, which will point us to the latest live HOT Tuple. the two problems which were put forth in this thread are a) updating two pages concurrently and Soln: We are just following the current Vacuum process. We are updating the index tuple tid, instead of reclaiming its space with a Super Exclusive lock. We will vacuum the heap tuple as usual. b) we won't know the HOT chain stub tid, when we encounter a HOT dead tuple Soln: Whenever we find a HOT dead tuple, during vacuum, we will add it to the list(tid_reaped uses) with the next-in-line HOT Tuple. So when we vacuum the index, it would be taken care of i am actually expecting some issues, as this is not a proposal and just a discussion -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. ( www.alliedgroups.com) -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
[HACKERS] Slow regression tests on windows
I am trying to run regression tests on both windows and RH. It looks like that the tests on windows run slower than linux using two machines with same hardware config. Is this known? Gevik Babakhani PostgreSQL NL http://www.postgresql.nl TrueSoftware BV http://www.truesoftware.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] should I worry?
[EMAIL PROTECTED] writes: On Sun, 4 Nov 2007, Tom Lane wrote: So you have a *bunch* of partially broken FK constraints in that source database. I just talk to my customer and he/we'll make a big batch deleting and recreating all foreign keys on 8.2.5. The question is, how do we get rid of those useless unamed triggers? DROP TRIGGER should be fine. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Slow regression tests on windows
Gevik Babakhani wrote: I am trying to run regression tests on both windows and RH. It looks like that the tests on windows run slower than linux using two machines with same hardware config. Is this known? We need far more information than this before we can say much sensibly, I think.. For example, what configure flags you are using, what compilers, etc. From what I can see MinGW regression is somewhat slower than MSVC (and even than Cygwin) on my buildfarm VM that runs all three, even though the latter two are rate limited by MAX_CONNECTIONS. The regression tests really aren't performance tests, though. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: May be i am missing something in the big picture. Please clarify me on that. Locking. Your proposal involves lots of multi-page operations, which are best avoided. Moreover, you have offered no data to suggest that there would be any real gain from all this extra complexity. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Visibility map thoughts
This is my feedback regarding the Visibility map. I just want to disagree on the fact that the DSM/Visibility map would not focus on Vacuum. I think Index only scans should be thought of as a fringe benefit of DSMs/ Visibility Maps. There are some basic assumptions in the design, which says that a) The inserts won't increase the size of the table. If it increases, it has to lock one full page of Visibility map and this is not suitable for tables, which are short-lived like partitioned tables b) Even if the inserts don't increase the size of the table, it might make DSM useless, if lot of inserts keep converting the all-visible ones to uncertain ones. For that matter, even the Deletes and Updates are also going to make lot of pages into uncertain ones. c) Visibility map gets useless, when there is a long running batch query / periodic background queries which run for longer times d) More updates- more blocks of uncertainity - space usage by DSM and the reference made to DSM is just an overhead e) Lot of times, people may not need index-only scans. Again this gets to be a overhead f) If there are scheduled reboots, the DSM crashes and periodic slow-downs in the queries during the time, the DSM gets re-constructed. I am not opposing this, as it is a redundant feature for Thick indexes. After all every one of us, want Postgres to be the fastest one in the world. But because DSM has a inherent assumption that lot of tables will become static and all the tuples would be visible to everyone. If there are such tables, then definitely Thick index becomes a overhead in terms of space. But DSM should not become overhead at any cost, as it is a memory resident one at all times and also always gets into the lifecycle of a query. Only way to achieve it is to make it a dual purpose one. It should help Vacuum, freezing and visibility checks. Thanks, Gokul. On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts. Reducing VACUUM time is important, but the real big promise is the ability to do index-only-scans. Because that's the main focus of this exercise, I'm calling it the the Visibility Map from now on, because it's not about tracking dead space, but tuple visibility in general. Don't worry, reduced VACUUM times on read-mostly tables with hot spots will still fall out of it. What to store in the Visibility Map? Each potential user of the visibility map needs slightly different information: a) VACUUM needs to know if a page has enough dead tuples that it's interesting to visit. b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that can be frozen. c) Index-only-scans needs to know if all tuples on a page are visible to all transactions. HOT updates are OK, though. From this point on, this document concentrates on a map suited for index-only-scans. The basic structure is a bitmap with one bit per heap page. A set bit means all tuples on heap page are visible to all transactions. A cleared bit means that we don't know if that's true or not. This kind of map is useful for VACUUM as well, though VACUUM could get away with less strict semantics, and we might not want to visit a page in VACUUM if there's only very few dead tuples on a page. What this implementation gives us is a conservative VACUUM that will always find all the same dead tuples as a regular VACUUM (except for HOT updated tuples which can be pruned without scanning indexes). VACUUM using the visibility map can't update stats, so we'd still want regular vacuums every now and then, or rely on ANALYZE to do that. It's not useful for VACUUM FREEZE, unless we're willing to freeze much more aggressively, and change the meaning of a set bit to all tuples on heap page are frozen. Other kinds of visibility maps could be useful in some narrow scenarios. For example, if we had a map of pages that have no live tuples, we could skip those pages in sequential scans. We could also use more than one bit per page to store more information, but let's keep it simple for now. Where to store the visibility map? -- a) In a fixed size shared memory struct. Not acceptable. b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes. c) As a new relation kind, like toast tables. d) As an auxiliary smgr relation, attached to the heap. I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much. B is not good because we might want to have other kinds of auxiliary files in the future (FSM in particular). And it is a bit hacky anyway. Let's do something more generic. C doesn't seem like the right approach, because a visibility map
Re: [HACKERS] Fwd: Clarification about HOT
Tom, Let me try to understand your statement. What extra multi-page operations are we doing? Currently, during Vacuum, we goto the Index and mark it as dead and reclaim the space. For doing this, we are acquiring a Super-Exclusive lock. After this implementation, we would update the index tuple instead of marking it for cleanup. What can be foreseen as a locking overhead here? Actually i don't know what should be the best practice. Should i start a discussion, take inputs from everyone and start a implementation? (Or) Should i finish the task, get the performance figures and then come to the forum? I realize, that i am doing something wrong here. HOT in its present implementation has some complexities associated, as it is dealt as a special case. I saw that you have also made that comment in your review. The only place where HOT is innovative is in its underlying assumption - Since we are not storing the snapshot info into the index, it need not get updated, if the index info is not changing. Currently we have implemented in a very limited sense- works only when you do in-page updates, works only when no index get updated. It would relish its completeness, if it works across pages and works treating each index as a seperate entity and makes decisions on updating it on a index-by-index basis. By doing this, we will have a rich indexing infrastructure, where thin indexes are suitable for heavily updated tables and thick indexes would be suitable for heavily selected tables. I definitely need guidance from you, before going into its implementation. So please don't consider this as a proposal. With your experience, try to gauge the usefulness of this feature. Some small tricks from you to make it even more effective, would also be very useful. Thanks, Gokul. On 11/5/07, Tom Lane [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram [EMAIL PROTECTED] writes: May be i am missing something in the big picture. Please clarify me on that. Locking. Your proposal involves lots of multi-page operations, which are best avoided. Moreover, you have offered no data to suggest that there would be any real gain from all this extra complexity. regards, tom lane -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Fwd: Clarification about HOT
On Nov 5, 2007 7:37 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: Tom, Let me try to understand your statement. What extra multi-page operations are we doing? Currently, during Vacuum, we goto the Index and mark it as dead and reclaim the space. For doing this, we are acquiring a Super-Exclusive lock. After this implementation, we would update the index tuple instead of marking it for cleanup. What can be foreseen as a locking overhead here? Its not just about vacuuming. You need to worry about locking during the HOT-fetches as well as chain pruning. There could be tricky corner cases between index/seq scans and pruning. And don't forget CREATE INDEX which would become even more challenging if you have HOT chains spanning multiple pages. This is not to discourage you from trying to improve HOT. But once-upon-a-time we had this multi-page HOT (it was called Heap-Overflow-Tuple) and I can tell you: it was really complex. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Fwd: Clarification about HOT
Thanks for the feedback. Let me try to put what is there in my mind for this. Please clarify whether my assumptions are valid On 11/5/07, Pavan Deolasee [EMAIL PROTECTED] wrote: On Nov 5, 2007 7:37 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: Tom, Let me try to understand your statement. What extra multi-page operations are we doing? Currently, during Vacuum, we goto the Index and mark it as dead and reclaim the space. For doing this, we are acquiring a Super-Exclusive lock. After this implementation, we would update the index tuple instead of marking it for cleanup. What can be foreseen as a locking overhead here? Its not just about vacuuming. You need to worry about locking during the HOT-fetches as well as chain pruning. During HOT fetches, the normal case is that we will take the tuple-id from the index, goto the top of HOT chain, descend from there, until we reach the right tuple. So this would involve BUFFER_SHARE locks which should not be of any concern. There may not be anything called chain-pruning. Instead the tuples, which are to be vacuumed, will get vacuumed, after redirecting their index tuple peers, during the Vacuum process. There could be tricky corner cases between index/seq scans and pruning. In seq-scans, i think we need not worry about the HOT implementation. we need to take each tuple, check for the visibility criteria and take the appropriate step. During clean-up, we will be taking the super exclusive lock here. In Index scans, we will change the index-entry, only after obtaining the Super-Exclusive locks - so no pins - so no index scans are going on during this time And don't forget CREATE INDEX which would become even more challenging if you have HOT chains spanning multiple pages. Create index has to do seq scan. so it will consider only tuples which are live at the time of creation of index. It won't worry about the HOT chains. This is not to discourage you from trying to improve HOT. But once-upon-a-time we had this multi-page HOT (it was called Heap-Overflow-Tuple) and I can tell you: it was really complex. Thanks a lot for the encouraging words. I would definitely refer to the Heap Overflow Tuple and check, whether there are any problems that are going to recur in this. In the mean-while, if you can think of a specific case, in which this design would fail, please notify me. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: Currently, during Vacuum, we goto the Index and mark it as dead and reclaim the space. For doing this, we are acquiring a Super-Exclusive lock. After this implementation, we would update the index tuple instead of marking it for cleanup. What can be foreseen as a locking overhead here? There are three operations involved: marking the child tuple not-HOT, updating the index entry(s), and removing the parent tuple. I doubt you can do them safely as independent atomic operations; I think you'll need to have more than one page locked at a time. The locking problem also applies to trying to collapse out a dead HOT tuple that's in the middle of the chain: if it's the end of a cross-page link then you need two pages super-exclusive-locked in order to do that. There's also the little problem that a redirect line pointer doesn't have room for a cross-page link, and the big problem that having to chase across multiple pages for an index lookup would almost certainly negate any performance gains you might get. (In particular it'd completely destroy locality of access for bitmap indexscans...) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fwd: Clarification about HOT
On Nov 5, 2007 8:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: There may not be anything called chain-pruning. Instead the tuples, which are to be vacuumed, will get vacuumed, after redirecting their index tuple peers, during the Vacuum process. This won't help us check the heap bloat. Though containing index bloat is important, most of the performance benefits of HOT comes from doing page level retail vacuuming. This not only reduces the heap bloat but also results in less frequent vacuuming of the table. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Visibility map thoughts
Gokulakannan Somasundaram wrote: a) The inserts won't increase the size of the table. If it increases, it has to lock one full page of Visibility map and this is not suitable for tables, which are short-lived like partitioned tables The overhead of locking a page is very small. Actually, extending a heap only needs to touch the visibility map when we need a new visibility map page, if we initialize all bits to zero. Like we do already anyway. b) Even if the inserts don't increase the size of the table, it might make DSM useless, if lot of inserts keep converting the all-visible ones to uncertain ones. For that matter, even the Deletes and Updates are also going to make lot of pages into uncertain ones. Sure. If you have a lot of (random) inserts/updates/deletes, it becomes much less useful. A small mitigating factor is that an insert/update/delete will fetch the heap page to memory anyway. Therefore having to access it just after the update is cheap. This helps inserts in particular, because after the inserting transaction is OldestXmin, we can set the bit again. c) Visibility map gets useless, when there is a long running batch query / periodic background queries which run for longer times Yeah, long running transactions are a bitch in many ways. d) More updates- more blocks of uncertainity - space usage by DSM and the reference made to DSM is just an overhead e) Lot of times, people may not need index-only scans. Again this gets to be a overhead The beauty of this approach is that the overhead is very small. f) If there are scheduled reboots, the DSM crashes and periodic slow-downs in the queries during the time, the DSM gets re-constructed. That's rubbish. I am not opposing this, as it is a redundant feature for Thick indexes. After all every one of us, want Postgres to be the fastest one in the world. And also the easiest to maintain, most space-efficient, most reliable and so forth... But because DSM has a inherent assumption that lot of tables will become static and all the tuples would be visible to everyone. If there are such tables, then definitely Thick index becomes a overhead in terms of space. But DSM should not become overhead at any cost, as it is a memory resident one at all times and also always gets into the lifecycle of a query. Only way to achieve it is to make it a dual purpose one. It should help Vacuum, freezing and visibility checks. I don't understand this paragraph. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] minimal update
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: A BEFORE UPDATE trigger would be better, and probably hardly more expensive than a wired-in facility (especially if you were willing to write it in C). Yes. I also prefer the trigger idea to a rule because triggers are easy to enable and disable. It's still a lot of work for what must be a common want, though. Could it be done generically? Well, you could write the trigger in C and it'd work for any table. I think it could be as simple as a memcmp of the tuples' data areas, since we now require padding bytes to be 0 ... Something like this fragment? newtuple = trigdata-tg_newtuple; oldtuple = trigdata-tg_trigtuple; rettuple = newtuple; if (newtuple-t_len == oldtuple-t_len newtuple-t_data-t_hoff == oldtuple-t_data-t_hoff memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple), newtuple-t_len - newtuple-t_data-t_hoff) == 0) rettuple = NULL; return PointerGetDatum(rettuple); Also, when did we first require padding bytes to be 0? cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Tom Lane [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram [EMAIL PROTECTED] writes: Currently, during Vacuum, we goto the Index and mark it as dead and reclaim the space. For doing this, we are acquiring a Super-Exclusive lock. After this implementation, we would update the index tuple instead of marking it for cleanup. What can be foreseen as a locking overhead here? There are three operations involved: marking the child tuple not-HOT, updating the index entry(s), and removing the parent tuple. I doubt you can do them safely as independent atomic operations; I think you'll need to have more than one page locked at a time. I think, in my suggestion, there are only two steps, redirect the index tuple tid to point to a new tuple and mark the parent heap tuple vacuumed. This is no different from the existing Vacuum process, except that we are updating the index tuple, instead of marking it for Vacuum . I think you are thinking about the current implementation. We need not mark any tuple as HOT in the proposed one. Instead our code will work under the assumption that normal indexes won't get updated, unless the indexing info changes. i don't foresee a requirement for HOT classification. Please explain me, if it is required in any case. The locking problem also applies to trying to collapse out a dead HOT tuple that's in the middle of the chain: Currently, we are maintaining a list/array in the tid_reaped or lazy_tid_Reaped function. So this will contain all the tids that can be Vacuumed. In our case,as already explained, all the heap tuples, which are updated, will contain an extra tid, which is taken from it t_ctid stored in t_data. So after we update the tid in the index-tuple, we can call tid_reaped again to check whether the replaced tid has to get replaced / can be Vacuumed if it's the end of a cross-page link then you need two pages super-exclusive-locked in order to do that. If the chain ends, the heap tuple would have got marked deleted. it won't point to any tuple in the t_data and would have got marked HEAP_XMAX_VALID. So it is the same case as Vacuuming a normal tuple. Have i correctly answered your question?? There's also the little problem that a redirect line pointer doesn't have room for a cross-page link, i don't know, what is a re-direct line pointer. The tid part of index tuple can be made to point to any place. So i don't understand what you mean over here. and the big problem that having to chase across multiple pages for an index lookup would almost certainly negate any performance gains you might get. (In particular it'd completely destroy locality of access for bitmap indexscans...) Yes, i think bitmap index scans will get affected, but only for the select on heavily updated tuples. That's the cost we need to pay for extracting the performance out of updates. As i already said the normal index would be the index for heavily updated tables and thick index would be the index for tables with heavy selects. We can also look at optimizing the index tuple structure to reduce its footprint. Please get back, if i am not clear in any. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: There's also the little problem that a redirect line pointer doesn't have room for a cross-page link, i don't know, what is a re-direct line pointer. Then you clearly don't understand at all how HOT works. Please go read src/backend/access/heap/README.HOT. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Pavan Deolasee [EMAIL PROTECTED] wrote: On Nov 5, 2007 8:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: There may not be anything called chain-pruning. Instead the tuples, which are to be vacuumed, will get vacuumed, after redirecting their index tuple peers, during the Vacuum process. This won't help us check the heap bloat. Though containing index bloat is important, most of the performance benefits of HOT comes from doing page level retail vacuuming. This not only reduces the heap bloat but also results in less frequent vacuuming of the table. Can you please explain this in more detail? If the HOT chain doesn't break and completely gets into a single page, the Vacuum daemon need not intervene with the HOT tuples for space reclamation. But ultimately the space would get reclaimed even with the normal Vacuum(without HOT Pruning). Isn't it? Then how do we say that without HOT Pruning, we will have Heap Bloat? -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Slow regression tests on windows
On Mon, Nov 05, 2007 at 08:43:01AM -0500, Andrew Dunstan wrote: Gevik Babakhani wrote: I am trying to run regression tests on both windows and RH. It looks like that the tests on windows run slower than linux using two machines with same hardware config. Is this known? We need far more information than this before we can say much sensibly, I think.. For example, what configure flags you are using, what compilers, etc. In general, the regression tests are slower on windows, yes. It's because there's a lot of processes spawned on both client and sderver side.. (Heck, in generall, *everything* in pg is slower on Windows than on Linux with the same hardware config) From what I can see MinGW regression is somewhat slower than MSVC (and even than Cygwin) on my buildfarm VM that runs all three, even though the latter two are rate limited by MAX_CONNECTIONS. Uh, you're saying MSVC and Cygwin somehow differ from MingW? //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fwd: Clarification about HOT
Thanks Heikki. To clarify, there won't be any redirect-line pointers in this implementation. That space is saved. We will have the index tuple point to the latest live tuple in the update chain. So no need for redirect-line pointers. Thanks, Gokul. On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: There's also the little problem that a redirect line pointer doesn't have room for a cross-page link, i don't know, what is a re-direct line pointer. Then you clearly don't understand at all how HOT works. Please go read src/backend/access/heap/README.HOT. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
[HACKERS] plpgsql keywords are hidden reserved words
I was somewhat bemused just now to find that this function stopped working: regression=# create function estimate_rows(query text) returns float8 as $$ declare r text; begin for r in execute 'explain ' || query loop if substring(r from 'rows=[0-9]') is not null then return substring (r from 'rows=([0-9]+)'); end if; end loop; return null; end$$ language plpgsql strict; CREATE FUNCTION regression=# select estimate_rows('select * from tenk1 where unique1500'); ERROR: column query does not exist LINE 1: SELECT 'explain ' || query ^ QUERY: SELECT 'explain ' || query CONTEXT: PL/pgSQL function estimate_rows line 3 at FOR over EXECUTE statement This works fine in 8.2. The reason it no longer works is that query is now a special token in the plpgsql lexer, and that means that it will never be substituted for by read_sql_construct(). So it's effectively a reserved word. While I can work around this by changing the parameter name or using for r in execute 'explain ' || estimate_rows.query loop it's still a tad annoying, and it means that we have to be *very* circumspect about adding new keywords to plpgsql. I don't see any fix for this that's reasonable to try to shoehorn into 8.3, but I think we really need to revisit the whole area of plpgsql variable substitution during 8.4. We could make this problem go away if variable substitution happened through a parser callback instead of before parsing. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] minimal update
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Well, you could write the trigger in C and it'd work for any table. I think it could be as simple as a memcmp of the tuples' data areas, since we now require padding bytes to be 0 ... Something like this fragment? newtuple = trigdata-tg_newtuple; oldtuple = trigdata-tg_trigtuple; rettuple = newtuple; if (newtuple-t_len == oldtuple-t_len newtuple-t_data-t_hoff == oldtuple-t_data-t_hoff memcmp(GETSTRUCT(newtuple),GETSTRUCT(oldtuple), newtuple-t_len - newtuple-t_data-t_hoff) == 0) rettuple = NULL; return PointerGetDatum(rettuple); Close, but I think you also need to take care to compare natts and the null bitmaps (if any). Might be worth comparing OIDs too, though AFAIR there is no mechanism for substituting a different OID during UPDATE. Probably the easiest coding is to memcmp all the way from offsetof(t_bits) to t_len, after comparing natts and the HASNULL and HASOID flags. Also, when did we first require padding bytes to be 0? The 8.3 varvarlena patch is what requires it, but in practice heap_formtuple has always started with a palloc0, so I think it would work a long ways back. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Fwd: Clarification about HOT
Gokulakannan Somasundaram wrote: Thanks Heikki. To clarify, there won't be any redirect-line pointers in this implementation. That space is saved. We will have the index tuple point to the latest live tuple in the update chain. So no need for redirect-line pointers. The redirected line pointers are there for a reason. The reason is to be able to retail vacuum (= prune) a page without having to do a regular vacuum, scanning all indexes. If you lose that ability, the idea is dead in the water. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] plpgsql keywords are hidden reserved words
Tom Lane [EMAIL PROTECTED] writes: I was somewhat bemused just now to find that this function stopped working: regression=# create function estimate_rows(query text) returns float8 as $$ ... This works fine in 8.2. The reason it no longer works is that query is now a special token in the plpgsql lexer, and that means that it will never be substituted for by read_sql_construct(). So it's effectively a reserved word. Perhaps we should be throwing a more intelligible error if you have a parameter (or variable?) named in a way that will conflict? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Slow regression tests on windows
Magnus Hagander wrote: From what I can see MinGW regression is somewhat slower than MSVC (and even than Cygwin) on my buildfarm VM that runs all three, even though the latter two are rate limited by MAX_CONNECTIONS. Uh, you're saying MSVC and Cygwin somehow differ from MingW? I'm saying my buildfarm members differ from each other. In fact, on the last run of each, MinGW was by far the slowest and Cygwin the fastest in the make check step. But there could be any number of reasons for that, including extraneous activity on the VM host that could have slowed the whole VM down. Anyway, comparing regression test speeds is probably not very productive. cheers andredw ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: Thanks Heikki. To clarify, there won't be any redirect-line pointers in this implementation. That space is saved. We will have the index tuple point to the latest live tuple in the update chain. So no need for redirect-line pointers. The redirected line pointers are there for a reason. The reason is to be able to retail vacuum (= prune) a page without having to do a regular vacuum, scanning all indexes. If you lose that ability, the idea is dead in the water. -- Since we are going to have the index point to the top of the chain and sliding the chain happens concurrently with index and heap, there is no need for redirected line pointers. But i doubt whether we can retail Vacuum a page since there is just one HOT chain in the page. We can retail Vacuum only the HOT chain and for the rest of the tuples, which are not HOT updated, we need to consult with the indexes. Any other tuple in that page has to go through the normal Vacuum process. Is my understanding correct here? If so, we don't retail vacuum a page. We just try to shorten the HOT chain at the expense of some space overhead. If this is going to give so much benefit, then may be the design can be made slightly flexible in order to accomodate it. Say, if we have a table with 4 indexes and updates occur in such intervals, we may not be able to find space in the same page for the update. Currently we are incurring the overhead of updating all the indexes in this scenario. Even if one of the index is updated, we will be incurring the same overhead again in the current scenario. In the proposed scenario, we will not have all those overheads. if we feel that this overhead is required for achieving a pruning of HOT chain, then i think we should let drop this idea. Thanks, Gokul. Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] Slow regression tests on windows
On Mon, Nov 05, 2007 at 11:01:26AM -0500, Andrew Dunstan wrote: Magnus Hagander wrote: From what I can see MinGW regression is somewhat slower than MSVC (and even than Cygwin) on my buildfarm VM that runs all three, even though the latter two are rate limited by MAX_CONNECTIONS. Uh, you're saying MSVC and Cygwin somehow differ from MingW? I'm saying my buildfarm members differ from each other. Ah. I thought you meant we had some builtin cap in the regression tests, which is what confused me. In fact, on the last run of each, MinGW was by far the slowest and Cygwin the fastest in the make check step. But there could be any number of reasons for that, including extraneous activity on the VM host that could have slowed the whole VM down. Yeah. It does surprise me a lot that cygwin should be fastest, really. Anyway, comparing regression test speeds is probably not very productive. Agreed. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal: PL/pgSQL EXECUTE INTO USING (for 8.4)
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Pavel Stehule wrote: Hello, this proposal change older unaccepted proposal http://archives.postgresql.org/pgsql-hackers/2006-03/msg01157.php . Changes: * based on prepared statements * syntax and behave is near to Oracle * usable as protection from SQL injection New syntax: a) EXECUTE stringexpr [INTO [STRICT] varlist [USING exprlist] b) FOR varlist IN EXECUTE stringexpr USING exprlist LOOP Reason: * defence from SQL injection * more readable, shorter, more comfortable Sample (secure dynamic statement): EXECUTE 'SELECT * FROM ' || CASE tblname WHEN 'tab1' THEN 'tab1' WHEN 'tab2' THEN 'tab2' ELSE 'some is wrong' END || ' WHERE c1 = $1 AND c2 = $2' USING unsecure_parameter1, unsecure_parameter2; Difference between PL/SQL and proposal: * allow only IN variables * use PostgreSQL placeholders notation - $n instead :n Compliance with PL/SQL * You can use numeric, character, and string literals as bind arguments * You cannot use bind arguments to pass the names of schema objects to a dynamic SQL statement. Best regards Pavel Stehule ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [COMMITTERS] pgsql: Add a note about another issue that needs to be considered before
Gregory Stark [EMAIL PROTECTED] writes: FWIW I found another issue with this variable when I was experimenting with small block sizes. If you set the target = the tuple header the toaster breaks. This is because it's doing unsigned arithmetic (Size is unsigned). I think the right solution is just to change it to use plain int32 arithmetic everywhere. Somehow I can't get excited about that. If we ever do open things up to allow user control of the target, we'd certainly constrain it to a sane range. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Slow regression tests on windows
Magnus Hagander wrote: In fact, on the last run of each, MinGW was by far the slowest and Cygwin the fastest in the make check step. But there could be any number of reasons for that, including extraneous activity on the VM host that could have slowed the whole VM down. Yeah. It does surprise me a lot that cygwin should be fastest, really. Don't read anything into it. The VMware clock is quite unreliable, in my experience. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Visibility map thoughts
On 11/5/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: a) The inserts won't increase the size of the table. If it increases, it has to lock one full page of Visibility map and this is not suitable for tables, which are short-lived like partitioned tables The overhead of locking a page is very small. Actually, extending a heap only needs to touch the visibility map when we need a new visibility map page, if we initialize all bits to zero. Like we do already anyway. As you have pointed out 1 page in the visibility map points to 65535 pages in the heap. So even if we are locking the visibility map for a small time, it will affect all those scans, which will need to access these pages. b) Even if the inserts don't increase the size of the table, it might make DSM useless, if lot of inserts keep converting the all-visible ones to uncertain ones. For that matter, even the Deletes and Updates are also going to make lot of pages into uncertain ones. Sure. If you have a lot of (random) inserts/updates/deletes, it becomes much less useful. A small mitigating factor is that an insert/update/delete will fetch the heap page to memory anyway. Therefore having to access it just after the update is cheap. This helps inserts in particular, because after the inserting transaction is OldestXmin, we can set the bit again. But we can set this bit only after a Vacuum process. The tuples might not be there, till the Vacuum process pitches in and marks this. c) Visibility map gets useless, when there is a long running batch query / periodic background queries which run for longer times Yeah, long running transactions are a bitch in many ways. But these are determined by business conditions and we need to provide efficient solutions to deal with it. d) More updates- more blocks of uncertainity - space usage by DSM and the reference made to DSM is just an overhead e) Lot of times, people may not need index-only scans. Again this gets to be a overhead The beauty of this approach is that the overhead is very small. f) If there are scheduled reboots, the DSM crashes and periodic slow-downs in the queries during the time, the DSM gets re-constructed. That's rubbish. I think DSM is not WAL-Logged. So when it gets reconstructed every time for a big table, isn't it a overhead? I am not opposing this, as it is a redundant feature for Thick indexes. After all every one of us, want Postgres to be the fastest one in the world. And also the easiest to maintain, most space-efficient, most reliable and so forth... Provided it supports Vacuuming Freezing.. But because DSM has a inherent assumption that lot of tables will become static and all the tuples would be visible to everyone. If there are such tables, then definitely Thick index becomes a overhead in terms of space. But DSM should not become overhead at any cost, as it is a memory resident one at all times and also always gets into the lifecycle of a query. Only way to achieve it is to make it a dual purpose one. It should help Vacuum, freezing and visibility checks. I don't understand this paragraph. Because updates, inserts and deletes reduce the utility of Visibility map, it seems to be designed for more static tables, which don't experience much of these operations. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] pgsql: Add a note about another issue that needs to be considered before
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: FWIW I found another issue with this variable when I was experimenting with small block sizes. If you set the target = the tuple header the toaster breaks. This is because it's doing unsigned arithmetic (Size is unsigned). I think the right solution is just to change it to use plain int32 arithmetic everywhere. Somehow I can't get excited about that. If we ever do open things up to allow user control of the target, we'd certainly constrain it to a sane range. Yeah, i didn't even mention it at the time. I just thought I would mention it now since you were adding comments on restrictions to the toast threshold. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Fwd: Clarification about HOT
On Mon, Nov 05, 2007 at 09:32:08PM +0530, Gokulakannan Somasundaram wrote: Say, if we have a table with 4 indexes and updates occur in such intervals, we may not be able to find space in the same page for the update. Currently we are incurring the overhead of updating all the indexes in this scenario. Even if one of the index is updated, we will be incurring the same overhead again in the current scenario. Ok, I've been following this tangentially, but here is one thing I really don't understand: Sure, you might save this cost during update, but you do incur this cost while updating the head of the chain. There is no link from the chain to the index tuple, so the work to find the current index tuple is nearly the same as the cost to create a new one. It seems to me that updating and pruning the head will happen about equally often, so I'm not sure you're saving anything here. Or am I missing something? Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Those who make peaceful revolution impossible will make violent revolution inevitable. -- John F Kennedy signature.asc Description: Digital signature
Re: [HACKERS] Fwd: Clarification about HOT
On 11/5/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote: On Mon, Nov 05, 2007 at 09:32:08PM +0530, Gokulakannan Somasundaram wrote: Say, if we have a table with 4 indexes and updates occur in such intervals, we may not be able to find space in the same page for the update. Currently we are incurring the overhead of updating all the indexes in this scenario. Even if one of the index is updated, we will be incurring the same overhead again in the current scenario. Ok, I've been following this tangentially, but here is one thing I really don't understand: Sure, you might save this cost during update, but you do incur this cost while updating the head of the chain. There is no link from the chain to the index tuple, so the work to find the current index tuple is nearly the same as the cost to create a new one. It seems to me that updating and pruning the head will happen about equally often, so I'm not sure you're saving anything here. Or am I missing something? Whatever you just said will happen during the index Vacuum. Instead of marking a particular tuple as Dead/ ready to be Vacuumed, we will update its tid to point to the tuple which is next to the head of the chain. This won't incur extra overhead, except that during the Vacuum process we will carry the tid of the tuple, which is next to the head of the chain(a memory overhead). Hope , i was clear. Please get back, in case you haven't understood. -- Thanks, Gokul. CertoSQL Project, Allied Solution Group. (www.alliedgroups.com)
Re: [HACKERS] pg_ctl configurable timeout
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I think the mythical pg_ping utility should be written. It seems the easiest way out of the problem. If pg_ctl were still a shell script there would be some point in that, but since it's a C program it can certainly do anything a separate utility would do. Well, pg_ctl would not be the only user of such an utility. Things like (say) control panels for shared hosting could benefit from it as well. As would system health monitors. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pg_ctl configurable timeout
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I think the mythical pg_ping utility should be written. It seems the easiest way out of the problem. If pg_ctl were still a shell script there would be some point in that, but since it's a C program it can certainly do anything a separate utility would do. Well, pg_ctl would not be the only user of such an utility. Things like (say) control panels for shared hosting could benefit from it as well. As would system health monitors. I still see no point in creating a separate binary for the functionality. If you want to make it available to shell scripts, invent a pg_ctl ping subcommand. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] dblink un-named connection doesn't get re-used
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Decibel! wrote: Is it intentional that dblink's unnamed connections don't get re-used? stats=# select datname, usename from pg_stat_activity; datname | usename -+- stats | decibel (1 row) stats=# select dblink_connect('dbname=stats'); dblink_connect OK (1 row) stats=# select dblink_connect('dbname=postgres'); dblink_connect OK (1 row) stats=# select datname, usename from pg_stat_activity; datname | usename --+-- stats| decibel stats| postgres postgres | postgres (3 rows) AFAIK there's no way I could possibly use or refer to the connection to stats at this point; so why doesn't dblink close it when I issue the second connect? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] A small rant about coding style for backend functions
I understand your suggestions but it seems there would be too many individual items to be readable. Can you suggest a full list so we can get an idea of how long it would be? --- Brendan Jurd wrote: On 11/1/07, Bruce Momjian [EMAIL PROTECTED] wrote: I have not forgotten this suggestion. Do have any ideas what such a list would look like? Examples? Thanks for the reply Bruce. Code examples, perhaps with good style and bad style versions to illustrate each point. In the case of Tom's OP about making all your GETARG calls at the top of the function, you could show (for a trivial function) how it looks with all the GETARGs stacked neatly at the top, and how it looks with the GETARGs scattered and inaccurate. I think that would make it immediately clear to any newbie why it's a good idea. I think we have avoided more details in fear of scaring off coders. People usually follow our style as they gain experience. Having a hard list seems like it would be a lot of do's and don't's. Just my perspective, but I think you'll scare off a lot more coders by giving them no firm guidance in the first place, and then jumping down their throats with you did this wrong when they post a patch. Might be worth opening up a wiki page for devs to contribute their insights about writing excellent (as opposed to merely passable) Postgres code. The GETARG rant could make a good starting point. Cheers BJ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Open items for 8.3
In an attempt to move us toward 8.3 RC1 I have put all open item emails into the patches queue: http://momjian.postgresql.org/cgi-bin/pgpatches -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Open items for 8.3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, 5 Nov 2007 12:47:10 -0500 (EST) Bruce Momjian [EMAIL PROTECTED] wrote: In an attempt to move us toward 8.3 RC1 I have put all open item emails into the patches queue: http://momjian.postgresql.org/cgi-bin/pgpatches It would be great if this would push to the wiki. That has been working really well through the cycle. Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHL1jNATb/zqfZUUQRAmI0AJ9HtxqXM52c9p999mkQ4CfZWQMeQQCfYMh2 arqwrUGX8YJkw5l4K/hkRM4= =cmdZ -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] plpgsql keywords are hidden reserved words
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This works fine in 8.2. The reason it no longer works is that query is now a special token in the plpgsql lexer, and that means that it will never be substituted for by read_sql_construct(). So it's effectively a reserved word. Perhaps we should be throwing a more intelligible error if you have a parameter (or variable?) named in a way that will conflict? Actually, it seems you already do get such a complaint if you try to use a keyword as a variable name: ERROR: syntax error at or near query LINE 2: declare query text; ^ Not the most tremendously helpful message, maybe, but at least it's pointing at the right place. So the problem is only for function parameter names, which aren't lexed by plpgsql itself but by the main parser. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] plpgsql keywords are hidden reserved words
John DeSoi [EMAIL PROTECTED] writes: Is there any feasibility to the idea of allowing pl/pgsql variables and parameters to be prefixed with a special character like '$'? I find this (a) really ugly, (b) incompatible with Oracle, which you'll recall is one of the main driving ideas for plpgsql, and (c) almost certainly a lexical conflict with dollar quoting. Most other special characters you might suggest would create parsing ambiguities too. I'm constantly adding prefixes like 'v_' because of conflicts with table or column names. The reason you have to do that is that we got the lookup order backward: per Oracle, column names within a query should bind more tightly than plpgsql variable names, and if you need to disambiguate you qualify the variables. We should fix that instead of bogotifying the syntax. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?
I'm trying fix independence of pg_dump.c on postgres.h. And I found following construct in dumpSequence function: 09391 snprintf(bufm, sizeof(bufm), INT64_FORMAT, SEQ_MINVALUE); 09392 snprintf(bufx, sizeof(bufx), INT64_FORMAT, SEQ_MAXVALUE); 09393 09394 appendPQExpBuffer(query, 09395 SELECT sequence_name, last_value, increment_by, 09396CASE WHEN increment_by 0 AND max_value = %s THEN NULL 09397 WHEN increment_by 0 AND max_value = -1 THEN NULL 09398ELSE max_value 09399 END AS max_value, 09400 CASE WHEN increment_by 0 AND min_value = 1 THEN NULL 09401 WHEN increment_by 0 AND min_value = %s THEN NULL 09402ELSE min_value 09403 END AS min_value, 09404 cache_value, is_cycled, is_called from %s, 09405 bufx, bufm, 09406 fmtId(tbinfo-dobj.name)); This construct is used to determine if max_value/min_value is used and after that pg_dump add NO MAXVALUE to the output instead of the value. If I compare it with manual documentation NO MAXVALUE uses default value and I do not see any reason to have this code there. I think we can remove this code and release dependency on sequence.h. Any comments? Zdenek ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Open items for 8.3
Joshua D. Drake wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, 5 Nov 2007 12:47:10 -0500 (EST) Bruce Momjian [EMAIL PROTECTED] wrote: In an attempt to move us toward 8.3 RC1 I have put all open item emails into the patches queue: http://momjian.postgresql.org/cgi-bin/pgpatches It would be great if this would push to the wiki. That has been working really well through the cycle. The wiki has a nice summary and links to the discussion. I don't have time to do that much work on this, and maintain it. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Open items for 8.3
Joshua D. Drake [EMAIL PROTECTED] writes: On Mon, 5 Nov 2007 12:47:10 -0500 (EST) Bruce Momjian [EMAIL PROTECTED] wrote: In an attempt to move us toward 8.3 RC1 I have put all open item emails into the patches queue: http://momjian.postgresql.org/cgi-bin/pgpatches It would be great if this would push to the wiki. That has been working really well through the cycle. How many developers have even jumped through the hoops to get wiki accounts? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] sending row data to frontend - reg
Hi All, we are trying to modify the source code of postgresql to support a new command pivot colA colB tablename we are doing the following steps: 1. After parsing the input, we are able to get the values entered by the user for column A , column B and tablename. 2. Then, from postgresmain, we call exec_simple_query with a new query select * from tablename 3. Then we read the result of execution of this query from sendrowdescriptionmessage and printtup functions and also, don't allow these to send the result of the select query to the frontend 4. now, we do the required changes needed for the pivot command 5. after this, from executorrun, we call sendrowdescriptionmessage function ( with modifications) to send the newly calculated row description - this is working fine 6. Now, we want to send the newly calculated row data to the frontend. We are not able to do this... is there any way to send the row values, without invoking the executeplan again? - since executeplan is overwriting the earlier sent rowdescription (which we are not able to figure out why ! ) Can anyone help us? we would be grateful, if you could give any pointers in this regard. Thanking you in advance, Rose.
Re: [HACKERS] pg 8.3beta 2 restore db with autovacuum report
Guillaume Smet wrote: I did the test again with the reference database I used a month ago. My previous figures with 8.3devel of October 1st were: - autovacuum off: 14m39 - autovacuum on, delay 20: 51m37 With 8.3devel of today, I have: - autovacuum on, delay 20: 15m26 Yay! Thanks! (It does take a bit longer, but I'm not really concerned about it.) -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica (Irulan) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Postgresql 8.3 beta crash
Peter Eisentraut [EMAIL PROTECTED] writes: Heikki Linnakangas wrote: It still feels unsafe to call ExecEvalExpr while holding on to xml structs. It means that it's not safe for external modules to use libxml2 and call xmlMemSetup or xmlSetGenericErrorFunc themselves. Well yeah, they shouldn't do that. I don't think we want to support that. I'm with Heikki that it would be cleaner/safer if we could allow that. The particular case that's bothering me is the idea that something like Perl could well try to use libxml internally, and if so it'd very likely call these functions. Now if Perl thinks it's got sole control, and tries to (say) re-use the results of xmlInitParser across calls, we're screwed anyway. But that's not an argument for designing our own code in a way that guarantees it can't share libxml with other code. So I'm thinking that we should continue to call xmlMemSetup, xmlSetGenericErrorFunc, and xmlInitParser (if needed) at the start of every XML function, and reorganize the code so that we don't call out to random other code until we've shut down libxml again. The main disadvantage I can see of reorganizing like that is that it will increase xmlelement's transient memory consumption, since it will need to accumulate all the OutputFunctionCall result strings before it starts to pass them to libxml. This probably isn't a huge problem though. Has anyone started actively working on this yet? If not, I can tackle it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?
Zdenek Kotala [EMAIL PROTECTED] writes: This construct is used to determine if max_value/min_value is used and after that pg_dump add NO MAXVALUE to the output instead of the value. If I compare it with manual documentation NO MAXVALUE uses default value and I do not see any reason to have this code there. I think we can remove this code and release dependency on sequence.h. IIUC you are proposing that it's okay to print random-huge-values in the dumped CREATE SEQUENCE commands. I don't agree with that. In particular it would make for a legacy/compatibility issue for INT64_IS_BROKEN platforms. A better solution might be to move the declarations of SEQ_MINVALUE/SEQ_MAXVALUE someplace else. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] sending row data to frontend - reg
Rose Catherine K [EMAIL PROTECTED] writes: we are trying to modify the source code of postgresql to support a new command pivot colA colB tablename we are doing the following steps: 1. After parsing the input, we are able to get the values entered by the user for column A , column B and tablename. 2. Then, from postgresmain, we call exec_simple_query with a new query select * from tablename 3. Then we read the result of execution of this query from sendrowdescriptionmessage and printtup functions and also, don't allow these to send the result of the select query to the frontend 4. now, we do the required changes needed for the pivot command 5. after this, from executorrun, we call sendrowdescriptionmessage function ( with modifications) to send the newly calculated row description - this is working fine 6. Now, we want to send the newly calculated row data to the frontend. We are not able to do this... It sounds to me like you've kluged the I/O support stuff to the point that it doesn't work at all. Leave that alone and use SPI to execute your internal query. As a general style suggestion, if you are touching postgres.c at all to add a new command, you are putting it in the wrong place. The guts of it should be under backend/commands/. I think you ought to look at existing utility commands that return rows, such as EXPLAIN or SHOW, for guidance. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Visibility map thoughts
On Mon, 2007-11-05 at 09:52 +, Heikki Linnakangas wrote: Reducing VACUUM time is important, but the real big promise is the ability to do index-only-scans. Because that's the main focus of this exercise, I'm calling it the the Visibility Map from now on, because it's not about tracking dead space, but tuple visibility in general. Don't worry, reduced VACUUM times on read-mostly tables with hot spots will still fall out of it. I like Visibility map because it's a positive name, and that prevents confusion over double-negatives (for the same reason it's called synchronous_commit even though the new feature is the ability to be asynchronous). With Dead Space Map, I wouldn't immediately know whether a 1 means always visible or might be invisible. However, DSM is a much less overloaded acronym than VM ;) Regarding the focus, it will depend a lot on the user. Some people will care more about VACUUM and some will care more about index-only scans. It's not useful for VACUUM FREEZE, unless we're willing to freeze much more aggressively, and change the meaning of a set bit to all tuples on heap page are frozen. This means that a regular VACUUM will no longer be enough to ensure safety from transaction id wraparound. I don't think this will be hard to fix, but it's an extra detail that would need to be decided. The most apparent options appear to be: 1) Do as you say above. What are some of the cost trade-offs here? It seems that frequent VACUUM FREEZE runs would keep the visibility map mostly full, but will also cause more writing. I suppose the worst case is that every tuple write needs results in two data page writes, one normal write and another to freeze it later, which sounds bad. Maybe there's a way to try to freeze the tuples on a page before it's written out? The idea of frozen and always visible seem very close in concept to me. 2) Change to autovacuum to FREEZE on the forced autovacuum to prevent wraparound. 3) Use multiple bits per visibility map 4) Have multiple types of visibility maps The more I think about the visibility map, the more I think it will be a huge win for PostgreSQL. It's especially nice that it works so well with HOT. Thanks for working on it! Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?
Tom Lane wrote: A better solution might be to move the declarations of SEQ_MINVALUE/SEQ_MAXVALUE someplace else. Hmm. It seems better, but it is also hard to find correct place. :( I'm thinking put it into c.h. Another question why sequence does not have separate flag which determines if it is default/no max value or predefined? Any comments, better ideas? Zdenek ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Is necessary to use SEQ_MAXVALUE in pg_dump?
Zdenek Kotala [EMAIL PROTECTED] writes: Tom Lane wrote: A better solution might be to move the declarations of SEQ_MINVALUE/SEQ_MAXVALUE someplace else. Hmm. It seems better, but it is also hard to find correct place. :( I'm thinking put it into c.h. The idea that was in the back of my mind was pg_config_manual.h, since these numbers could be seen as configuration constants if you hold your head at the right angle ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] should I worry?
Doesn't DROP TRIGGER require the name of the trigger? He says they are unnamed. How then does he drop them? On Nov 5, 2007, at 6:31 AM, Tom Lane wrote: [EMAIL PROTECTED] writes: On Sun, 4 Nov 2007, Tom Lane wrote: So you have a *bunch* of partially broken FK constraints in that source database. I just talk to my customer and he/we'll make a big batch deleting and recreating all foreign keys on 8.2.5. The question is, how do we get rid of those useless unamed triggers? DROP TRIGGER should be fine. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] should I worry?
Rick Gigger wrote: Doesn't DROP TRIGGER require the name of the trigger? He says they are unnamed. How then does he drop them? They're not really unnamed. pg_dump just replaces the real name with unnamed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Visibility map thoughts
Heikki Linnakangas [EMAIL PROTECTED] writes: Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts. I think we should do this at the same time as pushing the FSM out of shared memory, and design a data structure that serves both needs. Where to store the visibility map? -- a) In a fixed size shared memory struct. Not acceptable. b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes. c) As a new relation kind, like toast tables. d) As an auxiliary smgr relation, attached to the heap. I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much. I like B. The main objection I have to D is that it'll be far more invasive to the buffer manager (and a bunch of other code) than you admit; for starters RelFileNode won't work as-is. Another problem with D is that you can't readily make the number of blocks per visibility page an exact power of 2, unless you are willing to waste near half of each visibility page. That will complicate and slow down addressing ... ok, maybe not much, but some. What I'm envisioning is that we dedicate a quarter or half of every n'th heap page to visibility + free space map. Probably a byte per page is needed (this would give us 1 visibility bit and 7 bits for free space, or other tradeoffs if needed). So there would be 2K or 4K heap pages associated with each such special page. Or we could dial it down to even less, say 1K heap pages per special page, which would have the advantage of reducing update contention for those pages. Setting a bit is just a hint. It's ok to lose it on crash. However, setting a bit mustn't hit the disk too early. What might otherwise happen is that the change that made all tuples on a page visible, like committing an inserting transaction, isn't replayed after crash, but the set bit is already on disk. In other words, setting a bit must set the LSN of the visibility map page. I don't think this really works. You are effectively assuming that no kind of crash-induced corruption can set a bit that you didn't intend to set. Stated in those terms it seems obviously bogus. What we will need is that every WAL-logged update operation also include the appropriate setting or clearing of the page's visibility bit; where necessary, add new information to the WAL trace to allow this. To further reduce contention, we can have a copy of the bit in the page header of the heap page itself. That way we'd only need to access the visibility map on the first update on a page that clears a bit. Seems exceedingly prone to corruption. - We don't need to clear the bit on HOT updates, because by definition none of the indexed columns changed. Huh? I don't think I believe that, and I definitely don't believe your argument for it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] should I worry?
Heikki Linnakangas [EMAIL PROTECTED] writes: Rick Gigger wrote: Doesn't DROP TRIGGER require the name of the trigger? He says they are unnamed. How then does he drop them? They're not really unnamed. pg_dump just replaces the real name with unnamed. And \d will show the real names of the triggers, so it's not really that hard to drop them: u=# \d t1 Table public.t1 Column | Type | Modifiers +-+--- f1 | integer | not null Indexes: t1_pkey PRIMARY KEY, btree (f1) Triggers: RI_ConstraintTrigger_229629 AFTER DELETE ON t1 FROM t2 NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_noaction_del('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1') RI_ConstraintTrigger_229630 AFTER UPDATE ON t1 FROM t2 NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_noaction_upd('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1') u=# drop trigger RI_ConstraintTrigger_229629 on t1; DROP TRIGGER I do recall newbies forgetting to double-quote the mixed-case trigger names when this came up in times past, though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open items for 8.3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, 05 Nov 2007 18:57:39 + Gregory Stark [EMAIL PROTECTED] wrote: Joshua D. Drake [EMAIL PROTECTED] writes: On Mon, 5 Nov 2007 12:47:10 -0500 (EST) Bruce Momjian [EMAIL PROTECTED] wrote: In an attempt to move us toward 8.3 RC1 I have put all open item emails into the patches queue: http://momjian.postgresql.org/cgi-bin/pgpatches It would be great if this would push to the wiki. That has been working really well through the cycle. How many developers have even jumped through the hoops to get wiki accounts? You don't need a wiki account to view. - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHL4+NATb/zqfZUUQRAn4KAJ9AsHQODMrle5yRmxz8rGbG4upsXwCbBjLH 1FeUN3lIPZGw/RUF4MIY59Q= =nGLR -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Segmentation fault using digest from pg_crypto
Bruce Momjian wrote: Marko Kreen wrote: On 8/24/07, Manuel Sugawara [EMAIL PROTECTED] wrote: If something is not going a work (or is going a work in a different way) in some version after loading a shot form a previous one I think it should be documented or some kind of backwards compatibility mechanism should be provided. That's a valid complaint and I take the blame. The problem was that such evolution was not actually planned. So when I noticed the 8.2 commit, I did not think of the implications hard enough to realize the need for release note for it. Tom, how about putting a note about that into next 8.2 minor release notes? (8.3 too?) Something like You need to refresh pgcrypto functions, because since rel 8.2 the code depends on functions being tagged STRICT. Seems 8.2.5 was released without this release notes mention, but we haven't gotten any complaints about it so perhaps we don't need to add anything. Huh, I see exactly that complaint above. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC Estoy de acuerdo contigo en que la verdad absoluta no existe... El problema es que la mentira sà existe y tu estás mintiendo (G. Lama) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Segmentation fault using digest from pg_crypto
Alvaro Herrera wrote: If something is not going a work (or is going a work in a different way) in some version after loading a shot form a previous one I think it should be documented or some kind of backwards compatibility mechanism should be provided. That's a valid complaint and I take the blame. The problem was that such evolution was not actually planned. So when I noticed the 8.2 commit, I did not think of the implications hard enough to realize the need for release note for it. Tom, how about putting a note about that into next 8.2 minor release notes? (8.3 too?) Something like You need to refresh pgcrypto functions, because since rel 8.2 the code depends on functions being tagged STRICT. Seems 8.2.5 was released without this release notes mention, but we haven't gotten any complaints about it so perhaps we don't need to add anything. Huh, I see exactly that complaint above. My point is that we can't mention it in the release notes until 8.2.6. Will there still be people who are having an issue with it who haven't found the problem already? And if we put it in 8.2.6, it really was effective for 8.2 so we just mention it in 8.2.6 and say it applies to 8.2.X too? -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Test lab
On 11/4/07, Simon Riggs [EMAIL PROTECTED] wrote: Mark, Why don't you post a TODO list for TPC-E somewhere, so people can bite small pieces off of the list. I'm sure there's lots of people can help if we do it that way. This should be a good start: http://osdldbt.sourceforge.net/dbt5/todo.html Regards, Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Visibility map thoughts
Jeff Davis wrote: On Mon, 2007-11-05 at 09:52 +, Heikki Linnakangas wrote: It's not useful for VACUUM FREEZE, unless we're willing to freeze much more aggressively, and change the meaning of a set bit to all tuples on heap page are frozen. This means that a regular VACUUM will no longer be enough to ensure safety from transaction id wraparound. Good point. So we'd still need regular VACUUMs every now and then. (Gosh, we really need a name for the sort of vacuum. I was about to say we'd still need regular regular VACUUMs :-)) I don't think this will be hard to fix, but it's an extra detail that would need to be decided. The most apparent options appear to be: 1) Do as you say above. What are some of the cost trade-offs here? It seems that frequent VACUUM FREEZE runs would keep the visibility map mostly full, but will also cause more writing. I suppose the worst case is that every tuple write needs results in two data page writes, one normal write and another to freeze it later, which sounds bad. Maybe there's a way to try to freeze the tuples on a page before it's written out? It would also create more WAL traffic, because freezing tuples needs to be WAL-logged. 2) Change to autovacuum to FREEZE on the forced autovacuum to prevent wraparound. Doesn't necessarily need to be a VACUUM FREEZE. Just a regular VACUUM instead of using the visibility map. 3) Use multiple bits per visibility map That would work. 4) Have multiple types of visibility maps I'd rather not do that. It starts to get more complex and more expensive to update. 5) Have a more fine-grain equivalent of relfrozenxid. For example one frozenxid per visibility map page, so that whenever you update the visibility map, you also update the frozenxid. To advance the relfrozenxid in pg_class, you scan the visibility map and set relfrozenxid to the smallest frozenxid. Unlike relfrozenxid, it could be set to FrozenXid if the group of pages are totally frozen. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] should I worry?
Ah, yes it was the quotes. I guess that makes me a newbie. :) On Nov 5, 2007, at 1:53 PM, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Rick Gigger wrote: Doesn't DROP TRIGGER require the name of the trigger? He says they are unnamed. How then does he drop them? They're not really unnamed. pg_dump just replaces the real name with unnamed. And \d will show the real names of the triggers, so it's not really that hard to drop them: u=# \d t1 Table public.t1 Column | Type | Modifiers +-+--- f1 | integer | not null Indexes: t1_pkey PRIMARY KEY, btree (f1) Triggers: RI_ConstraintTrigger_229629 AFTER DELETE ON t1 FROM t2 NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_noaction_del('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1') RI_ConstraintTrigger_229630 AFTER UPDATE ON t1 FROM t2 NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE RI_FKey_noaction_upd('unnamed', 't2', 't1', 'UNSPECIFIED', 'f2', 'f1') u=# drop trigger RI_ConstraintTrigger_229629 on t1; DROP TRIGGER I do recall newbies forgetting to double-quote the mixed-case trigger names when this came up in times past, though. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Visibility map thoughts
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts. I think we should do this at the same time as pushing the FSM out of shared memory, and design a data structure that serves both needs. I'm afraid the data structure required for FSM is very different. For the visibility map, we need something that can be quickly addressed by heap block number. For FSM, we need something that's addressable by tuple size. Cache locality of having the FSM and the visibility map in the same structure is also not very good. Visibility map is accessed by reads, and updates/inserts/deletes on pages that previously had a set bit in the map, while the FSM is only needed for cold updates and deletes. Where to store the visibility map? -- a) In a fixed size shared memory struct. Not acceptable. b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes. c) As a new relation kind, like toast tables. d) As an auxiliary smgr relation, attached to the heap. I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much. I like B. The main objection I have to D is that it'll be far more invasive to the buffer manager (and a bunch of other code) than you admit; for starters RelFileNode won't work as-is. One problem is that you have to atomically update the visibility map when you update the heap. That means that you have to lock the visibility map page and the heap page at the same time. If the visibility map is in the heap, you need to take care that you don't deadlock. We can't directly use B for the FSM, because we also need a FSM for indexes, so we'll need something else for indexes anyway. B is also problematic if we ever get to do upgrades without dump/restore, because it requires changes to the format of the heap itself. What I'm envisioning is that we dedicate a quarter or half of every n'th heap page to visibility + free space map. Probably a byte per page is needed (this would give us 1 visibility bit and 7 bits for free space, or other tradeoffs if needed). So there would be 2K or 4K heap pages associated with each such special page. Or we could dial it down to even less, say 1K heap pages per special page, which would have the advantage of reducing update contention for those pages. How would you search that for a page with X bytes of free space? I don't think this really works. You are effectively assuming that no kind of crash-induced corruption can set a bit that you didn't intend to set. Don't we already assume that for hint-bit updates? Stated in those terms it seems obviously bogus. What we will need is that every WAL-logged update operation also include the appropriate setting or clearing of the page's visibility bit; where necessary, add new information to the WAL trace to allow this. I think we already emit a WAL record whenever we would set the bit in the visibility map, so we can certainly do that. It was more an interesting observation than anything else. To further reduce contention, we can have a copy of the bit in the page header of the heap page itself. That way we'd only need to access the visibility map on the first update on a page that clears a bit. Seems exceedingly prone to corruption. It does? It seems fairly simple to keep it up-to-date. Whenever we update the visibility map, we're holding the heap page locked as well. If you were thinking of hardware failure, I don't see how that bit would be more susceptible to that, and the potential damage isn't any bigger than from any other random bit-flip either. We can double-check and correct both the bit in the page header and in the visibility map whenever we perform a regular vacuum (or prune), if they did get corrupt for some reason. - We don't need to clear the bit on HOT updates, because by definition none of the indexed columns changed. Huh? I don't think I believe that, and I definitely don't believe your argument for it. HOT-updating a row doesn't change what an index-only scan would see. An index-only scan only needs columns that are in the index, and since it's a HOT update, none of those columns have changed, so it doesn't matter which tuple we're returning them from. Pages that have HOT updated tuples, but haven't otherwise been modified since last vacuum are also not interesting for VACUUM, because a prune will do the job of getting rid of dead HOT-updated tuples just as well. Am I missing something? Another thought: Should we support having a FSM and visibility map on temp tables? Doesn't seem very useful, but why not, I guess. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] plpgsql keywords are hidden reserved words
On Nov 5, 2007, at 10:20 AM, Tom Lane wrote: I don't see any fix for this that's reasonable to try to shoehorn into 8.3, but I think we really need to revisit the whole area of plpgsql variable substitution during 8.4. We could make this problem go away if variable substitution happened through a parser callback instead of before parsing. Is there any feasibility to the idea of allowing pl/pgsql variables and parameters to be prefixed with a special character like '$'? I'm constantly adding prefixes like 'v_' because of conflicts with table or column names. It would be nice to have something like declare $myvar integer; so it would be very easy to distinguish variable and parameter names from structure names or reserved words. John DeSoi, Ph.D. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] A small rant about coding style for backend functions
On 11/6/07, Bruce Momjian [EMAIL PROTECTED] wrote: I understand your suggestions but it seems there would be too many individual items to be readable. Can you suggest a full list so we can get an idea of how long it would be? If the body of material on writing good Postgres code becomes so comprehensive that it doesn't all fit on one page: a) I would see that as a positive! b) We can of course split it up into sub-articles. I can't realistically suggest a full list since I am not an experienced Postgres hacker. But I would hope for, as a minimum: * the stuff about good GETARG style, * something about using ereport() effectively, and writing localization-friendly messages, * some actual coding style guidelines. Regards, BJ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Open items for 8.3
On Mon, 5 Nov 2007, Gregory Stark wrote: How many developers have even jumped through the hoops to get wiki accounts? According to http://developer.postgresql.org/index.php?title=Special:Listusersgroup=pgdevlimit=500 there are currently 51 members of the group pgdev on the wiki. -- Spare no expense to save money on this one. -- Samuel Goldwyn ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Open items for 8.3
How many developers have even jumped through the hoops to get wiki accounts? According to http://developer.postgresql.org/index.php?title=Special:Listusersgroup=pgdevlimit=500 there are currently 51 members of the group pgdev on the wiki. Well, a lot of those people aren't actually developers, which was the question. But a lot of them are. More interesting would be how many has ever edited anything past just signing up... /Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Hash index todo list item
Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but significantly better a significantly better choice at least for some specific circumstances. We performed some probe tests on our patch on hash index and the original btree index to compare the performance between the two. We used a 80 million tuple table. The table contained single integer attribute and the data range was 0 - 1. (generated by a random generator). We did the following: 1. Populate the table with 80 million tuples. 2. Create HASH index on the table. 3. clear both linux cache psql buffers. (exiting psql and restarting it cleared the psql buffers; to clear linux cache, we used drop_cache command) 4. start psql 5. select on an integer in the range of values in the table. (all test numbers were big ones, like 98934599) 6. record the time. 7. exit psql. 8. drop caches.(as described above) 9. repeat 4-8 for different numbers. 10. Drop Hash index. 11. Create Btree index and repeat 3-9. The results are as shown below. All trials are in ms. Count is the number of such tuples in the table. HASH INDEX: Number Count Trial1 Trial2 Trial3 21367898 2 152.0129.5 131.1 95678699 2 140.6145.6 137.5 99899799 2 148.0147.4 152.6 97677899 2 142.0153.5 112.0 94311123 2 137.6146.3 141.3 67544455 2 141.6144.6 152.7 88877677 2 122.1123.1 130.8 88788988 2 150.2150.0 171.7 4444 1 119.9116.3 117.6 34267878 1 97.5 99.9 114.8 55489781 2 115.7117.2 145.3 97676767 1 169.5168.5 181.7 99767564 1 142.7133.6 132.7 21367989 4 198.0193.2 186.7 BTREE INDEX Number Trial1Trial2 Trial3 21367898 145.5 145.6 150.6 95678699 177.5 170.0 176.4 99899799 175.4 181.2 185.4 97677899 136.9 149.0 130.8 94311123 179.0 175.3 166.3 67544455 161.7 162.2 170.4 88877677 138.7 135.2 149.9 88788988 165.7 179.3 186.3 4444 166.0 172.8 184.3 34267878 159.1 168.8 147.8 55489781 183.2 195.4 185.1 97676767 147.2 143.6 132.6 99767564 167.8 178.3 162.1 21367989 232.3 238.1 216.9 From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant difference. Our prediction that the hash index would take on the average one probe for about 10ms and the btree would take three probes for about 30 ms or a difference of about 20ms was pretty well shown by the difference we got of about 27. Hope these data points will help with some questions about the performance differences between Hash and Btree index. Regards, Shreya Gregory Stark [EMAIL PROTECTED] wrote: Kenneth Marshall writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ... That is super! (and timely) It is pretty super. I have a few comments to raise but don't take it to be too negative, it sounds like this is a big step forward towards making hash indexes valuable. Firstly in the graphs it seems the index size graph has an exponential x-axis but a linear y-axis. This makes it hard to see what I would expect to be pretty much linear growth. The curves look exponential which would mean linear growth but of course it's hard to tell. Also, the growth in the time chart looks pretty much linear. That seems weird since I would expect there would be a visible extra component since sort times are n-log(n). Perhaps you need to test still larger tables to see that though. In any case it's clear from the results you have there that the change is a positive one and fixes a fundamental problem with the hash index build code. Something else you should perhaps test is indexing something which is substantially larger than hash function output. A hash function is going to make the most sense when indexing something like strings for which you want to avoid the long comparison costs. Especially consider testing this on a UTF8 locale with expensive comparisons (like a CJK locale for example). Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree
Re: [HACKERS] Visibility map thoughts
Heikki Linnakangas [EMAIL PROTECTED] writes: One problem is that you have to atomically update the visibility map when you update the heap. That means that you have to lock the visibility map page and the heap page at the same time. If the visibility map is in the heap, you need to take care that you don't deadlock. Well that's still a problem if it's in another filenode. On the other hand if you allocate a whole byte to every page you could set it atomically and not have to lock the page. Effectively having the lock on the original page protect the info byte. Whereas setting a single bit requires protecting against someone setting one of the other bits corresponding to another page entirely. - We don't need to clear the bit on HOT updates, because by definition none of the indexed columns changed. Huh? I don't think I believe that, and I definitely don't believe your argument for it. HOT-updating a row doesn't change what an index-only scan would see. An index-only scan only needs columns that are in the index, and since it's a HOT update, none of those columns have changed, so it doesn't matter which tuple we're returning them from. Pages that have HOT updated tuples, but haven't otherwise been modified since last vacuum are also not interesting for VACUUM, because a prune will do the job of getting rid of dead HOT-updated tuples just as well. Am I missing something? I think the point is that for index-only scans you really just want to know the visibility of the whole chain. The whole chain is either known-visible or not. A HOT update doesn't change that, it just changes which version along the chain is visible and the values of the non-indexed columns in that version. Some thought has to be given to the transition stages when you create or drop an index but I don't see a problem there. If you have a broken hot chain it doesn't change the visibility rules for anyone who does use an old index (and nobody who could see the broken end of the chain should be using the new index anyways). The problem with this is that it's different from what a DSM would need. We could skip vacuuming such HOT updated dead tuples, assuming a page prune will get it the next time we access the page I suppose. Or we could use a separate bit for more aggressive vacuum information. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq