[HACKERS] HOLD ON BETA2
[ Subject changed.] Marc, please hold on announcing beta2 until we get this resolved. Thanks. --- Tom Lane wrote: I think I've identified a primary cause for the no such pg_clog file problem that we've seen reported several times. A look at htup.h shows that the WAL only stores the low 8 bits of a tuple's t_infomask (see xl_heap_header struct). There is some fooling around in heapam.c's WAL redo routines to try to reconstitute some of the high-order bits, for example this: htup-t_infomask = HEAP_XMAX_INVALID | xlhdr.mask; But this is implicitly assuming that we can reconstruct the XMIN_COMMITTED bit at will. That was true when the WAL code was written, but with 7.2's ability to recycle allegedly-no-longer-needed pg_clog data, we cannot simply drop commit status bits. The only scenario I've been able to identify in which this actually causes a failure is when VACUUM FULL moves an old tuple and then shortly afterwards (before the next checkpoint) there is a crash. Post-crash, the tuple move will be redone from WAL, and the moved tuple will be inserted with zeroed-out commit status bits. When we next examine the tuple, we have to try to retrieve its commit status from pg_clog ... but it's not there anymore. As far as I can see, the only realistic solution is to store the full 16 bits of t_infomask in the WAL. We could do this without increasing WAL size by dropping the t_hoff field from xl_heap_header --- t_hoff is computable given the number of attributes and the HASNULL/HASOID bits, both of which are available. (Actually, we could save some space now by getting rid of t_oid in xl_heap_header; it's not necessary given that OID isn't in the fixed tuple header anymore.) This will require a WAL format change of course. Fortunately we can do that without forcing a complete initdb (people will have to run pg_resetxlog if they want to update a 7.3beta2 database without initdb). I see no way to fix the problem in the context of 7.2. Perhaps we should put out a bulletin warning people to avoid VACUUM FULL in 7.2, or at least to do CHECKPOINT as soon as possible after one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] HOLD ON BETA2
Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz --- Tom Lane wrote: I think I've identified a primary cause for the no such pg_clog file problem that we've seen reported several times. A look at htup.h shows that the WAL only stores the low 8 bits of a tuple's t_infomask (see xl_heap_header struct). There is some fooling around in heapam.c's WAL redo routines to try to reconstitute some of the high-order bits, for example this: htup-t_infomask = HEAP_XMAX_INVALID | xlhdr.mask; But this is implicitly assuming that we can reconstruct the XMIN_COMMITTED bit at will. That was true when the WAL code was written, but with 7.2's ability to recycle allegedly-no-longer-needed pg_clog data, we cannot simply drop commit status bits. The only scenario I've been able to identify in which this actually causes a failure is when VACUUM FULL moves an old tuple and then shortly afterwards (before the next checkpoint) there is a crash. Post-crash, the tuple move will be redone from WAL, and the moved tuple will be inserted with zeroed-out commit status bits. When we next examine the tuple, we have to try to retrieve its commit status from pg_clog ... but it's not there anymore. As far as I can see, the only realistic solution is to store the full 16 bits of t_infomask in the WAL. We could do this without increasing WAL size by dropping the t_hoff field from xl_heap_header --- t_hoff is computable given the number of attributes and the HASNULL/HASOID bits, both of which are available. (Actually, we could save some space now by getting rid of t_oid in xl_heap_header; it's not necessary given that OID isn't in the fixed tuple header anymore.) This will require a WAL format change of course. Fortunately we can do that without forcing a complete initdb (people will have to run pg_resetxlog if they want to update a 7.3beta2 database without initdb). I see no way to fix the problem in the context of 7.2. Perhaps we should put out a bulletin warning people to avoid VACUUM FULL in 7.2, or at least to do CHECKPOINT as soon as possible after one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] HOLD ON BETA2
Bruce Momjian [EMAIL PROTECTED] writes: Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz No, but they've been out there for more than a day. I think it's too late to retract beta2. We could fix this problem and then do a quick beta3, though... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] HOLD ON BETA2
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Oops, I see beta2's on the web site with yesterday's date. Have they been announced? -rw-r--r-- 1 70 70 1072573 Sep 25 10:15 postgresql-test-7.3b2.tar.gz No, but they've been out there for more than a day. I think it's too late to retract beta2. We could fix this problem and then do a quick beta3, though... Yes, that's the only solution. Let's not announce beta2 and make things worse. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org