Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On 03/12/2014 02:05 PM, Robert Haas wrote: On Wed, Mar 12, 2014 at 4:23 AM, Heikki Linnakangas wrote: The attached patch doesn't apply any more, but it looks like this issue still exists. Fixed. Did you forget to push? Yep. Pushed now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Wed, Mar 12, 2014 at 4:23 AM, Heikki Linnakangas wrote: >> The attached patch doesn't apply any more, but it looks like this >> issue still exists. > > Fixed. Did you forget to push? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On 03/10/2014 09:44 PM, Robert Haas wrote: On Mon, Apr 30, 2012 at 1:34 PM, Noah Misch wrote: When GIN changes a metapage, we WAL-log its ex-header content and never use a backup block. This reduces WAL volume since the vast majority of the metapage is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged content if the metapage LSN predates the WAL record LSN. If a metapage write tore and updated the LSN but not the other content, we would fail to complete the update. Instead, unconditionally reinitialize the metapage similar to how _bt_restore_meta() handles the situation. I found this problem by code reading and did not attempt to build a test case illustrating its practical consequences. It's possible that there's no problem in practice on account of some reason I haven't contemplated. The attached patch doesn't apply any more, but it looks like this issue still exists. Fixed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Mon, Apr 30, 2012 at 1:34 PM, Noah Misch wrote: > When GIN changes a metapage, we WAL-log its ex-header content and never use a > backup block. This reduces WAL volume since the vast majority of the metapage > is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged > content if the metapage LSN predates the WAL record LSN. If a metapage write > tore and updated the LSN but not the other content, we would fail to complete > the update. Instead, unconditionally reinitialize the metapage similar to how > _bt_restore_meta() handles the situation. > > I found this problem by code reading and did not attempt to build a test case > illustrating its practical consequences. It's possible that there's no > problem in practice on account of some reason I haven't contemplated. The attached patch doesn't apply any more, but it looks like this issue still exists. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
Robert Haas writes: > Are you planning to commit Noah's patch? I wasn't intending to do so personally in the near future; I've got other things on my to-do list. I won't object if somebody else commits it though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Thu, May 3, 2012 at 12:16 AM, Tom Lane wrote: > Having said all that, I wasn't really arguing that this was a guaranteed > safe thing for us to rely on; just pointing out that it's quite likely > that the issue hasn't been seen in the field because of this type of > consideration. Well, we do rely, in numerous places, on writes << 512 bytes not getting torn. pd_prune_xid, index tuple kills, heap tuple hint bits, relmapper files, etc. We generally assume, for example, that a 4-byte write which is 4-byte aligned does not need to be WAL-logged, which would be necessary if we thought that the write might be torn. Are you planning to commit Noah's patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
Daniel Farina writes: > On Wed, May 2, 2012 at 6:06 PM, Noah Misch wrote: >> Can we indeed assume that all support-worthy filesystems align the start of >> every file to a physical sector? I know little about modern filesystem >> design, but these references leave me wary of that assumption: >> >> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html >> http://en.wikipedia.org/wiki/Block_suballocation >> >> If it is a safe assumption, we could exploit it elsewhere. > Not to say whether this is safe or not, but it *is* exploited > elsewhere, as I understand it: the pg_control information, whose > justification for its safety is its small size. That may point to a > very rare problem with pg_control rather the safety of the assumption > it makes. I think it's somewhat common now for filesystems to attempt to optimize very small files (on the order of a few dozen bytes) in that way. It's hard to see where's the upside for changing the conventional storage allocation when the file is sector-sized or larger; the file system does have to be prepared to rewrite the file on demand, and moving it from one place to another isn't cheap. That wikipedia reference argues for doing this type of optimization on the last partial block of a file, which is entirely irrelevant for our purposes since we always ask for page-multiples of space. (The fact that much of that might be useless padding is, I think, unknown to the filesystem.) Having said all that, I wasn't really arguing that this was a guaranteed safe thing for us to rely on; just pointing out that it's quite likely that the issue hasn't been seen in the field because of this type of consideration. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Wed, May 2, 2012 at 6:06 PM, Noah Misch wrote: > Can we indeed assume that all support-worthy filesystems align the start of > every file to a physical sector? I know little about modern filesystem > design, but these references leave me wary of that assumption: > > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html > http://en.wikipedia.org/wiki/Block_suballocation > > If it is a safe assumption, we could exploit it elsewhere. Not to say whether this is safe or not, but it *is* exploited elsewhere, as I understand it: the pg_control information, whose justification for its safety is its small size. That may point to a very rare problem with pg_control rather the safety of the assumption it makes. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Mon, Apr 30, 2012 at 02:35:20PM -0400, Tom Lane wrote: > Noah Misch writes: > > When GIN changes a metapage, we WAL-log its ex-header content and never use > > a > > backup block. This reduces WAL volume since the vast majority of the > > metapage > > is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged > > content if the metapage LSN predates the WAL record LSN. If a metapage > > write > > tore and updated the LSN but not the other content, we would fail to > > complete > > the update. Instead, unconditionally reinitialize the metapage similar to > > how > > _bt_restore_meta() handles the situation. > > > I found this problem by code reading and did not attempt to build a test > > case > > illustrating its practical consequences. It's possible that there's no > > problem in practice on account of some reason I haven't contemplated. > > I think there's no problem in practice; the reason is that the > GinMetaPageData struct isn't large enough to extend past the first > physical sector of the page. So it's in the same disk sector as the > LSN and tearing is impossible. Still, this might be a good > future-proofing move, in case GinMetaPageData gets larger. Can we indeed assume that all support-worthy filesystems align the start of every file to a physical sector? I know little about modern filesystem design, but these references leave me wary of that assumption: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14690.html http://en.wikipedia.org/wiki/Block_suballocation If it is a safe assumption, we could exploit it elsewhere. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
Noah Misch writes: > When GIN changes a metapage, we WAL-log its ex-header content and never use a > backup block. This reduces WAL volume since the vast majority of the metapage > is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged > content if the metapage LSN predates the WAL record LSN. If a metapage write > tore and updated the LSN but not the other content, we would fail to complete > the update. Instead, unconditionally reinitialize the metapage similar to how > _bt_restore_meta() handles the situation. > I found this problem by code reading and did not attempt to build a test case > illustrating its practical consequences. It's possible that there's no > problem in practice on account of some reason I haven't contemplated. I think there's no problem in practice; the reason is that the GinMetaPageData struct isn't large enough to extend past the first physical sector of the page. So it's in the same disk sector as the LSN and tearing is impossible. Still, this might be a good future-proofing move, in case GinMetaPageData gets larger. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Torn page hazard in ginRedoUpdateMetapage()
When GIN changes a metapage, we WAL-log its ex-header content and never use a backup block. This reduces WAL volume since the vast majority of the metapage is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged content if the metapage LSN predates the WAL record LSN. If a metapage write tore and updated the LSN but not the other content, we would fail to complete the update. Instead, unconditionally reinitialize the metapage similar to how _bt_restore_meta() handles the situation. I found this problem by code reading and did not attempt to build a test case illustrating its practical consequences. It's possible that there's no problem in practice on account of some reason I haven't contemplated. Thanks, nm *** a/src/backend/access/gin/ginxlog.c --- b/src/backend/access/gin/ginxlog.c *** *** 492,504 ginRedoUpdateMetapage(XLogRecPtr lsn, XLogRecord *record) return; /* assume index was deleted, nothing to do */ metapage = BufferGetPage(metabuffer); ! if (!XLByteLE(lsn, PageGetLSN(metapage))) ! { ! memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData)); ! PageSetLSN(metapage, lsn); ! PageSetTLI(metapage, ThisTimeLineID); ! MarkBufferDirty(metabuffer); ! } if (data->ntuples > 0) { --- 492,503 return; /* assume index was deleted, nothing to do */ metapage = BufferGetPage(metabuffer); ! GinInitMetabuffer(metabuffer); ! memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData)); ! ! PageSetLSN(metapage, lsn); ! PageSetTLI(metapage, ThisTimeLineID); ! MarkBufferDirty(metabuffer); if (data->ntuples > 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers