Hi, At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote in <CAA4eK1JYyO5Hcxx4rP0jb=jmmc4qvy1yvg9uvkwnr5qrojs...@mail.gmail.com> > On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit.kapil...@gmail.com> > > wrote in > > <caa4ek1+-duto+myendle9p9u8g3fv6n+sojpsqmpsw6ashh...@mail.gmail.com> > >> On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmh...@gmail.com> wrote: > >> > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapil...@gmail.com> > >> > wrote: > >> >> I think we should do that as a separate patch (I can write the same as > >> >> well) because that is not new behavior introduced by this patch, ... > >> > > >> > True, although maybe hash indexes are the only way that happens today? > >> > > >> > >> No, I think it will happen for all other indexes as well. Basically, > >> we log new page for init forks of other indexes and then while > >> restoring that full page image, it will fall through the same path. > > > > (Sorry, I didn't noticed that hash metapage is not using log_newpgae) > > > > The bitmappage is also not using log_newpage. > > > For example, (bt|gin|gist|spgist|brin)buildempty are using > > log_newpage to log INIT_FORK metapages. I believe that the xlog > > flow from log_newpage to XLogReadBufferForRedoExtended is > > developed so that pages in init forks are surely flushed during > > recovery, so we should fix it instead adding other flushes if the > > path doesn't work. Am I looking wrong place? (I think it is > > working.) > > > > You are looking at right place. > > > If I'm understanding correctly here, hashbild and hashbuildempty > > should be refactored using the same structure with other *build > > and *buildempty functions. Specifically metapages initialization > > subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage > > or...) does only on-memory initialization and does not emit WAL, > > then *build and *buildempty emits WAL in their required way. > > > > I have considered this way of doing as well, read my first e-mail [1] > in this thread "Another approach to fix the issue ....". It is not > that we can't change the code of hashbuildempty to make it log new > pages for all kind of pages (metapage, bitmappage and datapages), but > I am not sure if there is a value in going in that direction. If we > have to go in that direction, we need to hard code some of the values > like hashm_nmaps and hashm_mapp in metapage rather than initializing > them after bitmappage creation. Before going in that direction, I > think we should also take opinion from other people especially some > committer as we might need to maintain two different ways of doing > almost the same thing.
Thanks for the explanation and the pointer (to the start of this thread.. sorry). > I am also not 100% comfortable with adding flush at two new places, > but that makes the code for fix simpler and fundamentally there is no > problem in doing so. I agree that the patch would be simpler. Ok, I am satisfied with an additional comment for _hash_init and hash_xlog_init_meta_page that describes the reason that _hash_init doesn't/can't use log_newpage and thus requires explicit flushes. Something like the description in [1] would be enough. > There is a small downside to always logging new page which is that it > will always log full page image in WAL which has the potential to > increase WAL volume if there are many unlogged tables and indexes. > Now considering the init fork generally has very less pages, it is not > a big concern, but still avoiding full page image has some value. Perhaps more effective mechanism will be developed at the time. > >> >>> By the way the comment of the function ReadBufferWithoutRelcache > >> >>> has the following sentense. > >> >>> > >> >>> | * NB: At present, this function may only be used on permanent > >> >>> relations, which > >> >>> | * is OK, because we only use it during XLOG replay. If in the > >> >>> future we > >> >>> | * want to use it on temporary or unlogged relations, we could pass > >> >>> additional > >> >>> | * parameters. > >> >>> > >> >>> and does > >> >>> > >> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, > >> >>> blockNum, > >> >>> mode, > >> >>> strategy, &hit); > >> >>> > >> >>> This surely works since BufferAlloc recognizes INIT_FORK. But > >> >>> some adjustment may be needed around here. > >> >> > >> >> Yeah, it should probably mention that the init fork of an unlogged > >> >> relation is also OK. > >> >> > >> > > >> > I think we should do that as a separate patch (I can write the same as > >> > well) because that is not new behavior introduced by this patch, but > >> > let me know if you think that we should add such a comment in this > >> > patch itself. > >> > > >> > >> Attached a separate patch to adjust the comment atop > >> ReadBufferWithoutRelcache. > > > > + * NB: At present, this function may only be used on permanent relations > > and > > + * init fork of an unlogged relation, which is OK, because we only use it > > + * during XLOG replay. If in the future we want to use it on temporary or > > + * unlogged relations, we could pass additional parameters. > > > > *I* think the target of the funcion is permanent relations and > > init forks, not unlogged relations. And I'd like to have an > > additional comment about RELPERSISTENCE_PERMANENT. Something like > > the attached. > > > > Okay, let's leave this for committer to decide. Agreed, thanks. > [1] - > https://www.postgresql.org/message-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w%40mail.gmail.com -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers