I have updated the patch based on the previous comments. Sorry for the late patch.
I removed `SetZeroDamagedPageInChecksum` and add `zeroDamagePage` flag in smgrread to determine whether we should zero damage page when an error happens. It depends on the caller. `GetRelationFilePath` is removed as well. We print segno on the fly. On Thu, Feb 20, 2020 at 2:33 PM Hubert Zhang <hzh...@pivotal.io> wrote: > Thanks, > > On Thu, Feb 20, 2020 at 11:36 AM Andres Freund <and...@anarazel.de> wrote: > >> Hi, >> >> On 2020-02-19 16:48:45 +0900, Michael Paquier wrote: >> > On Wed, Feb 19, 2020 at 03:00:54PM +0900, Kyotaro Horiguchi wrote: >> > > I have had support requests related to broken block several times, and >> > > (I think) most of *them* had hard time to locate the broken block or >> > > even broken file. I don't think it is useles at all, but I'm not sure >> > > it is worth the additional complexity. >> > >> > I handle stuff like that from time to time, and any reports usually >> > go down to people knowledgeable about PostgreSQL enough to know the >> > difference. My point is that in order to know where a broken block is >> > physically located on disk, you need to know four things: >> > - The block number. >> > - The physical location of the relation. >> > - The size of the block. >> > - The length of a file segment. >> > The first two items are printed in the error message, and you can >> > guess easily the actual location (file, offset) with the two others. >> >> > I am not necessarily against improving the error message here, but >> > FWIW I think that we need to consider seriously if the code >> > complications and the maintenance cost involved are really worth >> > saving from one simple calculation. >> >> I don't think it's that simple for most. >> >> And if we e.g. ever get the undo stuff merged, it'd get more >> complicated, because they segment entirely differently. Similar, if we >> ever manage to move SLRUs into the buffer pool and checksummed, it'd >> again work differently. >> >> Nor is it architecturally appealing to handle checksums in multiple >> places above the smgr layer: For one, that requires multiple places to >> compute verify them. But also, as the way checksums are computed depends >> on the page format etc, it'll likely change for things like undo/slru - >> which then again will require additional smarts if done above the smgr >> layer. >> > > So considering undo staff, it's better to move checksum logic into md.c > I will keep it in the new patch. > > On 2020-02-19 16:48:45 +0900, Michael Paquier wrote: > > > Particularly, quickly reading through the patch, I am rather unhappy >> > about the shape of the second patch which pushes down the segment >> > number knowledge into relpath.c, and creates more complication around >> > the handling of zero_damaged_pages and zero'ed pages. -- Michael >> >> I do not like the SetZeroDamagedPageInChecksum stuff at all however. >> >> > I'm +1 on the first concern, and I will delete the new added function > `GetRelationFilePath` > in relpath.c and append the segno directly in error message inside > function `VerifyPage` > > As for SetZeroDamagedPageInChecksum, the reason why I introduced it is > that when we are doing > smgrread() and one of the damaged page failed to pass the checksum check, > we could not directly error > out, since the caller of smgrread() may tolerate this error and just zero > all the damaged page plus a warning message. > Also, we could not just use GUC zero_damaged_pages to do this branch, > since we also have ReadBufferMode(i.e. RBM_ZERO_ON_ERROR) to control it. > > To get rid of SetZeroDamagedPageInChecksum, one idea is to pass > zero_damaged_page flag into smgrread(), something like below: > == > > extern void smgrread(SMgrRelation reln, ForkNumber forknum, > > BlockNumber blocknum, char *buffer, int flag); > > === > > > Any comments? > > > > -- > Thanks > > Hubert Zhang > -- Thanks Hubert Zhang
0003-Print-physical-file-path-when-verify-checksum-failed.patch
Description: Binary data