On Mon, Aug 18, 2014 at 10:05 AM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote:
> On 08/18/2014 08:05 AM, Alvaro Herrera wrote: > >> Marco Nenciarini wrote: >> >> To calculate the md5 checksum I've used the md5 code present in pgcrypto >>> contrib as the code in src/include/libpq/md5.h is not suitable for large >>> files. Since a core feature cannot depend on a piece of contrib, I've >>> moved the files >>> >>> contrib/pgcrypto/md5.c >>> contrib/pgcrypto/md5.h >>> >>> to >>> >>> src/backend/utils/hash/md5.c >>> src/include/utils/md5.h >>> >>> changing the pgcrypto extension to use them. >>> >> >> We already have the FNV checksum implementation in the backend -- can't >> we use that one for this and avoid messing with MD5? >> >> (I don't think we're looking for a cryptographic hash here. Am I wrong?) >> > > Hmm. Any user that can update a table can craft such an update that its > checksum matches an older backup. That may seem like an onerous task; to > correctly calculate the checksum of a file in a previous, you need to know > the LSNs and the exact data, including deleted data, on every block in the > table, and then construct a suitable INSERT or UPDATE that modifies the > table such that you get a collision. But for some tables it could be > trivial; you might know that a table was bulk-loaded with a particular LSN > and there are no dead tuples. Or you can simply create your own table and > insert exactly the data you want. Messing with your own table might seem > harmless, but it'll e.g. let you construct a case where an index points to > a tuple that doesn't exist anymore, or there's a row that doesn't pass a > CHECK-constraint that was added later. Even if there's no direct security > issue with that, you don't want that kind of uncertainty from a backup > solution. > > But more to the point, I thought the consensus was to use the highest LSN > of all the blocks in the file, no? That's essentially free to calculate (if > you have to read all the data anyway), and isn't vulnerable to collisions. > > - Heikki > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > We also have both crc32 and crc64 implementations in pg_crc. If the goal is just verifying file integrity (we can't really protect against intentional modification) crc sounds more appropriate to me.