On Mon, Aug 18, 2014 at 10:05 AM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>
>> Marco Nenciarini wrote:
>>
>>  To calculate the md5 checksum I've used the md5 code present in pgcrypto
>>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>>> files. Since a core feature cannot depend on a piece of contrib, I've
>>> moved the files
>>>
>>> contrib/pgcrypto/md5.c
>>> contrib/pgcrypto/md5.h
>>>
>>> to
>>>
>>> src/backend/utils/hash/md5.c
>>> src/include/utils/md5.h
>>>
>>> changing the pgcrypto extension to use them.
>>>
>>
>> We already have the FNV checksum implementation in the backend -- can't
>> we use that one for this and avoid messing with MD5?
>>
>> (I don't think we're looking for a cryptographic hash here.  Am I wrong?)
>>
>
> Hmm. Any user that can update a table can craft such an update that its
> checksum matches an older backup. That may seem like an onerous task; to
> correctly calculate the checksum of a file in a previous, you need to know
> the LSNs and the exact data, including deleted data, on every block in the
> table, and then construct a suitable INSERT or UPDATE that modifies the
> table such that you get a collision. But for some tables it could be
> trivial; you might know that a table was bulk-loaded with a particular LSN
> and there are no dead tuples. Or you can simply create your own table and
> insert exactly the data you want. Messing with your own table might seem
> harmless, but it'll e.g. let you construct a case where an index points to
> a tuple that doesn't exist anymore, or there's a row that doesn't pass a
> CHECK-constraint that was added later. Even if there's no direct security
> issue with that, you don't want that kind of uncertainty from a backup
> solution.
>
> But more to the point, I thought the consensus was to use the highest LSN
> of all the blocks in the file, no? That's essentially free to calculate (if
> you have to read all the data anyway), and isn't vulnerable to collisions.
>
> - Heikki
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

We also have both crc32 and crc64 implementations in pg_crc. If the goal is
just verifying file integrity (we can't really protect against intentional
modification) crc sounds more appropriate to me.

Reply via email to