On Thursday 27 May 2010 15:39:54 Paul Millar wrote:
> Hi,
> 
> I've been looking at Btrfs and have a couple of naive questions that don't
> seem to be answered on the wiki or in the articles I've read on the
> filesystem.
> 
> 
> First: discovering a file's checksum value.
> 
> Here's the scenario: software is writing some data as a fresh file.  This
> software happens to know (a priori) the checksum of this data; for example,
> a storage server receives the file's data and checksum independently.
> 
> I've some confidence that, once the data is stored in btrfs, any corruption
> (from the storage fabric) will be spotted; however, the data may have
> became corrupt before being stored (e.g., from the network).  To catch
> this, the checksum of the stored data needs to be calculated and checked.
> 
> One approach is to calculate the checksum (in user-space) after the data is
> stored.  This adds extra IO- and CPU-load and there's also the possibility
> of false-negative results due to the filesystem cache (although btrfs may
> remove this risk).
> 
> Another approach would be to ask btrfs for the checksum.  It seems that
> it's possible to combine multiple CRC-32C values to figure out the
> checksum of the combined data [e.g., zlib's crc32_combine() function]. 
> So, obtaining a file's checksum might be a light-weight operation.
> 
> Yet another possibility would be to push the desired checksum value (via
> fcntl?) and have btrfs compare the desired checksum with the file's actual
> checksum on close(2), failing that call if the checksums don't match.
> 
> Would any of this be possible (without an awful lot of work)?

IMO, if an application recieves data with checksum it can calculate the 
checksum of data on the fly, as it writes it to the disk. It won't add any 
additional IO to storage subsystem. It won't detect in-memory corruption 
though, but if you want to be resilant to this, you should be looking at ECC 
RAM as subsequent checks can be affected by it to.

Second, you shouldn't tie application or network protocol to a CRC scheme used 
by filesystem on server! Especially when there can be other CRC algorithms 
used, not only CRC-32C.

If the checksum algorithm used by FS was set in stone, then userspace could 
employ it somehow, but if there can be different CRCs used, I see no reason to 
allow the userspace to read them.


-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarządzania Jakością
zgodny z normą ISO 9001:2000
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to