subject:"\[RFC PATCH\] Btrfs\: add sha256 checksum option"

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-15 Thread David Sterba

On Mon, Dec 01, 2014 at 03:23:03PM -0800, Alex Elsayed wrote:
> > I have not seen any evidence that combining hashes like that actually
> > reduces the chances of collision, but if we assume it does, then
> > again, the non-crypto hashes would be faster. For example, 128-bit
> > Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
> > and would be faster than MD5 + whatever.
> 
> It has no real benefit, but _why_ depends on what your model is.
> 
> There's a saying that engineers worry about stochastic failure; security 
> professionals have to worry about malicious failure.
> 
> If your only concern is stochastic failure (random bitflips, etc), then the 
> chances of collision with 128-bit CityHash or MurmurHash or SipHash or what-
> have-you are already so small that every single component in your laptop 
> dying simultaneously is more likely. Adding another hash is thus just a 
> waste of cycles.

So as far as speed is preferred over strength, it does not matter much
which algorithm we choose and combining more together does not bring a
significant benefit. Crc32 is weak but has served well over the years,
the improvement to anything 128bit-based should be obvious.

> If your concern is malicious failure (in-band deduplication attack or 
> similar, ignoring for now that btrfs actually compares the extent data as 
> well IIRC), then it's well-known in the cryptographic community that the 
> concatenation of multiple hashes is as strong as the strongest hash, _but no 
> stronger_ [1].

In-band dedup uses sha anyway, the block checksums are only used for
verification.

> Since the strongest cipher in the above list is either a non-cryptographic 
> hash or MD5, which is known-weak to the point of there being numerous toy 
> programs finding collisions for arbitrary data, it would not be worth much.
> 
> The only place this might be of use is if you used N strong/unbroken hashes, 
> in order to hedge against up to N-1 of them being broken. However, the gain 
> of that is (again) infinetismal, and the performance cost quite large 
> indeed.

Thanks for you valuable input!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Alex Elsayed wrote:

> Christoph Anton Mitterer wrote:
> 
>> On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote:
>>> including that MAC-then-encrypt is fragile
>>> against a number of attacks, mainly in the padding-oracle category (See:
>>> TLS BEAST attack).
>> Well but here we talk about disk encryption... how would the MtE oracle
>> problems apply to that? Either you're already in the system, i.e. beyond
>> disk encryption (and can measure any timing difference)... or you're
>> not, but then you cannot measure anything.
> 
> Arguable. On a system with sufficiently little noise in the signal (say...
> systemd, on SSD, etc) you could possibly get some real information from
> corrupting padding on a relatively long extent used early in the boot
> process, by measuring how it affects time-to-boot.

To make this more concrete:

Alice owns the computer, and has root. /etc/shadow has the correct 
permissions.

Eve has _an_ account, but does not have root - and she wants it.

For simplicity, let's presume this is a laptop, Alice and Eve are sisters, 
and Eve wants to peek at Alice's diary.

Eve can boot into a livecd, selectively corrupt blocks, and get Alice to 
unlock the drive for a normal boot.

With this, she can execute the padding oracle attack against /etc/shadow, 
and deduce its contents.

The first rule of crypto is "Don't roll your own" largely because it is 
_brutally_ unforgiving of minor mistakes.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Christoph Anton Mitterer wrote:

> On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote:
>> including that MAC-then-encrypt is fragile
>> against a number of attacks, mainly in the padding-oracle category (See:
>> TLS BEAST attack).
> Well but here we talk about disk encryption... how would the MtE oracle
> problems apply to that? Either you're already in the system, i.e. beyond
> disk encryption (and can measure any timing difference)... or you're
> not, but then you cannot measure anything.

Arguable. On a system with sufficiently little noise in the signal (say... 
systemd, on SSD, etc) you could possibly get some real information from 
corrupting padding on a relatively long extent used early in the boot 
process, by measuring how it affects time-to-boot.

And padding oracles are just one issue. Overall, the problem is that MtE 
isn't generically secure. EtM or pure AEAD modes are, which means you can 
simply mark any attack that doesn't rely on one of the underlying primitives 
being weak as "Not applicable." It also means you can compose it out of 
arbitrary secure primitives, rather than needing to do your proof of 
security over again for every combination.

That's an _enormous_ win in terms of how easy it is to be sure a system is 
secure. Without it, you can't really be sure there isn't Yet Another Vector 
You Missed.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Christoph Anton Mitterer

On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote: 
> including that MAC-then-encrypt is fragile 
> against a number of attacks, mainly in the padding-oracle category (See: TLS 
> BEAST attack).
Well but here we talk about disk encryption... how would the MtE oracle
problems apply to that? Either you're already in the system, i.e. beyond
disk encryption (and can measure any timing difference)... or you're
not, but then you cannot measure anything.


Cheers,
Chris.


smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Christoph Anton Mitterer wrote:

> On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote:
>> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed 
>> wrote:
>> > Why not just use the kernel crypto API? Then the user can just specify
>> > any hash the kernel supports.
>> 
>> One reason is that crytographic hashes are an order of magnitude
>> slower than the fastest non-cryptographic hashes. And for filesystem
>> checksums, I do not see a need for crypotgraphic hashes.
> 
> I'm not that crypto expert, but wouldn't the combination of a
> cryptographic hash, in combination with e.g. dm-crypt below the
> filesystem give us what dm-crypt alone cannot really give us
> (authenticated integrity)?
> 
> Would that combination of hash+encrypt basically work like a MAC?

Sadly, no. Partially because in order for an encrypted hash to be a secure 
MAC, the encryption must be nonmalleable, which would require CMC or EME - 
encryption modes which Linux does not presently support as I understand it. 
There are other issues as well, including that MAC-then-encrypt is fragile 
against a number of attacks, mainly in the padding-oracle category (See: TLS 
BEAST attack).

AEAD modes are also nonmalleable, but as they are length-expanding they 
cannot be used for LUKS. However, as eCryptFS and possibly the recent ext4 
encryption work shows, using them at a higher-level (encrypting extents or 
files) does work. Of course, if you're using an AEAD mode in the filesystem 
anyway, just use it directly and have done with it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed  wrote:
>> There's a thing called the transitive property. When CRC32 is faster than
>> SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is
>> faster than CRC32, there are comparisons that can be made.
> 
> And yet you applied the transitive property with poor assumptions and
> in a convoluted way to come up with an incorrect conclusion.
> 
> 
>> It's that the flat assertion that "CityHash/SpookyHash/etc is always
>> faster" is _unwarranted_, as hardware acceleration _has a huge effect_.
> 
> Actually, the assertion is true and backed up by evidence that I
> cited. I'm not sure why you think hardware acceleration only helps
> SHA-1 and does not help CityHash or SpookyHash.

...because the hardware acceleration is in the form of instructions like 
"Update SHA1 state" ?

https://software.intel.com/en-us/articles/intel-sha-extensions

https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf
(page 99, the SHA1{C,P,M,H,SU0,SU1} instructions)

On SPARC it's a full-on crypto coprocessor.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed  wrote:
> There's a thing called the transitive property. When CRC32 is faster than
> SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is
> faster than CRC32, there are comparisons that can be made.

And yet you applied the transitive property with poor assumptions and
in a convoluted way to come up with an incorrect conclusion.


> It's that the flat assertion that "CityHash/SpookyHash/etc is always faster"
> is _unwarranted_, as hardware acceleration _has a huge effect_.

Actually, the assertion is true and backed up by evidence that I
cited. I'm not sure why you think hardware acceleration only helps
SHA-1 and does not help CityHash or SpookyHash.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Christoph Anton Mitterer

On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote: 
> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed  wrote:
> > Why not just use the kernel crypto API? Then the user can just specify any
> > hash the kernel supports.
> 
> One reason is that crytographic hashes are an order of magnitude
> slower than the fastest non-cryptographic hashes. And for filesystem
> checksums, I do not see a need for crypotgraphic hashes.

I'm not that crypto expert, but wouldn't the combination of a
cryptographic hash, in combination with e.g. dm-crypt below the
filesystem give us what dm-crypt alone cannot really give us
(authenticated integrity)?

Would that combination of hash+encrypt basically work like a MAC?

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed  wrote:
>> And I'm not sure what is "convoluted" or "incorrect" about saying "Look,
>> empirical evidence!"
> 
> No empirical evidence of the speed of SpookyHash or CityHash versus
> SHA-1 was cited. The only empirical data mentioned was on an
> UltraSPARC CPU, and did not include any SpookyHash or CityHash
> measurements, and yet you made a claim about the speeds on Intel and
> ARM CPUs.

There's a thing called the transitive property. When CRC32 is faster than 
SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is 
faster than CRC32, there are comparisons that can be made.

And what I've been trying to say this whole time is not some point about an 
individual architecture.

It's that the flat assertion that "CityHash/SpookyHash/etc is always faster" 
is _unwarranted_, as hardware acceleration _has a huge effect_.

On SPARC, it's empirically enough for SHA-1 to match CRC32.
On ARMv8, it brings SHA-1 from 4-8 cycles per byte down to _2_.
On Intel, when the Skylake SHA extensions land, it will likely have an 
enormous impact as well.

Broad, sweeping generalizations are great - so long as they are _properly 
qualified_.

For instance, I would agree *wholeheartedly* that a good software 
implementation of CityHash/SpookyHash/etc would beat the *pants* off a good 
software implementation of SHA-1. No question.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 4:06 PM, Alex Elsayed  wrote:
> https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl
>
> # hardware-assisted software(*)
> # Apple A72.31  4.13 (+14%)
> # Cortex-A53  2.19  8.73 (+108%)
> # Cortex-A57  2.35  7.88 (+74%)


Note that those are showing 2 cycles per byte.

> From the CityHash readme, on a Xeon X5550 (which is _considerably_ more
> powerful than any of the above):
>
> On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at
> about 5 to 5.5 bytes/cycle.

5 bytes per cycle is 0.2 cycles per byte. So your own citation shows
that CityHash is 10 times faster.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Alex Elsayed wrote:

> So CityHash is - at best - half as fast as SHA1 with acceleration.
> 
> In fact, on the Apple A7, it would likely be slower than _software_ SHA-1.

Argh, ignore this. The CityHash readme is in bytes/cycle, which I missed on 
first readthrough (why on earth they are  not using either MB/s for rate, or 
cycles/byte, eludes  me completely.)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed  wrote:

> And that _is_ the case; they are faster... *when both are software
> implementations*

They are also faster when both are optimized to use special
instructions of the CPU.

According to this Intel whitepaper, SHA-1 does not achieve less than 1
cycle/byte in any of the situations they tested:

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/haswell-cryptographic-performance-paper.pdf

SpookyHash and CityHash obtain better than 0.5 cycle/byte, and in the
case of CityHash256, better than 0.2 cycle/byte

https://code.google.com/p/cityhash/source/browse/trunk/README
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed  wrote:
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> By the way, your "hard evidence" is imaginary.
> 
> Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge:
> 
> https://blake2.net/
> 
> While SpookyHash (and CityHash) are about 3 bytes per cycle (on long
> keys) which is about 0.33 cycles per byte. More than 10 times faster
> than SHA-1.
> 
> http://burtleburtle.net/bob/hash/spooky.html

On further examination, I did indeed make a mistake - the hardware 
acceleration for SHA on Intel will be in Skylake; only the AES acceleration 
was added in Sandy Bridge. So you are correct to some degree with the rarity 
argument.

However, performance-wise, that means SHA-1 on Intel is still a software 
implementation. Let's look at ARMv8.

The ARM v8 architecture added a few cryptographic instructions, including 
for SHA-1. The results:

https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl

# hardware-assisted software(*)
# Apple A72.31  4.13 (+14%)
# Cortex-A53  2.19  8.73 (+108%)
# Cortex-A57  2.35  7.88 (+74%)

>From the CityHash readme, on a Xeon X5550 (which is _considerably_ more 
powerful than any of the above):

On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at 
about 5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers 
around CityHashCrc256 and should have similar performance on long strings.
(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as 
thorough as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The 
fastest Murmur variant on that hardware, Murmur3F, peaks at about 2.4 
bytes/cycle. We expect the peak speed of CityHash128 to dominate CityHash64, 
which is aimed more toward short strings or use in hash tables.

So CityHash is - at best - half as fast as SHA1 with acceleration.

In fact, on the Apple A7, it would likely be slower than _software_ SHA-1.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed  wrote:
> And I'm not sure what is "convoluted" or "incorrect" about saying "Look,
> empirical evidence!"

No empirical evidence of the speed of SpookyHash or CityHash versus
SHA-1 was cited. The only empirical data mentioned was on an
UltraSPARC CPU, and did not include any SpookyHash or CityHash
measurements, and yet you made a claim about the speeds on Intel and
ARM CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed  wrote:
>> Incidentally, you can be 'skeptical' all you like - per Austin's message
>> upthread, he was testing the Crypto API. Thus, skeptical as you may be,
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> You can make convoluted, incorrect claims all you like, but the fact
> is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
> CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
> Your claims are absurd.

And that _is_ the case; they are faster... *when both are software 
implementations*

And I'm not sure what is "convoluted" or "incorrect" about saying "Look, 
empirical evidence!"

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed  wrote:
> hard evidence shows that SHA-1 was equal to or faster than CRC32, which is
> unequivocally simpler and faster than CityHash (though CityHash comes
> close).
>
> And the CPUs in question are *not* particularly rare - Intel since Sandy
> Bridge or so, the majority of SPARC systems, a goodly number of ARM systems
> via coprocessors...

By the way, your "hard evidence" is imaginary.

Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge:

https://blake2.net/

While SpookyHash (and CityHash) are about 3 bytes per cycle (on long
keys) which is about 0.33 cycles per byte. More than 10 times faster
than SHA-1.

http://burtleburtle.net/bob/hash/spooky.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed  wrote:
>> Incidentally, you can be 'skeptical' all you like - per Austin's message
>> upthread, he was testing the Crypto API. Thus, skeptical as you may be,
>> hard evidence shows that SHA-1 was equal to or faster than CRC32, which
>> is unequivocally simpler and faster than CityHash (though CityHash comes
>> close).
>>
>> And the CPUs in question are *not* particularly rare - Intel since Sandy
>> Bridge or so, the majority of SPARC systems, a goodly number of ARM
>> systems via coprocessors...
> 
> You can make convoluted, incorrect claims all you like, but the fact
> is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
> CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
> Your claims are absurd.
And that is t


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed  wrote:
> Incidentally, you can be 'skeptical' all you like - per Austin's message
> upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard
> evidence shows that SHA-1 was equal to or faster than CRC32, which is
> unequivocally simpler and faster than CityHash (though CityHash comes
> close).
>
> And the CPUs in question are *not* particularly rare - Intel since Sandy
> Bridge or so, the majority of SPARC systems, a goodly number of ARM systems
> via coprocessors...

You can make convoluted, incorrect claims all you like, but the fact
is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel
CPUs, and Murmur3 is faster on ARM systems. And it is not even close.
Your claims are absurd.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn
>  wrote:
>> My only reasoning is that with this set of hashes (crc32c, adler32, and
>> md5), the statistical likely-hood of running into a hash collision with
>> more than one of them at a time is infinitesimally small compared to the
>> likely-hood of any one of them having a collision (or even compared to
>> something ridiculous like the probability of being killed by a meteor
>> strike), and the combination is faster on most systems that I have tried
>> than many 256-bit crypto hashes.
> 
> I have not seen any evidence that combining hashes like that actually
> reduces the chances of collision, but if we assume it does, then
> again, the non-crypto hashes would be faster. For example, 128-bit
> Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
> and would be faster than MD5 + whatever.

It has no real benefit, but _why_ depends on what your model is.

There's a saying that engineers worry about stochastic failure; security 
professionals have to worry about malicious failure.

If your only concern is stochastic failure (random bitflips, etc), then the 
chances of collision with 128-bit CityHash or MurmurHash or SipHash or what-
have-you are already so small that every single component in your laptop 
dying simultaneously is more likely. Adding another hash is thus just a 
waste of cycles.

If your concern is malicious failure (in-band deduplication attack or 
similar, ignoring for now that btrfs actually compares the extent data as 
well IIRC), then it's well-known in the cryptographic community that the 
concatenation of multiple hashes is as strong as the strongest hash, _but no 
stronger_ [1].

Since the strongest cipher in the above list is either a non-cryptographic 
hash or MD5, which is known-weak to the point of there being numerous toy 
programs finding collisions for arbitrary data, it would not be worth much.

The only place this might be of use is if you used N strong/unbroken hashes, 
in order to hedge against up to N-1 of them being broken. However, the gain 
of that is (again) infinetismal, and the performance cost quite large 
indeed.

[1] http://eprint.iacr.org/2008/075

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed 
> wrote:
>> Actually, I said "Sure" here, but this isn't strictly true. At some
>> point, you're more memory-bound than CPU-bound, and with CPU intrinsic
>> instructions (like SPARC and recent x86 have for SHA) you're often past
>> that. Then, you're not going to see any real difference - and the
>> accelerated cryptographic hashes may even win out, because the intrinsics
>> may be faster (less stuff of the I$, pipelined single instruction beating
>> multiple simpler instructions, etc) than the software non-cryptographic
>> hash.
> 
> In practice, I am skeptical whether any 128- or 256-bit crypto hashes
> will be as fast as the non-crypto hashes I mentioned, even on CPUs
> with specific instructions for the crypto hashes. The non-crypto
> hashes can (and do) take advantage of special CPU instructions as
> well.
> 
> But even if true that the crypto hashes approach the speed of
> non-crypto hashes on certain CPUs, that does not provide a strong
> argument for using the crypto hashes, since on the common x64 CPUs,
> the non-crypto hashes I mentioned are significantly faster than the
> equivalent crypto hashes.
> 
> So, you have some rare architectures where the crypto hashes may
> almost be as fast as the non-crypto, and common CPUs where the
> non-crypto are much faster. That makes the non-crypto hash functions I
> mentioned the obvious choice in the vast majority of systems.

Incidentally, you can be 'skeptical' all you like - per Austin's message 
upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard 
evidence shows that SHA-1 was equal to or faster than CRC32, which is 
unequivocally simpler and faster than CityHash (though CityHash comes 
close).

And the CPUs in question are *not* particularly rare - Intel since Sandy 
Bridge or so, the majority of SPARC systems, a goodly number of ARM systems 
via coprocessors...


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed 
> wrote:
>> Actually, I said "Sure" here, but this isn't strictly true. At some
>> point, you're more memory-bound than CPU-bound, and with CPU intrinsic
>> instructions (like SPARC and recent x86 have for SHA) you're often past
>> that. Then, you're not going to see any real difference - and the
>> accelerated cryptographic hashes may even win out, because the intrinsics
>> may be faster (less stuff of the I$, pipelined single instruction beating
>> multiple simpler instructions, etc) than the software non-cryptographic
>> hash.
> 
> In practice, I am skeptical whether any 128- or 256-bit crypto hashes
> will be as fast as the non-crypto hashes I mentioned, even on CPUs
> with specific instructions for the crypto hashes. The non-crypto
> hashes can (and do) take advantage of special CPU instructions as
> well.
> 
> But even if true that the crypto hashes approach the speed of
> non-crypto hashes on certain CPUs, that does not provide a strong
> argument for using the crypto hashes, since on the common x64 CPUs,
> the non-crypto hashes I mentioned are significantly faster than the
> equivalent crypto hashes.
> 
> So, you have some rare architectures where the crypto hashes may
> almost be as fast as the non-crypto, and common CPUs where the
> non-crypto are much faster. That makes the non-crypto hash functions I
> mentioned the obvious choice in the vast majority of systems.

And as I said upthread, one benefit of the Crypto API is that the filesystem 
developers _no longer have to choose_. By using the shash or ahash interface 
to the Crypto API, the _user_ can choose *any* hash the kernel supports. And 
the default is (and will almost certainly continue to be) crc32, so the user 
would need to specify a hash anyway - making whether some other non-
cryptographic hash is the "obvious choice" a completely moot point.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn
 wrote:
> My only reasoning is that with this set of hashes (crc32c, adler32, and
> md5), the statistical likely-hood of running into a hash collision with more
> than one of them at a time is infinitesimally small compared to the
> likely-hood of any one of them having a collision (or even compared to
> something ridiculous like the probability of being killed by a meteor
> strike), and the combination is faster on most systems that I have tried
> than many 256-bit crypto hashes.

I have not seen any evidence that combining hashes like that actually
reduces the chances of collision, but if we assume it does, then
again, the non-crypto hashes would be faster. For example, 128-bit
Spooky2 combined with 128-bit CityHash would produce a 256-bit hash
and would be faster than MD5 + whatever.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed  wrote:
> Actually, I said "Sure" here, but this isn't strictly true. At some point,
> you're more memory-bound than CPU-bound, and with CPU intrinsic instructions
> (like SPARC and recent x86 have for SHA) you're often past that. Then,
> you're not going to see any real difference - and the accelerated
> cryptographic hashes may even win out, because the intrinsics may be faster
> (less stuff of the I$, pipelined single instruction beating multiple simpler
> instructions, etc) than the software non-cryptographic hash.

In practice, I am skeptical whether any 128- or 256-bit crypto hashes
will be as fast as the non-crypto hashes I mentioned, even on CPUs
with specific instructions for the crypto hashes. The non-crypto
hashes can (and do) take advantage of special CPU instructions as
well.

But even if true that the crypto hashes approach the speed of
non-crypto hashes on certain CPUs, that does not provide a strong
argument for using the crypto hashes, since on the common x64 CPUs,
the non-crypto hashes I mentioned are significantly faster than the
equivalent crypto hashes.

So, you have some rare architectures where the crypto hashes may
almost be as fast as the non-crypto, and common CPUs where the
non-crypto are much faster. That makes the non-crypto hash functions I
mentioned the obvious choice in the vast majority of systems.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Austin S Hemmelgarn


On 2014-12-01 13:37, David Sterba wrote:

On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote:

On 2014-11-26 08:38, Brendan Hide wrote:

On 2014/11/25 18:47, David Sterba wrote:

We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the obvious cost.


I can imagine some use-cases where you might even want more than one
algorithm to be used and stored. Not sure if that makes me a madman,
though. ;)


Not crazy at all, I would love to have the ability to store multiple
different weak but fast hash values.  For example, on my laptop, it is
actually faster to compute crc32c, adler32, and md5 hashes together than
it is to compute pretty much any 256-bit hash I've tried.


Well, this is doable :) there's space for 256 bits in general, the order of
checksum bytes in one "checksum word" would be given by fixed order the
algorighms are defined. The code complexity would increase, but not that
much I think.


This then brings up the issue of what to do when we try to mount such a
fs on a system that doesn't support some or all of the hashes used.


I see two modes: first fail if all not present, or relaxed by a mount
option to accept at least one.

But let's keep this open, I'm not yet convinced that combining more weak
algos makes sense from the crypto POV. If this should protect against
random bitflips, would one fast-but-weak be comparable to a combination?
Or other expectations.

My only reasoning is that with this set of hashes (crc32c, adler32, and 
md5), the statistical likely-hood of running into a hash collision with 
more than one of them at a time is infinitesimally small compared to the 
likely-hood of any one of them having a collision (or even compared to 
something ridiculous like the probability of being killed by a meteor 
strike), and the combination is faster on most systems that I have tried 
than many 256-bit crypto hashes.


It's still a tradeoff though, I also think that the idea mentioned 
elsewhere in this thread of having separate hashes stored for 
subsections of the same block is also worth looking at.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Austin S Hemmelgarn


On 2014-12-01 14:34, Alex Elsayed wrote:

Alex Elsayed wrote:


* He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API),
and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
cryptographic hashes simply because those require multiple registers to
store the state if nothing else; which makes this a rather strong argument
that _hardware matters a heck of a lot_, quite possibly _more_ than the
algorithm.


Ah, correction - it seems he was comparing his own implementations, rather
than the Crypto API ones - but the points still hold, seeing as the Crypto
API does provide both algorithms.
Actually, I did the tests using the userspace interface to the kernel's 
Crypto API.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Alex Elsayed wrote:

> John Williams wrote:
>> Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
>> am talking about can and do take advantage of CPU architecture. For
>> 128- and 256-bit hashes, one (or more) of those three will be
>> significantly faster than any crypto hash in the Crypto API,
>> regardless of the CPU it is run on.
> 
> Sure.

Actually, I said "Sure" here, but this isn't strictly true. At some point, 
you're more memory-bound than CPU-bound, and with CPU intrinsic instructions 
(like SPARC and recent x86 have for SHA) you're often past that. Then, 
you're not going to see any real difference - and the accelerated 
cryptographic hashes may even win out, because the intrinsics may be faster 
(less stuff of the I$, pipelined single instruction beating multiple simpler 
instructions, etc) than the software non-cryptographic hash.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed 
> wrote:
> 
>> I think there's a fundamental set of points being missed.
> 
> That may be true, but it is not me who is missing them.
>> * The Crypto API can be used to access non-cryptographic hashes. Full
>> stop.
> 
> Irrelevant to my point. I am talking about specific non-cryptographic
> hashes, and they are not currently in the Crypto API.

Yes, but they're not anywhere else in the kernel either.

>> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the
>> Crypto API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto
>> API), and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit
>> non- cryptographic hashes simply because those require multiple registers
>> to store the state if nothing else; which makes this a rather strong
>> argument that _hardware matters a heck of a lot_, quite possibly _more_
>> than the algorithm.
> 
> Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
> am talking about can and do take advantage of CPU architecture. For
> 128- and 256-bit hashes, one (or more) of those three will be
> significantly faster than any crypto hash in the Crypto API,
> regardless of the CPU it is run on.

Sure.

> As for the possibility of adding more hash functions to Crypto API for
> btrfs to use, I do not believe I have argued against it, so I am not
> sure why you repeated the point. It seems to me that is a discussion
> that must be had with the maintainer(s) of Crypto API (will they
> accept additional non-crypto 128- and 256-bit hash functions, etc.)

In that case, I'm not sure what the reason for the thread continuing is? If 
they go in the Crypto API, there's no need to argue against cryptographic 
hashes either - it becomes the user's choice. That's pretty much the entire 
reason I kept responding; I figured that arguing against the cryptographic 
hashes _was_ an objection to the Crypto API, since they're basically a 
freebie for no effort if we use it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed  wrote:

> I think there's a fundamental set of points being missed.

That may be true, but it is not me who is missing them.
> * The Crypto API can be used to access non-cryptographic hashes. Full stop.

Irrelevant to my point. I am talking about specific non-cryptographic
hashes, and they are not currently in the Crypto API.

> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
> API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and
> SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
> cryptographic hashes simply because those require multiple registers to
> store the state if nothing else; which makes this a rather strong argument
> that _hardware matters a heck of a lot_, quite possibly _more_ than the
> algorithm.

Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I
am talking about can and do take advantage of CPU architecture. For
128- and 256-bit hashes, one (or more) of those three will be
significantly faster than any crypto hash in the Crypto API,
regardless of the CPU it is run on.

As for the possibility of adding more hash functions to Crypto API for
btrfs to use, I do not believe I have argued against it, so I am not
sure why you repeated the point. It seems to me that is a discussion
that must be had with the maintainer(s) of Crypto API (will they
accept additional non-crypto 128- and 256-bit hash functions, etc.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

Alex Elsayed wrote:

> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto
> API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API),
> and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
> cryptographic hashes simply because those require multiple registers to
> store the state if nothing else; which makes this a rather strong argument
> that _hardware matters a heck of a lot_, quite possibly _more_ than the
> algorithm.

Ah, correction - it seems he was comparing his own implementations, rather 
than the Crypto API ones - but the points still hold, seeing as the Crypto 
API does provide both algorithms.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Alex Elsayed

John Williams wrote:

> On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of
> the CPU optimized hashes aren't crypto hashes (other than the
>> various SHA implementations).  Furthermore, I've actually tested the
>> speed of a generic CRC32c implementation versus SHA-1 using the SHA
>> instructions on an UltraSPARC processor, and the difference ammounts to a
>> few microseconds in _favor_ of the optimized crypto hash; and I've run
>> the math for every other ISA that has instructions for computing SHA
>> hashes (I don't have the hardware for any of the others), and expect
>> similar results for those as well.
> 
> I think the confusion here is that I am talking about 128-bit and
> 256-bit hashes, which is what you would choose for filesystem
> checksums if you want to have extremely strong collision resistance
> (eg., you could also use it for dedup).
> 
> You seem to be talking about 32-bit (and maybe 64-bit) hashes.
> 
> The speed difference between crypto 128- and 256-bit hashes and
> non-crypto equivalents that I have mentioned is an order of magnitude
> or more.

I think there's a fundamental set of points being missed.

* The Crypto API can be used to access non-cryptographic hashes. Full stop.

* He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto 
API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and 
SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non-
cryptographic hashes simply because those require multiple registers to 
store the state if nothing else; which makes this a rather strong argument 
that _hardware matters a heck of a lot_, quite possibly _more_ than the 
algorithm.

Even if SHA-1 in software is vastly slower than CityHash or whatever in 
software, the Crypto API implementation *may not be purely software*.

* The main benefit of the Crypto API is not any specific hash, it's that 
it's a _common API_ for _using any supported hash_.

* Your preferred non-cryptographic hashes can, thus, be used _via_ the 
Crypto API.

* This has benefits of:
* Code reuse (for anyone else who wants to use such a hash).

* Optimization opportunities (if a CPU implements some primitive, it can 
be leveraged in an arch-specific implementation, which the Crypto API will 
use _automatically_).

* Flexibility (by using the Crypto API, _any_ supported hash can be used 
generically, so the _user_ can decide whether they want rather than a small, 
hard-coded menu of options in btrfs).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread David Sterba

On Thu, Nov 27, 2014 at 11:52:20AM +0800, Liu Bo wrote:
> > There are several checksum algorithms that trade off speed and strength
> > so we may want to support more than just sha256. Easy to add but I'd
> > rather see them added in all at once than one by one.
> > 
> > Another question is if we'd like to use different checksum for data and
> > metadata. This would not cost any format change if we use the 2 bytes in
> > super block csum_type.
> 
> Yes, but breaking it into meta_csum_type and data_csum_type will need a
> imcompat flag.

Not necessarily a new bit. If we read the field as-is, see if it's zero
we know it's the previous version, otherwise the new one and then set
only in-memory fileds for data and metadata.

The backward compatibility is fine, old kernels will refuse to mount
with csum_type != 0.

> > Optional/crazy/format change stuff:
> > 
> > * per-file checksum algorithm - unlike compression, the whole file would
> >   have to use the same csum algo
> >   reflink would work iff the algos match
> >   snapshotting is unaffected
> > 
> > * per-subvolume checksum algorithm - specify the csum type at creation
> >   time, or afterwards unless it's modified
> 
> I thought about this before, if we enable this, a few cases need to be dealt
> with(at least),
> 1. convert file data's csum from one algorithm to another

On-line or offline? I'd rather avoid doing that on a mounted filesystem.

> 2. to make different checksum length co-exist, we can either use different
>key.type for different algorithms, or pack checksum into a new structure 
> that
>has algorithm types(and length).

Oh right, the mixed sizes of checksums could be a problem and would
require a format change (and thus the incompatibility bit).

The key.type approach looks better, we'd encode the algorithm type
effectively, the item bytes contain only fixed-size checksums.
(Here I'm thinking a new BTRFS_EXTENT_CSUM_KEY per checksum type.)

OTOH storing the algo type (size is not needed) would add overhead
per-checksum (probably only a single byte but still).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread David Sterba

On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote:
> On 2014-11-26 08:38, Brendan Hide wrote:
> > On 2014/11/25 18:47, David Sterba wrote:
> >> We could provide an interface for external applications that would make
> >> use of the strong checksums. Eg. external dedup, integrity db. The
> >> benefit here is that the checksum is always up to date, so there's no
> >> need to compute the checksums again. At the obvious cost.
> >
> > I can imagine some use-cases where you might even want more than one
> > algorithm to be used and stored. Not sure if that makes me a madman,
> > though. ;)
> >
> Not crazy at all, I would love to have the ability to store multiple 
> different weak but fast hash values.  For example, on my laptop, it is 
> actually faster to compute crc32c, adler32, and md5 hashes together than 
> it is to compute pretty much any 256-bit hash I've tried.

Well, this is doable :) there's space for 256 bits in general, the order of
checksum bytes in one "checksum word" would be given by fixed order the
algorighms are defined. The code complexity would increase, but not that
much I think.

> This then brings up the issue of what to do when we try to mount such a 
> fs on a system that doesn't support some or all of the hashes used.

I see two modes: first fail if all not present, or relaxed by a mount
option to accept at least one.

But let's keep this open, I'm not yet convinced that combining more weak
algos makes sense from the crypto POV. If this should protect against
random bitflips, would one fast-but-weak be comparable to a combination?
Or other expectations.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of
the CPU optimized hashes aren't crypto hashes (other than the
> various SHA implementations).  Furthermore, I've actually tested the speed
> of a generic CRC32c implementation versus SHA-1 using the SHA instructions
> on an UltraSPARC processor, and the difference ammounts to a few
> microseconds in _favor_ of the optimized crypto hash; and I've run the math
> for every other ISA that has instructions for computing SHA hashes (I don't
> have the hardware for any of the others), and expect similar results for
> those as well.

I think the confusion here is that I am talking about 128-bit and
256-bit hashes, which is what you would choose for filesystem
checksums if you want to have extremely strong collision resistance
(eg., you could also use it for dedup).

You seem to be talking about 32-bit (and maybe 64-bit) hashes.

The speed difference between crypto 128- and 256-bit hashes and
non-crypto equivalents that I have mentioned is an order of magnitude
or more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Austin S Hemmelgarn


On 2014-12-01 12:22, John Williams wrote:

On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn
 wrote:


Just because it's a filesystem doesn't always mean that speed is the most
important thing.  Personally, I can think of multiple cases where using a
cryptographically strong hash would be preferable, for example:
  * On an fs used solely for backup purposes
  * On a fs used for /boot
  * On an fs spread across a very large near-line disk array and mounted
by a system with a powerful CPU
  * Almost any other case where data integrity is more important than
speed


What does data integrity have to do with whether the hash is
cryptographic or not? The primary difference between a cryptographic
and non-cryptographic hash is that the non-cryptographic hash can be
easily guessed / predicted (eg., an attack to deliberately create
collisions) whereas the cryptographic hash cannot (given reasonable
assumptions of CPU power).

For filesystem checksums it is difficult to imagine a deliberate
attack on the checksums. Consequently, the only really important
quality for the hash besides speed is collision resistance. The
non-crypto hashes that I have mentioned in this thread have excellent
collision resistant properties.
I'm not saying they don't have excellent collision resistance 
properties.  I'm also not saying that we shouldn't support such 
non-cryptographic hashes, just that we shouldn't explicitly NOT support 
other hashes, and that if we are going to support more than one hash 
algorithm, we should use the infrastructure already in place in the 
kernel for such things because it greatly simplifies maintaining the code.


In fact, if I had the time, I'd just write CryptoAPI implementations of 
those hashes myself.



The biggest reason to use the in-kernel Crypto API though, is that it gives
a huge amount of flexibility, and provides pretty much transparent
substitution of CPU optimized versions of the exported hash functions (for
example, you don't have to know whether or not your processor supports
Intel's CRC32 ISA extensions).


Which is worse than useless if the CPU-optimized crypto hash is slower
than the default non-crypto hash, and that will almost always be the
case. Besides, there is nothing magic happening in the Crypto API
library. If you implement your own hash, you can easily do a few
checks and choose the best code for the CPU.

Except most of the CPU optimized hashes aren't crypto hashes (other than 
the various SHA implementations).  Furthermore, I've actually tested the 
speed of a generic CRC32c implementation versus SHA-1 using the SHA 
instructions on an UltraSPARC processor, and the difference ammounts to 
a few microseconds in _favor_ of the optimized crypto hash; and I've run 
the math for every other ISA that has instructions for computing SHA 
hashes (I don't have the hardware for any of the others), and expect 
similar results for those as well.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread John Williams

On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn
 wrote:

> Just because it's a filesystem doesn't always mean that speed is the most
> important thing.  Personally, I can think of multiple cases where using a
> cryptographically strong hash would be preferable, for example:
>  * On an fs used solely for backup purposes
>  * On a fs used for /boot
>  * On an fs spread across a very large near-line disk array and mounted
>by a system with a powerful CPU
>  * Almost any other case where data integrity is more important than
>speed

What does data integrity have to do with whether the hash is
cryptographic or not? The primary difference between a cryptographic
and non-cryptographic hash is that the non-cryptographic hash can be
easily guessed / predicted (eg., an attack to deliberately create
collisions) whereas the cryptographic hash cannot (given reasonable
assumptions of CPU power).

For filesystem checksums it is difficult to imagine a deliberate
attack on the checksums. Consequently, the only really important
quality for the hash besides speed is collision resistance. The
non-crypto hashes that I have mentioned in this thread have excellent
collision resistant properties.

> The biggest reason to use the in-kernel Crypto API though, is that it gives
> a huge amount of flexibility, and provides pretty much transparent
> substitution of CPU optimized versions of the exported hash functions (for
> example, you don't have to know whether or not your processor supports
> Intel's CRC32 ISA extensions).

Which is worse than useless if the CPU-optimized crypto hash is slower
than the default non-crypto hash, and that will almost always be the
case. Besides, there is nothing magic happening in the Crypto API
library. If you implement your own hash, you can easily do a few
checks and choose the best code for the CPU.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-12-01 Thread Austin S Hemmelgarn


On 2014-11-29 16:21, John Williams wrote:

On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed  wrote:

I'd suggest looking more closely at the crypto api section of menuconfig -
it already has crc32c, among others. Just because it's called the "crypto
api" doesn't mean it only has cryptographically-strong algorithms.


I have looked. What 128- or 256-bit hash functions in "crypto api" are
you referring to that are as fast as Spooky2 or CityHash?


Just because it's a filesystem doesn't always mean that speed is the 
most important thing.  Personally, I can think of multiple cases where 
using a cryptographically strong hash would be preferable, for example:

 * On an fs used solely for backup purposes
 * On a fs used for /boot
 * On an fs spread across a very large near-line disk array and mounted
   by a system with a powerful CPU
 * Almost any other case where data integrity is more important than
   speed

The biggest reason to use the in-kernel Crypto API though, is that it 
gives a huge amount of flexibility, and provides pretty much transparent 
substitution of CPU optimized versions of the exported hash functions 
(for example, you don't have to know whether or not your processor 
supports Intel's CRC32 ISA extensions).




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-30 Thread Christoph Anton Mitterer

On Sun, 2014-11-30 at 23:05 +, Dimitri John Ledkov wrote: 
> Nope, we should use standard names.
Well I wouldn't know that there is really a standardised name in the
sense that it tells it's mandatory.
People use SHA2-xxx, SHA-xxx, SHAxxx and probably even more
combinations.

And just because something was started short-sighted and in a wrong way
it doesn't mean one cannot correct it, which is why we try to no longer
use e.g. KB but kB or KiB.

Cheers,
Chris.


smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-30 Thread Dimitri John Ledkov

On 30 November 2014 at 22:59, Christoph Anton Mitterer
 wrote:
>>Agree with others about -C 256...-C sha256 is only three
>>letters more ;)
>
> Ideally, sha2-256 would be used, since there will be (are) other
> versions of sha which have 256 bits size.
>

Nope, we should use standard names. SHA-2 256 was the first SHA algo
to use 256 bits, thus it's commonly referred to as sha256 across the
board in multiple pieces of software.
SHA-3 family of hashes started to have the same length and thus will
be known as sha3-256 etc.

Shorthand variant names in this table here
http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions appear
to me how SHA hashes are currently referred as.

-- 
Regards,

Dimitri.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-30 Thread Christoph Anton Mitterer

>Agree with others about -C 256...-C sha256 is only three
>letters more ;)

Ideally, sha2-256 would be used, since there will be (are) other
versions of sha which have 256 bits size.


Cheers,
Chris.




smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-30 Thread Christoph Anton Mitterer

>Agree with others about -C 256...-C sha256 is only three
>letters more ;)

Ideally, sha2-256 would be used, since there will be (are) other
versions of sha which have 256 bits size.


Cheers,
Chris.




smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-29 Thread Alex Elsayed

John Williams wrote:

> On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed 
> wrote:
>> I'd suggest looking more closely at the crypto api section of menuconfig
>> - it already has crc32c, among others. Just because it's called the
>> "crypto api" doesn't mean it only has cryptographically-strong
>> algorithms.
> 
> I have looked. What 128- or 256-bit hash functions in "crypto api" are
> you referring to that are as fast as Spooky2 or CityHash?

I'm saying that neither of those are in the kernel _anywhere_ now, so if 
someone's adding them the sensible thing seems to be to add them to the 
crypto api, access them through it, and then if we ever add more we get them 
for free on the btrfs side instead of needing to reinvent the wheel every 
time.

In short, there's a place for hashes - why not use it?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-29 Thread John Williams

On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed  wrote:
> I'd suggest looking more closely at the crypto api section of menuconfig -
> it already has crc32c, among others. Just because it's called the "crypto
> api" doesn't mean it only has cryptographically-strong algorithms.

I have looked. What 128- or 256-bit hash functions in "crypto api" are
you referring to that are as fast as Spooky2 or CityHash?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-29 Thread Alex Elsayed

John Williams wrote:

> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed 
> wrote:
>> Why not just use the kernel crypto API? Then the user can just specify
>> any hash the kernel supports.
> 
> One reason is that crytographic hashes are an order of magnitude
> slower than the fastest non-cryptographic hashes. And for filesystem
> checksums, I do not see a need for crypotgraphic hashes.

I'd suggest looking more closely at the crypto api section of menuconfig - 
it already has crc32c, among others. Just because it's called the "crypto 
api" doesn't mean it only has cryptographically-strong algorithms. As a side 
benefit, if someone implements (say) SipHash for it, then not only could 
btrfs benefit, but also all other users of the API, including (now) 
userspace.

The crypto api also has compression, for zlib/lzo/lz4/lz4hc, but I'm given 
to understand that btrfs' usage of compression doesn't match well to that 
API.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-29 Thread John Williams

On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed  wrote:
> Why not just use the kernel crypto API? Then the user can just specify any
> hash the kernel supports.

One reason is that crytographic hashes are an order of magnitude
slower than the fastest non-cryptographic hashes. And for filesystem
checksums, I do not see a need for crypotgraphic hashes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-29 Thread Alex Elsayed

David Sterba wrote:

> On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
>> This brings a strong-but-slow checksum algorithm, sha256.
>> 
>> Actually btrfs used sha256 at the early time, but then moved to crc32c
>> for performance purposes.
>> 
>> As crc32c is sort of weak due to its hash collision issue, we need a
>> stronger algorithm as an alternative.
>> 
>> Users can choose sha256 from mkfs.btrfs via
>> 
>> $ mkfs.btrfs -C 256 /device
> 
> There's already some good feedback so I'll try to cover what hasn't been
> mentioned yet.
> 
> I think it's better to separate the preparatory works from adding the
> algorithm itself. The former can be merged (and tested) independently.
> 
> There are several checksum algorithms that trade off speed and strength
> so we may want to support more than just sha256. Easy to add but I'd
> rather see them added in all at once than one by one.

Why not just use the kernel crypto API? Then the user can just specify any 
hash the kernel supports.

> Another question is if we'd like to use different checksum for data and
> metadata. This would not cost any format change if we use the 2 bytes in
> super block csum_type.

Mmm, that might be a good reason - although maybe store an entry in some 
tree of the full crypto api spec, and have a special value of one byte 
meaning "crypto API" and the other byte counts how many bytes the csum is.

> Optional/crazy/format change stuff:
> 
> * per-file checksum algorithm - unlike compression, the whole file would
>   have to use the same csum algo
>   reflink would work iff the algos match
>   snapshotting is unaffected
> 
> * per-subvolume checksum algorithm - specify the csum type at creation
>   time, or afterwards unless it's modified
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
W

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread Liu Bo

On Tue, Nov 25, 2014 at 05:39:05PM +0100, David Sterba wrote:
> On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a 
> > stronger
> > algorithm as an alternative.
> > 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> There's already some good feedback so I'll try to cover what hasn't been
> mentioned yet.
> 
> I think it's better to separate the preparatory works from adding the
> algorithm itself. The former can be merged (and tested) independently.

That's a good point.

> 
> There are several checksum algorithms that trade off speed and strength
> so we may want to support more than just sha256. Easy to add but I'd
> rather see them added in all at once than one by one.
> 
> Another question is if we'd like to use different checksum for data and
> metadata. This would not cost any format change if we use the 2 bytes in
> super block csum_type.

Yes, but breaking it into meta_csum_type and data_csum_type will need a
imcompat flag.

> 
> 
> Optional/crazy/format change stuff:
> 
> * per-file checksum algorithm - unlike compression, the whole file would
>   have to use the same csum algo
>   reflink would work iff the algos match
>   snapshotting is unaffected
> 
> * per-subvolume checksum algorithm - specify the csum type at creation
>   time, or afterwards unless it's modified

I thought about this before, if we enable this, a few cases need to be dealt
with(at least),
1. convert file data's csum from one algorithm to another
2. to make different checksum length co-exist, we can either use different
   key.type for different algorithms, or pack checksum into a new structure that
   has algorithm types(and length).

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread John Williams

On Wed, Nov 26, 2014 at 4:50 AM, Holger Hoffstätte
 wrote:
> On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:
>
>> 2) CityHash : for 256-bit hashes on all systems
>> https://code.google.com/p/cityhash/
>
> Btw this is now superseded by Farmhash:
> https://code.google.com/p/farmhash/
>

It seems FarmHash is not a complete replacement for CityHash.
Specifically, I don't see a Fingerprint256() function in FarmHash, so
no 256-bit fingerprints (unless I am missing something?). Also, it
seems that FarmHash's Fingerprint128() hash is just CityHash128().

Unless I am misreading it, I think FarmHash is mostly useful for
32-bit and 64-bit hashes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread Austin S Hemmelgarn


On 2014-11-26 08:38, Brendan Hide wrote:

On 2014/11/25 18:47, David Sterba wrote:

We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the obvious cost.


I can imagine some use-cases where you might even want more than one
algorithm to be used and stored. Not sure if that makes me a madman,
though. ;)

Not crazy at all, I would love to have the ability to store multiple 
different weak but fast hash values.  For example, on my laptop, it is 
actually faster to compute crc32c, adler32, and md5 hashes together than 
it is to compute pretty much any 256-bit hash I've tried.


This then brings up the issue of what to do when we try to mount such a 
fs on a system that doesn't support some or all of the hashes used.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread Brendan Hide


On 2014/11/25 18:47, David Sterba wrote:

We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the obvious cost.


I can imagine some use-cases where you might even want more than one 
algorithm to be used and stored. Not sure if that makes me a madman, 
though. ;)


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread Brendan Hide


On 2014/11/25 13:30, Liu Bo wrote:

This is actually inspired by ZFS, who offers checksum functions ranging
from the simple-and-fast fletcher2 to the slower-but-secure sha256.

Back to btrfs, crc32c is the only choice.

And also for the slowness of sha256, Intel has a set of instructions for
it, "Intel SHA Extensions", that may help a lot.


I think the advantage will be in giving a choice with some strong 
suggestions:


An example of suggestions - if using sha256 on an old or "low-power" 
CPU, detect that the CPU doesn't support the appropriate acceleration 
functions and print a warning at mount or a warning-and-prompt at mkfs-time.


The default could even be changed based on the architecture - though I 
suspect crc32c is already a good default on most architectures.


The choice allowance gives flexibility where admins know it optimally 
could be used - and David's suggestion (separate thread) would be able 
to take advantage of that.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-26 Thread Holger Hoffstätte

On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote:

> 2) CityHash : for 256-bit hashes on all systems
> https://code.google.com/p/cityhash/

Btw this is now superseded by Farmhash:
https://code.google.com/p/farmhash/

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread John Williams

On Tue, Nov 25, 2014 at 2:30 AM, Liu Bo  wrote:
> On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
>> For example, Spooky V2 hash is 128 bits and is very fast. It is
>> noncryptographic, but it is more than adequate for data checksums.
>>
>> http://burtleburtle.net/bob/hash/spooky.html
>>
>> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
>> (Xeon E3-1270 V2 @ 3.50Ghz)
>
> Thanks for the suggestion, I'll take a look.
>
> Btw, it's not in kernel yet, is it?

No, as far as I know, it is not in the kernel.

By the way, as for the suggestion of blake2 hash, note that it is much
slower than Spooky V2 hash. That is to be expected, since blake2 is a
cryptographic hash (even if it is one that is fast relative to other
cryptographic hashes) and as a class, cryptographic hashes tend to be
an order of magnitude slower than the fastest noncryptographic hashes.

The hashes that I would recommend for use with btrfs checksums are:

1) SpookyHash V2 : for 128 bit hashes on 64-bit systems
http://burtleburtle.net/bob/hash/spooky.html

2) CityHash : for 256-bit hashes on all systems
https://code.google.com/p/cityhash/

3) Murmur3 :for 128-bit hashes on 32-bit systems (since Spooky and
City are not the fastest on most 32-bit systems)
https://code.google.com/p/smhasher/wiki/MurmurHash3

All of those are noncryptographic, but they all have good properties
that should make them more than adequate for data checksums and dedup
usage.

For more information, here are some comparisons of fast hash functions
(note that these comparisons were written 2 to 3 years ago):

http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html
http://research.neustar.biz/tag/spooky-hash/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Bardur Arantsson

On 2014-11-25 17:47, David Sterba wrote:
> On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
>> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:
>>> This brings a strong-but-slow checksum algorithm, sha256.
>>>
>>> Actually btrfs used sha256 at the early time, but then moved to 
>>> crc32c for
>>> performance purposes.
>>>
>>> As crc32c is sort of weak due to its hash collision issue, we need a 
>>> stronger
>>> algorithm as an alternative.
>>>
>>> Users can choose sha256 from mkfs.btrfs via
>>>
>>> $ mkfs.btrfs -C 256 /device
>>
>> Agree with others about -C 256...-C sha256 is only three letters more ;)
>>
>> What's the target for this mode?  Are we trying to find evil people 
>> scribbling on the drive, or are we trying to find bad hardware?
> 
> We could provide an interface for external applications that would make
> use of the strong checksums. Eg. external dedup, integrity db. The
> benefit here is that the checksum is always up to date, so there's no
> need to compute the checksums again. At the obvious cost.

Yes, pleease!

Regards,


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread David Sterba

On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to 
> > crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a 
> > stronger
> > algorithm as an alternative.
> > 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)
> 
> What's the target for this mode?  Are we trying to find evil people 
> scribbling on the drive, or are we trying to find bad hardware?

We could provide an interface for external applications that would make
use of the strong checksums. Eg. external dedup, integrity db. The
benefit here is that the checksum is always up to date, so there's no
need to compute the checksums again. At the obvious cost.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread David Sterba

On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote:
> This brings a strong-but-slow checksum algorithm, sha256.
> 
> Actually btrfs used sha256 at the early time, but then moved to crc32c for
> performance purposes.
> 
> As crc32c is sort of weak due to its hash collision issue, we need a stronger
> algorithm as an alternative.
> 
> Users can choose sha256 from mkfs.btrfs via
> 
> $ mkfs.btrfs -C 256 /device

There's already some good feedback so I'll try to cover what hasn't been
mentioned yet.

I think it's better to separate the preparatory works from adding the
algorithm itself. The former can be merged (and tested) independently.

There are several checksum algorithms that trade off speed and strength
so we may want to support more than just sha256. Easy to add but I'd
rather see them added in all at once than one by one.

Another question is if we'd like to use different checksum for data and
metadata. This would not cost any format change if we use the 2 bytes in
super block csum_type.

Optional/crazy/format change stuff:

* per-file checksum algorithm - unlike compression, the whole file would
  have to use the same csum algo
  reflink would work iff the algos match
  snapshotting is unaffected

* per-subvolume checksum algorithm - specify the csum type at creation
  time, or afterwards unless it's modified
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Liu Bo

On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:
> >This brings a strong-but-slow checksum algorithm, sha256.
> >
> >Actually btrfs used sha256 at the early time, but then moved to
> >crc32c for
> >performance purposes.
> >
> >As crc32c is sort of weak due to its hash collision issue, we need
> >a stronger
> >algorithm as an alternative.
> >
> >Users can choose sha256 from mkfs.btrfs via
> >
> >$ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)

That's right, #stupidme

> 
> What's the target for this mode?  Are we trying to find evil people
> scribbling on the drive, or are we trying to find bad hardware?

This is actually inspired by ZFS, who offers checksum functions ranging
from the simple-and-fast fletcher2 to the slower-but-secure sha256.

Back to btrfs, crc32c is the only choice.

And also for the slowness of sha256, Intel has a set of instructions for
it, "Intel SHA Extensions", that may help a lot.

Not insisting on it, I'm always open to any suggestions.

Btw, having played with merkle tree for a while, however, making good use
of our existing scrub looks more promising for implemening the feature
that detects changes between mounts. 

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Daniel Cegiełka

2014-11-25 11:30 GMT+01:00 Liu Bo :
> On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
>> On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
>>  wrote:
>>
>> > Would there be room for a compromise with e.g. 128 bits?
>>
>> For example, Spooky V2 hash is 128 bits and is very fast. It is
>> noncryptographic, but it is more than adequate for data checksums.
>>
>> http://burtleburtle.net/bob/hash/spooky.html
>>
>> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
>> (Xeon E3-1270 V2 @ 3.50Ghz)
>
> Thanks for the suggestion, I'll take a look.
>
> Btw, it's not in kernel yet, is it?
>

The best option would be blake2b, but it isn't implemented in the
kernel. It is not a problem to use it locally (I can upload the code
stripped for usage in kernel).

from https://blake2.net/

Q: Why do you want BLAKE2 to be fast? Aren't fast hashes bad?

A: You want your hash function to be fast if you are using it to
compute the secure hash of a large amount of data, such as in
distributed filesystems (e.g. Tahoe-LAFS), cloud storage systems (e.g.
OpenStack Swift), intrusion detection systems (e.g. Samhain),
integrity-checking local filesystems (e.g. ZFS), peer-to-peer
file-sharing tools (e.g. BitTorrent), or version control systems (e.g.
git). You only want your hash function to be slow if you're using it
to "stretch" user-supplied passwords, in which case see the next
question.

https://blake2.net/
https://github.com/floodyberry/blake2b-opt

Best regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Liu Bo

On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
>  wrote:
> 
> > Would there be room for a compromise with e.g. 128 bits?
> 
> For example, Spooky V2 hash is 128 bits and is very fast. It is
> noncryptographic, but it is more than adequate for data checksums.
> 
> http://burtleburtle.net/bob/hash/spooky.html
> 
> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
> (Xeon E3-1270 V2 @ 3.50Ghz)

Thanks for the suggestion, I'll take a look.

Btw, it's not in kernel yet, is it?

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Liu Bo

On Mon, Nov 24, 2014 at 08:23:25AM +, Holger Hoffstätte wrote:
> On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote:
> 
> > This brings a strong-but-slow checksum algorithm, sha256.
> > 
> > Actually btrfs used sha256 at the early time, but then moved to crc32c for
> > performance purposes.
> > 
> > As crc32c is sort of weak due to its hash collision issue, we need a 
> > stronger
> > algorithm as an alternative.
> 
> I'm curious - did you see actual cases where this happened, i.e. a corrupt
> block that would pass crc32 validation? I know some high-integrity use
> cases require a stronger algorithm - just wondering.

Haven't see that so far, but here is a link for crc32c hash collision in
btrfs, http://lwn.net/Articles/529077/, it's not data checksum though,
btrfs's DIR_ITEM also use crc32c hash, if those happen to be data blocks,
something interesting will happen.

> 
> Would there be room for a compromise with e.g. 128 bits?

Yeah, we're good if it's not larger than 256 bits.

> 
> > Users can choose sha256 from mkfs.btrfs via
> > 
> > $ mkfs.btrfs -C 256 /device
> 
> Not sure how others feel about this, but it's probably easier for
> sysadmins to specify the algorithm by name from the set of supported
> ones, similar to how ssh does it ("ssh -C arcfour256").

Urr, my bad, I've made it locally but didn't 'git add' them.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Zygo Blaxell

On Mon, Nov 24, 2014 at 08:58:25PM +, Hugo Mills wrote:
> On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:
> > >This brings a strong-but-slow checksum algorithm, sha256.
> > >
> > >Actually btrfs used sha256 at the early time, but then moved to
> > >crc32c for
> > >performance purposes.
> > >
> > >As crc32c is sort of weak due to its hash collision issue, we need
> > >a stronger
> > >algorithm as an alternative.
> > >
> > >Users can choose sha256 from mkfs.btrfs via
> > >
> > >$ mkfs.btrfs -C 256 /device
> > 
> > Agree with others about -C 256...-C sha256 is only three letters more ;)
> > 
> > What's the target for this mode?  Are we trying to find evil people
> > scribbling on the drive, or are we trying to find bad hardware?
> 
>You're going to need a hell of a lot more infrastructure to deal
> with the first of those two cases. If someone can write arbitrary data
> to your storage without going through the filesystem, you've already
> lost the game.

If the filesystem can be arranged as a Merkle tree then you can store a
copy of the root SHA256 with a signature to detect arbitrary tampering.
Of course the magnitude of the "If" in that sentence is startlingly
large, especially if you are starting from where btrfs is now.  ;)

>I don't know what the stats are like for random error detection
> (probably just what you'd expect in the naive case -- 1/2^n chance of
> failing to detect an error for an n-bit hash). More bits likely are
> better for that, but how much CPU time do you want to burn on it?

crc64 should be more than adequate for simple disk corruption errors.
crc32's error rate works out to one false positive per dozen megabytes
*of random errors*, and crc64 FP rate is a few billion times lower
(one FP per petabyte or so).  If you have the kind of storage subsystem
that corrupts a petabyte of data, it'd be amazing if you could get
anything out of your filesystem at all.

>I could see this possibly being useful for having fewer false
> positives when using the inbuilt checksums for purposes of dedup.

Even then it's massive overkill.  A 16TB filesystem will average about
one hash collision from a good 64-bit hash.  Compared to a 256bit hash,
you'd be continuously maintaining a data structure on disk that is 96GB
larger than it has to be to save an average of *one* 4K read during a
full-filesystem dedup.

If your users are filling your disks with data blocks that all have
the same 64-bit hash (with any algorithm), SHA256 could be more
attractive...but you'd probably still be OK at half that size.

>Hugo.
> 
> -- 
> Hugo Mills | That's not rain, that's a lake with slots in it
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: 65E74AC0  |

signature.asc
Description: Digital signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Qu Wenruo

 Original Message 
Subject: Re: [RFC PATCH] Btrfs: add sha256 checksum option
From: Hugo Mills 
To: Chris Mason 
Date: 2014年11月25日 04:58

On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:

On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:

This brings a strong-but-slow checksum algorithm, sha256.

Actually btrfs used sha256 at the early time, but then moved to
crc32c for
performance purposes.

As crc32c is sort of weak due to its hash collision issue, we need
a stronger
algorithm as an alternative.

Users can choose sha256 from mkfs.btrfs via

$ mkfs.btrfs -C 256 /device

Agree with others about -C 256...-C sha256 is only three letters more ;)

What's the target for this mode?  Are we trying to find evil people
scribbling on the drive, or are we trying to find bad hardware?

You're going to need a hell of a lot more infrastructure to deal
with the first of those two cases. If someone can write arbitrary data
to your storage without going through the filesystem, you've already
lost the game.

I don't know what the stats are like for random error detection
(probably just what you'd expect in the naive case -- 1/2^n chance of
failing to detect an error for an n-bit hash). More bits likely are
better for that, but how much CPU time do you want to burn on it?

Agree with this, sha256's extra CPU usage seems not so worthy.

About the csum algorithm, personally I prefer algorithm with better 
error detection,
not only the integration about the whole data, but the range where the 
error lies in.

If btrfs can know, for example which 4K or 2K block the error lies in, 
it can drops only the range of data,

not the whole tree block, which can do great help for later btrfsck things.

In this point of view, 4 crc32 for 16K leaf/node (1 crc32 for 4K) may be 
more productive than single sha256.

Thanks,
Qu

I could see this possibly being useful for having fewer false
positives when using the inbuilt checksums for purposes of dedup.

Hugo.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Hugo Mills

On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:
> >This brings a strong-but-slow checksum algorithm, sha256.
> >
> >Actually btrfs used sha256 at the early time, but then moved to
> >crc32c for
> >performance purposes.
> >
> >As crc32c is sort of weak due to its hash collision issue, we need
> >a stronger
> >algorithm as an alternative.
> >
> >Users can choose sha256 from mkfs.btrfs via
> >
> >$ mkfs.btrfs -C 256 /device
> 
> Agree with others about -C 256...-C sha256 is only three letters more ;)
> 
> What's the target for this mode?  Are we trying to find evil people
> scribbling on the drive, or are we trying to find bad hardware?

   You're going to need a hell of a lot more infrastructure to deal
with the first of those two cases. If someone can write arbitrary data
to your storage without going through the filesystem, you've already
lost the game.

   I don't know what the stats are like for random error detection
(probably just what you'd expect in the naive case -- 1/2^n chance of
failing to detect an error for an n-bit hash). More bits likely are
better for that, but how much CPU time do you want to burn on it?

   I could see this possibly being useful for having fewer false
positives when using the inbuilt checksums for purposes of dedup.

   Hugo.

-- 
Hugo Mills | That's not rain, that's a lake with slots in it
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |

signature.asc
Description: Digital signature

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Chris Mason


On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo  wrote:

This brings a strong-but-slow checksum algorithm, sha256.

Actually btrfs used sha256 at the early time, but then moved to 
crc32c for

performance purposes.

As crc32c is sort of weak due to its hash collision issue, we need a 
stronger

algorithm as an alternative.

Users can choose sha256 from mkfs.btrfs via

$ mkfs.btrfs -C 256 /device


Agree with others about -C 256...-C sha256 is only three letters more ;)

What's the target for this mode?  Are we trying to find evil people 
scribbling on the drive, or are we trying to find bad hardware?


-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread John Williams

On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
 wrote:

> Would there be room for a compromise with e.g. 128 bits?

For example, Spooky V2 hash is 128 bits and is very fast. It is
noncryptographic, but it is more than adequate for data checksums.

http://burtleburtle.net/bob/hash/spooky.html

SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
(Xeon E3-1270 V2 @ 3.50Ghz)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Duncan

Holger Hoffstätte posted on Mon, 24 Nov 2014 08:23:25 + as excerpted:

>> Users can choose sha256 from mkfs.btrfs via
>> 
>> $ mkfs.btrfs -C 256 /device
> 
> Not sure how others feel about this, but it's probably easier for
> sysadmins to specify the algorithm by name from the set of supported
> ones, similar to how ssh does it ("ssh -C arcfour256").


Yes.  Simply 256 is waaay too generic for me to be comfortable with.  
256 /what/?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Holger Hoffstätte

On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote:

> This brings a strong-but-slow checksum algorithm, sha256.
> 
> Actually btrfs used sha256 at the early time, but then moved to crc32c for
> performance purposes.
> 
> As crc32c is sort of weak due to its hash collision issue, we need a stronger
> algorithm as an alternative.

I'm curious - did you see actual cases where this happened, i.e. a corrupt
block that would pass crc32 validation? I know some high-integrity use
cases require a stronger algorithm - just wondering.

Would there be room for a compromise with e.g. 128 bits?

> Users can choose sha256 from mkfs.btrfs via
> 
> $ mkfs.btrfs -C 256 /device

Not sure how others feel about this, but it's probably easier for
sysadmins to specify the algorithm by name from the set of supported
ones, similar to how ssh does it ("ssh -C arcfour256").

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] Btrfs: add sha256 checksum option

2014-11-23 Thread Liu Bo

This brings a strong-but-slow checksum algorithm, sha256.

Actually btrfs used sha256 at the early time, but then moved to crc32c for
performance purposes.

As crc32c is sort of weak due to its hash collision issue, we need a stronger
algorithm as an alternative.

Users can choose sha256 from mkfs.btrfs via

$ mkfs.btrfs -C 256 /device

Signed-off-by: Liu Bo 
---
 fs/btrfs/Kconfig|   1 +
 fs/btrfs/check-integrity.c  |  13 --
 fs/btrfs/compression.c  |  30 ++---
 fs/btrfs/ctree.h|   8 +++-
 fs/btrfs/disk-io.c  | 106 
 fs/btrfs/disk-io.h  |   2 -
 fs/btrfs/file-item.c|  25 +--
 fs/btrfs/free-space-cache.c |   8 ++--
 fs/btrfs/hash.c |  47 
 fs/btrfs/hash.h |   9 +++-
 fs/btrfs/inode.c|  21 +
 fs/btrfs/ordered-data.c |  10 +++--
 fs/btrfs/ordered-data.h |   9 ++--
 fs/btrfs/scrub.c|  67 +---
 14 files changed, 237 insertions(+), 119 deletions(-)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index a66768e..0a4f9e7 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -2,6 +2,7 @@ config BTRFS_FS
tristate "Btrfs filesystem support"
select CRYPTO
select CRYPTO_CRC32C
+   select CRYPTO_SHA256
select ZLIB_INFLATE
select ZLIB_DEFLATE
select LZO_COMPRESS
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index cb7f3fe..98e1037 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1808,8 +1808,11 @@ static int btrfsic_test_for_metadata(struct 
btrfsic_state *state,
 {
struct btrfs_header *h;
u8 csum[BTRFS_CSUM_SIZE];
-   u32 crc = ~(u32)0;
unsigned int i;
+   struct {
+   struct shash_desc shash;
+   char ctx[crypto_shash_descsize(state->root->fs_info->csum_tfm)];
+   } desc;
 
if (num_pages * PAGE_CACHE_SIZE < state->metablock_size)
return 1; /* not metadata */
@@ -1819,14 +1822,18 @@ static int btrfsic_test_for_metadata(struct 
btrfsic_state *state,
if (memcmp(h->fsid, state->root->fs_info->fsid, BTRFS_UUID_SIZE))
return 1;
 
+   desc.shash.tfm = state->root->fs_info->csum_tfm;
+   desc.shash.flags = 0;
+   crypto_shash_init(&desc.shash);
+
for (i = 0; i < num_pages; i++) {
u8 *data = i ? datav[i] : (datav[i] + BTRFS_CSUM_SIZE);
size_t sublen = i ? PAGE_CACHE_SIZE :
(PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE);
 
-   crc = btrfs_crc32c(crc, data, sublen);
+   crypto_shash_update(&desc.shash, data, sublen);
}
-   btrfs_csum_final(crc, csum);
+   crypto_shash_final(&desc.shash, csum);
if (memcmp(csum, h->csum, state->csum_size))
return 1;
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d3220d3..d10883f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -78,7 +78,7 @@ struct compressed_bio {
 * the start of a variable length array of checksums only
 * used by reads
 */
-   u32 sums;
+   u8 sums[];
 };
 
 static int btrfs_decompress_biovec(int type, struct page **pages_in,
@@ -111,31 +111,29 @@ static int check_compressed_csum(struct inode *inode,
struct page *page;
unsigned long i;
char *kaddr;
-   u32 csum;
-   u32 *cb_sum = &cb->sums;
+   u8 csum[BTRFS_CSUM_SIZE];
+   u8 *cb_sum = cb->sums;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+   u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
return 0;
 
for (i = 0; i < cb->nr_pages; i++) {
page = cb->compressed_pages[i];
-   csum = ~(u32)0;
 
kaddr = kmap_atomic(page);
-   csum = btrfs_csum_data(kaddr, csum, PAGE_CACHE_SIZE);
-   btrfs_csum_final(csum, (char *)&csum);
+   btrfs_csum(fs_info, kaddr, PAGE_CACHE_SIZE, csum);
kunmap_atomic(kaddr);
 
-   if (csum != *cb_sum) {
+   if (memcmp(csum, cb_sum, csum_size)) {
btrfs_info(BTRFS_I(inode)->root->fs_info,
-  "csum failed ino %llu extent %llu csum %u wanted %u 
mirror %d",
-  btrfs_ino(inode), disk_start, csum, *cb_sum,
-  cb->mirror_num);
+  "csum failed ino %llu extent %llu mirror %d",
+  btrfs_ino(inode), disk_start, cb->mirror_num);
ret = -EIO;
goto fail;
}
-   cb_sum++;
-
+   cb_sum += csum_size;
}
ret = 0;
 fail:
@@ -578,7 +576,8 @@ int btrfs_submit_compressed_

67 matches

Mail list logo