[openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-08-24 Thread Matt Caswell via RT
Resolved by overlapping buffer checks. Closing.

Matt

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-06-15 Thread Salz, Rich via RT
Not defined means we make no  guarantees.  OpenSSL can depend on what it knows 
to be true.  In the next release we can revisit this.


-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-06-15 Thread David Benjamin via RT
I don't think that will work. The SSL code uses in-place buffers
extensively, so in == out definitely needs to be defined. The question is
only whether out < in is also acceptable.

Either way, for BoringSSL, I've gone ahead and tightened our aliasing
constraints to forbid out < in and require equality, so that we don't have
to keep chasing down discrepancies in the assembly code in advance of a
decision being made here.

(I think there is something to be said for being able to in-place-ish
decrypt a structure with a record header and write the output without the
header, but perhaps this use case is not worth the cost---I see the numbers
went down slightly for chacha-x86.pl. Then again, most other files manage
it naturally. It's a decision you all will need to make.)

David


On Wed, Jun 15, 2016 at 11:01 AM Rich Salz via RT  wrote:

> I think for now, we just note this in the documentation: behavior for
> overlapping buffers, and even in-place buffers, is not defined.
>
> It's like memcpy() vs memmove().
>
> --
> Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
> Please log in as guest with password guest if prompted
>
>

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


[openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-06-15 Thread Rich Salz via RT
I think for now, we just note this in the documentation: behavior for
overlapping buffers, and even in-place buffers, is not defined.

It's like memcpy() vs memmove().

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-06-07 Thread Brian Smith via RT
Brian Smith  wrote:
> It seems that 32-bit ARM has the same limitation as x86 that the input and
> output pointers must match or the input and output buffers must not overlap
> at all. I'm not sure which ARM code path (NEON or non-NEON, or both) has
> this issue.

Just to follow up on this: I think this might actually be a QEMU ARM
(32-bit) emulator bug, or a configuration issue on my part. In one
version of the QEMU emulator, I have no trouble. But, in another,
newer, version of the QEMU emulator, I get results like this for
BoringSSL's chacha_test (modified to print all the results before
failing):

Mismatch at length 64 with in-place offset 1.
Mismatch at length 64 with in-place offset 2.
Mismatch at length 64 with in-place offset 5.
Mismatch at length 64 with in-place offset 6.
Mismatch at length 64 with in-place offset 9.

Notice, in particular, that it only happens when the input length is
64, and only for specific offsets. Like I said, I consistently get
these failures on the Android emulator but not in a newer version of
QEMU. It doesn't make any difference whether NEON is enabled or
disabled; I believe this is because the ARM code only uses NEON if
there are at least 3 blocks.

Anyway, I see in the ARM chacha code that there is a special case when
the length is 64, so it might be worth double-checking that code.

Just FYI.


-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-04-16 Thread Brian Smith via RT
It seems that 32-bit ARM has the same limitation as x86 that the input and
output pointers must match or the input and output buffers must not overlap
at all. I'm not sure which ARM code path (NEON or non-NEON, or both) has
this issue.

Cheers,
Brian
-- 
https://briansmith.org/

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-10 Thread David Benjamin
The current state is that, as far as I can tell, overlapping requirements
are undocumented (or is it somewhere and I missed it?) and, for ChaCha,
architecture-specific. I think something certainly needs to be done. Either
changing chacha-x86.pl and allowing any out <= in overlap, or declaring
that you want out == in (or something else) with, at minimum, a
documentation change.

I would actually suggest going further and updating EVP_CipherUpdate to
enforce the rule and raise an error if the caller doesn't honor it.
Otherwise we'll continue to be in the situation where callers may write
code that works on some architectures but not others. (BoringSSL's EVP_AEAD
API will fail with OUTPUT_ALIASES_INPUT if aliasing requirements aren't
honored.)

Actually, I'm not sure how to best translate an out == in rule to streaming
EVP_CipherUpdate for block ciphers. Imagine feeding one byte at a time to
EVP_CipherUpdate, in will naturally get ahead of out and then synchronize
at block boundaries, so the rule can't be as straight forward as "out ==
in". (Whereas out <= in naturally covers this behavior.)

Given the numbers in
https://mta.openssl.org/pipermail/openssl-dev/2016-March/005625.html
the cost seems fairly modest and this is only for 32-bit, not 64-bit. Based
on that, and that other implementations I've tested handle the case fine, I
think this is a reasonable requirement to impose.

Of course, I am also biased here because out == in will cause me some
nuisance. :-) One can certainly argue that out == in is perhaps easier to
handle than out <= in and it is not worth allowing it.

Either way, I'm not an OpenSSL team member and can't make a decision on
behalf of you all. This is something you all have to pick from.

David

On Fri, Mar 4, 2016 at 7:24 AM Andy Polyakov via RT  wrote:

> >>> If the other EVP ciphers universally allow this then I think we must
> >> treat this
> >>> as a bug, because people may be relying on this behaviour. There is
> also
> >>> sporadic documentation in lower-level APIs (AES source and des.pod)
> that
> >> the
> >>> buffers may overlap.
> >>>
> >>> If it's inconsistent then, at the very least, we must document that it
> >> is not
> >>> allowed.
> >>
> >> I'd like to argue that EVP is not place to provide any guarantees about
> >> partially overlapping buffers. Even though all current ciphers process
> >> data in ascending address order, we shouldn't make assumption that there
> >> won't be one that processes data in reverse order.
> >
> >
> > I'm afraid that, since we haven't documented it, the world may already
> have
> > made that assumption.
>
> Fear is irrational and destructive feeling. Having faith that world is
> better than that it nothing but healthy :-) What I'm saying is that
> let's put a little bit more substance into discourse. Would anybody
> consider it *sane* programming practice to rely on partially overlapping
> buffers in *general* case? I.e. without actually *knowing* (as opposite
> to *assuming*) what's gong on? [Control question: does compiler
> guarantee order of references to memory?] As said in last message I
> don't consider it sane and even consider it natural [which means that
> I'd expect majority to not consider it sane too].
>
> Once again, I'm not saying that nothing would be done, I simply want to
> figure out where does line go. From my personal view point I'd say that
> nothing *has to* be done, but it's just me. You seem to say that we're
> obliged to support partially overlapping buffers. My question then is
> *any* overlap, *any* cost? Shall we settle for simply writing down that
> application developer may not rely on partially overlapping buffers? If
> so, do we fix the modules in question arguing that this quality might be
> desirable in different context [where modules in question can be used]?
>
>
>
> --
> Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
> Please log in as guest with password guest if prompted
>
>
-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-10 Thread David Benjamin via RT
The current state is that, as far as I can tell, overlapping requirements
are undocumented (or is it somewhere and I missed it?) and, for ChaCha,
architecture-specific. I think something certainly needs to be done. Either
changing chacha-x86.pl and allowing any out <= in overlap, or declaring
that you want out == in (or something else) with, at minimum, a
documentation change.

I would actually suggest going further and updating EVP_CipherUpdate to
enforce the rule and raise an error if the caller doesn't honor it.
Otherwise we'll continue to be in the situation where callers may write
code that works on some architectures but not others. (BoringSSL's EVP_AEAD
API will fail with OUTPUT_ALIASES_INPUT if aliasing requirements aren't
honored.)

Actually, I'm not sure how to best translate an out == in rule to streaming
EVP_CipherUpdate for block ciphers. Imagine feeding one byte at a time to
EVP_CipherUpdate, in will naturally get ahead of out and then synchronize
at block boundaries, so the rule can't be as straight forward as "out ==
in". (Whereas out <= in naturally covers this behavior.)

Given the numbers in
https://mta.openssl.org/pipermail/openssl-dev/2016-March/005625.html
the cost seems fairly modest and this is only for 32-bit, not 64-bit. Based
on that, and that other implementations I've tested handle the case fine, I
think this is a reasonable requirement to impose.

Of course, I am also biased here because out == in will cause me some
nuisance. :-) One can certainly argue that out == in is perhaps easier to
handle than out <= in and it is not worth allowing it.

Either way, I'm not an OpenSSL team member and can't make a decision on
behalf of you all. This is something you all have to pick from.

David

On Fri, Mar 4, 2016 at 7:24 AM Andy Polyakov via RT  wrote:

> >>> If the other EVP ciphers universally allow this then I think we must
> >> treat this
> >>> as a bug, because people may be relying on this behaviour. There is
> also
> >>> sporadic documentation in lower-level APIs (AES source and des.pod)
> that
> >> the
> >>> buffers may overlap.
> >>>
> >>> If it's inconsistent then, at the very least, we must document that it
> >> is not
> >>> allowed.
> >>
> >> I'd like to argue that EVP is not place to provide any guarantees about
> >> partially overlapping buffers. Even though all current ciphers process
> >> data in ascending address order, we shouldn't make assumption that there
> >> won't be one that processes data in reverse order.
> >
> >
> > I'm afraid that, since we haven't documented it, the world may already
> have
> > made that assumption.
>
> Fear is irrational and destructive feeling. Having faith that world is
> better than that it nothing but healthy :-) What I'm saying is that
> let's put a little bit more substance into discourse. Would anybody
> consider it *sane* programming practice to rely on partially overlapping
> buffers in *general* case? I.e. without actually *knowing* (as opposite
> to *assuming*) what's gong on? [Control question: does compiler
> guarantee order of references to memory?] As said in last message I
> don't consider it sane and even consider it natural [which means that
> I'd expect majority to not consider it sane too].
>
> Once again, I'm not saying that nothing would be done, I simply want to
> figure out where does line go. From my personal view point I'd say that
> nothing *has to* be done, but it's just me. You seem to say that we're
> obliged to support partially overlapping buffers. My question then is
> *any* overlap, *any* cost? Shall we settle for simply writing down that
> application developer may not rely on partially overlapping buffers? If
> so, do we fix the modules in question arguing that this quality might be
> desirable in different context [where modules in question can be used]?
>
>
>
> --
> Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
> Please log in as guest with password guest if prompted
>
>

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread Andy Polyakov
>> Fear is irrational and destructive feeling. Having faith that world is
>> better than that it nothing but healthy :-) What I'm saying is that
>> let's put a little bit more substance into discourse. Would anybody
>> consider it *sane* programming practice to rely on partially overlapping
>> buffers in *general* case? I.e. without actually *knowing* (as opposite
>> to *assuming*) what's gong on? [Control question: does compiler
>> guarantee order of references to memory?] As said in last message I
>> don't consider it sane and even consider it natural [which means that
>> I'd expect majority to not consider it sane too].
> 
> One the cool features of the OCB code some folks I know to be using
> and relying on is that it supports in-place encryption.  You give
> it a buffer, and it is encrypted in place.  This is specifically
> promised by the API and is noticeably fast.
> 
> No idea whether this is a useful datapoint...

Question if specifically about *partially* overlapping buffers. Or in
other words it's not a question whether or not *fully* overlapping
buffers, a.k.a. in-place processing, should be supported (they should)
or may be used (they may).

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread Viktor Dukhovni

> On Mar 4, 2016, at 7:24 AM, Andy Polyakov via RT  wrote:
> 
> Fear is irrational and destructive feeling. Having faith that world is
> better than that it nothing but healthy :-) What I'm saying is that
> let's put a little bit more substance into discourse. Would anybody
> consider it *sane* programming practice to rely on partially overlapping
> buffers in *general* case? I.e. without actually *knowing* (as opposite
> to *assuming*) what's gong on? [Control question: does compiler
> guarantee order of references to memory?] As said in last message I
> don't consider it sane and even consider it natural [which means that
> I'd expect majority to not consider it sane too].

One the cool features of the OCB code some folks I know to be using
and relying on is that it supports in-place encryption.  You give
it a buffer, and it is encrypted in place.  This is specifically
promised by the API and is noticeably fast.

No idea whether this is a useful datapoint...

-- 
Viktor.

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread Andy Polyakov via RT
>>> If the other EVP ciphers universally allow this then I think we must
>> treat this
>>> as a bug, because people may be relying on this behaviour. There is also
>>> sporadic documentation in lower-level APIs (AES source and des.pod) that
>> the
>>> buffers may overlap.
>>>
>>> If it's inconsistent then, at the very least, we must document that it
>> is not
>>> allowed.
>>
>> I'd like to argue that EVP is not place to provide any guarantees about
>> partially overlapping buffers. Even though all current ciphers process
>> data in ascending address order, we shouldn't make assumption that there
>> won't be one that processes data in reverse order.
> 
> 
> I'm afraid that, since we haven't documented it, the world may already have
> made that assumption.

Fear is irrational and destructive feeling. Having faith that world is
better than that it nothing but healthy :-) What I'm saying is that
let's put a little bit more substance into discourse. Would anybody
consider it *sane* programming practice to rely on partially overlapping
buffers in *general* case? I.e. without actually *knowing* (as opposite
to *assuming*) what's gong on? [Control question: does compiler
guarantee order of references to memory?] As said in last message I
don't consider it sane and even consider it natural [which means that
I'd expect majority to not consider it sane too].

Once again, I'm not saying that nothing would be done, I simply want to
figure out where does line go. From my personal view point I'd say that
nothing *has to* be done, but it's just me. You seem to say that we're
obliged to support partially overlapping buffers. My question then is
*any* overlap, *any* cost? Shall we settle for simply writing down that
application developer may not rely on partially overlapping buffers? If
so, do we fix the modules in question arguing that this quality might be
desirable in different context [where modules in question can be used]?



-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread emi...@openssl.org via RT
On Fri, Mar 4, 2016 at 12:48 PM Andy Polyakov via RT  wrote:

> > If the other EVP ciphers universally allow this then I think we must
> treat this
> > as a bug, because people may be relying on this behaviour. There is also
> > sporadic documentation in lower-level APIs (AES source and des.pod) that
> the
> > buffers may overlap.
> >
> > If it's inconsistent then, at the very least, we must document that it
> is not
> > allowed.
>
> I'd like to argue that EVP is not place to provide any guarantees about
> partially overlapping buffers. Even though all current ciphers process
> data in ascending address order, we shouldn't make assumption that there
> won't be one that processes data in reverse order.


I'm afraid that, since we haven't documented it, the world may already have
made that assumption.


> I'd even argue that
> not providing such guarantee is natural, i.e. can be naturally
> *implied*. Just like you may not expect a tablet to work after you glued
> wheels to it to make a skateboard, arguing that nowhere does it say that
> it's not a viable idea. It might work, and apparently did for somebody,
> but you may not *expect* it to, neither as tablet or skateboard. And
> tablet manufacturer has no obligation to disclaim it in writing.
>
> I'm not saying that this particular problem can't/won't be addressed,
> though I consider it kind of bad style. Because it kind of sets a
> precedent of creating an undesired illusion. BTW, further measurements
> have shown that unlike others, Core2 suffers 20% performance regression.
> Well, one can argue that nobody cares about Core2, but what if it was
> contemporary processor?
>
>
> --
> Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
> Please log in as guest with password guest if prompted
>
> --
> openssl-dev mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
>

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread Emilia Käsper
On Fri, Mar 4, 2016 at 12:48 PM Andy Polyakov via RT  wrote:

> > If the other EVP ciphers universally allow this then I think we must
> treat this
> > as a bug, because people may be relying on this behaviour. There is also
> > sporadic documentation in lower-level APIs (AES source and des.pod) that
> the
> > buffers may overlap.
> >
> > If it's inconsistent then, at the very least, we must document that it
> is not
> > allowed.
>
> I'd like to argue that EVP is not place to provide any guarantees about
> partially overlapping buffers. Even though all current ciphers process
> data in ascending address order, we shouldn't make assumption that there
> won't be one that processes data in reverse order.


I'm afraid that, since we haven't documented it, the world may already have
made that assumption.


> I'd even argue that
> not providing such guarantee is natural, i.e. can be naturally
> *implied*. Just like you may not expect a tablet to work after you glued
> wheels to it to make a skateboard, arguing that nowhere does it say that
> it's not a viable idea. It might work, and apparently did for somebody,
> but you may not *expect* it to, neither as tablet or skateboard. And
> tablet manufacturer has no obligation to disclaim it in writing.
>
> I'm not saying that this particular problem can't/won't be addressed,
> though I consider it kind of bad style. Because it kind of sets a
> precedent of creating an undesired illusion. BTW, further measurements
> have shown that unlike others, Core2 suffers 20% performance regression.
> Well, one can argue that nobody cares about Core2, but what if it was
> contemporary processor?
>
>
> --
> Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
> Please log in as guest with password guest if prompted
>
> --
> openssl-dev mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
>
-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-04 Thread Andy Polyakov via RT
> If the other EVP ciphers universally allow this then I think we must treat 
> this
> as a bug, because people may be relying on this behaviour. There is also
> sporadic documentation in lower-level APIs (AES source and des.pod) that the
> buffers may overlap.
> 
> If it's inconsistent then, at the very least, we must document that it is not
> allowed.

I'd like to argue that EVP is not place to provide any guarantees about
partially overlapping buffers. Even though all current ciphers process
data in ascending address order, we shouldn't make assumption that there
won't be one that processes data in reverse order. I'd even argue that
not providing such guarantee is natural, i.e. can be naturally
*implied*. Just like you may not expect a tablet to work after you glued
wheels to it to make a skateboard, arguing that nowhere does it say that
it's not a viable idea. It might work, and apparently did for somebody,
but you may not *expect* it to, neither as tablet or skateboard. And
tablet manufacturer has no obligation to disclaim it in writing.

I'm not saying that this particular problem can't/won't be addressed,
though I consider it kind of bad style. Because it kind of sets a
precedent of creating an undesired illusion. BTW, further measurements
have shown that unlike others, Core2 suffers 20% performance regression.
Well, one can argue that nobody cares about Core2, but what if it was
contemporary processor?


-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-01 Thread Andy Polyakov
> I'm unclear on what EVP_CIPHER's interface guarantees are, but our EVP_AEAD
> APIs are documented to allow in/out buffers to alias as long as out is <=
> in. This matches what callers might expect from a naive implementation.
> 
> Our AES-GCM EVP_AEADs, which share code with OpenSSL, have tended to match
> this pattern too. For ChaCha, of chacha-{x86,x86_64,armv4,armv8}.pl and the
> C implementation, all seem satisfy this (though it's possible I don't have
> complete coverage) except for chacha-x86.pl. That one works if in == out,
> but not if out is slightly behind.
> 
> We were able to reproduce problems when in = out + 1. The SSE3 code
> triggers if the input is at least 256 bytes and the non-SSE3 code if the
> input is at least 64 bytes. The non-SSE3 code is because the words in a
> block are processed in a slightly funny order (0, 4, 8, 9, 12, 14, 1, 2, 3,
> 5, 6, 7, 10, 11, 13, 15). I haven't looked at the SSE3 case carefully, but
> I expect it's something similar.

It's in 16-byte chunks numbered 0,4,8,12, 1,5,8,13, 2,6,...

> Could the blocks perhaps be processed in a more straight-forward ordering,
> so that chacha-x86.pl behaves like the other implementations? (It's nice to
> avoid bugs that only trigger in one implementation.) Or is this order
> necessary for something?

It's the order in which amount of references to memory is minimal. But
double-check attached.


diff --git a/crypto/chacha/asm/chacha-x86.pl b/crypto/chacha/asm/chacha-x86.pl
index 850c917..986e7f7 100755
--- a/crypto/chacha/asm/chacha-x86.pl
+++ b/crypto/chacha/asm/chacha-x86.pl
@@ -19,13 +19,13 @@
 # P4   18.6/+84%
 # Core29.56/+89%   4.83
 # Westmere 9.50/+45%   3.35
-# Sandy Bridge 10.5/+47%   3.20
-# Haswell  8.15/+50%   2.83
-# Silvermont   17.4/+36%   8.35
+# Sandy Bridge 10.7/+47%   3.24
+# Haswell  8.22/+50%   2.89
+# Silvermont   17.8/+36%   8.53
 # Sledgehammer 10.2/+54%
-# Bulldozer13.4/+50%   4.38(*)
+# Bulldozer13.5/+50%   4.39(*)
 #
-# (*)  Bulldozer actually executes 4xXOP code path that delivers 3.55;
+# (*)  Bulldozer actually executes 4xXOP code path that delivers 3.50;
 
 $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
 push(@INC,"${dir}","${dir}../../perlasm");
@@ -238,18 +238,20 @@ if ($xmm) {
 
&xor($a, &DWP(4*0,$b)); # xor with input
&xor($b_,&DWP(4*4,$b));
-   &mov(&DWP(4*0,"esp"),$a);
+   &mov(&DWP(4*0,"esp"),$a);   # off-load for later write
&mov($a,&wparam(0));# load output pointer
&xor($c, &DWP(4*8,$b));
&xor($c_,&DWP(4*9,$b));
&xor($d, &DWP(4*12,$b));
&xor($d_,&DWP(4*14,$b));
-   &mov(&DWP(4*4,$a),$b_); # write output
-   &mov(&DWP(4*8,$a),$c);
-   &mov(&DWP(4*9,$a),$c_);
-   &mov(&DWP(4*12,$a),$d);
-   &mov(&DWP(4*14,$a),$d_);
+   &mov(&DWP(4*4,"esp"),$b_);
+   &mov($b_,&DWP(4*0,"esp"));
+   &mov(&DWP(4*8,"esp"),$c);
+   &mov(&DWP(4*9,"esp"),$c_);
+   &mov(&DWP(4*12,"esp"),$d);
+   &mov(&DWP(4*14,"esp"),$d_);
 
+   &mov(&DWP(4*0,$a),$b_); # write output in order
&mov($b_,&DWP(4*1,"esp"));
&mov($c, &DWP(4*2,"esp"));
&mov($c_,&DWP(4*3,"esp"));
@@ -266,35 +268,45 @@ if ($xmm) {
&xor($d, &DWP(4*5,$b));
&xor($d_,&DWP(4*6,$b));
&mov(&DWP(4*1,$a),$b_);
+   &mov($b_,&DWP(4*4,"esp"));
&mov(&DWP(4*2,$a),$c);
&mov(&DWP(4*3,$a),$c_);
+   &mov(&DWP(4*4,$a),$b_);
&mov(&DWP(4*5,$a),$d);
&mov(&DWP(4*6,$a),$d_);
 
-   &mov($b_,&DWP(4*7,"esp"));
-   &mov($c, &DWP(4*10,"esp"));
+   &mov($c,&DWP(4*7,"esp"));
+   &mov($d,&DWP(4*8,"esp"));
+   &mov($d_,&DWP(4*9,"esp"));
+   &add($c,&DWP(64+4*7,"esp"));
+   &mov($b_, &DWP(4*10,"esp"));
+   &xor($c,&DWP(4*7,$b));
&mov($c_,&DWP(4*11,"esp"));
+   &mov(&DWP(4*7,$a),$c);
+   &mov(&DWP(4*8,$a),$d);
+   &mov(&DWP(4*9,$a),$d_);
+
+   &add($b_, &DWP(64+4*10,"esp"));
+   &add($c_,&DWP(64+4*11,"esp"));
+   &xor($b_, &DWP(4*10,$b));
+   &xor($c_,&DWP(4*11,$b));
+   &mov(&DWP(4*10,$a),$b_);
+   &mov(&DWP(4*11,$a),$c_);
+
+   &mov($c,&DWP(4*12,"esp"));
+   &mov($c_,&DWP(4*14,"esp"));
&mov($d, &DWP(4*13,"esp"));
&mov($d_,&DWP(4*15,"esp"));
-   &add($b_,&DWP(64+4*7,"esp"));
-   &add($c, &DWP(64+4*10,"esp"));
-   &add($c_,&DWP(64+4*11,"esp"));
&add($d, &DWP(64+4*13,"esp"));
&add($d_,&DWP(64+4*15,"esp"));
-   &xor($b_,&DWP(4*7,$b));
-   &xor($c, &DWP(4*10,$b));
-   &xor($c_,&DWP(4*11,$b));
&xor($d, &DWP(4*13,$b));
&xor($d_,&DWP(4*15,$b));
&le

[openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-01 Thread Emilia Käsper via RT
If the other EVP ciphers universally allow this then I think we must treat this
as a bug, because people may be relying on this behaviour. There is also
sporadic documentation in lower-level APIs (AES source and des.pod) that the
buffers may overlap.

If it's inconsistent then, at the very least, we must document that it is not
allowed.

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


[openssl-dev] [openssl.org #4362] chacha-x86.pl has stricter aliasing requirements than other files

2016-03-01 Thread David Benjamin via RT
I'm unclear on what EVP_CIPHER's interface guarantees are, but our EVP_AEAD
APIs are documented to allow in/out buffers to alias as long as out is <=
in. This matches what callers might expect from a naive implementation.

Our AES-GCM EVP_AEADs, which share code with OpenSSL, have tended to match
this pattern too. For ChaCha, of chacha-{x86,x86_64,armv4,armv8}.pl and the
C implementation, all seem satisfy this (though it's possible I don't have
complete coverage) except for chacha-x86.pl. That one works if in == out,
but not if out is slightly behind.

We were able to reproduce problems when in = out + 1. The SSE3 code
triggers if the input is at least 256 bytes and the non-SSE3 code if the
input is at least 64 bytes. The non-SSE3 code is because the words in a
block are processed in a slightly funny order (0, 4, 8, 9, 12, 14, 1, 2, 3,
5, 6, 7, 10, 11, 13, 15). I haven't looked at the SSE3 case carefully, but
I expect it's something similar.

Could the blocks perhaps be processed in a more straight-forward ordering,
so that chacha-x86.pl behaves like the other implementations? (It's nice to
avoid bugs that only trigger in one implementation.) Or is this order
necessary for something?

David

-- 
Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4362
Please log in as guest with password guest if prompted

-- 
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev