On 11/20/15 10:22 AM, Stephen Farrell wrote:

(This combines the distributions for Simon's mail to gen-art
and secdir. I think one thread responding to my question(s)
below will work better.)

+1

I was out of my depth reviewing this because I am not familiar with common practice in defining security algorithms. I reviewed this from the perspective of a software engineer and found it very difficult to follow.

So it is probably better if secdir looks at my comments and decides if they have any merit and need further attention.

        Thanks,
        Paul

Hi Simon, all,

Thanks for the reviews and updates.

Does anyone think we need more review for this or is
it now ready for IESG eval? Modulo one question below,
I think it is ready to move forward, but there are a
lot of detailed changes in this revision, so it might
be prudent to try get a few eyeballs on those.

One other thing below...

On 20/11/15 12:29, Simon Josefsson wrote:
Executive Summary: I have submitted
https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-04
that (hopefully) resolve the gen-art and secdir feedback.

Paul Kyzivat <pkyzi...@alum.mit.edu> writes:

This draft is on the right track but has open issues, described in the
review.

Hello Paul.  I am sorry for the delay in answering your thorough (much
appreciated!) review.  Your review was emailed to the wrong
@tools.ietf.org address, but I can't blame that for my delay since I was
notified of your review through the secdir review.

(Issue-1): Intended status

The intended status of this document is Informational. I question why
it is not a normative document. As best I can tell, this is the formal
specification of the algorithm. Those that use it would presumably
want to claim conformance to it. The introduction describes this as an
alternative to other KDF functions, including only one with an RFC
reference - RFC2898. That one is also informational, but it is a
restatement of an algorithm specified elsewhere, so that RFC can be
viewed as an informational supplement to the actual definition. The
same is not true of this document.

(Of course changing this to a normative document would require
significant changes, including adding 2119 language. And it probably
could then not be handled as an AD-sponsored document.)

As I understand it, the IETF tradition is that descriptions of crypto
algorithms that were invented elsewhere are not published as Standards
Tracks documents.  So this is in line with that tradition.

(Issue-2): Type mismatch on call to scryptROMix

The scryptROMix function calls scryptBlockMix with an octet vector of
length 128*r octets. But the definition of scryptBlockMix specifies
that the input argument should be a vector of 2*r 64-octet blocks.

Clearly these don't match. One way to make them match would be to
divide the single 128*r octet vector into two 64-octet vectors, and
then to treat r as 2 inside of scryptBlockMix. I don't know if that is
the intent.

I believe this is a presentation issue.  The intention is indeed that
conversion between these formats is transparent.  In performant
implementations, B will refers to the same memory area.  The document
was confusing here, and I believe the document was further problematic
since it described the inputs as an array of elements, rather than a
concatenation of octets.  It now reads:

    Input:
             B[0] || B[1] || ... || B[2 * r - 1]
                    Input octet string (of size 128 * r octets),
                    treated as 2 * r 64-octet blocks.

    Output:
             B'[0] || B'[1] || ... || B'[2 * r - 1]
                    Output octet string.

I hope this improves this aspect.

(Issue-3): Definition of Integerify

The Integerify function is not clearly defined. The following is given
in the document:

   j = Integerify (X) mod N
       where Integerify (B[0] ... B[2 * r - 1]) is defined
       as the result of interpreting B[2 * r - 1] as a
       little-endian integer.

I can make no sense of this definition of Integerify. The description
implies that B must be an array containing elements up to index
2*r-1. But the definition of B is "Input octet vector of length 128 *
r octets". Taking the definition literally, B[2*r-1] must be an octet,
and 2*r must be less than 128. That seems like nonsense to me.

I found the following in the [SCRYPT] paper:

"We expect that for reasons of performance and simplicity,
implementors will restrict N to being a power of 2, in which case
the function Integerify can be replaced by reading the first (or
last) machine-length word from a k-bit block."

Simply reading a machine-length word ignores the differences between
little-endian and big-endian machines, and machines with different
word sizes. Conveniently, [SALSA20SPEC] defines a littleendian
function that yields a 32-bit integer from four bytes. That should be
sufficient bits for computing "j". So Integerify(X) could be defined
as:

    littleendian(X[0],X[1],X[2],X[3])

or

    littleendian(X[128*r-4],X[128*r-3],X[128*r-2],X[128*r-1])

(I don't think it matters which, as long as everyone does it the same way.)

In any case, the language is ambiguous and needs to be clarified.

Right.  I have changed this into:

           j = Integerify (X) mod N
               where Integerify (X) is defined as the result of
               interpreting the last four octets of X as a little-
               endian integer, i.e.:
                   littleendian(X[128*r-4], X[128*r-3],
                                X[128*r-2], X[128*r-1])

-------------
Minor issues:

(Issue-4): Identifiers reused for different meanings

In scrypt, "B" is an array of "p" vectors, each of which is 128*r
octets. In scryptROMix, "B" is a single vector of 128*r octets. In
scryptBlockMix, "B" is a vector of 2*r 64-octet blocks.

I'm not sure that changing variable names here is a good idea.  For any
performant implementation, these variables will refer to the same memory
area and thus a consistent variable name helps to signal that.  It is
just the interpretation of that memory area that is different in these
functions.

In both scrypt and scryptROMix "r" is the same block size
parameter. But in scryptROMix it is only used in the (broken)
definition of Integerify.

It is used in the variable descriptions too, to indicate length of the
variables.

In scryptBlockMix "r" is (apparently, if I have figured things out)
always 2, and unrelated to the other "r".

No, r is the same throughout.

The document would be clearer if distinct identifiers were used for
each unique concept.

I believe the identifier refers to unique concepts.  For B, what differs
is how that memory area is interpreted in each algorithm description.

Can you think of some way to make this more clear, if the changes I've
made now aren't sufficient?  I believe some changes I have made already
make this somewhat clearer though.

For those identifiers whose value is intended to be constant and
common across all the functions (such as "N"), it would be better to
define them once, globally.

(Issue-5): Confusing/misleading names/definitions of identifiers

The "Block size" parameter ("r") does not denote the size of a
block. It is a factor in the size of blocks, varying from function to
function. Exactly what concept it denotes, and how one would choose
it, isn't clear to me.

The definition of the "N" parameter (CPU/Memory cost parameter) isn't
especially clear. It appears that increasing N increases the cost both
of CPU and memory. But the "p" (parallelization) parameter acts as a
multiplier on N, also increasing the cost. It is far from clear how
one would choose appropriate values for N and p. For a given value of
N*p, is it better for N to be large, or p to be large?

I suggest that more thought be given to what these things mean in the
context of this application, and then choose identifier names and
descriptions accordingly. It may be better to refactor these some
other way.

This came from the secdir review as well -- I have added a section
"Scrypt parameters" to discuss this.

2.  Scrypt Parameters

    The scrypt function takes several parameters.  The passphrase P is
    typically a human-chosen password.  The salt is normally uniquely and
    randomly generated [RFC4086].  The parameter r ("blockSize") specify
    the block size.  The CPU/Memory cost parameter N ("costParameter")
    must be larger than 1, a power of 2 and less than 2^(128 * r / 8).
    The parallelization parameter p ("parallelizationParameter"), a
    positive integer less than or equal to ((2^32-1) * 32) / (128 * r).
    The intended output length dkLen in octets of the derived key
    ("keyLength"); a positive integer less than or equal to (2^32 - 1) *
    32.

    Users of scrypt can tune the parameters N, r, and p according to the
    amount of memory and computing power available, the latency-bandwidth
    product of the memory subsystem, and the amount of parallelism
    desired.  At the current time, taking r=8 and p=1 appears to yield
    good results, but as memory latency and CPU parallelism increase it
    is likely that the optimum values for both r and p will increase.
    Note also that since the computations of SMix are independent, a
    large value of p can be used to increase the computational cost of
    scrypt without increasing the memory usage; so we can expect scrypt
    to remain useful even if the growth rates of CPU power and memory
    capacity diverge.

I think that new text will almost certainly generate debate in
the IESG. You introduce a whole bunch of parameters but only
give recommended values for two. You will for sure be asked
what's good for the others. So.... what's good for the others
and why not include that in the draft?

Cheers,
S.



The ASN.1 in section 6 assigns names to several of these
identifiers. It would be helpful to readers if the names used in
defining the algorithms were also the names used here.

I've added these names to the "Scrypt Parameters" section above.

(Issue-6): Dubious stability of references

I looked for prior discussion of this draft, and found some on the
saag mailing list regarding the references.

The definition of the Salsa20 hash function in
http://cr.yp.to/snuffle/spec.pdf seems clear enough, but is the
document reference stable? It might be safer to replicate the
definition in this document (in an appendix) with attribution. It
doesn't appear that there is any copyright in the referenced
document.

I have included the brief C snippet that defines the function in the
section, which hopefully is sufficient clear for implementers (together
with the already included test vector) to transpose it into something
working in any language.  This allowed me to move those two references
to informative ones.

I'll also note that call to this hash function in scryptBlockMix is to
"Salsa", not Salso20. It ought to be consistent with the definition.

This is explained in the section:

  Below, Salsa(T) corresponds to the Salsa20/8 Core function applied to
    the octet vector T.

------------------------
Nits/editorial comments:

(Issue-7): IdNits reported errors

IdNits reports:

-- Obsolete informational reference (is this intentional?): RFC 5208
      (Obsoleted by RFC 5958)

This is triggered by the following in section 6:

"To be usable in PKCS#8 [RFC5208] and Asymmetric Key Packages
[RFC5958] the following extension of the PBES2-KDFs type is needed."

Since RFC5958 is referenced, and it obsoletes RFC5208, what is the
reason for referencing both?

The reason was that a lot of protocols refers to 'PKCS#8' and RFC 5208
is the IETF-canonical reference for that term, and a way to illustrate
that the scrypt ASN.1 schema is compliant with "old" PKCS#8.  RFC 5958
obsoletes RFC 5208 but the identifier 'PKCS#8' was lost in the process.

I'm fine with dropping 5208 as a reference, but I do believe it provides
slightly more value to the reader.

(There is also an error about 2119, but it is bogus, triggered by the
use of OPTIONAL in the ASN.1.)

This completes my review.

Thank you!

/Simon



_______________________________________________
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to