On 6/29/20 4:08 PM, Graham Leggett wrote:
> On 29 Jun 2020, at 11:41, Ruediger Pluem <rpl...@apache.org 
> <mailto:rpl...@apache.org>> wrote:
> 
>>> +        etag = apr_palloc(r->pool, weak_len + sizeof("\"\"") +
>>> +                4*(APR_SHA1_DIGESTSIZE/3) + vlv_len + 4);
>>
>> Using apr_base64_encode_len in the formula above would make it easier to 
>> understand and read IMHO.
> 
> It would also mean it could no longer be optimised to a constant. What side 
> do we want to fall on?

Fair enough. The above is faster. Nevertheless I would prefer to group that 
better. Probably a separate define
SHA1_DIGEST_BASE64_LEN that only defines the base64 length of the base64 encode 
SHA1 digest. This would mean
more calc work for the compiler and not during runtime, but I guess this is 
acceptable.

> 
>>> +
>>> +        apr_sha1_init(&context);
>>> +        nbytes = sizeof(buf);
>>> +        while (apr_file_read(fd, buf, &nbytes) == APR_SUCCESS) {
>>
>> I assume that the code has been taken from ap_md5digest, but
>> if we have MMAP available and if we have not disabled MMAP we use MMAP to 
>> read the contents of the file
>> if sendfile is not possible (e.g. SSL connections).
>> Wouldn't using MMAP more performant here especially in case of larger files?
> 
> It makes sense, but I would want to change something like this separately.

Makes sense.
Do you see a possibility to merge this code and the one of ap_md5digest to a 
more generic procedure that
allows to choose the digest algorithm while using 'MMAPED' reads?
BTW: Is sha1 mandatory for strong etags? If not wouldn't MD5 be enough and if 
MD5 is seen as too insecure
why isn't sha1?

>>> +            apr_sha1_update_binary(&context, buf, nbytes);
>>> +            nbytes = sizeof(buf);
>>> +        }
>>> +        apr_file_seek(fd, APR_SET, &offset);
>>
>> Why do we always reset the file pointer to 0? Why don't we get the actual 
>> file pointer of fd before we do the reading
>> and restore it afterwards?
>
> I am guessing that the why is lost in the mists of time. As a guess, it 
> avoids an additional call.
>
> Using the actual file pointer is cleaner, as there is a guarantee that the 
> function does not have any side effects.
>
> http://svn.apache.org/viewvc?rev=1879332&view=rev

Thanks. Another option that just came to mind would it be also fine in case 
that we get an fd passed to do
an apr_file_dup() on that descriptor such that we also do not touch the actual 
buffering (if enabled) of the used fd?
Furthermore don't we need to do a seek to 0 before we start reading? Who tells 
us that the filepointer is at 0 for a passed fd?


Regards

Rüdiger

Reply via email to