On Mon, Dec 16, 2024 at 01:19:06AM +0100, lou wrote: > Hi, > > Here's the promised follow-up to my earlier memory-mapped-files > patch.
Hi Lou, Thanks for the patch! > Enabling simple-md5 to digest big files by splitting files that > are bigger than what the C implementation can digest into chunks > and recuring over them. Yeah that's definitely a missing bit from the original egg. > Two notes: > > 1. This implementation is really only useful for posix > environments. The alternative `mapped-pointer` implementation > for Windows means, as far as I understand, that the Windows > version is limited by physical memory. I *really* don't remember where that code came from. Possibly from Felix's simple-sha1 egg. This egg is basically a copy of that but with sha1 replaced with md5. On that note... > 2. I hope I got the types right. I got a bit confused because > `unsigned-integer` seems to only accept values up to INT_MAX > not up to UINT_MAX, which wasn't clear to me from the > documentation here: > http://wiki.call-cc.org/man/5/Foreign%20type%20specifiers#integers Yeah, I had a look at that. The unsigned-integer type can definitely hold UINT_MAX, but the problem is that you're passing it to pointer+ which is defined to take "integer", which maps to the "int" type in C, resulting in a loss of precision. Perhaps this should be changed to offset_t, but we don't have a foreign type for that (yet). We could of course abuse ssize_t... Anyway, more to the point: if you look at these changes: https://bugs.call-cc.org/changeset/43992/project you'll see that I reverted part of your changes. The reason is that IMO it's better to keep using "unsigned-int", as that will always use fixnums, which are faster to decode than arbitrary integers, (as INT_MAX will typically be converted to a bignum). I've used most-positive-fixnum as the largest representable fixnum instead of pulling in limits.h. Not that it matters a great deal, because the loop should typically not be executed that often, and the main procedure should not be called that often (and it might actually be faster to use INT_MAX for large files), but I think simpler and less fussy code is always good. I got some compilation warnings on simple-md5 for CHICKEN 6 due to using the u8vector / bytevector types for the "context" structure. Instead, I reverted that to the original (nonnull-)scheme-pointer so that the warnings go away. This type works for bytevectors. @Felix: I think the manual can improve the wording here a bit. It states: "scheme-pointer is typically used to get a pointer to the raw byte content of bytevectors. But if you pass in a SRFI-4 vector, you will get a pointer to a bytevector object header (not the bytevectors's contents), which is almost certainly wrong. Instead, convert to a bytevector beforehand, or use a SRFI-4 specific type." This is a bit misleading/confusing - you can pass u8vector because that's equivalent to bytevector and directly encoded. Only the other srfi-4 types have wrapper objects. In the case of this egg, using scheme-pointer is better because it is equivalent to "void *" on the C level, which gets rid of the type warnings you'd get if using "char *" for the "struct MD5Context *" arguments to the MD5Update function. > I'll take a look at simple-sha1 in a couple of days, I suspect it > would benefit from a similar patch but I haven't checked that out > yet. Hope this is useful to some people. Yeah, it should be the same, as that's the "parent" egg. Cheers, Peter
