Hi,
Here's the promised follow-up to my earlier memory-mapped-files
patch.
Enabling simple-md5 to digest big files by splitting files that
are bigger than what the C implementation can digest into chunks
and recuring over them.
Two notes:
1. This implementation is really only useful for posix
environments. The alternative `mapped-pointer` implementation
for Windows means, as far as I understand, that the Windows
version is limited by physical memory. I don't have a Windows
machine, someone who does should do some testing on whether
the divergence is needed. Nominally `memory-mapped-files`
implements a mmap equivalent for Windows, so it should be
possible to just use the posix code unconditionally.
Putting the `mapped-pointer` call inside the recursion,
mapping the individual chunks, would work around this as
well, but result in an unnecessary amount of mmaps on posix
systems.
2. I hope I got the types right. I got a bit confused because
`unsigned-integer` seems to only accept values up to INT_MAX
not up to UINT_MAX, which wasn't clear to me from the
documentation here:
http://wiki.call-cc.org/man/5/Foreign%20type%20specifiers#integers
I'll take a look at simple-sha1 in a couple of days, I suspect it
would benefit from a similar patch but I haven't checked that out
yet. Hope this is useful to some people.
~~Lou
Index: simple-md5.scm
===================================================================
--- simple-md5.scm (revision 43989)
+++ simple-md5.scm (working copy)
@@ -6,16 +6,18 @@
(import scheme (chicken base) (chicken blob) (chicken file)
(chicken foreign) (chicken file posix) (chicken fixnum)
- memory-mapped-files)
+ (chicken memory) memory-mapped-files)
(foreign-declare "#include \"md5-base.c\"")
+(foreign-declare "#include <limits.h>")
(define digest-length (foreign-value "MD5_DIGEST_SIZE" unsigned-int))
(define context-size (foreign-value "sizeof(struct MD5Context)" unsigned-int))
+(define chunk-size (foreign-value "INT_MAX" unsigned-integer))
(define init (foreign-lambda void MD5Init scheme-pointer))
-(define update (foreign-lambda void MD5Update scheme-pointer scheme-pointer unsigned-int))
-(define raw-update (foreign-lambda void MD5Update scheme-pointer c-pointer unsigned-int))
+(define update (foreign-lambda void MD5Update scheme-pointer scheme-pointer unsigned-integer))
+(define raw-update (foreign-lambda void MD5Update scheme-pointer c-pointer unsigned-integer))
(define final (foreign-lambda void MD5Final scheme-pointer scheme-pointer))
(define (char->hexdigits c)
@@ -52,6 +54,11 @@
(ptr (memory-mapped-file-pointer mmap)))
(k ptr (cut unmap-file-from-memory mmap))))))
+(define (chunk-update ctxt buffer fsize)
+ (unless (zero? fsize)
+ (let ((size (min fsize chunk-size)))
+ (raw-update ctxt buffer size)
+ (chunk-update ctxt (pointer+ buffer size) (- fsize size)))))
(define (file-md5sum fname)
(and (file-exists? fname)
@@ -65,7 +72,7 @@
(mapped-pointer
fname fd fsize
(lambda (buffer cleanup)
- (raw-update ctxt buffer fsize)
+ (chunk-update ctxt buffer fsize)
(cleanup))))
(final ctxt digest)
(file-close fd)