Hi,

Here's the promised follow-up to my earlier memory-mapped-files
patch.

Enabling simple-md5 to digest big files by splitting files that
are bigger than what the C implementation can digest into chunks
and recuring over them.

Two notes:

 1. This implementation is really only useful for posix
    environments. The alternative `mapped-pointer` implementation
    for Windows means, as far as I understand, that the Windows
    version is limited by physical memory. I don't have a Windows
    machine, someone who does should do some testing on whether
    the divergence is needed. Nominally `memory-mapped-files`
    implements a mmap equivalent for Windows, so it should be
    possible to just use the posix code unconditionally.

    Putting the `mapped-pointer` call inside the recursion,
    mapping the individual chunks, would work around this as
    well, but result in an unnecessary amount of mmaps on posix
    systems.

 2. I hope I got the types right. I got a bit confused because
    `unsigned-integer` seems to only accept values up to INT_MAX
    not up to UINT_MAX, which wasn't clear to me from the
    documentation here:
    http://wiki.call-cc.org/man/5/Foreign%20type%20specifiers#integers

I'll take a look at simple-sha1 in a couple of days, I suspect it
would benefit from a similar patch but I haven't checked that out
yet. Hope this is useful to some people.

~~Lou

Index: simple-md5.scm
===================================================================
--- simple-md5.scm	(revision 43989)
+++ simple-md5.scm	(working copy)
@@ -6,16 +6,18 @@
 
 (import scheme (chicken base) (chicken blob) (chicken file)
         (chicken foreign) (chicken file posix) (chicken fixnum)
-        memory-mapped-files)
+        (chicken memory) memory-mapped-files)
 
 (foreign-declare "#include \"md5-base.c\"")
+(foreign-declare "#include <limits.h>")
 
 (define digest-length (foreign-value "MD5_DIGEST_SIZE" unsigned-int))
 (define context-size (foreign-value "sizeof(struct MD5Context)" unsigned-int))
+(define chunk-size (foreign-value "INT_MAX" unsigned-integer))
 
 (define init (foreign-lambda void MD5Init scheme-pointer))
-(define update (foreign-lambda void MD5Update scheme-pointer scheme-pointer unsigned-int))
-(define raw-update (foreign-lambda void MD5Update scheme-pointer c-pointer unsigned-int))
+(define update (foreign-lambda void MD5Update scheme-pointer scheme-pointer unsigned-integer))
+(define raw-update (foreign-lambda void MD5Update scheme-pointer c-pointer unsigned-integer))
 (define final (foreign-lambda void MD5Final scheme-pointer scheme-pointer))
 
 (define (char->hexdigits c)
@@ -52,6 +54,11 @@
 	    (ptr (memory-mapped-file-pointer mmap)))
        (k ptr (cut unmap-file-from-memory mmap))))))
 
+(define (chunk-update ctxt buffer fsize)
+  (unless (zero? fsize)
+    (let ((size (min fsize chunk-size)))
+      (raw-update ctxt buffer size)
+      (chunk-update ctxt (pointer+ buffer size) (- fsize size)))))
 
 (define (file-md5sum fname)
   (and (file-exists? fname)
@@ -65,7 +72,7 @@
 	   (mapped-pointer
 	    fname fd fsize
 	    (lambda (buffer cleanup)
-	      (raw-update ctxt buffer fsize)
+              (chunk-update ctxt buffer fsize)
 	      (cleanup))))
 	 (final ctxt digest)
 	 (file-close fd)

Reply via email to