Just to try to see if clojure is a practical language for doing byte-level work (parsing files, network streams, etc), I wrote a trivial function to iterate through a buffer of bytes and count all the newlines that it sees. For my testing, I've written a C version, a Java version, and a Clojure version. I'm running each routine 10 times over a 16MB buffer read from /dev/urandom (the buffer is refreshed between each call to the newline counting function). With gcc -O0, I get about 80ms per 16MB buffer. With gcc -O3, I get ~14ms per buffer. With javac (and java -server) I get 20ms per 16MB buffer. With clojure, I get 105ms per buffer (after the jvm warms up). I'm guessing that the huge boost that java and gcc -O3 get is from converting per-byte operations to per-int ops; at least that ~4x boost looks like it would come from something like that. Is that an optimization that is unavailable to clojure? The java_interop doc makes it sound like java and clojure get the exact same bytecode when using areduce correctly, so maybe there's something I could be doing better. Here are my small programs; if somebody could suggest improvements, I'd appreciate them.
iterate.clj: (set! *warn-on-reflection* true) (import java.io.FileInputStream) (def *numbytes* (* 16 1024 1024)) (defn countnl [#^bytes buf] (let [nl (byte 10)] (areduce buf idx count 0 (if (= (aget buf idx) nl) (inc count) count)))) (let [ifs (FileInputStream. "/dev/urandom") buf (make-array Byte/TYPE *numbytes*)] (dotimes [_ 10] (let [sz (.read ifs buf)] (println "Wanted" *numbytes* "got" sz "bytes") (let [count (time (countnl buf))] (println "Got" count "nls"))))) Iterate.java: import java.io.FileInputStream; class Iterate { static final int NUMBYTES = 16*1024*1024; static int countnl(byte[] buf) { int count = 0; for(int i = 0; i < buf.length; i++) { if(buf[i] == '\n') { count++; } } return count; } public static final void main(String[] args) throws Throwable { FileInputStream input = new FileInputStream("/dev/urandom"); byte[] buf = new byte[NUMBYTES]; int sz; long start, end; for(int i = 0; i < 10; i++) { sz = input.read(buf); System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes"); start = System.currentTimeMillis(); int count = countnl(buf); end = System.currentTimeMillis(); System.out.println("counted " + count + " nls in " + (end-start) + " msec"); } input.close(); } } iterate.c: #include<sys/types.h> #include<sys/stat.h> #include<sys/time.h> #include<stdlib.h> #include<unistd.h> #include<stdio.h> #include<fcntl.h> int countnl(char *buf, int sz) { int i; int count = 0; for(i = 0; i < sz; i++) { if(buf[i] == '\n') { count++; } } return count; } int main() { int fd = open("/dev/urandom", O_RDONLY); const int NUMBYTES = 16*1024*1024; char *buf = (char*)malloc(NUMBYTES); int sz; struct timeval start, end; int i; for(i = 0; i < 10; i++) { sz = read(fd, buf, NUMBYTES); printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz); gettimeofday(&start, 0); int count = countnl(buf, sz); gettimeofday(&end, 0); printf("counted %d nls in %f msec\n", count, (float)(end.tv_sec-start.tv_sec)*1e3 + (end.tv_usec-start.tv_usec)/1e3); } free(buf); close(fd); return 0; } -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en