Just to try to see if clojure is a practical language for doing
byte-level work (parsing files, network streams, etc), I wrote a
trivial function to iterate through a buffer of bytes and count all
the newlines that it sees.  For my testing, I've written a C version,
a Java version, and a Clojure version.  I'm running each routine 10
times over a 16MB buffer read from /dev/urandom (the buffer is
refreshed between each call to the newline counting function).  With
gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
 With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
guessing that the huge boost that java and gcc -O3 get is from
converting per-byte operations to per-int ops; at least that ~4x boost
looks like it would come from something like that.  Is that an
optimization that is unavailable to clojure?  The java_interop doc
makes it sound like java and clojure get the exact same bytecode when
using areduce correctly, so maybe there's something I could be doing
better.  Here are my small programs; if somebody could suggest
improvements, I'd appreciate them.

iterate.clj:

(set! *warn-on-reflection* true)
(import java.io.FileInputStream)

(def *numbytes* (* 16 1024 1024))

(defn countnl
  [#^bytes buf]
  (let [nl (byte 10)]
    (areduce buf idx count 0
             (if (= (aget buf idx) nl)
               (inc count)
               count))))

(let [ifs (FileInputStream. "/dev/urandom")
      buf (make-array Byte/TYPE *numbytes*)]
  (dotimes [_ 10]
    (let [sz (.read ifs buf)]
      (println "Wanted" *numbytes* "got" sz "bytes")
      (let [count (time (countnl buf))]
        (println "Got" count "nls")))))


Iterate.java:

import java.io.FileInputStream;

class Iterate
{
  static final int NUMBYTES = 16*1024*1024;

  static int countnl(byte[] buf)
  {
    int count = 0;
    for(int i = 0; i < buf.length; i++) {
      if(buf[i] == '\n') {
        count++;
      }
    }
    return count;
  }

  public static final void main(String[] args)
    throws Throwable
  {
    FileInputStream input = new FileInputStream("/dev/urandom");
    byte[] buf = new byte[NUMBYTES];
    int sz;
    long start, end;

    for(int i = 0; i < 10; i++) {
      sz = input.read(buf);
      System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes");
      start = System.currentTimeMillis();
      int count = countnl(buf);
      end = System.currentTimeMillis();
      System.out.println("counted " + count + " nls in " +
          (end-start) + " msec");
    }

    input.close();
  }
}

iterate.c:

#include<sys/types.h>
#include<sys/stat.h>
#include<sys/time.h>
#include<stdlib.h>
#include<unistd.h>
#include<stdio.h>
#include<fcntl.h>

int countnl(char *buf, int sz)
{
  int i;
  int count = 0;
  for(i = 0; i < sz; i++) {
    if(buf[i] == '\n') {
      count++;
    }
  }
  return count;
}

int main()
{
  int fd = open("/dev/urandom", O_RDONLY);
  const int NUMBYTES = 16*1024*1024;
  char *buf = (char*)malloc(NUMBYTES);

  int sz;
  struct timeval start, end;

  int i;
  for(i = 0; i < 10; i++) {
    sz = read(fd, buf, NUMBYTES);
    printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz);
    gettimeofday(&start, 0);
    int count = countnl(buf, sz);
    gettimeofday(&end, 0);
    printf("counted %d nls in %f msec\n", count,
        (float)(end.tv_sec-start.tv_sec)*1e3 + (end.tv_usec-start.tv_usec)/1e3);
  }

  free(buf);
  close(fd);
  return 0;
}

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to