I don't know what the heck is going here, but ignore the time the
program is reporting and just
pay attention to how long it actually takes wall-clock style and
you'll see that your clojure and
java programs already take the same time.

Here are my findings:

I saved Iterate.java into my rlm package and ran:
time java -server rlm.Iterate

results:
time java -server rlm.Iterate
Wanted 16777216 got 16777216 bytes
counted 65341 nls in 27 msec
Wanted 16777216 got 16777216 bytes
counted 65310 nls in 27 msec
Wanted 16777216 got 16777216 bytes
counted 66026 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65473 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65679 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65739 nls in 19 msec
Wanted 16777216 got 16777216 bytes
counted 65310 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65810 nls in 18 msec
Wanted 16777216 got 16777216 bytes
counted 65531 nls in 21 msec
Wanted 16777216 got 16777216 bytes
counted 65418 nls in 21 msec

real    0m27.469s
user    0m0.472s
sys     0m26.638s


I wrapped the last bunch of commands in your clojure script into a
(run) function:
(defn run []
  (let [ifs (FileInputStream. "/dev/urandom")
        buf (make-array Byte/TYPE *numbytes*)]
    (dotimes [_ 10]
      (let [sz (.read ifs buf)]
        (println "Wanted" *numbytes* "got" sz "bytes")
        (let [count (time (countnl buf))]
          (println "Got" count "nls"))))))

and ran
(time (run)) at the repl:

(time (run))
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.081975 msecs"
Got 65894 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.001814 msecs"
Got 65949 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.061934 msecs"
Got 65603 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.031131 msecs"
Got 65563 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.122567 msecs"
Got 65696 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 182.968066 msecs"
Got 65546 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.058508 msecs"
Got 65468 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 182.932395 msecs"
Got 65872 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 183.074646 msecs"
Got 65498 nls
Wanted 16777216 got 16777216 bytes
"Elapsed time: 187.733636 msecs"
Got 65434 nls
"Elapsed time: 28510.331507 msecs"
nil

Total running time for both programs is around 28 seconds.
The java program seems to be incorrectly reporting it's time.

--Robert McIntyre









On Mon, Aug 30, 2010 at 4:03 PM, tsuraan <tsur...@gmail.com> wrote:
> Just to try to see if clojure is a practical language for doing
> byte-level work (parsing files, network streams, etc), I wrote a
> trivial function to iterate through a buffer of bytes and count all
> the newlines that it sees.  For my testing, I've written a C version,
> a Java version, and a Clojure version.  I'm running each routine 10
> times over a 16MB buffer read from /dev/urandom (the buffer is
> refreshed between each call to the newline counting function).  With
> gcc -O0, I get about 80ms per 16MB buffer.  With gcc -O3, I get ~14ms
> per buffer.  With javac (and java -server) I get 20ms per 16MB buffer.
>  With clojure, I get 105ms per buffer (after the jvm warms up).  I'm
> guessing that the huge boost that java and gcc -O3 get is from
> converting per-byte operations to per-int ops; at least that ~4x boost
> looks like it would come from something like that.  Is that an
> optimization that is unavailable to clojure?  The java_interop doc
> makes it sound like java and clojure get the exact same bytecode when
> using areduce correctly, so maybe there's something I could be doing
> better.  Here are my small programs; if somebody could suggest
> improvements, I'd appreciate them.
>
> iterate.clj:
>
> (set! *warn-on-reflection* true)
> (import java.io.FileInputStream)
>
> (def *numbytes* (* 16 1024 1024))
>
> (defn countnl
>  [#^bytes buf]
>  (let [nl (byte 10)]
>    (areduce buf idx count 0
>             (if (= (aget buf idx) nl)
>               (inc count)
>               count))))
>
> (let [ifs (FileInputStream. "/dev/urandom")
>      buf (make-array Byte/TYPE *numbytes*)]
>  (dotimes [_ 10]
>    (let [sz (.read ifs buf)]
>      (println "Wanted" *numbytes* "got" sz "bytes")
>      (let [count (time (countnl buf))]
>        (println "Got" count "nls")))))
>
>
> Iterate.java:
>
> import java.io.FileInputStream;
>
> class Iterate
> {
>  static final int NUMBYTES = 16*1024*1024;
>
>  static int countnl(byte[] buf)
>  {
>    int count = 0;
>    for(int i = 0; i < buf.length; i++) {
>      if(buf[i] == '\n') {
>        count++;
>      }
>    }
>    return count;
>  }
>
>  public static final void main(String[] args)
>    throws Throwable
>  {
>    FileInputStream input = new FileInputStream("/dev/urandom");
>    byte[] buf = new byte[NUMBYTES];
>    int sz;
>    long start, end;
>
>    for(int i = 0; i < 10; i++) {
>      sz = input.read(buf);
>      System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes");
>      start = System.currentTimeMillis();
>      int count = countnl(buf);
>      end = System.currentTimeMillis();
>      System.out.println("counted " + count + " nls in " +
>          (end-start) + " msec");
>    }
>
>    input.close();
>  }
> }
>
> iterate.c:
>
> #include<sys/types.h>
> #include<sys/stat.h>
> #include<sys/time.h>
> #include<stdlib.h>
> #include<unistd.h>
> #include<stdio.h>
> #include<fcntl.h>
>
> int countnl(char *buf, int sz)
> {
>  int i;
>  int count = 0;
>  for(i = 0; i < sz; i++) {
>    if(buf[i] == '\n') {
>      count++;
>    }
>  }
>  return count;
> }
>
> int main()
> {
>  int fd = open("/dev/urandom", O_RDONLY);
>  const int NUMBYTES = 16*1024*1024;
>  char *buf = (char*)malloc(NUMBYTES);
>
>  int sz;
>  struct timeval start, end;
>
>  int i;
>  for(i = 0; i < 10; i++) {
>    sz = read(fd, buf, NUMBYTES);
>    printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz);
>    gettimeofday(&start, 0);
>    int count = countnl(buf, sz);
>    gettimeofday(&end, 0);
>    printf("counted %d nls in %f msec\n", count,
>        (float)(end.tv_sec-start.tv_sec)*1e3 + 
> (end.tv_usec-start.tv_usec)/1e3);
>  }
>
>  free(buf);
>  close(fd);
>  return 0;
> }
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to