Hi all,

I have recently been watching a set of videos from O'Reilly on
MapReduce. The author of the series is using Python for all of the
examples, but, in an effort to use Clojure more, I've been following
along and writing my code in Clojure. When I implemented the mapper
function that he described in both languages, I noticed that the
Python version was running quite a bit faster and I was wondering if
you all could help me understand why that is the case. I've pasted the
code for each solution below. Also, I am using cake to run the Clojure
code so my thoughts are, since it keeps a JVM up and running at all
times, that should remove the JVM startup time from the equation. The
input file that I am using is the Hound of the Baskervilles from
Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
pg2852.txt). I've also noticed that with an even longer text as input
(for example, I copied the text of the input.txt 10 times into a file)
the Clojure code slows significantly more. In some cases I had to just
stop the code with a Ctrl-c. Any ideas you all have on what could be
causing this would be great. I'm not trying to start any battles
between Python and Clojure, as I love them both, I'm strictly trying
to learn how to be a better programmer in Clojure.

Thanks ahead of time for any help you all can give.

Christopher

;; mapper.clj

(use ['clojure.java.io :only '(reader)])
(use ['clojure.string :only '(split)])

(defn mapper [lines]
  (doseq [line lines]
    (doseq [word (split line #"\s+")]
      (println (str word "\t1")))))

(mapper (line-seq (reader *in*)))


I am running the code above with the following command and I get the
output below

% time cake run mapper.clj < input.txt
real    0m3.573s
user    0m2.031s
sys     0m1.528s


# mapper.py

#!/usr/bin/env
python

import sys

def mapper(lines):
    for line in lines:
        words = line.split()
        for word in words:
            print "{0}\t1".format(word)

def main():
    mapper(sys.stdin)

if __name__ == '__main__':
    main()

% time mapper.py < input.txt
real    0m0.661s
user    0m0.105s
sys     0m0.083s

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to