If you're interested only in counting the number of unique words, then
you don't even need a map. You can get by with a set, like this:
(defn unique-words-in-file
[file]
(count (set (split-on-whitespace (slurp file)))))
slurp reads file into a String object in memory. The hypothetical
split-on-whitespace takes a String and returns a collection of word
objects. set takes a collection and produces a set of the elements in
that collection. count counts the number of elements in the set.
If, on the other hand, you wanted a map from each word in the file to
the number of times that it appears, you might do it like this:
(defn word-counts
[file]
(reduce
(fn [map word] (assoc map word (inc (get map word 0))))
{}
(split-on-whitespace (slurp file))))
The reduce starts with the empty map {}, and then for each word in the
file, produces a new map by invoking the anonymous function supplied
as the first argument to reduce.
You could also get the same result with a list comprehension, using
for:
(defn word-counts
[file]
(apply merge-with +
(for [word (split-on-whitespace (slurp file))]
{word 1})))
Here we emit a map for each word in the file, mapping that word to 1.
Then we merge all the maps together, using + when two maps contain the
same key. This function has a small bug: it throws an exception if
the file contains no words. To fix it, you would insert an additional
argument to apply, the empty map {}.
On Oct 19, 9:16 pm, "Tom Emerson" <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have a somewhat embarassing newbie question on the use of hash maps
> in a functional environment.
>
> Consider a little utility that counts the number of unique words in a
> file. A hash map mapping strings to integers is the obvious data
> structure for this, but the fact that (assoc) returns a new map each
> time it is called is tripping me up: since I can't define a 'global'
> hash map to accumulate the counts, do you pass one around in a
> function? or do you structure the code a different way? This is a
> difference from what I would do in Common Lisp, where I would just
> have a global that is used for the collection.
>
> Thanks in advance for your wisdom.
>
> -tree
>
> --
> Tom Emerson
> [EMAIL PROTECTED]://www.dreamersrealm.net/~tree
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---