I know I promised a representative example yesterday and didn't
deliver...let's try it now even though I'm outside the comfort of my own
house/desktop...
Right ,we're gonna need a single external dependency which I propose we
dynamically load via pomegranate. like so:
(use '[cemerick.pomegranate :only (add-dependencies)])
(add-dependencies :coordinates '[[org.apache.lucene/lucene-snowball
"3.0.3"]]
:repositories (merge
cemerick.pomegranate.aether/maven-central {"clojars"
"http://clojars.org/repo"}))
(import '[org.tartarus.snowball.ext EnglishStemmer]) ;;english ony
(defn porter-stemmer ;;NEED THIS TO COMPILE
"Depending on lang, returns the appropriate porter-stemmer instance.
Using Snowball underneath.
Supported languages include the following:
[danish, french, italian, spanish dutch, german, english, romanian,
turkish, finnish, hungarian, russian, swedish, norwegian, portugese]
If the language you specified does not match anything, an instance of
the english stemmer will be returned."
^org.tartarus.snowball.SnowballProgram
[^String lang]
(case lang
"english" (EnglishStemmer.) ;;there are actually more langs but
removed them or simplicity
(EnglishStemmer.)) )
(defn porter-stem "A function that stems words using Porter's algorithm."
(^String [^String s lang-or-stemmer]
(let [^org.tartarus.snowball.SnowballProgram stemmer
(cond-> lang-or-stemmer
(string? lang-or-stemmer) (porter-stemmer))] ;if a
stemmer object was passed in, use it!
(.getCurrent
(doto stemmer
(.setCurrent s)
.stem))))
([ss] ;;a collection?
(mapv #(porter-stem % "english") ss)) )
(def words (->> ["eating" "drinking" "dancing"]
cycle
(take 10000)
vec))
(defn mapv-res []
(time (porter-stem words)))
(defn foldcat-res []
(time (into [] (r/foldcat (r/map #(porter-stem % "english") words)))) )
After letting the jvm warm up a bit I am getting:
"Elapsed time: 68.498756 msecs" for mapv
"Elapsed time: 49.066978 msecs" for foldcat (!!!)
Hmm....I think I see what was happening 2 days ago...when measuring
'mapv' I'd like to reuse the same Object as everything happens serially
(no danger). However when measuring foldcat I must create a new stemmer
object for every single word and then throw it out because there is
parallelism.
I tried foldcat with the same stemmer-object and everything went fine!
I'm pretty sure the stemmer objects are not thread-safe...how am I able
to use the same object from separate threads on different words? I was
expecting seriously broken behaviour but everythig went smoothly...
Jim
On 19/06/13 08:14, Alan Busby wrote:
On Wed, Jun 19, 2013 at 4:03 PM, Tassilo Horn <t...@gnu.org
<mailto:t...@gnu.org>> wrote:
I might be wrong, but I think reducers are only faster in situations
where you bash many filters/maps/mapcats/etc on top of each other.
Or use fold on a large vector with a multi-core machine.
You might try fold-into-vec to see if there is any difference;
(defn fold-into-vec [coll]
"Provided a reducer, concatenate into a vector. Note: same as (into []
coll), but parallel."
(r/fold (r/monoid into vector) conj coll))
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.