I know I promised a representative example yesterday and didn't deliver...let's try it now even though I'm outside the comfort of my own house/desktop...

Right ,we're gonna need a single external dependency which I propose we dynamically load via pomegranate. like so:

(use '[cemerick.pomegranate :only (add-dependencies)])
(add-dependencies :coordinates '[[org.apache.lucene/lucene-snowball "3.0.3"]] :repositories (merge cemerick.pomegranate.aether/maven-central {"clojars" "http://clojars.org/repo"}))
(import '[org.tartarus.snowball.ext EnglishStemmer]) ;;english ony

(defn porter-stemmer  ;;NEED THIS TO COMPILE
"Depending on lang, returns the appropriate porter-stemmer instance. Using Snowball underneath.
Supported languages include the following:
[danish, french, italian, spanish dutch, german, english, romanian, turkish, finnish, hungarian, russian, swedish, norwegian, portugese] If the language you specified does not match anything, an instance of the english stemmer will be returned."
^org.tartarus.snowball.SnowballProgram
[^String lang]
(case lang
"english" (EnglishStemmer.) ;;there are actually more langs but removed them or simplicity
 (EnglishStemmer.)) )

(defn porter-stem "A function that stems words using Porter's algorithm."
(^String [^String s lang-or-stemmer]
 (let [^org.tartarus.snowball.SnowballProgram stemmer
            (cond-> lang-or-stemmer
(string? lang-or-stemmer) (porter-stemmer))] ;if a stemmer object was passed in, use it!
   (.getCurrent
      (doto stemmer
        (.setCurrent s)
        .stem))))
([ss] ;;a collection?
  (mapv #(porter-stem % "english") ss)) )


(def words (->> ["eating" "drinking" "dancing"]
             cycle
             (take 10000)
             vec))

(defn mapv-res []
  (time (porter-stem words)))

(defn foldcat-res []
  (time  (into [] (r/foldcat (r/map #(porter-stem % "english") words)))) )

After letting the jvm warm up a bit I am getting:

"Elapsed time: 68.498756 msecs" for mapv
"Elapsed time: 49.066978 msecs" for foldcat (!!!)


Hmm....I think I see what was happening 2 days ago...when measuring 'mapv' I'd like to reuse the same Object as everything happens serially (no danger). However when measuring foldcat I must create a new stemmer object for every single word and then throw it out because there is parallelism.

I tried foldcat with the same stemmer-object and everything went fine! I'm pretty sure the stemmer objects are not thread-safe...how am I able to use the same object from separate threads on different words? I was expecting seriously broken behaviour but everythig went smoothly...


Jim





On 19/06/13 08:14, Alan Busby wrote:

On Wed, Jun 19, 2013 at 4:03 PM, Tassilo Horn <t...@gnu.org <mailto:t...@gnu.org>> wrote:

    I might be wrong, but I think reducers are only faster in situations
    where you bash many filters/maps/mapcats/etc on top of each other.


Or use fold on a large vector with a multi-core machine.

You might try fold-into-vec to see if there is any difference;

(defn fold-into-vec [coll]

"Provided a reducer, concatenate into a vector. Note: same as (into [] coll), but parallel."

(r/fold (r/monoid into vector) conj coll))


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to