Re: real-world usage of reducers?

Jim - FooBar(); Wed, 22 Aug 2012 15:03:13 -0700

On 22/08/12 15:16, nicolas.o...@gmail.com wrote:

You should replace your functions that computes the board by function
that does return 30 times the same board.
And evaluation function by something that returns a constant value.
And check : speed and speed-up for folding.


Then you will know for sure whether the slowness comes from the
explore and search or from
the evaluation and computation of moves.

Alternatively, use visualvm (comes with jdk) or any other profiler to
check where are the cost.

Ok, so I followed your advice and replaced the scoring-fn with (rand-int10) and the next-level-fn with (repeat 30 (Move-Board blah blah...)) andI have some interesting results....

First of all, experimentation showed that the best partitioning size is1 unless it's only going to level 2 in which case it seems partitioningwith 2 is slightly better...anyway I'm not interested in only going tolevel 2 so it doesn't matter.

As you reported 4-5 emails back, after some optimizations (mainly'definline' and using reducers in my core ns as well) I managed to go tolevel 4 in roughly 8sec, with the dummy fns of course! Now that I got mybaseline, starts the real experimentation...

Just looking at the 2 fns that are obviously the culprit, it is prettyobvious that the one producing the next-boards is the most expensive ofthe 2. so let's leave it last. for the moment let's bother with the onethat calculates the leaves (the scoring-fn).


--with the dummy scoring-fn (rand-int 10) : 8-9sec

--with the scoring-fn that counts the pieces and subtracts  : 63-64sec

--with the scoring fn that counts their relative-value and subtracts :83-84

the good thing about the dummy scoring-fn is that at the end i canverify that it brought back the move with :value 9 so that is good news.However, no matter how much i tried to tune this, it seems that justcounting the pieces is 7 times more expensive than generating randomints!!! In addition, accessing the :value key of the pieces (they arerecords) and subtracting their sums is an extra 20% more expensive!These are the best times i can report - mind you, I started with 127 and168 sec respectively...

Now, that we've established how much cheaper it is to generate randomints let's move on to the serious bit. for the next experiment thescoring-fn is locked (rand-ints) but I'm using the real next-level fn,again after making some optimisations...

--with the dummy board generation (repeat blah blah), as we saw above ittakes 8-9 sec.--with the real board generation it takes forever!!! I can't evenmeasure how much cos I can't wait that long.

trying with the 2 real fns does not make any sense at this point...it ispretty clear that 'next level' is the culprit with regards toperformance. So here it is:

------------------------------------------------------------------------------------------------------------------------------------------------------------------
(defn next-level [b dir]
  (r/map #(Move->Board. % (core/try-move %))
             (core/team-moves @curr-game b dir))) ;;curr-game is a promise

(definline team-moves
[game b dir]
`(let [team# (gather-team ~b ~dir)

tmvs# (r/mapcat (fn [p#] (r/map #(dest->Move ~game p# %)(getMoves p#))) team#)]

 (into [] tmvs#)) )

(definline gather-team "Returns all the pieces with same direction diron this board b."

[b dir]

`(into [] (r/filter #(= ~dir (:direction %)) ~b))) ;all the team-mates(with same direction)


(definline dest->Move "Helper fn for creating moves."
[dm p dest]  `(Move. ~p (partial move ~dm) ~dest))

(defn move

"The function responsible for moving Pieces. Each piece knows how tomove itself. Returns the resulting board without making any statechanges. "

[game-map p coords]
;;{:pre [(satisfies? Piece p)]}  ;safety comes first

;;(if (some #{coords} (:mappings game-map)) ;check that the positionexists on the grid(let [newPiece (update-position p coords) ;the new piece as a result ofmoving

      old-pos  (getListPosition p)
      new-pos  (getListPosition newPiece)]   ;;piece is a record
(-> @(:board-atom game-map) ;deref the appropriate board atom
     (transient)
     (assoc! old-pos nil)
     (assoc! new-pos newPiece)
     (persistent!)
     #_(populate-board))) ;replace dead-pieces with nils

#_(throw (IllegalStateException. (str coords " is NOT a valid positionaccording to the mappings provided!"))))

(defn collides? "Returns true if the move from [sx sy] to [ex ey]collides with any friendly pieces.

The move will be walked step by step by the walker fn."
[[sx sy] [ex ey] walker b m dir]

(loop [[imm-x imm-y] (if (nil? walker) [ex ey] (walker [sx sy]))] ;ifwalker is nil make one big step to the end (for the knight)

(cond
  (= [ex ey] [imm-x imm-y]) ;if reached destination

(if (not= dir (:direction (b (translate-position ex ey m))))false true)

  (not (nil? (get b (translate-position imm-x imm-y m)))) true
:else (recur (walker [imm-x imm-y])))))

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

the only thing not shown here (unless i messed up) is getMoves whichbasically goes into a special namespace where the core.logic code lives.there is a separate fn for each piece that finds available moves.however because the potential moves are calculated by core.logic engineon an empty board. the rules contain only the logical restrictions ofchess - they don't interfere with the actual game played at any giventime. For this reason each move has to be 'walked' manually to check ifit collides with any pieces on the actual board we're playing now. thisis what 'collides?' does and so I'm removing every move that succeeds incolliding...

Now, can you see any place in these fns that 'next-level' depends onthat can be optimised any further? I don't think it is reasonable totake that long...calling (next-level (start-chess!) -1) once wherestart-chess returns the starting board takes just over 150 ms and thisis only going to be called 4 times until level 4! IT does not justifythat much delay...

I'm really confused and visualvm confused me even more...my app reaches29,000 objects at its peak! the memory profiler says my memory isdominated by core.logic.Substitutions objects (close to 34%)...Threadpeak is 12 if I remember correctly (quad-core cpu).

Have I hit the limit? I don't want to think that core.logic is toblame...after all, I went through rough times in order to encode therules in core.logic and I thought i would have performance benefits aswell (apart from clarity)...


Jim





--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: real-world usage of reducers?

Reply via email to