On 22/08/12 15:16, nicolas.o...@gmail.com wrote:
You should replace your functions that computes the board by function
that does return 30 times the same board.
And evaluation function by something that returns a constant value.
And check : speed and speed-up for folding.

Then you will know for sure whether the slowness comes from the
explore and search or from
the evaluation and computation of moves.

Alternatively, use visualvm (comes with jdk) or any other profiler to
check where are the cost.


Ok, so I followed your advice and replaced the scoring-fn with (rand-int 10) and the next-level-fn with (repeat 30 (Move-Board blah blah...)) and I have some interesting results....

First of all, experimentation showed that the best partitioning size is 1 unless it's only going to level 2 in which case it seems partitioning with 2 is slightly better...anyway I'm not interested in only going to level 2 so it doesn't matter.

As you reported 4-5 emails back, after some optimizations (mainly 'definline' and using reducers in my core ns as well) I managed to go to level 4 in roughly 8sec, with the dummy fns of course! Now that I got my baseline, starts the real experimentation...

Just looking at the 2 fns that are obviously the culprit, it is pretty obvious that the one producing the next-boards is the most expensive of the 2. so let's leave it last. for the moment let's bother with the one that calculates the leaves (the scoring-fn).

--with the dummy scoring-fn (rand-int 10) : 8-9sec

--with the scoring-fn that counts the pieces and subtracts  : 63-64sec
--with the scoring fn that counts their relative-value and subtracts : 83-84

the good thing about the dummy scoring-fn is that at the end i can verify that it brought back the move with :value 9 so that is good news. However, no matter how much i tried to tune this, it seems that just counting the pieces is 7 times more expensive than generating random ints!!! In addition, accessing the :value key of the pieces (they are records) and subtracting their sums is an extra 20% more expensive! These are the best times i can report - mind you, I started with 127 and 168 sec respectively...

Now, that we've established how much cheaper it is to generate random ints let's move on to the serious bit. for the next experiment the scoring-fn is locked (rand-ints) but I'm using the real next-level fn, again after making some optimisations...


--with the dummy board generation (repeat blah blah), as we saw above it takes 8-9 sec. --with the real board generation it takes forever!!! I can't even measure how much cos I can't wait that long.

trying with the 2 real fns does not make any sense at this point...it is pretty clear that 'next level' is the culprit with regards to performance. So here it is:
------------------------------------------------------------------------------------------------------------------------------------------------------------------
(defn next-level [b dir]
  (r/map #(Move->Board. % (core/try-move %))
             (core/team-moves @curr-game b dir))) ;;curr-game is a promise

(definline team-moves
[game b dir]
`(let [team# (gather-team ~b ~dir)
tmvs# (r/mapcat (fn [p#] (r/map #(dest->Move ~game p# %) (getMoves p#))) team#)]
 (into [] tmvs#)) )

(definline gather-team "Returns all the pieces with same direction dir on this board b."
[b dir]
`(into [] (r/filter #(= ~dir (:direction %)) ~b))) ;all the team-mates (with same direction)

(definline dest->Move "Helper fn for creating moves."
[dm p dest]  `(Move. ~p (partial move ~dm) ~dest))

(defn move
"The function responsible for moving Pieces. Each piece knows how to move itself. Returns the resulting board without making any state changes. "
[game-map p coords]
;;{:pre [(satisfies? Piece p)]}  ;safety comes first
;;(if (some #{coords} (:mappings game-map)) ;check that the position exists on the grid (let [newPiece (update-position p coords) ;the new piece as a result of moving
      old-pos  (getListPosition p)
      new-pos  (getListPosition newPiece)]   ;;piece is a record
(-> @(:board-atom game-map) ;deref the appropriate board atom
     (transient)
     (assoc! old-pos nil)
     (assoc! new-pos newPiece)
     (persistent!)
     #_(populate-board))) ;replace dead-pieces with nils
#_(throw (IllegalStateException. (str coords " is NOT a valid position according to the mappings provided!"))))

(defn collides? "Returns true if the move from [sx sy] to [ex ey] collides with any friendly pieces.
The move will be walked step by step by the walker fn."
[[sx sy] [ex ey] walker b m dir]
(loop [[imm-x imm-y] (if (nil? walker) [ex ey] (walker [sx sy]))] ;if walker is nil make one big step to the end (for the knight)
(cond
  (= [ex ey] [imm-x imm-y]) ;if reached destination
(if (not= dir (:direction (b (translate-position ex ey m)))) false true)
  (not (nil? (get b (translate-position imm-x imm-y m)))) true
:else (recur (walker [imm-x imm-y])))))

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

the only thing not shown here (unless i messed up) is getMoves which basically goes into a special namespace where the core.logic code lives. there is a separate fn for each piece that finds available moves. however because the potential moves are calculated by core.logic engine on an empty board. the rules contain only the logical restrictions of chess - they don't interfere with the actual game played at any given time. For this reason each move has to be 'walked' manually to check if it collides with any pieces on the actual board we're playing now. this is what 'collides?' does and so I'm removing every move that succeeds in colliding...

Now, can you see any place in these fns that 'next-level' depends on that can be optimised any further? I don't think it is reasonable to take that long...calling (next-level (start-chess!) -1) once where start-chess returns the starting board takes just over 150 ms and this is only going to be called 4 times until level 4! IT does not justify that much delay...

I'm really confused and visualvm confused me even more...my app reaches 29,000 objects at its peak! the memory profiler says my memory is dominated by core.logic.Substitutions objects (close to 34%)...Thread peak is 12 if I remember correctly (quad-core cpu).

Have I hit the limit? I don't want to think that core.logic is to blame...after all, I went through rough times in order to encode the rules in core.logic and I thought i would have performance benefits as well (apart from clarity)...

Jim





--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to