I've gotten really fast diffing in Clojure by using the following concepts:

1) If you can sometimes make parts of A and B be identical, then your
differ can skip checking those parts by doing a (identical? A B), this
improves performance
2) Take a side effecting approach, pass into the differ a function of (fn
[path a-val b-val] ...), and whenever you see a difference call that
function, this turns the differ into a event generator. From there it's
trivial to use transients, tuples, etc. to improve performance
3) If you can restrict yourself to maps, vectors and seqs, you can use
reduce-kv for maps and vectors, and walk the seq with a index count for the
latter. This will result in some rather efficient walking.

I used these methods in writing my data differ for Odin:
https://github.com/halgari/odin/blob/update-indicies/src/com/tbaldridge/odin/contexts/data.clj#L259-L384


Not perfect, but it's the fastest method I've come up with so far. Could
probably also replace the calls to condp with nested case statements.

Timothy

On Wed, Apr 19, 2017 at 5:32 PM, lvh <_...@lvh.io> wrote:

> Hi,
>
>
> I have two deeply nested data structures (consisting of maps, vecs and the
> occasional seq; althoguh I can make it maps and vecs consistently if need
> be). I want to compute (and store) diffs; ideally diffs that store [path
> oldval newval] so that I can apply them in either direction.
>
> Using clojure.data/diff on them takes a long time (well north of 10
> minutes on my new laptop).
>
> If I flatten these nested map out to entries they have about 2E5 entries.
> I'm expecting between 1E5 and 1E6 entries per map. These maps  represent
> the same data at two close points in time, so I'm expecting small
> differences. The tree is unbalanced: it has inconsistent depth and
> branching factors, but they're still going to be consistent between
> snapshots.
>
> Here are some ideas I'm trying (but I'm open to suggestions, experiences):
>
> - The machines I'm doing this on have plenty of beefy cores. Since the
> data structures are immutable, I should be able to parallelize this
> operation somewhat, even if it's only a constant speedup of ~4x or so. (I
> care about minor speedups since it takes 10 minutes, not 10 hours, to do
> the diff right now -- so it's entirely possible that enough small speedups
> add up.)
>
> - clojure.data/diff builds a giant data structure of things that are the
> same. I don't care about the parts that are the same; just parts that are
> different. That takes time.
>
> - clojure.data/diff doesn't use transients. While I'm not expecting a lot
> of diffs, this might be a speedup.
>
> I've found https://groups.google.com/forum/#!topic/clojure/VPpjlRC2INg ,
> but it appears that mostly doesn't go anywhere unless I want to maintain
> something that knows a lot about internal Clojure data structures :)
>
>
> lvh
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
“One of the main causes of the fall of the Roman Empire was that–lacking
zero–they had no way to indicate successful termination of their C
programs.”
(Robert Firth)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to