Hi,

I have two deeply nested data structures (consisting of maps, vecs and the
occasional seq; althoguh I can make it maps and vecs consistently if need
be). I want to compute (and store) diffs; ideally diffs that store [path
oldval newval] so that I can apply them in either direction.

Using clojure.data/diff on them takes a long time (well north of 10 minutes
on my new laptop).

If I flatten these nested map out to entries they have about 2E5 entries.
I'm expecting between 1E5 and 1E6 entries per map. These maps  represent
the same data at two close points in time, so I'm expecting small
differences. The tree is unbalanced: it has inconsistent depth and
branching factors, but they're still going to be consistent between
snapshots.

Here are some ideas I'm trying (but I'm open to suggestions, experiences):

- The machines I'm doing this on have plenty of beefy cores. Since the data
structures are immutable, I should be able to parallelize this operation
somewhat, even if it's only a constant speedup of ~4x or so. (I care about
minor speedups since it takes 10 minutes, not 10 hours, to do the diff
right now -- so it's entirely possible that enough small speedups add up.)

- clojure.data/diff builds a giant data structure of things that are the
same. I don't care about the parts that are the same; just parts that are
different. That takes time.

- clojure.data/diff doesn't use transients. While I'm not expecting a lot
of diffs, this might be a speedup.

I've found https://groups.google.com/forum/#!topic/clojure/VPpjlRC2INg ,
but it appears that mostly doesn't go anywhere unless I want to maintain
something that knows a lot about internal Clojure data structures :)


lvh

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to