Sorry that it has taken me so long to reply to this - I wanted to make sure that I had a look at rrb-vector and properly digest the points that you have made.
1. I've dumped the supervector idea :-) I ran some timings and as the number of levels of supervector increased, iteration times degraded faster than I had expected - exactly as you said. 2. I've had a look at rrb-vector - very fast for catvec - agreed, but there does appear to be a performance penalty in terms of conj[!]-ing - I can post my results if you are interested. I read the Bagwell paper on which your work was based and came away with the impression that he did not think he was introducing significant overhead with his improvements for constant tiime concatenation - so maybe this is due to rrb being impled in clojure and builting vec in java ? 3 whilst doing this, I have also been looking at sumatra (http://openjdk.java.net/projects/sumatra/) and considering how I would leverage all my gpgpu's cores to do this sort of thing efficiently. I came up with a very specific solution for mapping a function across a vector and into another vector which is extremely lightweight in both sequential and parallel modes: (def a (into [] (range 1000000))) (time (do (into [] a) nil)) ;; 124 ms (time (do (mapv identity a) nil)) ;; 220 ms (time (do (vmap identity a) nil)) ;; 140 ms (time (do (mapv inc a) nil)) ;; 280 ms (time (do (vmap inc a) nil)) ;; 240 ms (def n (/ (count a) (.availableProcessors (Runtime/getRuntime)))) ;; = 625000 ;; I have 16 :-) (do (time (r/fold (r/monoid into vector) conj a)) nil) ;; 380 ms (do (time (r/fold (r/monoid v/catvec v/vector) conj a)) nil) ;; 380 ms (do (time (r/fold (r/monoid into vector) conj (r/map inc a))) nil) ;; 620 ms (do (time (r/fold (r/monoid v/catvec v/vector) conj (r/map inc a))) nil) ;; 680 ms (do (time (r/fold n (r/monoid into vector) conj (r/map inc a))) nil) ;; 590 ms (do (time (r/fold n (r/monoid v/catvec v/vector) conj (r/map inc a))) nil) ;; 320 - 520 - erratic ! (time (count (vmap inc a))) ;; 230 ms ;; sequential (time (count (fjvmap inc a))) ;; 40 ms ;; parallel (= (time (r/fold n (r/monoid v/catvec v/vector) conj (r/map inc a))) (pvmap inc a)) ;; -> true The timings are very rough, but give a reasonable idea of performance differences. My code is hidden behind vmap (sequential) and fjvmap (parallel using fork/join). vmap is very simple - it understands a vector's internal structure - basically a tree in which each node holds an array size 32 of branches or leaves (but not both). It recurses down to a node containing a source array, makes a corresponding target array then applies either itself recursively, if it is operating at branch level, or the function to be applied, if it operating at leaf level, to each source array item putting the result into the same slot in the target array. fjvmap does the same, but hands off each of the 32 top-level jobs to an f/j pool. I came across this approach because it was the way that I expected the gpu to want its work laid out for it - an array of inputs and an array into which to place corresponding outputs that could be handed off to a workgroup of e.g. 32 gpu cores - but then I found that it worked very well without a gpu, since it bypasses all the infrastructure that makes seqs look the same and only deals with a particular case. This got me thinking though. core.reducers already makes a special case for vectors, so I don't think I am cheating. It seems to me that at an application level we should think of seqs being different impls of the same abstraction, but as library writers we should be prepared to leverage specific knowledge to provide a more performant solution. Seq impls do break down in a natural way and therefore efficient way - many into trees of arrays of size 32. Taking advantage of this seems to yield better performance and less churn so maybe we should pursue it. I'll be checking the source for vmap and fjmap into seqspert soon if anyone is interested. I am going to spend some time looking at fitting it into the core.reducers framework. I've also discovered that the APU (CPU/GPU combo) that I am thinking of buying seems to arrange its gpu cores in workgroups of 64, not 32 :-), so I might have to investigate parameterising the branching factor of the builtin vector. I am also very interested in rrb-vector because it allows different size nodes. If I was mapping over a clojure hashmap/set which are tries of size 32 arrays and wanted to build up a result vector in the same way as vmap does, I would need a vector type with variable sized nodes - so I will try to find time to look into this as well.... sorry posting is so long - wanted to just put it all out there and see if anyone was interested in discussing it further - it is always good to bounce stuff of other people. that's all folks ! Jules On Friday, 21 February 2014 01:09:26 UTC, Jean Niklas L'orange wrote: > > Hi Jules, > > On Thursday, February 20, 2014 11:59:03 PM UTC+1, Jules wrote: >> >> Subvec provides a view on a subseq of a vector that behaves like a full >> vector. Supervec provides a similar view that makes two vectors behave like >> a single one >> > > This data structure ("supervec") is usually known as a rope, and your > implementation degrades the *~O(1)* to worst case *O(k)* time for > lookups, assoc, conj and pop, where *k* is the amount of concatenations > performed. A good rope implementation can reduce that factor down to *O(log > k)* through balancing, but it will still break the performance guarantees > a persistent vector is supposed to have. If you try to use this in the > reducers library, the amount of concatenations fork/join performs might > actually show notable performance degradation for those operations. > Further, passing it around to code which might perform more concatenations > will lead to even worse performance, which may be hard to debug for people > not knowing Clojure internals. > > However, it is a very nice solution if you know there will be limited > concatenations, that the concatenations result in a balanced tree, and you > don't pass this vector around to other people. :) > > For a small discussion on this vs. RRB-trees, see Section 1.1 in > http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf. > > Hopefully I'm not destroying your fun playing around with these > things—that is not the intent at all. I'm just saying that these things are > (sadly?) harder than it first looks like. > > -- JN > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
