Re: fast parallel reduction into hash-set/map

shlomivaknin Thu, 20 Feb 2014 15:52:11 -0800

Hey Jules, 

Really nice stuff your making!
One note about your SuperVecs idea though, it seems that using that 
approach we could get a deeply nested tree to represent our a vector, and I 
think this is the exact thing vectors are trying to avoid..



On Friday, February 21, 2014 1:39:56 AM UTC+2, Jules wrote:
>
> Hi, Andy.
>
> No I haven't - Thanks for the pointer - I'll take a look.
>
> I have a very specific usecase - the speeding up of the re-combination of 
> the results of parallel reductions.
>
> I've found that the cost of this recombination often impacts badly on the 
> overall timing of such a reduction, but with a little thought we can speed 
> it up because we know that both seqs involved in the recombination are of 
> the same type...
>
> Given the proliferation of cores and FP's promise of simple and elegant 
> parallel code, I think that it is critical that Clojure really delivers on 
> this promise and this issue is the last blocker for me for a couple of 
> usecases that I have :-)
>
> I hope that explains where I am coming from - does it seem reasonable ? am 
> I missing anything ?
>
> thanks again,
>
>
> Jules
>   
>
> On Thursday, 20 February 2014 23:25:29 UTC, Andy Fingerhut wrote:
>>
>> Have you looked at core.rrb-vector, which implements Relaxed Radix Trees 
>> for vectors that implement subvec and concatenating two arbitrary vectors 
>> in O(log n) time?
>>
>>     https://github.com/clojure/core.rrb-vector
>>
>> Unless I am missing something, the fast concatenation is what you are 
>> trying to achieve?
>>
>> Andy
>>
>>
>> On Thu, Feb 20, 2014 at 2:59 PM, Jules <jules....@gmail.com> wrote:
>>
>>>
>>> So, having broken the back of fast re-combination of hash sets and maps, 
>>> I wanted to take a look at doing a similar sort of thing for vectors - 
>>> another type of seq that I use very heavily in this sort of situation.
>>>
>>> Let's see the cake first:
>>>
>>> seqspert.vector=> (def a (into [] (range 10000000)))
>>> #'seqspert.vector/a
>>> seqspert.vector=> (time (def b (into a a)))
>>> "Elapsed time: 3695.682654 msecs"
>>> #'seqspert.vector/b
>>> seqspert.vector=> (time (def c (supervec a a)))
>>> "Elapsed time: 0.253922 msecs"
>>> #'seqspert.vector/c
>>> seqspert.vector=> (= b c)
>>> true
>>>
>>> As you can see, the usual method of combining two vectors into a third 
>>> takes 3-4 secs to combine two 10M entry vectors.
>>>
>>> Using 'supervec' it takes MUCH less time.....
>>>
>>> How ?
>>>
>>> I looked at vector's impl and quickly realised that I would not be able 
>>> to use the same trick that I had used for hash set/map. Then I looked at 
>>> subvec and realised that I could do a very similar trick to create supervec.
>>>
>>> Subvec provides a view on a subseq of a vector that behaves like a full 
>>> vector. Supervec provides a similar view that makes two vectors behave like 
>>> a single one:
>>>
>>> seqspert.vector=> (supervec [1 2 3] [4 5 6])
>>> [1 2 3 4 5 6]
>>> seqspert.vector=> (type (supervec [1 2 3] [4 5 6]))
>>> clojure.lang.APersistentVector$SuperVector
>>>
>>> We can use seqspert (see my other posting this evening) to look inside a 
>>> SuperVector:
>>>
>>> seqspert.vector=> (decloak (supervec [1 2 3] [4 5 6]))
>>> #seqspert.vector.SuperVector{:left #seqspert.vector.Vector{:cnt 3, 
>>> :shift 5, :root #seqspert.vector.VectorNode{:array [nil nil nil nil nil nil 
>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil 
>>> nil nil nil nil nil nil nil]}, :tail [1 2 3]}, :right 
>>> #seqspert.vector.Vector{:cnt 3, :shift 5, :root 
>>> #seqspert.vector.VectorNode{:array [nil nil nil nil nil nil nil nil nil nil 
>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil 
>>> nil nil nil]}, :tail [4 5 6]}, :middle 3, :count 6}
>>> seqspert.vector=> (p/pprint (decloak (supervec [1 2 3] [4 5 6])))
>>> {:left
>>>  {:cnt 3,
>>>   :shift 5,
>>>   :root
>>>   {:array
>>>    [nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil]},
>>>   :tail [1 2 3]},
>>>  :right
>>>  {:cnt 3,
>>>   :shift 5,
>>>   :root
>>>   {:array
>>>    [nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil
>>>     nil]},
>>>   :tail [4 5 6]},
>>>  :middle 3,
>>>  :count 6}
>>> nil
>>> seqspert.vector=> 
>>>
>>> It looks very complicated - that is the two clojurer vectors held inside 
>>> ! A SuperVector has fields left, right (the two vectors that we are 
>>> combining), middle (the length of the left vector), and count (the sum of 
>>> the counts of left and right).
>>>
>>> With just this info and very little code, I  found myself with a working 
>>> impl (except 'pop' but who 'pop's a vector?).
>>>
>>> OK - there is a tiny extra runtime overhead associated with using a 
>>> supervec - but, when you look at what is involved in dereffing into a 
>>> vector, this quickly appears insignificant. It's about the same a using a 
>>> subvec.
>>>
>>> The SuperVector code is checked into my clojure fork on github - see 
>>> beginning of this thread. Seqspert is also at github - see the announcement 
>>> I posted this evening.
>>>
>>> Finally 'supervec' is currently defined as:
>>>
>>> (defn supervec [l r] (APersistentVector$SuperVector. {} l r))
>>>
>>> I am thinking or renaming 'splice' from my earlier posting to 'combine' 
>>> and creating a Combinable interface, to be implemented by any seq type that 
>>> is able to do a fast re-combination with another seq of the same type. This 
>>> could be leveraged by e.g. 'into' to deliver significant performance 
>>> benefits and footprint reduction.
>>>
>>> more as and when,
>>>
>>>
>>> Jules
>>>
>>>
>>>
>>>
>>>
>>> On Saturday, 15 February 2014 23:06:24 UTC, Jules wrote:
>>>
>>>> Guys,
>>>>
>>>> I've been playing with reducers on and off for a while but have been 
>>>> frustrated because they don't seem to fit a particular usecase that I have 
>>>> in mind... specifically: getting as many associations into a hash-map as 
>>>> as 
>>>> I can in as short a time as possible.
>>>>
>>>> My understanding of the reason for this is that reducers practice a 
>>>> divide and conquer strategy. The incoming sequence is divided up. Each 
>>>> sub-sequence is reduced into a sub-result (possibly in parallel) and then 
>>>> the sub-results are combined into the the final outgoing result.
>>>>
>>>> Unfortunately, there does not seem to be a better way of combining two 
>>>> hash-maps other than to read each entry from one and create a new 
>>>> corresponding association in the other. This means that each recombination 
>>>> in the above process essentially repeats most of the work already 
>>>> performed 
>>>> in the previous reduction stage.
>>>>
>>>> Hash-sets are implemented via hash-maps, and simpler with which to 
>>>> demonstrate this problem:
>>>>
>>>> user=> (def a (doall (range 10000000)))
>>>> #'user/a
>>>> user=> (def b (doall (range 5000000 15000000)))
>>>> #'user/b
>>>> user=> (time (def c (into #{} a)))
>>>> "Elapsed time: 6319.392669 msecs"
>>>> #'user/c
>>>> user=> (time (def d (into #{} b)))
>>>> "Elapsed time: 5389.805233 msecs"
>>>> #'user/d
>>>> user=> (time (def e (into c d)))
>>>> "Elapsed time: 8486.032191 msecs"
>>>> #'user/e
>>>>
>>>>
>>>> In the example above, you can see that the reduction into hash-sets of 
>>>> two overlapping lists of 10,000,000 elements takes 6.3 and 5.4 seconds. 
>>>> This stage can be carried out in parallel i.e. time elapsed for this stage 
>>>> would be 6.3 seconds - but we now have two hash-sets and we want one, so 
>>>> we 
>>>> have to combine them.
>>>>
>>>>
>>>> user=> (time (def e (into c d)))
>>>> "Elapsed time: 8486.032191 msecs"
>>>> #'user/e
>>>>
>>>> As you can see, all the advantages of splitting the original sequence 
>>>> into 2 and processing the two halves in parallel are lost since the 
>>>> recombination or their results takes 8.5 seconds - more than we saved by 
>>>> doing the reduction in parallel.
>>>>
>>>> So, what can we do about it ?
>>>>
>>>> I had a look at the code for PersistentHashMap (PersistentHashSet uses 
>>>> PersistantHashMap internally). I realised that it was possible to "splice" 
>>>> together the internal structure of two hash maps into a single one without 
>>>> repeating most of the work required to build one from scratch. So, I had a 
>>>> go at implementing it:
>>>>
>>>>
>>>> user=> (time (def f (clojure.lang.PersistentHashSet/splice c d)))
>>>> "Elapsed time: 3052.690911 msecs"
>>>> #'user/f
>>>>
>>>> and:
>>>>
>>>> user=> (= e f)
>>>> true
>>>>
>>>> Whilst this is still adding 3 seconds to our time, that 3 seconds is 
>>>> half the time that we would have added had we executed the second 
>>>> reduction 
>>>> in serial, rather than in parallel.
>>>>
>>>> This means that we can now reduce large datasets into sets/maps more 
>>>> quickly in parallel than we can in serial :-) As an added benefit, because 
>>>> splice reuses as much of the internal structure of both inputs as 
>>>> possible, 
>>>> it's impact in terms of heap consumption and churn is less - although I 
>>>> think that a full implementation might add some Java-side code complexity.
>>>>
>>>> If you would like to give 'splice' a try out, you will need to clone my 
>>>> fork of clojure at github 
>>>>
>>>> https://github.com/JulesGosnell/clojure
>>>>
>>>> Please let me know if you try out the code. I would be interested to 
>>>> hear if people think it is worth pursuing.
>>>>
>>>> I was also thinking that it should be possible to use a similar trick 
>>>> to quickly and cheaply split a map/set into [roughly] equal sized pieces 
>>>> (assuming an good hash distribution). This would enable the use of a 
>>>> map/set as an input sequence into the parallel reduction process outlined 
>>>> above. Currently, I believe that only a vector can be used in this way. It 
>>>> would be harder to arrange that 'count' could be implemented efficiently 
>>>> on 
>>>> these sub-maps/sets, but this is not important for the reduction process.
>>>>
>>>> BTW - benchmarks were run on a 3.2ghz Phenom II / clojure/master / 
>>>> openjdk-1.7.0_51 / Fedora 20 with min and max 4gb ram.
>>>>
>>>> regards,
>>>>
>>>>
>>>>
>>>> Jules
>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: fast parallel reduction into hash-set/map

Reply via email to