Re: parallel sequence side-effect processor

Francis Avila Fri, 23 Sep 2016 16:16:12 -0700

There are a few intermediate collections here:


   1. The source coll may produce a seq object. How costly this is depends 
   on the type of coll and the quality of its iterator/ireduce/seq 
   implementations.
   2. You may need to collect multiple source colls into a tuple-like thing 
   to produce a single object for the side-effecting function
   3. You may have an intermediate seq/coll of these tuple-like things.
   4. You may have a useless seq/coll of "output" from the side-effecting 
   function

In the single-coll case:

(map f col1) pays 1,4.
(doseq [x col1] (f x)) pays 1.
(run! f col1) pays 1 if coll has an inefficient IReduce, otherwise it pays 
nothing.
(fold f col1) is the same (using reducers r/fold protocol for vectors, 
which ultimately uses IReduce)

In the multi-coll case:

(map f coll1 col2) pays all four. 
(run! (fn [[a b]] (f a b)) (map vector col1 col2)) pays 1, 2, and 3.
(doseq [[a b] (map vector col1 col2)] (f a b)) pays 1, 2, 3.
(fold f col1 col2) pays 1 from what I can see? (It uses first+next to walk 
over the items stepwise? There's a lot of indirection so I'm not 100% sure 
what the impl is for vectors that actually gets used.)

There is no way to avoid 1 in the multi-step case (or 2 if you are fully 
variadic), all you can do is use the most efficient-possible intermediate 
object to track the traversal. Iterators are typically cheaper than seqs, 
so the ideal case would be a loop-recur over multiple iterators.

In the multi-coll case there is also no way IReduce can help. IReduce is a 
trade: you give up the power to see each step of iteration in order to 
allow the collection to perform the overall reduction operation more 
efficiently. However with multi-coll you really do need to control the 
iteration so you can get all the items at an index together.

The ideal for multi-collection would probably be something that internally 
looks like clojure.core/sequence but doesn't accumulate the results. 
(Unfortunately some of the classes necessary to do this (MultiIterator) are 
private.)

Fluokitten could probably do it with some tweaking to its 
algo/collection-foldmap to use iterators where possible instead of 
first/next.


On Friday, September 23, 2016 at 5:23:51 PM UTC-5, Dragan Djuric wrote:
>
> fluokitten's fold is MUCH better than (map f a b) because it does NOT 
> create intermediate collections. just use (fold f a b) and it would fold 
> everything into one thing (in this case nil). If f is a function with side 
> effects, it will invoke them. No intermediate collection is created AND the 
> folding would be optimized per the type of a.
>
> On Friday, September 23, 2016 at 10:56:00 PM UTC+2, tbc++ wrote:
>>
>> How is fluokitten's fold any better than using seqs like (map f a b) 
>> would? Both create intermediate collections.
>>
>> On Fri, Sep 23, 2016 at 11:40 AM, Dragan Djuric <drag...@gmail.com> 
>> wrote:
>>
>>> If you do not insist on vanilla clojure, but can use a library, fold 
>>> from fluokitten might enable you to do this. It is similar to reduce, but 
>>> accepts multiple arguments. Give it a vararg folding function that prints 
>>> what you need and ignores the first parameter, and you'd get what you asked 
>>> for.
>>>
>>>
>>> On Friday, September 23, 2016 at 7:15:42 PM UTC+2, Mars0i wrote:
>>>>
>>>> On Friday, September 23, 2016 at 11:11:07 AM UTC-5, Alan Thompson wrote:
>>>>>
>>>>> Huh.  I was also unaware of the run! function.
>>>>>
>>>>> I suppose you could always write it like this:
>>>>>
>>>>> (def x (vec (range 3)))
>>>>> (def y (vec (reverse x)))
>>>>>
>>>>> (run!
>>>>>   (fn [[x y]] (println x y))
>>>>>
>>>>>   (map vector x y))
>>>>>
>>>>>
>>>>>  > lein run
>>>>> 0 2
>>>>> 1 1
>>>>> 2 0
>>>>>
>>>>>
>>>> Yes.  But that's got the same problem.  Doesn't matter with a toy 
>>>> example, but the (map vector ...) could be undesirable with large 
>>>> collections in performance-critical code.
>>>>
>>>> although the plain old for loop with dotimes looks simpler:
>>>>>
>>>>> (dotimes [i (count x) ]
>>>>>   (println (x i) (y i)))
>>>>>
>>>>>
>>>>> maybe that is the best answer? It is hard to beat the flexibility of a 
>>>>> a loop and an explicit index.
>>>>>
>>>>
>>>> I agree that this is clearer, but it kind of bothers me to index 
>>>> through a vector sequentially in Clojure.  We need indexing In Clojure 
>>>> because sometimes you need to access a vector more arbitrarily.  If you're 
>>>> just walking the vector in order, we have better methods--as long as we 
>>>> don't want to walk multiple vectors in the same order for side effects.
>>>>
>>>> However, the real drawback of the dotimes method is that it's not 
>>>> efficient for the general case; it could be slow on lists, lazy sequences, 
>>>> etc. (again, on non-toy examples).  Many of the most convenient Clojure 
>>>> functions return lazy sequences.  Even the non-lazy sequences returned by 
>>>> transducers aren't efficiently indexable, afaik.  Of course you can always 
>>>> throw any sequence into 'vec' and get out a vector, but that's an 
>>>> unnecessary transformation if you just want to iterate through the 
>>>> sequences element by element.
>>>>
>>>> If I'm writing a function that will plot points or that will write data 
>>>> to a file, it shouldn't be a requirement for the sake of efficiency that 
>>>> the data come in the form of vectors.  I should be able to pass in the 
>>>> data 
>>>> in whatever form is easiest.  Right now, if I wanted efficiency for 
>>>> walking 
>>>> through sequences in the same order, without creating unnecessary data 
>>>> structures, I'd have to write the function using loop/recur.  On the other 
>>>> hand, if I wanted the cross product of the sequences, I'd use doseq and be 
>>>> done a lot quicker with clearer code.
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> “One of the main causes of the fall of the Roman Empire was that–lacking 
>> zero–they had no way to indicate successful termination of their C 
>> programs.”
>> (Robert Firth) 
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: parallel sequence side-effect processor

Reply via email to