Re: #{:eduction :performance} Trying to understand when to use eduction

Alex Miller Sun, 19 Jul 2015 09:35:23 -0700


On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote:
>
> Hi Leon,
>
> I think this is an edge case related to how varargs functions are 
> implemented in Clojure.
>
> The varargs arity of `max` is implemented with `reduce1`: core.clj line 
> 1088 
> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088>
>
> `reduce1` is a simplified implementation of "reduce" defined early in 
> clojure.core before the optimized reduction protocols have been loaded: 
> core.clj 
> line 894 
> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894>.
>  
> `reduce1` is implemented in terms of lazy sequences, with support for 
> chunking.
>
> So `apply max` defaults to using chunked lazy sequence operations. `map` 
> and `range` both return chunked sequences.
>
> `eduction` returns an Iterable, so when you `apply max` on it, it turns 
> the Iterable into a Seq, but it's not a chunked seq. Therefore, it's 
> slightly slower than `apply max` on a chunked seq.
>


seqs on eductions *are* chunked - they will fall into this case during 
seq: 
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525
 
which produces a chunked sequence over an Iterable.
 

> In this case, to ensure you're using the fast-path internal reduce over `
> eduction`, you can use `reduce` directly:
> (reduce max 0 (eduction (map inc) (range 100000)))
> You must provide an init value because `eduction` does not assume the 
> "init with first element" behavior of sequences.
>
> This version, in my informal benchmarking, is the fastest.
>
> Lots of functions in clojure.core use `reduce1` in their varargs 
> implementation. Perhaps they could be changed to use the optimized `reduce`, 
> but this might add a lot of repeated definitions as clojure.core is 
> bootstrapping itself. I'm not sure.
>

For various bootstrapping reasons, this is a hard change.
 

> In general, I would not assume that `eduction` is automatically faster 
> than lazy sequences. It will be faster only in the cases where it can use 
> the optimized reduction protocols such as InternalReduce. If the optimized 
> path isn't available, many operations will fall back to lazy sequences for 
> backwards-compatibility. 
>
> I would suggest using `eduction` only when you *know* you're going to 
> consume the result with `reduce` or `transduce`. As always, test first, and 
> profilers are your friend. :)
>

Use eduction for delayed eager *non-cached* execution. Seqs give you 
delayed *cached* execution. 
If you're doing a transformation once, or if the thing you're doing would 
consume too many resources if cached, then use eduction.
If you need to do a transformation once and then use the result multiple 
times, it's better to use sequence+transducer to get the caching effect and 
the benefits of reduced allocation during transformation.

Chunked seqs are surprisingly fast, particularly when all of the operations 
in a nested transformation are chunked. However, every new layer adds 
another set of (chunked) sequence allocation. Eduction or anything 
transducer-based is going to do no seq allocation and execute as a single 
eager pass. Generally this means that transducer stuff will win more if the 
collection source is reducible, if the inputs are "large" (more input = 
more win), or if the number of transformations is >1 (more transformations 
= more wins).

 

>
> –S
>
>
>
> On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:
>>
>> My understanding was that if I pass an eduction to a process using 
>> reduce, I can save the computer time and space because the per step 
>> overhead of lazy sequences is gone and also the entire sequence does not 
>> have to reside in memory at once.
>>
>> When I time the difference between (apply max (map inc (range 100000))) 
>> and (apply max (eduction (map inc) (range 100000))), the lazy-seq variant 
>> wins.
>>
>> I'd like to understand why, and when eductions should be used instead.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: #{:eduction :performance} Trying to understand when to use eduction

Reply via email to