Re: #{:eduction :performance} Trying to understand when to use eduction

Stuart Sierra Sun, 19 Jul 2015 08:53:50 -0700

Hi Leon,

I think this is an edge case related to how varargs functions are 
implemented in Clojure.

The varargs arity of `max` is implemented with `reduce1`: core.clj line 1088 
<https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088>

`reduce1` is a simplified implementation of "reduce" defined early in 
clojure.core before the optimized reduction protocols have been loaded: 
core.clj 
line 894 
<https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894>.

`reduce1` is implemented in terms of lazy sequences, with support for 
chunking.

So `apply max` defaults to using chunked lazy sequence operations. `map` 
and `range` both return chunked sequences.

`eduction` returns an Iterable, so when you `apply max` on it, it turns the 
Iterable into a Seq, but it's not a chunked seq. Therefore, it's slightly 
slower than `apply max` on a chunked seq.

In this case, to ensure you're using the fast-path internal reduce over `
eduction`, you can use `reduce` directly:
(reduce max 0 (eduction (map inc) (range 100000)))
You must provide an init value because `eduction` does not assume the "init 
with first element" behavior of sequences.

This version, in my informal benchmarking, is the fastest.

Lots of functions in clojure.core use `reduce1` in their varargs 
implementation. Perhaps they could be changed to use the optimized `reduce`, 
but this might add a lot of repeated definitions as clojure.core is 
bootstrapping itself. I'm not sure.

In general, I would not assume that `eduction` is automatically faster than 
lazy sequences. It will be faster only in the cases where it can use the 
optimized reduction protocols such as InternalReduce. If the optimized path 
isn't available, many operations will fall back to lazy sequences for 
backwards-compatibility. 

I would suggest using `eduction` only when you *know* you're going to 
consume the result with `reduce` or `transduce`. As always, test first, and 
profilers are your friend. :)

–S

On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:
>
> My understanding was that if I pass an eduction to a process using reduce, 
> I can save the computer time and space because the per step overhead of 
> lazy sequences is gone and also the entire sequence does not have to reside 
> in memory at once.
>
> When I time the difference between (apply max (map inc (range 100000))) 
> and (apply max (eduction (map inc) (range 100000))), the lazy-seq variant 
> wins.
>
> I'd like to understand why, and when eductions should be used instead.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: #{:eduction :performance} Trying to understand when to use eduction

Reply via email to