Re: Transducers: sequence versus eduction

Alex Miller Thu, 02 Apr 2015 05:53:14 -0700


> On Apr 2, 2015, at 4:09 AM, Tassilo Horn <t...@gnu.org> wrote:

> So we can agree on eduction not being a Sequable but still being
> sequable. :-)

Agreed. :)

> 
>>    I think my prime use-case is deeply nested mapcatting where the
>>    mapcatted function gets an object and returns a java collection, and
>>    filtering with usually a quite cheap predicate. E.g.
>> 
>>    (sequence-or-eduction (comp (mapcat ...)
>>    (filter ...)
>>    (mapcat ...)
>>    ...
>>    (filter ...))
>>    coll)
>> 
>>    That's part of a larger search algorithm which either stops after
>>    finding the first match (in which case the resulting sequence is likely
>>    not to be realized completely) or runs till all matches are found (in
>>    which case all these transducer-produced sequences would be fully
>>    realized).
>> 
>>    If that would make sense, I could use eduction when I know everything
>>    will be realized and sequence in the other case (or vice versa).
>> 
>> In this case, I don't know that I'd expect much difference. sequence
>> is going to create a TransformerIterator, and wrap it in an
>> chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly
>> the same thing, except calling through an Eduction type.  So, I doubt
>> it matters.
> 
> Ok, thanks.
> 
> Could you also comment on my other question namely the full realization
> of intermediate operations in:
> 
> ,----[ Docs of sequence at http://clojure.org/transducers ]
> | The resulting sequence elements are incrementally computed. These
> | sequences will consume input incrementally as needed and fully realize
> | intermediate operations.  This behavior differs from the equivalent
> | operations on lazy sequences.
> `----
> 
> Does that mean I'm better off with a traditionally nested
> map/mapcat/filter cascade instead of transducers in the case where the
> intermediate mapcatting functions return possibly not so cheap/large
> lazy sequences and it is likely that I won't fully realize the result
> sequence or eduction?

If you're going to use expanding transformations and not realize all of the 
results then I think sequences are likely a better choice for you.

> 
> Hm, I now benchmarked a bit with criterium and measured
> 
>  (sequence (comp (mapcat #(range %))
>                  (mapcat #(range %)))
>            (range 1000))
> 
> versus
> 
>  (eduction (comp (mapcat #(range %))
>                  (mapcat #(range %)))
>            (range 1000))
> 
> versus the traditional
> 
>  (mapcat #(range %)
>          (mapcat #(range %)
>                  (range 1000)))
> 
> and either taking just the first element of the result using (first
> ...), taking the first 1000 elements of the result using (dorun (take
> 1000 ...)), or realizing everything using (dorun ...).
> 
> These are the timings:
> 
> | Version            | first | first 1000 | everything |
> |--------------------+-------+------------+------------|
> | transd. (sequence) | 10 μs | 214 μs     | 21 sec     |
> | transd. (eduction) | 10 μs | 216 μs     | 21 sec     |
> | traditional        |  3 μs | 151 μs     |  7 sec     |
> 
> So as you see, the traditional version is about three times faster in
> the first and everything scenario, and just a littlebit faster in the
> first 1000 scenario.
> 
> When I test
> 
>  (sequence (comp (mapcat #(range % 1000))
>                  (mapcat #(range % 1000)))
>            (range 1000))
> 
> versus
> 
>  (eduction (comp (mapcat #(range % 1000))
>                  (mapcat #(range % 1000)))
>            (range 1000))
> 
> versus
> 
>  (mapcat #(range % 1000)
>          (mapcat #(range % 1000)
>                  (range 1000)))
> 
> so that the larger intermediate sequences come first, I get these
> timings:
> 
> | Version            | first | first 1000 | everything |
> |--------------------+-------+------------+------------|
> | transd. (sequence) | 24 ms |  24 ms     | 21 sec     |
> | transd. (eduction) | 24 ms |  24 ms     | 21 sec     |
> | traditional        |  4 μs | 102 μs     |  7 sec     |
> 
> So here the results are even much worse than above.  If you don't fully
> realize the resulting sequence the traditional version can be many
> orders of magnitudes faster.  I guess that's because of the cited
> statement above, i.e., intermediate operations are always fully
> realized.
> 
> However, at least I had expected that in the case where all elements are
> realized the transducer version should have been faster than the
> traditional version which also needs to fully realize all intermediate
> lazy seqs.  Why is it still three times slower?

I think my main suggestion here is that you are using a non-reducible source 
(range) throughout these timings, so transducers have no leverage on the input 
side. CLJ-1515 will make range reducible and should help a lot on this 
particular example.

> 
> So my conclusion is that you cannot use transducers as a kind of drop-in
> replacement of traditional sequence manipulation functions.  They pay
> off only when you can make very strong assumptions about the sizes and
> compututation costs of intermediate collections, and I think you cannot
> do that in general.  Or well, maybe you can when you program an
> application but you almost certainly cannot when you program a library
> and thus have no clue about how that's gonna be used by users.

Transducers make different trade offs than sequences and there will always be 
cases where one or the other is a better choice. I really appreciate this 
thread as highlighting some of the nuances.

Transducers break transformations into three parts - source iteration, composed 
transforms, and output collection. In the case of reducible inputs, multiple 
transforms, and full realization, transducers can be much faster. If not all of 
those are in play, then the results are more subtle. One thing I've found in 
perf testing a lot of stuff is that chunked sequences continually surprise me 
at how fast they can be.

> 
> Bye,
> Tassilo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction

Reply via email to