Re: Transducers: sequence versus eduction

Tassilo Horn Thu, 02 Apr 2015 02:10:56 -0700

Alex Miller <a...@puredanger.com> writes:

>     Ok. But to me, if I can call `seq` on a thing and iterate it using
>     `first` and `rest`, that's a sequable thing to me. :-)
>
> Fair enough. I just meant it no longer implements Seqable. :)


Yes, I got that.  But I think that's an implementation detail.  I go
with the sequable definition of "Clojure Programming" which says

  The set of types that are /sequable/ -- that is, those for which `seq`
  can produce a valid value -- include:

    - All Clojure collection types
    - All Java collections
    - ...
    - Anything that implements Clojure's clojure.lang.Sequable interface

So we can agree on eduction not being a Sequable but still being
sequable. :-)

>     I think my prime use-case is deeply nested mapcatting where the
>     mapcatted function gets an object and returns a java collection, and
>     filtering with usually a quite cheap predicate. E.g.
>
>     (sequence-or-eduction (comp (mapcat ...)
>     (filter ...)
>     (mapcat ...)
>     ...
>     (filter ...))
>     coll)
>
>     That's part of a larger search algorithm which either stops after
>     finding the first match (in which case the resulting sequence is likely
>     not to be realized completely) or runs till all matches are found (in
>     which case all these transducer-produced sequences would be fully
>     realized).
>
>     If that would make sense, I could use eduction when I know everything
>     will be realized and sequence in the other case (or vice versa).
>
> In this case, I don't know that I'd expect much difference. sequence
> is going to create a TransformerIterator, and wrap it in an
> chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly
> the same thing, except calling through an Eduction type.  So, I doubt
> it matters.

Ok, thanks.

Could you also comment on my other question namely the full realization
of intermediate operations in:

,----[ Docs of sequence at http://clojure.org/transducers ]
| The resulting sequence elements are incrementally computed. These
| sequences will consume input incrementally as needed and fully realize
| intermediate operations.  This behavior differs from the equivalent
| operations on lazy sequences.
`----

Does that mean I'm better off with a traditionally nested
map/mapcat/filter cascade instead of transducers in the case where the
intermediate mapcatting functions return possibly not so cheap/large
lazy sequences and it is likely that I won't fully realize the result
sequence or eduction?

Hm, I now benchmarked a bit with criterium and measured

  (sequence (comp (mapcat #(range %))
                  (mapcat #(range %)))
            (range 1000))

versus

  (eduction (comp (mapcat #(range %))
                  (mapcat #(range %)))
            (range 1000))

versus the traditional

  (mapcat #(range %)
          (mapcat #(range %)
                  (range 1000)))

and either taking just the first element of the result using (first
...), taking the first 1000 elements of the result using (dorun (take
1000 ...)), or realizing everything using (dorun ...).

These are the timings:

| Version            | first | first 1000 | everything |
|--------------------+-------+------------+------------|
| transd. (sequence) | 10 μs | 214 μs     | 21 sec     |
| transd. (eduction) | 10 μs | 216 μs     | 21 sec     |
| traditional        |  3 μs | 151 μs     |  7 sec     |

So as you see, the traditional version is about three times faster in
the first and everything scenario, and just a littlebit faster in the
first 1000 scenario.

When I test

  (sequence (comp (mapcat #(range % 1000))
                  (mapcat #(range % 1000)))
            (range 1000))

versus

  (eduction (comp (mapcat #(range % 1000))
                  (mapcat #(range % 1000)))
            (range 1000))

versus

  (mapcat #(range % 1000)
          (mapcat #(range % 1000)
                  (range 1000)))

so that the larger intermediate sequences come first, I get these
timings:

| Version            | first | first 1000 | everything |
|--------------------+-------+------------+------------|
| transd. (sequence) | 24 ms |  24 ms     | 21 sec     |
| transd. (eduction) | 24 ms |  24 ms     | 21 sec     |
| traditional        |  4 μs | 102 μs     |  7 sec     |

So here the results are even much worse than above.  If you don't fully
realize the resulting sequence the traditional version can be many
orders of magnitudes faster.  I guess that's because of the cited
statement above, i.e., intermediate operations are always fully
realized.

However, at least I had expected that in the case where all elements are
realized the transducer version should have been faster than the
traditional version which also needs to fully realize all intermediate
lazy seqs.  Why is it still three times slower?

So my conclusion is that you cannot use transducers as a kind of drop-in
replacement of traditional sequence manipulation functions.  They pay
off only when you can make very strong assumptions about the sizes and
compututation costs of intermediate collections, and I think you cannot
do that in general.  Or well, maybe you can when you program an
application but you almost certainly cannot when you program a library
and thus have no clue about how that's gonna be used by users.

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Transducers: sequence versus eduction

Reply via email to