Re: Transducers: sequence versus eduction
On Apr 1, 2015, at 11:16 AM, Alex Miller a...@puredanger.com wrote: - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of - chains like this: (- s (interpose 5) (partition-all 2)) to this: (- s (eduction (interpose 5) (partition-all 2))) Maybe it’s just me, but the eduction argument order just looks strange to me. For a variadic function, I would have expected the single collection to come first, then any number of xforms. There must be a better reason than mechanical rewriting. Wouldn't a macro make more sense? (defmacro educe- [coll xfs] `(-Eduction (comp ~@xfs) ~coll)) So the rewrite could be just educe- for -, without having to wrap the xforms at all. (educe- s (interpose 5) (partition-all 2)) Steve Miner stevemi...@gmail.com -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
I understand your point and there are several competing comparisons here. Generally only collection functions take the coll first. eduction is similar to sequence (and into and reduce and transduce) in taking it last (but differs in taking multiple xforms). The multiple xforms are similar to - though which works left to right and puts the source first. I think I'd rather emphasize it's similarity to sequence than its similarity to -. On Friday, April 3, 2015 at 10:08:38 AM UTC-5, miner wrote: On Apr 1, 2015, at 11:16 AM, Alex Miller al...@puredanger.com javascript: wrote: - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of - chains like this: (- s (interpose 5) (partition-all 2)) to this: (- s (eduction (interpose 5) (partition-all 2))) Maybe it’s just me, but the eduction argument order just looks strange to me. For a variadic function, I would have expected the single collection to come first, then any number of xforms. There must be a better reason than mechanical rewriting. Wouldn't a macro make more sense? (defmacro educe- [coll xfs] `(-Eduction (comp ~@xfs) ~coll)) So the rewrite could be just educe- for -, without having to wrap the xforms at all. (educe- s (interpose 5) (partition-all 2)) Steve Miner steve...@gmail.com javascript: -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
Alex Miller a...@puredanger.com writes: Ok. But to me, if I can call `seq` on a thing and iterate it using `first` and `rest`, that's a sequable thing to me. :-) Fair enough. I just meant it no longer implements Seqable. :) Yes, I got that. But I think that's an implementation detail. I go with the sequable definition of Clojure Programming which says The set of types that are /sequable/ -- that is, those for which `seq` can produce a valid value -- include: - All Clojure collection types - All Java collections - ... - Anything that implements Clojure's clojure.lang.Sequable interface So we can agree on eduction not being a Sequable but still being sequable. :-) I think my prime use-case is deeply nested mapcatting where the mapcatted function gets an object and returns a java collection, and filtering with usually a quite cheap predicate. E.g. (sequence-or-eduction (comp (mapcat ...) (filter ...) (mapcat ...) ... (filter ...)) coll) That's part of a larger search algorithm which either stops after finding the first match (in which case the resulting sequence is likely not to be realized completely) or runs till all matches are found (in which case all these transducer-produced sequences would be fully realized). If that would make sense, I could use eduction when I know everything will be realized and sequence in the other case (or vice versa). In this case, I don't know that I'd expect much difference. sequence is going to create a TransformerIterator, and wrap it in an chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly the same thing, except calling through an Eduction type. So, I doubt it matters. Ok, thanks. Could you also comment on my other question namely the full realization of intermediate operations in: ,[ Docs of sequence at http://clojure.org/transducers ] | The resulting sequence elements are incrementally computed. These | sequences will consume input incrementally as needed and fully realize | intermediate operations. This behavior differs from the equivalent | operations on lazy sequences. ` Does that mean I'm better off with a traditionally nested map/mapcat/filter cascade instead of transducers in the case where the intermediate mapcatting functions return possibly not so cheap/large lazy sequences and it is likely that I won't fully realize the result sequence or eduction? Hm, I now benchmarked a bit with criterium and measured (sequence (comp (mapcat #(range %)) (mapcat #(range %))) (range 1000)) versus (eduction (comp (mapcat #(range %)) (mapcat #(range %))) (range 1000)) versus the traditional (mapcat #(range %) (mapcat #(range %) (range 1000))) and either taking just the first element of the result using (first ...), taking the first 1000 elements of the result using (dorun (take 1000 ...)), or realizing everything using (dorun ...). These are the timings: | Version| first | first 1000 | everything | |+---++| | transd. (sequence) | 10 μs | 214 μs | 21 sec | | transd. (eduction) | 10 μs | 216 μs | 21 sec | | traditional| 3 μs | 151 μs | 7 sec | So as you see, the traditional version is about three times faster in the first and everything scenario, and just a littlebit faster in the first 1000 scenario. When I test (sequence (comp (mapcat #(range % 1000)) (mapcat #(range % 1000))) (range 1000)) versus (eduction (comp (mapcat #(range % 1000)) (mapcat #(range % 1000))) (range 1000)) versus (mapcat #(range % 1000) (mapcat #(range % 1000) (range 1000))) so that the larger intermediate sequences come first, I get these timings: | Version| first | first 1000 | everything | |+---++| | transd. (sequence) | 24 ms | 24 ms | 21 sec | | transd. (eduction) | 24 ms | 24 ms | 21 sec | | traditional| 4 μs | 102 μs | 7 sec | So here the results are even much worse than above. If you don't fully realize the resulting sequence the traditional version can be many orders of magnitudes faster. I guess that's because of the cited statement above, i.e., intermediate operations are always fully realized. However, at least I had expected that in the case where all elements are realized the transducer version should have been faster than the traditional version which also needs to fully realize all intermediate lazy seqs. Why is it still three times slower? So my conclusion is that you cannot use transducers as a kind of drop-in replacement of traditional sequence manipulation functions. They pay off only when you can make very strong assumptions about
Re: Transducers: sequence versus eduction
On Apr 2, 2015, at 4:09 AM, Tassilo Horn t...@gnu.org wrote: So we can agree on eduction not being a Sequable but still being sequable. :-) Agreed. :) I think my prime use-case is deeply nested mapcatting where the mapcatted function gets an object and returns a java collection, and filtering with usually a quite cheap predicate. E.g. (sequence-or-eduction (comp (mapcat ...) (filter ...) (mapcat ...) ... (filter ...)) coll) That's part of a larger search algorithm which either stops after finding the first match (in which case the resulting sequence is likely not to be realized completely) or runs till all matches are found (in which case all these transducer-produced sequences would be fully realized). If that would make sense, I could use eduction when I know everything will be realized and sequence in the other case (or vice versa). In this case, I don't know that I'd expect much difference. sequence is going to create a TransformerIterator, and wrap it in an chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly the same thing, except calling through an Eduction type. So, I doubt it matters. Ok, thanks. Could you also comment on my other question namely the full realization of intermediate operations in: ,[ Docs of sequence at http://clojure.org/transducers ] | The resulting sequence elements are incrementally computed. These | sequences will consume input incrementally as needed and fully realize | intermediate operations. This behavior differs from the equivalent | operations on lazy sequences. ` Does that mean I'm better off with a traditionally nested map/mapcat/filter cascade instead of transducers in the case where the intermediate mapcatting functions return possibly not so cheap/large lazy sequences and it is likely that I won't fully realize the result sequence or eduction? If you're going to use expanding transformations and not realize all of the results then I think sequences are likely a better choice for you. Hm, I now benchmarked a bit with criterium and measured (sequence (comp (mapcat #(range %)) (mapcat #(range %))) (range 1000)) versus (eduction (comp (mapcat #(range %)) (mapcat #(range %))) (range 1000)) versus the traditional (mapcat #(range %) (mapcat #(range %) (range 1000))) and either taking just the first element of the result using (first ...), taking the first 1000 elements of the result using (dorun (take 1000 ...)), or realizing everything using (dorun ...). These are the timings: | Version| first | first 1000 | everything | |+---++| | transd. (sequence) | 10 μs | 214 μs | 21 sec | | transd. (eduction) | 10 μs | 216 μs | 21 sec | | traditional| 3 μs | 151 μs | 7 sec | So as you see, the traditional version is about three times faster in the first and everything scenario, and just a littlebit faster in the first 1000 scenario. When I test (sequence (comp (mapcat #(range % 1000)) (mapcat #(range % 1000))) (range 1000)) versus (eduction (comp (mapcat #(range % 1000)) (mapcat #(range % 1000))) (range 1000)) versus (mapcat #(range % 1000) (mapcat #(range % 1000) (range 1000))) so that the larger intermediate sequences come first, I get these timings: | Version| first | first 1000 | everything | |+---++| | transd. (sequence) | 24 ms | 24 ms | 21 sec | | transd. (eduction) | 24 ms | 24 ms | 21 sec | | traditional| 4 μs | 102 μs | 7 sec | So here the results are even much worse than above. If you don't fully realize the resulting sequence the traditional version can be many orders of magnitudes faster. I guess that's because of the cited statement above, i.e., intermediate operations are always fully realized. However, at least I had expected that in the case where all elements are realized the transducer version should have been faster than the traditional version which also needs to fully realize all intermediate lazy seqs. Why is it still three times slower? I think my main suggestion here is that you are using a non-reducible source (range) throughout these timings, so transducers have no leverage on the input side. CLJ-1515 will make range reducible and should help a lot on this particular example. So my conclusion is that you cannot use transducers as a kind of drop-in replacement of traditional sequence manipulation functions. They pay off only when you can make very strong assumptions about the sizes and compututation costs of intermediate collections, and I think you cannot
Re: Transducers: sequence versus eduction
Alex Miller a...@puredanger.com writes: Hi Alex, Just for fun, I ran the (dorun (sequence (comp (mapcat #(range %)) (mapcat # (range %))) (range 1000))) and eduction version with the CLJ-1515 patch. I saw ~ 20 seconds before and 15 seconds after. But the new version also makes pure seqs better and I saw the full (dorun ...) drop from 6 seconds before to 3 seconds after too. Just a hint of the benefits coming in CLJ-1515. Awesome! Can't wait for alpha7 then. :-) Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
Just for fun, I ran the (dorun (sequence (comp (mapcat #(range %)) (mapcat #(range %))) (range 1000))) and eduction version with the CLJ-1515 patch. I saw ~20 seconds before and 15 seconds after. But the new version also makes pure seqs better and I saw the full (dorun ...) drop from 6 seconds before to 3 seconds after too. Just a hint of the benefits coming in CLJ-1515. On Thursday, April 2, 2015 at 9:21:08 AM UTC-5, Michał Marczyk wrote: It may be worth noting that while the return value of range is wrapped in lazy-seq and thus isn't itself a clojure.lang.IChunkedSeq, what you get when you realize it is indeed chunked: (contains? (ancestors (class (seq (range 128 clojure.lang.IChunkedSeq) true It doesn't implement c.l.IReduce, but clojure.core.protocols/InternalReduce has an implementation for c.l.IChunkedSeq. At least transduce should be able to take advantage of the InternalReduce implementation (via CollReduce). transduce could be used for short-circuiting search with (reduced …), so it might be a legitimate contender here. Cheers, Michał On 2 April 2015 at 15:38, Tassilo Horn ts...@gnu.org javascript: wrote: Alex Miller al...@puredanger.com javascript: writes: Hi Alex, If you're going to use expanding transformations and not realize all of the results then I think sequences are likely a better choice for you. Ok, I see. However, at least I had expected that in the case where all elements are realized the transducer version should have been faster than the traditional version which also needs to fully realize all intermediate lazy seqs. Why is it still three times slower? I think my main suggestion here is that you are using a non-reducible source (range) throughout these timings, so transducers have no leverage on the input side. CLJ-1515 will make range reducible and should help a lot on this particular example. Well, even if I revamp the (admittedly contrieved) example to have reducible vectors as source and also intermediates (let [v (vec (range 0 1000)) vs (zipmap (range 0 1000) (for [i (range 0 1000)] (vec (range i 1000] (time (dorun (sequence (comp (mapcat (fn [i] (vs i))) (mapcat (fn [i] (vs i v it still takes 18 seconds instead of 21 with lazy seqs produced by range, or just 7 seconds with normal lazy seq functions. In my real scenario, I think there's also no IReduces paths because the mapcat functions either return normal lazy seqs or Java Collections (which are not actually clojure collections). But usually, the transformations are not so freaking expanding as the example above. I benchmarked a bit, and there sometimes using transducers is faster and sometimes it is not. So I've made than configurable (with normal lazy seqs as default) so users can benchmark and then decide, and I don't need to choose for them. :-) Oh, and actually *you* have made that possible by making me aware of (sequence (comp xform*) start-coll) is almost identical to (- start-coll xform*) that is, when my macro computes xforms as if they were meant for transducing, I can also use them traditionally with -. Until now, I've newer used - but before I had implemented the expansion for transducers, I used a for with gensyms for intermediates like: (for [G__1 start-coll G__2 (xform1 G__1) G__3 (xform2 G__2)] G__3) That's pretty much different to generate. But since the xforms for transducers and - are the same, switching between lazy seq fns and transducers is just changing how start-coll and xforms are composed. Awesome! So my conclusion is that you cannot use transducers as a kind of drop-in replacement of traditional sequence manipulation functions. They pay off only when you can make very strong assumptions about the sizes and compututation costs of intermediate collections, and I think you cannot do that in general. Or well, maybe you can when you program an application but you almost certainly cannot when you program a library and thus have no clue about how that's gonna be used by users. Transducers make different trade offs than sequences and there will always be cases where one or the other is a better choice. I really appreciate this thread as highlighting some of the nuances. Yes, thanks a lot for your patience. I appreciate that very much. Transducers break transformations into three parts - source iteration, composed transforms, and output collection. In the case of reducible inputs, multiple transforms, and full realization, transducers can be much faster. If not all of those are in play, then the results are more subtle. One thing I've found in perf testing a lot of stuff is that chunked sequences continually surprise me at how fast they can be. Then maybe I should experiment with
Re: Transducers: sequence versus eduction
Alex Miller a...@puredanger.com writes: Hi Alex, If you're going to use expanding transformations and not realize all of the results then I think sequences are likely a better choice for you. Ok, I see. However, at least I had expected that in the case where all elements are realized the transducer version should have been faster than the traditional version which also needs to fully realize all intermediate lazy seqs. Why is it still three times slower? I think my main suggestion here is that you are using a non-reducible source (range) throughout these timings, so transducers have no leverage on the input side. CLJ-1515 will make range reducible and should help a lot on this particular example. Well, even if I revamp the (admittedly contrieved) example to have reducible vectors as source and also intermediates (let [v (vec (range 0 1000)) vs (zipmap (range 0 1000) (for [i (range 0 1000)] (vec (range i 1000] (time (dorun (sequence (comp (mapcat (fn [i] (vs i))) (mapcat (fn [i] (vs i v it still takes 18 seconds instead of 21 with lazy seqs produced by range, or just 7 seconds with normal lazy seq functions. In my real scenario, I think there's also no IReduces paths because the mapcat functions either return normal lazy seqs or Java Collections (which are not actually clojure collections). But usually, the transformations are not so freaking expanding as the example above. I benchmarked a bit, and there sometimes using transducers is faster and sometimes it is not. So I've made than configurable (with normal lazy seqs as default) so users can benchmark and then decide, and I don't need to choose for them. :-) Oh, and actually *you* have made that possible by making me aware of (sequence (comp xform*) start-coll) is almost identical to (- start-coll xform*) that is, when my macro computes xforms as if they were meant for transducing, I can also use them traditionally with -. Until now, I've newer used - but before I had implemented the expansion for transducers, I used a for with gensyms for intermediates like: (for [G__1 start-coll G__2 (xform1 G__1) G__3 (xform2 G__2)] G__3) That's pretty much different to generate. But since the xforms for transducers and - are the same, switching between lazy seq fns and transducers is just changing how start-coll and xforms are composed. Awesome! So my conclusion is that you cannot use transducers as a kind of drop-in replacement of traditional sequence manipulation functions. They pay off only when you can make very strong assumptions about the sizes and compututation costs of intermediate collections, and I think you cannot do that in general. Or well, maybe you can when you program an application but you almost certainly cannot when you program a library and thus have no clue about how that's gonna be used by users. Transducers make different trade offs than sequences and there will always be cases where one or the other is a better choice. I really appreciate this thread as highlighting some of the nuances. Yes, thanks a lot for your patience. I appreciate that very much. Transducers break transformations into three parts - source iteration, composed transforms, and output collection. In the case of reducible inputs, multiple transforms, and full realization, transducers can be much faster. If not all of those are in play, then the results are more subtle. One thing I've found in perf testing a lot of stuff is that chunked sequences continually surprise me at how fast they can be. Then maybe I should experiment with chunged seqs. Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
It may be worth noting that while the return value of range is wrapped in lazy-seq and thus isn't itself a clojure.lang.IChunkedSeq, what you get when you realize it is indeed chunked: (contains? (ancestors (class (seq (range 128 clojure.lang.IChunkedSeq) true It doesn't implement c.l.IReduce, but clojure.core.protocols/InternalReduce has an implementation for c.l.IChunkedSeq. At least transduce should be able to take advantage of the InternalReduce implementation (via CollReduce). transduce could be used for short-circuiting search with (reduced …), so it might be a legitimate contender here. Cheers, Michał On 2 April 2015 at 15:38, Tassilo Horn t...@gnu.org wrote: Alex Miller a...@puredanger.com writes: Hi Alex, If you're going to use expanding transformations and not realize all of the results then I think sequences are likely a better choice for you. Ok, I see. However, at least I had expected that in the case where all elements are realized the transducer version should have been faster than the traditional version which also needs to fully realize all intermediate lazy seqs. Why is it still three times slower? I think my main suggestion here is that you are using a non-reducible source (range) throughout these timings, so transducers have no leverage on the input side. CLJ-1515 will make range reducible and should help a lot on this particular example. Well, even if I revamp the (admittedly contrieved) example to have reducible vectors as source and also intermediates (let [v (vec (range 0 1000)) vs (zipmap (range 0 1000) (for [i (range 0 1000)] (vec (range i 1000] (time (dorun (sequence (comp (mapcat (fn [i] (vs i))) (mapcat (fn [i] (vs i v it still takes 18 seconds instead of 21 with lazy seqs produced by range, or just 7 seconds with normal lazy seq functions. In my real scenario, I think there's also no IReduces paths because the mapcat functions either return normal lazy seqs or Java Collections (which are not actually clojure collections). But usually, the transformations are not so freaking expanding as the example above. I benchmarked a bit, and there sometimes using transducers is faster and sometimes it is not. So I've made than configurable (with normal lazy seqs as default) so users can benchmark and then decide, and I don't need to choose for them. :-) Oh, and actually *you* have made that possible by making me aware of (sequence (comp xform*) start-coll) is almost identical to (- start-coll xform*) that is, when my macro computes xforms as if they were meant for transducing, I can also use them traditionally with -. Until now, I've newer used - but before I had implemented the expansion for transducers, I used a for with gensyms for intermediates like: (for [G__1 start-coll G__2 (xform1 G__1) G__3 (xform2 G__2)] G__3) That's pretty much different to generate. But since the xforms for transducers and - are the same, switching between lazy seq fns and transducers is just changing how start-coll and xforms are composed. Awesome! So my conclusion is that you cannot use transducers as a kind of drop-in replacement of traditional sequence manipulation functions. They pay off only when you can make very strong assumptions about the sizes and compututation costs of intermediate collections, and I think you cannot do that in general. Or well, maybe you can when you program an application but you almost certainly cannot when you program a library and thus have no clue about how that's gonna be used by users. Transducers make different trade offs than sequences and there will always be cases where one or the other is a better choice. I really appreciate this thread as highlighting some of the nuances. Yes, thanks a lot for your patience. I appreciate that very much. Transducers break transformations into three parts - source iteration, composed transforms, and output collection. In the case of reducible inputs, multiple transforms, and full realization, transducers can be much faster. If not all of those are in play, then the results are more subtle. One thing I've found in perf testing a lot of stuff is that chunked sequences continually surprise me at how fast they can be. Then maybe I should experiment with chunged seqs. Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are
Re: Transducers: sequence versus eduction
vve...@gmail.com writes: Eduction retains the ability to be recomposed with other transducers higher in the function chain. The following two are nearly equivalent: (transduce (take 1e2) + (eduction (filter odd?) (range))) (transduce (comp (filter odd?) (take 1e2)) + (range)) This will be slower: (transduce (take 1e2) + (sequence (filter odd?) (range))) Execution time mean : 19.054407 µs Execution time mean : 19.530890 µs Execution time mean : 39.955692 µs Interesting. But in my code experimentation has shown that sequence is almost always faster in my use-cases which usually look like (sequence (comp (mapcat foo1) (filter p1) (map f1) (mapcat foo2) (filter p2) (mapcat foo3) (filter p3)) coll) Here, I've switched between making the foo* functions return eductions or lazy sequences, and the latter seems to alway be faster although that looks like a use-case of eduction based on my assumptions... So maybe eduction can unfold its full potential only during transduce? Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
Eduction retains the ability to be recomposed with other transducers higher in the function chain. The following two are nearly equivalent: (transduce (take 1e2) + (eduction (filter odd?) (range))) (transduce (comp (filter odd?) (take 1e2)) + (range)) This will be slower: (transduce (take 1e2) + (sequence (filter odd?) (range))) Execution time mean : 19.054407 µs Execution time mean : 19.530890 µs Execution time mean : 39.955692 µs I also wonder to what extent eduction can be used as a drop in replacement for sequence. On Wednesday, April 1, 2015 at 9:13:07 AM UTC+2, Tassilo Horn wrote: Hi all, I've switched many nested filter/map/mapcat applications in my code to using transducers. That brought a moderate speedup in certain cases and the deeper the nesting has been before, the clearer the transducers code is in comparison, so yay! :-) However, I'm still quite unsure about the difference between `sequence` and `eduction`. From the docs and experimentation, I came to the assumptions below and I'd be grateful if someone with more knowledge could verify/falsify/add: - Return types differ: Sequence returns a standard lazy seq, eductions an instance of Eduction. - Eductions are reducible/sequable/iterable, i.e., basically I can use them wherever a (lazy) seq would also do, so sequence and eduction are quite interchangeable except when poking at internals, e.g., (.contains (sequence ...) x) works whereas (.contains (eduction ...) x) doesn't. - Both compute their contents lazily. - Lazy seqs cache their already realized contents, eductions compute them over and over again on each iteration. Because of that, I came to the conclusion that whenever I ask myself if one of my functions should return a lazy seq or an eduction, I should use these rules: 1. If the function is likely to be used like (let [xs (seq-producing-fn args)] (or (do-stuff-with xs) (do-other-stuff-with xs) ...)) that is, the resulting seq is likely to be bound to a variable which is then used multiple times (and thus lazy seq caching is benefitical), then use sequence. 2. If it is a private function only used internally and never with the usage pattern of point 1, then definitively use eduction. 3. If its a public function which usually isn't used with a pattern as in point 1, then I'm unsure. eduction is probably more efficient but sequence fits better in the original almost everything returns a lazy seq design. Also, the latter has the benefit that users of the library don't need to know anything about transducers. Is that sensible? Or am I completely wrong with my assumptions about sequence and eduction? On a related note, could someone please clarify the statement from the transducers docs for `sequence`? ,[ Docs of sequence at http://clojure.org/transducers ] | The resulting sequence elements are incrementally computed. These | sequences will consume input incrementally as needed and fully realize | intermediate operations. This behavior differs from the equivalent | operations on lazy sequences. ` I'm curious about the fully realize intermediate operations part. Does it mean that in a traditional (mapcat #(range %) (range 1)) the inner range is also evaluated lazy but with (sequence (mapcat #(range %)) (range 1)) it is not? It seems so. At least dorun-ning these two expressions shows that the traditional version is more than twice as fast than the transducer version. Also, the same seems to hold for (eduction (mapcat #(range %)) (range 1)) which is exactly as fast (or rather slow) as the sequence version. But wouldn't that mean that transducers with mapcat where the mapcatted function isn't super-cheap is a bad idea in general at least from a performance POV? Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
The general idea is that eduction is best when the result will be completely consumed in a reducible context. Any case of reusing the result will likely be better served by sequence which can cache and reuse the answer. On Wednesday, April 1, 2015 at 3:51:53 AM UTC-5, Tassilo Horn wrote: vve...@gmail.com javascript: writes: Eduction retains the ability to be recomposed with other transducers higher in the function chain. The following two are nearly equivalent: (transduce (take 1e2) + (eduction (filter odd?) (range))) (transduce (comp (filter odd?) (take 1e2)) + (range)) This will be slower: (transduce (take 1e2) + (sequence (filter odd?) (range))) Execution time mean : 19.054407 µs Execution time mean : 19.530890 µs Execution time mean : 39.955692 µs Interesting. But in my code experimentation has shown that sequence is almost always faster in my use-cases which usually look like (sequence (comp (mapcat foo1) (filter p1) (map f1) (mapcat foo2) (filter p2) (mapcat foo3) (filter p3)) coll) Here, I've switched between making the foo* functions return eductions or lazy sequences, and the latter seems to alway be faster although that looks like a use-case of eduction based on my assumptions... So maybe eduction can unfold its full potential only during transduce? Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
First, I love this discussion! Great questions, great thinking. Up top, I wanted to mention a couple things that have changed in alpha6: - Eduction is no longer Seqable and thus the return from eduction is not seqable (but it is reducible and iterable). You can use iterator-seq to get a chunked seq over the top if you need one. - Prior to alpha6, sequence with transformations returned a LazyTransformer. That's gone and now transformations are done using TransformerIterator, which is subsequently wrapped with a (now chunked) iterator-seq. - There are lots of performance implications due to those changes and I would recommend re-testing any perf test related to sequence or eduction on alpha6 to get a fresh picture as anything pre-alpha6 is not comparable. - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of - chains like this: (- s (interpose 5) (partition-all 2)) to this: (- s (eduction (interpose 5) (partition-all 2))) That particular example has timings in CLJ-1669 and shows a reduction from 501 microseconds to 108 microseconds on a source vector, comparable savings on a source sequence. You do need to be careful to not include non-transducer functions in that chain, or stop the rewrite when you fall into a materialization function at the end (for example, sort). On Wednesday, April 1, 2015 at 2:13:07 AM UTC-5, Tassilo Horn wrote: Hi all, I've switched many nested filter/map/mapcat applications in my code to using transducers. That brought a moderate speedup in certain cases and the deeper the nesting has been before, the clearer the transducers code is in comparison, so yay! :-) Depending what you're transducing over, you may see some additional gains in alpha6. However, I'm still quite unsure about the difference between `sequence` and `eduction`. From the docs and experimentation, I came to the assumptions below and I'd be grateful if someone with more knowledge could verify/falsify/add: - Return types differ: Sequence returns a standard lazy seq, eductions an instance of Eduction. Yes, although I think the actual return type of eduction is not important, rather the implemented interfaces of Iterable and IReduceInit are the key thing. - Eductions are reducible/sequable/iterable, i.e., basically I can use no longer explicitly seqable, although seq will automatically wrap a chunked iterator-seq around it if passed to a sequence function. them wherever a (lazy) seq would also do, so sequence and eduction are quite interchangeable except when poking at internals, e.g., (.contains (sequence ...) x) works whereas (.contains (eduction ...) x) doesn't. No guarantees made on those internals, only on what is promised in the docstrings (reducible/iterable for eduction and sequence for sequence). :) - Both compute their contents lazily. eduction is not lazy by itself. if used in a non-seq context, it is eagerly recomputed on each use. If you use it in a seq context, then it will be computed and cached as needed. - Lazy seqs cache their already realized contents, eductions compute them over and over again on each iteration. Yes. Because of that, I came to the conclusion that whenever I ask myself if one of my functions should return a lazy seq or an eduction, I should use these rules: 1. If the function is likely to be used like (let [xs (seq-producing-fn args)] (or (do-stuff-with xs) (do-other-stuff-with xs) ...)) that is, the resulting seq is likely to be bound to a variable which is then used multiple times (and thus lazy seq caching is benefitical), then use sequence. Yes, it makes sense to take advantage of seq caching this way. 2. If it is a private function only used internally and never with the usage pattern of point 1, then definitively use eduction. 3. If its a public function which usually isn't used with a pattern as in point 1, then I'm unsure. eduction is probably more efficient but sequence fits better in the original almost everything returns a lazy seq design. Also, the latter has the benefit that users of the library don't need to know anything about transducers. Is that sensible? Or am I completely wrong with my assumptions about sequence and eduction? You're definitely on the right track. How the result is going to be used is pretty important - eduction is designed to be consumed entirely for a result and thus gives up the caching of sequences. If you are going to consume the entire thing once, particularly in the context of a reduction, then eduction is far more efficient. Eduction does not need to allocate or cache intermediate sequence objects, so is much more memory and allocation efficient when used in this way (where
Re: Transducers: sequence versus eduction
Alex Miller a...@puredanger.com writes: Hi Alex, - Eduction is no longer Seqable and thus the return from eduction is not seqable (but it is reducible and iterable). You can use iterator-seq to get a chunked seq over the top if you need one. Really? user *clojure-version* {:major 1, :minor 7, :incremental 0, :qualifier alpha6} user (seq (eduction (map inc) (range 10))) (1 2 3 4 5 6 7 8 9 10) - There are lots of performance implications due to those changes and I would recommend re-testing any perf test related to sequence or eduction on alpha6 to get a fresh picture as anything pre-alpha6 is not comparable. My observations are all based on today's experience with alpha6. :-) - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of - chains like this: (- s (interpose 5) (partition-all 2)) to this: (- s (eduction (interpose 5) (partition-all 2))) Ah, that's sensible. The general idea is that eduction is best when the result will be completely consumed in a reducible context. Any case of reusing the result will likely be better served by sequence which can cache and reuse the answer. Yes, that's what I guessed. But at least when I tested replacing sequence with eduction at exactly these places () the result has been a slight slowdown instead of a speedup. But that might have been accidental as it is hard to do measurements with real-world code where a 5% slowdown/speedup might be the reason of something completely unrelated. Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
On Wed, Apr 1, 2015 at 2:17 PM, Tassilo Horn t...@gnu.org wrote: Alex Miller a...@puredanger.com writes: Hi Alex, - Eduction is no longer Seqable and thus the return from eduction is not seqable (but it is reducible and iterable). You can use iterator-seq to get a chunked seq over the top if you need one. Really? user *clojure-version* {:major 1, :minor 7, :incremental 0, :qualifier alpha6} user (seq (eduction (map inc) (range 10))) (1 2 3 4 5 6 7 8 9 10) Yep. :) user= (supers (class (eduction (map inc) (range 10 #{clojure.lang.Sequential java.lang.Object java.lang.Iterable clojure.lang.IReduceInit clojure.lang.IType} However, seq checks for Iterable and will automatically use iterator-seq to produce a seq from an Iterable, which is what's happening here. See: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L517 - There are lots of performance implications due to those changes and I would recommend re-testing any perf test related to sequence or eduction on alpha6 to get a fresh picture as anything pre-alpha6 is not comparable. My observations are all based on today's experience with alpha6. :-) - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of - chains like this: (- s (interpose 5) (partition-all 2)) to this: (- s (eduction (interpose 5) (partition-all 2))) Ah, that's sensible. The general idea is that eduction is best when the result will be completely consumed in a reducible context. Any case of reusing the result will likely be better served by sequence which can cache and reuse the answer. Yes, that's what I guessed. But at least when I tested replacing sequence with eduction at exactly these places () the result has been a slight slowdown instead of a speedup. But that might have been accidental as it is hard to do measurements with real-world code where a 5% slowdown/speedup might be the reason of something completely unrelated. Are you including the cost of realizing the elements? Comparing this stuff well is hard, or at least I had a hard time thinking through all the nuances. From what I've looked at, I think the alpha6 changes have made things slightly worse for sequence with transformations and somewhat better for eductions used in a reducible or sequence context. Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
Alex Miller a...@puredanger.com writes: seqable (but it is reducible and iterable). You can use iterator-seq to get a chunked seq over the top if you need one. Really? user *clojure-version* {:major 1, :minor 7, :incremental 0, :qualifier alpha6} user (seq (eduction (map inc) (range 10))) (1 2 3 4 5 6 7 8 9 10) Yep. :) user= (supers (class (eduction (map inc) (range 10 #{clojure.lang.Sequential java.lang.Object java.lang.Iterable clojure.lang.IReduceInit clojure.lang.IType} However, seq checks for Iterable and will automatically use iterator-seq to produce a seq from an Iterable, which is what's happening here. Ok. But to me, if I can call `seq` on a thing and iterate it using `first` and `rest`, that's a sequable thing to me. :-) The general idea is that eduction is best when the result will be completely consumed in a reducible context. Any case of reusing the result will likely be better served by sequence which can cache and reuse the answer. Yes, that's what I guessed. But at least when I tested replacing sequence with eduction at exactly these places () the result has been a slight slowdown instead of a speedup. But that might have been accidental as it is hard to do measurements with real-world code where a 5% slowdown/speedup might be the reason of something completely unrelated. Are you including the cost of realizing the elements? Oh, yes, and not only that. The tests I did maybe spend 20% of their time in the transducer part, 80% in other stuff. So I definitively have to do some more elaborate benchmarking where the transducer part is somehow isolated. That's why I've asked if there are clear usage-recipes when to use what. I think my prime use-case is deeply nested mapcatting where the mapcatted function gets an object and returns a java collection, and filtering with usually a quite cheap predicate. E.g. (sequence-or-eduction (comp (mapcat ...) (filter ...) (mapcat ...) ... (filter ...)) coll) That's part of a larger search algorithm which either stops after finding the first match (in which case the resulting sequence is likely not to be realized completely) or runs till all matches are found (in which case all these transducer-produced sequences would be fully realized). If that would make sense, I could use eduction when I know everything will be realized and sequence in the other case (or vice versa). Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Transducers: sequence versus eduction
On Wed, Apr 1, 2015 at 3:17 PM, Tassilo Horn t...@gnu.org wrote: Alex Miller a...@puredanger.com writes: Ok. But to me, if I can call `seq` on a thing and iterate it using `first` and `rest`, that's a sequable thing to me. :-) Fair enough. I just meant it no longer implements Seqable. :) The general idea is that eduction is best when the result will be completely consumed in a reducible context. Any case of reusing the result will likely be better served by sequence which can cache and reuse the answer. Yes, that's what I guessed. But at least when I tested replacing sequence with eduction at exactly these places () the result has been a slight slowdown instead of a speedup. But that might have been accidental as it is hard to do measurements with real-world code where a 5% slowdown/speedup might be the reason of something completely unrelated. Are you including the cost of realizing the elements? Oh, yes, and not only that. The tests I did maybe spend 20% of their time in the transducer part, 80% in other stuff. So I definitively have to do some more elaborate benchmarking where the transducer part is somehow isolated. That's why I've asked if there are clear usage-recipes when to use what. I think my prime use-case is deeply nested mapcatting where the mapcatted function gets an object and returns a java collection, and filtering with usually a quite cheap predicate. E.g. (sequence-or-eduction (comp (mapcat ...) (filter ...) (mapcat ...) ... (filter ...)) coll) That's part of a larger search algorithm which either stops after finding the first match (in which case the resulting sequence is likely not to be realized completely) or runs till all matches are found (in which case all these transducer-produced sequences would be fully realized). If that would make sense, I could use eduction when I know everything will be realized and sequence in the other case (or vice versa). In this case, I don't know that I'd expect much difference. sequence is going to create a TransformerIterator, and wrap it in an chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly the same thing, except calling through an Eduction type. So, I doubt it matters. Bye, Tassilo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.