Re: Transducers: sequence versus eduction

2015-04-03 Thread Steve Miner

 On Apr 1, 2015, at 11:16 AM, Alex Miller a...@puredanger.com wrote:
 
 - eduction now takes multiple transformations, not just one, and composes 
 them. This is designed for mechanical rewriting (hello tool developers!!) of 
 - chains like this:
 
 (- s (interpose 5) (partition-all 2))
 
 to this:
 
 (- s (eduction (interpose 5) (partition-all 2)))
 

Maybe it’s just me, but the eduction argument order just looks strange to me.  
For a variadic function, I would have expected the single collection to come 
first, then any number of xforms.  There must be a better reason than 
mechanical rewriting.  Wouldn't a macro make more sense?

(defmacro educe- [coll  xfs] `(-Eduction (comp ~@xfs) ~coll))

So the rewrite could be just educe- for -, without having to wrap the 
xforms at all.

(educe- s (interpose 5) (partition-all 2))



Steve Miner
stevemi...@gmail.com






-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-03 Thread Alex Miller
I understand your point and there are several competing comparisons here. 
Generally only collection functions take the coll first. eduction is 
similar to sequence (and into and reduce and transduce) in taking it last 
(but differs in taking multiple xforms). The multiple xforms are similar to 
- though which works left to right and puts the source first. I think I'd 
rather emphasize it's similarity to sequence than its similarity to -.


On Friday, April 3, 2015 at 10:08:38 AM UTC-5, miner wrote:


  On Apr 1, 2015, at 11:16 AM, Alex Miller al...@puredanger.com 
 javascript: wrote: 
  
  - eduction now takes multiple transformations, not just one, and 
 composes them. This is designed for mechanical rewriting (hello tool 
 developers!!) of - chains like this: 
  
  (- s (interpose 5) (partition-all 2)) 
  
  to this: 
  
  (- s (eduction (interpose 5) (partition-all 2))) 
  

 Maybe it’s just me, but the eduction argument order just looks strange to 
 me.  For a variadic function, I would have expected the single collection 
 to come first, then any number of xforms.  There must be a better reason 
 than mechanical rewriting.  Wouldn't a macro make more sense? 

 (defmacro educe- [coll  xfs] `(-Eduction (comp ~@xfs) ~coll)) 

 So the rewrite could be just educe- for -, without having to wrap the 
 xforms at all. 

 (educe- s (interpose 5) (partition-all 2)) 



 Steve Miner 
 steve...@gmail.com javascript: 








-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-02 Thread Tassilo Horn
Alex Miller a...@puredanger.com writes:

 Ok. But to me, if I can call `seq` on a thing and iterate it using
 `first` and `rest`, that's a sequable thing to me. :-)

 Fair enough. I just meant it no longer implements Seqable. :) 

Yes, I got that.  But I think that's an implementation detail.  I go
with the sequable definition of Clojure Programming which says

  The set of types that are /sequable/ -- that is, those for which `seq`
  can produce a valid value -- include:

- All Clojure collection types
- All Java collections
- ...
- Anything that implements Clojure's clojure.lang.Sequable interface

So we can agree on eduction not being a Sequable but still being
sequable. :-)

 I think my prime use-case is deeply nested mapcatting where the
 mapcatted function gets an object and returns a java collection, and
 filtering with usually a quite cheap predicate. E.g.

 (sequence-or-eduction (comp (mapcat ...)
 (filter ...)
 (mapcat ...)
 ...
 (filter ...))
 coll)

 That's part of a larger search algorithm which either stops after
 finding the first match (in which case the resulting sequence is likely
 not to be realized completely) or runs till all matches are found (in
 which case all these transducer-produced sequences would be fully
 realized).

 If that would make sense, I could use eduction when I know everything
 will be realized and sequence in the other case (or vice versa).

 In this case, I don't know that I'd expect much difference. sequence
 is going to create a TransformerIterator, and wrap it in an
 chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly
 the same thing, except calling through an Eduction type.  So, I doubt
 it matters.

Ok, thanks.

Could you also comment on my other question namely the full realization
of intermediate operations in:

,[ Docs of sequence at http://clojure.org/transducers ]
| The resulting sequence elements are incrementally computed. These
| sequences will consume input incrementally as needed and fully realize
| intermediate operations.  This behavior differs from the equivalent
| operations on lazy sequences.
`

Does that mean I'm better off with a traditionally nested
map/mapcat/filter cascade instead of transducers in the case where the
intermediate mapcatting functions return possibly not so cheap/large
lazy sequences and it is likely that I won't fully realize the result
sequence or eduction?

Hm, I now benchmarked a bit with criterium and measured

  (sequence (comp (mapcat #(range %))
  (mapcat #(range %)))
(range 1000))

versus

  (eduction (comp (mapcat #(range %))
  (mapcat #(range %)))
(range 1000))

versus the traditional

  (mapcat #(range %)
  (mapcat #(range %)
  (range 1000)))

and either taking just the first element of the result using (first
...), taking the first 1000 elements of the result using (dorun (take
1000 ...)), or realizing everything using (dorun ...).

These are the timings:

| Version| first | first 1000 | everything |
|+---++|
| transd. (sequence) | 10 μs | 214 μs | 21 sec |
| transd. (eduction) | 10 μs | 216 μs | 21 sec |
| traditional|  3 μs | 151 μs |  7 sec |

So as you see, the traditional version is about three times faster in
the first and everything scenario, and just a littlebit faster in the
first 1000 scenario.

When I test

  (sequence (comp (mapcat #(range % 1000))
  (mapcat #(range % 1000)))
(range 1000))

versus

  (eduction (comp (mapcat #(range % 1000))
  (mapcat #(range % 1000)))
(range 1000))

versus

  (mapcat #(range % 1000)
  (mapcat #(range % 1000)
  (range 1000)))

so that the larger intermediate sequences come first, I get these
timings:

| Version| first | first 1000 | everything |
|+---++|
| transd. (sequence) | 24 ms |  24 ms | 21 sec |
| transd. (eduction) | 24 ms |  24 ms | 21 sec |
| traditional|  4 μs | 102 μs |  7 sec |

So here the results are even much worse than above.  If you don't fully
realize the resulting sequence the traditional version can be many
orders of magnitudes faster.  I guess that's because of the cited
statement above, i.e., intermediate operations are always fully
realized.

However, at least I had expected that in the case where all elements are
realized the transducer version should have been faster than the
traditional version which also needs to fully realize all intermediate
lazy seqs.  Why is it still three times slower?

So my conclusion is that you cannot use transducers as a kind of drop-in
replacement of traditional sequence manipulation functions.  They pay
off only when you can make very strong assumptions about 

Re: Transducers: sequence versus eduction

2015-04-02 Thread Alex Miller




 On Apr 2, 2015, at 4:09 AM, Tassilo Horn t...@gnu.org wrote:

 So we can agree on eduction not being a Sequable but still being
 sequable. :-)

Agreed. :)

 
I think my prime use-case is deeply nested mapcatting where the
mapcatted function gets an object and returns a java collection, and
filtering with usually a quite cheap predicate. E.g.
 
(sequence-or-eduction (comp (mapcat ...)
(filter ...)
(mapcat ...)
...
(filter ...))
coll)
 
That's part of a larger search algorithm which either stops after
finding the first match (in which case the resulting sequence is likely
not to be realized completely) or runs till all matches are found (in
which case all these transducer-produced sequences would be fully
realized).
 
If that would make sense, I could use eduction when I know everything
will be realized and sequence in the other case (or vice versa).
 
 In this case, I don't know that I'd expect much difference. sequence
 is going to create a TransformerIterator, and wrap it in an
 chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly
 the same thing, except calling through an Eduction type.  So, I doubt
 it matters.
 
 Ok, thanks.
 
 Could you also comment on my other question namely the full realization
 of intermediate operations in:
 
 ,[ Docs of sequence at http://clojure.org/transducers ]
 | The resulting sequence elements are incrementally computed. These
 | sequences will consume input incrementally as needed and fully realize
 | intermediate operations.  This behavior differs from the equivalent
 | operations on lazy sequences.
 `
 
 Does that mean I'm better off with a traditionally nested
 map/mapcat/filter cascade instead of transducers in the case where the
 intermediate mapcatting functions return possibly not so cheap/large
 lazy sequences and it is likely that I won't fully realize the result
 sequence or eduction?

If you're going to use expanding transformations and not realize all of the 
results then I think sequences are likely a better choice for you.

 
 Hm, I now benchmarked a bit with criterium and measured
 
  (sequence (comp (mapcat #(range %))
  (mapcat #(range %)))
(range 1000))
 
 versus
 
  (eduction (comp (mapcat #(range %))
  (mapcat #(range %)))
(range 1000))
 
 versus the traditional
 
  (mapcat #(range %)
  (mapcat #(range %)
  (range 1000)))
 
 and either taking just the first element of the result using (first
 ...), taking the first 1000 elements of the result using (dorun (take
 1000 ...)), or realizing everything using (dorun ...).
 
 These are the timings:
 
 | Version| first | first 1000 | everything |
 |+---++|
 | transd. (sequence) | 10 μs | 214 μs | 21 sec |
 | transd. (eduction) | 10 μs | 216 μs | 21 sec |
 | traditional|  3 μs | 151 μs |  7 sec |
 
 So as you see, the traditional version is about three times faster in
 the first and everything scenario, and just a littlebit faster in the
 first 1000 scenario.
 
 When I test
 
  (sequence (comp (mapcat #(range % 1000))
  (mapcat #(range % 1000)))
(range 1000))
 
 versus
 
  (eduction (comp (mapcat #(range % 1000))
  (mapcat #(range % 1000)))
(range 1000))
 
 versus
 
  (mapcat #(range % 1000)
  (mapcat #(range % 1000)
  (range 1000)))
 
 so that the larger intermediate sequences come first, I get these
 timings:
 
 | Version| first | first 1000 | everything |
 |+---++|
 | transd. (sequence) | 24 ms |  24 ms | 21 sec |
 | transd. (eduction) | 24 ms |  24 ms | 21 sec |
 | traditional|  4 μs | 102 μs |  7 sec |
 
 So here the results are even much worse than above.  If you don't fully
 realize the resulting sequence the traditional version can be many
 orders of magnitudes faster.  I guess that's because of the cited
 statement above, i.e., intermediate operations are always fully
 realized.
 
 However, at least I had expected that in the case where all elements are
 realized the transducer version should have been faster than the
 traditional version which also needs to fully realize all intermediate
 lazy seqs.  Why is it still three times slower?

I think my main suggestion here is that you are using a non-reducible source 
(range) throughout these timings, so transducers have no leverage on the input 
side. CLJ-1515 will make range reducible and should help a lot on this 
particular example.

 
 So my conclusion is that you cannot use transducers as a kind of drop-in
 replacement of traditional sequence manipulation functions.  They pay
 off only when you can make very strong assumptions about the sizes and
 compututation costs of intermediate collections, and I think you cannot
 

Re: Transducers: sequence versus eduction

2015-04-02 Thread Tassilo Horn
Alex Miller a...@puredanger.com writes:

Hi Alex,

 Just for fun, I ran the (dorun (sequence (comp (mapcat #(range %))
 (mapcat # (range %))) (range 1000))) and eduction version with the
 CLJ-1515 patch. I saw ~ 20 seconds before and 15 seconds after.

 But the new version also makes pure seqs better and I saw the full
 (dorun ...)  drop from 6 seconds before to 3 seconds after too.

 Just a hint of the benefits coming in CLJ-1515.

Awesome!  Can't wait for alpha7 then. :-)

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-02 Thread Alex Miller
Just for fun, I ran the (dorun (sequence (comp (mapcat #(range %)) (mapcat 
#(range %))) (range 1000))) and eduction version with the CLJ-1515 patch. I 
saw ~20 seconds before and 15 seconds after. 

But the new version also makes pure seqs better and I saw the full (dorun 
...) drop from 6 seconds before to 3 seconds after too.

Just a hint of the benefits coming in CLJ-1515.



On Thursday, April 2, 2015 at 9:21:08 AM UTC-5, Michał Marczyk wrote:

 It may be worth noting that while the return value of range is wrapped in 
 lazy-seq and thus isn't itself a clojure.lang.IChunkedSeq, what you get 
 when you realize it is indeed chunked:

 (contains? (ancestors (class (seq (range 128 clojure.lang.IChunkedSeq)
 true

 It doesn't implement c.l.IReduce, but 
 clojure.core.protocols/InternalReduce has an implementation for 
 c.l.IChunkedSeq. At least transduce should be able to take advantage of the 
 InternalReduce implementation (via CollReduce). transduce could be used for 
 short-circuiting search with (reduced …), so it might be a legitimate 
 contender here.

 Cheers,
 Michał


 On 2 April 2015 at 15:38, Tassilo Horn ts...@gnu.org javascript: 
 wrote:

 Alex Miller al...@puredanger.com javascript: writes:

 Hi Alex,

  If you're going to use expanding transformations and not realize all of 
 the
  results then I think sequences are likely a better choice for you.

 Ok, I see.

  However, at least I had expected that in the case where all elements
  are realized the transducer version should have been faster than the
  traditional version which also needs to fully realize all
  intermediate lazy seqs.  Why is it still three times slower?
 
  I think my main suggestion here is that you are using a non-reducible
  source (range) throughout these timings, so transducers have no
  leverage on the input side. CLJ-1515 will make range reducible and
  should help a lot on this particular example.

 Well, even if I revamp the (admittedly contrieved) example to have
 reducible vectors as source and also intermediates

   (let [v (vec (range 0 1000))
 vs (zipmap (range 0 1000)
(for [i (range 0 1000)]
  (vec (range i 1000]
 (time (dorun (sequence (comp (mapcat (fn [i] (vs i)))
  (mapcat (fn [i] (vs i
v

 it still takes 18 seconds instead of 21 with lazy seqs produced by
 range, or just 7 seconds with normal lazy seq functions.

 In my real scenario, I think there's also no IReduces paths because the
 mapcat functions either return normal lazy seqs or Java Collections
 (which are not actually clojure collections).  But usually, the
 transformations are not so freaking expanding as the example above.  I
 benchmarked a bit, and there sometimes using transducers is faster and
 sometimes it is not.  So I've made than configurable (with normal lazy
 seqs as default) so users can benchmark and then decide, and I don't
 need to choose for them. :-)

 Oh, and actually *you* have made that possible by making me aware of

   (sequence (comp xform*) start-coll)

 is almost identical to

   (- start-coll xform*)

 that is, when my macro computes xforms as if they were meant for
 transducing, I can also use them traditionally with -.

 Until now, I've newer used - but before I had implemented the
 expansion for transducers, I used a for with gensyms for intermediates
 like:

   (for [G__1 start-coll
 G__2 (xform1 G__1)
 G__3 (xform2 G__2)]
 G__3)

 That's pretty much different to generate.  But since the xforms for
 transducers and - are the same, switching between lazy seq fns and
 transducers is just changing how start-coll and xforms are composed.
 Awesome!

  So my conclusion is that you cannot use transducers as a kind of
  drop-in replacement of traditional sequence manipulation functions.
  They pay off only when you can make very strong assumptions about the
  sizes and compututation costs of intermediate collections, and I
  think you cannot do that in general.  Or well, maybe you can when you
  program an application but you almost certainly cannot when you
  program a library and thus have no clue about how that's gonna be
  used by users.
 
  Transducers make different trade offs than sequences and there will
  always be cases where one or the other is a better choice.  I really
  appreciate this thread as highlighting some of the nuances.

 Yes, thanks a lot for your patience.  I appreciate that very much.

  Transducers break transformations into three parts - source iteration,
  composed transforms, and output collection.  In the case of reducible
  inputs, multiple transforms, and full realization, transducers can be
  much faster.  If not all of those are in play, then the results are
  more subtle.  One thing I've found in perf testing a lot of stuff is
  that chunked sequences continually surprise me at how fast they can
  be.

 Then maybe I should experiment with 

Re: Transducers: sequence versus eduction

2015-04-02 Thread Tassilo Horn
Alex Miller a...@puredanger.com writes:

Hi Alex,

 If you're going to use expanding transformations and not realize all of the
 results then I think sequences are likely a better choice for you.

Ok, I see.

 However, at least I had expected that in the case where all elements
 are realized the transducer version should have been faster than the
 traditional version which also needs to fully realize all
 intermediate lazy seqs.  Why is it still three times slower?

 I think my main suggestion here is that you are using a non-reducible
 source (range) throughout these timings, so transducers have no
 leverage on the input side. CLJ-1515 will make range reducible and
 should help a lot on this particular example.

Well, even if I revamp the (admittedly contrieved) example to have
reducible vectors as source and also intermediates

  (let [v (vec (range 0 1000))
vs (zipmap (range 0 1000)
   (for [i (range 0 1000)]
 (vec (range i 1000]
(time (dorun (sequence (comp (mapcat (fn [i] (vs i)))
 (mapcat (fn [i] (vs i
   v

it still takes 18 seconds instead of 21 with lazy seqs produced by
range, or just 7 seconds with normal lazy seq functions.

In my real scenario, I think there's also no IReduces paths because the
mapcat functions either return normal lazy seqs or Java Collections
(which are not actually clojure collections).  But usually, the
transformations are not so freaking expanding as the example above.  I
benchmarked a bit, and there sometimes using transducers is faster and
sometimes it is not.  So I've made than configurable (with normal lazy
seqs as default) so users can benchmark and then decide, and I don't
need to choose for them. :-)

Oh, and actually *you* have made that possible by making me aware of

  (sequence (comp xform*) start-coll)

is almost identical to

  (- start-coll xform*)

that is, when my macro computes xforms as if they were meant for
transducing, I can also use them traditionally with -.

Until now, I've newer used - but before I had implemented the
expansion for transducers, I used a for with gensyms for intermediates
like:

  (for [G__1 start-coll
G__2 (xform1 G__1)
G__3 (xform2 G__2)]
G__3)

That's pretty much different to generate.  But since the xforms for
transducers and - are the same, switching between lazy seq fns and
transducers is just changing how start-coll and xforms are composed.
Awesome!

 So my conclusion is that you cannot use transducers as a kind of
 drop-in replacement of traditional sequence manipulation functions.
 They pay off only when you can make very strong assumptions about the
 sizes and compututation costs of intermediate collections, and I
 think you cannot do that in general.  Or well, maybe you can when you
 program an application but you almost certainly cannot when you
 program a library and thus have no clue about how that's gonna be
 used by users.

 Transducers make different trade offs than sequences and there will
 always be cases where one or the other is a better choice.  I really
 appreciate this thread as highlighting some of the nuances.

Yes, thanks a lot for your patience.  I appreciate that very much.

 Transducers break transformations into three parts - source iteration,
 composed transforms, and output collection.  In the case of reducible
 inputs, multiple transforms, and full realization, transducers can be
 much faster.  If not all of those are in play, then the results are
 more subtle.  One thing I've found in perf testing a lot of stuff is
 that chunked sequences continually surprise me at how fast they can
 be.

Then maybe I should experiment with chunged seqs.

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-02 Thread Michał Marczyk
It may be worth noting that while the return value of range is wrapped in
lazy-seq and thus isn't itself a clojure.lang.IChunkedSeq, what you get
when you realize it is indeed chunked:

(contains? (ancestors (class (seq (range 128 clojure.lang.IChunkedSeq)
true

It doesn't implement c.l.IReduce, but clojure.core.protocols/InternalReduce
has an implementation for c.l.IChunkedSeq. At least transduce should be
able to take advantage of the InternalReduce implementation (via
CollReduce). transduce could be used for short-circuiting search with
(reduced …), so it might be a legitimate contender here.

Cheers,
Michał


On 2 April 2015 at 15:38, Tassilo Horn t...@gnu.org wrote:

 Alex Miller a...@puredanger.com writes:

 Hi Alex,

  If you're going to use expanding transformations and not realize all of
 the
  results then I think sequences are likely a better choice for you.

 Ok, I see.

  However, at least I had expected that in the case where all elements
  are realized the transducer version should have been faster than the
  traditional version which also needs to fully realize all
  intermediate lazy seqs.  Why is it still three times slower?
 
  I think my main suggestion here is that you are using a non-reducible
  source (range) throughout these timings, so transducers have no
  leverage on the input side. CLJ-1515 will make range reducible and
  should help a lot on this particular example.

 Well, even if I revamp the (admittedly contrieved) example to have
 reducible vectors as source and also intermediates

   (let [v (vec (range 0 1000))
 vs (zipmap (range 0 1000)
(for [i (range 0 1000)]
  (vec (range i 1000]
 (time (dorun (sequence (comp (mapcat (fn [i] (vs i)))
  (mapcat (fn [i] (vs i
v

 it still takes 18 seconds instead of 21 with lazy seqs produced by
 range, or just 7 seconds with normal lazy seq functions.

 In my real scenario, I think there's also no IReduces paths because the
 mapcat functions either return normal lazy seqs or Java Collections
 (which are not actually clojure collections).  But usually, the
 transformations are not so freaking expanding as the example above.  I
 benchmarked a bit, and there sometimes using transducers is faster and
 sometimes it is not.  So I've made than configurable (with normal lazy
 seqs as default) so users can benchmark and then decide, and I don't
 need to choose for them. :-)

 Oh, and actually *you* have made that possible by making me aware of

   (sequence (comp xform*) start-coll)

 is almost identical to

   (- start-coll xform*)

 that is, when my macro computes xforms as if they were meant for
 transducing, I can also use them traditionally with -.

 Until now, I've newer used - but before I had implemented the
 expansion for transducers, I used a for with gensyms for intermediates
 like:

   (for [G__1 start-coll
 G__2 (xform1 G__1)
 G__3 (xform2 G__2)]
 G__3)

 That's pretty much different to generate.  But since the xforms for
 transducers and - are the same, switching between lazy seq fns and
 transducers is just changing how start-coll and xforms are composed.
 Awesome!

  So my conclusion is that you cannot use transducers as a kind of
  drop-in replacement of traditional sequence manipulation functions.
  They pay off only when you can make very strong assumptions about the
  sizes and compututation costs of intermediate collections, and I
  think you cannot do that in general.  Or well, maybe you can when you
  program an application but you almost certainly cannot when you
  program a library and thus have no clue about how that's gonna be
  used by users.
 
  Transducers make different trade offs than sequences and there will
  always be cases where one or the other is a better choice.  I really
  appreciate this thread as highlighting some of the nuances.

 Yes, thanks a lot for your patience.  I appreciate that very much.

  Transducers break transformations into three parts - source iteration,
  composed transforms, and output collection.  In the case of reducible
  inputs, multiple transforms, and full realization, transducers can be
  much faster.  If not all of those are in play, then the results are
  more subtle.  One thing I've found in perf testing a lot of stuff is
  that chunked sequences continually surprise me at how fast they can
  be.

 Then maybe I should experiment with chunged seqs.

 Bye,
 Tassilo

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are 

Re: Transducers: sequence versus eduction

2015-04-01 Thread Tassilo Horn
vve...@gmail.com writes:

 Eduction retains the ability to be recomposed with other transducers
 higher in the function chain. The following two are nearly equivalent:

 (transduce (take 1e2) + (eduction (filter odd?) (range)))
 (transduce (comp (filter odd?) (take 1e2)) + (range))

 This will be slower:
 (transduce (take 1e2) + (sequence (filter odd?) (range)))

 Execution time mean : 19.054407 µs
 Execution time mean : 19.530890 µs
 Execution time mean : 39.955692 µs

Interesting.  But in my code experimentation has shown that sequence is
almost always faster in my use-cases which usually look like

  (sequence (comp (mapcat foo1)
  (filter p1)
  (map f1)
  (mapcat foo2)
  (filter p2)
  (mapcat foo3)
  (filter p3))
 coll)

Here, I've switched between making the foo* functions return eductions
or lazy sequences, and the latter seems to alway be faster although that
looks like a use-case of eduction based on my assumptions...

So maybe eduction can unfold its full potential only during transduce?

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread vvedee
Eduction retains the ability to be recomposed with other transducers higher 
in the function chain. The following two are nearly equivalent:
(transduce (take 1e2) + (eduction (filter odd?) (range)))
(transduce (comp (filter odd?) (take 1e2)) + (range))

This will be slower:
(transduce (take 1e2) + (sequence (filter odd?) (range)))

Execution time mean : 19.054407 µs
Execution time mean : 19.530890 µs
Execution time mean : 39.955692 µs

I also wonder to what extent eduction can be used as a drop 
in replacement for sequence.


On Wednesday, April 1, 2015 at 9:13:07 AM UTC+2, Tassilo Horn wrote:

 Hi all, 

 I've switched many nested filter/map/mapcat applications in my code to 
 using transducers.  That brought a moderate speedup in certain cases 
 and the deeper the nesting has been before, the clearer the transducers 
 code is in comparison, so yay! :-) 

 However, I'm still quite unsure about the difference between `sequence` 
 and `eduction`.  From the docs and experimentation, I came to the 
 assumptions below and I'd be grateful if someone with more knowledge 
 could verify/falsify/add: 

   - Return types differ: Sequence returns a standard lazy seq, eductions 
 an instance of Eduction. 

   - Eductions are reducible/sequable/iterable, i.e., basically I can use 
 them wherever a (lazy) seq would also do, so sequence and eduction 
 are quite interchangeable except when poking at internals, e.g., 
 (.contains (sequence ...) x) works whereas (.contains (eduction ...) 
 x) doesn't. 

   - Both compute their contents lazily. 

   - Lazy seqs cache their already realized contents, eductions compute 
 them over and over again on each iteration. 

 Because of that, I came to the conclusion that whenever I ask myself if 
 one of my functions should return a lazy seq or an eduction, I should 
 use these rules: 

   1. If the function is likely to be used like 

  (let [xs (seq-producing-fn args)] 
(or (do-stuff-with xs) 
(do-other-stuff-with xs) 
...)) 

  that is, the resulting seq is likely to be bound to a variable 
  which is then used multiple times (and thus lazy seq caching is 
  benefitical), then use sequence. 

   2. If it is a private function only used internally and never with the 
  usage pattern of point 1, then definitively use eduction. 

   3. If its a public function which usually isn't used with a pattern as 
  in point 1, then I'm unsure.  eduction is probably more efficient 
  but sequence fits better in the original almost everything returns 
  a lazy seq design.  Also, the latter has the benefit that users of 
  the library don't need to know anything about transducers. 

 Is that sensible?  Or am I completely wrong with my assumptions about 
 sequence and eduction? 

 On a related note, could someone please clarify the statement from the 
 transducers docs for `sequence`? 

 ,[ Docs of sequence at http://clojure.org/transducers ] 
 | The resulting sequence elements are incrementally computed. These 
 | sequences will consume input incrementally as needed and fully realize 
 | intermediate operations.  This behavior differs from the equivalent 
 | operations on lazy sequences. 
 ` 

 I'm curious about the fully realize intermediate operations part. 
 Does it mean that in a traditional 

 (mapcat #(range %) (range 1)) 

 the inner range is also evaluated lazy but with 

 (sequence (mapcat #(range %)) (range 1)) 

 it is not?  It seems so.  At least dorun-ning these two expressions 
 shows that the traditional version is more than twice as fast than the 
 transducer version.  Also, the same seems to hold for 

 (eduction (mapcat #(range %)) (range 1)) 

 which is exactly as fast (or rather slow) as the sequence version. 

 But wouldn't that mean that transducers with mapcat where the mapcatted 
 function isn't super-cheap is a bad idea in general at least from a 
 performance POV? 

 Bye, 
 Tassilo 


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread Alex Miller
The general idea is that eduction is best when the result will be 
completely consumed in a reducible context. Any case of reusing the result 
will likely be better served by sequence which can cache and reuse the 
answer.

On Wednesday, April 1, 2015 at 3:51:53 AM UTC-5, Tassilo Horn wrote:

 vve...@gmail.com javascript: writes: 

  Eduction retains the ability to be recomposed with other transducers 
  higher in the function chain. The following two are nearly equivalent: 
  
  (transduce (take 1e2) + (eduction (filter odd?) (range))) 
  (transduce (comp (filter odd?) (take 1e2)) + (range)) 
  
  This will be slower: 
  (transduce (take 1e2) + (sequence (filter odd?) (range))) 
  
  Execution time mean : 19.054407 µs 
  Execution time mean : 19.530890 µs 
  Execution time mean : 39.955692 µs 

 Interesting.  But in my code experimentation has shown that sequence is 
 almost always faster in my use-cases which usually look like 

   (sequence (comp (mapcat foo1) 
   (filter p1) 
   (map f1) 
   (mapcat foo2) 
   (filter p2) 
   (mapcat foo3) 
   (filter p3)) 
  coll) 

 Here, I've switched between making the foo* functions return eductions 
 or lazy sequences, and the latter seems to alway be faster although that 
 looks like a use-case of eduction based on my assumptions... 

 So maybe eduction can unfold its full potential only during transduce? 

 Bye, 
 Tassilo 


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread Alex Miller
First, I love this discussion! Great questions, great thinking.

Up top, I wanted to mention a couple things that have changed in alpha6:
- Eduction is no longer Seqable and thus the return from eduction is not 
seqable  (but it is reducible and iterable). You can use iterator-seq to 
get a chunked seq over the top if you need one. 
- Prior to alpha6, sequence with transformations returned a 
LazyTransformer. That's gone and now transformations are done using 
TransformerIterator, which is subsequently wrapped with a (now chunked) 
iterator-seq.
- There are lots of performance implications due to those changes and I 
would recommend re-testing any perf test related to sequence or eduction on 
alpha6 to get a fresh picture as anything pre-alpha6 is not comparable.
- eduction now takes multiple transformations, not just one, and composes 
them. This is designed for mechanical rewriting (hello tool developers!!) 
of - chains like this:

(- s (interpose 5) (partition-all 2))

to this:

(- s (eduction (interpose 5) (partition-all 2)))

That particular example has timings in CLJ-1669 and shows a reduction from 
501 microseconds to 108 microseconds on a source vector, comparable savings 
on a source sequence. You do need to be careful to not include 
non-transducer functions in that chain, or stop the rewrite when you fall 
into a materialization function at the end (for example, sort). 

On Wednesday, April 1, 2015 at 2:13:07 AM UTC-5, Tassilo Horn wrote:

 Hi all, 

 I've switched many nested filter/map/mapcat applications in my code to 
 using transducers.  That brought a moderate speedup in certain cases 
 and the deeper the nesting has been before, the clearer the transducers 
 code is in comparison, so yay! :-) 


Depending what you're transducing over, you may see some additional gains 
in alpha6.
 

 However, I'm still quite unsure about the difference between `sequence` 
 and `eduction`.  From the docs and experimentation, I came to the 
 assumptions below and I'd be grateful if someone with more knowledge 
 could verify/falsify/add: 

   - Return types differ: Sequence returns a standard lazy seq, eductions 
 an instance of Eduction. 


Yes, although I think the actual return type of eduction is not important, 
rather the implemented interfaces of Iterable and IReduceInit are the key 
thing.
 

   - Eductions are reducible/sequable/iterable, i.e., basically I can use 


no longer explicitly seqable, although seq will automatically wrap a 
chunked iterator-seq around it if passed to a sequence function.
 

 them wherever a (lazy) seq would also do, so sequence and eduction 
 are quite interchangeable except when poking at internals, e.g., 
 (.contains (sequence ...) x) works whereas (.contains (eduction ...) 
 x) doesn't. 


No guarantees made on those internals, only on what is promised in the 
docstrings (reducible/iterable for eduction and sequence for sequence). :) 
 

   - Both compute their contents lazily. 


eduction is not lazy by itself. if used in a non-seq context, it is eagerly 
recomputed on each use. If you use it in a seq context, then it will be 
computed and cached as needed.
 

   - Lazy seqs cache their already realized contents, eductions compute 
 them over and over again on each iteration. 


Yes.
 


 Because of that, I came to the conclusion that whenever I ask myself if 
 one of my functions should return a lazy seq or an eduction, I should 
 use these rules: 

   1. If the function is likely to be used like 

  (let [xs (seq-producing-fn args)] 
(or (do-stuff-with xs) 
(do-other-stuff-with xs) 
...)) 

  that is, the resulting seq is likely to be bound to a variable 
  which is then used multiple times (and thus lazy seq caching is 
  benefitical), then use sequence. 


Yes, it makes sense to take advantage of seq caching this way.
 

   2. If it is a private function only used internally and never with the 
  usage pattern of point 1, then definitively use eduction. 



  3. If its a public function which usually isn't used with a pattern as 
  in point 1, then I'm unsure.  eduction is probably more efficient 
  but sequence fits better in the original almost everything returns 
  a lazy seq design.  Also, the latter has the benefit that users of 
  the library don't need to know anything about transducers. 

 Is that sensible?  Or am I completely wrong with my assumptions about 
 sequence and eduction? 


You're definitely on the right track. How the result is going to be used is 
pretty important - eduction is designed to be consumed entirely for a 
result and thus gives up the caching of sequences. 

If you are going to consume the entire thing once, particularly in the 
context of a reduction, then eduction is far more efficient. Eduction does 
not need to allocate or cache intermediate sequence objects, so is much 
more memory and allocation efficient when used in this way (where 

Re: Transducers: sequence versus eduction

2015-04-01 Thread Tassilo Horn
Alex Miller a...@puredanger.com writes:

Hi Alex,

 - Eduction is no longer Seqable and thus the return from eduction is not
 seqable (but it is reducible and iterable). You can use iterator-seq to get a
 chunked seq over the top if you need one. 

Really?

user *clojure-version*
{:major 1, :minor 7, :incremental 0, :qualifier alpha6}
user (seq (eduction (map inc) (range 10)))
(1 2 3 4 5 6 7 8 9 10)

 - There are lots of performance implications due to those changes and
 I would recommend re-testing any perf test related to sequence or
 eduction on alpha6 to get a fresh picture as anything pre-alpha6 is
 not comparable.

My observations are all based on today's experience with alpha6. :-)

 - eduction now takes multiple transformations, not just one, and
 composes them.  This is designed for mechanical rewriting (hello tool
 developers!!) of - chains like this:

 (- s (interpose 5) (partition-all 2))

 to this:

 (- s (eduction (interpose 5) (partition-all 2)))

Ah, that's sensible.

 The general idea is that eduction is best when the result will be
 completely consumed in a reducible context.  Any case of reusing the
 result will likely be better served by sequence which can cache and
 reuse the answer.

Yes, that's what I guessed.  But at least when I tested replacing
sequence with eduction at exactly these places () the result has been a
slight slowdown instead of a speedup.  But that might have been
accidental as it is hard to do measurements with real-world code where a
5% slowdown/speedup might be the reason of something completely
unrelated.

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread Alex Miller
On Wed, Apr 1, 2015 at 2:17 PM, Tassilo Horn t...@gnu.org wrote:

 Alex Miller a...@puredanger.com writes:

 Hi Alex,

  - Eduction is no longer Seqable and thus the return from eduction is not
  seqable (but it is reducible and iterable). You can use iterator-seq to
 get a
  chunked seq over the top if you need one.

 Really?

 user *clojure-version*
 {:major 1, :minor 7, :incremental 0, :qualifier alpha6}
 user (seq (eduction (map inc) (range 10)))
 (1 2 3 4 5 6 7 8 9 10)


Yep. :)

user= (supers (class (eduction (map inc) (range 10
#{clojure.lang.Sequential java.lang.Object java.lang.Iterable
clojure.lang.IReduceInit clojure.lang.IType}

However, seq checks for Iterable and will automatically use iterator-seq to
produce a seq from an Iterable, which is what's happening here.
See:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L517


 - There are lots of performance implications due to those changes and
  I would recommend re-testing any perf test related to sequence or
  eduction on alpha6 to get a fresh picture as anything pre-alpha6 is
  not comparable.

 My observations are all based on today's experience with alpha6. :-)

  - eduction now takes multiple transformations, not just one, and
  composes them.  This is designed for mechanical rewriting (hello tool
  developers!!) of - chains like this:
 
  (- s (interpose 5) (partition-all 2))
 
  to this:
 
  (- s (eduction (interpose 5) (partition-all 2)))

 Ah, that's sensible.

  The general idea is that eduction is best when the result will be
  completely consumed in a reducible context.  Any case of reusing the
  result will likely be better served by sequence which can cache and
  reuse the answer.

 Yes, that's what I guessed.  But at least when I tested replacing
 sequence with eduction at exactly these places () the result has been a
 slight slowdown instead of a speedup.  But that might have been
 accidental as it is hard to do measurements with real-world code where a
 5% slowdown/speedup might be the reason of something completely
 unrelated.


Are you including the cost of realizing the elements? Comparing this stuff
well is hard, or at least I had a hard time thinking through all the
nuances. From what I've looked at, I think the alpha6 changes have made
things slightly worse for sequence with transformations and somewhat better
for eductions used in a reducible or sequence context.



 Bye,
 Tassilo


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread Tassilo Horn
Alex Miller a...@puredanger.com writes:

  seqable (but it is reducible and iterable). You can use
  iterator-seq to get a chunked seq over the top if you need one.

 Really?

 user *clojure-version*
 {:major 1, :minor 7, :incremental 0, :qualifier alpha6}
 user (seq (eduction (map inc) (range 10)))
 (1 2 3 4 5 6 7 8 9 10)

 Yep. :)

 user= (supers (class (eduction (map inc) (range 10
 #{clojure.lang.Sequential java.lang.Object java.lang.Iterable
 clojure.lang.IReduceInit clojure.lang.IType}

 However, seq checks for Iterable and will automatically use
 iterator-seq to produce a seq from an Iterable, which is what's
 happening here.

Ok.  But to me, if I can call `seq` on a thing and iterate it using
`first` and `rest`, that's a sequable thing to me. :-)

  The general idea is that eduction is best when the result will
  be completely consumed in a reducible context. Any case of
  reusing the result will likely be better served by sequence
  which can cache and reuse the answer.

 Yes, that's what I guessed.  But at least when I tested replacing
 sequence with eduction at exactly these places () the result has
 been a slight slowdown instead of a speedup.  But that might have
 been accidental as it is hard to do measurements with real-world
 code where a 5% slowdown/speedup might be the reason of something
 completely unrelated.

 Are you including the cost of realizing the elements?

Oh, yes, and not only that.  The tests I did maybe spend 20% of their
time in the transducer part, 80% in other stuff.  So I definitively have
to do some more elaborate benchmarking where the transducer part is
somehow isolated.

That's why I've asked if there are clear usage-recipes when to use what.
I think my prime use-case is deeply nested mapcatting where the
mapcatted function gets an object and returns a java collection, and
filtering with usually a quite cheap predicate.  E.g.

  (sequence-or-eduction (comp (mapcat ...)
  (filter ...)
  (mapcat ...)
  ...
  (filter ...))
coll)

That's part of a larger search algorithm which either stops after
finding the first match (in which case the resulting sequence is likely
not to be realized completely) or runs till all matches are found (in
which case all these transducer-produced sequences would be fully
realized).

If that would make sense, I could use eduction when I know everything
will be realized and sequence in the other case (or vice versa).

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Transducers: sequence versus eduction

2015-04-01 Thread Alex Miller
On Wed, Apr 1, 2015 at 3:17 PM, Tassilo Horn t...@gnu.org wrote:

 Alex Miller a...@puredanger.com writes:

 Ok.  But to me, if I can call `seq` on a thing and iterate it using
 `first` and `rest`, that's a sequable thing to me. :-)


Fair enough. I just meant it no longer implements Seqable. :)


   The general idea is that eduction is best when the result will
   be completely consumed in a reducible context. Any case of
   reusing the result will likely be better served by sequence
   which can cache and reuse the answer.
 
  Yes, that's what I guessed.  But at least when I tested replacing
  sequence with eduction at exactly these places () the result has
  been a slight slowdown instead of a speedup.  But that might have
  been accidental as it is hard to do measurements with real-world
  code where a 5% slowdown/speedup might be the reason of something
  completely unrelated.
 
  Are you including the cost of realizing the elements?

 Oh, yes, and not only that.  The tests I did maybe spend 20% of their
 time in the transducer part, 80% in other stuff.  So I definitively have
 to do some more elaborate benchmarking where the transducer part is
 somehow isolated.

 That's why I've asked if there are clear usage-recipes when to use what.
 I think my prime use-case is deeply nested mapcatting where the
 mapcatted function gets an object and returns a java collection, and
 filtering with usually a quite cheap predicate.  E.g.

   (sequence-or-eduction (comp (mapcat ...)
   (filter ...)
   (mapcat ...)
   ...
   (filter ...))
 coll)

 That's part of a larger search algorithm which either stops after
 finding the first match (in which case the resulting sequence is likely
 not to be realized completely) or runs till all matches are found (in
 which case all these transducer-produced sequences would be fully
 realized).

 If that would make sense, I could use eduction when I know everything
 will be realized and sequence in the other case (or vice versa).


In this case, I don't know that I'd expect much difference. sequence is
going to create a TransformerIterator, and wrap it in an
chunkedIteratorSeq. Using (seq (eduction ...) should give you exactly the
same thing, except calling through an Eduction type. So, I doubt it matters.


 Bye,
 Tassilo


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.