Re: Chunking is making my life more difficult.

Chas Emerick Thu, 30 Dec 2010 22:49:22 -0800

On Fri, Dec 31, 2010 at 12:25 AM, ehanneken <ehanne...@pobox.com> wrote:
> I spent a long time debugging some Clojure code yesterday.  The
> essence of it looked similar to this:
> 
> (defn items []
>  (mapcat expensive-function (range 0 4000 100)))
> 
> . . . (take 5 (items)) . . .
> 
> expensive-function is a function that issues an HTTP GET to retrieve a
> vector of data.  Since range's docstring says it returns a lazy
> sequence, and since mapcat is defined in terms of map and concat,
> which are both supposed to return lazy sequences, I expected (take 5
> (items)) to cause only one HTTP GET.  In reality, it causes 32 GETs.
> That's kind of costly in time and space, considering I merely wanted
> the first 5 of the 100 items returned in the response to the first
> GET.
> 
> This behavior was baffling to me at first, but after some research I
> found section 12.3 of _The Joy of Clojure_, which mentions that ever
> since Clojure 1.1 some functions (such as range) which are advertised
> as lazy are actually moderately eager, realizing chunks of up to 32
> elements at a time.



Chunking is dependent upon the type of seq being traversed, which is in turn 
dependent upon the type of collection underlying the seq.  Ranges always 
produce chunked seqs, as do non-empty vectors for example.  If chunking is a 
concern, you can always fall back to seqs "grounded" in lists, which are always 
unchunked (and therefore yield one-at-a-time behaviour with e.g. map):

=> (->> (range 50)
     (map println)
     first)
0
1
2
...
31
nil

vs…

=> (->> (range 50)
     (mapcat list)
     (map println)
     first)
0
nil

I'm not sure that this means that map, concat, etc are eager in practical 
terms.  I view the chunking in much the same light as transients vis á vis 
immutability of persistent data structures -- the result is a relative perf 
improvement that doesn't impact semantics in the vast majority of cases.  In my 
experience, chunking is only detrimental when mapping side-effecting (usually 
IO-related) functions across collections, as you're doing.  Given that, using a 
(mapcat list) interstitial to get unchunked seqs is inconsequential w.r.t. 
perf, etc.  I'd be interested in hearing any different perspectives.

FYI, chunked-seq? can be used to determine if a seq supports chunking or not.

- Chas

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Chunking is making my life more difficult.

Reply via email to