Re: Memory Consumption of Large Sequences

Chouser Mon, 02 Feb 2009 15:12:10 -0800

On Mon, Feb 2, 2009 at 4:48 PM, Keith Bennett <[email protected]> wrote:
>
> Clojure definitely has its benefits, but in terms of memory footprint,
> Java appears to be *much* more economical


It's probably worth being careful to separate the different parts of
Java and Clojure.  Clojure code can use most Java data structures
quite comfortably, and Java code can without too much inelegance use
Clojure data structures.  Besides that, each have other benefits and
drawbacks besides the data structures they can use.

> In a Java ArrayList, only a single ArrayList object is used to store
> all n elements added.  The objects are stored in an internal Object
> [].

If you really want a large mutable collection in your Clojure code,
you can use an ArrayList quite easily:

(def a1 (java.util.ArrayList.))
(dotimes [_ 100000] (.add a1 "x"))

user=> (take 10 a1)
("x" "x" "x" "x" "x" "x" "x" "x" "x" "x")

Here Clojure is adding essentially no memory overhead at all, beyond
what the ArrayList itself brings, and the resulting object plays quite
well with many common Clojure idioms.

But mutability has it's own costs, and Clojure provides several
options to increase your chances of finding an immutable solution that
also fits in your memory and CPU performance requirements.

For example, chances are you'll want to be accessing a large ordered
collection like this by index.  In this case the PersistentList you
used in your original example is the wrong choice anyway, since it's
O(n) for lookups.  Instead, you might like one of the vector types:

(def v1 (vec (replicate 100000 "x")))

Take a look at that with your profiling tool and you'll see that it's
taking very little memory.  In fact, the data is stored in a single
Java array Object[].

user=> (class v1)
clojure.lang.LazilyPersistentVector

But unlike ArrayList, this is immutable, so you can get a new object
with one element changed:

(def v2 (assoc v1 5 :foo))

user=> (take 10 v2)
("x" "x" "x" "x" "x" :foo "x" "x" "x" "x")

When you do this, you'll see your memory jump a bit, as you now have
both the original LazilyPersistentVector and the new PersistentVector:

user=> (class v2)
clojure.lang.PersistentVector

PersistentVectors have structural sharing, so each subsequent "copy"
you make with elements changed will cost much less memory then new
full copies of the whole vector would.

I feel like I'm rambling now, so let me tie this off.  Depending on
your actual use case, vectors may work well.  If you don't need the
whole collection at once, perhaps a lazy seq that simply promises the
values as they're demanded would work.  One of the Map types might be
good if your collection is likely to be sparse.  Or if none of these
immutable options will do, there are always ArrayList or even raw Java
arrays at hand.

You don't need to give up macros and the REPL just because a
100000-element cons-list feels bloated. :-)

--Chouser

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Memory Consumption of Large Sequences

Reply via email to