On 5 Lug, 18:49, Ken Wesson <kwess...@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 11:22 AM, Patrick Houk <path...@gmail.com> wrote:
> > Does the file you are evaluating have more than 65535 characters?  As
> > far as I can tell, that is the maximum length of a String literal in
> > Java (see the CONSTANT_Utf8_info struct in
> >http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc...).
> > I've encountered that limit when using Eclipse/CounterClockwise.  The
> > problem occurs when evaluating a file by doing something like:
>
> > (clojure.lang.Compiler/load (java.io.StringReader. "the-whole-file-as-
> > a-string"))
>
> > So the contents of the file ends up as a String literal, and Clojure
> > will generate a corrupt class if that String is too long.
> > CounterClockwise calls a function in nREPL (helpers/load-file-command)
> > that does this.  Perhaps Emacs/Slime is doing something similar.
>
> Smells like multiple bugs to me.
>
> 1. A too-large string literal should have a specific error message,
> rather than generate a misleading one suggesting a different type of
> problem.

There is no such thing as a too-large string literal in a class file.
See: <http://java.sun.com/docs/books/jvms/second_edition/html/
ClassFile.doc.html#7963>. String literals are made of 1-byte tag, 2-
bytes length, and (length * 1-byte) contents. I suppose Clojure's
compiler is generating an incorrect class file because the length is
either overflowing or growing past two bytes.

> 2. The limit should not be different from that on String objects in
> general, namely 2147483647 characters which nobody is likely to hit
> unless they mistakenly call read-string on that 1080p Avatar blu-ray
> rip .mkv they aren't legally supposed to possess.

That's a limitation imposed by the Java class file format.

> 3. Though both of the above bugs are in Oracle's Java implementation,

By the above, 1. is a Clojure bug and 2. is not a bug at all.

> it would seem to be a bug in Clojure's compiler if it is trying to
> make the entire source code of a namespace into a string *literal* in
> dynamically-generated bytecode somewhere rather than a string
> *object*.

Actually it seems it's the IDE, rather than Clojure, that is
evaluating a form containing such a big literal. Since Clojure has no
interpreter, it needs to compile that form.

> Sensible alternatives are a) get the string to whatever
> consumes it by some other means than embedding it as a single
> monolithic constant in bytecode,

This is what we currently do in ABCL (by storing literal objects in a
thread-local variable and retrieving them later when the compiled code
is loaded), but it only works for the runtime compiler, not the file
compiler (in Clojure terms, it won't work with AOT compilation).

> b) convert long strings into shorter
> chunks and emit a static initializer into the bytecode to reassemble
> them with concatenation into a single runtime-computed string constant
> stored in another static field,

This is what I'd like to have :)

> and c) restructure whatever consumes
> the string to consume a seq, java.util.List, or whatever of strings
> instead and feed it digestible chunks (e.g. a separate string for each
> defn or other top-level form, in order of appearance in the input file
> -- surely nobody has *individual defns* exceeding 64KB).

The problem is not in the consumer, but in the form containing the
string; to do what you're proposing, the reader, upon encountering a
big enough string, would have to produce a seq/List/whatever instead,
the compiler would need to be able to dump such an object to a class,
and all Clojure code handling strings would have to be prepared to
handle such an object, too. I think it's a little impractical.

Regarding the size of individual defns, that's an orthogonal problem;
anyway, the size of the _bytecode_ for methods is limited to 64KB (see
<http://java.sun.com/docs/books/jvms/second_edition/html/
ClassFile.doc.html#88659>) and, while pretty big, it's not impossible
to reach it, especially when using complex macros to produce a lot of
generated code. We used to generate such big methods in ABCL because
at one point we tried to spell out in the bytecode all the class names
corresponding to functions in a compiled file, in order to avoid
reflection when loading the compiled functions. For files with many
functions (> 1000 iirc) the generated code became too big. It turned
out that this optimization had a negligible impact on performance, so
we reverted it.

Cheers,
Alessio

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to