Also blogged here:

http://headius.blogspot.com/2007/10/another-performance-discovery-rexml.html

I've discovered a really awful bottleneck in REXML processing.

Look at these results for parsing our build.xml:

read content from stream, no DOM
  2.592000   0.000000   2.592000 (  2.592000)
  1.326000   0.000000   1.326000 (  1.326000)
  0.853000   0.000000   0.853000 (  0.853000)
  0.620000   0.000000   0.620000 (  0.620000)
  0.471000   0.000000   0.471000 (  0.471000)
read content once, no DOM
  5.323000   0.000000   5.323000 (  5.323000)
  5.328000   0.000000   5.328000 (  5.328000)
  5.209000   0.000000   5.209000 (  5.209000)
  5.173000   0.000000   5.173000 (  5.173000)
  5.138000   0.000000   5.138000 (  5.138000)

When reading from a stream, the content is read in in chunks, with each chunk being matched (and therefore encoded/decoded) in turn.

However, when a fully-read string is used in memory, matching proceeds as follows:

1. set buffer to entire string
2. match against the buffer
3. set buffer to post match

Now this is obviously a little inefficient, but copy-on-write String helps a lot. However in our case this means that we encode/decode the entire XML content for every element match. For any nontrivial file, this is *terrible* overhead.

So what's the fix? Here's the same second benchmark using a StringIO object passed to the parser.

read content once, no DOM
  0.640000   0.000000   0.640000 (  0.640000)
  0.693000   0.000000   0.693000 (  0.693000)
  0.542000   0.000000   0.542000 (  0.542000)
  0.349000   0.000000   0.349000 (  0.349000)
  0.336000   0.000000   0.336000 (  0.336000)

This is a perfect indication why JRuby's Rails performance is nowhere near where it could be. Of course the original code would work fine once our Oniguruma port is complete, but this is a simple change to make for now.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

Reply via email to