Simple disk IO/Hash consumes wild amounts of memory
---------------------------------------------------
Key: JRUBY-2701
URL: http://jira.codehaus.org/browse/JRUBY-2701
Project: JRuby
Issue Type: Bug
Components: Core Classes/Modules, Performance
Affects Versions: JRuby 1.1.2
Environment: Ubuntu, java 1.5.0_13
Reporter: Jørgen P. Tjernø
I used this command to generate the test data (change the *10 to experiment
with different datasizes - this is 10M)
/tmp$ JAVA_OPTS='-server' jruby -e 'open("testdata", "w") do |f| while f.pos <
1024*1024*10; f.puts "#{f.pos} 1"; end; end'
Then I used this script to produce the memory usage:
/tmp$ JAVA_OPTS='-server -Dcom.sun.management.jmxremote' JAVA_MEM='-Xmx1024m'
jruby -e 'occurrences = {}; File.open("testdata") { |f| f.each { |line| if
line.strip =~ /^(.*) (\d+)$/; occurrences[$1] = $2.to_i; end } }'
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
/tmp$ wc -l testdata
1056867 testdata
/tmp$ wc -L testdata
10 testdata
So, yeah. It's using >1024MB of memory to run this code on ~1 million lines,
where the longest line is 10 bytes. Quick calculations show that each line
consumes MORE than 1kb of memory (isn't that a bit high for a hash entry that
maps strings of less than 8 bytes to an int?). :o
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email