I have a database table with 3 million plus entries. (Its all the
files on my system and their md5 signatures). If I do something
simple like:
FilePaths.all.each { |f| puts f.path }
the ruby process blows up to 1G+ and it takes forever and a day. In
fact, I've never let it finish.
With the lazy load, when 'each' is called, the whole result gets
stuffed into a Collection. While all that is neat most of the time,
in this case, it is really not what I want to do at all.
As I read the code, the "do |collection| ..." block passed to the
Collection.new in read_many of DataObjectsAdapter becomes the
@load_with_block in the LazyArray. when 'each' is called, this block
gets called which loads the entire collection.
It seems like this problem would come up -- perhaps not often but more
than seldom.
Since 'each' and the other RETURN_SELF methods must also return the
collection, it looks to me that what we need is a "read" method.
Something that implies that the results from the query will be run
through but not collected anywhere.
Actually, what I like best is: FilePaths.all { |f| puts f.path }
So... I did some chain sawing and got that to work. I added a &block
parameter to Model#all. If it is called with a block then it calls
read_many and passes the block. Otherwise, it calls "load_many". I
changed read_many to load_many since that is what it does (maybe even
call it load_all). Then created a read_many that is roughly the same
but does not create a collection. Instead, loops through the results,
creates a record and then yields the record in each pass through the
loop.
I trimmed my test down to 1M entries. The results are not as
spectacular as I would have assumed. The test is:
if ARGV[0] == "new"
puts "New Style"
FilePath.all do |f|
g = f
end
else
puts "Old Style"
FilePath.all.each do |f|
g = f
end
end
system("ps uxwwp#{Process.pid}")
and the results are:
New Style
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME
COMMAND
pedz 6527 99.2 38.4 1402588 805452 p1 R+ 7:49PM 5:05.76
ruby ./sample.rb new
real 5m18.943s
user 5m1.466s
sys 0m4.481s
verses:
Old Style
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME
COMMAND
pedz 6539 98.1 39.9 1435280 836192 p1 R+ 7:54PM 6:18.77
ruby ./sample.rb old
real 6m32.176s
user 6m13.733s
sys 0m5.231s
Is this something others are interested in?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"DataMapper" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/datamapper?hl=en
-~----------~----~----~----~------~----~------~--~---