I have a database table with 3 million plus entries.  (Its all the
files on my system and their md5 signatures).  If I do something
simple like:

FilePaths.all.each { |f| puts f.path }

the ruby process blows up to 1G+ and it takes forever and a day.  In
fact, I've never let it finish.

With the lazy load, when 'each' is called, the whole result gets
stuffed into a Collection.  While all that is neat most of the time,
in this case, it is really not what I want to do at all.

As I read the code, the "do |collection| ..." block passed to the
Collection.new in read_many of DataObjectsAdapter becomes the
@load_with_block in the LazyArray.  when 'each' is called, this block
gets called which loads the entire collection.

It seems like this problem would come up -- perhaps not often but more
than seldom.

Since 'each' and the other RETURN_SELF methods must also return the
collection, it looks to me that what we need is a "read" method.
Something that implies that the results from the query will be run
through but not collected anywhere.

Actually, what I like best is: FilePaths.all { |f| puts f.path }

So... I did some chain sawing and got that to work.  I added a &block
parameter to Model#all.  If it is called with a block then it calls
read_many and passes the block.  Otherwise, it calls "load_many".  I
changed read_many to load_many since that is what it does (maybe even
call it load_all).  Then created a read_many that is roughly the same
but does not create a collection.  Instead, loops through the results,
creates a record and then yields the record in each pass through the
loop.

I trimmed my test down to 1M entries.  The results are not as
spectacular as I would have assumed.  The test is:


if ARGV[0] == "new"
  puts "New Style"
  FilePath.all do |f|
    g = f
  end
else
  puts "Old Style"
  FilePath.all.each do |f|
    g = f
  end
end
system("ps uxwwp#{Process.pid}")

and the results are:

New Style
USER   PID %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME
COMMAND
pedz  6527  99.2 38.4  1402588 805452   p1  R+    7:49PM   5:05.76
ruby ./sample.rb new

real    5m18.943s
user    5m1.466s
sys     0m4.481s

verses:

Old Style
USER   PID %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME
COMMAND
pedz  6539  98.1 39.9  1435280 836192   p1  R+    7:54PM   6:18.77
ruby ./sample.rb old

real    6m32.176s
user    6m13.733s
sys     0m5.231s

Is this something others are interested in?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to