Tim Bunce <[EMAIL PROTECTED]> wrote:                                   
> On Tue, Oct 28, 2003 at 02:37:29PM +0000, [EMAIL PROTECTED] wrote:        
>>                                                                          
>> I ran some more tests, some of which might be more significant:          
>>                                                                          
>>                    time(sec)   db size (kB)    peak RAM (MB)             
>> no coverage           15          ---             ~ 10                   
>> Data::Dumper+eval    246          245             ~ 23.4                 
>> Storable             190           60             ~ 19.7                 
>> no storage           184          ---             ~ 18                   
>                                                                            
> Excellent. From 23.4-18 to 19.7-18 is 5.4 to 1.7. So Storable is 
> taking only 30% of the time that Data::Dumper+eval took.                          

You're looking at the column for peak RAM usage. The time difference is 
62 (246-184) vs 6 (190-184). So Storable is taking about 10% of the time 
that Dumper+eval took. File IO is now pretty insignificant next to the 
overhead of doing coverage. Hopefully that number will come down some, 
eventually.

>> Eventually, I think that a transition to a real database (where 
>> you can read/write only the portions of interest) would be good.  
>                                                                            
> How would you define "portions of interest"?                               

The files you're actually adding/updating coverage for. Right now, if 
your cover_db holds data for a dozen files, but you test them one at a 
time, you have to read and write *all* the coverage data (as well as 
have the RAM to hold it). That's a lot of unnecessary work and wasted 
memory.

> Certainly some changes are needed in the higher level processing.          
> But there's possibly no need for a "real database" (if you mean            
> DBI/SQL etc which carry significant overheads). Multiple files, for        
> example, may suffice.

I did have DBI in mind, but I'll agree that the overhead must be 
considered. If multiple files or a tie() to disk works and is faster, 
I'm all for it.

-mjc

Reply via email to