Since you're saying that each of these files may be as large as the
virtual memory of the process will allow, it follows that you cannot have
them open at the same time. Thus, putting the processing of each file into
a separate thread will not help, and you're stuck iterating through them
sequentially.

If the files are very large, larger than the physical memory on the
machine, it will be very beneficial to instead look at the code that
processes each file. This is because if you're accessing the data in a
random way, you'll cause massive page faulting, whereas if you access the
data sequentially its likely that the OS will help and prefetch some of
the pages, thereby avoiding the waits for page faults.

So my advice is to carefully look at the processing-per-file code and not
try to optimize the way files are processed (i.e. in what order or
concurrency).

--JYL

> Currently, I have some code which acesses mutiple metakit storage files
> in succession. This is something I would like to speed up a bit.
> Suppose I have a loop which is something like this(simplified for
> clarity) for mkFile in files:
>   db=metakit.storage(mkFile,1)
>    #now do something.
> Maybe this is just a stupid question but is there a better, as in
> faster, way of accessing mutiple mk storage files? The answer cannot be
> having all the data in a single mk storage file however. Potentially,
> each one of these may reach a maximum size limit for memory mapped
> files. Just a thought, but would accessing each of the files in a
> seperate thread help much? Would that create other issues that I may not
>  anticipate at first glance?
>
> _____________________________________________
> Metakit mailing list  -  [EMAIL PROTECTED]
> http://www.equi4.com/mailman/listinfo/metakit



_____________________________________________
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to