On Wed, Feb 23, 2011 at 1:47 PM, Jason Grout
<jason-s...@creativetrax.com> wrote:
> On 2/23/11 3:06 PM, Robert Bradshaw wrote:
>>
>> On Wed, Feb 23, 2011 at 11:34 AM, William Stein<wst...@gmail.com>  wrote:
>>>
>>> On Wed, Feb 23, 2011 at 10:57 AM, Jason Grout
>>> <jason-s...@creativetrax.com>  wrote:
>>>>
>>>> On 2/23/11 12:28 PM, William Stein wrote:
>>>>>
>>>>> At lunch yesterday Robert Bradshaw made the interesting suggestion to
>>>>> read the docs for importlib
>>>>> (http://docs.python.org/dev/library/importlib.html) and write a
>>>>> customized import hook, so that every time during Sage startup that a
>>>>> module is imported, the import is done from a single big in-memory zip
>>>>> file instead of done using the filesystem.    If this can be made to
>>>>> work, it would be a huge win for slow filesystems.   The basic problem
>>>>> is that some filesystems are fast but have huge*latency*.
>>>>
>>>> Is it a big win primarily because the zip file contents can be read in
>>>> and
>>>> cached by us?  I'm just trying to understand it better.
>>>
>>> Which would you rather do on a high latency filesystem:
>>>
>>>  (1) Read/stat 20,000 little files, or
>>>  (2) Read exactly one 40MB file.
>>>
>>>>  Is this the same idea as Jar files in java?
>>>
>>> I don't know.
>>
>> Yep. In that case the "high latency file system" was a webserver.
>>
>>>> You mean like http://docs.python.org/library/zipimport.html ?
>>>
>>> Cool.
>>
>> Note that this should just involve putting the zip file first in the
>> python path.
>>
>>> I don't know for a fact that Robert Bradshaw's suggestion will be a
>>> big win, since nobody has tried this yet.  But I'm optimistic.  The
>>> idea would be to make a zip archive of
>>> $SAGE_ROOT/local/lib/python/site-packages (say), and do *all* imports
>>> using that massive zip archive.
>>
>> I'm optimistic too. This would, of course, make more sense for
>> system-wide installs than development versions, but the former are
>> more likely to be on a non-local filesystem anyways.
>
>
> Sounds like it is time for a trial!
>
> I created a directory of 2000 .py files and an __init__.py file to make it a
> module
>
> for i in range(2000):
>    with open('importtest/test_%s.py'%i,'w') as f:
>        f.write("VALUE=%s\n"%i)
> with open('importtest/__init__.py','w') as f:
>    f.write(' ')
>
> Then I imported each of these so that .pyc files were created.
>
> for i in range(2000):
>    exec 'import importtest.test_%s'%i
>
>
> Okay, then I copied the directory and zipped it up (in the shell now):
>
> $ cp -r importtest zipimporttest
> $ zip -r tmp.zip zipimporttest
> $ rm -rf zipimporttest
>
> One nice side effect is that the zip file is less than one MB, while the
> directory of python files is around 16M.
>
> Now for the test.  Here are my two scripts.  One imports each module in the
> directory and adds up the VALUE in each module:
>
> % cat mytest.py
> s=0
> for i in range(2000):
>    exec 'import importtest.test_%s as tt'%i
>    s+=tt.VALUE
> print s
>
>
> The other first adds the zip to the front of sys.path and then does the same
> imports and summing, but using the zipped module:
>
> % cat mytestzip.py
> import sys
> sys.path.insert(0,'./tmp.zip')
> s=0
> for i in range(2000):
>    exec 'import zipimporttest.test_%s as tt'%i
>    s+=tt.VALUE
> print s
>
>
> And now for the timings:
>
> % time sage -python mytest.py
> Detected SAGE64 flag
> Building Sage on OS X in 64-bit mode
> 1999000
> sage -python mytest.py  0.26s user 1.47s system 75% cpu 2.282 total
>
>
> % time sage -python mytestzip.py
> Detected SAGE64 flag
> Building Sage on OS X in 64-bit mode
> 1999000
> sage -python mytestzip.py  0.21s user 0.11s system 99% cpu 0.327 total
>
>
> It looks like the zip is a clear winner in this case.  And this is with the
> directory presumably in the FS cache.

Cool. Given the CPU was pegged at 99%, have you tried using an
uncompressed zip file? It'd have more data to read, but less to do
with it once it's read.

- Robert

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to