On Feb 23, 11:03 pm, Jason Grout <jason-s...@creativetrax.com> wrote:
> On 2/23/11 3:56 PM, Robert Bradshaw wrote:
>
>
>
> > On Wed, Feb 23, 2011 at 1:47 PM, Jason Grout
> > <jason-s...@creativetrax.com>  wrote:
> >> On 2/23/11 3:06 PM, Robert Bradshaw wrote:
>
> >>> On Wed, Feb 23, 2011 at 11:34 AM, William Stein<wst...@gmail.com>    
> >>> wrote:
>
> >>>> On Wed, Feb 23, 2011 at 10:57 AM, Jason Grout
> >>>> <jason-s...@creativetrax.com>    wrote:
>
> >>>>> On 2/23/11 12:28 PM, William Stein wrote:
>
> >>>>>> At lunch yesterday Robert Bradshaw made the interesting suggestion to
> >>>>>> read the docs for importlib
> >>>>>> (http://docs.python.org/dev/library/importlib.html) and write a
> >>>>>> customized import hook, so that every time during Sage startup that a
> >>>>>> module is imported, the import is done from a single big in-memory zip
> >>>>>> file instead of done using the filesystem.    If this can be made to
> >>>>>> work, it would be a huge win for slow filesystems.   The basic problem
> >>>>>> is that some filesystems are fast but have huge*latency*.
>
> >>>>> Is it a big win primarily because the zip file contents can be read in
> >>>>> and
> >>>>> cached by us?  I'm just trying to understand it better.
>
> >>>> Which would you rather do on a high latency filesystem:
>
> >>>>   (1) Read/stat 20,000 little files, or
> >>>>   (2) Read exactly one 40MB file.
>
> >>>>>   Is this the same idea as Jar files in java?
>
> >>>> I don't know.
>
> >>> Yep. In that case the "high latency file system" was a webserver.
>
> >>>>> You mean likehttp://docs.python.org/library/zipimport.html?
>
> >>>> Cool.
>
> >>> Note that this should just involve putting the zip file first in the
> >>> python path.
>
> >>>> I don't know for a fact that Robert Bradshaw's suggestion will be a
> >>>> big win, since nobody has tried this yet.  But I'm optimistic.  The
> >>>> idea would be to make a zip archive of
> >>>> $SAGE_ROOT/local/lib/python/site-packages (say), and do *all* imports
> >>>> using that massive zip archive.
>
> >>> I'm optimistic too. This would, of course, make more sense for
> >>> system-wide installs than development versions, but the former are
> >>> more likely to be on a non-local filesystem anyways.
>
> >> Sounds like it is time for a trial!
>
> >> I created a directory of 2000 .py files and an __init__.py file to make it 
> >> a
> >> module
>
> >> for i in range(2000):
> >>     with open('importtest/test_%s.py'%i,'w') as f:
> >>         f.write("VALUE=%s\n"%i)
> >> with open('importtest/__init__.py','w') as f:
> >>     f.write(' ')
>
> >> Then I imported each of these so that .pyc files were created.
>
> >> for i in range(2000):
> >>     exec 'import importtest.test_%s'%i
>
> >> Okay, then I copied the directory and zipped it up (in the shell now):
>
> >> $ cp -r importtest zipimporttest
> >> $ zip -r tmp.zip zipimporttest
> >> $ rm -rf zipimporttest
>
> >> One nice side effect is that the zip file is less than one MB, while the
> >> directory of python files is around 16M.
>
> >> Now for the test.  Here are my two scripts.  One imports each module in the
> >> directory and adds up the VALUE in each module:
>
> >> % cat mytest.py
> >> s=0
> >> for i in range(2000):
> >>     exec 'import importtest.test_%s as tt'%i
> >>     s+=tt.VALUE
> >> print s
>
> >> The other first adds the zip to the front of sys.path and then does the 
> >> same
> >> imports and summing, but using the zipped module:
>
> >> % cat mytestzip.py
> >> import sys
> >> sys.path.insert(0,'./tmp.zip')
> >> s=0
> >> for i in range(2000):
> >>     exec 'import zipimporttest.test_%s as tt'%i
> >>     s+=tt.VALUE
> >> print s
>
> >> And now for the timings:
>
> >> % time sage -python mytest.py
> >> Detected SAGE64 flag
> >> Building Sage on OS X in 64-bit mode
> >> 1999000
> >> sage -python mytest.py  0.26s user 1.47s system 75% cpu 2.282 total
>
> >> % time sage -python mytestzip.py
> >> Detected SAGE64 flag
> >> Building Sage on OS X in 64-bit mode
> >> 1999000
> >> sage -python mytestzip.py  0.21s user 0.11s system 99% cpu 0.327 total
>
> >> It looks like the zip is a clear winner in this case.  And this is with the
> >> directory presumably in the FS cache.
>
> > Cool. Given the CPU was pegged at 99%, have you tried using an
> > uncompressed zip file? It'd have more data to read, but less to do
> > with it once it's read.
>
> In my case, using zip -0 (no compression) gives:
>
> % time sage -python mytestzip.py
> Detected SAGE64 flag
> Building Sage on OS X in 64-bit mode
> 1999000
> sage -python mytestzip.py  0.20s user 0.10s system 99% cpu 0.309 total
>
> So just a slight savings.
>
> Jason


I had an orthorgonal thought, though I'm not sure it's completely
possible. Insted of actually loading the real functions/classes etc.,
couldn't we fast-load (or generate) stub-versions of all these, which
when called would load and replace themselves with the real version
and then run it. I'm not completely sure it's possible with Python,
but Python is pretty flexible so perhaps there is a way; in
particular, I don't know how Python supports reflection for adding new
functions to the namespace dynamically. Also, the doc-strings and
search*-functions should also somehow be thought into it.
If it's possible, as far as I can see, the user would not notice this
(except for a minute overhead the first time a function was called),
and only the very small fraction of used modules would be loaded each
session. Furthermore, because the stub-functions were in the
namespace, tab-completion would still work.
The stub-versions could either come from auto-generated python-files
from when compiling Sage and loaded by the usual module-loader, or
perhaps by some Python-function which used a compile-time-generated
listing of all functions/classes etc. to create these wrapper-
functions at run-time and add them to the namespace.


Regards,
Johan

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to