Building the AST in parallel is definitely an idea. You could load and build it in parallel, and then any weirdness around module stuff would get fixed at runtime...although I could imagine a case where one module alters the definition of another function in another module that forces it to require a totally different module. Hmm.
2011/10/24 Andrew Cholakian <and...@andrewvc.com> > I'm wondering how much of the issue is IO and how much is CPU time required > to parse. Would it be easiest to just do a quick scan for module > dependencies and cache all the files ASAP, then parse serially? I'm not sure > if it'd be possible to do a quick parse for just 'require'. > > > On Mon, Oct 24, 2011 at 9:47 PM, Jonathan Coveney <jcove...@gmail.com>wrote: > >> I was thinking about the case below, and I think that this is an >> interesting idea, but I'm wondering how you would resolve certain >> difficulties. Imagine: >> >> require 'ALib' >> a = 10+2 >> require 'BLib' >> b=a/2 >> >> where ALib is a lot of random stuff, then: >> class Fixnum >> def +(other) >> self*other >> end >> end >> >> and BLib is a lot of random stuff, then: >> class Fixnum >> def /(other) >> self*other*other >> end >> end >> >> How would you know how to resolve these various pieces? I guess you >> mention eager interpreting and then a cache, but given that any module can >> change any other module's functionality, you would have to keep track of >> everything that you eagerly interpreted, and possibly go back depending on >> what your module declares. How else would you know that a module that >> doesn't depend on any other modules is going to actually execute in a >> radically different way because of another module that you have included? >> The only way I can think of would be if the thread executing any given piece >> of code kept track of the calls that it made and where, and then went back >> to the earliest piece it had to in the case that anything was >> rewritten...but then you could imagine an even more convoluted case where >> module A changes an earlier piece of module B such that it changes how a >> later piece of itself works...and so on. >> >> Perhaps this is incoherent, but I think the question of how you deal with >> the fact that separately running pieces of code can change the fundamental >> underlying state of the world. >> >> >> 2011/10/24 Charles Oliver Nutter <head...@headius.com> >> >>> Nahi planted an interesting seed on Twitter...what if we could >>> parallelize parsing of Ruby files when loading a large application? >>> >>> At a naive level, parallelizing the parse of an individual file is >>> tricky to impossible; the parser state is very much straight-line. But >>> perhaps it's possible to parallelize loading of many files? >>> >>> I started playing with parallelizing calls to the parser, but that >>> doesn't really help anything; every call to the parser blocks waiting >>> for it to complete, and the contents are not interpreted until after >>> that point. That means that "require" lines remain totally opaque, >>> preventing us from proactively starting threaded parses of additional >>> files. But there lies the opportunity: what if load/require requests >>> were done as Futures, require/load lines were eagerly interpreted by >>> submitting load/require requests to a thread pool, and child requires >>> could be loading and parsing at the same time as the parent >>> file...without conflicting. >>> >>> In order to do this, I think we would need to make the following >>> modifications: >>> >>> * LoadService would need to explose Future-based versions of "load" >>> and "require". The initial file loaded as the "main" script would be >>> synchronous, but subsequent requires and loads could be shunted to a >>> thread pool. >>> * The parser would need to initiate eager load+parser of files >>> encountered in require-like and load-like lines. This load+parse would >>> encompass filesystem searching plus content parsing, so all the heavy >>> lifting of booting a file would be pushed into the thread pool. >>> * Somewhere (perhaps in LoadService) we would maintain an LRU cache >>> mapping from file paths to ASTs. The cache would contain Futures; >>> getting the actual parsed library would then simply be a matter of >>> Future.get, allowing many of the load+parses to be done >>> asynchronously. >>> >>> For a system like Rails, where there might be hundreds of files >>> loaded, this could definitely improve startup performance. >>> >>> Thoughts? >>> >>> - Charlie >>> >>> --------------------------------------------------------------------- >>> To unsubscribe from this list, please visit: >>> >>> http://xircles.codehaus.org/manage_email >>> >>> >>> >> >