Re: [jruby-dev] Improving load time by parallelizing load/parse?

Jonathan Coveney Mon, 24 Oct 2011 22:05:47 -0700

Building the AST in parallel is definitely an idea. You could load and build
it in parallel, and then any weirdness around module stuff would get fixed
at runtime...although I could imagine a case where one module alters the
definition of another function in another module that forces it to require a
totally different module. Hmm.


2011/10/24 Andrew Cholakian <and...@andrewvc.com>

> I'm wondering how much of the issue is IO and how much is CPU time required
> to parse. Would it be easiest to just do a quick scan for module
> dependencies and cache all the files ASAP, then parse serially? I'm not sure
> if it'd be possible to do a quick parse for just 'require'.
>
>
> On Mon, Oct 24, 2011 at 9:47 PM, Jonathan Coveney <jcove...@gmail.com>wrote:
>
>> I was thinking about the case below, and I think that this is an
>> interesting idea, but I'm wondering how you would resolve certain
>> difficulties. Imagine:
>>
>> require 'ALib'
>> a = 10+2
>> require 'BLib'
>> b=a/2
>>
>> where ALib is a lot of random stuff, then:
>> class Fixnum
>>   def +(other)
>>     self*other
>>   end
>> end
>>
>> and BLib is a lot of random stuff, then:
>> class Fixnum
>>   def /(other)
>>     self*other*other
>>   end
>> end
>>
>> How would you know how to resolve these various pieces? I guess you
>> mention eager interpreting and then a cache, but given that any module can
>> change any other module's functionality, you would have to keep track of
>> everything that you eagerly interpreted, and possibly go back depending on
>> what your module declares. How else would you know that a module that
>> doesn't depend on any other modules is going to actually execute in a
>> radically different way because of another module that you have included?
>> The only way I can think of would be if the thread executing any given piece
>> of code kept track of the calls that it made and where, and then went back
>> to the earliest piece it had to in the case that anything was
>> rewritten...but then you could imagine an even more convoluted case where
>> module A changes an earlier piece of module B such that it changes how a
>> later piece of itself works...and so on.
>>
>> Perhaps this is incoherent, but I think the question of how you deal with
>> the fact that separately running pieces of code can change the fundamental
>> underlying state of the world.
>>
>>
>> 2011/10/24 Charles Oliver Nutter <head...@headius.com>
>>
>>> Nahi planted an interesting seed on Twitter...what if we could
>>> parallelize parsing of Ruby files when loading a large application?
>>>
>>> At a naive level, parallelizing the parse of an individual file is
>>> tricky to impossible; the parser state is very much straight-line. But
>>> perhaps it's possible to parallelize loading of many files?
>>>
>>> I started playing with parallelizing calls to the parser, but that
>>> doesn't really help anything; every call to the parser blocks waiting
>>> for it to complete, and the contents are not interpreted until after
>>> that point. That means that "require" lines remain totally opaque,
>>> preventing us from proactively starting threaded parses of additional
>>> files. But there lies the opportunity: what if load/require requests
>>> were done as Futures, require/load lines were eagerly interpreted by
>>> submitting load/require requests to a thread pool, and child requires
>>> could be loading and parsing at the same time as the parent
>>> file...without conflicting.
>>>
>>> In order to do this, I think we would need to make the following
>>> modifications:
>>>
>>> * LoadService would need to explose Future-based versions of "load"
>>> and "require". The initial file loaded as the "main" script would be
>>> synchronous, but subsequent requires and loads could be shunted to a
>>> thread pool.
>>> * The parser would need to initiate eager load+parser of files
>>> encountered in require-like and load-like lines. This load+parse would
>>> encompass filesystem searching plus content parsing, so all the heavy
>>> lifting of booting a file would be pushed into the thread pool.
>>> * Somewhere (perhaps in LoadService) we would maintain an LRU cache
>>> mapping from file paths to ASTs. The cache would contain Futures;
>>> getting the actual parsed library would then simply be a matter of
>>> Future.get, allowing many of the load+parses to be done
>>> asynchronously.
>>>
>>> For a system like Rails, where there might be hundreds of files
>>> loaded, this could definitely improve startup performance.
>>>
>>> Thoughts?
>>>
>>> - Charlie
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this list, please visit:
>>>
>>>    http://xircles.codehaus.org/manage_email
>>>
>>>
>>>
>>
>

Re: [jruby-dev] Improving load time by parallelizing load/parse?

Reply via email to