On Fri, 22 Jan 2016, Sven Schreiber wrote:

> Am 21.01.2016 um 16:10 schrieb Allin Cottrell:
>
>> Your example shows that recursion is a _lot_ faster in julia; so now we
>> want a case where recursion is actually needed.
>>
>
> One more thought on this: What about the "omit --auto" command? I guess
> this could be viewed as something that is done recursively, like this in
> pseudo code:
>
> function <reduced-equation> omit_one_by_one(<estimated-equation>)
>  if min(<signif>) < threshold
>    eliminate(coeff_where_min(<signif>))
>    omit_one_by_one(<equ_reduced_by_one>)
>  else
>    return <estimated-equation>
>  endif
> end function
>
> If somebody is already proficient enough in Julia (or some other
> JIT-compiled language), I think it would be interesting to compare the
> speed to gretl's performance there.

Many things are such that they _can_ be done by recursion (in the 
sense of a function calling itself), or they can be done by a 
non-recursive iteration (as in gretl's "omit --auto"), or possibly 
by a simple closed-form calculation (Fibonacci numbers).

I was suggesting that we might try to think of calculations relevant 
to econometrics that are _best_ solved via recursion, given julia's 
huge advantage in that area; I kinda doubt whether auto-omission 
falls in that category, though if anyone cares to try that would be 
nice.

Another thing to consider: julia is amazingly fast at "general 
computation" (almost as fast as C) but once you start using packages 
-- such as GLM for regression -- you pay a big cost in set-up time, 
and the package code may not be anything like as efficient as the 
built-in functions. Here's a trivial example, compounded of examples 
from the julia GLM documentation:

<julia>
using GLM, RDatasets
form = dataset("datasets","Formaldehyde")
lm1 = fit(LinearModel, OptDen ~ Carb, form)
cycle = dataset("datasets", "LifeCycleSavings")
fm2 = fit(LinearModel, SR ~ Pop15 + Pop75 + DPI + DDPI, cycle)
</julia>

Running this on my i7 machine takes around 5.8 seconds (the "real" 
value from the unix "time" program). Then here's the gretl 
equivalent (after having used R to write out the two datasets as 
.dta files):

<hansl>
open formaldehyde.dta -q
ols optden 0 carb
open lifecycle.dta -q
ols sr 0 pop15 pop75 dpi ddpi
</hansl>

Running time: 0.017 seconds, or 340 times faster.

We may suppose that there's a big fixed cost in the julia case, so 
my next step was to wrap each estimation function/command in a 
loop of 100000 replications (and eliminate the printing). That gave:

julia: 30.928s
gretl:  0.747s

OK, so now gretl is only 40 times as fast. What about a million 
replications?

julia: 4m21.023s
gretl:  0m6.138s

Still 40 x faster, so it's by no means all to do with a fixed set-up 
cost.

Once again, I don't doubt there _are_ computations we could 
outsource to julia with advantage, but it seems clear that running 
regressions via GLM is not one of them.

Allin

Reply via email to