After a patch to prebuild PGE/Glob.pir and use the precompiled bytecode, I'm seeing the following timings with "hello.tcl" in partcl's examples:

CokeZero:~/research/parrot/languages/tcl/examples wcoleda$ time ../..//../parrot ../tcl.pbc hello.tcl
Hello World

real    0m0.313s
user    0m0.187s
sys     0m0.057s
CokeZero:~/research/parrot/languages/tcl/examples wcoleda$ time ../..//../parrot ../tcl.pbc hello.tcl
Hello World

real    0m0.315s
user    0m0.189s
sys     0m0.061s

CokeZero:~/research/parrot/languages/tcl/examples wcoleda$ time tclsh hello.tcl
Hello World

real    0m0.325s
user    0m0.035s
sys     0m0.032s

Yes, it's actually *faster* than real tclsh. This cannot be right (Actually, given the incredible amount of cheating partcl must be doing, only going this much faster is disappointing. =-) Let's try again.

CokeZero:~/research/parrot/languages/tcl/examples wcoleda$ time tclsh hello.tcl
Hello World

real    0m0.069s
user    0m0.035s
sys     0m0.023s

Ah, there we go. partcl is back down to about 4.5 times slower. Running a trace, I see the top two opcodes are:

 Code J Name                         Calls  Total/s       Avg/ms
  177 - compile_p_p_s                    2    0.088355   44.177380
  537 - load_bytecode_sc                 6    0.034809    5.801486

Which combined only make up .122s - not enough to get us back down to even, but it's a start. There are no compile opcodes in the path for "hello.tcl", so this has to be coming in through something we're loading:

  load_bytecode "library/Data/Escape.pbc"
  load_bytecode "library/PGE.pbc"
  load_bytecode "library/PGE/Glob.pbc"

Can any of these stdlib items be optimized so they load faster? Anything they're doing at load time that could have been done at compile time instead - like the hash init in PGE::EXP && P6Rule, or the rule compilation in PGE::Rule.

As for the load_bytecode, I followed the code from the opcode back into src/embed.c, where it calls (eventually) Parrot_readbc - which appears to read the files via fread(). Can we can changed this to something that mmaps instead? (I wonder how much time is spent setting up the initial load of tcl.pbc - that's not done via an opcode, so that time isn't reported via -p, is it?)

Regards.


Reply via email to