There are 2 major issues:
1) PMC initialization (new_p_ic_p): The shared PMC needs additionally the allocation of the synchronize structure and the MUTEX_INIT.
2) PMC access (set_p_i): locking/unlocking the mutex
Here are snippets from the profile:
with SharedRef CODE OP FULL NAME CALLS TOTAL TIME AVG T. ms ---- ----------------- ------- ---------- ---------- 753 new_p_ic_p 100000 0.157785 0.0016 905 set_p_i 100000 0.049269 0.0005
with Ref CODE OP FULL NAME CALLS TOTAL TIME AVG T. ms ---- ----------------- ------- ---------- ---------- 753 new_p_ic_p 100000 0.051330 0.0005 905 set_p_i 100000 0.011356 0.0001
(Overall timings aren't really comparable, the SharedRef also does a LOCK for mark, which slows that down as well)
Linux 2.2.16, Athlon 800, unoptimized Parrot build. leo
[1] set I0, 100000 set I1, 0 lp: new P0, .PerlInt new P1, .Ref, P0 # or .SharedRef set P1, I1 inc I1 lt I1, I0, lp end