Harendra Kumar <harendra.ku...@gmail.com> writes: > I was looking at the RTS code for allocating small objects via prim ops > e.g. newByteArray# . The code looks like: > > stg_newByteArrayzh ( W_ n ) > { > MAYBE_GC_N(stg_newByteArrayzh, n); > > payload_words = ROUNDUP_BYTES_TO_WDS(n); > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); > > We are making a foreign call here (ccall). I am wondering how much overhead > a ccall adds? I guess it may have to save and restore registers. Would it > be better to do the fast path case of allocating small objects from the > nursery using cmm code like in stg_gc_noregs? > GHC's operational model is designed in such a way that foreign calls are fairly cheap (e.g. we don't need to switch stacks, which can be quite costly). Judging by the assembler produced for newByteArray# in one random x86-64 tree that I have lying around, it's only a couple of data-movement instructions, an %eax clear, and a stack pop:
36: 48 89 ce mov %rcx,%rsi 39: 48 89 c7 mov %rax,%rdi 3c: 31 c0 xor %eax,%eax 3e: e8 00 00 00 00 call 43 <stg_newByteArrayzh+0x43> 43: 48 83 c4 08 add $0x8,%rsp The data movement operations in particular are quite cheap on most microarchitectures where GHC would run due to register renaming. I doubt that this overhead would be noticable in anything but a synthetic benchmark. However, it never hurts to measure. Cheers, - Ben
signature.asc
Description: PGP signature
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs