I've been arguing for a long time that the interaction between conservative GC w/o thread-local allocators and the builtin AA is horrible. Keep this in mind: You have to take the GC lock for **EVERY SINGLE INSERTION** into a builtin AA. This makes them absolutely worthless in multithreaded environments. Even in single threaded mode, they create ridiculous amounts of false pointers and heap fragmentation.
I even went as far as to create my own AA implementation, called RandAA, specifically designed for conservative GC. It uses parallel key and value arrays (to save on alignment overhead and allow only keys or only values to be scanned by the GC) and randomized probing. In real world programs where false pointers were eating me alive with the builtin, RandAA worked fine. Unfortunately, it's succumbed to bit rot a little, but if there's interest in it again, I'll fix the issues and post a link.