This is strange. You said that you profiled the program and the extra time spent is not in user code? Where is it spent then?
This is a damn good question. I tried to debug it manually with writefln's, it showed that glfwSwapBuffers needed the time (which, I looked it up, is just a wrapper around glXSwapBuffers). `perf` showed me nothing, the time was used in some unresolved calls.
I will make new tests with perf tomorrow.