Re: [Simh] clang (was Re: XCode and LTO)
On Sat, 28 Apr 2012, Sergey Oboguev wrote: > Like with all busy-wait loops intended to implement short delays, loop is > initially calibrated against a real time source, then a required number of > iterations is used to execute actual delay. > You can find more details in SIMH source code. Hi, I feel I may not have been clear enoungh. How does the simulated CPU know the wall time of the host? Is it able to read the host wall time via some emulated io space? Relying on any operation in native C being implemented in any partcular way is prone to breakage, eventually. Since the time in the rom is expected to be short, why not just spin waiting for gettimeofday to change? Peter ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
> From: Michael Bloom > nanosleep() would give you the exact timing needed. I'm very much against >polling the time as a solution because it could use up a lot of CPU cycles >that >other processes would love to have. > > "spinning" when there are ways that are more accurate (not to mention >friendlier to other processes who'd like some CPU time) to simulate timing, >is >not the best choice, in my opinion. Hi Michael, First off, the whole issue was about the code that gets executed only when console ROM is active. Once the bootstrap is started, it is irrelevant. Second, on most platforms you are lucky to get sleep interval with granularity around 1 ms. Never mind 100 us, let alone 1 us. Having system call that takes its argument in nanosecond or femtosecond units does not mean host OS timer events are actually processed at this resolution. But even if they magically were, the cost for no-op system call (userland-kernel-userland context roundtrip) on 3+ GHz x64 CPU is about 1 us, which is approximately the amount of the delay introduced by rom_read_delay loop. The overhead of creating timer entry -- merely creating the entry, without any processing of it -- would exceed the delay required. ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
Oops. If the math looks wrong, you're right. Tthe "per second" doesn't belong there. On 04/28/2012 02:42 PM, Michael Bloom wrote: P.S. One of the best known "offensive spinners" is Google Chrome, which after a short while can bring a system to it's knees.. Some of it's many processes are constantly spinning. Run strace on one that's racked up a lot of time, and in ten seconds, you'll see: about 167,000 identical getrlimit() calls per second ... ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
I think I would have to agree with Peter. Or at least, half agree. nanosleep() would give you the exact timing needed. I'm very much against polling the time as a solution because it could use up a lot of CPU cycles that other processes would love to have. System calls such as gettimeofday() typically do nothing besides copy the internally maintained time back to user space (and , and as such, do not need to sleep and so never voluntarily give up the CPU. Some systems, will recognize when processes seem to exhibit "infinite loop" behavior, and "auto-nice" the process, potentially affecting the "brittleness" that Peter refers to. In any event, "spinning" when there are ways that are more accurate (not to mention friendlier to other processes who'd like some CPU time) to simulate timing, is not the best choice, in my opinion. -Michael P.S. One of the best known "offensive spinners" is Google Chrome, which after a short while can bring a system to it's knees.. Some of it's many processes are constantly spinning. Run strace on one that's racked up a lot of time, and in ten seconds, you'll see: about 167,000 identical getrlimit() calls per second with always the same args, and always the same returned values; 55000 clock_gettime() calls, 22000 gettimeofday() calls (can't they make up their mind?); 1379 zero-timeout poll calls which always check descriptors 14, and 15, and 49 of those calls also check descriptor 5 ... all but two of those poll calls immediately time out. The two that don't time out are for (you guessed it) descriptor 5. Right after the poll calls have timed out (because there is no i/o available on descriptors 14, and 15), the code then issues 1379 read calls on each of those those 2 descriptors. Of course, because they didn't want those calls to block, fd's 14 and 15 were set non-blocking. Thus they returned immediately with errno set to EAGAIN. After the only two polls that did not time out, chrome issued 2 reads on fd 5. So how about the time spent? looking at /proc//stat on linux, I'm seeing up to a ten to 1 ratio in user to kernel time. So, whatever the amount of cpu time it took to execute 250,000 system calls in ten seconds, it spent up to ten times that much cpu time executing in user space. On 04/27/2012 08:15 PM, Peter Svensson wrote: Hi, What is the rom code comparing against and why do we not do the delay compared to that? If it is against the real time clock, would not nanosleep() or just polling the time be more portable? Playing games with the C memory model to acheive a certain performace seems to me to always be brittle. Peter On Fri, 27 Apr 2012, Sergey Oboguev wrote: Hi Mark, The goal is to prevent smart compiler from collapsing the loops in rom_read_delay, especially the bottom loop, by optimizing them. Declaring "loopval" as volatile does just that, by effectively disabling compiler's capability to optimize, and does it in a portable way. Disabling inlining of rom_swapb, in fact, does not provide such guarantee long term. It may shut off compiler's optimizations today, but once the compiler (or compilers) gets even smarter in the future, it can some day figure out the code "does not need" to call rom_swapb. Compiler may leave the function un-inlined, but just figure out it does not need to be called and optimize the whole loop construct away. Therefore volatile is both portable and -- long-term -- safer approach. The caveat is, compilers do have bugs and can sometimes disregard volatile declaration. See for ex. "Volatiles Are Miscompiled, and What to Do about It" http://dl.acm.org/citation.cfm?id=1450093 Note that in older versions of LLVM used to be a particularly bad offender, miscompiling (in LLVC-GCC version 2.2) 19% of volatile references, however it got better since then. So when using volatile it's worth to take some extra steps that reduce probability of triggering compiler's bug, particularly avoiding declaring variable in question as local scope. Or, perhaps even better, what Eide& Regehr suggest in the mentioned article: instead of accessing variable directly, perform accesses via via per-type accessor routines. Or both. Thanks, Sergey ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
Like with all busy-wait loops intended to implement short delays, loop is initially calibrated against a real time source, then a required number of iterations is used to execute actual delay. You can find more details in SIMH source code. - Original Message From: Peter Svensson > What is the rom code able to check instruction cycle against? Real time clock >imported to the simulation from the host? > > I.e what is it that makes the wall time of the host matter to the simulation? ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
On Sat, 28 Apr 2012, Sergey Oboguev wrote: > > From: Peter Svensson > > What is the rom code comparing against and why do we not do the delay > > compared > >to that? > > > > If it is against the real time clock, would not nanosleep() or just polling > > the > >time be more portable? > > Time flow quantum exposed to applications in a portable way is not in the > nanosecond range, but is in the range of several milliseconds. > Delay has to be performed to a small fraction of this quantum. Hi, What is the rom code able to check instruction cycle against? Real time clock imported to the simulation from the host? I.e what is it that makes the wall time of the host matter to the simulation? Peter ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh
Re: [Simh] clang (was Re: XCode and LTO)
> From: Peter Svensson > What is the rom code comparing against and why do we not do the delay > compared >to that? > > If it is against the real time clock, would not nanosleep() or just polling > the >time be more portable? Time flow quantum exposed to applications in a portable way is not in the nanosecond range, but is in the range of several milliseconds. Delay has to be performed to a small fraction of this quantum. ___ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh