Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Peter Svensson
On Sat, 28 Apr 2012, Sergey Oboguev wrote:

> Like with all busy-wait loops intended to implement short delays, loop is 
> initially calibrated against a real time source, then a required number of 
> iterations is used to execute actual delay.
> You can find more details in SIMH source code.

Hi,

I feel I may not have been clear enoungh. How does the simulated CPU know 
the wall time of the host? Is it able to read the host wall time via some 
emulated io space?

Relying on any operation in native C being implemented in any partcular 
way is prone to breakage, eventually.

Since the time in the rom is expected to be short, why not just spin 
waiting for gettimeofday to change?

Peter
___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Sergey Oboguev
> From: Michael Bloom 

> nanosleep() would give you the exact timing needed.  I'm very much against 
>polling the time as a solution because it could use up a lot of CPU cycles 
>that 
>other processes would love to have.  
>
> "spinning" when there are ways that are more accurate (not to mention 
>friendlier to other processes who'd like some CPU time) to simulate timing,  
>is 
>not the best choice, in my opinion.


Hi Michael,

First off, the whole issue was about the code that gets executed only when 
console ROM is active.
Once the bootstrap is started, it is irrelevant.

Second, on most platforms you are lucky to get sleep interval with granularity 
around 1 ms.
Never mind 100 us, let alone 1 us.
Having system call that takes its argument in nanosecond or femtosecond  units 
does not mean host OS timer events are actually processed at this resolution.

But even if they magically were, the cost for no-op system call  
(userland-kernel-userland context roundtrip) on 3+ GHz x64 CPU is about 1 us,  
which is approximately the amount of the delay introduced by  rom_read_delay 
loop. The overhead of creating timer entry -- merely  creating the entry, 
without any processing of it -- would exceed the  delay required.
___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Michael Bloom
Oops. If the math looks wrong, you're right. Tthe "per second" doesn't 
belong there.


On 04/28/2012 02:42 PM, Michael Bloom wrote:
P.S. One of the best known "offensive spinners" is Google Chrome, 
which after a short while can bring a system to it's knees..  Some of 
it's many processes are constantly spinning.  Run strace on one that's 
racked up a lot of time, and in ten seconds, you'll see: about 167,000 
identical getrlimit() calls per second ...

___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Michael Bloom
I think I would have to agree with Peter. Or at least, half agree.  
nanosleep() would give you the exact timing needed.  I'm very much 
against polling the time as a solution because it could use up a lot of 
CPU cycles that other processes would love to have.   System calls such 
as gettimeofday() typically do nothing besides copy the internally 
maintained time back to user space (and ,  and as such,  do not need to 
sleep and so never voluntarily give up the CPU.  Some systems, will 
recognize when processes seem to exhibit "infinite loop" behavior, and 
"auto-nice" the process,  potentially affecting the "brittleness" that 
Peter refers to.


In any event, "spinning" when there are ways that are more accurate (not 
to mention friendlier to other processes who'd like some CPU time) to 
simulate timing,  is not the best choice, in my opinion.


-Michael

P.S. One of the best known "offensive spinners" is Google Chrome, which 
after a short while can bring a system to it's knees..  Some of it's 
many processes are constantly spinning.  Run strace on one that's racked 
up a lot of time, and in ten seconds, you'll see: about 167,000 
identical getrlimit() calls per second with always the same args, and 
always the same returned values; 55000 clock_gettime() calls, 22000 
gettimeofday() calls (can't they make up their mind?); 1379 zero-timeout 
poll calls which always check descriptors 14, and 15, and  49 of those 
calls also check descriptor 5 ... all but two of those poll calls 
immediately time out.  The two that don't time out are for (you guessed 
it) descriptor 5. Right after the poll calls have timed out (because 
there is no i/o available on descriptors 14, and 15),  the code then 
issues 1379 read calls on each of those those 2 descriptors.  Of course, 
because they didn't want those calls to block,  fd's 14 and 15 were set 
non-blocking.  Thus they returned immediately with errno set to EAGAIN.  
After the only two polls that did not time out, chrome issued 2 reads on 
fd 5.  So how about the time spent? looking at /proc//stat on 
linux, I'm seeing up to a ten to 1 ratio in user to kernel time.  So, 
whatever the amount of cpu time it took to execute 250,000 system calls 
in ten seconds,  it spent up to ten times that much cpu time executing 
in user space.



On 04/27/2012 08:15 PM, Peter Svensson wrote:

Hi,

What is the rom code comparing against and why do we not do the delay
compared to that?

If it is against the real time clock, would not nanosleep() or just
polling the time be more portable?

Playing games with the C memory model to acheive a certain performace
seems to me to always be brittle.

Peter

On Fri, 27 Apr 2012, Sergey Oboguev wrote:


Hi Mark,

The goal is to prevent smart compiler from collapsing the loops in
rom_read_delay, especially the bottom loop, by optimizing them.
Declaring "loopval" as volatile does just that, by effectively disabling
compiler's capability to optimize, and does it in a portable way.

Disabling inlining of rom_swapb, in fact, does not provide such guarantee long
term.
It may shut off compiler's optimizations today, but once the compiler  (or
compilers) gets even smarter in the future, it can some day figure  out the code
"does not need" to call rom_swapb.
Compiler may leave the function un-inlined, but just figure out it does  not
need to be called and optimize the whole loop construct away.

Therefore volatile is both portable and -- long-term -- safer approach.
The caveat is, compilers do have bugs and can sometimes disregard volatile
declaration.
See for ex. "Volatiles Are Miscompiled, and What to Do about It"
http://dl.acm.org/citation.cfm?id=1450093
Note that in older versions of LLVM used to be a particularly bad  offender,
miscompiling (in LLVC-GCC version 2.2) 19% of volatile  references, however it
got better since then.

So when using volatile it's worth to take some extra steps that reduce
probability of triggering compiler's bug, particularly avoiding  declaring
variable in question as local scope.
Or, perhaps even better, what Eide&  Regehr suggest in the mentioned article:
instead of  accessing variable directly, perform accesses via via per-type
accessor  routines. Or both.

Thanks,
Sergey


___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Sergey Oboguev
Like with all busy-wait loops intended to implement short delays, loop is 
initially calibrated against a real time source, then a required number of 
iterations is used to execute actual delay.
You can find more details in SIMH source code.


- Original Message 
From: Peter Svensson 
> What is the rom code able to check instruction cycle against? Real time clock 
>imported to the simulation from the host? 
>
> I.e what is it that makes the wall time of the host matter to the simulation?
___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Peter Svensson
On Sat, 28 Apr 2012, Sergey Oboguev wrote:

> > From: Peter Svensson 
> > What is the rom code comparing against and why do we not do the delay 
> > compared 
> >to that? 
> >
> > If it is against the real time clock, would not nanosleep() or just polling 
> > the 
> >time be more portable?
> 
> Time flow quantum exposed to applications in a portable way is not  in the 
> nanosecond range, but is in the range of several milliseconds.
> Delay has to be performed to a small fraction of this quantum.

Hi,

What is the rom code able to check instruction cycle against? Real time 
clock imported to the simulation from the host? I.e what is it that makes 
the wall time of the host matter to the simulation?

Peter



___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh


Re: [Simh] clang (was Re: XCode and LTO)

2012-04-28 Thread Sergey Oboguev
> From: Peter Svensson 
> What is the rom code comparing against and why do we not do the delay 
> compared 
>to that? 
>
> If it is against the real time clock, would not nanosleep() or just polling 
> the 
>time be more portable?

Time flow quantum exposed to applications in a portable way is not  in the 
nanosecond range, but is in the range of several milliseconds.
Delay has to be performed to a small fraction of this quantum.

___
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh