Bug#599161: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50 minutes" bug.

2012-11-08 Thread Tim Deegan
At 09:39 + on 08 Nov (1352367592), Jan Beulich wrote:
> The plt_wrap < plt_now thing of course is entirely unexplainable
> to me too: Considering that plt_scale doesn't change at all post-
> boot, apart from memory corruption I could only see an memory
> access ordering problem to be the reason (platform_timer_stamp
> and/or stime_platform_stamp changing despite platform_timer_lock
> being held. So maybe taking a snapshot of all three static values
> involved in the calculation in __read_platform_stime() between
> acquiring the lock and the first call to __read_platform_stime(),
> and printing them together with the "live" values in a second
> printk() after the one your original patch added could rule that
> out.
>  
> But the box doesn't even seem to be NUMA (of course it also
> doesn't help that the log level was kept restricted - hint, hint,
> Philippe), not does there appear to be any S3 cycle or pCPU
> bring-up/-down in between...

S3 looks like it might be a culprit, since resume_platform_timer()
clobbers plt_stamp64 without taking the platform_timer_lock.  But both
the S3 resume code and the plt_overflow timer should only ever run on
CPU 0, so even that should be safe (unless continue_hypercall_on_cpu()
is broken...)

Definitely having loglvl=all would have helped here, to eliminate S3
from our enquiries.

> > I wonder whether the overflow handling should just be removed, or made
> > conditional on a command-line parameter, or on the 32-bit platform counter
> > being at least somewhat likely to overflow before a softirq occurs -- it
> > seems lots of systems are using 14MHz HPET, and that gives us a couple of
> > minutes for the plt_overflow softirq to do its work before overflow occurs.
> > I think we would notice that outage in other ways. :)
> 
> Iirc we added this for a good reason - to cover the, however
> unlikely, event of Xen running for very long without preemption.
> Presumably most of the cases got fixed meanwhile, and indeed
> a wraparound time on the order of minutes should make this
> superfluous, but as the case here shows that code did spot a
> severe anomaly (whatever that may turn out to be).

ISTR when this code went in we were dealing with a timer that had a
period of about 4 seconds (ACPI PMTIMER?).  It might well be OTT for the
HPET, but if there's something weird going on I'd like to track it down
while we have some sort of a handle on it.

Tim.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#437127: [Xen-devel] [PATCH] hotplug: fix ip_of for systems using peer-to-peer link

2012-07-10 Thread Tim Deegan
At 17:14 -0600 on 09 Jul (1341854093), Ian Campbell wrote:
> diff -r 54384951de02 -r 1d33f934dd67 tools/hotplug/Linux/vif-common.sh
> --- a/tools/hotplug/Linux/vif-common.sh   Tue Jul 10 00:07:20 2012 +0100
> +++ b/tools/hotplug/Linux/vif-common.sh   Tue Jul 10 00:14:54 2012 +0100
> @@ -175,7 +175,7 @@ handle_iptable()
>  #
>  ip_of()
>  {
> -  ip addr show "$1" | awk "/^.*inet.*$1\$/{print \$2}" | sed -n '1 s,/.*,,p'
> +  ip -4 -o addr show primary dev eth0 | awk '$3 == "inet" {split($4,i,"/"); 
> print i[1]; exit}'

s/eth0/"$1"/?

Tim.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org