Bug#1070159: i915: CPU usage spikes with monitor powered down or unplugged

2024-08-10 Thread Gedalya
forwarded 1070159 https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11919
thanks



Bug#1070159: i915: CPU usage spikes with monitor powered down or unplugged

2024-08-10 Thread Gedalya
reopen 1070159
found 1070159 6.10.3-1
thanks

Linux 6.10.3-1 shows the same behavior, no change



Bug#1071582: ip: Poor color choice for dark background

2024-05-22 Thread Gedalya
I'm trying ...

https://lore.kernel.org/netdev/2c0a1779713b5bdd443a8e8258c7d...@manjaro.org/

https://lore.kernel.org/netdev/e1s9rpa-0006jy7-1...@ws2.gedalya.net/



Bug#1071582: ip: Poor color choice for dark background

2024-05-21 Thread Gedalya
On 5/21/24 10:55 PM, Luca Boccassi wrote:
> This has always been enabled by default, even in stable.

What is the meaning of this line in the changelog for 6.9.0-1, and why does it 
correlate with an actual change in behavior?

Quote:
  * Enable output with colors on terminals



Bug#1071582: ip: Poor color choice for dark background

2024-05-21 Thread Gedalya
Package: iproute2
Version: 6.9.0-1
Severity: minor

Hello,

The newly enabled colored output is rather hard to read on dark backgrounds, 
especially the deep blue color used for IPv6 addresses.

Setting COLORFGBG=";0" as the manpage suggests helps a lot.

Consider a scenario when this command is used under stress to manually bring up 
networking from a VT. It's a rather basic command that would typically run on a 
dark background and it's important for it to be accessible by default without 
fiddling around too much.

Thanks!



Bug#1070159: i915: CPU usage spikes with monitor powered down or unplugged

2024-05-20 Thread Gedalya
On 5/21/24 2:41 AM, Salvatore Bonaccorso wrote:
> Can you please test if you have the same behaviour with recent
> upstream kernels? For instance test with 6.8.9-1 in unstable, or if
> you can build upstream stable version 6.8.10, 6.9.1.

6.8.9 - same behavior

6.9.1 from pristine upstream tar shows the same behavior too.

Among the new config options in 6.9.1, CONFIG_DRM_DISPLAY_DP_TUNNEL and 
CONFIG_DRM_I915_DP_TUNNEL got enabled. I suppose this isn't relevant.

Thank you for your attention!



Bug#992832: linux-image-5.10.0-8-amd64: please enable CONFIG_AMD_PMC

2021-08-24 Thread Gedalya
Package: linux-image-5.10.0-8-amd64
Version: 5.10.46-4

Hi,

amd-pmc is needed on recent AMD Ryzen laptops in order to properly enter s2Idle.

Another module apparently relevant on recent Ryzen laptops is 
CONFIG_AMD_SFH_HID, although this PCI device is not present on my laptop.

Thanks,

Gedalya



Bug#984852: firmware-amd-graphics: Please add cezanne ("green sardine")

2021-03-09 Thread Gedalya
On 3/9/21 5:31 PM, maximilian attems wrote:
>
>> the display stopped updating as soon as amdgpu took over, and it never came 
>> back.
>> The firmware is needed of course, and as a side note, it could be nice if 
>> the kernel could fail more gracefully, somehow bringing the display back (it 
>> works fine with amdgpu blacklisted, just no acceleration) such that it takes 
>> me less than two days to figure it all out.
> this is a linux bug, you might want to submit it upstream to AMD guys.

Yea, I actually wasn't sure if this would be regarded as such, thanks for the 
encouragement. I'll look that up.

>> The relevant files are amdgpu/green_sardine_*
> right, they only got pushed upstream in linux-firmware git on 11/2/2021
> after the latest 20210208 release, hence unfortunately they miss out the
> next debian release
> https://release.debian.org/bullseye/freeze_policy.html
>
> They will land in the next upstream release upload of 202103XX to
> experimental, backports and then after bullseye release to unstable
> (plus next testing).
>  
I sure hope it would also end up in some point release? Or maybe a freeze 
exception could be made?

It would be unfortunate if a new computer won't be usable (without backports) 
on stable Debian which is to be finally released presumably several months 
after the computer hit the market, for want of a rather minor fix.




Bug#984852: firmware-amd-graphics: Please add cezanne ("green sardine")

2021-03-09 Thread Gedalya
Package: firmware-amd-graphics
Version: 20210208-3

Hi,

I've received my new laptop with a Ryzen R7 5800U and the display stopped 
updating as soon as amdgpu took over, and it never came back.
The firmware is needed of course, and as a side note, it could be nice if the 
kernel could fail more gracefully, somehow bringing the display back (it works 
fine with amdgpu blacklisted, just no acceleration) such that it takes me less 
than two days to figure it all out.

The relevant files are amdgpu/green_sardine_*

Thanks!



Bug#947413: linux-image-4.19.0-7-amd64: kernel lockup

2019-12-26 Thread Gedalya
Package: src:linux
Version: 4.19.87-1

Dear Maintainer,


   * What led up to the situation?

Not sure exactly. Normal operations.

I'll include some facts that I suspect might be relevant:

Most volumes are mounted with the "discard" option. The I/O operations in 
question seem to have involved /var, which is ext4, mounted with discard.

This looks to me like it *could* be related:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/block?h=v4.19.91&id=42d72c9d28964fbbaeaa15baf2e7ce418d1c1a0a

When this issue happened I was using mq-deadline, I tried switching to bfq in 
the process of investigating the possible causes.

I do not yet have a way to reproduce this behavior.

   * What was the outcome of this action?

Dec 25 07:15:57 be1va kernel: [698773.335359] BUG: unable to handle kernel NULL 
pointer dereference at 0028
Dec 25 07:15:57 be1va kernel: [698773.338614] PGD 0 P4D 0
Dec 25 07:15:57 be1va kernel: [698773.339537] Oops:  [#1] SMP NOPTI
Dec 25 07:15:57 be1va kernel: [698773.340914] CPU: 29 PID: 784 Comm: 
kworker/29:1H Not tainted 4.19.0-7-amd64 #1 Debian 4.19.87-1
Dec 25 07:15:57 be1va kernel: [698773.344187] Hardware name: QEMU Standard PC 
(i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Dec 25 07:15:57 be1va kernel: [698773.347903] Workqueue: kblockd 
blk_mq_run_work_fn
Dec 25 07:15:57 be1va kernel: [698773.349978] RIP: 0010:rb_next+0x11/0x50
Dec 25 07:15:57 be1va kernel: [698773.351545] Code: 89 ec 48 89 c5 e9 80 fe ff 
ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 8b 0f 48 39 cf 74 39 48 8b 
47 08 48 85 c0 74 22 <48> 8b 50 10 48 85 d2 74 0c 48 89 d0 48 8b 50 10 48 85 d2 
75 f4 c3
Dec 25 07:15:57 be1va kernel: [698773.357983] RSP: 0018:a075c73bfdb0 
EFLAGS: 00010206
Dec 25 07:15:57 be1va kernel: [698773.359782] RAX: 0018 RBX: 
901819663e00 RCX: db8d639239c0
Dec 25 07:15:57 be1va kernel: [698773.362331] RDX: 901819663e68 RSI: 
0001 RDI: 90181519ea60
Dec 25 07:15:57 be1va kernel: [698773.364686] RBP: 90181519ea00 R08: 
 R09: 00480020
Dec 25 07:15:57 be1va kernel: [698773.367346] R10: 0015d519e000 R11: 
 R12: 901819663e5c
Dec 25 07:15:57 be1va kernel: [698773.369754] R13: 0001 R14: 
901819663e10 R15: 90181519d6c0
Dec 25 07:15:57 be1va kernel: [698773.372166] FS:  () 
GS:901827b4() knlGS:
Dec 25 07:15:57 be1va kernel: [698773.375158] CS:  0010 DS:  ES:  CR0: 
80050033
Dec 25 07:15:57 be1va kernel: [698773.377238] CR2: 0028 CR3: 
0002df80a002 CR4: 00360ee0
Dec 25 07:15:57 be1va kernel: [698773.379495] DR0:  DR1: 
 DR2: 
Dec 25 07:15:57 be1va kernel: [698773.382132] DR3:  DR6: 
fffe0ff0 DR7: 0400
Dec 25 07:15:57 be1va kernel: [698773.384604] Call Trace:
Dec 25 07:15:57 be1va kernel: [698773.385590]  dd_dispatch_request+0x15f/0x210
Dec 25 07:15:57 be1va kernel: [698773.387083]  
blk_mq_do_dispatch_sched+0xc7/0x120
Dec 25 07:15:57 be1va kernel: [698773.388752]  
blk_mq_sched_dispatch_requests+0x11e/0x170
Dec 25 07:15:57 be1va kernel: [698773.390857]  __blk_mq_run_hw_queue+0x4e/0xe0
Dec 25 07:15:57 be1va kernel: [698773.392508]  process_one_work+0x1a7/0x3a0
Dec 25 07:15:57 be1va kernel: [698773.393888]  worker_thread+0x30/0x390
Dec 25 07:15:57 be1va kernel: [698773.395354]  ? create_worker+0x1a0/0x1a0
Dec 25 07:15:57 be1va kernel: [698773.396927]  kthread+0x112/0x130
Dec 25 07:15:57 be1va kernel: [698773.398221]  ? kthread_bind+0x30/0x30
Dec 25 07:15:57 be1va kernel: [698773.399943]  ret_from_fork+0x1f/0x40
Dec 25 07:15:57 be1va kernel: [698773.401562] Modules linked in: 
rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss nfs_acl nfs lockd grace 
fscache sunrpc sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
pcc_cpufreq virtio_console virtio_balloon sg joydev serio_raw evdev pcspkr 
button qemu_fw_cfg ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb sd_mod 
ata_generic virtio_net net_failover virtio_scsi failover crc32c_intel ata_piix 
uhci_hcd libata ehci_hcd aesni_intel aes_x86_64 virtio_pci crypto_simd scsi_mod 
usbcore virtio_ring cryptd psmouse glue_helper virtio i2c_piix4 usb_common 
floppy
Dec 25 07:15:57 be1va kernel: [698773.420978] CR2: 0028
Dec 25 07:15:57 be1va kernel: [698773.422326] ---[ end trace a1e4f02f504cc7b4 
]---
Dec 25 07:15:57 be1va kernel: [698773.424075] RIP: 0010:rb_next+0x11/0x50
Dec 25 07:15:57 be1va kernel: [698773.425534] Code: 89 ec 48 89 c5 e9 80 fe ff 
ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 8b 0f 48 39 cf 74 39 48 8b 
47 08 48 85 c0 74 22 <48> 8b 50 10 48 85 d2 74 0c 48 89 d0 48 8b 50 10 48 85 d2 
75 f4 c3
Dec 25 07:15:57 be1va kernel: [698773.433054] RSP: 0018:a075c73bfdb0 
EFLAGS: 00010206
Dec 25 07:15:57 be1va kernel: [698773.434964] RAX: 0018 RBX: 
901819663e00 RCX: db8d639239c0

Bug#930443: ixgbe: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0

2019-10-15 Thread Gedalya
See:

https://bugs.launchpad.net/ubuntu/+source/strongswan/+bug/1846283

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?h=v5.4-rc1&id=f39b683d35dfa93a58f1b400a8ec0ff81296b37c

(linked therein)

And:

https://bugzilla.kernel.org/show_bug.cgi?id=204551



Bug#930443: ixgbe: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0

2019-06-12 Thread Gedalya
Package: src:linux
Version: 4.19.28-2

Hi,

We have an IPsec setup with the office, using strongswan on the server side. 
The office is using 192.168.0.0/22. On the server side, we're using 
xx.xx.xx.64/28 and 10.xx.xx.0/24.

Traffic is working fine within the same physical box on the server side. This 
includes virtual machines on the bridge brpriv1. The ethernet interface 
enp5s0f1 is a member of the bridge brpriv1.

When (decrypted) traffic is meant to reach hosts on the local network, leaving 
the box on enp5s0f1, we get the following kernel message for each packet:
ixgbe :05:00.1 enp5s0f1: ixgbe_ipsec_tx: bad sa_idx=64512 handle=0

A few things I've tried:

I've looked into ipsec offloading. It wasn't enabled. Tried enabling, but can't 
because we're using 256-bit keys. Kernel message:
IPsec hw offload only supports keys up to 128 bits with a 32 bit salt

Tried toggling esp-hw-offload, tx-esp-segmentation, esp-tx-csum-hw-offload with 
ethtool, for both public and private ethernet interfaces (enp5s0f0, enp5s0f1).

The kernel module esp4_offload wasn't loaded originally.

Wireguard is in use, in case that matters.

Unfortunately I don't fully understand what's going on here, but I'd be happy 
to investigate further given some guidance.

Also, if someone could provide some tips for a workaround that would be nice, 
since this is currently simply not working.

Thanks,

Gedalya

ipsec statusall:

Connections:
office-1:  %any...office  IKEv2, dpddelay=30s
office-1:   local:  [...] uses pre-shared key authentication
office-1:   remote: [office] uses pre-shared key authentication
office-1:   child:  xx.xx.xx.64/28 === 192.168.0.0/22 TUNNEL, dpdaction=restart
office-2:   child:  10.xx.xx.0/24 === 192.168.0.0/22 TUNNEL, dpdaction=restart
Security Associations (1 up, 0 connecting):
office-1[1]: ESTABLISHED 26 minutes ago, xx.xx.xx.85[...]...xx.xx.xx.186[office]
office-1[1]: IKEv2 SPIs: b3bbc19e78e7fa0c_i* 972193602d0d65fe_r, pre-shared key 
reauthentication in 23 minutes
office-1[1]: IKE proposal: 
AES_CBC_256/HMAC_SHA2_384_192/PRF_HMAC_SHA2_384/ECP_384
office-1{1}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: c8d56650_i d4705673_o
office-1{1}:  AES_GCM_16_256, 301543 bytes_i (5155 pkts, 238s ago), 13153087 
bytes_o (8694 pkts, 238s ago), rekeying in 18 minutes
office-1{1}:   xx.xx.xx.64/28 === 192.168.0.0/22
office-2{2}:  INSTALLED, TUNNEL, reqid 2, ESP SPIs: c7380afa_i 9b883d35_o
office-2{2}:  AES_GCM_16_256/ECP_384, 6376 bytes_i (79 pkts, 9s ago), 8752 
bytes_o (106 pkts, 9s ago), rekeying in 16 minutes
office-2{2}:   10.xx.xx.0/24 === 192.168.0.0/22

ip xfrm policy:

src 10.xx.xx.0/24 dst 192.168.0.0/22
    dir out priority 376447 ptype main
    tmpl src xx.xx.xx.85 dst xx.xx.xx.186
        proto esp spi 0x9b883d35 reqid 2 mode tunnel
src 192.168.0.0/22 dst 10.xx.xx.0/24
    dir fwd priority 376447 ptype main
    tmpl src xx.xx.xx.186 dst xx.xx.xx.85
        proto esp reqid 2 mode tunnel
src 192.168.0.0/22 dst 10.xx.xx.0/24
    dir in priority 376447 ptype main
    tmpl src xx.xx.xx.186 dst xx.xx.xx.85
        proto esp reqid 2 mode tunnel
src xx.xx.xx.64/28 dst 192.168.0.0/22
    dir out priority 374399 ptype main
    tmpl src xx.xx.xx.85 dst xx.xx.xx.186
        proto esp spi 0xd4705673 reqid 1 mode tunnel
src 192.168.0.0/22 dst xx.xx.xx.64/28
    dir fwd priority 374399 ptype main
    tmpl src xx.xx.xx.186 dst xx.xx.xx.85
        proto esp reqid 1 mode tunnel
src 192.168.0.0/22 dst xx.xx.xx.64/28
    dir in priority 374399 ptype main
    tmpl src xx.xx.xx.186 dst xx.xx.xx.85
        proto esp reqid 1 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0
    socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
    socket out priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
    socket in priority 0 ptype main
src 0.0.0.0/0 dst 0.0.0.0/0
    socket out priority 0 ptype main
src ::/0 dst ::/0
    socket in priority 0 ptype main
src ::/0 dst ::/0
    socket out priority 0 ptype main
src ::/0 dst ::/0
    socket in priority 0 ptype main
src ::/0 dst ::/0
    socket out priority 0 ptype main


ip xfrm state

src xx.xx.xx.85 dst xx.xx.xx.186
    proto esp spi 0x9b883d35 reqid 2 mode tunnel
    replay-window 0 flag af-unspec
    aead rfc4106(gcm(aes)) 0x... 128
    anti-replay context: seq 0x0, oseq 0x9b, bitmap 0x
src xx.xx.xx.186 dst xx.xx.xx.85
    proto esp spi 0xc7380afa reqid 2 mode tunnel
    replay-window 32 flag af-unspec
    aead rfc4106(gcm(aes)) 0x... 128
    anti-replay context: seq 0x72, oseq 0x0, bitmap 0x
src xx.xx.xx.85 dst xx.xx.xx.186
    proto esp spi 0xd4705673 reqid 1 mode tunnel
    replay-window 0 flag af-unspec
    aead rfc4106(gcm(aes)) 0x... 128
    anti-replay context: seq 0x0, oseq 0x1fdc, bitmap 0x
src xx.xx.xx.186 dst xx.xx.xx.85
    proto esp spi 0xc8d56650 reqid 1 mode tunnel
    replay-window 32 flag af-unspec
    aead rfc4106(gcm(aes)) 0x... 128
    anti-replay context: seq 0x1679, oseq 0x0, bitmap 0x


-

Bug#839627: linux-image-4.7.0-1-amd64: kvm-clock provides unadjusted time

2016-10-08 Thread Gedalya
I have two KVM servers where I'm doing GPU passthrough to a VM. Both are 
running linux 4.6.4.
We're using an NVIDIA GPU which means I have to hide KVM (kvm=off, or 
).
As a result, current_clocksource is tsc. ntpd shows a drift of ~0.
So in seems that the tsc values the VM is getting are adjusted.
On the same server, non-GPU VMs do get an unadjusted kvm-clock.



Bug#839627: linux-image-4.7.0-1-amd64: kvm-clock provides unadjusted time

2016-10-06 Thread Gedalya

On 10/06/2016 08:55 PM, Ben Hutchings wrote:



Presumably you consider the behaviour on 4.6/4.7 to be undesirable.

 From what I can see, the host side may attempt to keep the pvclock
synchronised with its real time but it is not recommended for guests to
rely on this.


It looked like an interesting change in behavior. From what I've read on 
this

topic, I know enough to always run ntpd in virtual machines, including KVM.


check that the host is sending clock updates to the guest.


# cat /sys/module/kvm/parameters/kvmclock_periodic_sync
Y

# cat trace_pipe
kworker/0:47-13195 [000]  387173.146166: kvmclock_update_fn 
<-process_one_work
kworker/0:47-13195 [000]  387173.146174: kvmclock_update_fn 
<-process_one_work
kworker/0:47-13195 [000]  387175.194257: kvmclock_update_fn 
<-process_one_work
   kworker/2:126-13231 [002]  387175.194260: kvmclock_update_fn 
<-process_one_work
kworker/1:45-13244 [001]  387175.194263: kvmclock_update_fn 
<-process_one_work
   kworker/3:151-13105 [003]  387175.194265: kvmclock_update_fn 
<-process_one_work
kworker/0:47-13195 [000]  387474.198723: kvmclock_update_fn 
<-process_one_work
kworker/0:47-13195 [000]  387474.198735: kvmclock_update_fn 
<-process_one_work
kworker/1:45-13244 [001]  387476.246874: kvmclock_update_fn 
<-process_one_work
   kworker/3:151-13105 [003]  387476.246879: kvmclock_update_fn 
<-process_one_work
   kworker/2:126-13231 [002]  387476.246883: kvmclock_update_fn 
<-process_one_work
kworker/0:47-13195 [000]  387476.246884: kvmclock_update_fn 
<-process_one_work


Yet, the guests still seem to be getting the unadjusted time.


Thanks,

Gedalya



Bug#839627: linux-image-4.7.0-1-amd64: kvm-clock provides unadjusted time

2016-10-03 Thread Gedalya
Package: src:linux
Version: 4.7.5-1
Severity: normal

Dear Maintainer,

When booting the host with linux 3.16, it looks like kvm-clock provides guests 
with time as adjusted by ntpd.
This looks like this (note the 'frequency' variable [0]):

host :

# ntpq -crv
associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p8@1.3265-o Tue Jun  7 20:34:16 UTC 2016 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=00, stratum=1,
precision=-23, rootdelay=0.000, rootdisp=1.195, refid=GPS,
reftime=db9b8865.6537d5fd  Sun, Oct  2 2016  9:21:41.395,
clock=db9b8873.2c57b9e4  Sun, Oct  2 2016  9:21:55.173, peer=13173, tc=4,
mintc=3, offset=-0.003014, frequency=-8.983, sys_jitter=0.007532,
clk_jitter=0.010, clk_wander=0.002

guest :

# ntpq -crv
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Jul 22 17:30:51 UTC 2016 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=00, stratum=2,
precision=-22, rootdelay=0.415, rootdisp=18.143, refid=192.168.9.10,
reftime=db9b8686.bccb12d6  Sun, Oct  2 2016  9:13:42.737,
clock=db9b88a1.6495514c  Sun, Oct  2 2016  9:22:41.392, peer=11095,
tc=10, mintc=3, offset=0.280, frequency=-0.094, sys_jitter=0.386,
clk_jitter=1.265, clk_wander=0.251

Note the drift measured on the host as ~-9, and on the guest ~0.

When booting with linux 4.6/4.7,

host :

# ntpq -crv
associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync,
version="ntpd 4.2.8p8@1.3265-o Tue Jun  7 20:34:16 UTC 2016 (1)",
processor="x86_64", system="Linux/4.7.0-1-amd64", leap=00, stratum=1,
precision=-23, rootdelay=0.000, rootdisp=1.210, refid=GPS,
reftime=db9c93cd.12585193  Mon, Oct  3 2016  4:22:37.071,
clock=db9c93db.66370833  Mon, Oct  3 2016  4:22:51.399, peer=34903, tc=4,
mintc=3, offset=0.003722, frequency=-9.350, sys_jitter=0.010023,
clk_jitter=0.013, clk_wander=0.003

guest :

# ntpq -crv
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Jul 22 17:30:51 UTC 2016 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=00, stratum=2,
precision=-23, rootdelay=0.310, rootdisp=21.999, refid=192.168.9.30,
reftime=db9c92a7.85b541f5  Mon, Oct  3 2016  4:17:43.522,
clock=db9c93fd.0f4d900a  Mon, Oct  3 2016  4:23:25.059, peer=30713,
tc=10, mintc=3, offset=0.023, frequency=-9.399, sys_jitter=0.285,
clk_jitter=0.997, clk_wander=0.192

ntpd measures on the guest the same drift as on the host.
This gives me the impression that on the later kernels, kvm-clock provides a 
raw, unadjusted time.


On this particular setup I have the host running a stratum-1 NTP server, hooked 
up to a GPS device.
The guests sync against it and other servers.
However I observe the same behavior on several other servers running in various 
locations.
On those running linux 3.16, the guests' measured drift hovers around zero, and 
on those running 4.6 or 4.7, the drift is around the same as on the host.
On all those other hosts and guests, ntpd runs with the unmodified default 
config file, picking up internet ntp servers from the pool.


[0] http://doc.ntp.org/current-stable/ntpq.html#system


-- Package-specific info:
** Version:
Linux version 4.7.0-1-amd64 (debian-kernel@lists.debian.org) (gcc version 5.4.1 
20160904 (Debian 5.4.1-2) ) #1 SMP Debian 4.7.5-1 (2016-09-26)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.7.0-1-amd64 root=/dev/mapper/rvg0-vmhost0_rootfs ro 
quiet

** Not tainted


-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.7.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages linux-image-4.7.0-1-amd64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.125
ii  kmod22-1.1
ii  linux-base  4.5

Versions of packages linux-image-4.7.0-1-amd64 recommends:
ii  firmware-linux-free  3.4
ii  irqbalance   1.1.0-2

Versions of packages linux-image-4.7.0-1-amd64 suggests:
pn  debian-kernel-handbook  
ii  grub-pc 2.02~beta2-36
pn  linux-doc-4.7   

Versions of packages linux-image-4.7.0-1-amd64 is related to:
ii  firmware-amd-graphics 20160824-1
pn  firmware-atheros  
pn  firmware-bnx2 
pn  firmware-bnx2x
pn  firmware-brcm80211
pn  firmware-cavium   
pn  firmware-intel-sound  
pn  firmware-intelwimax   
pn  firmware-ipw2x00  
pn  firmware-ivtv 
pn  firmware-iwlwifi  
pn  firmware-libertas 
ii  firmware-linux-nonfree20160824-1
ii  firmware-misc-nonfree 20160824-1
pn  firmware-myricom  
pn  firmware-netxen   
pn  firmware-qlogic   
ii  firmware-realtek  20160824-1
pn  firmware-samsung  
pn  firmware-siano 

Bug#771045: System randomly freezes using Kernel 3.16 and radeon

2015-02-15 Thread Gedalya

On 02/16/2015 01:08 AM, Gedalya wrote:

Fixes for this are included in Linux 3.19, commits:

3a01fd367e09ebf05d75a000407364e7ebe2b678
d474ea7e52cbaaae22711d857949ba6018562c29
cbfc35b90f3b4853d1eb9fcb82e99531d6a1c629



Ah, already in 3.16.7-ckt6 too.


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54e137b5.3020...@gedalya.net



Bug#771045: System randomly freezes using Kernel 3.16 and radeon

2015-02-15 Thread Gedalya

Fixes for this are included in Linux 3.19, commits:

3a01fd367e09ebf05d75a000407364e7ebe2b678
d474ea7e52cbaaae22711d857949ba6018562c29
cbfc35b90f3b4853d1eb9fcb82e99531d6a1c629


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54e13511.9020...@gedalya.net



Bug#776050: Bug#776237: xen-hypervisor-4.4-amd64: kernel panic on dom0 boot

2015-01-27 Thread Gedalya
On 01/27/2015 03:17 AM, Ian Campbell wrote:
>> Could #776237 be related to #776050?
>>
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776050
> It's quite possible and the coincidence of the issue arising in ckt4
> makes me think it most likely is.
>
> Lets assume it is for now, I've merged the two bugs. If the issue
> persists after the fix for #776050 is uploaded we can always unmerge.
>
> Ian.
>

Yup, looks OK with 3.16.7-ckt4-2.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54c86802.8080...@gedalya.net



Bug#776448: xen dom0: "xen:balloon: reserve_additional_memory: add_memory() failed: -17"

2015-01-27 Thread Gedalya
Package: src:linux
Version: 3.16.7-ckt4-2
Severity: normal

Dear Maintainer,

After a clean boot everything is OK, but as soon as I reboot a domU for
the first time, I start getting these lines in dmesg:

xen:balloon: reserve_additional_memory: add_memory() failed: -17

And they keep repeating forever, quite frequently.

This seems to have been fixed in ubuntu:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304001
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3dcf63677d4eb7fdfc13290c8558c301d2588fe8


# xl dmesg
(XEN) Xen version 4.4.1 (Debian 4.4.1-6) (wa...@debian.org) (gcc (Debian
4.9.2-5) 4.9.2) debug=n Thu Dec 11 14:57:34 UTC 2014
(XEN) Bootloader: GRUB 2.02~beta2-20
(XEN) Command line: placeholder dom0_mem=768M,max:768M ucode=scan

And I have autoballoon="off" in /etc/xen/xl.conf



-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt4-2 (2015-01-27)

** Command line:
placeholder root=/dev/mapper/rvg0-dom0_80_rootfs ro quiet

** Not tainted

** Kernel log:
[   46.806552] device vif4.1 entered promiscuous mode
[   46.806719] device vif4.0 entered promiscuous mode
[   46.808018] IPv6: ADDRCONF(NETDEV_UP): vif4.1: link is not ready
[   46.808201] IPv6: ADDRCONF(NETDEV_UP): vif4.0: link is not ready
[   48.080581] xen-blkback:ring-ref 8, event-channel 15, protocol 1
(x86_64-abi) persistent grants
[   48.096044] xen-blkback:ring-ref 9, event-channel 16, protocol 1
(x86_64-abi) persistent grants
[   48.104120] xen-blkback:ring-ref 10, event-channel 17, protocol 1
(x86_64-abi) persistent grants
[   48.111257] xen-blkback:ring-ref 11, event-channel 18, protocol 1
(x86_64-abi) persistent grants
[   48.131462] vif vif-4-0 vif4.0: Guest Rx ready
[   48.131495] IPv6: ADDRCONF(NETDEV_CHANGE): vif4.0: link becomes ready
[   48.131532] breth1: port 4(vif4.0) entered forwarding state
[   48.131535] breth1: port 4(vif4.0) entered forwarding state
[   48.136706] vif vif-4-1 vif4.1: Guest Rx ready
[   48.136734] IPv6: ADDRCONF(NETDEV_CHANGE): vif4.1: link becomes ready
[   48.136753] breth0: port 5(vif4.1) entered forwarding state
[   48.136756] breth0: port 5(vif4.1) entered forwarding state
[   52.524439] device vif5.0 entered promiscuous mode
[   52.526017] IPv6: ADDRCONF(NETDEV_UP): vif5.0: link is not ready
[   52.810327] xen-blkback:ring-ref 8, event-channel 8, protocol 2
(x86_32-abi)
[   52.820391] xen-blkback:ring-ref 9, event-channel 9, protocol 2
(x86_32-abi)
[   52.832214] vif vif-5-0 vif5.0: Guest Rx ready
[   52.832257] IPv6: ADDRCONF(NETDEV_CHANGE): vif5.0: link becomes ready
[   52.832304] breth0: port 6(vif5.0) entered forwarding state
[   52.832308] breth0: port 6(vif5.0) entered forwarding state
[   58.840587] device vif6.0 entered promiscuous mode
[   58.842141] IPv6: ADDRCONF(NETDEV_UP): vif6.0: link is not ready
[   58.842798] device vif6.1 entered promiscuous mode
[   58.844263] IPv6: ADDRCONF(NETDEV_UP): vif6.1: link is not ready
[   60.145988] vif vif-6-0 vif6.0: Guest Rx ready
[   60.146026] IPv6: ADDRCONF(NETDEV_CHANGE): vif6.0: link becomes ready
[   60.146070] breth1: port 5(vif6.0) entered forwarding state
[   60.146074] breth1: port 5(vif6.0) entered forwarding state
[   60.147731] xen-blkback:ring-ref 8, event-channel 15, protocol 1
(x86_64-abi) persistent grants
[   60.158149] vif vif-6-1 vif6.1: Guest Rx ready
[   60.158178] IPv6: ADDRCONF(NETDEV_CHANGE): vif6.1: link becomes ready
[   60.158199] breth0: port 7(vif6.1) entered forwarding state
[   60.158202] breth0: port 7(vif6.1) entered forwarding state
[   60.160099] xen-blkback:ring-ref 2308, event-channel 24, protocol 1
(x86_64-abi) persistent grants
[   60.173611] xen-blkback:ring-ref 2309, event-channel 25, protocol 1
(x86_64-abi) persistent grants
[   60.181874] xen-blkback:ring-ref 2310, event-channel 26, protocol 1
(x86_64-abi) persistent grants
[   65.525950] device vif7.1 entered promiscuous mode
[   65.527393] IPv6: ADDRCONF(NETDEV_UP): vif7.1: link is not ready
[   65.528520] device vif7.0 entered promiscuous mode
[   65.529955] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready
[   66.770492] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi) persistent grants
[   66.783509] xen-blkback:ring-ref 9, event-channel 10, protocol 1
(x86_64-abi) persistent grants
[   66.869486] vif vif-7-0 vif7.0: Guest Rx ready
[   66.869528] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready
[   66.869567] breth0: port 8(vif7.0) entered forwarding state
[   66.869571] breth0: port 8(vif7.0) entered forwarding state
[   66.872315] vif vif-7-1 vif7.1: Guest Rx ready
[   66.872341] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.1: link becomes ready
[   66.872363] breth1: port 6(vif7.1) entered forwarding state
[   66.872366] breth1: port 6(vif7.1) entered forwarding state
[   71.443566] device vif8.0 entered promiscuous mode
[   71.444965] IPv6: ADDRCONF(NETDEV_UP): vif8.0: link is not re

Bug#776050: Bug#776237: xen-hypervisor-4.4-amd64: kernel panic on dom0 boot

2015-01-26 Thread Gedalya
On Mon, 26 Jan 2015 10:24:41 + Ian Campbell wrote:
>
> This is actually a kernel issue I think, so reassigning accordingly.
>
> 2c3fc8d26dd0 "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
> was backported to the stable kernel but this commit was reverted in
> mainline via dbdd74763f1f.
>
> I'll revert it in the Debian kernel and let the stable kernel folks
> know.
>
> Ian.
>
>
>

Could #776237 be related to #776050?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776050



Bug#771045: Acknowledgement (linux-image-3.16.0-4-amd64: System randomly freezes using Kernel 3.16 and radeon)

2015-01-23 Thread Gedalya

On Tue, 20 Jan 2015 12:36:59 +0100 Antoine Amarilli wrote:
> It seems like the bug was already reported in mesa:
>

The mesa package already has a fix for this staged in git for jessie, 
not released yet.


http://anonscm.debian.org/cgit/pkg-xorg/lib/mesa.git/commit/?h=debian-jessie&id=6d8a4971d16ddaa5429dc9e3a26b603e08738d8a

This bug should nevertheless remain open against the kernel and track 
the progress of the patches at:

http://lists.freedesktop.org/archives/dri-devel/2015-January/074967.html

Note the comment in the upstream fix in mesa, saying that it is a 
temporary fix disabling some functionality until the kernel is properly 
fixed.


I was trying to build a local kernel with these patches applied and ran 
into some difficulty. Will get back to it later.




Bug#771045: Kernel crashes with radeonsi

2015-01-12 Thread Gedalya
I've been running now for 4d17h with locally built mesa 10.3.2-1 + 
upstream commit ae4536b4 applied.

Kernel 3.16.7-ckt2-1.
So far running without any problems. Without that patch I wouldn't last 
24 hours.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54b387f4.4000...@gedalya.net



Bug#767261: xen-netback changes in #767261 break Mini-OS netfront

2014-12-29 Thread Gedalya

Hi Ian!

Thank you so much for building this kernel!

I just had this problem with a Windows 2012 R2 domu with GPLPV drivers 
from http://www.ejbdigital.com.au/1-0-1105/.

dom0 has the message:
[  324.272925] vif vif-5-0: 22 feature-rx-notify is mandatory
domu obviously has nothing helpful to say ;-)
With your newly built kernel, everything seems to be fine.

Cheers!

Gedalya


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/54a24b4a.1040...@gedalya.net



Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down

2014-11-09 Thread Gedalya

On 11/09/2014 05:11 AM, Ian Campbell wrote:

On Sat, 2014-11-08 at 15:13 -0500, Gedalya wrote:

On 11/08/2014 08:44 AM, Gedalya wrote:

Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work
very well ;-)

Did you backport just the above or the full set of changes from 3.18?

I tried to "simplify" (avoid having to edit code myself..) by just
copying the full xen-netback from 3.18 as it is.
I did have to revert "c835a6 net: set name_assign_type in
alloc_netdev()" to get it to compile, but then it gave me a kernel bug
as soon as a xen guest booted up.
(see attached if it matters)
I'll try to apply just those 3 patches and see how it goes.

Important: I have no idea what I'm doing!!

:-D


So I cherry-picked the following
xen-netback: reintroduce guest Rx stall detection
xen-netback: fix unlimited guest Rx internal queue and carrier flapping
xen-netback: make feature-rx-notify mandatory
xen-netback: Don't deschedule NAPI when carrier off
xen-netback: Fix vif->disable handling
xen-netback: Turn off the carrier if the guest is not able to receive
xen-netback: Using a new state bit instead of carrier

I'm attaching the two commits for which I had to manually resolve
conflicts, and finally a debian quilt patch including all 7 commits for
3.16.7-2

So far this is working, behavior is as I described for 3.18.

Perhaps this could be helpful but someone should certainly review it.

Thanks, I actually ended up backporting a few more patches, effectively
all of the netback changes since v3.16 since they all looked like useful
fixes, and it reduced the conflicts.

If you were able to test the kernel from
http://xenbits.xen.org/people/ianc/debian/767261/ that would be great
(I'm struggling a bit to regroove my usual test box with something
useful).

Ian.


OK, that works. Got it to stall etc., uptime 6 minutes.. all good so 
far. I'll let u know if anything interesting happens.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/545f42d8.4040...@gedalya.net



Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down

2014-11-08 Thread Gedalya

On 11/08/2014 08:44 AM, Gedalya wrote:



Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work
very well ;-)

Did you backport just the above or the full set of changes from 3.18?
I tried to "simplify" (avoid having to edit code myself..) by just 
copying the full xen-netback from 3.18 as it is.
I did have to revert "c835a6 net: set name_assign_type in 
alloc_netdev()" to get it to compile, but then it gave me a kernel bug 
as soon as a xen guest booted up.

(see attached if it matters)
I'll try to apply just those 3 patches and see how it goes. 


Important: I have no idea what I'm doing!!

So I cherry-picked the following
xen-netback: reintroduce guest Rx stall detection
xen-netback: fix unlimited guest Rx internal queue and carrier flapping
xen-netback: make feature-rx-notify mandatory
xen-netback: Don't deschedule NAPI when carrier off
xen-netback: Fix vif->disable handling
xen-netback: Turn off the carrier if the guest is not able to receive
xen-netback: Using a new state bit instead of carrier

I'm attaching the two commits for which I had to manually resolve 
conflicts, and finally a debian quilt patch including all 7 commits for 
3.16.7-2


So far this is working, behavior is as I described for 3.18.

Perhaps this could be helpful but someone should certainly review it.

>From 027c8388ba6d47e248def1ea04584d0fed2d3cd9 Mon Sep 17 00:00:00 2001
From: David Vrabel 
Date: Wed, 22 Oct 2014 14:08:53 +0100
Subject: [PATCH 1/2] xen-netback: make feature-rx-notify mandatory

Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).

This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.

Signed-off-by: David Vrabel 
Acked-by: Wei Liu 
Signed-off-by: David S. Miller 

Conflicts:
	drivers/net/xen-netback/interface.c
---
 drivers/net/xen-netback/common.h|  5 -
 drivers/net/xen-netback/interface.c | 12 +---
 drivers/net/xen-netback/xenbus.c| 13 -
 3 files changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5d50a7a..1d5a694 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -226,9 +226,6 @@ struct xenvif {
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 
-	/* Internal feature information. */
-	u8 can_queue:1;	/* can queue packets for receiver? */
-
 	/* Is this interface disabled? True when backend discovers
 	 * frontend is rogue.
 	 */
@@ -266,8 +263,6 @@ void xenvif_xenbus_fini(void);
 
 int xenvif_schedulable(struct xenvif *vif);
 
-int xenvif_must_stop_queue(struct xenvif_queue *queue);
-
 int xenvif_queue_stopped(struct xenvif_queue *queue);
 void xenvif_wake_queue(struct xenvif_queue *queue);
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1dc9cdbf..280f86d8 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -43,16 +43,6 @@
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
 
-static inline void xenvif_stop_queue(struct xenvif_queue *queue)
-{
-	struct net_device *dev = queue->vif->dev;
-
-	if (!queue->vif->can_queue)
-		return;
-
-	netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
-}
-
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) &&
@@ -192,7 +182,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (!xenvif_rx_ring_slots_available(queue, min_slots_needed)) {
 		queue->rx_stalled.function = xenvif_rx_stalled;
 		queue->rx_stalled.data = (unsigned long)queue;
-		xenvif_stop_queue(queue);
+		netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
 		mod_timer(&queue->rx_stalled,
 			  jiffies + rx_drain_timeout_jiffies);
 	}
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3d85acd..096b244 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -694,15 +694,10 @@ static int read_xenbus_vif_flags(struct backend_info *be)
 	if (!rx_copy)
 		return -EOPNOTSUPP;
 
-	if (vif->dev->tx_queue_len != 0) {
-		if (xenbus_scanf(XBT_NIL, dev->otherend,
- "feature-rx-notify", "%d", &val) < 0)
-			val = 0;
-		if (val)
-			vif->can_queue = 1;
-		else
-			/* Must be non-zero for pfifo_fast to work. */
-			vif->dev->tx_queue_len = 1;
+	if (xenbus_scanf(XBT_NIL, dev->otherend,
+			 "feature-rx-notify", "%d", &val) < 0 || val == 0) {
+		xenbus_dev_fatal(dev, -EINVAL, "feature-rx-notify is mandatory");
+		return -EINVAL;
 	}
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
-- 
2.1.3

>From 9d3a3cbb65c7972fb152aa887650eb41335a5f32 Mon Sep 

Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down

2014-11-08 Thread Gedalya

On 11/08/2014 05:39 AM, Ian Campbell wrote:

On Sat, 2014-11-08 at 00:40 -0500, Gedalya wrote:

On 11/07/2014 03:25 AM, Ian Campbell wrote:

On Thu, 2014-11-06 at 11:06 -0500, Gedalya wrote:

I suspect we will need to backport some xen-netback patch or other. I've
put some feelers out to see if any of the upstream devs have any
hints...

OK so if it's just a matter of changing a kernel on one box, I can
perhaps try to build a 3.18 this weekend

I think these commits, which are in v3.18-rc3, are probably the ones:

ecf08d2 xen-netback: reintroduce guest Rx stall detection
f48da8b xen-netback: fix unlimited guest Rx internal queue and carrier flapping
bc96f64 xen-netback: make feature-rx-notify mandatory

I'll investigate a backport/check if they are destined for stable@.

Ian.


Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work
very well ;-)

Did you backport just the above or the full set of changes from 3.18?
I tried to "simplify" (avoid having to edit code myself..) by just 
copying the full xen-netback from 3.18 as it is.
I did have to revert "c835a6 net: set name_assign_type in 
alloc_netdev()" to get it to compile, but then it gave me a kernel bug 
as soon as a xen guest booted up.

(see attached if it matters)
I'll try to apply just those 3 patches and see how it goes.





I'm running 3.18rc3+ now. Bombarding the downed interface by
broadcast-pinging the network it's on causes the following
[  281.396014] vif vif-3-0 vif3.0: Guest Rx stalled
[  281.396080] breth1: port 3(vif3.0) entered disabled state
and that's it. This is instead of the previously repeated 'draining TX
queue' messages.
Let's assume it won't crash, I'll let you know if this assumption turns
out to be wrong.

I'm kind of curious why this is preceded by
[   46.232475] vif vif-3-0 vif3.0: Guest Rx ready
[   46.232514] IPv6: ADDRCONF(NETDEV_CHANGE): vif3.0: link becomes ready
And the host figures out it's down only when traffic comes and doesn't
get through.
I guess this might change if I run 3.18 in the guest too?

I *think* this is the intended behaviour of "xen-netback: reintroduce
guest Rx stall detection", since the interface is down on the guest side
it becomes considered stalled (i.e not processing any packets).

The "link becomes ready" message I think refers to the backend end of
the connection, it's like a network cable only plugged in at one end or
something. Perhaps things could be smarter, but that would be an
upstream thing I think.

OK, makes sense. Thanks!

Nov  7 23:30:59 xen kernel: [   31.990845] BUG: unable to handle kernel NULL pointer dereference at   (null)
Nov  7 23:30:59 xen kernel: [   31.990862] IP: [] strcmp+0xc/0x30
Nov  7 23:30:59 xen kernel: [   31.990871] PGD 0 
Nov  7 23:30:59 xen kernel: [   31.990876] Oops:  [#1] SMP 
Nov  7 23:30:59 xen kernel: [   31.990882] Modules linked in: xen_netback(+) xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc it87 hwmon_vid snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support coretemp pps_ldisc pps_core i2c_i801 snd_hda_codec_realtek lpc_ich snd_hda_codec_generic mfd_core nouveau ppdev evdev snd_hda_intel mxm_wmi pcspkr snd_hda_controller tpm_infineon tpm_tis i7core_edac edac_core snd_hda_codec serio_raw video tpm snd_hwdep ttm drm_kms_helper drm i2c_algo_bit i2c_core snd_pcm snd_timer snd parport_pc parport wmi soundcore shpchp button processor thermal_sys ext4 crc16 mbcache jbd2 dm_mod ata_generic sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common crc32c_intel firewire_ohci firewire_core crc_itu_t r8169 mii ahci pata_jmicron libahci ehci_pci uhci_hcd xhci_hcd libata ehci_hcd megaraid_sas usbcore usb_common scsi_mod
Nov  7 23:30:59 xen kernel: [   31.992670] CPU: 0 PID: 2470 Comm: udevd Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-2
Nov  7 23:30:59 xen kernel: [   31.993478] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4P/P55A-UD4P, BIOS F13 08/10/2010
Nov  7 23:30:59 xen kernel: [   31.994297] task: 8800028ff530 ti: 88001d6ec000 task.ti: 88001d6ec000
Nov  7 23:30:59 xen kernel: [   31.995114] RIP: e030:[]  [] strcmp+0xc/0x30
Nov  7 23:30:59 xen kernel: [   31.995933] RSP: e02b:88001d6efcf0  EFLAGS: 00010202
Nov  7 23:30:59 xen kernel: [   31.996744] RAX: 0076 RBX: 88001f622b80 RCX: 0002
Nov  7 23:30:59 xen kernel: [   31.997556] RDX: 0002 RSI: 0001 RDI: 88001fdb3541
Nov  7 23:30:59 xen kernel: [   31.998373] RBP: 88002097a5c0 R08: 0004 R09: 0008
Nov  7 23:30:59 xen kernel: [   31.999189] R10: 818e1880 R11: 3ecd R12: 
Nov  7 23:30:59 xen kernel: [   31.94] R13: a0676000 R14: a0673390 R15: 0001
Nov  7 23:30:59 xen kernel: [   32.000803] FS:  00

Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down

2014-11-07 Thread Gedalya

On 11/07/2014 03:25 AM, Ian Campbell wrote:

On Thu, 2014-11-06 at 11:06 -0500, Gedalya wrote:

I suspect we will need to backport some xen-netback patch or other. I've
put some feelers out to see if any of the upstream devs have any
hints...

OK so if it's just a matter of changing a kernel on one box, I can
perhaps try to build a 3.18 this weekend

I think these commits, which are in v3.18-rc3, are probably the ones:

ecf08d2 xen-netback: reintroduce guest Rx stall detection
f48da8b xen-netback: fix unlimited guest Rx internal queue and carrier flapping
bc96f64 xen-netback: make feature-rx-notify mandatory

I'll investigate a backport/check if they are destined for stable@.

Ian.

Tried to just frankenport xen-netback from 3.18 into 3.16, didn't work 
very well ;-)
I'm running 3.18rc3+ now. Bombarding the downed interface by 
broadcast-pinging the network it's on causes the following

[  281.396014] vif vif-3-0 vif3.0: Guest Rx stalled
[  281.396080] breth1: port 3(vif3.0) entered disabled state
and that's it. This is instead of the previously repeated 'draining TX 
queue' messages.
Let's assume it won't crash, I'll let you know if this assumption turns 
out to be wrong.


I'm kind of curious why this is preceded by
[   46.232475] vif vif-3-0 vif3.0: Guest Rx ready
[   46.232514] IPv6: ADDRCONF(NETDEV_CHANGE): vif3.0: link becomes ready
And the host figures out it's down only when traffic comes and doesn't 
get through.

I guess this might change if I run 3.18 in the guest too?


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/545daceb.7050...@gedalya.net



Bug#767261: [Pkg-xen-devel] Bug#767261: xen-hypervisor-4.4-amd64: host lockup when DomU network iface is down

2014-11-06 Thread Gedalya

On 11/06/2014 07:17 AM, Ian Campbell wrote:

Control: reassign -1 src:linux
Control: found -1 3.16.5-1

On Wed, 2014-10-29 at 12:57 -0400, Gedalya wrote:

On dom0 I get messages like 'vif vif-10-0 vif10.0: draining TX queue',
starting as soon as the domU's boot up. I'm pretty sure this is a
regression from Xen 4.1 in wheezy.

[...]

dom0 and domU kernel is linux 3.16-3-amd64 3.16.5-1

This is most likely to be a dom0 kernel side issue, so reassigning.

Are there any interesting messages preceeding the "draining TX queue"
ones?

No... nothing at all.



I suspect we will need to backport some xen-netback patch or other. I've
put some feelers out to see if any of the upstream devs have any
hints...
OK so if it's just a matter of changing a kernel on one box, I can 
perhaps try to build a 3.18 this weekend




Ian.





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/545b9c8e.5040...@gedalya.net



Bug#675302: reassign to linux

2012-11-26 Thread Gedalya

reassign 675302 src:linux 3.2.32-1
thanks


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/50b3654d.7070...@gedalya.net



Bug#681743: i915: display backlight brightness initially set to zero on boot

2012-10-14 Thread Gedalya

On 07/16/2012 07:45 AM, Jonathan Nieder wrote:

If I don't get back to you soon, feel free to ping me.

ping!


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/507afa52.60...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-10-14 Thread Gedalya

What's up?
Linux 3.2.30-1 (currently in sid) still has the same problem.


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/507afad7.6010...@gedalya.net



Bug#681743: i915: display backlight brightness initially set to zero on boot

2012-07-19 Thread Gedalya
Tested 3.5.0-rc7+, boots up the same way, with the backlight off. Adding
i915.invert_brightness=1 makes it turn the brightness up to the maximum
as expected, however this is pretty ugly. Without this parameter, I can
just turn the brightness up with the fn keys and I get the nice
on-screen indicator from GNOME. However with invert_brightness I need to
turn it _up_, with the indicator showing it going up, in order to get it
down, and vice versa.
Anyway, I guess the desired result is that it should "just work" without
having to do special workarounds.


On Mon, 2012-07-16 at 06:45 -0500, Jonathan Nieder wrote:
> Those commits are both in 3.5-rc1,
> so if you get a chance to test 3.5-rc1 or newer, that would also be
> useful.
> 
> 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1342681054.3341.2.ca...@dml.gedalya.net



Bug#680737: [wheezy] Intel i915: black display after boot

2012-07-15 Thread Gedalya
On Mon, 2012-07-16 at 01:32 -0500, Jonathan Nieder wrote:
> Thanks for writing.  Your hardware is sufficiently different from
> Roland's that I really want this as a separate bug.  We can merge them
> later if they turn out to have the same cause.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681743

> [...]
> > ** Tainted: W (512)
> >  * Taint on warning.
> 
> To save a round trip: when filing that new bug, please include the
> portion of your kernel log describing this warning (or full "dmesg"
> output from a normal boot, which would be even better) as an
> attachment.
> 
Not sure what that warning was. That bug report was done after my
machine was up for several days, maybe that had something to do with a
USB wifi adapter which used a non-free firmware or something. The new
bug report with a newly-booted dmesg didn't say that.

> Hope that helps,
> Jonathan


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1342421618.3391.5.ca...@dml.gedalya.net



Bug#680737: linux-image-3.2.0-2-amd64: Intel i915: black display after boot

2012-07-15 Thread Gedalya
Package: src
Version: 3.2.21-3
Followup-For: Bug #680737

I think this is the same problem here.. using an HP dm4t-1200 laptop.
The backlight is set to zero brightness as soon as the kernel begins to load.
In my case I can set the brightness higher by simply using the appropriate fn
key combination, but until then everything looks black.

It looks to me like the following is the fix:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=7bd90909bbf9ce7c40e1da3d72b97b93839c188a
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=4dca20efb1a9c2efefc28ad2867e5d6c3f5e1955

found some discussion here
https://bbs.archlinux.org/viewtopic.php?id=131747




-- Package-specific info:
** Version:
Linux version 3.2.0-3-amd64 (Debian 3.2.21-3) (debian-kernel@lists.debian.org) 
(gcc version 4.6.3 (Debian 4.6.3-8) ) #1 SMP Thu Jun 28 09:07:26 UTC 2012

** Command line:
BOOT_IMAGE=/vmlinuz-3.2.0-3-amd64 root=/dev/mapper/dml_vg0-rootfs ro quiet

** Tainted: W (512)
 * Taint on warning.

** Kernel log:
[126929.403902] Disabling non-boot CPUs ...
[126929.507481] CPU 1 is now offline
[126929.611451] CPU 2 is now offline
[126929.715414] CPU 3 is now offline
[126929.716044] Extended CMOS year: 2000
[126929.716255] ACPI: Low-level resume complete
[126929.716317] PM: Restoring platform NVS memory
[126929.716907] Extended CMOS year: 2000
[126929.716953] Enabling non-boot CPUs ...
[126929.717127] Booting Node 0 Processor 1 APIC 0x1
[126929.717128] smpboot cpu 1: start_ip = 98000
[126929.728155] Calibrating delay loop (skipped) already calibrated this CPU
[126929.748519] NMI watchdog enabled, takes one hw-pmu counter.
[126929.748862] CPU1 is up
[126929.749006] Booting Node 0 Processor 2 APIC 0x4
[126929.749007] smpboot cpu 2: start_ip = 98000
[126929.760034] Calibrating delay loop (skipped) already calibrated this CPU
[126929.780482] NMI watchdog enabled, takes one hw-pmu counter.
[126929.780840] CPU2 is up
[126929.781030] Booting Node 0 Processor 3 APIC 0x5
[126929.781031] smpboot cpu 3: start_ip = 98000
[126929.792056] Calibrating delay loop (skipped) already calibrated this CPU
[126929.812621] NMI watchdog enabled, takes one hw-pmu counter.
[126929.813138] CPU3 is up
[126929.815702] ACPI: Waking up from system sleep state S3
[126930.140757] i915 :00:02.0: restoring config space at offset 0x1 (was 
0x97, writing 0x900407)
[126930.140819] ehci_hcd :00:1a.0: restoring config space at offset 0x1 
(was 0x296, writing 0x292)
[126930.140853] ehci_hcd :00:1a.0: wake-up capability disabled by ACPI
[126930.140857] ehci_hcd :00:1a.0: PME# disabled
[126930.140881] snd_hda_intel :00:1b.0: restoring config space at offset 
0x1 (was 0x16, writing 0x12)
[126930.140983] ehci_hcd :00:1d.0: restoring config space at offset 0x1 
(was 0x296, writing 0x292)
[126930.141008] ehci_hcd :00:1d.0: wake-up capability disabled by ACPI
[126930.141013] ehci_hcd :00:1d.0: PME# disabled
[126930.141025] pci :00:1e.0: restoring config space at offset 0xa (was 
0x, writing 0x0)
[126930.141099] ahci :00:1f.2: restoring config space at offset 0x1 (was 
0x2b7, writing 0x2b00407)
[126930.141213] r8169 :01:00.0: restoring config space at offset 0x1 (was 
0x17, writing 0x100407)
[126930.141301] brcmsmac :02:00.0: restoring config space at offset 0xf 
(was 0x100, writing 0x10a)
[126930.141320] brcmsmac :02:00.0: restoring config space at offset 0x4 
(was 0x4, writing 0xb244)
[126930.141326] brcmsmac :02:00.0: restoring config space at offset 0x3 
(was 0x0, writing 0x10)
[126930.141332] brcmsmac :02:00.0: restoring config space at offset 0x1 
(was 0x10, writing 0x16)
[126930.141454] PM: early resume of devices complete after 0.746 msecs
[126930.141561] i915 :00:02.0: setting latency timer to 64
[126930.141567] ehci_hcd :00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 
16
[126930.141574] ehci_hcd :00:1a.0: setting latency timer to 64
[126930.141606] snd_hda_intel :00:1b.0: PCI INT A -> GSI 22 (level, low) -> 
IRQ 22
[126930.141611] snd_hda_intel :00:1b.0: setting latency timer to 64
[126930.141617] ehci_hcd :00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 
20
[126930.141624] ehci_hcd :00:1d.0: setting latency timer to 64
[126930.141644] pci :00:1e.0: setting latency timer to 64
[126930.141657] snd_hda_intel :00:1b.0: irq 43 for MSI/MSI-X
[126930.141659] ahci :00:1f.2: setting latency timer to 64
[126930.141710] pci :00:1f.3: PCI INT C -> GSI 19 (level, low) -> IRQ 19
[126930.141757] brcmsmac :02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 
17
[126930.141763] brcmsmac :02:00.0: setting latency timer to 64
[126930.141833] sd 0:0:0:0: [sda] Starting disk
[126930.180638] r8169 :01:00.0: wake-up capability disabled by ACPI
[126930.180644] r8169 :01:00.0: PME# disabled
[126930.286766] Extended CMOS year: 2000
[126930.391986] usb 2-1.6: reset high-speed USB device num

Bug#675302: [Bug 50571] nouveau crashes with GeForce GT 520

2012-06-16 Thread Gedalya

On 6/13/2012 9:43 PM, bugzilla-dae...@freedesktop.org wrote:

If you can find the fix by bisecting, that would make me very happy.



I hope I got it all right, I tried to be very careful...


$ git bisect bad
4cbb0f8d2b06c72aae3552ff1a0a57814c6ce7d2 is the first bad commit
commit 4cbb0f8d2b06c72aae3552ff1a0a57814c6ce7d2
Author: Ben Skeggs 
Date:   Mon Mar 12 15:23:44 2012 +1000

drm/nvd0/disp: disconnect encoders before reprogramming them

Signed-off-by: Ben Skeggs 

:04 04 4648bcf9d08c294fdb7ef76343e6f89e3cc2fe24 
2c31aa85ebf6c6f3b96fc6e91934a895bb42c0a3 M  drivers



$ git bisect log
# bad: [f8f5701bdaf9134b1f90e5044a82c66324d2073f] Linux 3.5-rc1
# good: [c16fa4f2ad19908a47c63d8fa436a1178438c7e7] Linux 3.3
git bisect start 'v3.5-rc1' 'v3.3' '--' 'drivers/gpu/drm/nouveau'
# bad: [4a206ffc0bfe8e8c3fc0468a052f5b0bb625a57b] drm/nouveau: oops, 
create m2mf for nvd9 too

git bisect bad 4a206ffc0bfe8e8c3fc0468a052f5b0bb625a57b
# good: [7d3a766b6aa4e293e72bfd6add477f05ac7fdf5a] drm/nouveau/pm: init 
only after display subsystem has been created

git bisect good 7d3a766b6aa4e293e72bfd6add477f05ac7fdf5a
# bad: [f1377998eede7a8caa124fcf6a589b02c9e2bac7] drm/nouveau: add 
userspace fallback hints.

git bisect bad f1377998eede7a8caa124fcf6a589b02c9e2bac7
# good: [c11dd0da5277596d0ccdccb745b273d69a94f2d7] drm/nouveau/pm: fix 
oops if chipset has no pm support at all

git bisect good c11dd0da5277596d0ccdccb745b273d69a94f2d7
# good: [6e83fda2c055f17780b2feef404f06803a49a261] drm/nvd0/disp: 
initial implementation of displayport

git bisect good 6e83fda2c055f17780b2feef404f06803a49a261
# good: [3488c57b983546e6bf4c9e0bfd0f7f2a1292267a] drm/nvd0/disp: move 
syncs/magic setup to or mode_set

git bisect good 3488c57b983546e6bf4c9e0bfd0f7f2a1292267a
# bad: [2f5394c3ed573de2ab18cdac503b8045cd16ac5e] drm/nouveau: map first 
page of mmio early and determine chipset earlier

git bisect bad 2f5394c3ed573de2ab18cdac503b8045cd16ac5e
# bad: [4cbb0f8d2b06c72aae3552ff1a0a57814c6ce7d2] drm/nvd0/disp: 
disconnect encoders before reprogramming them

git bisect bad 4cbb0f8d2b06c72aae3552ff1a0a57814c6ce7d2




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fdd0eeb.9060...@gedalya.net



Bug#676866: linux-image-3.2.0-2-686-pae: won't boot under xen

2012-06-09 Thread Gedalya

Package: linux-2.6
Version: 3.2.19-1
Severity: important

Dear Maintainer,

   * What led up to the situation?
dist-upgrade, got linux 3.2.19-1

   * What exactly did you do (or not do) that was effective (or
 ineffective)?
reboot, since there's a new kernel.

   * What was the outcome of this action?
won't boot up.

   * What outcome did you expect instead?
boot up.

-- More information:
I was able to reproduce this so far on two DomU's running under my local 
xen host.


Dom0 info:
root@xen:~# uname -a
Linux xen 3.2.0-2-amd64 #1 SMP Mon May 21 17:45:41 UTC 2012 x86_64 GNU/Linux
root@xen:~# dpkg -l | egrep "xen|linux"
ii  console-setup-linux1.76 Linux specific part of 
console-setup
ii  libselinux1:amd64  2.1.9-2 SELinux runtime shared 
libraries

ii  libxen-4.1 4.1.2-6 Public libs for Xen
ii  libxenstore3.0 4.1.2-6 Xenstore communications 
library for Xen

ii  linux-base 3.5 Linux image base package
ii  linux-image-3.2.0-2-amd64  3.2.18-1 Linux 3.2 for 64-bit PCs
ii  linux-image-amd64  3.2+44 Linux for 64-bit PCs 
(meta-package)
ii  util-linux 2.20.1-5 Miscellaneous system 
utilities

ii  xen-hypervisor-4.1-amd64   4.1.2-6 Xen Hypervisor on AMD64
ii  xen-linux-system-3.2.0-2-amd64 3.2.18-1 Xen system with Linux 
3.2 on 64-bit PCs (meta-package)
ii  xen-linux-system-amd64 3.2+44 Xen system with Linux for 
64-bit PCs (meta-package)

ii  xen-utils-4.1  4.1.2-6 XEN administrative tools
ii  xen-utils-common   4.1.2-6 Xen administrative tools 
- common files

ii  xenstore-utils 4.1.2-6 Xenstore utilities for Xen
root@xen:~#

DomU info:
Was at 3.2.18-1, stopped working when picked up 3.2.19-1.
Booted up again when I manually restored vmlinuz and initrd to 3.2.18-1. 
The info below was dumped after the successful boot to 3.2.18-1.


Here is what the failure looked like:

root@xen:~# xm create -c domu-cfgs/wheezytest.cfg
Using config file "./domu-cfgs/wheezytest.cfg".
Exception AttributeError: AttributeError("'_DummyThread' object has no 
attribute '_Thread__block'",) in '/usr/lib/python2.7/threading.pyc'> ignored
  Using 'grub.GrubConf.Grub2ConfigFile'> to parse /boot/grub/grub.cfg

WARNING:root:Unknown directive load_video
   WARNING:root:Unknown directive terminal_output
WARNING:root:Unknown directive source

pyGRUB  version 0.6
 lk
 x Debian GNU/Linux, with Linux 3.2.0-2-686-pae   x
 x Debian GNU/Linux, with Linux 3.2.0-2-686-pae (recovery mode)   x
 x x
 x x
 x x
 x x
 x x
 x x
 mj
 Use the ^ and v keys to select which entry is highlighted.
 Press enter to boot the selected OS, 'e' to edit the
 commands before booting, 'a' to modify the kernel arguments
 before booting, or 'c' for a command line.




 Will boot selected entry in  1 seconds


Started domain wheezytest (id=6)
[0.405353] i8042: No controller found
[0.445719] 
/build/buildd-linux-2.6_3.2.19-1-i386-c_JlIT/linux-2.6-3.2.19/debian/build/source_i386_none/drivers/rtc/hctosys.c: 
unable to open rtc device (rtc0)

[0.447219] BUG: unable to handle kernel paging request at dd9c4dd8
[0.447227] IP: [] atomic64_read_cx8+0x4/0xc
[0.447234] *pdpt = 1d9c5027 *pde = 037c7067 *pte = 
80001d9c4061

[0.447243] Oops: 0003 [#1] SMP
[0.447247] Modules linked in:
[0.447251]
[0.447253] Pid: 1, comm: init Not tainted 3.2.0-2-686-pae #1
[0.447259] EIP: 0061:[] EFLAGS: 00010246 CPU: 0
[0.447264] EIP is at atomic64_read_cx8+0x4/0xc
[0.447267] EAX: dd9c0e40 EBX: dd9c0e40 ECX: dd9c4dd8 EDX: dd9c4dd8
[0.447272] ESI: dd9c6498 EDI: dd9c65f8 EBP: dd9c6440 ESP: df431c98
[0.447276]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: e021
[0.447281] Process init (pid: 1, ti=df43 task=df42f900 
task.ti=df43)

[0.447285] Stack:
[0.447287]  c10ac623 df431d0c b776d000 dd9c6440 b776f000 b776f000 
b776f000 dd9c4dd8
[0.447297]   0fcc  df431d3c dd9c1010 dd9c1010 
b776d000 dd9c0e40
[0.447305]  dfc18060 0001 dfc18060 0001 c10c71f0 fffa 
dfbe1cdc c1416080

[0.447315] Call Trace:
[0.447320]  [] ? unmap_vmas+0x234/0x65b
[0.447326]  [] ? mem_cgroup_add_lru_list+0xe/0x84
[0.447332]  [] ? pagevec_lru_move_fn+0x8a/0x98
[0.447337]  [] ? add_page_to_lru_list+0x54/0x54
[0.447342]  [] ? unmap_region+0x6f/0xb2
[0.447347]  [] ? __split_vma+0x100/0x154
[0.447352]  [] ? do_munmap+0x1b3/0x1fb
[0.447358]  [] ? elf_map+0xa2/0xda
[0.447363]  [] ? load_elf_binary+0x7ff/0x101b
[0.447369]  [] ? vfs_read+0xa1/0xd1
[0.447373] 

Bug#675302: nouveau: hard lockup when gdm3 starts

2012-06-01 Thread Gedalya

On 6/1/2012 2:47 AM, Jonathan Nieder wrote:

Ok, one more test and we should take this upstream: can you reproduce
this with a 3.3.y or newer kernel from experimental?

If so, please report this at, product
Xorg, component Driver/nouveau (yes, that's where they track their
kernel bugs, too), and let us know the bug number so we can track it.


Bug 50571 - https://bugs.freedesktop.org/show_bug.cgi?id=50571




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc86bc2.7010...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Gedalya

On 6/1/2012 1:57 AM, Jonathan Nieder wrote:

Worrisome.  Can you send a full kernel log from booting and doing
this, including the boot-up sequence?  Please send it as an attachment
if possible so the log doesn't get corrupted in transit (e.g. by line
wrapping).


Saved the log from partedmagic too just in case it can help. Looks like 
the ring buffer here is too small to keep the earliest boot messages.



root@PartedMagic:~# uname -a
Linux PartedMagic 3.3.6-pmagic #1 SMP Sat May 12 20:01:06 CDT 2012 i686 
Intel(R) Core(TM)2 Quad CPUQ9550  @ 2.83GHz GenuineIntel GNU/Linux

[0.303349] pci_bus :06: resource 7 [mem 0x000c-0x000d]
[0.303669] pci_bus :06: resource 8 [mem 0xfed4-0xfed44fff]
[0.303988] pci_bus :06: resource 9 [mem 0xd7f0-0xfebf]
[0.304334] NET: Registered protocol family 2
[0.304677] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.305082] TCP established hash table entries: 131072 (order: 8, 1048576 
bytes)
[0.305904] TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
[0.306367] TCP: Hash tables configured (established 131072 bind 65536)
[0.306688] TCP reno registered
[0.306995] UDP hash table entries: 512 (order: 2, 16384 bytes)
[0.307335] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[0.307692] NET: Registered protocol family 1
[0.308104] RPC: Registered named UNIX socket transport module.
[0.308422] RPC: Registered udp transport module.
[0.308735] RPC: Registered tcp transport module.
[0.309061] RPC: Registered tcp NFSv4.1 backchannel transport module.
[0.331043] pci :01:00.0: Boot video device
[0.331379] PCI: CLS 32 bytes, default 64
[0.331725] Trying to unpack rootfs image as initramfs...
[0.440926] Freeing initrd memory: 38420k freed
[0.453397] audit: initializing netlink socket (disabled)
[0.453728] type=2000 audit(1338532572.452:1): initialized
[0.454640] highmem bounce pool size: 64 pages
[0.459860] VFS: Disk quotas dquot_6.5.2
[0.460287] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[0.460734] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[0.461297] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[0.461618] ROMFS MTD (C) 2007 Red Hat, Inc.
[0.462329] aufs 3.3-20120402
[0.462635] msgmni has been set to 1421
[0.466088] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
253)
[0.46] io scheduler noop registered
[0.466973] io scheduler deadline registered
[0.467307] io scheduler cfq registered (default)
[0.467989] pcieport :00:01.0: irq 40 for MSI/MSI-X
[0.468433] pcieport :00:1c.0: irq 41 for MSI/MSI-X
[0.468869] pcieport :00:1c.3: irq 42 for MSI/MSI-X
[0.469337] pcieport :00:1c.4: irq 43 for MSI/MSI-X
[0.469768] pcieport :00:1c.5: irq 44 for MSI/MSI-X
[0.470411] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[0.471043] isapnp: Scanning for PnP cards...
[0.826975] isapnp: No Plug & Play device found
[0.833490] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[1.098144] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1.119600] 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[1.123779] brd: module loaded
[1.180634] loop: module loaded
[1.181738] Loading iSCSI transport class v2.0-870.
[1.183443] i8042: PNP: No PS/2 controller found. Probing ports directly.
[1.184214] serio: i8042 KBD port at 0x60,0x64 irq 1
[1.184545] serio: i8042 AUX port at 0x60,0x64 irq 12
[1.185132] mousedev: PS/2 mouse device common for all mice
[1.185756] EISA: Probing bus 0 at eisa.0
[1.186106] EISA: Cannot allocate resource for mainboard
[1.186420] Cannot allocate resource for EISA slot 1
[1.186733] Cannot allocate resource for EISA slot 2
[1.187058] Cannot allocate resource for EISA slot 3
[1.187370] Cannot allocate resource for EISA slot 4
[1.187681] Cannot allocate resource for EISA slot 5
[1.187992] Cannot allocate resource for EISA slot 6
[1.188325] Cannot allocate resource for EISA slot 7
[1.188637] Cannot allocate resource for EISA slot 8
[1.188948] EISA: Detected 0 cards.
[1.189275] cpuidle: using governor ladder
[1.189583] cpuidle: using governor menu
[1.190029] TCP cubic registered
[1.190334] Initializing XFRM netlink socket
[1.190646] NET: Registered protocol family 17
[1.191120] Registering the dns_resolver key type
[1.191450] Using IPI No-Shortcut mode
[1.299537] registered taskstats version 1
[1.300194] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[1.300612] Freeing unused kernel memory: 556k freed
[1.301106] Write protecting the kernel text: 3548k
[1.301451] Write protecting the kernel read-only data: 1340k
[1.454034] Refined TSC clocksource calibration: 2833.010 MHz.
[1.454039] Switching to clocksource tsc
[1.527281] -

Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-31 Thread Gedalya

On 5/31/2012 2:31 AM, Jonathan Nieder wrote:

Hm.  Might be possible to get a log with netconsole[1].

[1]http://www.kernel.org/doc/Documentation/networking/netconsole.txt
http://blog.mraw.org/2010/11/08/Debugging_using_netconsole/


Now tried running startx /usr/bin/xterm with nouveau,

[   82.427553] [drm] nouveau :01:00.0: PDISP: DCB for 6/0xbad00103 
not found
[   82.428536] [drm] nouveau :01:00.0: PDISP: DCB for 0/0xbad00103 
not found
[   82.429483] [drm] nouveau :01:00.0: Table 0x0103 not found for 
0/2, using first


I kept a previously opened ssh connection.
When starting X, the screen went black, but didn't totally lock up until 
I killed the X process from SSH. No further netconsole output, the 
machine went totally dead.


Rebooted, this time tried to start gdm3. This time we got some juice.

[   84.538008] [drm] nouveau :01:00.0: PDISP: DCB for 6/0xbad00103 
not found
[   84.538990] [drm] nouveau :01:00.0: PDISP: DCB for 0/0xbad00103 
not found
[   84.539937] [drm] nouveau :01:00.0: Table 0x0103 not found for 
0/2, using first
[   85.216875] BUG: unable to handle kernel paging request at 
8800f1d5f100
[   85.216940] IP: [] evo_wait.constprop.13+0x3f/0xaa 
[nouveau]

[   85.216991] PGD 1606063 PUD 1fffc067 PMD 0
[   85.217020] Oops: 0002 [#1] SMP
[   85.217045] CPU 2
[   85.217057] Modules linked in: usb_storage uas netconsole configfs 
nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc ext2 loop 
firewire_sbp2 tpm_infineon nouveau mxm_wmi snd_hda_codec_hdmi wmi video 
snd_usb_audio snd_usbmidi_lib uvcvideo snd_rawmidi snd_hda_codec_realtek 
ttm snd_seq_device drm_kms_helper videodev iTCO_wdt iTCO_vendor_support 
parport_pc parport psmouse pcspkr serio_raw drm i2c_algo_bit i2c_i801 
snd_hda_intel v4l2_compat_ioctl32 snd_hda_codec snd_hwdep snd_pcm 
snd_page_alloc media button processor tpm_tis tpm i2c_core tpm_bios 
evdev snd_timer snd soundcore thermal_sys ext4 crc16 jbd2 mbcache dm_mod 
raid1 md_mod sd_mod crc_t10dif sr_mod cdrom ata_generic usbhid hid 
pata_jmicron firewire_ohci firewire_core crc_itu_t uhci_hcd r8169 mii 
ahci libahci libata scsi_mod ehci_hcd usbcore usb_common [last unloaded: 
scsi_wait_scan]

[   85.217816]
[   85.217826] Pid: 2605, comm: Xorg Not tainted 3.2.0-2-amd64 #1 
Gigabyte Technology Co., Ltd. EP45-UD3P/EP45-UD3P
[   85.217885] RIP: 0010:[]  [] 
evo_wait.constprop.13+0x3f/0xaa [nouveau]

[   85.217942] RSP: 0018:88021be35cd8  EFLAGS: 00010212
[   85.217969] RAX: 88003705f000 RBX: 88021cdc7000 RCX: 

[   85.218003] RDX: 2eb40040 RSI: 0064 RDI: 
88021cdc7000
[   85.218037] RBP: 88021b66bac0 R08:  R09: 
8168d880
[   85.218071] R10: ffea R11: ffea R12: 
2eb40050
[   85.218105] R13: 88021b859001 R14: 88021db4cbc0 R15: 
88021dc03c00
[   85.218140] FS:  7fbf30962880() GS:880227d0() 
knlGS:

[   85.218178] CS:  0010 DS:  ES:  CR0: 80050033
[   85.218207] CR2: 8800f1d5f100 CR3: 00021dac7000 CR4: 
000406e0
[   85.218241] DR0:  DR1:  DR2: 

[   85.218275] DR3:  DR6: 0ff0 DR7: 
0400
[   85.218310] Process Xorg (pid: 2605, threadinfo 88021be34000, 
task 88021ddb35d0)

[   85.218348] Stack:
[   85.218359]  88021b859000 88021cdc7000 88021cdc7001 
a0474303
[   85.218406]  88021b859001 88021b859000  
88021cdc7020
[   85.218451]  88021b859001 a04746f4 c01c64a3 
88021be35df0

[   85.218497] Call Trace:
[   85.218519]  [] ? nvd0_crtc_cursor_show+0x21/0xf8 
[nouveau]
[   85.218562]  [] ? nvd0_crtc_cursor_set+0xd5/0xf1 
[nouveau]
[   85.218602]  [] ? drm_mode_cursor_ioctl+0xe5/0x13d 
[drm]

[   85.218703]  [] ? drm_ioctl+0x289/0x35e [drm]
[   85.218738]  [] ? drm_mode_setcrtc+0x376/0x376 [drm]
[   85.218772]  [] ? do_page_fault+0x2fc/0x337
[   85.218802]  [] ? do_vfs_ioctl+0x459/0x49a
[   85.218831]  [] ? sys_ioctl+0x4b/0x72
[   85.218859]  [] ? system_call_fastpath+0x16/0x1b
[   85.21] Code: 89 fb 48 8b a8 08 0f 00 00 e8 06 f4 ff ff 89 c2 c1 
ea 02 41 01 d4 41 81 fc ff 03 00 00 76 66 48 8b 45 10 be 00 00 64 00 48 
89 df  04 90 00 00 00 20 31 d2 e8 ed f3 ff ff 45 31 c0 83 c9 ff ba
[   85.219253] RIP  [] evo_wait.constprop.13+0x3f/0xaa 
[nouveau]

[   85.219299]  RSP 
[   85.219316] CR2: 8800f1d5f100


Tried pinging the machine at this point and netconsole printed this:

[  240.648014] INFO: task kworker/2:1:30 blocked for more than 120 seconds.
[  240.648064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  240.648122] kworker/2:1 D 880227d13540 030  2 
0x
[  240.648192]  88021ef4f020 0046  
88021ed39610
[  240.648272]  00013540 88021ef53fd8 88021ef53fd8 
88021ef4f020
[  240.648351]  00

Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-30 Thread Gedalya

On 5/31/2012 1:59 AM, Jonathan Nieder wrote:

Hi,

Gedalya wrote:


Tried removing the nvidia stuff and booting up with nouveau.

[...]

Total system hang as soon as xorg starts. No response from keyboard,
mouse, no response on the network (no ping, no ARP response)

[...]

Using an Nvidia GeForce GT 520.

Do I understand correctly that the system works fine until X starts
(e.g., if you use the "text" kernel command line option)?

Yes. As long as X doesn't start it seems rock solid.

   Does X with
the fbdev driver work?  (You can test by putting the following in
/etc/X11/xorg.conf.)

Section "Device"
Identifier "geforce"
Driver "fbdev"
EndSection


Yes. This does work. BTW driver "vesa" hangs the same.


Does starting X with the nouveau driver work if you do not start a
GNOME session?  (You can test by running "startx xterm" if the xinit
package is installed.)


Interesting. Still working on this one. startx was complaining something 
about xorg.conf so I just renamed it, and then it started up. Pretty 
frozen at this point, no keyboard, only reset button helps, but I do get 
network response - there is ping, initial ssh response but nothing more, 
can't actually log in.



Thanks for a clear report.

Hope that helps,
Jonathan





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc70ec3.7040...@gedalya.net



Bug#675302: nouveau: hard lockup when gdm3 starts

2012-05-30 Thread Gedalya

On 5/31/2012 2:31 AM, Jonathan Nieder wrote:

Gedalya wrote:

On 5/31/2012 1:59 AM, Jonathan Nieder wrote:

Does starting X with the nouveau driver work if you do not start a
GNOME session?  (You can test by running "startx xterm" if the xinit
package is installed.)

Interesting. Still working on this one. startx was complaining
something about xorg.conf so I just renamed it, and then it started
up. Pretty frozen at this point, no keyboard, only reset button
helps, but I do get network response - there is ping, initial ssh
response but nothing more, can't actually log in.

Hm.  Might be possible to get a log with netconsole[1].

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt
http://blog.mraw.org/2010/11/08/Debugging_using_netconsole/


Tried this again and this time I got a total hang again.

Then I got it to work with the config file so I first tried fbdev and it 
worked, then when I switched to nouveau I got again this situation that 
it's pretty much hung but with ping.


SSH does respond with failed authentication when it's the wrong 
password, but no successful login is possible.


I'm gonna study netconsole now, let's see if I can figure it out.




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc7117d.9040...@gedalya.net



Bug#675302: linux-image-3.2.0-2-amd64: system totally hands when using nouveau

2012-05-30 Thread Gedalya

Package: linux-2.6
Version: 3.2.17-1
Severity: important

Dear Maintainer,

   * What led up to the situation?

Wanted to change back from the nonfree nvidia driver to nouveau.

   * What exactly did you do (or not do) that was effective (or
 ineffective)?

Tried removing the nvidia stuff and booting up with nouveau.

   * What was the outcome of this action?

Total system hang as soon as xorg starts. No response from keyboard, 
mouse, no response on the network (no ping, no ARP response)


   * What outcome did you expect instead?

Something better!


Using an Nvidia GeForce GT 520.

I tried a new, clean install of wheezy. After installing lxde (my 
favorite), the system freezes as the moment when gdm3 starts up.


I'm pretty certain this is not debian-specific, since I tried booting up 
parted magic (v. 2012_05_30) from a USB stick. It does the same exact 
thing. However parted magic works when I choose the xvesa boot option. 
The nouveau driver is still in use but I guess the fact that only 
non-accelerated VESA modes are being used helps.


A while back (like around linux 3.0.0) nouveau and gnome 2.30 did work 
on this same machine, at the time I changed to the nvidia drivers for 
more minor issues.

I'm filing this against the kernel since this is a total system hang.

The system information below is generated from my working system, using 
the nvidia driver.



-- Package-specific info:
** Version:
Linux version 3.2.0-2-amd64 (Debian 3.2.17-1) 
(debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-5) ) 
#1 SMP Sat May 12 23:08:28 UTC 2012


** Command line:
BOOT_IMAGE=/vmlinuz-3.2.0-2-amd64 root=/dev/mapper/nws_vg0-rootfs ro quiet

** Tainted: PO (4097)
 * Proprietary module has been loaded.
 * Out-of-tree module has been loaded.

** Kernel log:
[5.724696] PM: Image not found (code -22)
[5.724698] PM: Hibernation image not present or could not be loaded.
[5.765388] EXT4-fs (dm-0): mounted filesystem with ordered data 
mode. Opts: (null)

[7.093488] udevd[399]: starting version 175
[7.664239] input: Power Button as 
/devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input2

[7.664243] ACPI: Power Button [PWRB]
[7.664319] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3

[7.664321] ACPI: Power Button [PWRF]
[7.725543] input: PC Speaker as /devices/platform/pcspkr/input/input4
[7.755010] i801_smbus :00:1f.3: PCI INT C -> GSI 18 (level, low) 
-> IRQ 18

[7.867662] Marking TSC unstable due to TSC halts in idle
[7.867668] ACPI: acpi_idle registered with cpuidle
[7.868903] Switching to clocksource hpet
[7.878614] iTCO_vendor_support: vendor-support=0
[7.950400] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07
[7.950492] iTCO_wdt: Found a ICH10R TCO device (Version=2, 
TCOBASE=0x0460)

[7.950566] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[7.972393] snd_hda_intel :00:1b.0: PCI INT A -> GSI 22 (level, 
low) -> IRQ 22

[7.972441] snd_hda_intel :00:1b.0: irq 48 for MSI/MSI-X
[7.972471] snd_hda_intel :00:1b.0: setting latency timer to 64
[8.009769] tpm_tis 00:09: 1.2 TPM (device-id 0xB, rev-id 16)
[8.194326] hda_codec: ALC889A: BIOS auto-probing.
[8.203338] input: HDA Intel Headphone as 
/devices/pci:00/:00:1b.0/sound/card0/input5
[8.203836] snd_hda_intel :01:00.1: PCI INT B -> GSI 17 (level, 
low) -> IRQ 17

[8.203839] hda_intel: Disabling MSI
[8.203876] snd_hda_intel :01:00.1: setting latency timer to 64
[8.467460] Linux media interface: v0.10
[8.496188] nvidia: module license 'NVIDIA' taints kernel.
[8.496191] Disabling lock debugging due to kernel taint
[8.69] Linux video capture interface: v2.00
[8.696074] tpm_tis 00:09: Adjusting TPM timeout parameters.
[8.700016] HDMI status: Codec=0 Pin=4 Presence_Detect=0 ELD_Valid=0
[8.700526] uvcvideo: Found UVC 1.00 device  (046d:0990)
[8.714893] input: UVC Camera (046d:0990) as 
/devices/pci:00/:00:1d.7/usb2/2-4/2-4:1.0/input/input6

[8.714981] usbcore: registered new interface driver uvcvideo
[8.714982] USB Video Class driver (1.1.1)
[8.732021] HDMI status: Codec=0 Pin=5 Presence_Detect=0 ELD_Valid=0
[8.748101] input: HDA NVidia HDMI/DP,pcm=7 as 
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input7
[8.748197] input: HDA NVidia HDMI/DP,pcm=3 as 
/devices/pci:00/:00:01.0/:01:00.1/sound/card1/input8
[8.748461] nvidia :01:00.0: PCI INT A -> GSI 16 (level, low) -> 
IRQ 16

[8.748471] nvidia :01:00.0: setting latency timer to 64
[8.748477] vgaarb: device changed decodes: 
PCI::01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[8.748651] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  295.49  
Mon Apr 30 23:46:33 PDT 2012

[9.057232] usbcore: registered new interface driver snd-usb-audio
[9.468054] parport_pc 00:08: reported by Plug and Play ACPI
[9.468107] parport0: P

Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen (also affects ext3 as of linux-image-3.1.0-1-amd64 et al)

2012-03-04 Thread Gedalya

notfound 637234 3.2.6-1
notfound 637234 2.6.32-41
thanks

Timo & all,

This has been a bug in the dom0 kernel. If I've been following this bug 
correctly, then the only thing about the 3.0+ kernels is that they have 
barriers enabled by default, which we then disable as a workaround. But 
this is supposed to work, and it was mishandled by the kernel in dom0. I 
therefore see no point in marking this bug as found in 3.0+ kernels used 
in domU's.


This bug was apparently fixed in 2.6.32-40 or 2.6.32-41, I waited for 
2.6.32-41 and now I'm running that kernel in my dom0. You have to 
upgrade your dom0 to the latest kernel. I now have various wheezy domU's 
with barriers enabled again, running with no issues.


Regards,

Gedalya


On 02/07/2012 05:17 AM, Timo Juhani Lindfors wrote:

package linux-2.6
notfound 637234 3.0.0-3
found 637234 3.1.0-1~experimental.1
found 637234 3.1.1-1
found 637234 3.1.8-2
found 637234 3.2.4-1
thanks

dom0


amd64 squeeze with
Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-39) (da...@debian.org) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Thu Nov 3 05:42:31 UTC 2011

config
--

name = 'lindi2'
vcpus = '4'
memory = '1024'
disk = [ 'file:/local/xen/lindi2/disk.img,xvda,w' ]
vif = [ 'mac=00:01:01:99:80:02' ]
on_crash = 'restart'

domU


amd64 wheezy with
Linux version 3.1.0-1-amd64 (Debian 3.1.0-1~experimental.1) (wa...@debian.org) 
(gcc version 4.6.1 (Debian 4.6.1-16) ) #1 SMP Thu Nov 3 19:35:59 UTC 2011

(also occurs with 3.1.1-1, 3.1.8-2 and 3.2.4-1 but listing the first
version where the bug occurs)

Using ext3 for rootfs. dmesg has

...
blkfront: barrier: empty write xvda op failed
blkfront: xvda: barrier or flush: disabled
end_request: I/O error, dev xvda, sector 16519664
end_request: I/O error, dev xvda, sector 16519664
...

Full dmesg attached.



workaround
==

Setting barrier=0 for / in fstab seems to help. Thanks Konrad for the
tip, now I can continue working :-)





Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-26 Thread Gedalya



One way to make sure that is not the case is to disable barriers in the
guest. Meaning in /etc/fstab have something like this:

/dev/xvdc /blah ext4errors=remount-ro,barrier=0 0 1


That seems to fix it. It was remounting as read only either during the 
boot process or immediately after, and now it boots up and seems to stay 
up. I'll test laster with a DomU that actually has things running.


This also fixes the reboot problem I noted earlier, init 6 now reboots 
the DomU rather than destory it.




The other question is what version of Dom0 are you running? Is it 2.6.32?
2.6.39?

squeeze, running linux-image-2.6.32-5-xen-amd64  2.6.32-35





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e58251a.8090...@gedalya.net



Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-25 Thread Gedalya



Gedalya, 2.6.39-2-686-pae could be anything from v2.6.39..v2.6.39.2
please could you confirm which package version you have installed in
case it makes a difference.


root@mail:~# uname -a
Linux mail 2.6.39-2-686-pae #1 SMP Tue Jul 5 03:48:49 UTC 2011 i686 
GNU/Linux

root@mail:~# dpkg -l | grep linux-image
ii  linux-image-2.6-686-pae  
3.0.0+39 Linux for modern PCs (dummy package)
ii  linux-image-2.6.39-2-686-pae 
2.6.39-3 Linux 2.6.39 for modern PCs
ii  linux-image-3.0.0-1-686-pae  
3.0.0-1  Linux 3.0.0 for modern PCs
ii  linux-image-686-pae  
3.0.0+39 Linux for modern PCs (meta-package)

root@mail:~#




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e55f7bf.1080...@gedalya.net



Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-14 Thread Gedalya

This has already been reported here:
http://www.mail-archive.com/ubuntu-server@lists.ubuntu.com/msg05621.html

Generally I use lvm2 as storage backend, but I've just reproduced the 
exact same behavior using raw image files.


I get the following exactly every other time the VM boots:

[6.837760] end_request: I/O error, dev xvda, sector 4456680
[6.837783] end_request: I/O error, dev xvda, sector 4456680
[6.837824] Aborting journal on device xvda-8.
[6.845859] EXT4-fs error (device xvda): ext4_journal_start_sb:296: 
Detected aborted journal

[6.845945] EXT4-fs (xvda): Remounting filesystem read-only


The following time it runs fsck:

Checking root file system...fsck from util-linux 2.19.1
/dev/xvda contains a file system with errors, check forced.
/dev/xvda: 26427/327680 files (0.2% non-contiguous), 196522/1310720 blocks
done.

and then seems to work fine, until the next reboot.




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e483c37.1010...@gedalya.net



Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-13 Thread Gedalya

Confirmed this happens under xen, not under native hardware or vmware.
Reproduced on various xen hosts with both i686-pae and amd64 kernels in 
DomU.


On a side note: DomU doesn't come back up on a restart, it seems like 
it's destroyed and not recreated. Seems like this too was introduced 
with linux 3.0.0-1, perhaps a separate bug should be filed?





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e46cdcf.9000...@gedalya.net



Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-09 Thread Gedalya
Package: linux-2.6
Version: 3.0.0-1
Severity: important


Hello,

I have a xen host running debian squeeze, amd64, some of the DomU's are
running wheezy. My mail server is a DomU called "mail", using ext4 for the
root (and other) FS. A dist-upgrade on "mail" has upgraded the kernel to
linux-image-3.0.0-1-686-pae, and at this point I started getting I/O errors
during the boot process, as follows:

---
Starting MySQL database server: mysqld[6.453894] end_request: I/O error, 
dev xvda, sector 4456704
[6.453919] end_request: I/O error, dev xvda, sector 4456704
[6.453964] Aborting journal on device xvda-8.
[6.462873] EXT4-fs error (device xvda): ext4_journal_start_sb:296: Detected 
aborted journal
[6.462903] EXT4-fs (xvda): Remounting filesystem read-only
[6.463276] journal commit I/O error
 . . . . . . . . . . . . . . failed!
Starting MTA: exim4.
Starting IMAP/POP3 mail server: dovecot.
startpar: service(s) returned failure: mysql ... failed!
---

So I went ahead and installed wheezy on a brand new DomU, and this
was repeated immediately when booting the machine after the installation
completed.

---
Starting NFS common utilities: statd[3.977392] end_request: I/O error, dev 
xvda, sector 4456808
[3.977415] end_request: I/O error, dev xvda, sector 4456808
[3.977470] Aborting journal on device xvda-8.
[3.990442] journal commit I/O error
[3.991041] EXT4-fs error (device xvda): ext4_journal_start_sb:296: Detected 
aborted journal
[3.991126] EXT4-fs (xvda): Remounting filesystem read-only
 failed!
Cleaning up temporary files
Setting up console font and keymap...done.
startpar: service(s) returned failure: nfs-common ... failed!
INIT: Entering runlevel: 2
Using makefile-style concurrent boot in runlevel 2.
Starting rpcbind daemon...Already running..
Starting NFS common utilities: statd failed!
touch: cannot touch `/var/log/dmesg.new': Read-only file system
chown: cannot access `/var/log/dmesg.new': No such file or directory
chmod: cannot access `/var/log/dmesg.new': No such file or directory
ln: creating hard link `/var/log//dmesg.0': Read-only file system
... etc. ...
---

Now, it happenes this way exactly every _other_ time the machines boot.
When I reboot after these I/O errors, fsck is run and then the machine
seems to be actually fine until the next reboot when it all happens
again.

For me, this is happening on xen DomU's, only when running linux
3.0.0-1-686-pae, only when using ext4 for the root FS.
No problems when booting back to 2.6.39-2-686-pae.

Please let me know what more specific testing needs to be done, if
necessary I can test more platforms / flavors.

I have observed nothing to suggest this is related to xen, it's just my
platform here.

-- Package-specific info:
** Version:
Linux version 3.0.0-1-686-pae (Debian 3.0.0-1) (b...@decadent.org.uk) (gcc 
version 4.5.3 (Debian 4.5.3-3) ) #1 SMP Sun Jul 24 14:27:32 UTC 2011

** Command line:
root=UUID=8a1a7bca-b0e2-4714-baf1-b852eab25843 ro  quiet 

** Not tainted

** Kernel log:
[0.016117] PCI: System does not support PCI
[0.016120] PCI: System does not support PCI
[0.016231] Switching to clocksource xen
[0.017739] pnp: PnP ACPI: disabled
[0.017742] PnPBIOS: Disabled
[0.018820] Switched to NOHz mode on CPU #1
[0.018902] Switched to NOHz mode on CPU #0
[0.020460] PCI: max bus depth: 0 pci_try_num: 1
[0.020696] NET: Registered protocol family 2
[0.020967] IP route cache hash table entries: 8192 (order: 3, 32768 bytes)
[0.021437] TCP established hash table entries: 32768 (order: 6, 262144 
bytes)
[0.021752] TCP bind hash table entries: 32768 (order: 6, 262144 bytes)
[0.022063] TCP: Hash tables configured (established 32768 bind 32768)
[0.022069] TCP reno registered
[0.022077] UDP hash table entries: 512 (order: 2, 16384 bytes)
[0.022100] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[0.022469] NET: Registered protocol family 1
[0.022486] PCI: CLS 0 bytes, default 64
[0.022574] Unpacking initramfs...
[0.042069] Freeing initrd memory: 22480k freed
[0.046257] platform rtc_cmos: registered platform RTC device (no PNP device 
found)
[0.046605] audit: initializing netlink socket (disabled)
[0.046616] type=2000 audit(1312911347.921:1): initialized
[0.056740] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[0.057039] VFS: Disk quotas dquot_6.5.2
[0.057099] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[0.057194] msgmni has been set to 999
[0.057354] alg: No test for stdrng (krng)
[0.057382] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
253)
[0.057386] io scheduler noop registered
[0.057388] io scheduler deadline registered
[0.057402] io scheduler cfq registered (default)
[0.057598] isapnp: Scanning for PnP cards...
[0.409558] isapnp: No Plug & Play device found
[0.409873] Serial: 8250/16550 dr