Bug#511603: linux-image-2.6.26-1-amd64: iwl4965 panic

2009-01-12 Thread Noah Meyerhans
On Mon, Jan 12, 2009 at 12:01:11PM -0600, Michael Ekstrand wrote:
> For some time now, I have been experiencing kernel panics (system
> frozen, caps lock LED blinking) when using my Intel 4965AGN wireless
> card (in a Thinkpad R61).  I have only noticed the problems when
> connected to my school's WPA-PEAP network.  It usually works for some
> time and then panics; I am running in X11, so I do not have a dump
> available presently.  I have no problems when wireless is disabled.
> 
> Based on the timing and the varying natures of networks in which I have
> and have not observed the problem, I suspect that it is being triggered
> by roaming between access points on the same network.  I do not know if
> the problem persists or not when using a multi-AP unsecured network.  I
> have no problems on my home network, which is secured with WPA-PSK and
> has a single access point.  I am using wpa_supplicant in roaming mode to
> manage wireless connectivity.
> 
> I am using firmware from the firmware-iwlwifi package version 0.14.

We'll know better when we've got a panic log, but this sounds quite a
bit like #502326, which is fixed by a newer firmware image for the 4965
chipset.

FWIW, I needed NETCONSOLE and the laptop's wired interface to get the
panic log on my machine.

After you're able to get a panic log, try installing a new firmware
image from http://www.intellinuxwireless.org/?n=downloads&f=ucode  and
see if it helps.  Make sure you reload the module to get the new
firmware image loaded (or just reboot).

noah



signature.asc
Description: Digital signature


Bug#511603: linux-image-2.6.26-1-amd64: iwl4965 panic

2009-01-13 Thread Noah Meyerhans
On Tue, Jan 13, 2009 at 04:24:03PM -0600, Michael Ekstrand wrote:
> > I have tried to get a netconsole running, but so far have had no
> > success.  If I can get a crash with debug=1 this afternoon, I'll post
> > any additional info it yields; otherwise, I'll try either the updated
> > firmware or the snapshot build and see if I can last tomorrow without
> > crashing.
> 
> I have had a crash with debug=1, but it did not have any additional info
> in /var/log/messages.

That's not surprising.  The kernel is likely dead before it gets to
write the messages out to the file.  You'll really need to set up
netconsole...

noah



signature.asc
Description: Digital signature


Bug#502326: Current status?

2009-01-20 Thread Noah Meyerhans
On Tue, Jan 13, 2009 at 02:03:10PM +0100, Oliver Bock wrote:
> I'd like to know the current status of this bug as the root cause got 
> fixed upstream apparently. Is the fixed microcode already scheduled for 
> integration into "firmware-iwlwifi" already?

The bug still occurs in linux-image-2.6.26-1-686 (2.6.26-13) even with
the new firmware.  I believe we also need
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commitdiff;h=55d6a3cd0cc85ed90c39cf32e16f622bd003117b;hp=47cbb1107e4172f3632713d74dc8651a32ceb294
though I'm not completely sure why.  It seems to me that, if the
firmware fixes the problem, the above patch shouldn't be needed.

In any case, I'm going to build 2.6.26 with the above patch for testing.
If it helps, it'd be really helpful to get it included with an upcoming
lenny revision.  The lenny kernel is pretty much unusable to me with
this bug present...

noah



signature.asc
Description: Digital signature


Bug#505174: linux-image-2.6.26-1-686: After installed vmware-tools, the mouse stuck in debian (i'm running vmware workstation 6.5 in Win XP)

2009-02-25 Thread Noah Meyerhans
On Wed, Feb 25, 2009 at 11:27:56PM +, Orlando Agostinho wrote:
> Package: linux-image-2.6.26-1-686
> Version: 2.6.26-13
> Followup-For: Bug #505174

Why is the is a followup for 505174?  It doesn't seem at all related.

> 
> I have vmware  workstation 6.5 running in WIN XP. After, installed
> debian lenny, the mouse stuck in debian lenny!
> 

I've seen this as well, but since vmware seems to have caused it, and
their software is most definitely non-free, I don't think this is a bug
in Linux or anything else Debian needs to care about.

FWIW, you can work around this by editing your xorg.conf and replacing
vmmouse with mouse.

noah



signature.asc
Description: Digital signature


Bug#524373: linux-2.6: /dev/mem rootkit vulnerability

2009-04-16 Thread Noah Meyerhans
On Thu, Apr 16, 2009 at 11:55:05AM -0400, Michael S. Gilbert wrote:
> as seen in recent articles and discussions, the linux kernel is
> currently vulnerable to rootkit attacks via the /dev/mem device.  one
> article [1] mentions that there is an existing patch for the problem,
> but does not link to it.  perhaps this fix can be found in the kernel
> mailing lists.

There's no vulnerability there.  /dev/mem is only writable by root.

The research (if there's really any research involved) just shows how
you could hide files or processes by manipulating /dev/mem.  That's been
known for ages.  That's why you don't let your users write to /dev/mem.
If the attacker has root, who cares what means they use to hide their
precese, you've already lost.

noah



signature.asc
Description: Digital signature


Bug#524373: linux-2.6: /dev/mem rootkit vulnerability

2009-04-16 Thread Noah Meyerhans
On Thu, Apr 16, 2009 at 04:21:10PM -0400, Michael S. Gilbert wrote:
> 
> i think that any flaw that allows an attacker to elevate his pwnage from
> root to hidden should always be considered a grave security issue.

Your argument sounds like the one used by RIAA, MPAA etc, based on the
DMCA's anti-circumvention clause, to keep things like open source dvd
players illegal.  Just because something can be used for malicious
purposes, doesn't mean its existance is a bad thing.  There are reasons
for /dev/mem to exist, and why you might want to manipulate kernel state
through it.  Many of these do not involve rootkits.

The support for dynamically loadable kernel modules in Linux can be
abuses similarly.  Does that make it a "grave security issue"?

But as Dann pointed out, we'll have CONFIG_STRICT_DEVMEM in the future
to help minimize exposure.

If you want to continue this discussion, I propose to do it outside the
BTS.

noah



signature.asc
Description: Digital signature


Bug#528504: linux-image-k7: K7 Image not availble

2009-05-14 Thread Noah Meyerhans
On Wed, May 13, 2009 at 12:35:47PM +0200, Francis Debord wrote:
> I got a AMD Geode NX 1700, also know as AMD-AthlonXP-Mobile with 
> Multiprocessorsupport (an unlocked Athlon / thouroughbred II) on PCCHips 
> M848A Motherboard.
> 
> When an other sad i686-kernel is working fine and will say i am lying, 
> he had to come here and bring out the proofs.
> 
> Show me, what you know better than me!

Francis, please refer to the following:

spider:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 4
model name  : AMD Athlon(tm) processor
stepping: 2
cpu MHz : 1200.122
cache size  : 256 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat 
pse36 mmx fxsr syscall mmxext 3dnowext 3dnow up
bogomips: 2403.12
clflush size: 32
power management:

spider:~$ uname -r
2.6.26-2-686
spider:~$ uptime
 12:16:51 up 19 days,  6:58,  3 users,  load average: 0.32, 0.25, 0.17


Uptimes of well over 100 days are common on this system.  The 686 kernel
fully supports the AMD Athlon process.

noah



signature.asc
Description: Digital signature


Re: Debian Kernel Group Meeting

2009-10-23 Thread Noah Meyerhans
On Fri, Oct 23, 2009 at 11:06:55AM +0200, Bastian Blank wrote:
> > > > xen dom 0
> > > > +
> > Are you basing your patch on the PV-ops tree or on one of the SuSE
> > forward ports? Based on your recent posts to xen-devel I had assumed the
> > former but Ben's initial post suggested perhaps the later.
> 
> It is based on the pv-ops tree.
> 
> Bastian

If this is the case, and if we assume that pv-ops xen support is
eventually included upstream (which certainly appears to be the goal),
why the deprecation notice?

noah



signature.asc
Description: Digital signature


Re: Debian Kernel Group Meeting

2009-10-23 Thread Noah Meyerhans
On Fri, Oct 23, 2009 at 04:43:31PM +0200, maximilian attems wrote:
> > If this is the case, and if we assume that pv-ops xen support is
> > eventually included upstream (which certainly appears to be the goal),
> > why the deprecation notice?
> 
> who stepped to properly maintain the patch in the current
> for the next stable release and more important during it's hole
> lifetime!?

AIUI, the pv-ops xen support is under active development and will be
included in the mainline kernel.  There will be no separate patch to
maintain.  DomU support is already mainline.  See
http://wiki.xensource.com/xenwiki/XenParavirtOps for details.

noah



signature.asc
Description: Digital signature


Re: Debian Kernel Group Meeting

2009-10-23 Thread Noah Meyerhans
On Fri, Oct 23, 2009 at 05:08:15PM +0200, maximilian attems wrote:
> you still didn't read or get my question:
> Who will maintain the chosen pvops patch for the Squeeze lifetime?
> 
> yes, this is the part of the work:
> * backward port important fixes, securtiy patches
> * read bug reports, decipher oopses and be able to fix them

When I hear that this feature will be marked as "deprecated" in squeeze,
I interpret that to mean that it is supported, but that it is a legacy
feature and will be removed from future versions.  In other words, I
assume that the kernel team has already decided how things will be
handled within squeeze.  If this isn't what you mean by "deprecated",
could you please clarify what your intentions are, regarding Xen dom0
suppport (pv-ops or otherwise), in squeeze, and in squeeze+1?

I don't mean to trivialize the work that must go in to supporting
pv-ops/dom0 support in squeeze.  If nobody is volunteering to take on
this task, I understand that it is a significant issue.  I'm just hoping
to get a clarification of what the actual situation is.

Thanks.
noah



signature.asc
Description: Digital signature


Re: 2.6.29 Kernel, Lenny backports

2009-10-26 Thread Noah Meyerhans
On Mon, Oct 26, 2009 at 09:04:04PM +, Steve Gane wrote:
> I'm trying to use the linux-headers-2.6.29-bpo.2-686 package from  
> Lenny/backports, but aptitude says it depends on linux-kbuild-2.6.29,  
> which is not available in Lenny, or Lenny backports, or Squeeze.

(This question should have been posted to debian-user or (even better)
backports-users, not debian-kernel. (*) This list is used for
coordination of development efforts.)

I ran into this exact problem today, and took it to mean "You should be
using 2.6.30 from backports.org instead".  All is now well.

noah

* http://lists.backports.org/mailman/listinfo/backports-users



signature.asc
Description: Digital signature


Re: Linux image packages going to depend on python

2009-11-30 Thread Noah Meyerhans
On Sun, Nov 29, 2009 at 02:15:41PM -0600, Manoj Srivastava wrote:
> Perhaps you should consider making the script just create a
>  ./fstab.new file, and not overwriting /etc/fstab?  makes it easier to
>  test the script out without altering current setup.

Keeping a copy of the original file, maybe in /var/backups, might be
helpful as well.

noah



signature.asc
Description: Digital signature


Bug#534591: linux-image-2.6.30-1-686: oops when removing i915 kernel module

2009-06-25 Thread Noah Meyerhans
Package: linux-image-2.6.30-1-686
Version: 2.6.30-1
Severity: normal

While trying to debug X issues similar to #524340, I ran "modprobe -r
915", which resulted in the following kernel oops:

[ 2417.824670] [drm] Module unloaded
[ 2417.872150] BUG: unable to handle kernel NULL pointer dereference at 0010
[ 2417.872408] IP: [] klist_put+0xf/0x58
[ 2417.872590] *pde =  
[ 2417.872759] Oops:  [#1] SMP 
[ 2417.873006] last sysfs file: /sys/module/video/refcnt
[ 2417.873084] Modules linked in: usbhid hid ppdev parport_pc lp parport sco 
bridge stp bnep rfcomm l2cap bluetooth acpi_cpufreq cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats fuse dm_crypt 
snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep arc4 ecb evdev 
serio_raw snd_pcm_oss snd_mixer_oss pcspkr psmouse i2c_i801 snd_pcm uvcvideo 
i2c_core videodev v4l1_compat iwlagn iwlcore mac80211 snd_seq_midi snd_rawmidi 
thinkpad_acpi cfg80211 rfkill snd_seq_midi_event battery led_class snd_seq 
nvram ac snd_timer snd_seq_device snd soundcore snd_page_alloc processor button 
ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod 
crc_t10dif ide_pci_generic uhci_hcd ata_generic ahci libata video(-) output 
ehci_hcd intel_agp agpgart piix scsi_mod ide_core e1000e usbcore thermal fan 
thermal_sys [last unloaded: i2c_algo_bit]
[ 2417.876102] 
[ 2417.876102] Pid: 9065, comm: modprobe Not tainted (2.6.30-1-686 #1) 64781TU
[ 2417.876102] EIP: 0060:[] EFLAGS: 00010246 CPU: 0
[ 2417.876102] EIP is at klist_put+0xf/0x58
[ 2417.876102] EAX:  EBX: 0001 ECX: 0002 EDX: 0001
[ 2417.876102] ESI:  EDI: f6455d78 EBP: 0080 ESP: f43f3f10
[ 2417.876102]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 2417.876102] Process modprobe (pid: 9065, ti=f43f2000 task=f7398550 
task.ti=f43f2000)
[ 2417.876102] Stack:
[ 2417.876102]  0002  c03fccb8 0080 c030fd5f c040bd14 c040bd14 
f6455d78
[ 2417.876102]  f7398550  f822441c c026c291 f82244dc 093b9be0  
f8220029
[ 2417.876102]  c0145a54 65646976 c017006f f6bdd3c8 0246 f6bdd3c8  
b7f28000
[ 2417.876102] Call Trace:
[ 2417.876102]  [] ? klist_remove+0x51/0x7a
[ 2417.876102]  [] ? bus_remove_driver+0x59/0x88
[ 2417.876102]  [] ? cleanup_module+0xa/0x1a [video]
[ 2417.876102]  [] ? sys_delete_module+0x18e/0x1e7
[ 2417.876102]  [] ? wb_timer_fn+0xf/0x27
[ 2417.876102]  [] ? remove_vma+0x3e/0x43
[ 2417.876102]  [] ? do_munmap+0x20e/0x228
[ 2417.876102]  [] ? sysenter_do_call+0x12/0x28
[ 2417.876102] Code: 39 d3 75 e4 8b 06 fe 00 85 ed 74 08 85 ff 74 04 89 f8 ff 
d5 8b 46 04 5b 5e 5f 5d c3 55 57 89 c7 56 53 8b 30 89 d3 83 e6 fe 89 f0 <8b> 6e 
10 e8 c7 e4 00 00 84 db 74 17 f6 07 01 74 0f ba 45 00 00 
[ 2417.876102] EIP: [] klist_put+0xf/0x58 SS:ESP 0068:f43f3f10
[ 2417.876102] CR2: 0010
[ 2417.889612] ---[ end trace d76f27590b9dc7de ]---


-- Package-specific info:
** Version:
Linux version 2.6.30-1-686 (Debian 2.6.30-1) (wa...@debian.org) (gcc version 
4.3.3 (Debian 4.3.3-11) ) #1 SMP Sun Jun 14 16:11:32 UTC 2009

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.30-1-686 
root=UUID=3308441b-88a7-45e3-84dd-9181366a0f1b ro quiet

** Tainted: G D (128)

** Kernel log:
[   14.283501] iwlagn :03:00.0: firmware: requesting iwlwifi-4965-2.ucode
[   14.343159] iwlagn :03:00.0: loaded firmware version 228.57.2.23
[   14.545521] Registered led device: iwl-phy0::radio
[   14.545553] Registered led device: iwl-phy0::assoc
[   14.545577] Registered led device: iwl-phy0::RX
[   14.545600] Registered led device: iwl-phy0::TX
[   14.613964] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   16.403385] wlan0: authenticate with AP 00:22:6b:54:61:da
[   16.405475] wlan0: authenticated
[   16.405480] wlan0: associate with AP 00:22:6b:54:61:da
[   16.407676] wlan0: RX AssocResp from 00:22:6b:54:61:da (capab=0x401 status=0 
aid=1)
[   16.407680] wlan0: associated
[   16.424259] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   16.424626] wlan0: disassociating by local choice (reason=3)
[   27.056088] wlan0: no IPv6 routers present
[   74.397065] usb 5-1: new low speed USB device using uhci_hcd and address 2
[   74.573135] usb 5-1: New USB device found, idVendor=046d, idProduct=c00e
[   74.573140] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   74.573145] usb 5-1: Product: USB-PS/2 Optical Mouse
[   74.573148] usb 5-1: Manufacturer: Logitech
[   74.573273] usb 5-1: configuration #1 chosen from 1 choice
[   74.620014] usbcore: registered new interface driver hiddev
[   74.633498] input: Logitech USB-PS/2 Optical Mouse as 
/devices/pci:00/:00:1d.0/usb5/5-1/5-1:1.0/input/input10
[   74.633628] generic-usb 0003:046D:C00E.0001: input,hidraw0: USB HID v1.10 
Mouse [Logitech USB-PS/2 Optical Mouse] on usb-:00:1d.0-1/input0
[   74.633650] usbcore: registered new interface driver usbhid
[   74.633654] usbhid: v2.6:USB HID core driver
[   80.324475] nepomukservices[3725]: segfault at 4 ip

Bug#542470: closed by maximilian attems (Re: Bug#542470: linux-image-2.6.30-1-686: IPv6 can not be disabled)

2009-08-20 Thread Noah Meyerhans
On Thu, Aug 20, 2009 at 10:47:56PM +0200, advocatux wrote:
> Yep, I know I can add "ipv6.disable=1" in /boot/grub/menu.lst but this
> method doesn't work always, it depends on which 2.6.30 kernel version
> you're running.

So this bug was closed when 2.6.30 was uploaded to unstable, no?  We're
not going to support anything less than 2.6.30 with the squeeze release.

Are you claiming that something needs to be done in lenny and/or etch?
It doesn't seem like it.

noah



signature.asc
Description: Digital signature


Bug#502326: linux-image-2.6.26-1-686: crash in iwl3945

2008-10-15 Thread Noah Meyerhans
Package: linux-image-2.6.26-1-686
Version: 2.6.26-8
Severity: important

My thinkpad X300 has recently developed a rather unpleasant case of
instability related to the iwl3945 driver.  This bug may be identical to
#500914, though the symptoms, including stack trace, are slightly
different.  Unlike #500914, there's no reason to think that this bug is
related to proximity to the access point, as mine is pretty much right
outside my door.  Also, the crash doesn't happen on boot, but usually
happens within about 30 minutes or so.  lspci -v reports this wireless
interface as

03:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN 
Network Connection (rev 61)
Subsystem: Intel Corporation Lenovo ThinkPad T51
Flags: fast devsel, IRQ 17
Memory at f9f0 (64-bit, non-prefetchable) [size=8K]
Capabilities: 
Kernel driver in use: iwl4965
Kernel modules: iwl4965

Kernel output follows:

[  878.071717] iwl4965: Error wrong command queue 63 command id 0x0
[  878.071717] [ cut here ]
[  878.071717] kernel BUG at drivers/net/wireless/iwlwifi/iwl4965-base.c:3465!
[  878.071717] invalid opcode:  [#1] SMP
[  878.071717] Modules linked in: i915 drm rfcomm l2cap bluetooth uinput ppdev 
parport_pc lp parport ipv6 acpi_cpufreq cpufreq_powersave cpufreq_stats 
cpufreq_userspace cpufreq_conservative cpufreq_ondemand freq_table netconsole 
configfs fuse loop joydev snd_hda_intel snd_seq_dummy snd_pcm_oss snd_mixer_oss 
snd_seq_oss arc4 snd_seq_midi snd_rawmidi snd_pcm ecb crypto_blkcipher psmouse 
serio_raw snd_seq_midi_event pcspkr i2c_i801 i2c_core iTCO_wdt snd_seq iwl4965 
firmware_class iwlcore rfkill mac80211 cfg80211 snd_timer snd_seq_device snd 
soundcore snd_page_alloc ac battery bay video output button intel_agp agpgart 
evdev thinkpad_acpi led_class nvram ext3 jbd mbcache ide_cd_mod cdrom sd_mod 
piix ide_pci_generic ide_core ahci ata_generic libata scsi_mod dock ehci_hcd 
uhci_hcd usbcore e1000e thermal processor fan thermal_sys
[  878.071717]
[  878.071717] Pid: 0, comm: swapper Not tainted (2.6.26-1-686 #1)
[  878.071717] EIP: 0060:[] EFLAGS: 00010092 CPU: 1
[  878.071717] EIP is at iwl4965_irq_tasklet+0x2db/0x4de [iwl4965]
[  878.071717] EAX: 0047 EBX: f16b4000 ECX:  EDX: 0092
[  878.071717] ESI: 0001 EDI: f7599020 EBP: f759a058 ESP: f7493ec8
[  878.071717]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[  878.071717] Process swapper (pid: 0, ti=f7492000 task=f748f520 
task.ti=f7492000)
[  878.071717] Stack:  4001 0282 c010fca5 316b4000 0008 
005b 005c
[  878.071717]f7599aa0  0006d38c 8000 f759f110  
000a 0001
[  878.071717]c0126867 0001 c036da68 c0126419 0046 c0372de0 
 00da
[  878.071717] Call Trace:
[  878.071717]  [] lapic_next_event+0xc/0x10
[  878.071717]  [] tasklet_action+0x68/0xd0
[  878.071717]  [] __do_softirq+0x66/0xd3
[  878.071717]  [] do_softirq+0x45/0x53
[  878.071717]  [] irq_exit+0x35/0x67
[  878.071717]  [] do_IRQ+0x52/0x63
[  878.071717]  [] common_interrupt+0x23/0x28
[  878.071717]  [] module_param_sysfs_remove+0xf/0x23
[  878.071717]  [] acpi_idle_enter_bm+0x2a9/0x317 [processor]
[  878.071717]  [] cpuidle_idle_call+0x5b/0x86
[  878.071717]  [] cpuidle_idle_call+0x0/0x86
[  878.071717]  [] cpu_idle+0xab/0xcb
[  878.071717]  ===
[  878.071717] Code: 8b 88 a8 00 00 00 66 8b 41 06 0f b6 d4 81 e2 bf 00 00 00 
83 fa 04 74 17 0f b6 41 04 50 52 68 df 2d ae f8 e8 f0 95 64 c7 83 c4 0c <0f> 0b 
eb fe 0f b6 d0 f6 c4 40 89 54 24 28 8b 87 b0 23 00 00 75
[  878.071717] EIP: [] iwl4965_irq_tasklet+0x2db/0x4de [iwl4965] 
SS:ESP 0068:f7493ec8
[  878.071717] Kernel panic - not syncing: Fatal exception in interrupt


-- Package-specific info:
** Version:
Linux version 2.6.26-1-686 (Debian 2.6.26-8) ([EMAIL PROTECTED]) (gcc version 
4.1.3 20080623 (prerelease) (Debian 4.1.2-23)) #1 SMP Thu Oct 9 15:18:09 UTC 
2008

** Command line:
root=/dev/sda1 ro quiet vga=791 quiet

** Not tainted

** Kernel log:
[7.526021] Registered led device: tpacpi::dock_batt
[7.526021] Registered led device: tpacpi::unknown_led
[7.526021] Registered led device: tpacpi::standby
[7.529644] thinkpad_acpi: Lenovo BIOS switched to ACPI backlight control 
mode
[7.529644] thinkpad_acpi: standard ACPI backlight interface available, not 
loading native one...
[7.529644] input: ThinkPad Extra Buttons as /class/input/input1
[7.862017] input: Power Button (FF) as /class/input/input2
[7.908595] Linux agpgart interface v0.103
[7.910076] ACPI: Power Button (FF) [PWRF]
[7.910163] input: Lid Switch as /class/input/input3
[7.937991] agpgart: Detected an Intel 965GM Chipset.
[7.937991] agpgart: Detected 7676K stolen memory.
[7.942542] ACPI: Lid Switch [LID]
[7.942626] input: Sleep Button (CM) as /class/input/input4
[7.953152] agpgart: AGP aperture is 256M @ 0xe000
[8.01

Bug#502326: Acknowledgement (linux-image-2.6.26-1-686: crash in iwl3945)

2008-10-15 Thread Noah Meyerhans
And of course, the bug report should have mentioned iwl4965, not 3945.
Retitled accordingly.

noah



signature.asc
Description: Digital signature


Bug#502326: Acknowledgement (linux-image-2.6.26-1-686: crash in iwl4965)

2008-10-24 Thread Noah Meyerhans
Instability has gotten bad enough that I'm trying a homegrown 2.6.27.3
image.  Uptime is approaching 20 hours, all with wireless active, and no
sign of trouble, but it's way too soon to tell if it really made a
difference...

noah



signature.asc
Description: Digital signature


Bug#502326: Acknowledgement (linux-image-2.6.26-1-686: crash in iwl4965)

2008-10-24 Thread Noah Meyerhans
I've now confirmed that this crash will happen with 2.6.27.3 as well. :(
I'll see about pursuing this with upstream.



signature.asc
Description: Digital signature


Bug#502326: upstream bug assigned

2008-11-15 Thread Noah Meyerhans
tags 502326 + upstream
thanks

This is bug
http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703



signature.asc
Description: Digital signature


Bug#502326: firmware update

2008-11-28 Thread Noah Meyerhans
On Sat, Nov 29, 2008 at 01:57:12AM +0100, Moritz Muehlenhoff wrote:
> > I can also confirm (2) that given the provided firmware by Zhu Yi the
> > issue is fixed on the same 2.6.27.7 kernel I built and talked about
> > before (1) for Lenny, using the iwl4965 driver.
> > 
> > 1: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502326#41
> > 2: http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703#c53
> 
> Did anyone test, whether Zhu Yi's firmware also fixes this issue
> with the standard Lenny kernel?

I cannot confirm that it actually fixes anything with the current lenny
kernels, because I've not been experiencing this bug recently.  It seems
to have something to do with the wireless network environment (number of
wireless clients, a misbehaving client, or something along those lines)
and the particular issues haven't come up recently.

I can, however, confirm that the new firmware doesn't actually seem to
*break* anything in lenny.

noah



signature.asc
Description: Digital signature


Bug#1061445: linux-image-6.7-cloud-amd64: Built CONFIG_VIRTIO_BLK into kernel

2024-04-17 Thread Noah Meyerhans
On Wed, Apr 03, 2024 at 06:26:46PM +0200, Paul Menzel wrote:
> Sorry, I have the feeling we talk past each other. I do not want to create
> an initrd. I want to boot *without* an initrd, and the only missing piece is
> building VIRTIO_BLK into the Linux kernel.
> 
> Ubuntu also builds this into their “kvm” flavour [1].
> 
> If you think, that is unnecessary, could you please elaborate, how I would
> achieve the goal with virtiofs?

The cloud kernel generally targets VM guests on the Microsoft Azure and
Amazon EC2 cloud environments, neither of which benefit from VIRTIO_BLK
driver being statically linked as you describe.  I think that's the
primary reason for reluctance to make your requested change.

For background, the Azure and AWS clouds present well-defined device
models, making it straightforward for us to construct targeted kernel
configs for them.

noah



Re: ocfs2_dlmfs missing from the cloud kernel

2024-05-17 Thread Noah Meyerhans
On Fri, May 17, 2024 at 05:34:57PM +0200, Bastian Blank wrote:
> > > how do I change this?
> > You install the non-cloud kernel.
> 
> The cloud kernel is limited in scope.  And the decision was that not
> everything you can do on platforms is in scope.

To clarify the scope a bit, historically the cloud kernels have
specifically targeted Amazon EC2 and Microsoft Azure.  The rationale for
this is that these providers present a consistent and reasonably well
defined device model, meaning we can be sure what drivers and other
kernel features are needed and which we can leave out.  It is not
intended to be usable on every cloud service.

The module in question, ocfs2_dlmfs, is not, to my knowledge, generally
useful in the cloud environments targeted by the cloud kernel.

If we had the resources, it'd be great provide further optimized kernel
builds, e.g. one for EC2 specifically, Azure specifically, GCP, and
maybe some sort of OpenStack/QEMU VM definition.  Unfortunately, we do
not currently have those resources.

So, IMO, a wishlist bug against src:linux asking for another build
configuration would be a reasonable way to record an interest in such a
change.  The kernel team may be able to provide more context on which
specific resources would be needed in order to support this.  Maybe in
the future it'll get implemented.

noah



Re: Kernel features and Cloud (and GCE)

2024-05-27 Thread Noah Meyerhans
On Mon, May 27, 2024 at 03:37:08PM +0200, Emanuele Rocca wrote:
> > So we have the problem that the Debian cloud kernel supports some, but
> > not all, of the devices our shared users need, and we’re not sure of
> > the right way to solve that. We wondered if we should switch the
> > images to the generic kernel, or if there’s a way we could help the
> > cloud kernel support more clouds, or if there’s a better solution we
> > haven’t thought of.
> 
> I think the best approach is enabling the needed modules one by one in
> the cloud image following the procedure above.

Andrew's question is a bit higher level than that, and mostly boils down
to "Which cloud environments do we actually want to support with the
cloud kernel?"

We have declined requests to enable modules in the cloud kernel in the
past, referring people to the standard kernel instead (see e.g.
#969140).  See also the previous discussion at
#https://lists.debian.org/debian-kernel/2020/04/msg6.html

We have not, as far as I can recall, ever explicitly stated a policy
around this, nor have we documented what it would take for us to support
more fine-grained kernel builds (i.e. what stops us from generating a
kernel image targeting *only* GCP).

noah



Re: Closing of buster-backports?

2022-09-07 Thread Noah Meyerhans
On Wed, Sep 07, 2022 at 07:37:45AM +0200, Alexander Wirt wrote:
> > > Now that buster is LTS and no longer officially supported, should the
> > > -backports pocket be closed? AFAIK, buster just receives the security
> > > uploads by the -security pocket and shouldn't have -backports open
> > > anymore. I hope I am not mistaken or missing anything?
> > > 
> > > FTR, packages are still entering the -backports pocket and this
> > > probably needs to stop(!?)
> > 
> > Why should it stop?  If people are willing to do the work to backport a
> > package, why should it be blocked?  The understanding is that the release as
> > a whole will not be supported, but voluntary updates will continue.
> 
> we (backports ftpmasters) asked that question some time ago. Consensus was 
> that the backports
> maintainers doesn't want to support oldstable backports over its lts 
> lifetime. 
> 
> For that reason we will close oldstable-backports soon. 

The cloud team publishes images for various cloud environments
(OpenStack, Amazon EC2, etc).  The primary (and most popular, from the
data I have) images use the main kernel, but we publish alternative
images that boot the backports kernel by default.

Is there a plan to continue offering new kernels for buster LTS?

If we simply close the backports archive, we leave these users without a
path forward without upgrading to bullseye, which is something they're
evidently not ready for.  We can cease publication of these images,
which will limit new adoption, but it'd be nice to continue providing at
least kernel backports.

noah



Bug#890343: linux: make fq_codel default for default_qdisc

2021-12-01 Thread Noah Meyerhans
On Fri, Feb 26, 2021 at 06:58:50PM +0100, Vincent Blut wrote:
> > > I think the distinction is that the other packages that tweak sysctl
> > > values don't claim to be doing so on behalf of the kernel team.  If
> > > the
> > > kernel team is responsible for the values being set, then the
> > > settings
> > > should come from a package that the kernel team owns, not some other
> > > package.
> > 
> > Right, maybe in linux-base?  Although that might annoy derivatives that
> > want different defaults.
> > 
> > procps is the wrong place, not just because it's out of our hands, but
> > because systemd applies sysctl configuration now and procps is
> > optional.
> 
> Is there a definitive answer from the kernel team about how this should be
> implemented? In the meantime, Noah sent [1].

I've rebased 
https://salsa.debian.org/kernel-team/linux/-/merge_requests/309 against
the current 'master' kernel branch on Salsa.  Basic test results are
below.  It'd be nice if the kernel team could have a look and give
feedback on the approach or recommend an alternative one if this isn't
the one they'd like to pursue.

# before reboot:
admin@ip-10-0-0-136:~$ /sbin/tc qdisc
qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev ens5 root
qdisc pfifo_fast 0: dev ens5 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: dev ens5 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
admin@ip-10-0-0-136:~$ sudo apt install 
./linux-image-5.16.0-rc3-cloud-amd64-unsigned_5.16~rc3-1~exp2_amd64.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'linux-image-5.16.0-rc3-cloud-amd64-unsigned' instead of 
'./linux-image-5.16.0-rc3-cloud-amd64-unsigned_5.16~rc3-1~exp2_amd64.deb'
The following additional packages will be installed:
  firmware-linux-free
Suggested packages:
  linux-doc-5.16 debian-kernel-handbook grub-pc | grub-efi-amd64 | extlinux
The following NEW packages will be installed:
  firmware-linux-free linux-image-5.16.0-rc3-cloud-amd64-unsigned
...
admin@ip-10-0-0-136:~$ sudo reboot
Connection to 18.236.97.48 closed by remote host.
...
admin@ip-10-0-0-136:~$ uname -a
Linux ip-10-0-0-136 5.16.0-rc3-cloud-amd64 #1 SMP PREEMPT Debian 
5.16~rc3-1~exp2 (2021-12-01) x86_64 GNU/Linux
admin@ip-10-0-0-136:~$ /sbin/tc qdisc
qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev ens5 root
qdisc fq_codel 0: dev ens5 parent :2 limit 10240p flows 1024 quantum 1514 
target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev ens5 parent :1 limit 10240p flows 1024 quantum 1514 
target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64



Re: Bug#1006346: cloud.debian.org: bullseye AMIs don't boot on Amazon EC2 Xen instances with Enhanced Networking

2022-02-25 Thread Noah Meyerhans
Control: reassign -1 src:linux
Control: tags -1 + upstream

> Amazon EC2 instance types with Enhanced Networking use the ixgbevf.ko
> driver.  The current AMIs successfully probe the ixgbevf driver and spawn
> dhclient as expected, but dhclient appears to never receive a lease.  Older
> AMIs do work on this class of instance.

Upstream commit 83dbf898a2d4 "PCI/MSI: Mask MSI-X vectors only on
success" seems to introduce a regression that breaks the "Enhanced
Networking" feature used on Amazon EC2 instances, which use PCI
passthrough access to Intel ethernet devices using the ixgbevf.ko
driver.  Systems using this hardware seem to probe their network
hardware as usual, and don't log any errors to the console, but are
never able to communicate over the NIC.

Device details:

00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
Physical Slot: 3
Flags: bus master, fast devsel, latency 64
Memory at f300 (64-bit, prefetchable) [size=16K]
Memory at f3004000 (64-bit, prefetchable) [size=16K]
Capabilities: 
Kernel driver in use: ixgbevf
Kernel modules: ixgbevf

The issue is present in Debian kernels in sid and experimental.

The patch has been backported to stable branches including those used in
our stable releaseѕ:

The 5.10.x (released with v5.10.88) is e5949933f313.  Since bullseye is
currently using v5.10.92, it is impacted.

The 4.19.x branch (released with v4.19.222) is 12ae8cd1c7e9.  Since
buster is still on v4.19.208, it is not yet impacted, but likely would
be with the next kernel update.

This issue has been reported elsewhere as well, for example Fedora
CoreOS at https://github.com/coreos/fedora-coreos-tracker/issues/1066

I have confirmed that reverting e5949933f313 from 5.10.x results in a
build that functions properly with this hardware on bullseye, but this
is probably not a reasonable thing to do generally.

noah



Bug#1007144: linux-image-cloud-amd64: Network doesn't come up on AWS Xen-based EC2 instances (ex c4.large)

2022-03-17 Thread Noah Meyerhans
Control: reassign -1 src:linux
Control: forcemerge 1006346 -1

On Sat, Mar 12, 2022 at 01:21:23AM +, Reilly Brogan wrote:
> I bisected this issue and it was introduced in kernel 5.10.88 as commit
> e5949933f313c9e2c30ba05b977a047148b5e38c "PCI/MSI: Mask MSI-X vectors
> only on success", thus present in linux-image-5.10.0-11-amd64 which uses
> the 5.10.92 kernel (and all newer versions of the package).

I've already reported this as
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006346

The change implicated has been backported to all the upstream stable
branches, so this basically impacts all the current Debian kernels (or
will, once they're updated).



Bug#1006346: cloud.debian.org: bullseye AMIs don't boot on Amazon EC2 Xen instances with Enhanced Networking

2022-03-17 Thread Noah Meyerhans
>From the upstream discussion on the linux-pci mailing list [*]:

> Yes. My understanding is that the issue is because AWS is using older
> versions of Xen. They are in the process of updating their fleet to a
> newer version of Xen so the change introduced with Stefan's commit
> isn't an issue any longer.
> 
> I think the changes are scheduled to be completed in the next 10-12
> weeks. For now we are carrying a revert in the Fedora Kernel.
> 
> You can follow this Fedora CoreOS issue if you'd like to know more
> about when the change lands in their backend. We work closely with one
> of their partner engineers and he keeps us updated.
> https://github.com/coreos/fedora-coreos-tracker/issues/1066

Ideally we can revert the upstream commit from the stable kernels, since
otherwise Debian users on AWS Xen instance types may be stuck using
older, unsafe kernels.  Especially if we have time to include the change
in the upcoming bullseye and buster point releases.  If the kernel
updates for those stable updates have already been built, though, it
might be too late to matter.  By the time we publish our next kernel
builds, the AWS Xen update may be complete.

noah

* 
https://lore.kernel.org/linux-pci/c4a65b9a-d1e2-bf0d-2519-aac718593...@redhat.com/



Bug#1006346: cloud.debian.org: bullseye AMIs don't boot on Amazon EC2 Xen instances with Enhanced Networking

2022-03-19 Thread Noah Meyerhans
On Sat, Mar 19, 2022 at 10:41:39AM +0100, Salvatore Bonaccorso wrote:
> > >From the upstream discussion on the linux-pci mailing list [*]:
> > 
> > > Yes. My understanding is that the issue is because AWS is using older
> > > versions of Xen. They are in the process of updating their fleet to a
> > > newer version of Xen so the change introduced with Stefan's commit
> > > isn't an issue any longer.
> > > 
> > > I think the changes are scheduled to be completed in the next 10-12
> > > weeks. For now we are carrying a revert in the Fedora Kernel.
> > > 
> > > You can follow this Fedora CoreOS issue if you'd like to know more
> > > about when the change lands in their backend. We work closely with one
> > > of their partner engineers and he keeps us updated.
> > > https://github.com/coreos/fedora-coreos-tracker/issues/1066
> > 
> > Ideally we can revert the upstream commit from the stable kernels, since
> > otherwise Debian users on AWS Xen instance types may be stuck using
> > older, unsafe kernels.  Especially if we have time to include the change
> > in the upcoming bullseye and buster point releases.  If the kernel
> > updates for those stable updates have already been built, though, it
> > might be too late to matter.  By the time we publish our next kernel
> > builds, the AWS Xen update may be complete.
> 
> Wehere one can track the update status for their Xen version directly
> or is following the above the only reference?

It's just for reference; the deployment timeline isn't published.  As
far as I know, it's also subject to change in the event that unexpected
issues arise or it's preempted by some high severity issue.

> How frequent is this particular combination of hardware/software? We
> have the change already applied for a while in bullseye, buster would
> be impacted new since the last update done for security fixes

The impacted instance types aren't the most common, as they're not the
latest generation.  So I expect that the majority of the impact is felt
by people or organizations that haven't yet been able to make time to
switch to newer instance types.  The implication here, of course, is
that many of these deployments may be production environment where
stability is prioritized over migration to the new thing.

We get a little bit of data about what instance types are used with
Debian on AWS, but it's incomplete as it only reflects usage by AWS
customers who use access Debian via the AWS Marketplace.  Consider it
something like popcon data; it's essentially opt-in.  If the data we get
from the Marketplace covering the past 3 days worth of activity is
representative of the Debian usage in general, then it looks like
roughly 1% of Debian users on AWS are trying to use the impacted
instance types.

> Are there workarounds for the affected users of this combination? I
> see some options listed in 
> https://wiki.debian.org/Cloud/AmazonEC2Image/Bullseye 

People can use newer generation instance types, which are not impacted.
Depending on the use case, that could be a trivial change, but it could
also be disruptive.  Newer instance types aren't based on Xen at all and
expose a different hardware device model to the instance.  Debian
supports the newer instance types, but the end user workload may still
need additional nontrivial qualification.

> If we revert the commit it reverts a fix for a bug with Marvell NVME
> devices.
> 
> But we cannot just revert the commit for the cloud images.

Understood.

> If we know something about the release schedule from Amazon to update
> their Xen instances (which is the way to move forward, since upstream
> won't revert the commit) then we should leave the status as it is for
> bullseye (and now for buster). For bullseye there is there is
> CVE-2022-0847 fixes they would need to pick up.

Yes, the problem will go away when the Xen fleet is updated.  It sounds
like we're looking at roughly a 3 month timeline, after which point the
patch won't be a problem.  However, until then, people who need to use
Xen instances will be stuck either running an unsafe kernel or building
their own.

noah



Re: Generating a cloud / VM kernel package

2017-08-27 Thread Noah Meyerhans
On Sat, Aug 26, 2017 at 05:18:45PM +0100, Ben Hutchings wrote:
> > Thomas, can you elaborate why you think this a good idea? Is this about
> > boot time of the kernel image? The thing I really do not want to have is
> > additional kernel source uploads to the archive for just those cloud
> > kernel images, but you already considered that a bad idea (from what I
> > read between your lines).
> 
> When the Google Cloud people talked to me about slow booting, it turned
> out that reconfiguring initramfs-tools to MODULES=dep made a big
> improvement.  That is likely to be a sensible configuration for most
> cloud images.

I'm not sure that'll work for us. The image generation is not generally
expected to occur on cloud instances (though in practice it certainly
may).

OTOH, the list of required modules may be small enough for us to
enumerate the ones we need for booting in /etc/initramfs-tools/modules.
I will look into this, and we'll see what it does to boot times.

noah



signature.asc
Description: PGP signature


Re: Generating a cloud / VM kernel package

2017-08-27 Thread Noah Meyerhans
On Mon, Aug 28, 2017 at 01:31:31AM +0100, Ben Hutchings wrote:
> > OTOH, the list of required modules may be small enough for us to
> > enumerate the ones we need for booting in /etc/initramfs-tools/modules.
> 
> ...and then you could use MODULES=list.  initramfs-tools will still
> follow module static dependencies in this case.
> 
> > I will look into this, and we'll see what it does to boot times.
> 
> Note that the saving will mainly be in time to load the initramfs -
> which on Google Compute Engine is done through BIOS disk services that
> have very low performance.  The mere presence of the unneeded modules
> in the initramfs won't cause them to be loaded into the kernel and
> shouldn't make much difference to the time taken to boot after this
> point.

On Amazon's HVM instance families, the initramfs is read from "local"
disk, which may be network-attached or actually local. I haven't
profiled load times in great depth, but my guess is that reading and
uncompressing the image would be the biggest contributors to the load
time. In my experimentation, uncompressing an 18 MB initramfs takes
roughly 500 ms of clock time when read from network storage. That's not
completely insignificant, but considering the fragility of MODULES=list
or MODULES=dep, I'm not sure it's the best place to look for
optimizations right now.

noah



signature.asc
Description: PGP signature


Re: Generating a cloud / VM kernel package

2017-08-28 Thread Noah Meyerhans
On Sun, Aug 27, 2017 at 04:16:50PM +0200, Thomas Goirand wrote:
> Basically, the only thing that I want to see is a specific config for
> that kernel, nothing else. Otherwise, it's going to be too much
> maintenance work. Indeed, it should *not* be a different source upload,
> that's too much work as well. There also may be some optimization that
> we could do.
> 
> Also, I don't see this happening without a prior agreement from the
> kernel team (which means probably that Ben has to agree). On our side,
> we could prepare a list of kernel modules that we do *not* want.

You might consider looking at what Ubuntu did to their kernel.
https://insights.ubuntu.com/2017/04/05/ubuntu-on-aws-gets-serious-performance-boost-with-aws-tuned-kernel/
suggests that they did more than just disable some modules, but it's
light on details.

If we're able to come up with a specific list of proposed changes, we'll
probably be able to have a more fruitful conversation.

noah



signature.asc
Description: PGP signature


Bug#910049: linux-image-4.18.0-1-cloud-amd64: Please enable Amazon ENA NIC support

2018-10-01 Thread Noah Meyerhans
Package: linux-image-4.18.0-1-cloud-amd64
Version: 4.18.8-1
Severity: wishlist
Tags: patch
Control: affects -1 cloud.debian.org

The cloud variant of the kernel packages does not currently enable
CONFIG_ENA_ETHERNET, meaning it is not able to drive the network hardware
on modern AWS instances.

A patch for enabling this is attached. I have tested it with the linux
sources from stretch-backports on an AWS m5d instance and confirmed that
the ENA driver is properly enabled and functional.
--- /mnt/config.amd64_none_cloud-amd64  2018-10-01 20:47:28.803906526 +
+++ .config 2018-10-01 20:58:42.155319362 +
@@ -4,7 +4,7 @@
 #
 
 #
-# Compiler: gcc-6 (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
+# Compiler: gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
 #
 CONFIG_64BIT=y
 CONFIG_X86_64=y
@@ -57,6 +57,7 @@
 # CONFIG_COMPILE_TEST is not set
 CONFIG_LOCALVERSION=""
 # CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_BUILD_SALT=""
 CONFIG_HAVE_KERNEL_GZIP=y
 CONFIG_HAVE_KERNEL_BZIP2=y
 CONFIG_HAVE_KERNEL_LZMA=y
@@ -1935,7 +1936,8 @@
 # CONFIG_NET_VENDOR_ALACRITECH is not set
 # CONFIG_NET_VENDOR_ALTEON is not set
 # CONFIG_ALTERA_TSE is not set
-# CONFIG_NET_VENDOR_AMAZON is not set
+CONFIG_NET_VENDOR_AMAZON=y
+CONFIG_ENA_ETHERNET=m
 # CONFIG_NET_VENDOR_AMD is not set
 # CONFIG_NET_VENDOR_AQUANTIA is not set
 # CONFIG_NET_VENDOR_ARC is not set


Bug#910049: Acknowledgement (linux-image-4.18.0-1-cloud-amd64: Please enable Amazon ENA NIC support)

2018-10-22 Thread Noah Meyerhans
Submitted the patch in more complete form at 
https://salsa.debian.org/kernel-team/linux/merge_requests/68



Bug#896165: linux: request packaging of bpftool

2018-11-14 Thread Noah Meyerhans
On Fri, Apr 20, 2018 at 02:07:40PM +0200, Simon Horman wrote:
> I would like to request packaging of bpftool which has been
> included in upstream Linux tree since v4.15-rc1. I expect this can
> be done in a similar manner to the way that perf, also present in
> the upstream Linux kernel tree, is packaged.

I've started some work on this. It's not ready to be merged with the
kernel packaging but does build. Please see
https://salsa.debian.org/noahm/linux/commits/bpftool and feel free to
send improvements.

noah



signature.asc
Description: PGP signature


Bug#896165: linux: request packaging of bpftool

2018-11-19 Thread Noah Meyerhans
On Fri, Apr 20, 2018 at 02:07:40PM +0200, Simon Horman wrote:
> I would like to request packaging of bpftool which has been
> included in upstream Linux tree since v4.15-rc1. I expect this can
> be done in a similar manner to the way that perf, also present in
> the upstream Linux kernel tree, is packaged.

Please see https://salsa.debian.org/kernel-team/linux/merge_requests/72

It's not ready for merge, but hopefully it gets some good feedback and I
can get it ready before long.

I expect that applying the same patch to the 4.18 branch for sid will be
straightforward.

Is the plan for buster to include 4.18, or 4.19? Or something else?

noah



signature.asc
Description: PGP signature


Bug#896165: linux: request packaging of bpftool

2018-11-20 Thread Noah Meyerhans
On Mon, Nov 19, 2018 at 11:34:26PM -0800, Noah Meyerhans wrote:
> Please see https://salsa.debian.org/kernel-team/linux/merge_requests/72

Ugh. We cannot currently package bpftool in Debian. There are several
GPLv2-only files in its source tree, and it links unconditionally
against the GPLv3 libbfd. :(

There is work underway to make libbfd optional, so the situation may
change before too long:
https://www.mail-archive.com/netdev@vger.kernel.org/msg254808.html

noah



signature.asc
Description: PGP signature


Bug#896165: linux: request packaging of bpftool

2018-11-28 Thread Noah Meyerhans
On Tue, Nov 27, 2018 at 09:50:17AM -0800, Jakub Kicinski wrote:
> > > Please see https://salsa.debian.org/kernel-team/linux/merge_requests/72
> > 
> > Ugh. We cannot currently package bpftool in Debian. There are several
> > GPLv2-only files in its source tree, and it links unconditionally
> > against the GPLv3 libbfd. :(
> 
> If we relicense the GPLv2-only files to be GPLv2-only OR BSD-2-Clause
> - like the majority of bpftool sources - would that work?
> 
> I wanted to make sure GPLv2-only + BSD-2-Clause will satisfy the
> license requirement when linking against libbfd, before I start chasing
> people for acks on the relicense :)

Yes, the BSD 2-clause license is OK. GPLv2 or greater would be OK, too.
It's really just GPLv2-only in this case that's causing the problem.

noah



signature.asc
Description: PGP signature


Bug#915229: src:linux: Updated driver needed for Amazon ENA ethernet

2018-12-01 Thread Noah Meyerhans
Package: src:linux
Severity: important

ENA is an ethernet adaptor used on Amazon EC2 cloud instances. Version 2.0.2 of
the ENA driver was added to the mainline kernel as of version 4.20. This
version includes fixes for various bugs, some of which result in kernel panics,
and is needed in order to enable access to features available on newer
hardware. It is also needed in order to support the recently announced
arm64-based instances.

I have opened https://salsa.debian.org/kernel-team/linux/merge_requests/77 to
backport ENA 2.0.2 to 4.19 for buster.

I have a WIP branch for stretch's 4.9 kernel at
https://salsa.debian.org/noahm/linux/tree/stretch+ena2 This requires a bit more
effort as 4.9 contains a much older version of the driver.

Thanks
noah



Bug#915231: src:linux: Enable PCI_HOTPLUG for arm64

2018-12-01 Thread Noah Meyerhans
Package: src:linux
Version: 4.9.130-2
Severity: wishlist
Tags: stretch

Amazon recently announced arm64-based EC2 instances. These instances rely on
PCI_HOTPLUG functionality to support attach/detach of resources such as
ethernet interfaces and block devices. PCI_HOTPLUG is enabled for arm64 in
buster and sid, but not stretch. Please enable PCI_HOTPLUG for arm64 in order
to support EC2 on arm64.

Thanks
noah



Bug#915231: Proposed fix submitted on salsa

2018-12-06 Thread Noah Meyerhans
Control: tags -1 + patch

Merge request: https://salsa.debian.org/kernel-team/linux/merge_requests/80



Bug#915229: src:linux: Updated driver needed for Amazon ENA ethernet

2018-12-06 Thread Noah Meyerhans
Control: tags -1 + patch

Merge request for Linux 4.9 (stretch): 
https://salsa.debian.org/kernel-team/linux/merge_requests/81



Bug#918188: linux: FTBFS on arm64

2019-01-04 Thread Noah Meyerhans
On Fri, Jan 04, 2019 at 06:57:21AM +0100, Salvatore Bonaccorso wrote:
> >   LD  vmlinux.o
> >   MODPOST vmlinux.o
> >   GEN .version
> >   CHK include/generated/compile.h
> >   UPD include/generated/compile.h
> >   CC  init/version.o
> >   LD  init/built-in.o
> > ./drivers/firmware/efi/libstub/lib.a(arm64-stub.stub.o): In function 
> > `handle_kernel_image':
> > ./debian/build/build_arm64_none_arm64/./drivers/firmware/efi/libstub/arm64-stub.c:63:
> >  undefined reference to `__efistub__GLOBAL_OFFSET_TABLE_'
> > ld: ./drivers/firmware/efi/libstub/lib.a(arm64-stub.stub.o): relocation 
> > R_AARCH64_ADR_PREL_PG_HI21 against external symbol 
> > `__efistub__GLOBAL_OFFSET_TABLE_' can not be used when making a shared 
> > object; recompile with -fPIC
> > /<>/Makefile:1010: recipe for target 'vmlinux' failed
> > make[5]: *** [vmlinux] Error 1
> > Makefile:152: recipe for target 'sub-make' failed
> > make[4]: *** [sub-make] Error 2
> > Makefile:24: recipe for target '__sub-make' failed
> > make[3]: *** [__sub-make] Error 2
> > make[3]: Leaving directory 
> > '/<>/debian/build/build_arm64_none_arm64'
> > debian/rules.real:190: recipe for target 
> > 'debian/stamps/build_arm64_none_arm64' failed
> > make[2]: *** [debian/stamps/build_arm64_none_arm64] Error 2
> > make[2]: Leaving directory '/<>'
> > debian/rules.gen:400: recipe for target 'build-arch_arm64_none_arm64_real' 
> > failed
> > make[1]: *** [build-arch_arm64_none_arm64_real] Error 2
> > make[1]: Leaving directory '/<>'
> > debian/rules:41: recipe for target 'build-arch' failed
> > make: *** [build-arch] Error 2
> > dpkg-buildpackage: error: debian/rules build-arch gave error exit status 2
> 
> https://buildd.debian.org/status/fetch.php?pkg=linux&arch=arm64&ver=4.9.144-1&stamp=1546572157&raw=0

The problem was introduced with upstream commit
27b5ebf61818749b3568354c64a8ec2d9cd5ecca. Reverting that commit fixes
the build, but there may be a better option.



signature.asc
Description: PGP signature


Re: Possibilities for a special Azure or cloud Linux package

2017-12-17 Thread Noah Meyerhans
On Fri, Dec 15, 2017 at 08:03:51PM +0100, Bastian Blank wrote:
> > > We at credativ are responsible for maintaining the Azure cloud images.
> > > We got asked by Microsoft to explore the possibilities of introducing a
> > > specialised Linux image for this plattform into Debian.  The main
> > > enhancements we look at would be:
> > > - faster boot of the instance,
> > > - smaller memory footprint of the running kernel, and
> > > - new features.
> > 
> > However, if it is possible to create a single flavour that provides
> > those sorts of enhancements for multiple cloud platforms, I think that
> > would be worthwhile.
> 
> I have some initial findings for a kernel using a derived config.  I
> reduced the boot time by 5 seconds (from 30 to 25).  The installed size
> was reduced from 190 to 50MB.
> 
> Microsoft published a patch set against 4.13 and 4.14
> https://github.com/Microsoft/azure-linux-kernel
> they would like to add.
> 
> Now the question is if other cloud providers would like to follow such a
> path by using and careing for a specialised linux image for this
> platforms?

I'd be interested in helping to support work for a cloud kernel and
verifying its functionality on EC2. I can't, however, make a lot of
promises about how much time I can commit to this effort.

In our previous thread on this topic[1] it was suggested[2] that a tuned
initramfs config might go a long way toward reducing boot times. It's
not an investigation to which I've been able to devote much time,
unfortunately, but I think we should pursue that path to completion
before we look at patching the kernel or providing custom builds. Has
this been done on any cloud platform yet?

noah

1. https://lists.debian.org/debian-cloud/2017/08/msg00025.html
2. https://lists.debian.org/debian-cloud/2017/08/msg00032.html


signature.asc
Description: PGP signature


Bug#969140: linux-image-5.7.0-0.bpo.2-amd64: Please enable CONFIG_F2FS_FS in the cloud image kernel

2020-08-28 Thread Noah Meyerhans
On Sat, Aug 29, 2020 at 11:43:21AM +1000, Hamish Moffatt wrote:
> > > Could you please enable CONFIG_F2FS_FS in the cloud kernel?
> > [...]
> > 
> > What makes you think f2fs will be commonly used in cloud deployments?
> > 
> I don't know that it will be, but as it supports encryption and compression
> and benchmarks shows it performs at least as well as ext4, I don't see why
> it couldn't be a good choice for virtual machines. It seems at least as
> useful in a cloud deployment as minix and hpfs which are included in this
> flavour.

This sounds like a good argument for turning off minix and hpfs to me.
;)

Is F2FS usable as a Debian root filesystem?  Does it support all the
features (file capabilities and POSIX ACLs, for example) that are
commonly used on Debian systems?

The cloud team, for what it's worth, does not have any plans to switch
from ext4 in the forseeable future.  (We probably would not do so unless
Debian made the change distro-wide.)  That doesn't mean we shouldn't
consider enabling it, but I'd like to see a clearer use case.  My small
amount of research into it (mostly reading wikipedia and a couple of the
reference sources) suggests that it's most popular on phones and similar
devices, not cloud instances.  Do you see use cases involving
manipulating filesystems for those type of devices in cloud VMs?  Or
something else?

How are crypto keys handled for its encryption functionality?

The cloud kernel is not expected to be useful for 100% of people, even
in cloud environments.  In cases where specific functionality is needed
that isn't available in the cloud kernel, the generic kernel is
available and I'd probably recommend that.

noah



Bug#969443: src:linux: none

2020-09-02 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.132-1
Severity: important
Tags: buster

When used in virtual machine environments, Linux on amd64 is able to report
"steal time" to the guest.  This functionality has been supported by Linux
on amd64 for years, but was only added to arm64 with Linux 5.5.

As Debian and arm64 are increasingly used in virtual environments, including
cloud environments, the ability to report steal time is increasingly
important for system monitoring and performance analysis.  Thus, I'd like to
request that CPU steal time accounting support for arm64 be backported to
buster, if possible.

noah



Bug#969443: arm64: please backport stolen time support to buster kernel (Re: src:linux: none)

2020-09-02 Thread Noah Meyerhans
On Wed, Sep 02, 2020 at 06:09:46PM -0700, Jonathan Nieder wrote:
> > Subject: src:linux: none
> 
> It looks like you forgot to include a subject line?

*sigh* Computers are hard.

> The relevant series is a4b28f5c6798 (Merge remote-tracking branch
> 'kvmarm/kvm-arm64/stolen-time' into kvmarm-master/next, 2019-10-24).
> It may be possible to try applying it if you are interested.

That is the series I had in mind, yes.  I will open a merge request once
I have one prepared and tested.

> That said, for this kind of issue that is about new features instead
> of hardware support, my advice would be to use a more current kernel
> from backports instead.

That's not an option for users requiring security support.  While the
prevelance of arm64-based clouds is still relatively low, it is growing.
As it becomes more common, the lack of visibility into CPU steal time
will become a significant issue for people running in these
environments, hence the "important" severity.

noah



Bug#969443: Info received (Bug#969443: arm64: please backport stolen time support to buster kernel (Re: src:linux: none))

2020-09-03 Thread Noah Meyerhans
Control: tags -1 + patch

I've submitted a merge request containing a proposed fix at
https://salsa.debian.org/kernel-team/linux/-/merge_requests/268

Tested in an arm64 KVM environment with steal time reporting; validated
its behavior against that of the sid kernel.



Bug#972709: Wishlist/RFC: Change to CONFIG_PREEMPT_NONE in linux-image-cloud-*

2020-11-22 Thread Noah Meyerhans
On Sun, Nov 22, 2020 at 03:53:32PM -0800, Flavio Veloso Soares wrote:
>  Unfortunately, I couldn't find many comprehensive benchmarks of kernel
>  CONFIG_PREEMPT* options. The one at
>  
> [1]https://www.codeblueprint.co.uk/2019/12/23/linux-preemption-latency-throughput.html
>  seems to be very thorough,
> 
>  [...]
> 
>  Not particularly.  I'm used to latency benchmarks showing e.g. average,
>  90th percentile, 99th percentile, as well as worst.

I don't think Ben was talking about specific benchmarks.  The web page
you cites lacks basic measurements one would expect to see from *any*
meaningful performance benchmark.  Comparing maximum latency is fine,
but it's not really relevant by itself.  If a configuration change
improves the worst case (100th percentile) but negatively impacts the
50th percentile, is that a change worth making?  Maybe.  But without
having that data at all, the benchmark really isn't worth much at all.

It's totally reasonable for us to consider making this change, but we
should have comprehensive data about the impact of doing so.  What
impact does the change have on different classes of workloads?  e.g.
high tps, CPU-bound, IO-bound, etc.  It's entirely possible that the
proposed change improves performance under certain workloads, but
negatively impacts others.  Without knowing the impact in more in more
detail, which would allow us to evaluate the tradeoffs, I don't think
there's a compelling reason to make a change.

noah



Bug#977005: cloud: Additional modules to disable

2020-12-09 Thread Noah Meyerhans
Package: src:linux
Version: 5.9.11-1
Severity: wishlist

Per discussion in the recent cloud-team meeting, there are several
additional modules that can be disabled in the cloud kernel.  The
proposed candidates for removal is below; please feel free to critique
specific entries if you think there's a need to leave them enabled.

fs/coda/coda.ko (CODA_FS)
fs/reiserfs/reiserfs.ko (REISERFS_FS)
fs/hpfs/hpfs.ko (HPFS_FS)
fs/hfsplus/hfsplus.ko   (HFSPLUS_FS)
fs/hfs/hfs.ko   (HFS_FS)
fs/jfs/jfs.ko   (JFS_FS)
fs/nilfs2/nilfs2.ko (NILFS2_FS)
fs/minix/minix.ko   (MINIX_FS)
fs/ecryptfs/ecryptfs.ko (ECRYPT_FS)

Thanks
noah



Bug#977615: arm64: memory corruption bug

2020-12-17 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.160-2
Severity: important
Tags: upstream fixed-upstream
Control: fixed -1 5.9.15-1
Control: fixed -1 5.10~rc7-1~exp1
Control: found -1 5.9.11-1

Opening a bug for visibility.  Arguably this could be Severity: grave given
that memory corruption can lead to data loss.  It has been fixed upstream in
4.19.161, 5.9.12, and 5.10.  I'm not sure about the status for 4.9/stretch
LTS.

There is a memory corruption bug impacting arm64.  The upstream fix was made
in 5.10 with commit ff1712f953e2 ("arm64: pgtable: Ensure dirty bit is
preserved across pte_wrprotect()").  The upstream commit [1] describes the
issue as:

With hardware dirty bit management, calling pte_wrprotect() on a
writable, dirty PTE will lose the dirty state and return a
read-only, clean entry.

Impact from the issue has been observed in the real world on systems running
redis, as described at https://github.com/redis/redis/issues/8124 (note in
particular comments [2] and [3], where the kernel connection is made).

1. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff1712f953e27f0b0718762ec17d0adb15c9fd0b
2. https://github.com/redis/redis/issues/8124#issuecomment-745791340
3. https://github.com/redis/redis/issues/8124#issuecomment-745838911



Bug#977615: arm64: memory corruption bug

2020-12-17 Thread Noah Meyerhans
> Thanks. Pending currently with the ongoing rebase in the v4.19.y
> series in
> https://salsa.debian.org/kernel-team/linux/-/merge_requests/295 .
> 
> Just we need to check if this warrants a regression update issued
> earlier via stable-updates.

If possible, I'd vote for an release via stable-updates, and I'd be
happy to put together a merge request for such a release.

It seems that the bug is triggered during relatively uncommon
circumstances (CoW and MADVISE_FREE used together), but in places where
it is triggered, the impact is severe and there is no practical
workaround.

noah



Bug#890343: linux: make fq_codel default for default_qdisc

2021-01-07 Thread Noah Meyerhans
On Thu, Apr 23, 2020 at 03:34:06PM -0700, Matt Taggart wrote:
> #890343 was originally opened against systemd asking to install the upstream
> systemd sysctl.d/50-default.conf file that sets:
> 
> net.core.default_qdisc = fq_codel
> 
> As explained in #950701 (and the systemd debian changelog) the debian
> systemd maintainers felt that systemd in debian should not be changing
> kernel policies (and I agree).
> So #890343 was reassigned to linux to consider changing the default.
> 
> fq_codel is better in every way than pfifo_fast and I am unaware of any
> reason why it would not be a better default. (but don't trust me, ask the
> kernel networking experts)
> 
> Can we change it?

I strongly agree that we should make this change for the bullseye
release.

I'm looking into provding a patch to implement the switch to fq_codel by
default, but it appears to require something more than just a kernel
config change.  I have tried the following with the 5.10 kernel from the
current sid branch:

CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_DEFAULT_FQ_CODEL=y
CONFIG_DEFAULT_NET_SCH="fq_codel"

Then we don't see any change at all to the qdisc in use:

admin@ip-10-0-0-162:~$ grep -i fq_codel /boot/config-$(uname -r)
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_DEFAULT_FQ_CODEL=y
CONFIG_DEFAULT_NET_SCH="fq_codel"
admin@ip-10-0-0-162:~$ /sbin/sysctl net.core.default_qdisc
net.core.default_qdisc = pfifo_fast
admin@ip-10-0-0-162:~$ tc qdisc show dev ens5
qdisc mq 0: root
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
admin@ip-10-0-0-162:~$ ip link show dev ens5
2: ens5:  mtu 9001 qdisc mq state UP mode 
DEFAULT group default qlen 1000
link/ether 02:47:e2:7c:be:ff brd ff:ff:ff:ff:ff:ff
altname enp0s5

If we statically link the fq_codel module into the kernel, then we see:

admin@ip-10-0-0-162:~$ grep -i fq_codel /boot/config-$(uname -r)
CONFIG_NET_SCH_FQ_CODEL=y
CONFIG_DEFAULT_FQ_CODEL=y
CONFIG_DEFAULT_NET_SCH="fq_codel"
admin@ip-10-0-0-162:~$ /sbin/sysctl net.core.default_qdisc
net.core.default_qdisc = fq_codel
admin@ip-10-0-0-162:~$ /sbin/tc qdisc show dev ens5
qdisc mq 0: root
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5ms 
interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms 
interval 100ms memory_limit 32Mb ecn drop_batch 64
admin@ip-10-0-0-162:~$ ip link show dev ens5
2: ens5:  mtu 9001 qdisc mq state UP mode 
DEFAULT group default qlen 1000
link/ether 02:47:e2:7c:be:ff brd ff:ff:ff:ff:ff:ff
altname enp0s5

So in this case, we have fq_codel configured, but not as the root
qdisc for the interface.  If we manually set it:

admin@ip-10-0-0-162:~$ sudo /sbin/tc qdisc add root dev ens5 fq_codel

Then we get the following configuration:

admin@ip-10-0-0-162:~$ /sbin/tc qdisc show dev ens5
qdisc fq_codel 8001: root refcnt 3 limit 10240p flows 1024 quantum 9015 target 
5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
admin@ip-10-0-0-162:~$ ip link show dev ens5
2: ens5:  mtu 9001 qdisc fq_codel state UP 
mode DEFAULT group default qlen 1000
link/ether 02:47:e2:7c:be:ff brd ff:ff:ff:ff:ff:ff
altname enp0s5

I believe that this is what we want.  Is that accurate?

The recent thread at 
https://www.mail-archive.com/netdev@vger.kernel.org/msg380410.html
also seems relevant.

noah



Bug#890343: linux: make fq_codel default for default_qdisc

2021-01-20 Thread Noah Meyerhans
On Sun, Jan 17, 2021 at 10:29:44PM -0300, Ivan Baldo wrote:
>     I think we want the mq qdisc to distribute the load between cores, to
> support very high speed network cards or too slow CPUs.

Yep, you're right. Though it's not about CPU cores, but about tx queues
on the NIC hardware.

>     Also, if net.core.default_qdisc = fq_codel is used, it also has the mq
> qdisc and fq_codel childs for each CPU core, so that's the default behavior
> in other distros.

Yep, so I think the behavior when we set the default qdisc to fq_codel
in the kernel config has the effect we're looking for, after all.

noah



Bug#890343: linux: make fq_codel default for default_qdisc

2021-01-20 Thread Noah Meyerhans
On Wed, Jan 20, 2021 at 10:22:16PM +0100, Vincent Blut wrote:
> My proposal would differ from yours though in that it would not touch the 
> kernel
> configuration but would instead consist in patching procps to provide a
> configuration file (let's say default_qdisc.conf) to set the value of the
> net.core.default_qdisc variable to fq_codel via sysctl.
> This would allow to benefit from FQ_Codel without depending on a specific 
> Linux
> version.

We could do that.  However, in the past (earlier in this bug, even) it's
been pointed out that other packages should not be responsible for
setting kernel policies, so changes like this should be the
responsibility of the kernel packages.  That seems like a sensible
position to take.

One possible way for the kernel team to take ownership of this would be
for it to introduce a new "debian-kernel-sysctl" package or something
like that to provide some sysctl.d drop-in files.  It could then set
net.core.default_qdisc, and potentially others in various scenarios.
Such a package can be installed indepdendently of whether the user is
running a Debian-provided kernel package.

The other alternative is the one I've proposed, which involves changing
the compile-time defaults in Debian's kernel packages.  This obviously
only affects users of those packages.  However, I think that's fine;
people who are building their own packages may very well be starting
from Debian's config, in which case they'll still get this change, or
may be constructing their own configuration from scratch, in which case
they're assuming ownership of all the parameters.

noah



Bug#890343: linux: make fq_codel default for default_qdisc

2021-01-20 Thread Noah Meyerhans
Control: tags -1 + patch

A proposed patch is at
https://salsa.debian.org/kernel-team/linux/-/merge_requests/309



Bug#890343: linux: make fq_codel default for default_qdisc

2021-01-20 Thread Noah Meyerhans
On Wed, Jan 20, 2021 at 11:39:16PM +0100, Vincent Blut wrote:
> > We could do that.  However, in the past (earlier in this bug, even) it's
> > been pointed out that other packages should not be responsible for
> > setting kernel policies, so changes like this should be the
> > responsibility of the kernel packages.  That seems like a sensible
> > position to take.
> 
> If this is the position of the kernel team, then fine. But some packages *do*
> tweak kernel parameters using the sysctl interface mechanism. So does the 
> kernel
> team provides documention about what is acceptable?

I think the distinction is that the other packages that tweak sysctl
values don't claim to be doing so on behalf of the kernel team.  If the
kernel team is responsible for the values being set, then the settings
should come from a package that the kernel team owns, not some other
package.

AFAIK, there are no guidelines or policy anywhere in Debian about
whether or not a package can provide its own sysctl settings.

noah



Bug#809293: linux-image-3.16.0-4-amd64: network regression in 3.16.7-ckt20-1+deb8u1 breaks ipv6 ike/ipsec negotiations

2015-12-28 Thread Noah Meyerhans
Package: src:linux
Version: 3.16.7-ckt20-1+deb8u1
Severity: normal

Following the recent kernel security update, racoon(8) from ipsec-tools
can no longer negotiate an IPSec security association with an ipv6 peer.
IPv4 does not appear affected.

Racoon logs the following:
Dec 28 13:20:42 amarth racoon: ERROR: recvmsg (Resource temporarily unavailable)
Dec 28 13:20:42 amarth racoon: ERROR: failed to receive isakmp packet at 
isakmp.c:238: Resource temporarily unavailable

This happens when trying to read an IKE (udp port 500) message from the
peer.

Downgrading to 3.16.7-ckt11-1+deb8u3 resolves the problem.

My first guess was that it was related to the recently added
net-add-validation-for-the-socket-syscall-protocol.patch fix, so I tried
backing that out, but it didn't help. Then I realized that there were a
lot more changes between 3.16.7-ckt11-1+deb8u3 and 3.16.7-ckt20-1+deb8u1
than what were described in the DSA. I'm attempting to identify the
specific commit (at least to the debian packaging repo, if not the
actual ckt kernel)

Thanks
Noah

-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt20-1+deb8u1 (2015-12-14)

** Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 root=/dev/mapper/amarth--vg-root ro include 
console=tty1 console=ttyS0,115200

** Not tainted

** Kernel log:
[7.248625] systemd[1]: Starting Syslog Socket.
[7.264080] systemd[1]: Listening on Syslog Socket.
[7.269054] systemd[1]: Starting Journal Service...
[7.292065] systemd[1]: Started Journal Service.
[7.537734] systemd-udevd[205]: starting version 215
[7.922679] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input4
[7.930230] ACPI: Power Button [PWRF]
[8.020609] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[8.041411] ipmi message handler version 39.2
[8.093982] intel_rng: FWH not detected
[8.098337] IPMI System Interface driver.
[8.102708] ipmi_si: probing via SMBIOS
[8.106607] ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 0
[8.112626] ipmi_si: Adding SMBIOS-specified kcs state machine
[8.118574] ipmi_si: Trying SMBIOS-specified kcs state machine at i/o 
address 0xca8, slave address 0x20, irq 0
[8.152503] input: PC Speaker as /devices/platform/pcspkr/input/input6
[8.177073] EDAC MC: Ver: 3.0.0
[8.192219] EDAC MC0: Giving out device to module i5000_edac.c controller 
I5000: DEV :00:10.0 (POLLED)
[8.201979] EDAC PCI0: Giving out device to module i5000_edac controller 
EDAC PCI controller: DEV :00:10.0 (POLLED)
[8.313606] [drm] Initialized drm 1.1.0 20060810
[8.328595] dcdbas dcdbas: Dell Systems Management Base Driver (version 
5.6.0-3.2)
[8.364313] ipmi_si ipmi_si.0: Found new BMC (man_id: 0x0002a2, prod_id: 
0x0100, dev_id: 0x20)
[8.373025] ipmi_si ipmi_si.0: IPMI kcs interface initialized
[8.414596] iTCO_vendor_support: vendor-support=0
[8.419814] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[8.425511] iTCO_wdt: Found a 631xESB/632xESB TCO device (Version=2, 
TCOBASE=0x0860)
[8.433453] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[8.613050] [drm] radeon kernel modesetting enabled.
[8.618713] [drm] initializing kernel modesetting (RV100 0x1002:0x515E 
0x1028:0x01B2).
[8.626780] [drm] register mmio base: 0xFC2D
[8.631475] [drm] register mmio size: 65536
[8.635822] radeon :0e:0d.0: VRAM: 128M 0xD800 - 
0xDFFF (16M used)
[8.644525] radeon :0e:0d.0: GTT: 512M 0xB800 - 
0xD7FF
[8.652220] [drm] Detected VRAM RAM=128M, BAR=128M
[8.657090] [drm] RAM width 16bits DDR
[8.660836] [TTM] Zone  kernel: Available graphics memory: 1026588 kiB
[8.667472] [TTM] Initializing pool allocator
[8.673206] [TTM] Initializing DMA pool allocator
[8.679313] [drm] radeon: 16M of VRAM memory ready
[8.685456] [drm] radeon: 512M of GTT memory ready.
[8.693034] [drm] GART: num cpu pages 131072, num gpu pages 131072
[8.720561] [drm] PCI GART of 512M enabled (table at 0x7A60).
[8.727459] radeon :0e:0d.0: WB disabled
[8.731806] radeon :0e:0d.0: fence driver on ring 0 use gpu addr 
0xb800 and cpu addr 0x88007921
[8.742676] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[8.749358] [drm] Driver supports precise vblank timestamp query.
[8.755595] [drm] radeon: irq initialized.
[8.759790] [drm] Loading R100 Microcode
[8.759855] ipmi device interface
[8.776300] radeon :0e:0d.0: firmware: direct-loading firmware 
radeon/R100_cp.bin
[8.784408] [drm] radeon: ring at 0xB8001000
[8.789518] [drm] ring test succeeded in 1 usecs
[8.794319] [drm] ib test succeeded in 0 usecs
[8.799096] [drm] Radeon Display Connectors
[8.803346] [drm] Connector 0:
[8.806463] [drm]   VGA-1
[8

Bug#809293: linux-image-3.16.0-4-amd64: network regression in 3.16.7-ckt20-1+deb8u1 breaks ipv6 ike/ipsec negotiations

2015-12-29 Thread Noah Meyerhans
On Mon, Dec 28, 2015 at 03:22:52PM -0800, Noah Meyerhans wrote:
> Following the recent kernel security update, racoon(8) from ipsec-tools
> can no longer negotiate an IPSec security association with an ipv6 peer.
> IPv4 does not appear affected.
> 
> Racoon logs the following:
> Dec 28 13:20:42 amarth racoon: ERROR: recvmsg (Resource temporarily 
> unavailable)
> Dec 28 13:20:42 amarth racoon: ERROR: failed to receive isakmp packet at 
> isakmp.c:238: Resource temporarily unavailable
> 
> This happens when trying to read an IKE (udp port 500) message from the
> peer.
> 
> Downgrading to 3.16.7-ckt11-1+deb8u3 resolves the problem.

git-bisect of the debian packaging repo suggests that the problem was
introduced in 3.16.7-ckt17.

Looking at the git logs for that release, the only commit that is
obviously related to ipv6 and udp is f3106f:
Author: Eric Dumazet 
Date:   Tue Jul 14 08:10:22 2015 +0200

ipv6: lock socket in ip6_datagram_connect()

commit 03645a11a570d52e70631838cb786eb4253eb463 upstream.

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

But I haven't tested anything yet...

noah



Bug#809293: linux-image-3.16.0-4-amd64: network regression in 3.16.7-ckt20-1+deb8u1 breaks ipv6 ike/ipsec negotiations

2015-12-31 Thread Noah Meyerhans
On Tue, Dec 29, 2015 at 11:39:46AM -0800, Noah Meyerhans wrote:
> > Dec 28 13:20:42 amarth racoon: ERROR: recvmsg (Resource temporarily 
> > unavailable)
> > Dec 28 13:20:42 amarth racoon: ERROR: failed to receive isakmp packet at 
> > isakmp.c:238: Resource temporarily unavailable
> > 
> > This happens when trying to read an IKE (udp port 500) message from the
> > peer.
> > 
> > Downgrading to 3.16.7-ckt11-1+deb8u3 resolves the problem.
> 
> git-bisect of the debian packaging repo suggests that the problem was
> introduced in 3.16.7-ckt17.

Bisecting the upstream kernel changes suggests the following commit as
being the origin of the problem:

commit bd0900e5eed6502b314402d36ec11f6d1a67de82
Author: Herbert Xu 
Date:   Mon Jul 13 20:01:42 2015 +0800

net: Fix skb csum races when peeking

commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a upstream.

When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques 

I will confirm this by trying a build with this change reverted, and
will also check to see if there have been followup changes upstream of
the ckt kernels that might have corrected the problem.

noah



Bug#952108: Cloud variant: please enable CONFIG_VHOST_SCSI

2020-03-02 Thread Noah Meyerhans
On Sun, Feb 23, 2020 at 12:28:53AM -0800, Josh Triplett wrote:
> The normal Debian kernel configuration has CONFIG_VHOST_SCSI enabled,
> but the cloud configuration does not seem to have it enabled. Please
> enable CONFIG_VHOST_SCSI=m on the cloud configuration as well.

Out of curiosity, where is this actually needed?



Re: virtio_mmio.device parameter unknown as kernel config option is disabled

2020-03-05 Thread Noah Meyerhans
On Thu, Mar 05, 2020 at 10:31:37PM +0100, João Mikos wrote:
> I am experimenting with running a Debian Buster microVM under the
> Firecracker hypervisor, using a stock Debian kernel (converted to
> vmlinux format) and an initrd.img file. When booting the microVM with
> this setup, I receive the following error message inside the microVM,
> which halts booting:

Hi João.  I have extensive experience with Firecracker, with both Debian
and other distros as the guest OS.  In general, the expectation with
Firecracker is that it will not work with distro kernels out-of-the-box,
but that it requires a custom kernel specifically configured for MicroVM
deployments.  Keep in mind that Firecracker originally didn't even
provide initramfs support, meaning that all device and filesystem
drivers needed to be statically linked into the kernel.  Although
Firecracker has since added initramfs support, its goal of minimal
feature sets and extremely fast boot times still encourages you to use a
customized kernel.

> I could trace this to the flag CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES
> being disabled in the default Debian kernel config.
...
> 1) Is there any specific reason for this flag to be disabled in the
> Debian kernel?

It's not generally useful, so it hasn't been deemed appropriate for
inclusion in the standard kernel builds.  It'll be up to the kernel team
to decide if supporting Firecracker with the default kernel is a
sufficiently interesting use-case to warrant changing it.  IMO, it is
not.

> 2) If the flag is disabled for no specific reason, is there any chance
> for this flag to be enabled in future Debian versions, so the
> virtio_mmio module recognizes the .device parameter by default?

You should open a wishlist severity bug against the src:linux package to
officially request this change.

noah



Re: virtio_mmio.device parameter unknown as kernel config option is disabled

2020-03-07 Thread Noah Meyerhans
On Fri, Mar 06, 2020 at 09:44:26AM +0100, João Mikos wrote:
> Reflecting about what you said the default kernel might indeed not be
> the best place to enable this, but maybe the cloud kernel could be a
> good candidate instead. Official support for Firecracker might be a
> stretch, especially due to the goals you've mentioned, but I am seeing
> projects (e.g. Ignite) using Firecracker as a basis. Any such derived
> project would be benefited by this change. But indeed, the kernel team
> has the best view on whether they want to support (with all that
> entails) any new flag or feature.

On the contrary, tools that integrate Firecracker directly, like Ignite,
Kata Containers, and firecracker-containerd, are the ones that will
benefit the most from using a custom kernel.  Indeed, all of those
projects provide their own kernel images for use with Firecracker.
These kernel images are configured to omit drivers for devices that are
not present in Firecracker's device model, some of of which lead to long
delays in the boot process while they probe for nonexistent devices.
Support for Firecracker in the default Debian kernel could be
interesting for experimentation, but I think you'd want to switch to a
customized kernel as soon as possible before running in production.

Given that the kernel in a Firecracker MicroVM is loaded by the host,
rather than being embedded in the guest VM image, I think a kernel
supporting Firecracker might be better handled in a manner similar to
the user-mode-linux kernel package.
https://packages.debian.org/buster/user-mode-linux  That would be a
package that'd be installed on the system hosting MicroVMs, where it
could provide an uncompressed kernel image for use by Firecracker.

noah



Bug#955366: Enable CONFIG_KSM in the cloud kernel

2020-03-30 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.98-1
Severity: wishlist
Owner: no...@debian.org

Relaying a request on behalf of an Amazon EC2 customer.  The generic
kernel enables CONFIG_KSM, but the cloud kernel disables this.  The
customer makes extensive use of KSM features in EC2 and would like to
request that the feature be enabled for the cloud kernel.

Since the cloud kernel is intended to be optimized for boot time and
size for common cloud usecases, it's worth understanding how much code
is actually enabled by CONFIG_KSM, and how this affects the size and
boot time (if at all).

noah



Bug#955232: Please add 9p kernel module to the cloud image.

2020-03-30 Thread Noah Meyerhans
On Mon, Mar 30, 2020 at 10:33:24AM +0200, Vincent Bernat wrote:
> >> Please consider adding kernel module '9p' to the cloud image. It is
> >> impossible to passthrough filesystem from hypervisor with the current cloud
> >> image and there are no additional packages to install it. The only way that
> >> I know to passthrough filesystem currently is to remove
> >> linux-image-cloud-amd64 and install linux-image-amd64.
> >
> > The "cloud" images are specifically meant for running on public clouds,
> > and don't support everything that could possibly be exposed to a VM.
> >
> > Is there a public cloud that supports 9p passthrough?
> 
> Dunno about OP, but for me, it's also useful when using nested
> virtualisation. Also, the limitation to "public" cloud seems quite
> restrictive. Cloud kernel is used by cloud images, which can be used
> directly with libvirt, for example.

The "nocloud" VM images generated by the cloud team use the generic
kernel, and they're what I'd recommend for use with libvirt and similar
systems.  The others are targeted toward specific cloud vendors and may
not work entirely as expected outside those environments.  Some of the
supported cloud vendors do support nested virtualization and/or
bare-metal host provisioning, so there are cases where 9p could be used
to share data between hosts...

I'm inclined to agree with Ben.  The goal of the cloud kernel is not to
support all possible cloud configurations.  Host <-> guest filesystem
sharing isn't a super common thing in cloud environments, and the
generic kernel is always available if needed.  Its featureset is a
superset of that of the cloud kernel.

Enabling 9p won't result in a drastic increase in the installed size,
but it is something of a slippery slope.

Virtio-fs appears to be the future for this type of use-case, and we do
enable this for bullseye cloud kernels.  I'd much rather encourage its
use instead of the crufty old abuse of a network filesystem that is 9p.

noah



What belongs in the Debian cloud kernel?

2020-04-01 Thread Noah Meyerhans
For buster, we generate a cloud kernel for amd64.  For sid/bullseye,
we'll also support a cloud kernel for arm64.  At the moment, the cloud
kernel is the only used in the images we generate for Microsoft Azure
and Amazon EC2.  It's used in the GCE images we generate as well, but
I'm not sure anybody actually uses those.  We generate two OpenStack
images, one that uses the cloud kernel and another uses the generic
kernel.

There are open bugs against the cloud kernel requesting that
configuration options be turned on there. [1][2][3]  These, IMO,
highlight a need for some documentation around what is in scope for the
cloud kernel, and what is not.  This will help us answer requests such
as these more consistently, and it will also help our users better
understand whether they can expect the cloud kernel to meet their needs
or not.

At the moment, the primary optimization applied to the cloud kernel
focuses on disk space consumed.  We disable compilation of drivers that
we feel are unlikely to ever appear in a cloud environment.  By doing
so, we reduce the installed size of the kernel package by roughly 70%.
There are other optimization we may apply (see [4] for examples), but we
don't yet.

Should we simply say "yes" to any request to add functionality to the
cloud kernel?  None of the drivers will add *that* much to the size of
the image, and if people are asking for them, then they've obviously got
a use case for them.  Or is this a slipperly slope that diminishes the
value of the cloud kernel?  I can see both sides of the argument, so I'd
like to hear what others have to say.

If we're not going to say "yes" to all requests, what criteria should we
use to determine whether or not to enable a feature?  It's rather not
leave it as a judgement call.

noah

1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232
4. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947759



Re: What belongs in the Debian cloud kernel?

2020-04-02 Thread Noah Meyerhans
On Thu, Apr 02, 2020 at 10:55:16AM -0700, Ross Vandegrift wrote:
> I don't think just saying "yes" automatically is the best approach.  But
> I'm not sure we can come up with a clear set of rules.  Evaluating the
> use cases will involve judgment calls about size vs functionality.  I
> guess I think that's okay.

You certainly may be right.  I wasn't able to convince myself either
way, which is why I posted for additional opinions.

> The first two bugs are about nested virtualization.  I like the idea of
> deciding to support that or not.  I don't know much about nested virt,
> so I don't have a strong opinion.  It seems pretty widely supported on
> our platforms.  I don't know if it raises performance or security
> concerns.  So these seem okay to me, as long as we decide to support
> nested virt, and there aren't major cons that I'm unaware of.

IMO nested virtualization is not something I'd want to see in a
"production" environment.  Hardware-assisted isolation between VMs is
critical for hosting mixed-trust workloads (e.g. VMs owned and
controlled by unrelated parties without a mutual trust relationship).
Current hardware virtualization extensions, e.g. Intel VTx, only have a
concept of a single level of virtualization.  Nested virtualization is
implemented by trapping and emulating the CPU extensions, and by doing a
bunch of mapping of nested guest state to allow it to effectively run as
a peer VM of the parent guest in hardware.  Some details at [1].  So not
only is it painfully complex, but it's also quite slow.

This is not to say that there aren't any legitimate use cases for nested
virtualization.  Only that I'm not sure it's something we want to be
optimizing for.

> Can you share more about the KSM use case?  I'm worried about raising
> security concerns for this one.  KSM has had a history of enabling
> attacks that are sorta serious, but also sorta theoretical.  This might
> cause upset from infosec folks that freak out about any vulnerability -
> even when they don't really understand the magnitude of the risk.

I don't have any direct experience with KSM.  I can certainly see how it
could help with certain classes of workload, though, if it's known that
multiple processes with mostly identical state are running.

I'm not sure I'd focus too much on the security implications of KSM,
though, since it's widely enabled in Debian's generic kernel and kernels
distributed by other distros.  I don't want to cargo-cult it, but
neither do I want to ignore prior art.  I don't think there's any reason
to drop support for applications making use of KSM in our cloud kernels,
though.  I can't think of any reason why the feature would be less
useful in a cloud environment, and it could certainly save money by
allowing the use of smaller instances.

noah

1. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/nested-vmx.rst



Re: What belongs in the Debian cloud kernel?

2020-04-03 Thread Noah Meyerhans
On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
> There are open bugs against the cloud kernel requesting that
> configuration options be turned on there. [1][2][3]



> 1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
> 2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
> 3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232

So, the discussion thus far has focused on these specific requests more
than I had hoped.  So for now, so we can deal with the current requests,
here's what happens if we enable them:

These are the kernel .config changes:
+CONFIG_VHOST_SCSI=m
+CONFIG_KSM=y
+CONFIG_NET_9P=m
+CONFIG_NET_9P_VIRTIO=m
+# CONFIG_NET_9P_XEN is not set
+# CONFIG_NET_9P_DEBUG is not set
+CONFIG_TARGET_CORE=m
+CONFIG_TCM_IBLOCK=m
+CONFIG_TCM_FILEIO=m
+CONFIG_TCM_PSCSI=m
+CONFIG_TCM_USER2=m
+# CONFIG_LOOPBACK_TARGET is not set
+CONFIG_ISCSI_TARGET=m
+# CONFIG_XEN_SCSI_BACKEND is not set
+CONFIG_9P_FS=m
+CONFIG_9P_FSCACHE=y
+CONFIG_9P_FS_POSIX_ACL=y
+CONFIG_9P_FS_SECURITY=y
+CONFIG_XXHASH=y

Because CONFIG_KSM changes statically linked code, it results in a size
increase of roughly 12 kB of the compressed kernel.  The uncompressed
kernel increases by about 852 kB in size.  The boot time appears to be
unchanged.  I don't like the size increase, but this feature is enabled
everywhere else and apparently does break some users if it's disabled,
so we should enable it.

The kernel package installed size increases by roughly 2 MB due to the
additional modules we generate for 9P and VHOST_SCSI.

So, I think the answer for these specific requests can be affirmative.
The cost is small enough that if these features are useful to somebody,
then we might as well enable them.

noah



Re: What belongs in the Debian cloud kernel?

2020-04-04 Thread Noah Meyerhans
On Sat, Apr 04, 2020 at 10:17:20AM +0200, Thomas Goirand wrote:
> > The first two bugs are about nested virtualization.  I like the idea of
> > deciding to support that or not.  I don't know much about nested virt,
> > so I don't have a strong opinion.  It seems pretty widely supported on
> > our platforms.  I don't know if it raises performance or security
> > concerns.  So these seem okay to me, as long as we decide to support
> > nested virt, and there aren't major cons that I'm unaware of.
> 
> There's a big problem when activating nested virt. I have read that Live
> migration of VMs can become impossible (ie: for all VMs that are also
> host OS for virtualization). As much as I understand, this is because of
> the difficulty to support nested MMU. I'm not sure if the situation has
> changed or not, but last time I checked this was the case. Ben, do you
> know if this has evolved?

Remember, nested virtualization works today; nothing we have done would
have prevented that.  The question is about whether or not we care about
enabling features to support use cases that only arise when nested
virtualization is in use.

The reason nested virtualization breaks live migration is that it shares
state between the VM and the underlying hypervisor.  The VM is, in a
sense, no longer self contained.  The nested VMs state is tracked by the
parent VM in a VMCS structure, as shown in the nested-vmx.rst doc I
linked previously, and the values in that struct need to be mapped to a
corresponding list in the hypervisor.  Migration would entail some
coordination between the hypervisor and the outer VM, as the shared
state would need to be kept in sync throughout the process.

The sharing of state between the VM and the hypervisor hints at some of
the potential security concerns around nested virtualization in
mixed-trust environments.

> So, when I'm being asked about it, my answer from an OpenStack operator
> point of view, is always a big "NO !". I want to be able to service my
> compute nodes. This means being able to live-migrate the workload away,
> otherwise, customers may notice.

Whether or not you support nested virt on your infrastructure is a
deployment choice, not a choice Debian needs to make.

noah



Bug#955366: 955366 is Important

2020-04-06 Thread Noah Meyerhans
Control: severity -1 important

Raising this to Important.  The lack of a KSM is a regression from the
generic kernel that is impacting the usefulness of this kernel build in
the environment in which it's intended to be used.  This should get
fixed in buster.



Bug#955366: 955366 is Important

2020-04-06 Thread Noah Meyerhans
Control: tags -1 + patch

Proposed fix for buster at 
https://salsa.debian.org/kernel-team/linux/-/merge_requests/229



Bug#956703: linux-image-5.5: 5.5 kernel seems to break pulseaudio HDMI detection

2020-04-14 Thread Noah Meyerhans
On Tue, Apr 14, 2020 at 02:06:30PM +0100, Simon John wrote:
> Booting into 5.5 on Sid gives me no audio out via HDMI.

I can confirm similar HDMI audio breakage with 5.5.13 on a Thinkpad X1
Carbon (gen2) with the following audio devices:

00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
Subsystem: Lenovo Haswell-ULT HD Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
Subsystem: Lenovo 8 Series HD Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel



Bug#956703: linux-image-5.5: 5.5 kernel seems to break pulseaudio HDMI detection

2020-04-15 Thread Noah Meyerhans
On Tue, Apr 14, 2020 at 02:06:30PM +0100, Simon John wrote:
> Package: src:linux
> Version: 5.5.13-2
> 
> Booting into 5.5 on Sid gives me no audio out via HDMI.
> 

HDMI audio seems fixed for me with 5.5.17, recently uploaded to
unstable.  Is it better for you?



Bug#890343: linux: make fq_codel default for default_qdisc

2020-04-23 Thread Noah Meyerhans
On Thu, Apr 23, 2020 at 03:34:06PM -0700, Matt Taggart wrote:
> fq_codel is better in every way than pfifo_fast and I am unaware of any
> reason why it would not be a better default. (but don't trust me, ask the
> kernel networking experts)

Isn't CAKE supposed to be even better than fq_codel, including better
handling of both large numbers of flows (e.g. busy routers) and small
systems with limited resources.

https://www.bufferbloat.net/projects/codel/wiki/Cake/

If we consider a change (which I think we should), is there a reason we
wouldn't go with CAKE?

noah



Bug#968623: src:linux: buster fails to boot successfully on AWS arm64 bare-metal instances

2020-08-18 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.132-1
Severity: important
Tags: buster

Buster fails to fully initialize the hardware on AWS arm64 bare-metal
instances (e.g. m6g.metal).  The issue is fixed with the patch series from
https://lore.kernel.org/linux-arm-kernel/20190615002359.29577-2-b...@kernel.crashing.org/
that adds support for preservation of the PCI bridge configuration upon
request by the firmware.

I have backported the above patch series to 4.19 and will open a merge
request.

noah



Bug#968623: src:linux: buster fails to boot successfully on AWS arm64 bare-metal instances

2020-08-18 Thread Noah Meyerhans
Control: tags -1 + patch

MR available at https://salsa.debian.org/kernel-team/linux/-/merge_requests/263



Bug#941284: Wishlist/RFC: Use CONFIG_HZ=100 in linux-image-cloud-*

2019-09-27 Thread Noah Meyerhans
On Fri, Sep 27, 2019 at 01:15:29PM -0700, Flavio Veloso wrote:
> Since linux-image-cloud-* packages are created for cloud environments --
> read: servers which do not need desktop-level responsiveness --, wouldn't it
> be beneficial to build the kernels with CONFIG_HZ set to 100?

For what it's worth, the Amazon Linux 2 4.14.x kernel also ships with
CONFIG_HZ=250. We obviously don't need to use the same settings, but
that kernel specifically targets cloud deployments, and its maintainers
do not see a need to set CONFIG_HZ=100.

I don't know what considerations were taken into account when choosing
the value for that variable at AWS.

noah



Bug#941291: Amazon ENA driver update for stable

2019-09-27 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.67-2

ENA is an ethernet adaptor used on Amazon EC2 cloud instances.  Upstream
has recently merged a number of bug fixes and enhancements, and it would
be nice to have these available in stable.

Specific upstream changesets that I'm interested in are:

https://lore.kernel.org/netdev/20190212.110632.499796234007955726.da...@davemloft.net/
https://lore.kernel.org/netdev/20190504.001749.585621906089996460.da...@davemloft.net/
https://lore.kernel.org/netdev/20190603.133056.1755579912817273080.da...@davemloft.net/
https://lore.kernel.org/netdev/20190612.112249.1252227316058010690.da...@davemloft.net/
https://lore.kernel.org/netdev/20190623.083930.762200013774329614.da...@davemloft.net/
https://lore.kernel.org/netdev/20190625.140934.145849418200881936.da...@davemloft.net/
https://lore.kernel.org/netdev/20190916.220620.564607753604799412.da...@davemloft.net/#t

In theory it would be possible to pare down the list of commits to 
something more tightly focused, but the level of effort involved (and 
potential for regressions) would be quite a bit higher. Further, I would 
rather avoid diverging too far from what's in mainline.

I will provide a merge request on salsa with proposed changes.



Bug#941291: Acknowledgement (Amazon ENA driver update for stable)

2019-10-01 Thread Noah Meyerhans
Proposed implementation in
https://salsa.debian.org/kernel-team/linux/merge_requests/172



Bug#931341: linux-image-4.19.0-5-cloud-amd64 does not have /dev/rtc, used by GCE images

2019-10-15 Thread Noah Meyerhans
Proposed fixes for unstable and stable (respectively) are at:

https://salsa.debian.org/kernel-team/linux/merge_requests/179
https://salsa.debian.org/kernel-team/linux/merge_requests/178



Re: Latest kernel and headers in buster-backports

2019-11-26 Thread Noah Meyerhans
On Tue, Nov 26, 2019 at 10:07:45AM +0100, Andreas Heinlein wrote:
> 
> I noticed that linux-image-amd64 is currently missing in buster-backports 
> since a few days. I guess this is because the signed kernel images are not 
> available yet. Still, linux-headers-amd64 exists in buster backports and 
> pulls in linux-headers-5.3...
> 
> We use the latest kernel image and headers from buster-backports for 
> automated installations using FAI, and this situation breaks a few things, 
> including dkms modules not being compiled and so on.
> 
> I know that backports isn't supposed to be as well supported as the "main" 
> repo, but still I'd like to know if it is possible to keep headers and image 
> metapackages in sync in backports.
> 

The thread beginning here has the details:
https://lists.debian.org/debian-backports/2019/11/msg9.html

It is a known issue and will be resolved "soon".

noah



Bug#948519: insufficient boot-time entropy on arm64 virtual machines

2020-01-09 Thread Noah Meyerhans
Package: src:linux
Version: 4.19.67-2+deb10u2
Severity: important 

See the thread at
https://lists.debian.org/debian-cloud/2020/01/threads.html#00013 for
some context.

When launching arm64 VMs on Amazon EC2, a lack of entropy at boot
results in the full boot process taking several minutes, when the
expectation is that it take a small number of seconds (<10).

Analysis of the boot process shows the ssh key generation is the
culprit, taking nearly 3 minutes.

admin@ip-10-0-1-87:~$ cloud-init analyze blame
-- Boot Record 01 --
 165.77300s (init-network/config-ssh)

The 5.4 kernel currently in sid does not experience this lack of
entropy.  It has been suggested that upstream commit 50ee7529ec45
("random: try to actively add entropy rather than passively wait for
it") may be the difference here, but I have not confirmed this.

A suggested workaround has been to install haveged in the image, but
this tends to make crypto people frown.



Bug#931644: Buster kernel entropy pool too low on VM boot

2020-01-09 Thread Noah Meyerhans
On Thu, Jul 11, 2019 at 09:42:17AM -0400, Michael J. Redd wrote:
> > The release notes for buster do mention this issue and provide a
> > link to:
> > 
> > https://wiki.debian.org/BoottimeEntropyStarvation
> > 
> > which has your Haveged solution as one of its suggestions.
> > 
> 
> D'oh! Serves me right for just skimming the release notes, then. After
> doing some in-depth reading, this is a problem for the Linux community
> at large. Wow. While I'm glad the kernel's getting choosier about where
> and how to harvest entropy and can personally live with the ~30 seconds
> added to VM boot times, it could be painful to, for example, bootstrap
> a Linux guest on AWS for the first time and wait for the initial SSH
> keys to be created.
> 
> Will be interesting to see how this evolves over time. In the meantime,
> as this is not actually a kernel defect, I suppose this bug can be
> closed.

I suspect that this bug might end up being mergeable with
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519.  In that bug,
I am investigating cherry-picking commit 50ee7529ec45 from the linux
mainline branch for buster.  At least on the arm64 ec2 instances where
I've tested, this change resolves the issue.

If I provide a package for you, would you be able to test it in your
environment to see if the proposed patch addresses the problem there?

Thanks
noah

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ee7529ec45



Bug#948519: insufficient boot-time entropy on arm64 virtual machines

2020-01-09 Thread Noah Meyerhans
On Thu, Jan 09, 2020 at 02:00:01PM -0500, Noah Meyerhans wrote:
> The 5.4 kernel currently in sid does not experience this lack of
> entropy.  It has been suggested that upstream commit 50ee7529ec45
> ("random: try to actively add entropy rather than passively wait for
> it") may be the difference here, but I have not confirmed this.

I've tested 4.19 with the changes from 50ee7529ec45, and it appears to
do the expected thing.  The kernel claims to have enough entropy at boot
that ssh host key generation happens quickly.

This change is likely to also address #931644.  I've asked the submitter
of that bug if they'd be able to test a proposed fix.

I will prepare a merge request and note it here when it's available.

noah



Bug#948519: Info received (Bug#948519: insufficient boot-time entropy on arm64 virtual machines)

2020-01-10 Thread Noah Meyerhans
Control: tags -1 + patch

Proposed solution submitted as 
https://salsa.debian.org/kernel-team/linux/merge_requests/202



Bug#947759: Configuration optimizations for the cloud variant

2020-01-20 Thread Noah Meyerhans
On Mon, Jan 20, 2020 at 04:38:55PM -0800, Josh Triplett wrote:
> Following up on this, here's a simplified list of optimizations for the
> cloud variant in one place, taking into account the previous reply.
> Would it help to get this in the form of a patch or MR on
> https://salsa.debian.org/kernel-team/linux/ ?

Hi Josh.  I started some work on integrating and testing your proposed
changes in a private branch at
https://salsa.debian.org/noahm/linux/tree/cloud-optimizations-bug947759,
but haven't gotten to the point of creating an MR yet.  I can likely
pick this up again this week, but if you create one first, that's fine
too.  One thing to note is that MR !193 changes the layout of the cloud
configs as it introduces an arm64 cloud flavour.  It hasn't been fully
reviewed yet, but there will be some merging to do if you work on your
MR independently of that.
https://salsa.debian.org/kernel-team/linux/merge_requests/193

Thank you for opening this bug!

noah



Bug#947759: Configuration optimizations for the cloud variant

2020-02-04 Thread Noah Meyerhans
Before optimizations:

$ systemd-analyze
Startup finished in 7.828s (kernel) + 22.332s (userspace) = 30.161s 
graphical.target reached after 20.312s in userspace

With optimizations:
$ systemd-analyze
Startup finished in 1.968s (kernel) + 6.536s (userspace) = 8.504s 
graphical.target reached after 6.197s in userspace

Only tested so far on an EC2 t3.medium, so results may vary elsewhere,
but that's nice. :)

I'll post a WIP MR on salsa next.  Some cleanup is needed still.

noah



Bug#947759: Configuration optimizations for the cloud variant

2020-02-05 Thread Noah Meyerhans
On Tue, Feb 04, 2020 at 03:48:32PM -0800, Josh Triplett wrote:
> I would suggest testing on a c5.large. t2 and t3 have shared CPUs, so
> they have less consistent boot time. c5.large is about the same cost as
> t3.large, but will have far more consistent performance.

Performance definitely seems to vary quite a bit.  Here are a few
samples from c5.large instances in us-west-2:

5.5 "generic" kernel:
$ systemd-analyze 
Startup finished in 4.323s (kernel) + 11.189s (userspace) = 15.513s 
graphical.target reached after 9.865s in userspace

5.5 "cloud" with optimizations:
$ systemd-analyze 
Startup finished in 7.273s (kernel) + 21.984s (userspace) = 29.258s 
graphical.target reached after 19.811s in userspace

$ systemd-analyze 
Startup finished in 4.319s (kernel) + 13.684s (userspace) = 18.003s 
graphical.target reached after 12.321s in userspace

$ systemd-analyze 
Startup finished in 3.327s (kernel) + 9.846s (userspace) = 13.174s 
graphical.target reached after 8.831s in userspace

The optimized timings are all taken from the initial boot of newly
launched instances.

It's certainly possible that things will look better with more data, but
it's not clear yet that this change is an improvement in all cases.

> > I'll post a WIP MR on salsa next.  Some cleanup is needed still.
> 
> Thank you for working on this!
> 

The MR is here: https://salsa.debian.org/kernel-team/linux/merge_requests/206

I'm happy to share binary and/or source .debs and/or AMIs if you'd like
to run some of your own tests.

> Have you confirmed that with the optimizations, you can boot without
> needing an initramfs?

I've confirmed that the kernel can boot on c5 and t3 instances without
an initramfs present at all.  However, the Xen backed EC2 instance types
would also need to statically link (at least) ATA_PIIX, ATA_GENERIC, and
XEN_BLKDEV_FRONTEND if we wanted to provide kernels that could be
generally useful in EC2 without an initramfs.

noah



Bug#981186: linux: Enable CMN-600 interconnect on arm64

2021-02-02 Thread Noah Meyerhans
On Wed, Jan 27, 2021 at 12:57:07PM +, Wookey wrote:
> Current arm hardware such as graviton2 (AWS arm64 hardware) has
> 'Coherent Mesh Network' interconnect (between components in a
> soc). It's important that support for this is built in the kernel so
> it can be used.
> 
> This requires CONFIG_ARM_CMN=y

To be precise, this driver is needed for perf event monitoring of this
interconnect.  The interconnect itself is always in use.

On Amazon EC2, these PMU events are only exposed on the bare-metal
instances (e.g. m6g.metal), not the VMs.

We should still enable support for this driver, in any case.

noah



Bug#983923: linux-image-4.19.0-13-cloud-amd64: Please add CONFIG_MAXSMP to the linux-image-cloud-amd64 kernel

2021-03-03 Thread Noah Meyerhans
On Wed, Mar 03, 2021 at 05:35:42PM +0100, Louis Bouchard wrote:
> Thank you for the quick update. I just want to mention that this makes the
> Debian Buster cloud image unusable for any VM with more than 64 cpus.

Is it the number of physical cores that matters, rather than the SMT
threads?  Because on a 96 core SMT system, it works today:

admin@ip-10-0-0-75:~$ uname -a && ec2metadata --instance-type && lscpu
Linux ip-10-0-0-75 4.19.0-14-cloud-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) 
x86_64 GNU/Linux
m5a.24xlarge
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   48 bits physical, 48 bits virtual
CPU(s):  96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):   2
NUMA node(s):6
Vendor ID:   AuthenticAMD
CPU family:  23
Model:   1
Model name:  AMD EPYC 7571
Stepping:2
CPU MHz: 2550.330
BogoMIPS:4399.61
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:   32K
L1i cache:   64K
L2 cache:512K
L3 cache:8192K
NUMA node0 CPU(s):   0-7,48-55
NUMA node1 CPU(s):   8-15,56-63
NUMA node2 CPU(s):   16-23,64-71
NUMA node3 CPU(s):   24-31,72-79
NUMA node4 CPU(s):   32-39,80-87
NUMA node5 CPU(s):   40-47,88-95
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm 
aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe 
popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm 
sse4a misalignsse 3dnowprefetch topoext perfctr_core vmmcall fsgsbase bmi1 avx2 
smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero 
xsaveerptr arat npt nrip_save

Do any of the cloud providers supported by the cloud kernel build
(primarily AWS and Azure) offer VMs with >64 physical cores?

noah



Bug#986741: Please enable CONFIG_IP_PNP_DHCP=y in cloud image

2021-04-13 Thread Noah Meyerhans
On Tue, Apr 13, 2021 at 11:36:11AM +0200, Bastian Blank wrote:
> > > Where was that discussed?
> > It was discussed in
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947759
> > , with responses from both Ben and Noah.
> 
> As this is incomplete at best, I intend to revert that change.
> 
> Otherwise it at least needs in addition:
> - hyperv pci support (for nvme),
> - hyperv block and
> - virtio-scsi.

This should probably be discussed in depth outside of this bug, but I do
not think this is the right approach.  The kernel config changes have
been in place for over a year, and were merged only after discussion
both on salsa and the BTS.  They accomplish the desired effect, which is
to allow the cloud kernel to boot without an initrd on common cloud
instance types (specifically AWS Nitro instance types).  While this is
not the default behavior for any of our cloud images, it is simple for
users to enable it on derivative images if they can benefit from the
reduced boot time.

If we want to support initrd-less booting on instances aside from Nitro
instance types, that's great, and we should consider doing that.  That
would be much more beneficial to our users than reverting the change.

> However this bug is about dhcp support.  If there is no network device
> support available, how does it do dhcp?

I'd also argue that NIC drivers (in particular ena.ko, used on Nitro
instances) should be left configured as modules, even if we do want to
support initrd-less booting.  AWS publishes an out-of-tree variant of
the ENA driver that may be ahead of what's in the stable kernel tree,
and some users may have a need to run that.

noah



signature.asc
Description: PGP signature


Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4

2011-05-06 Thread Noah Meyerhans
Package: linux-2.6
Version: 2.6.38-3
Severity: normal

Hi. I've got a system that hosts several kvm virtual hosts.  The VMs
access the network via tap devices bridged with a physical interface.
After upgrading to linux-image-2.6.38-2-amd64_2.6.38-4, I noticed that
the virtualhosts were not autoconfiguring their IPv6 interfaces.
Debugging revealed that no multicast was passing over the bridge.

The bridge configuration is:
bridge name bridge id   STP enabled interfaces
br0 8000.0002e3080eb5   no  eth1
tap0
tap1
tap2

If I attach tcpdump to br0, I can see multicast (e.g. IPv6 Neighbor
Solicitation) packets.  However, if I attach tcpdump to eth1, I do not
see multicast packets sourced from one of the VMs.

Downgrading to 2.6.38-3 solves the problem.

noah

-- Package-specific info:
** Version:
Linux version 2.6.38-2-amd64 (Debian 2.6.38-3) (b...@decadent.org.uk) (gcc 
version 4.4.5 (Debian 4.4.5-15) ) #1 SMP Thu Apr 7 06:43:20 UTC 2011

** Command line:
BOOT_IMAGE=/vmlinuz root=UUID=c5ed1e31-1b76-44fa-a32d-12aa816c51eb ro quiet

** Not tainted

** Kernel log:
[ 1146.028039] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1146.028040] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1146.028042] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1146.028044] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1146.028046] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1146.028047] 
[ 1146.028050] radeon :01:05.0: DVI-D-1: EDID block 0 invalid.
[ 1146.028053] [drm:radeon_dvi_detect] *ERROR* DVI-D-1: probed a monitor but 
no|invalid EDID
[ 1156.118932] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, 
remainder is 1
[ 1156.118936] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[ 1156.118939] <3>01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118941] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118944] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118946] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118949] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118951] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118953] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118956] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.118957] 
[ 1156.168710] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, 
remainder is 1
[ 1156.168712] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[ 1156.168715] <3>01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168717] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168720] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168722] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168725] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168727] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168729] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168732] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.168734] 
[ 1156.218447] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, 
remainder is 1
[ 1156.218450] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[ 1156.218452] <3>01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218454] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218457] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218459] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218462] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218464] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218466] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218469] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.218471] 
[ 1156.268287] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, 
remainder is 1
[ 1156.268289] [drm:drm_edid_block_valid] *ERROR* Raw EDID:
[ 1156.268292] <3>01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.268294] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.268297] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.268299] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[ 1156.268301] <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4

2011-05-09 Thread Noah Meyerhans
On Tue, May 10, 2011 at 03:38:44AM +0100, Ben Hutchings wrote:
> This is pretty weird.  Debian version 2.6.38-3 has a few bridging
> changes from stable 2.6.38.3 and 2.6.38.4, but they don't look like they
> would cause this.

I have apparently filed the bug against the wrong version of Debian's
kernel.  2.6.38-3 is not affected, and works as expected.  The change
was introduced in -4.  That may have been clear from the report itself,
but the report was filed against -3.  I've fixed that in the BTS.

I've also confirmed that -5 is affected, to no great surprise.

I'll investigate further.

noah



signature.asc
Description: Digital signature


Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4

2011-05-10 Thread Noah Meyerhans
On Tue, May 10, 2011 at 01:42:49PM +0100, Ben Hutchings wrote:
> > > This is pretty weird.  Debian version 2.6.38-3 has a few bridging
> > > changes from stable 2.6.38.3 and 2.6.38.4, but they don't look like they
> > > would cause this.
> > 
> > I have apparently filed the bug against the wrong version of Debian's
> > kernel.  2.6.38-3 is not affected, and works as expected.  The change
> > was introduced in -4.  That may have been clear from the report itself,
> > but the report was filed against -3.  I've fixed that in the BTS.
> 
> I gathered that, and then made the same mistake in writing the above!
> The version with the regression, 2.6.38-4, includes the changes from
> stable 2.6.38.3 and 2.6.38.4

With a little help from git bisect, I've tracked this regression down to
the following commit to the stable-2.6.38.y tree:

commit 5f1c356a3fadc0c19922d660da723b79bcc9aad7
Author: Herbert Xu 
Date:   Fri Mar 18 05:27:28 2011 +

bridge: Reset IPCB when entering IP stack on NF_FORWARD

[ Upstream commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e ]

Whenever we enter the IP stack proper from bridge netfilter we
need to ensure that the skb is in a form the IP stack expects
it to be in.

The entry point on NF_FORWARD did not meet the requirements of
the IP stack, therefore leading to potential crashes/panics.

This patch fixes the problem.

Signed-off-by: Herbert Xu 
Acked-by: Stephen Hemminger 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 

The diff is
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 4b5b66d..49d50ea 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -741,6 +741,9 @@ static unsigned int br_nf_forward_ip(unsigned int
hook, struct sk_buff *skb,
nf_bridge->mask |= BRNF_PKT_TYPE;
}
 
+   if (br_parse_ip_options(skb))
+   return NF_DROP;
+
/* The physdev module checks on this */
nf_bridge->mask |= BRNF_BRIDGED;
nf_bridge->physoutdev = skb->dev;

If I revert this change, network connectivity functions as expected for
the VMs on this host.

I don't know enough about this change or the problem it was supposed to
solve to be able to guess about what's going wrong.

noah



signature.asc
Description: Digital signature


Bug#625914: [Bridge] Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4

2011-05-10 Thread Noah Meyerhans
On Tue, May 10, 2011 at 03:11:00PM -0700, Stephen Hemminger wrote:
> There were two more follow on commits in stable related to this.
> I recommend merging 2.6.38.6 which includes these.

The problem still exists in the current 2.6.38.6.  Backing out 5f1c356a
still solves the problem there.

I have not yet tried anything outside the stable-2.6.38.y tree, but it
seems like these same changes are present there, and it's unlikely that
other releases will work any better.

noah



signature.asc
Description: Digital signature


Bug#625914: [Bridge] Bug#625914: linux-image-2.6.38-2-amd64: bridging is not interacting well with multicast in 2.6.38-4

2011-05-12 Thread Noah Meyerhans
On Thu, May 12, 2011 at 04:43:22PM -0700, Stephen Hemminger wrote:
> > > There were two more follow on commits in stable related to this.
> > > I recommend merging 2.6.38.6 which includes these.
> > 
> > The problem still exists in the current 2.6.38.6.  Backing out 5f1c356a
> > still solves the problem there.
> > 
> > I have not yet tried anything outside the stable-2.6.38.y tree, but it
> > seems like these same changes are present there, and it's unlikely that
> > other releases will work any better.
> 
> Does this fix the problem?  The tap driver allocates an skb and throws
> it into the receive path, but the skb does not have the same padding
> as normal skb's received.
> 
> --- a/drivers/net/tun.c   2011-05-12 16:36:15.231347935 -0700
> +++ b/drivers/net/tun.c   2011-05-12 16:36:38.503464573 -0700
> @@ -614,7 +614,7 @@ static __inline__ ssize_t tun_get_user(s
>   }
>  
>   if ((tun->flags & TUN_TYPE_MASK) == TUN_TAP_DEV) {
> - align = NET_IP_ALIGN;
> + align = NET_IP_ALIGN + NET_SKB_PAD;
>   if (unlikely(len < ETH_HLEN ||
>(gso.hdr_len && gso.hdr_len < ETH_HLEN)))
>   return -EINVAL;
> 

Sorry, this does not fix the problem.

noah



signature.asc
Description: Digital signature