Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels
On 3/10/21 4:48 PM, Hans de Goede wrote: Hi, On 3/10/21 9:55 PM, Filipe Laíns wrote: On Wed, 2021-03-10 at 15:24 -0500, Mark Hounschell wrote: That is correct, I don't have any buttons bound to keyboard events. With the original patch the G4(forward) and G5(Backward) buttons work in a browser. I guess G7, G8, and G9 buttons are programmable to keyboard events? However this patch does not seem to fix the messages I get. Regards Mark Those events belong to the USB HID button usage page and are sent by the receiver in the HID device with the unnumbered report descriptor, so they are not affected. Looking at the report descriptor for the other HID device, I see a report ID of 128 (0x80) used for a vendor application, I am not really sure what it is used for and can't seem to trigger my device to send it. I am gonna guess this is the device reporting the pressed buttons via vendor reports or something like that. Speaking as the person who added support for this device in libratbag, this report is very likely not something that we don't need in our custom drivers and just likely something extra that Logitech built to achieve something custom in the Windows driver. FWIW, this device is a very weird one, it does not even follow Logitech's own spec :P Seeing this report the driver chugs. if (report > REPORT_TYPE_RFREPORT_LAST) { hid_err(hdev, "Unexpected input report number %d\n", report); return; } Causing your [ 36.471326] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 36.565317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.390321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 I feel like the correct fix for these cases is not to consume the report and not forward it to device node, but rather to forward it to the receiver node. (looping in Hans) Hans, you introduced this code, do you remember why? Where did REPORT_TYPE_RFREPORT_LAST get its value from and what is the purpose of this check? Shouldn't we just keep forwarding unknown reports to the receiver node? Is there any technical limitation to do that? I am not too familiar with this part of the code. The code used by the recvr_type_gaming_hidpp receivers is shared with all the other non-unifying receivers. Even though these receivers are not unifying the non gaming versions may still have multiple devices (typically a keyboard + a mouse) paired with them. The standard HID interfaces which these devices emulate are usually split in at least 2 HID interfaces: 1. A keyboard following the requirements of the "boot keyboard" subclass of the USB HID class, so that the keyboard works inside say the BIOS setup screen. This uses a single unnumbered HID report 2. A mouse + media-keys interface, which delivers numbered reports, including the special Logitech HID++ reports for things like battery monitoring, but also some special keys, which have their own sub-addressing embedded inside the reports. The driver asks the receiver for a list of paired devices and then builts a list of devices, which are then instantiated as child-HID devices which are handled by the drivers/hid/hid-logitech-hidpp.c driver. Any input reports received by drivers/hid/hid-logitech-dj.c are then forwarded to the instantiated child devices, where they are actually processed. The problem is that there is not a 1:1 relation between the interfaces and the instantiated child-devices, so the driver aggregates all input-reports from both interfaces together and then dispatches / forwards them to the child-devices using its own internal addressing. This forwarding uses 2 different addressing schemes: 1. If the report received is a special HID++ report, then it is forwarded to paired-dev child-dev matching the HID++ device-index which is embedded inside these special reports. 2. If a normal (unnumbered or numbered) report is received then that report is forwarded based on the report-number. What happens here is that each paired-dev which the hid-logitech-dj.c code instantiates has a bitmask associated with it which indicates which kind of reports it consumes. So e.g. a normal mouse will only consume mouse input-reports (STD_MOUSE, report-id 2) and a keyboard will consume all of: #define STD_KEYBOARDBIT(1) #define MULTIMEDIA BIT(3) #define POWER_KEYS BIT(4) #define MEDIA_CENTERBIT(8) #define KBD_LEDSBIT(14) When forwarding these normal (unnumbered or numbered) reports, the list of paired devices is searched and the report is forwarded to the first paired-dev which reports_supported bitmask includes the report-nr: spin_lock_irqsave(_dev->lock, flags); for (i = 0; i < (DJ_MAX_PAIRED_DEVICES + DJ_DEVICE_INDEX_MIN); i++) {
Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels
On 3/10/21 3:24 PM, Mark Hounschell wrote: On 3/10/21 2:56 PM, Filipe Laíns wrote: On Wed, 2021-03-10 at 13:55 -0500, Mark Hounschell wrote: I have been using a Logitech wireless G602 mouse since forever. As of kernel 5.10.11 I get the following kernel messages; $dmesg | grep -i logitech (snip) . . . Every mouse event seems to produce another "Unexpected input report number 128" kernel message. The commit that started this is: commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da Author: Filipe LaÃns Date: Mon Jan 4 20:47:17 2021 + HID: logitech-dj: add the G602 receiver [ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ] Tested. The device gets correctly exported to userspace and I can see mouse and keyboard events. Signed-off-by: Filipe LaÃns Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin The actual patch: diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 1ffcfc9a1e033..45e7e0bdd382b 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -1869,6 +1869,10 @@ static const struct hid_device_id logi_dj_receivers[] = { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xc531), .driver_data = recvr_type_gaming_hidpp}, + { /* Logitech G602 receiver (0xc537) */ + HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, + 0xc537), + .driver_data = recvr_type_gaming_hidpp}, { /* Logitech lightspeed receiver (0xc539) */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_NANO_RECEIVER_LIGHTSPEED_1), markh@harley:~> lsusb Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 003: ID 046d:c537 Logitech, Inc. Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub With the patch reverted: $dmesg | grep -i logitech (snip) $lsusb Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 003: ID 046d:c537 Logitech, Inc. Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub With or without the patch and error messages the mouse has always worked. Regards Mark Yes, sorry about that. The following patch should fix it, can you confirm? You probably didn't notice any breakage because you do not have any of your buttons bound to keyboard events. commit ef07c116d98772952807492bd32a61f5af172a94 (hid/for-5.11/upstream-fixes) Author: Filipe Laíns Date: Fri Feb 5 14:34:44 2021 + HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming In e400071a805d6229223a98899e9da8c6233704a1 I added support for the receiver that comes with the G602 device, but unfortunately I screwed up during testing and it seems the keyboard events were actually not being sent to userspace. This resulted in keyboard events being broken in userspace, please backport the fix. The receiver uses the normal 0x01 Logitech keyboard report descriptor, as expected, so it is just a matter of flagging it as supported. Reported in https://github.com/libratbag/libratbag/issues/1124 Fixes: e400071a805d6 ("HID: logitech-dj: add the G602 receiver") Cc: Signed-off-by: Filipe Laíns Signed-off-by: Jiri Kosina diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 45e7e0bdd382..fcdc922bc973 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -980,6 +980,7 @@ static void logi_hidpp_recv_queue_notif(struct hid_device *hdev, case 0x07: device_type = "eQUAD step 4 Gaming"; logi_hidpp_dev_conn_notif_equad(hdev, hidpp_report, ); + workitem.reports_supported |= STD_KEYBOARD; break; case 0x08: device_type = "eQUAD step 4 for gamepads"; That is correct, I don't have any buttons bound to keyboard events. With the original patch the G4(forward) and G5(Backward) buttons work in a browser. I guess G7, G8, and G9 buttons are programmable to keyboard events? However this patch does not seem to fix the messages I get. Actually is not this patch already in a 5.10.21 kernel? Mark
Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels
On 3/10/21 2:56 PM, Filipe Laíns wrote: On Wed, 2021-03-10 at 13:55 -0500, Mark Hounschell wrote: I have been using a Logitech wireless G602 mouse since forever. As of kernel 5.10.11 I get the following kernel messages; $dmesg | grep -i logitech (snip) . . . Every mouse event seems to produce another "Unexpected input report number 128" kernel message. The commit that started this is: commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da Author: Filipe LaÃns Date: Mon Jan 4 20:47:17 2021 + HID: logitech-dj: add the G602 receiver [ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ] Tested. The device gets correctly exported to userspace and I can see mouse and keyboard events. Signed-off-by: Filipe LaÃns Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin The actual patch: diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 1ffcfc9a1e033..45e7e0bdd382b 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -1869,6 +1869,10 @@ static const struct hid_device_id logi_dj_receivers[] = { HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xc531), .driver_data = recvr_type_gaming_hidpp}, + { /* Logitech G602 receiver (0xc537) */ + HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, + 0xc537), + .driver_data = recvr_type_gaming_hidpp}, { /* Logitech lightspeed receiver (0xc539) */ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_NANO_RECEIVER_LIGHTSPEED_1), markh@harley:~> lsusb Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 003: ID 046d:c537 Logitech, Inc. Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub With the patch reverted: $dmesg | grep -i logitech (snip) $lsusb Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 003: ID 046d:c537 Logitech, Inc. Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub With or without the patch and error messages the mouse has always worked. Regards Mark Yes, sorry about that. The following patch should fix it, can you confirm? You probably didn't notice any breakage because you do not have any of your buttons bound to keyboard events. commit ef07c116d98772952807492bd32a61f5af172a94 (hid/for-5.11/upstream-fixes) Author: Filipe Laíns Date: Fri Feb 5 14:34:44 2021 + HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming In e400071a805d6229223a98899e9da8c6233704a1 I added support for the receiver that comes with the G602 device, but unfortunately I screwed up during testing and it seems the keyboard events were actually not being sent to userspace. This resulted in keyboard events being broken in userspace, please backport the fix. The receiver uses the normal 0x01 Logitech keyboard report descriptor, as expected, so it is just a matter of flagging it as supported. Reported in https://github.com/libratbag/libratbag/issues/1124 Fixes: e400071a805d6 ("HID: logitech-dj: add the G602 receiver") Cc: Signed-off-by: Filipe Laíns Signed-off-by: Jiri Kosina diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 45e7e0bdd382..fcdc922bc973 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -980,6 +980,7 @@ static void logi_hidpp_recv_queue_notif(struct hid_device *hdev, case 0x07: device_type = "eQUAD step 4 Gaming"; logi_hidpp_dev_conn_notif_equad(hdev, hidpp_report, ); + workitem.reports_supported |= STD_KEYBOARD; break; case 0x08: device_type = "eQUAD step 4 for gamepads"; That is correct, I don't have any buttons bound to keyboard events. With the original patch the G4(forward) and G5(Backward) buttons work in a browser. I guess G7, G8, and G9 buttons are programmable to keyboard events? However this patch does not seem to fix the messages I get. Regards Mark
Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels
I have been using a Logitech wireless G602 mouse since forever. As of kernel 5.10.11 I get the following kernel messages; $dmesg | grep -i logitech [7.102140] usb 3-3.4: Manufacturer: Logitech [ 10.036763] input: Logitech USB Receiver as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.0/0003:046D:C537.0001/input/input10 [ 10.037904] hid-generic 0003:046D:C537.0001: input,hidraw0: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-:16:00.3-3.4/input0 [ 10.039542] input: Logitech USB Receiver Keyboard as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input11 [ 10.092374] input: Logitech USB Receiver Consumer Control as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input12 [ 10.093726] input: Logitech USB Receiver System Control as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input13 [ 10.094924] input: Logitech USB Receiver as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input16 [ 10.096155] hid-generic 0003:046D:C537.0002: input,hiddev96,hidraw1: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-:16:00.3-3.4/input1 [ 10.121557] logitech-djreceiver 0003:046D:C537.0001: hidraw0: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-:16:00.3-3.4/input0 [ 10.264445] logitech-djreceiver 0003:046D:C537.0002: hiddev96,hidraw1: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-:16:00.3-3.4/input1 [ 10.320315] logitech-djreceiver 0003:046D:C537.0002: device of type eQUAD step 4 Gaming (0x07) connected on slot 1 [ 10.321505] input: Logitech Wireless Mouse PID:402c Mouse as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/0003:046D:402C.0003/input/input17 [ 10.322637] hid-generic 0003:046D:402C.0003: input,hidraw2: USB HID v1.11 Mouse [Logitech Wireless Mouse PID:402c] on usb-:16:00.3-3.4/input1:1 [ 10.360344] input: Logitech G602 as /devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/0003:046D:402C.0003/input/input21 [ 10.361537] logitech-hidpp-device 0003:046D:402C.0003: input,hidraw2: USB HID v1.11 Mouse [Logitech G602] on usb-:16:00.3-3.4/input1:1 [ 23.271323] logitech-hidpp-device 0003:046D:402C.0003: HID++ 2.0 device connected. [ 36.471326] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 36.565317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.390321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.478325] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.771318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.859318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 42.955318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.049318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.105317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.200317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.280318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.375321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.455318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.558317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.638318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.741319] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.812319] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 43.916318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 44.003318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 [ 44.106318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report number 128 . . . Every mouse event seems to produce another "Unexpected input report number 128" kernel message. The commit that started this is: commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da Author: Filipe LaÃns Date: Mon Jan 4 20:47:17 2021 + HID: logitech-dj: add the G602 receiver [ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ] Tested. The device gets correctly exported to userspace and I can see mouse and keyboard events. Signed-off-by: Filipe LaÃns Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin The actual patch: diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 1ffcfc9a1e033..45e7e0bdd382b 100644 ---
4.18.1 "make install" issue
Something is amiss. A "make install" of an already compiled 4.18.1 kernel is taking 10 minutes or so. The same thing for a 4.17.14 kernel takes all of a minute. And during that 10 minutes or so we seem hung up on i915.ko 2626 pts/8S+ 0:00 make -f ./scripts/Makefile.build obj=arch/x86/boot install 2627 pts/8S+ 0:00 /bin/sh /sbin/installkernel 4.18.1 arch/x86/boot/bzImage System.map /boot 2642 pts/11 Ss 0:00 /bin/bash 2659 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4581 pts/1Ss 0:00 /bin/bash 4620 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4622 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4623 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4624 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4625 pts/8S+ 0:09 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4627 pts/8R+ 1:47 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4682 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4683 pts/8S+ 0:00 cat /lib/modules/4.18.1//kernel/drivers/gpu/drm/i915/i915.ko 4684 pts/8S+ 0:00 tr -cd [:print:] # date ; make install ; date Fri Aug 17 10:25:47 EDT 2018 sh ./arch/x86/boot/install.sh 4.18.1 arch/x86/boot/bzImage \ System.map "/boot" dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 dracut: *** Including module: bash *** dracut: *** Including module: systemd *** dracut: *** Including module: warpclock *** dracut: *** Including module: systemd-initrd *** dracut: *** Including module: i18n *** dracut: Could not find FONT_MAP none! dracut: *** Including module: drm *** Here it hangs up for 10 minutes with the above "ps ax" output? dracut: Possible missing firmware "nvidia/gv100/sec2/sig.bin" for kernel module "nouveau.ko" . . . dracut: *** Including module: plymouth *** dracut: *** Including module: kernel-modules *** dracut: *** Including module: resume *** dracut: *** Including module: rootfs-block *** dracut: *** Including module: terminfo *** dracut: *** Including module: udev-rules *** dracut: Skipping udev rule: 40-redhat.rules dracut: Skipping udev rule: 50-firmware.rules dracut: Skipping udev rule: 50-udev.rules dracut: Skipping udev rule: 91-permissions.rules dracut: Skipping udev rule: 80-drivers-modprobe.rules dracut: *** Including module: dracut-systemd *** dracut: *** Including module: haveged *** dracut: *** Including module: ostree *** dracut: *** Including module: usrmount *** dracut: *** Including module: base *** dracut: *** Including module: fs-lib *** dracut: *** Including module: shutdown *** dracut: *** Including module: suse *** dracut: *** Including modules done *** dracut: *** Installing kernel module dependencies and firmware *** dracut: *** Installing kernel module dependencies and firmware done *** dracut: *** Resolving executable dependencies *** dracut: *** Resolving executable dependencies done*** dracut: *** Hardlinking files *** dracut: *** Hardlinking files done *** dracut: *** Stripping files *** dracut: *** Stripping files done *** dracut: *** Generating early-microcode cpio image *** dracut: *** Constructing AuthenticAMD.bin dracut: *** Store current command line parameters *** dracut: Stored kernel commandline: dracut: resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b dracut: root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 rootflags=rw,relatime dracut: *** Creating image file '/boot/initrd-4.18.1' *** dracut: *** Creating initramfs image file '/boot/initrd-4.18.1' done *** Fri Aug 17 10:34:50 EDT 2018 Same machine installing a 4.17.14 kernel take 1 minute: # date ; make install ; date Fri Aug 17 13:27:41 EDT 2018 sh ./arch/x86/boot/install.sh 4.17.14-lcrs arch/x86/boot/bzImage \ System.map "/boot" renamed '/boot/config-4.17.14-lcrs' -> '/boot/config-4.17.14-lcrs.old' dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.17.14-lcrs 4.17.14-lcrs dracut: *** Including module: bash *** dracut: *** Including module: systemd *** dracut: *** Including module: warpclock *** dracut: *** Including module: systemd-initrd *** dracut: *** Including module: i18n *** dracut: Could not find FONT_MAP none! dracut: *** Including module: drm *** dracut: *** Including module: plymouth *** . . . dracut: *** Generating early-microcode cpio image *** dracut: *** Constructing AuthenticAMD.bin dracut: *** Store current command line parameters *** dracut: Stored kernel commandline: dracut: resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b dracut: root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 rootflags=rw,relatime
4.18.1 "make install" issue
Something is amiss. A "make install" of an already compiled 4.18.1 kernel is taking 10 minutes or so. The same thing for a 4.17.14 kernel takes all of a minute. And during that 10 minutes or so we seem hung up on i915.ko 2626 pts/8S+ 0:00 make -f ./scripts/Makefile.build obj=arch/x86/boot install 2627 pts/8S+ 0:00 /bin/sh /sbin/installkernel 4.18.1 arch/x86/boot/bzImage System.map /boot 2642 pts/11 Ss 0:00 /bin/bash 2659 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4581 pts/1Ss 0:00 /bin/bash 4620 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4622 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4623 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4624 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4625 pts/8S+ 0:09 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4627 pts/8R+ 1:47 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4682 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 4683 pts/8S+ 0:00 cat /lib/modules/4.18.1//kernel/drivers/gpu/drm/i915/i915.ko 4684 pts/8S+ 0:00 tr -cd [:print:] # date ; make install ; date Fri Aug 17 10:25:47 EDT 2018 sh ./arch/x86/boot/install.sh 4.18.1 arch/x86/boot/bzImage \ System.map "/boot" dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1 dracut: *** Including module: bash *** dracut: *** Including module: systemd *** dracut: *** Including module: warpclock *** dracut: *** Including module: systemd-initrd *** dracut: *** Including module: i18n *** dracut: Could not find FONT_MAP none! dracut: *** Including module: drm *** Here it hangs up for 10 minutes with the above "ps ax" output? dracut: Possible missing firmware "nvidia/gv100/sec2/sig.bin" for kernel module "nouveau.ko" . . . dracut: *** Including module: plymouth *** dracut: *** Including module: kernel-modules *** dracut: *** Including module: resume *** dracut: *** Including module: rootfs-block *** dracut: *** Including module: terminfo *** dracut: *** Including module: udev-rules *** dracut: Skipping udev rule: 40-redhat.rules dracut: Skipping udev rule: 50-firmware.rules dracut: Skipping udev rule: 50-udev.rules dracut: Skipping udev rule: 91-permissions.rules dracut: Skipping udev rule: 80-drivers-modprobe.rules dracut: *** Including module: dracut-systemd *** dracut: *** Including module: haveged *** dracut: *** Including module: ostree *** dracut: *** Including module: usrmount *** dracut: *** Including module: base *** dracut: *** Including module: fs-lib *** dracut: *** Including module: shutdown *** dracut: *** Including module: suse *** dracut: *** Including modules done *** dracut: *** Installing kernel module dependencies and firmware *** dracut: *** Installing kernel module dependencies and firmware done *** dracut: *** Resolving executable dependencies *** dracut: *** Resolving executable dependencies done*** dracut: *** Hardlinking files *** dracut: *** Hardlinking files done *** dracut: *** Stripping files *** dracut: *** Stripping files done *** dracut: *** Generating early-microcode cpio image *** dracut: *** Constructing AuthenticAMD.bin dracut: *** Store current command line parameters *** dracut: Stored kernel commandline: dracut: resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b dracut: root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 rootflags=rw,relatime dracut: *** Creating image file '/boot/initrd-4.18.1' *** dracut: *** Creating initramfs image file '/boot/initrd-4.18.1' done *** Fri Aug 17 10:34:50 EDT 2018 Same machine installing a 4.17.14 kernel take 1 minute: # date ; make install ; date Fri Aug 17 13:27:41 EDT 2018 sh ./arch/x86/boot/install.sh 4.17.14-lcrs arch/x86/boot/bzImage \ System.map "/boot" renamed '/boot/config-4.17.14-lcrs' -> '/boot/config-4.17.14-lcrs.old' dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.17.14-lcrs 4.17.14-lcrs dracut: *** Including module: bash *** dracut: *** Including module: systemd *** dracut: *** Including module: warpclock *** dracut: *** Including module: systemd-initrd *** dracut: *** Including module: i18n *** dracut: Could not find FONT_MAP none! dracut: *** Including module: drm *** dracut: *** Including module: plymouth *** . . . dracut: *** Generating early-microcode cpio image *** dracut: *** Constructing AuthenticAMD.bin dracut: *** Store current command line parameters *** dracut: Stored kernel commandline: dracut: resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b dracut: root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 rootflags=rw,relatime
AM4 B350 chipset Sata/IDE problem
I'm running a 4.13.13 kernel on an AM4 MSI B350 Tomahawk Arctic MB. I have a couple of these setups and both do the same. I get this output just building a kernel. My drives are older Seagate ST3160815AS, 3.AAD, max UDMA/133 configured at boot time for UDMA/133 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) I can take this very disk and boot it up on an AM3 based setup and I will not get these messages. I also have a pci-e Sata card in this AM4 box and when connected to it instead of the MB Sata ports, I also don't get these messages. I don't really know where the problem is but since this is a fairly new chip set I guess it could be a kernel issue so thought I would post it for you all. The kernel does in fact build and run. I get no user level errors. Nov 21 10:28:01 harley kernel: ata1.00: exception Emask 0x11 SAct 0x7ffe SErr 0x40 action 0x6 frozen Nov 21 10:28:03 harley kernel: ata1.00: irq_stat 0x4808, interface fatal error Nov 21 10:28:03 harley kernel: ata1: SError: { Handshk } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:00:00:a0:f6/08:00:0c:00:00/40 tag 0 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:08:00:a8:f6/08:00:0c:00:00/40 tag 1 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:10:00:b0:f6/08:00:0c:00:00/40 tag 2 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:18:00:b8:f6/08:00:0c:00:00/40 tag 3 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:20:00:c0:f6/08:00:0c:00:00/40 tag 4 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:28:00:c8:f6/08:00:0c:00:00/40 tag 5 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:30:00:d0:f6/08:00:0c:00:00/40 tag 6 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:38:00:d8:f6/08:00:0c:00:00/40 tag 7 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:40:00:e0:f6/08:00:0c:00:00/40 tag 8 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:48:00:e8:f6/08:00:0c:00:00/40 tag 9 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:50:00:f0:f6/08:00:0c:00:00/40 tag 10 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:58:00:f8:f6/08:00:0c:00:00/40 tag 11 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:60:00:00:f7/08:00:0c:00:00/40 tag 12 ncq dma 1048576 ou res
AM4 B350 chipset Sata/IDE problem
I'm running a 4.13.13 kernel on an AM4 MSI B350 Tomahawk Arctic MB. I have a couple of these setups and both do the same. I get this output just building a kernel. My drives are older Seagate ST3160815AS, 3.AAD, max UDMA/133 configured at boot time for UDMA/133 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) I can take this very disk and boot it up on an AM3 based setup and I will not get these messages. I also have a pci-e Sata card in this AM4 box and when connected to it instead of the MB Sata ports, I also don't get these messages. I don't really know where the problem is but since this is a fairly new chip set I guess it could be a kernel issue so thought I would post it for you all. The kernel does in fact build and run. I get no user level errors. Nov 21 10:28:01 harley kernel: ata1.00: exception Emask 0x11 SAct 0x7ffe SErr 0x40 action 0x6 frozen Nov 21 10:28:03 harley kernel: ata1.00: irq_stat 0x4808, interface fatal error Nov 21 10:28:03 harley kernel: ata1: SError: { Handshk } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:00:00:a0:f6/08:00:0c:00:00/40 tag 0 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:08:00:a8:f6/08:00:0c:00:00/40 tag 1 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:10:00:b0:f6/08:00:0c:00:00/40 tag 2 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:18:00:b8:f6/08:00:0c:00:00/40 tag 3 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:20:00:c0:f6/08:00:0c:00:00/40 tag 4 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:28:00:c8:f6/08:00:0c:00:00/40 tag 5 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:30:00:d0:f6/08:00:0c:00:00/40 tag 6 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:38:00:d8:f6/08:00:0c:00:00/40 tag 7 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:40:00:e0:f6/08:00:0c:00:00/40 tag 8 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:48:00:e8:f6/08:00:0c:00:00/40 tag 9 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:50:00:f0:f6/08:00:0c:00:00/40 tag 10 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:58:00:f8:f6/08:00:0c:00:00/40 tag 11 ncq dma 1048576 ou res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY } Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:60:00:00:f7/08:00:0c:00:00/40 tag 12 ncq dma 1048576 ou res
ata1.00: failed command: WRITE FPDMA QUEUED on new AMD AM4 MSI B350 Motherboard
With both 4.11 and 4.12 kernels I get the following when doing heavy disk I/O, like a kernel build with "make -j 15". Even copying the kernel source tree from one place to another. The hardware is an MSI B350 Tomahawk Arctic MB with 16GB of memory and a Ryzen 1700 processor. The disk being used is a 160Gb Seagate ST3160815AS that has error free media according to "badblocks -w". Jul 6 13:34:43 cpu0 kernel: ata1.00: exception Emask 0x11 SAct 0x7ffb SErr 0x40 action 0x6 frozen Jul 6 13:34:43 cpu0 kernel: ata1.00: irq_stat 0x4808, interface fatal error Jul 6 13:34:43 cpu0 kernel: ata1: SError: { Handshk } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:00:57:89:90/00:00:03:00:00/40 tag 0 ncq dma 4096 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Jul 6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:08:87:89:90/00:00:03:00:00/40 tag 1 ncq dma 4096 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Jul 6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/20:10:97:89:90/00:00:03:00:00/40 tag 2 ncq dma 16384 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) When I set the kernel cmdline option libata.force=noncq, the messages change into: [ 1724.372101] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 frozen [ 1724.375888] ata1.00: irq_stat 0x4801, interface fatal error [ 1724.379721] ata1: SError: { Handshk } [ 1724.383691] ata1.00: failed command: WRITE DMA EXT [ 1724.383695] ata1.00: cmd 35/00:50:67:0d:e4/00:09:02:00:00/e0 tag 10 dma 1220608 out res 51/84:50:67:0d:e4/00:09:02:00:00/e0 Emask 0x10 (ATA bus error) [ 1724.383699] ata1.00: status: { DRDY ERR } [ 1724.383700] ata1.00: error: { ICRC ABRT } [ 1724.383706] ata1: hard resetting link [ 1724.850060] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1724.959883] ata1.00: configured for UDMA/133 [ 1724.959910] ata1: EH complete [ 1921.704356] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 frozen [ 1921.708292] ata1.00: irq_stat 0x4801, interface fatal error [ 1921.712210] ata1: SError: { Handshk } [ 1921.716294] ata1.00: failed command: WRITE DMA EXT [ 1921.716297] ata1.00: cmd 35/00:90:ef:93:86/00:03:02:00:00/e0 tag 18 dma 466944 out res 51/84:90:ef:93:86/00:03:02:00:00/e0 Emask 0x10 (ATA bus error) [ 1921.716298] ata1.00: status: { DRDY ERR } [ 1921.716298] ata1.00: error: { ICRC ABRT } [ 1921.716303] ata1: hard resetting link [ 1922.175312] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1922.284165] ata1.00: configured for UDMA/133 [ 1922.288602] ata1: EH complete smartctl shows no issues with the drive. In fact I can take this very drive and install it an an AM3 machine and everything works just fine. I have also installed a PCI-e Sata card and connected the drive to that and that works just fine also. So I have either a linux kernel problem or a hardware problem on this brand new AM4 motherboard. I don't really know what it is other than it is something related with the AMD B350 chipset. It is a fairly new chip set so I am suspicious of the kernel. # smartctl -a /dev/sda smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.11.6-lcrs] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 Device Model: ST3160815AS Serial Number:6RACD737 Firmware Version: 4.AAB User Capacity:160,041,885,696 bytes [160 GB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: ATA/ATAPI-7 (minor revision not indicated) Local Time is:Fri Jul 7 13:50:50 2017 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:( 430) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection
ata1.00: failed command: WRITE FPDMA QUEUED on new AMD AM4 MSI B350 Motherboard
With both 4.11 and 4.12 kernels I get the following when doing heavy disk I/O, like a kernel build with "make -j 15". Even copying the kernel source tree from one place to another. The hardware is an MSI B350 Tomahawk Arctic MB with 16GB of memory and a Ryzen 1700 processor. The disk being used is a 160Gb Seagate ST3160815AS that has error free media according to "badblocks -w". Jul 6 13:34:43 cpu0 kernel: ata1.00: exception Emask 0x11 SAct 0x7ffb SErr 0x40 action 0x6 frozen Jul 6 13:34:43 cpu0 kernel: ata1.00: irq_stat 0x4808, interface fatal error Jul 6 13:34:43 cpu0 kernel: ata1: SError: { Handshk } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:00:57:89:90/00:00:03:00:00/40 tag 0 ncq dma 4096 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Jul 6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:08:87:89:90/00:00:03:00:00/40 tag 1 ncq dma 4096 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Jul 6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY } Jul 6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED Jul 6 13:34:43 cpu0 kernel: ata1.00: cmd 61/20:10:97:89:90/00:00:03:00:00/40 tag 2 ncq dma 16384 out res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) When I set the kernel cmdline option libata.force=noncq, the messages change into: [ 1724.372101] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 frozen [ 1724.375888] ata1.00: irq_stat 0x4801, interface fatal error [ 1724.379721] ata1: SError: { Handshk } [ 1724.383691] ata1.00: failed command: WRITE DMA EXT [ 1724.383695] ata1.00: cmd 35/00:50:67:0d:e4/00:09:02:00:00/e0 tag 10 dma 1220608 out res 51/84:50:67:0d:e4/00:09:02:00:00/e0 Emask 0x10 (ATA bus error) [ 1724.383699] ata1.00: status: { DRDY ERR } [ 1724.383700] ata1.00: error: { ICRC ABRT } [ 1724.383706] ata1: hard resetting link [ 1724.850060] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1724.959883] ata1.00: configured for UDMA/133 [ 1724.959910] ata1: EH complete [ 1921.704356] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 frozen [ 1921.708292] ata1.00: irq_stat 0x4801, interface fatal error [ 1921.712210] ata1: SError: { Handshk } [ 1921.716294] ata1.00: failed command: WRITE DMA EXT [ 1921.716297] ata1.00: cmd 35/00:90:ef:93:86/00:03:02:00:00/e0 tag 18 dma 466944 out res 51/84:90:ef:93:86/00:03:02:00:00/e0 Emask 0x10 (ATA bus error) [ 1921.716298] ata1.00: status: { DRDY ERR } [ 1921.716298] ata1.00: error: { ICRC ABRT } [ 1921.716303] ata1: hard resetting link [ 1922.175312] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1922.284165] ata1.00: configured for UDMA/133 [ 1922.288602] ata1: EH complete smartctl shows no issues with the drive. In fact I can take this very drive and install it an an AM3 machine and everything works just fine. I have also installed a PCI-e Sata card and connected the drive to that and that works just fine also. So I have either a linux kernel problem or a hardware problem on this brand new AM4 motherboard. I don't really know what it is other than it is something related with the AMD B350 chipset. It is a fairly new chip set so I am suspicious of the kernel. # smartctl -a /dev/sda smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.11.6-lcrs] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 Device Model: ST3160815AS Serial Number:6RACD737 Firmware Version: 4.AAB User Capacity:160,041,885,696 bytes [160 GB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: ATA/ATAPI-7 (minor revision not indicated) Local Time is:Fri Jul 7 13:50:50 2017 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:( 430) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/22/2016 03:45 AM, Joerg Roedel wrote: On Mon, Nov 21, 2016 at 04:47:59PM -0500, Mark Hounschell wrote: OK, I did get this message before the reported BUG message. gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0xffee8000] [size=8192 bytes] But I've verified that the dma_addr_t that I get for the alloc, and also use for the free is 0xffee8000 in this case? Is device "address=0xffee8000" in that message a bug in the message or do we have a sign extended address problem? It seems strange to me, I've never seen a dma_addr_t given, when using the iommu, that high. In the past I've seen them as usually 0x00xx? I have also verified that simply changing from pci_alloc/free_consistent to the newer DMA API fixes my issue and I get no such messages. Yes, this looks like a sign-extension bug somewhere. But its not in the amd-iommu driver, because dma-debug also sees it. And from what I can tell the dma-api interface seems to be fine. It consistently uses dma_addr_t to pass these values around. Where can I find the source of the failing code? I need exactly the code version that triggers the problem. I certainly don't have a problem sending you the code but I'm sure you have better things to do than scour over some out of kernel GPL driver code. I see many many users of pci_alloc/free in kernel so it can't be broken as badly as it appears to me here. I'm going to just go ahead and convert this section of code to use the newer DMA API and be done with it. It appears pci_alloc/free_consistent is going to be removed completely soon anyway. Thanks Mark
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/22/2016 03:45 AM, Joerg Roedel wrote: On Mon, Nov 21, 2016 at 04:47:59PM -0500, Mark Hounschell wrote: OK, I did get this message before the reported BUG message. gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0xffee8000] [size=8192 bytes] But I've verified that the dma_addr_t that I get for the alloc, and also use for the free is 0xffee8000 in this case? Is device "address=0xffee8000" in that message a bug in the message or do we have a sign extended address problem? It seems strange to me, I've never seen a dma_addr_t given, when using the iommu, that high. In the past I've seen them as usually 0x00xx? I have also verified that simply changing from pci_alloc/free_consistent to the newer DMA API fixes my issue and I get no such messages. Yes, this looks like a sign-extension bug somewhere. But its not in the amd-iommu driver, because dma-debug also sees it. And from what I can tell the dma-api interface seems to be fine. It consistently uses dma_addr_t to pass these values around. Where can I find the source of the failing code? I need exactly the code version that triggers the problem. I certainly don't have a problem sending you the code but I'm sure you have better things to do than scour over some out of kernel GPL driver code. I see many many users of pci_alloc/free in kernel so it can't be broken as badly as it appears to me here. I'm going to just go ahead and convert this section of code to use the newer DMA API and be done with it. It appears pci_alloc/free_consistent is going to be removed completely soon anyway. Thanks Mark
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/21/2016 10:32 AM, Joerg Roedel wrote: > On Fri, Nov 18, 2016 at 09:04:05AM -0500, Mark Hounschell wrote: >> OK. It's attached. > >> [ 45.033715] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1158 >> check_unmap+0x4ef/0x990 >> [ 45.037651] 3c59x :04:04.0: DMA-API: device driver failed to check >> map error[device address=0xffedf840] [size=1536 bytes] [mapped as >> single] > > Okay, there is a dma-debug warning from the 3c59x driver. Can you please > blacklist this driver (or compile it out) and re-try? The dma-debug code > will only show the first warning and stop then. So any possible warnings > from your out-of-tree driver will not be shown as long as the 3c59x > driver is loaded. > > Alternativly you can write 1 to dma-api/all_errors in debug-fs so that > it will show all errors. But I suggest to unload the 3com driver first > and then load your driver. > OK, I did get this message before the reported BUG message. gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0xffee8000] [size=8192 bytes] But I've verified that the dma_addr_t that I get for the alloc, and also use for the free is 0xffee8000 in this case? Is device "address=0xffee8000" in that message a bug in the message or do we have a sign extended address problem? It seems strange to me, I've never seen a dma_addr_t given, when using the iommu, that high. In the past I've seen them as usually 0x00xx? I have also verified that simply changing from pci_alloc/free_consistent to the newer DMA API fixes my issue and I get no such messages. Mark
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/21/2016 10:32 AM, Joerg Roedel wrote: > On Fri, Nov 18, 2016 at 09:04:05AM -0500, Mark Hounschell wrote: >> OK. It's attached. > >> [ 45.033715] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1158 >> check_unmap+0x4ef/0x990 >> [ 45.037651] 3c59x :04:04.0: DMA-API: device driver failed to check >> map error[device address=0xffedf840] [size=1536 bytes] [mapped as >> single] > > Okay, there is a dma-debug warning from the 3c59x driver. Can you please > blacklist this driver (or compile it out) and re-try? The dma-debug code > will only show the first warning and stop then. So any possible warnings > from your out-of-tree driver will not be shown as long as the 3c59x > driver is loaded. > > Alternativly you can write 1 to dma-api/all_errors in debug-fs so that > it will show all errors. But I suggest to unload the 3com driver first > and then load your driver. > OK, I did get this message before the reported BUG message. gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0xffee8000] [size=8192 bytes] But I've verified that the dma_addr_t that I get for the alloc, and also use for the free is 0xffee8000 in this case? Is device "address=0xffee8000" in that message a bug in the message or do we have a sign extended address problem? It seems strange to me, I've never seen a dma_addr_t given, when using the iommu, that high. In the past I've seen them as usually 0x00xx? I have also verified that simply changing from pci_alloc/free_consistent to the newer DMA API fixes my issue and I get no such messages. Mark
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/17/2016 05:00 PM, Joerg Roedel wrote: On Thu, Nov 17, 2016 at 04:53:24PM -0500, Mark Hounschell wrote: On 11/17/2016 04:41 PM, Joerg Roedel wrote: Hi Mark, On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote: Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. Can you please run the driver with dma-api debugging enabled? Does it trigger any warnings? It's been a while since I've done that for ya. Kernel config options? Kernel cmd line you want? Just enable CONFIG_DMA_API_DEBUG and boot the resulting kernel. Please send me full dmesg when the issue happens again. OK. It's attached. Mark [0.00] Linux version 4.8.7-lcrs (markh@harley) (gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) ) #1 SMP PREEMPT Fri Nov 18 08:19:32 EST 2016 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.8.7-lcrs root=/dev/disk/by-id/ata-ST3160815AS_6RACD7A6-part5 splash=verbose showopts video=1600x1200 noresume apm=off vmalloc=256M audit=0 rcu_nocbs=1,2,3,4,5,6,7 3 [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable [0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xae474fff] usable [0.00] BIOS-e820: [mem 0xae475000-0xaea32fff] reserved [0.00] BIOS-e820: [mem 0xaea33000-0xaee2efff] ACPI NVS [0.00] BIOS-e820: [mem 0xaee2f000-0xaf154fff] reserved [0.00] BIOS-e820: [mem 0xaf155000-0xaf155fff] usable [0.00] BIOS-e820: [mem 0xaf156000-0xaf35bfff] ACPI NVS [0.00] BIOS-e820: [mem 0xaf35c000-0xaf7f] usable [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved [0.00] BIOS-e820: [mem 0xfed61000-0xfed70fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfef0-0x] reserved [0.00] BIOS-e820: [mem 0x00011000-0x00044fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x45 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B write-through [0.00] C-C write-protect [0.00] D-EBFFF uncachable [0.00] EC000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base mask 8000 write-back [0.00] 1 base 8000 mask C000 write-back [0.00] 2 base AF80 mask FF80 uncachable [0.00] 3 base B000 mask F000 uncachable [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00045000 aka 17664M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xaf80-0x] usable ==> reserved [0.00] e820: last_pfn = 0xaf800 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd990-0x000fd99f] mapped at [880fd990] [0.00] Base memory trampoline at [88098000] 98000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x01113000, 0x01113fff] PGTABLE [0.00] BRK [0x01114000, 0x01114fff] PGTABLE [0.00] BRK [0x01115000, 0x01115fff] PGTABLE [0.00] BRK [0x01116000, 0x01116fff] PGTABLE [0.0
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/17/2016 05:00 PM, Joerg Roedel wrote: On Thu, Nov 17, 2016 at 04:53:24PM -0500, Mark Hounschell wrote: On 11/17/2016 04:41 PM, Joerg Roedel wrote: Hi Mark, On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote: Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. Can you please run the driver with dma-api debugging enabled? Does it trigger any warnings? It's been a while since I've done that for ya. Kernel config options? Kernel cmd line you want? Just enable CONFIG_DMA_API_DEBUG and boot the resulting kernel. Please send me full dmesg when the issue happens again. OK. It's attached. Mark [0.00] Linux version 4.8.7-lcrs (markh@harley) (gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) ) #1 SMP PREEMPT Fri Nov 18 08:19:32 EST 2016 [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.8.7-lcrs root=/dev/disk/by-id/ata-ST3160815AS_6RACD7A6-part5 splash=verbose showopts video=1600x1200 noresume apm=off vmalloc=256M audit=0 rcu_nocbs=1,2,3,4,5,6,7 3 [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable [0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xae474fff] usable [0.00] BIOS-e820: [mem 0xae475000-0xaea32fff] reserved [0.00] BIOS-e820: [mem 0xaea33000-0xaee2efff] ACPI NVS [0.00] BIOS-e820: [mem 0xaee2f000-0xaf154fff] reserved [0.00] BIOS-e820: [mem 0xaf155000-0xaf155fff] usable [0.00] BIOS-e820: [mem 0xaf156000-0xaf35bfff] ACPI NVS [0.00] BIOS-e820: [mem 0xaf35c000-0xaf7f] usable [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved [0.00] BIOS-e820: [mem 0xfed61000-0xfed70fff] reserved [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfef0-0x] reserved [0.00] BIOS-e820: [mem 0x00011000-0x00044fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x45 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B write-through [0.00] C-C write-protect [0.00] D-EBFFF uncachable [0.00] EC000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base mask 8000 write-back [0.00] 1 base 8000 mask C000 write-back [0.00] 2 base AF80 mask FF80 uncachable [0.00] 3 base B000 mask F000 uncachable [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00045000 aka 17664M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xaf80-0x] usable ==> reserved [0.00] e820: last_pfn = 0xaf800 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd990-0x000fd99f] mapped at [880fd990] [0.00] Base memory trampoline at [88098000] 98000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x01113000, 0x01113fff] PGTABLE [0.00] BRK [0x01114000, 0x01114fff] PGTABLE [0.00] BRK [0x01115000, 0x01115fff] PGTABLE [0.00] BRK [0x01116000, 0x01116fff] PGTABLE [0.0
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/17/2016 04:41 PM, Joerg Roedel wrote: Hi Mark, On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote: Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. Can you please run the driver with dma-api debugging enabled? Does it trigger any warnings? It's been a while since I've done that for ya. Kernel config options? Kernel cmd line you want? Mark
Re: BUG at drivers/iommu/amd_iommu.c:1436!
On 11/17/2016 04:41 PM, Joerg Roedel wrote: Hi Mark, On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote: Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. Can you please run the driver with dma-api debugging enabled? Does it trigger any warnings? It's been a while since I've done that for ya. Kernel config options? Kernel cmd line you want? Mark
BUG at drivers/iommu/amd_iommu.c:1436!
Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. The commit is: 2d4c515bf06c9bce87b546279413621f847ef6a3 is the first bad commit commit 2d4c515bf06c9bce87b546279413621f847ef6a3 Author: Joerg RoedelDate: Tue Jul 5 16:21:32 2016 +0200 iommu/amd: Remove other remains of old address allocator There are other remains in the code from the old allocatore. Remove them all. Signed-off-by: Joerg Roedel :04 04 87b020717cdd7dcab45e3574dfb6649d05dcc044 817e613acede15fafb70b046d831f558e6bffd93 M drivers Nov 16 08:39:15 harley kernel: kernel BUG at drivers/iommu/amd_iommu.c:1436! Nov 16 08:39:15 harley kernel: invalid opcode: [#1] PREEMPT SMP Nov 16 08:39:15 harley kernel: Modules linked in: gpiohsd(O) bnep bluetooth rfkill nvidia(PO) drm af_packet iscsi_ibft iscsi_boot_sysfs snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec snd_hda_core xhci_pci snd_hwdep xhci_hcd snd_pcm synclink_gt osst 3c59x r8169 irqbypass crc32_pclmul dgap(C) snd_timer n_hdlc hdlc crc32c_intel snd tpm_infineon st joydev mii aesni_intel shpchp aes_x86_64 glue_helper tpm_tis tpm_tis_core input_leds lrw tpm k10temp fam15h_power soundcore gf128mul ablk_helper processor i2c_piix4 fjes serio_raw pcspkr cryptd dm_mod sr_mod cdrom ata_generic aic79xx pata_atiixp ohci_pci aic7xxx scsi_transport_spi mxm_wmi wmi button sg autofs4 Nov 16 08:39:15 harley kernel: CPU: 6 PID: 4750 Comm: trusim1a Tainted: P C O4.8.0 #1 Nov 16 08:39:15 harley kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 Nov 16 08:39:15 harley kernel: task: 88042d849c00 task.stack: 88043991 Nov 16 08:39:15 harley kernel: RIP: 0010:[] [] iommu_unmap_page+0xd3/0xe0 Nov 16 08:39:15 harley kernel: RSP: 0018:880439913b60 EFLAGS: 00010202 Nov 16 08:39:15 harley kernel: RAX: 122f RBX: 1000 RCX: 0027 Nov 16 08:39:15 harley kernel: RDX: fdba RSI: 88043b13a0a0 RDI: Nov 16 08:39:15 harley kernel: RBP: 880439913b90 R08: R09: Nov 16 08:39:15 harley kernel: R10: 0005 R11: 0002 R12: 88043b13a0a0 Nov 16 08:39:15 harley kernel: R13: 88043b13a000 R14: 1230 R15: ffec6260 Nov 16 08:39:15 harley kernel: FS: 7fb360e6e700() GS:88044fd8() knlGS: Nov 16 08:39:15 harley kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 16 08:39:15 harley kernel: CR2: 7fe345af58f0 CR3: 00042d48a000 CR4: 000406e0 Nov 16 08:39:15 harley kernel: Stack: Nov 16 08:39:15 harley kernel: 0246 0001 ffec7000 0001 Nov 16 08:39:15 harley kernel: 88043b13a000 ffec6000 880439913bd0 80620298 Nov 16 08:39:15 harley kernel: 88043d2480a0 1000 88043d2480a0 ffec6000 Nov 16 08:39:15 harley kernel: Call Trace: Nov 16 08:39:15 harley kernel: [] __unmap_single.isra.23+0x58/0x1b0 Nov 16 08:39:15 harley kernel: [] free_coherent+0x76/0xc0 Nov 16 08:39:15 harley kernel: [] ehsd_set_signal+0x395/0x620 [gpiohsd] Nov 16 08:39:15 harley kernel: [] ? finish_task_switch+0x73/0x1e0 Nov 16 08:39:15 harley kernel: [] ehsd_ioctl+0xdb9/0x17cb [gpiohsd] Nov 16 08:39:15 harley kernel: [] ? pat_enabled+0x20/0x20 Nov 16 08:39:15 harley kernel: [] ? walk_system_ram_range+0x70/0xc0 Nov 16 08:39:15 harley kernel: [] ? unmap_page_range+0x698/0x8a0 Nov 16 08:39:15 harley kernel: [] ? find_next_bit+0x19/0x20 Nov 16 08:39:15 harley kernel: [] ? cpumask_any_but+0x2c/0x40 Nov 16 08:39:15 harley kernel: [] ? flush_tlb_mm_range+0x4c/0x1a0 Nov 16 08:39:15 harley kernel: [] ? tlb_finish_mmu+0x17/0x50 Nov 16 08:39:15 harley kernel: [] ? unmap_region+0xe0/0x110 Nov 16 08:39:15 harley kernel: [] ? __rb_erase_color+0x138/0x280 Nov 16 08:39:15 harley kernel: [] do_vfs_ioctl+0x8f/0x5a0 Nov 16 08:39:15 harley kernel: [] ? do_munmap+0x27d/0x370 Nov 16 08:39:15 harley kernel: [] SyS_ioctl+0x74/0x80 Nov 16 08:39:15 harley kernel: [] entry_SYSCALL_64_fastpath+0x17/0x93 Nov 16 08:39:15 harley kernel: Code: d7 49 01 c7 4c 39 f3 77 89 4d 85 f6 75 14 48 83 c4 08 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 49 8d 46 ff 4c 85 f0 74 e3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 8b 57 7c 48 89 f0 85 d2 Nov 16 08:39:15 harley kernel: RIP [] iommu_unmap_page+0xd3/0xe0 Nov 16 08:39:15 harley kernel: RSP Nov 16 08:39:15 harley kernel: ---[ end trace 45a510d4d695b2d3 ]--- Regards Mark
BUG at drivers/iommu/amd_iommu.c:1436!
Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free causes this. The commit is: 2d4c515bf06c9bce87b546279413621f847ef6a3 is the first bad commit commit 2d4c515bf06c9bce87b546279413621f847ef6a3 Author: Joerg Roedel Date: Tue Jul 5 16:21:32 2016 +0200 iommu/amd: Remove other remains of old address allocator There are other remains in the code from the old allocatore. Remove them all. Signed-off-by: Joerg Roedel :04 04 87b020717cdd7dcab45e3574dfb6649d05dcc044 817e613acede15fafb70b046d831f558e6bffd93 M drivers Nov 16 08:39:15 harley kernel: kernel BUG at drivers/iommu/amd_iommu.c:1436! Nov 16 08:39:15 harley kernel: invalid opcode: [#1] PREEMPT SMP Nov 16 08:39:15 harley kernel: Modules linked in: gpiohsd(O) bnep bluetooth rfkill nvidia(PO) drm af_packet iscsi_ibft iscsi_boot_sysfs snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec snd_hda_core xhci_pci snd_hwdep xhci_hcd snd_pcm synclink_gt osst 3c59x r8169 irqbypass crc32_pclmul dgap(C) snd_timer n_hdlc hdlc crc32c_intel snd tpm_infineon st joydev mii aesni_intel shpchp aes_x86_64 glue_helper tpm_tis tpm_tis_core input_leds lrw tpm k10temp fam15h_power soundcore gf128mul ablk_helper processor i2c_piix4 fjes serio_raw pcspkr cryptd dm_mod sr_mod cdrom ata_generic aic79xx pata_atiixp ohci_pci aic7xxx scsi_transport_spi mxm_wmi wmi button sg autofs4 Nov 16 08:39:15 harley kernel: CPU: 6 PID: 4750 Comm: trusim1a Tainted: P C O4.8.0 #1 Nov 16 08:39:15 harley kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 Nov 16 08:39:15 harley kernel: task: 88042d849c00 task.stack: 88043991 Nov 16 08:39:15 harley kernel: RIP: 0010:[] [] iommu_unmap_page+0xd3/0xe0 Nov 16 08:39:15 harley kernel: RSP: 0018:880439913b60 EFLAGS: 00010202 Nov 16 08:39:15 harley kernel: RAX: 122f RBX: 1000 RCX: 0027 Nov 16 08:39:15 harley kernel: RDX: fdba RSI: 88043b13a0a0 RDI: Nov 16 08:39:15 harley kernel: RBP: 880439913b90 R08: R09: Nov 16 08:39:15 harley kernel: R10: 0005 R11: 0002 R12: 88043b13a0a0 Nov 16 08:39:15 harley kernel: R13: 88043b13a000 R14: 1230 R15: ffec6260 Nov 16 08:39:15 harley kernel: FS: 7fb360e6e700() GS:88044fd8() knlGS: Nov 16 08:39:15 harley kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 16 08:39:15 harley kernel: CR2: 7fe345af58f0 CR3: 00042d48a000 CR4: 000406e0 Nov 16 08:39:15 harley kernel: Stack: Nov 16 08:39:15 harley kernel: 0246 0001 ffec7000 0001 Nov 16 08:39:15 harley kernel: 88043b13a000 ffec6000 880439913bd0 80620298 Nov 16 08:39:15 harley kernel: 88043d2480a0 1000 88043d2480a0 ffec6000 Nov 16 08:39:15 harley kernel: Call Trace: Nov 16 08:39:15 harley kernel: [] __unmap_single.isra.23+0x58/0x1b0 Nov 16 08:39:15 harley kernel: [] free_coherent+0x76/0xc0 Nov 16 08:39:15 harley kernel: [] ehsd_set_signal+0x395/0x620 [gpiohsd] Nov 16 08:39:15 harley kernel: [] ? finish_task_switch+0x73/0x1e0 Nov 16 08:39:15 harley kernel: [] ehsd_ioctl+0xdb9/0x17cb [gpiohsd] Nov 16 08:39:15 harley kernel: [] ? pat_enabled+0x20/0x20 Nov 16 08:39:15 harley kernel: [] ? walk_system_ram_range+0x70/0xc0 Nov 16 08:39:15 harley kernel: [] ? unmap_page_range+0x698/0x8a0 Nov 16 08:39:15 harley kernel: [] ? find_next_bit+0x19/0x20 Nov 16 08:39:15 harley kernel: [] ? cpumask_any_but+0x2c/0x40 Nov 16 08:39:15 harley kernel: [] ? flush_tlb_mm_range+0x4c/0x1a0 Nov 16 08:39:15 harley kernel: [] ? tlb_finish_mmu+0x17/0x50 Nov 16 08:39:15 harley kernel: [] ? unmap_region+0xe0/0x110 Nov 16 08:39:15 harley kernel: [] ? __rb_erase_color+0x138/0x280 Nov 16 08:39:15 harley kernel: [] do_vfs_ioctl+0x8f/0x5a0 Nov 16 08:39:15 harley kernel: [] ? do_munmap+0x27d/0x370 Nov 16 08:39:15 harley kernel: [] SyS_ioctl+0x74/0x80 Nov 16 08:39:15 harley kernel: [] entry_SYSCALL_64_fastpath+0x17/0x93 Nov 16 08:39:15 harley kernel: Code: d7 49 01 c7 4c 39 f3 77 89 4d 85 f6 75 14 48 83 c4 08 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 49 8d 46 ff 4c 85 f0 74 e3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 8b 57 7c 48 89 f0 85 d2 Nov 16 08:39:15 harley kernel: RIP [] iommu_unmap_page+0xd3/0xe0 Nov 16 08:39:15 harley kernel: RSP Nov 16 08:39:15 harley kernel: ---[ end trace 45a510d4d695b2d3 ]--- Regards Mark
Re: [PATCH 4.7 114/143] Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
On 09/05/2016 12:44 PM, Greg Kroah-Hartman wrote: 4.7-stable review patch. If anyone has any objections, please let me know. -- From: Jens Axboecommit 468c298ad3ed3f0d94a65f8ca00f6bfc6c2b4e33 upstream. This reverts commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e. Signed-off-by: Greg Kroah-Hartman --- drivers/block/floppy.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,6 +3663,11 @@ static int floppy_open(struct block_devi opened_bdev[drive] = bdev; + if (!(mode & (FMODE_READ|FMODE_WRITE))) { + res = -EINVAL; + goto out; + } + res = -ENXIO; if (!floppy_track_buffer) { @@ -3706,15 +3711,13 @@ static int floppy_open(struct block_devi if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - if (mode & (FMODE_READ|FMODE_WRITE)) { - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; - } + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; res = -EROFS; Can I assume the one to revert 09954bad448791ef01202351d437abdd9497a804 will be a separate one? Thanks Mark
Re: [PATCH 4.7 114/143] Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
On 09/05/2016 12:44 PM, Greg Kroah-Hartman wrote: 4.7-stable review patch. If anyone has any objections, please let me know. -- From: Jens Axboe commit 468c298ad3ed3f0d94a65f8ca00f6bfc6c2b4e33 upstream. This reverts commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e. Signed-off-by: Greg Kroah-Hartman --- drivers/block/floppy.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,6 +3663,11 @@ static int floppy_open(struct block_devi opened_bdev[drive] = bdev; + if (!(mode & (FMODE_READ|FMODE_WRITE))) { + res = -EINVAL; + goto out; + } + res = -ENXIO; if (!floppy_track_buffer) { @@ -3706,15 +3711,13 @@ static int floppy_open(struct block_devi if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - if (mode & (FMODE_READ|FMODE_WRITE)) { - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; - } + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; res = -EROFS; Can I assume the one to revert 09954bad448791ef01202351d437abdd9497a804 will be a separate one? Thanks Mark
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/25/2016 10:41 AM, Jens Axboe wrote: On 08/25/2016 07:08 AM, Mark Hounschell wrote: On 08/24/2016 05:11 PM, Jiri Kosina wrote: On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote: I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Does the same problem also happen in Linus's tree? If not, any hints on the patch that might have fixed it there? Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user land multiple ways) should be reverted and a fix for what ever obscure BUG was supposed to be fixed should be "retried". That's still an unresolved issue, and it's on my list to things to look into. This particular patch though fixes a different issue, and should be applied to -stable. This patch fixes a bug that was introduced to fix "some other obscure BUG" that has nothing to do whatever with the "physical floppy" device. And it broke user land in at least 2 ways. All this patch does is revert part of the original patch so that it is only broke in 1 way to user land. It should revert the whole thing IMHO. Which patch is this? If that is truly the case, it should be reverted asap. commit 09954bad448791ef01202351d437abdd9497a804 seems to be the one. Mark
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/25/2016 10:41 AM, Jens Axboe wrote: On 08/25/2016 07:08 AM, Mark Hounschell wrote: On 08/24/2016 05:11 PM, Jiri Kosina wrote: On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote: I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Does the same problem also happen in Linus's tree? If not, any hints on the patch that might have fixed it there? Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user land multiple ways) should be reverted and a fix for what ever obscure BUG was supposed to be fixed should be "retried". That's still an unresolved issue, and it's on my list to things to look into. This particular patch though fixes a different issue, and should be applied to -stable. This patch fixes a bug that was introduced to fix "some other obscure BUG" that has nothing to do whatever with the "physical floppy" device. And it broke user land in at least 2 ways. All this patch does is revert part of the original patch so that it is only broke in 1 way to user land. It should revert the whole thing IMHO. Which patch is this? If that is truly the case, it should be reverted asap. commit 09954bad448791ef01202351d437abdd9497a804 seems to be the one. Mark
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/24/2016 05:11 PM, Jiri Kosina wrote: On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote: I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Does the same problem also happen in Linus's tree? If not, any hints on the patch that might have fixed it there? Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user land multiple ways) should be reverted and a fix for what ever obscure BUG was supposed to be fixed should be "retried". That's still an unresolved issue, and it's on my list to things to look into. This particular patch though fixes a different issue, and should be applied to -stable. This patch fixes a bug that was introduced to fix "some other obscure BUG" that has nothing to do whatever with the "physical floppy" device. And it broke user land in at least 2 ways. All this patch does is revert part of the original patch so that it is only broke in 1 way to user land. It should revert the whole thing IMHO. Regards Mark Thanks,
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/24/2016 05:11 PM, Jiri Kosina wrote: On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote: I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Does the same problem also happen in Linus's tree? If not, any hints on the patch that might have fixed it there? Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user land multiple ways) should be reverted and a fix for what ever obscure BUG was supposed to be fixed should be "retried". That's still an unresolved issue, and it's on my list to things to look into. This particular patch though fixes a different issue, and should be applied to -stable. This patch fixes a bug that was introduced to fix "some other obscure BUG" that has nothing to do whatever with the "physical floppy" device. And it broke user land in at least 2 ways. All this patch does is revert part of the original patch so that it is only broke in 1 way to user land. It should revert the whole thing IMHO. Regards Mark Thanks,
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/18/2016 09:59 AM, Greg Kroah-Hartman wrote: 4.7-stable review patch. If anyone has any objections, please let me know. -- From: Jiri Kosinacommit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e upstream. Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_devi opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_devi if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Thanks Mark
Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open
On 08/18/2016 09:59 AM, Greg Kroah-Hartman wrote: 4.7-stable review patch. If anyone has any objections, please let me know. -- From: Jiri Kosina commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e upstream. Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_devi opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_devi if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; I have a problem with this patch. It only fixes one of the regressions caused by the original change to the floppy driver. It does not address the user land breakage of removing the NODELAY flag checks. Thanks Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/12/2016 04:54 AM, Jiri Kosina wrote: On Mon, 11 Jul 2016, Mark Hounschell wrote: Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. Are we making any progress on fixing this regression? Anything I can do? Regards Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/12/2016 04:54 AM, Jiri Kosina wrote: On Mon, 11 Jul 2016, Mark Hounschell wrote: Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. Are we making any progress on fixing this regression? Anything I can do? Regards Mark
Re: Context switch latency in tickless isolated CPU
On 08/22/2016 11:37 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote: On 08/22/2016 10:48 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote: On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote: If latency is all you care about, one approach is to map the device registers into userspace and do the I/O without assistance from the kernel. In addition to the context switch latency, local interrupts are also closed during user_enter and user_exit of the context tracking. Therefore, the interrupt latency might be also increased on the isolated tickless CPU. That will degrade the real time performance. Are these two events determined? Hmmm... Why would you be taking interrupts on your isolated tickless CPUs? Doesn't that defeat the purpose of designating them as isolated and tickless? Don't mean to butt in here but think about a "special" PCI card that does nothing but take an external interrupt or external interrupts from an outside source where the latency between the time it occurs on the outside and the time an isolated processor can act on that event. The IRQ of that card also being pinned/isolated to that processor. This is a very common thing in the RT world. In this case, the host OS would see an event-driven real-time workload from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL on the host OS. Of course, if you are instead building an OS to run on the PCI card itself, then the choice of configuration would depend on how the PCI card was set up. If it polled hardware, then NO_HZ_FULL on the PCI card might work quite well. But then you wouldn't have interrupts (on the PCI card), so I am guessing that you mean the scenario covered in the first paragraph. Or am I missing your point? Thanx, Paul First paragraph scenario is the one I was referring. Thanks Mark Mark The key point being that effective use of NO_HZ_FULL requires careful configuration and complete understanding of your workload. And it is quite possible that you instead need to use something other than NO_HZ_FULL. If your question is instead "why must interrupts be disabled during context tracking", I must defer to people who understand the x86 entry/exit code paths better than I do. Thanx, Paul
Re: Context switch latency in tickless isolated CPU
On 08/22/2016 11:37 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote: On 08/22/2016 10:48 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote: On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney wrote: If latency is all you care about, one approach is to map the device registers into userspace and do the I/O without assistance from the kernel. In addition to the context switch latency, local interrupts are also closed during user_enter and user_exit of the context tracking. Therefore, the interrupt latency might be also increased on the isolated tickless CPU. That will degrade the real time performance. Are these two events determined? Hmmm... Why would you be taking interrupts on your isolated tickless CPUs? Doesn't that defeat the purpose of designating them as isolated and tickless? Don't mean to butt in here but think about a "special" PCI card that does nothing but take an external interrupt or external interrupts from an outside source where the latency between the time it occurs on the outside and the time an isolated processor can act on that event. The IRQ of that card also being pinned/isolated to that processor. This is a very common thing in the RT world. In this case, the host OS would see an event-driven real-time workload from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL on the host OS. Of course, if you are instead building an OS to run on the PCI card itself, then the choice of configuration would depend on how the PCI card was set up. If it polled hardware, then NO_HZ_FULL on the PCI card might work quite well. But then you wouldn't have interrupts (on the PCI card), so I am guessing that you mean the scenario covered in the first paragraph. Or am I missing your point? Thanx, Paul First paragraph scenario is the one I was referring. Thanks Mark Mark The key point being that effective use of NO_HZ_FULL requires careful configuration and complete understanding of your workload. And it is quite possible that you instead need to use something other than NO_HZ_FULL. If your question is instead "why must interrupts be disabled during context tracking", I must defer to people who understand the x86 entry/exit code paths better than I do. Thanx, Paul
Re: Context switch latency in tickless isolated CPU
On 08/22/2016 10:48 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote: On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenneywrote: If latency is all you care about, one approach is to map the device registers into userspace and do the I/O without assistance from the kernel. In addition to the context switch latency, local interrupts are also closed during user_enter and user_exit of the context tracking. Therefore, the interrupt latency might be also increased on the isolated tickless CPU. That will degrade the real time performance. Are these two events determined? Hmmm... Why would you be taking interrupts on your isolated tickless CPUs? Doesn't that defeat the purpose of designating them as isolated and tickless? Don't mean to butt in here but think about a "special" PCI card that does nothing but take an external interrupt or external interrupts from an outside source where the latency between the time it occurs on the outside and the time an isolated processor can act on that event. The IRQ of that card also being pinned/isolated to that processor. This is a very common thing in the RT world. Mark The key point being that effective use of NO_HZ_FULL requires careful configuration and complete understanding of your workload. And it is quite possible that you instead need to use something other than NO_HZ_FULL. If your question is instead "why must interrupts be disabled during context tracking", I must defer to people who understand the x86 entry/exit code paths better than I do. Thanx, Paul
Re: Context switch latency in tickless isolated CPU
On 08/22/2016 10:48 AM, Paul E. McKenney wrote: On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote: On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney wrote: If latency is all you care about, one approach is to map the device registers into userspace and do the I/O without assistance from the kernel. In addition to the context switch latency, local interrupts are also closed during user_enter and user_exit of the context tracking. Therefore, the interrupt latency might be also increased on the isolated tickless CPU. That will degrade the real time performance. Are these two events determined? Hmmm... Why would you be taking interrupts on your isolated tickless CPUs? Doesn't that defeat the purpose of designating them as isolated and tickless? Don't mean to butt in here but think about a "special" PCI card that does nothing but take an external interrupt or external interrupts from an outside source where the latency between the time it occurs on the outside and the time an isolated processor can act on that event. The IRQ of that card also being pinned/isolated to that processor. This is a very common thing in the RT world. Mark The key point being that effective use of NO_HZ_FULL requires careful configuration and complete understanding of your workload. And it is quite possible that you instead need to use something other than NO_HZ_FULL. If your question is instead "why must interrupts be disabled during context tracking", I must defer to people who understand the x86 entry/exit code paths better than I do. Thanx, Paul
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/12/2016 05:37 AM, Jiri Kosina wrote: On Thu, 11 Aug 2016, Mark Hounschell wrote: I just tested what is currently in Linus' tree and it does NOT work for me. Is there some minimalized reproducer you are seeing the regression with that you could share? Thanks, Your patch is NOT there yet. There is no reference to NODELAY in the floppy driver of Linus tree that I checked out yesterday. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/12/2016 05:37 AM, Jiri Kosina wrote: On Thu, 11 Aug 2016, Mark Hounschell wrote: I just tested what is currently in Linus' tree and it does NOT work for me. Is there some minimalized reproducer you are seeing the regression with that you could share? Thanks, Your patch is NOT there yet. There is no reference to NODELAY in the floppy driver of Linus tree that I checked out yesterday. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/11/2016 09:24 AM, Jiri Kosina wrote: On Wed, 3 Aug 2016, Mark Hounschell wrote: I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git. It's there. A patch for 4.5 would be easy for me though. Anyway the commit landed in Linus' tree already (ff06db1ef). Testing it in your environment would be appreciated. Thanks, I just tested what is currently in Linus' tree and it does NOT work for me. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/11/2016 09:24 AM, Jiri Kosina wrote: On Wed, 3 Aug 2016, Mark Hounschell wrote: I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git. It's there. A patch for 4.5 would be easy for me though. Anyway the commit landed in Linus' tree already (ff06db1ef). Testing it in your environment would be appreciated. Thanks, I just tested what is currently in Linus' tree and it does NOT work for me. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/02/2016 05:44 AM, Jiri Kosina wrote: On Wed, 13 Jul 2016, Mark Hounschell wrote: Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. I'm not supplementing anything. The driver _did_ this on its own. I mean, you're passing O_NDELAY to open(/dev/fd0) exactly to avoid kernel issuing check_disk_change(). That's the only semantics O_NDELAY has for fd. I just expect to be able to open the drive to get a handle without the kernel attempting to access the media. My apps manage a disk_change on their own. I don't think its check_disk_change that gives me my pain. There is some probe happening that fails when a floppy is installed that is not a "standard" format. That causes the open to fail which is the most pain. Still I should be able to get a handle without any media or even unrecognized media installed. Yeah, that's check_disk_change(). The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). Forgive me here as I'm ignorant about why any virtualized floppy would require the real physical kernel floppy driver to be involved at all. Because VMs (such as qemu) actually do emulate a FDC on a hardware level, but don't emulate the timings of the real hardware (which are not mandated by the spec, but "are just there"). This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. I would be happy to do some testing for you if needed. At least with regard to our apps. Could you please check whether my last patch that Jens queued in linux-block.git ("floppy: fix open(O_ACCMODE) for ioctl-only open" in for-linus branch) remedies at least some of the issues you are seeing? I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git. A patch for 4.5 would be easy for me though. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 08/02/2016 05:44 AM, Jiri Kosina wrote: On Wed, 13 Jul 2016, Mark Hounschell wrote: Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. I'm not supplementing anything. The driver _did_ this on its own. I mean, you're passing O_NDELAY to open(/dev/fd0) exactly to avoid kernel issuing check_disk_change(). That's the only semantics O_NDELAY has for fd. I just expect to be able to open the drive to get a handle without the kernel attempting to access the media. My apps manage a disk_change on their own. I don't think its check_disk_change that gives me my pain. There is some probe happening that fails when a floppy is installed that is not a "standard" format. That causes the open to fail which is the most pain. Still I should be able to get a handle without any media or even unrecognized media installed. Yeah, that's check_disk_change(). The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). Forgive me here as I'm ignorant about why any virtualized floppy would require the real physical kernel floppy driver to be involved at all. Because VMs (such as qemu) actually do emulate a FDC on a hardware level, but don't emulate the timings of the real hardware (which are not mandated by the spec, but "are just there"). This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. I would be happy to do some testing for you if needed. At least with regard to our apps. Could you please check whether my last patch that Jens queued in linux-block.git ("floppy: fix open(O_ACCMODE) for ioctl-only open" in for-linus branch) remedies at least some of the issues you are seeing? I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git. A patch for 4.5 would be easy for me though. Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/12/2016 04:54 AM, Jiri Kosina wrote: On Mon, 11 Jul 2016, Mark Hounschell wrote: Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. I'm not supplementing anything. The driver _did_ this on its own. I just expect to be able to open the drive to get a handle without the kernel attempting to access the media. My apps manage a disk_change on their own. I don't think its check_disk_change that gives me my pain. There is some probe happening that fails when a floppy is installed that is not a "standard" format. That causes the open to fail which is the most pain. Still I should be able to get a handle without any media or even unrecognized media installed. Funny, though even fdformat from the linux-utils package won't allow me to format a floppy that is NOT already formatted in a supported format. Once I format a floppy to an other than standard format, fdformat will not allow me to reformat it back to a standard format. Doesn't make much sense does it? "Unable to format a floppy that is not already formatted??" That is another issue though. The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). Forgive me here as I'm ignorant about why any virtualized floppy would require the real physical kernel floppy driver to be involved at all. We also do virtualized floppies in our user land apps but we certainly don't require any kernel floppy driver support to do it? This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. I would be happy to do some testing for you if needed. At least with regard to our apps. Regards Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/12/2016 04:54 AM, Jiri Kosina wrote: On Mon, 11 Jul 2016, Mark Hounschell wrote: Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? Alright, so you are basically supplementing O_NDELAY flag in order to avoid check_disk_change() being called. It's rather a coincidence that it has worked this way, but I agree with you that we can't ignore the fact that there is userspace relying on this behavior. I'm not supplementing anything. The driver _did_ this on its own. I just expect to be able to open the drive to get a handle without the kernel attempting to access the media. My apps manage a disk_change on their own. I don't think its check_disk_change that gives me my pain. There is some probe happening that fails when a floppy is installed that is not a "standard" format. That causes the open to fail which is the most pain. Still I should be able to get a handle without any media or even unrecognized media installed. Funny, though even fdformat from the linux-utils package won't allow me to format a floppy that is NOT already formatted in a supported format. Once I format a floppy to an other than standard format, fdformat will not allow me to reformat it back to a standard format. Doesn't make much sense does it? "Unable to format a floppy that is not already formatted??" That is another issue though. The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now That's not really true; the code is a racy mess, and this is being uncovered only when virtualized floppy devices started to exist (because they are much faster than a real hardware, and the different timing reveals bugs that were not visible before). Forgive me here as I'm ignorant about why any virtualized floppy would require the real physical kernel floppy driver to be involved at all. We also do virtualized floppies in our user land apps but we certainly don't require any kernel floppy driver support to do it? This particular fix was because syzkaller found a way how easily corrupt kernel memory using O_NDELAY to floppy driver; see https://lkml.org/lkml/2016/2/2/848 so I am really confused as to why these changes were induced. The floppy driver is in an orphan mode; no new "features" are added "just because". Everything that's happening there is to fix real bugs in the kernel. I'll look into ways how to fix this, but I am afraid this is going to be really tricky. Therefore we'd have to very likely proceed asap with revert of 09954bad448 and coming up with a workaround that'd still avoid the bug reported by syzkaller. I would be happy to do some testing for you if needed. At least with regard to our apps. Regards Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/11/2016 11:36 AM, Jiri Kosina wrote: On Tue, 5 Jul 2016, Mark Hounschell wrote: From: Jiri Kosina <jkos...@suse.cz> Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt <w...@djo.tudelft.nl> Tested-by: Wim Osterholt <w...@djo.tudelft.nl> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- [ ... snip ... ] But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Hi Mark, thanks for the regression report. For my better understanding of your issue -- what behavior/semantics exactly does your userspace think it'll be getting from opening /dev/fd0 with O_NDELAY? Thanks, Hi Jiri. Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now so I am really confused as to why these changes were induced. As for the "O_RDONLY | O_WRONLY" thing you decided to change back, which I'm happy to see, was wrong. Almost ALL removable media devices have W/R protection built into the media. For ever, I understood that it was MY responsibility to write protect my removable media. An open of a removable device should never even care about that stuff. It is the users responsibility. We use extensively, the FDRAWCMD ioctl API. It is totally borked now for us without maintaining our own kernel patch that reverts the changes from 4.4 to 4.5. Regards Mark
Re: Resend: Another 4.4 to 4.5 floppy issue
On 07/11/2016 11:36 AM, Jiri Kosina wrote: On Tue, 5 Jul 2016, Mark Hounschell wrote: From: Jiri Kosina Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina --- [ ... snip ... ] But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Hi Mark, thanks for the regression report. For my better understanding of your issue -- what behavior/semantics exactly does your userspace think it'll be getting from opening /dev/fd0 with O_NDELAY? Thanks, Hi Jiri. Well, all that was specified in my original post. I can no longer open the floppy drive with no floppy media inserted. Worse, I can also no longer open a floppy with media inserted that is not a "linux" recognized format. A floppy drive is a removable media device and should be treated as such. The original implementation of the O_NDELAY flag allowed it to be. Any removable media device should be capable of being opened with no, or even unrecognizable media installed. The kernel and its utilities should not "assume" to much when it comes to removable media. Consider a SCSI tape drive or even a removable media SCSI disk drive. How would you explain an open failure to someone trying to open a SCSI tape drive that had no tape or even a "non-tar" formatted tape media in it??? Or better yet, trying to open a removable media device the was write protected but didn't include O_RDONLY in the open? The original behavior of the floppy driver was correct. I have no idea what BUG these changes were supposed to fix but the "fix" obviously broke user land. Was this bug reported by some new ROBOT test or something? The kernel floppy driver has been stable for years now so I am really confused as to why these changes were induced. As for the "O_RDONLY | O_WRONLY" thing you decided to change back, which I'm happy to see, was wrong. Almost ALL removable media devices have W/R protection built into the media. For ever, I understood that it was MY responsibility to write protect my removable media. An open of a removable device should never even care about that stuff. It is the users responsibility. We use extensively, the FDRAWCMD ioctl API. It is totally borked now for us without maintaining our own kernel patch that reverts the changes from 4.4 to 4.5. Regards Mark
Resend: Another 4.4 to 4.5 floppy issue
Just rejoined the list due to floppy open problems created from 4.4 to 4.5. I found the following email that indicates a fix for one of the problems. From: Jiri KosinaCommit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina --- Jens, this should preferably go into 4.7-rcX and to -stable as well. drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 84708a5..a1dcf12 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; -- Jiri Kosina SUSE Labs But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Thanks and regards Mark
Resend: Another 4.4 to 4.5 floppy issue
Just rejoined the list due to floppy open problems created from 4.4 to 4.5. I found the following email that indicates a fix for one of the problems. From: Jiri Kosina Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina --- Jens, this should preferably go into 4.7-rcX and to -stable as well. drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 84708a5..a1dcf12 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; -- Jiri Kosina SUSE Labs But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Thanks and regards Mark
Another 4.4 to 4.5 floppy issue
Just rejoined the list due to floppy open problems created from 4.4 to 4.5. I found the following email that indicates a fix for one of the problems. From: Jiri KosinaCommit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina --- Jens, this should preferably go into 4.7-rcX and to -stable as well. drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 84708a5..a1dcf12 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; -- Jiri Kosina SUSE Labs But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Thanks and regards Mark
Another 4.4 to 4.5 floppy issue
Just rejoined the list due to floppy open problems created from 4.4 to 4.5. I found the following email that indicates a fix for one of the problems. From: Jiri Kosina Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Cc: sta...@vger.kernel.org # v4.5+ Reported-by: Wim Osterholt Tested-by: Wim Osterholt Signed-off-by: Jiri Kosina --- Jens, this should preferably go into 4.7-rcX and to -stable as well. drivers/block/floppy.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 84708a5..a1dcf12 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) opened_bdev[drive] = bdev; - if (!(mode & (FMODE_READ|FMODE_WRITE))) { - res = -EINVAL; - goto out; - } - res = -ENXIO; if (!floppy_track_buffer) { @@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device *bdev, fmode_t mode) if (UFDCS->rawcmd == 1) UFDCS->rawcmd = 2; - UDRS->last_checked = 0; - clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); - check_disk_change(bdev); - if (test_bit(FD_DISK_CHANGED_BIT, >flags)) - goto out; - if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) - goto out; + if (mode & (FMODE_READ|FMODE_WRITE)) { + UDRS->last_checked = 0; + clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags); + check_disk_change(bdev); + if (test_bit(FD_DISK_CHANGED_BIT, >flags)) + goto out; + if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags)) + goto out; + } res = -EROFS; -- Jiri Kosina SUSE Labs But this does not completely fix all the problems induced by the original changes from 4.4 to 4.5. The following is what we use to open the floppy. fd = open(device, O_RDWR | O_NDELAY); The FMODE_NDELAY check that was removed now prevents one from doing an open of the device with no media inserted. It also prevents one from doing an open of the device with media inserted that is not already formatted in a "standard" format. I do both of these things a lot. I deal with a few very non-standard formats and this change prevents me from doing what I've been doing for YEARS. Could we please get the original behavior back in the floppy driver. Thanks and regards Mark
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/26/2015 11:52 AM, Joerg Roedel wrote: Hi Mark, On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote: Sorry but CMA was still badly broken. I have a patch below that works. In which way is it broken? What happens when you try to allocate memory with dma_alloc_coherent? I got Oops on both the alloc and free and I had to hit the reset button. I've tested it with small (no CMA) and large (with CMA) dma's using dma_alloc_coherent. The patch below is just the "git diff" from your cloned tree piped to a file then copied into this email. If you require an official patch I can send one. Just let me know. The main differences I can spot are that you change the order (first CMA, then buddy) and you manually align the input size. I can see the reason for the later, but why does CMA need to be tried first? I believe that is how dma_alloc_coherent works. If CMA is present it uses it. Even for small allocations. If not present then it does the best it can. That patch was derived by looking at the Intel iommu driver works properly. Maybe you should take a look at that. Also, in my opinion, this CMA thing is clearly a BUG not a feature request. The AMD iommu clearly breaks CMA. I feel what ever fix you are happy with should be back ported to stable. It is not a BUG, the interface definition for dma_alloc_coherent does not specify that it can allocate infinite amounts of memory. So this patch does not qualify for stable. I don't care what the "the interface definition for dma_alloc_coherent" says. If it does not describe it's use with CMA, it is outdated. Maybe the "interface definition for dma_alloc_coherent" was done before CMA was implemented. Maybe you should be looking at CMA Documentation. Unfortunately there does not yet seem to be any in the Documentation dir. Probably because CMA is supposed to be transparent to the DMA API. You know, one should not need to details about it, etc... It is a BUG. To access CMA you use dma_alloc_coherent. You have completely highjacked that function. You have broken CMA. PERIOD. That is a BUG. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
Hi Joerg, On 03/26/2015 08:45 AM, Joerg Roedel wrote: > On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote: >> BTW, so far the first 2 patches are working well. I was going to >> wait until the end of the day to report but so far I have been >> unable to produce the problems I was seeing. And I am in the middle >> of some driver work so lots of unloading/loading going on. > > Great, thanks. Please let me know when you have test results for the > other patches too. > > > Joerg Sorry but CMA was still badly broken. I have a patch below that works. I've tested it with small (no CMA) and large (with CMA) dma's using dma_alloc_coherent. The patch below is just the "git diff" from your cloned tree piped to a file then copied into this email. If you require an official patch I can send one. Just let me know. Also, in my opinion, this CMA thing is clearly a BUG not a feature request. The AMD iommu clearly breaks CMA. I feel what ever fix you are happy with should be back ported to stable. Regards Mark diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a0197d0..5ea4fed 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2917,28 +2917,36 @@ static void *alloc_coherent(struct device *dev, size_t size, u64 dma_mask = dev->coherent_dma_mask; struct protection_domain *domain; unsigned long flags; - struct page *page; + struct page *page = 0; + int order; + unsigned int count; + + size = PAGE_ALIGN(size); + order = get_order(size); + count = size >> PAGE_SHIFT; INC_STATS_COUNTER(cnt_alloc_coherent); domain = get_domain(dev); if (PTR_ERR(domain) == -EINVAL) { - page = alloc_pages(flag, get_order(size)); + page = alloc_pages(flag, order); *dma_addr = page_to_phys(page); return page_address(page); } else if (IS_ERR(domain)) return NULL; - dma_mask = dev->coherent_dma_mask; flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); + flag |= __GFP_ZERO; - page = alloc_pages(flag, get_order(size)); + if (flag & __GFP_WAIT) { + page = dma_alloc_from_contiguous(dev, count, order); + if (page && page_to_phys(page) + size > dma_mask) { + dma_release_from_contiguous(dev, page, count); + page = NULL; + } + } if (!page) { - if (!(flag & __GFP_WAIT)) - return NULL; - - page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT, -get_order(size)); + page = alloc_pages(flag, order); if (!page) return NULL; } @@ -2951,7 +2959,7 @@ static void *alloc_coherent(struct device *dev, size_t size, *dma_addr = __map_single(dev, domain->priv, page_to_phys(page), size, DMA_BIDIRECTIONAL, true, dma_mask); - if (*dma_addr == DMA_ERROR_CODE) { + if (!dma_addr || (*dma_addr == DMA_ERROR_CODE)) { spin_unlock_irqrestore(>lock, flags); goto out_free; } @@ -2965,7 +2973,7 @@ static void *alloc_coherent(struct device *dev, size_t size, out_free: if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT)) - __free_pages(page, get_order(size)); + __free_pages(page, order); return NULL; } @@ -2979,6 +2987,11 @@ static void free_coherent(struct device *dev, size_t size, { unsigned long flags; struct protection_domain *domain; + int order; + struct page *page = virt_to_page(virt_addr); + + size = PAGE_ALIGN(size); + order = get_order(size); INC_STATS_COUNTER(cnt_free_coherent); @@ -2995,7 +3008,9 @@ static void free_coherent(struct device *dev, size_t size, spin_unlock_irqrestore(>lock, flags); free_mem: - free_pages((unsigned long)virt_addr, get_order(size)); + if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT)) + __free_pages(page, order); + } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/26/2015 11:52 AM, Joerg Roedel wrote: Hi Mark, On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote: Sorry but CMA was still badly broken. I have a patch below that works. In which way is it broken? What happens when you try to allocate memory with dma_alloc_coherent? I got Oops on both the alloc and free and I had to hit the reset button. I've tested it with small (no CMA) and large (with CMA) dma's using dma_alloc_coherent. The patch below is just the git diff from your cloned tree piped to a file then copied into this email. If you require an official patch I can send one. Just let me know. The main differences I can spot are that you change the order (first CMA, then buddy) and you manually align the input size. I can see the reason for the later, but why does CMA need to be tried first? I believe that is how dma_alloc_coherent works. If CMA is present it uses it. Even for small allocations. If not present then it does the best it can. That patch was derived by looking at the Intel iommu driver works properly. Maybe you should take a look at that. Also, in my opinion, this CMA thing is clearly a BUG not a feature request. The AMD iommu clearly breaks CMA. I feel what ever fix you are happy with should be back ported to stable. It is not a BUG, the interface definition for dma_alloc_coherent does not specify that it can allocate infinite amounts of memory. So this patch does not qualify for stable. I don't care what the the interface definition for dma_alloc_coherent says. If it does not describe it's use with CMA, it is outdated. Maybe the interface definition for dma_alloc_coherent was done before CMA was implemented. Maybe you should be looking at CMA Documentation. Unfortunately there does not yet seem to be any in the Documentation dir. Probably because CMA is supposed to be transparent to the DMA API. You know, one should not need to details about it, etc... It is a BUG. To access CMA you use dma_alloc_coherent. You have completely highjacked that function. You have broken CMA. PERIOD. That is a BUG. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
Hi Joerg, On 03/26/2015 08:45 AM, Joerg Roedel wrote: On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote: BTW, so far the first 2 patches are working well. I was going to wait until the end of the day to report but so far I have been unable to produce the problems I was seeing. And I am in the middle of some driver work so lots of unloading/loading going on. Great, thanks. Please let me know when you have test results for the other patches too. Joerg Sorry but CMA was still badly broken. I have a patch below that works. I've tested it with small (no CMA) and large (with CMA) dma's using dma_alloc_coherent. The patch below is just the git diff from your cloned tree piped to a file then copied into this email. If you require an official patch I can send one. Just let me know. Also, in my opinion, this CMA thing is clearly a BUG not a feature request. The AMD iommu clearly breaks CMA. I feel what ever fix you are happy with should be back ported to stable. Regards Mark diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a0197d0..5ea4fed 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2917,28 +2917,36 @@ static void *alloc_coherent(struct device *dev, size_t size, u64 dma_mask = dev-coherent_dma_mask; struct protection_domain *domain; unsigned long flags; - struct page *page; + struct page *page = 0; + int order; + unsigned int count; + + size = PAGE_ALIGN(size); + order = get_order(size); + count = size PAGE_SHIFT; INC_STATS_COUNTER(cnt_alloc_coherent); domain = get_domain(dev); if (PTR_ERR(domain) == -EINVAL) { - page = alloc_pages(flag, get_order(size)); + page = alloc_pages(flag, order); *dma_addr = page_to_phys(page); return page_address(page); } else if (IS_ERR(domain)) return NULL; - dma_mask = dev-coherent_dma_mask; flag = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); + flag |= __GFP_ZERO; - page = alloc_pages(flag, get_order(size)); + if (flag __GFP_WAIT) { + page = dma_alloc_from_contiguous(dev, count, order); + if (page page_to_phys(page) + size dma_mask) { + dma_release_from_contiguous(dev, page, count); + page = NULL; + } + } if (!page) { - if (!(flag __GFP_WAIT)) - return NULL; - - page = dma_alloc_from_contiguous(dev, size PAGE_SHIFT, -get_order(size)); + page = alloc_pages(flag, order); if (!page) return NULL; } @@ -2951,7 +2959,7 @@ static void *alloc_coherent(struct device *dev, size_t size, *dma_addr = __map_single(dev, domain-priv, page_to_phys(page), size, DMA_BIDIRECTIONAL, true, dma_mask); - if (*dma_addr == DMA_ERROR_CODE) { + if (!dma_addr || (*dma_addr == DMA_ERROR_CODE)) { spin_unlock_irqrestore(domain-lock, flags); goto out_free; } @@ -2965,7 +2973,7 @@ static void *alloc_coherent(struct device *dev, size_t size, out_free: if (!dma_release_from_contiguous(dev, page, size PAGE_SHIFT)) - __free_pages(page, get_order(size)); + __free_pages(page, order); return NULL; } @@ -2979,6 +2987,11 @@ static void free_coherent(struct device *dev, size_t size, { unsigned long flags; struct protection_domain *domain; + int order; + struct page *page = virt_to_page(virt_addr); + + size = PAGE_ALIGN(size); + order = get_order(size); INC_STATS_COUNTER(cnt_free_coherent); @@ -2995,7 +3008,9 @@ static void free_coherent(struct device *dev, size_t size, spin_unlock_irqrestore(domain-lock, flags); free_mem: - free_pages((unsigned long)virt_addr, get_order(size)); + if (!dma_release_from_contiguous(dev, page, size PAGE_SHIFT)) + __free_pages(page, order); + } /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/25/2015 11:13 AM, Joerg Roedel wrote: Hi again, On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote: On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote: I'll be happy to test it. Okay, here you go: git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu I just added more patches to implement the use of the contiguous memory allocator for the dma_alloc_coherent path. Since you asked for it, can you test this as well? Sure. No problem. If I update my tree from the above, will I get it. BTW, so far the first 2 patches are working well. I was going to wait until the end of the day to report but so far I have been unable to produce the problems I was seeing. And I am in the middle of some driver work so lots of unloading/loading going on. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/25/2015 11:13 AM, Joerg Roedel wrote: Hi again, On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote: On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote: I'll be happy to test it. Okay, here you go: git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu I just added more patches to implement the use of the contiguous memory allocator for the dma_alloc_coherent path. Since you asked for it, can you test this as well? Sure. No problem. If I update my tree from the above, will I get it. BTW, so far the first 2 patches are working well. I was going to wait until the end of the day to report but so far I have been unable to produce the problems I was seeing. And I am in the middle of some driver work so lots of unloading/loading going on. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/23/2015 11:03 AM, Joerg Roedel wrote: Hi Mark, On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote: It looks like this problem is NOT a bug with the SCSI aic7xxx driver after all. I can duplicate this BUG very easily with other hardware. Simply removing a driver module (whether it its self, has actually used any of the DMA API or not) that is sitting on the same pci bus as a card that is actually using DMA will cause this. And that card that is in use and using DMA will no longer function. It "looks and feels" like unloading a module causes the IOMMU to improperly unmap valid mappings. You are right, I looked into the code and found the problem. I'll post a fix for testing this week. I'll be happy to test it. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
On 03/23/2015 11:03 AM, Joerg Roedel wrote: Hi Mark, On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote: It looks like this problem is NOT a bug with the SCSI aic7xxx driver after all. I can duplicate this BUG very easily with other hardware. Simply removing a driver module (whether it its self, has actually used any of the DMA API or not) that is sitting on the same pci bus as a card that is actually using DMA will cause this. And that card that is in use and using DMA will no longer function. It looks and feels like unloading a module causes the IOMMU to improperly unmap valid mappings. You are right, I looked into the code and found the problem. I'll post a fix for testing this week. I'll be happy to test it. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/15/2015 08:07 AM, Mark Hounschell wrote: On 03/14/2015 04:44 AM, Greg KH wrote: On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote: On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius --- v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ -dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); -brd = dgnc_Board[dgnc_NumBoards]; +brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole "static list of boards/cards" go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole "board state" mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Really? It's always been broken? Why don't we just delete the thing? Well, broken might not be the best word. I should have just said, doesn't work for me. It did work on a 3.4 kernel when I provided it to you. I have spent no time as to why but will take a look on Monday. The driver actually does work. It didn't work for me because the device entries were not as they used to be. So if you want to make a change you think needs testing, I can test it. I only have a single card 8 port though. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/15/2015 08:07 AM, Mark Hounschell wrote: On 03/14/2015 04:44 AM, Greg KH wrote: On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote: On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius giedrius.statkevic...@gmail.com --- v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ -dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); -brd = dgnc_Board[dgnc_NumBoards]; +brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole static list of boards/cards go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole board state mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Really? It's always been broken? Why don't we just delete the thing? Well, broken might not be the best word. I should have just said, doesn't work for me. It did work on a 3.4 kernel when I provided it to you. I have spent no time as to why but will take a look on Monday. The driver actually does work. It didn't work for me because the device entries were not as they used to be. So if you want to make a change you think needs testing, I can test it. I only have a single card 8 port though. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/14/2015 04:44 AM, Greg KH wrote: On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote: On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius --- v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ - dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); - brd = dgnc_Board[dgnc_NumBoards]; + brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole "static list of boards/cards" go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole "board state" mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Really? It's always been broken? Why don't we just delete the thing? Well, broken might not be the best word. I should have just said, doesn't work for me. It did work on a 3.4 kernel when I provided it to you. I have spent no time as to why but will take a look on Monday. Greg, while I'm here, how can we get Digi firmware into the kernel firmware tree? The dgap card is useless without it and I suspect the dgnc is as well. Submit a patch to the linux-firmware git tree with the firmware in it. I don't know if the binary blobs are GPL compatable? Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/14/2015 04:44 AM, Greg KH wrote: On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote: On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius giedrius.statkevic...@gmail.com --- v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ - dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); - brd = dgnc_Board[dgnc_NumBoards]; + brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole static list of boards/cards go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole board state mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Really? It's always been broken? Why don't we just delete the thing? Well, broken might not be the best word. I should have just said, doesn't work for me. It did work on a 3.4 kernel when I provided it to you. I have spent no time as to why but will take a look on Monday. Greg, while I'm here, how can we get Digi firmware into the kernel firmware tree? The dgap card is useless without it and I suspect the dgnc is as well. Submit a patch to the linux-firmware git tree with the firmware in it. I don't know if the binary blobs are GPL compatable? Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius --- v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ - dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); - brd = dgnc_Board[dgnc_NumBoards]; + brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole "static list of boards/cards" go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole "board state" mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Greg, while I'm here, how can we get Digi firmware into the kernel firmware tree? The dgap card is useless without it and I suspect the dgnc is as well. Regards mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize
On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote: On 2015.03.12 12:08, Greg KH wrote: On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote: Remove BOARD_FAILED and don't save dgnc_boards which failed to initialize. Assign the result of kzalloc() to brd in dgnc_found_board() and only put it in the dgnc_Board[] if it successfully initializes. Also, remove BOARD_FAILED enum and all ifs that check for it. Finally, remove one final place where state was set to BOARD_FAILED which was even redundant before this patch. Signed-off-by: Giedrius Statkevičius giedrius.statkevic...@gmail.com --- v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot to do in the first version drivers/staging/dgnc/dgnc_driver.c | 20 ++-- drivers/staging/dgnc/dgnc_driver.h | 3 +-- drivers/staging/dgnc/dgnc_mgmt.c | 5 + drivers/staging/dgnc/dgnc_tty.c| 8 4 files changed, 4 insertions(+), 32 deletions(-) diff --git a/drivers/staging/dgnc/dgnc_driver.c b/drivers/staging/dgnc/dgnc_driver.c index fa1ee79..075727d 100644 --- a/drivers/staging/dgnc/dgnc_driver.c +++ b/drivers/staging/dgnc/dgnc_driver.c @@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id) unsigned long flags; /* get the board structure and prep it */ - dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL); - brd = dgnc_Board[dgnc_NumBoards]; + brd = kzalloc(sizeof(*brd), GFP_KERNEL); You've done a great job here, but... Yeah, sorry... I really want to see this whole static list of boards/cards go away. There should not be any need for that in any in-kernel driver. Your patch here is a sign that things are really wrong with this whole static array mess. So could you do that instead? I don't want to take patches around this whole board state mess anymore, as it should all not be needed at all. If you need pointers on what needs to be done here, just let me know. thanks, greg k-h I can try :) But my main concern is the lack of dgnc driver maintainers activity and that I don't own the hardware this driver is written for as it's quite expensive (cheapest cards I've found start at 300$~) and I can't afford it ATM. But I guess if I keep the patches small and logical everything will be okay. I am on the maintainers list and actually have hardware. I have just been so swamped the last few months that I haven't been able to do ANYTHING here. Even work on the dgap driver that I was working on, I just haven't had the time. I can tell you that the dgnc driver does NOT work at all and hasn't worked since it was introduced into staging by Greg. Greg, while I'm here, how can we get Digi firmware into the kernel firmware tree? The dgap card is useless without it and I suspect the dgnc is as well. Regards mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
Hi Joerg, It looks like this problem is NOT a bug with the SCSI aic7xxx driver after all. I can duplicate this BUG very easily with other hardware. Simply removing a driver module (whether it its self, has actually used any of the DMA API or not) that is sitting on the same pci bus as a card that is actually using DMA will cause this. And that card that is in use and using DMA will no longer function. It "looks and feels" like unloading a module causes the IOMMU to improperly unmap valid mappings. Regards Mark > Kernel 3.18.7 (x86_64) with an enabled IOMMU. This happens immediately after unloading the aic7xxx driver. In fact this happens unloading any module associated with any pci card on the same bus as the Adaptec AHA-2930CU card. If I blacklist the aic7xxx driver the problem never shows. If I disable the IOMMU, the problem never shows. This was brought up over at io...@lists.linux-foundation.org in this thread. http://lists.linuxfoundation.org/pipermail/iommu/2015-February/012274.html and they think it is not an iommu problem, but a possibly a problem with the aic7xxx driver. Complete dmesg attached. [ 106.355752] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00073980 flags=0x0070] [ 106.358732] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x000739a0 flags=0x0070] [ 106.361870] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x000739b0 flags=0x0070] [ 112.352950] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d00 flags=0x0070] [ 112.356057] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d20 flags=0x0070] [ 112.359142] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d30 flags=0x0070] [ 126.588896] AMD-Vi: Using protection domain 1 for device :10:05.0 [ 126.591992] [ cut here ] [ 126.595107] WARNING: CPU: 4 PID: 0 at drivers/iommu/amd_iommu.c:2637 dma_ops_domain_unmap.part.13+0x65/0x70() [ 126.598408] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet kvm crc32_pclmul crc32c_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec aes_x86_64 drm snd_hwdep snd_pcm glue_helper lrw aic79xx gf128mul ablk_helper cryptd snd_timer 3c59x scsi_transport_spi snd r8169 xhci_pci xhci_hcd serio_raw pcspkr mii tpm_infineon tpm_tis fam15h_power k10temp i2c_piix4 processor thermal_sys shpchp tpm soundcore 8250_fintek dm_mod sr_mod cdrom ata_generic mxm_wmi pata_atiixp ohci_pci wmi button sg autofs4 [last unloaded: aic7xxx] [ 126.613992] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.18.7-lcrs #1 [ 126.618146] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 [ 126.622456] 0009 88044fd03cb8 807bde4d [ 126.626815] 88044fd03cf8 8024cf7c 88044fd03cf8 [ 126.631172] 88043cdbc380 0600 000702c0 [ 126.635504] Call Trace: [ 126.639703][] dump_stack+0x4e/0x71 [ 126.644003] [] warn_slowpath_common+0x7c/0xa0 [ 126.648335] [] warn_slowpath_null+0x15/0x20 [ 126.652632] [] dma_ops_domain_unmap.part.13+0x65/0x70 [ 126.656919] [] __unmap_single.isra.16+0x9b/0x100 [ 126.661242] [] unmap_page+0x48/0x70 [ 126.665572] [] boomerang_rx+0x333/0x600 [3c59x] [ 126.669968] [] boomerang_interrupt+0x16a/0x4f0 [3c59x] [ 126.674400] [] handle_irq_event_percpu+0x3e/0x1e0 [ 126.678860] [] handle_irq_event+0x3c/0x60 [ 126.683298] [] handle_fasteoi_irq+0x7e/0x130 [ 126.687732] [] handle_irq+0x1d/0x30 [ 126.692164] [] do_IRQ+0x4e/0xf0 [ 126.696577] [] common_interrupt+0x6a/0x6a [ 126.700873][] ? hrtimer_start+0x13/0x20 [ 126.705107] [] ? default_idle+0x17/0x100 [ 126.709386] [] arch_cpu_idle+0xa/0x10 [ 126.713685] [] cpu_startup_entry+0x34a/0x380 [ 126.718010] [] ? cpu_startup_entry+0x1/0x380 [ 126.722361] [] start_secondary+0x157/0x180 [ 126.726731] ---[ end trace ace20602a5302afb ]--- [ 128.557546] [ cut here ] [ 128.560501] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x000766c0 flags=0x0020] [ 128.560502] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076700 flags=0x0020] [ 128.560503] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076740 flags=0x0020] [ 128.560504] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076780 flags=0x0020] [ 128.560505] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076800 flags=0x0020] [ 128.560506] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x000767c0 flags=0x0020] [ 128.560506] AMD-Vi: Event
Re: BUG: SCSI aic7xxx driver and AMD IOMMU
Hi Joerg, It looks like this problem is NOT a bug with the SCSI aic7xxx driver after all. I can duplicate this BUG very easily with other hardware. Simply removing a driver module (whether it its self, has actually used any of the DMA API or not) that is sitting on the same pci bus as a card that is actually using DMA will cause this. And that card that is in use and using DMA will no longer function. It looks and feels like unloading a module causes the IOMMU to improperly unmap valid mappings. Regards Mark Kernel 3.18.7 (x86_64) with an enabled IOMMU. This happens immediately after unloading the aic7xxx driver. In fact this happens unloading any module associated with any pci card on the same bus as the Adaptec AHA-2930CU card. If I blacklist the aic7xxx driver the problem never shows. If I disable the IOMMU, the problem never shows. This was brought up over at io...@lists.linux-foundation.org in this thread. http://lists.linuxfoundation.org/pipermail/iommu/2015-February/012274.html and they think it is not an iommu problem, but a possibly a problem with the aic7xxx driver. Complete dmesg attached. [ 106.355752] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00073980 flags=0x0070] [ 106.358732] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x000739a0 flags=0x0070] [ 106.361870] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x000739b0 flags=0x0070] [ 112.352950] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d00 flags=0x0070] [ 112.356057] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d20 flags=0x0070] [ 112.359142] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x address=0x00074d30 flags=0x0070] [ 126.588896] AMD-Vi: Using protection domain 1 for device :10:05.0 [ 126.591992] [ cut here ] [ 126.595107] WARNING: CPU: 4 PID: 0 at drivers/iommu/amd_iommu.c:2637 dma_ops_domain_unmap.part.13+0x65/0x70() [ 126.598408] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet kvm crc32_pclmul crc32c_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec aes_x86_64 drm snd_hwdep snd_pcm glue_helper lrw aic79xx gf128mul ablk_helper cryptd snd_timer 3c59x scsi_transport_spi snd r8169 xhci_pci xhci_hcd serio_raw pcspkr mii tpm_infineon tpm_tis fam15h_power k10temp i2c_piix4 processor thermal_sys shpchp tpm soundcore 8250_fintek dm_mod sr_mod cdrom ata_generic mxm_wmi pata_atiixp ohci_pci wmi button sg autofs4 [last unloaded: aic7xxx] [ 126.613992] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.18.7-lcrs #1 [ 126.618146] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013 [ 126.622456] 0009 88044fd03cb8 807bde4d [ 126.626815] 88044fd03cf8 8024cf7c 88044fd03cf8 [ 126.631172] 88043cdbc380 0600 000702c0 [ 126.635504] Call Trace: [ 126.639703] IRQ [807bde4d] dump_stack+0x4e/0x71 [ 126.644003] [8024cf7c] warn_slowpath_common+0x7c/0xa0 [ 126.648335] [8024d045] warn_slowpath_null+0x15/0x20 [ 126.652632] [806a47e5] dma_ops_domain_unmap.part.13+0x65/0x70 [ 126.656919] [806a675b] __unmap_single.isra.16+0x9b/0x100 [ 126.661242] [806a7198] unmap_page+0x48/0x70 [ 126.665572] [a0157373] boomerang_rx+0x333/0x600 [3c59x] [ 126.669968] [a015784a] boomerang_interrupt+0x16a/0x4f0 [3c59x] [ 126.674400] [8029600e] handle_irq_event_percpu+0x3e/0x1e0 [ 126.678860] [802961ec] handle_irq_event+0x3c/0x60 [ 126.683298] [80298bfe] handle_fasteoi_irq+0x7e/0x130 [ 126.687732] [8020543d] handle_irq+0x1d/0x30 [ 126.692164] [80204cee] do_IRQ+0x4e/0xf0 [ 126.696577] [807c596a] common_interrupt+0x6a/0x6a [ 126.700873] EOI [802a6e23] ? hrtimer_start+0x13/0x20 [ 126.705107] [8020cad7] ? default_idle+0x17/0x100 [ 126.709386] [8020d4ca] arch_cpu_idle+0xa/0x10 [ 126.713685] [80281dda] cpu_startup_entry+0x34a/0x380 [ 126.718010] [80281a91] ? cpu_startup_entry+0x1/0x380 [ 126.722361] [80233857] start_secondary+0x157/0x180 [ 126.726731] ---[ end trace ace20602a5302afb ]--- [ 128.557546] [ cut here ] [ 128.560501] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x000766c0 flags=0x0020] [ 128.560502] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076700 flags=0x0020] [ 128.560503] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 address=0x00076740 flags=0x0020] [ 128.560504] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0
Re: IOMMU/DMA API inquiry
Hi Joerg, Thanks for the response. On 02/18/2015 01:19 PM, j...@8bytes.org >> Joerg Roedel wrote: Hi Mark, On Tue, Feb 17, 2015 at 02:48:03PM -0500, Mark Hounschell wrote: I understand that AMD IOMMU support is not available for 32-bit kernels. I believe the INTEL IOMMU is supported there. Not knowing why, I was curious if that is going to remain that way? Yes, I have no plan on making the AMD IOMMU driver available on 32bit. But I would not be resistant to patches enabling the driver there. OK. I might take a look. Should it not "just work" more or less? I've learned that the AMD IOMMU does not play well with the kernels "Contiguous Memory Allocator" (CMA). I also believe, but could be mistaken, that the INTEL IOMMU does. Again, not knowing why, I was curious if that is going to remain that way also? No, I will queue a patch for the next merge window to enable CMA use in the AMD IOMMU driver. So there will be support for this. Great! That's a couple of kernels away then. I would be happy to test if you have a need. Is the fact that the AMD IOMMU is not supported on 32 bit kernels the reason that dma_map_page always returns 0 on 32 bit kernels. There is no particular reason for not supporting it on 32 bit kernels, it just didn't seem to be important yet. At least not important enough to justify the work. Understood. I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I thought that maybe dma_map_sg could be used to get around the fact that AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). Is that correct or can dma_map_sg somehow enable you to do a single DMA using a single address for the entire sglist? The map_sg functions can't be used with the AMD IOMMU driver to work around missing CMA support. Depending on what you want it might work with the Intel IOMMU driver, as this one allocates a single IOVA region for the entire sg_list. Thanks Joerg Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IOMMU/DMA API inquiry
Hi Joerg, Thanks for the response. On 02/18/2015 01:19 PM, j...@8bytes.org Joerg Roedel wrote: Hi Mark, On Tue, Feb 17, 2015 at 02:48:03PM -0500, Mark Hounschell wrote: I understand that AMD IOMMU support is not available for 32-bit kernels. I believe the INTEL IOMMU is supported there. Not knowing why, I was curious if that is going to remain that way? Yes, I have no plan on making the AMD IOMMU driver available on 32bit. But I would not be resistant to patches enabling the driver there. OK. I might take a look. Should it not just work more or less? I've learned that the AMD IOMMU does not play well with the kernels Contiguous Memory Allocator (CMA). I also believe, but could be mistaken, that the INTEL IOMMU does. Again, not knowing why, I was curious if that is going to remain that way also? No, I will queue a patch for the next merge window to enable CMA use in the AMD IOMMU driver. So there will be support for this. Great! That's a couple of kernels away then. I would be happy to test if you have a need. Is the fact that the AMD IOMMU is not supported on 32 bit kernels the reason that dma_map_page always returns 0 on 32 bit kernels. There is no particular reason for not supporting it on 32 bit kernels, it just didn't seem to be important yet. At least not important enough to justify the work. Understood. I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I thought that maybe dma_map_sg could be used to get around the fact that AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). Is that correct or can dma_map_sg somehow enable you to do a single DMA using a single address for the entire sglist? The map_sg functions can't be used with the AMD IOMMU driver to work around missing CMA support. Depending on what you want it might work with the Intel IOMMU driver, as this one allocates a single IOVA region for the entire sg_list. Thanks Joerg Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
IOMMU/DMA API inquiry
I've searched the Doc tree and web to no avail. I was hoping I might get some answers to a couple of questions that have arisen as a result of actually trying to take advantage of the IOMMU in our out of tree GPL drivers. The AMD IOMMU in particular. I'm currently using the 3.18.x kernel versions. Both 32 and 64 bit. I understand that AMD IOMMU support is not available for 32-bit kernels. I believe the INTEL IOMMU is supported there. Not knowing why, I was curious if that is going to remain that way? I've learned that the AMD IOMMU does not play well with the kernels "Contiguous Memory Allocator" (CMA). I also believe, but could be mistaken, that the INTEL IOMMU does. Again, not knowing why, I was curious if that is going to remain that way also? Is the fact that the AMD IOMMU is not supported on 32 bit kernels the reason that dma_map_page always returns 0 on 32 bit kernels. I found this strange because the dma_alloc_coherent function returns a page_to_phys address in the dma_handle when no IOMMU is enabled. I was sort of expecting dma_map_page to do the same. Other wise I seem to have to KNOW if an enabled and supported IOMMU is present. How does a driver tell if an IOMMU is present? I suspect it should not have to? I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I thought that maybe dma_map_sg could be used to get around the fact that AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). Is that correct or can dma_map_sg somehow enable you to do a single DMA using a single address for the entire sglist? Sorry if these questions seem ignorant. It was not intentional. Thanks for any insight. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
IOMMU/DMA API inquiry
I've searched the Doc tree and web to no avail. I was hoping I might get some answers to a couple of questions that have arisen as a result of actually trying to take advantage of the IOMMU in our out of tree GPL drivers. The AMD IOMMU in particular. I'm currently using the 3.18.x kernel versions. Both 32 and 64 bit. I understand that AMD IOMMU support is not available for 32-bit kernels. I believe the INTEL IOMMU is supported there. Not knowing why, I was curious if that is going to remain that way? I've learned that the AMD IOMMU does not play well with the kernels Contiguous Memory Allocator (CMA). I also believe, but could be mistaken, that the INTEL IOMMU does. Again, not knowing why, I was curious if that is going to remain that way also? Is the fact that the AMD IOMMU is not supported on 32 bit kernels the reason that dma_map_page always returns 0 on 32 bit kernels. I found this strange because the dma_alloc_coherent function returns a page_to_phys address in the dma_handle when no IOMMU is enabled. I was sort of expecting dma_map_page to do the same. Other wise I seem to have to KNOW if an enabled and supported IOMMU is present. How does a driver tell if an IOMMU is present? I suspect it should not have to? I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I thought that maybe dma_map_sg could be used to get around the fact that AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). Is that correct or can dma_map_sg somehow enable you to do a single DMA using a single address for the entire sglist? Sorry if these questions seem ignorant. It was not intentional. Thanks for any insight. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
dma_alloc_coherent - cma - and IOMMU question
Sorry for the noise. I've read everything DMA in the kernel Doc dir and searched the web to no avail. So I thought I might get some useful info here. I'm currently using a 3.18.3 (x86_64) kernel on an AMD platform. I am currently doing 8MB DMAs to and from our device using the in kernel CMA "cma=64M@0-4G" with no problems. This device is not DAC or scatter/gather capable so the in kernel CMA has been great and replaced our old bigphysarea usage. We simply use dma_alloc_coherent and pass the dma_addr_t *dma_handle returned from the dma_alloc_coherent function to our device as the "bus/pci" address to use. We also use remap_pfn_range on that dma_addr_t *dma_handle returned from the dma_alloc_coherent function to mmap userland to the buffer. All is good until I enable the IOMMU. I then either get IO_PAGE_FAULTs, the DMA just quietly never completes or the system gets borked. [ 106.115725] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x001b address=0xaa50 flags=0x0010] [ 106.115729] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x001b address=0xaa500040 flags=0x0010] Here are the IOMMU settings in my kernel config: #grep IOMMU .config # CONFIG_GART_IOMMU is not set # CONFIG_CALGARY_IOMMU is not set CONFIG_IOMMU_HELPER=y CONFIG_VFIO_IOMMU_TYPE1=m CONFIG_IOMMU_API=y CONFIG_IOMMU_SUPPORT=y CONFIG_AMD_IOMMU=y # CONFIG_AMD_IOMMU_STATS is not set CONFIG_AMD_IOMMU_V2=m CONFIG_INTEL_IOMMU=y CONFIG_INTEL_IOMMU_DEFAULT_ON=y CONFIG_INTEL_IOMMU_FLOPPY_WA=y # CONFIG_IOMMU_STRESS is not set From reading the in kernel doc it would appear that we could in fact, using the IOMMU and the dma_map_sg function, get rid of the CMA requirement and our device could DMA anywhere, even above the 4GB address space limit of our device. But before going through this larger change to our GPL driver, I want to understand if and/or why the dma_alloc_coherent function does not appear to set up the IOMMU for me. Is the IOMMU only supported for "streaming" DMA type and not for "coherent"? I read no reference to this in the kernel doc? Any hints would be greatly appreciated. Again, sorry for the noise. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
dma_alloc_coherent - cma - and IOMMU question
Sorry for the noise. I've read everything DMA in the kernel Doc dir and searched the web to no avail. So I thought I might get some useful info here. I'm currently using a 3.18.3 (x86_64) kernel on an AMD platform. I am currently doing 8MB DMAs to and from our device using the in kernel CMA cma=64M@0-4G with no problems. This device is not DAC or scatter/gather capable so the in kernel CMA has been great and replaced our old bigphysarea usage. We simply use dma_alloc_coherent and pass the dma_addr_t *dma_handle returned from the dma_alloc_coherent function to our device as the bus/pci address to use. We also use remap_pfn_range on that dma_addr_t *dma_handle returned from the dma_alloc_coherent function to mmap userland to the buffer. All is good until I enable the IOMMU. I then either get IO_PAGE_FAULTs, the DMA just quietly never completes or the system gets borked. [ 106.115725] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x001b address=0xaa50 flags=0x0010] [ 106.115729] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x001b address=0xaa500040 flags=0x0010] Here are the IOMMU settings in my kernel config: #grep IOMMU .config # CONFIG_GART_IOMMU is not set # CONFIG_CALGARY_IOMMU is not set CONFIG_IOMMU_HELPER=y CONFIG_VFIO_IOMMU_TYPE1=m CONFIG_IOMMU_API=y CONFIG_IOMMU_SUPPORT=y CONFIG_AMD_IOMMU=y # CONFIG_AMD_IOMMU_STATS is not set CONFIG_AMD_IOMMU_V2=m CONFIG_INTEL_IOMMU=y CONFIG_INTEL_IOMMU_DEFAULT_ON=y CONFIG_INTEL_IOMMU_FLOPPY_WA=y # CONFIG_IOMMU_STRESS is not set From reading the in kernel doc it would appear that we could in fact, using the IOMMU and the dma_map_sg function, get rid of the CMA requirement and our device could DMA anywhere, even above the 4GB address space limit of our device. But before going through this larger change to our GPL driver, I want to understand if and/or why the dma_alloc_coherent function does not appear to set up the IOMMU for me. Is the IOMMU only supported for streaming DMA type and not for coherent? I read no reference to this in the kernel doc? Any hints would be greatly appreciated. Again, sorry for the noise. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nouveau: Do you really want these new links in the kernel tree?
/usr/src/linux-3.18> find drivers/ -type l | xargs ls -al lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/class.h -> ../../../nvif/class.h lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/event.h -> ../../../nvif/event.h lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/ioctl.h -> ../../../nvif/ioctl.h lrwxrwxrwx 1 markh users 22 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/unpack.h -> ../../../nvif/unpack.h lrwxrwxrwx 1 markh users 12 Dec 7 17:21 ./drivers/gpu/drm/nouveau/nvif/os.h -> ../core/os.h Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nouveau: Do you really want these new links in the kernel tree?
/usr/src/linux-3.18 find drivers/ -type l | xargs ls -al lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/class.h - ../../../nvif/class.h lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/event.h - ../../../nvif/event.h lrwxrwxrwx 1 markh users 21 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/ioctl.h - ../../../nvif/ioctl.h lrwxrwxrwx 1 markh users 22 Dec 7 17:21 ./drivers/gpu/drm/nouveau/core/include/nvif/unpack.h - ../../../nvif/unpack.h lrwxrwxrwx 1 markh users 12 Dec 7 17:21 ./drivers/gpu/drm/nouveau/nvif/os.h - ../core/os.h Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Staging: dgnc: Fix long line coding style issues in dgnc_cls.h
On 12/03/2014 06:37 PM, Joe Perches wrote: On Wed, 2014-12-03 at 21:30 +, Sean Cleator wrote: A patch to fix the rest of the long line warnings in the dgnc_cls.h file found by the checkpatch.pl tool checkpatch is a brainless little tool. You should prefer to develop a readable style rather than pay too close attention to precisely what checkpatch says. fyi: There is this warning in the file: * NOTE: THIS IS A SHARED HEADER. DO NOT CHANGE CODING STYLE!!! That comment should be removed. It was put there when this driver was under Digi's CVS control. Some of their user land apps, delivered with the driver package, also used it. They probably had different people working on the kernel and user land side. Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Staging: dgnc: Fix long line coding style issues in dgnc_cls.h
On 12/03/2014 06:37 PM, Joe Perches wrote: On Wed, 2014-12-03 at 21:30 +, Sean Cleator wrote: A patch to fix the rest of the long line warnings in the dgnc_cls.h file found by the checkpatch.pl tool checkpatch is a brainless little tool. You should prefer to develop a readable style rather than pay too close attention to precisely what checkpatch says. fyi: There is this warning in the file: * NOTE: THIS IS A SHARED HEADER. DO NOT CHANGE CODING STYLE!!! That comment should be removed. It was put there when this driver was under Digi's CVS control. Some of their user land apps, delivered with the driver package, also used it. They probably had different people working on the kernel and user land side. Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] staging: dgap: re-arrange functions for removing forward declarations
On 10/29/2014 05:22 AM, Greg KH wrote: On Sun, Oct 26, 2014 at 11:08:54AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Tested-by: Mark Hounschell Signed-off-by: Daeseok Youn --- RESEND: This patch is tested all by Mark. It is good to merge. Greg, check please. It doesn't apply to my tree :( Daeseok, I will be happy to test the new patch. Somehow another patch got in there without us seeing it "commit: ac2f46c365e655dd2c366f69c609febc4ac310fd" on the list. That's why it doesn't apply for Greg. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] staging: dgap: re-arrange functions for removing forward declarations
On 10/29/2014 05:22 AM, Greg KH wrote: On Sun, Oct 26, 2014 at 11:08:54AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Tested-by: Mark Hounschell ma...@compro.net Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- RESEND: This patch is tested all by Mark. It is good to merge. Greg, check please. It doesn't apply to my tree :( Daeseok, I will be happy to test the new patch. Somehow another patch got in there without us seeing it commit: ac2f46c365e655dd2c366f69c609febc4ac310fd on the list. That's why it doesn't apply for Greg. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.
On 10/14/2014 08:01 AM, Mark Hounschell wrote: On 10/13/2014 10:04 PM, Greg KH wrote: On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote: On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote: Hi, 2014-10-13 12:25 GMT+09:00 Greg KH : On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Signed-off-by: Daeseok Youn --- This patch has too many changes for re-arranging the functions. So I wonder that I should break this up into smaller patches. Are the .o files identical before and after this patch? If so, it's fine. Ok. I will check for that. The .o files shouldn't be identical after function reordering. Hm, they might be the same size, but I can see how on some architectures (like ppc) how that would not be the case, you are right. Isn't there an "objdiff" program or something like that which might help in validating that nothing "changed" in the source for type of patch that just moves functions around in a file. thanks, Greg, Would just testing the thing be of any help? Regards Mark I don't know what is going on with this patch, but for what it's worth, I have applied this patch and my testing still shows everything OK. Tested-by: Mark Hounschell Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.
On 10/14/2014 08:01 AM, Mark Hounschell wrote: On 10/13/2014 10:04 PM, Greg KH wrote: On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote: On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote: Hi, 2014-10-13 12:25 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- This patch has too many changes for re-arranging the functions. So I wonder that I should break this up into smaller patches. Are the .o files identical before and after this patch? If so, it's fine. Ok. I will check for that. The .o files shouldn't be identical after function reordering. Hm, they might be the same size, but I can see how on some architectures (like ppc) how that would not be the case, you are right. Isn't there an objdiff program or something like that which might help in validating that nothing changed in the source for type of patch that just moves functions around in a file. thanks, Greg, Would just testing the thing be of any help? Regards Mark I don't know what is going on with this patch, but for what it's worth, I have applied this patch and my testing still shows everything OK. Tested-by: Mark Hounschell ma...@compro.net Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.
On 10/13/2014 10:04 PM, Greg KH wrote: On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote: On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote: Hi, 2014-10-13 12:25 GMT+09:00 Greg KH : On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Signed-off-by: Daeseok Youn --- This patch has too many changes for re-arranging the functions. So I wonder that I should break this up into smaller patches. Are the .o files identical before and after this patch? If so, it's fine. Ok. I will check for that. The .o files shouldn't be identical after function reordering. Hm, they might be the same size, but I can see how on some architectures (like ppc) how that would not be the case, you are right. Isn't there an "objdiff" program or something like that which might help in validating that nothing "changed" in the source for type of patch that just moves functions around in a file. thanks, Greg, Would just testing the thing be of any help? Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.
On 10/13/2014 10:04 PM, Greg KH wrote: On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote: On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote: Hi, 2014-10-13 12:25 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote: Re-arrange the functions for removing forward declarations. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- This patch has too many changes for re-arranging the functions. So I wonder that I should break this up into smaller patches. Are the .o files identical before and after this patch? If so, it's fine. Ok. I will check for that. The .o files shouldn't be identical after function reordering. Hm, they might be the same size, but I can see how on some architectures (like ppc) how that would not be the case, you are right. Isn't there an objdiff program or something like that which might help in validating that nothing changed in the source for type of patch that just moves functions around in a file. thanks, Greg, Would just testing the thing be of any help? Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()
On 07/31/2014 07:14 PM, DaeSeok Youn wrote: Hi, Mark 2014-07-31 21:44 GMT+09:00 Mark Hounschell : On 07/31/2014 12:02 AM, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. Making a node will allocate node memory and strings for saving configrations with kstrdup(). So these are freed when dgap is unloaded or failed to initialize. Signed-off-by: Daeseok Youn --- V2: Do not need to free for NULLNODE. I have been too busy to solve this issue, sorry for late. Mark, Can you test this patch? I try to make simple module which is testing dgap_parsefile() and dgap_cleanup_nodes(). I'll be happy to, but I can't do it until Monday. I'm not where the hardware is until then. That's OK. :-) After applying this patch I am still able to load and unload the driver at will, and it still works for me. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()
On 07/31/2014 07:14 PM, DaeSeok Youn wrote: Hi, Mark 2014-07-31 21:44 GMT+09:00 Mark Hounschell ma...@compro.net: On 07/31/2014 12:02 AM, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. Making a node will allocate node memory and strings for saving configrations with kstrdup(). So these are freed when dgap is unloaded or failed to initialize. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- V2: Do not need to free for NULLNODE. I have been too busy to solve this issue, sorry for late. Mark, Can you test this patch? I try to make simple module which is testing dgap_parsefile() and dgap_cleanup_nodes(). I'll be happy to, but I can't do it until Monday. I'm not where the hardware is until then. That's OK. :-) After applying this patch I am still able to load and unload the driver at will, and it still works for me. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()
On 07/31/2014 12:02 AM, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. Making a node will allocate node memory and strings for saving configrations with kstrdup(). So these are freed when dgap is unloaded or failed to initialize. Signed-off-by: Daeseok Youn --- V2: Do not need to free for NULLNODE. I have been too busy to solve this issue, sorry for late. Mark, Can you test this patch? I try to make simple module which is testing dgap_parsefile() and dgap_cleanup_nodes(). I'll be happy to, but I can't do it until Monday. I'm not where the hardware is until then. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()
On 07/31/2014 12:02 AM, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. Making a node will allocate node memory and strings for saving configrations with kstrdup(). So these are freed when dgap is unloaded or failed to initialize. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- V2: Do not need to free for NULLNODE. I have been too busy to solve this issue, sorry for late. Mark, Can you test this patch? I try to make simple module which is testing dgap_parsefile() and dgap_cleanup_nodes(). I'll be happy to, but I can't do it until Monday. I'm not where the hardware is until then. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: introduce dgap_cleanup_nodes()
On 07/16/2014 09:35 PM, Daeseok Youn wrote: > When a configration file is parsed with dgap_parsefile(), > makes nodes for saving configrations for board. > > Making a node will allocate node memory and strings for saving > configrations with kstrdup(). > > So these are freed when dgap is unloaded or failed to initialize. > > Signed-off-by: Daeseok Youn > --- > Mark, please review this patch. > Thanks. > > drivers/staging/dgap/dgap.c | 47 > +++ > 1 files changed, 47 insertions(+), 0 deletions(-) > > diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c > index 06c55cb..e9df2ea 100644 > --- a/drivers/staging/dgap/dgap.c > +++ b/drivers/staging/dgap/dgap.c > @@ -201,6 +201,7 @@ static int dgap_test_fep(struct board_t *brd); > static int dgap_tty_register_ports(struct board_t *brd); > static int dgap_firmware_load(struct pci_dev *pdev, int card_type, > struct board_t *brd); > +static void dgap_cleanup_nodes(void); > > static void dgap_cleanup_module(void); > > @@ -619,6 +620,7 @@ unregister_tty: > free_flipbuf: > dgap_free_flipbuf(brd); > cleanup_brd: > + dgap_cleanup_nodes(); > dgap_release_remap(brd); > kfree(brd); > > @@ -659,6 +661,8 @@ static void dgap_cleanup_module(void) > dgap_cleanup_board(dgap_board[i]); > } > > + dgap_cleanup_nodes(); > + > if (dgap_numboards) > pci_unregister_driver(_driver); > } > @@ -6323,6 +6327,49 @@ static void dgap_remove_tty_sysfs(struct device *c) > sysfs_remove_group(>kobj, _tty_attribute_group); > } > > +static void dgap_cleanup_nodes(void) > +{ > + struct cnode *p; > + > + p = _head; > + > + while (p) { > + struct cnode *tmp = p->next; > + > + switch (p->type) { > + case BNODE: > + kfree(p->u.board.portstr); > + kfree(p->u.board.addrstr); > + kfree(p->u.board.pcibusstr); > + kfree(p->u.board.pcislotstr); > + kfree(p->u.board.method); > + break; > + case CNODE: > + kfree(p->u.conc.id); > + kfree(p->u.conc.connect); > + break; > + case MNODE: > + kfree(p->u.module.id); > + break; > + case TNODE: > + kfree(p->u.ttyname); > + break; > + case CUNODE: > + kfree(p->u.cuname); > + break; > + case LNODE: > + kfree(p->u.line.cable); > + break; > + case PNODE: > + kfree(p->u.printname); > + break; > + } > + > + kfree(p->u.board.status); > + kfree(p); > + p = tmp; > + } > +} > /* >* Parse a configuration file read into memory as a string. >*/ > I get a kernel oops when unloading the driver with this patch. 2014-07-17T09:22:12.666987-04:00 harley kernel: [60216.979134] task: 8801037846b0 ti: 880149256000 task.ti: 880149256000 2014-07-17T09:22:12.666988-04:00 harley kernel: [60216.979136] RIP: 0010:[] [] kfree+0x17f/0x190 2014-07-17T09:22:12.666989-04:00 harley kernel: [60216.979143] RSP: 0018:880149257e78 EFLAGS: 00010246 2014-07-17T09:22:12.666991-04:00 harley kernel: [60216.979144] RAX: 40080068 RBX: a0428d20 RCX: 038eb895 2014-07-17T09:22:12.666992-04:00 harley kernel: [60216.979146] RDX: 40080068 RSI: ea0004131e00 RDI: a0428d20 2014-07-17T09:22:12.666993-04:00 harley kernel: [60216.979147] RBP: 880149257e90 R08: 00015b60 R09: 88014fd55b60 2014-07-17T09:22:12.666994-04:00 harley kernel: [60216.979149] R10: ea810a00 R11: a0424ffd R12: 88005a3d2a80 2014-07-17T09:22:12.666995-04:00 harley kernel: [60216.979150] R13: a041c155 R14: 0002 R15: 0bc0 2014-07-17T09:22:12.666996-04:00 harley kernel: [60216.979152] FS: 7fb7d51d2700() GS:88014fd4() knlGS: 2014-07-17T09:22:12.666997-04:00 harley kernel: [60216.979154] CS: 0010 DS: ES: CR0: 8005003b 2014-07-17T09:22:12.666998-04:00 harley kernel: [60216.979155] CR2: 7fe66ee4e000 CR3: 0001497af000 CR4: 000407e0 2014-07-17T09:22:12.666999-04:00 harley kernel: [60216.979156] Stack: 2014-07-17T09:22:12.667000-04:00 harley kernel: [60216.979157] a0428d20 88005a3d2a80 8801259dd330 880149257eb0 2014-07-17T09:22:12.667001-04:00 harley kernel: [60216.979161] a041c155 0700 8801259dd000 880149257ee8 2014-07-17T09:22:12.667002-04:00 harley kernel: [60216.979164] a0424ee8 80c92700 a04289e0
Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()
On 07/16/2014 08:42 PM, DaeSeok Youn wrote: 2014-07-16 23:17 GMT+09:00 Mark Hounschell : On 07/16/2014 05:26 AM, DaeSeok Youn wrote: 2014-07-16 8:50 GMT+09:00 Greg KH : On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote: Hi, 2014-07-16 0:29 GMT+09:00 Greg KH : On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote: The dgap_err() is printing a message with pr_err(), so all those are replaced. Use definition "pr_fmt" and then all of "dgap:" in the beginning of print messages are removed. And also removed "out of memory" message because the kernel has own message for that. Signed-off-by: Daeseok Youn --- V2: use pr_fmt "dgap:" prefix on print message on dgap. remove "out of memory" message. Adds Mark to TO list and CC list for checking send this email properly to him. drivers/staging/dgap/dgap.c | 306 +++ 1 files changed, 133 insertions(+), 173 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index 06c55cb..9e750fb 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -41,6 +41,8 @@ */ #undef DIGI_CONCENTRATORS_SUPPORTED +#define pr_fmt(fmt) "dgap: " fmt + #include #include #include @@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch); static int dgap_gettok(char **in); static char *dgap_getword(char **in); static int dgap_checknode(struct cnode *p); -static void dgap_err(char *s); /* * Function prototypes from dgap_sysfs.h @@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev *pdev, int id, if (ret) goto free_brd; - pr_info("dgap: board %d: %s (rev %d), irq %ld\n", + pr_info("board %d: %s (rev %d), irq %ld\n", boardnum, brd->name, brd->rev, brd->irq); Almost all of the pr_*() calls in this driver should be converted over to use dev_*() calls instead. And some of them, like this one, should be removed entirely (no need for a driver to be "noisy" when a device for it is found, it should be quiet if at all possible, unless something went wrong.) So can you do that here instead? I've applied the earlier patches in this series, and stopped here. OK. I can. pr_*() calls are replaced with dev_*() calls. And also removes some of print message which are useless like "out of memory" Yes, please do that, that would be great. I have been working to change pr_*() to dev_*(), but dgap_parse() has no "struct device" for using dev_*(). If dgap_parse still need for this driver, it need to take a parameter for using dev_*(), it may be "pdev" but configuration file doesn't need to parse in kernel at all, dgap_parse() will be removed. So I will wait to verify by Mark about parsing configuration file. Thanks. regards, Daeseok Youn. Hi Daeseok, I would wait on that one for now. I know that code has to be removed eventually. I'm just not sure if we are quite ready. That is actually a LOT of lines of code also. I think a couple of things need done first. Here is a sample config file created by one of DIGI's user land applications (mpi). These entries are only for 2 different cards. There are others cards that may have other things to consider. I only have these 2 cards types now. I had a third type which is just a 2 port but that one is gone now. config_begin board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttya altpin 0 useintr 0 board Digi_AccelePort_4r_920_PCI io 0x000 mem 0x00 start 1 nports 4 ttyname ttyb altpin 0 useintr 0 board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttyc altpin 0 useintr 0 config_end oh.. I couldn't find a sample of config file for dgap module in web. Thanks. The altpin and useintr parameters need to be converted to module parameters. I found references in the code somewhere that the nports may not be reliably known using the device id for at least one card type. All the other stuff in this particular config file is pretty much useless. Well, sort of. The ttyname parameter is used by the driver to populate a "sys" file with a custom device name that is then used by a userland script and udev to allow a user to define his own device names. Custom links are created. Perhaps this also would be nice to have as a module param? I'm not sure about using module param and udev. I think config file used when firmware is loaded. The config file has nothing to do with the actual firmware loading or which firmware file is to be loaded for a given board. There are a couple of things done to the board either before or after firmware loads that probably
Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()
On 07/16/2014 08:42 PM, DaeSeok Youn wrote: 2014-07-16 23:17 GMT+09:00 Mark Hounschell ma...@compro.net: On 07/16/2014 05:26 AM, DaeSeok Youn wrote: 2014-07-16 8:50 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote: Hi, 2014-07-16 0:29 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote: The dgap_err() is printing a message with pr_err(), so all those are replaced. Use definition pr_fmt and then all of dgap: in the beginning of print messages are removed. And also removed out of memory message because the kernel has own message for that. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- V2: use pr_fmt dgap: prefix on print message on dgap. remove out of memory message. Adds Mark to TO list and CC list for checking send this email properly to him. drivers/staging/dgap/dgap.c | 306 +++ 1 files changed, 133 insertions(+), 173 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index 06c55cb..9e750fb 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -41,6 +41,8 @@ */ #undef DIGI_CONCENTRATORS_SUPPORTED +#define pr_fmt(fmt) dgap: fmt + #include linux/kernel.h #include linux/module.h #include linux/pci.h @@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch); static int dgap_gettok(char **in); static char *dgap_getword(char **in); static int dgap_checknode(struct cnode *p); -static void dgap_err(char *s); /* * Function prototypes from dgap_sysfs.h @@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev *pdev, int id, if (ret) goto free_brd; - pr_info(dgap: board %d: %s (rev %d), irq %ld\n, + pr_info(board %d: %s (rev %d), irq %ld\n, boardnum, brd-name, brd-rev, brd-irq); Almost all of the pr_*() calls in this driver should be converted over to use dev_*() calls instead. And some of them, like this one, should be removed entirely (no need for a driver to be noisy when a device for it is found, it should be quiet if at all possible, unless something went wrong.) So can you do that here instead? I've applied the earlier patches in this series, and stopped here. OK. I can. pr_*() calls are replaced with dev_*() calls. And also removes some of print message which are useless like out of memory Yes, please do that, that would be great. I have been working to change pr_*() to dev_*(), but dgap_parse() has no struct device for using dev_*(). If dgap_parse still need for this driver, it need to take a parameter for using dev_*(), it may be pdev but configuration file doesn't need to parse in kernel at all, dgap_parse() will be removed. So I will wait to verify by Mark about parsing configuration file. Thanks. regards, Daeseok Youn. Hi Daeseok, I would wait on that one for now. I know that code has to be removed eventually. I'm just not sure if we are quite ready. That is actually a LOT of lines of code also. I think a couple of things need done first. Here is a sample config file created by one of DIGI's user land applications (mpi). These entries are only for 2 different cards. There are others cards that may have other things to consider. I only have these 2 cards types now. I had a third type which is just a 2 port but that one is gone now. config_begin board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttya altpin 0 useintr 0 board Digi_AccelePort_4r_920_PCI io 0x000 mem 0x00 start 1 nports 4 ttyname ttyb altpin 0 useintr 0 board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttyc altpin 0 useintr 0 config_end oh.. I couldn't find a sample of config file for dgap module in web. Thanks. The altpin and useintr parameters need to be converted to module parameters. I found references in the code somewhere that the nports may not be reliably known using the device id for at least one card type. All the other stuff in this particular config file is pretty much useless. Well, sort of. The ttyname parameter is used by the driver to populate a sys file with a custom device name that is then used by a userland script and udev to allow a user to define his own device names. Custom links are created. Perhaps this also would be nice to have as a module param? I'm not sure about using module param and udev. I think config file used when firmware is loaded. The config file has nothing to do with the actual firmware loading or which firmware file is to be loaded for a given board. There are a couple of things done to the board either before or after firmware loads that probably still require that the config file had
Re: [PATCH] staging: dgap: introduce dgap_cleanup_nodes()
On 07/16/2014 09:35 PM, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. Making a node will allocate node memory and strings for saving configrations with kstrdup(). So these are freed when dgap is unloaded or failed to initialize. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- Mark, please review this patch. Thanks. drivers/staging/dgap/dgap.c | 47 +++ 1 files changed, 47 insertions(+), 0 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index 06c55cb..e9df2ea 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -201,6 +201,7 @@ static int dgap_test_fep(struct board_t *brd); static int dgap_tty_register_ports(struct board_t *brd); static int dgap_firmware_load(struct pci_dev *pdev, int card_type, struct board_t *brd); +static void dgap_cleanup_nodes(void); static void dgap_cleanup_module(void); @@ -619,6 +620,7 @@ unregister_tty: free_flipbuf: dgap_free_flipbuf(brd); cleanup_brd: + dgap_cleanup_nodes(); dgap_release_remap(brd); kfree(brd); @@ -659,6 +661,8 @@ static void dgap_cleanup_module(void) dgap_cleanup_board(dgap_board[i]); } + dgap_cleanup_nodes(); + if (dgap_numboards) pci_unregister_driver(dgap_driver); } @@ -6323,6 +6327,49 @@ static void dgap_remove_tty_sysfs(struct device *c) sysfs_remove_group(c-kobj, dgap_tty_attribute_group); } +static void dgap_cleanup_nodes(void) +{ + struct cnode *p; + + p = dgap_head; + + while (p) { + struct cnode *tmp = p-next; + + switch (p-type) { + case BNODE: + kfree(p-u.board.portstr); + kfree(p-u.board.addrstr); + kfree(p-u.board.pcibusstr); + kfree(p-u.board.pcislotstr); + kfree(p-u.board.method); + break; + case CNODE: + kfree(p-u.conc.id); + kfree(p-u.conc.connect); + break; + case MNODE: + kfree(p-u.module.id); + break; + case TNODE: + kfree(p-u.ttyname); + break; + case CUNODE: + kfree(p-u.cuname); + break; + case LNODE: + kfree(p-u.line.cable); + break; + case PNODE: + kfree(p-u.printname); + break; + } + + kfree(p-u.board.status); + kfree(p); + p = tmp; + } +} /* * Parse a configuration file read into memory as a string. */ I get a kernel oops when unloading the driver with this patch. 2014-07-17T09:22:12.666987-04:00 harley kernel: [60216.979134] task: 8801037846b0 ti: 880149256000 task.ti: 880149256000 2014-07-17T09:22:12.666988-04:00 harley kernel: [60216.979136] RIP: 0010:[8034d5ff] [8034d5ff] kfree+0x17f/0x190 2014-07-17T09:22:12.666989-04:00 harley kernel: [60216.979143] RSP: 0018:880149257e78 EFLAGS: 00010246 2014-07-17T09:22:12.666991-04:00 harley kernel: [60216.979144] RAX: 40080068 RBX: a0428d20 RCX: 038eb895 2014-07-17T09:22:12.666992-04:00 harley kernel: [60216.979146] RDX: 40080068 RSI: ea0004131e00 RDI: a0428d20 2014-07-17T09:22:12.666993-04:00 harley kernel: [60216.979147] RBP: 880149257e90 R08: 00015b60 R09: 88014fd55b60 2014-07-17T09:22:12.666994-04:00 harley kernel: [60216.979149] R10: ea810a00 R11: a0424ffd R12: 88005a3d2a80 2014-07-17T09:22:12.666995-04:00 harley kernel: [60216.979150] R13: a041c155 R14: 0002 R15: 0bc0 2014-07-17T09:22:12.666996-04:00 harley kernel: [60216.979152] FS: 7fb7d51d2700() GS:88014fd4() knlGS: 2014-07-17T09:22:12.666997-04:00 harley kernel: [60216.979154] CS: 0010 DS: ES: CR0: 8005003b 2014-07-17T09:22:12.666998-04:00 harley kernel: [60216.979155] CR2: 7fe66ee4e000 CR3: 0001497af000 CR4: 000407e0 2014-07-17T09:22:12.666999-04:00 harley kernel: [60216.979156] Stack: 2014-07-17T09:22:12.667000-04:00 harley kernel: [60216.979157] a0428d20 88005a3d2a80 8801259dd330 880149257eb0 2014-07-17T09:22:12.667001-04:00 harley kernel: [60216.979161] a041c155 0700 8801259dd000 880149257ee8 2014-07-17T09:22:12.667002-04:00 harley kernel: [60216.979164] a0424ee8 80c92700 a04289e0 2014-07-17T09:22:12.667002-04:00 harley kernel:
Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()
On 07/16/2014 05:26 AM, DaeSeok Youn wrote: 2014-07-16 8:50 GMT+09:00 Greg KH : On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote: Hi, 2014-07-16 0:29 GMT+09:00 Greg KH : On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote: The dgap_err() is printing a message with pr_err(), so all those are replaced. Use definition "pr_fmt" and then all of "dgap:" in the beginning of print messages are removed. And also removed "out of memory" message because the kernel has own message for that. Signed-off-by: Daeseok Youn --- V2: use pr_fmt "dgap:" prefix on print message on dgap. remove "out of memory" message. Adds Mark to TO list and CC list for checking send this email properly to him. drivers/staging/dgap/dgap.c | 306 +++ 1 files changed, 133 insertions(+), 173 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index 06c55cb..9e750fb 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -41,6 +41,8 @@ */ #undef DIGI_CONCENTRATORS_SUPPORTED +#define pr_fmt(fmt) "dgap: " fmt + #include #include #include @@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch); static int dgap_gettok(char **in); static char *dgap_getword(char **in); static int dgap_checknode(struct cnode *p); -static void dgap_err(char *s); /* * Function prototypes from dgap_sysfs.h @@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev *pdev, int id, if (ret) goto free_brd; - pr_info("dgap: board %d: %s (rev %d), irq %ld\n", + pr_info("board %d: %s (rev %d), irq %ld\n", boardnum, brd->name, brd->rev, brd->irq); Almost all of the pr_*() calls in this driver should be converted over to use dev_*() calls instead. And some of them, like this one, should be removed entirely (no need for a driver to be "noisy" when a device for it is found, it should be quiet if at all possible, unless something went wrong.) So can you do that here instead? I've applied the earlier patches in this series, and stopped here. OK. I can. pr_*() calls are replaced with dev_*() calls. And also removes some of print message which are useless like "out of memory" Yes, please do that, that would be great. I have been working to change pr_*() to dev_*(), but dgap_parse() has no "struct device" for using dev_*(). If dgap_parse still need for this driver, it need to take a parameter for using dev_*(), it may be "pdev" but configuration file doesn't need to parse in kernel at all, dgap_parse() will be removed. So I will wait to verify by Mark about parsing configuration file. Thanks. regards, Daeseok Youn. Hi Daeseok, I would wait on that one for now. I know that code has to be removed eventually. I'm just not sure if we are quite ready. That is actually a LOT of lines of code also. I think a couple of things need done first. Here is a sample config file created by one of DIGI's user land applications (mpi). These entries are only for 2 different cards. There are others cards that may have other things to consider. I only have these 2 cards types now. I had a third type which is just a 2 port but that one is gone now. config_begin board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttya altpin 0 useintr 0 board Digi_AccelePort_4r_920_PCI io 0x000 mem 0x00 start 1 nports 4 ttyname ttyb altpin 0 useintr 0 board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttyc altpin 0 useintr 0 config_end The altpin and useintr parameters need to be converted to module parameters. I found references in the code somewhere that the nports may not be reliably known using the device id for at least one card type. All the other stuff in this particular config file is pretty much useless. Well, sort of. The ttyname parameter is used by the driver to populate a "sys" file with a custom device name that is then used by a userland script and udev to allow a user to define his own device names. Custom links are created. Perhaps this also would be nice to have as a module param? Also, there is a userland utility (dpa) that is used to set some parameters of the card. Sort of like stty except it is for "special" things that the kernel does not directly support. Like special baud rates and such. I'm not sure what to do about that one. I personally have never used these "special" things but I'm sure they are used by someone somewhere?? I saw some code removed related to "dpa" recently. This came to mind. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read
Re: [PATCH 7/8 RESEND] staging: dgap: introduce dgap_cleanup_nodes()
On 07/15/2014 11:30 AM, Greg KH wrote: On Tue, Jul 15, 2014 at 06:14:25PM +0900, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. configuration files should not be parsed in the kernel at all. That logic should be removed as it should not be needed. Mark, can you verify that this is not needed with your hardware anymore? As far as I can see it is still needed. If no config file, no devices are created and configured. I had planned to remove that code such that all the different cards supported are still supported. Even though I only have 2 different types of cards. I Just haven't got to it yet. I realize "configuration files should not be parsed in the kernel" and I'm certainly not going to argue that. But the way it is, it allows user land to define board order and even device node names (additional links to the actual device). No matter where boards are installed and found by the kernel, you have persistent naming. I guess all that can be done with udev (well, sort of anyway). There are also options in the config file that should be passed in as module parameters and I planned to do that part before messing with the config file code. We also have this "DIGI_EXPANDERS_SUPPORTED" thingy. I have none of these cards. These cards interrogate the expander to find out how many ports there are. Then configure accordingly. Should we just remove all that code and NOT support expanders since we don't currently have the hardware to test? I will update from your current git tree and test this week. I've been swamped lately and unable to do much. I thought I'd be available to do some more work but things come up as you know. Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8 RESEND] staging: dgap: introduce dgap_cleanup_nodes()
On 07/15/2014 11:30 AM, Greg KH wrote: On Tue, Jul 15, 2014 at 06:14:25PM +0900, Daeseok Youn wrote: When a configration file is parsed with dgap_parsefile(), makes nodes for saving configrations for board. configuration files should not be parsed in the kernel at all. That logic should be removed as it should not be needed. Mark, can you verify that this is not needed with your hardware anymore? As far as I can see it is still needed. If no config file, no devices are created and configured. I had planned to remove that code such that all the different cards supported are still supported. Even though I only have 2 different types of cards. I Just haven't got to it yet. I realize configuration files should not be parsed in the kernel and I'm certainly not going to argue that. But the way it is, it allows user land to define board order and even device node names (additional links to the actual device). No matter where boards are installed and found by the kernel, you have persistent naming. I guess all that can be done with udev (well, sort of anyway). There are also options in the config file that should be passed in as module parameters and I planned to do that part before messing with the config file code. We also have this DIGI_EXPANDERS_SUPPORTED thingy. I have none of these cards. These cards interrogate the expander to find out how many ports there are. Then configure accordingly. Should we just remove all that code and NOT support expanders since we don't currently have the hardware to test? I will update from your current git tree and test this week. I've been swamped lately and unable to do much. I thought I'd be available to do some more work but things come up as you know. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()
On 07/16/2014 05:26 AM, DaeSeok Youn wrote: 2014-07-16 8:50 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote: Hi, 2014-07-16 0:29 GMT+09:00 Greg KH gre...@linuxfoundation.org: On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote: The dgap_err() is printing a message with pr_err(), so all those are replaced. Use definition pr_fmt and then all of dgap: in the beginning of print messages are removed. And also removed out of memory message because the kernel has own message for that. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- V2: use pr_fmt dgap: prefix on print message on dgap. remove out of memory message. Adds Mark to TO list and CC list for checking send this email properly to him. drivers/staging/dgap/dgap.c | 306 +++ 1 files changed, 133 insertions(+), 173 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index 06c55cb..9e750fb 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -41,6 +41,8 @@ */ #undef DIGI_CONCENTRATORS_SUPPORTED +#define pr_fmt(fmt) dgap: fmt + #include linux/kernel.h #include linux/module.h #include linux/pci.h @@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch); static int dgap_gettok(char **in); static char *dgap_getword(char **in); static int dgap_checknode(struct cnode *p); -static void dgap_err(char *s); /* * Function prototypes from dgap_sysfs.h @@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev *pdev, int id, if (ret) goto free_brd; - pr_info(dgap: board %d: %s (rev %d), irq %ld\n, + pr_info(board %d: %s (rev %d), irq %ld\n, boardnum, brd-name, brd-rev, brd-irq); Almost all of the pr_*() calls in this driver should be converted over to use dev_*() calls instead. And some of them, like this one, should be removed entirely (no need for a driver to be noisy when a device for it is found, it should be quiet if at all possible, unless something went wrong.) So can you do that here instead? I've applied the earlier patches in this series, and stopped here. OK. I can. pr_*() calls are replaced with dev_*() calls. And also removes some of print message which are useless like out of memory Yes, please do that, that would be great. I have been working to change pr_*() to dev_*(), but dgap_parse() has no struct device for using dev_*(). If dgap_parse still need for this driver, it need to take a parameter for using dev_*(), it may be pdev but configuration file doesn't need to parse in kernel at all, dgap_parse() will be removed. So I will wait to verify by Mark about parsing configuration file. Thanks. regards, Daeseok Youn. Hi Daeseok, I would wait on that one for now. I know that code has to be removed eventually. I'm just not sure if we are quite ready. That is actually a LOT of lines of code also. I think a couple of things need done first. Here is a sample config file created by one of DIGI's user land applications (mpi). These entries are only for 2 different cards. There are others cards that may have other things to consider. I only have these 2 cards types now. I had a third type which is just a 2 port but that one is gone now. config_begin board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttya altpin 0 useintr 0 board Digi_AccelePort_4r_920_PCI io 0x000 mem 0x00 start 1 nports 4 ttyname ttyb altpin 0 useintr 0 board Digi_AccelePort_8r_920_PCI io 0x000 mem 0x00 start 1 nports 8 ttyname ttyc altpin 0 useintr 0 config_end The altpin and useintr parameters need to be converted to module parameters. I found references in the code somewhere that the nports may not be reliably known using the device id for at least one card type. All the other stuff in this particular config file is pretty much useless. Well, sort of. The ttyname parameter is used by the driver to populate a sys file with a custom device name that is then used by a userland script and udev to allow a user to define his own device names. Custom links are created. Perhaps this also would be nice to have as a module param? Also, there is a userland utility (dpa) that is used to set some parameters of the card. Sort of like stty except it is for special things that the kernel does not directly support. Like special baud rates and such. I'm not sure what to do about that one. I personally have never used these special things but I'm sure they are used by someone somewhere?? I saw some code removed related to dpa recently. This came to mind. Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org
Re: [PATCH V2] staging: dgap: implement proper error handling in dgap_firmware_load()
On 05/28/2014 06:11 AM, Dan Carpenter wrote: > On Wed, May 28, 2014 at 06:29:38PM +0900, DaeSeok Youn wrote: >>> In your patch it has: >>> + dgap_tty_uninit(brd, false); >>> >>> But it should only be "false" if dgap_tty_init() failed. If >>> dgap_tty_register_ports() fails then it should be "true". Another >> Yes, you're right. There were no error handle for tty_port_register_device() >> and >> dgap_create_tty_sysfs() in dgap_tty_register_ports(). I didn't catch it. :-( >> It need to add error handlers for them, right? > > Eventually, yes. But I don't see a simple way to fix > dgap_firmware_load() until after the code is cleaned up. > >> >>> problem is that as you say, the earlier function are allocating >>> resources like dgap_tty_register() but only the last two function calls >>> have a "goto err_cleanup;" so the error handling is incomplete. >> So remove "goto" in dgap_firmware_load() and add error handler in >> dgap_tty_init() > > In the current code there isn't a goto in dgap_firmware_load(). Remove > the call to dgap_tty_uninit() and add error handling in dgap_tty_init(). > > That will clean up the code, and fix some NULL dereference bugs inside > dgap_tty_uninit(). > >> and dgap_tty_register_ports(), right? > > Inside dgap_tty_register_ports(), then we should add a > kfree(brd->serial_ports) if the "brd->printer_ports" allocation fails. > That is not a complete fix, but it is a part fix and it is clean. > >> >> I have a question of this. In case of this, how to complete the error >> handling? > > [patch 1/x] staging: dgap: remove useless dgap_probe1() function > [patch 2/x] staging: dgap: unwind on error in dgap_found_board() > [patch 3/x] staging: dgap: remove bogus null test in dgap_tty_init() > The ->channels[] were set to null in dgap_found_board(). > [patch 4/x] staging: dgap: unwind on error in dgap_tty_init() > This also removes the call to dgap_tty_uninit() in > dgap_firmware_load() > [patch 5/x] staging: dgap: unwind on error in dgap_tty_register_ports() > [patch 6/x] staging: dgap: make dgap_config_buf a local buffer > [patch 7/x] staging: dgap: pass "dgap_numboards" to dgap_found_board() > instead of using a global variable > [patch 8/x] staging: dgap: pass "brd" to dgap_after_config_loaded() > instead of passing "dgap_numboards" and looking up brd again. > [patch 9/x] staging: dgap: rename dgap_finalize_board_init() to > dgap_request_irq() > > In the end, I hate dgap_tty_uninit() because it doesn't match > dgap_tty_init() at all. It's poorly named. We should rename it and > make another dgap_tty_init() which just sets the ->channels[] to NULL. > > [patch x/x] staging: dgap: introduce dgap_tty_unregister() > This is currently done in dgap_tty_uninit(), which is the wrong > place. > > I have started using a new todo list tag in my emails. So I'm adding > this stuff to the todo list. > > TODO-list: 2014-05-28: dgap: cleanups and bug fixes. > I can think of some more things that could be added to that TODO-list. All the configuration file stuff needs to go away. As Greg previously said, we don't read configurations files in kernel modules. I'm pretty sure all cards supported have unique device IDs, even though notes and code in this driver imply otherwise. As long as they all do, with the exception of those cards that use port extenders, I think this could be done. Maybe we will never be able to support "port extenders". Unclear yet. When we do this, the useintr and altpin configuration file stuff will need converted to module options. There was also some ioctl code removed entirely before the driver was merged into staging. It was specific to the dgap_mgmt device (currently dead as a result) that allowed userland some real-time monitoring and configuration changes. These userland utilities were part of the original Digi package and I'm sure are used by some. I personally never used them, but... Then there are some things you recommended in a previous email that I haven't got to yet? Is that TODO-list going to be commited? Regards Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] staging: dgap: implement proper error handling in dgap_firmware_load()
On 05/28/2014 06:11 AM, Dan Carpenter wrote: On Wed, May 28, 2014 at 06:29:38PM +0900, DaeSeok Youn wrote: In your patch it has: + dgap_tty_uninit(brd, false); But it should only be false if dgap_tty_init() failed. If dgap_tty_register_ports() fails then it should be true. Another Yes, you're right. There were no error handle for tty_port_register_device() and dgap_create_tty_sysfs() in dgap_tty_register_ports(). I didn't catch it. :-( It need to add error handlers for them, right? Eventually, yes. But I don't see a simple way to fix dgap_firmware_load() until after the code is cleaned up. problem is that as you say, the earlier function are allocating resources like dgap_tty_register() but only the last two function calls have a goto err_cleanup; so the error handling is incomplete. So remove goto in dgap_firmware_load() and add error handler in dgap_tty_init() In the current code there isn't a goto in dgap_firmware_load(). Remove the call to dgap_tty_uninit() and add error handling in dgap_tty_init(). That will clean up the code, and fix some NULL dereference bugs inside dgap_tty_uninit(). and dgap_tty_register_ports(), right? Inside dgap_tty_register_ports(), then we should add a kfree(brd-serial_ports) if the brd-printer_ports allocation fails. That is not a complete fix, but it is a part fix and it is clean. I have a question of this. In case of this, how to complete the error handling? [patch 1/x] staging: dgap: remove useless dgap_probe1() function [patch 2/x] staging: dgap: unwind on error in dgap_found_board() [patch 3/x] staging: dgap: remove bogus null test in dgap_tty_init() The -channels[] were set to null in dgap_found_board(). [patch 4/x] staging: dgap: unwind on error in dgap_tty_init() This also removes the call to dgap_tty_uninit() in dgap_firmware_load() [patch 5/x] staging: dgap: unwind on error in dgap_tty_register_ports() [patch 6/x] staging: dgap: make dgap_config_buf a local buffer [patch 7/x] staging: dgap: pass dgap_numboards to dgap_found_board() instead of using a global variable [patch 8/x] staging: dgap: pass brd to dgap_after_config_loaded() instead of passing dgap_numboards and looking up brd again. [patch 9/x] staging: dgap: rename dgap_finalize_board_init() to dgap_request_irq() In the end, I hate dgap_tty_uninit() because it doesn't match dgap_tty_init() at all. It's poorly named. We should rename it and make another dgap_tty_init() which just sets the -channels[] to NULL. [patch x/x] staging: dgap: introduce dgap_tty_unregister() This is currently done in dgap_tty_uninit(), which is the wrong place. I have started using a new todo list tag in my emails. So I'm adding this stuff to the todo list. TODO-list: 2014-05-28: dgap: cleanups and bug fixes. I can think of some more things that could be added to that TODO-list. All the configuration file stuff needs to go away. As Greg previously said, we don't read configurations files in kernel modules. I'm pretty sure all cards supported have unique device IDs, even though notes and code in this driver imply otherwise. As long as they all do, with the exception of those cards that use port extenders, I think this could be done. Maybe we will never be able to support port extenders. Unclear yet. When we do this, the useintr and altpin configuration file stuff will need converted to module options. There was also some ioctl code removed entirely before the driver was merged into staging. It was specific to the dgap_mgmt device (currently dead as a result) that allowed userland some real-time monitoring and configuration changes. These userland utilities were part of the original Digi package and I'm sure are used by some. I personally never used them, but... Then there are some things you recommended in a previous email that I haven't got to yet? Is that TODO-list going to be commited? Regards Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()
On 04/25/2014 08:59 AM, Dan Carpenter wrote: > On Fri, Apr 25, 2014 at 08:29:41AM -0400, Mark Hounschell wrote: >> On 04/25/2014 07:02 AM, DaeSeok Youn wrote: >>> Hi, Dan. >>> >>> 2014-04-25 18:26 GMT+09:00 Dan Carpenter : >>>> Mark, maybe you should add yourself to the MAINTAINERS entry for this >>>> driver? >>>> >> >> I'll look into this. I am clueless on what that would actually mean. >> > > Just add your name with Lidza in the MAINTAINERS file so that people > will CC you on all the patches. > > DIGI EPCA PCI PRODUCTS > M: Lidza Louina > L: driverdev-de...@linuxdriverproject.org > S: Maintained > F: drivers/staging/dgap/ > > You don't have to do it if you don't want to, but you seem to be working > on this driver and I'm going to refer questions to you either way. :P > >>>> On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote: >>>>> @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd) >>>>> /* Register tty devices */ >>>>> rc = tty_register_driver(brd->SerialDriver); >>>>> if (rc < 0) >>>>> - return rc; >>>>> + goto free_print_ttys; >>>>> + >>>>> brd->dgap_Major_Serial_Registered = TRUE; >>>>> dgap_BoardsByMajor[brd->SerialDriver->major] = brd; >>>>> brd->dgap_Serial_Major = brd->SerialDriver->major; >>>>> @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd) >>>>> /* Register Transparent Print devices */ >>>>> rc = tty_register_driver(brd->PrintDriver); >>>>> if (rc < 0) >>>>> - return rc; >>>>> + goto unregister_serial_drv; >>>>> + >>>>> brd->dgap_Major_TransparentPrint_Registered = TRUE; >>>>> dgap_BoardsByMajor[brd->PrintDriver->major] = brd; >>>>> brd->dgap_TransparentPrint_Major = brd->PrintDriver->major; >>>>> } >>>>> >>>>> return rc; >>>>> + >>>>> +unregister_serial_drv: >>>>> + tty_unregister_driver(brd->SerialDriver); >>>> >>>> We only register the ->SerialDriver if someone else hasn't registered it >>>> first? So this should be: >>>> >>>> if (we_were_the_ones_who_registered_the_serial_driver) >>>> tty_unregister_driver(brd->SerialDriver); >>>> >>>> I haven't followed looked at this. Who else is registering the serial >>>> driver? You have looked at this, what do you think? Or Mark. >>> >> >> registering the brd->XDriver is only done when a board is detected >> and only during the firmware_load process. If we fail to >> tty_register_driver do we _need_ to tty_unregister_driver? Isn't that >> like freeing after an alloc failure? > > The allocation is conditional so the free should be conditional. If we > didn't allocate it, then we shouldn't free it. > > It wouldn't have even been a question except I'm not sure the allocation > is *really* conditional because brd->dgap_Major_Serial_Registered might > always be "false" like you guys seem to be saying. > >> >>> I think brd struct is from dgap_Board array as global static variable >>> when this function is >>> called. So brd->dgap_Major_Serial_Registered is always "false". >>> If dgap_NumBoards is less than MAXBOARDS, brd->SerialDriver should be >>> registered. >>> >>> I'm not sure.. >>> >> >> I don't see any check for (dgap_NumBoards < MAXBOARDS), which I think I >> probably should, but I do see we are calling dgap_tty_register, which >> can fail, without actually checking the return value. Also, yes, >> dgap_Major_Xxxx_Registered seems to be always "false" until registered, >> and it looks like dgap_Major_X_Registered flags could be removed >> because the only places we can unregister is at module_cleanup or >> "after" it is already registered. >> >> What is the driver _supposed_ to do if we fail something on the second >> or later board? Is the driver supposed to cleanup and exit or are we >> supposed to stay loaded for the board/boards that are usable? > > Stay loaded. > Then these tests on brd->dgap_Major_Serial_Registered need to stay in there. If I have 3 boards and the second fails in some way, if I rmmod the driver they will protect from unregistering a never registered one. At least in the unregister code path. There is probably no need for them in the register code path. I'll work up a patch for this. I need to look at the (dgap_NumBoards < MAXBOARDS) thing too. Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()
On 04/25/2014 07:02 AM, DaeSeok Youn wrote: > Hi, Dan. > > 2014-04-25 18:26 GMT+09:00 Dan Carpenter : >> Mark, maybe you should add yourself to the MAINTAINERS entry for this >> driver? >> I'll look into this. I am clueless on what that would actually mean. >> On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote: >>> @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd) >>> /* Register tty devices */ >>> rc = tty_register_driver(brd->SerialDriver); >>> if (rc < 0) >>> - return rc; >>> + goto free_print_ttys; >>> + >>> brd->dgap_Major_Serial_Registered = TRUE; >>> dgap_BoardsByMajor[brd->SerialDriver->major] = brd; >>> brd->dgap_Serial_Major = brd->SerialDriver->major; >>> @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd) >>> /* Register Transparent Print devices */ >>> rc = tty_register_driver(brd->PrintDriver); >>> if (rc < 0) >>> - return rc; >>> + goto unregister_serial_drv; >>> + >>> brd->dgap_Major_TransparentPrint_Registered = TRUE; >>> dgap_BoardsByMajor[brd->PrintDriver->major] = brd; >>> brd->dgap_TransparentPrint_Major = brd->PrintDriver->major; >>> } >>> >>> return rc; >>> + >>> +unregister_serial_drv: >>> + tty_unregister_driver(brd->SerialDriver); >> >> We only register the ->SerialDriver if someone else hasn't registered it >> first? So this should be: >> >> if (we_were_the_ones_who_registered_the_serial_driver) >> tty_unregister_driver(brd->SerialDriver); >> >> I haven't followed looked at this. Who else is registering the serial >> driver? You have looked at this, what do you think? Or Mark. > registering the brd->XDriver is only done when a board is detected and only during the firmware_load process. If we fail to tty_register_driver do we _need_ to tty_unregister_driver? Isn't that like freeing after an alloc failure? > I think brd struct is from dgap_Board array as global static variable > when this function is > called. So brd->dgap_Major_Serial_Registered is always "false". > If dgap_NumBoards is less than MAXBOARDS, brd->SerialDriver should be > registered. > > I'm not sure.. > I don't see any check for (dgap_NumBoards < MAXBOARDS), which I think I probably should, but I do see we are calling dgap_tty_register, which can fail, without actually checking the return value. Also, yes, dgap_Major_Xxxx_Registered seems to be always "false" until registered, and it looks like dgap_Major_X_Registered flags could be removed because the only places we can unregister is at module_cleanup or "after" it is already registered. What is the driver _supposed_ to do if we fail something on the second or later board? Is the driver supposed to cleanup and exit or are we supposed to stay loaded for the board/boards that are usable? Regards mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()
On 04/25/2014 07:02 AM, DaeSeok Youn wrote: Hi, Dan. 2014-04-25 18:26 GMT+09:00 Dan Carpenter dan.carpen...@oracle.com: Mark, maybe you should add yourself to the MAINTAINERS entry for this driver? I'll look into this. I am clueless on what that would actually mean. On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote: @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd) /* Register tty devices */ rc = tty_register_driver(brd-SerialDriver); if (rc 0) - return rc; + goto free_print_ttys; + brd-dgap_Major_Serial_Registered = TRUE; dgap_BoardsByMajor[brd-SerialDriver-major] = brd; brd-dgap_Serial_Major = brd-SerialDriver-major; @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd) /* Register Transparent Print devices */ rc = tty_register_driver(brd-PrintDriver); if (rc 0) - return rc; + goto unregister_serial_drv; + brd-dgap_Major_TransparentPrint_Registered = TRUE; dgap_BoardsByMajor[brd-PrintDriver-major] = brd; brd-dgap_TransparentPrint_Major = brd-PrintDriver-major; } return rc; + +unregister_serial_drv: + tty_unregister_driver(brd-SerialDriver); We only register the -SerialDriver if someone else hasn't registered it first? So this should be: if (we_were_the_ones_who_registered_the_serial_driver) tty_unregister_driver(brd-SerialDriver); I haven't followed looked at this. Who else is registering the serial driver? You have looked at this, what do you think? Or Mark. registering the brd-XDriver is only done when a board is detected and only during the firmware_load process. If we fail to tty_register_driver do we _need_ to tty_unregister_driver? Isn't that like freeing after an alloc failure? I think brd struct is from dgap_Board array as global static variable when this function is called. So brd-dgap_Major_Serial_Registered is always false. If dgap_NumBoards is less than MAXBOARDS, brd-SerialDriver should be registered. I'm not sure.. I don't see any check for (dgap_NumBoards MAXBOARDS), which I think I probably should, but I do see we are calling dgap_tty_register, which can fail, without actually checking the return value. Also, yes, dgap_Major_Xxxx_Registered seems to be always false until registered, and it looks like dgap_Major_X_Registered flags could be removed because the only places we can unregister is at module_cleanup or after it is already registered. What is the driver _supposed_ to do if we fail something on the second or later board? Is the driver supposed to cleanup and exit or are we supposed to stay loaded for the board/boards that are usable? Regards mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()
On 04/25/2014 08:59 AM, Dan Carpenter wrote: On Fri, Apr 25, 2014 at 08:29:41AM -0400, Mark Hounschell wrote: On 04/25/2014 07:02 AM, DaeSeok Youn wrote: Hi, Dan. 2014-04-25 18:26 GMT+09:00 Dan Carpenter dan.carpen...@oracle.com: Mark, maybe you should add yourself to the MAINTAINERS entry for this driver? I'll look into this. I am clueless on what that would actually mean. Just add your name with Lidza in the MAINTAINERS file so that people will CC you on all the patches. DIGI EPCA PCI PRODUCTS M: Lidza Louina lidza.lou...@gmail.com L: driverdev-de...@linuxdriverproject.org S: Maintained F: drivers/staging/dgap/ You don't have to do it if you don't want to, but you seem to be working on this driver and I'm going to refer questions to you either way. :P On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote: @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd) /* Register tty devices */ rc = tty_register_driver(brd-SerialDriver); if (rc 0) - return rc; + goto free_print_ttys; + brd-dgap_Major_Serial_Registered = TRUE; dgap_BoardsByMajor[brd-SerialDriver-major] = brd; brd-dgap_Serial_Major = brd-SerialDriver-major; @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd) /* Register Transparent Print devices */ rc = tty_register_driver(brd-PrintDriver); if (rc 0) - return rc; + goto unregister_serial_drv; + brd-dgap_Major_TransparentPrint_Registered = TRUE; dgap_BoardsByMajor[brd-PrintDriver-major] = brd; brd-dgap_TransparentPrint_Major = brd-PrintDriver-major; } return rc; + +unregister_serial_drv: + tty_unregister_driver(brd-SerialDriver); We only register the -SerialDriver if someone else hasn't registered it first? So this should be: if (we_were_the_ones_who_registered_the_serial_driver) tty_unregister_driver(brd-SerialDriver); I haven't followed looked at this. Who else is registering the serial driver? You have looked at this, what do you think? Or Mark. registering the brd-XDriver is only done when a board is detected and only during the firmware_load process. If we fail to tty_register_driver do we _need_ to tty_unregister_driver? Isn't that like freeing after an alloc failure? The allocation is conditional so the free should be conditional. If we didn't allocate it, then we shouldn't free it. It wouldn't have even been a question except I'm not sure the allocation is *really* conditional because brd-dgap_Major_Serial_Registered might always be false like you guys seem to be saying. I think brd struct is from dgap_Board array as global static variable when this function is called. So brd-dgap_Major_Serial_Registered is always false. If dgap_NumBoards is less than MAXBOARDS, brd-SerialDriver should be registered. I'm not sure.. I don't see any check for (dgap_NumBoards MAXBOARDS), which I think I probably should, but I do see we are calling dgap_tty_register, which can fail, without actually checking the return value. Also, yes, dgap_Major_Xxxx_Registered seems to be always false until registered, and it looks like dgap_Major_X_Registered flags could be removed because the only places we can unregister is at module_cleanup or after it is already registered. What is the driver _supposed_ to do if we fail something on the second or later board? Is the driver supposed to cleanup and exit or are we supposed to stay loaded for the board/boards that are usable? Stay loaded. Then these tests on brd-dgap_Major_Serial_Registered need to stay in there. If I have 3 boards and the second fails in some way, if I rmmod the driver they will protect from unregistering a never registered one. At least in the unregister code path. There is probably no need for them in the register code path. I'll work up a patch for this. I need to look at the (dgap_NumBoards MAXBOARDS) thing too. Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: remove useless cast on kzalloc()
On 03/06/2014 01:17 AM, Daeseok Youn wrote: coccinelle warning: drivers/staging/dgap/dgap.c:782:3-7: WARNING: casting value returned by k[cmz]alloc to (char *) is useless. drivers/staging/dgap/dgap.c:776:2-16: WARNING: casting value returned by k[cmz]alloc to (struct board_t *) is useless. Signed-off-by: Daeseok Youn --- drivers/staging/dgap/dgap.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index cbce457..1adcd13 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -773,13 +773,13 @@ static int dgap_found_board(struct pci_dev *pdev, int id) /* get the board structure and prep it */ brd = dgap_Board[dgap_NumBoards] = - (struct board_t *) kzalloc(sizeof(struct board_t), GFP_KERNEL); + kzalloc(sizeof(struct board_t), GFP_KERNEL); if (!brd) return -ENOMEM; /* make a temporary message buffer for the boot messages */ brd->msgbuf = brd->msgbuf_head = - (char *) kzalloc(sizeof(char) * 8192, GFP_KERNEL); + kzalloc(sizeof(char) * 8192, GFP_KERNEL); if (!brd->msgbuf) { kfree(brd); return -ENOMEM; I'm pretty sure this has already been fixed up in current staging-next. Thanks Mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging: dgap: remove useless cast on kzalloc()
On 03/06/2014 01:17 AM, Daeseok Youn wrote: coccinelle warning: drivers/staging/dgap/dgap.c:782:3-7: WARNING: casting value returned by k[cmz]alloc to (char *) is useless. drivers/staging/dgap/dgap.c:776:2-16: WARNING: casting value returned by k[cmz]alloc to (struct board_t *) is useless. Signed-off-by: Daeseok Youn daeseok.y...@gmail.com --- drivers/staging/dgap/dgap.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c index cbce457..1adcd13 100644 --- a/drivers/staging/dgap/dgap.c +++ b/drivers/staging/dgap/dgap.c @@ -773,13 +773,13 @@ static int dgap_found_board(struct pci_dev *pdev, int id) /* get the board structure and prep it */ brd = dgap_Board[dgap_NumBoards] = - (struct board_t *) kzalloc(sizeof(struct board_t), GFP_KERNEL); + kzalloc(sizeof(struct board_t), GFP_KERNEL); if (!brd) return -ENOMEM; /* make a temporary message buffer for the boot messages */ brd-msgbuf = brd-msgbuf_head = - (char *) kzalloc(sizeof(char) * 8192, GFP_KERNEL); + kzalloc(sizeof(char) * 8192, GFP_KERNEL); if (!brd-msgbuf) { kfree(brd); return -ENOMEM; I'm pretty sure this has already been fixed up in current staging-next. Thanks Mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/