from:"Mark Hounschell"

Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels

2021-03-11 Thread Mark Hounschell


On 3/10/21 4:48 PM, Hans de Goede wrote:

Hi,

On 3/10/21 9:55 PM, Filipe Laíns wrote:

On Wed, 2021-03-10 at 15:24 -0500, Mark Hounschell wrote:


That is correct, I don't have any buttons bound to keyboard events. With
the original patch the G4(forward) and G5(Backward) buttons work in a
browser. I guess G7, G8, and G9 buttons are programmable to keyboard events?

However this patch does not seem to fix the messages I get.

Regards
Mark


Those events belong to the USB HID button usage page and are sent by the
receiver in the HID device with the unnumbered report descriptor, so they are
not affected.

Looking at the report descriptor for the other HID device, I see a report ID of
128 (0x80) used for a vendor application, I am not really sure what it is used
for and can't seem to trigger my device to send it.

I am gonna guess this is the device reporting the pressed buttons via vendor
reports or something like that. Speaking as the person who added support for
this device in libratbag, this report is very likely not something that we don't
need in our custom drivers and just likely something extra that Logitech built
to achieve something custom in the Windows driver. FWIW, this device is a very
weird one, it does not even follow Logitech's own spec :P

Seeing this report the driver chugs.

if (report > REPORT_TYPE_RFREPORT_LAST) {
hid_err(hdev, "Unexpected input report number %d\n", report);
return;
}

Causing your

[   36.471326] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report 
number 128
[   36.565317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report 
number 128
[   42.390321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input report 
number 128

I feel like the correct fix for these cases is not to consume the report and not
forward it to device node, but rather to forward it to the receiver node.

(looping in Hans)
Hans, you introduced this code, do you remember why? Where did
REPORT_TYPE_RFREPORT_LAST get its value from and what is the purpose of this
check?
Shouldn't we just keep forwarding unknown reports to the receiver node? Is there
any technical limitation to do that? I am not too familiar with this part of the
code.


The code used by the recvr_type_gaming_hidpp receivers is shared with all the
other non-unifying receivers. Even though these receivers are not unifying the
non gaming versions may still have multiple devices (typically a keyboard + a 
mouse)
paired with them.

The standard HID interfaces which these devices emulate are usually split in
at least 2 HID interfaces:

1. A keyboard following the requirements of the "boot keyboard" subclass of the
USB HID class, so that the keyboard works inside say the BIOS setup screen.
This uses a single unnumbered HID report

2. A mouse + media-keys interface, which delivers numbered reports, including 
the
special Logitech HID++ reports for things like battery monitoring, but also some
special keys, which have their own sub-addressing embedded inside the reports.

The driver asks the receiver for a list of paired devices and then builts a list
of devices, which are then instantiated as child-HID devices which are
handled by the drivers/hid/hid-logitech-hidpp.c driver.

Any input reports received by drivers/hid/hid-logitech-dj.c are then forwarded
to the instantiated child devices, where they are actually processed.

The problem is that there is not a 1:1 relation between the interfaces and
the instantiated child-devices, so the driver aggregates all input-reports
from both interfaces together and then dispatches / forwards them to the
child-devices using its own internal addressing.

This forwarding uses 2 different addressing schemes:

1. If the report received is a special HID++ report, then it is forwarded to
paired-dev child-dev matching the HID++ device-index which is embedded
inside these special reports.

2. If a normal (unnumbered or numbered) report is received then that report is
forwarded based on the report-number.  What happens here is that each paired-dev
which the hid-logitech-dj.c code instantiates has a bitmask associated with it
which indicates which kind of reports it consumes. So e.g. a normal mouse will
only consume mouse input-reports (STD_MOUSE, report-id 2) and a keyboard
will consume all of:

#define STD_KEYBOARDBIT(1)
#define MULTIMEDIA  BIT(3)
#define POWER_KEYS  BIT(4)
#define MEDIA_CENTERBIT(8)
#define KBD_LEDSBIT(14)

When forwarding these normal (unnumbered or numbered) reports, the list of
paired devices is searched and the report is forwarded to the first paired-dev
which reports_supported bitmask includes the report-nr:

spin_lock_irqsave(_dev->lock, flags);
 for (i = 0; i < (DJ_MAX_PAIRED_DEVICES + DJ_DEVICE_INDEX_MIN); i++) {

Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels

2021-03-10 Thread Mark Hounschell


On 3/10/21 3:24 PM, Mark Hounschell wrote:

On 3/10/21 2:56 PM, Filipe Laíns wrote:

On Wed, 2021-03-10 at 13:55 -0500, Mark Hounschell wrote:

I have been using a Logitech wireless G602 mouse since forever. As of
kernel 5.10.11 I get the following kernel messages;


$dmesg | grep -i logitech

(snip)

.
.
.
Every mouse event seems to produce another "Unexpected input report
number 128" kernel message.

The commit that started this is:

commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da
Author: Filipe LaÃns 
Date:   Mon Jan 4 20:47:17 2021 +

  HID: logitech-dj: add the G602 receiver

  [ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ]

  Tested. The device gets correctly exported to userspace and I 
can see

  mouse and keyboard events.

  Signed-off-by: Filipe LaÃns 
  Signed-off-by: Jiri Kosina 
  Signed-off-by: Sasha Levin 

The actual patch:

diff --git a/drivers/hid/hid-logitech-dj.c 
b/drivers/hid/hid-logitech-dj.c

index 1ffcfc9a1e033..45e7e0bdd382b 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -1869,6 +1869,10 @@ static const struct hid_device_id
logi_dj_receivers[] = {
    HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
  0xc531),
   .driver_data = recvr_type_gaming_hidpp},
+   { /* Logitech G602 receiver (0xc537) */
+ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
+   0xc537),
+    .driver_data = recvr_type_gaming_hidpp},
  { /* Logitech lightspeed receiver (0xc539) */
    HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
  USB_DEVICE_ID_LOGITECH_NANO_RECEIVER_LIGHTSPEED_1),



markh@harley:~> lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 046d:c537 Logitech, Inc.
Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub



With the patch reverted:

$dmesg | grep -i logitech

(snip)


$lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 046d:c537 Logitech, Inc.
Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

With or without the patch and error messages the mouse has always 
worked.


Regards
Mark


Yes, sorry about that. The following patch should fix it, can you 
confirm?
You probably didn't notice any breakage because you do not have any of 
your

buttons bound to keyboard events.


commit ef07c116d98772952807492bd32a61f5af172a94 
(hid/for-5.11/upstream-fixes)

Author: Filipe Laíns 
Date:   Fri Feb 5 14:34:44 2021 +

 HID: logitech-dj: add support for keyboard events in eQUAD step 4 
Gaming


 In e400071a805d6229223a98899e9da8c6233704a1 I added support for the
 receiver that comes with the G602 device, but unfortunately I 
screwed up
 during testing and it seems the keyboard events were actually not 
being

 sent to userspace.
 This resulted in keyboard events being broken in userspace, please
 backport the fix.

 The receiver uses the normal 0x01 Logitech keyboard report 
descriptor,

 as expected, so it is just a matter of flagging it as supported.

 Reported in
 https://github.com/libratbag/libratbag/issues/1124

 Fixes: e400071a805d6 ("HID: logitech-dj: add the G602 receiver")
 Cc: 
 Signed-off-by: Filipe Laíns 
 Signed-off-by: Jiri Kosina 

diff --git a/drivers/hid/hid-logitech-dj.c 
b/drivers/hid/hid-logitech-dj.c

index 45e7e0bdd382..fcdc922bc973 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -980,6 +980,7 @@ static void logi_hidpp_recv_queue_notif(struct 
hid_device

*hdev,
 case 0x07:
 device_type = "eQUAD step 4 Gaming";
 logi_hidpp_dev_conn_notif_equad(hdev, hidpp_report, 
);

+   workitem.reports_supported |= STD_KEYBOARD;
 break;
 case 0x08:
 device_type = "eQUAD step 4 for gamepads";




That is correct, I don't have any buttons bound to keyboard events. With 
the original patch the G4(forward) and G5(Backward) buttons work in a 
browser. I guess G7, G8, and G9 buttons are programmable to keyboard 
events?


However this patch does not seem to fix the messages I get.



Actually is not this patch already in a 5.10.21 kernel?

Mark

Re: Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels

2021-03-10 Thread Mark Hounschell


On 3/10/21 2:56 PM, Filipe Laíns wrote:

On Wed, 2021-03-10 at 13:55 -0500, Mark Hounschell wrote:

I have been using a Logitech wireless G602 mouse since forever. As of
kernel 5.10.11 I get the following kernel messages;


$dmesg | grep -i logitech

(snip)

.
.
.
Every mouse event seems to produce another "Unexpected input report
number 128" kernel message.

The commit that started this is:

commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da
Author: Filipe LaÃns 
Date:   Mon Jan 4 20:47:17 2021 +

  HID: logitech-dj: add the G602 receiver

  [ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ]

  Tested. The device gets correctly exported to userspace and I can see
  mouse and keyboard events.

  Signed-off-by: Filipe LaÃns 
  Signed-off-by: Jiri Kosina 
  Signed-off-by: Sasha Levin 

The actual patch:

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index 1ffcfc9a1e033..45e7e0bdd382b 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -1869,6 +1869,10 @@ static const struct hid_device_id
logi_dj_receivers[] = {
    HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
  0xc531),
   .driver_data = recvr_type_gaming_hidpp},
+   { /* Logitech G602 receiver (0xc537) */
+ HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
+   0xc537),
+    .driver_data = recvr_type_gaming_hidpp},
  { /* Logitech lightspeed receiver (0xc539) */
    HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH,
  USB_DEVICE_ID_LOGITECH_NANO_RECEIVER_LIGHTSPEED_1),



markh@harley:~> lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 046d:c537 Logitech, Inc.
Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub



With the patch reverted:

$dmesg | grep -i logitech

(snip)


$lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 046d:c537 Logitech, Inc.
Bus 003 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

With or without the patch and error messages the mouse has always worked.

Regards
Mark


Yes, sorry about that. The following patch should fix it, can you confirm?
You probably didn't notice any breakage because you do not have any of your
buttons bound to keyboard events.


commit ef07c116d98772952807492bd32a61f5af172a94 (hid/for-5.11/upstream-fixes)
Author: Filipe Laíns 
Date:   Fri Feb 5 14:34:44 2021 +

 HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming

 In e400071a805d6229223a98899e9da8c6233704a1 I added support for the
 receiver that comes with the G602 device, but unfortunately I screwed up
 during testing and it seems the keyboard events were actually not being
 sent to userspace.
 This resulted in keyboard events being broken in userspace, please
 backport the fix.

 The receiver uses the normal 0x01 Logitech keyboard report descriptor,
 as expected, so it is just a matter of flagging it as supported.

 Reported in
 https://github.com/libratbag/libratbag/issues/1124

 Fixes: e400071a805d6 ("HID: logitech-dj: add the G602 receiver")
 Cc: 
 Signed-off-by: Filipe Laíns 
 Signed-off-by: Jiri Kosina 

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index 45e7e0bdd382..fcdc922bc973 100644
--- a/drivers/hid/hid-logitech-dj.c
+++ b/drivers/hid/hid-logitech-dj.c
@@ -980,6 +980,7 @@ static void logi_hidpp_recv_queue_notif(struct hid_device
*hdev,
 case 0x07:
 device_type = "eQUAD step 4 Gaming";
 logi_hidpp_dev_conn_notif_equad(hdev, hidpp_report, );
+   workitem.reports_supported |= STD_KEYBOARD;
 break;
 case 0x08:
 device_type = "eQUAD step 4 for gamepads";




That is correct, I don't have any buttons bound to keyboard events. With 
the original patch the G4(forward) and G5(Backward) buttons work in a 
browser. I guess G7, G8, and G9 buttons are programmable to keyboard events?


However this patch does not seem to fix the messages I get.

Regards
Mark

Logitech G602 wireless mouse kernel error messages in 5.10.11+ kernels

2021-03-10 Thread Mark Hounschell

I have been using a Logitech wireless G602 mouse since forever. As of 
kernel 5.10.11 I get the following kernel messages;



$dmesg | grep -i logitech
[7.102140] usb 3-3.4: Manufacturer: Logitech
[   10.036763] input: Logitech USB Receiver as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.0/0003:046D:C537.0001/input/input10
[   10.037904] hid-generic 0003:046D:C537.0001: input,hidraw0: USB HID 
v1.11 Mouse [Logitech USB Receiver] on usb-:16:00.3-3.4/input0
[   10.039542] input: Logitech USB Receiver Keyboard as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input11
[   10.092374] input: Logitech USB Receiver Consumer Control as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input12
[   10.093726] input: Logitech USB Receiver System Control as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input13
[   10.094924] input: Logitech USB Receiver as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/input/input16
[   10.096155] hid-generic 0003:046D:C537.0002: input,hiddev96,hidraw1: 
USB HID v1.11 Keyboard [Logitech USB Receiver] on 
usb-:16:00.3-3.4/input1
[   10.121557] logitech-djreceiver 0003:046D:C537.0001: hidraw0: USB HID 
v1.11 Mouse [Logitech USB Receiver] on usb-:16:00.3-3.4/input0
[   10.264445] logitech-djreceiver 0003:046D:C537.0002: 
hiddev96,hidraw1: USB HID v1.11 Keyboard [Logitech USB Receiver] on 
usb-:16:00.3-3.4/input1
[   10.320315] logitech-djreceiver 0003:046D:C537.0002: device of type 
eQUAD step 4 Gaming (0x07) connected on slot 1
[   10.321505] input: Logitech Wireless Mouse PID:402c Mouse as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/0003:046D:402C.0003/input/input17
[   10.322637] hid-generic 0003:046D:402C.0003: input,hidraw2: USB HID 
v1.11 Mouse [Logitech Wireless Mouse PID:402c] on 
usb-:16:00.3-3.4/input1:1
[   10.360344] input: Logitech G602 as 
/devices/pci:00/:00:08.1/:16:00.3/usb3/3-3/3-3.4/3-3.4:1.1/0003:046D:C537.0002/0003:046D:402C.0003/input/input21
[   10.361537] logitech-hidpp-device 0003:046D:402C.0003: input,hidraw2: 
USB HID v1.11 Mouse [Logitech G602] on usb-:16:00.3-3.4/input1:1
[   23.271323] logitech-hidpp-device 0003:046D:402C.0003: HID++ 2.0 
device connected.
[   36.471326] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   36.565317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   42.390321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   42.478325] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   42.771318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   42.859318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   42.955318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.049318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.105317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.200317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.280318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.375321] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.455318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.558317] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.638318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.741319] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.812319] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   43.916318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   44.003318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128
[   44.106318] logitech-djreceiver 0003:046D:C537.0002: Unexpected input 
report number 128

.
.
.
Every mouse event seems to produce another "Unexpected input report 
number 128" kernel message.


The commit that started this is:

commit 1e6fc9768ed2c3917e1fd7af26cb194dfe14f7da
Author: Filipe LaÃns 
Date:   Mon Jan 4 20:47:17 2021 +

HID: logitech-dj: add the G602 receiver

[ Upstream commit e400071a805d6229223a98899e9da8c6233704a1 ]

Tested. The device gets correctly exported to userspace and I can see
mouse and keyboard events.

Signed-off-by: Filipe LaÃns 
Signed-off-by: Jiri Kosina 
Signed-off-by: Sasha Levin 

The actual patch:

diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c
index 1ffcfc9a1e033..45e7e0bdd382b 100644
---

4.18.1 "make install" issue

2018-08-17 Thread Mark Hounschell



Something is amiss. A "make install" of an already compiled 4.18.1
kernel is taking 10 minutes or so. The same thing for a 4.17.14 kernel
takes all of a minute.

And during that 10 minutes 
or so we seem hung up on i915.ko

 2626 pts/8S+ 0:00 make -f ./scripts/Makefile.build obj=arch/x86/boot 
install
 2627 pts/8S+ 0:00 /bin/sh /sbin/installkernel 4.18.1 
arch/x86/boot/bzImage System.map /boot
 2642 pts/11   Ss 0:00 /bin/bash
 2659 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4581 pts/1Ss 0:00 /bin/bash
 4620 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4622 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4623 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4624 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4625 pts/8S+ 0:09 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4627 pts/8R+ 1:47 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4682 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4683 pts/8S+ 0:00 cat 
/lib/modules/4.18.1//kernel/drivers/gpu/drm/i915/i915.ko
 4684 pts/8S+ 0:00 tr -cd [:print:]



 # date ; make install ; date
Fri Aug 17 10:25:47 EDT 2018

sh ./arch/x86/boot/install.sh 4.18.1 arch/x86/boot/bzImage \
System.map "/boot"
dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: warpclock ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: i18n ***
dracut: Could not find FONT_MAP none!
dracut: *** Including module: drm ***


Here it hangs up for 10 minutes with the above "ps ax" output? 


dracut: Possible missing firmware "nvidia/gv100/sec2/sig.bin" for kernel module 
"nouveau.ko"
.
.
.
dracut: *** Including module: plymouth ***
dracut: *** Including module: kernel-modules ***
dracut: *** Including module: resume ***
dracut: *** Including module: rootfs-block ***
dracut: *** Including module: terminfo ***
dracut: *** Including module: udev-rules ***
dracut: Skipping udev rule: 40-redhat.rules
dracut: Skipping udev rule: 50-firmware.rules
dracut: Skipping udev rule: 50-udev.rules
dracut: Skipping udev rule: 91-permissions.rules
dracut: Skipping udev rule: 80-drivers-modprobe.rules
dracut: *** Including module: dracut-systemd ***
dracut: *** Including module: haveged ***
dracut: *** Including module: ostree ***
dracut: *** Including module: usrmount ***
dracut: *** Including module: base ***
dracut: *** Including module: fs-lib ***
dracut: *** Including module: shutdown ***
dracut: *** Including module: suse ***
dracut: *** Including modules done ***
dracut: *** Installing kernel module dependencies and firmware ***
dracut: *** Installing kernel module dependencies and firmware done ***
dracut: *** Resolving executable dependencies ***
dracut: *** Resolving executable dependencies done***
dracut: *** Hardlinking files ***
dracut: *** Hardlinking files done ***
dracut: *** Stripping files ***
dracut: *** Stripping files done ***
dracut: *** Generating early-microcode cpio image ***
dracut: *** Constructing AuthenticAMD.bin 
dracut: *** Store current command line parameters ***
dracut: Stored kernel commandline:
dracut:  resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b
dracut:  root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 
rootflags=rw,relatime
dracut: *** Creating image file '/boot/initrd-4.18.1' ***
dracut: *** Creating initramfs image file '/boot/initrd-4.18.1' done ***

Fri Aug 17 10:34:50 EDT 2018


Same machine installing a 4.17.14 kernel take 1 minute:
# date ; make install ; date
Fri Aug 17 13:27:41 EDT 2018

sh ./arch/x86/boot/install.sh 4.17.14-lcrs arch/x86/boot/bzImage \
System.map "/boot"
renamed '/boot/config-4.17.14-lcrs' -> '/boot/config-4.17.14-lcrs.old'
dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.17.14-lcrs 
4.17.14-lcrs
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: warpclock ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: i18n ***
dracut: Could not find FONT_MAP none!
dracut: *** Including module: drm ***
dracut: *** Including module: plymouth ***
.
.
.
dracut: *** Generating early-microcode cpio image ***
dracut: *** Constructing AuthenticAMD.bin 
dracut: *** Store current command line parameters ***
dracut: Stored kernel commandline:
dracut:  resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b
dracut:  root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 
rootflags=rw,relatime

4.18.1 "make install" issue

2018-08-17 Thread Mark Hounschell



Something is amiss. A "make install" of an already compiled 4.18.1
kernel is taking 10 minutes or so. The same thing for a 4.17.14 kernel
takes all of a minute.

And during that 10 minutes 
or so we seem hung up on i915.ko

 2626 pts/8S+ 0:00 make -f ./scripts/Makefile.build obj=arch/x86/boot 
install
 2627 pts/8S+ 0:00 /bin/sh /sbin/installkernel 4.18.1 
arch/x86/boot/bzImage System.map /boot
 2642 pts/11   Ss 0:00 /bin/bash
 2659 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4581 pts/1Ss 0:00 /bin/bash
 4620 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4622 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4623 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4624 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4625 pts/8S+ 0:09 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4627 pts/8R+ 1:47 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4682 pts/8S+ 0:00 /bin/bash --norc /usr/bin/dracut --hostonly --force 
/boot/initrd-4.18.1 4.18.1
 4683 pts/8S+ 0:00 cat 
/lib/modules/4.18.1//kernel/drivers/gpu/drm/i915/i915.ko
 4684 pts/8S+ 0:00 tr -cd [:print:]



 # date ; make install ; date
Fri Aug 17 10:25:47 EDT 2018

sh ./arch/x86/boot/install.sh 4.18.1 arch/x86/boot/bzImage \
System.map "/boot"
dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.18.1 4.18.1
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: warpclock ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: i18n ***
dracut: Could not find FONT_MAP none!
dracut: *** Including module: drm ***


Here it hangs up for 10 minutes with the above "ps ax" output? 


dracut: Possible missing firmware "nvidia/gv100/sec2/sig.bin" for kernel module 
"nouveau.ko"
.
.
.
dracut: *** Including module: plymouth ***
dracut: *** Including module: kernel-modules ***
dracut: *** Including module: resume ***
dracut: *** Including module: rootfs-block ***
dracut: *** Including module: terminfo ***
dracut: *** Including module: udev-rules ***
dracut: Skipping udev rule: 40-redhat.rules
dracut: Skipping udev rule: 50-firmware.rules
dracut: Skipping udev rule: 50-udev.rules
dracut: Skipping udev rule: 91-permissions.rules
dracut: Skipping udev rule: 80-drivers-modprobe.rules
dracut: *** Including module: dracut-systemd ***
dracut: *** Including module: haveged ***
dracut: *** Including module: ostree ***
dracut: *** Including module: usrmount ***
dracut: *** Including module: base ***
dracut: *** Including module: fs-lib ***
dracut: *** Including module: shutdown ***
dracut: *** Including module: suse ***
dracut: *** Including modules done ***
dracut: *** Installing kernel module dependencies and firmware ***
dracut: *** Installing kernel module dependencies and firmware done ***
dracut: *** Resolving executable dependencies ***
dracut: *** Resolving executable dependencies done***
dracut: *** Hardlinking files ***
dracut: *** Hardlinking files done ***
dracut: *** Stripping files ***
dracut: *** Stripping files done ***
dracut: *** Generating early-microcode cpio image ***
dracut: *** Constructing AuthenticAMD.bin 
dracut: *** Store current command line parameters ***
dracut: Stored kernel commandline:
dracut:  resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b
dracut:  root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 
rootflags=rw,relatime
dracut: *** Creating image file '/boot/initrd-4.18.1' ***
dracut: *** Creating initramfs image file '/boot/initrd-4.18.1' done ***

Fri Aug 17 10:34:50 EDT 2018


Same machine installing a 4.17.14 kernel take 1 minute:
# date ; make install ; date
Fri Aug 17 13:27:41 EDT 2018

sh ./arch/x86/boot/install.sh 4.17.14-lcrs arch/x86/boot/bzImage \
System.map "/boot"
renamed '/boot/config-4.17.14-lcrs' -> '/boot/config-4.17.14-lcrs.old'
dracut: Executing: /usr/bin/dracut --hostonly --force /boot/initrd-4.17.14-lcrs 
4.17.14-lcrs
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: warpclock ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: i18n ***
dracut: Could not find FONT_MAP none!
dracut: *** Including module: drm ***
dracut: *** Including module: plymouth ***
.
.
.
dracut: *** Generating early-microcode cpio image ***
dracut: *** Constructing AuthenticAMD.bin 
dracut: *** Store current command line parameters ***
dracut: Stored kernel commandline:
dracut:  resume=UUID=86053ad3-d0fe-483f-bdd3-bdf8c26fa47b
dracut:  root=UUID=a271f078-c500-4099-b058-fa9812037fc2 rootfstype=ext4 
rootflags=rw,relatime

AM4 B350 chipset Sata/IDE problem

2017-11-21 Thread Mark Hounschell

I'm running a 4.13.13 kernel on an AM4 MSI B350 Tomahawk Arctic MB. 
I have a couple of these setups and both do the same. I get this output just 
building
a kernel. My drives are older Seagate ST3160815AS, 3.AAD, max UDMA/133 
configured
at boot time for UDMA/133 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
I can take this very disk and boot it up on an AM3 based setup and I will not
get these messages. I also have a pci-e Sata card in this AM4 box and when
connected to it instead of the MB Sata ports, I also don't get these messages.

I don't really know where the problem is but since this is a fairly new chip 
set I guess
it could be a kernel issue so thought I would post it for you all.

The kernel does in fact build and run. I get no user level errors.



Nov 21 10:28:01 harley kernel: ata1.00: exception Emask 0x11 SAct 0x7ffe 
SErr 0x40 action 0x6 frozen
Nov 21 10:28:03 harley kernel: ata1.00: irq_stat 0x4808, interface fatal 
error
Nov 21 10:28:03 harley kernel: ata1: SError: { Handshk }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:00:00:a0:f6/08:00:0c:00:00/40 
tag 0 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:08:00:a8:f6/08:00:0c:00:00/40 
tag 1 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:10:00:b0:f6/08:00:0c:00:00/40 
tag 2 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:18:00:b8:f6/08:00:0c:00:00/40 
tag 3 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:20:00:c0:f6/08:00:0c:00:00/40 
tag 4 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:28:00:c8:f6/08:00:0c:00:00/40 
tag 5 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:30:00:d0:f6/08:00:0c:00:00/40 
tag 6 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:38:00:d8:f6/08:00:0c:00:00/40 
tag 7 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:40:00:e0:f6/08:00:0c:00:00/40 
tag 8 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:48:00:e8:f6/08:00:0c:00:00/40 
tag 9 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:50:00:f0:f6/08:00:0c:00:00/40 
tag 10 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:58:00:f8:f6/08:00:0c:00:00/40 
tag 11 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:60:00:00:f7/08:00:0c:00:00/40 
tag 12 ncq dma 1048576 ou
 res

AM4 B350 chipset Sata/IDE problem

2017-11-21 Thread Mark Hounschell

I'm running a 4.13.13 kernel on an AM4 MSI B350 Tomahawk Arctic MB. 
I have a couple of these setups and both do the same. I get this output just 
building
a kernel. My drives are older Seagate ST3160815AS, 3.AAD, max UDMA/133 
configured
at boot time for UDMA/133 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
I can take this very disk and boot it up on an AM3 based setup and I will not
get these messages. I also have a pci-e Sata card in this AM4 box and when
connected to it instead of the MB Sata ports, I also don't get these messages.

I don't really know where the problem is but since this is a fairly new chip 
set I guess
it could be a kernel issue so thought I would post it for you all.

The kernel does in fact build and run. I get no user level errors.



Nov 21 10:28:01 harley kernel: ata1.00: exception Emask 0x11 SAct 0x7ffe 
SErr 0x40 action 0x6 frozen
Nov 21 10:28:03 harley kernel: ata1.00: irq_stat 0x4808, interface fatal 
error
Nov 21 10:28:03 harley kernel: ata1: SError: { Handshk }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:00:00:a0:f6/08:00:0c:00:00/40 
tag 0 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:08:00:a8:f6/08:00:0c:00:00/40 
tag 1 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:10:00:b0:f6/08:00:0c:00:00/40 
tag 2 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:18:00:b8:f6/08:00:0c:00:00/40 
tag 3 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:20:00:c0:f6/08:00:0c:00:00/40 
tag 4 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:28:00:c8:f6/08:00:0c:00:00/40 
tag 5 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:30:00:d0:f6/08:00:0c:00:00/40 
tag 6 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:38:00:d8:f6/08:00:0c:00:00/40 
tag 7 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:40:00:e0:f6/08:00:0c:00:00/40 
tag 8 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:48:00:e8:f6/08:00:0c:00:00/40 
tag 9 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:50:00:f0:f6/08:00:0c:00:00/40 
tag 10 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:58:00:f8:f6/08:00:0c:00:00/40 
tag 11 ncq dma 1048576 ou
 res 40/00:30:00:d0:f6/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Nov 21 10:28:03 harley kernel: ata1.00: status: { DRDY }
Nov 21 10:28:03 harley kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Nov 21 10:28:03 harley kernel: ata1.00: cmd 61/00:60:00:00:f7/08:00:0c:00:00/40 
tag 12 ncq dma 1048576 ou
 res

ata1.00: failed command: WRITE FPDMA QUEUED on new AMD AM4 MSI B350 Motherboard

2017-07-07 Thread Mark Hounschell

With both 4.11 and 4.12 kernels I get the following when doing heavy disk I/O, 
like a kernel build with "make -j 15". Even copying the kernel source tree from 
one place to another. The hardware is an MSI B350 Tomahawk Arctic MB with 16GB 
of memory and a Ryzen 1700 processor. The disk being used is a 160Gb Seagate 
ST3160815AS that has error free media according to "badblocks -w".

Jul  6 13:34:43 cpu0 kernel: ata1.00: exception Emask 0x11 SAct 0x7ffb SErr 
0x40 action 0x6 frozen
Jul  6 13:34:43 cpu0 kernel: ata1.00: irq_stat 0x4808, interface fatal error
Jul  6 13:34:43 cpu0 kernel: ata1: SError: { Handshk }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:00:57:89:90/00:00:03:00:00/40 
tag 0 ncq dma 4096 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Jul  6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:08:87:89:90/00:00:03:00:00/40 
tag 1 ncq dma 4096 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Jul  6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/20:10:97:89:90/00:00:03:00:00/40 
tag 2 ncq dma 16384 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)

When I set the kernel cmdline option libata.force=noncq, the messages change 
into:

[ 1724.372101] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 
frozen
[ 1724.375888] ata1.00: irq_stat 0x4801, interface fatal error
[ 1724.379721] ata1: SError: { Handshk }
[ 1724.383691] ata1.00: failed command: WRITE DMA EXT
[ 1724.383695] ata1.00: cmd 35/00:50:67:0d:e4/00:09:02:00:00/e0 tag 10 dma 
1220608 out
res 51/84:50:67:0d:e4/00:09:02:00:00/e0 Emask 0x10 (ATA 
bus error)
[ 1724.383699] ata1.00: status: { DRDY ERR }
[ 1724.383700] ata1.00: error: { ICRC ABRT }
[ 1724.383706] ata1: hard resetting link
[ 1724.850060] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1724.959883] ata1.00: configured for UDMA/133
[ 1724.959910] ata1: EH complete
[ 1921.704356] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 
frozen
[ 1921.708292] ata1.00: irq_stat 0x4801, interface fatal error
[ 1921.712210] ata1: SError: { Handshk }
[ 1921.716294] ata1.00: failed command: WRITE DMA EXT
[ 1921.716297] ata1.00: cmd 35/00:90:ef:93:86/00:03:02:00:00/e0 tag 18 dma 
466944 out
res 51/84:90:ef:93:86/00:03:02:00:00/e0 Emask 0x10 (ATA 
bus error)
[ 1921.716298] ata1.00: status: { DRDY ERR }
[ 1921.716298] ata1.00: error: { ICRC ABRT }
[ 1921.716303] ata1: hard resetting link
[ 1922.175312] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1922.284165] ata1.00: configured for UDMA/133
[ 1922.288602] ata1: EH complete


smartctl shows no issues with the drive. In fact I can take this very drive 
and install it an an AM3 machine and everything works just fine. I have 
also installed a PCI-e Sata card and connected the drive to that and that
works just fine also. 

So I have either a linux kernel problem or a hardware problem on 
this brand new AM4 motherboard. I don't really know what it 
is other than it is something related with the AMD B350 chipset.

It is a fairly new chip set so I am suspicious of the kernel. 

# smartctl -a  /dev/sda
smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.11.6-lcrs] (SUSE RPM)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10
Device Model: ST3160815AS
Serial Number:6RACD737
Firmware Version: 4.AAB
User Capacity:160,041,885,696 bytes [160 GB]
Sector Size:  512 bytes logical/physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:Fri Jul  7 13:50:50 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection:(  430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection

ata1.00: failed command: WRITE FPDMA QUEUED on new AMD AM4 MSI B350 Motherboard

2017-07-07 Thread Mark Hounschell

With both 4.11 and 4.12 kernels I get the following when doing heavy disk I/O, 
like a kernel build with "make -j 15". Even copying the kernel source tree from 
one place to another. The hardware is an MSI B350 Tomahawk Arctic MB with 16GB 
of memory and a Ryzen 1700 processor. The disk being used is a 160Gb Seagate 
ST3160815AS that has error free media according to "badblocks -w".

Jul  6 13:34:43 cpu0 kernel: ata1.00: exception Emask 0x11 SAct 0x7ffb SErr 
0x40 action 0x6 frozen
Jul  6 13:34:43 cpu0 kernel: ata1.00: irq_stat 0x4808, interface fatal error
Jul  6 13:34:43 cpu0 kernel: ata1: SError: { Handshk }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:00:57:89:90/00:00:03:00:00/40 
tag 0 ncq dma 4096 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Jul  6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/08:08:87:89:90/00:00:03:00:00/40 
tag 1 ncq dma 4096 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Jul  6 13:34:43 cpu0 kernel: ata1.00: status: { DRDY }
Jul  6 13:34:43 cpu0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jul  6 13:34:43 cpu0 kernel: ata1.00: cmd 61/20:10:97:89:90/00:00:03:00:00/40 
tag 2 ncq dma 16384 out
 res 40/00:b8:2f:ff:b3/00:00:02:00:00/40 Emask 0x10 (ATA bus error)

When I set the kernel cmdline option libata.force=noncq, the messages change 
into:

[ 1724.372101] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 
frozen
[ 1724.375888] ata1.00: irq_stat 0x4801, interface fatal error
[ 1724.379721] ata1: SError: { Handshk }
[ 1724.383691] ata1.00: failed command: WRITE DMA EXT
[ 1724.383695] ata1.00: cmd 35/00:50:67:0d:e4/00:09:02:00:00/e0 tag 10 dma 
1220608 out
res 51/84:50:67:0d:e4/00:09:02:00:00/e0 Emask 0x10 (ATA 
bus error)
[ 1724.383699] ata1.00: status: { DRDY ERR }
[ 1724.383700] ata1.00: error: { ICRC ABRT }
[ 1724.383706] ata1: hard resetting link
[ 1724.850060] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1724.959883] ata1.00: configured for UDMA/133
[ 1724.959910] ata1: EH complete
[ 1921.704356] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x40 action 0x6 
frozen
[ 1921.708292] ata1.00: irq_stat 0x4801, interface fatal error
[ 1921.712210] ata1: SError: { Handshk }
[ 1921.716294] ata1.00: failed command: WRITE DMA EXT
[ 1921.716297] ata1.00: cmd 35/00:90:ef:93:86/00:03:02:00:00/e0 tag 18 dma 
466944 out
res 51/84:90:ef:93:86/00:03:02:00:00/e0 Emask 0x10 (ATA 
bus error)
[ 1921.716298] ata1.00: status: { DRDY ERR }
[ 1921.716298] ata1.00: error: { ICRC ABRT }
[ 1921.716303] ata1: hard resetting link
[ 1922.175312] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1922.284165] ata1.00: configured for UDMA/133
[ 1922.288602] ata1: EH complete


smartctl shows no issues with the drive. In fact I can take this very drive 
and install it an an AM3 machine and everything works just fine. I have 
also installed a PCI-e Sata card and connected the drive to that and that
works just fine also. 

So I have either a linux kernel problem or a hardware problem on 
this brand new AM4 motherboard. I don't really know what it 
is other than it is something related with the AMD B350 chipset.

It is a fairly new chip set so I am suspicious of the kernel. 

# smartctl -a  /dev/sda
smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.11.6-lcrs] (SUSE RPM)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10
Device Model: ST3160815AS
Serial Number:6RACD737
Firmware Version: 4.AAB
User Capacity:160,041,885,696 bytes [160 GB]
Sector Size:  512 bytes logical/physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:Fri Jul  7 13:50:50 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection:(  430) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-22 Thread Mark Hounschell


On 11/22/2016 03:45 AM, Joerg Roedel wrote:

On Mon, Nov 21, 2016 at 04:47:59PM -0500, Mark Hounschell wrote:

OK, I did get this message before the reported BUG message.

gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not 
allocated [device address=0xffee8000] [size=8192 bytes]

But I've verified that the dma_addr_t that I get for the alloc, and
also use for the free is 0xffee8000 in this case? Is device
"address=0xffee8000" in that message a bug in the message or
do we have a sign extended address problem? It seems strange to me,
I've never seen a dma_addr_t given, when using the iommu, that
high. In the past I've seen them as usually 0x00xx?

I have also verified that simply changing from
pci_alloc/free_consistent to the newer DMA API fixes my issue and I
get no such messages.


Yes, this looks like a sign-extension bug somewhere. But its not in the
amd-iommu driver, because dma-debug also sees it. And from what I can
tell the dma-api interface seems to be fine. It consistently uses
dma_addr_t to pass these values around.

Where can I find the source of the failing code? I need exactly the code
version that triggers the problem.




I certainly don't have a problem sending you the code but I'm sure you 
have better things to do than scour over some out of kernel GPL driver 
code. I see many many users of pci_alloc/free in kernel so it can't be 
broken as badly as it appears to me here. I'm going to just go ahead and 
convert this section of code to use the newer DMA API and be done with it.


It appears pci_alloc/free_consistent is going to be removed completely 
soon anyway.


Thanks
Mark

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-22 Thread Mark Hounschell


On 11/22/2016 03:45 AM, Joerg Roedel wrote:

On Mon, Nov 21, 2016 at 04:47:59PM -0500, Mark Hounschell wrote:

OK, I did get this message before the reported BUG message.

gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not 
allocated [device address=0xffee8000] [size=8192 bytes]

But I've verified that the dma_addr_t that I get for the alloc, and
also use for the free is 0xffee8000 in this case? Is device
"address=0xffee8000" in that message a bug in the message or
do we have a sign extended address problem? It seems strange to me,
I've never seen a dma_addr_t given, when using the iommu, that
high. In the past I've seen them as usually 0x00xx?

I have also verified that simply changing from
pci_alloc/free_consistent to the newer DMA API fixes my issue and I
get no such messages.


Yes, this looks like a sign-extension bug somewhere. But its not in the
amd-iommu driver, because dma-debug also sees it. And from what I can
tell the dma-api interface seems to be fine. It consistently uses
dma_addr_t to pass these values around.

Where can I find the source of the failing code? I need exactly the code
version that triggers the problem.




I certainly don't have a problem sending you the code but I'm sure you 
have better things to do than scour over some out of kernel GPL driver 
code. I see many many users of pci_alloc/free in kernel so it can't be 
broken as badly as it appears to me here. I'm going to just go ahead and 
convert this section of code to use the newer DMA API and be done with it.


It appears pci_alloc/free_consistent is going to be removed completely 
soon anyway.


Thanks
Mark

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-21 Thread Mark Hounschell

On 11/21/2016 10:32 AM, Joerg Roedel wrote:
> On Fri, Nov 18, 2016 at 09:04:05AM -0500, Mark Hounschell wrote:
>> OK. It's attached.
> 
>> [   45.033715] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1158 
>> check_unmap+0x4ef/0x990
>> [   45.037651] 3c59x :04:04.0: DMA-API: device driver failed to check 
>> map error[device address=0xffedf840] [size=1536 bytes] [mapped as 
>> single]
> 
> Okay, there is a dma-debug warning from the 3c59x driver. Can you please
> blacklist this driver (or compile it out) and re-try? The dma-debug code
> will only show the first warning and stop then. So any possible warnings
> from your out-of-tree driver will not be shown as long as the 3c59x
> driver is loaded.
> 
> Alternativly you can write 1 to dma-api/all_errors in debug-fs so that
> it will show all errors. But I suggest to unload the 3com driver first
> and then load your driver.
> 

OK, I did get this message before the reported BUG message.

gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not 
allocated [device address=0xffee8000] [size=8192 bytes]

But I've verified that the dma_addr_t that I get for the alloc, and also use 
for the free is 0xffee8000 in this case? Is device 
"address=0xffee8000" in that message a bug in the message or do we have 
a sign extended address problem? It seems strange to me, I've never seen a 
dma_addr_t given, when using the iommu, that high. In the past I've seen them 
as usually 0x00xx?

I have also verified that simply changing from pci_alloc/free_consistent to the 
newer DMA API fixes my issue and I get no such messages.

Mark

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-21 Thread Mark Hounschell

On 11/21/2016 10:32 AM, Joerg Roedel wrote:
> On Fri, Nov 18, 2016 at 09:04:05AM -0500, Mark Hounschell wrote:
>> OK. It's attached.
> 
>> [   45.033715] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1158 
>> check_unmap+0x4ef/0x990
>> [   45.037651] 3c59x :04:04.0: DMA-API: device driver failed to check 
>> map error[device address=0xffedf840] [size=1536 bytes] [mapped as 
>> single]
> 
> Okay, there is a dma-debug warning from the 3c59x driver. Can you please
> blacklist this driver (or compile it out) and re-try? The dma-debug code
> will only show the first warning and stop then. So any possible warnings
> from your out-of-tree driver will not be shown as long as the 3c59x
> driver is loaded.
> 
> Alternativly you can write 1 to dma-api/all_errors in debug-fs so that
> it will show all errors. But I suggest to unload the 3com driver first
> and then load your driver.
> 

OK, I did get this message before the reported BUG message.

gpiohsd gpiohsd: DMA-API: device driver tries to free DMA memory it has not 
allocated [device address=0xffee8000] [size=8192 bytes]

But I've verified that the dma_addr_t that I get for the alloc, and also use 
for the free is 0xffee8000 in this case? Is device 
"address=0xffee8000" in that message a bug in the message or do we have 
a sign extended address problem? It seems strange to me, I've never seen a 
dma_addr_t given, when using the iommu, that high. In the past I've seen them 
as usually 0x00xx?

I have also verified that simply changing from pci_alloc/free_consistent to the 
newer DMA API fixes my issue and I get no such messages.

Mark

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-18 Thread Mark Hounschell


On 11/17/2016 05:00 PM, Joerg Roedel wrote:

On Thu, Nov 17, 2016 at 04:53:24PM -0500, Mark Hounschell wrote:

On 11/17/2016 04:41 PM, Joerg Roedel wrote:

Hi Mark,

On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote:

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This
is an out of tree GPL driver using
pci_alloc_consistent/pci_free_consistent. The free causes this.


Can you please run the driver with dma-api debugging enabled? Does it
trigger any warnings?



It's been a while since I've done that for ya. Kernel config
options? Kernel cmd line you want?


Just enable CONFIG_DMA_API_DEBUG and boot the resulting kernel. Please
send me full dmesg when the issue happens again.



OK. It's attached.

Mark

[0.00] Linux version 4.8.7-lcrs (markh@harley) (gcc version 4.8.3 
20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) ) #1 SMP PREEMPT Fri Nov 
18 08:19:32 EST 2016
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.8.7-lcrs 
root=/dev/disk/by-id/ata-ST3160815AS_6RACD7A6-part5 splash=verbose showopts 
video=1600x1200 noresume apm=off vmalloc=256M audit=0 rcu_nocbs=1,2,3,4,5,6,7 3
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable
[0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xae474fff] usable
[0.00] BIOS-e820: [mem 0xae475000-0xaea32fff] reserved
[0.00] BIOS-e820: [mem 0xaea33000-0xaee2efff] ACPI NVS
[0.00] BIOS-e820: [mem 0xaee2f000-0xaf154fff] reserved
[0.00] BIOS-e820: [mem 0xaf155000-0xaf155fff] usable
[0.00] BIOS-e820: [mem 0xaf156000-0xaf35bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xaf35c000-0xaf7f] usable
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed61000-0xfed70fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved
[0.00] BIOS-e820: [mem 0xfef0-0x] reserved
[0.00] BIOS-e820: [mem 0x00011000-0x00044fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./990FXA-UD5, BIOS FB 01/23/2013
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x45 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B write-through
[0.00]   C-C write-protect
[0.00]   D-EBFFF uncachable
[0.00]   EC000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base  mask 8000 write-back
[0.00]   1 base 8000 mask C000 write-back
[0.00]   2 base AF80 mask FF80 uncachable
[0.00]   3 base B000 mask F000 uncachable
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00045000 aka 17664M
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xaf80-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xaf800 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd990-0x000fd99f] mapped at 
[880fd990]
[0.00] Base memory trampoline at [88098000] 98000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x01113000, 0x01113fff] PGTABLE
[0.00] BRK [0x01114000, 0x01114fff] PGTABLE
[0.00] BRK [0x01115000, 0x01115fff] PGTABLE
[0.00] BRK [0x01116000, 0x01116fff] PGTABLE
[0.0

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-18 Thread Mark Hounschell


On 11/17/2016 05:00 PM, Joerg Roedel wrote:

On Thu, Nov 17, 2016 at 04:53:24PM -0500, Mark Hounschell wrote:

On 11/17/2016 04:41 PM, Joerg Roedel wrote:

Hi Mark,

On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote:

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This
is an out of tree GPL driver using
pci_alloc_consistent/pci_free_consistent. The free causes this.


Can you please run the driver with dma-api debugging enabled? Does it
trigger any warnings?



It's been a while since I've done that for ya. Kernel config
options? Kernel cmd line you want?


Just enable CONFIG_DMA_API_DEBUG and boot the resulting kernel. Please
send me full dmesg when the issue happens again.



OK. It's attached.

Mark

[0.00] Linux version 4.8.7-lcrs (markh@harley) (gcc version 4.8.3 
20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) ) #1 SMP PREEMPT Fri Nov 
18 08:19:32 EST 2016
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.8.7-lcrs 
root=/dev/disk/by-id/ata-ST3160815AS_6RACD7A6-part5 splash=verbose showopts 
video=1600x1200 noresume apm=off vmalloc=256M audit=0 rcu_nocbs=1,2,3,4,5,6,7 3
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable
[0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xae474fff] usable
[0.00] BIOS-e820: [mem 0xae475000-0xaea32fff] reserved
[0.00] BIOS-e820: [mem 0xaea33000-0xaee2efff] ACPI NVS
[0.00] BIOS-e820: [mem 0xaee2f000-0xaf154fff] reserved
[0.00] BIOS-e820: [mem 0xaf155000-0xaf155fff] usable
[0.00] BIOS-e820: [mem 0xaf156000-0xaf35bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xaf35c000-0xaf7f] usable
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed61000-0xfed70fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved
[0.00] BIOS-e820: [mem 0xfef0-0x] reserved
[0.00] BIOS-e820: [mem 0x00011000-0x00044fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./990FXA-UD5, BIOS FB 01/23/2013
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x45 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B write-through
[0.00]   C-C write-protect
[0.00]   D-EBFFF uncachable
[0.00]   EC000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base  mask 8000 write-back
[0.00]   1 base 8000 mask C000 write-back
[0.00]   2 base AF80 mask FF80 uncachable
[0.00]   3 base B000 mask F000 uncachable
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00045000 aka 17664M
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xaf80-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xaf800 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd990-0x000fd99f] mapped at 
[880fd990]
[0.00] Base memory trampoline at [88098000] 98000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x01113000, 0x01113fff] PGTABLE
[0.00] BRK [0x01114000, 0x01114fff] PGTABLE
[0.00] BRK [0x01115000, 0x01115fff] PGTABLE
[0.00] BRK [0x01116000, 0x01116fff] PGTABLE
[0.0

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-17 Thread Mark Hounschell


On 11/17/2016 04:41 PM, Joerg Roedel wrote:

Hi Mark,

On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote:

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This
is an out of tree GPL driver using
pci_alloc_consistent/pci_free_consistent. The free causes this.


Can you please run the driver with dma-api debugging enabled? Does it
trigger any warnings?



It's been a while since I've done that for ya. Kernel config options? 
Kernel cmd line you want?


Mark

Re: BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-17 Thread Mark Hounschell


On 11/17/2016 04:41 PM, Joerg Roedel wrote:

Hi Mark,

On Thu, Nov 17, 2016 at 02:13:59PM -0500, Mark Hounschell wrote:

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This
is an out of tree GPL driver using
pci_alloc_consistent/pci_free_consistent. The free causes this.


Can you please run the driver with dma-api debugging enabled? Does it
trigger any warnings?



It's been a while since I've done that for ya. Kernel config options? 
Kernel cmd line you want?


Mark

BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-17 Thread Mark Hounschell

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out 
of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free 
causes this.

The commit is:

2d4c515bf06c9bce87b546279413621f847ef6a3 is the first bad commit
commit 2d4c515bf06c9bce87b546279413621f847ef6a3
Author: Joerg Roedel 
Date:   Tue Jul 5 16:21:32 2016 +0200

iommu/amd: Remove other remains of old address allocator

There are other remains in the code from the old allocatore.
Remove them all.

Signed-off-by: Joerg Roedel 

:04 04 87b020717cdd7dcab45e3574dfb6649d05dcc044 
817e613acede15fafb70b046d831f558e6bffd93 M  drivers



Nov 16 08:39:15 harley kernel: kernel BUG at drivers/iommu/amd_iommu.c:1436!
Nov 16 08:39:15 harley kernel: invalid opcode:  [#1] PREEMPT SMP
Nov 16 08:39:15 harley kernel: Modules linked in: gpiohsd(O) bnep bluetooth 
rfkill nvidia(PO) drm af_packet iscsi_ibft iscsi_boot_sysfs 
snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec 
snd_hda_core xhci_pci snd_hwdep xhci_hcd snd_pcm synclink_gt osst 3c59x r8169 
irqbypass crc32_pclmul dgap(C) snd_timer n_hdlc hdlc crc32c_intel snd 
tpm_infineon st joydev mii aesni_intel shpchp aes_x86_64 glue_helper tpm_tis 
tpm_tis_core input_leds lrw tpm k10temp fam15h_power soundcore gf128mul 
ablk_helper processor i2c_piix4 fjes serio_raw pcspkr cryptd dm_mod sr_mod 
cdrom ata_generic aic79xx pata_atiixp ohci_pci aic7xxx scsi_transport_spi 
mxm_wmi wmi button sg autofs4
Nov 16 08:39:15 harley kernel: CPU: 6 PID: 4750 Comm: trusim1a Tainted: P   
  C O4.8.0 #1
Nov 16 08:39:15 harley kernel: Hardware name: Gigabyte Technology Co., Ltd. To 
be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013
Nov 16 08:39:15 harley kernel: task: 88042d849c00 task.stack: 
88043991
Nov 16 08:39:15 harley kernel: RIP: 0010:[]  
[] iommu_unmap_page+0xd3/0xe0
Nov 16 08:39:15 harley kernel: RSP: 0018:880439913b60  EFLAGS: 00010202
Nov 16 08:39:15 harley kernel: RAX: 122f RBX: 1000 RCX: 
0027
Nov 16 08:39:15 harley kernel: RDX: fdba RSI: 88043b13a0a0 RDI: 

Nov 16 08:39:15 harley kernel: RBP: 880439913b90 R08:  R09: 

Nov 16 08:39:15 harley kernel: R10: 0005 R11: 0002 R12: 
88043b13a0a0
Nov 16 08:39:15 harley kernel: R13: 88043b13a000 R14: 1230 R15: 
ffec6260
Nov 16 08:39:15 harley kernel: FS:  7fb360e6e700() 
GS:88044fd8() knlGS:
Nov 16 08:39:15 harley kernel: CS:  0010 DS:  ES:  CR0: 80050033
Nov 16 08:39:15 harley kernel: CR2: 7fe345af58f0 CR3: 00042d48a000 CR4: 
000406e0
Nov 16 08:39:15 harley kernel: Stack:
Nov 16 08:39:15 harley kernel:  0246 0001 
ffec7000 0001
Nov 16 08:39:15 harley kernel:  88043b13a000 ffec6000 
880439913bd0 80620298
Nov 16 08:39:15 harley kernel:  88043d2480a0 1000 
88043d2480a0 ffec6000
Nov 16 08:39:15 harley kernel: Call Trace:
Nov 16 08:39:15 harley kernel:  [] 
__unmap_single.isra.23+0x58/0x1b0
Nov 16 08:39:15 harley kernel:  [] free_coherent+0x76/0xc0
Nov 16 08:39:15 harley kernel:  [] 
ehsd_set_signal+0x395/0x620 [gpiohsd]
Nov 16 08:39:15 harley kernel:  [] ? 
finish_task_switch+0x73/0x1e0
Nov 16 08:39:15 harley kernel:  [] ehsd_ioctl+0xdb9/0x17cb 
[gpiohsd]
Nov 16 08:39:15 harley kernel:  [] ? pat_enabled+0x20/0x20
Nov 16 08:39:15 harley kernel:  [] ? 
walk_system_ram_range+0x70/0xc0
Nov 16 08:39:15 harley kernel:  [] ? 
unmap_page_range+0x698/0x8a0
Nov 16 08:39:15 harley kernel:  [] ? find_next_bit+0x19/0x20
Nov 16 08:39:15 harley kernel:  [] ? cpumask_any_but+0x2c/0x40
Nov 16 08:39:15 harley kernel:  [] ? 
flush_tlb_mm_range+0x4c/0x1a0
Nov 16 08:39:15 harley kernel:  [] ? tlb_finish_mmu+0x17/0x50
Nov 16 08:39:15 harley kernel:  [] ? unmap_region+0xe0/0x110
Nov 16 08:39:15 harley kernel:  [] ? 
__rb_erase_color+0x138/0x280
Nov 16 08:39:15 harley kernel:  [] do_vfs_ioctl+0x8f/0x5a0
Nov 16 08:39:15 harley kernel:  [] ? do_munmap+0x27d/0x370
Nov 16 08:39:15 harley kernel:  [] SyS_ioctl+0x74/0x80
Nov 16 08:39:15 harley kernel:  [] 
entry_SYSCALL_64_fastpath+0x17/0x93
Nov 16 08:39:15 harley kernel: Code: d7 49 01 c7 4c 39 f3 77 89 4d 85 f6 75 14 
48 83 c4 08 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 49 8d 46 ff 4c 85 
f0 74 e3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 8b 57 7c 48 89 f0 85 d2
Nov 16 08:39:15 harley kernel: RIP  [] 
iommu_unmap_page+0xd3/0xe0
Nov 16 08:39:15 harley kernel:  RSP 
Nov 16 08:39:15 harley kernel: ---[ end trace 45a510d4d695b2d3 ]---


Regards
Mark

BUG at drivers/iommu/amd_iommu.c:1436!

2016-11-17 Thread Mark Hounschell

Kernel version is 4.8.0. No failure when the IOMMU is disabled. This is an out 
of tree GPL driver using pci_alloc_consistent/pci_free_consistent. The free 
causes this.

The commit is:

2d4c515bf06c9bce87b546279413621f847ef6a3 is the first bad commit
commit 2d4c515bf06c9bce87b546279413621f847ef6a3
Author: Joerg Roedel 
Date:   Tue Jul 5 16:21:32 2016 +0200

iommu/amd: Remove other remains of old address allocator

There are other remains in the code from the old allocatore.
Remove them all.

Signed-off-by: Joerg Roedel 

:04 04 87b020717cdd7dcab45e3574dfb6649d05dcc044 
817e613acede15fafb70b046d831f558e6bffd93 M  drivers



Nov 16 08:39:15 harley kernel: kernel BUG at drivers/iommu/amd_iommu.c:1436!
Nov 16 08:39:15 harley kernel: invalid opcode:  [#1] PREEMPT SMP
Nov 16 08:39:15 harley kernel: Modules linked in: gpiohsd(O) bnep bluetooth 
rfkill nvidia(PO) drm af_packet iscsi_ibft iscsi_boot_sysfs 
snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec 
snd_hda_core xhci_pci snd_hwdep xhci_hcd snd_pcm synclink_gt osst 3c59x r8169 
irqbypass crc32_pclmul dgap(C) snd_timer n_hdlc hdlc crc32c_intel snd 
tpm_infineon st joydev mii aesni_intel shpchp aes_x86_64 glue_helper tpm_tis 
tpm_tis_core input_leds lrw tpm k10temp fam15h_power soundcore gf128mul 
ablk_helper processor i2c_piix4 fjes serio_raw pcspkr cryptd dm_mod sr_mod 
cdrom ata_generic aic79xx pata_atiixp ohci_pci aic7xxx scsi_transport_spi 
mxm_wmi wmi button sg autofs4
Nov 16 08:39:15 harley kernel: CPU: 6 PID: 4750 Comm: trusim1a Tainted: P   
  C O4.8.0 #1
Nov 16 08:39:15 harley kernel: Hardware name: Gigabyte Technology Co., Ltd. To 
be filled by O.E.M./990FXA-UD5, BIOS FB 01/23/2013
Nov 16 08:39:15 harley kernel: task: 88042d849c00 task.stack: 
88043991
Nov 16 08:39:15 harley kernel: RIP: 0010:[]  
[] iommu_unmap_page+0xd3/0xe0
Nov 16 08:39:15 harley kernel: RSP: 0018:880439913b60  EFLAGS: 00010202
Nov 16 08:39:15 harley kernel: RAX: 122f RBX: 1000 RCX: 
0027
Nov 16 08:39:15 harley kernel: RDX: fdba RSI: 88043b13a0a0 RDI: 

Nov 16 08:39:15 harley kernel: RBP: 880439913b90 R08:  R09: 

Nov 16 08:39:15 harley kernel: R10: 0005 R11: 0002 R12: 
88043b13a0a0
Nov 16 08:39:15 harley kernel: R13: 88043b13a000 R14: 1230 R15: 
ffec6260
Nov 16 08:39:15 harley kernel: FS:  7fb360e6e700() 
GS:88044fd8() knlGS:
Nov 16 08:39:15 harley kernel: CS:  0010 DS:  ES:  CR0: 80050033
Nov 16 08:39:15 harley kernel: CR2: 7fe345af58f0 CR3: 00042d48a000 CR4: 
000406e0
Nov 16 08:39:15 harley kernel: Stack:
Nov 16 08:39:15 harley kernel:  0246 0001 
ffec7000 0001
Nov 16 08:39:15 harley kernel:  88043b13a000 ffec6000 
880439913bd0 80620298
Nov 16 08:39:15 harley kernel:  88043d2480a0 1000 
88043d2480a0 ffec6000
Nov 16 08:39:15 harley kernel: Call Trace:
Nov 16 08:39:15 harley kernel:  [] 
__unmap_single.isra.23+0x58/0x1b0
Nov 16 08:39:15 harley kernel:  [] free_coherent+0x76/0xc0
Nov 16 08:39:15 harley kernel:  [] 
ehsd_set_signal+0x395/0x620 [gpiohsd]
Nov 16 08:39:15 harley kernel:  [] ? 
finish_task_switch+0x73/0x1e0
Nov 16 08:39:15 harley kernel:  [] ehsd_ioctl+0xdb9/0x17cb 
[gpiohsd]
Nov 16 08:39:15 harley kernel:  [] ? pat_enabled+0x20/0x20
Nov 16 08:39:15 harley kernel:  [] ? 
walk_system_ram_range+0x70/0xc0
Nov 16 08:39:15 harley kernel:  [] ? 
unmap_page_range+0x698/0x8a0
Nov 16 08:39:15 harley kernel:  [] ? find_next_bit+0x19/0x20
Nov 16 08:39:15 harley kernel:  [] ? cpumask_any_but+0x2c/0x40
Nov 16 08:39:15 harley kernel:  [] ? 
flush_tlb_mm_range+0x4c/0x1a0
Nov 16 08:39:15 harley kernel:  [] ? tlb_finish_mmu+0x17/0x50
Nov 16 08:39:15 harley kernel:  [] ? unmap_region+0xe0/0x110
Nov 16 08:39:15 harley kernel:  [] ? 
__rb_erase_color+0x138/0x280
Nov 16 08:39:15 harley kernel:  [] do_vfs_ioctl+0x8f/0x5a0
Nov 16 08:39:15 harley kernel:  [] ? do_munmap+0x27d/0x370
Nov 16 08:39:15 harley kernel:  [] SyS_ioctl+0x74/0x80
Nov 16 08:39:15 harley kernel:  [] 
entry_SYSCALL_64_fastpath+0x17/0x93
Nov 16 08:39:15 harley kernel: Code: d7 49 01 c7 4c 39 f3 77 89 4d 85 f6 75 14 
48 83 c4 08 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 49 8d 46 ff 4c 85 
f0 74 e3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 8b 57 7c 48 89 f0 85 d2
Nov 16 08:39:15 harley kernel: RIP  [] 
iommu_unmap_page+0xd3/0xe0
Nov 16 08:39:15 harley kernel:  RSP 
Nov 16 08:39:15 harley kernel: ---[ end trace 45a510d4d695b2d3 ]---


Regards
Mark

Re: [PATCH 4.7 114/143] Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"

2016-09-07 Thread Mark Hounschell


On 09/05/2016 12:44 PM, Greg Kroah-Hartman wrote:

4.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Jens Axboe 

commit 468c298ad3ed3f0d94a65f8ca00f6bfc6c2b4e33 upstream.

This reverts commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e.

Signed-off-by: Greg Kroah-Hartman 

---
 drivers/block/floppy.c |   21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,6 +3663,11 @@ static int floppy_open(struct block_devi

opened_bdev[drive] = bdev;

+   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
+   res = -EINVAL;
+   goto out;
+   }
+
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3706,15 +3711,13 @@ static int floppy_open(struct block_devi
if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   if (mode & (FMODE_READ|FMODE_WRITE)) {
-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
-   }
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;

res = -EROFS;






Can I assume the one to revert 09954bad448791ef01202351d437abdd9497a804 
will be a separate one?


Thanks
Mark

Re: [PATCH 4.7 114/143] Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"

2016-09-07 Thread Mark Hounschell


On 09/05/2016 12:44 PM, Greg Kroah-Hartman wrote:

4.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Jens Axboe 

commit 468c298ad3ed3f0d94a65f8ca00f6bfc6c2b4e33 upstream.

This reverts commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e.

Signed-off-by: Greg Kroah-Hartman 

---
 drivers/block/floppy.c |   21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,6 +3663,11 @@ static int floppy_open(struct block_devi

opened_bdev[drive] = bdev;

+   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
+   res = -EINVAL;
+   goto out;
+   }
+
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3706,15 +3711,13 @@ static int floppy_open(struct block_devi
if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   if (mode & (FMODE_READ|FMODE_WRITE)) {
-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
-   }
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;

res = -EROFS;






Can I assume the one to revert 09954bad448791ef01202351d437abdd9497a804 
will be a separate one?


Thanks
Mark

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-25 Thread Mark Hounschell


On 08/25/2016 10:41 AM, Jens Axboe wrote:

On 08/25/2016 07:08 AM, Mark Hounschell wrote:

On 08/24/2016 05:11 PM, Jiri Kosina wrote:

On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote:


I have a problem with this patch. It only fixes one of the regressions
caused by the original change to the floppy driver. It does not
address the
user land breakage of removing the NODELAY flag checks.


Does the same problem also happen in Linus's tree?  If not, any
hints on
the patch that might have fixed it there?




Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user
land multiple ways) should be reverted and a fix for what ever obscure
BUG was supposed to be fixed should be "retried".


That's still an unresolved issue, and it's on my list to things to look
into.

This particular patch though fixes a different issue, and should be
applied to -stable.



This patch fixes a bug that was introduced to fix "some other obscure
BUG" that has nothing to do whatever with the "physical floppy" device.
And it broke user land in at least 2 ways. All this patch does is revert
part of the original patch so that it is only broke in 1 way to user
land. It should revert the whole thing IMHO.


Which patch is this? If that is truly the case, it should be reverted
asap.


commit 09954bad448791ef01202351d437abdd9497a804 seems to be the one.

Mark

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-25 Thread Mark Hounschell


On 08/25/2016 10:41 AM, Jens Axboe wrote:

On 08/25/2016 07:08 AM, Mark Hounschell wrote:

On 08/24/2016 05:11 PM, Jiri Kosina wrote:

On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote:


I have a problem with this patch. It only fixes one of the regressions
caused by the original change to the floppy driver. It does not
address the
user land breakage of removing the NODELAY flag checks.


Does the same problem also happen in Linus's tree?  If not, any
hints on
the patch that might have fixed it there?




Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user
land multiple ways) should be reverted and a fix for what ever obscure
BUG was supposed to be fixed should be "retried".


That's still an unresolved issue, and it's on my list to things to look
into.

This particular patch though fixes a different issue, and should be
applied to -stable.



This patch fixes a bug that was introduced to fix "some other obscure
BUG" that has nothing to do whatever with the "physical floppy" device.
And it broke user land in at least 2 ways. All this patch does is revert
part of the original patch so that it is only broke in 1 way to user
land. It should revert the whole thing IMHO.


Which patch is this? If that is truly the case, it should be reverted
asap.


commit 09954bad448791ef01202351d437abdd9497a804 seems to be the one.

Mark

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-25 Thread Mark Hounschell


On 08/24/2016 05:11 PM, Jiri Kosina wrote:

On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote:


I have a problem with this patch. It only fixes one of the regressions
caused by the original change to the floppy driver. It does not address the
user land breakage of removing the NODELAY flag checks.


Does the same problem also happen in Linus's tree?  If not, any hints on
the patch that might have fixed it there?




Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user 
land multiple ways) should be reverted and a fix for what ever obscure 
BUG was supposed to be fixed should be "retried".



That's still an unresolved issue, and it's on my list to things to look
into.

This particular patch though fixes a different issue, and should be
applied to -stable.



This patch fixes a bug that was introduced to fix "some other obscure 
BUG" that has nothing to do whatever with the "physical floppy" device. 
And it broke user land in at least 2 ways. All this patch does is revert 
part of the original patch so that it is only broke in 1 way to user 
land. It should revert the whole thing IMHO.


Regards
Mark


Thanks,

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-25 Thread Mark Hounschell


On 08/24/2016 05:11 PM, Jiri Kosina wrote:

On Wed, 24 Aug 2016, Greg Kroah-Hartman wrote:


I have a problem with this patch. It only fixes one of the regressions
caused by the original change to the floppy driver. It does not address the
user land breakage of removing the NODELAY flag checks.


Does the same problem also happen in Linus's tree?  If not, any hints on
the patch that might have fixed it there?




Yes, it does. IMHO the entire patch (between 4.4 and 4.5 that broke user 
land multiple ways) should be reverted and a fix for what ever obscure 
BUG was supposed to be fixed should be "retried".



That's still an unresolved issue, and it's on my list to things to look
into.

This particular patch though fixes a different issue, and should be
applied to -stable.



This patch fixes a bug that was introduced to fix "some other obscure 
BUG" that has nothing to do whatever with the "physical floppy" device. 
And it broke user land in at least 2 ways. All this patch does is revert 
part of the original patch so that it is only broke in 1 way to user 
land. It should revert the whole thing IMHO.


Regards
Mark


Thanks,

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-24 Thread Mark Hounschell


On 08/18/2016 09:59 AM, Greg Kroah-Hartman wrote:

4.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Kosina 

commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e upstream.

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/block/floppy.c |   21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_devi

opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_devi
if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;






I have a problem with this patch. It only fixes one of the regressions 
caused by the original change to the floppy driver. It does not address 
the user land breakage of removing the NODELAY flag checks.


Thanks
Mark

Re: [PATCH 4.7 146/186] floppy: fix open(O_ACCMODE) for ioctl-only open

2016-08-24 Thread Mark Hounschell


On 08/18/2016 09:59 AM, Greg Kroah-Hartman wrote:

4.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Kosina 

commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e upstream.

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/block/floppy.c |   21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_devi

opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_devi
if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;






I have a problem with this patch. It only fixes one of the regressions 
caused by the original change to the floppy driver. It does not address 
the user land breakage of removing the NODELAY flag checks.


Thanks
Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-23 Thread Mark Hounschell


On 07/12/2016 04:54 AM, Jiri Kosina wrote:

On Mon, 11 Jul 2016, Mark Hounschell wrote:


Well, all that was specified in my original post. I can no longer open the
floppy drive with no floppy media inserted. Worse, I can also no longer open a
floppy with media inserted that is not a "linux" recognized format. A floppy
drive is a removable media device and should be treated as such. The original
implementation of the O_NDELAY flag allowed it to be.

Any removable media device should be capable of being opened with no, or even
unrecognizable media installed. The kernel and its utilities should not
"assume" to much when it comes to removable media. Consider a SCSI tape drive
or even a removable media SCSI disk drive. How would you explain an open
failure to someone trying to open a SCSI tape drive that had no tape or even a
"non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write protected
but didn't include O_RDONLY in the open?


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that it
has worked this way, but I agree with you that we can't ignore the fact
that there is userspace relying on this behavior.


The original behavior of the floppy driver was correct. I have no idea
what BUG these changes were supposed to fix but the "fix" obviously
broke user land. Was this bug reported by some new ROBOT test or
something? The kernel floppy driver has been stable for years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist (because
they are much faster than a real hardware, and the different timing
reveals bugs that were not visible before).

This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.



Are we making any progress on fixing this regression? Anything I can do?

Regards
Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-23 Thread Mark Hounschell


On 07/12/2016 04:54 AM, Jiri Kosina wrote:

On Mon, 11 Jul 2016, Mark Hounschell wrote:


Well, all that was specified in my original post. I can no longer open the
floppy drive with no floppy media inserted. Worse, I can also no longer open a
floppy with media inserted that is not a "linux" recognized format. A floppy
drive is a removable media device and should be treated as such. The original
implementation of the O_NDELAY flag allowed it to be.

Any removable media device should be capable of being opened with no, or even
unrecognizable media installed. The kernel and its utilities should not
"assume" to much when it comes to removable media. Consider a SCSI tape drive
or even a removable media SCSI disk drive. How would you explain an open
failure to someone trying to open a SCSI tape drive that had no tape or even a
"non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write protected
but didn't include O_RDONLY in the open?


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that it
has worked this way, but I agree with you that we can't ignore the fact
that there is userspace relying on this behavior.


The original behavior of the floppy driver was correct. I have no idea
what BUG these changes were supposed to fix but the "fix" obviously
broke user land. Was this bug reported by some new ROBOT test or
something? The kernel floppy driver has been stable for years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist (because
they are much faster than a real hardware, and the different timing
reveals bugs that were not visible before).

This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.



Are we making any progress on fixing this regression? Anything I can do?

Regards
Mark

Re: Context switch latency in tickless isolated CPU

2016-08-22 Thread Mark Hounschell


On 08/22/2016 11:37 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote:

On 08/22/2016 10:48 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:

On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
<paul...@linux.vnet.ibm.com> wrote:

If latency is all you care about, one approach is to map the device
registers into userspace and do the I/O without assistance from the
kernel.

In addition to the context switch latency, local interrupts are also
closed during
user_enter and user_exit of the context tracking. Therefore, the interrupt
latency might be also increased on the isolated tickless CPU. That
will degrade the
real time performance. Are these two events determined?


Hmmm...  Why would you be taking interrupts on your isolated tickless
CPUs?  Doesn't that defeat the purpose of designating them as isolated
and tickless?


Don't mean to butt in here but think about a "special" PCI card that
does nothing but take an external interrupt or external interrupts
from an outside source where the latency between the time it occurs
on the outside and the time an isolated processor can act on that
event. The IRQ of that card also being pinned/isolated to that
processor. This is a very common thing in the RT world.


In this case, the host OS would see an event-driven real-time workload
from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL
on the host OS.

Of course, if you are instead building an OS to run on the PCI card
itself, then the choice of configuration would depend on how the PCI
card was set up.  If it polled hardware, then NO_HZ_FULL on the PCI card
might work quite well.  But then you wouldn't have interrupts (on the
PCI card), so I am guessing that you mean the scenario covered in the
first paragraph.

Or am I missing your point?

Thanx, Paul



First paragraph scenario is the one I was referring.

Thanks
Mark


Mark


The key point being that effective use of NO_HZ_FULL requires
careful configuration and complete understanding of your workload.
And it is quite possible that you instead need to use something
other than NO_HZ_FULL.

If your question is instead "why must interrupts be disabled during
context tracking", I must defer to people who understand the x86
entry/exit code paths better than I do.

Thanx, Paul

Re: Context switch latency in tickless isolated CPU

2016-08-22 Thread Mark Hounschell


On 08/22/2016 11:37 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 11:12:45AM -0400, Mark Hounschell wrote:

On 08/22/2016 10:48 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:

On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
 wrote:

If latency is all you care about, one approach is to map the device
registers into userspace and do the I/O without assistance from the
kernel.

In addition to the context switch latency, local interrupts are also
closed during
user_enter and user_exit of the context tracking. Therefore, the interrupt
latency might be also increased on the isolated tickless CPU. That
will degrade the
real time performance. Are these two events determined?


Hmmm...  Why would you be taking interrupts on your isolated tickless
CPUs?  Doesn't that defeat the purpose of designating them as isolated
and tickless?


Don't mean to butt in here but think about a "special" PCI card that
does nothing but take an external interrupt or external interrupts
from an outside source where the latency between the time it occurs
on the outside and the time an isolated processor can act on that
event. The IRQ of that card also being pinned/isolated to that
processor. This is a very common thing in the RT world.


In this case, the host OS would see an event-driven real-time workload
from the PCI card, which would lead me to suggest -not- using NO_HZ_FULL
on the host OS.

Of course, if you are instead building an OS to run on the PCI card
itself, then the choice of configuration would depend on how the PCI
card was set up.  If it polled hardware, then NO_HZ_FULL on the PCI card
might work quite well.  But then you wouldn't have interrupts (on the
PCI card), so I am guessing that you mean the scenario covered in the
first paragraph.

Or am I missing your point?

Thanx, Paul



First paragraph scenario is the one I was referring.

Thanks
Mark


Mark


The key point being that effective use of NO_HZ_FULL requires
careful configuration and complete understanding of your workload.
And it is quite possible that you instead need to use something
other than NO_HZ_FULL.

If your question is instead "why must interrupts be disabled during
context tracking", I must defer to people who understand the x86
entry/exit code paths better than I do.

Thanx, Paul

Re: Context switch latency in tickless isolated CPU

2016-08-22 Thread Mark Hounschell


On 08/22/2016 10:48 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:

On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
 wrote:

If latency is all you care about, one approach is to map the device
registers into userspace and do the I/O without assistance from the
kernel.

In addition to the context switch latency, local interrupts are also
closed during
user_enter and user_exit of the context tracking. Therefore, the interrupt
latency might be also increased on the isolated tickless CPU. That
will degrade the
real time performance. Are these two events determined?


Hmmm...  Why would you be taking interrupts on your isolated tickless
CPUs?  Doesn't that defeat the purpose of designating them as isolated
and tickless?



Don't mean to butt in here but think about a "special" PCI card that 
does nothing but take an external interrupt or external interrupts from 
an outside source where the latency between the time it occurs on the 
outside and the time an isolated processor can act on that event. The 
IRQ of that card also being pinned/isolated to that processor. This is a 
very common thing in the RT world.


Mark


The key point being that effective use of NO_HZ_FULL requires
careful configuration and complete understanding of your workload.
And it is quite possible that you instead need to use something
other than NO_HZ_FULL.

If your question is instead "why must interrupts be disabled during
context tracking", I must defer to people who understand the x86
entry/exit code paths better than I do.

Thanx, Paul

Re: Context switch latency in tickless isolated CPU

2016-08-22 Thread Mark Hounschell


On 08/22/2016 10:48 AM, Paul E. McKenney wrote:

On Mon, Aug 22, 2016 at 05:40:03PM +0800, GeHao Kang wrote:

On Sun, Aug 21, 2016 at 10:53 PM, Paul E. McKenney
 wrote:

If latency is all you care about, one approach is to map the device
registers into userspace and do the I/O without assistance from the
kernel.

In addition to the context switch latency, local interrupts are also
closed during
user_enter and user_exit of the context tracking. Therefore, the interrupt
latency might be also increased on the isolated tickless CPU. That
will degrade the
real time performance. Are these two events determined?


Hmmm...  Why would you be taking interrupts on your isolated tickless
CPUs?  Doesn't that defeat the purpose of designating them as isolated
and tickless?



Don't mean to butt in here but think about a "special" PCI card that 
does nothing but take an external interrupt or external interrupts from 
an outside source where the latency between the time it occurs on the 
outside and the time an isolated processor can act on that event. The 
IRQ of that card also being pinned/isolated to that processor. This is a 
very common thing in the RT world.


Mark


The key point being that effective use of NO_HZ_FULL requires
careful configuration and complete understanding of your workload.
And it is quite possible that you instead need to use something
other than NO_HZ_FULL.

If your question is instead "why must interrupts be disabled during
context tracking", I must defer to people who understand the x86
entry/exit code paths better than I do.

Thanx, Paul

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-12 Thread Mark Hounschell


On 08/12/2016 05:37 AM, Jiri Kosina wrote:

On Thu, 11 Aug 2016, Mark Hounschell wrote:


I just tested what is currently in Linus' tree and it does NOT work for
me.


Is there some minimalized reproducer you are seeing the regression with
that you could share?

Thanks,



Your patch is NOT there yet. There is no reference to NODELAY in the 
floppy driver of Linus tree that I checked out yesterday.


Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-12 Thread Mark Hounschell


On 08/12/2016 05:37 AM, Jiri Kosina wrote:

On Thu, 11 Aug 2016, Mark Hounschell wrote:


I just tested what is currently in Linus' tree and it does NOT work for
me.


Is there some minimalized reproducer you are seeing the regression with
that you could share?

Thanks,



Your patch is NOT there yet. There is no reference to NODELAY in the 
floppy driver of Linus tree that I checked out yesterday.


Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-11 Thread Mark Hounschell


On 08/11/2016 09:24 AM, Jiri Kosina wrote:

On Wed, 3 Aug 2016, Mark Hounschell wrote:


I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git.


It's there.


A patch for 4.5 would be easy for me though.


Anyway the commit landed in Linus' tree already (ff06db1ef). Testing it in
your environment would be appreciated.

Thanks,



I just tested what is currently in Linus' tree and it does NOT work for me.

Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-11 Thread Mark Hounschell


On 08/11/2016 09:24 AM, Jiri Kosina wrote:

On Wed, 3 Aug 2016, Mark Hounschell wrote:


I'm not sure how to get "for-linus" branch. I don't see it in linux-block.git.


It's there.


A patch for 4.5 would be easy for me though.


Anyway the commit landed in Linus' tree already (ff06db1ef). Testing it in
your environment would be appreciated.

Thanks,



I just tested what is currently in Linus' tree and it does NOT work for me.

Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-03 Thread Mark Hounschell


On 08/02/2016 05:44 AM, Jiri Kosina wrote:

On Wed, 13 Jul 2016, Mark Hounschell wrote:


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that
it has worked this way, but I agree with you that we can't ignore the
fact that there is userspace relying on this behavior.


I'm not supplementing anything. The driver _did_ this on its own.


I mean, you're passing O_NDELAY to open(/dev/fd0) exactly to avoid kernel
issuing check_disk_change(). That's the only semantics O_NDELAY has for
fd.


I just expect to be able to open the drive to get a handle without the
kernel attempting to access the media. My apps manage a disk_change on
their own. I don't think its check_disk_change that gives me my pain.
There is some probe happening that fails when a floppy is installed that
is not a "standard" format. That causes the open to fail which is the
most pain. Still I should be able to get a handle without any media or
even unrecognized media installed.


Yeah, that's check_disk_change().


The original behavior of the floppy driver was correct. I have no
idea what BUG these changes were supposed to fix but the "fix"
obviously broke user land. Was this bug reported by some new ROBOT
test or something? The kernel floppy driver has been stable for
years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist
(because they are much faster than a real hardware, and the different
timing reveals bugs that were not visible before).


Forgive me here as I'm ignorant about why any virtualized floppy would
require the real physical kernel floppy driver to be involved at all.


Because VMs (such as qemu) actually do emulate a FDC on a hardware level,
but don't emulate the timings of the real hardware (which are not mandated
by the spec, but "are just there").


This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.


I would be happy to do some testing for you if needed. At least with regard to
our apps.


Could you please check whether my last patch that Jens queued in
linux-block.git ("floppy: fix open(O_ACCMODE) for ioctl-only open" in
for-linus branch) remedies at least some of the issues you are seeing?



I'm not sure how to get "for-linus" branch. I don't see it in 
linux-block.git. A patch for 4.5 would be easy for me though.


Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-08-03 Thread Mark Hounschell


On 08/02/2016 05:44 AM, Jiri Kosina wrote:

On Wed, 13 Jul 2016, Mark Hounschell wrote:


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that
it has worked this way, but I agree with you that we can't ignore the
fact that there is userspace relying on this behavior.


I'm not supplementing anything. The driver _did_ this on its own.


I mean, you're passing O_NDELAY to open(/dev/fd0) exactly to avoid kernel
issuing check_disk_change(). That's the only semantics O_NDELAY has for
fd.


I just expect to be able to open the drive to get a handle without the
kernel attempting to access the media. My apps manage a disk_change on
their own. I don't think its check_disk_change that gives me my pain.
There is some probe happening that fails when a floppy is installed that
is not a "standard" format. That causes the open to fail which is the
most pain. Still I should be able to get a handle without any media or
even unrecognized media installed.


Yeah, that's check_disk_change().


The original behavior of the floppy driver was correct. I have no
idea what BUG these changes were supposed to fix but the "fix"
obviously broke user land. Was this bug reported by some new ROBOT
test or something? The kernel floppy driver has been stable for
years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist
(because they are much faster than a real hardware, and the different
timing reveals bugs that were not visible before).


Forgive me here as I'm ignorant about why any virtualized floppy would
require the real physical kernel floppy driver to be involved at all.


Because VMs (such as qemu) actually do emulate a FDC on a hardware level,
but don't emulate the timings of the real hardware (which are not mandated
by the spec, but "are just there").


This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.


I would be happy to do some testing for you if needed. At least with regard to
our apps.


Could you please check whether my last patch that Jens queued in
linux-block.git ("floppy: fix open(O_ACCMODE) for ioctl-only open" in
for-linus branch) remedies at least some of the issues you are seeing?



I'm not sure how to get "for-linus" branch. I don't see it in 
linux-block.git. A patch for 4.5 would be easy for me though.


Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-07-13 Thread Mark Hounschell


On 07/12/2016 04:54 AM, Jiri Kosina wrote:

On Mon, 11 Jul 2016, Mark Hounschell wrote:


Well, all that was specified in my original post. I can no longer open the
floppy drive with no floppy media inserted. Worse, I can also no longer open a
floppy with media inserted that is not a "linux" recognized format. A floppy
drive is a removable media device and should be treated as such. The original
implementation of the O_NDELAY flag allowed it to be.

Any removable media device should be capable of being opened with no, or even
unrecognizable media installed. The kernel and its utilities should not
"assume" to much when it comes to removable media. Consider a SCSI tape drive
or even a removable media SCSI disk drive. How would you explain an open
failure to someone trying to open a SCSI tape drive that had no tape or even a
"non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write protected
but didn't include O_RDONLY in the open?


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that it
has worked this way, but I agree with you that we can't ignore the fact
that there is userspace relying on this behavior.



I'm not supplementing anything. The driver _did_ this on its own. I just 
expect to be able to open the drive to get a handle without the kernel 
attempting to access the media. My apps manage a disk_change on their 
own. I don't think its check_disk_change that gives me my pain. There is 
some probe happening that fails when a floppy is installed that is not a 
"standard" format. That causes the open to fail which is the most pain. 
Still I should be able to get a handle without any media or even 
unrecognized media installed.


Funny, though even fdformat from the linux-utils package won't allow me 
to format a floppy that is NOT already formatted in a supported format. 
Once I format a floppy to an other than standard format, fdformat will 
not allow me to reformat it back to a standard format. Doesn't make much 
sense does it? "Unable to format a floppy that is not already 
formatted??" That is another issue though.



The original behavior of the floppy driver was correct. I have no idea
what BUG these changes were supposed to fix but the "fix" obviously
broke user land. Was this bug reported by some new ROBOT test or
something? The kernel floppy driver has been stable for years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist (because
they are much faster than a real hardware, and the different timing
reveals bugs that were not visible before).



Forgive me here as I'm ignorant about why any virtualized floppy would 
require the real physical kernel floppy driver to be involved at all. We 
also do virtualized floppies in our user land apps but we certainly 
don't require any kernel floppy driver support to do it?



This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.



I would be happy to do some testing for you if needed. At least with 
regard to our apps.


Regards
Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-07-13 Thread Mark Hounschell


On 07/12/2016 04:54 AM, Jiri Kosina wrote:

On Mon, 11 Jul 2016, Mark Hounschell wrote:


Well, all that was specified in my original post. I can no longer open the
floppy drive with no floppy media inserted. Worse, I can also no longer open a
floppy with media inserted that is not a "linux" recognized format. A floppy
drive is a removable media device and should be treated as such. The original
implementation of the O_NDELAY flag allowed it to be.

Any removable media device should be capable of being opened with no, or even
unrecognizable media installed. The kernel and its utilities should not
"assume" to much when it comes to removable media. Consider a SCSI tape drive
or even a removable media SCSI disk drive. How would you explain an open
failure to someone trying to open a SCSI tape drive that had no tape or even a
"non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write protected
but didn't include O_RDONLY in the open?


Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that it
has worked this way, but I agree with you that we can't ignore the fact
that there is userspace relying on this behavior.



I'm not supplementing anything. The driver _did_ this on its own. I just 
expect to be able to open the drive to get a handle without the kernel 
attempting to access the media. My apps manage a disk_change on their 
own. I don't think its check_disk_change that gives me my pain. There is 
some probe happening that fails when a floppy is installed that is not a 
"standard" format. That causes the open to fail which is the most pain. 
Still I should be able to get a handle without any media or even 
unrecognized media installed.


Funny, though even fdformat from the linux-utils package won't allow me 
to format a floppy that is NOT already formatted in a supported format. 
Once I format a floppy to an other than standard format, fdformat will 
not allow me to reformat it back to a standard format. Doesn't make much 
sense does it? "Unable to format a floppy that is not already 
formatted??" That is another issue though.



The original behavior of the floppy driver was correct. I have no idea
what BUG these changes were supposed to fix but the "fix" obviously
broke user land. Was this bug reported by some new ROBOT test or
something? The kernel floppy driver has been stable for years now


That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist (because
they are much faster than a real hardware, and the different timing
reveals bugs that were not visible before).



Forgive me here as I'm ignorant about why any virtualized floppy would 
require the real physical kernel floppy driver to be involved at all. We 
also do virtualized floppies in our user land apps but we certainly 
don't require any kernel floppy driver support to do it?



This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see

https://lkml.org/lkml/2016/2/2/848


so I am really confused as to why these changes were induced.


The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.

I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.



I would be happy to do some testing for you if needed. At least with 
regard to our apps.


Regards
Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-07-11 Thread Mark Hounschell


On 07/11/2016 11:36 AM, Jiri Kosina wrote:

On Tue, 5 Jul 2016, Mark Hounschell wrote:


From: Jiri Kosina <jkos...@suse.cz>

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt <w...@djo.tudelft.nl>
Tested-by: Wim Osterholt <w...@djo.tudelft.nl>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

[ ... snip ... ]


But this does not completely fix all the problems induced by the original
changes from 4.4 to 4.5. The following is what we use to open the floppy.

fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an open of
the device with no media inserted. It also prevents one from doing an open of
the device with media inserted that is not already formatted in a "standard"
format.  I do both of these things a lot. I deal with a few very non-standard
formats and this change prevents me from doing what I've been doing for YEARS.
Could we please get the original behavior back in the floppy driver.


Hi Mark,

thanks for the regression report.

For my better understanding of your issue -- what behavior/semantics
exactly does your userspace think it'll be getting from opening /dev/fd0
with O_NDELAY?

Thanks,



Hi Jiri.

Well, all that was specified in my original post. I can no longer open 
the floppy drive with no floppy media inserted. Worse, I can also no 
longer open a floppy with media inserted that is not a "linux" 
recognized format. A floppy drive is a removable media device and should 
be treated as such. The original implementation of the O_NDELAY flag 
allowed it to be.


Any removable media device should be capable of being opened with no, or 
even unrecognizable media installed. The kernel and its utilities should 
not "assume" to much when it comes to removable media. Consider a SCSI 
tape drive or even a removable media SCSI disk drive. How would you 
explain an open failure to someone trying to open a SCSI tape drive that 
had no tape or even a "non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write 
protected but didn't include O_RDONLY in the open?


The original behavior of the floppy driver was correct. I have no idea 
what BUG these changes were supposed to fix but the "fix" obviously 
broke user land. Was this bug reported by some new ROBOT test or 
something? The kernel floppy driver has been stable for years now so I 
am really confused as to why these changes were induced.


As for the "O_RDONLY | O_WRONLY" thing you decided to change back, which 
I'm happy to see, was wrong. Almost ALL removable media devices have W/R 
protection built into the media. For ever, I understood that it was MY 
responsibility to write protect my removable media. An open of a 
removable device should never even care about that stuff. It is the 
users responsibility.


We use extensively, the FDRAWCMD ioctl API. It is totally borked now for 
us without maintaining our own kernel patch that reverts the changes 
from 4.4 to 4.5.


Regards
Mark

Re: Resend: Another 4.4 to 4.5 floppy issue

2016-07-11 Thread Mark Hounschell


On 07/11/2016 11:36 AM, Jiri Kosina wrote:

On Tue, 5 Jul 2016, Mark Hounschell wrote:


From: Jiri Kosina 

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
---

[ ... snip ... ]


But this does not completely fix all the problems induced by the original
changes from 4.4 to 4.5. The following is what we use to open the floppy.

fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an open of
the device with no media inserted. It also prevents one from doing an open of
the device with media inserted that is not already formatted in a "standard"
format.  I do both of these things a lot. I deal with a few very non-standard
formats and this change prevents me from doing what I've been doing for YEARS.
Could we please get the original behavior back in the floppy driver.


Hi Mark,

thanks for the regression report.

For my better understanding of your issue -- what behavior/semantics
exactly does your userspace think it'll be getting from opening /dev/fd0
with O_NDELAY?

Thanks,



Hi Jiri.

Well, all that was specified in my original post. I can no longer open 
the floppy drive with no floppy media inserted. Worse, I can also no 
longer open a floppy with media inserted that is not a "linux" 
recognized format. A floppy drive is a removable media device and should 
be treated as such. The original implementation of the O_NDELAY flag 
allowed it to be.


Any removable media device should be capable of being opened with no, or 
even unrecognizable media installed. The kernel and its utilities should 
not "assume" to much when it comes to removable media. Consider a SCSI 
tape drive or even a removable media SCSI disk drive. How would you 
explain an open failure to someone trying to open a SCSI tape drive that 
had no tape or even a "non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write 
protected but didn't include O_RDONLY in the open?


The original behavior of the floppy driver was correct. I have no idea 
what BUG these changes were supposed to fix but the "fix" obviously 
broke user land. Was this bug reported by some new ROBOT test or 
something? The kernel floppy driver has been stable for years now so I 
am really confused as to why these changes were induced.


As for the "O_RDONLY | O_WRONLY" thing you decided to change back, which 
I'm happy to see, was wrong. Almost ALL removable media devices have W/R 
protection built into the media. For ever, I understood that it was MY 
responsibility to write protect my removable media. An open of a 
removable device should never even care about that stuff. It is the 
users responsibility.


We use extensively, the FDRAWCMD ioctl API. It is totally borked now for 
us without maintaining our own kernel patch that reverts the changes 
from 4.4 to 4.5.


Regards
Mark

Resend: Another 4.4 to 4.5 floppy issue

2016-07-05 Thread Mark Hounschell

Just rejoined the list due to floppy open problems created from 4.4 to 
4.5. I found the following email that indicates a fix for one of the 
problems.




From: Jiri Kosina 

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
---

Jens, this should preferably go into 4.7-rcX and to -stable as well.

 drivers/block/floppy.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 84708a5..a1dcf12 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, 
fmode_t mode)


opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device 
*bdev, fmode_t mode)

if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;

--
Jiri Kosina
SUSE Labs





But this does not completely fix all the problems induced by the 
original changes from 4.4 to 4.5. The following is what we use to open 
the floppy.


fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an 
open of the device with no media inserted. It also prevents one from 
doing an open of the device with media inserted that is not already 
formatted in a "standard" format.  I do both of these things a lot. I 
deal with a few very non-standard formats and this change prevents me 
from doing what I've been doing for YEARS. Could we please get the 
original behavior back in the floppy driver.


Thanks and regards
Mark

Resend: Another 4.4 to 4.5 floppy issue

2016-07-05 Thread Mark Hounschell

Just rejoined the list due to floppy open problems created from 4.4 to 
4.5. I found the following email that indicates a fix for one of the 
problems.




From: Jiri Kosina 

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
---

Jens, this should preferably go into 4.7-rcX and to -stable as well.

 drivers/block/floppy.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 84708a5..a1dcf12 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, 
fmode_t mode)


opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device 
*bdev, fmode_t mode)

if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;

--
Jiri Kosina
SUSE Labs





But this does not completely fix all the problems induced by the 
original changes from 4.4 to 4.5. The following is what we use to open 
the floppy.


fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an 
open of the device with no media inserted. It also prevents one from 
doing an open of the device with media inserted that is not already 
formatted in a "standard" format.  I do both of these things a lot. I 
deal with a few very non-standard formats and this change prevents me 
from doing what I've been doing for YEARS. Could we please get the 
original behavior back in the floppy driver.


Thanks and regards
Mark

Another 4.4 to 4.5 floppy issue

2016-07-05 Thread Mark Hounschell

Just rejoined the list due to floppy open problems created from 4.4 to 
4.5. I found the following email that indicates a fix for one of the 
problems.




From: Jiri Kosina 

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
---

Jens, this should preferably go into 4.7-rcX and to -stable as well.

 drivers/block/floppy.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 84708a5..a1dcf12 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, 
fmode_t mode)


opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device 
*bdev, fmode_t mode)

if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;

--
Jiri Kosina
SUSE Labs





But this does not completely fix all the problems induced by the 
original changes from 4.4 to 4.5. The following is what we use to open 
the floppy.


fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an 
open of the device with no media inserted. It also prevents one from 
doing an open of the device with media inserted that is not already 
formatted in a "standard" format.  I do both of these things a lot. I 
deal with a few very non-standard formats and this change prevents me 
from doing what I've been doing for YEARS. Could we please get the 
original behavior back in the floppy driver.


Thanks and regards
Mark

Another 4.4 to 4.5 floppy issue

2016-07-05 Thread Mark Hounschell

Just rejoined the list due to floppy open problems created from 4.4 to 
4.5. I found the following email that indicates a fix for one of the 
problems.




From: Jiri Kosina 

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: sta...@vger.kernel.org # v4.5+
Reported-by: Wim Osterholt 
Tested-by: Wim Osterholt 
Signed-off-by: Jiri Kosina 
---

Jens, this should preferably go into 4.7-rcX and to -stable as well.

 drivers/block/floppy.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 84708a5..a1dcf12 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3663,11 +3663,6 @@ static int floppy_open(struct block_device *bdev, 
fmode_t mode)


opened_bdev[drive] = bdev;

-   if (!(mode & (FMODE_READ|FMODE_WRITE))) {
-   res = -EINVAL;
-   goto out;
-   }
-
res = -ENXIO;

if (!floppy_track_buffer) {
@@ -3711,13 +3706,15 @@ static int floppy_open(struct block_device 
*bdev, fmode_t mode)

if (UFDCS->rawcmd == 1)
UFDCS->rawcmd = 2;

-   UDRS->last_checked = 0;
-   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
-   check_disk_change(bdev);
-   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
-   goto out;
-   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
-   goto out;
+   if (mode & (FMODE_READ|FMODE_WRITE)) {
+   UDRS->last_checked = 0;
+   clear_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags);
+   check_disk_change(bdev);
+   if (test_bit(FD_DISK_CHANGED_BIT, >flags))
+   goto out;
+   if (test_bit(FD_OPEN_SHOULD_FAIL_BIT, >flags))
+   goto out;
+   }

res = -EROFS;

--
Jiri Kosina
SUSE Labs





But this does not completely fix all the problems induced by the 
original changes from 4.4 to 4.5. The following is what we use to open 
the floppy.


fd = open(device,  O_RDWR | O_NDELAY);

The FMODE_NDELAY check that was removed now prevents one from doing an 
open of the device with no media inserted. It also prevents one from 
doing an open of the device with media inserted that is not already 
formatted in a "standard" format.  I do both of these things a lot. I 
deal with a few very non-standard formats and this change prevents me 
from doing what I've been doing for YEARS. Could we please get the 
original behavior back in the floppy driver.


Thanks and regards
Mark

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell


On 03/26/2015 11:52 AM, Joerg Roedel wrote:

Hi Mark,

On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote:

Sorry but CMA was still badly broken. I have a patch below that works.


In which way is it broken? What happens when you try to allocate memory
with dma_alloc_coherent?



I got Oops on both the alloc and free and I had to hit the reset button.


I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the "git diff" from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.


The main differences I can spot are that you change the order (first
CMA, then buddy) and you manually align the input size. I can see the
reason for the later, but why does CMA need to be tried first?



I believe that is how dma_alloc_coherent works. If  CMA is present it 
uses it. Even for small allocations. If not present then it does the 
best it can. That patch was derived by looking at the Intel iommu driver 
works properly. Maybe you should take a look at that.



Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.


It is not a BUG, the interface definition for dma_alloc_coherent does
not specify that it can allocate infinite amounts of memory. So this
patch does not qualify for stable.




I don't care what the "the interface definition for dma_alloc_coherent" 
says. If it does not describe it's use with CMA, it is outdated. Maybe 
the "interface definition for dma_alloc_coherent" was done before CMA 
was implemented.


Maybe you should be looking at CMA Documentation.  Unfortunately there 
does not yet seem to be any in the Documentation dir. Probably because 
CMA is supposed to be transparent to the DMA API. You know, one should 
not need to details about it, etc...


It is a BUG. To access CMA you use dma_alloc_coherent. You have 
completely highjacked that function.  You have broken CMA. PERIOD. That 
is a BUG.


Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell

Hi Joerg,

On 03/26/2015 08:45 AM, Joerg Roedel wrote:
> On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote:
>> BTW, so far the first 2 patches are working well. I was going to
>> wait until the end of the day to report but so far I have been
>> unable to produce the problems I was seeing. And I am in the middle
>> of some driver work so lots of unloading/loading going on.
> 
> Great, thanks. Please let me know when you have test results for the
> other patches too.
> 
> 
>   Joerg
 
Sorry but CMA was still badly broken. I have a patch below that works.
I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the "git diff" from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.

Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.

Regards
Mark 

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a0197d0..5ea4fed 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2917,28 +2917,36 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
u64 dma_mask = dev->coherent_dma_mask;
struct protection_domain *domain;
unsigned long flags;
-   struct page *page;
+   struct page *page = 0;
+   int order;
+   unsigned int count;
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
+   count = size >> PAGE_SHIFT;
 
INC_STATS_COUNTER(cnt_alloc_coherent);
 
domain = get_domain(dev);
if (PTR_ERR(domain) == -EINVAL) {
-   page = alloc_pages(flag, get_order(size));
+   page = alloc_pages(flag, order);
*dma_addr = page_to_phys(page);
return page_address(page);
} else if (IS_ERR(domain))
return NULL;
 
-   dma_mask  = dev->coherent_dma_mask;
flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+   flag |= __GFP_ZERO;
 
-   page = alloc_pages(flag, get_order(size));
+   if (flag & __GFP_WAIT) {
+   page = dma_alloc_from_contiguous(dev, count, order);
+   if (page && page_to_phys(page) + size > dma_mask) {
+   dma_release_from_contiguous(dev, page, count);
+   page = NULL;
+   }
+   }
if (!page) {
-   if (!(flag & __GFP_WAIT))
-   return NULL;
-
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-get_order(size));
+   page = alloc_pages(flag, order);
if (!page)
return NULL;
}
@@ -2951,7 +2959,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
*dma_addr = __map_single(dev, domain->priv, page_to_phys(page),
 size, DMA_BIDIRECTIONAL, true, dma_mask);
 
-   if (*dma_addr == DMA_ERROR_CODE) {
+   if (!dma_addr || (*dma_addr == DMA_ERROR_CODE)) {
spin_unlock_irqrestore(>lock, flags);
goto out_free;
}
@@ -2965,7 +2973,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
 out_free:
 
if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
-   __free_pages(page, get_order(size));
+   __free_pages(page, order);
 
return NULL;
 }
@@ -2979,6 +2987,11 @@ static void free_coherent(struct device *dev, size_t 
size,
 {
unsigned long flags;
struct protection_domain *domain;
+   int order;
+   struct page *page = virt_to_page(virt_addr);
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
 
INC_STATS_COUNTER(cnt_free_coherent);
 
@@ -2995,7 +3008,9 @@ static void free_coherent(struct device *dev, size_t size,
spin_unlock_irqrestore(>lock, flags);
 
 free_mem:
-   free_pages((unsigned long)virt_addr, get_order(size));
+   if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
+   __free_pages(page, order);
+
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell


On 03/26/2015 11:52 AM, Joerg Roedel wrote:

Hi Mark,

On Thu, Mar 26, 2015 at 10:58:15AM -0400, Mark Hounschell wrote:

Sorry but CMA was still badly broken. I have a patch below that works.


In which way is it broken? What happens when you try to allocate memory
with dma_alloc_coherent?



I got Oops on both the alloc and free and I had to hit the reset button.


I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the git diff from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.


The main differences I can spot are that you change the order (first
CMA, then buddy) and you manually align the input size. I can see the
reason for the later, but why does CMA need to be tried first?



I believe that is how dma_alloc_coherent works. If  CMA is present it 
uses it. Even for small allocations. If not present then it does the 
best it can. That patch was derived by looking at the Intel iommu driver 
works properly. Maybe you should take a look at that.



Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.


It is not a BUG, the interface definition for dma_alloc_coherent does
not specify that it can allocate infinite amounts of memory. So this
patch does not qualify for stable.




I don't care what the the interface definition for dma_alloc_coherent 
says. If it does not describe it's use with CMA, it is outdated. Maybe 
the interface definition for dma_alloc_coherent was done before CMA 
was implemented.


Maybe you should be looking at CMA Documentation.  Unfortunately there 
does not yet seem to be any in the Documentation dir. Probably because 
CMA is supposed to be transparent to the DMA API. You know, one should 
not need to details about it, etc...


It is a BUG. To access CMA you use dma_alloc_coherent. You have 
completely highjacked that function.  You have broken CMA. PERIOD. That 
is a BUG.


Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-26 Thread Mark Hounschell

Hi Joerg,

On 03/26/2015 08:45 AM, Joerg Roedel wrote:
 On Wed, Mar 25, 2015 at 12:36:34PM -0400, Mark Hounschell wrote:
 BTW, so far the first 2 patches are working well. I was going to
 wait until the end of the day to report but so far I have been
 unable to produce the problems I was seeing. And I am in the middle
 of some driver work so lots of unloading/loading going on.
 
 Great, thanks. Please let me know when you have test results for the
 other patches too.
 
 
   Joerg
 
Sorry but CMA was still badly broken. I have a patch below that works.
I've tested it with small (no CMA) and large (with CMA) dma's using
dma_alloc_coherent. The patch below is just the git diff from your
cloned tree piped to a file then copied into this email. If you require
an official patch I can send one. Just let me know.

Also, in my opinion, this CMA thing is clearly a BUG not a feature
request. The AMD iommu clearly breaks CMA. I feel what ever fix
you are happy with should be back ported to stable.

Regards
Mark 

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a0197d0..5ea4fed 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2917,28 +2917,36 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
u64 dma_mask = dev-coherent_dma_mask;
struct protection_domain *domain;
unsigned long flags;
-   struct page *page;
+   struct page *page = 0;
+   int order;
+   unsigned int count;
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
+   count = size  PAGE_SHIFT;
 
INC_STATS_COUNTER(cnt_alloc_coherent);
 
domain = get_domain(dev);
if (PTR_ERR(domain) == -EINVAL) {
-   page = alloc_pages(flag, get_order(size));
+   page = alloc_pages(flag, order);
*dma_addr = page_to_phys(page);
return page_address(page);
} else if (IS_ERR(domain))
return NULL;
 
-   dma_mask  = dev-coherent_dma_mask;
flag = ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+   flag |= __GFP_ZERO;
 
-   page = alloc_pages(flag, get_order(size));
+   if (flag  __GFP_WAIT) {
+   page = dma_alloc_from_contiguous(dev, count, order);
+   if (page  page_to_phys(page) + size  dma_mask) {
+   dma_release_from_contiguous(dev, page, count);
+   page = NULL;
+   }
+   }
if (!page) {
-   if (!(flag  __GFP_WAIT))
-   return NULL;
-
-   page = dma_alloc_from_contiguous(dev, size  PAGE_SHIFT,
-get_order(size));
+   page = alloc_pages(flag, order);
if (!page)
return NULL;
}
@@ -2951,7 +2959,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
*dma_addr = __map_single(dev, domain-priv, page_to_phys(page),
 size, DMA_BIDIRECTIONAL, true, dma_mask);
 
-   if (*dma_addr == DMA_ERROR_CODE) {
+   if (!dma_addr || (*dma_addr == DMA_ERROR_CODE)) {
spin_unlock_irqrestore(domain-lock, flags);
goto out_free;
}
@@ -2965,7 +2973,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
 out_free:
 
if (!dma_release_from_contiguous(dev, page, size  PAGE_SHIFT))
-   __free_pages(page, get_order(size));
+   __free_pages(page, order);
 
return NULL;
 }
@@ -2979,6 +2987,11 @@ static void free_coherent(struct device *dev, size_t 
size,
 {
unsigned long flags;
struct protection_domain *domain;
+   int order;
+   struct page *page = virt_to_page(virt_addr);
+
+   size = PAGE_ALIGN(size);
+   order = get_order(size);
 
INC_STATS_COUNTER(cnt_free_coherent);
 
@@ -2995,7 +3008,9 @@ static void free_coherent(struct device *dev, size_t size,
spin_unlock_irqrestore(domain-lock, flags);
 
 free_mem:
-   free_pages((unsigned long)virt_addr, get_order(size));
+   if (!dma_release_from_contiguous(dev, page, size  PAGE_SHIFT))
+   __free_pages(page, order);
+
 }
 
 /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Mark Hounschell


On 03/25/2015 11:13 AM, Joerg Roedel wrote:

Hi again,

On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote:

On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:

I'll be happy to test it.


Okay, here you go:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu


I just added more patches to implement the use of the contiguous memory
allocator for the dma_alloc_coherent path. Since you asked for it, can
you test this as well?




Sure. No problem. If I update my tree from the above, will I get it.

BTW, so far the first 2 patches are working well. I was going to wait 
until the end of the day to report but so far I have been unable to 
produce the problems I was seeing. And I am in the middle of some driver 
work so lots of unloading/loading going on.


Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-25 Thread Mark Hounschell


On 03/25/2015 11:13 AM, Joerg Roedel wrote:

Hi again,

On Wed, Mar 25, 2015 at 02:59:37PM +0100, Joerg Roedel wrote:

On Tue, Mar 24, 2015 at 07:57:49AM -0400, Mark Hounschell wrote:

I'll be happy to test it.


Okay, here you go:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git amd-iommu


I just added more patches to implement the use of the contiguous memory
allocator for the dma_alloc_coherent path. Since you asked for it, can
you test this as well?




Sure. No problem. If I update my tree from the above, will I get it.

BTW, so far the first 2 patches are working well. I was going to wait 
until the end of the day to report but so far I have been unable to 
produce the problems I was seeing. And I am in the middle of some driver 
work so lots of unloading/loading going on.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-24 Thread Mark Hounschell


On 03/23/2015 11:03 AM, Joerg Roedel wrote:

Hi Mark,

On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote:

It looks like this problem is NOT a bug with the SCSI aic7xxx driver
after all. I can duplicate this BUG very easily with other hardware.
Simply removing a driver module  (whether it its self, has actually
used any of the DMA API or not) that is sitting on the same pci bus
as a card that is actually using DMA will cause this. And that card
that is in use and using DMA will no longer function. It "looks and
feels" like unloading a module causes the IOMMU to improperly unmap
valid mappings.


You are right, I looked into the code and found the problem. I'll post a
fix for testing this week.



I'll be happy to test it.

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-24 Thread Mark Hounschell


On 03/23/2015 11:03 AM, Joerg Roedel wrote:

Hi Mark,

On Tue, Mar 03, 2015 at 02:36:19PM -0500, Mark Hounschell wrote:

It looks like this problem is NOT a bug with the SCSI aic7xxx driver
after all. I can duplicate this BUG very easily with other hardware.
Simply removing a driver module  (whether it its self, has actually
used any of the DMA API or not) that is sitting on the same pci bus
as a card that is actually using DMA will cause this. And that card
that is in use and using DMA will no longer function. It looks and
feels like unloading a module causes the IOMMU to improperly unmap
valid mappings.


You are right, I looked into the code and found the problem. I'll post a
fix for testing this week.



I'll be happy to test it.

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-16 Thread Mark Hounschell


On 03/15/2015 08:07 AM, Mark Hounschell wrote:

On 03/14/2015 04:44 AM, Greg KH wrote:

On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote:

On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and
only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even
redundant
before this patch.

Signed-off-by: Giedrius Statkevičius

---
v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot
to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev
*pdev, int id)
  unsigned long flags;

  /* get the board structure and prep it */
-dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-brd = dgnc_Board[dgnc_NumBoards];
+brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole "static list of boards/cards" go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole
static
array mess.

So could you do that instead?  I don't want to take patches around
this
whole "board state" mess anymore, as it should all not be needed at
all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written
for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and
logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just
been so
swamped the last few months that I haven't been able to do ANYTHING
here.
Even work on the dgap driver that I was working on, I just haven't
had the
time. I can tell you that the dgnc driver does NOT work at all and
hasn't
worked since it was introduced into staging by Greg.


Really?  It's always been broken?  Why don't we just delete the thing?



Well, broken might not be the best word. I should have just said,
doesn't work for me.
It did work on a 3.4 kernel when I provided it to you. I have spent no
time as to why but will take a look on Monday.



The driver actually does work. It didn't work for me because the device 
entries were not as they used to be. So if you want to make a change you 
think needs testing, I can test it.

I only have a single card 8 port though.

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-16 Thread Mark Hounschell


On 03/15/2015 08:07 AM, Mark Hounschell wrote:

On 03/14/2015 04:44 AM, Greg KH wrote:

On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote:

On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and
only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even
redundant
before this patch.

Signed-off-by: Giedrius Statkevičius
giedrius.statkevic...@gmail.com
---
v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot
to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev
*pdev, int id)
  unsigned long flags;

  /* get the board structure and prep it */
-dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-brd = dgnc_Board[dgnc_NumBoards];
+brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole static list of boards/cards go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole
static
array mess.

So could you do that instead?  I don't want to take patches around
this
whole board state mess anymore, as it should all not be needed at
all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written
for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and
logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just
been so
swamped the last few months that I haven't been able to do ANYTHING
here.
Even work on the dgap driver that I was working on, I just haven't
had the
time. I can tell you that the dgnc driver does NOT work at all and
hasn't
worked since it was introduced into staging by Greg.


Really?  It's always been broken?  Why don't we just delete the thing?



Well, broken might not be the best word. I should have just said,
doesn't work for me.
It did work on a 3.4 kernel when I provided it to you. I have spent no
time as to why but will take a look on Monday.



The driver actually does work. It didn't work for me because the device 
entries were not as they used to be. So if you want to make a change you 
think needs testing, I can test it.

I only have a single card 8 port though.

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-15 Thread Mark Hounschell


On 03/14/2015 04:44 AM, Greg KH wrote:

On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote:

On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even redundant
before this patch.

Signed-off-by: Giedrius Statkevičius 
---
v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c 
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id)
unsigned long flags;

/* get the board structure and prep it */
-   dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-   brd = dgnc_Board[dgnc_NumBoards];
+   brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole "static list of boards/cards" go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole static
array mess.

So could you do that instead?  I don't want to take patches around this
whole "board state" mess anymore, as it should all not be needed at all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just been so
swamped the last few months that I haven't been able to do ANYTHING here.
Even work on the dgap driver that I was working on, I just haven't had the
time. I can tell you that the dgnc driver does NOT work at all and hasn't
worked since it was introduced into staging by Greg.


Really?  It's always been broken?  Why don't we just delete the thing?



Well, broken might not be the best word. I should have just said, 
doesn't work for me.
It did work on a 3.4 kernel when I provided it to you. I have spent no 
time as to why but will take a look on Monday.



Greg, while I'm here, how can we get Digi firmware into the kernel firmware
tree? The dgap card is useless without it and I suspect the dgnc is as well.


Submit a patch to the linux-firmware git tree with the firmware in it.



I don't know if the binary blobs are GPL compatable?

Regards
Mark


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-15 Thread Mark Hounschell


On 03/14/2015 04:44 AM, Greg KH wrote:

On Fri, Mar 13, 2015 at 04:55:55PM -0400, Mark Hounschell wrote:

On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even redundant
before this patch.

Signed-off-by: Giedrius Statkevičius giedrius.statkevic...@gmail.com
---
v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c 
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id)
unsigned long flags;

/* get the board structure and prep it */
-   dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-   brd = dgnc_Board[dgnc_NumBoards];
+   brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole static list of boards/cards go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole static
array mess.

So could you do that instead?  I don't want to take patches around this
whole board state mess anymore, as it should all not be needed at all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just been so
swamped the last few months that I haven't been able to do ANYTHING here.
Even work on the dgap driver that I was working on, I just haven't had the
time. I can tell you that the dgnc driver does NOT work at all and hasn't
worked since it was introduced into staging by Greg.


Really?  It's always been broken?  Why don't we just delete the thing?



Well, broken might not be the best word. I should have just said, 
doesn't work for me.
It did work on a 3.4 kernel when I provided it to you. I have spent no 
time as to why but will take a look on Monday.



Greg, while I'm here, how can we get Digi firmware into the kernel firmware
tree? The dgap card is useless without it and I suspect the dgnc is as well.


Submit a patch to the linux-firmware git tree with the firmware in it.



I don't know if the binary blobs are GPL compatable?

Regards
Mark


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-13 Thread Mark Hounschell


On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even redundant
before this patch.

Signed-off-by: Giedrius Statkevičius 
---
v2: Remove "brd = dgnc_Board[dgnc_NumBoards];" line which I forgot to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c 
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id)
unsigned long flags;

/* get the board structure and prep it */
-   dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-   brd = dgnc_Board[dgnc_NumBoards];
+   brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole "static list of boards/cards" go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole static
array mess.

So could you do that instead?  I don't want to take patches around this
whole "board state" mess anymore, as it should all not be needed at all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just 
been so swamped the last few months that I haven't been able to do 
ANYTHING here. Even work on the dgap driver that I was working on, I 
just haven't had the time. I can tell you that the dgnc driver does NOT 
work at all and hasn't worked since it was introduced into staging by Greg.


Greg, while I'm here, how can we get Digi firmware into the kernel 
firmware tree? The dgap card is useless without it and I suspect the 
dgnc is as well.


Regards
mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dgnc: Don't save boards in memory that have failed to initialize

2015-03-13 Thread Mark Hounschell


On 03/12/2015 12:14 PM, Giedrius Statkevičius wrote:

On 2015.03.12 12:08, Greg KH wrote:

On Mon, Mar 09, 2015 at 06:29:38PM +0200, Giedrius Statkevičius wrote:

Remove BOARD_FAILED and don't save dgnc_boards which failed to
initialize.

Assign the result of kzalloc() to brd in dgnc_found_board() and only put
it in the dgnc_Board[] if it successfully initializes. Also, remove
BOARD_FAILED enum and all ifs that check for it. Finally, remove one
final place where state was set to BOARD_FAILED which was even redundant
before this patch.

Signed-off-by: Giedrius Statkevičius giedrius.statkevic...@gmail.com
---
v2: Remove brd = dgnc_Board[dgnc_NumBoards]; line which I forgot to do
in the first version

  drivers/staging/dgnc/dgnc_driver.c | 20 ++--
  drivers/staging/dgnc/dgnc_driver.h |  3 +--
  drivers/staging/dgnc/dgnc_mgmt.c   |  5 +
  drivers/staging/dgnc/dgnc_tty.c|  8 
  4 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/dgnc/dgnc_driver.c 
b/drivers/staging/dgnc/dgnc_driver.c
index fa1ee79..075727d 100644
--- a/drivers/staging/dgnc/dgnc_driver.c
+++ b/drivers/staging/dgnc/dgnc_driver.c
@@ -401,8 +401,7 @@ static int dgnc_found_board(struct pci_dev *pdev, int id)
unsigned long flags;

/* get the board structure and prep it */
-   dgnc_Board[dgnc_NumBoards] = kzalloc(sizeof(*brd), GFP_KERNEL);
-   brd = dgnc_Board[dgnc_NumBoards];
+   brd = kzalloc(sizeof(*brd), GFP_KERNEL);


You've done a great job here, but...

Yeah, sorry...

I really want to see this whole static list of boards/cards go away.
There should not be any need for that in any in-kernel driver.  Your
patch here is a sign that things are really wrong with this whole static
array mess.

So could you do that instead?  I don't want to take patches around this
whole board state mess anymore, as it should all not be needed at all.

If you need pointers on what needs to be done here, just let me know.

thanks,

greg k-h



I can try :) But my main concern is the lack of dgnc driver maintainers
activity and that I don't own the hardware this driver is written for as
it's quite expensive (cheapest cards I've found start at 300$~) and I
can't afford it ATM. But I guess if I keep the patches small and logical
everything will be okay.



I am on the maintainers list and actually have hardware. I have just 
been so swamped the last few months that I haven't been able to do 
ANYTHING here. Even work on the dgap driver that I was working on, I 
just haven't had the time. I can tell you that the dgnc driver does NOT 
work at all and hasn't worked since it was introduced into staging by Greg.


Greg, while I'm here, how can we get Digi firmware into the kernel 
firmware tree? The dgap card is useless without it and I suspect the 
dgnc is as well.


Regards
mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-03 Thread Mark Hounschell


Hi Joerg,

It looks like this problem is NOT a bug with the SCSI aic7xxx driver 
after all. I can duplicate this BUG very easily with other hardware. 
Simply removing a driver module  (whether it its self, has actually used 
any of the DMA API or not) that is sitting on the same pci bus as a card 
that is actually using DMA will cause this. And that card that is in use 
and using DMA will no longer function. It "looks and feels" like 
unloading a module causes the IOMMU to improperly unmap valid mappings.


Regards
Mark

> Kernel 3.18.7 (x86_64) with an enabled IOMMU. This happens 
immediately after

unloading the aic7xxx driver. In fact this happens unloading any module 
associated
with any pci card on the same bus as the Adaptec AHA-2930CU card. If I blacklist
the aic7xxx driver the problem never shows.  If I disable the IOMMU, the problem
never shows.

This was brought up over at io...@lists.linux-foundation.org in this thread.
http://lists.linuxfoundation.org/pipermail/iommu/2015-February/012274.html
and they think it is not an iommu problem, but a possibly a problem with the
aic7xxx driver.  Complete dmesg attached.

[  106.355752] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00073980 flags=0x0070]
[  106.358732] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739a0 flags=0x0070]
[  106.361870] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739b0 flags=0x0070]
[  112.352950] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d00 flags=0x0070]
[  112.356057] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d20 flags=0x0070]
[  112.359142] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d30 flags=0x0070]
[  126.588896] AMD-Vi: Using protection domain 1 for device :10:05.0
[  126.591992] [ cut here ]
[  126.595107] WARNING: CPU: 4 PID: 0 at drivers/iommu/amd_iommu.c:2637 
dma_ops_domain_unmap.part.13+0x65/0x70()
[  126.598408] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet kvm 
crc32_pclmul crc32c_intel aesni_intel snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec aes_x86_64 
drm snd_hwdep snd_pcm glue_helper lrw aic79xx gf128mul ablk_helper cryptd 
snd_timer 3c59x scsi_transport_spi snd r8169 xhci_pci xhci_hcd serio_raw pcspkr 
mii tpm_infineon tpm_tis fam15h_power k10temp i2c_piix4 processor thermal_sys 
shpchp tpm soundcore 8250_fintek dm_mod sr_mod cdrom ata_generic mxm_wmi 
pata_atiixp ohci_pci wmi button sg autofs4 [last unloaded: aic7xxx]
[  126.613992] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.18.7-lcrs #1
[  126.618146] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./990FXA-UD5, BIOS FB 01/23/2013
[  126.622456]  0009 88044fd03cb8 807bde4d 

[  126.626815]   88044fd03cf8 8024cf7c 
88044fd03cf8
[  126.631172]  88043cdbc380  0600 
000702c0
[  126.635504] Call Trace:
[  126.639703][] dump_stack+0x4e/0x71
[  126.644003]  [] warn_slowpath_common+0x7c/0xa0
[  126.648335]  [] warn_slowpath_null+0x15/0x20
[  126.652632]  [] dma_ops_domain_unmap.part.13+0x65/0x70
[  126.656919]  [] __unmap_single.isra.16+0x9b/0x100
[  126.661242]  [] unmap_page+0x48/0x70
[  126.665572]  [] boomerang_rx+0x333/0x600 [3c59x]
[  126.669968]  [] boomerang_interrupt+0x16a/0x4f0 [3c59x]
[  126.674400]  [] handle_irq_event_percpu+0x3e/0x1e0
[  126.678860]  [] handle_irq_event+0x3c/0x60
[  126.683298]  [] handle_fasteoi_irq+0x7e/0x130
[  126.687732]  [] handle_irq+0x1d/0x30
[  126.692164]  [] do_IRQ+0x4e/0xf0
[  126.696577]  [] common_interrupt+0x6a/0x6a
[  126.700873][] ? hrtimer_start+0x13/0x20
[  126.705107]  [] ? default_idle+0x17/0x100
[  126.709386]  [] arch_cpu_idle+0xa/0x10
[  126.713685]  [] cpu_startup_entry+0x34a/0x380
[  126.718010]  [] ? cpu_startup_entry+0x1/0x380
[  126.722361]  [] start_secondary+0x157/0x180
[  126.726731] ---[ end trace ace20602a5302afb ]---
[  128.557546] [ cut here ]
[  128.560501] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x000766c0 flags=0x0020]
[  128.560502] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076700 flags=0x0020]
[  128.560503] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076740 flags=0x0020]
[  128.560504] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076780 flags=0x0020]
[  128.560505] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076800 flags=0x0020]
[  128.560506] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x000767c0 flags=0x0020]
[  128.560506] AMD-Vi: Event

Re: BUG: SCSI aic7xxx driver and AMD IOMMU

2015-03-03 Thread Mark Hounschell


Hi Joerg,

It looks like this problem is NOT a bug with the SCSI aic7xxx driver 
after all. I can duplicate this BUG very easily with other hardware. 
Simply removing a driver module  (whether it its self, has actually used 
any of the DMA API or not) that is sitting on the same pci bus as a card 
that is actually using DMA will cause this. And that card that is in use 
and using DMA will no longer function. It looks and feels like 
unloading a module causes the IOMMU to improperly unmap valid mappings.


Regards
Mark

 Kernel 3.18.7 (x86_64) with an enabled IOMMU. This happens 
immediately after

unloading the aic7xxx driver. In fact this happens unloading any module 
associated
with any pci card on the same bus as the Adaptec AHA-2930CU card. If I blacklist
the aic7xxx driver the problem never shows.  If I disable the IOMMU, the problem
never shows.

This was brought up over at io...@lists.linux-foundation.org in this thread.
http://lists.linuxfoundation.org/pipermail/iommu/2015-February/012274.html
and they think it is not an iommu problem, but a possibly a problem with the
aic7xxx driver.  Complete dmesg attached.

[  106.355752] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00073980 flags=0x0070]
[  106.358732] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739a0 flags=0x0070]
[  106.361870] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x000739b0 flags=0x0070]
[  112.352950] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d00 flags=0x0070]
[  112.356057] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d20 flags=0x0070]
[  112.359142] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x 
address=0x00074d30 flags=0x0070]
[  126.588896] AMD-Vi: Using protection domain 1 for device :10:05.0
[  126.591992] [ cut here ]
[  126.595107] WARNING: CPU: 4 PID: 0 at drivers/iommu/amd_iommu.c:2637 
dma_ops_domain_unmap.part.13+0x65/0x70()
[  126.598408] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet kvm 
crc32_pclmul crc32c_intel aesni_intel snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec aes_x86_64 
drm snd_hwdep snd_pcm glue_helper lrw aic79xx gf128mul ablk_helper cryptd 
snd_timer 3c59x scsi_transport_spi snd r8169 xhci_pci xhci_hcd serio_raw pcspkr 
mii tpm_infineon tpm_tis fam15h_power k10temp i2c_piix4 processor thermal_sys 
shpchp tpm soundcore 8250_fintek dm_mod sr_mod cdrom ata_generic mxm_wmi 
pata_atiixp ohci_pci wmi button sg autofs4 [last unloaded: aic7xxx]
[  126.613992] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.18.7-lcrs #1
[  126.618146] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./990FXA-UD5, BIOS FB 01/23/2013
[  126.622456]  0009 88044fd03cb8 807bde4d 

[  126.626815]   88044fd03cf8 8024cf7c 
88044fd03cf8
[  126.631172]  88043cdbc380  0600 
000702c0
[  126.635504] Call Trace:
[  126.639703]  IRQ  [807bde4d] dump_stack+0x4e/0x71
[  126.644003]  [8024cf7c] warn_slowpath_common+0x7c/0xa0
[  126.648335]  [8024d045] warn_slowpath_null+0x15/0x20
[  126.652632]  [806a47e5] dma_ops_domain_unmap.part.13+0x65/0x70
[  126.656919]  [806a675b] __unmap_single.isra.16+0x9b/0x100
[  126.661242]  [806a7198] unmap_page+0x48/0x70
[  126.665572]  [a0157373] boomerang_rx+0x333/0x600 [3c59x]
[  126.669968]  [a015784a] boomerang_interrupt+0x16a/0x4f0 [3c59x]
[  126.674400]  [8029600e] handle_irq_event_percpu+0x3e/0x1e0
[  126.678860]  [802961ec] handle_irq_event+0x3c/0x60
[  126.683298]  [80298bfe] handle_fasteoi_irq+0x7e/0x130
[  126.687732]  [8020543d] handle_irq+0x1d/0x30
[  126.692164]  [80204cee] do_IRQ+0x4e/0xf0
[  126.696577]  [807c596a] common_interrupt+0x6a/0x6a
[  126.700873]  EOI  [802a6e23] ? hrtimer_start+0x13/0x20
[  126.705107]  [8020cad7] ? default_idle+0x17/0x100
[  126.709386]  [8020d4ca] arch_cpu_idle+0xa/0x10
[  126.713685]  [80281dda] cpu_startup_entry+0x34a/0x380
[  126.718010]  [80281a91] ? cpu_startup_entry+0x1/0x380
[  126.722361]  [80233857] start_secondary+0x157/0x180
[  126.726731] ---[ end trace ace20602a5302afb ]---
[  128.557546] [ cut here ]
[  128.560501] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x000766c0 flags=0x0020]
[  128.560502] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076700 flags=0x0020]
[  128.560503] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0 domain=0x0001 
address=0x00076740 flags=0x0020]
[  128.560504] AMD-Vi: Event logged [IO_PAGE_FAULT device=0f:00.0

Re: IOMMU/DMA API inquiry

2015-02-19 Thread Mark Hounschell


Hi Joerg, Thanks for the response.

On 02/18/2015 01:19 PM, j...@8bytes.org >> Joerg Roedel wrote:

Hi Mark,

On Tue, Feb 17, 2015 at 02:48:03PM -0500, Mark Hounschell wrote:

I understand that AMD IOMMU support is not available for 32-bit
kernels.  I believe the INTEL IOMMU is supported there. Not knowing
why, I was curious if that is going to remain that way?


Yes, I have no plan on making the AMD IOMMU driver available on 32bit.
But I would not be resistant to patches enabling the driver there.



OK. I might take a look. Should it not "just work" more or less?




I've learned that the AMD IOMMU does not play well with the kernels
"Contiguous Memory Allocator" (CMA). I also believe, but could be
mistaken, that the INTEL IOMMU does. Again, not knowing why, I was
curious if that is going to remain that way also?


No, I will queue a patch for the next merge window to enable CMA use in
the AMD IOMMU driver. So there will be support for this.



Great! That's a couple of  kernels away then. I would be happy to test 
if  you have a need.





Is the fact that the AMD IOMMU is not supported on 32 bit kernels
the reason that dma_map_page always returns 0 on 32 bit kernels.


There is no particular reason for not supporting it on 32 bit kernels,
it just didn't seem to be important yet. At least not important enough
to justify the work.



Understood.


I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I
thought that maybe dma_map_sg could be used to get around the fact
that AMD IOMMU doesn't work with CMA. But it looks as though I was
mistaken and I would actually have to do a DMA for_each_sg(sglist,
sg, count, i). Is that correct or can dma_map_sg somehow enable you
to do a single DMA using a single address for the entire sglist?


The map_sg functions can't be used with the AMD IOMMU driver to work
around missing CMA support. Depending on what you want it might work
with the Intel IOMMU driver, as this one allocates a single IOVA region
for the entire sg_list.



Thanks  Joerg

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IOMMU/DMA API inquiry

2015-02-19 Thread Mark Hounschell


Hi Joerg, Thanks for the response.

On 02/18/2015 01:19 PM, j...@8bytes.org  Joerg Roedel wrote:

Hi Mark,

On Tue, Feb 17, 2015 at 02:48:03PM -0500, Mark Hounschell wrote:

I understand that AMD IOMMU support is not available for 32-bit
kernels.  I believe the INTEL IOMMU is supported there. Not knowing
why, I was curious if that is going to remain that way?


Yes, I have no plan on making the AMD IOMMU driver available on 32bit.
But I would not be resistant to patches enabling the driver there.



OK. I might take a look. Should it not just work more or less?




I've learned that the AMD IOMMU does not play well with the kernels
Contiguous Memory Allocator (CMA). I also believe, but could be
mistaken, that the INTEL IOMMU does. Again, not knowing why, I was
curious if that is going to remain that way also?


No, I will queue a patch for the next merge window to enable CMA use in
the AMD IOMMU driver. So there will be support for this.



Great! That's a couple of  kernels away then. I would be happy to test 
if  you have a need.





Is the fact that the AMD IOMMU is not supported on 32 bit kernels
the reason that dma_map_page always returns 0 on 32 bit kernels.


There is no particular reason for not supporting it on 32 bit kernels,
it just didn't seem to be important yet. At least not important enough
to justify the work.



Understood.


I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I
thought that maybe dma_map_sg could be used to get around the fact
that AMD IOMMU doesn't work with CMA. But it looks as though I was
mistaken and I would actually have to do a DMA for_each_sg(sglist,
sg, count, i). Is that correct or can dma_map_sg somehow enable you
to do a single DMA using a single address for the entire sglist?


The map_sg functions can't be used with the AMD IOMMU driver to work
around missing CMA support. Depending on what you want it might work
with the Intel IOMMU driver, as this one allocates a single IOVA region
for the entire sg_list.



Thanks  Joerg

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

IOMMU/DMA API inquiry

2015-02-17 Thread Mark Hounschell

I've searched the Doc tree and web to no avail. I was hoping I might get 
some  answers to a couple of questions that have arisen as a result of 
actually trying to take advantage of the IOMMU in our out of tree GPL 
drivers. The AMD IOMMU in particular. I'm currently using the 3.18.x 
kernel versions. Both 32 and 64 bit.


I understand that AMD IOMMU support is not available for 32-bit kernels. 
 I believe the INTEL IOMMU is supported there. Not knowing why, I was 
curious if that is going to remain that way?


I've learned that the AMD IOMMU does not play well with the kernels 
"Contiguous Memory Allocator" (CMA). I also believe, but could be 
mistaken, that the INTEL IOMMU does. Again, not knowing why, I was 
curious if that is going to remain that way also?


Is the fact that the AMD IOMMU is not supported on 32 bit kernels the 
reason that dma_map_page always returns 0 on 32 bit kernels. I found 
this strange because the dma_alloc_coherent function returns a 
page_to_phys address in the dma_handle when no IOMMU is enabled. I was 
sort of expecting dma_map_page to do the same. Other wise I seem to have 
to KNOW if an enabled and supported IOMMU is present. How does a driver 
tell if an IOMMU is present? I suspect it should not have to?


I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I 
thought that maybe dma_map_sg could be used to get around the fact that 
AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken 
and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). 
Is that correct or can dma_map_sg somehow enable you to do a single DMA 
using a single address for the entire sglist?


Sorry if these questions seem ignorant. It was not intentional.

Thanks for any insight.

Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

IOMMU/DMA API inquiry

2015-02-17 Thread Mark Hounschell

I've searched the Doc tree and web to no avail. I was hoping I might get 
some  answers to a couple of questions that have arisen as a result of 
actually trying to take advantage of the IOMMU in our out of tree GPL 
drivers. The AMD IOMMU in particular. I'm currently using the 3.18.x 
kernel versions. Both 32 and 64 bit.


I understand that AMD IOMMU support is not available for 32-bit kernels. 
 I believe the INTEL IOMMU is supported there. Not knowing why, I was 
curious if that is going to remain that way?


I've learned that the AMD IOMMU does not play well with the kernels 
Contiguous Memory Allocator (CMA). I also believe, but could be 
mistaken, that the INTEL IOMMU does. Again, not knowing why, I was 
curious if that is going to remain that way also?


Is the fact that the AMD IOMMU is not supported on 32 bit kernels the 
reason that dma_map_page always returns 0 on 32 bit kernels. I found 
this strange because the dma_alloc_coherent function returns a 
page_to_phys address in the dma_handle when no IOMMU is enabled. I was 
sort of expecting dma_map_page to do the same. Other wise I seem to have 
to KNOW if an enabled and supported IOMMU is present. How does a driver 
tell if an IOMMU is present? I suspect it should not have to?


I've read the DMA-API-HOWTO.txt concerning dma_map_sg and at first I 
thought that maybe dma_map_sg could be used to get around the fact that 
AMD IOMMU doesn't work with CMA. But it looks as though I was mistaken 
and I would actually have to do a DMA for_each_sg(sglist, sg, count, i). 
Is that correct or can dma_map_sg somehow enable you to do a single DMA 
using a single address for the entire sglist?


Sorry if these questions seem ignorant. It was not intentional.

Thanks for any insight.

Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

dma_alloc_coherent - cma - and IOMMU question

2015-01-29 Thread Mark Hounschell

Sorry for the noise. I've read everything DMA in the kernel Doc dir and 
searched the web to no avail. So I thought I might get some useful info 
here.


I'm currently using a 3.18.3 (x86_64) kernel on an AMD platform. I am 
currently doing 8MB DMAs to and from our device using the in kernel CMA 
"cma=64M@0-4G" with no problems. This device is not DAC or 
scatter/gather capable so the in kernel CMA has been great and replaced 
our old bigphysarea usage.


We simply use dma_alloc_coherent and pass the dma_addr_t *dma_handle 
returned from the dma_alloc_coherent function to our device as the 
"bus/pci" address to use.
We also use remap_pfn_range on that dma_addr_t *dma_handle returned from 
 the dma_alloc_coherent function to mmap userland to the buffer. All is 
good until I enable the IOMMU. I then either get IO_PAGE_FAULTs, the DMA 
just quietly never completes or the system gets borked.


[  106.115725] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 
domain=0x001b address=0xaa50 flags=0x0010]
[  106.115729] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 
domain=0x001b address=0xaa500040 flags=0x0010]


Here are the IOMMU settings in my kernel config:

#grep IOMMU .config
# CONFIG_GART_IOMMU is not set
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_AMD_IOMMU=y
# CONFIG_AMD_IOMMU_STATS is not set
CONFIG_AMD_IOMMU_V2=m
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_IOMMU_STRESS is not set


From reading the in kernel doc it would appear that we could in fact, 
using the IOMMU and the dma_map_sg function, get rid of the CMA 
requirement and our device could DMA anywhere, even above the 4GB 
address space limit of our device. But before going through this larger 
change to our GPL driver, I want to understand if and/or why the 
dma_alloc_coherent function does not appear to set up the IOMMU for me. 
Is the IOMMU only supported for "streaming" DMA type and not for 
"coherent"? I read no reference to this in the kernel doc?


Any hints would be greatly appreciated. Again, sorry for the noise.

Regards
Mark




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

dma_alloc_coherent - cma - and IOMMU question

2015-01-29 Thread Mark Hounschell

Sorry for the noise. I've read everything DMA in the kernel Doc dir and 
searched the web to no avail. So I thought I might get some useful info 
here.


I'm currently using a 3.18.3 (x86_64) kernel on an AMD platform. I am 
currently doing 8MB DMAs to and from our device using the in kernel CMA 
cma=64M@0-4G with no problems. This device is not DAC or 
scatter/gather capable so the in kernel CMA has been great and replaced 
our old bigphysarea usage.


We simply use dma_alloc_coherent and pass the dma_addr_t *dma_handle 
returned from the dma_alloc_coherent function to our device as the 
bus/pci address to use.
We also use remap_pfn_range on that dma_addr_t *dma_handle returned from 
 the dma_alloc_coherent function to mmap userland to the buffer. All is 
good until I enable the IOMMU. I then either get IO_PAGE_FAULTs, the DMA 
just quietly never completes or the system gets borked.


[  106.115725] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 
domain=0x001b address=0xaa50 flags=0x0010]
[  106.115729] AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 
domain=0x001b address=0xaa500040 flags=0x0010]


Here are the IOMMU settings in my kernel config:

#grep IOMMU .config
# CONFIG_GART_IOMMU is not set
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_AMD_IOMMU=y
# CONFIG_AMD_IOMMU_STATS is not set
CONFIG_AMD_IOMMU_V2=m
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_IOMMU_STRESS is not set


From reading the in kernel doc it would appear that we could in fact, 
using the IOMMU and the dma_map_sg function, get rid of the CMA 
requirement and our device could DMA anywhere, even above the 4GB 
address space limit of our device. But before going through this larger 
change to our GPL driver, I want to understand if and/or why the 
dma_alloc_coherent function does not appear to set up the IOMMU for me. 
Is the IOMMU only supported for streaming DMA type and not for 
coherent? I read no reference to this in the kernel doc?


Any hints would be greatly appreciated. Again, sorry for the noise.

Regards
Mark




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

nouveau: Do you really want these new links in the kernel tree?

2014-12-08 Thread Mark Hounschell

/usr/src/linux-3.18> find drivers/ -type l | xargs ls -al

lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/class.h -> ../../../nvif/class.h
lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/event.h -> ../../../nvif/event.h
lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/ioctl.h -> ../../../nvif/ioctl.h
lrwxrwxrwx 1 markh users 22 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/unpack.h -> ../../../nvif/unpack.h
lrwxrwxrwx 1 markh users 12 Dec  7 17:21 ./drivers/gpu/drm/nouveau/nvif/os.h -> 
../core/os.h

Regards
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

nouveau: Do you really want these new links in the kernel tree?

2014-12-08 Thread Mark Hounschell

/usr/src/linux-3.18 find drivers/ -type l | xargs ls -al

lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/class.h - ../../../nvif/class.h
lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/event.h - ../../../nvif/event.h
lrwxrwxrwx 1 markh users 21 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/ioctl.h - ../../../nvif/ioctl.h
lrwxrwxrwx 1 markh users 22 Dec  7 17:21 
./drivers/gpu/drm/nouveau/core/include/nvif/unpack.h - ../../../nvif/unpack.h
lrwxrwxrwx 1 markh users 12 Dec  7 17:21 ./drivers/gpu/drm/nouveau/nvif/os.h - 
../core/os.h

Regards
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Staging: dgnc: Fix long line coding style issues in dgnc_cls.h

2014-12-04 Thread Mark Hounschell


On 12/03/2014 06:37 PM, Joe Perches wrote:

On Wed, 2014-12-03 at 21:30 +, Sean Cleator wrote:

A patch to fix the rest of the long line warnings in the dgnc_cls.h file
found by the checkpatch.pl tool


checkpatch is a brainless little tool.

You should prefer to develop a readable style rather than
pay too close attention to precisely what checkpatch says.

fyi: There is this warning in the file:

  * NOTE: THIS IS A SHARED HEADER. DO NOT CHANGE CODING STYLE!!!



That comment should be removed. It was put there when this driver was 
under Digi's CVS control. Some of their user land apps, delivered with 
the driver package, also used it. They probably had different people 
working on the kernel and user land side.


Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Staging: dgnc: Fix long line coding style issues in dgnc_cls.h

2014-12-04 Thread Mark Hounschell


On 12/03/2014 06:37 PM, Joe Perches wrote:

On Wed, 2014-12-03 at 21:30 +, Sean Cleator wrote:

A patch to fix the rest of the long line warnings in the dgnc_cls.h file
found by the checkpatch.pl tool


checkpatch is a brainless little tool.

You should prefer to develop a readable style rather than
pay too close attention to precisely what checkpatch says.

fyi: There is this warning in the file:

  * NOTE: THIS IS A SHARED HEADER. DO NOT CHANGE CODING STYLE!!!



That comment should be removed. It was put there when this driver was 
under Digi's CVS control. Some of their user land apps, delivered with 
the driver package, also used it. They probably had different people 
working on the kernel and user land side.


Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] staging: dgap: re-arrange functions for removing forward declarations

2014-10-29 Thread Mark Hounschell


On 10/29/2014 05:22 AM, Greg KH wrote:

On Sun, Oct 26, 2014 at 11:08:54AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Tested-by: Mark Hounschell 
Signed-off-by: Daeseok Youn 
---
RESEND: This patch is tested all by Mark.
It is good to merge. Greg, check please.


It doesn't apply to my tree :(




Daeseok,

I will be happy to test the new patch. Somehow another patch got in 
there without us seeing it "commit: 
ac2f46c365e655dd2c366f69c609febc4ac310fd" on the list. That's why it 
doesn't apply for Greg.


Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] staging: dgap: re-arrange functions for removing forward declarations

2014-10-29 Thread Mark Hounschell


On 10/29/2014 05:22 AM, Greg KH wrote:

On Sun, Oct 26, 2014 at 11:08:54AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Tested-by: Mark Hounschell ma...@compro.net
Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
RESEND: This patch is tested all by Mark.
It is good to merge. Greg, check please.


It doesn't apply to my tree :(




Daeseok,

I will be happy to test the new patch. Somehow another patch got in 
there without us seeing it commit: 
ac2f46c365e655dd2c366f69c609febc4ac310fd on the list. That's why it 
doesn't apply for Greg.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.

2014-10-21 Thread Mark Hounschell


On 10/14/2014 08:01 AM, Mark Hounschell wrote:

On 10/13/2014 10:04 PM, Greg KH wrote:

On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote:

On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote:

Hi,

2014-10-13 12:25 GMT+09:00 Greg KH :

On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Signed-off-by: Daeseok Youn 
---
This patch has too many changes for re-arranging the functions.
So I wonder that I should break this up into smaller patches.


Are the .o files identical before and after this patch?  If so, it's
fine.

Ok. I will check for that.


The .o files shouldn't be identical after function reordering.


Hm, they might be the same size, but I can see how on some
architectures (like ppc) how that would not be the case, you are right.

Isn't there an "objdiff" program or something like that which might help
in validating that nothing "changed" in the source for type of patch
that just moves functions around in a file.

thanks,



Greg,

Would just testing the thing be of any help?

Regards
Mark


I don't know what is going on with this patch, but for what it's worth, 
I have applied this patch and my testing still shows everything OK.


Tested-by: Mark Hounschell 

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.

2014-10-21 Thread Mark Hounschell


On 10/14/2014 08:01 AM, Mark Hounschell wrote:

On 10/13/2014 10:04 PM, Greg KH wrote:

On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote:

On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote:

Hi,

2014-10-13 12:25 GMT+09:00 Greg KH gre...@linuxfoundation.org:

On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
This patch has too many changes for re-arranging the functions.
So I wonder that I should break this up into smaller patches.


Are the .o files identical before and after this patch?  If so, it's
fine.

Ok. I will check for that.


The .o files shouldn't be identical after function reordering.


Hm, they might be the same size, but I can see how on some
architectures (like ppc) how that would not be the case, you are right.

Isn't there an objdiff program or something like that which might help
in validating that nothing changed in the source for type of patch
that just moves functions around in a file.

thanks,



Greg,

Would just testing the thing be of any help?

Regards
Mark


I don't know what is going on with this patch, but for what it's worth, 
I have applied this patch and my testing still shows everything OK.


Tested-by: Mark Hounschell ma...@compro.net

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.

2014-10-14 Thread Mark Hounschell


On 10/13/2014 10:04 PM, Greg KH wrote:

On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote:

On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote:

Hi,

2014-10-13 12:25 GMT+09:00 Greg KH :

On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Signed-off-by: Daeseok Youn 
---
This patch has too many changes for re-arranging the functions.
So I wonder that I should break this up into smaller patches.


Are the .o files identical before and after this patch?  If so, it's
fine.

Ok. I will check for that.


The .o files shouldn't be identical after function reordering.


Hm, they might be the same size, but I can see how on some
architectures (like ppc) how that would not be the case, you are right.

Isn't there an "objdiff" program or something like that which might help
in validating that nothing "changed" in the source for type of patch
that just moves functions around in a file.

thanks,



Greg,

Would just testing the thing be of any help?

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: re-arrange functions for removing forward declarations.

2014-10-14 Thread Mark Hounschell


On 10/13/2014 10:04 PM, Greg KH wrote:

On Mon, Oct 13, 2014 at 07:56:38AM -0700, Joe Perches wrote:

On Mon, 2014-10-13 at 17:01 +0900, DaeSeok Youn wrote:

Hi,

2014-10-13 12:25 GMT+09:00 Greg KH gre...@linuxfoundation.org:

On Mon, Oct 13, 2014 at 11:34:25AM +0900, Daeseok Youn wrote:

Re-arrange the functions for removing forward declarations.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
This patch has too many changes for re-arranging the functions.
So I wonder that I should break this up into smaller patches.


Are the .o files identical before and after this patch?  If so, it's
fine.

Ok. I will check for that.


The .o files shouldn't be identical after function reordering.


Hm, they might be the same size, but I can see how on some
architectures (like ppc) how that would not be the case, you are right.

Isn't there an objdiff program or something like that which might help
in validating that nothing changed in the source for type of patch
that just moves functions around in a file.

thanks,



Greg,

Would just testing the thing be of any help?

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()

2014-08-04 Thread Mark Hounschell


On 07/31/2014 07:14 PM, DaeSeok Youn wrote:

Hi, Mark

2014-07-31 21:44 GMT+09:00 Mark Hounschell :

On 07/31/2014 12:02 AM, Daeseok Youn wrote:


When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.

Making a node will allocate node memory and strings for saving
configrations with kstrdup().

So these are freed when dgap is unloaded or failed to initialize.

Signed-off-by: Daeseok Youn 
---
V2: Do not need to free for NULLNODE.

I have been too busy to solve this issue, sorry for late.

Mark, Can you test this patch? I try to make simple module which is
testing dgap_parsefile() and dgap_cleanup_nodes().



I'll be happy to, but I can't do it until Monday. I'm not where the hardware
is until then.

That's OK. :-)



After applying this patch I am still able to load and unload the driver 
at will, and it still works for me.


Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()

2014-08-04 Thread Mark Hounschell


On 07/31/2014 07:14 PM, DaeSeok Youn wrote:

Hi, Mark

2014-07-31 21:44 GMT+09:00 Mark Hounschell ma...@compro.net:

On 07/31/2014 12:02 AM, Daeseok Youn wrote:


When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.

Making a node will allocate node memory and strings for saving
configrations with kstrdup().

So these are freed when dgap is unloaded or failed to initialize.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
V2: Do not need to free for NULLNODE.

I have been too busy to solve this issue, sorry for late.

Mark, Can you test this patch? I try to make simple module which is
testing dgap_parsefile() and dgap_cleanup_nodes().



I'll be happy to, but I can't do it until Monday. I'm not where the hardware
is until then.

That's OK. :-)



After applying this patch I am still able to load and unload the driver 
at will, and it still works for me.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-31 Thread Mark Hounschell


On 07/31/2014 12:02 AM, Daeseok Youn wrote:

When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.

Making a node will allocate node memory and strings for saving
configrations with kstrdup().

So these are freed when dgap is unloaded or failed to initialize.

Signed-off-by: Daeseok Youn 
---
V2: Do not need to free for NULLNODE.

I have been too busy to solve this issue, sorry for late.

Mark, Can you test this patch? I try to make simple module which is
testing dgap_parsefile() and dgap_cleanup_nodes().



I'll be happy to, but I can't do it until Monday. I'm not where the 
hardware is until then.


Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-31 Thread Mark Hounschell


On 07/31/2014 12:02 AM, Daeseok Youn wrote:

When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.

Making a node will allocate node memory and strings for saving
configrations with kstrdup().

So these are freed when dgap is unloaded or failed to initialize.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
V2: Do not need to free for NULLNODE.

I have been too busy to solve this issue, sorry for late.

Mark, Can you test this patch? I try to make simple module which is
testing dgap_parsefile() and dgap_cleanup_nodes().



I'll be happy to, but I can't do it until Monday. I'm not where the 
hardware is until then.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-17 Thread Mark Hounschell

On 07/16/2014 09:35 PM, Daeseok Youn wrote:
> When a configration file is parsed with dgap_parsefile(),
> makes nodes for saving configrations for board.
> 
> Making a node will allocate node memory and strings for saving
> configrations with kstrdup().
> 
> So these are freed when dgap is unloaded or failed to initialize.
> 
> Signed-off-by: Daeseok Youn 
> ---
> Mark, please review this patch.
> Thanks.
> 
>   drivers/staging/dgap/dgap.c |   47 
> +++
>   1 files changed, 47 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
> index 06c55cb..e9df2ea 100644
> --- a/drivers/staging/dgap/dgap.c
> +++ b/drivers/staging/dgap/dgap.c
> @@ -201,6 +201,7 @@ static int dgap_test_fep(struct board_t *brd);
>   static int dgap_tty_register_ports(struct board_t *brd);
>   static int dgap_firmware_load(struct pci_dev *pdev, int card_type,
> struct board_t *brd);
> +static void dgap_cleanup_nodes(void);
>   
>   static void dgap_cleanup_module(void);
>   
> @@ -619,6 +620,7 @@ unregister_tty:
>   free_flipbuf:
>   dgap_free_flipbuf(brd);
>   cleanup_brd:
> + dgap_cleanup_nodes();
>   dgap_release_remap(brd);
>   kfree(brd);
>   
> @@ -659,6 +661,8 @@ static void dgap_cleanup_module(void)
>   dgap_cleanup_board(dgap_board[i]);
>   }
>   
> + dgap_cleanup_nodes();
> +
>   if (dgap_numboards)
>   pci_unregister_driver(_driver);
>   }
> @@ -6323,6 +6327,49 @@ static void dgap_remove_tty_sysfs(struct device *c)
>   sysfs_remove_group(>kobj, _tty_attribute_group);
>   }
>   
> +static void dgap_cleanup_nodes(void)
> +{
> + struct cnode *p;
> +
> + p = _head;
> +
> + while (p) {
> + struct cnode *tmp = p->next;
> +
> + switch (p->type) {
> + case BNODE:
> + kfree(p->u.board.portstr);
> + kfree(p->u.board.addrstr);
> + kfree(p->u.board.pcibusstr);
> + kfree(p->u.board.pcislotstr);
> + kfree(p->u.board.method);
> + break;
> + case CNODE:
> + kfree(p->u.conc.id);
> + kfree(p->u.conc.connect);
> + break;
> + case MNODE:
> + kfree(p->u.module.id);
> + break;
> + case TNODE:
> + kfree(p->u.ttyname);
> + break;
> + case CUNODE:
> + kfree(p->u.cuname);
> + break;
> + case LNODE:
> + kfree(p->u.line.cable);
> + break;
> + case PNODE:
> + kfree(p->u.printname);
> + break;
> + }
> +
> + kfree(p->u.board.status);
> + kfree(p);
> + p = tmp;
> + }
> +}
>   /*
>* Parse a configuration file read into memory as a string.
>*/
> 

I get a kernel oops when unloading the driver with this patch.

2014-07-17T09:22:12.666987-04:00 harley kernel: [60216.979134] task: 
8801037846b0 ti: 880149256000 task.ti: 880149256000
2014-07-17T09:22:12.666988-04:00 harley kernel: [60216.979136] RIP: 
0010:[]  [] kfree+0x17f/0x190
2014-07-17T09:22:12.666989-04:00 harley kernel: [60216.979143] RSP: 
0018:880149257e78  EFLAGS: 00010246
2014-07-17T09:22:12.666991-04:00 harley kernel: [60216.979144] RAX: 
40080068 RBX: a0428d20 RCX: 038eb895
2014-07-17T09:22:12.666992-04:00 harley kernel: [60216.979146] RDX: 
40080068 RSI: ea0004131e00 RDI: a0428d20
2014-07-17T09:22:12.666993-04:00 harley kernel: [60216.979147] RBP: 
880149257e90 R08: 00015b60 R09: 88014fd55b60
2014-07-17T09:22:12.666994-04:00 harley kernel: [60216.979149] R10: 
ea810a00 R11: a0424ffd R12: 88005a3d2a80
2014-07-17T09:22:12.666995-04:00 harley kernel: [60216.979150] R13: 
a041c155 R14: 0002 R15: 0bc0
2014-07-17T09:22:12.666996-04:00 harley kernel: [60216.979152] FS:  
7fb7d51d2700() GS:88014fd4() knlGS:
2014-07-17T09:22:12.666997-04:00 harley kernel: [60216.979154] CS:  0010 DS: 
 ES:  CR0: 8005003b
2014-07-17T09:22:12.666998-04:00 harley kernel: [60216.979155] CR2: 
7fe66ee4e000 CR3: 0001497af000 CR4: 000407e0
2014-07-17T09:22:12.666999-04:00 harley kernel: [60216.979156] Stack:
2014-07-17T09:22:12.667000-04:00 harley kernel: [60216.979157]  
a0428d20 88005a3d2a80 8801259dd330 880149257eb0
2014-07-17T09:22:12.667001-04:00 harley kernel: [60216.979161]  
a041c155 0700 8801259dd000 880149257ee8
2014-07-17T09:22:12.667002-04:00 harley kernel: [60216.979164]  
a0424ee8  80c92700 a04289e0

Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()

2014-07-17 Thread Mark Hounschell


On 07/16/2014 08:42 PM, DaeSeok Youn wrote:

2014-07-16 23:17 GMT+09:00 Mark Hounschell :

On 07/16/2014 05:26 AM, DaeSeok Youn wrote:


2014-07-16 8:50 GMT+09:00 Greg KH :


On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote:


Hi,

2014-07-16 0:29 GMT+09:00 Greg KH :


On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote:


The dgap_err() is printing a message with pr_err(),
so all those are replaced.

Use definition "pr_fmt" and then all of "dgap:" in
the beginning of print messages are removed.

And also removed "out of memory" message because
the kernel has own message for that.

Signed-off-by: Daeseok Youn 
---
V2: use pr_fmt "dgap:" prefix on print message on dgap.
  remove "out of memory" message.

  Adds Mark to TO list and CC list for checking send
  this email properly to him.

   drivers/staging/dgap/dgap.c |  306
+++
   1 files changed, 133 insertions(+), 173 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index 06c55cb..9e750fb 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -41,6 +41,8 @@
*/
   #undef DIGI_CONCENTRATORS_SUPPORTED

+#define pr_fmt(fmt) "dgap: " fmt
+
   #include 
   #include 
   #include 
@@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct
channel_t *ch);
   static int dgap_gettok(char **in);
   static char *dgap_getword(char **in);
   static int dgap_checknode(struct cnode *p);
-static void dgap_err(char *s);

   /*
* Function prototypes from dgap_sysfs.h
@@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct
pci_dev *pdev, int id,
if (ret)
goto free_brd;

- pr_info("dgap: board %d: %s (rev %d), irq %ld\n",
+ pr_info("board %d: %s (rev %d), irq %ld\n",
boardnum, brd->name, brd->rev, brd->irq);



Almost all of the pr_*() calls in this driver should be converted over
to use dev_*() calls instead.  And some of them, like this one, should
be removed entirely (no need for a driver to be "noisy" when a device
for it is found, it should be quiet if at all possible, unless
something
went wrong.)

So can you do that here instead?  I've applied the earlier patches in
this series, and stopped here.


OK. I can. pr_*() calls are replaced with dev_*() calls.
And also removes some of print message which are useless like "out
of memory"



Yes, please do that, that would be great.


I have been working to change pr_*() to dev_*(), but dgap_parse() has no

"struct device" for using dev_*(). If dgap_parse still need for this
driver,
it need to take a parameter for using dev_*(), it may be "pdev" but
configuration
file doesn't need to parse in kernel at all, dgap_parse() will be removed.

So I will wait to verify by Mark about parsing configuration file.

Thanks.

regards,
Daeseok Youn.



Hi Daeseok,

I would wait on that one for now. I know that code has to be removed
eventually. I'm just not sure if we are quite ready. That is actually a LOT
of lines of code also. I think a couple of things need done first.

Here is a sample config file created by one of DIGI's user land applications
(mpi). These entries are only for 2 different cards. There are others cards
that may have other things to consider. I only have these 2 cards types now.
I had a third type which is just a 2 port but that one is gone now.

config_begin
board   Digi_AccelePort_8r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 8
 ttyname ttya
altpin 0
useintr 0
board   Digi_AccelePort_4r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 4
 ttyname ttyb
altpin 0
useintr 0
board   Digi_AccelePort_8r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 8
 ttyname ttyc
altpin 0
useintr 0
config_end


oh.. I couldn't find a sample of config file for dgap module in web. Thanks.



The altpin and useintr parameters need to be converted to module parameters.
I found references in the code somewhere that the nports may not be reliably
known using the device id for at least one card type. All the other stuff in
this particular config file is pretty much useless. Well, sort of. The
ttyname parameter is used by the driver to populate a "sys" file with a
custom device name that is then used by a userland script and udev to allow
a user  to define his own device names. Custom links are created. Perhaps
this also would be nice to have as a module param?


I'm not sure about using module param and udev. I think config file
used when firmware
is loaded.


The config file has nothing to do with the actual firmware loading or 
which firmware file is to be loaded for a given board. There are a 
couple of things done to the board either before or after firmware loads 
that probably

Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()

2014-07-17 Thread Mark Hounschell


On 07/16/2014 08:42 PM, DaeSeok Youn wrote:

2014-07-16 23:17 GMT+09:00 Mark Hounschell ma...@compro.net:

On 07/16/2014 05:26 AM, DaeSeok Youn wrote:


2014-07-16 8:50 GMT+09:00 Greg KH gre...@linuxfoundation.org:


On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote:


Hi,

2014-07-16 0:29 GMT+09:00 Greg KH gre...@linuxfoundation.org:


On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote:


The dgap_err() is printing a message with pr_err(),
so all those are replaced.

Use definition pr_fmt and then all of dgap: in
the beginning of print messages are removed.

And also removed out of memory message because
the kernel has own message for that.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
V2: use pr_fmt dgap: prefix on print message on dgap.
  remove out of memory message.

  Adds Mark to TO list and CC list for checking send
  this email properly to him.

   drivers/staging/dgap/dgap.c |  306
+++
   1 files changed, 133 insertions(+), 173 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index 06c55cb..9e750fb 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -41,6 +41,8 @@
*/
   #undef DIGI_CONCENTRATORS_SUPPORTED

+#define pr_fmt(fmt) dgap:  fmt
+
   #include linux/kernel.h
   #include linux/module.h
   #include linux/pci.h
@@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct
channel_t *ch);
   static int dgap_gettok(char **in);
   static char *dgap_getword(char **in);
   static int dgap_checknode(struct cnode *p);
-static void dgap_err(char *s);

   /*
* Function prototypes from dgap_sysfs.h
@@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct
pci_dev *pdev, int id,
if (ret)
goto free_brd;

- pr_info(dgap: board %d: %s (rev %d), irq %ld\n,
+ pr_info(board %d: %s (rev %d), irq %ld\n,
boardnum, brd-name, brd-rev, brd-irq);



Almost all of the pr_*() calls in this driver should be converted over
to use dev_*() calls instead.  And some of them, like this one, should
be removed entirely (no need for a driver to be noisy when a device
for it is found, it should be quiet if at all possible, unless
something
went wrong.)

So can you do that here instead?  I've applied the earlier patches in
this series, and stopped here.


OK. I can. pr_*() calls are replaced with dev_*() calls.
And also removes some of print message which are useless like out
of memory



Yes, please do that, that would be great.


I have been working to change pr_*() to dev_*(), but dgap_parse() has no

struct device for using dev_*(). If dgap_parse still need for this
driver,
it need to take a parameter for using dev_*(), it may be pdev but
configuration
file doesn't need to parse in kernel at all, dgap_parse() will be removed.

So I will wait to verify by Mark about parsing configuration file.

Thanks.

regards,
Daeseok Youn.



Hi Daeseok,

I would wait on that one for now. I know that code has to be removed
eventually. I'm just not sure if we are quite ready. That is actually a LOT
of lines of code also. I think a couple of things need done first.

Here is a sample config file created by one of DIGI's user land applications
(mpi). These entries are only for 2 different cards. There are others cards
that may have other things to consider. I only have these 2 cards types now.
I had a third type which is just a 2 port but that one is gone now.

config_begin
board   Digi_AccelePort_8r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 8
 ttyname ttya
altpin 0
useintr 0
board   Digi_AccelePort_4r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 4
 ttyname ttyb
altpin 0
useintr 0
board   Digi_AccelePort_8r_920_PCI
 io 0x000
 mem 0x00
 start 1
 nports 8
 ttyname ttyc
altpin 0
useintr 0
config_end


oh.. I couldn't find a sample of config file for dgap module in web. Thanks.



The altpin and useintr parameters need to be converted to module parameters.
I found references in the code somewhere that the nports may not be reliably
known using the device id for at least one card type. All the other stuff in
this particular config file is pretty much useless. Well, sort of. The
ttyname parameter is used by the driver to populate a sys file with a
custom device name that is then used by a userland script and udev to allow
a user  to define his own device names. Custom links are created. Perhaps
this also would be nice to have as a module param?


I'm not sure about using module param and udev. I think config file
used when firmware
is loaded.


The config file has nothing to do with the actual firmware loading or 
which firmware file is to be loaded for a given board. There are a 
couple of things done to the board either before or after firmware loads 
that probably still require that the config file had

Re: [PATCH] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-17 Thread Mark Hounschell

On 07/16/2014 09:35 PM, Daeseok Youn wrote:
 When a configration file is parsed with dgap_parsefile(),
 makes nodes for saving configrations for board.
 
 Making a node will allocate node memory and strings for saving
 configrations with kstrdup().
 
 So these are freed when dgap is unloaded or failed to initialize.
 
 Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
 ---
 Mark, please review this patch.
 Thanks.
 
   drivers/staging/dgap/dgap.c |   47 
 +++
   1 files changed, 47 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
 index 06c55cb..e9df2ea 100644
 --- a/drivers/staging/dgap/dgap.c
 +++ b/drivers/staging/dgap/dgap.c
 @@ -201,6 +201,7 @@ static int dgap_test_fep(struct board_t *brd);
   static int dgap_tty_register_ports(struct board_t *brd);
   static int dgap_firmware_load(struct pci_dev *pdev, int card_type,
 struct board_t *brd);
 +static void dgap_cleanup_nodes(void);
   
   static void dgap_cleanup_module(void);
   
 @@ -619,6 +620,7 @@ unregister_tty:
   free_flipbuf:
   dgap_free_flipbuf(brd);
   cleanup_brd:
 + dgap_cleanup_nodes();
   dgap_release_remap(brd);
   kfree(brd);
   
 @@ -659,6 +661,8 @@ static void dgap_cleanup_module(void)
   dgap_cleanup_board(dgap_board[i]);
   }
   
 + dgap_cleanup_nodes();
 +
   if (dgap_numboards)
   pci_unregister_driver(dgap_driver);
   }
 @@ -6323,6 +6327,49 @@ static void dgap_remove_tty_sysfs(struct device *c)
   sysfs_remove_group(c-kobj, dgap_tty_attribute_group);
   }
   
 +static void dgap_cleanup_nodes(void)
 +{
 + struct cnode *p;
 +
 + p = dgap_head;
 +
 + while (p) {
 + struct cnode *tmp = p-next;
 +
 + switch (p-type) {
 + case BNODE:
 + kfree(p-u.board.portstr);
 + kfree(p-u.board.addrstr);
 + kfree(p-u.board.pcibusstr);
 + kfree(p-u.board.pcislotstr);
 + kfree(p-u.board.method);
 + break;
 + case CNODE:
 + kfree(p-u.conc.id);
 + kfree(p-u.conc.connect);
 + break;
 + case MNODE:
 + kfree(p-u.module.id);
 + break;
 + case TNODE:
 + kfree(p-u.ttyname);
 + break;
 + case CUNODE:
 + kfree(p-u.cuname);
 + break;
 + case LNODE:
 + kfree(p-u.line.cable);
 + break;
 + case PNODE:
 + kfree(p-u.printname);
 + break;
 + }
 +
 + kfree(p-u.board.status);
 + kfree(p);
 + p = tmp;
 + }
 +}
   /*
* Parse a configuration file read into memory as a string.
*/
 

I get a kernel oops when unloading the driver with this patch.

2014-07-17T09:22:12.666987-04:00 harley kernel: [60216.979134] task: 
8801037846b0 ti: 880149256000 task.ti: 880149256000
2014-07-17T09:22:12.666988-04:00 harley kernel: [60216.979136] RIP: 
0010:[8034d5ff]  [8034d5ff] kfree+0x17f/0x190
2014-07-17T09:22:12.666989-04:00 harley kernel: [60216.979143] RSP: 
0018:880149257e78  EFLAGS: 00010246
2014-07-17T09:22:12.666991-04:00 harley kernel: [60216.979144] RAX: 
40080068 RBX: a0428d20 RCX: 038eb895
2014-07-17T09:22:12.666992-04:00 harley kernel: [60216.979146] RDX: 
40080068 RSI: ea0004131e00 RDI: a0428d20
2014-07-17T09:22:12.666993-04:00 harley kernel: [60216.979147] RBP: 
880149257e90 R08: 00015b60 R09: 88014fd55b60
2014-07-17T09:22:12.666994-04:00 harley kernel: [60216.979149] R10: 
ea810a00 R11: a0424ffd R12: 88005a3d2a80
2014-07-17T09:22:12.666995-04:00 harley kernel: [60216.979150] R13: 
a041c155 R14: 0002 R15: 0bc0
2014-07-17T09:22:12.666996-04:00 harley kernel: [60216.979152] FS:  
7fb7d51d2700() GS:88014fd4() knlGS:
2014-07-17T09:22:12.666997-04:00 harley kernel: [60216.979154] CS:  0010 DS: 
 ES:  CR0: 8005003b
2014-07-17T09:22:12.666998-04:00 harley kernel: [60216.979155] CR2: 
7fe66ee4e000 CR3: 0001497af000 CR4: 000407e0
2014-07-17T09:22:12.666999-04:00 harley kernel: [60216.979156] Stack:
2014-07-17T09:22:12.667000-04:00 harley kernel: [60216.979157]  
a0428d20 88005a3d2a80 8801259dd330 880149257eb0
2014-07-17T09:22:12.667001-04:00 harley kernel: [60216.979161]  
a041c155 0700 8801259dd000 880149257ee8
2014-07-17T09:22:12.667002-04:00 harley kernel: [60216.979164]  
a0424ee8  80c92700 a04289e0
2014-07-17T09:22:12.667002-04:00 harley kernel:

Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()

2014-07-16 Thread Mark Hounschell


On 07/16/2014 05:26 AM, DaeSeok Youn wrote:

2014-07-16 8:50 GMT+09:00 Greg KH :

On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote:

Hi,

2014-07-16 0:29 GMT+09:00 Greg KH :

On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote:

The dgap_err() is printing a message with pr_err(),
so all those are replaced.

Use definition "pr_fmt" and then all of "dgap:" in
the beginning of print messages are removed.

And also removed "out of memory" message because
the kernel has own message for that.

Signed-off-by: Daeseok Youn 
---
V2: use pr_fmt "dgap:" prefix on print message on dgap.
 remove "out of memory" message.

 Adds Mark to TO list and CC list for checking send
 this email properly to him.

  drivers/staging/dgap/dgap.c |  306 +++
  1 files changed, 133 insertions(+), 173 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index 06c55cb..9e750fb 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -41,6 +41,8 @@
   */
  #undef DIGI_CONCENTRATORS_SUPPORTED

+#define pr_fmt(fmt) "dgap: " fmt
+
  #include 
  #include 
  #include 
@@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch);
  static int dgap_gettok(char **in);
  static char *dgap_getword(char **in);
  static int dgap_checknode(struct cnode *p);
-static void dgap_err(char *s);

  /*
   * Function prototypes from dgap_sysfs.h
@@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev 
*pdev, int id,
   if (ret)
   goto free_brd;

- pr_info("dgap: board %d: %s (rev %d), irq %ld\n",
+ pr_info("board %d: %s (rev %d), irq %ld\n",
   boardnum, brd->name, brd->rev, brd->irq);


Almost all of the pr_*() calls in this driver should be converted over
to use dev_*() calls instead.  And some of them, like this one, should
be removed entirely (no need for a driver to be "noisy" when a device
for it is found, it should be quiet if at all possible, unless something
went wrong.)

So can you do that here instead?  I've applied the earlier patches in
this series, and stopped here.

OK. I can. pr_*() calls are replaced with dev_*() calls.
And also removes some of print message which are useless like "out
of memory"


Yes, please do that, that would be great.

I have been working to change pr_*() to dev_*(), but dgap_parse() has no
"struct device" for using dev_*(). If dgap_parse still need for this driver,
it need to take a parameter for using dev_*(), it may be "pdev" but
configuration
file doesn't need to parse in kernel at all, dgap_parse() will be removed.

So I will wait to verify by Mark about parsing configuration file.

Thanks.

regards,
Daeseok Youn.



Hi Daeseok,

I would wait on that one for now. I know that code has to be removed 
eventually. I'm just not sure if we are quite ready. That is actually a 
LOT of lines of code also. I think a couple of things need done first.


Here is a sample config file created by one of DIGI's user land 
applications (mpi). These entries are only for 2 different cards. There 
are others cards that may have other things to consider. I only have 
these 2 cards types now. I had a third type which is just a 2 port but 
that one is gone now.


config_begin
board   Digi_AccelePort_8r_920_PCI
io 0x000
mem 0x00
start 1
nports 8
ttyname ttya
altpin 0
useintr 0
board   Digi_AccelePort_4r_920_PCI
io 0x000
mem 0x00
start 1
nports 4
ttyname ttyb
altpin 0
useintr 0
board   Digi_AccelePort_8r_920_PCI
io 0x000
mem 0x00
start 1
nports 8
ttyname ttyc
altpin 0
useintr 0
config_end

The altpin and useintr parameters need to be converted to module 
parameters. I found references in the code somewhere that the nports may 
not be reliably known using the device id for at least one card type. 
All the other stuff in this particular config file is pretty much 
useless. Well, sort of. The ttyname parameter is used by the driver to 
populate a "sys" file with a custom device name that is then used by a 
userland script and udev to allow a user  to define his own device 
names. Custom links are created. Perhaps this also would be nice to have 
as a module param?


Also, there is a userland utility (dpa) that is used to set some 
parameters of the card. Sort of like stty except it is for "special" 
things that the kernel does not directly support. Like special baud 
rates and such. I'm not sure what to do about that one. I personally 
have never used these "special" things but I'm sure they are used by 
someone somewhere?? I saw some code removed related to "dpa" recently. 
This came to mind.


Regards
Mark






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: [PATCH 7/8 RESEND] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-16 Thread Mark Hounschell


On 07/15/2014 11:30 AM, Greg KH wrote:

On Tue, Jul 15, 2014 at 06:14:25PM +0900, Daeseok Youn wrote:

When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.


configuration files should not be parsed in the kernel at all.  That
logic should be removed as it should not be needed.

Mark, can you verify that this is not needed with your hardware anymore?



As far as I can see it is still needed. If no config file, no devices 
are created and configured. I had planned to remove that code such that 
all the different cards supported are still supported. Even though I 
only have 2 different types of cards. I Just haven't got to it yet. I 
realize "configuration files should not be parsed in the kernel" and I'm 
certainly not going to argue that. But the way it is, it allows user 
land to define board order and even device node names (additional links 
to the actual device). No matter where boards are installed and found by 
the kernel, you have persistent naming.  I guess all that can be done 
with udev (well, sort of anyway).


There are also options in the config file that should be passed in as 
module parameters

and I planned to do that part before messing with the config file code.

We also have this "DIGI_EXPANDERS_SUPPORTED" thingy. I have none of 
these cards. These cards interrogate the expander to find out how many 
ports there are. Then configure accordingly. Should we just remove all 
that code and NOT support expanders since we don't currently have the 
hardware to test?


I will update from your current git tree and test this week. I've been 
swamped lately and unable to do much. I thought I'd be available to do 
some more work but things come up as you know.


Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8 RESEND] staging: dgap: introduce dgap_cleanup_nodes()

2014-07-16 Thread Mark Hounschell


On 07/15/2014 11:30 AM, Greg KH wrote:

On Tue, Jul 15, 2014 at 06:14:25PM +0900, Daeseok Youn wrote:

When a configration file is parsed with dgap_parsefile(),
makes nodes for saving configrations for board.


configuration files should not be parsed in the kernel at all.  That
logic should be removed as it should not be needed.

Mark, can you verify that this is not needed with your hardware anymore?



As far as I can see it is still needed. If no config file, no devices 
are created and configured. I had planned to remove that code such that 
all the different cards supported are still supported. Even though I 
only have 2 different types of cards. I Just haven't got to it yet. I 
realize configuration files should not be parsed in the kernel and I'm 
certainly not going to argue that. But the way it is, it allows user 
land to define board order and even device node names (additional links 
to the actual device). No matter where boards are installed and found by 
the kernel, you have persistent naming.  I guess all that can be done 
with udev (well, sort of anyway).


There are also options in the config file that should be passed in as 
module parameters

and I planned to do that part before messing with the config file code.

We also have this DIGI_EXPANDERS_SUPPORTED thingy. I have none of 
these cards. These cards interrogate the expander to find out how many 
ports there are. Then configure accordingly. Should we just remove all 
that code and NOT support expanders since we don't currently have the 
hardware to test?


I will update from your current git tree and test this week. I've been 
swamped lately and unable to do much. I thought I'd be available to do 
some more work but things come up as you know.


Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/8 V2] staging: dgap: remove unneeded dgap_err()

2014-07-16 Thread Mark Hounschell


On 07/16/2014 05:26 AM, DaeSeok Youn wrote:

2014-07-16 8:50 GMT+09:00 Greg KH gre...@linuxfoundation.org:

On Wed, Jul 16, 2014 at 08:21:30AM +0900, DaeSeok Youn wrote:

Hi,

2014-07-16 0:29 GMT+09:00 Greg KH gre...@linuxfoundation.org:

On Tue, Jul 15, 2014 at 06:11:44PM +0900, Daeseok Youn wrote:

The dgap_err() is printing a message with pr_err(),
so all those are replaced.

Use definition pr_fmt and then all of dgap: in
the beginning of print messages are removed.

And also removed out of memory message because
the kernel has own message for that.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
V2: use pr_fmt dgap: prefix on print message on dgap.
 remove out of memory message.

 Adds Mark to TO list and CC list for checking send
 this email properly to him.

  drivers/staging/dgap/dgap.c |  306 +++
  1 files changed, 133 insertions(+), 173 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index 06c55cb..9e750fb 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -41,6 +41,8 @@
   */
  #undef DIGI_CONCENTRATORS_SUPPORTED

+#define pr_fmt(fmt) dgap:  fmt
+
  #include linux/kernel.h
  #include linux/module.h
  #include linux/pci.h
@@ -153,7 +155,6 @@ static void dgap_firmware_reset_port(struct channel_t *ch);
  static int dgap_gettok(char **in);
  static char *dgap_getword(char **in);
  static int dgap_checknode(struct cnode *p);
-static void dgap_err(char *s);

  /*
   * Function prototypes from dgap_sysfs.h
@@ -815,7 +816,7 @@ static struct board_t *dgap_found_board(struct pci_dev 
*pdev, int id,
   if (ret)
   goto free_brd;

- pr_info(dgap: board %d: %s (rev %d), irq %ld\n,
+ pr_info(board %d: %s (rev %d), irq %ld\n,
   boardnum, brd-name, brd-rev, brd-irq);


Almost all of the pr_*() calls in this driver should be converted over
to use dev_*() calls instead.  And some of them, like this one, should
be removed entirely (no need for a driver to be noisy when a device
for it is found, it should be quiet if at all possible, unless something
went wrong.)

So can you do that here instead?  I've applied the earlier patches in
this series, and stopped here.

OK. I can. pr_*() calls are replaced with dev_*() calls.
And also removes some of print message which are useless like out
of memory


Yes, please do that, that would be great.

I have been working to change pr_*() to dev_*(), but dgap_parse() has no
struct device for using dev_*(). If dgap_parse still need for this driver,
it need to take a parameter for using dev_*(), it may be pdev but
configuration
file doesn't need to parse in kernel at all, dgap_parse() will be removed.

So I will wait to verify by Mark about parsing configuration file.

Thanks.

regards,
Daeseok Youn.



Hi Daeseok,

I would wait on that one for now. I know that code has to be removed 
eventually. I'm just not sure if we are quite ready. That is actually a 
LOT of lines of code also. I think a couple of things need done first.


Here is a sample config file created by one of DIGI's user land 
applications (mpi). These entries are only for 2 different cards. There 
are others cards that may have other things to consider. I only have 
these 2 cards types now. I had a third type which is just a 2 port but 
that one is gone now.


config_begin
board   Digi_AccelePort_8r_920_PCI
io 0x000
mem 0x00
start 1
nports 8
ttyname ttya
altpin 0
useintr 0
board   Digi_AccelePort_4r_920_PCI
io 0x000
mem 0x00
start 1
nports 4
ttyname ttyb
altpin 0
useintr 0
board   Digi_AccelePort_8r_920_PCI
io 0x000
mem 0x00
start 1
nports 8
ttyname ttyc
altpin 0
useintr 0
config_end

The altpin and useintr parameters need to be converted to module 
parameters. I found references in the code somewhere that the nports may 
not be reliably known using the device id for at least one card type. 
All the other stuff in this particular config file is pretty much 
useless. Well, sort of. The ttyname parameter is used by the driver to 
populate a sys file with a custom device name that is then used by a 
userland script and udev to allow a user  to define his own device 
names. Custom links are created. Perhaps this also would be nice to have 
as a module param?


Also, there is a userland utility (dpa) that is used to set some 
parameters of the card. Sort of like stty except it is for special 
things that the kernel does not directly support. Like special baud 
rates and such. I'm not sure what to do about that one. I personally 
have never used these special things but I'm sure they are used by 
someone somewhere?? I saw some code removed related to dpa recently. 
This came to mind.


Regards
Mark






--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org

Re: [PATCH V2] staging: dgap: implement proper error handling in dgap_firmware_load()

2014-05-28 Thread Mark Hounschell

On 05/28/2014 06:11 AM, Dan Carpenter wrote:
> On Wed, May 28, 2014 at 06:29:38PM +0900, DaeSeok Youn wrote:
>>> In your patch it has:
>>> +   dgap_tty_uninit(brd, false);
>>>
>>> But it should only be "false" if dgap_tty_init() failed.  If
>>> dgap_tty_register_ports() fails then it should be "true".  Another
>> Yes, you're right. There were no error handle for tty_port_register_device() 
>> and
>> dgap_create_tty_sysfs() in dgap_tty_register_ports(). I didn't catch it. :-(
>> It need to add error handlers for them, right?
> 
> Eventually, yes.  But I don't see a simple way to fix
> dgap_firmware_load() until after the code is cleaned up.
> 
>>
>>> problem is that as you say, the earlier function are allocating
>>> resources like dgap_tty_register() but only the last two function calls
>>> have a "goto err_cleanup;" so the error handling is incomplete.
>> So remove "goto" in dgap_firmware_load() and add error handler in
>> dgap_tty_init()
> 
> In the current code there isn't a goto in dgap_firmware_load().  Remove
> the call to dgap_tty_uninit() and add error handling in dgap_tty_init().
> 
> That will clean up the code, and fix some NULL dereference bugs inside
> dgap_tty_uninit().
> 
>> and dgap_tty_register_ports(), right?
> 
> Inside dgap_tty_register_ports(), then we should add a
> kfree(brd->serial_ports) if the "brd->printer_ports" allocation fails.
> That is not a complete fix, but it is a part fix and it is clean.
> 
>>
>> I have a question of this. In case of this, how to complete the error 
>> handling?
> 
> [patch 1/x] staging: dgap: remove useless dgap_probe1() function
> [patch 2/x] staging: dgap: unwind on error in dgap_found_board()
> [patch 3/x] staging: dgap: remove bogus null test in dgap_tty_init()
>   The ->channels[] were set to null in dgap_found_board().
> [patch 4/x] staging: dgap: unwind on error in dgap_tty_init()
>   This also removes the call to dgap_tty_uninit() in
>   dgap_firmware_load()
> [patch 5/x] staging: dgap: unwind on error in dgap_tty_register_ports()
> [patch 6/x] staging: dgap: make dgap_config_buf a local buffer
> [patch 7/x] staging: dgap: pass "dgap_numboards" to dgap_found_board()
>   instead of using a global variable
> [patch 8/x] staging: dgap: pass "brd" to dgap_after_config_loaded()
>   instead of passing "dgap_numboards" and looking up brd again.
> [patch 9/x] staging: dgap: rename dgap_finalize_board_init() to 
> dgap_request_irq()
> 
> In the end, I hate dgap_tty_uninit() because it doesn't match
> dgap_tty_init() at all.  It's poorly named.  We should rename it and
> make another dgap_tty_init() which just sets the ->channels[] to NULL.
> 
> [patch x/x] staging: dgap: introduce dgap_tty_unregister()
>   This is currently done in dgap_tty_uninit(), which is the wrong
>   place.
> 
> I have started using a new todo list tag in my emails.  So I'm adding
> this stuff to the todo list.
> 
> TODO-list: 2014-05-28: dgap: cleanups and bug fixes.
> 

I can think of some more things that could be added to that TODO-list.

All the configuration file stuff needs to go away. As Greg previously
said, we don't read configurations files in kernel modules. I'm pretty
sure all cards supported have unique device IDs, even though notes and
code in this driver imply otherwise. As long as they all do, with the
exception of those cards that use port extenders, I think this could be
done. Maybe we will never be able to support "port extenders". Unclear
yet. When we do this, the useintr and altpin configuration file stuff
will need converted to module options.

There was also some ioctl code removed entirely before the driver was
merged into staging. It was specific to the dgap_mgmt device (currently
dead as a result) that allowed userland some real-time monitoring and
configuration changes. These userland utilities were part of the
original Digi package and I'm sure are used by some. I personally never
used them, but...

Then there are some things you recommended in a previous email that I
haven't got to yet?

Is that TODO-list going to be commited?

Regards
Mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] staging: dgap: implement proper error handling in dgap_firmware_load()

2014-05-28 Thread Mark Hounschell

On 05/28/2014 06:11 AM, Dan Carpenter wrote:
 On Wed, May 28, 2014 at 06:29:38PM +0900, DaeSeok Youn wrote:
 In your patch it has:
 +   dgap_tty_uninit(brd, false);

 But it should only be false if dgap_tty_init() failed.  If
 dgap_tty_register_ports() fails then it should be true.  Another
 Yes, you're right. There were no error handle for tty_port_register_device() 
 and
 dgap_create_tty_sysfs() in dgap_tty_register_ports(). I didn't catch it. :-(
 It need to add error handlers for them, right?
 
 Eventually, yes.  But I don't see a simple way to fix
 dgap_firmware_load() until after the code is cleaned up.
 

 problem is that as you say, the earlier function are allocating
 resources like dgap_tty_register() but only the last two function calls
 have a goto err_cleanup; so the error handling is incomplete.
 So remove goto in dgap_firmware_load() and add error handler in
 dgap_tty_init()
 
 In the current code there isn't a goto in dgap_firmware_load().  Remove
 the call to dgap_tty_uninit() and add error handling in dgap_tty_init().
 
 That will clean up the code, and fix some NULL dereference bugs inside
 dgap_tty_uninit().
 
 and dgap_tty_register_ports(), right?
 
 Inside dgap_tty_register_ports(), then we should add a
 kfree(brd-serial_ports) if the brd-printer_ports allocation fails.
 That is not a complete fix, but it is a part fix and it is clean.
 

 I have a question of this. In case of this, how to complete the error 
 handling?
 
 [patch 1/x] staging: dgap: remove useless dgap_probe1() function
 [patch 2/x] staging: dgap: unwind on error in dgap_found_board()
 [patch 3/x] staging: dgap: remove bogus null test in dgap_tty_init()
   The -channels[] were set to null in dgap_found_board().
 [patch 4/x] staging: dgap: unwind on error in dgap_tty_init()
   This also removes the call to dgap_tty_uninit() in
   dgap_firmware_load()
 [patch 5/x] staging: dgap: unwind on error in dgap_tty_register_ports()
 [patch 6/x] staging: dgap: make dgap_config_buf a local buffer
 [patch 7/x] staging: dgap: pass dgap_numboards to dgap_found_board()
   instead of using a global variable
 [patch 8/x] staging: dgap: pass brd to dgap_after_config_loaded()
   instead of passing dgap_numboards and looking up brd again.
 [patch 9/x] staging: dgap: rename dgap_finalize_board_init() to 
 dgap_request_irq()
 
 In the end, I hate dgap_tty_uninit() because it doesn't match
 dgap_tty_init() at all.  It's poorly named.  We should rename it and
 make another dgap_tty_init() which just sets the -channels[] to NULL.
 
 [patch x/x] staging: dgap: introduce dgap_tty_unregister()
   This is currently done in dgap_tty_uninit(), which is the wrong
   place.
 
 I have started using a new todo list tag in my emails.  So I'm adding
 this stuff to the todo list.
 
 TODO-list: 2014-05-28: dgap: cleanups and bug fixes.
 

I can think of some more things that could be added to that TODO-list.

All the configuration file stuff needs to go away. As Greg previously
said, we don't read configurations files in kernel modules. I'm pretty
sure all cards supported have unique device IDs, even though notes and
code in this driver imply otherwise. As long as they all do, with the
exception of those cards that use port extenders, I think this could be
done. Maybe we will never be able to support port extenders. Unclear
yet. When we do this, the useintr and altpin configuration file stuff
will need converted to module options.

There was also some ioctl code removed entirely before the driver was
merged into staging. It was specific to the dgap_mgmt device (currently
dead as a result) that allowed userland some real-time monitoring and
configuration changes. These userland utilities were part of the
original Digi package and I'm sure are used by some. I personally never
used them, but...

Then there are some things you recommended in a previous email that I
haven't got to yet?

Is that TODO-list going to be commited?

Regards
Mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()

2014-04-25 Thread Mark Hounschell

On 04/25/2014 08:59 AM, Dan Carpenter wrote:
> On Fri, Apr 25, 2014 at 08:29:41AM -0400, Mark Hounschell wrote:
>> On 04/25/2014 07:02 AM, DaeSeok Youn wrote:
>>> Hi, Dan.
>>>
>>> 2014-04-25 18:26 GMT+09:00 Dan Carpenter :
>>>> Mark, maybe you should add yourself to the MAINTAINERS entry for this
>>>> driver?
>>>>
>>
>> I'll look into this. I am clueless on what that would actually mean.
>>
> 
> Just add your name with Lidza in the MAINTAINERS file so that people
> will CC you on all the patches.
> 
> DIGI EPCA PCI PRODUCTS
> M:  Lidza Louina 
> L:  driverdev-de...@linuxdriverproject.org
> S:  Maintained
> F:  drivers/staging/dgap/
> 
> You don't have to do it if you don't want to, but you seem to be working
> on this driver and I'm going to refer questions to you either way.  :P
> 
>>>> On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote:
>>>>> @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd)
>>>>>   /* Register tty devices */
>>>>>   rc = tty_register_driver(brd->SerialDriver);
>>>>>   if (rc < 0)
>>>>> - return rc;
>>>>> + goto free_print_ttys;
>>>>> +
>>>>>   brd->dgap_Major_Serial_Registered = TRUE;
>>>>>   dgap_BoardsByMajor[brd->SerialDriver->major] = brd;
>>>>>   brd->dgap_Serial_Major = brd->SerialDriver->major;
>>>>> @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd)
>>>>>   /* Register Transparent Print devices */
>>>>>   rc = tty_register_driver(brd->PrintDriver);
>>>>>   if (rc < 0)
>>>>> - return rc;
>>>>> + goto unregister_serial_drv;
>>>>> +
>>>>>   brd->dgap_Major_TransparentPrint_Registered = TRUE;
>>>>>   dgap_BoardsByMajor[brd->PrintDriver->major] = brd;
>>>>>   brd->dgap_TransparentPrint_Major = brd->PrintDriver->major;
>>>>>   }
>>>>>
>>>>>   return rc;
>>>>> +
>>>>> +unregister_serial_drv:
>>>>> + tty_unregister_driver(brd->SerialDriver);
>>>>
>>>> We only register the ->SerialDriver if someone else hasn't registered it
>>>> first?  So this should be:
>>>>
>>>> if (we_were_the_ones_who_registered_the_serial_driver)
>>>> tty_unregister_driver(brd->SerialDriver);
>>>>
>>>> I haven't followed looked at this.  Who else is registering the serial
>>>> driver?  You have looked at this, what do you think?  Or Mark.
>>>
>>
>> registering the brd->XDriver is only done when a board is detected
>> and only during the firmware_load process. If we fail to
>> tty_register_driver do we _need_ to tty_unregister_driver? Isn't that
>> like freeing after an alloc failure?
> 
> The allocation is conditional so the free should be conditional.  If we
> didn't allocate it, then we shouldn't free it.
> 
> It wouldn't have even been a question except I'm not sure the allocation
> is *really* conditional because brd->dgap_Major_Serial_Registered might
> always be "false" like you guys seem to be saying.
> 
>>
>>> I think brd struct is from dgap_Board array as global static variable
>>> when this function is
>>> called. So brd->dgap_Major_Serial_Registered is always "false".
>>> If dgap_NumBoards is less than MAXBOARDS, brd->SerialDriver should be
>>> registered.
>>>
>>> I'm not sure..
>>>
>>
>> I don't see any check for (dgap_NumBoards <  MAXBOARDS), which I think I
>> probably should, but I do see we are calling dgap_tty_register, which
>> can fail, without actually checking the return value. Also, yes,
>> dgap_Major_Xxxx_Registered seems to be always "false" until registered,
>> and it looks like dgap_Major_X_Registered flags could be removed
>> because the only places we can unregister is at module_cleanup or
>> "after" it is already registered.
>>
>> What is the driver _supposed_ to do if we fail something on the second
>> or later board? Is the driver supposed to cleanup and exit or are we
>> supposed to stay loaded for the board/boards that are usable?
> 
> Stay loaded.
> 

Then these tests on brd->dgap_Major_Serial_Registered need to stay in
there. If I have 3 boards and the second fails in some way, if I rmmod
the driver they will protect from unregistering a never registered one.
At least in the unregister code path. There is probably no need for them
in the register code path. I'll work up a patch for this.

I need to look at the (dgap_NumBoards <  MAXBOARDS) thing too.

Mark


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()

2014-04-25 Thread Mark Hounschell

On 04/25/2014 07:02 AM, DaeSeok Youn wrote:
> Hi, Dan.
> 
> 2014-04-25 18:26 GMT+09:00 Dan Carpenter :
>> Mark, maybe you should add yourself to the MAINTAINERS entry for this
>> driver?
>>

I'll look into this. I am clueless on what that would actually mean.

>> On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote:
>>> @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd)
>>>   /* Register tty devices */
>>>   rc = tty_register_driver(brd->SerialDriver);
>>>   if (rc < 0)
>>> - return rc;
>>> + goto free_print_ttys;
>>> +
>>>   brd->dgap_Major_Serial_Registered = TRUE;
>>>   dgap_BoardsByMajor[brd->SerialDriver->major] = brd;
>>>   brd->dgap_Serial_Major = brd->SerialDriver->major;
>>> @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd)
>>>   /* Register Transparent Print devices */
>>>   rc = tty_register_driver(brd->PrintDriver);
>>>   if (rc < 0)
>>> - return rc;
>>> + goto unregister_serial_drv;
>>> +
>>>   brd->dgap_Major_TransparentPrint_Registered = TRUE;
>>>   dgap_BoardsByMajor[brd->PrintDriver->major] = brd;
>>>   brd->dgap_TransparentPrint_Major = brd->PrintDriver->major;
>>>   }
>>>
>>>   return rc;
>>> +
>>> +unregister_serial_drv:
>>> + tty_unregister_driver(brd->SerialDriver);
>>
>> We only register the ->SerialDriver if someone else hasn't registered it
>> first?  So this should be:
>>
>> if (we_were_the_ones_who_registered_the_serial_driver)
>> tty_unregister_driver(brd->SerialDriver);
>>
>> I haven't followed looked at this.  Who else is registering the serial
>> driver?  You have looked at this, what do you think?  Or Mark.
> 

registering the brd->XDriver is only done when a board is detected
and only during the firmware_load process. If we fail to
tty_register_driver do we _need_ to tty_unregister_driver? Isn't that
like freeing after an alloc failure?

> I think brd struct is from dgap_Board array as global static variable
> when this function is
> called. So brd->dgap_Major_Serial_Registered is always "false".
> If dgap_NumBoards is less than MAXBOARDS, brd->SerialDriver should be
> registered.
> 
> I'm not sure..
> 

I don't see any check for (dgap_NumBoards <  MAXBOARDS), which I think I
probably should, but I do see we are calling dgap_tty_register, which
can fail, without actually checking the return value. Also, yes,
dgap_Major_Xxxx_Registered seems to be always "false" until registered,
and it looks like dgap_Major_X_Registered flags could be removed
because the only places we can unregister is at module_cleanup or
"after" it is already registered.

What is the driver _supposed_ to do if we fail something on the second
or later board? Is the driver supposed to cleanup and exit or are we
supposed to stay loaded for the board/boards that are usable?

Regards
mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()

2014-04-25 Thread Mark Hounschell

On 04/25/2014 07:02 AM, DaeSeok Youn wrote:
 Hi, Dan.
 
 2014-04-25 18:26 GMT+09:00 Dan Carpenter dan.carpen...@oracle.com:
 Mark, maybe you should add yourself to the MAINTAINERS entry for this
 driver?


I'll look into this. I am clueless on what that would actually mean.

 On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote:
 @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd)
   /* Register tty devices */
   rc = tty_register_driver(brd-SerialDriver);
   if (rc  0)
 - return rc;
 + goto free_print_ttys;
 +
   brd-dgap_Major_Serial_Registered = TRUE;
   dgap_BoardsByMajor[brd-SerialDriver-major] = brd;
   brd-dgap_Serial_Major = brd-SerialDriver-major;
 @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd)
   /* Register Transparent Print devices */
   rc = tty_register_driver(brd-PrintDriver);
   if (rc  0)
 - return rc;
 + goto unregister_serial_drv;
 +
   brd-dgap_Major_TransparentPrint_Registered = TRUE;
   dgap_BoardsByMajor[brd-PrintDriver-major] = brd;
   brd-dgap_TransparentPrint_Major = brd-PrintDriver-major;
   }

   return rc;
 +
 +unregister_serial_drv:
 + tty_unregister_driver(brd-SerialDriver);

 We only register the -SerialDriver if someone else hasn't registered it
 first?  So this should be:

 if (we_were_the_ones_who_registered_the_serial_driver)
 tty_unregister_driver(brd-SerialDriver);

 I haven't followed looked at this.  Who else is registering the serial
 driver?  You have looked at this, what do you think?  Or Mark.
 

registering the brd-XDriver is only done when a board is detected
and only during the firmware_load process. If we fail to
tty_register_driver do we _need_ to tty_unregister_driver? Isn't that
like freeing after an alloc failure?

 I think brd struct is from dgap_Board array as global static variable
 when this function is
 called. So brd-dgap_Major_Serial_Registered is always false.
 If dgap_NumBoards is less than MAXBOARDS, brd-SerialDriver should be
 registered.
 
 I'm not sure..
 

I don't see any check for (dgap_NumBoards   MAXBOARDS), which I think I
probably should, but I do see we are calling dgap_tty_register, which
can fail, without actually checking the return value. Also, yes,
dgap_Major_Xxxx_Registered seems to be always false until registered,
and it looks like dgap_Major_X_Registered flags could be removed
because the only places we can unregister is at module_cleanup or
after it is already registered.

What is the driver _supposed_ to do if we fail something on the second
or later board? Is the driver supposed to cleanup and exit or are we
supposed to stay loaded for the board/boards that are usable?

Regards
mark


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: implement error handling in dgap_tty_register()

2014-04-25 Thread Mark Hounschell

On 04/25/2014 08:59 AM, Dan Carpenter wrote:
 On Fri, Apr 25, 2014 at 08:29:41AM -0400, Mark Hounschell wrote:
 On 04/25/2014 07:02 AM, DaeSeok Youn wrote:
 Hi, Dan.

 2014-04-25 18:26 GMT+09:00 Dan Carpenter dan.carpen...@oracle.com:
 Mark, maybe you should add yourself to the MAINTAINERS entry for this
 driver?


 I'll look into this. I am clueless on what that would actually mean.

 
 Just add your name with Lidza in the MAINTAINERS file so that people
 will CC you on all the patches.
 
 DIGI EPCA PCI PRODUCTS
 M:  Lidza Louina lidza.lou...@gmail.com
 L:  driverdev-de...@linuxdriverproject.org
 S:  Maintained
 F:  drivers/staging/dgap/
 
 You don't have to do it if you don't want to, but you seem to be working
 on this driver and I'm going to refer questions to you either way.  :P
 
 On Fri, Apr 25, 2014 at 04:04:59PM +0900, Daeseok Youn wrote:
 @@ -1263,7 +1277,8 @@ static int dgap_tty_register(struct board_t *brd)
   /* Register tty devices */
   rc = tty_register_driver(brd-SerialDriver);
   if (rc  0)
 - return rc;
 + goto free_print_ttys;
 +
   brd-dgap_Major_Serial_Registered = TRUE;
   dgap_BoardsByMajor[brd-SerialDriver-major] = brd;
   brd-dgap_Serial_Major = brd-SerialDriver-major;
 @@ -1273,13 +1288,29 @@ static int dgap_tty_register(struct board_t *brd)
   /* Register Transparent Print devices */
   rc = tty_register_driver(brd-PrintDriver);
   if (rc  0)
 - return rc;
 + goto unregister_serial_drv;
 +
   brd-dgap_Major_TransparentPrint_Registered = TRUE;
   dgap_BoardsByMajor[brd-PrintDriver-major] = brd;
   brd-dgap_TransparentPrint_Major = brd-PrintDriver-major;
   }

   return rc;
 +
 +unregister_serial_drv:
 + tty_unregister_driver(brd-SerialDriver);

 We only register the -SerialDriver if someone else hasn't registered it
 first?  So this should be:

 if (we_were_the_ones_who_registered_the_serial_driver)
 tty_unregister_driver(brd-SerialDriver);

 I haven't followed looked at this.  Who else is registering the serial
 driver?  You have looked at this, what do you think?  Or Mark.


 registering the brd-XDriver is only done when a board is detected
 and only during the firmware_load process. If we fail to
 tty_register_driver do we _need_ to tty_unregister_driver? Isn't that
 like freeing after an alloc failure?
 
 The allocation is conditional so the free should be conditional.  If we
 didn't allocate it, then we shouldn't free it.
 
 It wouldn't have even been a question except I'm not sure the allocation
 is *really* conditional because brd-dgap_Major_Serial_Registered might
 always be false like you guys seem to be saying.
 

 I think brd struct is from dgap_Board array as global static variable
 when this function is
 called. So brd-dgap_Major_Serial_Registered is always false.
 If dgap_NumBoards is less than MAXBOARDS, brd-SerialDriver should be
 registered.

 I'm not sure..


 I don't see any check for (dgap_NumBoards   MAXBOARDS), which I think I
 probably should, but I do see we are calling dgap_tty_register, which
 can fail, without actually checking the return value. Also, yes,
 dgap_Major_Xxxx_Registered seems to be always false until registered,
 and it looks like dgap_Major_X_Registered flags could be removed
 because the only places we can unregister is at module_cleanup or
 after it is already registered.

 What is the driver _supposed_ to do if we fail something on the second
 or later board? Is the driver supposed to cleanup and exit or are we
 supposed to stay loaded for the board/boards that are usable?
 
 Stay loaded.
 

Then these tests on brd-dgap_Major_Serial_Registered need to stay in
there. If I have 3 boards and the second fails in some way, if I rmmod
the driver they will protect from unregistering a never registered one.
At least in the unregister code path. There is probably no need for them
in the register code path. I'll work up a patch for this.

I need to look at the (dgap_NumBoards   MAXBOARDS) thing too.

Mark


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: remove useless cast on kzalloc()

2014-03-06 Thread Mark Hounschell


On 03/06/2014 01:17 AM, Daeseok Youn wrote:


coccinelle warning:
drivers/staging/dgap/dgap.c:782:3-7: WARNING:
  casting value returned by k[cmz]alloc to (char *) is useless.
drivers/staging/dgap/dgap.c:776:2-16: WARNING:
  casting value returned by k[cmz]alloc to (struct board_t *) is useless.

Signed-off-by: Daeseok Youn 
---
  drivers/staging/dgap/dgap.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index cbce457..1adcd13 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -773,13 +773,13 @@ static int dgap_found_board(struct pci_dev *pdev, int id)

/* get the board structure and prep it */
brd = dgap_Board[dgap_NumBoards] =
-   (struct board_t *) kzalloc(sizeof(struct board_t), GFP_KERNEL);
+   kzalloc(sizeof(struct board_t), GFP_KERNEL);
if (!brd)
return -ENOMEM;

/* make a temporary message buffer for the boot messages */
brd->msgbuf = brd->msgbuf_head =
-   (char *) kzalloc(sizeof(char) * 8192, GFP_KERNEL);
+   kzalloc(sizeof(char) * 8192, GFP_KERNEL);
if (!brd->msgbuf) {
kfree(brd);
return -ENOMEM;



I'm pretty sure this has already been fixed up in current staging-next.

Thanks
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: dgap: remove useless cast on kzalloc()

2014-03-06 Thread Mark Hounschell


On 03/06/2014 01:17 AM, Daeseok Youn wrote:


coccinelle warning:
drivers/staging/dgap/dgap.c:782:3-7: WARNING:
  casting value returned by k[cmz]alloc to (char *) is useless.
drivers/staging/dgap/dgap.c:776:2-16: WARNING:
  casting value returned by k[cmz]alloc to (struct board_t *) is useless.

Signed-off-by: Daeseok Youn daeseok.y...@gmail.com
---
  drivers/staging/dgap/dgap.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index cbce457..1adcd13 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -773,13 +773,13 @@ static int dgap_found_board(struct pci_dev *pdev, int id)

/* get the board structure and prep it */
brd = dgap_Board[dgap_NumBoards] =
-   (struct board_t *) kzalloc(sizeof(struct board_t), GFP_KERNEL);
+   kzalloc(sizeof(struct board_t), GFP_KERNEL);
if (!brd)
return -ENOMEM;

/* make a temporary message buffer for the boot messages */
brd-msgbuf = brd-msgbuf_head =
-   (char *) kzalloc(sizeof(char) * 8192, GFP_KERNEL);
+   kzalloc(sizeof(char) * 8192, GFP_KERNEL);
if (!brd-msgbuf) {
kfree(brd);
return -ENOMEM;



I'm pretty sure this has already been fixed up in current staging-next.

Thanks
Mark
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 236 matches

Mail list logo