4.9.130: CPU soft lockups and other weird memory errors

2019-04-09 Thread Christoph Anton Mitterer
Hey.

Perhaps anyone can help with the following, which is a problem at a
mass storage system cluster at the physics faculty here:

The cluster consists of 40 nodes all running Debian stable with a
4.9.130 kernel serving some ~3 PiB storage via 10GbE networking.
Part of the nodes are some Dell PowerEdges/PowerVaults, the others are
HP ProLiant DL380 Gen9.
All of them have basically the same configuration (except of course
obvious things like IP addresses, etc.) and all should have plenty
memory (the HPs 64 GiB, the Dells 32 GiB).

The following two(?) problems occur only on the HP nodes (which is IMO
some indication that it's a hardware/kernel problem):



HP nodes regularly get stuck with either some strange memory or CPU
soft lockup errors being printed endlessly to the serial console (see
attached files for some examples):

When this starts to happen, the system may come back a few times for
some seconds but then it usually ends up in an endless loop of these
errors out of which only a hard reset helps (everything else like
serial console, ssh no longer reacts).

The problem seems to occur whenever system load goes up, especially
"higher" network load seems to cause the issue.
I say "higher" because it doesn't seem having to be that much. One
example of a node that crashed today, had a 1/5/15 min load of ~60 and
something between 40-60 MB/s of received bytes (and basically nothing
sent).


Any idea on how to fix that respectively further trace it down would be
highly appreciated.


Cheers,
Chris.


mem1.log.xz
Description: application/xz


mem2.log.xz
Description: application/xz


mem3.log.xz
Description: application/xz


mem-followed-by-softlockup.log.xz
Description: application/xz


soft-lockup1.log.xz
Description: application/xz


soft-lockup2.log.xz
Description: application/xz


soft-lockup3.log.xz
Description: application/xz


sd 6:0:0:0: [sdb] Unaligned partial completion (resid=12, sector_sz=512)

2018-06-09 Thread Christoph Anton Mitterer
Hey.

I'm seeing these errors:
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=16, sector_sz=512)
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 01 cd e6 de 40 00 00 00 e0 00 00
Jun 09 21:13:22 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 7749426752
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=12, sector_sz=512)
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 00 00 14 59 60 00 00 01 00 00 00
Jun 09 21:13:31 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 1333600
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=12, sector_sz=512)
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 00 c3 a2 56 80 00 00 01 00 00 00
Jun 09 21:13:53 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 3282196096

(many of them) on a Seagate 8TB Archive HDD.
The disk is only a year old and barely used (since it's just one of the
backups of my main data disks - and the primary disk another Seagate
8TB HDD broke just few days ago).

I have connected them via USB/SATA bridge... and when I now started
copying from the backup (which is sdb) to a fresh HDD I started seeing
vast numbers of these errors.

However, fsck, cp seem to do fine, and the data (as far as it has been
copied) seems good (I have sha512 sums of all of it).


Any ideas what "Unaligned partial completion" and these errors mean?


Thanks,
Chris.


sd 6:0:0:0: [sdb] Unaligned partial completion (resid=12, sector_sz=512)

2018-06-09 Thread Christoph Anton Mitterer
Hey.

I'm seeing these errors:
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=16, sector_sz=512)
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:22 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 01 cd e6 de 40 00 00 00 e0 00 00
Jun 09 21:13:22 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 7749426752
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=12, sector_sz=512)
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:31 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 00 00 14 59 60 00 00 01 00 00 00
Jun 09 21:13:31 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 1333600
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] Unaligned partial 
completion (resid=12, sector_sz=512)
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium 
Error [current] 
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 Add. Sense: 
Unrecovered read error
Jun 09 21:13:53 heisenberg kernel: sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 
00 00 00 00 c3 a2 56 80 00 00 01 00 00 00
Jun 09 21:13:53 heisenberg kernel: print_req_error: critical medium error, dev 
sdb, sector 3282196096

(many of them) on a Seagate 8TB Archive HDD.
The disk is only a year old and barely used (since it's just one of the
backups of my main data disks - and the primary disk another Seagate
8TB HDD broke just few days ago).

I have connected them via USB/SATA bridge... and when I now started
copying from the backup (which is sdb) to a fresh HDD I started seeing
vast numbers of these errors.

However, fsck, cp seem to do fine, and the data (as far as it has been
copied) seems good (I have sha512 sums of all of it).


Any ideas what "Unaligned partial completion" and these errors mean?


Thanks,
Chris.


"Core temperature above threshold" on Fujitsu U757 with 2 core Kaby Lake (i7-7600U)

2017-10-27 Thread Christoph Anton Mitterer
Hey.

Perhaps someone can help me with this.


I got a brand new notebook from the university, a Fujitsu U757[0][1],
with a 2 core Kaby Lake (i7-7600U) and 32GB RAM.
It runs Debian unstable, that is as of now kernel 4.13.4.

Even at pretty simple tasks (just some VM running) and a bit more, the
CPUs seem to overheat (>100°C).
I brought the thing back to the university's vendor and they claimed
that they couldn't reproduce this with the (Windows based) tests and it
might be a OS issue (they did replace the heat paste at my request).


The kernel logs quite regularly give:
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature/speed normal

I guess every time it goes beyond 100° C.

Once so far I had a complete lockup of the machine (it still seemed to
write data to the HDD, but I could only hard power cycle to get
it usable again.
Not sure if this is related to the temperature issue.
See the attached kernel log.

At around Oct 15 22:46:39 there seems to be first a crash of the Wifi
microcode a bit later, beginning at about Oct 16 01:27:16, there are
numerous stack traces with "BUG: soft lockup - CPU".


Could this be some kernel issue? Especially the overheating... I mean
obviously not in the sense that it's the kernels fault, but in the
sense that is should speed it down earlier or so...?


Interestingly, when I run e.g. stress or stress-ng on all 4 logical
CPUs... then sometimes I do get the overheating, sometimes not (in
which case temperature stays above 90°C.. but always below 100°C (I
assume).



Any help would be welcome, do not hesitate to ask if you need more data
(keep me CCed).

Thanks,
Chris.



[0] http://www.fujitsu.com/fts/products/computing/pc/notebooks/lifebook-u757/
[1] http://docs.ts.fujitsu.com/dl.aspx?id=addf5093-b73b-407b-ae78-90c5baf6456a

kern.log.xz
Description: application/xz


"Core temperature above threshold" on Fujitsu U757 with 2 core Kaby Lake (i7-7600U)

2017-10-27 Thread Christoph Anton Mitterer
Hey.

Perhaps someone can help me with this.


I got a brand new notebook from the university, a Fujitsu U757[0][1],
with a 2 core Kaby Lake (i7-7600U) and 32GB RAM.
It runs Debian unstable, that is as of now kernel 4.13.4.

Even at pretty simple tasks (just some VM running) and a bit more, the
CPUs seem to overheat (>100°C).
I brought the thing back to the university's vendor and they claimed
that they couldn't reproduce this with the (Windows based) tests and it
might be a OS issue (they did replace the heat paste at my request).


The kernel logs quite regularly give:
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature/speed normal

I guess every time it goes beyond 100° C.

Once so far I had a complete lockup of the machine (it still seemed to
write data to the HDD, but I could only hard power cycle to get
it usable again.
Not sure if this is related to the temperature issue.
See the attached kernel log.

At around Oct 15 22:46:39 there seems to be first a crash of the Wifi
microcode a bit later, beginning at about Oct 16 01:27:16, there are
numerous stack traces with "BUG: soft lockup - CPU".


Could this be some kernel issue? Especially the overheating... I mean
obviously not in the sense that it's the kernels fault, but in the
sense that is should speed it down earlier or so...?


Interestingly, when I run e.g. stress or stress-ng on all 4 logical
CPUs... then sometimes I do get the overheating, sometimes not (in
which case temperature stays above 90°C.. but always below 100°C (I
assume).



Any help would be welcome, do not hesitate to ask if you need more data
(keep me CCed).

Thanks,
Chris.



[0] http://www.fujitsu.com/fts/products/computing/pc/notebooks/lifebook-u757/
[1] http://docs.ts.fujitsu.com/dl.aspx?id=addf5093-b73b-407b-ae78-90c5baf6456a

kern.log.xz
Description: application/xz


USB ExpressCard makes kworker process utilise 72% CPU infinitely

2015-11-21 Thread Christoph Anton Mitterer
Hey.

I bough a USB3.0 ExpressCard from StarTech[0] which is apparently[1]
based on the NEC uPD720200.

Using a kernel 4.2.6 on amd64, the System is Debian sid, the following
happens when I plug the card:

kernel logs show:
Nov 21 17:15:22 heisenberg kernel: [  102.387452] pci :01:00.0: [1033:0194] 
type 00 class 0x0c0330
Nov 21 17:15:22 heisenberg kernel: [  102.387545] pci :01:00.0: reg 0x10: 
[mem 0x-0x1fff 64bit]
Nov 21 17:15:22 heisenberg kernel: [  102.387723] pci :01:00.0: PME# 
supported from D0 D3hot
Nov 21 17:15:22 heisenberg kernel: [  102.394689] pci :01:00.0: BAR 0: 
assigned [mem 0xf0d0-0xf0d01fff 64bit]
Nov 21 17:15:22 heisenberg kernel: [  102.394750] pci :01:00.0: enabling 
device ( -> 0002)
Nov 21 17:15:22 heisenberg kernel: [  102.395178] xhci_hcd :01:00.0: xHCI 
Host Controller
Nov 21 17:15:22 heisenberg kernel: [  102.395192] xhci_hcd :01:00.0: new 
USB bus registered, assigned bus number 5
Nov 21 17:15:22 heisenberg kernel: [  102.395418] xhci_hcd :01:00.0: hcc 
params 0x014042cb hci version 0x96 quirks 0x0004
Nov 21 17:15:22 heisenberg kernel: [  102.395892] usb usb5: New USB device 
found, idVendor=1d6b, idProduct=0002
Nov 21 17:15:22 heisenberg kernel: [  102.395896] usb usb5: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
Nov 21 17:15:22 heisenberg kernel: [  102.395899] usb usb5: Product: xHCI Host 
Controller
Nov 21 17:15:22 heisenberg kernel: [  102.395902] usb usb5: Manufacturer: Linux 
4.2.0-1-amd64 xhci-hcd
Nov 21 17:15:22 heisenberg kernel: [  102.395905] usb usb5: SerialNumber: 
:01:00.0
Nov 21 17:15:22 heisenberg kernel: [  102.396308] hub 5-0:1.0: USB hub found
Nov 21 17:15:22 heisenberg kernel: [  102.396331] hub 5-0:1.0: 2 ports detected
Nov 21 17:15:22 heisenberg kernel: [  102.396591] xhci_hcd :01:00.0: xHCI 
Host Controller
Nov 21 17:15:22 heisenberg kernel: [  102.396599] xhci_hcd :01:00.0: new 
USB bus registered, assigned bus number 6
Nov 21 17:15:22 heisenberg kernel: [  102.398835] usb usb6: We don't know the 
algorithms for LPM for this host, disabling LPM.
Nov 21 17:15:22 heisenberg kernel: [  102.398883] usb usb6: New USB device 
found, idVendor=1d6b, idProduct=0003
Nov 21 17:15:22 heisenberg kernel: [  102.398887] usb usb6: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
Nov 21 17:15:22 heisenberg kernel: [  102.398890] usb usb6: Product: xHCI Host 
Controller
Nov 21 17:15:22 heisenberg kernel: [  102.398892] usb usb6: Manufacturer: Linux 
4.2.0-1-amd64 xhci-hcd
Nov 21 17:15:22 heisenberg kernel: [  102.398895] usb usb6: SerialNumber: 
:01:00.0
Nov 21 17:15:22 heisenberg kernel: [  102.399272] hub 6-0:1.0: USB hub found
Nov 21 17:15:22 heisenberg kernel: [  102.399294] hub 6-0:1.0: 2 ports detected

and when removing:
Nov 21 17:15:40 heisenberg kernel: [  120.541515] xhci_hcd :01:00.0: 
remove, state 4
Nov 21 17:15:40 heisenberg kernel: [  120.544671] usb usb6: USB disconnect, 
device number 1
Nov 21 17:15:40 heisenberg kernel: [  120.546683] xhci_hcd :01:00.0: Host 
not halted after 16000 microseconds.
Nov 21 17:15:40 heisenberg kernel: [  120.547611] xhci_hcd :01:00.0: USB 
bus 6 deregistered
Nov 21 17:15:40 heisenberg kernel: [  120.547618] xhci_hcd :01:00.0: 
remove, state 4
Nov 21 17:15:40 heisenberg kernel: [  120.547622] usb usb5: USB disconnect, 
device number 1
Nov 21 17:15:40 heisenberg kernel: [  120.547735] xhci_hcd :01:00.0: USB 
bus 5 deregistered


Now the problem is that immediately when I attach the card, a kworker
process starts to utilise the CPU to always around 72%.
And it never stops again until I shutdown; removing the card doesn't
help.


Any ideas?

Thanks,
Chris.


[0] 
http://www.startech.com/Cards-Adapters/USB-3.0/Cards/2-Port-Flush-Mount-USB-3-ExpressCard-Adapter~ECUSB3S254F
[1] 
http://sgcdn.startech.com/005329/media/sets/ECUSB3S254F_Manual/ECUSB3S254F.pdf

smime.p7s
Description: S/MIME cryptographic signature


USB ExpressCard makes kworker process utilise 72% CPU infinitely

2015-11-21 Thread Christoph Anton Mitterer
Hey.

I bough a USB3.0 ExpressCard from StarTech[0] which is apparently[1]
based on the NEC uPD720200.

Using a kernel 4.2.6 on amd64, the System is Debian sid, the following
happens when I plug the card:

kernel logs show:
Nov 21 17:15:22 heisenberg kernel: [  102.387452] pci :01:00.0: [1033:0194] 
type 00 class 0x0c0330
Nov 21 17:15:22 heisenberg kernel: [  102.387545] pci :01:00.0: reg 0x10: 
[mem 0x-0x1fff 64bit]
Nov 21 17:15:22 heisenberg kernel: [  102.387723] pci :01:00.0: PME# 
supported from D0 D3hot
Nov 21 17:15:22 heisenberg kernel: [  102.394689] pci :01:00.0: BAR 0: 
assigned [mem 0xf0d0-0xf0d01fff 64bit]
Nov 21 17:15:22 heisenberg kernel: [  102.394750] pci :01:00.0: enabling 
device ( -> 0002)
Nov 21 17:15:22 heisenberg kernel: [  102.395178] xhci_hcd :01:00.0: xHCI 
Host Controller
Nov 21 17:15:22 heisenberg kernel: [  102.395192] xhci_hcd :01:00.0: new 
USB bus registered, assigned bus number 5
Nov 21 17:15:22 heisenberg kernel: [  102.395418] xhci_hcd :01:00.0: hcc 
params 0x014042cb hci version 0x96 quirks 0x0004
Nov 21 17:15:22 heisenberg kernel: [  102.395892] usb usb5: New USB device 
found, idVendor=1d6b, idProduct=0002
Nov 21 17:15:22 heisenberg kernel: [  102.395896] usb usb5: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
Nov 21 17:15:22 heisenberg kernel: [  102.395899] usb usb5: Product: xHCI Host 
Controller
Nov 21 17:15:22 heisenberg kernel: [  102.395902] usb usb5: Manufacturer: Linux 
4.2.0-1-amd64 xhci-hcd
Nov 21 17:15:22 heisenberg kernel: [  102.395905] usb usb5: SerialNumber: 
:01:00.0
Nov 21 17:15:22 heisenberg kernel: [  102.396308] hub 5-0:1.0: USB hub found
Nov 21 17:15:22 heisenberg kernel: [  102.396331] hub 5-0:1.0: 2 ports detected
Nov 21 17:15:22 heisenberg kernel: [  102.396591] xhci_hcd :01:00.0: xHCI 
Host Controller
Nov 21 17:15:22 heisenberg kernel: [  102.396599] xhci_hcd :01:00.0: new 
USB bus registered, assigned bus number 6
Nov 21 17:15:22 heisenberg kernel: [  102.398835] usb usb6: We don't know the 
algorithms for LPM for this host, disabling LPM.
Nov 21 17:15:22 heisenberg kernel: [  102.398883] usb usb6: New USB device 
found, idVendor=1d6b, idProduct=0003
Nov 21 17:15:22 heisenberg kernel: [  102.398887] usb usb6: New USB device 
strings: Mfr=3, Product=2, SerialNumber=1
Nov 21 17:15:22 heisenberg kernel: [  102.398890] usb usb6: Product: xHCI Host 
Controller
Nov 21 17:15:22 heisenberg kernel: [  102.398892] usb usb6: Manufacturer: Linux 
4.2.0-1-amd64 xhci-hcd
Nov 21 17:15:22 heisenberg kernel: [  102.398895] usb usb6: SerialNumber: 
:01:00.0
Nov 21 17:15:22 heisenberg kernel: [  102.399272] hub 6-0:1.0: USB hub found
Nov 21 17:15:22 heisenberg kernel: [  102.399294] hub 6-0:1.0: 2 ports detected

and when removing:
Nov 21 17:15:40 heisenberg kernel: [  120.541515] xhci_hcd :01:00.0: 
remove, state 4
Nov 21 17:15:40 heisenberg kernel: [  120.544671] usb usb6: USB disconnect, 
device number 1
Nov 21 17:15:40 heisenberg kernel: [  120.546683] xhci_hcd :01:00.0: Host 
not halted after 16000 microseconds.
Nov 21 17:15:40 heisenberg kernel: [  120.547611] xhci_hcd :01:00.0: USB 
bus 6 deregistered
Nov 21 17:15:40 heisenberg kernel: [  120.547618] xhci_hcd :01:00.0: 
remove, state 4
Nov 21 17:15:40 heisenberg kernel: [  120.547622] usb usb5: USB disconnect, 
device number 1
Nov 21 17:15:40 heisenberg kernel: [  120.547735] xhci_hcd :01:00.0: USB 
bus 5 deregistered


Now the problem is that immediately when I attach the card, a kworker
process starts to utilise the CPU to always around 72%.
And it never stops again until I shutdown; removing the card doesn't
help.


Any ideas?

Thanks,
Chris.


[0] 
http://www.startech.com/Cards-Adapters/USB-3.0/Cards/2-Port-Flush-Mount-USB-3-ExpressCard-Adapter~ECUSB3S254F
[1] 
http://sgcdn.startech.com/005329/media/sets/ECUSB3S254F_Manual/ECUSB3S254F.pdf

smime.p7s
Description: S/MIME cryptographic signature


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-10-13 Thread Christoph Anton Mitterer
Hi Greg, Guenter and Chris.

Coming back to the stuff discussed previously[0].

Chris Eastwood has made most of these (i.e. LEDs and buttons, the
buzzers may work on at least some of the devices via some other serial
device) working (AFAIU based on the previously mentioned code at
Github[1]), he told me in several iterations of private mail.

I'm not sure now, whether anything based on this code would be
appropriate for the mainline kernel, since Guenter mentioned he'd prefer
a mfd core driver for all that... OTOH, the later may probably never
happen, and Chris' work seems to already do the job.

I don't know however, whether he needs to patch any other places in the
kernel, but I'm sure he can show his work (and ask questions) better
than I, thereby inviting him to do so.
Greg you had mentioned before that you might be able to spend some time
on this, so if you could help Chris, that would be great.


Cheers,
Chris.

[0] http://thread.gmane.org/gmane.linux.kernel/1508763/focus=1512903
[1] https://github.com/tomtastic/qnap-gpio/

PS: Sorry for having lost the message threading.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-10-13 Thread Christoph Anton Mitterer
Hi Greg, Guenter and Chris.

Coming back to the stuff discussed previously[0].

Chris Eastwood has made most of these (i.e. LEDs and buttons, the
buzzers may work on at least some of the devices via some other serial
device) working (AFAIU based on the previously mentioned code at
Github[1]), he told me in several iterations of private mail.

I'm not sure now, whether anything based on this code would be
appropriate for the mainline kernel, since Guenter mentioned he'd prefer
a mfd core driver for all that... OTOH, the later may probably never
happen, and Chris' work seems to already do the job.

I don't know however, whether he needs to patch any other places in the
kernel, but I'm sure he can show his work (and ask questions) better
than I, thereby inviting him to do so.
Greg you had mentioned before that you might be able to spend some time
on this, so if you could help Chris, that would be great.


Cheers,
Chris.

[0] http://thread.gmane.org/gmane.linux.kernel/1508763/focus=1512903
[1] https://github.com/tomtastic/qnap-gpio/

PS: Sorry for having lost the message threading.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
Oh and one more thing:

The QNAP driver seems to be able to do much more than just
LEDs/HDD-LEDs/buzzers/buttons... at least their symbols imply such,
like:
#define QNAP_IOCTL_SATA_UP  0x0100
#define QNAP_IOCTL_SATA_DOWN0x0200
#define QNAP_IOCTL_ESATA_UP 0x0300
#define QNAP_IOCTL_ESATA_DOWN   0x0400
#define QNAP_IOCTL_SATA_ERR 0x0500
#define QNAP_IOCTL_ETH_UP   0x0600
#define QNAP_IOCTL_ETH_DOWN 0x0700
#define QNAP_IOCTL_BOND_UP  0x0800
#define QNAP_IOCTL_BOND_DOWN0x0900
#define QNAP_IOCTL_USB_DRV_RELOAD 0x0a00
#define QNAP_IOCTL_USB_SET_POLL_INTV 0x0b00

#define MD_RESYNCING0x20
#define MD_RESYNCING_DONE   0x21
#define MD_RESYNCING_SKIP   0x22
#define MD1_REBUILDING  0x23
#define MD1_REBUILDING_DONE 0x24
#define MD1_REBUILDING_SKIP 0x25
#define MD1_RESYNCING   0x26
...

and much more (see drivers/qnap/pic.h of their kernel bundle).


I'm not sure whether it would be really a good idea for the end user to
use all these... the "normal" kernel drivers seem to handle all that
just fine.
Same applies for the cooling fan... there are PIC commands to control
it... (with my TS 569 Pro,... that didn't even work from within the
native QNAP firmware)... but fancontrol from lm-sensor already works
fine (any much more granular)... so I guess people should use that...
and it's not needed to add suport for that as well.



When you look at the model that I have (the TS-569 Pro[0]):
http://www.qnap.com/upload/album/24/m_929_20120726112135_59189.png

- The LCD already works, is a A125 device and controllable via an UART.
- The power button already works (normal ACPI button device)

- The buzzer does not yet work.
- With respect to the LEDs... from the native QNAP firmware I was able
to control the left-most one[1] (below the QNAP logo) and the ones
labelled STATUS[2] and USB[3].
- I'm not sure whether one can control the HDD-LEDs at all (but I'd hope
so[4])
- The other 3 buttons (in the image labelled COPY, ENTER, SELECT) do all
not work.



qcontrol (on some ARM based QNAPs) does a bit more and also allows to
set EuP mode and WOL...

One thing that could be interesting was, if it were possible really
power down SATA ports... could perhaps be QNAP_HDERR_ON(nr) /
QNAP_HDERR_OFF(nr)... or QNAP_SATAn_UP / QNAP_SATAn_DOWN ... but that's
just blind guessing.

Best wishes,
Chris.



[0]
http://www.qnap.com/useng/index.php?lang=en-us=862=355=526=692=9904
[1] These are the QNAP_PIC_POWER_LED_* in their code.
[2] These are the QNAP_PIC_STATUS_* in their code.
[3] These are the QNAP_PIC_USB_LED_* in their code.
[4] There is IOCTL_HD_ERROR_LED_SEND_MESSAGE and int
set_hd_error_led_on(int, int) in their code, which could be that.
[5] QNAP_PIC_WOL_* and QNAP_PIC_EUP_*


smime.p7s
Description: S/MIME cryptographic signature


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
On Thu, 2013-06-20 at 14:42 -0700, Greg KH wrote:
> Also, do you have a pointer to the git tree for
> the hardware again, I can't seem to find it.
You mean a git repo for their driver? I don't think they have one...
just the big tarball with the patches integrated...

>   I can dig through the tree
> to see if I can make something "self-contained", if you are willing to
> test it out.
Sure... I'm happy to test anything out... right now the box isn't used
in production yet... so it's easy for me... and I guess you have enough
experience to not accidentally write code that overwrites the
firmware ;) (which would probably happen when I'd have tried to ^^)


Thanks,
Chris.

btw: Let's remove Wokes Wang again in any following messages... if he's
interested in contributing he can come back and read the messages in the
archive (http://thread.gmane.org/gmane.linux.kernel/1508763)... and I
don't want to spam him, if he's not interested.


smime.p7s
Description: S/MIME cryptographic signature


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
Hey Greg.

On Wed, 2013-06-19 at 19:59 -0700, Greg KH wrote:
> If you can dig the code out into a stand-alone form that I can make into
> a patch for the drivers/staging/ tree, I'll be glad to take it.
Well I don't think my kernel/hardware development skills are enough for
that... especially as Guenter already said that the code shouldn't be
part of hwmon/it87 ... but rather a separate gpio driver...


> Or, if
> you get a contact at QNAP, I'll be glad to help out there as well
Well I've asked them again,... no reply so far.
The only direct contact I know (from the sources of their drivers) is
Wokes Wang, which I've CCed. Perhaps he can help.


>  as
> that's what we do all the time for loads of companies.
sure...


Cheers,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
Hey Greg.

On Wed, 2013-06-19 at 19:59 -0700, Greg KH wrote:
 If you can dig the code out into a stand-alone form that I can make into
 a patch for the drivers/staging/ tree, I'll be glad to take it.
Well I don't think my kernel/hardware development skills are enough for
that... especially as Guenter already said that the code shouldn't be
part of hwmon/it87 ... but rather a separate gpio driver...


 Or, if
 you get a contact at QNAP, I'll be glad to help out there as well
Well I've asked them again,... no reply so far.
The only direct contact I know (from the sources of their drivers) is
Wokes Wang, which I've CCed. Perhaps he can help.


  as
 that's what we do all the time for loads of companies.
sure...


Cheers,
Chris.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
On Thu, 2013-06-20 at 14:42 -0700, Greg KH wrote:
 Also, do you have a pointer to the git tree for
 the hardware again, I can't seem to find it.
You mean a git repo for their driver? I don't think they have one...
just the big tarball with the patches integrated...

   I can dig through the tree
 to see if I can make something self-contained, if you are willing to
 test it out.
Sure... I'm happy to test anything out... right now the box isn't used
in production yet... so it's easy for me... and I guess you have enough
experience to not accidentally write code that overwrites the
firmware ;) (which would probably happen when I'd have tried to ^^)


Thanks,
Chris.

btw: Let's remove Wokes Wang again in any following messages... if he's
interested in contributing he can come back and read the messages in the
archive (http://thread.gmane.org/gmane.linux.kernel/1508763)... and I
don't want to spam him, if he's not interested.


smime.p7s
Description: S/MIME cryptographic signature


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-20 Thread Christoph Anton Mitterer
Oh and one more thing:

The QNAP driver seems to be able to do much more than just
LEDs/HDD-LEDs/buzzers/buttons... at least their symbols imply such,
like:
#define QNAP_IOCTL_SATA_UP  0x0100
#define QNAP_IOCTL_SATA_DOWN0x0200
#define QNAP_IOCTL_ESATA_UP 0x0300
#define QNAP_IOCTL_ESATA_DOWN   0x0400
#define QNAP_IOCTL_SATA_ERR 0x0500
#define QNAP_IOCTL_ETH_UP   0x0600
#define QNAP_IOCTL_ETH_DOWN 0x0700
#define QNAP_IOCTL_BOND_UP  0x0800
#define QNAP_IOCTL_BOND_DOWN0x0900
#define QNAP_IOCTL_USB_DRV_RELOAD 0x0a00
#define QNAP_IOCTL_USB_SET_POLL_INTV 0x0b00

#define MD_RESYNCING0x20
#define MD_RESYNCING_DONE   0x21
#define MD_RESYNCING_SKIP   0x22
#define MD1_REBUILDING  0x23
#define MD1_REBUILDING_DONE 0x24
#define MD1_REBUILDING_SKIP 0x25
#define MD1_RESYNCING   0x26
...

and much more (see drivers/qnap/pic.h of their kernel bundle).


I'm not sure whether it would be really a good idea for the end user to
use all these... the normal kernel drivers seem to handle all that
just fine.
Same applies for the cooling fan... there are PIC commands to control
it... (with my TS 569 Pro,... that didn't even work from within the
native QNAP firmware)... but fancontrol from lm-sensor already works
fine (any much more granular)... so I guess people should use that...
and it's not needed to add suport for that as well.



When you look at the model that I have (the TS-569 Pro[0]):
http://www.qnap.com/upload/album/24/m_929_20120726112135_59189.png

- The LCD already works, is a A125 device and controllable via an UART.
- The power button already works (normal ACPI button device)

- The buzzer does not yet work.
- With respect to the LEDs... from the native QNAP firmware I was able
to control the left-most one[1] (below the QNAP logo) and the ones
labelled STATUS[2] and USB[3].
- I'm not sure whether one can control the HDD-LEDs at all (but I'd hope
so[4])
- The other 3 buttons (in the image labelled COPY, ENTER, SELECT) do all
not work.



qcontrol (on some ARM based QNAPs) does a bit more and also allows to
set EuP mode and WOL...

One thing that could be interesting was, if it were possible really
power down SATA ports... could perhaps be QNAP_HDERR_ON(nr) /
QNAP_HDERR_OFF(nr)... or QNAP_SATAn_UP / QNAP_SATAn_DOWN ... but that's
just blind guessing.

Best wishes,
Chris.



[0]
http://www.qnap.com/useng/index.php?lang=en-ussn=862c=355sc=526t=692n=9904
[1] These are the QNAP_PIC_POWER_LED_* in their code.
[2] These are the QNAP_PIC_STATUS_* in their code.
[3] These are the QNAP_PIC_USB_LED_* in their code.
[4] There is IOCTL_HD_ERROR_LED_SEND_MESSAGE and int
set_hd_error_led_on(int, int) in their code, which could be that.
[5] QNAP_PIC_WOL_* and QNAP_PIC_EUP_*


smime.p7s
Description: S/MIME cryptographic signature


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-19 Thread Christoph Anton Mitterer
On Sat, 2013-06-15 at 03:31 +0200, Christoph Anton Mitterer wrote: 
> I wondered whether anyone knows, whether the kernel supports the
> LEDs/buttons/buzzer of Intel Atom based QNAP NAS like the TS-569 Pro?

I tried to find out some more information (and got some help there as
well)... seems I'm stuck now, though. So perhaps for the records (and if
there should ever be someone with more experience in hardware
programming) what I found:



According[0] do Ian Campbell, who maintains qcontrol[1], the ARM based
QNAP devices have UART interface to their PIC controller (which
apparently controls the LEDs, buzzers, etc.)... it seems though that the
Intel based ones (or at least the one I have), doesn't have this - well
there is a serial device, but I guess it's just a "normal" one as
nothing happens when I send the (supposed) commands to it.

Actually I personally would have preferred being able to control the
stuff without the need for a kernel driver... a pity that I couldn't get
it running.


QNAP itself seems to have a kernel driver for all this...
On their website, they provide a GPL bundle[2], which, amongst others,
contains the sources to their kernel with many modifications (no single
patches provided, unfortunately o.O ).
This includes a drivers/qnap which seems to export a device /dev/pic
which their user space tools use to set the LEDs/etc. and that driver in
turn seems to use their modifications (GPIO stuff and so on) to the
kernel's it87 driver (according to Guenter - see below - they use a
IT8721).


I asked Guenter Roeck, who kindly had a look[3], but according to him,
the QNAP code cannot be easily taken over.


Well perhaps someone else with enough knowledge has time to look into
this,... or perhaps someone has some good contacts over at QNAP and is
able to lobby them to submit their code to the mainline kernel; I tried
to contact their support but got no answer.

Cheers,
Chris.


[0] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712283
[1] https://gitorious.org/qcontrol/
[2] http://sourceforge.net/projects/qosgpl/files/latest/download
[3] https://github.com/groeck/it87/issues/1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-19 Thread Christoph Anton Mitterer
On Sat, 2013-06-15 at 03:31 +0200, Christoph Anton Mitterer wrote: 
 I wondered whether anyone knows, whether the kernel supports the
 LEDs/buttons/buzzer of Intel Atom based QNAP NAS like the TS-569 Pro?

I tried to find out some more information (and got some help there as
well)... seems I'm stuck now, though. So perhaps for the records (and if
there should ever be someone with more experience in hardware
programming) what I found:



According[0] do Ian Campbell, who maintains qcontrol[1], the ARM based
QNAP devices have UART interface to their PIC controller (which
apparently controls the LEDs, buzzers, etc.)... it seems though that the
Intel based ones (or at least the one I have), doesn't have this - well
there is a serial device, but I guess it's just a normal one as
nothing happens when I send the (supposed) commands to it.

Actually I personally would have preferred being able to control the
stuff without the need for a kernel driver... a pity that I couldn't get
it running.


QNAP itself seems to have a kernel driver for all this...
On their website, they provide a GPL bundle[2], which, amongst others,
contains the sources to their kernel with many modifications (no single
patches provided, unfortunately o.O ).
This includes a drivers/qnap which seems to export a device /dev/pic
which their user space tools use to set the LEDs/etc. and that driver in
turn seems to use their modifications (GPIO stuff and so on) to the
kernel's it87 driver (according to Guenter - see below - they use a
IT8721).


I asked Guenter Roeck, who kindly had a look[3], but according to him,
the QNAP code cannot be easily taken over.


Well perhaps someone else with enough knowledge has time to look into
this,... or perhaps someone has some good contacts over at QNAP and is
able to lobby them to submit their code to the mainline kernel; I tried
to contact their support but got no answer.

Cheers,
Chris.


[0] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712283
[1] https://gitorious.org/qcontrol/
[2] http://sourceforge.net/projects/qosgpl/files/latest/download
[3] https://github.com/groeck/it87/issues/1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-14 Thread Christoph Anton Mitterer
Hi.

I wondered whether anyone knows, whether the kernel supports the
LEDs/buttons/buzzer of Intel Atom based QNAP NAS like the TS-569 Pro?

I got the two line LCD, which is a A125, working,...it can easily be
controlled via the serial device... but not the others.
Seems these are GPIO controlled...

I further found this: https://github.com/tomtastic/qnap-gpio/
but it seems it's for the TS-239 Pro only.

Any people out there with some experience? :)

Cheers and thanks,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


support for Intel Atom based QNAP LEDs/buttons/buzzer in Linux?

2013-06-14 Thread Christoph Anton Mitterer
Hi.

I wondered whether anyone knows, whether the kernel supports the
LEDs/buttons/buzzer of Intel Atom based QNAP NAS like the TS-569 Pro?

I got the two line LCD, which is a A125, working,...it can easily be
controlled via the serial device... but not the others.
Seems these are GPIO controlled...

I further found this: https://github.com/tomtastic/qnap-gpio/
but it seems it's for the TS-239 Pro only.

Any people out there with some experience? :)

Cheers and thanks,
Chris.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-09 Thread Christoph Anton Mitterer
On Sun, 2012-10-07 at 21:24 -0400, Theodore Ts'o wrote:
> I've looked at his message, I didn't see any justification for his
> concern/assertion.  So I can't really comment on it since he didn't
> give any reason for his belief.
I asked him again[0] to be sure and he replied to have no reason to
believe it's possible to spoil it.



> We've made a lot of changes in how we gather entropy recently
>...
I see,.. I guess this was in 3.6 then? Cause I made some tests with 3.5
and there (even on my desktop) available entropy is always rather
low ... but with haveged it quickly falls and rises (that actually
puzzles me) between 4096  and ~1k



> We're not using SHA has a traditional cryptographic hash
>...
Of course :) Thanks for the good explanation of the operation though!


> So I'm not particularly worried at this point.  The other thing to
> note is that the possible alternatives to SHA-1 (i.e., SHA-2 and
> SHA-3) are actually slower, not faster.  So we would be giving up
> performance if we were to use them.
I rather meant some other fast algos, e.g. those from the SHA3
competition which seem to be faster than SHA1.
Haven't measured myself but just took:
http://arctic.org/~dean/crypto/sha-sse2-20041218.txt
http://skein-hash.info/sha3-engineering
Well it's perhaps rather minor...


Thanks anyway for all your information :)


Cheers,
Chris.



[0]
http://lists.gnupg.org/pipermail/gnupg-users/2012-October/045551.html


smime.p7s
Description: S/MIME cryptographic signature


Re: RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-09 Thread Christoph Anton Mitterer
On Sun, 2012-10-07 at 21:24 -0400, Theodore Ts'o wrote:
 I've looked at his message, I didn't see any justification for his
 concern/assertion.  So I can't really comment on it since he didn't
 give any reason for his belief.
I asked him again[0] to be sure and he replied to have no reason to
believe it's possible to spoil it.



 We've made a lot of changes in how we gather entropy recently
...
I see,.. I guess this was in 3.6 then? Cause I made some tests with 3.5
and there (even on my desktop) available entropy is always rather
low ... but with haveged it quickly falls and rises (that actually
puzzles me) between 4096  and ~1k



 We're not using SHA has a traditional cryptographic hash
...
Of course :) Thanks for the good explanation of the operation though!


 So I'm not particularly worried at this point.  The other thing to
 note is that the possible alternatives to SHA-1 (i.e., SHA-2 and
 SHA-3) are actually slower, not faster.  So we would be giving up
 performance if we were to use them.
I rather meant some other fast algos, e.g. those from the SHA3
competition which seem to be faster than SHA1.
Haven't measured myself but just took:
http://arctic.org/~dean/crypto/sha-sse2-20041218.txt
http://skein-hash.info/sha3-engineering
Well it's perhaps rather minor...


Thanks anyway for all your information :)


Cheers,
Chris.



[0]
http://lists.gnupg.org/pipermail/gnupg-users/2012-October/045551.html


smime.p7s
Description: S/MIME cryptographic signature


Re: RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-07 Thread Christoph Anton Mitterer
Hi Ted.


Thanks for your prompt reply.


On Thu, 2012-10-04 at 18:49 -0400, Theodore Ts'o wrote:
> It is impossible by design.  Or specifically, /dev/random was designed
> so that it can be world-writeable, and an attacker can feed in any
> kind of input he or she wants, and it will not allow the attacker to
> know anything more about the state of the entropy pool than he or she
> knew before they started mixing inputs in.

I just wondered because I remembered David Shaw (one of the main
developers from gpg) to imply[0] some time ago, that an "evil" entropy
source would actually be a problem:
> Not completely useless given the Linux random design, but
> certainly an evil source of entropy would be a serious problem.  "



> There are comments that go into more detail about the design in
> drivers/char/random.c.
I had a short glance at it,... but I guess it goes a bit above my
understanding of entropy theory... well at least without without putting
some effort into it.

Some notes though (guess you're the maintainer anyway):
1) With respect to the sources of entropy... would it make sense for the
kernel to follow ideas from haveged[1].
I mean we all now that especially disk-less server systems have problems
with the current sources.
Or is that intended to be kept in userspace?

2) At some places, the documentation mentiones that SHA is used... any
sense in "upgrading" to stronger/more secure (especially as it says the
hash is used to protect the internal state of the pool) and faster
algos?

3) Some places note that things are not so cryptographically strong...
which sounds a bit worrying...

4) Were "newer" developments in PRNGs already taken into account? E.g.
the Mersenne Twister (which is AFAIK however not cryptographically
secure; at least in it's native form)


Thanks again,
Chris.


[0]
http://lists.gnupg.org/pipermail/gnupg-users/2009-September/037301.html
[1] http://www.issihosts.com/haveged/
[2] http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html


smime.p7s
Description: S/MIME cryptographic signature


Re: RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-07 Thread Christoph Anton Mitterer
Hi Ted.


Thanks for your prompt reply.


On Thu, 2012-10-04 at 18:49 -0400, Theodore Ts'o wrote:
 It is impossible by design.  Or specifically, /dev/random was designed
 so that it can be world-writeable, and an attacker can feed in any
 kind of input he or she wants, and it will not allow the attacker to
 know anything more about the state of the entropy pool than he or she
 knew before they started mixing inputs in.

I just wondered because I remembered David Shaw (one of the main
developers from gpg) to imply[0] some time ago, that an evil entropy
source would actually be a problem:
 Not completely useless given the Linux random design, but
 certainly an evil source of entropy would be a serious problem.  



 There are comments that go into more detail about the design in
 drivers/char/random.c.
I had a short glance at it,... but I guess it goes a bit above my
understanding of entropy theory... well at least without without putting
some effort into it.

Some notes though (guess you're the maintainer anyway):
1) With respect to the sources of entropy... would it make sense for the
kernel to follow ideas from haveged[1].
I mean we all now that especially disk-less server systems have problems
with the current sources.
Or is that intended to be kept in userspace?

2) At some places, the documentation mentiones that SHA is used... any
sense in upgrading to stronger/more secure (especially as it says the
hash is used to protect the internal state of the pool) and faster
algos?

3) Some places note that things are not so cryptographically strong...
which sounds a bit worrying...

4) Were newer developments in PRNGs already taken into account? E.g.
the Mersenne Twister (which is AFAIK however not cryptographically
secure; at least in it's native form)


Thanks again,
Chris.


[0]
http://lists.gnupg.org/pipermail/gnupg-users/2009-September/037301.html
[1] http://www.issihosts.com/haveged/
[2] http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html


smime.p7s
Description: S/MIME cryptographic signature


RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-04 Thread Christoph Anton Mitterer
Hi.

This is a question towards the crypto/entropy experts.

When seeding the kernels entropy cache (which is then ultimately used
for /dev/random), e.g. by (semi-)TRNGs like haveged[0],
audio-entropyd[1], Simtec’s Entropy Key[2] or friends... can one spoil
the randomness by that or is this impossible by design?

Of course it's easy to check the distribution of these randomness
sources, but as we see on the plain Mersenne Twister, a "perfect"
distribution is not necessarily usable for cryptography.


Further, one could imagine that closed products like the Entropy Key are
hacked or have backdoors, which may make them produce subtle patterns
that could later be used in cryptoanalysis.
(This is in no way a claim, that Simtec would do this,... just an
example.)


Cheers,
Chris.



[0] http://www.issihosts.com/haveged/
[1] http://www.vanheusden.com/aed/
[2] http://www.entropykey.co.uk/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-04 Thread Christoph Anton Mitterer
Hi.

This is a question towards the crypto/entropy experts.

When seeding the kernels entropy cache (which is then ultimately used
for /dev/random), e.g. by (semi-)TRNGs like haveged[0],
audio-entropyd[1], Simtec’s Entropy Key[2] or friends... can one spoil
the randomness by that or is this impossible by design?

Of course it's easy to check the distribution of these randomness
sources, but as we see on the plain Mersenne Twister, a perfect
distribution is not necessarily usable for cryptography.


Further, one could imagine that closed products like the Entropy Key are
hacked or have backdoors, which may make them produce subtle patterns
that could later be used in cryptoanalysis.
(This is in no way a claim, that Simtec would do this,... just an
example.)


Cheers,
Chris.



[0] http://www.issihosts.com/haveged/
[1] http://www.vanheusden.com/aed/
[2] http://www.entropykey.co.uk/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with dmcrypt/LUKS

2008-02-15 Thread Christoph Anton Mitterer
Hi Filippo.


On Wed, 2008-02-13 at 22:39 +0100, Filippo Zangheri wrote:
> have you conducted further tests? Have you discovered anything?
I actually conducted some tests last week (also with aes-cbc-essiv) but
wasn't able to reproduce the two errors (tested it on the same computer,
with the same USB-sticks, same commands, kernel 2.6.24 etc.).
I'm not sure if I should be glad about this,.. because I definitely had
those two problems, but of course it's still possible (though I consider
it unlikely) that there were hardware problems.


Today I copied a complete Debian installation (about 6 GB) from an
unencrypted partition to a luks/dm-crypt partition with aes-xts-plain
(according the Herbert Xu plain is the "most secure" and it's not
required to use the benbi mode for the IV generation).
I had no problems with this mode, too.

Best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: data corruption with dmcrypt/LUKS

2008-02-15 Thread Christoph Anton Mitterer
Hi Filippo.


On Wed, 2008-02-13 at 22:39 +0100, Filippo Zangheri wrote:
 have you conducted further tests? Have you discovered anything?
I actually conducted some tests last week (also with aes-cbc-essiv) but
wasn't able to reproduce the two errors (tested it on the same computer,
with the same USB-sticks, same commands, kernel 2.6.24 etc.).
I'm not sure if I should be glad about this,.. because I definitely had
those two problems, but of course it's still possible (though I consider
it unlikely) that there were hardware problems.


Today I copied a complete Debian installation (about 6 GB) from an
unencrypted partition to a luks/dm-crypt partition with aes-xts-plain
(according the Herbert Xu plain is the most secure and it's not
required to use the benbi mode for the IV generation).
I had no problems with this mode, too.

Best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: kernel support for newer UDF versions (> 2.01)

2008-02-06 Thread Christoph Anton Mitterer
Hi.

On Wed, 2008-02-06 at 17:36 +0100, Jan Kara wrote:
> I think there's a patch for 2.50 support which is quite recent (2.6.24).
> I plan to merge the support when Sebastian submits it (I'm already in
> contact with him regarding this).
Ah great, so 2.6.25 probably?!

Best wishes,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel support for newer UDF versions ( 2.01)

2008-02-06 Thread Christoph Anton Mitterer
Hi.

On Wed, 2008-02-06 at 17:36 +0100, Jan Kara wrote:
 I think there's a patch for 2.50 support which is quite recent (2.6.24).
 I plan to merge the support when Sebastian submits it (I'm already in
 contact with him regarding this).
Ah great, so 2.6.25 probably?!

Best wishes,
Chris.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with dmcrypt/LUKS

2008-02-04 Thread Christoph Anton Mitterer
On Mon, 2008-02-04 at 10:17 +0100, Milan Broz wrote:
> Yes, so if you hit this with 2.6.24 too is very important to sent OOps
> log to identify problem (or link to screen snapshot, digital camera
> snapshot or so).
I did about 5 complete tests today and dozens of mkfs.ext3's but I
wasn't able to reproduce any of the two errors... very very strange.
(used the same sequence of commands, with and without using the
USB-stick)...
I'll do some other tests tomorrow because these problems were real and I
cannot believe, that they're simply gone...

And IMHO hardware problems are still very unlikely, or am I wrong?

Anyway,.. is there anybody who made deeper tests of dmcrypt? I mean real
massive tests perhaps with different filesystems and so on?
What are your experiences at Redhat?

Best wishes,
Chris


smime.p7s
Description: S/MIME cryptographic signature


Re: data corruption with dmcrypt/LUKS

2008-02-04 Thread Christoph Anton Mitterer
On Mon, 2008-02-04 at 10:17 +0100, Milan Broz wrote:
 Yes, so if you hit this with 2.6.24 too is very important to sent OOps
 log to identify problem (or link to screen snapshot, digital camera
 snapshot or so).
I did about 5 complete tests today and dozens of mkfs.ext3's but I
wasn't able to reproduce any of the two errors... very very strange.
(used the same sequence of commands, with and without using the
USB-stick)...
I'll do some other tests tomorrow because these problems were real and I
cannot believe, that they're simply gone...

And IMHO hardware problems are still very unlikely, or am I wrong?

Anyway,.. is there anybody who made deeper tests of dmcrypt? I mean real
massive tests perhaps with different filesystems and so on?
What are your experiences at Redhat?

Best wishes,
Chris


smime.p7s
Description: S/MIME cryptographic signature


Re: data corruption with dmcrypt/LUKS

2008-02-03 Thread Christoph Anton Mitterer
Hi Milan Broz

On Sun, 2008-02-03 at 23:06 +0100, Milan Broz wrote:
> Are you sure, that your USB-stick is not faulty ?
I actually tested the stick, too. But I consider problems in the stick
(you mean the key-holding stick, do you?) as highly unlikely. 
If the key would be wrong a good crypto system should give me completely
different data and not just these "minor" faults.


> Could you reproduce it with different piece of hw ?
> (Several strange reports for dm-crypt over USB were identified to be 
> USB hw faults.)
I'll test it tomorrow.



> > 2) The second bug happens only rarely and leads to a panic.
> > Unfortunately it's difficult to reproduce, but it always happened when I
> > mkfs.ext3 on the /dev/mapper/sda2.
> > There's a stack-trace printed which clearly involves some dmcrypt
> > lines...
> But no stack trace attached here... please attach it.
Unfortunately I don't have one,... nothing was written to the logs and I
forgot to write it up :-/



> It can be known bug which was fixed in stable version some time ago
> see http://lkml.org/lkml/2007/7/20/211
Uhm but that patch should be part of 2.6.24, shouldn't it?


> No known bugs causing data corruption, no such reports so far
> for stable kernel.
Uhm ok,.. well as told above I'll make some other tests (without the
USB-sticks) but it would be great some people here could try this, too.

Best wishes,
Chris.

btw: What's about the dmcrypt mailing list,.. I've tried to subscribe
but no answers, and I get not posts (not even my owns).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


data corruption with dmcrypt/LUKS

2008-02-03 Thread Christoph Anton Mitterer
Hi.

I think I've found a bug somewhere in dm-crypt...

First of all the system that I use:
Debian (sid) with kernel 2.26.24 on AMD64 (intel core2 duo), 2GB RAM

For several days now I try to fully encrypt that system (that is, all
partitions are encrypted an I boot from an USB stick)
There are two errors that appear always and always again but first of
all an explanation how I setup everything:
/dev/sda1 is my unencrypted debian installation
/dev/sda2 is the partition that will hold the encrypted root
/dev/sda3 is swap

I boot from an USB stick (with the same debian sid/2.6.24 kernel as
on /dev/sda1) which is /dev/sdb(1).
The key itself is on /dev/sdc (also an USB stick)

How I've made the key:
dd if=/dev/random of=/tmp/KEY count=32 bs=1

How I've formatted sda2:
cryptsetup --verbose --cipher aes-cbc-essiv:sha256 --key-size 256
--iter-time 1 luksFormat /dev/sda2 /tmp/KEY
cryptsetup --key-file /tmp/KEY luksOpen /dev/sda2 sda2
mkfs.ext3 /dev/mapper/sda2
cryptsetup luksClose sda2




cp -a /mnt/unencrypted /mnt/encrypted/





Here's the first problem:
1) When I now diff the two versions again (the unencrypted and the one
from the encrypted partition) I get differences...
I'm quite sure that this is not due to damaged RAM or harddisk (checked
several times with memtest and badblocks) and the corruption is always
the same, although not fully reproducibly.
The filesystem tree itself seems to be the same on both discs (but I'm
not sure if the permissions and owners are copied correctly), but there
are differences in some (though not all) files.
The difference is always the same, that for one or more bytes of the
affected files, the hexcode is reduced by 0x10
That is:
If the file contains a byte "T" (0x74) on the unencrypted partition it
will have a "D" (0x64) on the encrypted.

I've first recognized this bug some weeks ago, when I used a 2.6.18
kernel on my boot+copy USB-stick (/dev/sdb) but I thought this might be
a bug in that pretty old version...
But now it even happens with 2.6.24...

2) The second bug happens only rarely and leads to a panic.
Unfortunately it's difficult to reproduce, but it always happened when I
mkfs.ext3 on the /dev/mapper/sda2.
There's a stack-trace printed which clearly involves some dmcrypt
lines...


Unfortunately this bug makes dm-crypt completely unusable for me (and
everybody who needs correctness for his data ;-) )

I'd ask you to run your own (massive) copying tests and report here if
you can reproduce that error.

Best wishes,
Chris.

btw: are there any other currently known bugs in dmcrypt? Or is it
considered as "production stable"?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


data corruption with dmcrypt/LUKS

2008-02-03 Thread Christoph Anton Mitterer
Hi.

I think I've found a bug somewhere in dm-crypt...

First of all the system that I use:
Debian (sid) with kernel 2.26.24 on AMD64 (intel core2 duo), 2GB RAM

For several days now I try to fully encrypt that system (that is, all
partitions are encrypted an I boot from an USB stick)
There are two errors that appear always and always again but first of
all an explanation how I setup everything:
/dev/sda1 is my unencrypted debian installation
/dev/sda2 is the partition that will hold the encrypted root
/dev/sda3 is swap

I boot from an USB stick (with the same debian sid/2.6.24 kernel as
on /dev/sda1) which is /dev/sdb(1).
The key itself is on /dev/sdc (also an USB stick)

How I've made the key:
dd if=/dev/random of=/tmp/KEY count=32 bs=1

How I've formatted sda2:
cryptsetup --verbose --cipher aes-cbc-essiv:sha256 --key-size 256
--iter-time 1 luksFormat /dev/sda2 /tmp/KEY
cryptsetup --key-file /tmp/KEY luksOpen /dev/sda2 sda2
mkfs.ext3 /dev/mapper/sda2
cryptsetup luksClose sda2

reboot
create mappings and mount everything again

cp -a /mnt/unencrypted /mnt/encrypted/

when I diff -q -r /mnt/unencrypted /mnt/encrypted/ here, everything is
ok but this is just, because those files are still cached in RAM
unmount + close mapping + reboot
create mappings and mount everything again

Here's the first problem:
1) When I now diff the two versions again (the unencrypted and the one
from the encrypted partition) I get differences...
I'm quite sure that this is not due to damaged RAM or harddisk (checked
several times with memtest and badblocks) and the corruption is always
the same, although not fully reproducibly.
The filesystem tree itself seems to be the same on both discs (but I'm
not sure if the permissions and owners are copied correctly), but there
are differences in some (though not all) files.
The difference is always the same, that for one or more bytes of the
affected files, the hexcode is reduced by 0x10
That is:
If the file contains a byte T (0x74) on the unencrypted partition it
will have a D (0x64) on the encrypted.

I've first recognized this bug some weeks ago, when I used a 2.6.18
kernel on my boot+copy USB-stick (/dev/sdb) but I thought this might be
a bug in that pretty old version...
But now it even happens with 2.6.24...

2) The second bug happens only rarely and leads to a panic.
Unfortunately it's difficult to reproduce, but it always happened when I
mkfs.ext3 on the /dev/mapper/sda2.
There's a stack-trace printed which clearly involves some dmcrypt
lines...


Unfortunately this bug makes dm-crypt completely unusable for me (and
everybody who needs correctness for his data ;-) )

I'd ask you to run your own (massive) copying tests and report here if
you can reproduce that error.

Best wishes,
Chris.

btw: are there any other currently known bugs in dmcrypt? Or is it
considered as production stable?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with dmcrypt/LUKS

2008-02-03 Thread Christoph Anton Mitterer
Hi Milan Broz

On Sun, 2008-02-03 at 23:06 +0100, Milan Broz wrote:
 Are you sure, that your USB-stick is not faulty ?
I actually tested the stick, too. But I consider problems in the stick
(you mean the key-holding stick, do you?) as highly unlikely. 
If the key would be wrong a good crypto system should give me completely
different data and not just these minor faults.


 Could you reproduce it with different piece of hw ?
 (Several strange reports for dm-crypt over USB were identified to be 
 USB hw faults.)
I'll test it tomorrow.



  2) The second bug happens only rarely and leads to a panic.
  Unfortunately it's difficult to reproduce, but it always happened when I
  mkfs.ext3 on the /dev/mapper/sda2.
  There's a stack-trace printed which clearly involves some dmcrypt
  lines...
 But no stack trace attached here... please attach it.
Unfortunately I don't have one,... nothing was written to the logs and I
forgot to write it up :-/



 It can be known bug which was fixed in stable version some time ago
 see http://lkml.org/lkml/2007/7/20/211
Uhm but that patch should be part of 2.6.24, shouldn't it?


 No known bugs causing data corruption, no such reports so far
 for stable kernel.
Uhm ok,.. well as told above I'll make some other tests (without the
USB-sticks) but it would be great some people here could try this, too.

Best wishes,
Chris.

btw: What's about the dmcrypt mailing list,.. I've tried to subscribe
but no answers, and I get not posts (not even my owns).

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel support for newer UDF versions (> 2.01)

2008-02-02 Thread Christoph Anton Mitterer
Hi everybody.

Does someone know if it's planned (or even in the works) to support
newer versions of the UDF filesystem?
Especially the versions 2.50 and 2.60 would be very interesting for
mounting BluRay discs (and HD DVD)...

I've seen somewhere a patch but that was pretty old and AFAIK not yet
finished or included in the vanilla tree.

Best wishes,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel support for newer UDF versions ( 2.01)

2008-02-02 Thread Christoph Anton Mitterer
Hi everybody.

Does someone know if it's planned (or even in the works) to support
newer versions of the UDF filesystem?
Especially the versions 2.50 and 2.60 would be very interesting for
mounting BluRay discs (and HD DVD)...

I've seen somewhere a patch but that was pretty old and AFAIK not yet
finished or included in the vanilla tree.

Best wishes,
Chris.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer
On Mon, 2008-01-28 at 17:47 -0600, Robert Hancock wrote:
> Nope, I/we are still trying to figure out how to fix this properly..
I see :-)
Uhm is there a bugreport opened, so that I can trace your efforts? Or
would you be so kind to inform me when you have a patch an Linus
accepted it? :-)

btw: Could you also fix that AACS issue?? (ok I know that this isn't
bluray related,... ;-P )

Thanks,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer

On Mon, 2008-01-28 at 17:38 -0600, Robert Hancock wrote:
> Christoph Anton Mitterer wrote:
> > btw: I'm cross posting this to lkml and debian-user,... hope nobody
> > feels offended :-)
> 
> How much RAM is in your machine? There's a known problem with sata_nv 
> ADMA with ATAPI devices and over 4GB of RAM.
Uhm *tadaaa* ... got 4GB ^^
Ok,.. so If this only applies to ATAPI devices it should be clear why my
SATA HDDs aren't affected ;)

> As a temporary workaround, 
> you can boot with sata_nv.adma=0 on the kernel command line, or limit 
> your memory with the mem= command line option so that memory over 4GB is 
> not used.
Great,.. will have a try tomorrow :-)

Is this already fixed in git?

best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer
Hi everybody.

I've just bought and installed a LG GGW-H20L Blu-Ray burner,...
This is an SATA device, I'm running a 2.6.23.10 kernel (not the Debian
version) on Debian sid (AMD64) and I use the proprietary nvidia drivers
(169.07).
The system is an Dual (!) DualCore AMD Opteron machine.
(Please ask if you need further information)

The kernel seems to correctly identify the device (part of dmesg):
Jan 28 23:22:28 fermat kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Jan 28 23:22:28 fermat kernel: ata4.00: ATAPI: HL-DT-ST BD-RE  GGW-H20L,
YL02, max UDMA/133
Jan 28 23:22:28 fermat kernel: ata4.00: configured for UDMA/133
Jan 28 23:22:28 fermat kernel: scsi 2:0:0:0: Direct-Access ATA
ST3750640AS  3.AA PQ: 0 ANSI: 5
Jan 28 23:22:28 fermat kernel: ata3: bounce limit 0x,
segment boundary 0x, hw segs 61
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] 1465149168 512-byte
hardware sectors (750156 MB)
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] 1465149168 512-byte
hardware sectors (750156 MB)
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jan 28 23:22:28 fermat kernel:  sdc: sdc1
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Attached SCSI disk
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: Attached scsi generic sg2
type 0
Jan 28 23:22:28 fermat kernel: scsi 3:0:0:0: CD-ROMHL-DT-ST
BD-RE  GGW-H20L  YL02 PQ: 0 ANSI: 5
Jan 28 23:22:28 fermat kernel: ata4: bounce limit 0x, segment
boundary 0x, hw segs 127
Jan 28 23:22:28 fermat kernel: sr0: scsi3-mmc drive: 0x/0x caddy
Jan 28 23:22:28 fermat kernel: sr 3:0:0:0: Attached scsi CD-ROM sr0
Jan 28 23:22:28 fermat kernel: sr 3:0:0:0: Attached scsi generic sg3
type 5


Anyway,... as soon as I insert an BluRay disc or an DVD my system
crashes (it either freezes, that happened with the DVD-R, or it reboots,
that happened with the BluRay disc).

There are no panics, oopses, or any other messages in the usual logs (at
least I didn't recognize them ;) ).

Any ideas?

Thanks and best wishes,
Chris.

btw: I'm cross posting this to lkml and debian-user,... hope nobody
feels offended :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer
Hi everybody.

I've just bought and installed a LG GGW-H20L Blu-Ray burner,...
This is an SATA device, I'm running a 2.6.23.10 kernel (not the Debian
version) on Debian sid (AMD64) and I use the proprietary nvidia drivers
(169.07).
The system is an Dual (!) DualCore AMD Opteron machine.
(Please ask if you need further information)

The kernel seems to correctly identify the device (part of dmesg):
Jan 28 23:22:28 fermat kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Jan 28 23:22:28 fermat kernel: ata4.00: ATAPI: HL-DT-ST BD-RE  GGW-H20L,
YL02, max UDMA/133
Jan 28 23:22:28 fermat kernel: ata4.00: configured for UDMA/133
Jan 28 23:22:28 fermat kernel: scsi 2:0:0:0: Direct-Access ATA
ST3750640AS  3.AA PQ: 0 ANSI: 5
Jan 28 23:22:28 fermat kernel: ata3: bounce limit 0x,
segment boundary 0x, hw segs 61
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] 1465149168 512-byte
hardware sectors (750156 MB)
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] 1465149168 512-byte
hardware sectors (750156 MB)
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jan 28 23:22:28 fermat kernel:  sdc: sdc1
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: [sdc] Attached SCSI disk
Jan 28 23:22:28 fermat kernel: sd 2:0:0:0: Attached scsi generic sg2
type 0
Jan 28 23:22:28 fermat kernel: scsi 3:0:0:0: CD-ROMHL-DT-ST
BD-RE  GGW-H20L  YL02 PQ: 0 ANSI: 5
Jan 28 23:22:28 fermat kernel: ata4: bounce limit 0x, segment
boundary 0x, hw segs 127
Jan 28 23:22:28 fermat kernel: sr0: scsi3-mmc drive: 0x/0x caddy
Jan 28 23:22:28 fermat kernel: sr 3:0:0:0: Attached scsi CD-ROM sr0
Jan 28 23:22:28 fermat kernel: sr 3:0:0:0: Attached scsi generic sg3
type 5


Anyway,... as soon as I insert an BluRay disc or an DVD my system
crashes (it either freezes, that happened with the DVD-R, or it reboots,
that happened with the BluRay disc).

There are no panics, oopses, or any other messages in the usual logs (at
least I didn't recognize them ;) ).

Any ideas?

Thanks and best wishes,
Chris.

btw: I'm cross posting this to lkml and debian-user,... hope nobody
feels offended :-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer

On Mon, 2008-01-28 at 17:38 -0600, Robert Hancock wrote:
 Christoph Anton Mitterer wrote:
  btw: I'm cross posting this to lkml and debian-user,... hope nobody
  feels offended :-)
 
 How much RAM is in your machine? There's a known problem with sata_nv 
 ADMA with ATAPI devices and over 4GB of RAM.
Uhm *tadaaa* ... got 4GB ^^
Ok,.. so If this only applies to ATAPI devices it should be clear why my
SATA HDDs aren't affected ;)

 As a temporary workaround, 
 you can boot with sata_nv.adma=0 on the kernel command line, or limit 
 your memory with the mem= command line option so that memory over 4GB is 
 not used.
Great,.. will have a try tomorrow :-)

Is this already fixed in git?

best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Blu Ray LG GGW-H20L crashes Linux

2008-01-28 Thread Christoph Anton Mitterer
On Mon, 2008-01-28 at 17:47 -0600, Robert Hancock wrote:
 Nope, I/we are still trying to figure out how to fix this properly..
I see :-)
Uhm is there a bugreport opened, so that I can trace your efforts? Or
would you be so kind to inform me when you have a patch an Linus
accepted it? :-)

btw: Could you also fix that AACS issue?? (ok I know that this isn't
bluray related,... ;-P )

Thanks,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


kexec, initramdisk and dmcrypt questions

2008-01-17 Thread Christoph Anton Mitterer
Hi.

I'd like to setup a system where all partitions (including the root file
system) are encrypted using dmcrypt.
Of course I need some place where I can boot from, and I intended to use
an USB-stick for that purpose.

Now I think there are (at least) the following two ways of doing this:

1) Traditional way
Boot from the USB-Stick with and initramsdisk,.. that sets up dmcrypt
and mounts the root-filesystem.

-Has the advantages that it's pretty well supported by some distros
(e.g. Debian) and it's very easy to setup.
-Has the disadvantages, that I'll always have to update the contents of
the stick when I install a new kernel (btw: does anybody know of an
write-once USB-Stick? ;) )

After booting it should be possible to just plug out the stick (as the
kernel and the modules are already loaded), or not?



2) using kexec.
I could imagine that my USB-stick serves just as loader,... having a
kernel and initrd that sets up dmcrypt/mounts root and calls kexec for
the "real" working kernel and the corresponding initramdisk, that are
both stored encrypted on e.g. the root filesystem in /boot/ or so...
The initrd of the working kernel contains the dmcrypt keys and
automatically sets up the mappings and mounts the filesystems.

-Has the advantage that this is nearly transparent for the system,
especially for tools that automatically create the initramdisk (stuff
like update-initramfs in Debian)
-And I would (nearly) never have to change the contents of the
loader-USB-stick.

Now I've read through the kexec documentation and I wonder wheter using
kexec might have some negative impact?
As the firmware is already initialised (by the loader kernel??) and the
working kernel must be put on different addresses.

I'm also not sure how to use the "architecture options" from the kexec
userspace tools?

Any ideas, help, suggestions, or threads ;) ?

Thanks and best wishes,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kexec, initramdisk and dmcrypt questions

2008-01-17 Thread Christoph Anton Mitterer
Hi.

I'd like to setup a system where all partitions (including the root file
system) are encrypted using dmcrypt.
Of course I need some place where I can boot from, and I intended to use
an USB-stick for that purpose.

Now I think there are (at least) the following two ways of doing this:

1) Traditional way
Boot from the USB-Stick with and initramsdisk,.. that sets up dmcrypt
and mounts the root-filesystem.

-Has the advantages that it's pretty well supported by some distros
(e.g. Debian) and it's very easy to setup.
-Has the disadvantages, that I'll always have to update the contents of
the stick when I install a new kernel (btw: does anybody know of an
write-once USB-Stick? ;) )

After booting it should be possible to just plug out the stick (as the
kernel and the modules are already loaded), or not?



2) using kexec.
I could imagine that my USB-stick serves just as loader,... having a
kernel and initrd that sets up dmcrypt/mounts root and calls kexec for
the real working kernel and the corresponding initramdisk, that are
both stored encrypted on e.g. the root filesystem in /boot/ or so...
The initrd of the working kernel contains the dmcrypt keys and
automatically sets up the mappings and mounts the filesystems.

-Has the advantage that this is nearly transparent for the system,
especially for tools that automatically create the initramdisk (stuff
like update-initramfs in Debian)
-And I would (nearly) never have to change the contents of the
loader-USB-stick.

Now I've read through the kexec documentation and I wonder wheter using
kexec might have some negative impact?
As the firmware is already initialised (by the loader kernel??) and the
working kernel must be put on different addresses.

I'm also not sure how to use the architecture options from the kexec
userspace tools?

Any ideas, help, suggestions, or threads ;) ?

Thanks and best wishes,
Chris.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Thanks for all your help :-)

Best wishes from Munich,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Hi Alan.

On Fri, 2007-12-14 at 22:24 +, Alan Cox wrote:
> Can you reproduce this without the Nvidia stuff ?
No,.. I'm running for about 2 years now with propreitary nvidia gpu
module,.. but I've never encountered that problem before.
Anyway,... I might have just missed it...

Ah and by the way,.. in the meantime to problem didn't occure again :)

Best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Hi Tejun.


On Sat, 2007-12-15 at 00:16 +0900, Tejun Heo wrote:
> Do you have log with timestamp?  It's difficult to tell what's going on
> without knowing what happened when.
Ah sorry,... I've completely missed that... perhaps those two problems
were not related (at least there's so much time between).

Dec 13 17:05:51 fermat kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel
Module  100.14.19  Wed Sep 12 14:08:38 PDT 2007
Dec 13 17:12:36 fermat kernel: usb 1-1: new high speed USB device using
ehci_hcd and address 5
Dec 13 17:12:36 fermat kernel: usb 1-1: configuration #1 chosen from 1
choice
Dec 13 17:12:36 fermat kernel: scsi6 : SCSI emulation for USB Mass
Storage devices
Dec 13 17:12:36 fermat kernel: usb-storage: device found at 5
Dec 13 17:12:36 fermat kernel: usb-storage: waiting for device to settle
before scanning
Dec 13 17:12:41 fermat kernel: scsi 6:0:0:0: CD-ROMPLEXTOR
DVD-ROM PX-130A  1.03 PQ: 0 ANSI: 0 CCS
Dec 13 17:12:41 fermat kernel: sr0: scsi3-mmc drive: 0x/50x cd/rw
xa/form2 cdda tray
Dec 13 17:12:41 fermat kernel: sr 6:0:0:0: Attached scsi CD-ROM sr0
Dec 13 17:12:41 fermat kernel: sr 6:0:0:0: Attached scsi generic sg3
type 5
Dec 13 17:12:41 fermat kernel: usb-storage: device scan complete
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308688
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77172
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77173
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308688
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77172
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77173
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308816
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77204
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77205
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308816
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77204
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77205
Dec 13 17:18:32 fermat kernel: UDF-fs: No VRS found
Dec 13 18:09:43 fermat kernel: tun0: Disabled Privacy Extensions
Dec 14 01:06:33 fermat kernel: usb 1-1: USB disconnect, address 5
Dec 14 01:06:33 fermat kernel: ata1: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next
 cpb idx 0x0
Dec 14 01:06:33 fermat kernel: ata1: CPB 0: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 1: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 2: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 3: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 4: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 5: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 6: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 7: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 8: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 9: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 10: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: timeout waiting for ADMA IDLE,
stat=0x400
Dec 14 01:06:33 fermat kernel: ata1.00: exception Emask 0x0 SAct 0x7ff
SErr 0x0 action 0x2 frozen
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/10:00:2e:07:a8/01:00:0b:00:00/40 tag 0 cdb 0x0 data 139264 out
Dec 14 01:06:33 fermat kernel:  res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/18:08:07:3f:02/00:00:00:00:00/40 tag 1 cdb 0x0 data 12288 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:10:b6:a8:23/00:00:0d:00:00/40 tag 2 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:18:f6:37:68/00:00:0d:00:00/40 tag 3 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:20:9e:32:a1/00:00:0d:00:00/40 tag 4 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:28:4f:00:dc/00:00:02:00:00/40 tag 5 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:30:6f:00:dc/00:00:02:00:00/40 tag 6 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 

Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Hi Tejun.


On Sat, 2007-12-15 at 00:16 +0900, Tejun Heo wrote:
 Do you have log with timestamp?  It's difficult to tell what's going on
 without knowing what happened when.
Ah sorry,... I've completely missed that... perhaps those two problems
were not related (at least there's so much time between).

Dec 13 17:05:51 fermat kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel
Module  100.14.19  Wed Sep 12 14:08:38 PDT 2007
Dec 13 17:12:36 fermat kernel: usb 1-1: new high speed USB device using
ehci_hcd and address 5
Dec 13 17:12:36 fermat kernel: usb 1-1: configuration #1 chosen from 1
choice
Dec 13 17:12:36 fermat kernel: scsi6 : SCSI emulation for USB Mass
Storage devices
Dec 13 17:12:36 fermat kernel: usb-storage: device found at 5
Dec 13 17:12:36 fermat kernel: usb-storage: waiting for device to settle
before scanning
Dec 13 17:12:41 fermat kernel: scsi 6:0:0:0: CD-ROMPLEXTOR
DVD-ROM PX-130A  1.03 PQ: 0 ANSI: 0 CCS
Dec 13 17:12:41 fermat kernel: sr0: scsi3-mmc drive: 0x/50x cd/rw
xa/form2 cdda tray
Dec 13 17:12:41 fermat kernel: sr 6:0:0:0: Attached scsi CD-ROM sr0
Dec 13 17:12:41 fermat kernel: sr 6:0:0:0: Attached scsi generic sg3
type 5
Dec 13 17:12:41 fermat kernel: usb-storage: device scan complete
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308688
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77172
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77173
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308688
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77172
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77173
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308816
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77204
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77205
Dec 13 17:13:33 fermat kernel: end_request: I/O error, dev sr0, sector
308816
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77204
Dec 13 17:13:33 fermat kernel: Buffer I/O error on device sr0, logical
block 77205
Dec 13 17:18:32 fermat kernel: UDF-fs: No VRS found
Dec 13 18:09:43 fermat kernel: tun0: Disabled Privacy Extensions
Dec 14 01:06:33 fermat kernel: usb 1-1: USB disconnect, address 5
Dec 14 01:06:33 fermat kernel: ata1: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0
next
 cpb idx 0x0
Dec 14 01:06:33 fermat kernel: ata1: CPB 0: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 1: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 2: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 3: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 4: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 5: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 6: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 7: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 8: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 9: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: CPB 10: ctl_flags 0x1f, resp_flags
0x2
Dec 14 01:06:33 fermat kernel: ata1: timeout waiting for ADMA IDLE,
stat=0x400
Dec 14 01:06:33 fermat kernel: ata1.00: exception Emask 0x0 SAct 0x7ff
SErr 0x0 action 0x2 frozen
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/10:00:2e:07:a8/01:00:0b:00:00/40 tag 0 cdb 0x0 data 139264 out
Dec 14 01:06:33 fermat kernel:  res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/18:08:07:3f:02/00:00:00:00:00/40 tag 1 cdb 0x0 data 12288 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:10:b6:a8:23/00:00:0d:00:00/40 tag 2 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:18:f6:37:68/00:00:0d:00:00/40 tag 3 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:20:9e:32:a1/00:00:0d:00:00/40 tag 4 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:28:4f:00:dc/00:00:02:00:00/40 tag 5 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 01:06:33 fermat kernel: ata1.00: cmd
61/08:30:6f:00:dc/00:00:02:00:00/40 tag 6 cdb 0x0 data 4096 out
Dec 14 01:06:33 fermat kernel:  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 

Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Hi Alan.

On Fri, 2007-12-14 at 22:24 +, Alan Cox wrote:
 Can you reproduce this without the Nvidia stuff ?
No,.. I'm running for about 2 years now with propreitary nvidia gpu
module,.. but I've never encountered that problem before.
Anyway,... I might have just missed it...

Ah and by the way,.. in the meantime to problem didn't occure again :)

Best wishes,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Re: Strange ATA problems

2007-12-14 Thread Christoph Anton Mitterer
Thanks for all your help :-)

Best wishes from Munich,
Chris.


smime.p7s
Description: S/MIME cryptographic signature


Strange ATA problems

2007-12-13 Thread Christoph Anton Mitterer
Hi everybody.

Today I've experienced a very strange problem.

I have a CD/DVD drive connected via USB,... and while woking the system
suddenly freezed (at least those processes that tried to access the
hardsisk).

Looking at dmesg it showed me this:
ACPI: PCI Interrupt Link [LNK3] enabled at IRQ 18
ACPI: PCI Interrupt :02:00.0[A] -> Link [LNK3] -> GSI 18 (level,
high) -> IRQ 18
PCI: Setting latency timer of device :02:00.0 to 64
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  100.14.19  Wed Sep 12
14:08:38 PDT 2007
usb 1-1: new high speed USB device using ehci_hcd and address 5
usb 1-1: configuration #1 chosen from 1 choice
scsi6 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 5
usb-storage: waiting for device to settle before scanning
scsi 6:0:0:0: CD-ROMPLEXTOR  DVD-ROM PX-130A  1.03 PQ: 0
ANSI: 0 CCS
sr0: scsi3-mmc drive: 0x/50x cd/rw xa/form2 cdda tray
sr 6:0:0:0: Attached scsi CD-ROM sr0
sr 6:0:0:0: Attached scsi generic sg3 type 5
usb-storage: device scan complete
end_request: I/O error, dev sr0, sector 308688
Buffer I/O error on device sr0, logical block 77172
Buffer I/O error on device sr0, logical block 77173
end_request: I/O error, dev sr0, sector 308688
Buffer I/O error on device sr0, logical block 77172
Buffer I/O error on device sr0, logical block 77173
end_request: I/O error, dev sr0, sector 308816
Buffer I/O error on device sr0, logical block 77204
Buffer I/O error on device sr0, logical block 77205
end_request: I/O error, dev sr0, sector 308816
Buffer I/O error on device sr0, logical block 77204
Buffer I/O error on device sr0, logical block 77205
UDF-fs: No VRS found
tun0: Disabled Privacy Extensions

So I've plugged out the USB connected drive and then those processes
continued.

Anyway, now I got the following stuff:
usb 1-1: USB disconnect, address 5
ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000
status 0x400 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 5: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 6: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 7: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 8: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 9: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 10: ctl_flags 0x1f, resp_flags 0x2
ata1: timeout waiting for ADMA IDLE, stat=0x400
ata1.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x2 frozen
ata1.00: cmd 61/10:00:2e:07:a8/01:00:0b:00:00/40 tag 0 cdb 0x0 data
139264 out
 res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/18:08:07:3f:02/00:00:00:00:00/40 tag 1 cdb 0x0 data
12288 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:10:b6:a8:23/00:00:0d:00:00/40 tag 2 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:18:f6:37:68/00:00:0d:00:00/40 tag 3 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:20:9e:32:a1/00:00:0d:00:00/40 tag 4 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:28:4f:00:dc/00:00:02:00:00/40 tag 5 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:30:6f:00:dc/00:00:02:00:00/40 tag 6 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:38:3f:03:2c/00:00:05:00:00/40 tag 7 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:40:36:ea:72/00:00:0d:00:00/40 tag 8 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:48:7f:00:c4/00:00:05:00:00/40 tag 9 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:50:47:02:d8/00:00:05:00:00/40 tag 10 cdb 0x0 data
4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA


ata1/sda is my harddisk (at least one of them)
What do does errors mean? Is this probably a hardware failure?
And how can the CD/DVD problem relate to this?

Any ideas?

Thanks and best wishes,
Christoph Anton Mitterer.


total demsg:
usb-dib0700.force_lna_activation=1 iommu=soft 
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009b000 (usable)
 BIOS-e820: 0009b000 - 000a (reserved)
 

Strange ATA problems

2007-12-13 Thread Christoph Anton Mitterer
Hi everybody.

Today I've experienced a very strange problem.

I have a CD/DVD drive connected via USB,... and while woking the system
suddenly freezed (at least those processes that tried to access the
hardsisk).

Looking at dmesg it showed me this:
ACPI: PCI Interrupt Link [LNK3] enabled at IRQ 18
ACPI: PCI Interrupt :02:00.0[A] - Link [LNK3] - GSI 18 (level,
high) - IRQ 18
PCI: Setting latency timer of device :02:00.0 to 64
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  100.14.19  Wed Sep 12
14:08:38 PDT 2007
usb 1-1: new high speed USB device using ehci_hcd and address 5
usb 1-1: configuration #1 chosen from 1 choice
scsi6 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 5
usb-storage: waiting for device to settle before scanning
scsi 6:0:0:0: CD-ROMPLEXTOR  DVD-ROM PX-130A  1.03 PQ: 0
ANSI: 0 CCS
sr0: scsi3-mmc drive: 0x/50x cd/rw xa/form2 cdda tray
sr 6:0:0:0: Attached scsi CD-ROM sr0
sr 6:0:0:0: Attached scsi generic sg3 type 5
usb-storage: device scan complete
end_request: I/O error, dev sr0, sector 308688
Buffer I/O error on device sr0, logical block 77172
Buffer I/O error on device sr0, logical block 77173
end_request: I/O error, dev sr0, sector 308688
Buffer I/O error on device sr0, logical block 77172
Buffer I/O error on device sr0, logical block 77173
end_request: I/O error, dev sr0, sector 308816
Buffer I/O error on device sr0, logical block 77204
Buffer I/O error on device sr0, logical block 77205
end_request: I/O error, dev sr0, sector 308816
Buffer I/O error on device sr0, logical block 77204
Buffer I/O error on device sr0, logical block 77205
UDF-fs: No VRS found
tun0: Disabled Privacy Extensions

So I've plugged out the USB connected drive and then those processes
continued.

Anyway, now I got the following stuff:
usb 1-1: USB disconnect, address 5
ata1: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000
status 0x400 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 1: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 2: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 3: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 4: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 5: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 6: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 7: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 8: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 9: ctl_flags 0x1f, resp_flags 0x2
ata1: CPB 10: ctl_flags 0x1f, resp_flags 0x2
ata1: timeout waiting for ADMA IDLE, stat=0x400
ata1.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x2 frozen
ata1.00: cmd 61/10:00:2e:07:a8/01:00:0b:00:00/40 tag 0 cdb 0x0 data
139264 out
 res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/18:08:07:3f:02/00:00:00:00:00/40 tag 1 cdb 0x0 data
12288 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:10:b6:a8:23/00:00:0d:00:00/40 tag 2 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:18:f6:37:68/00:00:0d:00:00/40 tag 3 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:20:9e:32:a1/00:00:0d:00:00/40 tag 4 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:28:4f:00:dc/00:00:02:00:00/40 tag 5 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:30:6f:00:dc/00:00:02:00:00/40 tag 6 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:38:3f:03:2c/00:00:05:00:00/40 tag 7 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:40:36:ea:72/00:00:0d:00:00/40 tag 8 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:48:7f:00:c4/00:00:05:00:00/40 tag 9 cdb 0x0 data 4096
out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: cmd 61/08:50:47:02:d8/00:00:05:00:00/40 tag 10 cdb 0x0 data
4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA


ata1/sda is my harddisk (at least one of them)
What do does errors mean? Is this probably a hardware failure?
And how can the CD/DVD problem relate to this?

Any ideas?

Thanks and best wishes,
Christoph Anton Mitterer.


total demsg:
usb-dib0700.force_lna_activation=1 iommu=soft 
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009b000 (usable)
 BIOS-e820: 0009b000 - 000a (reserved)
 BIOS-e820

Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-03-22 Thread Christoph Anton Mitterer
Hi folks.

1) Are there any new developments in this issue? Does someone know if
AMD and Nvidia is still investigating?

2) Steve Langasek from Debian sent me a patch that disables the hw-iommu
per default on Nvidia boards.
I've attached it in the kernel bugzilla and asked for inclusion in the
kernel (until we find a real solution).
I'd be pleased if all of you (who experienced the data corruption) could
test this patch.
Note: This patch is NOT a real solution for the issue, it just applies
our workaround (iommu=soft) per default

Bugreport at kernel.org: http://bugzilla.kernel.org/show_bug.cgi?id=7768
Bugreport at Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404148

Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-03-22 Thread Christoph Anton Mitterer
Hi folks.

1) Are there any new developments in this issue? Does someone know if
AMD and Nvidia is still investigating?

2) Steve Langasek from Debian sent me a patch that disables the hw-iommu
per default on Nvidia boards.
I've attached it in the kernel bugzilla and asked for inclusion in the
kernel (until we find a real solution).
I'd be pleased if all of you (who experienced the data corruption) could
test this patch.
Note: This patch is NOT a real solution for the issue, it just applies
our workaround (iommu=soft) per default

Bugreport at kernel.org: http://bugzilla.kernel.org/show_bug.cgi?id=7768
Bugreport at Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404148

Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-24 Thread Christoph Anton Mitterer
Alistair John Strachan wrote:
>> I knew of course about libdvdcss but I've never noticed before that the
>> kernel issues these error messages to the syslog.
>> 
> If you've replaced the drive, the decss keys might have changed for inserted 
> media (this is especially true if your old drive had a different region 
> setting).
>   
Yes,... but as I've just told Alan,... it happens even when just
mounting (an not using libdvdcss at all).

> First set the DVD region on the drive to where you are (probably 2), 
> using "regionset".
That was already done (by factory I assume):
# regionset /dev/hdb
regionset version 0.1 -- reads/sets region code on DVD drives
Current Region Code settings:
RPC Phase: II
type: SET
vendor resets available: 4
user controlled changes resets available: 4
drive plays discs from region(s): 2, mask=0xFD

Would you like to change the region setting of your drive? [y/n]:n


> Then rm -rf ~/.dvdcss and restart your video playing 
> application. It should rescan the dvd, figure out the keys, no error 
> messages.
>   
Playing just works normal (even without recreation of the dvdcss cache)
it's just the syslog message when mounting that makes me nervous ;)


Thanks,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-24 Thread Christoph Anton Mitterer
Alan wrote:
>> I knew of course about libdvdcss but I've never noticed before that the
>> kernel issues these error messages to the syslog.
>> 
> Various bits of random desktop junk poll drives to see what has appeared
> and stick icons on desktops. Some of them do stuff that produces messages
> like this because they aren't careful how they probe and what they do.
>   
Yes of course, but I've disabled that end-user stuff, and for
convenience even tested the whole thing with runlevel 1 (single user).


>> So you say this is normal?! But the problem with ejecting isn't probably
>> normal.
>> 
> Eject is application controllable. Most movie players re-enable the
> button.
>   
Yes I knew,.. I just wondered because that worked even when no player
was started at all. But I found out that this must depend on GNOME,
because when mount + eject-button from single-user,.. the disc is not
ejected.
Anyway it is ejected (+unmounted) when using the eject command (but I
assume that this is normal).

Thanks,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-24 Thread Christoph Anton Mitterer
Alan wrote:
 I knew of course about libdvdcss but I've never noticed before that the
 kernel issues these error messages to the syslog.
 
 Various bits of random desktop junk poll drives to see what has appeared
 and stick icons on desktops. Some of them do stuff that produces messages
 like this because they aren't careful how they probe and what they do.
   
Yes of course, but I've disabled that end-user stuff, and for
convenience even tested the whole thing with runlevel 1 (single user).


 So you say this is normal?! But the problem with ejecting isn't probably
 normal.
 
 Eject is application controllable. Most movie players re-enable the
 button.
   
Yes I knew,.. I just wondered because that worked even when no player
was started at all. But I found out that this must depend on GNOME,
because when mount + eject-button from single-user,.. the disc is not
ejected.
Anyway it is ejected (+unmounted) when using the eject command (but I
assume that this is normal).

Thanks,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-24 Thread Christoph Anton Mitterer
Alistair John Strachan wrote:
 I knew of course about libdvdcss but I've never noticed before that the
 kernel issues these error messages to the syslog.
 
 If you've replaced the drive, the decss keys might have changed for inserted 
 media (this is especially true if your old drive had a different region 
 setting).
   
Yes,... but as I've just told Alan,... it happens even when just
mounting (an not using libdvdcss at all).

 First set the DVD region on the drive to where you are (probably 2), 
 using regionset.
That was already done (by factory I assume):
# regionset /dev/hdb
regionset version 0.1 -- reads/sets region code on DVD drives
Current Region Code settings:
RPC Phase: II
type: SET
vendor resets available: 4
user controlled changes resets available: 4
drive plays discs from region(s): 2, mask=0xFD

Would you like to change the region setting of your drive? [y/n]:n


 Then rm -rf ~/.dvdcss and restart your video playing 
 application. It should rescan the dvd, figure out the keys, no error 
 messages.
   
Playing just works normal (even without recreation of the dvdcss cache)
it's just the syslog message when mounting that makes me nervous ;)


Thanks,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-23 Thread Christoph Anton Mitterer
Alan wrote:
>> kernel:   Error: Illegal request -- (Sense key=0x05)
>> kernel:   Read of scrambled sector without authentication -- (asc=0x6f,
>> ascq=0x03)
>> 
>
> The disc is using digital rights management. If you are in a country that
> permits it you can use a dvd reader library with decss support, if not
> you'll have to either watch your disks on a system approved by the movie
> industry enforcers or commit a crime to read the disc.
>   
I knew of course about libdvdcss but I've never noticed before that the
kernel issues these error messages to the syslog.

So you say this is normal?! But the problem with ejecting isn't probably
normal.

Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Strange

2007-01-23 Thread Christoph Anton Mitterer
Hi.

I'm using a kernel 2.6.18.1 with a Plextor PX-769A CD/DVD drive (using
the old IDE drivers)... The drive itself is brand-new. The firmware
version is 1.06 (which is the most recent).

There are to issues I experience.

1) On some (but not all) DVD-Videos I get the following messages:

kernel: hdb: command error: status=0x51 { DriveReady SeekComplete Error }
kernel: hdb: command error: error=0x54 { AbortedCommand
LastFailedSense=0x05 }
kernel: ide: failed opcode was: unknown
kernel: ATAPI device hdb:
kernel:   Error: Illegal request -- (Sense key=0x05)
kernel:   Read of scrambled sector without authentication -- (asc=0x6f,
ascq=0x03)
kernel:   The failed "Read 10" packet command was:
kernel:   "28 00 00 00 01 38 00 00 01 00 00 00 00 00 00 00 "
kernel: end_request: I/O error, dev hdb, sector 1248

What do they mean? Does it indicate a damage in the drive? Or a firmware
error? Anyway I can read the contents of the media (and watch the movies
on it)

The error appears after mounting the discs (but as I've said, not with
every disc).

Any ideas?



2) As far as I can remember, when a disc was mounted, the eject button
was disabled. But now, when I press the eject button, the disc is
ejected (and it seems that it is unmounted, at least mount doesn't show
it any longer).

Is this the correct behaviour? I'd doubt that.




Thanks in advance,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Strange

2007-01-23 Thread Christoph Anton Mitterer
Hi.

I'm using a kernel 2.6.18.1 with a Plextor PX-769A CD/DVD drive (using
the old IDE drivers)... The drive itself is brand-new. The firmware
version is 1.06 (which is the most recent).

There are to issues I experience.

1) On some (but not all) DVD-Videos I get the following messages:

kernel: hdb: command error: status=0x51 { DriveReady SeekComplete Error }
kernel: hdb: command error: error=0x54 { AbortedCommand
LastFailedSense=0x05 }
kernel: ide: failed opcode was: unknown
kernel: ATAPI device hdb:
kernel:   Error: Illegal request -- (Sense key=0x05)
kernel:   Read of scrambled sector without authentication -- (asc=0x6f,
ascq=0x03)
kernel:   The failed Read 10 packet command was:
kernel:   28 00 00 00 01 38 00 00 01 00 00 00 00 00 00 00 
kernel: end_request: I/O error, dev hdb, sector 1248

What do they mean? Does it indicate a damage in the drive? Or a firmware
error? Anyway I can read the contents of the media (and watch the movies
on it)

The error appears after mounting the discs (but as I've said, not with
every disc).

Any ideas?



2) As far as I can remember, when a disc was mounted, the eject button
was disabled. But now, when I press the eject button, the disc is
ejected (and it seems that it is unmounted, at least mount doesn't show
it any longer).

Is this the correct behaviour? I'd doubt that.




Thanks in advance,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: Strange

2007-01-23 Thread Christoph Anton Mitterer
Alan wrote:
 kernel:   Error: Illegal request -- (Sense key=0x05)
 kernel:   Read of scrambled sector without authentication -- (asc=0x6f,
 ascq=0x03)
 

 The disc is using digital rights management. If you are in a country that
 permits it you can use a dvd reader library with decss support, if not
 you'll have to either watch your disks on a system approved by the movie
 industry enforcers or commit a crime to read the disc.
   
I knew of course about libdvdcss but I've never noticed before that the
kernel issues these error messages to the syslog.

So you say this is normal?! But the problem with ejecting isn't probably
normal.

Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
Erik Andersen wrote:
> I just tried again and while using iommu=soft does avoid the
> corruption problem, as with previous kernels with 2.6.20-rc5
> using iommu=soft still makes my pcHDTV HD5500 DVB cards not work.
> I still have to disable memhole and lose 1 GB.  :-(

Please add this to the bugreport
(http://bugzilla.kernel.org/show_bug.cgi?id=7768)

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
joachim wrote:
> Not only has it only been on Nvidia chipsets but we have only seen
> reports on the Nvidia CK804 SATA controller.  Please write in or add
> yourself to the bugzilla entry [1] and tell us which hardware you have
> if you get 4kB pagesize corruption and it goes away with "iommu=soft".
How do I find out if I get a 4kB pagesize corruption (or is this the
same as "our corruption"?

Chris.

btw: Should we only post the controller, or other hardware details, too?
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
joachim wrote:
 Not only has it only been on Nvidia chipsets but we have only seen
 reports on the Nvidia CK804 SATA controller.  Please write in or add
 yourself to the bugzilla entry [1] and tell us which hardware you have
 if you get 4kB pagesize corruption and it goes away with iommu=soft.
How do I find out if I get a 4kB pagesize corruption (or is this the
same as our corruption?

Chris.

btw: Should we only post the controller, or other hardware details, too?
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Christoph Anton Mitterer
Erik Andersen wrote:
 I just tried again and while using iommu=soft does avoid the
 corruption problem, as with previous kernels with 2.6.20-rc5
 using iommu=soft still makes my pcHDTV HD5500 DVB cards not work.
 I still have to disable memhole and lose 1 GB.  :-(

Please add this to the bugreport
(http://bugzilla.kernel.org/show_bug.cgi?id=7768)

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Andi Kleen wrote:
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.
I've just read the posts about AMDs and NVIDIAs effort to find the
issue,... but in the meantime this would be the best solution.

And if "we"'ll ever find a rue solution,.. we could still deactivate the
iommu=soft setting.


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
> I'd like to here from Andi how he feels about this?  It seems like a
> somewhat drastic solution in some ways given a lot of hardware doesn't
> seem to be affected (or maybe in those cases it's just really hard to
> hit, I don't know).
>   
Yes this might be true,.. those who have reported working systems might
just have a configuration where the error happens even rarer or where
some other event(s) work around it.

>> Well we can hope that Nvidia will find out more (though I'm not too
>> optimistic).
>> 
> Ideally someone from AMD needs to look into this, if some mainboards
> really never see this problem, then why is that?  Is there errata that
> some BIOS/mainboard vendors are dealing with that others are not?
>   
Some time ago I've asked here in a post if some of you could try to
contact AMD and/or Nvidia,.. as no one did,... I wrote them again (to
all forums and email addresses I knew). (You can see the text here
http://www.nvnews.net/vbulletin/showthread.php?t=82909).
Now Nvidia replied and it seems (thanks to Mr. Friedman) that they're
actually try to investigate in the issue...

I received on reply from AMD (actually in German which is strange as I
wrote to their US support)... where they told me they'd have forwarded
my mail to their Linux engineers... but no reply since then.

Perhaps some of you have some "contacts" and can use them...
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Arkadiusz Miskiewicz wrote:
> FYI it seems that I was also hit by this bug with qlogic fc card + adaptec 
> taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on 
> it.
>
> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st=arkadiusz+fibre=8#45701994c95fe2cf
>   
I'm aware of your old thread and at least I considered your postings
from it :-)

Anyway, thanks for your information. =)

Chris.

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
> right now i'm thinking if we can't figure out which cpu/bios
> combinations are safe we might almost be better off doing iommu=soft
> for *all* k8 stuff except for those that are whitelisted; though this
> seems extremely drastic
>   
I agree,... it seems drastic, but this is the only really secure solution.
But it seems that none of the responsible developers read our thread or
the bugreport and gave his opinion about the issue.

> it's not clear if this only affect nvidia based chipsets, the nature
> of the corruption makes me think it's not an iommu software bug (we
> see a few bytes not entire pages corrupted, it's not even clear if
> it's entire cachelines trashed) --- perhaps other vendors have more
> recent bios errata or maybe it's just that nvidia has sold a lot of
> these so they are more visible? (i'm assuming at this point it might
> be some kind of cpu errata that some bioses deal with because some
> mainboards don't ever seem to see this whilst others do)
>   
Well we can hope that Nvidia will find out more (though I'm not too
optimistic).


> in some ways the problem is worse with recent kernels --- because the
> ethernet and sata can address over 4GB and don't use the iommu anymore
> the problem is going to be *much* harder to hit, but still here
> lurking to cause problems for people.
Yes I agree,.. this is a dangerous situation...
But we should not forget about the issue, just because SATA is not
longer affected.

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-16 Thread Christoph Anton Mitterer
Robert Hancock wrote:
>> What is that GART thing exactly? Is this the hardware IOMMU? I've always
>> thought GART was something graphics card related,.. but if so,.. how
>> could this solve our problem (that seems to occur mainly on harddisks)?
>> 
> The GART built into the Athlon 64/Opteron CPUs is normally used for 
> remapping graphics memory so that an AGP graphics card can see 
> physically non-contiguous memory as one contiguous region. However, 
> Linux can also use it as an IOMMU which allows devices which normally 
> can't access memory above 4GB to see a mapping of that memory that 
> resides below 4GB. In pre-2.6.20 kernels both the SATA and PATA 
> controllers on the nForce 4 chipsets can only access memory below 4GB so 
> transfers to memory above this mark have to go through the IOMMU. In 
> 2.6.20 this limitation is lifted on the nForce4 SATA controllers.
>   
Ah, I see. Thanks for that introduction :-)


>> Does this mean that PATA is no related? The corruption appears on PATA
>> disks to, so why should it only solve the issue at SATA disks? Sounds a
>> bit strange to me?
>> 
> The PATA controller will still be using 32-bit DMA and so may also use 
> the IOMMU, so this problem would not be avoided.
>   
>   
>> Can you explain this a little bit more please? Is this a drawback (like
>> a performance decrease)? Like under Windows where they never use the
>> hardware iommu but always do it via software?
>> 
>
> No, it shouldn't cause any performance loss. In previous kernels the 
> nForce4 SATA controller was controlled using an interface quite similar 
> to a PATA controller. In 2.6.20 kernels they use a more efficient 
> interface that NVidia calls ADMA, which in addition to supporting NCQ 
> also supports DMA without any 4GB limitations, so it can access all 
> memory directly without requiring IOMMU assistance.
>
> Note that if this corruption problem is, as has been suggested, related 
> to memory hole remapping and the IOMMU, then this change only prevents 
> the SATA controller transfers from experiencing this problem. Transfers 
> on the PATA controller as well as any other devices with 32-bit DMA 
> limitations might still have problems. As such this really just avoids 
> the problem, not fixes it.
>   
Ok,.. that sounds reasonable,.. so the whole thing might (!) actually be
a hardware design error,... but we just don't use that hardware any
longer when accessing devices via sata_nv.

So this doesn't solve our problem with PATA drives or other devices
(although we had until now no reports of errors with other devices) and
we have to stick with iommu=soft.

If one use iommu=soft the sata_nv will continue to use the new code for
the ADMA, right?


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-16 Thread Christoph Anton Mitterer
Robert Hancock wrote:
 What is that GART thing exactly? Is this the hardware IOMMU? I've always
 thought GART was something graphics card related,.. but if so,.. how
 could this solve our problem (that seems to occur mainly on harddisks)?
 
 The GART built into the Athlon 64/Opteron CPUs is normally used for 
 remapping graphics memory so that an AGP graphics card can see 
 physically non-contiguous memory as one contiguous region. However, 
 Linux can also use it as an IOMMU which allows devices which normally 
 can't access memory above 4GB to see a mapping of that memory that 
 resides below 4GB. In pre-2.6.20 kernels both the SATA and PATA 
 controllers on the nForce 4 chipsets can only access memory below 4GB so 
 transfers to memory above this mark have to go through the IOMMU. In 
 2.6.20 this limitation is lifted on the nForce4 SATA controllers.
   
Ah, I see. Thanks for that introduction :-)


 Does this mean that PATA is no related? The corruption appears on PATA
 disks to, so why should it only solve the issue at SATA disks? Sounds a
 bit strange to me?
 
 The PATA controller will still be using 32-bit DMA and so may also use 
 the IOMMU, so this problem would not be avoided.
   
   
 Can you explain this a little bit more please? Is this a drawback (like
 a performance decrease)? Like under Windows where they never use the
 hardware iommu but always do it via software?
 

 No, it shouldn't cause any performance loss. In previous kernels the 
 nForce4 SATA controller was controlled using an interface quite similar 
 to a PATA controller. In 2.6.20 kernels they use a more efficient 
 interface that NVidia calls ADMA, which in addition to supporting NCQ 
 also supports DMA without any 4GB limitations, so it can access all 
 memory directly without requiring IOMMU assistance.

 Note that if this corruption problem is, as has been suggested, related 
 to memory hole remapping and the IOMMU, then this change only prevents 
 the SATA controller transfers from experiencing this problem. Transfers 
 on the PATA controller as well as any other devices with 32-bit DMA 
 limitations might still have problems. As such this really just avoids 
 the problem, not fixes it.
   
Ok,.. that sounds reasonable,.. so the whole thing might (!) actually be
a hardware design error,... but we just don't use that hardware any
longer when accessing devices via sata_nv.

So this doesn't solve our problem with PATA drives or other devices
(although we had until now no reports of errors with other devices) and
we have to stick with iommu=soft.

If one use iommu=soft the sata_nv will continue to use the new code for
the ADMA, right?


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
 right now i'm thinking if we can't figure out which cpu/bios
 combinations are safe we might almost be better off doing iommu=soft
 for *all* k8 stuff except for those that are whitelisted; though this
 seems extremely drastic
   
I agree,... it seems drastic, but this is the only really secure solution.
But it seems that none of the responsible developers read our thread or
the bugreport and gave his opinion about the issue.

 it's not clear if this only affect nvidia based chipsets, the nature
 of the corruption makes me think it's not an iommu software bug (we
 see a few bytes not entire pages corrupted, it's not even clear if
 it's entire cachelines trashed) --- perhaps other vendors have more
 recent bios errata or maybe it's just that nvidia has sold a lot of
 these so they are more visible? (i'm assuming at this point it might
 be some kind of cpu errata that some bioses deal with because some
 mainboards don't ever seem to see this whilst others do)
   
Well we can hope that Nvidia will find out more (though I'm not too
optimistic).


 in some ways the problem is worse with recent kernels --- because the
 ethernet and sata can address over 4GB and don't use the iommu anymore
 the problem is going to be *much* harder to hit, but still here
 lurking to cause problems for people.
Yes I agree,.. this is a dangerous situation...
But we should not forget about the issue, just because SATA is not
longer affected.

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Arkadiusz Miskiewicz wrote:
 FYI it seems that I was also hit by this bug with qlogic fc card + adaptec 
 taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on 
 it.

 http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=stq=arkadiusz+fibrernum=8#45701994c95fe2cf
   
I'm aware of your old thread and at least I considered your postings
from it :-)

Anyway, thanks for your information. =)

Chris.

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
 I'd like to here from Andi how he feels about this?  It seems like a
 somewhat drastic solution in some ways given a lot of hardware doesn't
 seem to be affected (or maybe in those cases it's just really hard to
 hit, I don't know).
   
Yes this might be true,.. those who have reported working systems might
just have a configuration where the error happens even rarer or where
some other event(s) work around it.

 Well we can hope that Nvidia will find out more (though I'm not too
 optimistic).
 
 Ideally someone from AMD needs to look into this, if some mainboards
 really never see this problem, then why is that?  Is there errata that
 some BIOS/mainboard vendors are dealing with that others are not?
   
Some time ago I've asked here in a post if some of you could try to
contact AMD and/or Nvidia,.. as no one did,... I wrote them again (to
all forums and email addresses I knew). (You can see the text here
http://www.nvnews.net/vbulletin/showthread.php?t=82909).
Now Nvidia replied and it seems (thanks to Mr. Friedman) that they're
actually try to investigate in the issue...

I received on reply from AMD (actually in German which is strange as I
wrote to their US support)... where they told me they'd have forwarded
my mail to their Linux engineers... but no reply since then.

Perhaps some of you have some contacts and can use them...
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-16 Thread Christoph Anton Mitterer
Andi Kleen wrote:
 AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
 although there were similar problems on VIA in the past too.
 Unless a good workaround comes around soon I'll probably default
 to iommu=soft on Nvidia.
I've just read the posts about AMDs and NVIDIAs effort to find the
issue,... but in the meantime this would be the best solution.

And if we'll ever find a rue solution,.. we could still deactivate the
iommu=soft setting.


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Sorry, as always I've forgot some things... *g*


Robert Hancock wrote:

> If this is related to some problem with using the GART IOMMU with memory 
> hole remapping enabled
What is that GART thing exactly? Is this the hardware IOMMU? I've always
thought GART was something graphics card related,.. but if so,.. how
could this solve our problem (that seems to occur mainly on harddisks)?

> then 2.6.20-rc kernels may avoid this problem on 
> nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA 
> controller are concerned
Does this mean that PATA is no related? The corruption appears on PATA
disks to, so why should it only solve the issue at SATA disks? Sounds a
bit strange to me?

> as the sata_nv driver now supports 64-bit DMA 
> on these chipsets and so no longer requires the IOMMU.
>   
Can you explain this a little bit more please? Is this a drawback (like
a performance decrease)? Like under Windows where they never use the
hardware iommu but always do it via software?


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Hi everybody.

Sorry again for my late reply...

Robert gave us the following interesting information some days ago:

Robert Hancock wrote:
> If this is related to some problem with using the GART IOMMU with memory 
> hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on 
> nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA 
> controller are concerned as the sata_nv driver now supports 64-bit DMA 
> on these chipsets and so no longer requires the IOMMU.
>   


I've just tested it with my "normal" BIOS settings, that is memhole
mapping = hardware, IOMMU = enabled and 64MB and _without_ (!)
iommu=soft as kernel parameters.
I only had the time for a small test (that is 3 passes with each 10
complete sha512sums cyles over about 30GB data)... but sofar, no
corruption occured.

It is surely far to eraly to tell that our issue was solved by
2.6.20-rc-something but I ask all of you that had systems that
suffered from the corruption to make _intensive_ tests with the most
recent rc of 2.6.20 (I've used 2.6.20-rc5) and report your results.
I'll do a extensive test tomorrow.

And of course (!!): Test without using iommu=soft and with enabled
memhole mapping (in the BIOS). (It won't make any sense to look if the
new kernel solves our problem while still applying one of our two
workarounds).


Please also note that there might be two completely data corruption
problems. The onle "solved" by iommu=soft and another reported by Kurtis
D. Rader.
I've asked him to clarify this in a post. :-)



Ok,... now if this (the new kernel) would really solve the issue... we
should try to find out what exactly was changed in the code, and if it
sounds logical that this solved the problem or not.
The new kernel could just make the corruption even more rare.


Best wishes,
Chris.


begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Hi.

Some days ago I received the following message from "Sunny Days". I
think he did not send it lkml so I forward it now:

Sunny Days wrote:
> hello,
>
> i have done some extensive testing on this.
>
> various opterons, always single socket
> various dimms 1 and 2gb modules
> and hitachi+seagate disks with various firmwares and sizes
> but i am getting a diferent pattern in the corruption.
> My test file was 10gb.
>
> I have mapped the earliest corruption as low as 10mb in the written data.
> i have also monitor the adress range used from the cp /md5sum proccess
> under /proc//$PID/maps to see if i could find a pattern but i was
> unable to.
>
> i also tested ext2 and lvm with similar results aka corruption.
> later on the week i should get a pci promise controller and test on that one.
>
> Things i have not tested is the patch that linus released 10 days ago
> and reiserfs3/4
>
> my nvidia chipset was ck804 (a3)
>
> Hope somehow we get to the bottom of this.
>
> Hope this helps
>
>
> btw amd erratas that could possible influence this are
>
> 115, 123, 156 with the latter been fascinating as it the workaround
> suggested is 0x0 page entry.
>
>   

Does anyone has any opinions about this? Could you please read the
mentioned erratas and tell me what you think?

Best wishes,
Chris.

@ Sunny Days: Thanks for you mail.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Hi.

Some days ago I received the following message from Sunny Days. I
think he did not send it lkml so I forward it now:

Sunny Days wrote:
 hello,

 i have done some extensive testing on this.

 various opterons, always single socket
 various dimms 1 and 2gb modules
 and hitachi+seagate disks with various firmwares and sizes
 but i am getting a diferent pattern in the corruption.
 My test file was 10gb.

 I have mapped the earliest corruption as low as 10mb in the written data.
 i have also monitor the adress range used from the cp /md5sum proccess
 under /proc//$PID/maps to see if i could find a pattern but i was
 unable to.

 i also tested ext2 and lvm with similar results aka corruption.
 later on the week i should get a pci promise controller and test on that one.

 Things i have not tested is the patch that linus released 10 days ago
 and reiserfs3/4

 my nvidia chipset was ck804 (a3)

 Hope somehow we get to the bottom of this.

 Hope this helps


 btw amd erratas that could possible influence this are

 115, 123, 156 with the latter been fascinating as it the workaround
 suggested is 0x0 page entry.

   

Does anyone has any opinions about this? Could you please read the
mentioned erratas and tell me what you think?

Best wishes,
Chris.

@ Sunny Days: Thanks for you mail.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Hi everybody.

Sorry again for my late reply...

Robert gave us the following interesting information some days ago:

Robert Hancock wrote:
 If this is related to some problem with using the GART IOMMU with memory 
 hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on 
 nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA 
 controller are concerned as the sata_nv driver now supports 64-bit DMA 
 on these chipsets and so no longer requires the IOMMU.
   


I've just tested it with my normal BIOS settings, that is memhole
mapping = hardware, IOMMU = enabled and 64MB and _without_ (!)
iommu=soft as kernel parameters.
I only had the time for a small test (that is 3 passes with each 10
complete sha512sums cyles over about 30GB data)... but sofar, no
corruption occured.

It is surely far to eraly to tell that our issue was solved by
2.6.20-rc-something but I ask all of you that had systems that
suffered from the corruption to make _intensive_ tests with the most
recent rc of 2.6.20 (I've used 2.6.20-rc5) and report your results.
I'll do a extensive test tomorrow.

And of course (!!): Test without using iommu=soft and with enabled
memhole mapping (in the BIOS). (It won't make any sense to look if the
new kernel solves our problem while still applying one of our two
workarounds).


Please also note that there might be two completely data corruption
problems. The onle solved by iommu=soft and another reported by Kurtis
D. Rader.
I've asked him to clarify this in a post. :-)



Ok,... now if this (the new kernel) would really solve the issue... we
should try to find out what exactly was changed in the code, and if it
sounds logical that this solved the problem or not.
The new kernel could just make the corruption even more rare.


Best wishes,
Chris.


begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-15 Thread Christoph Anton Mitterer
Sorry, as always I've forgot some things... *g*


Robert Hancock wrote:

 If this is related to some problem with using the GART IOMMU with memory 
 hole remapping enabled
What is that GART thing exactly? Is this the hardware IOMMU? I've always
thought GART was something graphics card related,.. but if so,.. how
could this solve our problem (that seems to occur mainly on harddisks)?

 then 2.6.20-rc kernels may avoid this problem on 
 nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA 
 controller are concerned
Does this mean that PATA is no related? The corruption appears on PATA
disks to, so why should it only solve the issue at SATA disks? Sounds a
bit strange to me?

 as the sata_nv driver now supports 64-bit DMA 
 on these chipsets and so no longer requires the IOMMU.
   
Can you explain this a little bit more please? Is this a drawback (like
a performance decrease)? Like under Windows where they never use the
hardware iommu but always do it via software?


Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-04 Thread Christoph Anton Mitterer
Hi.

Just for you information: I've put the issue into the kernel.org bugzilla.
http://bugzilla.kernel.org/show_bug.cgi?id=7768

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-04 Thread Christoph Anton Mitterer
Hi.

Just for you information: I've put the issue into the kernel.org bugzilla.
http://bugzilla.kernel.org/show_bug.cgi?id=7768

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-03 Thread Christoph Anton Mitterer
Hi everybody.

After my last mails to this issue (btw: anything new in the meantime? I
received no replys..) I wrote again to nvidia and AMD...
This time with some more success.

Below is the answer from Mr. Friedman to my mail. He says that he wasn't
able to reproduce the problem and asks for a testing system.
Unfortunately I cannot ship my system as this is my only home PC and I
need it for daily work. But perhaps someone else here might has a system
(with the error) that he can send to Nvidia...

I cc'ed Mr. Friedman so he'll read your replies.

To Mr. Friedman: What system did you exactly use for your testing?
(Hardware configuration, BIOS settings and so on). As we've seen before
it might be possible that some BIOSes correct the problem.

Best wishes,
Chris.




Lonni J Friedman wrote:
> Christoph,
> Thanks for your email.  I'm aware of the LKML threads, and have spent 
> considerable time attempting to reproduce this problem on one of our 
> reference motherboards without success.  If you could ship a system 
> which reliably reproduces the problem, I'd be happy to investigate further.
>
> Thanks,
> Lonni J Friedman
> NVIDIA Corporation
>
> Christoph Anton Mitterer wrote:
>   
>> Hi.
>>
>> First of all: This is only a copy from a thread to nvnews.net
>> (http://www.nvnews.net/vbulletin/showthread.php?t=82909). You probably
>> should read the description there.
>>
>> Please note that his is also a very important issue. It is most likely
>> not only Linux related but a general nforce chipset design flaw, so
>> perhaps you should forwad this mail to your engineers too. (Please CC me
>> in all mails).
>>
>> Also note: I'm not one of the normal "end users" with simple problems or
>> damaged hardware. I study computer science and work in one of Europes
>> largest supercomputing centres (Leibniz supercomputing centre).
>> Believe me: I know what I'm talking about and I'm investigating in
>> this issue (with many others) for some weeks now.
>>
>> Please answer either to the specific lkml thread, to the nvnews.net post
>> or directly to me (via email).
>> And I'd be grateful if you could give me email-addresses from your
>> developers or enginers, or even better, forward this email to them and
>> CC me. Of course I'll keep their emails-addresses absolutely confident
>> if you wish.
>>
>> Best wishes,
>> Christoph Anton Mitterer.
>> Munich University of Applied Sciences / Department of Mathematics and
>> Computer Science
>> Leibniz Supercomputing Centre / Department for High Performance
>> Computing and Compute Servers
>>
>>
>>
>>
>> Here is the copy:
>> Hi.
>>
>> I've already tried to "resolve" this via the nvidia knowledgebase but
>> either they don't want to know about that issue or there is noone who is
>> competent enought to give information/solutions about it.
>> They finally pointed me to this fourm and told me that Linux
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#> support would
>> be handled here (they did not realise that this is probably a hardware
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#> flaw and not
>> OS related).
>>
>> I must admit that I'm a little bit bored with Nvidia's policy in such
>> matters and thus I only describe the problem in brief.
>> If here is any competent chipset engineer who reads this, than he might
>> read the main discussion-thread (and some spin-off threads) of the issue
>> which takes place at the linux-kernel mailing list (again this is
>> probably not Linux related).
>> You can find the archive here:
>> http://marc.theaimsgroup.com/?t=11650212181=1=2
>> <http://marc.theaimsgroup.com/?t=11650212181=1=2>
>>
>>
>> Now a short description:
>> -I (and many others) found a data corruption issue that happens on AMD
>> Opteron / Nvidia chipset systems
>> <http://www.nvnews.net/vbulletin/showthread.php?t=82909#>.
>>
>> -What happens: If one reads/writes large amounts of data there are errors.
>> We test this the following way: Create some test data (huge amounts
>> of),.. make md5sums of it (or with other hash algorithms), then verify
>> them over and over.
>> The test shoes differences (refer the lkml thread for more information
>> about this). Always at differnt files (). It may happen at read AND
>> write access <http://www.nvnews.net/vbulletin/showthread.php?t=82909#>.
>> Note that even for affected users the error occurs rarely (but this is
>> of course still far to often): My personal tests shows about the following:
&

Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2007-01-03 Thread Christoph Anton Mitterer
Hi everybody.

After my last mails to this issue (btw: anything new in the meantime? I
received no replys..) I wrote again to nvidia and AMD...
This time with some more success.

Below is the answer from Mr. Friedman to my mail. He says that he wasn't
able to reproduce the problem and asks for a testing system.
Unfortunately I cannot ship my system as this is my only home PC and I
need it for daily work. But perhaps someone else here might has a system
(with the error) that he can send to Nvidia...

I cc'ed Mr. Friedman so he'll read your replies.

To Mr. Friedman: What system did you exactly use for your testing?
(Hardware configuration, BIOS settings and so on). As we've seen before
it might be possible that some BIOSes correct the problem.

Best wishes,
Chris.




Lonni J Friedman wrote:
 Christoph,
 Thanks for your email.  I'm aware of the LKML threads, and have spent 
 considerable time attempting to reproduce this problem on one of our 
 reference motherboards without success.  If you could ship a system 
 which reliably reproduces the problem, I'd be happy to investigate further.

 Thanks,
 Lonni J Friedman
 NVIDIA Corporation

 Christoph Anton Mitterer wrote:
   
 Hi.

 First of all: This is only a copy from a thread to nvnews.net
 (http://www.nvnews.net/vbulletin/showthread.php?t=82909). You probably
 should read the description there.

 Please note that his is also a very important issue. It is most likely
 not only Linux related but a general nforce chipset design flaw, so
 perhaps you should forwad this mail to your engineers too. (Please CC me
 in all mails).

 Also note: I'm not one of the normal end users with simple problems or
 damaged hardware. I study computer science and work in one of Europes
 largest supercomputing centres (Leibniz supercomputing centre).
 Believe me: I know what I'm talking about and I'm investigating in
 this issue (with many others) for some weeks now.

 Please answer either to the specific lkml thread, to the nvnews.net post
 or directly to me (via email).
 And I'd be grateful if you could give me email-addresses from your
 developers or enginers, or even better, forward this email to them and
 CC me. Of course I'll keep their emails-addresses absolutely confident
 if you wish.

 Best wishes,
 Christoph Anton Mitterer.
 Munich University of Applied Sciences / Department of Mathematics and
 Computer Science
 Leibniz Supercomputing Centre / Department for High Performance
 Computing and Compute Servers




 Here is the copy:
 Hi.

 I've already tried to resolve this via the nvidia knowledgebase but
 either they don't want to know about that issue or there is noone who is
 competent enought to give information/solutions about it.
 They finally pointed me to this fourm and told me that Linux
 http://www.nvnews.net/vbulletin/showthread.php?t=82909# support would
 be handled here (they did not realise that this is probably a hardware
 http://www.nvnews.net/vbulletin/showthread.php?t=82909# flaw and not
 OS related).

 I must admit that I'm a little bit bored with Nvidia's policy in such
 matters and thus I only describe the problem in brief.
 If here is any competent chipset engineer who reads this, than he might
 read the main discussion-thread (and some spin-off threads) of the issue
 which takes place at the linux-kernel mailing list (again this is
 probably not Linux related).
 You can find the archive here:
 http://marc.theaimsgroup.com/?t=11650212181r=1w=2
 http://marc.theaimsgroup.com/?t=11650212181r=1w=2


 Now a short description:
 -I (and many others) found a data corruption issue that happens on AMD
 Opteron / Nvidia chipset systems
 http://www.nvnews.net/vbulletin/showthread.php?t=82909#.

 -What happens: If one reads/writes large amounts of data there are errors.
 We test this the following way: Create some test data (huge amounts
 of),.. make md5sums of it (or with other hash algorithms), then verify
 them over and over.
 The test shoes differences (refer the lkml thread for more information
 about this). Always at differnt files (). It may happen at read AND
 write access http://www.nvnews.net/vbulletin/showthread.php?t=82909#.
 Note that even for affected users the error occurs rarely (but this is
 of course still far to often): My personal tests shows about the following:
 Test data: 30GB (of random data), I verify sha512sum 50 times (that is
 what I call one complete test). So I verify 30*50GB. In one complete
 test there are about 1-3 files with differences. With about 100
 corrupted bytes (at leas very low data sizes, far below an MB)

 -It probably happens with all the nforce chipsets (see the lkml thread
 where everybody tells his hardware)

 -The reasons are not single hardware defects (dozens of hight quality
 memory http://www.nvnews.net/vbulletin/showthread.php?t=82909#, CPU,
 PCI bus, HDD bad block scans, PCI parity, ECC, etc. tests showed this,
 and even with different hardware compontents the issue remained

Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-22 Thread Christoph Anton Mitterer
John A Chaves wrote:
> I didn't need to run a specific test for this.  The normal workload of the
> machine approximates a continuous selftest for almost the last year.
>
> Large files (4-12GB is typical) are being continuously packed and unpacked
> with gzip and bzip2.  Statistical analysis of the datasets is followed by
> verification of the data, sometimes using diff, or md5sum, or python
> scripts using numarray to mmap 2GB chunks at a time.  The machine
> often goes for days with a load level of 20+ and 32GB RAM + another 32GB
> swap in use.  It would be very unlikely for data corruption to go unnoticed.
>
> When I first got the machine I did have some problems with disks being
> dropped from the RAID and occasional log messages implicating the IOMMU.
> But that was with kernel 2.6.16.?, Kernels since 2.6.17 haven't had any
> problem.
>   
Ah thanks for that info,.. as far as I can tell,.. this "testing
environment" should have found any corruptions I there had been any.

So I think we could take this as our first working system where the
issue don't occur although we would expect it...

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-22 Thread Christoph Anton Mitterer
Hi my friends

It became a little bit silent about this issue... any new ideas or results?



Karsten Weiss wrote:
> BTW: Did someone already open an official bug at
> http://bugzilla.kernel.org ?
Karsten, did you already file a bug?



I told the whole issue to the Debian people which are about to release
etch and suggested them to use iommu=soft by default.
This brings me to:
Chris Wedgwood wrote:
> Does anyone have an amd64 with an nforce4 chipset and >4GB that does
> NOT have this problem? If so it might be worth chasing the BIOS
> vendors to see what errata they are dealing with.
John Chaves replied and claimed that he wouldn't suffer from that
problem (I've CC'ed him to this post).
You can read his message at the bottom of this post.
@ John: Could you please tell us in detail how you've tested your system?



Muli told us some information about the iommu options (when he
discuessed Karstens patch) has anybody made tests with the other iommu
options?



Ok and what does it all come down to? We still don't know the exact
reason...
Perhaps a kernel bug, a Opteron and/or Chipset bug,.. and perhaps there
are even some BIOSes that solve the issue...

For the kernel-bug reason,... who is the responsible developer for the
relevant code? Can we contact him to read our threads and perhaps review
the code?

Is anyone able (or wants to try) to inform AMD and/or Nvidia about the
issue (perhaps with pointing to that thread).

Someone might even try to contact some board vendors (some of us seem to
have Tyan boards). Although I'm in contact with the German support Team
of Tyan, I wasn't very successful with the US team... perhaps they have
other ideas.

Last but not least if we don't find a solution what should we do?
In my opinion at least the following:
1) Inform other OS communities (*BSD) and point the to our thread. Some
of you claimed that Windows wouldn't use the hwiommu at all so I think
we don't have to contact big evil.
2) Contact the major Linux Distributions (I've already did it for
Debian) and inform them about the potential issue and pointing them to
this thread (where one can find all the relevant information, I think)
3) Workaround for the kernel:
I have to less knowledge to know exactly what to do but I remember there
are other fixes for mainboard flaws and buggy chipsets in the kernel
(e.g. the RZ1000 or something like this in the "old" IDE driver)...
Perhaps someone (who knows what to do ;-) ) could write some code that
automatically uses iommu=soft,... but then we have the question: In
which case :-( . I imagine that the AMD users who don't suffer from this
issue would like to continue using their hwiommus..


What I'm currently plan to do:
1) If know one else is willing to try contacting AMD/Nvidia,.. I'd try
again.
2) I told you that I'm going to test the whole issue in the Leibniz
Supercomputing Centre where I work as student...
This is a little bit delayed (organisational problems :-) )
Anyway,... I'm not only going to test it on our Linux Cluster but also
some Sun Fire's (whe have mny of them ;-) ). According to my
boss they have nvidia chipsets... (He is probably contacting Sun for the
issue).



So much for now.

Best wishes,
Chris.


John Chaves message:
Here's another data point in case it helps.
The following system does *not* have the data corruption issue.

Motherboard: Iwill DK88 
Chipset: NVIDIA nForce4 Professional 2200
CPUs: Two Dual Core AMD Opteron(tm) Processor 280
Memory: 32GB
Disks: Four 500GB SATA in linux RAID1 over RAID0 setup
Kernel: 2.6.18

This system is a workhorse with extreme disk I/O of huge files,
and the nature of the work done would have revealed data
corruption pretty quickly.

FWIW,
John Chaves

His lspic:
:00:00.0 Memory controller: nVidia Corporation CK804 Memory
Controller (rev a3)
Flags: bus master, 66MHz, fast devsel, latency 0
Capabilities: [44] #08 [01e0]
Capabilities: [e0] #08 [a801]

:00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0

:00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
Subsystem: nVidia Corporation: Unknown device cb84
Flags: 66MHz, fast devsel, IRQ 9
I/O ports at d400 [size=32]
I/O ports at 4c00 [size=64]
I/O ports at 4c40 [size=64]
Capabilities: [44] Power Management version 2

:00:02.0 USB Controller: nVidia Corporation CK804 USB Controller
(rev a2) (prog-if 10 [OHCI])
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 209
Memory at feafc000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2

:00:02.1 USB Controller: nVidia Corporation CK804 USB Controller
(rev a3) (prog-if 20 [EHCI])
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 193
Memory at feafdc00 (32-bit, non-prefetchable) [size=256]

Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-22 Thread Christoph Anton Mitterer
Hi my friends

It became a little bit silent about this issue... any new ideas or results?



Karsten Weiss wrote:
 BTW: Did someone already open an official bug at
 http://bugzilla.kernel.org ?
Karsten, did you already file a bug?



I told the whole issue to the Debian people which are about to release
etch and suggested them to use iommu=soft by default.
This brings me to:
Chris Wedgwood wrote:
 Does anyone have an amd64 with an nforce4 chipset and 4GB that does
 NOT have this problem? If so it might be worth chasing the BIOS
 vendors to see what errata they are dealing with.
John Chaves replied and claimed that he wouldn't suffer from that
problem (I've CC'ed him to this post).
You can read his message at the bottom of this post.
@ John: Could you please tell us in detail how you've tested your system?



Muli told us some information about the iommu options (when he
discuessed Karstens patch) has anybody made tests with the other iommu
options?



Ok and what does it all come down to? We still don't know the exact
reason...
Perhaps a kernel bug, a Opteron and/or Chipset bug,.. and perhaps there
are even some BIOSes that solve the issue...

For the kernel-bug reason,... who is the responsible developer for the
relevant code? Can we contact him to read our threads and perhaps review
the code?

Is anyone able (or wants to try) to inform AMD and/or Nvidia about the
issue (perhaps with pointing to that thread).

Someone might even try to contact some board vendors (some of us seem to
have Tyan boards). Although I'm in contact with the German support Team
of Tyan, I wasn't very successful with the US team... perhaps they have
other ideas.

Last but not least if we don't find a solution what should we do?
In my opinion at least the following:
1) Inform other OS communities (*BSD) and point the to our thread. Some
of you claimed that Windows wouldn't use the hwiommu at all so I think
we don't have to contact big evil.
2) Contact the major Linux Distributions (I've already did it for
Debian) and inform them about the potential issue and pointing them to
this thread (where one can find all the relevant information, I think)
3) Workaround for the kernel:
I have to less knowledge to know exactly what to do but I remember there
are other fixes for mainboard flaws and buggy chipsets in the kernel
(e.g. the RZ1000 or something like this in the old IDE driver)...
Perhaps someone (who knows what to do ;-) ) could write some code that
automatically uses iommu=soft,... but then we have the question: In
which case :-( . I imagine that the AMD users who don't suffer from this
issue would like to continue using their hwiommus..


What I'm currently plan to do:
1) If know one else is willing to try contacting AMD/Nvidia,.. I'd try
again.
2) I told you that I'm going to test the whole issue in the Leibniz
Supercomputing Centre where I work as student...
This is a little bit delayed (organisational problems :-) )
Anyway,... I'm not only going to test it on our Linux Cluster but also
some Sun Fire's (whe have mny of them ;-) ). According to my
boss they have nvidia chipsets... (He is probably contacting Sun for the
issue).



So much for now.

Best wishes,
Chris.


John Chaves message:
Here's another data point in case it helps.
The following system does *not* have the data corruption issue.

Motherboard: Iwill DK88 http://www.iwill.net/product_2.asp?p_id=102
Chipset: NVIDIA nForce4 Professional 2200
CPUs: Two Dual Core AMD Opteron(tm) Processor 280
Memory: 32GB
Disks: Four 500GB SATA in linux RAID1 over RAID0 setup
Kernel: 2.6.18

This system is a workhorse with extreme disk I/O of huge files,
and the nature of the work done would have revealed data
corruption pretty quickly.

FWIW,
John Chaves

His lspic:
:00:00.0 Memory controller: nVidia Corporation CK804 Memory
Controller (rev a3)
Flags: bus master, 66MHz, fast devsel, latency 0
Capabilities: [44] #08 [01e0]
Capabilities: [e0] #08 [a801]

:00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0

:00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
Subsystem: nVidia Corporation: Unknown device cb84
Flags: 66MHz, fast devsel, IRQ 9
I/O ports at d400 [size=32]
I/O ports at 4c00 [size=64]
I/O ports at 4c40 [size=64]
Capabilities: [44] Power Management version 2

:00:02.0 USB Controller: nVidia Corporation CK804 USB Controller
(rev a2) (prog-if 10 [OHCI])
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 209
Memory at feafc000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 2

:00:02.1 USB Controller: nVidia Corporation CK804 USB Controller
(rev a3) (prog-if 20 [EHCI])
Subsystem: nVidia Corporation: Unknown device cb84
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 193
Memory at feafdc00 (32-bit, non-prefetchable) [size=256]

Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-22 Thread Christoph Anton Mitterer
John A Chaves wrote:
 I didn't need to run a specific test for this.  The normal workload of the
 machine approximates a continuous selftest for almost the last year.

 Large files (4-12GB is typical) are being continuously packed and unpacked
 with gzip and bzip2.  Statistical analysis of the datasets is followed by
 verification of the data, sometimes using diff, or md5sum, or python
 scripts using numarray to mmap 2GB chunks at a time.  The machine
 often goes for days with a load level of 20+ and 32GB RAM + another 32GB
 swap in use.  It would be very unlikely for data corruption to go unnoticed.

 When I first got the machine I did have some problems with disks being
 dropped from the RAID and occasional log messages implicating the IOMMU.
 But that was with kernel 2.6.16.?, Kernels since 2.6.17 haven't had any
 problem.
   
Ah thanks for that info,.. as far as I can tell,.. this testing
environment should have found any corruptions I there had been any.

So I think we could take this as our first working system where the
issue don't occur although we would expect it...

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-14 Thread Christoph Anton Mitterer
Muli Ben-Yehuda wrote:
>> 4)
>> And does someone know if the nforce/opteron iommu requires IBM Calgary
>> IOMMU support?
>> 
> It doesn't, Calgary isn't found in machine with Opteron CPUs or NForce
> chipsets (AFAIK). However, compiling Calgary in should make no
> difference, as we detect in run-time which IOMMU is found and the
> machine.
Yes,.. I've read the relevant section shortly after sending that email ;-)

btw & for everybody:
I'm working (as student) at the LRZ (Leibniz Computing Centre) in Munich
where we have very large Linux Cluster and lots of different other
machines,...
I'm going to test for that error on most of the different types of
systems we have,.. and will inform you about my results (if they're
interesting).

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-14 Thread Christoph Anton Mitterer
Muli Ben-Yehuda wrote:
 4)
 And does someone know if the nforce/opteron iommu requires IBM Calgary
 IOMMU support?
 
 It doesn't, Calgary isn't found in machine with Opteron CPUs or NForce
 chipsets (AFAIK). However, compiling Calgary in should make no
 difference, as we detect in run-time which IOMMU is found and the
 machine.
Yes,.. I've read the relevant section shortly after sending that email ;-)

btw  for everybody:
I'm working (as student) at the LRZ (Leibniz Computing Centre) in Munich
where we have very large Linux Cluster and lots of different other
machines,...
I'm going to test for that error on most of the different types of
systems we have,.. and will inform you about my results (if they're
interesting).

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Hi.

I've just looked for some kernel config options that might relate to our
issue:


1)
Old style AMD Opteron NUMA detection (CONFIG_K8_NUMA)
Enable K8 NUMA node topology detection.  You should say Y here if you
have a multi processor AMD K8 system. This uses an old method to read
the NUMA configuration directly from the builtin Northbridge of Opteron.
It is recommended to use X86_64_ACPI_NUMA instead, which also takes
priority if both are compiled in.

ACPI NUMA detection (CONFIG_X86_64_ACPI_NUMA)
Enable ACPI SRAT based node topology detection.

What should one select for the Opterons? And is it possible that this
has something to do with our datacorruption error?


2)
The same two questions for the memory model (Discontiguous or Sparse)



3)
The same two questions for CONFIG_MIGRATION ()


4)
And does someone know if the nforce/opteron iommu requires IBM Calgary
IOMMU support?



This is unrelated to our issue,.. but it would be nice if some of your
could send me their .config,.. I'd like to compare them with my own and
see if I could something tweak or so.
(Of course only people with 2x DualCore Systems ;) )



Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Lennart Sorensen wrote:
> I upgrade my plextor firmware using linux.  pxupdate for most devices,
> and pxfw for new drivers (like the PX760).  Works perfectly for me.  It
> is one of the reasons I buy plextors.
Yes I know about it,.. although never tested it,... anyway the main
reason for Windows is Exact Audio Copy (but Andre Wiehthoff is working
on a C port :-D )

Unfortunately my PX760 seems to be defect,.. posted about the issue to
lkml but no success :-(

Best wishes,
Chris.

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Erik Andersen wrote:
> I just realized that booting with "iommu=soft" makes my pcHDTV
> HD5500 DVB cards not work.  Time to go back to disabling the
> memhole and losing 1 GB.  :-(
Crazy,...
I have a Hauppauge Nova-T 500 DualDVB-T card,... I'll check it later if
I have the same problem and will inform you (please remember me if I
forget ;) )


Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Chris Wedgwood wrote:
>> Did anyone made any test under Windows? I cannot set there
>> iommu=soft, can I?
>> 
> Windows never uses the hardware iommu, so it's always doing the
> equivalent on iommu=soft
>   
That would mean that I'm not able to reproduce the issue unter windows,
right?
Does that apply for all versions (up to and including Vista).

Don't understand me wrong,.. I don't use Windows (expect for upgrading
my Plextor firmware and EAC ;) )... but I ask because the more
information we get (even if it's not Linux specific) the more steps we
can take ;)

Chris.

begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Karsten Weiss wrote:
> "Memory hole mapping" was set to "hardware". With "disabled" we only
> see 3 of our 4 GB memory.
>   
That sounds reasonable,... I even only see 2,5 GB,.. as my memhole takes
1536 MB (don't ask me which PCI device needs that much address space ;) )
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Karsten Weiss wrote:

> Of course, the big question "Why does the hardware iommu *not*
> work on those machines?" still remains.
>   
I'm going to check AMDs errata docs these days,.. perhaps I find
something that relates. But I'd ask you to do the same as I  don't
consider myself as an expert in these issues ;-)

Chris Wedgwood said that iommu isn't used unter windows at all,.. so I
think the following three solutions would be possible:
- error in the Opteron (memory controller)
- error in the Nvidia chipsets
- error in the kernel


> I have also tried setting "memory hole mapping" to "disabled"
> instead of "hardware" on some of the machines and this *seems*
> to work stable, too. However, I did only test it on about a
> dozen machines because this bios setting costs us 1 GB memory
> (and iommu=soft does not).
>   
Yes... loosing so much memory is a big drawback,.. anyway it would be
great if you can make some more extensive tests that we'd be able to say
if memholemapping=disabled in the BIOS really solves that issue, too, or
not.

Does anyone know how memhole mapping in the BIOS relates to the iommu stuff?
Is it likely or explainable that both would sovle the issue?


> BTW: Maybe I should also mention that other machines types
> (e.g. the HP xw9300 dual opteron workstations) which also use a
> NVIDIA chipset and Opterons never had this problem as far as I
> know.
>   
Uhm,.. that's really strange,... I would have thought that this would
affect all systems that uses either the (mayby) buggy nforce chipset,..
or the (mayby) buggy Opteron.

Did those systems have exactly the same Nvidia-Type? Same question for
the CPU (perhaps the issue only occurs for a speciffic stepping)
Again I have:
nforce professional 2200
nforce professional 2050
Opteron model 275 (stepping E6)


btw: I think that is already clear but again:
Both "solutions" solve the problem for me:
Either
- memhole mapping=disabled in the BIOS (but you loose some memory)
- without any iommu= option for the kernel
or
- memhole mapping=hardware in the BIOS (I suppuse it will work with
software too)
- with iommu=soft for the kernel



Best wishes,
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

2006-12-13 Thread Christoph Anton Mitterer
Ah and I forgot,...

Did anyone made any test under Windows? I cannot set there iommu=soft,
can I?

Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



  1   2   >