Bug#1015871: Enabling PCI_P2PDMA for distro kernels?

2023-10-25 Thread Bjorn Helgaas
On Wed, Oct 25, 2023 at 07:11:26PM +0200, Lukas Wunner wrote:
> On Wed, Oct 25, 2023 at 10:30:07AM -0600, Logan Gunthorpe wrote:
> > In addition to the above, P2PDMA transfers are only allowed by the
> > kernel for traffic that flows through certain host bridges that are
> > known to work. For AMD, all modern CPUs are on this list, but for Intel,
> > the list is very patchy.
> 
> This has recently been brought up internally at Intel and nobody could
> understand why there's a whitelist in the first place.  A long-time PCI
> architect told me that Intel silicon validation has been testing P2PDMA
> at least since the Lindenhurst days, i.e. since 2005.
> 
> What's the reason for the whitelist?  Was there Intel hardware which
> didn't support it or turned out to be broken?
> 
> I imagine (but am not certain) that the feature might only be enabled
> for server SKUs, is that the reason?

No, the reason is that the PCIe spec doesn't require routing of
peer-to-peer transactions between Root Ports:
https://git.kernel.org/linus/0f97da831026

I think there was a little discussion about adding a firmware
interface to advertise this capability, but I guess nobody cared
enough to advance it.

Bjorn



Bug#679545: Upstream PCI bugzilla for this issue

2012-10-05 Thread Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=48451


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#679545: ia64, SR870, EFI bug breaks ata_piix, uninitialized ICH4 IDE EXBAR mem resource

2012-10-01 Thread Bjorn Helgaas
On Sat, Sep 29, 2012 at 4:09 PM, Stephan Schreiber i...@fs-driver.org wrote:
 Hello Bjorn,
 thank you very much for the patch.
 I tested it; it works.

 (typing mistake: it must read PCI_COMMAND_MEMORY instead of PCI_COMMAND_MEM
 at one location;
 some hunks of the patch couldn't be applied automatically on Kernel 3.2.23
 because some comments in the contexts are different)

Thanks a lot for testing this!  I'll fix up this typo and work on
getting something like this merged.

 The dmesg output:

 [0.00] Initializing cgroup subsys cpuset
 [0.00] Initializing cgroup subsys cpu
 [0.00] Linux version 3.2.0-3-mckinley (Debian 3.2.23-1)
 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1
 SMP Fri Sep 28 21:57:11 CEST 2012
 ...
 [0.065510] pci :00:1f.1: [8086:24cb] type 0 class 0x000101
 [0.065524] pci :00:1f.1: reg 10: [io  0x-0x0007]
 [0.065535] pci :00:1f.1: reg 14: [io  0x-0x0003]
 [0.065546] pci :00:1f.1: reg 18: [io  0x-0x0007]
 [0.065556] pci :00:1f.1: reg 1c: [io  0x-0x0003]
 [0.065567] pci :00:1f.1: reg 20: [io  0x1000-0x100f]
 [0.065578] pci :00:1f.1: reg 24: [mem 0x-0x03ff unset]
 ...
 [1.391380] libata version 3.00 loaded.
 [1.391922] ata_piix :00:1f.1: version 2.13
 [1.391938] ata_piix :00:1f.1: can't derive routing for PCI INT A
 [1.392493] scsi0 : ata_piix
 [1.392886] scsi1 : ata_piix
 [1.393018] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1000 irq
 34
 [1.393066] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1008 irq
 33
 [1.557756] ata1.00: ATAPI: HL-DT-ST DVDRAM GSA-T40N, JR03, max UDMA/33
 [1.573616] ata1.00: configured for UDMA/33
 [1.579147] scsi 0:0:0:0: CD-ROMHL-DT-ST DVDRAM GSA-T40N
 JR03 PQ: 0 ANSI: 5
 [1.590806] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2
 cdda tray
 [1.590872] cdrom: Uniform CD-ROM driver Revision: 3.20
 [1.591272] sr 0:0:0:0: Attached scsi CD-ROM sr0
 [1.593910] sr 0:0:0:0: Attached scsi generic sg0 type 5
 ...

 On x86, Windows normally doesn't reconfigure PCI devices unless it
 finds a problem with the configuration done by the BIOS.  I suspect
 it works similarly on ia64.  I would guess that Windows noticed that
 the MEM bit was not set, and therefore ignored the MEM BAR contents.


 Since I have the four Windows versions 'for Itanium Based Systems' on that
 box as well (XP, Server 2003, 2008, 2008 R2), I can tell you more:
 The Device Manager shows a memory range FFBFFC00-FFBF for the Intel
 82801DB Ultra ATA Storage Controller-24CB - on any of these Windows
 versions.

Oh, that's good data, thanks!  It looks like Windows noticed that the
BAR was invalid and assigned a valid resource to it.  That's in the
third aperture below:

ACPI: PCI Root Bridge [PCI0] (domain  [bus 00-01])
pci_root PNP0A03:00: host bridge window [mem 0x000a-0x000f]
pci_root PNP0A03:00: host bridge window [mem 0xfa00-0xfbff]
pci_root PNP0A03:00: host bridge window [mem 0xff00-0x]
pci_root PNP0A03:00: host bridge window [mem 0xfec0-0xfec0]

Linux *should* probably do the same (though at a different actual
address because we assign bottom-up instead of top-down as Windows
does).  I don't know off the top of my head whether we actually do in
this case or not.

What's the output of dmesg | grep :00:1f.1; lspci -vs00:1f.1?


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#679545: [RFC/PATCH v2] ia64, SR870, EFI bug breaks ata_piix, uninitialized ICH4 IDE EXBAR mem resource

2012-09-24 Thread Bjorn Helgaas
On Mon, Sep 24, 2012 at 07:09:12PM +0200, Stephan Schreiber wrote:
 Mpfff, there aren't many replies; seems I didn't satisfy what you
 want to have...
 
 At first I want to mention that I just want to help the Debian
 project and started testing Debian Wheezy my old ia64 box.

Thanks, I really appreciate that, and you've done a huge amount of
debugging and testing already.  It's very normal to iterate on the
resolution as we're doing now.

 The firmware left the memory BAR at 0x24 cleared (0x), but it
 also left the MEM bit in the command register disabled.  So it seems
 like a Linux bug that we're trying to use that zero address from the
 BAR.  If the firmware left the MEM or IO decode enable bit cleared,
 why would we assume it put anything useful in the corresponding BARs?
 
 Your idea would be a fundamental change in the Kernel; I just want
 to fix the ata_piix problem in Debian Wheezy.

Right.  I think you've tripped over a rather fundamental issue, and
I'm hoping we can fix that.  If we can, that will help many users,
not just the handful who have this ia64 box.

 If you would evaluate the command registers, which the BIOS or EFI
 has initialized, you would work around some wrong BARs. You might
 run into trouble due to wrong command register values instead.
 Are you sure that any BIOS or EFI sets the command registers correctly?

We can't be 100% sure about things like that, of course.  But we do
know that if the MEM or IO bits are set in the command register, the
device will claim transactions that match whatever is in the BARs.
So setting the MEM or IO bit is a pretty strong statement that the
BAR contains a valid address.  If the BIOS leaves those bits clear,
we really can't conclude anything about the BAR contents.

 Currently the Linux Kernel sets and clears the IORESOURCE_MEM and
 IORESOURCE_IO bits in the command registers as needed.
 Windows reconfigures any PCI device. The settings of the BIOS or EFI
 do not matter at all; the user doesn't experience any BIOS bug at
 all.

On x86, Windows normally doesn't reconfigure PCI devices unless it
finds a problem with the configuration done by the BIOS.  I suspect
it works similarly on ia64.  I would guess that Windows noticed that
the MEM bit was not set, and therefore ignored the MEM BAR contents.

Can you try the following patch?  It's based on 3.6-rc5+, but I think
it will apply to your 3.2.23 kernel with minor conflicts that shouldn't
be too hard to resolve.

It's not quite right because we really shouldn't turn on the MEM or IO
decode bit unless *all* of the corresponding BARs have been set, but
in your case, I think there is only one MEM BAR that is an issue.

Bjorn




commit 9038dd3b3c4c9e4c7ca0118c8df398c4c646ab58
Author: Bjorn Helgaas bhelg...@google.com
Date:   Mon Sep 24 17:16:28 2012 -0600

vsprintf: Add support for IORESOURCE_UNSET in %pR

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 0e33754..b6c 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -600,7 +600,7 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
 * 64-bit res (sizeof==8): 20 chars in dec, 18 in hex (0x + 16) */
 #define RSRC_BUF_SIZE  ((2 * sizeof(resource_size_t)) + 4)
 #define FLAG_BUF_SIZE  (2 * sizeof(res-flags))
-#define DECODED_BUF_SIZE   sizeof([mem - 64bit pref window disabled])
+#define DECODED_BUF_SIZE   sizeof([mem - 64bit pref window unset 
disabled])
 #define RAW_BUF_SIZE   sizeof([mem - flags 0x])
char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
 2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
@@ -642,6 +642,8 @@ char *resource_string(char *buf, char *end, struct resource 
*res,
p = string(p, pend,  pref, str_spec);
if (res-flags  IORESOURCE_WINDOW)
p = string(p, pend,  window, str_spec);
+   if (res-flags  IORESOURCE_UNSET)
+   p = string(p, pend,  unset, str_spec);
if (res-flags  IORESOURCE_DISABLED)
p = string(p, pend,  disabled, str_spec);
} else {

commit f4795a79dc370b6f4106768b16a4a9edba4df933
Author: Bjorn Helgaas bhelg...@google.com
Date:   Mon Sep 24 17:15:30 2012 -0600

PCI: Ignore BAR contents when firmware left decoding disabled

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2396111..6926dcb 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -175,9 +175,10 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
 
mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
 
+   pci_read_config_word(dev, PCI_COMMAND, orig_cmd);
+
/* No printks while decoding is disabled! */
if (!dev-mmio_always_on) {
-   pci_read_config_word(dev, PCI_COMMAND, orig_cmd);
pci_write_config_word(dev, PCI_COMMAND,
orig_cmd  ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO));
}
@@ -211,9 +212,13 @@ int

Bug#679545: [RFC/PATCH v2] ia64, SR870, EFI bug breaks ata_piix, uninitialized ICH4 IDE EXBAR mem resource

2012-09-20 Thread Bjorn Helgaas
On Thu, Sep 20, 2012 at 8:16 AM, Stephan Schreiber i...@fs-driver.org wrote:
 description of the symptoms which you have already read on the initial
 RFC/PATCH==


 Kernel 3.2.23 with Debian patches (Debian Wheezy, testing)
 Debian bug#679545 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679545)

 Machine: Dell PowerEdge 3250 (equivalent with Intel SR870BH2)
 Processor: 2x Itanium Madison 1.5GHz 6M
 Memory: 4GB

 Intel ICH4 (82801DB), IDE host adapter. The ata_piix module fails to
 initialize.

 A snippet from dmesg:
 [0.00] Initializing cgroup subsys cpuset
 [0.00] Initializing cgroup subsys cpu
 [0.00] Linux version 3.2.0-3-mckinley (Debian 3.2.23-1)
 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1
 SMP Mon Jul 23 09:01:02 UTC 2012
 ...
 [0.065516] pci :00:1f.1: [8086:24cb] type 0 class 0x000101
 [0.065530] pci :00:1f.1: reg 10: [io  0x-0x0007]
 [0.065541] pci :00:1f.1: reg 14: [io  0x-0x0003]
 [0.065552] pci :00:1f.1: reg 18: [io  0x-0x0007]
 [0.065563] pci :00:1f.1: reg 1c: [io  0x-0x0003]
 [0.065574] pci :00:1f.1: reg 20: [io  0x1000-0x100f]
 [0.065585] pci :00:1f.1: reg 24: [mem 0x-0x03ff]
 ...
 [1.640965] libata version 3.00 loaded.
 [1.641656] ata_piix :00:1f.1: version 2.13
 [1.641671] ata_piix :00:1f.1: device not available (can't reserve
 [mem 0x-0x03ff])
 [1.641747] ata_piix: probe of :00:1f.1 failed with error -22
 ...

 lspci -vvxxx reports:

 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev
 02) (prog-if 8a [Master SecP PriP])
 Subsystem: Intel Corporation Device 3404
 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
 TAbort- MAbort- SERR- PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 0
 Region 0: I/O ports at 01f0 [size=8]
 Region 1: I/O ports at 03f4 [size=1]
 Region 2: I/O ports at 0170 [size=8]
 Region 3: I/O ports at 0374 [size=1]
 Region 4: I/O ports at 1000 [size=16]
 00: 86 80 cb 24 05 00 80 02 02 8a 01 01 00 00 00 00
 10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
 20: 01 10 00 00 00 00 00 00 00 00 00 00 86 80 04 34
 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
 40: 03 a3 00 80 00 00 00 00 01 00 02 00 00 00 00 00
 50: 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00
 60: 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00
 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 f0: 00 00 00 00 00 00 00 00 60 0f 00 00 00 00 00 00


 You can read in the Intel 82801DB I/O Controller Hub 4 (ICH4) datasheet
 (http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82801db-io-controller-hub-4-datasheet.pdf)
 about the EXBAR register at offset 0x24 (4 bytes):
 EXBAR register
 This is a memory mapped BAR that requires 1 KB of DWord-aligned memory that
 is Intel reserved
 for future functionality. BIOS needs to program the base address for a 1-KB
 memory space.

 The dump shows that EXBAR is 0x, equal to the default value after reset;
 EFI doesn't initialize it.

 ata_piix uses pcim_enable_device() which enables this along with the I/O
 BARs. In systems based on the Intel SR870 platform the firmware does not
 initialize the EXBAR and pcim_enable_device() fails because the memory
 region 0x0-0x3FF cannot be allocated.

 =description of the symptoms which you have already read on the initial
 RFC/PATCH




 My only disagreement here would be putting it in the ia64 paths. If
 someone does the same for x86-32 (and this is EFI so it'll presumbly
 smell the same on all platforms) then we'll want the same.

 Better I think to generically catch the 0/0 case.

 Alan



 Here is a new patch. It extends some existing code in pci_setup_device()
 which maintains some hard-coded io regions on ide controllers in legacy
 mode.
 The idea is hiding an uninitialized EXBAR just as on the initial patch.
 The patch is defensive; it does nothing if
 - the controller isn't in legacy mode,
 - BAR5 (EXBAR) isn't a memory resource, or
 - BAR5 is already initialized.

 The patch is generic because it works on both x86-32 and ia64 and also for
 other ICH4 variants than my rare 82801DB_11 ICH4.
 Even the added 'if' statement of this patch is also executed on IDE
 controllers of other vendors than Intel or on other Intel ICHs, I believe
 that it won't break anything.

This still isn't very generic.  It only looks at BAR 

Bug#679545: [RFC/PATCH] ia64, SR870, EFI bug breaks ata_piix, uninitialized ICH4 IDE EXBAR mem resource

2012-09-17 Thread Bjorn Helgaas
On Sun, Sep 16, 2012 at 10:39 AM, Stephan Schreiber i...@fs-driver.org wrote:

 [0.065516] pci :00:1f.1: [8086:24cb] type 0 class 0x000101
 [0.065530] pci :00:1f.1: reg 10: [io  0x-0x0007]
 [0.065541] pci :00:1f.1: reg 14: [io  0x-0x0003]
 [0.065552] pci :00:1f.1: reg 18: [io  0x-0x0007]
 [0.065563] pci :00:1f.1: reg 1c: [io  0x-0x0003]
 [0.065574] pci :00:1f.1: reg 20: [io  0x1000-0x100f]
 [0.065585] pci :00:1f.1: reg 24: [mem 0x-0x03ff]
 ...
 [1.640965] libata version 3.00 loaded.
 [1.641656] ata_piix :00:1f.1: version 2.13
 [1.641671] ata_piix :00:1f.1: device not available (can't reserve
 [mem 0x-0x03ff])
 [1.641747] ata_piix: probe of :00:1f.1 failed with error -22
 ...

 lspci -vvxxx reports:

 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev
 02) (prog-if 8a [Master SecP PriP])
 Subsystem: Intel Corporation Device 3404
 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
 TAbort- MAbort- SERR- PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 0
 Region 0: I/O ports at 01f0 [size=8]
 Region 1: I/O ports at 03f4 [size=1]
 Region 2: I/O ports at 0170 [size=8]
 Region 3: I/O ports at 0374 [size=1]
 Region 4: I/O ports at 1000 [size=16]
 00: 86 80 cb 24 05 00 80 02 02 8a 01 01 00 00 00 00
 10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
 20: 01 10 00 00 00 00 00 00 00 00 00 00 86 80 04 34
 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00

I agree that we should have a generic way to do this rather than an
ia64-specific way.  In this case you have EFI, but the same thing
could happen with BIOS.

The firmware left the memory BAR at 0x24 cleared (0x), but it
also left the MEM bit in the command register disabled.  So it seems
like a Linux bug that we're trying to use that zero address from the
BAR.  If the firmware left the MEM or IO decode enable bit cleared,
why would we assume it put anything useful in the corresponding BARs?

What would break if we paid attention to the command register enables
in the PCI core and just cleared the resource flags for MEM BARs if
the MEM-decode bit was off, and those for IO BARs if the IO-decode bit
was off?

I don't know much of the ancient history here, so maybe there's a good
reason why this works the way it currently does.

Bjorn


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#543308: [Bug 15362] MPT Fusion SCSI drives no longer appear - suspect PCI bus scan bug

2012-07-02 Thread Bjorn Helgaas
 in short: the bios is broken, it return wrong segment in DSDT.

I *think* what Yinghai is saying is:

  - MMCONFIG is not used either in 2.6.26 or 2.6.32.
  - BIOS reports these host bridges via DSDT PNP0A08 devices:
[PCI0] leading to segment  bus 00
[PCI1] leading to segment 0001 bus 40
[PCI2] leading to segment 0002 bus 80
  - Buses 40 and 80 are actually in segment 0, not segments 1 and 2.
  - When we enumerate bus 40 and bus 80, we pass seg=1 and seg=2,
respectively, to pci_conf1_read(), but 2.6.26 ignores seg.  For
example, when we think we're reading 0001:40:01.0 config space, 2.6.26
actually reads :40:01.0 config space instead.
  - In 2.6.32, instead of ignoring seg, we return an error if it is
not zero.  Therefore, we fail to find anything on bus 40 and bus 80.

Sean, what system and BIOS version is this?  (The 3.4.x dmesg log or
the dmidecode output will contain this information.)  I don't expect
HP to change the BIOS, and it wouldn't be reasonable to require users
to debug this issue and upgrade their BIOS in any case.

But I would like to read the release notes or help text that mentions
this issue.  If all the buses were in fact in segment 0, the DSDT
would typically not have any _SEG methods at all, because segment 0 is
the default.  Yinghai is assuming that HP went to the trouble to *add*
_SEG methods that returned incorrect values.  But the fact that HP was
aware of the issue and provided the BIOS disable ACPI bus
segmentation option makes it less likely that this is the case.

Also, the system was very likely tested with Windows, and the fact
that the BIOS option is to *disable* segmentation suggests that the
default is segmentation enabled.  So my guess is that segmentation
does work with Windows.  Sean, can you confirm or deny that?  The
AIDA64 tool (free trial version at http://www.aida64.com/) generates a
report with useful information.

I agree with Jonathan's assertion here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=543308#87 that the
BIOS switch is not adequate.  Neither is a patched DSDT.

I think it's likely that Windows works with segmentation, using
MMCONFIG, and that Linux is a bit too quick to disable MMCONFIG in
this case.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: Errors/warnings show during startup

2012-02-14 Thread Bjorn Helgaas
On Tue, Feb 14, 2012 at 1:53 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Hi Ralf,

 Ralf Jung wrote[1]:

 after upgrading to version 3.2.0-1 of the kernel, one of the two error
 messages during startup is gone - the corresponding patch by Bjorn Helgaas 
 has
 been accepted upstream.
 However, the timer error is still present:
 $ dmesg | fgrep TCO
 [   11.915459] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
 [   11.915563] SP5100 TCO timer: mmio address 0xfec000f0 already in use
 (I attached the full dmesg log)

 Bjorn also wrote patches for these, which I attached as well - however, the
 upstream discussion about them just stopped at some point, and/or the patches
 got lost while kernel.org was down. I do not know how this is usually 
 handled.

 Nice.  The usual approach is to resend to the relevant people as a
 reminder, like you have done now.  For reference, here's the last
 discussion of the two patches you attached:

  http://thread.gmane.org/gmane.linux.kernel/1184383

 If I understand correctly, Cyrill Gorcunov liked the patches.

 Bjorn, would you like to resend, or should I?

I don't think we had clear consensus that my patches were correct.  So
I don't want to blindly resend them without more consideration.

Bjorn



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-26 Thread Bjorn Helgaas
On Fri, Aug 26, 2011 at 8:16 AM, Ralf Jung ralfjun...@gmx.de wrote:
 Hi Bjorn,

 Here's a test patch for the TCO timer issue.  That SP5100 watchdog
 driver is a mess -- it gropes around at hard-coded places in I/O port
 space -- so while I think this patch will fix the message, the
 watchdog itself still may not work.  If you can verify that the
 watchdog works, that would be great.
 I applied the patches you sent to the list, for both of the issues, and the
 messages are both gone. (Those address conflict messages I mentioned were
 already gone with the plain rc3).
 However, I don't know how to verify that the watchdog works. I installed the
 watchdog package, and I just did kill -9 watchdog PID, but the system
 keeps running. The same behaviour is shown with the 3.0 shipped by Debian. Now
 I don't know if that's the watchdog or my verification method failing ;-)

I don't know what's in the watchdog package.  I would try the test
program in the kernel sources:
Documentation/watchdog/src/watchdog-simple.c.

It looks like if you kill any other process that has /dev/watchdog
open (use lsof to check), then start watchdog-simple, then suspend
or kill *it*, you should see a system reset after a minute or two.

Thanks for testing all this stuff!

Bjorn



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-26 Thread Bjorn Helgaas
On Tue, Aug 23, 2011 at 6:13 PM, Bjorn Helgaas bhelg...@google.com wrote:
 Your error is SP5100 TCO timer: mmio address 0xbafe00 already in
 use.  (Same error, but different address.)  That looks like it's in
 the middle of your RAM, i.e., it looks completely bogus.  Given the
 ugliness of the sp5100_tco driver, that doesn't surprise me.  Possibly
 the BIOS configured it differently and we tried to read the MMIO
 address from the wrong (hard-coded) I/O ports.  If we can dig up a
 spec for this device, maybe this could be fixed up.

I don't really have time to work on this, unfortunately, but here's a
little info in case somebody else can.

Specs:
  http://support.amd.com/us/Embedded_TechDocs/44413.pdf SP5100
Register Reference Guide
  http://support.amd.com/us/Embedded_TechDocs/44414.pdf SP5100
Register Programming Requirements
  http://support.amd.com/us/Embedded_TechDocs/44415.pdf SP5100 BIOS
Developer's Guide

I think the BDG has an example putting the watchdog at 0xfec000f0,
which is where Ralf's system has it.  The power-up default looks like
0, so if you have 0xbafe00, so either the BIOS put it somewhere
nonsensical (in the middle of RAM), or we're doing something wrong in
reading the address.

It's possible we could learn something by booting Windows and seeing
whether it uses the watchdog, and at what address.  Something like the
Device Manager or http://www.aida64.com/ could be useful.

Here are some relevant registers from the RRG:

2.3 SMBus Module and ACPI Block
2.3.1 PCI Configuration Registers
  PCI_Reg 0x90  32 bits  Smbus Base Address
2.3.2 SMBus Registers
  SMBUS register space defined by PCI config 0x90
2.3.3 Legacy ISA and ACPI Controller
2.3.3.1 Legacy Block Registers
2.3.3.1.1 IO-Mapped Control Registers
  IO_Reg 0xCD6   8 bits  PM_Index (p. 163)
  IO_Reg 0xCD7   8 bits  PM_Data
2.3.3.2 Power Management (PM) Registers (p. 165)
  PM_REG  0x69   8 bits  WatchDogTimerControl (p. 190)
  PM_REG  0x6c   8 bits  WatchDogTimerBase0
  PM_REG  0x6d   8 bits  WatchDogTimerBase1
  PM_REG  0x6e   8 bits  WatchDogTimerBase2
  PM_REG  0x6f   8 bits  WatchDogTimerBase3
2.3.4 WatchDogTimer Registers (p. 225)
  WD_Mem_Reg  0x00  32 bits  WatchDogControl
  WD_Mem_Reg  0x04  32 bits  WatchDogCount

This is intertwined with piix4.  I did notice that piix4_setup() reads
the Smbus Base Address at PCI config offset 0x90, and it assumes I/O
space.  The SP5100 supports either MMIO or I/O, so if your system uses
MMIO, things will go wrong.  The attached patch checks for that.

I haven't worked out the chain from there to the WatchDogTimerBase registers.

Bjorn


piix4-check-io
Description: Binary data


Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-23 Thread Bjorn Helgaas
Hi Ralf, can you attach the complete dmesg log to the bug report,
please?  I see a snippet (starting with Bluetooth: SCO socket layer
initialized), but there's a lot of useful information before that.
The dmesg command only shows the most recent part of the log, so if
the kernel's buffer has wrapped around, it doesn't show the beginning.
 The /var/log/dmesg (or similar) file should contain the useful bits.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-23 Thread Bjorn Helgaas
Thanks!  These tests:

if ((dev-vendor == PCI_VENDOR_ID_AMD) || (dev-device ==
PCI_DEVICE_ID_AMD_GOLAM_7450))

are clearly wrong.  I suspect  was intended instead of ||, but
this code seems to have been that  way since the beginning, so I don't
know how to verify that.

In any event, it would be perfectly legal for any other (non-AMD)
manufacturer to make a PCI bridge that uses the 0x7450 device ID, and
this code would do the wrong thing with it.

I think we should change that || to .  That will fix your
message and avoid any conflicts with non-AMD devices.  It's possible
that it will break shpchp on some non-7450 AMD bridges, but we'll just
have to deal with those as we discover them.

I cc'd a couple AMD folks in case they have comments [see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863].



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-23 Thread Bjorn Helgaas
Ralf, can you attach your /proc/iomem contents, too?  I looked at the
SP5100 TCO timer: mmio address 0xfec000f0 already in use message,
but I don't see why that address is in use.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-23 Thread Bjorn Helgaas
Here's a test patch for the TCO timer issue.  That SP5100 watchdog
driver is a mess -- it gropes around at hard-coded places in I/O port
space -- so while I think this patch will fix the message, the
watchdog itself still may not work.  If you can verify that the
watchdog works, that would be great.

 [    0.470960] pci_root PNP0A03:00: address space collision: host bridge
 window [mem 0x000cc000-0x000c] conflicts with Video ROM [mem
 0x000c-0x000ce9ff]
 [    0.471052] pci_root PNP0A03:00: address space collision: host bridge
 window [mem 0x000ec000-0x000e] conflicts with reserved [mem
 0x000ef000-0x000f]

These are unrelated and I'm doing some other work that addresses them.

 [    1.480097] pci :00:04.0: ASPM: Could not configure common clock

I don't know about this one.  It looks like we tried a link retrain
but it failed.  I would poke Shaohua Li shaohua...@intel.com about
this since he submitted the original code for that.


ioapic
Description: Binary data


Bug#638863: shpchp: Cannot reserve MMIO region error during boot (linux 3.0)

2011-08-23 Thread Bjorn Helgaas
Your error is SP5100 TCO timer: mmio address 0xbafe00 already in
use.  (Same error, but different address.)  That looks like it's in
the middle of your RAM, i.e., it looks completely bogus.  Given the
ugliness of the sp5100_tco driver, that doesn't surprise me.  Possibly
the BIOS configured it differently and we tried to read the MMIO
address from the wrong (hard-coded) I/O ports.  If we can dig up a
spec for this device, maybe this could be fixed up.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#481331: usage message fix

2008-05-15 Thread Bjorn Helgaas
Thanks for this patch.  Here's a small fix to the usage message:

--- mail.orig   2008-05-15 15:50:14.0 -0600
+++ mail2008-05-15 15:51:21.0 -0600
@@ -21,7 +21,7 @@
 
 usage()
 {
-   printf $Usage: quilt mail {--mbox file|--send} [-m text] [--prefix 
prefix] [--sender ...] [--from ...] [--to ...] [--cc ...] [--bcc ...] 
[--subject ...] [--signature file]\n
+   printf $Usage: quilt mail {--mbox file|--send} [--select] [-m text] 
[--prefix prefix] [--sender ...] [--from ...] [--to ...] [--cc ...] [--bcc ...] 
[--subject ...] [--signature file]\n
if [ x$1 = x-h ]
then
printf $



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#386694: ketchup bracket expression patch

2007-04-20 Thread Bjorn Helgaas
I sent Matt this patch, which solves the problem for me:


To complement the character class matched by a bracket expression,
the exclamation mark seems more widely accepted than circumflex.
Bash accepts either, but dash, ksh, and The Open Group shell command
language spec accept only exclamation mark.

Dash is installed as /bin/sh on recent Ubuntu systems, and the fact that it
doesn't accept circumflex to complement bracket expressions causes errors
like this:

Unpacking linux-2.6.20.tar.bz2
mv: cannot move `linux-2.6.20/..' to `../..': Device or resource busy

Problem reports:
https://launchpad.net/bugs/69804
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=386694
http://www.archivum.info/linux.debian.bugs.dist/2006-09/msg02777.html

References:
bash: http://www.gnu.org/software/bash/manual/bashref.html#SEC34 (sec 
3.5.8.1)
ksh:  http://www.cs.princeton.edu/~jlk/kornshell/doc/man93.html#File Name 
Generation
TOG:  
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13
 (sec 2.13.1)

--- ketchup.orig2006-05-01 14:09:00.0 -0600
+++ ketchup 2007-04-20 14:15:36.0 -0600
@@ -433,7 +433,7 @@
 error(Unpacking failed: , err)
 sys.exit(-1)
 
-err = os.system(mv linux*/* linux*/.[^.]* ..; rmdir linux*)
+err = os.system(mv linux*/* linux*/.[!.]* ..; rmdir linux*)
 if err:
 error(Unpacking failed: , err)
 sys.exit(-1)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#284111: xserver-xfree86: Doesn't scan PCI domains above 0000 on startup

2005-01-13 Thread Bjorn Helgaas
On Thu, 2005-01-13 at 17:59 -0500, Branden Robinson wrote:
 On Fri, Dec 31, 2004 at 12:18:29AM -0800, David Mosberger wrote:
Branden I wonder how many domains we should look for before we give
Branden up.  I get the feeling doing an ftw() on /proc/pci/pci is
Branden not a good idea.  Even doing as much as a readdir() feels
Branden wrong, but maybe not.  :-P

I think readdir() should be kosher.  ls in /proc/bus/pci
has to get its data somehow.

But I'm confused about why we would iterate through all
the domains anyway, since we don't seem to iterate through
all PCI buses.  Maybe X's PCITAG doesn't include a domain,
or maybe there's no config file syntax for specifying it?

  I'm not terribly familiar with multi-domain machines.  From what I
  recall, the domain-changes to /proc/bus/pci were SPARC-specific and
  I'm not sure whether that approach is the final answer.

The domain changes to /proc/bus/pci are implemented for
sparc64, ppc64, ia64, alpha, and mips (see pci_name_bus()).
They aren't all identical (sparc64 uses %04x:%02x always,
while ia64 uses %04x:%02x only for non-zero domain), but
they look close enough that one could try %02x first, then
%04x:%02x.

I added Matthew Wilcox in case he has additional input.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]