from:"Gérard Roudier"

Re: SCSI Tape corruption - update

2001-07-20 Thread Gérard Roudier




On Fri, 20 Jul 2001, Geert Uytterhoeven wrote:

> On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:
> > New findings:
> >   - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
> > kernels starting with 2.2.18-pre1.
> >   - The only related stuff that changed in 2.2.18-pre1 seems to be the
> > Sym53c8xx driver itself. I'll do some more tests soon to isolate the
> > problem.
> >   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
> > individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
> > somewhere?

Not completely. The reason is that I used manual diffing/patching against
various kernel versions and it would be a PITA to resurrect all
intermediate driver versions using these patches. If we consider patches
that went directly to kernel main stream without changing the driver
version, a double PITA it would be. Btw, for sym-2.1.x series, I now use a
CVS tree and each driver release is tagged independently. For those ones,
it will be much more easy to isolate broken changes.

> The problem is indeed introduced by the changes to the Sym53c8xx in 2.2.18-pre1.
> I managed to find some intermediate versions in the 2.3.x series, and here are the
> results:
>   - sym53c8xx-1.3g (from BK linuxppc_2_2): OK
>   - sym53c8xx-1.5e: crash in SCSI interrupt during driver init
>   - sym53c8xx-1.5f: lock up during driver init
>   - sym53c8xx-1.5g: random 32-byte error bursts when writing to tape

That's an interesting result. But 1.5g - 1.3g diffs are probably very
large. Patches available from ftp.tux.org should allow to resurrect
driver versions 1.4, 1.5, 1.5a, 1.5b, 1.5c, 1.5d.

ftp://ftp.tux.org/pub/roudier/drivers/linux/sym53c8xx/README

You may, for example, apply incremental patches that address kernel 2.2.5
to a fresh kernel 2.2.5 tree and extract driver files accordingly.

> Perhaps I can get 1.5e and 1.5g to work using some PPC-specific fixes from the
> 1.3.g driver in the linuxppc_2_2 tree (it differed a bit from the 1.3g in
> Alan's 2.2.17). But even then the changes in 1.5f and 1.5g are rather small,
> compared to the changes between 1.3g and 1.5f.

Some PPC specific changes are very probably not present in my driver
sources. I am unable to help on that point.

> So I'd be very happy if I could get my hand on more intermediate versions.
> Thanks for your help! I _really_ want to nail this one down!
>
> Gr{oetje,eeting}s,

Regards,
  Gérard.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-08 Thread Gérard Roudier




On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:

> On Thu, 21 Jun 2001, Geert Uytterhoeven wrote:
> > On Tue, 8 May 2001, Geert Uytterhoeven wrote:
> > > In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
> > > Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
> > > under 2.2.17 neither.
> > >
> > > My experiences:
> > >   - reading works fine, writing doesn't
> > >   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
> > >   - hardware compression doesn't matter
> > >   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
> > > SCSI hardware driver bug
> > >   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
> > >   - corruption is always a block of 32 bytes being replaced by 32 bytes from
> > > the previous tape block (depending on block size!) (approx. 6 errors per
> > > 256 MB)
> > >
> > > Lorenzo, can you please investigate the exact nature of the corruption on your
> > > system?
> > >   - How many successive bytes are corrupted?
> > >   - Where do the corrupted data come from?
> >
> > Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
> > backing up my system now, so it detects corruption through the gzip CRCs).
> >
> > I'll do some more tests (when I find time) to get a higher statistical
> > certainty that it really doesn't happen under earlier 2.2.x kernels.
>
> New findings:
>   - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
> kernels starting with 2.2.18-pre1.
>   - The only related stuff that changed in 2.2.18-pre1 seems to be the
> Sym53c8xx driver itself. I'll do some more tests soon to isolate the
> problem.
>   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
> individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
> somewhere?

No. But you can move the sym/ncr driver bundle from 2.2.18-pre1 to 2.2.17
and vice-versa.
 sym53c8xx.h, sym53c8xx_defs.h, sym53c8xx.c,
 sym53c8xx_comm.h, ncr53c8xx.h, ncr53c8xx.c

You also can download either sym-1.7.3c-ncr-3.4.3b, or sym-2.1.11, or just
both and play with all that stuff under 2.2.17 and later 2.2 kernels.

 ftp://ftp.tux.org/pub/roudier/README-drivers-linux

Btw, I am interested in results using sym-1.7.3c and sym-2.1.11 under
kernel 2.2.17 and possibly 2.2.18.

> BTW, I wrote a small test program which tries to analyze error bursts. You can
> find it at http://home.tvd.be/cr26864/Download/genpseudorandom.c
>
> Sample test using 2 bytes of data:
>
> genpseudorandom -o -l 2  > /dev/tape
> genpseudorandom -i < /dev/tape

Unfortunately, I haven't any tape device.

> So far I always saw problems when writing even only 10 MB to tape: ca. 3-5
> bursts of 32 or 12 incorrect bytes, which are always a copy of the
> corresponding bytes in the previous block. Of course I used a much larger test
> stream to verify 2.2.17.
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
>   Geert

Thanks for your testings,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: Changes for PCI

2001-06-28 Thread Gérard Roudier

On Wed, 27 Jun 2001, Jeff Garzik wrote:

> Tom Gall wrote:
> > Well you have device drivers like the symbios scsi driver for instance that
> > tries to determine if it's seen a card before. It does this by looking at the
> > bus,dev etc numbers...  It's quite reasonable for two different scsi cards to be
> > on the same bus number, same dev number etc yet they are in different PCI
> > domains.
> >
> > Is this a device driver bug or feature?
>
> I hesitate to call it a device driver bug, because that was likely the
> best decision Gerard could make at the time.
>
> However, I think the driver (only going by your description) would be
> more correct to use a pointer to struct pci_dev.  We have a token in the
> kernel that is guaranteed 100% unique to any given PCI device:  the
> pointer to its struct pci_dev.

The driver checks against PCI bus+dev+func in 2 situations:

1) To apply the boot order that user can set up in the controller NVRAMs.
2) To detect buggy double reporting of the same device by the kernel PCI
   code (this made lot of troubles at some time).

The great bug is to invent useless abstractions that don't match reality.
Such brain masturbation leads to confusion (hence subtle bugs)  and
useless software bloatage (thus _real_ resource wastage).

If we want to handle _real_ PCI bus domains, we just have to add a domain
number to identify a _real_ PCI device. Anything that wants to hide such
reality in some opaque data looks like brain masturbation to me.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Client receives TCP packets but does not ACK

2001-06-15 Thread Gérard Roudier




On Fri, 15 Jun 2001, Mike Black wrote:

> This is a very common misconception -- I worked a contract many years ago
> where I actually had to quote the author of TCP to convince a banking
> company I was working with that TCP is not a guaranteed protocol.
> Guaranteed delivery at layer 5 - yes -- but NOT a guaranteed protcol.
> 
> Guaranteed means that there is absolutely NO way that data can be dropped by
> an application if either sender or receiver screws up.
> 
> The only way to do this is at layer 7 of the OSI model -- even then you end
> up making assumptions.

You are mixing oranges (protocols) and apples (implementations and APIs)
here.

The layer that is expected to provide reliable end to end communication is
layer 4 (transport layer). TCP, at least in theory, is as good as OSI
transport in providing reliable end to end communication. 

> Here's some examples for layer 5 (which TCP operates at) but talking at
> Layer 7:
> 
> #1 - You send() data -- meanwhile the receiver terminates the connection --
> what happened to the data?  It's gone!  Your app never receives feedback
> that it didn't send() correctly.  You'll see the reset on the next read but
> you don't know what happened to the data.
> #2 - You send() data and overrun your IP queue -- nobody will ever know the
> difference without a layer 7 protocol (or int the case quoted in this
> subject it might lock up).
> #3 - You send() data and either machine has bad RAM and flips a bit -- guess
> what? -- data corruption.
> 
> Even when you do layer 7 (with checksums and ack/nak) you make assumptions:
> 
> #1 - You checksum the packet you just received -- what's to say a bit can't
> flip?
> 
> TCP may be guaranteed at layer 5 but we don't typically program at layer
> 5 -- we program at layer 7 and then lots of people assume they're doing it
> at layer 5 -- ergo the problems.

Layers above layer 4 provide additionnal services for applications but
they assume that layer 4 is reliable. In other words, a broken transport
layer breaks all layers above it and thus the applications.

In fact, when you build your application above layer 4 and need services
normally provided by upper OSI layers, you have to implement equivalent
services in your application, using layered protocols or not.

> To look at it another way -- "Just 'cuz I told my C library to send a packet
> doesn't mean it's going to work".
> For example, if you're using non-blocking sockets you have to check to
> ensure there's room in your IP queue to transmit.

That's API semantic issue, not protocol issue.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VIA's Southbridge bug: Latest (pseudo-)patch

2001-06-06 Thread Gérard Roudier

On Sun, 3 Jun 2001, Adrian Cox wrote:

> Marc Lehmann wrote:
> 
> 
> > Aren't PCI delayed transaction supposed to be handled by the pci master
> > (e.g. my northbridge), not by the (software) driver for my pdc(?) I would
> > also be surprised if my pdc actually used that feature, not to speak of
> > the fact that the promise + harddisk worked fine in another computer (the
> > data corruption was easily detectable, one couldn't even write 500megs
> > without altered bytes).
> 
> 
> Wrong way round. You're right that the pci master is supposed to handle 
> delayed transactions, but during data transfer the pdc is the pci master 
> and the northbridge is the PCI target.

Wrong in my opinion, at least as it is worded. :)

PCI delayed transactions are not a PCI master issue at all, for the reason
it is not possible for a PCI master to distiguish between a transaction
which is delayed by the target and a simple retry requested by the target.

Btw, a PCI master that is unable to properly retry a transaction (could
the transaction be handled by the target as a delayed transaction on not)
would not allow a system to work for a long time. I would expect system
breakage within seconds in such situation.

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sym53c8xx timer and smp fixes

2001-06-04 Thread Gérard Roudier

On Thu, 31 May 2001, Tim Hockin wrote:

> All,
> 
> Attached is a patch for sym53c8xx.c to handle the error timer better, and
> be more proper for SMP.  The changes are very simple, and have been beaten
> on by us.  Please let me know if there are any problems accepting this
> patch for general inclusion.

I have no problems accepting your patch. Thanks for it.

I just want to have to deal with a human manageable finite number of
actual driver versions :). I also want the same driver source to also be
useable on recent 2.2 kernels.

About timers in modules and more generally either timers in drivers or
modules unloading, you must keep in mind that this stuff has been racy for
years in Linux. Allow time for me to check if it is really fixed in latest
kernel and so to make sure it is worthwhile to apply your patch.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI-CD-Writer don't show up

2001-06-04 Thread Gérard Roudier

On Sat, 2 Jun 2001, Matthias Schniedermeyer wrote:

> #Include 
> 
> 
> 
> I have 3 SCSI-CD-Writers. "Strange" is that the boot-process only finds
> the first one (1 0 5 0), the other two i have to add with
> 
> echo "scsi add-single-device 2 0 4 0" > /proc/scsi/scsi
> echo "scsi add-single-device 2 0 6 0" > /proc/scsi/scsi
> 
> to make them useable.
> 
> Here is the complete ist of my SCSI-Devices:
> 
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
>   Vendor: IBM  Model: DDYS-T18350N Rev: S93E
>   Type:   Direct-AccessANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
>   Vendor: PLEXTOR  Model: CD-ROM PX-32TS   Rev: 1.03
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 01 Lun: 00
>   Vendor: PIONEER  Model: DVD-ROM DVD-303  Rev: 1.10
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 05 Lun: 00
>   Vendor: TEAC Model: CD-R58S  Rev: 1.0N
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi2 Channel: 00 Id: 02 Lun: 00
>   Vendor: PIONEER  Model: DVD-ROM DVD-304  Rev: 1.03
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi2 Channel: 00 Id: 03 Lun: 00
>   Vendor: PIONEER  Model: DVD-ROM DVD-304  Rev: 1.03
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi2 Channel: 00 Id: 04 Lun: 00
>   Vendor: TEAC Model: CD-R58S  Rev: 1.0K
>   Type:   CD-ROM   ANSI SCSI revision: 02
> Host: scsi2 Channel: 00 Id: 06 Lun: 00
>   Vendor: TEAC Model: CD-R58S  Rev: 1.0P
>   Type:   CD-ROM   ANSI SCSI revision: 02
> 
> I have a "Symbios 53c1010 (Dual Channel Ultra 160)" and a "NCR 810a" The
> two devices which are not found are connected through adapters onto the
> second channel of the Symbios 53c1010.
> 
> Kernel is 2.4.4 or 2.4.5ac6. 
> As host-adapter-driver i use the "SYM53C8XX"-driver
> 
> If other info is needed, no problem. :-)

You should check if your devices are enabled for SCAN in the NVRAM.

Devices that aren't enabled for "SCAN AT BOOT" are forced by the driver to
fail the initial SCSI scan. As a plus, the driver also applies the boot
order for all Symbios HBAs that look Symbios-compatible regarding GPIO
pins and NVRAM layout.

For such subset of SCSI BUSes, this allows to present SCSI devices to the
kernel in the same order as BIOS saw them. On the other hand, this may
speed-up the system boot process a lot. If you had numerous O/Ses
installed on a single system, you would appreciate as useful it is. For
example, I use to boot an O/S with only drive 80 seen by BIOS and sda seen
by system, and then mount the other disks in the order I want.

  Gérard.

PS: See README.ncr53c8xx for the way to disable this feature if it does
not fit your expectation. :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sym53c8xx timer and smp fixes

2001-06-01 Thread Gérard Roudier




On Fri, 1 Jun 2001, Jeff Garzik wrote:

> Tim Hockin wrote:
> >  spinlock_t sym53c8xx_lock = SPIN_LOCK_UNLOCKED;
> > +spinlock_t sym53c8xx_host_lock = SPIN_LOCK_UNLOCKED;
> >  #defineNCR_LOCK_DRIVER(flags) spin_lock_irqsave(&sym53c8xx_lock, 
>flags)
> >  #defineNCR_UNLOCK_DRIVER(flags)   
>spin_unlock_irqrestore(&sym53c8xx_lock,flags)
> > +#defineNCR_LOCK_HOSTS(flags) spin_lock_irqsave(&sym53c8xx_host_lock, 
>flags)
> > +#defineNCR_UNLOCK_HOSTS(flags)   
>spin_unlock_irqrestore(&sym53c8xx_host_lock,flags)
> > 
> >  #define NCR_INIT_LOCK_NCB(np)  spin_lock_init(&np->smp_lock);
> >  #defineNCR_LOCK_NCB(np, flags)spin_lock_irqsave(&np->smp_lock, flags)
> > @@ -650,6 +655,8 @@
> > 
> >  #defineNCR_LOCK_DRIVER(flags) do { save_flags(flags); cli(); } while 
>(0)
> >  #defineNCR_UNLOCK_DRIVER(flags)   do { restore_flags(flags); } while (0)
> > +#defineNCR_LOCK_HOSTS(flags) do { save_flags(flags); cli(); } while 
>(0)
> > +#defineNCR_UNLOCK_HOSTS(flags)   do { restore_flags(flags); } while (0)
> > 
> >  #defineNCR_INIT_LOCK_NCB(np)  do { } while (0)
> >  #defineNCR_LOCK_NCB(np, flags)do { save_flags(flags); cli(); } while 
>(0)
> > @@ -695,7 +702,7 @@
> 
> so, this driver is mixed spinlocks and save/restore_flags?  Any chance
> this can be converted to all spinlocks?

This has been done years ago for linux 2.1.93.
The save/restore flags locking methods are conditionnaly compiled for
earlier kernels. This makes the corresponding code very probably quite
useless nowadays and I should remove it from the source.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: alpha iommu fixes

2001-05-20 Thread Gérard Roudier

On Sun, 20 May 2001, Ivan Kokshaysky wrote:

> On Sun, May 20, 2001 at 04:40:13AM +0200, Andrea Arcangeli wrote:
> > I was only talking about when you get the "pci_map_sg failed" because
> > you have not 3 but 300 scsi disks connected to your system and you are
> > writing to all them at the same time allocating zillons of pte, and one
> > of your drivers (possibly not even a storage driver) is actually not
> > checking the reval of the pci_map_* functions. You don't need a pte
> > memleak to trigger it, even regardless of the fact I grown the dynamic
> > window to 1G which makes it 8 times harder to trigger than in mainline.
> 
> I think you're too pessimistic. Don't mix "disks" and "controllers" --
> SCSI adapter with 10 drives attached is a single DMA agent, not 10 agents.
> 
> If you're so concerned about Big Iron, go ahead and implement 64-bit PCI
> support, it would be right long-term solution. I'm pretty sure that
> high-end servers use mostly this kind of hardware.
> 
> Oh, well. This doesn't mean that I'm disagreed with what you said. :-)
> Driver writers must realize that pci mappings are limited resources.

The IOMMU code allocation strategy is designed to fail due to
fragmentation as everything that performs contiguous allocations of
variable quantities.

I may add a test of pci_map_* return code in the sym53c8xx driver, but
the driver will panic on failure. It is not acceptable to consider such
kind of failure as a normal situation (returning some ?_BUSY status to
the SCSI driver) for the following reasons:

- IOs may be reordered and break upper layers assumptions.
- Spurious errors and even BUS resets may happen.

For now, driver methods that are requested to queue IOs are not allowed to
wait for resources. Anyway, the pci_map_* interface is unable to wait.

There are obviously ways to deal gracefully with such resource lack, but
the current SCSI layer isn't featured for that. For example, a
freeze/unfreeze mechanism as described in CAM can be implemented in order
not to reorder IOs, and some mechanism (callback, resource wait, etc...)
must be added to restart the operation when resource is likely to be
available.

IMO, the only acceptable fix in the current kernel is to perform IOMMU PTE
allocations of a fixed quantity at a time, as limiting SG entry to fit in
a single PAGE for example.

  Gérard.

PS: May-be I should look how *BSD's handles IOMMUs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Wow! Is memory ever cheap!

2001-05-09 Thread Gérard Roudier

On Tue, 8 May 2001, Dan Hollis wrote:

> On Tue, 8 May 2001, Larry McVoy wrote:
> > which is a text version of the paper I mentioned before.  The basic
> > message of the paper is that it really doesn't help much to have things
> > like ECC unless you can be sure that 100% of the rest of your system
> > has similar checks.
> 
> UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
> your cpu l1 and l2 have ecc...

SCSI Ultra-160 has CRC.

PCI has parity (btw, you think right), but only a few drivers make sure
PCI parity checking is enabled. On the other hand, a PCI parity error
should be considered as extremally serious and the system should be
stopped when such happens.

Btw, it seems (read at the pci list) that the original PCI hadn't parity.
After all, PCI had been designed for PC machines... :)

> Looks like similar checks are already there.
> 
> -Dan
> 
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ServerWorks LE and MTRR

2001-04-29 Thread Gérard Roudier

On Sun, 29 Apr 2001, Steffen Persvold wrote:

> [EMAIL PROTECTED] wrote:
> > On Sun, 29 Apr 2001, Steffen Persvold wrote:
> > 
> > > I've learned it the hard way, I have two types : Compaq DL360 (rev 5) and a
> > > Tyan S2510 (rev 6). On the compaq machine I constantly get data corruption on
> > > the last double word (4 bytes) in a 64 byte PCI burst when I use write
> > > combining on the CPU. On the Tyan however the transfer is always ok.
> > >
> > 
> > Are you sure that is not due to board design differences?
> 
> No I can't be 100% certain that the layout of the board isn't the reason since
> I haven't asked ServerWorks about this and it doesn't say anything in their
> docs (yes my company has the NDA, so I shouldn't get to much in detail here),
> but if this was the case it would be totally wrong to disable write combining
> on any LE chipset.
> 
> The test case that I have been using to trigger this is sort of special because
> we are using SCI shared memory adapters to write (with PIO) into remote nodes
> memory, and the bandwidth tends to get quite high (approx 170 MByte/sec on LE
> with write combining). I've been able to run this case on 5 different
> motherboards using the LE and HE-SL ServerWorks chipsets, but only two of them
> are LE (the DL360 and the S2510). Everything works fine with write-combining on
> every motherboard except the DL360 (which has rev 5).
> 
> One basic test case that I haven't tried, could be to enable write-combining on
> your PCI graphics adapter memory and see if the X display gets screwed up.

Done since 8 months on my Supermicro 370 DLE board. /proc/pci tells about 
2 PCI bridges rev. 5. The 64bit PCI (bus 1) is interfacing a LSI53C1010
33MHz 64 bit PCI-SCSI controller. The other devices (3dfx, SYM53C895, ...)
are on PCI bus #0. The machine does network using an external modem only.
Never got a single glitch (linux-2.2.18), but the machine is not a server
but my workstation I use at home.

Here is /proc/pci layout:
PCI devices found:
  Bus  0, device   0, function  1:
Host bridge: Unknown vendor CNB30LE PCI Bridge (rev 5).
  Medium devsel.  Master Capable.  Latency=16.  
  Bus  0, device   0, function  0:
Host bridge: Unknown vendor CNB30LE PCI Bridge (rev 5).
  Medium devsel.  Master Capable.  Latency=32.  
  Bus  0, device   1, function  0:
SCSI storage controller: NCR 53c895 (rev 1).
  Medium devsel.  IRQ 16.  Master Capable.  Latency=72.  Min Gnt=30.Max Lat=64.
  I/O at 0xde00 [0xde01].
  Non-prefetchable 32 bit memory at 0xfeaefe00 [0xfeaefe00].
  Non-prefetchable 32 bit memory at 0xfeaec000 [0xfeaec000].
  Bus  0, device   2, function  0:
SCSI storage controller: NCR 53c810 (rev 18).
  Medium devsel.  IRQ 18.  Master Capable.  Latency=64.  Min Gnt=8.Max Lat=64.
  I/O at 0xd400 [0xd401].
  Non-prefetchable 32 bit memory at 0xfeaeff00 [0xfeaeff00].
  Bus  0, device   3, function  0:
VGA compatible controller: 3Dfx Unknown device (rev 1).
  Vendor id=121a. Device id=5.
  Fast devsel.  Fast back-to-back capable.  IRQ 20.  
  Non-prefetchable 32 bit memory at 0xfc00 [0xfc00].
  Prefetchable 32 bit memory at 0xf800 [0xf808].
  I/O at 0xd800 [0xd801].
  Bus  0, device   6, function  0:
Ethernet controller: Intel 82557 (rev 8).
  Medium devsel.  Fast back-to-back capable.  IRQ 31.  Master Capable.  
Latency=64.  Min Gnt=8.Max Lat=56.
  Non-prefetchable 32 bit memory at 0xfeaed000 [0xfeaed000].
  I/O at 0xd000 [0xd001].
  Non-prefetchable 32 bit memory at 0xfe90 [0xfe90].
  Bus  0, device  15, function  0:
ISA bridge: Unknown vendor Unknown device (rev 79).
  Vendor id=1166. Device id=200.
  Medium devsel.  Master Capable.  No bursts.  
  Bus  0, device  15, function  1:
IDE interface: Unknown vendor Unknown device (rev 0).
  Vendor id=1166. Device id=211.
  Medium devsel.  Master Capable.  Latency=64.  
  I/O at 0xffa0 [0xffa1].
  Bus  0, device  15, function  2:
USB Controller: Unknown vendor Unknown device (rev 4).
  Vendor id=1166. Device id=220.
  Medium devsel.  Fast back-to-back capable.  IRQ 10.  Master Capable.  
Latency=64.  Max Lat=80.
  Non-prefetchable 32 bit memory at 0xfeaee000 [0xfeaee000].
  Bus  1, device   1, function  1:
SCSI storage controller: NCR Unknown device (rev 1).
  Vendor id=1000. Device id=20.
  Medium devsel.  IRQ 25.  Master Capable.  Latency=72.  Min Gnt=17.Max Lat=18.
  I/O at 0xe800 [0xe801].
  Non-prefetchable 64 bit memory at 0xfebffc00 [0xfebffc04].
  Non-prefetchable 64 bit memory at 0xfebfc000 [0xfebfc004].
  Bus  1, device   1, function  0:
SCSI storage controller: NCR Unknown device (rev 1).
  Vendor id=1000. Device id=20.
  Medium devsel.  IRQ 24.  Master Capable.  Latency=72.  Min Gnt=17.Max Lat=18.
  I/O at 0xe400 [0xe401].
  Non-prefetchable 64 bit memory at 0xfebff800 [0xfebff804].
  Non-prefetchable 64 bit me

Re: ServerWorks LE and MTRR

2001-04-29 Thread Gérard Roudier




On Sun, 29 Apr 2001, Steffen Persvold wrote:

> Hi all,
> 
> I just compiled 2.4.4 and are running it on a Serverworks LE motherboard.
> Whenever I try to add a write-combining region, it gets rejected. I took a peek
> in the arch/i386/kernel/mtrr.c and found that this is just as expected with
> v1.40 of the code. It is great that the mtrr code checks and prevents the user
> from doing something that could eventually lead to data corruption. Using
> write-combining on PCI acesses can lead to this on certain LE revisions but
> _not_ all (only rev < 5). Therefore please consider my small patch to allow the
> good ones to be able to use write-combining. I have several rev 06 and they are
> working fine with this patch.

You wrote that 'only rev < 5' can lead to data corruption, but your patch
seems to disallow use of write combining for rev 5 too.

Could you clarify?

  Gérard.

PS:
From what hat did you get this information ? as it seems that ServerWorks
require NDA for letting know technical information on their chipsets.

> Best regards,
> -- 
>  Steffen PersvoldSystems Engineer
>  Email  : mailto:[EMAIL PROTECTED]Scali AS (http://www.scali.com)
>  Norway : Tel  : (+47) 2262 8950 Olaf Helsets vei 6
>   Fax  : (+47) 2262 8951 N-0621 Oslo, Norway
> 
>  USA: Tel  : (+1) 713 706 0544   10500 Richmond Avenue, Suite 190
>  Houston, Texas 77042, USA
> 
> diff -Nur linux/arch/i386/kernel/mtrr.c.~1~ linux/arch/i386/kernel/mtrr.c
> --- linux/arch/i386/kernel/mtrr.c.~1~ Wed Apr 11 21:02:27 2001
> +++ linux/arch/i386/kernel/mtrr.c Sun Apr 29 10:18:06 2001
> @@ -480,6 +480,7 @@
>  {
>  unsigned long config, dummy;
>  struct pci_dev *dev = NULL;
> +u8 rev;
>  
> /* ServerWorks LE chipsets have problems with  write-combining 
>Don't allow it and  leave room for other chipsets to be tagged */
> @@ -489,7 +490,9 @@
>  case PCI_VENDOR_ID_SERVERWORKS:
>   switch (dev->device) {
>   case PCI_DEVICE_ID_SERVERWORKS_LE:
> - return 0;
> + pci_read_config_byte(dev, PCI_CLASS_REVISION, &rev);
> + if (rev <= 5)
> + return 0;
>   break;
>   default:
>   break;
> -

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sym53c875 error

2001-04-24 Thread Gérard Roudier

On Tue, 24 Apr 2001, Hamilton, Eamonn wrote:

> Hi Folks.
> 
> Under all of the kernels I have access to try ( 2.2.19, 2.4.X & 2.4.X-ac* ),
> when I try and write an image in XA2 format to my SCSI writer ( Yamaha
> CDR-400t ), I get a DMA overrun. When I try with a kernel patched with the
> beta symbios driver ( 2.1.9 ), it works just fine.

Interesting.

Note that sym-2.1.9 status is probably far better than beta. I just
haven't information enough to know how reliable this driver version
actually is. FYI, I use sym-2.1.x under Linux and FreeBSD since several
months. The NetBSD port is still work in progress, but the driver works
just fine for me under this O/S too.

> This is on a Debian woody system, using cdrecord 1.10 ( also 1.9 and 1.8
> with the same symptoms ) attached to a Tekram DC390F.
> 
> Transcript as follows :
> 
> cdrecord dev=0,3,0 -dummy -xa2 firmware.iso
> 
> Cdrecord 1.10a18 (i686-pc-linux-gnu) Copyright (C) 1995-2001 Jörg Schilling
> scsidev: '0,3,0'
> scsibus: 0 target: 3 lun: 0
> Linux sg driver version: 3.1.17
> Using libscg version 'schily-0.5'
> Device type: Removable CD-ROM
> Version: 2
> Response Format: 2
> Capabilities   :
> Vendor_info: 'YAMAHA  '
> Identifikation : 'CDR400t '
> Revision   : '1.0q'
> Device seems to be: Yamaha CDR-400.
> Using generic SCSI-3/mmc CD-R driver (mmc_cdr).
> Driver flags   : SWABAUDIO
> Starting to write CD/DVD at speed 1 in dummy mode for single session.
> Last chance to quit, starting dummy write in 1 seconds.
> cdrecord: Input/output error. write_g1: scsi sendcmd: retryable error
> CDB:  2A 00 00 00 01 C2 00 00 1F 00
> status: 0x0 (GOOD STATUS)
> DMA overrun, resid: -248

Would be interesting to know how cdrecord calculates the residual. It
should probably use the return value from read()/write(). Does it ?

> cmd finished after 0.579s timeout 40s
> write track data: error after 0 bytes
> Sense Bytes: 70 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00
> 
> 
> And while that lot happens, I get
> 
> sym53c875-0-<3,*>: target did not report SYNC.

This message is not normal given the device that for sure supports 
synchronous data transfers. I will look into this problem, first.

> sym53c875-0-<3,*>: extraneous data discarded.
> sym53c875-0-<3,*>: COMMAND FAILED (89 0) @c12a3800.

This one could have been triggerred by previous errors ???.

> Standard burns work ok, it's just the xa2 stuff I have a problem with so
> far. I also tried using the old NCR driver with the same results.

If you mean that the ncr53c8xx driver gets the same error, then the cause
can be a either common bug in sym53c8xx and ncr53c8xx, or caused by a
difference between sym53c8xx/ncr53c8xx and sym-2.1.9.

The main difference that comes to mind is that sym-2 uses the new error
handling interface but sym53c8xx/ncr53c8xx use the old one. If it is the
cause, then the sg driver might get involved in the failure.

> Anybody got any ideas?

No more than the above for now.
Will let you know if I get better ones.

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: scanner problem

2001-04-12 Thread Gérard Roudier

On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:

> hi,
> 
> when trying to scan with xsane and "agfa snapscan 1236s", i get the
> following message:
> 
> Attached scsi generic sg2 at scsi0, channel 0, id 5, lun 0, type 6
> sym53c895-0-<5,*>: target did not report SYNC.

This message is just a warning. If your scanner does not support
synchronous data transfers, then it is ok. You may want to check the doc
of the device on this point.

> sym53c895-0-<5,0>: extraneous data discarded.
> sym53c895-0-<5,0>: COMMAND FAILED (89 0) @cff3d000.

This is the way the driver signals data overrun. Btw, I never used xsane.
If this tool has some trace mode that tells what SCSI commands are sent to
the device, this would help to get such traces.

> what can i do about this?

At least, try to catch the SCSI commands that are sent to the device, and
report them.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-12 Thread Gérard Roudier




On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:

> Still experimenting with my SDT-9000... tried connecting it to another
> controller
> (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
> problem.
> Tried with another tape (even with an old DDS-2 tape). Same. Even tried
> another
> cable/removing the CDWR drive from the bus.
> 
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
> accepted :)

A similar problem has been reported under Linux/PPC a couple of weeks ago
using a sym53c875 controller. In this case, kernel 2.2 was fine.

> Now I'll build some old 2.2 kernel to try...

If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
my opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx and 2.4.3 failures - fix, it is interrupt routing

2001-04-10 Thread Gérard Roudier




On Mon, 9 Apr 2001, Jim Studt wrote:

> G*rard Roudier insightfully opined..
> > Looks like an IRQ problem to me.
> > I mean the kernel wants to change IRQ routing and just do the wrong job.
> 
> Give the man a prize!  
> 
> After failing to work with 2.4.0, 2.4.1, 2.4.3, and 2.4.3-ac3 I
> enabled X86_UP_IOAPIC to stir up the interrupt code and it works.
> 
> I'll keep one of these servers set aside for testing and see if I can't
> figure out a little more specifically what the problem is, but IOAPIC
> is fine.  

Probably because the code that messes with IRQs isn't involved when IOAPIC
is used. If I had to guess I would point "arch/i386/kernel/pci-irq.c",
function "pcibios_lookup_irq()".

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx and 2.4.3 failures

2001-04-09 Thread Gérard Roudier



Looks like an IRQ problem to me.
I mean the kernel wants to change IRQ routing and just do the wrong job.

Ingo reported me a similar problem a couple of week ago that made failed
the sym53c8xx driver. Looks very similar to this one with the kernel PCI
code wanting to assign IRQ 11 to almost everything.

Btw, I donnot know the cause of the problem since I am still expecting
some reply following some suggestions from me. :-)

  Gerard.

On Mon, 9 Apr 2001, Jim Studt wrote:

> > > A typical startup with 6.1.9 proceeds like this...  (6.1.10 hangs silently
> > > after emitting the scsi0 and scsi1 adapter summaries, maybe it is
> > > going through the same gyrations silently.) 
> > > 
> > 
> 
> Alan Cox directs...
> > Try saying N to the AIC7xxx driver and Y to AIC7XXX_OLD and see if that works.
> > This is important both because it might solve your problem for now but also
> > because if the old driver works we can be fairly sure the bug is in the 
> > new adaptec driver and not elsewhere and triggered on it
> 
> Using AIC7XXX_OLD does not work either.  Different output
> 
> SCSI subsystem driver Revision: 1.00
> PCI: Assigned IRQ 11 for device 00:0c.0
> PCI: The same IRQ used for device 00:0c.1
> PCI: Found IRQ 11 for device 00:0c.1
> PCI: The same IRQ used for device 00:0c.0
> (scsi0)  found at PCI 0/12/0
> (scsi0) Wide Channel A, SCSI ID=7, 32/255 SCBs
> (scsi0) Downloading sequencer code... 392 instructions downloaded
> (scsi1)  found at PCI 0/12/1
> (scsi1) Wide Channel B, SCSI ID=7, 32/255 SCBs
> (scsi1) Downloading sequencer code... 392 instructions downloaded
> scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0
>
> scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.1/5.2.0
>
> scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 
>Inquiry 00 00 00 ff 00 
> SCSI host 0 abort (pid 0) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> SCSI host 0 channel 0 reset (pid 0) timed out - trying harder
> SCSI bus is being reset for host 0 channel 0.
> SCSI host 0 abort (pid 0) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> SCSI host 0 channel 0 reset (pid 0) timed out - trying harder
> SCSI bus is being reset for host 0 channel 0.
> SCSI host 0 abort (pid 0) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> ..
> 
> 
> Since we are looking elsewhere now... I have tried PCI access mode
> BIOS and Direct with no improvement.  
> 
> There is an unrecognized PCI bridge resource in the boot messages...
> 
> CPU: L1 I cache: 16K, L1 D cache: 16K
> CPU: L2 cache: 256K
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> CPU serial number disabled.
> CPU: Intel Pentium III (Coppermine) stepping 06
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support... done.
> Checking 'hlt' instruction... OK.
> POSIX conformance testing by UNIFIX
> mtrr: v1.37 (20001109) Richard Gooch ([EMAIL PROTECTED])
> mtrr: detected mtrr type: Intel
> PCI: Using configuration type 1
> PCI: Probing PCI hardware
> Unknown bridge resource 0: assuming transparent
> Unknown bridge resource 1: assuming transparent
> Unknown bridge resource 2: assuming transparent
> Unknown bridge resource 0: assuming transparent
> Unknown bridge resource 1: assuming transparent
> Unknown bridge resource 2: assuming transparent
> PCI: Discovered primary peer bus ff [IRQ]
> PCI: Using IRQ router PIIX [8086/7110] at 00:12.0
> 
> # lspci
> 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
> 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
> 00:0c.0 SCSI storage controller: Adaptec 7896
> 00:0c.1 SCSI storage controller: Adaptec 7896
> 00:0e.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
> 00:12.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
> 00:12.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
> 00:12.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
> 00:12.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
> 00:14.0 VGA compatible controller: Cirrus Logic GD 5480 (rev 23)
> 01:0f.0 PCI bridge: Digital Equipment Corporation DECchip 21150 (rev 06)
> 
> I will go back and try 2.4.0 and 2.4.3-ac3 and see where that gets me.
> 
> -- 
>  Jim Studt, President
>  The Federated Software Group, Inc.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH for Broken PCI Multi-IO in 2.4.3 (serial+parport)

2001-04-07 Thread Gérard Roudier

On Sat, 7 Apr 2001, Tim Waugh wrote:

> On Sat, Apr 07, 2001 at 08:42:35PM +0200, Gunther Mayer wrote:
> 
> > Please apply this little patch instead of wasting time by
> > finger-pointing and arguing.
> 
> This patch would make me happy.
> 
> It would allow support for new multi-IO cards to generally be the
> addition of about two lines to two files (which is currently how it's
> done), rather than having separate mutant hybrid monstrosity drivers
> for each card (IMHO)..

It is possible to design a single function PCI device that is able to do
everything. Your approach is just encouraging this kind of monstrosity.
Such montrosity will look like some single-IRQ capable ISA remake, thus
worse than 20 years old ISA.

If we want to encourage that, then we want to stay stupid for life, in my
nervous opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multi-function PCI devices

2001-04-07 Thread Gérard Roudier

On Sat, 7 Apr 2001, Michael Reinelt wrote:

> Tim Waugh wrote:
> > 
> > On Sat, Apr 07, 2001 at 01:33:25PM +0200, Michael Reinelt wrote:
> > 
> > > Adding PCI entries to both serial.c and parport_pc.c was that easy
> > 
> > And that's how it should be, IMHO.  There needs to be provision for
> > more than one driver to be able to talk to any given PCI device.
> 
> True, true, true.

Could you start up your brain now :) and think about the actual issue. All
the drivers must share the device resources and there is no (simple) way
to do so generically.
What you want to do is to write a single software driver, optionnaly
broken into several modules, that is aware of all the functionnalities of
the board and that will register to all involved sub-systems as needed.

> But - how to deal with it? Who decides if we can deal this way or not?
> PCI maintainer? Linus?
> 
> bye, Michael
> 
> P.S. I really need this. I have to unload serial and parallel and reload
> them in different order when I want either print something or talk to my
> Palm :-(

What about the option of using a different hardware ? :-)

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multi-function PCI devices

2001-04-07 Thread Gérard Roudier




On Sat, 7 Apr 2001, Michael Reinelt wrote:

> Brian Gerst wrote:
> > 
> > Gérard Roudier wrote:
> > >
> > > On Sat, 7 Apr 2001, Michael Reinelt wrote:
> > >
> > > > The card shows up on the PCI bus as one device. For the card provides
> > > > both serial and parallel ports, it will be driven by two subsystems, the
> > > > serial and the parallel driver.
> > >
> > > Given your description, this board is certainly not a multi-fonction PCI
> > > device. Multi-function PCI devices provide separate resources for each
> > > function in a way that allows each function to be driven by separate
> > > software drivers. A single function PCI device that provides several
> > > functionnalities commonly handled by separate sub-systems, is nothing but
> > > a bag of shit we should not want to support in any O/S in my opinion.
> > > Let me claim that ingenieers that want O/Ses to support such hardware are
> > > either morons or bastards.
> > 
> > Unfortunately, Windoze supports this configuration, and that's enough
> > for most hardware designers.  This is also an issue with the joystick
> > ports on many PCI sound cards.  We're not in a position to get up on the
> > soap box and decree this hardware "a bag of shit" though, yet.
> 
> How about other Multi-I/O-Cards? I think these 2S/1P (or 2P/1S or 2P/2S)
> cards are very common. At least they have been as ISA (PnP) cards. I

Please donnot compare ISA and PCI. ISA wasted trillions of user hours
because of its inability to allow automatic configuration.
PCI fixed this by assigning configuration space to each device.
These 'a la shitty ISA' Multi-I/O boards just kill the advantage of PCI by
moving again ISA burden to PCI. In year 2001, they stink a lot.

> don't know, but I'm shure there are a lot of these out there in the
> field. As mainboards without any ISA slots get more common every day,
> there will be even more PCI multi-I/O-cards (apart from everyone running
> to USB :-)

PCI multi I/O boards _shall_ provide a separate function for each kind of
IO. Those that donnot are kind of PCI messy IO boards.

> I needed another serial and parallel port, and I've got one of these
> mainboards (Asus A7V). So I had to buy such a PCI card. Nowadays you
> can't even ask for a specific hardware manufacturer, everything the guy
> in the shop knows is "yes, it's PCI, and yes, it has two serial and one
> parallel port". 
> 
> As these cards are very cheap, you can't expect very much from them (I

Cheap for whom?
What happens is that other companies or people that want to to support
such hardware must do additionnal efforts that could have been avoided if
the board had been correctly designed.

> don't even think there are any expensive ones out there). NetMos does
> not produce this cards, they produce _chips_ for such cards. So there
> are probably a lot of cards out there with these NetMos chips.
> 
> Again, how about other cards? Are there any PCI Multi-I/O-cards out
> there, which are supported by linux? I'd be interested in how the driver
> looks like

I donnot know and will never know. I only use hardware that does not look
too shitty to me. Time is too much important for me to waste even seconds
with dubious hardware. :)

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multi-function PCI devices

2001-04-07 Thread Gérard Roudier

On Sat, 7 Apr 2001, Michael Reinelt wrote:

> Hi there,
> 
> I've got a problem with my communication card: It's a PCI card with a
> NetMos chip, and it provides two serial and one parallel port. It's not
> officially supported by the linux kernel, so I wrote my own patch and
> sent it to the parallel, serial and pci maintainer. The patch itself is
> basically an extension of the pci id tables; and I hope it's in the
> queue for the official kernel. 
> 
> The patch worked great for me with kernel 2.4.1 and .2, but no longer
> with 2.4.3. The parallel port still works, but the serial port will not
> be detected. I had a quite long debugging session last night (adding
> printk's to the pci code takes some time, for you have to reboot to load
> the new kernel), and I think I found the reason:
> 
> The card shows up on the PCI bus as one device. For the card provides
> both serial and parallel ports, it will be driven by two subsystems, the
> serial and the parallel driver.

Given your description, this board is certainly not a multi-fonction PCI
device. Multi-function PCI devices provide separate resources for each
function in a way that allows each function to be driven by separate
software drivers. A single function PCI device that provides several
functionnalities commonly handled by separate sub-systems, is nothing but
a bag of shit we should not want to support in any O/S in my opinion.
Let me claim that ingenieers that want O/Ses to support such hardware are
either morons or bastards.

> I found that _either_ the parallel or the serial port works, depending
> on which module you load first. The reason for this seems to be in
> pci.c, especially in the pci_register_driver() function. It reads:
> 
> int pci_register_driver(struct pci_driver *drv)
> {
>   struct pci_dev *dev;
>   int count = 0;
> 
>   list_add_tail(&drv->node, &pci_drivers);
>   pci_for_each_dev(dev) {
>   if (!pci_dev_driver(dev))
>   count += pci_announce_device(drv, dev);
>   }
>   return count;
> }
> 
> 
> pci_announce_device() will be called only if there's no other driver
> claiming the device. This explains why either the parallel or the serial
> port will be detected: The first driver loaded will see the device, the
> next drivers won't.
> 
> I'm afraid this is not a bug, but a design issue, and will be hard to
> solve. Maybe we need a flag for such devices which allows it to be
> claimed ba more thean one driver?
> 
> In the meantime, what can I do to get both ports working?

Since the hardware does not allows the software to transparently share the
different functionnalities provided by the silicium, you must handle such
sharing by software. I mean, you must, at least, write a module (or
sub-driver or sub-system) that will handle the sharing of the PCI
function. Band-aiding the kernel code in order to cope with such
brain-deaded hardware would be a pity, in my opinion. Burden must stay
where it is deserved. If they want their 'save 0.01$ but push shit ahead'
hardware to be supported, they should write their drivers themselves, in
my opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: st corruption with 2.4.3-pre4

2001-04-06 Thread Gérard Roudier




On Fri, 6 Apr 2001, Gérard Roudier wrote:

> Here is a patch that removes the offending PPC PCI hacky area from the
> driver (sym53c8xx_defs.h):
> 
> --- sym53c8xx_defs.h  Fri Apr  6 16:23:48 2001
> +++ sym53c8xx_defs.h.orig Sun Mar  4 13:54:11 2001
> @@ -175,6 +175,9 @@
>  #define  SCSI_NCR_IOMAPPED
>  #elif defined(__alpha__)
>  #define  SCSI_NCR_IOMAPPED
> +#elif defined(__powerpc__)
> +#define  SCSI_NCR_IOMAPPED
> +#define SCSI_NCR_PCI_MEM_NOT_SUPPORTED
>  #elif defined(__sparc__)
>  #undef SCSI_NCR_IOMAPPED
>  #endif
>  Cut Here --

The patch is obviously reversed. You just have to remove the 3 lines that
apply to powerpc using you preferred editor.
Btw, using the one you dislike the most will also fit. :-)

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: st corruption with 2.4.3-pre4

2001-04-06 Thread Gérard Roudier

On Thu, 5 Apr 2001, Geert Uytterhoeven wrote:

> 
> BTW, my 2.4.3-pre8 kernel just said
> 
> | sym53c875-0:0: ERROR (81:0) (3-21-0) (10/9d) @ (script 8a8:0b00).

Illegal instruction detected.

> | sym53c875-0: script cmd = 1100
> | sym53c875-0: regdump: da 10 80 9d 47 10 00 0d 00 03 80 21 80 01 09 09 00 30 4e 00 
>08 ff ff ff.
> | sym53c875-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50.0 ns, offset 16)
> 
> during the boot process, and continued without problems. What does this mean?

Looks extremally serious to me.

The SCRIPTS processor should be fetching CHMOV DSA relative when DATA_IN
instructions. This corresponds to opcode 0x1100.

However, it seems to have fetched instruction 0x0b00 which is a 
MOVE ABSOLUTE WHEN STATUS PHASE.

In (3-21-0) we can see that the chip is expecting STATUS PHASE (3), but
the target is driving DATA IN phase (21 - the 1 indicates DATA IN phase).

In other word, the SCRIPTS processor seems to have fetched a bogus
instruction. The signaled 'illegal instruction detected' may be due to the 
count of bytes to transfer to be zero.

> Gr{oetje,eeting}s,

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: st corruption with 2.4.3-pre4

2001-04-06 Thread Gérard Roudier

On Fri, 6 Apr 2001, Stefano Coluccini wrote:

> > I'm still waiting for other reports of st/sym53c8xx on PPC under
> > 2.4.x. BTW,
> > does it work on other big-endian platforms, like sparc?
> 
> I don't know if it is the same problem, but ...
> I have a Motorola MVME5100 (PowerPC 750 based CPU) with a mezzanine PCI
> based on the sym53c875 chip. I'm using the 2_5 kernel from fmslabs and the
> first time I have downloaded the kernel all works fine, while in a
> successive update the sym53c8xx driver was changed and my board don't work
> anymore. The driver hangs on downloading the SCSI scripts.
> I'm not a SCSI driver expert, so I've solved the problem installing the old
> version of the driver.
> Tom Rini says to me that it happened when he have merged some updates from
> the 2_4 tree, so I think my problem is related to the latest updates to the
> driver.
> I hope this helps you.
> Bye.
> Stefano.

IMO, it might well be the Linux/PPC PCI interface that doesn't return
expected values.

1) The [sym|ncr]53c8xx need to know about BAR addresses as physical
   address values as seen from the BUS. These values are used by the 
   SCSI SCRIPTS and _NOT_ by the CPU.

2) The pcidev structure returns cookies instead, that commonly are
   BARs physical addresses as seen from CPU.

The recent change in the Symbios driver about point (1) is that the
driver now reads the BARs using the pci_read_config*() interface. If these
functions donnot return the actual BAR values usable from the BUS for some
obscure reasons, this may explain your problem.

The cookies contained in the pcidev structure are completely useless for
the driver and probably for any driver. They just have to be used to remap
memory BARs to CPU virtual addresses. Then the driver forgets about them.

There are still some PPC PCI specific hacks in the sym53c8xx driver and it
has been reported to me that they can be removed. If the PPC PCI interface
is correct, then they should be removed without problems, IMO.

Here is a patch that removes the offending PPC PCI hacky area from the
driver (sym53c8xx_defs.h):

--- sym53c8xx_defs.hFri Apr  6 16:23:48 2001
+++ sym53c8xx_defs.h.orig   Sun Mar  4 13:54:11 2001
@@ -175,6 +175,9 @@
 #defineSCSI_NCR_IOMAPPED
 #elif defined(__alpha__)
 #defineSCSI_NCR_IOMAPPED
+#elif defined(__powerpc__)
+#defineSCSI_NCR_IOMAPPED
+#define SCSI_NCR_PCI_MEM_NOT_SUPPORTED
 #elif defined(__sparc__)
 #undef SCSI_NCR_IOMAPPED
 #endif
 Cut Here --

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Assumption in sym53c8xx.c failed

2001-04-03 Thread Gérard Roudier



Hello,

The patch I sent you yesterday has a typo that can lead to serious
breakage. This has been pointed out by Michal Jaegermann. Thanks, Michal.
(k must be compared against -1 and not against 1)

You want to apply this new tiny patch instead and let me know if it fixes:

--- sym53c8xx.c.0402Mon Apr  2 15:58:32 2001
+++ sym53c8xx.c Tue Apr  3 07:15:00 2001
@@ -10167,14 +10167,13 @@
if (i >= MAX_START*2)
i = 0;
}
-   assert(k != -1);
-   if (k != 1) {
+   if (k != -1) {
np->squeue[k] = np->squeue[i]; /* Idle task */
np->squeueput = k; /* Start queue pointer */
-   cp->host_status = HS_ABORTED;
-   cp->scsi_status = S_ILLEGAL;
-   ncr_complete(np, cp);
}
+   cp->host_status = HS_ABORTED;
+   cp->scsi_status = S_ILLEGAL;
+   ncr_complete(np, cp);
}
break;
/*
- Cut Here -

  Gérard.

On Mon, 2 Apr 2001, Gérard Roudier wrote:

> 
> 
> On Sat, 31 Mar 2001, Christian Kurz wrote:
> 
> > Hi,
> > 
> > I'm currently running 2.4.2-ac28 and today I got a failing assumption in
> > sym53c8xx.c. I'm not sure about the exact steps that I did to produce
> > this error, but it must have been something like: cdparanoia -blank=all,
> > then sending Ctrl+C to this process and after it's been killed
> > cdparanoia -blank=fast. I then got assertion: k!=-1 failed. But I found
> > no hint about this in the messages or syslog file. So I looked through
> > sym53c8xx.c to find this code and it seems like line 10123 is
> > responsible for creating this error and kernel panic. Should this be the
> > normal behaviour or is this a bug in the code?
> 
> This might well be both at the same time. I mean normal behaviour given a
> bug in the code. :-)
> 
> Could you try this tiny patch and let me know:
> 
> --- sym53c8xx.c.0402  Mon Apr  2 15:58:32 2001
> +++ sym53c8xx.c   Mon Apr  2 16:02:43 2001
> @@ -10167,14 +10167,13 @@
>   if (i >= MAX_START*2)
>   i = 0;
>   }
> - assert(k != -1);
>   if (k != 1) {
>   np->squeue[k] = np->squeue[i]; /* Idle task */
>   np->squeueput = k; /* Start queue pointer */
> - cp->host_status = HS_ABORTED;
> - cp->scsi_status = S_ILLEGAL;
> - ncr_complete(np, cp);
>   }
> + cp->host_status = HS_ABORTED;
> + cp->scsi_status = S_ILLEGAL;
> + ncr_complete(np, cp);
>   }
>   break;
>   /*
> 
> What happens is that this part of the driver code assumed that the CCB for
> an IO to abort is queued to the SCSI SCRIPTS. This is not always true
> since the driver may temporarily not queue all IOs to SCRIPTS. This may
> happens on QUEUE FULL condition or for devices that donnot accept tagged
> commands, for example.
> 
> Regards,
>   Gérard.
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Assumption in sym53c8xx.c failed

2001-04-02 Thread Gérard Roudier




On Sat, 31 Mar 2001, Christian Kurz wrote:

> Hi,
> 
> I'm currently running 2.4.2-ac28 and today I got a failing assumption in
> sym53c8xx.c. I'm not sure about the exact steps that I did to produce
> this error, but it must have been something like: cdparanoia -blank=all,
> then sending Ctrl+C to this process and after it's been killed
> cdparanoia -blank=fast. I then got assertion: k!=-1 failed. But I found
> no hint about this in the messages or syslog file. So I looked through
> sym53c8xx.c to find this code and it seems like line 10123 is
> responsible for creating this error and kernel panic. Should this be the
> normal behaviour or is this a bug in the code?

This might well be both at the same time. I mean normal behaviour given a
bug in the code. :-)

Could you try this tiny patch and let me know:

--- sym53c8xx.c.0402Mon Apr  2 15:58:32 2001
+++ sym53c8xx.c Mon Apr  2 16:02:43 2001
@@ -10167,14 +10167,13 @@
if (i >= MAX_START*2)
i = 0;
}
-   assert(k != -1);
if (k != 1) {
np->squeue[k] = np->squeue[i]; /* Idle task */
np->squeueput = k; /* Start queue pointer */
-   cp->host_status = HS_ABORTED;
-   cp->scsi_status = S_ILLEGAL;
-   ncr_complete(np, cp);
}
+   cp->host_status = HS_ABORTED;
+   cp->scsi_status = S_ILLEGAL;
+   ncr_complete(np, cp);
}
break;
/*

What happens is that this part of the driver code assumed that the CCB for
an IO to abort is queued to the SCSI SCRIPTS. This is not always true
since the driver may temporarily not queue all IOs to SCRIPTS. This may
happens on QUEUE FULL condition or for devices that donnot accept tagged
commands, for example.

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WG: 2.4 on COMPQ Proliant

2001-03-29 Thread Gérard Roudier




On Thu, 29 Mar 2001, Butter, Frank wrote:

> 2.2.16 claimes to find a ncr53c1510D-chipset, supported by
> the driver ncr53c8xx. Which kernel-param would be the correct one for this?

There is no specific kernel option apart configuring the NCR53C8XX and/or
the SYM53C8XX driver. (And not the 53c7,8xx driver as I have to repeat
since 5 years).

For the 53c1510D, you must ensure that the chip is configured for behaving
as a 53c8xx chip (and not as an intelligent controller).
To know about the 1510 configuration, you may check the PCI device id
claimed by the chip. Value 0xa is the expected value if the chip is
configured for 53c8xx mode.

Btw, I donnot know how to change the 1510 from intelligent mode to 53c8xx
mode and vice-versa. You may ask your vendor for that.

  Gérard.

> Frank
> 
> > -Ursprüngliche Nachricht-
> > Von: Butter, Frank 
> > Gesendet: Donnerstag, 29. März 2001 17:11
> > An: '[EMAIL PROTECTED]'
> > Betreff: 2.4 on COMPQ Proliant
> > 
> > 
> > 
> > Has anyone experiences with 2.4.x on recent Compaq Proliant 
> > Servers (e.g. ML570)?
> > 
> > I've installed RedHat7 and it worked fine out of the box.
> > Except that the SMP-enabled kernel stated there was no 
> > SMP-board detected ;-/
> > For some reasons (Fibrechannel drivers and so on) I've compiled
> > 2.4.2 and installed it. Although I've compiled the support 
> > in, the NCR-SCSI-chip was not found and therefore no 
> > root-partition. It is a model supported by 53c8xx - detected 
> > by the original RedHat-kernel.  
> > 
> > For testing I compiled a kernel with all (!) scsi-low-level-drivers -
> > with the same result. The SMP-board also was NOT detected by 2.4.2.
> > 
> > Any hint?
> > 
> > Frank
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NCR53c8xx driver and multiple controllers...(not new prob)

2001-03-26 Thread Gérard Roudier

On Sun, 25 Mar 2001, LA Walsh wrote:

> Here is the 'alternate' output when the ncr53c8xx driver is
> compiled in:
> 
> SCSI subsystem driver Revision: 1.00
> scsi-ncr53c7,8xx : at PCI bus 0, device 8, function 0
> scsi-ncr53c7,8xx : warning : revision of 35 is greater than 2.
> scsi-ncr53c7,8xx : NCR53c810 at memory 0xfa101000, io 0x2000, irq 58
> scsi0 : burst length 16
> scsi0 : NCR code relocated to 0x37d6c610 (virt 0xf7d6c610)
> scsi0 : test 1 started
> scsi0 : NCR53c{7,8}xx (rel 17)
> request_module[block-major-8]: Root fs not mounted
> VFS: Cannot open root device "807" or 08:07
> Please append a correct "root=" boot option
> Kernel panic: VFS: Unable to mount root fs on 08:07
> -

The 53c7,8xx driver is a different driver that hasn't been updated for the
support of the 53C896. I figure out that you already have been replied
upon this point.

For your machine configuration you want to use either the NCR53C8XX driver
or the SYM53C8XX driver. The SYM53C8XX driver has a better support for the
896 as it handles phase mismatch from SCRIPTS. The both drivers share the
same kernel config options for simplicity (in fact the SYM53C8XX driver
just steals the NCR53C8XX config options).

Go to the SCSI low-level configuration form under make menuconfig for
example and configure the SYM53C8XX driver as 'Y' and the NCR53C8XX driver
as 'N'. Btw, also configure 'N' the 53c7,8xx driver to avoid conflicts.

You may also have a look at the help entries for these drivers and at the 
file linux/drivers/scsi/README.ncr53c8xx (also applies to SYM53C8XX).

A driver named SYM-2 that replaces both the NCR53C8XX and the SYM53C8XX
drivers also exists. This driver is multi-platform and for now has been 
added support for Linux, FreeBSD and NetBSD. It is intended to replace the
NCR53C8XX and the SYM53C8XX. It is not included in Linux for now since we
only have _stable_ kernels at the moment and the NCR+SYM driver pair is
the current _stable_ support for SYMBIOS 53C8XX controllers. 
If you want to try SYM-2: ftp://ftp.tux.org/roudier/README-drivers-linux

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NCR53c8xx driver and multiple controllers...(not new prob)

2001-03-25 Thread Gérard Roudier

Hello,

You sent me by private e-mail, the /proc/pci information of your system.
I donnot see anything wrong in these data. If the problem is PCI-related,
you would probably have better help by sending these infos to the l-k
list, in my opinion.

To be honest (I may be wrong here - sorry if I am), I am under the
impression that you may well be barking up the wrong tree.
Just quoted here an exerpt of your mail that let me think so:

> > When I compile it in, it only see the 1st controller
> > and the boot partition I think is on the 3rd.
 ^
Knowing instead is required, here, IMO.

1) controllers attach SCSI devices, notably hard disks
2) hard disks contain partitions
3) partitions contain file systems.
4) the kernel needs to *know* the root file system to boot your system.

What about point 4 ?

i.e : how did you tell to the boot loader the root partition name to 
pass to the kernel at boot ?

If you are using lilo and have configured it for user to be prompted
before running the kernel, you may try to enter your root partition name
when you are prompted by lilo. For example if root fs is on /dev/sda5:

lilo: your_lilo_config_entry_name root=/dev/sda5 ro

Just replace the lilo_config_entry_name and the root partition name by the
values that match your configuration.

  Gérard.

On Sat, 24 Mar 2001, Gérard Roudier wrote:

> On Sat, 24 Mar 2001, LA Walsh wrote:
> 
> > I have a machine with 3 of these controllers (a 4 CPU server).  The
> > 3 controllers are:
> > ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58
> > ncr53c810a-0: ID 7, Fast-10, Parity Checking
> > ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57
> > ncr53c896-1: ID 7, Fast-40, Parity Checking
> > ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56
> > ncr53c896-2: ID 7, Fast-40, Parity Checking
> > ncr53c896-2: on-chip RAM at 0xfe002000
> > 
> > I'd like to be able to make a kernel with the driver compiled in and
> > no loadable module support.  It don't see how to do this from the
> > documentation -- it seems to require a separate module loaded for
> > each controller.  When I compile it in, it only see the 1st controller
> > and the boot partition I think is on the 3rd.  Any ideas?
> 
> The driver tries to detect all controllers it supports. Since the
> ncr53c8xx supports both the 810a and the 896, all your controllers should
> have been detected. When loaded as a module, the driver must be loaded
> once (btw, a seconf load should fail).
> 
> > This problem is present in the 2.2.x series as well as 2.4.x (x up to 2).
> 
> What hardware are you using (CPU, Core Logic and tutti quanti) ?
> Is the 896 on PCI BUS #0 or on some sort of secondary PCI BUS ?
> Does the sym53c8xx driver show same behaviour ?
> 
>   Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NCR53c8xx driver and multiple controllers...(not new prob)

2001-03-24 Thread Gérard Roudier




On Sat, 24 Mar 2001, LA Walsh wrote:

> I have a machine with 3 of these controllers (a 4 CPU server).  The
> 3 controllers are:
> ncr53c810a-0: rev=0x23, base=0xfa101000, io_port=0x2000, irq=58
> ncr53c810a-0: ID 7, Fast-10, Parity Checking
> ncr53c896-1: rev=0x01, base=0xfe004000, io_port=0x3000, irq=57
> ncr53c896-1: ID 7, Fast-40, Parity Checking
> ncr53c896-2: rev=0x01, base=0xfe004400, io_port=0x3400, irq=56
> ncr53c896-2: ID 7, Fast-40, Parity Checking
> ncr53c896-2: on-chip RAM at 0xfe002000
> 
> I'd like to be able to make a kernel with the driver compiled in and
> no loadable module support.  It don't see how to do this from the
> documentation -- it seems to require a separate module loaded for
> each controller.  When I compile it in, it only see the 1st controller
> and the boot partition I think is on the 3rd.  Any ideas?

The driver tries to detect all controllers it supports. Since the
ncr53c8xx supports both the 810a and the 896, all your controllers should
have been detected. When loaded as a module, the driver must be loaded
once (btw, a seconf load should fail).

> This problem is present in the 2.2.x series as well as 2.4.x (x up to 2).

What hardware are you using (CPU, Core Logic and tutti quanti) ?
Is the 896 on PCI BUS #0 or on some sort of secondary PCI BUS ?
Does the sym53c8xx driver show same behaviour ?

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: etymology was: OOM fest

2001-03-24 Thread Gérard Roudier

On Sat, 24 Mar 2001 [EMAIL PROTECTED] wrote:

> > Btw, 'decade' comes from Latin 'deca'=10 and dies=days
> 
> No. It is from the Greek dekas, dekados (group of ten).

All my french dictionnaries agree with you. Thanks for the fix. :-)

> > Could it be due to the word 'decadent'
> 
> Unrelated. (MLatin: to fall down.)

This one was intended to be the joke, but given the previous
show-stopper... :-(

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Prevent OOM from killing init

2001-03-24 Thread Gérard Roudier

On Fri, 23 Mar 2001, Stephen E. Clark wrote:

> Alan Cox wrote:
> > 
> > > You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
> > > Machine)
> > 
> > The JVM doesnt actually. The JVM will itself spontaenously explode in real
> > life when out of memory. Maybe the JVM on a DOS extender 8)
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> Back in the early nineties I was working with 18 developers on a Data
> General Aviion running DGUX. The system had only 16mb of memory and
> 600mb of disk. We were all continuously going thru the edit, compile,
> debug steps developing as large Computer Aided Dispatch System. Never
> did this system with its limited resources crash, or randomly start
> killing user or system processes.

What about the following (it is an estimate):

early nineties  -->  early eighties
18 developers   -->  18 developers
16mb of memory  -->   1 mb of memory
600 mb of disk  -->  70 mb of disk

Most current applications are so huge BLOATAGE that they should not 
deserve to be run just once. :-)
The kernel must try to cope with that and also with its own BLOATAGE.

Human nature is to eat what can be eaten, regardless if it is useful or
not.

> My $.02.

What about 'My M$.02' in some decades. :)

Btw, 'decade' comes from Latin 'deca'=10 and dies=days (not sure for
dies). As a result, it should have meant a period of 10 days instead of 10
years. It means a period of 10 days in French.

May-be, a knowledgeable person at this list has an explanation for this
misinterpretation. Could it be due to the word 'decadent' that has a 
very different ethymology.

10 days is too short for getting decadent, but 10 years should be enough,
no ? :-)

> Steve

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: st corruption with 2.4.3-pre4

2001-03-20 Thread Gérard Roudier

On Tue, 20 Mar 2001, Geert Uytterhoeven wrote:

> On Tue, 20 Mar 2001, Geert Uytterhoeven wrote:
> > On Mon, 19 Mar 2001, Jeff Garzik wrote:
> > > Is the corruption reproducible?  If so, does the corruption go away if
> > 
> > Yes, it is reproducible. In all my tests, I tarred 16 files of 16 MB each to
> > tape (I used a new one).
> >   - test 1: 4 files with failed md5sum (no further investigation on type of
> > corruption)
> >   - test 2: 7 files with failed md5sum, 7 blocks of 32 consecutive bytes were
> > corrupted, all starting at an offset of the form 32*x+1.
> >   - test 3: 7 files with failed md5sum, 7 blocks of 32 consecutive bytes were
> > corrupted, all starting at an offset of the form 32*x+1.
> > 
> > The files seem to be corrupted during writing only, as reading always gives the
> > exact same (corrupted) data back.
> > 
> > Copying files from the disk on the MESH to a disk on the Sym53c875 (which also
> > has the tape drive) shows no corruption.
> 
> I did some more tests:
>   - The problem also occurs when tarring up files from a disk on the Sym53c875.
>   - The corrupted data always occurs at offset 32*x (the `+1' above was caused
> by hexdump, starting counting at 1).
>   - The 32 bytes of corrupted data at offset 32*x are always a copy of the data
> at offset 32*x-10240.
>   - Since 10240 is the default blocksize of tar (bug in tar?), I made a tarball
> on disk instead of on tape, but no corruption.
>   - 32 is the size of a cacheline on PPC. Is there a missing cacheflush
> somewhere in the Sym53c875 driver? But then it should happen on disk as
> well?

The only PCI transaction that requires the cache line size to be correctly
configured is PCI WRITE and INVALIDATE. This transaction may be used by
the 875 only for data read from a SCSI device and DMAed to memory.

Note that the controller may use optimized PCI transactions only if the 
cache line size is configured in its PCI device configuration space.
Otherwise only normal PCI memory read and PCI memory write transactions 
will be used.

Could you check if the cache line size is configured for your 875?

Let me imagine it is so. Btw, I may be wasting my time if it is not ...
Then the 875 may also use PCI read multiple transactions and/or PCI read
line transactions when reading data from memory. If the corruption is due
to the use of these transactions, the the PCI-HOST bridges may well be the
culprit, in my opinion.

Anyway, since the sym53c8xx driver does not try to change the configured
cache line size on PPC, I would suggest to try again the same tests with
the cache line size set to zero for the 875. You may hack the driver code
or the PPC pci code if needed, for example, for value zero to be written
in the proper place in the PCI configuration space of the 875.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: HP Vectra XU 5/90 interrupt problems

2001-03-11 Thread Gérard Roudier

On Sun, 11 Mar 2001, John William wrote:

> If shared, edge triggered interrupts are ok then I will talk to the driver 
> maintainers about the problem. If this isn't ok, then maybe the sanity check 
> in pci-irq.c would be to force level triggering only on shared PCI 
> interrupts?

DEFINITELY NO!

Given a PCI device + driver pair, level triggerred interrupt may be 
required for them to work properly, even when the line is not shared.
Anyway, it is a requirement. OTOH, the PCI device must know how to 
trigger the interrupt.

Edge triggerred interrupts cannot be shared. Level triggerred (level
sensitive is a better wording, in my opinion) can be shared.

Even when it is not shared (as it is required), an edge triggerred
interrupt can be lost by the driver. Using level sensitive interrupt let
the interrupt condition active as long as the condition is present at, at
least, one device that wants to interrupt the CPU.

Apart sharing of interrupt lines, level sensitive interrupt allows the
device firmware to run concurrently to the CPU (software driver) without
losing interrupt condition, providing that both driver and firmware use
appropriate barriers against buffering in the bridge.
In the same situation, using edge triggerred interrupt (not shared) can
lead to interrupt condition being lost by the software driver.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SLAB vs. pci_alloc_xxx in usb-uhci patch [RFC: API]

2001-03-09 Thread Gérard Roudier

On Fri, 9 Mar 2001, David Brownell wrote:

> Gérard --
> 
> > Just for information to people that want to complexify the
> > pci_alloc_consistent() interface thats looks simple and elegant to me:
> 
> I certainly didn't propose that!  Just a layer on top of the
> pci_alloc_consistent code -- used as a page allocator, just
> like you used it.
> 
> 
> >   The object file of the allocator as seen in sym2 is as tiny as 3.4K
> >   unstripped and 2.5K stripped.
> 
> What I sent along just compiled to 2.3 KB ... stripped, and "-O".
> Maybe smaller with normal kernel flags.  The reverse mapping
> code hast to be less than 0.1KB.

If reverse mapping means bus_to_virt(), then I would suggest not to
provide it since it is a confusing interface. OTOH, only a few drivers
need or want to retrieve the virtual address that lead to some bus dma
address and they should check that this virtual address is still valid
prior to using it. As I wrote, some trivial hashed list can be used by
such drivers (as sym* do).

> I looked at your code, but it didn't seem straightforward to reuse.
> I think the allocation and deallocation costs can be pretty comparable
> in the two implementations.  Your implementation might even fit behind
> the API I sent.  They're both layers over pci_*_consistent (and both
> have address-to-address mappings, implemented much the same).

I wanted the code as short as possible since the driver code is already
very large. On the other hand there are bunches of #ifdef to deal with all
still alive kernel versions. As a result, the code may well not be general
nor clean enough to be moved to the kernel. Just what it actually does is 
fairly simple.

> > Now, if modern programmers are expecting Java-like interfaces for writing
> > kernel software, it is indeed another story. :-)
> 
> Only if when you wrote "Java-like" you really meant "reusable"!  :)

Hmmm... 'reusable' implies 'usable'...
Does 'usable' apply to Java applications ? :-)

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SLAB vs. pci_alloc_xxx in usb-uhci patch [RFC: API]

2001-03-09 Thread Gérard Roudier

On Fri, 9 Mar 2001, David Brownell wrote:

> > > > > extern void *
> > > > > pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
> > > > 
> > > > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > > > yet.
> > > 
> > > Some hardware (like OHCI) talks to drivers using those dma handles.
> > 
> > I wonder if it may be feasible to allocate a bunch of contiguous
> > pages. Then, whenever the hardware returns a bus address, subtract
> > the remembered bus address of the zone start, add the offset to
> > the virtual and voila.
> 
> That's effectively what the implementation I posted is doing.
> 
> Simple math ... as soon as you get the right "logical page",
> and that page size could become a per-pool tunable.  Currently
> one logical page is PAGE_SIZE; there are some issues to
> deal with in terms of not crossing page boundaries.
> 
> There can be multiple such pages, known to the pool allocator
> and hidden from the device drivers.  I'd expect most USB host
> controllers wouldn't allocate more than one or two pages, so
> the cost of this function would typically be small.

Just for information to people that want to complexify the
pci_alloc_consistent() interface thats looks simple and elegant to me:
(Hopefully, I am not off topic here)

1) The sym53c8xx driver and friends expect this simple interface to 
   return naturally aligned memory chunks. It mostly allocates 1 page 
   at a time.

2) The sym* drivers use a very simple allocator that keeps track of bus 
   addresses for each chunk (page sized).
   The object file of the allocator as seen in sym2 is as tiny as 3.4K
   unstripped and 2.5K stripped.

3) The drivers need reverse virtual addresses for the DSA bus addresses
   and implements a simplistic hashed list that costs no more 
   than 10 lines of trivial C code.

Btw, as a result, if pci_alloc_consistent() will ever return not 
naturally aligned addresses the sym* drivers will be broken for 
Linux.

This leaves me very surprised by all this noise given the _no_ issue I
have seen using the pci_alloc_consistent() interface. It looked to me a
_lot_ more simple to use than the equivalent interfaces I have had to
suffer with other O/Ses.

Now, if modern programmers are expecting Java-like interfaces for writing
kernel software, it is indeed another story. :-)

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx (and sym53c8xx) plans

2001-02-19 Thread Gérard Roudier

On Mon, 19 Feb 2001, Peter Samuelson wrote:

> [Justin Gibbs]
> > I've verified the driver's functionality on 25 different cards thus
> > far covering the full range of chips from aic7770->aic7899.
> 
> That's very good to hear.  I know the temptation of only testing on new
> hardware; that's why I was concerned.
> 
> > Lots of people here at Adaptec look at me funny when I pull a PC from
> > the scrap-heap, or pull an old, discontinued card from an unused
> > marketing display for use in my lab
> 
> Heh. (:
> 
> BTW, is there really enough common ground between the whole series of
> AIC chips to justify a single huge driver?  I know they ship three
> separate NT drivers to cover this range..

LSILOGIC also ship 3 drivers to cover the 53C810 - 53C1010 range on NT.
And, btw, these chips are all PCI.

Doing so, 12 different drivers would be needed to cover 4 different O/Ses,
for example. These drivers (I spoke about both LSILOGIC and ADAPTEC
drivers for NT) obviously work for i386, but what about architecture
dependencies at source level?

May-be this is the reason some UNIX vendors seem to love UDI. :)

If you also use SYMBIOS chips, you may give a try with SYM-2. For the
moment, it replaces only 6 drivers :) as also seems to do, for the moment,
Justin's AIC7XXX-6, by the way.

The plans seem clear to me. :-)
Btw, I _do_ like a lot better the 'one driver' plan over the '12 or more'
one.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] starfire reads irq before pci_enable_device.

2001-02-14 Thread Gérard Roudier

On Tue, 13 Feb 2001, Ion Badulescu wrote:

> On Tue, 13 Feb 2001 12:29:16 -0800, Ion Badulescu <[EMAIL PROTECTED]> 
>wrote:
> > On Tue, 13 Feb 2001 07:06:44 -0600 (CST), Jeff Garzik 
><[EMAIL PROTECTED]> wrote:
> > 
> >> On 12 Feb 2001, Jes Sorensen wrote:
> >>> In fact one has to look out for this and disable the feature in some
> >>> cases. On the acenic not disabling Memory Write and Invalidate costs
> >>> ~20% on performance on some systems.
> >> 
> >> And in another message, On Mon, 12 Feb 2001, David S. Miller wrote:
> >>> 3) The acenic/gbit performance anomalies have been cured
> >>>by reverting the PCI mem_inval tweaks.
> >> 
> >> Just to be clear, acenic should or should not use MWI?
> 
> With the zerocopy patch, acenic always disables MWI by default.
> 
> >> And can a general rule be applied here?  Newer Tulip hardware also
> >> has the ability to enable/disable MWI usage, IIRC.
> > 
> > And so do eepro100 and starfire. On the eepro100 we're enabling MWI 
> > unconditionally, and on the starfire we disable it unconditionally...
> > 
> > I should probably take a look at acenic's use of PCI_COMMAND_INVALIDATE
> > to see when it gets activated. Some benchmarking would probably help,
> > too -- maybe later today.
> 
> I did some testing with starfire, and the results are inconclusive --
> at least on my P-III is makes absolutely no difference. Does it make
> a difference on other architectures? sparc64, ia64 maybe? 
> 
> I should probably rephrase this: MWI makes no difference on i386, but
> it is claimed that using MWI *reduces* performance on some systems.
> Are there any systems on which MWI *increases* performance?

I have read several data sheets about Intel PCI-HOST bridges and they all
were said to alias PCI MWI to normal PCI MEMORY WRITE transactions.
This matches your observations just fine.
Even when MWI is handled as MW, the PCI master is required to transfer 
entire cache lines and this cannot be bad for performances. But this 
should probably not make significant difference with normal MW.

Btw, your rephrasing looks improper to me. The processor is not involved
in the handling of MWI., especially when the MWI targets the memory. It is
the PCI-HOST bridge that must be considered here. What about ServerWorks
chipset ? Hmmm... I would be glad to have docs about these ones. :(

The MWI is intended to allow optimizations based on cache lines
invalidations rather than snooping. The target (or bridge) can perfectly
elect to handle the MWI as a normal MW and so, performance should not be
significantly lowered using MWI. But nothing is perfect, as we know.

The MWI is interesting for PCI throughput optimization but the MEMORY READ
LINE and MEMORY READ MULTIPLE transactions are a lot more interesting, in
my opinion. WRITEs can be posted (buffered), but in order to stream data
from memory (prefetchable) the bridge can do a better work when it knows
the intention of the PCI MASTER. All bridges should take in considerations
hints associated with MRL and MRM. IIRC, Intel bridges do.

> I've added some code to the starfire driver that allows changing the
> use of MWI at module load time, just in case. By default, it activates
> it.

You should also play with MRL and MRM, in my opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

SYM-2 / SYM-1 / NCR-3 drivers UPdates

2001-02-10 Thread Gérard Roudier



Updated drivers for SYMBIOS 53C[8XX|1010] chips are available from 
the ftp.tux.org site.

sym53c8xx-1.7.3-pre1 + ncr53c8xx-3.4.3-pre1
---
URL (entered by hand):
ftp.tux.org://roudier/drivers/linux/stable/sym-1.7.3-ncr-3.4.3-pre1.tar.gz

sym-2.1.6
-
URL (entered by foot :))
ftp.tux.org://roudier/drivers/portable/sym-2.1.6-20010207.tar.gz

The former is an update for the driver bundle currently in 2.2 and 2.4.
The latter is the portable sym driver that for now supports Linux and 
FreeBSD.

Stock sym/ncr drivers in both 2.2 and 2.4 are more than 6 months old and
need to be updated. My plan is to leave kernel maintainers the choice
between sym-1/ncr-3 and sym-2. Btw, sym-2 is anyway candidate for 2.5.

The both (tri?) drivers do call pci_enable_device() prior to looking 
into the pcidev structure. Donnot colour me happy of that, but given that
it is not me but kernel maintainers that will be bashed if this breaks
firmware RAID, I didn't see any problem for this change. :-)

If some additionnal testing could be performed this week-end by courageous
Linux users, this will avoid some noise once sent to kernel maintainers,
if I missed something important in the updates.

  Gérard.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] starfire reads irq before pci_enable_device.

2001-02-10 Thread Gérard Roudier

On Fri, 9 Feb 2001, Alan Cox wrote:

> > > For non routing paths its virtually free because the DMA forced the lines
> > > from cache anyway. 
> > 
> > Are you actually sure about this? I thought DMA from PCI devices reached 
> > the main memory without polluting the L2 cache. Otherwise any large DMA 
> > transfer would kill the cache (think frame grabbers...)
> 
> DMA to main memory normally invalidates those lines in the CPU cache rather
> than the cache snooping and updating its view of them.

In PCI, it is the Memory Write and Invalidate PCI transaction that is
intended to allow core-logics to optimize DMA this way. For normal Memory
Write PCI transactions or when the core-logic is aliasing MWI to MW, the
snooping may well happen. All that stuff, very probably, varies a lot
depending on the core-logic.

As we know, in normal PCI, the target is not told about the transaction
length prior to the bursting of the data. This makes difficult for a core
logic to use cache invalidation rather than dma snooping when a normal MW
is used, thus the invention of MWI.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] Hamachi not doing pci_enable before reading resources

2001-02-07 Thread Gérard Roudier



You missed the newer statements about every piece of hardware being
assumed to be hot-pluggable and all the hardware being under full control
by CPU.

You also missed the well known point that only device drivers are broken
under Linux and that all the generic O/S code is just perfect. :-)

  Gérard.

On Wed, 7 Feb 2001, Richard B. Johnson wrote:

> On Wed, 7 Feb 2001 [EMAIL PROTECTED] wrote:
> 
> > 
> > Hi Alan,
> > 
> >  Another driver not doing pci_enable_device() early enough.
> > 
> > Dave.
> > 
> 
> A PCI device does not and should not be enabled to probe for resources!
> It is only devices that have BIOS that require the device to be enabled
> for memory I/O prior to downloading the BIOS into RAM. The BARs are
> read/writable (and are required to be), even when the Mem/I/O bits
> in the cmd/status register are clear.
> 
> This is a required condition!  You certainly don't want to write all
> ones to a decode (to find the resource length) of a live, on-line chip!
> If the chip hickups (think network chips connected to networks, on a
> warm-boot), you will trash lots of stuff in memory.
> 
> It looks as though you are "fixing" drivers that are not broken and,
> in fact, are trying to do the right thing. Maybe the PCI code in the
> kernel is preventing access to resources unless the device has been
> enabled??? If so, it's broken and should be fixed, instead of all
> the drivers.
> 
> Cheers,
> Dick Johnson
> 
> Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
> 
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups

2001-02-03 Thread Gérard Roudier




On Fri, 2 Feb 2001, Manfred Spraul wrote:

> Gérard Roudier wrote:
> > 
> > So, why not using a pure software flag in memory and only tampering the
> > things if the offending interrupt is actually delivered ? If the given
> > interrupt is delivered and the software mask is set we could simply do:
> > 
> > - MASK the given interrupt
> > - EOI it.
> > - return
> >
> Good idea.
> I implemented it, and it was a full success: not it always locks up :-(
> 
> If I apply the attached patch, then I get a lockup after ~ 100 packets
> flood ping.
> I've also attached the dmesg print.
> I know that startup is currently wrong (must set trigger to level), but
> that doesn't matter since I only ifup once.
> 
> But I think we can change the bug description:
> 
> If an io apic io redirection entry is unmasked while the irq pin is
> active, then the io apic sends out the interrupt as edge triggered, but
> nevertheless sets the IRR bit.

Interesting.

My little finger tells me that O/Ses that thread interrupts might well
want to rely on MASK + EOI in order to quiesce incoming level-sensitive
interrupts.

Note that tampering the IO/APIC after initializations looks extremally
ugly to me. In my opinion, only the local APIC was intended by Intel
designers to be accessed by CPU after initialization (I may be wrong
here).

> In a second test run I checked the TMR bit in the local apics: the bit
> on the cpu that received the last interrupt is really 0.
> 
> I'll not try a 2 step enable:
> * unmask.
> * io_apic_sync()
> * set trigger mode to level.

Thanks for having tried my suggestion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups

2001-02-02 Thread Gérard Roudier

On Fri, 2 Feb 2001, Maciej W. Rozycki wrote:

> On Thu, 1 Feb 2001, Andrew Morton wrote:

> +/*
> + * It appears there is an erratum which affects at least the 82093AA
> + * I/O APIC.  If a level-triggered interrupt input is being masked in
> + * the redirection entry while the interrupt is send pending (its
> + * delivery status bit is set), the interrupt is erroneously
> + * delivered as edge-triggered but the IRR bit gets set nevertheless.
> + * As a result the I/O unit expects an EOI message but it will never
> + * arrive and further interrupts are blocked for the source.
> + *
> + * A workaround is to set the trigger mode to edge when masking
> + * a level-triggered interrupt and to revert the mode when unmasking.
> + * The idea is from Manfred Spraul.  --macro
> + */

Is the below idea feasible or just stupid:

Once a level-sensitive interrupt has been accepted by a local APIC, the IO
APIC will wait for the EIO message prior to delivering again this
interrupt. Therefore masking a level-sensitive interrupt once it has been
delivered and prior to EIOing it should not race with the APIC hardware.

So, why not using a pure software flag in memory and only tampering the
things if the offending interrupt is actually delivered ? If the given
interrupt is delivered and the software mask is set we could simply do:

- MASK the given interrupt
- EOI it.
- return

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] make sym53c8xx.c and ncr53c8xx.c call pci_enable_device

2001-01-31 Thread Gérard Roudier

On Wed, 31 Jan 2001, Alan Cox wrote:

> > If the pci_enable_device() thing is to be added to the drivers, it must
> > preferently be placed after the checking against RAID attachement.
> 
> You cant check the signature until the device has been enabled. Maybe you
> could poke the LSI guys and find out how windows PnP stuff handles this.

Personnaly I donnot need to know how windows handle this in order to
figure out how it should be properly handled.

In theory, the signature should be checked prior to any change in the
device configuration space. But since PCI BIOS assigning of resource
windows is complete mess-up, the O/S has to probe BAR sizes. The probing
of BAR sizes does not seem to harm. This done, given that it is possible
then to first check the signature and to leave quiet the device if it is
owned by RAID, there is no valuable reason to still tamper the device for
nothing if it is not to be attached.

> I think the proposed change is ok because if the board is enabled then the 
> enabling code won't touch it.

Can you swear that the code will never change ?

Anyway, the current code at least enables response to IO and to MEM based
on existing BARs. This probably does not harm but this should be done on
behalf of the software driver and based on actually _used_ resources. On
the other hand, the complex enabling of IRQ is really something we want to
avoid in situation where it is not actually useful.

End of story, since moving pci_enable_device() in its preferred place as I
suggested is an obvious minute change.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] make sym53c8xx.c and ncr53c8xx.c call pci_enable_device(241p11)

2001-01-29 Thread Gérard Roudier




You missed that SYMBIOS devices can be attached by RAID firmware and,
that, in such situation, the kernel should avoid tampering the SYMBIOS
device. The [ncr|sym]53c8xx drivers are aware of that, and donnot attach
SYMBIOS devices owned by RAID.

If the pci_enable_device() thing is to be added to the drivers, it must
preferently be placed after the checking against RAID attachement.

(look for 0x52414944 in the driver source which is the signature tested by
the drivers, or look into beta SYM-2).

Gérard.

On Mon, 29 Jan 2001, Rasmus Andersen wrote:

> Hi.
> 
> The following patch makes drivers/scsi/sym53c8xx.c and (by way of
> sym53c8xx_comm.h::sym53c8xx__detect) ncr53c8xx.c do a
> pci_enable_device after finding a device.
> 
> It applies against ac12 and 241p11.
> 
> Comments?
> 
> 
> --- linux-ac11-clean/drivers/scsi/sym53c8xx.c Mon Jan  1 19:23:21 2001
> +++ linux-ac11/drivers/scsi/sym53c8xx.c   Thu Jan 25 23:12:06 2001
> @@ -13294,6 +13294,8 @@
>   ++j;
>   continue;
>   }
> + if (pci_enable_device(pcidev))
> + continue;
>   /* Some HW as the HP LH4 may report twice PCI devices */
>   for (i = 0; i < count ; i++) {
>   if (devtbl[i].slot.bus   == PciBusNumber(pcidev) && 
> --- linux-ac11-clean/drivers/scsi/sym53c8xx_comm.hMon Oct 16 21:56:50 2000
> +++ linux-ac11/drivers/scsi/sym53c8xx_comm.h  Fri Jan 26 22:54:19 2001
> @@ -2754,6 +2754,8 @@
>   ++j;
>   continue;
>   }
> + if (pci_enable_device(pcidev))
> + continue;
>   /* Some HW as the HP LH4 may report twice PCI devices */
>   for (i = 0; i < count ; i++) {
>   if (devtbl[i].slot.bus   == PciBusNumber(pcidev) && 
> 
> -- 
> Regards,
> Rasmus([EMAIL PROTECTED])
> 
> When C++ is your hammer, everything looks like a thumb.  Steven M. Haflich
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] make drivers/scsi/sym53c8xx.c check request_region'sreturn code (241p9)]

2001-01-28 Thread Gérard Roudier




On Thu, 25 Jan 2001, Rasmus Andersen wrote:

> Hi.
> 
> I apparently forgot to cc the lists on this one. Replies should be cc'ed
> to [EMAIL PROTECTED] also.
> 
> Thanks.

The change should not harm, but request_region() is very unlikely to fail
here. Reason is that the drivers previously perform a check_region() in
order to synchronyze with any other driver that may have attached the same
device (candidates are : sym53c8xx, ncr53c8xx, 53c7,8xx)

By the way, as PCI only device drivers, the [sym|ncr]53c8xx drivers want
to use MMIO and not normal IO. The 'normal IO' path is here for archs that
donnot want to accept MMIO. Note that there are a couple of strangenesses
here:

1) The FreeBSD sym driver (derived from sym53c8xx) works with MMIO on
   FreeBSD/Alpha, but the sym53c8xx fails MMIO under Linux/Alpha.
2) Power/PC port wants drivers to use normal IOs even on machines that
   only have MMIO.

(2) looks rather a weirdness than a strangeness. :-)

Gérard.

> - Forwarded message from Rasmus Andersen <[EMAIL PROTECTED]> -
> 
> Date: Tue, 23 Jan 2001 23:37:14 +0100
> From: Rasmus Andersen <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: [PATCH] make drivers/scsi/sym53c8xx.c check request_region's return code 
>(241p9)
> User-Agent: Mutt/1.2.4i
> 
> Hi.
> 
> The following patch makes drivers/scsi/sym53c8xx.c check the return
> code of request_region. It applies cleanly against ac10 and 241p9.
> 
> Please comment.
> 
> 
> 
> --- linux-ac10-clean/drivers/scsi/sym53c8xx.c Mon Jan  1 19:23:21 2001
> +++ linux-ac10/drivers/scsi/sym53c8xx.c   Sun Jan 21 21:40:54 2001
> @@ -5817,7 +5817,11 @@
>   */
>  
>   if (device->slot.io_port) {
> - request_region(device->slot.io_port, np->base_ws, NAME53C8XX);
> + if (!request_region(device->slot.io_port, np->base_ws, 
> + NAME53C8XX)) {
> + printk(KERN_ERR "Cannot mmap IO range.\n");
> + goto attach_error;
> + }
>   np->base_io = device->slot.io_port;
>   }
> 
> --- linux-ac10-clean/drivers/scsi/sym53c8xx_comm.hMon Oct 16 21:56:50 2000
> +++ linux-ac10/drivers/scsi/sym53c8xx_comm.h  Mon Jan 22 21:56:46 2001
> @@ -1799,7 +1799,8 @@
>   **Get access to chip IO registers
>   */
>  #ifdef NCR_IOMAPPED
> - request_region(devp->slot.io_port, 128, NAME53C8XX);
> + if (!request_region(devp->slot.io_port, 128, NAME53C8XX))
> + return;
>   devp->slot.base_io = devp->slot.io_port;
>  #else
>   devp->slot.reg = (struct ncr_reg *) remap_pci_mem(devp->slot.base, 128);
> 
> 
> -- 
> Regards,
> Rasmus([EMAIL PROTECTED])
> 
> It isn't pollution that's harming the environment. It's the impurities in
> our air and water that are doing it.  -Former U.S. Vice-President Dan
> Quayle
> 
> - End forwarded message -
> 
> -- 
> Regards,
> Rasmus([EMAIL PROTECTED])
> 
> Freedom of the press is limited to those who own one.
>  - A.J. Liebling 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Poor SCSI drive performance on SMP machine, 2.2.16

2001-01-28 Thread Gérard Roudier

On Sun, 28 Jan 2001, paradox3 wrote:

> I have an SMP machine (dual PII 400s) running 2.2.16 with one 10,000 RPM IBM
> 10 GB SCSI drive
> (AIC 7890 on motherboard, using aic7xxx.o), and four various IDE drives. The
> SCSI drive
> performs the worst. In tests of writing 100 MB and sync'ing, one of my IDE
> drives takes 31 seconds. The SCSI drive (while doing nothing else) took
> 2 minutes, 10 seconds. This is extremely noticable in file transfers that
> completely
> monopolize the SCSI drive, and are much slower than when involving the IDE
> drives.
> After a large data operation on the SCSI drive, the system will hang for
> several minutes.
> Anyone know what could be causing this? Thanks.
> 
> Attached are some data to help.

You didn't provide enough information for anybody to have a single idea
about the cause of the problem you report, in my opinion.

Just any not too old 10,000 RPM disk is able to sustain more that 25 MB/s
sequential data transfer, but cannot do better than 5 milli-seconds
latency with random IO patterns. So, result for 100 MB transfer can be
less than 4 seconds in the best case but greater than (25000*5)/1000=125
seconds for random 4K IO pattern, for example.

What you want to do, in my opinion, could be:

- Check in the syslog the actual transfer speed that has been negotiated
  for your SCSI disk.
- Also check if error messages related to disk IOs have been reported by 
  the kernel.
- Run some benchmark to check, at least, sequential IO performance (iozone
  for example will fit)

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Scanning problems - machine lockups

2001-01-19 Thread Gérard Roudier




On Fri, 19 Jan 2001, Bob Frey wrote:

> On Thu, Jan 18, 2001 at 11:24:54PM +, Stephen Kitchener wrote:
> > The only thing that might be odd is that the scanner's scsi card and the 
> > display card are using the same IRQ, but I thought that IRQ sharing was ok in 
> > the new kernels. The display card is an AGP type and the scsi card is pci.
> >
> > As you might have guessed, I am at a loss as to what to do next. Any help 
> > appriciated, even suggestions as to how I can track down what I haven't done 
> > (yet!)
> Sharing interrupts could be the problem. Interrupt sharing is supported
> in the kernel as far as two different drivers being able to register a
> handler for the same interrupt, but not much beyond that. From studying
> the code I don't find any handling of unclaimed or spurious interrupts.
> 
> Some drivers (like video cards) do not register a handler for their card's
> interrupt. So when another driver (like the advansys driver) shares an
> interrupt with this card's "unregistered" interrupt there is no one left
> to handle the interrupt. The system will loop taking an interrupt from
> the card. I've observed this using the frame buffer driver. Note: this
> problem is unnoticed if the (video) card does not share an interrupt with
> another driver, because (at least on x86) Linux does not enable the
> PIC IRQ bit for IRQs that do not have registered interrupted handlers.
> 
> For Linux I think the right way to handle this is to have each (SA_SHIRQ)
> sharing capable interrupt handler return a TRUE or FALSE value indicating
> whether the interrupt belongs to the driver. In kernel/irq.c:handle_IRQ_event()
> check the return value. If after one pass through all of the interrupt
> (action) handlers no one has claimed the inerrupt then log a warning message
> (spurious interrupt) and clear the interrupt. The difficult/painstaking
> problem is that all SA_SHIRQ drivers need to be changed to return a return
> value to make this work.

There is no ordering of interrupts with respect to transactions in PCI.
As a result, getting interrupts that does not match a pending interrupt
condition as seen by driver can happen, without the interrupt being
spurious.

As a result, the 2 following assertions:
- All interrupts in PCI are spurious
- No interrupt is PCI is spurious
Are less wrong than asserting that some interrupts in PCI are relevant and
some are spurious. :-)

And btw, some hardwares, notably Intel ones, seems to ensure coherency
prior to deliver interrupts. This is a useless work when the IRQ is
actually shared and does only make sense for ISA or ISA-like PCI devices
and in situations where the IRQ is not actually shared.

> Anyway the simplest solution for you is probably if you can is to put
> assign the video card its own interrupt. Putting the two advansys cards
> on the same interrupt is fine. I have used interrupt sharing between
> multiple advansys cards and and ethernet cards without a problem.

In theory, the O/S should warn _loudly_ if any PCI device hasn't a
software driver attached, for the reason there is no generic way to
actually quiesce completely a PCI device. As a result, loading drivers
after boot or just loading drivers with interrupt enabled at boot is
unsafe with PCI devices. This shall be considered, even if the risk of a
breakage is generally very low.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: sym-2.1.0-20001230 vs. sg (cdrecord)

2001-01-11 Thread Gérard Roudier

On Thu, 11 Jan 2001, Boszormenyi Zoltan wrote:

> Hi!
> 
> I just wanted to let you know that I successfully ruined
> a CD with 2.4.0 + sym-2.1.0-20001230. The system is a RH 7.0
> with glibc-2.2-9, cdrecord-1.9.

Thanks for the report.
But with so tiny information, it gives about no usefulness to me.

> When will it be really usable?

A single ruined CD is probably too weak a symptom for stating any serious
sickness in the driver. FYI, I cannot even personnaly try to ruin a single
CDR, for the reason I don't have CDR.

If you can retrieve information related to the failure, you may send me
them (syslog messages, cdrecord output messages, etc...). Thanks in
advance. You may also give a try with stable kernel and related stuff and 
let me know the result.

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

SYM-2 driver released (:=sym53c8xx+ncr53c8xx).

2000-12-30 Thread Gérard Roudier



I just released sym-2.1.0 driver, that, according to my personnal QA 
plan :-), is the first Beta-release of this major driver version.

People interested in either using or just trying it can found the
reference tarball at the following URL:

ftp://ftp.tux.org/roudier/drivers/portable/sym-2.1.x/sym-2.1.0-20001230.tar.gz

This driver replaces functionnaly both sym53c8xx and ncr53c8xx.
It is in fact the FreeBSD sym driver that got portable and that, for now, 
also supports Linux.

The driver reference sources layout is the following:

Common:
sym_conf.h   sym_defs.h  sym_fw.csym_fw.h
sym_fw1.hsym_fw2.h   sym_hipd.c  sym_hipd.h
sym_malloc.c sym_misc.h  sym_nvram.c

FreeBSD:
sym_glue.c   sym_glue.h

Linux:
sym53c8xx.h  sym_glue.c  sym_glue.h

All the files can also be clicked/clipped :) individually from the 
the following directory:

   ftp://ftp.tux.org/roudier/drivers/portable/sym-2.1.x/current/

Given the genealogy of this driver, I have decided to maintain a high 
level of compatibility with the sym53c8xx driver under Linux.
But, due to the number of sources files (14 under Linux), the driver 
sources will now own a separate directory instead of being dropped in 
the huge drivers/scsi/ directory.

The installation procedure supplied in the tarball moves the files to:

   /usr/linux/drivers/scsi/sym53c8xx/

As a result, a tiny patch is needed for the related kernel files to 
be aware of the new driver files location. And, as I have limited 
time, only patches for 2.2.16, 2.2.17 and 2.2.18 are supplied for now.

People who will succeed installing the driver on other Linux kernel 
releases, especially recent ones, can send me the corresponding tiny 
kernel patch. Btw, this driver does not support Linux-2.0.X kernels.

The major improvements against sym53c8xx driver can be summarized 
as follows:

- Don't use the scsi_obsolete interface anymore.
  I could word it as 'use the new error handling interface', but the best 
  advantage, in my opinion, is that driver entry points are not called 
  recursively as does the old scsi code.

- Support for the entire NCR/SYMBIOS/LSILOGIC 53C[8XX|1010] in a single 
  driver without significant bloat of the object code.
  The driver with all options enabled is about 73K not stripped and 59K 
  stripped under Linux-2.2.18.

- Refining of a couple of work-arounds that let me claim that the driver 
  supports the best possible all chips of all revisions, even very early 
  revisions of recent chips.

I am highly interested in receiving reports, either success or problem, 
about this driver version, especially when the driver is tried on non Intel 
IA32 platforms.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Adaptec AIC7XXX v 6.0.6 BETA Released

2000-12-14 Thread Gérard Roudier

On Wed, 13 Dec 2000, Justin T. Gibbs wrote:

> >   Date: Wed, 13 Dec 2000 20:56:08 -0700
> >   From: "Justin T. Gibbs" <[EMAIL PROTECTED]>
> >
> >   None-the-less, it seems to me that spamming the kernel namespace
> >   with "current" in at least the way that the 2.2 kernels do (does
> >   this occur in later kernels?) should be corrected.
> >
> >Justin, "current" is a pointer to the current thread executing on the
> >current processor under Linux.  It has existed since day one of the
> >Linux kernel and probably will exist till the end of it's life.
> >
> >I'm sure the BSD kernel has some similar bogosity :-)
> 
> BSD has curproc, but that is considerably less likely to be
> used in "inoccent code" than "current".  I mean, "current what?".
> It could be anything, current privledges, current process, current
> thread, the current time...

"buf, buffers, type, version" (of what ?) with FreeBSD kernel (if they
still exist), but they are global variables, not macros.

By the way, SYM-2, that is "FreeBSD sym back to Linux but still in
FreeBSD":), clashed on Linux "current" as well. Reason is that the
corresponding code was based on yours :)  (as indicated in the sym driver
source). I have changed "current" by "curr". This is as clear and has the
advantage of scaling better with "user" and "goal" (4 characters each).

   tinfo.goal
   tinfo.user
   tinfo.curr

Just a suggestion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Signal 11 - the continuing saga

2000-12-13 Thread Gérard Roudier

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> 
> 
> Ehh, I think I found it.
> 
> Hint: "ptep_mkdirty()".
> 
> Oops.
> 
> I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that

Poor European Gérard as slim as 1.84 meter - 78 Kg these days.
What about old days poor European Linus versus these days American Linus
on these points ? ;-)

> this explains it.

Really ? :o)

>   Linus

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-12 Thread Gérard Roudier




On Tue, 12 Dec 2000, David S. Miller wrote:

>Date: Tue, 12 Dec 2000 20:17:21 +0100 (CET)
>From: Gérard Roudier <[EMAIL PROTECTED]>
> 
>On Mon, 11 Dec 2000, David S. Miller wrote:
> 
>> Tell me one valid use of this information first :-)
> 
>SCRIPTS. Have a look into my kind :-) response to Martin.
> 
> Ok, this I understand.
> 
>> b) If you wish to interpret the BAR values and use them from a BUS
>>perspective somehow, you still need to go through some interface
>>because you cannot assume what even the hw BAR values mean.
>>This is precisely the kind of interface I am suggesting.
> 
>The BAR values make FULL sense on the BUS.
> 
> I am saying there may be systems where it does not make any sense,
> f.e. actually used bits of BAR depend upon whether CPU, or DEVICE on
> that bus, or DEVICE on some other bus make the access.
>
> Forget all the PCI specifications, it is irrelevant here.  All your
> PCI expertiece means nothing, nor mine.  People build dumb machines
> with "PCI implementations" and we need to handle them.

Even the dumbest PCI implementation will keep with BAR relevance. Reason
is that PCI devices are using BAR values and corresponding size to make
decision about claiming or not a transaction as target.
You can be as dump as you want with PCI, but not that much. :-)

>I will wait for your .txt file that describes your idea. Your
>documentation about the new DMA mapping had been extremally useful.
>Let me thank you again for it.
> 
> It requires no .txt file :-), 

No problems, a ".text" file would also fit just fine. :-)

> it will just be formalization of
> existing bus_to_dvma_whatever hack :-) Specify PDEV (device) and
> RESNUM (which I/O or MEM resource for that device), returns either
> error or address as seen by BUS that PDEV is on.  You may offset
> this return value as desired, up to the size of that resource.
> 
> I could make a more elaborate interface (add new parameter,
> PDEV_MASTER which is device which wishes to access area described by
> PDEV+RESNUM), allowing full PCI peer-to-peer setup, as described by
> someone else in another email of this thread.  This version would have
> an error return, since there will be peer2peer situations on some
> systems which cannot be made.  But I feel this is inappropriate until
> 2.5.x, others can disagree.

I saw the proposal.

Btw, unlike the person, that proposed it, that will be able to test
peer-to-peer unability only, my current machine will allow to test
peer-to-peer ability only between 2 different PCI BUSes. :-)

For now, my intention is to encapsulate the right interface as seen from
my brain device in macros and forget about it until a new interface will
be provided. I will first implement it on SYM-2 and backport changes to
sym53c8xx later. And since I need the new major driver version to be
tested on non-Intel platforms, this will make full synergy for the
testings. :-)

Bye,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-12 Thread Gérard Roudier




On Mon, 11 Dec 2000, David S. Miller wrote:

>Date: Mon, 11 Dec 2000 23:07:01 +0100 (CET)
>From: Gérard Roudier <[EMAIL PROTECTED]>
> 
>So, if you want to fix this insane PCI interface:
> 
>1) Provide the _actual_ BARs values in the pci dev structure, otherwise 
>   drivers that need them will have to deal with ugly hackery or access 
>   explicitely the PCI configuration space.
> 
> Tell me one valid use of this information first :-)

SCRIPTS. Have a look into my kind :-) response to Martin.

By the way, the genuine physical addresses, alias pcidev cookies, as seen
from the CPU have exactly NO USE at all, except as input for ioremap().
Drivers can throw them away after that. So, given correct design they
should not even have to deal with them.

> a) If you want to use it to arrive at addresses MEM I/O operations
>you need to go through something akin to ioremap() first anyways.

ioremap() is the historical successor of vremap(). Without vremap(), it
may well never have existed.

> b) If you wish to interpret the BAR values and use them from a BUS
>perspective somehow, you still need to go through some interface
>because you cannot assume what even the hw BAR values mean.
>This is precisely the kind of interface I am suggesting.

The BAR values make FULL sense on the BUS.

>Consider even just that top few bits of BAR values on some system
>have some special meaning, and must be masked out before used from
>PCI device side transactions.  Perhaps these bits are interpreted
>somehow at the host bridge when CPU accesses to device MEM or I/O
>space are made.  I argue not that this is compliant behavior, I
>argue only that it is something idiots designing hardware will in
>fact do.  We have seen worse things occur.  Now, subsequently, if
>we start using raw BARs in drivers these systems (however important
>or not important) will become difficult to impossible to support.
>Here the blacklists will end up in your driver, which is where I
>think both of us will agree they should not be :-)

Read my reply to Martin on that point. 

>2) Provide an interface that accepts the PCI dev and the BAR offset as
>   input and that return somes cookie for read*/write* interface.
> GiveMeSomeCookieForMmIo(pcidev, bar_offset).
> 
> I do not understand why ioremap() is such a bletcherous interface
> for you :-)  You take resource in PDEV, add desired offset, and pass
> it to ioremap().  What about this sequence requires you to take pain
> killers? :-)  It seems quite straightforward to me.

I can live perfectly with ioremap(). :-))

> We do not want to expose physical BARs because you as a driver have
> no way to portably interpret this information.  On the other hand
> if you tell us "Given PDEV resource X, plus offset Y, give me this
> address in BUS space" we can do that and that is the interface that
> makes sense and is implementable on all architectures.  This is what
> I am proposing for adding asm/pci.h
> 
> Having people read and intepret BARs is not implementable on all
> architecures (see discussion in (b) above).
> 
> I guess there is some fundamental reason you do not like the kernel
> trying to discourage access to physical BARs.  This makes things so
> much easier and cleaner, at least to me.
> 
> I bet we end up in standstill here and ifdef hacks remain in symbios
> drivers :-)))  We will see...

I will wait for your .txt file that describes your idea. Your
documentation about the new DMA mapping had been extremally useful.
Let me thank you again for it.

Bye,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-12 Thread Gérard Roudier

On Tue, 12 Dec 2000, Martin Mares wrote:

> Hello!
> 
> > It is the bar cookies in pci dev structure that are insane, in my opinion.
> > 
> > If a driver needs BARs values, it needs actual BARs values and not some
> > stinking cookies. What a driver can do with BAR cookies other than using
> > them as band-aid for dubiously designed kernel interface.
> 
> If a driver wants to know the BAR values, it can pick them up in the configuration
> space itself. The problem is that these values mean absolutely nothing outside

The return value makes FULL sense on the BUS on which _real_ PCI
transactions will happen for old SYMBIOS devices and will hint recent ones
about using internal cycles instead (that are PCI 2.2 compliants) for
accessing the on chip-RAM.

As seen from the BUS and thus from the PCI device, all the opaque
inventions of O/Ses are just irrelevant sci-fi.

By the way, the hack that used bus_dvma_to_mem() from the BAR cookies is
not from me, but from David S. Miller. This will be fixed as you suggest.

> the bus the device resides on. There exist zillions of translating bridges
> and no driver except for the driver for the particular bridge should ever
> assume anything about them.

You seem to know well PCI but, in my opinion, you still have to learn much
about it and about what reality is.

You should repeat hundred times:

"It is not Gérard neither the sym driver that wants to know about
 BARs"

But,

"They are these damned PCI specifications that based everything on 
 actual BUS address comparators and the NCR/SYMBIOS ingenieers 
 that based memory related SCRIPTS instructions on actual adresses
 as seen from the BUS, and btw, as suggested by the specifications."

> The values in pci_dev->resource[] are not some random cookies, they are
> genuine physical addresses as seen by the CPU and as accepted by ioremap().

These cookies are confusing a lot and useless given a correct design of
related kernel interfaces. There is plenty of room in the pcidev structure
for private informations that would have avoided these stupid cookies.

In fact, these cookies are still there for historical reasons when
MMIO-capable PCI device driver(s) had to use vremap() on actual BAR
addresses. This only worked on Intel.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-11 Thread Gérard Roudier

On Mon, 11 Dec 2000, David S. Miller wrote:

>Date: Mon, 11 Dec 2000 22:30:59 +0100 (CET)
>From: Gérard Roudier <[EMAIL PROTECTED]>
> 
>On Mon, 11 Dec 2000, David S. Miller wrote:
> 
>> Really, in 2.4.x sparc64 requires PCI config space hackery no longer.
> 
>Really?
> 
>I was thinking about the pcivtophys() alias bus_dvma_to_mem() hackery used
>to retrieve the actual BAR address from the so-called pcidev bar cookies.
> 
> Really :-)  This conversation was about drivers making modifications
> to PCI config space areas which are being argued to be only modified
> by arch-specific PCI support layers.  That is the context in which
> I made my statements.

Was more general in my opinion. :-)

> Interpreting physical BAR values is another issue altogether.  Kernel
> wide interfaces for this may be easily added to include/asm/pci.h
> infrastructure, please just choose some sane name for it and I will
> compose a patch ok? :-)

Really? :-)

It is the bar cookies in pci dev structure that are insane, in my opinion.

If a driver needs BARs values, it needs actual BARs values and not some
stinking cookies. What a driver can do with BAR cookies other than using
them as band-aid for dubiously designed kernel interface.

BUT, a driver does not care about handles passed to read*/write* and
friends and should not have to care. Using cookies, handle or tag or
whatever means 'user should not worry about but just pass them when
needed' is good here.

So, if you want to fix this insane PCI interface:

1) Provide the _actual_ BARs values in the pci dev structure, otherwise 
   drivers that need them will have to deal with ugly hackery or access 
   explicitely the PCI configuration space.

2) Provide an interface that accepts the PCI dev and the BAR offset as
   input and that return somes cookie for read*/write* interface.
   GiveMeSomeCookieForMmIo(pcidev, bar_offset).

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-11 Thread Gérard Roudier

On Mon, 11 Dec 2000, David S. Miller wrote:

>Date:  Mon, 11 Dec 2000 21:49:52 +0100 (CET)
>From: Gérard Roudier <[EMAIL PROTECTED]>
> 
>If now, the PCI stuff is claimed to be cleaned up, then _all_ the
>hacks have to be removed definitely.  As a result, the driver will
>not work anymore on Sparc64, neither on PPC and I am not sure it
>will still work on Alpha, in my opinion.
> 
> Actually Gerard, in your current 2.4.x NCR53c8xx and SYM53c8XX drivers
> only real ifdefs for sparc64 are printf format strings for PCI interrupt
> numbers :-)
> 
> Really, in 2.4.x sparc64 requires PCI config space hackery no longer.

Really?

I was thinking about the pcivtophys() alias bus_dvma_to_mem() hackery used
to retrieve the actual BAR address from the so-called pcidev bar cookies.

As you know the driver needs to know the actual values of MEM BARs, since
SCRIPTS may access either the IO registers and/or the on-chip RAM using
non sci-fi but actual BUS adresses (those that are actually used by PCI
transactions and that devices compare against their BARs in order to
claim access they are targetted).

Even for chips that donnot actually master themselves (896 for example),
due to LOAD/STORE and using internal cycles to access the on-chip RAM, 
the actual on-chip RAM BAR address we need.

Note that if reading the BARs using pci_read_config_*() interface is
allowed, then the pcivtophys() is and was an useless thing.

About the PPC, it is the memcpy_toio() for the on-chip RAM that does not
work using iomapped bar cookie. The driver has to use SCRIPT that does
self-mastering, but self-mastering is no more compliant with PCI-2.2 as we
know.

About the Alpha. The pcivtobus/bus_dvma_to_mem thing in the driver, is not
defined as just nilpotent, but in fact it is so (d & 0x) at least
for 32 bit scsi-fi cookies.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-11 Thread Gérard Roudier

On Mon, 11 Dec 2000, Martin Mares wrote:

> Hello Gerard!
> 
> > Having to call some pdev_enable_device() to have the cache line size
> > configured looks like shit to me. After all, the BARs, INT, LATENCY TIMER,
> > etc.. are configured prior to entering driver probe.
> 
> Once upon a time, they used to be, but they no longer are. Unfortunately, there
> are lots of bogus devices which must be never assigned BARs nor routed
> interrupts.

Hmmm... Because of broken devices you move the burden to fine ones. Let me
disagree here.

> You need to call pci_enable_device() after you recognize the device
> as handled by your driver to get BARs and interrupts set up. Also, if your
> driver uses bus mastering, it should call pci_set_master().

I donnot see in what pci_enable_device() knows better than the tailored
software driver about what is to enable for the device and what must not
or should not. And, btw, when pci_enable_device() was born, it just
shoe-horned latency timer 64 if it was lower that 16 and I didn't want to
use so obviously loose in the first place interface.

> > Why should the cache line size be deferred to some call to some obscure
> > mismaned thing ?
> 
> See above.  I'm also not joyfully jumping when I think of it, but consider
> it being a tax on being compatible with the rough world of buggy PCI devices.

Blacklists are there to preserve simplicity and genericity for compliant
devices and it has the advantage of showing in a single place what and how
these devices are broken.

On the other hand, I donnot see `pci_enable_device()' in latest 2.2
kernels. It seem to beleive that the millions Linux used around the world
are rather 2.2 that 2.4. All these buggy PCI devices breaking millions of
Linux 2.2 kernels should make some noise. What did I miss here?

> Anyway, it's still zillion times better than random drivers modifying such
> configuration registers in random manner, knowing nothing about the host
> bridge and other such stuff.

Making driver for Linux work has been a nightmare for years, especially
PCI-SCSI, due to limitations and brokenness of related kernel services.

> (Side note: I'm not saying the method your driver uses was bad at the time
> it was designed, I'm only saying that it's wrong wrt. the rest of the kernel
> and it should be gone.)

The driver is not calling pci_enable_device() but is does not change
anything by default for Intel (the only one I support personnaly) if it is
provided with a correct configuration. I have used it on various Intel
platforms, even without SDMS BIOS and the only configuration item I have
seen missing sometimes has been the offending PCI cache line size.

When the driver has been tinked (not by me) in order to run on non Intel
platforms, I am been provided with various ugly PCI-related hacks that I
have incorporated with appropriate #ifdef(s), since I couldn't even test
them.

If now, the PCI stuff is claimed to be cleaned up, then _all_ the hacks
have to be removed definitely.
As a result, the driver will not work anymore on Sparc64, neither on PPC
and I am not sure it will still work on Alpha, in my opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-11 Thread Gérard Roudier

On Mon, 11 Dec 2000 [EMAIL PROTECTED] wrote:

> On Mon, 11 Dec 2000, Jamie Lokier wrote:
> 
> > Here are a few more:
> > 
> >  net/acenic.c: pci_write_config_byte(ap->pdev, PCI_CACHE_LINE_SIZE,
> 
> Acenic is at least setting it to the correct values, not hardcoding it.
> 
> >  net/gmac.c: PCI_CACHE_LINE_SIZE, 8);
> 
> Ick.
> 
> >  scsi/sym53c8xx.c: printk(NAME53C8XX ": PCI_CACHE_LINE_SIZE set to %d (fix-up).\n",
> 
> **vomit**

A BASTARD you are. Linux was born thanks to volunteers that spent
thousands of hours on their free time for helping development. If you
vomit on me, let me shit on you.

> On the plus side, they made it arch independant. Shame it's incomplete.
> If you look at the x86 path, its missing Pentium 4 support (x86==15).

Most of the code in Linux was there years ago prior to the Pentium 4 that, 
by the way, looks like the buggiest thing that are ever existed.

> It also screws up on Athlon where it should be set to 16, but gets 8.

Same for this one.

> I wouldn't be surprised if the other arch's were missing some definitions
> too.  The fact that this driver is a port of FreeBSD driver may be the
> reason why SMP_CACHE_BYTES wasn't used instead, and the author opted
> for that monster. But still, the whole thing is completely unnecessary.

The driver is back to FreeBSD and is intended to go to other Free O/Ses as 
I will find time for.

[...]

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-11 Thread Gérard Roudier

On Mon, 11 Dec 2000, Jamie Lokier wrote:

> Here are a few more:
> 
>  net/acenic.c: pci_write_config_byte(ap->pdev, PCI_CACHE_LINE_SIZE,
>  net/gmac.c: PCI_CACHE_LINE_SIZE, 8);

>  scsi/sym53c8xx.c: printk(NAME53C8XX ": PCI_CACHE_LINE_SIZE set to %d (fix-up).\n",

For this one, this happens on Intel:

- ONLY if PCI cache line size was configured to ZERO (i.e. not
  configured).

 AND

- ONLY if user asked for this through the boot command line.

Anyway, the driver WARNs user about if it shoe-horns some value as you can
see above.

Btw, there is a single case where using MWI is a workaround.

Given that all known systems have a known PCI CACHE LINE SIZE for L2/L3,
if POST software + O/S PCI driver are loose enough not to provide the
RIGHT value of the PCI CACHE LINE LINE for devices that support it, what
software drivers can do ?

May-be, they should just refuse to attach the device, at least when this
information _must_ be known in order to work-around a device problem. This
will remove some ugly code for non-Intel plat-forms from the sym53c8xx
source, by the way.

Having to call some pdev_enable_device() to have the cache line size
configured looks like shit to me. After all, the BARs, INT, LATENCY TIMER,
etc.. are configured prior to entering driver probe. Why should the cache
line size be deferred to some call to some obscure mismaned thing ?

[...]

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: pdev_enable_device no longer used ?

2000-12-09 Thread Gérard Roudier

On Sat, 9 Dec 2000, Alan Cox wrote:

> > If/When x86 (or all?) architectures use this, will it make sense to
> > remove the PCI space cache line setting from drivers ?
> > Or is there borked hardware out there that require drivers to say
> > "This cacheline size must be xxx bytes, anything else will break" ?
> 
> If there is surely the driver can override it again before enabling the
> master bit or talking to the device ?

Configuring PCI cacheline size with a value that is a multiple of the
right value should not break. MWIs will still write whole cache lines and
MRL and MRM may prefetch more data but this should be harmless.

But, configuring a device for a value lower that the right value of the
cache line size will break if the hardware actually invalidate cache-lines
on MWI. Bridges that alias MWI to MW will obviously not be harmed by such
a misconfiguration.

As a result, in my opinion:

- A device that requires some non zero cache line size value lower than
the right value for a given system and that actually use MWIs must not be
supported on that system, unless we know that the bridge does alias MWI to
MW. (If such a device can be configured for not using MWI, any value for 
the PCI cache line size will not break).

- A driver that blindly shoe-horns some value for the cache-line size must
be fixed. Basically, it should not change the value if it is not zero and,
at least, warn user if it has changed the value because it was zero.

What are the strong reasons that let some POST softwares not fill properly
the cache line size of PCI devices ?

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [2*PATCH] alpha I/O access and mb()

2000-12-09 Thread Gérard Roudier

On Sat, 9 Dec 2000, Abramo Bagnara wrote:

> Gérard Roudier wrote:
> > 
> > 
> > Based on that, let me claim that most of blind barriers inserted this way
> > are useless (thus sob-optimal) and may band-aid useful barriers that are
> > missing. The result is subtle bugs, hidden most of the time, that we will
> > have to suffer for decades.
> > 
> > The only way to do things right regarding ordering it to have device
> > drivers _aware_ of such issues. Now, if we are happy with broken portable
> > or platform-independant drivers that rely on broken hidden ordering
> > alchemy rather than on correctness, then it is another story.
> 
> I see perfectly your point and this is the reason why we have
> __raw_write[bwlq] in 2.4, but write[bwlq] expected semantic is to ensure
> that write *happens* and are visible by other agents.

Ordering and flushing are different issues. Hardware may flush in order to
guarantee some order, but if nothing is to order it may not. A memory
barrier does not guarantee you that a write will go faster to the system
BUS. Basically, if no other agent does need the data and the data is
cacheable, the flushing is just useless. Confusing ordering and flushing
is a serious mistake in my humble opinion.

Speaking about MMIO which is not cacheable, indeed we want the data
targetting the MMIO area to be flushed quickly. But we also want the
device to have a consistent vue of the data in memory for all IOs it is
provided with. Usually, we have to deal with the following:

1) Prepare some DATA in cachable memory (DMA related)
2) Ensure ordering: i.e. may insert a MEMORY BARRIER
3) Tell the device about the IO to perform: write to MMIO

Drivers that are unaware of (2) _are_ broken.

Or:

1) Read device status register to know about IO that completed.
2) Ensure ordering of speculative reads against DMA from the device.
   i.e. may insert a MEMORY (READ) BARRIER.
3) Look into memory for IOs that have completed.

Drivers that are unaware of (2) _are_ broken.

Since the hidden BUS stuff just puts its implicit barriers at the wrong
place regarding the above, any device driver that does DMA and that does
not use explicit barriers is likely to be broken even if it uses normal
IOs. Reason is that the PCI specifications also allow host bridges to post
IO transactions and thus assuming 15 years old ISA-like behaviour is plain
wrong.

> You can tell me that almost nobody uses __raw_write now and this is bad
> and I agree with you, but sometime this is not a perfect world ;-)

The various BUS abstractions I have to suffer of are indeed a great
demonstration of our world being not perfect. ;-)

Hehe... I read so often that most drivers are broken that shoe-horning
bunches or barriers, bus things and other band-aidings is probably the
only way to have some of them mostly usable. ;-)
Or could it be that current O/S guys are still ISA-bussed. ;-)

By the way, given our real world, your patch is probably quite
reasonnable. My point was not to disagree with it, in particular.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [2*PATCH] alpha I/O access and mb()

2000-12-09 Thread Gérard Roudier

On Sat, 9 Dec 2000, Abramo Bagnara wrote:

> 
> Here you are two patches:
> 
> alpha-mb-2.2.diff add the missing mb() to the cores that still lack it
> (against 2.2.18pre25)
> 
> alpha-mb-2.4.diff add missing defines from core_t2.h for non generic
> kernel (against 2.4.0test11)
> 
> Please apply on your trees.
> 
> I've also noted that 2.4 uses mb() after read[bwlq] while 2.2 don't.
> Who's right?

Let me howl for a minute. :-)

The actual issue regarding ordering is generally to ensure something to
happen (i.e. to be seen to happen by other agents) _before_ something
else. As a result, what we have in mind is to insert a barrier _before_
this `something else'.

However, everything I seem to see about this issue on our planet and that
applies to IO subsystems is blindly inserting barriers _after_ the
'something'.

Is software getting sci. fi. ? ;-)

Based on that, let me claim that most of blind barriers inserted this way
are useless (thus sob-optimal) and may band-aid useful barriers that are
missing. The result is subtle bugs, hidden most of the time, that we will
have to suffer for decades.

The only way to do things right regarding ordering it to have device
drivers _aware_ of such issues. Now, if we are happy with broken portable
or platform-independant drivers that rely on broken hidden ordering
alchemy rather than on correctness, then it is another story.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Defective Red Hat Distribution poorly represents Linux

2000-11-21 Thread Gérard Roudier




On Tue, 21 Nov 2000, David Riley wrote:

> Horst von Brand wrote:
> > 
> > So what? My former machine ran fine with Win95/WinNT. Linux wouldn't even
> > end booting the kernel. Reason: P/100 was running at 120Mhz. Fixed that, no
> > trouble for years. Not the only case of WinXX running (apparently?) fine
> > on broken/misconfigured hardware I've seen, mind you.
> 
> This is something I've noticed as well...
> 
> Windoze is not the only OS to handle bad hardware better than Linux.  On
> my Mac, I had a bad DIMM that worked fine on the MacOS side, but kept
> causing random bus-type errors in Linux.  Same as when I accidentally
> (long story) overclocked the bus on the CPU.  I think that more
> tolerance for faulty hardware (more than just poorly programmed BIOS or
> chipsets with known bugs) is something that might be worth looking into.
>  I'm sure it would solve problems like this (which I clearly identify as
> a hardware problem, because the same thing happened with the bad DIMM,
> the overclocked bus, and two different overclocked processors (AMD 5x86
> and AMD K6-2 500) and went away when I remedied the offending problem). 
> Additionally, overclockers (I myself am a reformed one) might appreciate
> more tolerance for such things.

Hmmm... The more an O/S wait stupidly for something when it could do
useful work, the less it is likely to trigger hardware problems.

Windoze is probably still far better that Linux at handling billions
dollars. I never noticed it was good at anything else. :-)

> My two cents/pence/centavos/local tiny currency denomination,
>   David

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI-PCI bridges mess in 2.4.x

2000-11-10 Thread Gérard Roudier




On Sat, 11 Nov 2000, Ivan Kokshaysky wrote:

> On Fri, Nov 10, 2000 at 07:35:41PM +0100, Gerard Roudier wrote:
> > I only have spec 1.0 on paper. I should have checked 1.1. Anyway, it may
> > still exist bridges that have been designed prior to spec. 1.1.
> 
> Yes, DEC 2105x bridges, for example.
> 
> The only update listed in revision history is "Update to include
> target initial latency requirements", so this (base > limit) stuff
> must be in rev. 1.0 as well. Please check chapters 3.2.5.[6,8,9].

The revision history should be a lot pessimistic about the amount of
additions. Btw, rev. 1.0 April 5, 1994 is 63 pages, and rev 1.1 is about
147 pages, as you know.

> > > I/O is slightly different because it's optional for the bridge -
> > > but if it's implemented same rules apply.
> > 
> > Will also check the spec. on this point. :)
> 
> Also, according the spec, we need some paranoia checks ;-)
> 1. check if the bridge has an I/O window not implemented

Read-only, returning zero on read. Already present in spec. 1.0.

> 2. if the bridge has regular BARs, allocate them properly
>on the primary bus.

Limit < Base (new in 1.1, unless I missed the point. Btw, I actually
  donnot want to read again P2P spec. 1.0 :-) )

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI-PCI bridges mess in 2.4.x

2000-11-10 Thread Gérard Roudier




On Fri, 10 Nov 2000, Ivan Kokshaysky wrote:

> On Thu, Nov 09, 2000 at 09:37:41PM +0100, Gerard Roudier wrote:
> > Hmmm...
> > The PCI spec. says that Limit registers define the top addresses
> > _inclusive_.
> 
> Correct.
> 
> > The spec. does not seem to imagine that a Limit register lower than the
> > corresponding Base register will ever exist anywhere, in my opinion. :-)
> 
> Not correct.
> Here's a quote from `PCI-to-PCI Bridge Architecture Specification rev 1.1':
>The Memory Limit register _must_ be programmed to a smaller value
>than the Memory Base if there are no memory-mapped I/O addresses on the
>secondary side of the bridge.

I only have spec 1.0 on paper. I should have checked 1.1. Anyway, it may
still exist bridges that have been designed prior to spec. 1.1.

> I/O is slightly different because it's optional for the bridge -
> but if it's implemented same rules apply.

Will also check the spec. on this point. :)

> > This let me think that trying to be clever here is probably a very bad
> > idea. What is so catastrophic of having 1 to 4 bytes of addresses and no
> > more being possibly in a forwardable range?
> > 
> Huh. 1 to 4 bytes? 4K for I/O and 1M for memory.
> And it's not trying to be clever (anymore :-) - just strictly following
> the Specs.

I just missed the units, but absolute values weren't so wrong. :-)

> I understand your point very well, btw. I asked similar questions to myself
> until I've had the docs.

Ok. Thanks for the reply.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI-PCI bridges mess in 2.4.x

2000-11-09 Thread Gérard Roudier




On Thu, 9 Nov 2000, Ivan Kokshaysky wrote:

Hmmm...
The PCI spec. says that Limit registers define the top addresses
_inclusive_.

The spec. does not seem to imagine that a Limit register lower than the
corresponding Base register will ever exist anywhere, in my opinion. :-)

This let me think that trying to be clever here is probably a very bad
idea. What is so catastrophic of having 1 to 4 bytes of addresses and no
more being possibly in a forwardable range?

  Gérard.


> On Wed, Nov 08, 2000 at 03:48:11PM -0800, Richard Henderson wrote:
> > Whee!  We're back in Bootsville.
> 
> Cool!
> Meanwhile this base/limit stuff got confirmation :-)
> Here is a patch against bridges-2.4.0t11-rth.
> 
> Ivan.
> 
> --- 2.4.0t11p1/drivers/pci/setup-bus.cWed Nov  8 19:44:42 2000
> +++ linux/drivers/pci/setup-bus.c Thu Nov  9 15:11:01 2000
> @@ -88,14 +88,14 @@ pbus_assign_resources_sorted(struct pci_
>   ranges->io_end += io_reserved;
>   ranges->mem_end += mem_reserved;
>  
> - /* ??? How to turn off a bus from responding to, say, I/O at
> -all if there are no I/O ports behind the bus?  Turning off
> -PCI_COMMAND_IO doesn't seem to do the job.  So we must
> -allow for at least one unit.  */
> - if (ranges->io_end == ranges->io_start)
> - ranges->io_end += 1;
> - if (ranges->mem_end == ranges->mem_start)
> - ranges->mem_end += 1;
> + /* Interesting case is when, say, io_end == io_start, i.e.
> +there is no I/O behind the bridge at all. We initialize
> +the bridge with base=io_start and limit=io_end-1, so
> +in this case we'll have base > limit. According to
> +the `PCI-to-PCI Bridge Architecture Specification', this
> +means that the bridge will not forward any I/O transactions
> +from the primary bus to the secondary bus and will forward
> +all I/O transactions upstream. Exactly what we want.  */
>  
>   ranges->io_end = ROUND_UP(ranges->io_end, 4*1024);
>   ranges->mem_end = ROUND_UP(ranges->mem_end, 1024*1024);
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

RE: accessing on-card ram/rom

2000-11-09 Thread Gérard Roudier




On Wed, 8 Nov 2000 [EMAIL PROTECTED] wrote:

> I looked at the IO-mapping.txt file. It says that
> on x86 architecture it should not make any difference.
> It also says that "on x86 it _is_ the same memory space. So
> on x86 it actually works to just dereference a pointer".

For bus_to_virt() to give a usable virtual address, such a virtual 
mapping must exist and additionnally be part of the linear kernel 
mapping. A PCI MMIO address is generally _not_ even mapped by default 
by the kernel.

On the other hand, bus_to_virt() hasn't the proper semantic for your
problem, since it applies to main memory addresses as seen from the PCI
BUS, and you want an MMIO address usable by the CPU (=virtual).

Hmmm... ioremap() just create a virtual mapping for the CPU to access 
the MMIO window of your PCI chip just fine.

  Gérard.

> Any inputs on this ?
> 
> Thanks and regards,
> -hiren
> 
> > -Original Message-
> > From: Jeff Garzik [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, November 08, 2000 2:53 PM
> > To: MEHTA,HIREN (A-SanJose,ex1)
> > Cc: '[EMAIL PROTECTED]'
> > Subject: Re: accessing on-card ram/rom
> > 
> > 
> > "MEHTA,HIREN (A-SanJose,ex1)" wrote:
> > > I have a PCI card which has on-card ram/rom which gets mapped
> > > into pci address space and there is a separate base register
> > > for this memory. Now the question is : can I access this on-card
> > > memory by converting the pci base address into the virtual address
> > > using bus_to_virt and adding the required offset ? Or do I need
> > > to use ioremap function to map the physical address space starting
> > > from the pci base address into the kernel virtual address space ?
> > > Or is there any other interface to access the on-card memory ?
> > > Is it that bus_to_virt can be used only for the normal RAM ?
> > 
> > Use ioremap.
> > 
> > For more details, read linux/Documentation/IO-mapping.txt.
> > 
> > Jeff
> > 
> > 
> > -- 
> > Jeff Garzik | "When I do this, my computer freezes."
> > Building 1024   |  -user
> > MandrakeSoft| "Don't do that."
> > |  -level 1
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] Fix SCSI proc oops

2000-10-14 Thread Gérard Roudier




On Sat, 14 Oct 2000, Torben Mathiasen wrote:

> On Sat, Oct 14 2000, David S. Miller wrote:
> >Date: Sat, 14 Oct 2000 11:43:09 +0200
> >From: Torben Mathiasen <[EMAIL PROTECTED]>
> > 
> >David, why is tpnt->proc_name NULL on sparc for devices not
> >existing?  Every driver has this as part of their tpnt struct, so
> >it doesn't matter if the underlying device really exists.
> > 
> > In the mentioned case it would be NULL on all architectures, not just
> > Sparc ;-) (it happens on ix86 too, ix86 is different only because it
> > does not trap kernel NULL pointer derefences during bootup for some
> > odd reason)
> > 
> > Here is what happens:
> > 
> > scsi_register_host()
> > tpnt->present = tpnt->detect(tpnt);
> > /* tpnt->present is zero since no such adapters were found */
> > 
> > If no hosts are detected the driver is under no obligation to
> > initialize the tpnt->proc_name field.  For example,
> > sym53c8xx.c:sym53c8xx_detect() does not if PCI is not present and this
> > is the specific case hit on my SBUS-only workstation :-)
> > 
> > Subsequently scsi_unregister_host() is called for this TPNT and
> > this is where the NULL pointer is hit.
> >
> 
> Ahh, the drivers I looked at all had proc_name as part of their:
> 
> define IN2000 {
>   bla:bla,
>   proc_name: IN2000,
> 
> }
> 
> structure. I see your point now.
> 
> Are there any reason why sym53c8xx and others initialize proc_name only
> if an adapter was actually found (or in the sym case, if a pcibus was
> found)?

The [sym|ncr]53c8xx drivers do initialize the proc fs stuff (proc_name for
recent kernels) in the Scsi_Host_Template if PCI is present and user wants
the controllers to be registered in the proc FS. On the other hand, the
`proc_name' field appeared in some 2.3.X kernel version (sym53c8xx says
Linux-2.3.27). In previous kernel versions, SCSI drivers had to make
`proc_dir' field point to a `proc_dir' structure that allowed proc/FSing
the controller to be optionnable.

We have to decide if we want proc/FSing SCSI controllers (event when none
will be attached or BUS does not even exist) to be optionnal or not.
- If it is optionnal, then proc_name=NULL should be interpreted as
  proc/FSing option disabled.
- Otherwise, the proc_name field should be required to be not NULL.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Updated 2.4 TODO List -- new addition WAS(test9 PCI resourcecollisions (fwd)

2000-10-11 Thread Gérard Roudier




On Wed, 11 Oct 2000, Mike A. Harris wrote:

> On 10 Oct 2000, Gnea wrote:
> 
> >>  Please add this to your list. Linux is unusable in these machines.
> >>  I have cc'ed Martin and Linus because they play in that PCI area.
> >
> >erm, looking at your list it says that you're using Redhat 7.0, which
> >is known to ship with a buggy gcc, which is KNOWN to do nasty things
> >with kernels.  
> >
> >Linux version 2.4.0-test9-JHS1 ([EMAIL PROTECTED]) (gcc
> >version 2.96 2
> >731 (Red Hat Linux 7.0)) #2 Thu Oct 5 11:59:31 EDT 2000
> >
> >yeah, that pretty much sums it up right there.. you may want to try
> >something else.
> 
> Once again misinformation and FUD.
> 
> On Red Hat 7.0, use "kgcc" for kernel compilation.  This is
> really an FAQ...  Instead of changing distributions, try reading
> manuals.

What manuals ?

The genuine Linux kernel distribution contains its own documentation on
how to build and configure it.

The kgcc story looks to me like a lie from RedHat. In my opinion, they
just don't want to recognize that they have been loose.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: which is it ncr53c810 or ncr53c810a?

2000-10-01 Thread Gérard Roudier

810A

Your problem doesn't look like an SCSI transport error to me. Any SCSI
controller will probably not give better result.

Your device does not like some SCSI command and returns CHECK CONDITION
status, then returns SENSE DATA that are passed to the application
client.

> CDB:  52 01 00 00 00 FF 00 00 1C 00

52 -> READ TRACK INFORMATION.
TRACK BIT=1 TRACK NUMBER=0xff -> track number of the invisible track 
ALLOCATION LENGTH = 0x1C = 28 which looks correct for the size of the 
  response.

The SCSI command seems correct to me. But given the value of byte 15
(0xC0) of the SENSE DATA, the device wants to tell you about the field
stated invalid with SENSE DATA bytes 16 and 17, but infortunately only 16
bytes are displayed in the trace instead of 18 as it is usual. So, we will
not know.

I would suggest you to report the problem to 'cdrecord' maintainers or to
HP support.

  Gerard.

On Sun, 1 Oct 2000, Joe wrote:

> Hi,
> I have been troubleshoting some scsi errors that I have been
> recieving, and noticed something weird.  In my proc/pci my controller
> card is identified as (I know obsolete).
> ---
> Bus  0, device  18, function  0:
> SCSI storage controller: NCR 53c810 (rev 18).
>   Medium devsel.  IRQ 9.  Master Capable.  Latency=64.  Min
> Gnt=8.Max Lat=64.
>   I/O at 0x6800 [0x6801].
>   Non-prefetchable 32 bit memory at 0xe400 [0xe400].
> ---
> I then looked in the /proc/scsi/ncr53c8xx/0 file and it said
> ---
> General information:
>   Chip NCR53C810a, device id 0x1, revision id 0x12
>   IO port address 0x6800, IRQ number 9
>   Using memory mapped IO at virtual address 0xc8902000
>   Synchronous period factor 25, max commands per lun 32
> -
> So which is it?  An 810 or 810A? Also sometimes I get the following
> error when writing cdr's
> 
> cdrecord: Input/output error. read track info: scsi sendcmd: retryable
> error
> CDB:  52 01 00 00 00 FF 00 00 1C 00
> status: 0x2 (CHECK CONDITION)
> Sense Bytes: 70 00 05 00 00 00 00 12 00 00 00 00 24 00 00 C0
> Sense Key: 0x5 Illegal Request, Segment 0
> Sense Code: 0x24 Qual 0x00 (invalid field in cdb) Fru 0x0
> Sense flags: Blk 0 (not valid) error refers to command part, bit ptr 0
> (not valid) field ptr 0
> 
> The drive is an HP 9200i, any ideas? The drive is terminated and is the
> only drive on the scsi bus.  The strangest thing is that I have set the
> scsi speed to Fast-5 Mb as fast 10 increases the frequency of this
> error, and when I do a modprobe -a ncr53c8xx it spits out it is fast-10
> and then when I do a modprobe -a sr_mod it detects it is a Fast-5.  Is
> anyone else using the HP 9200i scsi drive? I don't think that it is a
> scsi termination issue or cabling.
> 
> --
> Joe Acosta 
> home: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Socket Interface

2000-09-28 Thread Gérard Roudier




On Thu, 28 Sep 2000, Peter Samuelson wrote:

>   [Peter Samuelson]
> > > But it really bugs me when someone uses the term 'Linux 6.2' (:
> > > I could not resist pointing out the distinction.
> 
> [Gérard Roudier]
> > You seem to do the same confusion of assuming that Linux is RedHat. Note
> > that it is not your fault. We all are so screwed by marketing techniques.
> 
> Hey now.  I didn't assume "Linux is RedHat".  I assumed a user who says
> "Linux 6.2" is referring to "Red Hat Linux 6.2".  And I still wouldn't
> bet against Igmar.  What's the point?  I don't like beer anyway. (:

> Note that I deliberately ignored my assumption when replying -- I gave
> instructions for Debian derivations, even though I don't know of any
> Debian-derived distribution with a version number 6.2.

There are dozens of similarly different (or differently similar - as your
prefer) Linux based O/Ses distributions around the world and some have had
6.x versionning when RedHat was also 6.x (SuSe for example).

  Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Socket Interface

2000-09-28 Thread Gérard Roudier



On Thu, 28 Sep 2000, Peter Samuelson wrote:

>   [Peter Samuelson]
> > > There is no Linux 6.2.  The newest version is a prerelease of 2.4.0.
> 
> [Igmar Palsenberg <[EMAIL PROTECTED]>]
> > I'll bet you a beer he's using RedHat :)

A german beer ? ;-)

> Yes, yes.  You know he's using Red Hat and I know he's using Red Hat.

Wasn't so clear to me.

> But it really bugs me when someone uses the term 'Linux 6.2' (:
> I could not resist pointing out the distinction.

You seem to do the same confusion of assuming that Linux is RedHat. Note
that it is not your fault. We all are so screwed by marketing techniques.

Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: New topic (PowerPC Linux PCI HELL)

2000-09-22 Thread Gérard Roudier

On Fri, 22 Sep 2000, Jamie Lokier wrote:

> Michel Lanners wrote:
> > >> static inline int pci_enable_device(struct pci_device *dev)
> > >> {
> > >>  return pci_enable_device_features(USE_IO|USE_MM);
> > >> }
> > (snip)
> > > And what about other features ?
> > > I mean:
> > > - Bus Master
> > > - Memory Write and Invalidate
> > > - Parity Error response
> > 
> > This should probably be handled in arch-dependant code. So make a
> > pci_enable_device() per arch. The point beeing that only this code has a
> > chance to know some of the details of the PCI implementation on this
> > platform/arch. Bus master and MemWI don't hurt, but I guess enabling
> > parity can halt the bus. So you want to be careful...
> 
> Take a look at the drivers/net/acenic.c driver.  It enables/disables
> Memory Write and Invalidate one way or another, but the decision is not
> arch-specific.  It gets worse: it writes cache line size to PCI_COMMAND
> as well.

Why should it not do so, given that no hardware quirks is supplied ?

The fact that you aren't told about something should not make you assume
the worse case, by default. Such behaviour is called paranoia. And,
speaking about PCI, result is that users pay for features the kernel
decide in user's back not to use, even when they are functionning just
fine.

About PCI parity error checking and reporting, this feature is required
for most PCI controller classes. If it is confusing the machine, then
hardware should get garbagged and user should get reimbursed of the
hardware and time wasted.

About the PCI cache line size, there is no valuable reason for the POST
code not to set it. Obviously, MWI requires the right value of the cache
line to be configured. A POST code that does not set the cache line size
to the proper value in order to tell about not using MWI looks like a
definite piece of crap to me. PCI devices may use the configured cache
line size in order to make decision about using MRL and MRM as you know,
and bridges and devices that implement prefetchable memory can gain
advantage of such transactions.

Note that for MRL and MRM, assuming a wrong value of the cache line size
will not break anything, and a not too wrong value (twice or half the
right value) can be adequate.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: New topic (PowerPC Linux PCI HELL)

2000-09-18 Thread Gérard Roudier




On Mon, 18 Sep 2000, Alan Cox wrote:

> > All I wanted was a function that allows the driver to decide that which
> > needs to be enabled.
> > 
> > pci_enable_device(struct pci_dev *dev, byte enable_mask)
> > 
> > This would allow drivers to enable that which it needs and not weird out
> > the hardware that does not like all of this extra fluff.
> 
> Sounds not too daft
> 
> static inline int pci_enable_device(struct pci_device *dev)
> {
>   return pci_enable_device_features(USE_IO|USE_MM);
> }

Should be worded as "Respond to IO", "Respond to Memory" transactions,
given the explicit PCI context.

And what about other features ?
I mean:
- Bus Master
- Memory Write and Invalidate
- Parity Error response
Etc ...
Are they considered as misfeatures ? ;-)
 
> and then just go and turn the existing enable_device into enable_device_Features ? 

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: An elevator algorithm (patch)

2000-09-17 Thread Gérard Roudier

On Sun, 17 Sep 2000, Rik van Riel wrote:

> On 17 Sep 2000, Peter Osterlund wrote:
> > Andrea Arcangeli <[EMAIL PROTECTED]> writes:
> > 
> > > While the queue is plugged or with things like SCSI your logic
> > > change won't work because in such case if your request is lower the
> > > lowest in the queue, you can put it at the head of the queue and you
> > > have no way to know where your "tmp1" was placed so you can't make
> > > any assumption (that's why the current code makes sense).
> > 
> > I still don't think the current code makes sense.
> 
> [snip examples]
> 
> Indeed, it's obvious that your code will give better
> results (and it's more readable too). I like it...
> 
> > The new patch is also not unfair to requests near the end of the
> > disk. The current kernel code can starve those requests a very
> > long time if the request queue never becomes empty.
> 
> This is a very very big problem, which definately needs
> to be addressed ASAP. I've witnessed this starvation happen
> a couple of times and it's a really big problem...
> 
> > The only disadvantage I can see is that the new patch doesn't
> > handle consecutive insertions in O(1) time, but then again, the
> > pre-latency elevator code didn't do that either. Is this really
> > important? How long can the request queue be? Apparently we gain
> > more by avoiding disk seeks than we lose by wasting some CPU
> > cycles during request insertion.
> 
> Well DUH. ;)
> 
> A disk seek takes ~10 milliseconds on a modern drive,
> that's about an /eternity/ as far as the CPU is concerned.

This was under MS/DOS 10 years ago probably.

Nowadays you can connect 30 disks of about 4 ms average seek time to a
single Ultra 160 2 channels 64 bit 66MHz PCI controller. With appropriate
benchmarks that gain advantage of disk prefetching, you can easily observe
30,000 short IOs per seconds and even more (15000 per channel). This
happens using no more than 3 disks per BUS.

For real work, with disks actually seeking, such an possible system should  
probably be capable of performing more than 6000 IOs per seconds (That's 
theory from me, since I never saw such a system :-) ).
This gives some margin for wasting CPU cycles but far less than you seem 
to assume, in my opinion.

> Anyone who thinks saving on CPU time is worth it for
> something as critical to disk seeks as the elevator code
> should be locked up inside a microVAX ;)

Not wasting uselessly CPU cycles is still worth it, in my opinion.

Regards,
  Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: New topic (PowerPC Linux PCI HELL)

2000-09-16 Thread Gérard Roudier




On Fri, 15 Sep 2000, Richard B. Johnson wrote:

> On Fri, 15 Sep 2000, [ISO-8859-1] Gérard Roudier wrote:
> 
> > 
> > 
> > On Fri, 15 Sep 2000, Linus Torvalds wrote:
> > 
> > > On Fri, 15 Sep 2000, Gérard Roudier wrote:
> > > > 
> 

[ ... ]

> > > No ifs, why's or buts. A driver that just enables the IO windows is a
> > > buggy driver. 
> > 
> > In PCI, you donnot enable windows, but you enable/disable devices to
> > generate and/or respond to transactions.
> 
> Well really? From the programmers point-of-view, you have just enabled
> some windows into address space. The word "transaction" has gotten way
> too much visibility. The fact that some hardware mechanism has gotten
> involved reading from and writing to a device means nothing except
> that a write (if enabled) is posted. We don't bother thinking about
> "transactions" when we write to SDRAM do we? To the programmer, we
> write to it and it sticks. The fact that there was a hardware transaction
> involving a read/modify/write of (usually) much more than our byte
> isn't a concern.

Hmmm... If you know what drivers have been written with such a limited
"low skilled" CPU centric approach in mind, let me know. I will just avoid
forever using the resulting crap.

  Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: New topic (PowerPC Linux PCI HELL)

2000-09-15 Thread Gérard Roudier




On Fri, 15 Sep 2000, Linus Torvalds wrote:

> On Fri, 15 Sep 2000, Gérard Roudier wrote:
> > 
> > > Just as an example: imagine that the IO windows haven't been set up
> > > correctly. If the low-level driver just blindly enables IO cycles by
> > > writing to the PCI_COMMAND register, that device may come up in an invalid
> > > state, and mess up the whole system. The driver simply does not KNOW
> > > enough. It doesn't know where other devices are, and it _shouldn't_ know. 
> > 
> > How do you want this to happen ? Could you elaborate.
> 
> It's really easy.
> 
> Call "pci_enable_device()".
> 
> What's so hard about that?

This function delegates too much as a whole to the PCI generic layer, IMO.
Imagine that for sanity I want to allocate all the device resources, but
only _enable_ part of device features (for example only memory
transactions). Imagine some special handling to be necessary due to some
chip bug.

> You don't seem to realize, but it's entirely possible to have a setup
> where some device CANNOT be allocated it's IO region. The BIOS may have
> left the device disabled on purpose, simply because there wasn't enough
> free space in the memory map to enable the device anywhere.

PCI specs said corresponding BAR must be set to ZERO, here.

> You can't just have the device driver enable such a device. It _has_ to
> ask the PCI layer to do it for it - because the PCI layer is the only one
> that can figure out that "Oh, damn, this machine has 3GB of memory, and 4
> video cards that want a 256MB aperture each, and we don't have any place
> to map this card any more".

I want to say the generic layer "What to do" in some more fine-grained way
than just a single verb, at least. I may accept to delegate it some "How
to do it".

> No ifs, why's or buts. A driver that just enables the IO windows is a
> buggy driver. 

In PCI, you donnot enable windows, but you enable/disable devices to
generate and/or respond to transactions. OTOH, if you look at all the bits
in the COMMAND register, you will see that some other features are also to
be addressed by the enabling/disabling kernel interface.

> > > In contrast, the general PCI layer _does_ know. It keeps track of
> > > resources, makes sure that different cards do not have overlapping address
> > > ranges, knows about PCI bridges (a card behind a PCI bridge can only be
> > > enabled after the _bridge_ has been enabled and can only be mapped in the
> > > region that the bridge maps).
> > 
> > Yes, it is expected to do its work (e.g. assigning ressources to all
> > agents on the BUS hierachy).
> 
> Right.
> 
> And sometimes it CANNOT.
> 
> End of story.
> 
>   Linus

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: New topic (PowerPC Linux PCI HELL)

2000-09-15 Thread Gérard Roudier

On Fri, 15 Sep 2000, Linus Torvalds wrote:

> On Fri, 15 Sep 2000, Richard B. Johnson wrote:
> > 
> > The PCI Specification states, in part, that either the BIOS or the
> > driver has to enable the device. So, many drivers find that the device
> > has not been enabled. This is normal and necessary because many/most
> > PCI hardware had better not be enabled until an ISR is in-place.
> 
> But that's why we have the "pci_enable_device()" function. And that's why
> we have the generic PCI setup functionality that finds and enables devices
> at the right addresses.
> 
> DO NOT USE ANY LOCAL HACKS. Use the proper function. Don't go mucking
> around with random configuration state information: enabling a device
> involves a lot more than just writing stuff to configuration ports. Things
> like making sure the interrupt routing on the motherboard has been
> enabled, etc. Things that the driver does not know about, and should not
> even _try_ to understand.
> 
> The PCI layer should be used to handle generic PCI issues. A low-level
> driver should _never_ try to handle resource allocation and enabling in
> hardware.

Disagreed about `enabling'.

> Just as an example: imagine that the IO windows haven't been set up
> correctly. If the low-level driver just blindly enables IO cycles by
> writing to the PCI_COMMAND register, that device may come up in an invalid
> state, and mess up the whole system. The driver simply does not KNOW
> enough. It doesn't know where other devices are, and it _shouldn't_ know. 

How do you want this to happen ? Could you elaborate.

> In contrast, the general PCI layer _does_ know. It keeps track of
> resources, makes sure that different cards do not have overlapping address
> ranges, knows about PCI bridges (a card behind a PCI bridge can only be
> enabled after the _bridge_ has been enabled and can only be mapped in the
> region that the bridge maps).

Yes, it is expected to do its work (e.g. assigning ressources to all
agents on the BUS hierachy).

> To make a long story short: a driver that touches the PCI_COMMAND or other
> generic PCI registers by hand is a _buggy_ driver. It's a sure recipe for
> disaster.

I disagree 100% with this statement. The genericity of PCI configuration
is only here to facilitate configuration of all the devices on a BUS
hierarchy. Indeed a PCI device driver must not tamper the resources the
device got from the generic PCI layer. But it is expected to know better
about the PCI devices it supports that any generic PCI layer. Not
everything are generic in PCI. What about device bugs, for example? About
bridges, part of their configuration can be handled generically, but not
everything. The generic PCI layer that deals with bridges have to know
bridge quirks and special features as you know. In some way, the PCI
generic layer acts as PCI device drivers for the bridges. Configuring and
enabling are 2 different issues. Configuring has been designed to be
generic in PCI but the `enabling' should be performed by the entity that
knows the best about the real device, thus the PCI device driver.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

79 matches

Mail list logo