Re: [openib-general] Ordering between PCI config space writes and MMIO reads?

2006-11-01 Thread David Miller
From: John Partridge [EMAIL PROTECTED]
Date: Wed, 01 Nov 2006 10:27:19 -0600

 Sorry, but I find this change a bit puzzling. The problem is
 particular to the PPB on the HCA and not Altix. I can't see anywhere
 that a PCI Config Write is required to block until completion, it is
 the driver and the HCA ,not the Altix hardware that requires the
 Config Write to have completed before we leave mthca_reset()
 Changing pci_write_config_xxx() will change the behavior for ALL
 drivers and the possibility of breaking something else. The fix was
 very low risk in mthca_reset(), changing the PCI code to fix this is
 much more onerous.

The issue is that something as simple as:

val = pci_read_config(REG);
val |= bit;
pci_write_config(REG, val);
newval = pci_read_config(REG);
BUG_ON(!(newval  bit));

is not guarenteed by PCI (aparently).

I see no valid reason why every PCI device driver should
be troubled with this lunacy and the ordering should thus
be ensured by the PCI layer.

It just so happens to take care of the original driver
issue too :-)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files

2006-10-26 Thread David Miller
From: Glenn Grundstrom [EMAIL PROTECTED]
Date: Thu, 26 Oct 2006 19:06:23 -0500

 +#include nes_tcpip/include/nes_sockets.h

I want to know what in the world this nes_tcpip thing is?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Ordering between PCI config space writes and MMIO reads?

2006-10-24 Thread David Miller
From: Matthew Wilcox [EMAIL PROTECTED]
Date: Tue, 24 Oct 2006 16:36:32 -0600

 This is only really a problem for setup (when we program the BARs), so
 it seems silly to enforce an ordering at any other time.  Reluctantly, I
 must disagree with Jeff -- drivers need to fix this.

One thing is that we definitely don't want to fix this by,
for example, reading back the PCI_COMMAND register or something
like that.  That causes two problems:

1) Some PCI config writes shut the device down and make it
   no respond to some kinds of PCI config transactions.
   One example is putting the device into D3 or similar
   power state, another is performing a device reset.

2) Several drivers use PCI config space accesses to touch the
   main registers in order to workaround bugs in the PCI-X
   implementation of their chip or similar (tg3 has a few
   cases like this), doing a PCI config space readback will
   kill performance quite a bit for an already slow situation.

In fact, I do recall that one of the x86 PCI config space access
implementations did a readback like this, and we had to remove
it because it caused problems when doing a reset on tg3 chips
when using PCI config space register write to do the reset.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Thu, 12 Oct 2006 21:12:06 +0200

 Quoting r. David Miller [EMAIL PROTECTED]:
  Subject: Re: Dropping NETIF_F_SG since no checksum feature.
  
  Numbers?
 
 I created two subnets on top of the same pair infiniband HCAs:

I was asking for SG vs. non-SG numbers so I could see proof
that it really does help like you say it will.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 11:05:04 +0200

 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
 data will be copied over rather than sent directly.
 So why does dev.c have to force set NETIF_F_SG to off then?

Because it's more efficient to copy into a linear destination
buffer of an SKB than page sub-chunks when doing checksum+copy.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 17:01:03 +0200

 Quoting Steven Whitehouse [EMAIL PROTECTED]:
   ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
   {
   ssize_t res;
   struct sock *sk = sock-sk;
   
   if (!(sk-sk_route_caps  NETIF_F_SG) ||
   !(sk-sk_route_caps  NETIF_F_ALL_CSUM))
   return sock_no_sendpage(sock, page, offset, size, flags);
   
   
   So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
   data will be copied over rather than sent directly.
   So why does dev.c have to force set NETIF_F_SG to off then?
  
  I agree with that analysis,
 
 So, would you Ack something like the following then?

I certainly don't.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 23:23:39 +0200

 With my patch, there is a huge performance gain by increasing MTU to 64K.
 And it seems the only way to do this is by S/G.

Numbers?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Michael S. Tsirkin [EMAIL PROTECTED]
Date: Wed, 11 Oct 2006 02:13:38 +0200

 Maybe I can patch linux to allow SG without checksum?
 Dave, maybe you could drop a hint or two on whether this is worthwhile
 and what are the issues that need addressing to make this work?
 
 I imagine it's not just the matter of changing net/core/dev.c :).

You can't, it's a quality of implementation issue.  We sendfile()
pages directly out of the filesystem page cache without any
blocking of modifications to the page contents, and the only way
that works is if the card computes the checksum for us.

If we sendfile() a page directly, we must compute a correct checksum
no matter what the contents.  We can't do this on the cpu before the
data hits the device because another thread of execution can go in and
modify the page contents which would invalidate the checksum and thus
invalidating the packet.  We cannot allow this.

Blocking modifications is too expensive, so that's not an option
either.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:33:46 -0700

 Michael My guess was, an extra pass over data is likely to be
 Michael expensive - dirtying the cache if nothing else. But I do
 Michael plan to measure that, and see.
 
 I don't get it -- where's the extra pass?  If you can't compute the
 checksum on the NIC then you have to compute sometime it on the CPU
 before passing the data to the NIC.

Also, if you don't do checksumming on the card we MUST copy
the data (be it from a user buffer, or from a filesystem page
cache page) into a private buffer since if the data changes
the checksum would become invalid, as I mentioned in another
email earlier.

Therefore, since we have to copy anyways, it always is better
to checksum in parallel with the copy.

So the whole idea of SG without hw-checksum support is without
much merit at all.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:42:20 -0700

 On the other hand I'm not sure how useful such a netdevice would be --
 will non-sendfile() paths generate big packets even if the MTU is 64KB?

non-sendfile() paths will generate big packets just fine, as long
as the application is providing that much data.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-10 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 10 Oct 2006 20:49:09 -0700

 David non-sendfile() paths will generate big packets just fine,
 David as long as the application is providing that much data.
 
 OK, cool.  Will the big packets be non-linear skbs?

If you had SG enabled (and thus checksumming offload too) then yes
you'll get a non-linear SKB.  Otherwise, without SG, you'll get a
fully linear SKB.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH Round 5 0/3] Network Event Notifier Mechanism

2006-07-30 Thread David Miller
From: Steve Wise [EMAIL PROTECTED]
Date: Fri, 28 Jul 2006 13:28:49 -0500

 Dave/Roland, is this patchset about ready to go?

All 3 patches applied, thanks Steve.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification.

2006-07-26 Thread David Miller
From: Steve Wise [EMAIL PROTECTED]
Date: Wed, 26 Jul 2006 11:15:43 -0500

 Dave, what do you think about removing the user-space stuff for the
 first round of integration?  IE:  Just add netevents and kernel hooks to
 generate them.

Sure.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-14 Thread David Miller
From: Ralph Campbell [EMAIL PROTECTED]
Date: Fri, 14 Jul 2006 15:27:07 -0700

 Note that the current design only supports one IOMMU type in a system.
 This could support multiple IOMMU types at the same time.

This is not true, the framework allows multiply such types
and in fact Sparc64 takes advantage of this.  We have about
4 or 5 different PCI controllers, and the IOMMUs are slightly
different in each.

Even with the standard PCI DMA mapping calls, we can gather the
platform private information necessary to program the IOMMU
appropriately for a given chipset.

The dma_mapping_ops idea will never get accepted by folks like Linus,
for reasons I've outlined in previous emails in this thread.  So, it's
best to look elsewhere for solutions to your problem, such as the
ideas used by the USB and IEE1394 device layers.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread David Miller
From: Ralph Campbell [EMAIL PROTECTED]
Date: Wed, 12 Jul 2006 16:29:27 -0700

 Currently, the ib_ipath driver requires that the mapping be
 one-to-one since there is no practical way to reverse IOMMU
 mappings.

You can maintain a hash table that maps DMA addresses back to kernel
mappings.  Depending upon your situation, you can optimize this to use
very small keys if you have some kind of other identification method
for your buffers.

That would be for dynamic mappings.

You were using consistent DMA memory, which I gather you're not,
you could use the PCI DMA pool mechanism.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Wed, 12 Jul 2006 17:11:26 -0700

 A cleaner solution would be to make the dma_ API really use the device
 it's passed anyway, and allow drivers to override the standard PCI
 stuff nicely.  But that would be major surgery, I guess.

Clean but expensive, you should not force the rest of the kernel
to eat the cost of something you want to do when it's totally
unnecessary for most other users.

For example, x86 never needs to do anything other than a direct
virt_to_phys translation to produce a DMA address, no matter what
bus the device is on.  It's a single simple integer adjustment
that can be done inline in about 2 or 3 instructions at most.

Once you start allowing overrides then even x86 starts to eat the
stupid costs of dereferencing some kind of device ops method.

That doesn't make any sense, and that's why the DMA API works the way
it does now.  It's a platform or bus operation, not a device one.

If you need device level DMA mapping semantics, create them for your
device type.  This is what USB does, btw.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss

2006-06-29 Thread David Miller
From: Bryan O'Sullivan [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 14:41:30 -0700

 +/*
 + * Copy data.  Try not to pollute the dcache with the source data,
 + * because we won't be reading it again.
 + */
 +#if defined(CONFIG_X86_64)
 +void *ipath_memcpy_nc(void *dest, const void *src, size_t n);
 +#else
 +#define ipath_memcpy_nc(dest, src, n) memcpy(dest, src, n)
 +#endif

A facility like this doesn't belong in some arbitrary driver layer.
It belongs as a generic facility the whole kernel could make use
of.

Please stop polluting the infiniband drivers with Opteron crap.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-06-29 Thread David Miller
From: Bryan O'Sullivan [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 14:41:29 -0700

  ipath_core-$(CONFIG_X86_64) += ipath_wc_x86_64.o
 +ipath_core-$(CONFIG_PPC64) += ipath_wc_ppc64.o

Again, don't put these kinds of cpu specific functions
into the infiniband driver.  They are potentially globally
useful, not something only Infiniband might want to do.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-06-29 Thread David Miller
From: Bryan O'Sullivan [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 15:01:39 -0700

 The support for write combining in the kernel is not in a state where
 that makes any sense at the moment.

Please fix the generic code if it doesn't provide the facility
you need at the moment.  Don't shoe horn it into your driver
just to make up for that.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss

2006-06-29 Thread David Miller
From: Bryan O'Sullivan [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 14:59:37 -0700

 It could, indeed.  In fact, we had that discussion here before I sent
 this patch in.  It presumably wants to live in lib/, and acquire a more
 generic name.  What name will capture the uncached-read-but-cached-write
 semantics in a useful fashion?  memcpy_nc?

I'm not good with names :-)

Note that there also might be cases where using such a memcpy
variant might be the wrong thing to do.  For example, for a very
tightly coupled CMT cpu implementation which has the memory controller,
L2 cache, PCI controller, etc. all on the same die and the PCI controller
makes use of the L2 cache just like the cpu threads do, using this
kind of memcpy would always be the wrong thing to do.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss

2006-06-29 Thread David Miller
From: Bryan O'Sullivan [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 16:34:23 -0700

 I'm not quite following you, though I assume you're referring to Niagara
 or Rock :-)  Are you saying a memcpy_nc would do worse than plain
 memcpy, or worse than some other memcpy-like routine?

It would do worse than memcpy.

If you bypass the L2 cache, it's pointless because the next
agent (PCI controller, CPU thread, etc.) is going to need the
data in the L2 cache.

It's better in that kind of setup to eat the L2 cache miss overhead in
memcpy since memcpy can usually prefetch and store buffer in order to
absorb some of the L2 miss costs.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss

2006-06-29 Thread David Miller
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 17:28:50 -0700

 I thought that most PCI controllers (that is to say the things bridging 
 PCI to the rest of the system) could do prefetching and/or that PCI-X 
 (if not PCI, no idea about PCI-e) cards could issue multiple 
 transactions anyway?

People doing deep CMT chips have found out that all of that
prefetching and store buffering is unnecessary when everything is so
tightly integrated.

All of the previous UltraSPARC boxes before Niagara had a
streaming cache sitting on the PCI controller.  It basically
prefetched for reads and collected writes from PCI devices
into cacheline sized chunks.

The PCI controller in the current Niagara systems has none of that
stuff.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss

2006-06-29 Thread David Miller
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 29 Jun 2006 17:44:05 -0700

 Then is prefetching in memcpy really that important to them.

Not really, the thread just blocks while waiting for memory.
On stores they do a cacheline fill optimization similar to
the powerpc.

 Relying on PCI-X devices to issue multiple requests then?

Perhaps :)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/2][RFC] Network Event Notifier Mechanism

2006-06-21 Thread David Miller

Most of the folks capable of reviewing networking changes
listen in on the netdev@vger.kernel.org mailing list, not
here.

Thanks.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general