Re: [openib-general] [PATCH 1/6] Add pci_find_ht_capability() for finding Hypertransport capabilities

2006-11-09 Thread Segher Boessenkool
 +int pci_find_next_ht_capability(struct pci_dev *dev, int pos, int  
 ht_cap)
 +{
 + int rc;
 + u8 cap, mask;
 +
 + if (ht_cap == HT_CAPTYPE_SLAVE || ht_cap == HT_CAPTYPE_HOST)
 + mask = HT_3BIT_CAP_MASK;
 + else
 + mask = HT_5BIT_CAP_MASK;
 +

+   pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_HT);

or the caller will loop forever if a second same type HT cap is found.

 + while (pos) {
 + rc = pci_read_config_byte(dev, pos + 3, cap);
 + if (rc != PCIBIOS_SUCCESSFUL)
 + return 0;
 +
 + if ((cap  mask) == ht_cap)
 + return pos;
 +
 + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_HT);
 + }
 +
 + return 0;
 +}


Segher


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 5/6] Use pci_find_ht_capability() in drivers/pci/quirks.c

2006-11-09 Thread Brice Goglin
You don't have any TTL in the while loop below, neither in the while
loop in pci_find_next_ht_capability(). It's paranoid, but I'd rather
keep a TTL in both loops (a brain-damaged capability chain in the PCI
config space could lead to an infinite loop without any clue of what's
going on, not easy to find out...).

Apart from that, I like the idea.

Brice



Michael Ellerman wrote:
 Use pci_find_ht_capability() in drivers/pci/quirks.c.

 I'm pretty sure the logic is unchanged here, but someone please eye-ball it
 for me. I've changed the message to be a little shorter, it's now:

 PCI: Found (enabled|disabled) HT MSI mapping on :xx:xx.x

 Signed-off-by: Michael Ellerman [EMAIL PROTECTED]
 ---

  drivers/pci/quirks.c |   27 +++
  1 file changed, 15 insertions(+), 12 deletions(-)

 Index: msi/drivers/pci/quirks.c
 ===
 --- msi.orig/drivers/pci/quirks.c
 +++ msi/drivers/pci/quirks.c
 @@ -1724,19 +1724,22 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AM
   * return 1 if a HT MSI capability is found and enabled */
  static int __devinit msi_ht_cap_enabled(struct pci_dev *dev)
  {
 - u8 pos;
 - int ttl;
 - for (pos = pci_find_capability(dev, PCI_CAP_ID_HT), ttl = 48;
 -  pos  ttl;
 -  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_HT), ttl--) {
 - u32 cap_hdr;
 - /* MSI mapping section according to Hypertransport spec */
 - if (pci_read_config_dword(dev, pos, cap_hdr) == 0
 -  (cap_hdr  0xf800) == 0xa800 /* MSI mapping */) {
 - printk(KERN_INFO PCI: Found HT MSI mapping on %s with 
 capability %s\n,
 -pci_name(dev), cap_hdr  0x1 ? enabled : 
 disabled);
 - return (cap_hdr  0x1) != 0; /* MSI mapping cap 
 enabled */
 + int pos;
 +
 + pos = pci_find_ht_capability(dev, HT_CAPTYPE_MSI_MAPPING);
 + while (pos) {
 + u8 flags;
 +
 + if (pci_read_config_byte(dev,
 + pos + HT_MSI_FLAGS, flags) == 0) {
 + printk(KERN_INFO PCI: Found %s HT MSI Mapping on %s\n,
 + flags  HT_MSI_FLAGS_ENABLE ?
 + enabled : disabled, pci_name(dev));
 + return (flags  HT_MSI_FLAGS_ENABLE) != 0;
   }
 +
 + pos = pci_find_next_ht_capability(dev, pos,
 + HT_CAPTYPE_MSI_MAPPING);
   }
   return 0;
  }
   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 5/6] Use pci_find_ht_capability() in drivers/pci/quirks.c

2006-11-09 Thread Segher Boessenkool
 You don't have any TTL in the while loop below, neither in the while
 loop in pci_find_next_ht_capability(). It's paranoid, but I'd rather
 keep a TTL in both loops (a brain-damaged capability chain in the PCI
 config space could lead to an infinite loop without any clue of what's
 going on, not easy to find out...).

There's so many other ways broken PCI headers can cause
problems, it's just not funny.  You can't catch all of
them however hard you try.

I always thought the super-over-the-top paranoia checks
in the generic PCI capability list walkers were workarounds
for problems actually observed in the field; can we do the
same for the HT-specific walker?  I.e., don't implement
the workaround before we know we need it.


Segher


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Fwd: IPoIB new multicast API patches oops

2006-11-09 Thread Michael S. Tsirkin
Following Sean's suggestion, I have let the nightly tests run with Roland's mad 
patch
in addition to Sean's new multicast interface patches (v2), and got the
following crash:

Nov  9 10:59:43 sw084 kernel: NET: Unregistered protocol family 27
Nov  9 10:59:46 sw084 net.agent[25786]: remove event not handled
Nov  9 10:59:46 sw084 net.agent[25787]: remove event not handled
Nov  9 10:59:46 sw084 net.agent[25818]: remove event not handled
Nov  9 10:59:46 sw084 net.agent[25814]: remove event not handled
Nov  9 10:59:46 sw084 kernel: BUG: spinlock bad magic on CPU#1, ib_mad2/1588
Nov  9 10:59:46 sw084 kernel: general protection fault:  [1] SMP
Nov  9 10:59:46 sw084 kernel: CPU 1
Nov  9 10:59:46 sw084 kernel: Modules linked in: nfsd exportfs ipv6 parport_pc 
lp parport autofs4 nfs lockd nfs_acl sunrpc vfat fat dm_mirror dm_mod button b
attery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core ib_mthca ib_umad ib_sa ib_mad 
ib_core tg3 ext3 jbd sata_nv libata mptsas scsi_transport_sas sd_mod
Nov  9 10:59:46 sw084 kernel: Pid: 1588, comm: ib_mad2 Not tainted 2.6.17.7 #3
Nov  9 10:59:46 sw084 kernel: RIP: 0010:[802ddc40] 
802ddc40{spin_bug+116}
Nov  9 10:59:46 sw084 kernel: RSP: 0018:81013c611ca8  EFLAGS: 00010002
Nov  9 10:59:46 sw084 kernel: RAX: 6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 
8044c057
Nov  9 10:59:46 sw084 kernel: RDX: 804a7f18 RSI: 0046 RDI: 
804a7f00
Nov  9 10:59:46 sw084 kernel: RBP: 81013c48a320 R08:  R09: 

Nov  9 10:59:46 sw084 kernel: R10: 0001 R11: 802160df R12: 
81013c48a318
Nov  9 10:59:46 sw084 kernel: R13: 0212 R14:  R15: 
8808f2d8
Nov  9 10:59:46 sw084 kernel: FS:  2af8d7bd8b00() 
GS:81013fc616d0() knlGS:f7f796c0
Nov  9 10:59:46 sw084 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Nov  9 10:59:46 sw084 kernel: CR2: 005b719c CR3: 00013ad5c000 CR4: 
06e0
Nov  9 10:59:46 sw084 kernel: Process ib_mad2 (pid: 1588, threadinfo 
81013c61, task 81013f5c29f0)
Nov  9 10:59:46 sw084 kernel: Stack: 0003 81013c48a320 
81013c48a320 802ddc8d
Nov  9 10:59:46 sw084 kernel:81013bb722f0 81013c48a320 
81013c48a318 80428b2b
Nov  9 10:59:46 sw084 kernel:0246 88097eff
Nov  9 10:59:46 sw084 kernel: Call Trace: 802ddc8d{_raw_spin_lock+28} 
80428b2b{_spin_lock_irqsave+11}
Nov  9 10:59:46 sw084 kernel:
88097eff{:ib_sa:release_group+27} 
88098903{:ib_sa:mcast_work_handler+1280}
Nov  9 10:59:46 sw084 kernel:80428bd7{_spin_unlock_irq+7} 
8808f2d8{:ib_mad:timeout_sends+0}
Nov  9 10:59:46 sw084 kernel:
880978c3{:ib_sa:ib_sa_mcmember_rec_callback+64}
Nov  9 10:59:46 sw084 kernel:80428bd7{_spin_unlock_irq+7} 
88097ac4{:ib_sa:send_handler+74}
Nov  9 10:59:46 sw084 kernel:
8808f465{:ib_mad:timeout_sends+397} 
80238450{run_workqueue+161}
Nov  9 10:59:46 sw084 kernel:8023849a{worker_thread+0} 
8023b444{keventd_create_kthread+0}
Nov  9 10:59:46 sw084 kernel:8023859f{worker_thread+261} 
80223ddd{default_wake_function+0}
Nov  9 10:59:46 sw084 kernel:
8023b444{keventd_create_kthread+0} 
80223ddd{default_wake_function+0}
Nov  9 10:59:46 sw084 kernel:
8023b444{keventd_create_kthread+0} 8023b41b{kthread+200}
Nov  9 10:59:46 sw084 kernel:8020a6a6{child_rip+8} 
8023b444{keventd_create_kthread+0}
Nov  9 10:59:46 sw084 kernel:8023b353{kthread+0} 
8020a69e{child_rip+0}
Nov  9 10:59:46 sw084 kernel:
Nov  9 10:59:46 sw084 kernel: Code: 44 8b 83 04 01 00 00 48 8d 8b a0 02 00 00 
8b 55 04 41 89 c1
Nov  9 10:59:46 sw084 kernel: RIP 802ddc40{spin_bug+116} RSP 
81013c611ca8
Nov  9 10:59:46 sw084 kernel:  3BUG: sleeping function called from invalid 
context at include/linux/rwsem.h:43
Nov  9 10:59:46 sw084 kernel: in_atomic():0, irqs_disabled():1
Nov  9 10:59:46 sw084 kernel:
Nov  9 10:59:46 sw084 kernel: Call Trace: 80221db2{__might_sleep+190} 
802355ef{blocking_notifier_call_chain+31}
Nov  9 10:59:46 sw084 kernel:8022b856{do_exit+34} 
80428b2b{_spin_lock_irqsave+11}
Nov  9 10:59:46 sw084 kernel:
802eb925{vgacon_set_cursor_size+51} 
8808f2d8{:ib_mad:timeout_sends+0}
Nov  9 10:59:46 sw084 kernel:8020afc3{do_divide_error+0} 
8042964c{do_general_protection+254}
Nov  9 10:59:46 sw084 kernel:8020a4ed{error_exit+0} 
8808f2d8{:ib_mad:timeout_sends+0}
Nov  9 10:59:46 sw084 kernel:802160df{flat_send_IPI_mask+0} 
802ddc40{spin_bug+116}
Nov  9 10:59:46 sw084 kernel:802ddc2d{spin_bug+97} 
802ddc8d{_raw_spin_lock+28}
Nov  9 

Re: [openib-general] [mvapich-discuss] This is the last time I'm asking...

2006-11-09 Thread Jeff Squyres
DK --

Are you going to answer my questions?


On Nov 6, 2006, at 11:27 AM, Jeff Squyres wrote:

 As I explained in my mail, no one had replied to any of the posts  
 containing my very directed and specific questions (not even you --  
 and you still haven't), so I figured that no one cared.  That's not  
 an unreasonable assumption given that I posted the same questions 3  
 times and got silence in return.

 I am unaware of any special right required to make a motion.  Are  
 there some protocols (perhaps a la Robert's Rules of Order) that  
 are typically used for making a motion?  I haven't seen any...?

 The agenda for the SC Developer's Summit is already over-full.   
 This conversation is fine to begin in e-mail; a good start would be  
 answering my original questions.  Thanks!


 On Nov 6, 2006, at 9:53 AM, Dhabaleswar Panda wrote:

 Jeff:

 May I know on with what `right' you are making this motion to remove
 the code.

 To have the code there was decided by the OpenIB community and the
 organizers. It needs to be decided by the community, not by an
 individual person.

 Let me suggest that we we discuss this at the Developers Summit at SC
 '06.  If the Open Fabrics community no longer wants the code to be
 there and will prefer to download it from the OSU SVN site, we can
 proceed accordingly.

 Thanks,

 DK


 Having received no replies for 2 weeks as to why it is useful to  
 have
 MVAPICH in the OpenFabrics SVN, I can only conclude that no one
 cares.  If someone does care, please respond to my original  
 questions
 included below ASAP (originally posted 23 Oct, 27 Oct, 1 Nov).

 I therefore make the motion to remove MVAPICH from the OpenFabrics
 SVN (all the source is still available via the OSU SVN and other
 distribution points).  Specifically, I motion to do the following
 around COB tomorrow (7 Nov 2006):

  svn rm https://openib.org/svn/gen2/trunk/src/userspace/mpi

 Any objections?



 On Nov 1, 2006, at 10:53 AM, Jeff Squyres wrote:

 Forwarding this to the mvapich-discuss list because it has gotten
 zero replies on the openib-general list.  If someone from OSU could
 reply, it would be most helpful.  Thanks.


 Begin forwarded message:

 From: Jeff Squyres [EMAIL PROTECTED]
 Date: October 27, 2006 11:05:17 AM EDT
 To: openib openib-general@openib.org
 Subject: Re: [mvapich] Announcing the release of MVAPICH2 0.9.6
 with on-demand connection management, multi-core optimized shared
 memory communication and memory hook support

 Any response from the OSU crew?

 Can someone provide a reason why MVAPICH is still in OpenIB's
 Subversion repository?  Please see my original mail, below, for
 more detailed questions.

 Thanks.


 On Oct 23, 2006, at 7:36 AM, Jeff Squyres wrote:

 On Oct 22, 2006, at 11:53 PM, Dhabaleswar Panda wrote:

 A stripped down version of this release is also available at the
 OpenIB SVN.

 I see this statement in every MVAPICH release notice and it
 continues to puzzle me.

 I understand that there was a use for an alternate distribution
 source before MVAPICH became open source.  But now that the
 MVAPICH code bases are freely available from OSU via multiple
 mechanisms (anonymous SVN, tarball download, etc.), why is a
 stripped down version maintained in the OpenIB SVN?

 1. What, exactly, is the difference between the MVAPICH available
 from OSU and the stripped down version in the OpenIB SVN?

 2. Why would someone choose to download the stripped down
 version from the OpenIB SVN?  Have any real users/customers done
 so?

 3. What is the point of maintaining yet more flavors of MVAPICH
 -- aren't there enough already (multiple versions from OSU,  
 more versions available from each IB vendor)?

 DK -- can you please explain?  Thanks.

 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems




 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems




 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems




 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems

 ___
 mvapich-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems




-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 5/6] Use pci_find_ht_capability() in drivers/pci/quirks.c

2006-11-09 Thread Greg KH
On Thu, Nov 09, 2006 at 10:43:25AM +0100, Segher Boessenkool wrote:
 You don't have any TTL in the while loop below, neither in the while
 loop in pci_find_next_ht_capability(). It's paranoid, but I'd rather
 keep a TTL in both loops (a brain-damaged capability chain in the PCI
 config space could lead to an infinite loop without any clue of what's
 going on, not easy to find out...).
 
 There's so many other ways broken PCI headers can cause
 problems, it's just not funny.  You can't catch all of
 them however hard you try.
 
 I always thought the super-over-the-top paranoia checks
 in the generic PCI capability list walkers were workarounds
 for problems actually observed in the field; can we do the
 same for the HT-specific walker?  I.e., don't implement
 the workaround before we know we need it.

While yes, we should not in general add new workarounds before we need
them, for this quirk, you should keep the original functionality, unless
you wrote the quirk, or unless you have the hardware that needs it and
you can verify that the change works properly.

Are any of these last two options true for you?

If not, I suggest that you put the TTL logic back in just to be safe.

thanks,

greg k-h

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 5/6] Use pci_find_ht_capability() in drivers/pci/quirks.c

2006-11-09 Thread Segher Boessenkool
 While yes, we should not in general add new workarounds before we need
 them, for this quirk, you should keep the original functionality,  
 unless
 you wrote the quirk, or unless you have the hardware that needs it and
 you can verify that the change works properly.

 Are any of these last two options true for you?

This new code only runs on HyperTransport devices and
none of those _existed_ when the quirk was first written.
I cannot claim I know for sure it is never needed there
of course, but it's quite improbable at least.

 If not, I suggest that you put the TTL logic back in just to be safe.

I'm fine with that -- but I'm not writing the code here,
Michael is, and I just hope he has more spine than I do ;-)


Segher


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] scaling issues, was: uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-09 Thread Or Gerlitz
Sean Hefty wrote:
 Or Gerlitz wrote:
 Can be very nice if you share with the community the IB stack issues 
 revealed under scale-out testing... basically what was the testbed?

 We have a 256 node (512 processors) cluster that we can test with on the 
 second Tuesday following the first Monday of any month with two full 
 moons.  We're only now getting some time on the cluster, and our test 
 capabilities are limited.

 The main issue that we saw was that the SA simply doesn't scale.

I see. Thanks for the detailed response and sorry for the no reply on my 
  side so far, i was too busy...

Your email describes the problem under the all-to-all connection model. 
My thinking is that this design is the first one to be revisited, i 
understand that open mpi opens connections on demand (at this point of 
time it does not use the ib stack connection management services as 
well). Even in the all-to-all-conn model, a question to ask is if the 
connecting is done in N phases or for all ranks you just call in a loop

for(j=i+1; jn; j++)
dat_ep_connect(ep[j], ip-address of peer j)


and then

while(there are more non established connections)
dat_evd_wait(...)

 At 5000 queries per second, it will take the SA nearly 30 seconds to 
 respond to the first set of requests, most of which will have timed 
 out.  By the time it reached the end of the first 130,000 requests, it 
 had hundreds of thousands of queued retries, most of which had also 
 already timed out.  (E.g. even with a exponential backoff, you'd have 
 retries at 4 seconds, 12 seconds, and 28 seconds before the SA can 
 finish processing the first set of requests.)

 To further complicate the issue, retried requests are given new 
 transaction IDs by the ib_sa module, which makes it impossible for the 
 SA to detect retries from original requests.  It sees all requests as 
 new.  On our largest run, we were never able to complete route resolution.

OK, i recall some patch or rfc you have posted which enables a response 
on original request match a pending retry, basically it means that all 
the retries use the TID of the original request, correct? am i dreaming 
so this is indeed somewhere in the pipe to the kernel?

 We're still exploring possibilities in this area.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] what should happen in a completion event channel is being destroyed when there are several CQs associated to it?

2006-11-09 Thread Dotan Barak
Hi.

What should happen in a completion event channel is being destroyed when 
there are several CQs associated to it?
Should this operation fail (return EBUSY)?
Should this operation pass?
Is it legal for a user to perform this operation?

When i tried to do it and later on try to wait for a completion on this 
event channel i got seg fault...

thanks
Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Installation on openSUSE 10.2 Beta1 fails

2006-11-09 Thread Vladimir Sokolovsky
Hello Diego,
Check that you have libstdc++, libstdc++-devel and compat-libstdc++ RPMs 
installed.

Regards,
Vladimir

Diego Guella wrote:

 From: Tziporet Koren
 The failing is utility is used for IPoIB high availability. If you 
 don't need to use them you can just change this line in ofed.conf:
 ipoibtools=n

 Tziporet

 Thanks Tziporet for your answer.


 Tried just right now, i disabled ipoibtools. I get another, more 
 strange error:
 (attached OFED.3816.log)
 -
 /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
 cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
 cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
 Running: ./configure 
 --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
 --disable-libcheck --prefix /usr/local/ofed --libdir 
 /usr/local/ofed/lib CPPFLAGS=-I../libibverbs/include
 configure: creating cache 
 /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
 checking for a BSD-compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking for gawk... gawk
 checking whether make sets $(MAKE)... yes
 checking build system type... x86_64-unknown-linux-gnu
 checking host system type... x86_64-unknown-linux-gnu
 checking for style of include used by make... GNU
 checking for gcc... gcc
 checking for C compiler default output file name... configure: error: 
 C compiler cannot create executables
 See `config.log' for more details.
 Failed to execute: ./configure 
 --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
 --disable-libcheck --prefix /usr/local/ofed --libdir 
 /usr/local/ofed/lib CPPFLAGS=-I../libibverbs/include
 error: Bad exit status from /var/tmp/rpm-tmp.46102 (%install)
 -

 Am I right? It says my C compiler cannot create executables Is it 
 joking me
 In the log file, line 6393, it says:
 -
 checking for C compiler default output file name... a.out
 -

 I don't understand!
 Is there something I can do to fix this?


 Thanks,
 Diego
 

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] scaling issues, was: uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-09 Thread Sean Hefty
Or Gerlitz wrote:
 for(j=i+1; jn; j++)
 dat_ep_connect(ep[j], ip-address of peer j)
 
 
 and then
 
 while(there are more non established connections)
dat_evd_wait(...)

I'm not overly familiar with the the MPI code, so I can't comment on the 
implementation.

 OK, i recall some patch or rfc you have posted which enables a response 
 on original request match a pending retry, basically it means that all 
 the retries use the TID of the original request, correct? am i dreaming 
 so this is indeed somewhere in the pipe to the kernel?

I have a patch that exposed the mad layer retry count up through the SA query 
code.  However, I'm not sure that it helps us all that much without additional 
changes.  Detecting duplicate requests is left as a responsibility to the 
receiver, and retries are issued using a linear timeout.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Unusable QP's on CM established connections from gen2 client to gen1 server.

2006-11-09 Thread Bub Thomas
Title: Unusable QP's on CM established connections from gen2 client to gen1 server.






As written before I have to connect a gen2 client with a gen1 server using CM.

The connection is established fine and both sides go into the connected state.

However I cant send any data from none of the two sides.

As soon as I do so I get a 0x81 VAPI_RETRY_EXC_ERR when trying to send from the gen1 or a vendor_err 0x81 when trying to send from the gen2 side.



What works so far with my gen1 and gen2 code is:

Connecting gen1 client and gen1 server is no problem.

Connecting gen2 client and gen2 server is no problem.

Connecting gen1 client and gen2 server is no problem



Unfortunately the only usage scenario I have is to connect gen2 client and gen1 server. L

Is there a way to diagnose the reason for trouble when a VAPI_SEND / IBV_WR_SEND returns that 0x81 VAPI_RETRY_EXC_ERR?

It seems as if the receiving side which is sure in the receive mode does not get any notification as if the sender does not even start sending.

I already printed out the qp_state of both the gen2 client and the gen1 server before the send fails. Which look like:



Gen2 client QP state.

 qp_state: 3

 cur_qp_state: 3

 path_mtu: 4

 path_mig_state: 0

 qkey: 16391

 rq_psn: 5571589

 sq_psn: 15025863

 dest_qp_num: 10814473

 qp_access_flags: 14

 cap: 256

 ah_attr: 256

 alt_ah_attr: 29

 pkey_index: 3

 alt_pkey_index: 460

en_sqd_async_notify: 33022

 sq_draining: 0

 max_rd_atomic: 82905088

max_dest_rd_atomic: 219649539

 min_rnr_timer: 0

 port_num: 16384

 timeout: 50331753

 retry_cnt: 65795

 rnr_retry: 0

 alt_port_num: 0

 alt_timeout: 0



Gen1 server QP state.

 qp_state: 3

 en_sqd_asyn_notif: 0

 sq_draining: 0

 qp_num: 10814473

remote_atomic_flags: 7

 qkey: 0

 path_mtu: 4

 path_mig_state: 0

 rq_psn: 15025863

 sq_psn: 5571589

 qp_ous_rd_atom: 4

 ous_dst_rd_atom: 4

 min_rnr_timer: 27

 cap: 200

 dest_qp_num: 200

 sched_queue: 28

 pkey_ix: 28

 port: 460

 av: 5571589

 timeout: 0

 retry_count: 0

 rnr_retry: 1

 alt_pkey_ix: 0

 alt_port: 0

 alt_av: 0

 alt_timeout: 0



In the good cases described above the qp_state look similar to the ones described here 

Any help welcome.

Thomas Bub


Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: 49 6150 104 147
Fax: 49 6150 104 656
Email: [EMAIL PROTECTED]
www.GrassValley.com





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Unusable QP's on CM established connections from gen2 client to gen1 server.

2006-11-09 Thread Sean Hefty
Bub Thomas wrote:
 As soon as I do so I get a 0x81 VAPI_RETRY_EXC_ERR when trying to send 
 from the gen1 or a vendor_err 0x81 when trying to send from the gen2 side.

It sounds like there's an issue with the QP configuration.  Maybe there's a 
difference between which byte-order the QP attributes are specified?

  rq_psn: 5571589
 
  sq_psn: 15025863
 
 dest_qp_num: 10814473

Can you verify the byte order for the values above?

I didn't see the local qp_num listed.

  pkey_index: 3

This isn't necessarily a problem, but I usually see a pkey index of 0.  Are you 
running with multiple partitions on your subnet?  What pkey value does this 
equate to? (cat /sys/class/infiniband/mthca0/ports/1/pkeys/3)

port_num: 16384

Is the displayed port_num valid?

 Gen1 server QP state.
 
  qp_num: 10814473
 
  rq_psn: 15025863
 
  sq_psn: 5571589
 
 dest_qp_num: 200

Can you verify the qp_num on the remote side?

 pkey_ix: 28

This index looks unlikely.

port: 460

Is this value valid?

 retry_count: 0

Try increasing this.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Fwd: IPoIB new multicast API patches oops

2006-11-09 Thread Sean Hefty
Michael S. Tsirkin wrote:
 Following Sean's suggestion, I have let the nightly tests run with Roland's 
 mad patch
 in addition to Sean's new multicast interface patches (v2), and got the
 following crash:

I have time now to try reproducing this.  Can you describe the test setup some? 
  Was opensm running?  Was this after loading/unloading ipoib?  Was this while 
trying to unload ib_sa?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support

2006-11-09 Thread Venkatesh Babu
Hi Sean,

I have verified your changes and it is working fine. I have tried port 
failover on both Active and Passive nodes. It is working fine.

Since you have not provided the ib_sa_serv_notice_hdlr() changes for the 
remote event notification I am still using my patch. What are your plans 
for updating that ? How did you tested the failover on the Passive node ?

VBabu

Sean Hefty wrote:

Memo to me: read comments about missing functionality...

Fixed an issue with the previous patch not having the right pkey
when forwarding LAP messages to the user.

With this patch, I'm able to fail over between two paths, reload
a new path, and fail again repeatedly using my test program.

Venkatesh, if you can verify that this code works for you, I will
request that it be queued for 2.6.20.

Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1cf0d42..ed69573 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -147,12 +147,12 @@ struct cm_id_private {
   __be32 rq_psn;
   int timeout_ms;
   enum ib_mtu path_mtu;
+  __be16 pkey;
   u8 private_data_len;
   u8 max_cm_retries;
   u8 peer_to_peer;
   u8 responder_resources;
   u8 initiator_depth;
-  u8 local_ack_timeout;
   u8 retry_count;
   u8 rnr_retry_count;
   u8 service_timeout;
@@ -691,7 +691,7 @@ static void cm_enter_timewait(struct cm_
* timewait before notifying the user that we've exited timewait.
*/
   cm_id_priv-id.state = IB_CM_TIMEWAIT;
-  wait_time = cm_convert_to_ms(cm_id_priv-local_ack_timeout);
+  wait_time = cm_convert_to_ms(cm_id_priv-av.packet_life_time + 1);
   queue_delayed_work(cm.wq, cm_id_priv-timewait_info-work.work,
  msecs_to_jiffies(wait_time));
   cm_id_priv-timewait_info = NULL;
@@ -1010,6 +1010,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_i
   cm_id_priv-responder_resources = param-responder_resources;
   cm_id_priv-retry_count = param-retry_count;
   cm_id_priv-path_mtu = param-primary_path-mtu;
+  cm_id_priv-pkey = param-primary_path-pkey;
   cm_id_priv-qp_type = param-qp_type;
 
   ret = cm_alloc_msg(cm_id_priv, cm_id_priv-msg);
@@ -1024,8 +1025,6 @@ int ib_send_cm_req(struct ib_cm_id *cm_i
 
   cm_id_priv-local_qpn = cm_req_get_local_qpn(req_msg);
   cm_id_priv-rq_psn = cm_req_get_starting_psn(req_msg);
-  cm_id_priv-local_ack_timeout =
-  cm_req_get_primary_local_ack_timeout(req_msg);
 
   spin_lock_irqsave(cm_id_priv-lock, flags);
   ret = ib_post_send_mad(cm_id_priv-msg, NULL);
@@ -1410,9 +1409,8 @@ static int cm_req_handler(struct cm_work
   cm_id_priv-initiator_depth = cm_req_get_resp_res(req_msg);
   cm_id_priv-responder_resources = cm_req_get_init_depth(req_msg);
   cm_id_priv-path_mtu = cm_req_get_path_mtu(req_msg);
+  cm_id_priv-pkey = req_msg-pkey;
   cm_id_priv-sq_psn = cm_req_get_starting_psn(req_msg);
-  cm_id_priv-local_ack_timeout =
-  cm_req_get_primary_local_ack_timeout(req_msg);
   cm_id_priv-retry_count = cm_req_get_retry_count(req_msg);
   cm_id_priv-rnr_retry_count = cm_req_get_rnr_retry_count(req_msg);
   cm_id_priv-qp_type = cm_req_get_qp_type(req_msg);
@@ -1716,7 +1714,7 @@ static int cm_establish_handler(struct c
   unsigned long flags;
   int ret;
 
-  /* See comment in ib_cm_establish about lookup. */
+  /* See comment in cm_establish about lookup. */
   cm_id_priv = cm_acquire_id(work-local_id, work-remote_id);
   if (!cm_id_priv)
   return -EINVAL;
@@ -2402,11 +2400,16 @@ int ib_send_cm_lap(struct ib_cm_id *cm_i
   cm_id_priv = container_of(cm_id, struct cm_id_private, id);
   spin_lock_irqsave(cm_id_priv-lock, flags);
   if (cm_id-state != IB_CM_ESTABLISHED ||
-  cm_id-lap_state != IB_CM_LAP_IDLE) {
+  (cm_id-lap_state != IB_CM_LAP_UNINIT 
+   cm_id-lap_state != IB_CM_LAP_IDLE)) {
   ret = -EINVAL;
   goto out;
   }
 
+  ret = cm_init_av_by_path(alternate_path, cm_id_priv-alt_av);
+  if (ret)
+  goto out;
+
   ret = cm_alloc_msg(cm_id_priv, msg);
   if (ret)
   goto out;
@@ -2431,7 +2434,8 @@ out: spin_unlock_irqrestore(cm_id_priv-
 }
 EXPORT_SYMBOL(ib_send_cm_lap);
 
-static void cm_format_path_from_lap(struct ib_sa_path_rec *path,
+static void cm_format_path_from_lap(struct cm_id_private *cm_id_priv,
+  struct ib_sa_path_rec *path,
   struct cm_lap_msg *lap_msg)
 {
   memset(path, 0, sizeof *path);
@@ -2443,10 +2447,10 @@ static void cm_format_path_from_lap(stru
   path-hop_limit = lap_msg-alt_hop_limit;
   path-traffic_class = cm_lap_get_traffic_class(lap_msg);
   path-reversible = 1;
-  /* pkey is same as in REQ */
+  path-pkey 

[openib-general] ANNOUNCE: libmthca 1.0.3

2006-11-09 Thread Roland Dreier
I've just made a new 1.0.3 release of libmthca and pushed it out to
the relevant channel, which means that it should appear on
http://openib.org/downloads/ shortly.  Binary packages will also
appear in Debian and Fedora Extras when the builds complete.

Changes since 1.0.2 include:

 - Add Valgrind annotations, enabled with configure --with-valgrind.
 - fork() support when built against the libibverbs development branch.
 - Various fixes and cleanups.

See the ChangeLog in the package for full details.

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] amso1100: Fix typo

2006-11-09 Thread Jean Delvare
Fix the AMSO1100 firmware version computation, which was broken
due to  being used where  should have.

Signed-off-by: Jean Delvare [EMAIL PROTECTED]
---
 drivers/infiniband/hw/amso1100/c2_rnic.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.19-rc5.orig/drivers/infiniband/hw/amso1100/c2_rnic.c  
2006-11-09 10:30:33.0 +0100
+++ linux-2.6.19-rc5/drivers/infiniband/hw/amso1100/c2_rnic.c   2006-11-09 
20:50:28.0 +0100
@@ -157,8 +157,8 @@
 
props-fw_ver =
((u64)be32_to_cpu(reply-fw_ver_major)  32) |
-   ((be32_to_cpu(reply-fw_ver_minor)  0x)  16) |
-   (be32_to_cpu(reply-fw_ver_patch)  0x);
+   ((be32_to_cpu(reply-fw_ver_minor)  0x)  16) |
+   (be32_to_cpu(reply-fw_ver_patch)  0x);
memcpy(props-sys_image_guid, c2dev-netdev-dev_addr, 6);
props-max_mr_size = 0x;
props-page_size_cap   = ~(C2_MIN_PAGESIZE-1);


-- 
Jean Delvare

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] what should happen in a completion event channel is being destroyed when there are several CQs associated to it?

2006-11-09 Thread Roland Dreier
  What should happen in a completion event channel is being destroyed
  when there are several CQs associated to it?
  Should this operation fail (return EBUSY)?

I think that would be the most consistent thing, since we return EBUSY
for example if a CQ is destroyed with QPs still attached.

  When i tried to do it and later on try to wait for a completion on
  this event channel i got seg fault...

Does the destroy succeed?

Anyway I'll look at this code to see if it seems OK.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/1] Unitialized pseudo_netdev accessed in c2_register_device

2006-11-09 Thread Roland Dreier
thanks, queued for 2.6.19

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] amso1100: Fix typo

2006-11-09 Thread Roland Dreier
Looks pretty obvious.  Tom/Steve, I queued this for 2.6.19, so tell me
if it's wrong and I should drop it.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] amso1100: Fix typo

2006-11-09 Thread Tom Tucker
Jean:

Thanks. I gave this a whirl because I honestly couldn't 
remember exactly how these numbers were reported by the 
FW and it seems to work correctly. 

Roland, can you pull in Jean's patch? 

Thanks,
Tom


On Thu, 2006-11-09 at 21:02 +0100, Jean Delvare wrote:
 Fix the AMSO1100 firmware version computation, which was broken
 due to  being used where  should have.
 
 Signed-off-by: Jean Delvare [EMAIL PROTECTED]
 ---
  drivers/infiniband/hw/amso1100/c2_rnic.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 --- linux-2.6.19-rc5.orig/drivers/infiniband/hw/amso1100/c2_rnic.c
 2006-11-09 10:30:33.0 +0100
 +++ linux-2.6.19-rc5/drivers/infiniband/hw/amso1100/c2_rnic.c 2006-11-09 
 20:50:28.0 +0100
 @@ -157,8 +157,8 @@
  
   props-fw_ver =
   ((u64)be32_to_cpu(reply-fw_ver_major)  32) |
 - ((be32_to_cpu(reply-fw_ver_minor)  0x)  16) |
 - (be32_to_cpu(reply-fw_ver_patch)  0x);
 + ((be32_to_cpu(reply-fw_ver_minor)  0x)  16) |
 + (be32_to_cpu(reply-fw_ver_patch)  0x);
   memcpy(props-sys_image_guid, c2dev-netdev-dev_addr, 6);
   props-max_mr_size = 0x;
   props-page_size_cap   = ~(C2_MIN_PAGESIZE-1);
 
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] amso1100: Fix typo

2006-11-09 Thread Roland Dreier
  Roland, can you pull in Jean's patch? 

Already done, thanks for the ACK though.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IB/documentation - add new file to Documentation/infiniband

2006-11-09 Thread Ralph Campbell
This patch adds a new file to the kernel infiniband documentation
directory to briefly describe how to use memory regions.

Note: I will be on vacation from Nov. 11 through Nov. 26.

Signed-off-by: Ralph Campbell [EMAIL PROTECTED]

diff -r b9d92097f918 Documentation/infiniband/memory_regions.txt
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/Documentation/infiniband/memory_regions.txt   Wed Nov 08 18:35:46 
2006 -0800
@@ -0,0 +1,110 @@
+INFINIBAND MEMORY REGIONS
+
+  This is an overview of memory region usage for the user and kernel
+  verbs interface.  The verbs API to send and receive data does not
+  specify memory addresses directly.  Instead, a memory region
+  is constructed and a Lkey or Rkey is used to refer to the region.
+
+User Space Memory Regions
+
+  User space memory regions are created by calling ibv_reg_mr().
+  It returns a pointer to a struct ibv_mr which contains the
+  'lkey' field and 'rkey' field.  The lkey should be copied
+  into the 'lkey' field of struct ibv_sge when posting buffers
+  with ibv_post_send(), ibv_post_recv(), and ibv_post_srq_recv().
+  The 'addr' field of the ibv_sge should be a user address between
+  the address and address + length passed to ibv_reg_mr().
+
+  The 'rkey' can be sent to another process and used by the
+  remote process in RDMA write, read, and atomic operations
+  to access the local process' memory region.
+  The 'remote_addr' field in the ibv_send_wr should be the local
+  process' address within the memory region.  At some point in
+  the future, the interface may be extended to allow zero based
+  remote addresses which would mean the remote_addr would be
+  an offset within the local process' memory region.
+
+  A memory region is destroyed by calling ibv_dereg_mr().
+
+  Note that creating and destroying memory regions results
+  in kernel system calls which lock the user's virtual memory
+  to physical memory.  This means the system administrator must set
+  the RLIMIT memory lock limit high enough for processes to
+  be able to create memory regions of the desired size.
+  It is therefore best to limit the size of memory regions created.
+
+Kernel Space Memory Regions
+
+  ib_get_dma_mr()  This function returns a pointer to struct ib_mr
+  which contains the 'lkey' and 'rkey' fields similar to user
+  memory regions.  The memory region represents all of physical
+  memory so no base address or length is needed when creating it.
+  The addresses used for the 'addr' field of struct ib_sge need
+  to be hardware device addresses suitable for DMA.
+  Since this mapping may be device specific, there are a set
+  of kernel verbs functions corresponding to the DMA mapping
+  functions described in DMA-API.txt.  Another useful reference
+  is the Linux Device Drivers book, 3rd edition, by Rubini and Corbet.
+
+   ib_dma_mapping_error()
+   ib_dma_map_single()
+   ib_dma_unmap_single()
+   ib_dma_map_page()
+   ib_dma_unmap_page()
+   ib_dma_map_sg()
+   ib_dma_unmap_sg()
+   ib_sg_dma_address()
+   ib_sg_dma_len()
+   ib_dma_sync_single_for_cpu()
+   ib_dma_sync_single_for_device()
+
+  Remote processes should use the same address for 'remote_addr'
+  as the local kernel's address as returned by the mapping functions
+  listed above.  The only difference is the local kernel uses the
+  'lkey' and the remote kernel uses the 'rkey'.
+
+  Note that the mapped addresses need to be unmapped after they
+  are no longer needed.  This may require the local and remote
+  kernels to pass messages at the middle or upper layers to
+  sychronize.
+
+  ib_reg_phys_mr()  This function returns a pointer to struct ib_mr.
+  It takes an array of device DMA addresses and lengths which are used
+  to describe the memory region.  These addresses are created by
+  calling the mapping functions listed for ib_get_dma_mr().
+  The 'iova' argument is the starting address of the memory region
+  which should be used with the 'lkey' or 'rkey' returned in the
+  struct ib_mr.
+
+  ib_dereg_mr() is used to destroy memory regions created by
+  either ib_get_dma_mr() or ib_reg_phys_mr().
+
+  ib_alloc_fmr()  This returns a pointer to a struct ib_fmr.
+  The struct ib_fmr_attr argument specifies the size of each
+  FMR page as a power of two in 'page_shift'.  This size
+  is assumed by ib_map_phys_fmr() described below.
+  A FMR cannot be used until ib_map_phys_fmr() is called.
+  The 'lkey' and 'rkey' fields are defined in struct ib_fmr
+  and used the same way as the other memory regions.
+
+  ib_map_phys_fmr()  The function takes an array of u64 and a length
+  for the number of entries in the array.  Each u64 value should be
+  a DMA address created with the mapping functions listed for
+  ib_get_dma_mr().  The length of each u64 address region is the
+  FMR page size set when ib_alloc_fmr() was called.
+  Note that this now defines the memory region to start at address
+  'iova' and is the base address used for 'addr' 

Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support

2006-11-09 Thread Venkatesh Babu
Sean Hefty wrote:

I only tested failover on one side of the connection, because I didn't see a
need to test more, and only one side of my test had path records.  Once a
connection has been established, there's no difference between the active and
passive sides.
  

Yes, only Active side will have the path records. But port may fail on 
the Passive side too.
When a port is failed on the Passive node, active node also need to 
change the QP state to migrated.
Only Active node can reload the alternate path. So if the path comes 
back on the Passive node, it has to send event notification to the 
active to reload the alternate path. With my ib_sa_serv_notice_hdlr() 
and your CM changes I have tested all possible combinations. It was 
working fine.

VBabu

- Sean
  


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support

2006-11-09 Thread Sean Hefty
Yes, only Active side will have the path records. But port may fail on
the Passive side too.
When a port is failed on the Passive node, active node also need to
change the QP state to migrated.

The QP state will automatically change to migrated on both sides of the
connection after a failure occurs.  There's a delay before you'll see the
IB_EVENT_PATH_MIGRATED event on the QP though, so a manual transition of the QP
state may be faster, but isn't necessary.

For my testing, I waited for both sides to process the IB_EVENT_PATH_MIGRATED
event before having the original active side call ib_send_cm_lap().

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.19 2/4] ehca: hcp_phyp.c: correct page mapping in 64k page mode

2006-11-09 Thread Paul Mackerras
Christoph Raisch writes:

 ioremap maps 4k pages on 4k kernels and on 64k pages on 64k kernels. So far
 the theory.
 
 This is true for memory.

And for I/O. :)  ioremap updates the (Linux) page tables that map the
vmalloc/ioremap area, and that is at page granularity.  So there is in
fact no difference in the end result in the page tables whether you
ask to map a small amount inside a page, or the whole page.

 On POWER the ebus memory is mapped by H_ENTER.
 The hypervisor checks for 4k page size on H_ENTER, reason see above.

The next part of the story is that the low-level MMU code on System-P
(pSeries) machines only does the H_ENTER when you access an I/O
mapping.  It does H_ENTER for 4k pages for non-cacheable mappings,
and it only does the H_ENTER for the 4k subpages of a 64k page that
the kernel actually accesses.

So Roland is correct in his comment about how ioremap is called.

Regards,
Paul.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.19 2/4] ehca: hcp_phyp.c: correct page mapping in 64k page mode

2006-11-09 Thread Roland Dreier
  So Roland is correct in his comment about how ioremap is called.

Umm, so is this patch really needed?  Where did the patch come from --
is it needed to fix something actually seen, or was it written just
based on some theoretical understanding?

I'm confused...

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RDMA/iwcm: Get rid of extra call to list_empty()

2006-11-09 Thread Steve WIse
Tom, can you review at this one?  I remember this was sensitive code.  I
don't see the need for the 'empty' variable, but perhaps its plugging a
race condition?


On Thu, 2006-11-09 at 09:30 +0530, Krishna Kumar wrote:
 Get rid of extra call to list_empty(), and unnecessary
 variable. Has the side effect of sometimes resulting in
 faster processing of new events (like handling new
 connections, eg when cm_work_handler was processing the
 last entry) added to this list instead of cm_work_handler
 function exiting and re-entering when a new queue_work()
 is done.
 
 Doing the redundant queue_work() (if cm_work_handler is
 already running and processing the last entry) will not
 result in another call to cm_work_handler (run_workqueue)
 where no entry is found, since cm_work_handler will remove
 all entries from the list, even ones that are added late.
 
 Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
 ---
 diff -ruNp org/drivers/infiniband/core/iwcm.c 
 new/drivers/infiniband/core/iwcm.c
 --- org/drivers/infiniband/core/iwcm.c2006-10-09 16:40:04.0 
 +0530
 +++ new/drivers/infiniband/core/iwcm.c2006-10-09 16:52:03.0 
 +0530
 @@ -834,22 +834,17 @@ static void cm_work_handler(void *arg)
   struct iw_cm_event levent;
   struct iwcm_id_private *cm_id_priv = work-cm_id;
   unsigned long flags;
 - int empty;
 - int ret = 0;
  
   spin_lock_irqsave(cm_id_priv-lock, flags);
 - empty = list_empty(cm_id_priv-work_list);
 - while (!empty) {
 + while (!list_empty(cm_id_priv-work_list)) {
   work = list_entry(cm_id_priv-work_list.next,
 struct iwcm_work, list);
   list_del_init(work-list);
 - empty = list_empty(cm_id_priv-work_list);
   levent = work-event;
   put_work(work);
   spin_unlock_irqrestore(cm_id_priv-lock, flags);
  
 - ret = process_event(cm_id_priv, levent);
 - if (ret) {
 + if (process_event(cm_id_priv, levent)) {
   set_bit(IWCM_F_CALLBACK_DESTROY, cm_id_priv-flags);
   destroy_cm_id(cm_id_priv-id);
   }
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Fwd: [PATCH] RDMA/iwcm: Memory corruption bug in cm_work_handler]

2006-11-09 Thread Steve WIse
Roland, this fix looks good to me. I don't think it is high severity, so
perhaps it can just go into 2.6.20.

Krishna, for future patches, please include netdev@vger.kernel.org since
this code is now in linux proper.  The module in svn is no longer being
maintained in svn...


Acked-by: Steve Wise [EMAIL PROTECTED]


 Forwarded Message 
From: Krishna Kumar [EMAIL PROTECTED]
To: openib-general@openib.org
Subject: [openib-general] [PATCH] RDMA/iwcm: Memory corruption bug in
cm_work_handler
Date: Thu, 09 Nov 2006 09:30:34 +0530

Possible memory corruption scenario : after putting the work
entry back on the work_free_list, we call process_event()
which dereferences work-event, which could have been
modified to another value meanwhile.

Patches against 2.6.19-rc4 bits.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c  2006-10-09 16:40:04.0 +0530
+++ new/drivers/infiniband/core/iwcm.c  2006-10-09 16:52:03.0 +0530
@@ -830,7 +830,8 @@ static int process_event(struct iwcm_id_
  */
 static void cm_work_handler(void *arg)
 {
-   struct iwcm_work *work = arg, lwork;
+   struct iwcm_work *work = arg;
+   struct iw_cm_event levent;
struct iwcm_id_private *cm_id_priv = work-cm_id;
unsigned long flags;
int empty;
@@ -843,11 +844,11 @@ static void cm_work_handler(void *arg)
  struct iwcm_work, list);
list_del_init(work-list);
empty = list_empty(cm_id_priv-work_list);
-   lwork = *work;
+   levent = work-event;
put_work(work);
spin_unlock_irqrestore(cm_id_priv-lock, flags);
 
-   ret = process_event(cm_id_priv, work-event);
+   ret = process_event(cm_id_priv, levent);
if (ret) {
set_bit(IWCM_F_CALLBACK_DESTROY, cm_id_priv-flags);
destroy_cm_id(cm_id_priv-id);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Steve WIse
Roland,

This fix looks good.  IMO it's not high priority for 2.6.19, so 2.6.20
is fine.  If anyone thinks otherwise, hollar...



Acked-by: Steve Wise [EMAIL PROTECTED]

 Forwarded Message 
From: Krishna Kumar [EMAIL PROTECTED]
To: openib-general@openib.org
Subject: [openib-general] [PATCH] RDMA/iwcm: Fix memory leak
Date: Thu, 09 Nov 2006 09:30:41 +0530

If we get IW_CM_EVENT_CONNECT_REQUEST message and
encounter an error (not in the LISTEN state, cannot
create an id, cannot alloc work_entry, etc), then
the memory allocated by cm_event_handler() in the
event-private_data gets leaked. Since cm_work_handler
has already put the event on the work_free_list, this
allocated memory is leaked. High backlog value can
allow DoS attacks.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c  2006-10-09 16:40:04.0 +0530
+++ new/drivers/infiniband/core/iwcm.c  2006-10-09 16:52:03.0 +0530
@@ -620,7 +620,7 @@ static void cm_conn_req_handler(struct i
spin_lock_irqsave(listen_id_priv-lock, flags);
if (listen_id_priv-state != IW_CM_STATE_LISTEN) {
spin_unlock_irqrestore(listen_id_priv-lock, flags);
-   return;
+   goto out;
}
spin_unlock_irqrestore(listen_id_priv-lock, flags);
 
@@ -629,7 +629,7 @@ static void cm_conn_req_handler(struct i
listen_id_priv-id.context);
/* If the cm_id could not be created, ignore the request */
if (IS_ERR(cm_id))
-   return;
+   goto out;
 
cm_id-provider_data = iw_event-provider_data;
cm_id-local_addr = iw_event-local_addr;
@@ -642,7 +642,7 @@ static void cm_conn_req_handler(struct i
if (ret) {
iw_cm_reject(cm_id, NULL, 0);
iw_destroy_cm_id(cm_id);
-   return;
+   goto out;
}
 
/* Call the client CM handler */
@@ -654,6 +654,7 @@ static void cm_conn_req_handler(struct i
kfree(cm_id);
}
 
+out:
if (iw_event-private_data_len)
kfree(iw_event-private_data);
 }

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Sean Hefty
   if (iw_event-private_data_len)
   kfree(iw_event-private_data);

Kfree checks for a null value, so is the private_data_len check necessary?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Roland Dreier
  if (iw_event-private_data_len)
  kfree(iw_event-private_data);
  
  Kfree checks for a null value, so is the private_data_len check necessary?

Could private_data be a junk pointer if private_data_len == 0 ?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Fwd: [PATCH] RDMA/iwcm: Rewrite comment for iwcm_deref_id() to match code.]

2006-11-09 Thread Steve WIse
Roland,

Comment cleanup for 2.6.20.

Acked-by: Steve Wise [EMAIL PROTECTED]

 Forwarded Message 
From: Krishna Kumar [EMAIL PROTECTED]
To: openib-general@openib.org
Subject: [openib-general] [PATCH] RDMA/iwcm: Rewrite comment for
iwcm_deref_id() to match code.
Date: Thu, 09 Nov 2006 09:30:48 +0530

In iwcm_deref_id(), the comment says : If the last
reference is being removed and iw_destroy_cm_id is
waiting, wake up the waiting thread. The second part
of the comment and iw_destroy_cm_id is waiting is
wrong, since this function either wakes the waiter
already waiting in iwcm_deref_id, or enables it (so
that when wait_for_completion() is performed later,
it will immediately return).

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c  2006-10-09 16:40:04.0 +0530
+++ new/drivers/infiniband/core/iwcm.c  2006-10-09 16:52:03.0 +0530
@@ -148,8 +148,9 @@ static int copy_private_data(struct iw_c
 }
 
 /*
- * Release a reference on cm_id. If the last reference is being removed
- * and iw_destroy_cm_id is waiting, wake up the waiting thread.
+ * Release a reference on cm_id. If the last reference is being
+ * released, enable the waiting thread (in iw_destroy_cm_id) to
+ * get woken up, and return 1 if a thread is already waiting.
  */
 static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
 {

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Fwd: [PATCH] RDMA/iwcm: Remove unnecessary function argument.]

2006-11-09 Thread Steve WIse
Roland,

Another small cleanup for 2.6.20.

Acked-by: Steve Wise [EMAIL PROTECTED]


 Forwarded Message 
From: Krishna Kumar [EMAIL PROTECTED]
To: openib-general@openib.org
Subject: [openib-general] [PATCH] RDMA/iwcm: Remove unnecessary function
argument.
Date: Thu, 09 Nov 2006 09:30:45 +0530

Remove unnecessary function argument, and change text to
reflect the code. Fix couple of typos.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c  2006-10-09 16:40:04.0 +0530
+++ new/drivers/infiniband/core/iwcm.c  2006-10-09 16:52:03.0 +0530
@@ -80,7 +80,7 @@ struct iwcm_work {
  * 1) in the event upcall, cm_event_handler(), for a listening cm_id.  If
  *the backlog is exceeded, then no more connection request events will
  *be processed.  cm_event_handler() returns -ENOMEM in this case.  Its up
- *to the provider to reject the connectino request.
+ *to the provider to reject the connection request.
  * 2) in the connection request workqueue handler, cm_conn_req_handler().
  *If work elements cannot be allocated for the new connect request cm_id,
  *then IWCM will call the provider reject method.  This is ok since
@@ -131,12 +131,11 @@ static int alloc_work_entries(struct iwc
 }
 
 /*
- * Save private data from incoming connection requests in the
- * cm_id_priv so the low level driver doesn't have to.  Adjust
+ * Save private data from incoming connection requests to
+ * iw_cm_event, so the low level driver doesn't have to. Adjust
  * the event ptr to point to the local copy.
  */
-static int copy_private_data(struct iwcm_id_private *cm_id_priv,
-  struct iw_cm_event *event)
+static int copy_private_data(struct iw_cm_event *event)
 {
void *p;
 
@@ -243,7 +242,7 @@ static int iwcm_modify_qp_sqd(struct ib_
 /*
  * CM_ID -- CLOSING
  *
- * Block if a passive or active connection is currenlty being processed. Then
+ * Block if a passive or active connection is currently being processed. Then
  * process the event as follows:
  * - If we are ESTABLISHED, move to CLOSING and modify the QP state
  *   based on the abrupt flag
@@ -903,7 +902,7 @@ static int cm_event_handler(struct iw_cm
if ((work-event.event == IW_CM_EVENT_CONNECT_REQUEST ||
 work-event.event == IW_CM_EVENT_CONNECT_REPLY) 
work-event.private_data_len) {
-   ret = copy_private_data(cm_id_priv, work-event);
+   ret = copy_private_data(work-event);
if (ret) {
put_work(work);
goto out;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Fwd: [PATCH] RDMA/iwcm: Remove un-required initializations.]

2006-11-09 Thread Steve WIse
Roland,

Code cleanup for 2.6.20.

Acked-by: Steve Wise [EMAIL PROTECTED]

 Forwarded Message 
From: Krishna Kumar [EMAIL PROTECTED]
To: openib-general@openib.org
Subject: [openib-general] [PATCH] RDMA/iwcm: Remove un-required
initializations.
Date: Thu, 09 Nov 2006 09:30:43 +0530

Remove un-required initializations.

Signed-off-by: Krishna Kumar [EMAIL PROTECTED]
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c  2006-10-09 16:40:04.0 +0530
+++ new/drivers/infiniband/core/iwcm.c  2006-10-09 16:52:03.0 +0530
@@ -408,7 +408,7 @@ int iw_cm_listen(struct iw_cm_id *cm_id,
 {
struct iwcm_id_private *cm_id_priv;
unsigned long flags;
-   int ret = 0;
+   int ret;
 
cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
 
@@ -535,7 +535,7 @@ EXPORT_SYMBOL(iw_cm_accept);
 int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param)
 {
struct iwcm_id_private *cm_id_priv;
-   int ret = 0;
+   int ret;
unsigned long flags;
struct ib_qp *qp;
 
@@ -675,7 +675,7 @@ static int cm_conn_est_handler(struct iw
   struct iw_cm_event *iw_event)
 {
unsigned long flags;
-   int ret = 0;
+   int ret;
 
spin_lock_irqsave(cm_id_priv-lock, flags);
 
@@ -705,7 +705,7 @@ static int cm_conn_rep_handler(struct iw
   struct iw_cm_event *iw_event)
 {
unsigned long flags;
-   int ret = 0;
+   int ret;
 
spin_lock_irqsave(cm_id_priv-lock, flags);
/*

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Steve WIse
I think the semantics are that the pointer is only used if
private_data_len  0.  Otherwise, it is undefined.  So I think we should
keep the check.  Plus I don't like calling kfree() with a NULL pointer.
It just seems wrong...

;-)


On Thu, 2006-11-09 at 14:59 -0800, Roland Dreier wrote:
 if (iw_event-private_data_len)
 kfree(iw_event-private_data);
   
   Kfree checks for a null value, so is the private_data_len check necessary?
 
 Could private_data be a junk pointer if private_data_len == 0 ?
 
  - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Roland Dreier
  I think the semantics are that the pointer is only used if
  private_data_len  0.  Otherwise, it is undefined.  So I think we should
  keep the check.  Plus I don't like calling kfree() with a NULL pointer.
  It just seems wrong...

Well, the first half definitely justifies leaving the check.

However you're wrong about kfree(NULL) :)  Every time you write

if (foo)
kfree(foo);

a kitten is killed... Seriously, the check is pure bloat that wastes
instruction cache, etc.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/7] IB/core - Add DMA mapping functions to allow device drivers to interpose

2006-11-09 Thread Christoph Hellwig
On Sun, Nov 05, 2006 at 11:25:27PM +0200, Or Gerlitz wrote:
 On 11/5/06, Roland Dreier [EMAIL PROTECTED] wrote:
   I have mentioned this to Ralph in the past, just want to get ack/nak
   on that from you: also on 64bit arch a block driver (eg SCSI LLD eg
   SRP/iSER/etc) might get from higher level an SG whose pages are
   **not** mapped into the kernel virtual address space. For example this
   can happen with Direct I/O.
 
 No, I don't see how that could happen.  Aren't all pages always mapped
 by the the kernel direct mapping on 64-bit architectures?
 
 I don't know exactly how this happens, but one of the comments i've
 got from Christoph
 on the iser code, is that one can't assume page_address(sg[i].page)
 will not be NULL for SG passed to a SCSI LLD, i think Direct I/O is
 one flow where this might happen.

That statement is indeed true.  Only for GFP_KERNEL allocations you
can assume page_address is valid, and the scatterlist passed to a SCSI
LLDD can contain any type of pages.  Currently on all 64bit
architectures page_address works on all pages, but that's an
implementation detail that could change any time and that you should
not rely on.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] amso1100 bug found by checker

2006-11-09 Thread Bryan O'Sullivan
I've filed the bug at kernel.org.  It looks easy to fix.  Please take 
ownership, if you will: http://bugzilla.kernel.org/show_bug.cgi?id=7478

b

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED 1.1 IPoIB did not recover after a mthca catas recovery.

2006-11-09 Thread Ira Weiny
We just had an internal parity error on a mellanox HCA.  The HCA recovered.  
However, IPoIB did not fair as well.  We are not sure of the details.  What I 
have on the console is:

2006-11-09 15:20:05 ib_mthca :07:00.0: Catastrophic error detected: 
internal parity error
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[00]: 0514
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[01]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[02]: 00196240
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[03]: 00126618
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[04]: 00206128
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[05]: 001d6ff8
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[06]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[07]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[08]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[09]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0a]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0b]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0c]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0d]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0e]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0f]: 
2006-11-09 15:20:05 divert: no divert_blk to free, ib0 not ethernet
2006-11-09 15:20:05 divert: no divert_blk to free, ib1 not ethernet


ifconfig showed ib0 as gone (as in not listed).  We tried to ifup ib0 and got:

# zeus64 /root  ifup ib0
ib_ipoib
ib_ipoib device ib0 does not seem to be present, delaying initialization.


I then tried to unload the ib_ipoib module and that has hung for the last 15 
min.

I have run ibv_rc_pingpong and ib_rdma_bw through the node fine.  ibstat and 
ibstatus and the switch show the link to be up.  So it appears as though the 
card recovered fine.

What can we do?

:-/

Thanks,
Ira

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.1 IPoIB did not recover after a mthca catas recovery.

2006-11-09 Thread Boris Shpolyansky
Ira,

I think our general recommendation is to reboot the machine once the HCA
has reported catastrophic error, since the device is in the fatal state
and wouldn't respond to any command from the host. 
However the gen-2 driver, i.e. ib_mthca, resets the HCA when it starts,
so restarting the driver may serve you just fine (unless you have a
persistent HW failure).

From what you reported IPoIB doesn't seem to survive this, so it looks
like you still have to reboot your machine.

Regards,
Boris Shpolyansky
Application Engineer
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ira Weiny
Sent: Thursday, November 09, 2006 4:45 PM
To: openib-general@openib.org
Cc: Roland Dreier; Trent D'Hooge
Subject: [openib-general] OFED 1.1 IPoIB did not recover after a mthca
catas recovery.

We just had an internal parity error on a mellanox HCA.  The HCA
recovered.  However, IPoIB did not fair as well.  We are not sure of the
details.  What I have on the console is:

2006-11-09 15:20:05 ib_mthca :07:00.0: Catastrophic error detected:
internal parity error
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[00]: 0514
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[01]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[02]: 00196240
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[03]: 00126618
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[04]: 00206128
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[05]: 001d6ff8
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[06]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[07]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[08]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[09]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0a]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0b]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0c]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0d]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0e]: 
2006-11-09 15:20:05 ib_mthca :07:00.0:   buf[0f]: 
2006-11-09 15:20:05 divert: no divert_blk to free, ib0 not ethernet
2006-11-09 15:20:05 divert: no divert_blk to free, ib1 not ethernet


ifconfig showed ib0 as gone (as in not listed).  We tried to ifup ib0
and got:

# zeus64 /root  ifup ib0
ib_ipoib
ib_ipoib device ib0 does not seem to be present, delaying
initialization.


I then tried to unload the ib_ipoib module and that has hung for the
last 15 min.

I have run ibv_rc_pingpong and ib_rdma_bw through the node fine.  ibstat
and ibstatus and the switch show the link to be up.  So it appears as
though the card recovered fine.

What can we do?

:-/

Thanks,
Ira

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.1 IPoIB did not recover after a mthca catas recovery.

2006-11-09 Thread Roland Dreier
  What can we do?

Something wacky happened I guess.  If you still have the system in the
state where unloading ib_ipoib hung, could you do

echo t  /proc/sysrq-trigger

and then send the kernel log with that output?

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support

2006-11-09 Thread Venkatesh Babu
Sean Hefty wrote:

The QP state will automatically change to migrated on both sides of the
connection after a failure occurs.  There's a delay before you'll see the
IB_EVENT_PATH_MIGRATED event on the QP though, so a manual transition of the QP
state may be faster, but isn't necessary.

  

Atleast in OFED 1.0 QP state was not automatically changeing to 
migrated. I have to manually call ib_modify_qp() to do this.

For my testing, I waited for both sides to process the IB_EVENT_PATH_MIGRATED
event before having the original active side call ib_send_cm_lap().
  

That path might not have come back up when you load the alternate path. 
I presume it is possible to load the alternate path even though it is 
down. If the failover hapens before the alternate path comes up, 
failover fails. It is no different than if it is not loaded. So both 
your case and my case works the same.

VBabu

- Sean
  


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED-1.1: *** stack smashing detected ***: opensm terminated

2006-11-09 Thread chris_youb
Setups:

A) Suse 10.0 w/ OFED 1.1
B) Ubuntu 6.10 (native drivers), self compiled opensm from OFED 1.1

Suse - I've successfully setup opensm on the Suse system and it appears to be 
running fine (has been for days).

Ubuntu - Thanks to Roland D. I setup the ib drivers on a new Ubuntu 6.10 box.  
I've also compiled opensm from OFED-1.1.  It runs, allocates LIDs and otherwise 
appears OK.  But it terminates after a minute as follows:

*** stack smashing detected ***: opensm terminated
Aborted (core dumped)



Observations: On the Suse system I periodically get MAD received with 
unsupported base version 0 in the console window but it continues on.  On the 
Ubuntu box I never see them, except in /var/log/messages I get ib_mad: MAD 
received with unsupported base version 0 around the same time it crashes.  But 
that could be coincidence.

Questions: I looked into the stack smashing message and it appears to be a 
safety check from gcc, which could be a false positive?  Anyways, I am running 
gcc 4.1.2.  Is there a way to:

A) confirm if this is an error (what do I need to provide)
B) turn off this check via a compiler flag (in the case of a false positive).

Thanks.

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/messages/openib-general@openib.org/topic.html

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED-1.1: *** stack smashing detected ***: opensm terminated

2006-11-09 Thread Hal Rosenstock
MAD messages received with unsupported base version 0 means the MADs are 
somehow corrupted. Current (and only) base version is 1.
 
What HCA are you using ? What firmware ?
 
As far as the stack smashing goes, it would be nice to know the file and line 
number where this occurred. There have been several bugs recently fixed which 
might relate to this. Using the -fstack-protector argument (add to CFLAGS) to 
compile in automatic buffer overflow protection and seeing what errors or 
warnings are generated might be instructive.
 
-- Hal
 


From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent: Thu 11/9/2006 9:01 PM
To: openib-general@openib.org
Subject: [openib-general] OFED-1.1: *** stack smashing detected ***: opensm 
terminated



Setups:

A) Suse 10.0 w/ OFED 1.1
B) Ubuntu 6.10 (native drivers), self compiled opensm from OFED 1.1

Suse - I've successfully setup opensm on the Suse system and it appears to be 
running fine (has been for days).

Ubuntu - Thanks to Roland D. I setup the ib drivers on a new Ubuntu 6.10 box.  
I've also compiled opensm from OFED-1.1.  It runs, allocates LIDs and otherwise 
appears OK.  But it terminates after a minute as follows:

*** stack smashing detected ***: opensm terminated
Aborted (core dumped)



Observations: On the Suse system I periodically get MAD received with 
unsupported base version 0 in the console window but it continues on.  On the 
Ubuntu box I never see them, except in /var/log/messages I get ib_mad: MAD 
received with unsupported base version 0 around the same time it crashes.  But 
that could be coincidence.

Questions: I looked into the stack smashing message and it appears to be a 
safety check from gcc, which could be a false positive?  Anyways, I am running 
gcc 4.1.2.  Is there a way to:

A) confirm if this is an error (what do I need to provide)
B) turn off this check via a compiler flag (in the case of a false positive).

Thanks.

--
This message was sent on behalf of [EMAIL PROTECTED] at openSubscriber.com
http://www.opensubscriber.com/messages/openib-general@openib.org/topic.html

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED-1.1: *** stack smashing detected ***: opensm terminated

2006-11-09 Thread Roland Dreier
  *** stack smashing detected ***: opensm terminated
  Aborted (core dumped)

Probably a bug in opensm.  Running gdb on the core file and sending
the backtrace backtrace (output of gdb command bt) would be useful
for fixing this I guess.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Krishna Kumar2
Though the amso driver (c2_ae_event) is setting the private_data and
private_data_len together for connect request and connect result, so
the check may not be necessary. But if the semantics prefer checking
to make sure, we should follow that (esp if other future drivers may
also simply set private_data_len to zero without modifying
private_data).

I did it this way since cm_conn_rep_handler() had the same check :)

thanks,

- KK

 I think the semantics are that the pointer is only used if
 private_data_len  0.  Otherwise, it is undefined.  So I think we should
 keep the check.  Plus I don't like calling kfree() with a NULL pointer.
 It just seems wrong...
 
 ;-)
 
 
 On Thu, 2006-11-09 at 14:59 -0800, Roland Dreier wrote:
if (iw_event-private_data_len)
   kfree(iw_event-private_data);

Kfree checks for a null value, so is the private_data_len check 
necessary?
  
  Could private_data be a junk pointer if private_data_len == 0 ?
  
   - R.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Tom Tucker

If it's truly nul or a ptr, we don't need to (and shouldn't) check, just
call kfree. If it's unitialized, we can't tell anyway and it's a bug --
right? 

Am I missing something?

On 11/9/06 10:41 PM, Krishna Kumar2 [EMAIL PROTECTED] wrote:

 Though the amso driver (c2_ae_event) is setting the private_data and
 private_data_len together for connect request and connect result, so
 the check may not be necessary. But if the semantics prefer checking
 to make sure, we should follow that (esp if other future drivers may
 also simply set private_data_len to zero without modifying
 private_data).
 
 I did it this way since cm_conn_rep_handler() had the same check :)
 
 thanks,
 
 - KK
 
 I think the semantics are that the pointer is only used if
 private_data_len  0.  Otherwise, it is undefined.  So I think we should
 keep the check.  Plus I don't like calling kfree() with a NULL pointer.
 It just seems wrong...
 
 ;-)
 
 
 On Thu, 2006-11-09 at 14:59 -0800, Roland Dreier wrote:
if (iw_event-private_data_len)
   kfree(iw_event-private_data);
 
 Kfree checks for a null value, so is the private_data_len check
 necessary?
 
 Could private_data be a junk pointer if private_data_len == 0 ?
 
  - R.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Krishna Kumar2
That is valid only if the drivers also comply. Eg if driver has two
stack variables private_data and private_data_len, and it sets
only private_data_len to zero. Then when calling the upper layer,
it sets the event-private_data to its local private_data (uninitialized)
and event-private_data_len to its local private_data_len (zero).
Here we have to check the private_data_len before touching
private_data or risk bug/panic.

thanks,

- KK

Tom Tucker [EMAIL PROTECTED] wrote on 11/10/2006 10:20:18 AM:

 
 If it's truly nul or a ptr, we don't need to (and shouldn't) check, just
 call kfree. If it's unitialized, we can't tell anyway and it's a bug --
 right? 
 
 Am I missing something?
 
 On 11/9/06 10:41 PM, Krishna Kumar2 [EMAIL PROTECTED] wrote:
 
  Though the amso driver (c2_ae_event) is setting the private_data and
  private_data_len together for connect request and connect result, so
  the check may not be necessary. But if the semantics prefer checking
  to make sure, we should follow that (esp if other future drivers may
  also simply set private_data_len to zero without modifying
  private_data).
  
  I did it this way since cm_conn_rep_handler() had the same check :)
  
  thanks,
  
  - KK
  
  I think the semantics are that the pointer is only used if
  private_data_len  0.  Otherwise, it is undefined.  So I think we 
should
  keep the check.  Plus I don't like calling kfree() with a NULL 
pointer.
  It just seems wrong...
  
  ;-)
  
  
  On Thu, 2006-11-09 at 14:59 -0800, Roland Dreier wrote:
 if (iw_event-private_data_len)
kfree(iw_event-private_data);
  
  Kfree checks for a null value, so is the private_data_len check
  necessary?
  
  Could private_data be a junk pointer if private_data_len == 0 ?
  
   - R.
  
  
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
  
  To unsubscribe, please visit
  http://openib.org/mailman/listinfo/openib-general
  
  
  
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
  
  To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
  
 
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Fwd: [PATCH] RDMA/iwcm: Fix memory leak]

2006-11-09 Thread Tom Tucker
Krishna: 

Maybe I'm missing something, but whether

if (len)
kfree(ptr)

or

if (ptr)
kfree(ptr)

is correct is contingent upon how you couple the two variables. But I don't'
think this has anything to do with the Roland's point.

I think Pandora's box was opened when Steve suggested that it's just good
policy to check for nul before calling free...and in general it is good
defensive programming.

However, Roland's point is that in the kernel, it's contingent upon us all
to know and leverage the error checking done by the services we use. If
kfree checks for nul, we don't have toand shouldn't check it.

Kittens are cute... really ... who can argue with that? What 'len' allows us
to assume about 'ptr' is a little more ... well... fuzzy.


On 11/9/06 11:11 PM, Krishna Kumar2 [EMAIL PROTECTED] wrote:

 That is valid only if the drivers also comply. Eg if driver has two
 stack variables private_data and private_data_len, and it sets
 only private_data_len to zero. Then when calling the upper layer,
 it sets the event-private_data to its local private_data (uninitialized)
 and event-private_data_len to its local private_data_len (zero).
 Here we have to check the private_data_len before touching
 private_data or risk bug/panic.
 
 thanks,
 
 - KK
 
 Tom Tucker [EMAIL PROTECTED] wrote on 11/10/2006 10:20:18 AM:
 
 
 If it's truly nul or a ptr, we don't need to (and shouldn't) check, just
 call kfree. If it's unitialized, we can't tell anyway and it's a bug --
 right? 
 
 Am I missing something?
 
 On 11/9/06 10:41 PM, Krishna Kumar2 [EMAIL PROTECTED] wrote:
 
 Though the amso driver (c2_ae_event) is setting the private_data and
 private_data_len together for connect request and connect result, so
 the check may not be necessary. But if the semantics prefer checking
 to make sure, we should follow that (esp if other future drivers may
 also simply set private_data_len to zero without modifying
 private_data).
 
 I did it this way since cm_conn_rep_handler() had the same check :)
 
 thanks,
 
 - KK
 
 I think the semantics are that the pointer is only used if
 private_data_len  0.  Otherwise, it is undefined.  So I think we
 should
 keep the check.  Plus I don't like calling kfree() with a NULL
 pointer.
 It just seems wrong...
 
 ;-)
 
 
 On Thu, 2006-11-09 at 14:59 -0800, Roland Dreier wrote:
if (iw_event-private_data_len)
   kfree(iw_event-private_data);
 
 Kfree checks for a null value, so is the private_data_len check
 necessary?
 
 Could private_data be a junk pointer if private_data_len == 0 ?
 
  - R.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general
 
 
 
 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-09 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/19/2006 09:10:35 PM:
Roland,

 I looked over my code again, and I don't see anything obviously wrong,
 but it's quite possible I made a mistake that I just can't see right
 now (like reversing a truth value somewhere). Someone who knows how
 ehca works might be able to spot the error.
 
 - R.

Your code is OK. I just found the problem here.
+		 if (empty) {
+		 		 netif_rx_complete(dev);
+		 		 ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, missed_event);
+		 		 if (unlikely(missed_event)  netif_rx_reschedule(dev, 0))
+		 		 		 goto repoll;
+
+		 		 return 0;
+		 }

netif_rx_complete() should be called right before return. It does improve none scaling performance with this patch, but reduce scaling performance.

+		 if (empty) {
+		 		 ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, missed_event);
+		 		 if (unlikely(missed_event)  netif_rx_reschedule(dev, 0))
+		 		 		 goto repoll;
+		 		 netif_rx_complete(dev);
+
+		 		 return 0;
+		 }
Any other reason, calling netif_rx_complete() while still possibably within napi? 

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [discuss] Re: 2.6.19-rc4: known unfixed regressions (v3)

2006-11-09 Thread Andi Kleen

 So we should have the infrastructure to only use mmconfig for some set
 of busses.  If that interface is well described we can probably
 bootstrap sanely, only enabling what we know exists and like wise
 only reserving what we know is used.

Unfortunately there is a chicken and egg problem on those few broken
systems (like some x86 Macs) where only mcfg works. Without mcfg you
won't be able to probe the bus. Ok you could trust ACPI when it says
it's there, but I'm not sure Linus would like that.

Still perhaps I guess only reserving when the bus is probed is probably
a good idea. In most cases we only probe a small number of busses 
because ACPI tells us the number.

This basically means pci_mcfg_init() should be split up.

-Andi


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2.6.19 2/4] ehca: hcp_phyp.c: correct page mapping in 64k page mode

2006-11-09 Thread Christoph Raisch

 Umm, so is this patch really needed?  Where did the patch come from --
 is it needed to fix something actually seen, or was it written just
 based on some theoretical understanding?

 I'm confused...

  - R.
The patch is needed. We've seen it on the real system. We did fix it on the
real system.
...and it conforms to theory... although theory is a bit confusing here.

let me try to summarize:
ioremap checks for 64k boundary (actually page boundary)
nopage does H_ENTER in 4k granularity if it's configured like that for a
certain type of POWER processor.

so you have to adjust the ioremap to page boundary, and THEN access at the
offset within the 64k.

Took quite a while until we understood that code path ;-)

Christoph R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general