Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Grant Grundler
On Fri, Jun 10, 2005 at 02:59:52PM -0700, Grant Grundler wrote:
> My r2600 says:
> /proc/cpuinfo reports  1500 Mhz
> Calculated speed is1492 Mhz  (130150841056 cycles / 871849 usec)

Sorry, I wasn't paying attention...the 1492 number is wrong and
I finally figured out why.

> This is derived from 199. MHz FSB speed * (15:2 ratio) == 1492.5 Mhz.
> The "Calculated speed" is really 1492.81 Mhz.  That's < 0.5% error.

"" is probably 199.9 or so - very close to spec (200Mhz).
The error is well below %0.1.

> I just checked my rx4640 (also ZX1 chipset)
> and it seems to report the proper speed:
> cpu MHz: 1299.762145
> itc MHz: 1299.762145

In this case, the FSB is 199.96 Mhz.

> On this machine, I'm going to trust the firmware is more accurate
> than my user space test.

Definitely.

> + printf("Calculated speed is   %5Ld Mhz",  tsc_total / tod_total / 100);

Sorry - I should have figured out why I needed the "/ 100".
It directly corresponds to tx_depth (default 100).
And it was a big, fat, hairy clue something wasn't right.
I just didn't work out the math at the time though it bothered
me a bit becuase it didn't make sense.

The posted/completed get_cycles() measurements overlap.
Ergo with big "-n 1" I was getting a number that was 100x too big. 
My second clue was that smaller -n would have "bigger" errors.

Patch below sums the get_cycle measurements correctly (I think)
and results look like:
/proc/cpuinfo reports 1500.00 Mhz
Calculated speed is   1500.02 Mhz  (642797211 cycles / 428525 usec)


or with different parameters:

/proc/cpuinfo reports 1500.00 Mhz
Calculated speed is   1499.99 Mhz  (129930481 cycles / 86621 usec)

(129930481/86621 is 1499.98823...)

Patch below adds "-g" (gettimeofday) and "-M=" parameters
in case someone's firmware is less accurate than HP's.

And to keep Bernhard happy, fixes usage() output to match 16KB default. :^)

apologies,
grant


Index: rdma_bw.c
===
--- rdma_bw.c   (revision 2572)
+++ rdma_bw.c   (working copy)
@@ -58,13 +58,10 @@
 #include "get_clock.h"
 
 #define PINGPONG_RDMA_WRID 3
+#define MILLION100
 
 static int page_size;
 
-struct report_options {
-   int cycles;   /* report delta's in cycles, not microsec's */
-};
-
 struct pingpong_context {
struct ibv_context *context;
struct ibv_pd  *pd;
@@ -418,23 +415,20 @@ static void usage(const char *argv0)
printf("  -p, --port=  listen on/connect to port  
(default 18515)\n");
printf("  -d, --ib-dev= use IB device  (default first 
device found)\n");
printf("  -i, --ib-port=   use port  of IB device (default 
1)\n");
-   printf("  -s, --size=  size of message to exchange (default 
1)\n");
+   printf("  -s, --size=  size of message to exchange (default 
16KB)\n");
printf("  -t, --tx-depth=   size of tx queue (default 100)\n");
printf("  -n, --iters=number of exchanges (at least 2, 
default 1000)\n");
printf("  -b, --bidirectionalmeasure bidirectional bandwidth 
(default unidirectional)\n");
-   printf("  -C, --report-cyclesreport times in cpu cycle units 
(default seconds)\n");
-   printf("  -H, --report-histogram print out all results (default print 
summary only)\n");
-   printf("  -U, --report-unsorted  (implies -H) print out unsorted 
results (default sorted)\n");
+   printf("  -g, --gettimeofday Use gettimeofday() to calibrate Mhz 
(default /proc/cpuinfo)\n");
+   printf("  -M, --mhz=Use  for Mhz\n");
 }
 
-static void print_report(struct report_options * options,
-unsigned int iters, double size, int duplex,
+static void print_report(double cycles_to_units, unsigned int iters,
+int size, int duplex,
 cycles_t *tposted, cycles_t *tcompleted)
 {
-   double cycles_to_units;
-   double tsize; /* Transferred size, in megabytes */
+   unsigned long tsize;/* Transferred size, in megabytes */
int i, j;
-   const char* units;
int opt_posted = 0, opt_completed = 0;
cycles_t opt_delta;
cycles_t t;
@@ -453,19 +447,20 @@ static void print_report(struct report_o
}
}
 
-   if (options->cycles) {
-   cycles_to_units = 1;
-   units = "cycles";
-   } else {
-   cycles_to_units = get_cpu_mhz() * 100;
-   units = "sec";
-   }
 
tsize = duplex ? 2 : 1;
-   tsize = tsize * size / 0x10;
+   tsize = tsize * size / 1024;/* convert to KB */
 
-   printf("Bandwidth peak (#%d to #%d): %g MByte/%s\n", opt_posted, 
opt_completed, tsize * cycles_to_units / opt_delta, units);
-   printf("Bandwidth average: %g MByte/%s\n", tsize * iters * 
cycles_to_units / (tcompleted[iters - 1] - tposted[0]), units);
+   printf("Bandwidth peak (#%d to #%d): %'.2f MB/sec\n

Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Bernhard Fischer
On Thu, Jun 09, 2005 at 04:40:10PM -0700, Roland Dreier wrote:
>Shirley> I used cyclets_to_units() get Bandwidth peak (#0 to
>Shirley> #972): 1852.72 MByte/sec
>
>Shirley> Which is far away from the 1X (2.5Gbit/s) throughtput.
>
>Actually it's not that far away -- the 2.5 Gbit/sec is after 8b/10b
>coding, so the raw data rate is really 2 Gbit/sec.  Getting > 92% link
>utilization is pretty good considering packet header overhead and so
>on.  BTW I'm assuming "MByte/sec" is a typo for "Mbit/sec".

I may be wrong and OT, but from what i got, nowadays,
http://kerneltrap.org/node/340

ymmv,
-- 
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [mthca] debug strings

2005-06-10 Thread Bernhard Fischer
On Fri, Jun 10, 2005 at 09:34:10PM +0200, Bernhard Fischer wrote:
>Hi,

>- remove duplicate ': ' and prefix messages from sdp with ib_sdp
>  like ib_mthca does.

hm. when loading mthca i see:
ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
ib_mthca: Initializing Mellanox Technologies .
ib_mthca :04:00.0: FW version 000400060002, max commands 64
...

which is a bit inconsistent.

No functional obj-code changes.

- peruse DRV_NAME
- trim ':' from PFX to be consistent with dev_{info,warn,err} output.
- whitespace in struct mthca_profile: replace spaces with tabs to
  be consistent with other member of that struct wrt spacing style.
- mthca_dev_lim() remove tab (was that ment to be a reminder?)
- add apparently missing newlines to stings emitted by dev_err.
- add note whom to contact if old FW is found.


In mthca_map_eq_regs(), i take it that it is of no interrest to know
who of "..dev->fw.arbel.eq_"arm_base|eq_set_ci_base") + "(0|4) did
trigger? Memfree vs. not would be obvious from previous msgs.
Same in mthca_tune_pci for cap messages (which don't fit into width=80 
terminals)

find_mgm(): 'if (0)\n\t\tmthca_dbg(dev, "Hash for..' should not rely
on dead code elimination but HEAVY_DEBUG. Haven't looked for other
occurances of that, yet. Will you?

thank you,
-- 
Bernhard
diff -X excl -rduNp 
gen2.2551.oorig/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_dev.h 
gen2.2551/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_dev.h
--- gen2.2551.oorig/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_dev.h  
2005-05-27 11:06:10.0 +0200
+++ gen2.2551/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_dev.h
2005-06-11 01:27:31.0 +0200
@@ -45,7 +45,7 @@
 #include "mthca_doorbell.h"
 
 #define DRV_NAME   "ib_mthca"
-#define PFXDRV_NAME ": "
+#define PFXDRV_NAME " "
 #define DRV_VERSION"0.06-pre"
 #define DRV_RELDATE"November 8, 2004"
 
diff -X excl -rduNp 
gen2.2551.oorig/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_main.c 
gen2.2551/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_main.c
--- gen2.2551.oorig/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_main.c 
2005-05-27 11:06:10.0 +0200
+++ gen2.2551/trunk/src/linux-kernel/infiniband/hw/mthca/mthca_main.c   
2005-06-10 23:14:43.0 +0200
@@ -69,7 +69,7 @@ MODULE_PARM_DESC(msi, "attempt to use MS
 #endif /* CONFIG_PCI_MSI */
 
 static const char mthca_version[] __devinitdata =
-   "ib_mthca: Mellanox InfiniBand HCA driver v"
+   DRV_NAME " Mellanox InfiniBand HCA driver v"
DRV_VERSION " (" DRV_RELDATE ")\n";
 
 static struct mthca_profile default_profile = {
@@ -154,11 +154,11 @@ static int __devinit mthca_dev_lim(struc
return -ENODEV;
}
 
-   mdev->limits.num_ports  = dev_lim->num_ports;
+   mdev->limits.num_ports  = dev_lim->num_ports;
mdev->limits.vl_cap = dev_lim->max_vl;
mdev->limits.mtu_cap= dev_lim->max_mtu;
-   mdev->limits.gid_table_len  = dev_lim->max_gids;
-   mdev->limits.pkey_table_len = dev_lim->max_pkeys;
+   mdev->limits.gid_table_len  = dev_lim->max_gids;
+   mdev->limits.pkey_table_len = dev_lim->max_pkeys;
mdev->limits.local_ca_ack_delay = dev_lim->local_ca_ack_delay;
mdev->limits.max_sg = dev_lim->max_sg;
mdev->limits.reserved_qps   = dev_lim->reserved_qps;
@@ -188,7 +188,7 @@ static int __devinit mthca_dev_lim(struc
 
if (dev_lim->flags & DEV_LIM_FLAG_BAD_QKEY_CNTR)
mdev->device_cap_flags |= IB_DEVICE_BAD_QKEY_CNTR;
-   
+
if (dev_lim->flags & DEV_LIM_FLAG_RAW_MULTI)
mdev->device_cap_flags |= IB_DEVICE_RAW_MULTI;
 
@@ -927,13 +927,13 @@ static int __devinit mthca_init_one(stru
 */
if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM) ||
pci_resource_len(pdev, 0) != 1 << 20) {
-   dev_err(&pdev->dev, "Missing DCS, aborting.");
+   dev_err(&pdev->dev, "Missing DCS, aborting.\n");
err = -ENODEV;
goto err_disable_pdev;
}
if (!(pci_resource_flags(pdev, 2) & IORESOURCE_MEM) ||
pci_resource_len(pdev, 2) != 1 << 23) {
-   dev_err(&pdev->dev, "Missing UAR, aborting.");
+   dev_err(&pdev->dev, "Missing UAR, aborting.\n");
err = -ENODEV;
goto err_disable_pdev;
}
@@ -1032,6 +1032,7 @@ static int __devinit mthca_init_one(stru
   (int) (mthca_hca_table[id->driver_data].latest_fw >> 
16) & 0x,
   (int) (mthca_hca_table[id->driver_data].latest_fw & 
0x));
mthca_warn(mdev, "If you have problems, try updating your HCA 
FW.\n");
+   mthca_warn(mdev, "Contact your HW supplier for current FW.\n");
}
 
err = mthca_setup_hca(mdev);
@@ -1163,7

Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Grant Grundler
On Sat, Jun 11, 2005 at 12:59:33AM +0200, Bernhard Fischer wrote:
> np, as long as you use at least 4*page_size and reflect this in
> usage() :)
> 
> >-int  size = 1;
> >+int  size = 4 * 1024;

heh - I'll fix that. :^)

thanks,
grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH][RFC] kDAPL: remove dat wrapper funct ion dat_ia_query()

2005-06-10 Thread James Lentini

Committed in revision 2588 except for the item below.

On Fri, 10 Jun 2005, Tom Duffy wrote:

tduffy> Index: linux-kernel/dat-provider/dapl_sp.h
tduffy> ===
tduffy> --- linux-kernel/dat-provider/dapl_sp.h (revision 2585)
tduffy> +++ linux-kernel/dat-provider/dapl_sp.h (working copy)
tduffy> @@ -29,12 +29,8 @@
tduffy>   * $Id$
tduffy>   */
tduffy>  
tduffy> -#ifndef DAPL_PSP_UTIL_H
tduffy> -#define DAPL_PSP_UTIL_H
tduffy> -
tduffy> -struct dapl_sp *dapl_sp_alloc(struct dapl_ia *ia_ptr, boolean_t 
is_psp);
tduffy> -
tduffy> -void dapl_sp_dealloc(struct dapl_sp *sp_ptr);

I didn't remove this declaration from the header because 
dapl_sp_dealloc is being called by dapl_get_sp_ep

tduffy> +#ifndef DAPL_SP_H
tduffy> +#define DAPL_SP_H
tduffy>  
tduffy>  void dapl_sp_link_cr(struct dapl_sp *sp_ptr, struct dapl_cr *cr_ptr);
tduffy>  
tduffy> @@ -45,4 +41,4 @@ void dapl_sp_remove_cr(struct dapl_sp *s
tduffy>  
tduffy>  void dapl_sp_remove_ep(struct dapl_ep *ep_ptr);
tduffy>  
tduffy> -#endif /* DAPL_PSP_UTIL_H */
tduffy> +#endif /* DAPL_SP_H */
tduffy> Index: linux-kernel/patches/alt_dat_provider_makefile
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Bernhard Fischer
On Fri, Jun 10, 2005 at 02:59:52PM -0700, Grant Grundler wrote:
>On Thu, Jun 09, 2005 at 05:33:52PM -0700, Grant Grundler wrote:

>(Sorry - this is a complete diff including previous
>changes that haven't been committed yet.)

np, as long as you use at least 4*page_size and reflect this in
usage() :)

>-  int  size = 1;
>+  int  size = 4 * 1024;

cheers,
-- 
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [osm] segfault in libibumad

2005-06-10 Thread Bernhard Fischer
On Fri, Jun 10, 2005 at 01:59:49PM -0700, Tom Duffy wrote:
>On Fri, 2005-06-10 at 21:46 +0200, Bernhard Fischer wrote:
>> Yes, but should it work? I don't know if its possible and implemented
>> (in the long run) to get to and query the respective API versions in
>> order not to rely on sysfs.
>
>Do you have a system without sysfs?  Is there a reason why you would
>not?
For normal small systems, i do not see a requirements for sysfs.
Artifical use case (i don't own HCAs nor euid==0 access to them):
/me builds a well-connected firewall/router/whatever. I don't want
to waste RAM/ROM for sysfs or sysfs support.

I do, however, agree that normal nodes will have sysfs support enabled
in the mid/short run.
>
>The behavior /could/ be that if it didn't find the sysfs entry, opensm
>would continue on its merry way and just assume the API was correct.

As Hal expressed, sysfs is a requirement for OpenIB; so be it.

-- 
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Grant Grundler
On Thu, Jun 09, 2005 at 05:33:52PM -0700, Grant Grundler wrote:
> I think adding such a calibration to get_cpu_mhz()
> so we can warn if /proc/cpuinfo data doesn't agree with
> gettimeofday() and get_cycles().

The appended patch prints the Mhz reported in cpuinfo and
the Mhz calculated from get_cycles()/gettimeofday().
(Sorry - this is a complete diff including previous
changes that haven't been committed yet.)

Conclusion: Maybe add another parameter to rdma_bw() to switch
the source of the CPU Mhz.
rdma_bw should continue using cpuinfo to report BW by default.

The problem is gettimeofday() will also measure any other
activity that occurs while the test is running. So it is
more likely to under report bandwidth.
Theoretically rdma_bw can discard extreme get_cycle()
readings caused by noise on the test machine.
(We don't do that today).


My r2600 says:
/proc/cpuinfo reports  1500 Mhz
Calculated speed is1492 Mhz  (130150841056 cycles / 871849 usec)

This is derived from 199. MHz FSB speed * (15:2 ratio) == 1492.5 Mhz.
The "Calculated speed" is really 1492.81 Mhz.  That's < 0.5% error.

I'm told only HP 1 and 2-way boxes exhibit this "feature" - other boxes
calculate the actual ITC speed just like the patch does and report
that in /proc/cpuinfo. I just checked my rx4640 (also ZX1 chipset)
and it seems to report the proper speed:
cpu MHz: 1299.762145
itc MHz: 1299.762145

While closer, rmda_bw still doesn't seem to agree:
/proc/cpuinfo reports 1299.76 Mhz
Calculated speed is1295 Mhz  (118700953436 cycles / 916265 usec)

On this machine, I'm going to trust the firmware is more accurate
than my user space test.

hth,
grant


Index: rdma_bw.c
===
--- rdma_bw.c   (revision 2572)
+++ rdma_bw.c   (working copy)
@@ -61,10 +61,6 @@
 
 static int page_size;
 
-struct report_options {
-   int cycles;   /* report delta's in cycles, not microsec's */
-};
-
 struct pingpong_context {
struct ibv_context *context;
struct ibv_pd  *pd;
@@ -422,19 +418,14 @@
printf("  -t, --tx-depth=   size of tx queue (default 100)\n");
printf("  -n, --iters=number of exchanges (at least 2, 
default 1000)\n");
printf("  -b, --bidirectionalmeasure bidirectional bandwidth 
(default unidirectional)\n");
-   printf("  -C, --report-cyclesreport times in cpu cycle units 
(default seconds)\n");
-   printf("  -H, --report-histogram print out all results (default print 
summary only)\n");
-   printf("  -U, --report-unsorted  (implies -H) print out unsorted 
results (default sorted)\n");
 }
 
-static void print_report(struct report_options * options,
-unsigned int iters, double size, int duplex,
+static void print_report(unsigned int iters, int size, int duplex,
 cycles_t *tposted, cycles_t *tcompleted)
 {
double cycles_to_units;
-   double tsize; /* Transferred size, in megabytes */
+   unsigned long tsize;/* Transferred size, in megabytes */
int i, j;
-   const char* units;
int opt_posted = 0, opt_completed = 0;
cycles_t opt_delta;
cycles_t t;
@@ -453,19 +444,21 @@
}
}
 
-   if (options->cycles) {
-   cycles_to_units = 1;
-   units = "cycles";
-   } else {
-   cycles_to_units = get_cpu_mhz() * 100;
-   units = "sec";
-   }
+   cycles_to_units = get_cpu_mhz() * 100;
 
tsize = duplex ? 2 : 1;
-   tsize = tsize * size / 0x10;
+   tsize = tsize * size / 1024;
 
-   printf("Bandwidth peak (#%d to #%d): %g MByte/%s\n", opt_posted, 
opt_completed, tsize * cycles_to_units / opt_delta, units);
-   printf("Bandwidth average: %g MByte/%s\n", tsize * iters * 
cycles_to_units / (tcompleted[iters - 1] - tposted[0]), units);
+   printf("Bandwidth peak (#%d to #%d): %g MB/sec\n",
+opt_posted, opt_completed,
+tsize * cycles_to_units / opt_delta / 1024);
+   printf("Bandwidth average: %g MB/sec\n",
+tsize * iters * cycles_to_units / (tcompleted[iters - 
1] - tposted[0]) / 1024);
+
+   printf("Service Demand peak (#%d to #%d): %ld cycles/KB\n",
+opt_posted, opt_completed, opt_delta/tsize);
+   printf("Service Demand Avg  : %ld cycles/KB\n",
+(tcompleted[iters - 1] - tposted[0])/(tsize * iters));
 }
 
 
@@ -478,16 +471,18 @@
struct pingpong_dest*rem_dest;
char*ib_devname = NULL;
char*servername = NULL;
+struct timeval   tod_start, tod_end;
+cycles_ttsc_start, tsc_end;
int  port = 18515;
int  ib_port = 1;
-   int  size = 1;
+   int  size 

Re: [openib-general] [osm] segfault in libibumad

2005-06-10 Thread Hal Rosenstock
On Fri, 2005-06-10 at 15:46, Bernhard Fischer wrote:
> On Fri, Jun 10, 2005 at 03:22:32PM -0400, Hal Rosenstock wrote:
> >On Fri, 2005-06-10 at 14:50, Bernhard Fischer wrote:
> 
> >> I'm not sure if userspace is supposed to work without sysfs. What do
> >> you think?
> >
> >It shouldn't segv...
> 
> Yes, but should it work? I don't know if its possible and implemented
> (in the long run) to get to and query the respective API versions in
> order not to rely on sysfs.

It is not designed to run without sysfs. sysfs is a requirement for
OpenIB.

> >> In osm_vendor_init(), i'd set int r = -1, n_cas = -1; and would say
> >> else\nif ((n_cas = umad_get_cas_names.
> >> Also, in umad_get_cas_names() i guess only freeing namelist if
> >> scandir did not return <0 may be better..
> >
> >Yes to both.
> 
> >Can you try this patch ? Thanks.
> 
> >-free(namelist);
> >+if (n >= 0)
> >+free(namelist);
> yes, should do.
> 
> >-}
> >+} else
> >+r = n_cas = -1;
> yes, but why not just initialize r and n_cas to -1?

OK. Did you try it ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [osm] segfault in libibumad

2005-06-10 Thread Tom Duffy
On Fri, 2005-06-10 at 21:46 +0200, Bernhard Fischer wrote:
> Yes, but should it work? I don't know if its possible and implemented
> (in the long run) to get to and query the respective API versions in
> order not to rely on sysfs.

Do you have a system without sysfs?  Is there a reason why you would
not?

The behavior /could/ be that if it didn't find the sysfs entry, opensm
would continue on its merry way and just assume the API was correct.

-tduffy


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] kDAPL: convert the ep list to linux native

2005-06-10 Thread Tom Duffy
This patch converts the ep list to the linux native linked list
structure.  Let me know what you think.

Signed-off-by: Tom Duffy <[EMAIL PROTECTED]>

Index: linux-kernel-ll2/dat-provider/dapl_ia.c
===
--- linux-kernel-ll2/dat-provider/dapl_ia.c (revision 2585)
+++ linux-kernel-ll2/dat-provider/dapl_ia.c (working copy)
@@ -63,7 +63,7 @@ struct dapl_ia *dapl_ia_alloc(struct dat
ia->async_error_evd = NULL;
ia->cleanup_async_error_evd = FALSE;
dapl_llist_init_entry(&ia->hca_ia_list_entry);
-   dapl_llist_init_head(&ia->ep_list_head);
+   INIT_LIST_HEAD(&ia->ep_list);
dapl_llist_init_head(&ia->lmr_list_head);
dapl_llist_init_head(&ia->rmr_list_head);
dapl_llist_init_head(&ia->pz_list_head);
@@ -104,7 +104,7 @@ bail:
 u32 dapl_ia_abrupt_close(struct dapl_ia *ia)
 {
u32 dat_status = DAT_SUCCESS;
-   struct dapl_ep *ep, *next_ep;
+   struct dapl_ep *ep;
struct dapl_lmr *lmr, *next_lmr;
struct dapl_rmr *rmr, *next_rmr;
struct dapl_pz *pz, *next_pz;
@@ -151,12 +151,7 @@ u32 dapl_ia_abrupt_close(struct dapl_ia 
sp = next_sp;
}
 
-   ep = (dapl_llist_is_empty(&ia->ep_list_head)
- ? NULL : dapl_llist_peek_head(&ia->ep_list_head));
-   while (ep != NULL) {
-   next_ep = dapl_llist_next_entry(&ia->ep_list_head,
-   &ep->common.
-   ia_list_entry);
+   list_for_each_entry(ep, &ia->ep_list, list) {
/*
 * Issue a disconnect if the EP needs it
 */
@@ -181,7 +176,6 @@ u32 dapl_ia_abrupt_close(struct dapl_ia 
dapl_dbg_log(DAPL_DBG_TYPE_WARN,
 "ia_close(ABRUPT): ep_free(%p) returns 
%x\n",
 ep, dat_status);
-   ep = next_ep;
}
 
lmr = (dapl_llist_is_empty(&ia->lmr_list_head)
@@ -332,7 +326,7 @@ u32 dapl_ia_graceful_close(struct dapl_i
 
if (!dapl_llist_is_empty(&ia->rmr_list_head) ||
!dapl_llist_is_empty(&ia->rsp_list_head) ||
-   !dapl_llist_is_empty(&ia->ep_list_head) ||
+   !list_empty(&ia->ep_list) ||
!dapl_llist_is_empty(&ia->lmr_list_head) ||
!dapl_llist_is_empty(&ia->psp_list_head) ||
!dapl_llist_is_empty(&ia->pz_list_head)) {
@@ -427,7 +421,7 @@ void dapl_ia_free(struct dapl_ia *ia)
dapl_os_assert(ia->async_error_evd == NULL);
dapl_os_assert(dapl_llist_is_empty(&ia->lmr_list_head));
dapl_os_assert(dapl_llist_is_empty(&ia->rmr_list_head));
-   dapl_os_assert(dapl_llist_is_empty(&ia->ep_list_head));
+   dapl_os_assert(list_empty(&ia->ep_list));
dapl_os_assert(dapl_llist_is_empty(&ia->evd_list_head));
dapl_os_assert(dapl_llist_is_empty(&ia->psp_list_head));
dapl_os_assert(dapl_llist_is_empty(&ia->rsp_list_head));
@@ -444,19 +438,17 @@ void dapl_ia_free(struct dapl_ia *ia)
 void dapl_ia_link_ep(struct dapl_ia *ia, struct dapl_ep *ep)
 {
spin_lock_irqsave(&ia->common.lock, ia->common.flags);
-   dapl_llist_add_head(&ia->ep_list_head,
-   &ep->common.ia_list_entry, ep);
+   list_add(&ep->list, &ia->ep_list);
spin_unlock_irqrestore(&ia->common.lock, ia->common.flags);
 }
 
 /*
- * Remove an ep from the ia info structure
+ * Remove an ep from the ia structure
  */
 void dapl_ia_unlink_ep(struct dapl_ia *ia, struct dapl_ep *ep)
 {
spin_lock_irqsave(&ia->common.lock, ia->common.flags);
-   dapl_llist_remove_entry(&ia->ep_list_head,
-   &ep->common.ia_list_entry);
+   list_del(&ep->list);
spin_unlock_irqrestore(&ia->common.lock, ia->common.flags);
 }
 
Index: linux-kernel-ll2/dat-provider/dapl.h
===
--- linux-kernel-ll2/dat-provider/dapl.h(revision 2585)
+++ linux-kernel-ll2/dat-provider/dapl.h(working copy)
@@ -34,6 +34,8 @@
 #ifndef DAPL_H
 #define DAPL_H
 
+#include 
+
 #include 
 
 #include "dapl_util.h"
@@ -152,7 +154,7 @@ struct dapl_ia {
boolean_t cleanup_async_error_evd;
 
struct dapl_llist_entry hca_ia_list_entry;  /* HCAs list of IAs */
-   struct dapl_llist_entry *ep_list_head;  /* EP queue */
+   struct list_head ep_list;   /* EP queue */
struct dapl_llist_entry *lmr_list_head; /* LMR queue */
struct dapl_llist_entry *rmr_list_head; /* RMR queue */
struct dapl_llist_entry *pz_list_head;  /* PZ queue */
@@ -195,6 +197,7 @@ struct dapl_evd {
 struct dapl_ep {
struct dat_ep ep;
struct dapl_common common;
+   struct list_head list;
/* What the DAT Consumer asked for */
struct dat_ep_param param;
 

_

Re: [openib-general] [osm] segfault in libibumad

2005-06-10 Thread Bernhard Fischer
On Fri, Jun 10, 2005 at 03:22:32PM -0400, Hal Rosenstock wrote:
>On Fri, 2005-06-10 at 14:50, Bernhard Fischer wrote:

>> I'm not sure if userspace is supposed to work without sysfs. What do
>> you think?
>
>It shouldn't segv...

Yes, but should it work? I don't know if its possible and implemented
(in the long run) to get to and query the respective API versions in
order not to rely on sysfs.

>> In osm_vendor_init(), i'd set int r = -1, n_cas = -1; and would say
>> else\nif ((n_cas = umad_get_cas_names.
>> Also, in umad_get_cas_names() i guess only freeing namelist if
>> scandir did not return <0 may be better..
>
>Yes to both.

>Can you try this patch ? Thanks.

>-  free(namelist);
>+  if (n >= 0)
>+  free(namelist);
yes, should do.

>-  }
>+  } else
>+  r = n_cas = -1;
yes, but why not just initialize r and n_cas to -1?

thank you,
-- 
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [sdp] debug strings

2005-06-10 Thread Bernhard Fischer
Hi,

Upon loading ib_sdp with debugging enabled, i see:

INIT: : SDP module load.
with attached patchlet, i see the (to me) more appealing string
ib_sdp INIT: SDP module load.



- remove duplicate ': ' and prefix messages from sdp with ib_sdp
  like ib_mthca does.
- fix odd "INIT: : INIT: SDP module unload." debug message.


thank you,
diff -X excl -rduNp 
gen2.2551.oorig/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_inet.c 
gen2.2551/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_inet.c
--- gen2.2551.oorig/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_inet.c
2005-05-17 07:36:11.0 +
+++ gen2.2551/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_inet.c  
2005-06-10 19:07:49.0 +
@@ -79,7 +79,7 @@ MODULE_PARM_DESC(recv_post_max,
 "Set the receive completion queue size.");
 
 module_param(recv_buff_max, int, 0);
-MODULE_PARM_DESC(recv_buff_max, 
+MODULE_PARM_DESC(recv_buff_max,
 "Set the maximum number of receives buffered.");
 
 module_param(send_post_max, int, 0);
@@ -91,7 +91,7 @@ MODULE_PARM_DESC(send_buff_max,
 "Set the maximum number of sends buffered.");
 
 module_param(send_usig_max, int, 0);
-MODULE_PARM_DESC(send_usig_max, 
+MODULE_PARM_DESC(send_usig_max,
 "Set the maximum consecutive unsignalled send events.");
 
 module_param(sdp_debug_level, int, 0);
@@ -1571,7 +1571,7 @@ error_proc:
  */
 static void __exit sdp_exit(void)
 {
-   sdp_dbg_init("INIT: SDP module unload.");
+   sdp_dbg_init("SDP module unload.");
/*
 * unregister
 */
diff -X excl -rduNp 
gen2.2551.oorig/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_proto.h 
gen2.2551/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_proto.h
--- gen2.2551.oorig/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_proto.h   
2005-05-17 07:36:11.0 +
+++ gen2.2551/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_proto.h 
2005-06-10 16:26:10.0 +
@@ -471,7 +471,7 @@ extern int sdp_debug_level;
 #define sdp_dbg_out(level, type, format, arg...) \
 do { \
 if (!(level > sdp_debug_level)) { \
-printk("<%d>%s: " format "\n", \
+printk("<%d>ib_sdp %s: " format "\n", \
level, type, ## arg);  \
 } \
 } while (0)
@@ -516,7 +516,7 @@ extern int sdp_debug_level;
 #define sdp_dbg_init(format, arg...) do { } while (0)
 #else
 #define sdp_dbg_init(format, arg...) \
-sdp_dbg_out(__SDP_DEBUG_INIT, "INIT: ", format, ## arg)
+sdp_dbg_out(__SDP_DEBUG_INIT, "INIT", format, ## arg)
 #endif
 
 #if __SDP_DEBUG_LEVEL < __SDP_DEBUG_WARN

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [osm] segfault in libibumad

2005-06-10 Thread Hal Rosenstock
On Fri, 2005-06-10 at 14:50, Bernhard Fischer wrote:
> Hi,
> 
> I have no ib drivers loaded, starting opensm fails with a segfault.
> 
> I'm not sure if userspace is supposed to work without sysfs. What do
> you think?

It shouldn't segv...

> In osm_vendor_init(), i'd set int r = -1, n_cas = -1; and would say
> else\nif ((n_cas = umad_get_cas_names.
> Also, in umad_get_cas_names() i guess only freeing namelist if
> scandir did not return <0 may be better..

Yes to both.

> (gdb) run
> Starting program: /opt/infiniband/ib/bin/opensm
> -
> OpenSM Rev:openib-1.0.0
> Command Line Arguments:
>  Log File: /var/log/osm.log
> -
> warn: [5900] umad_init: can't read ABI version from
> /sys/class/infiniband_mad/abi_version (No such file or directory): is
> ib_umad module loaded?
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x400b8735 in free () from /lib/tls/libc.so.6
> (gdb) bt
> #0  0x400b8735 in free () from /lib/tls/libc.so.6
> #1  0x4002235a in umad_get_cas_names (cas=0x80acc1c, max=32) at
> umad.c:513
> #2  0x4001afe1 in osm_vendor_init (p_vend=0x80acb88, p_log=0x80a979c,
> timeout=100) at osm_vendor_ibumad.c:418
> #3  0x4001b0ba in osm_vendor_new (p_log=0x80a979c, timeout=100)
> at osm_vendor_ibumad.c:452
> #4  0x0805d2e4 in osm_opensm_init (p_osm=0x80a8520, p_opt=0xbfb5dc40)
> at osm_opensm.c:234
> #5  0x0804ca19 in main (argc=1, argv=0xbfb5de14) at main.c:632

Can you try this patch ? Thanks.

-- Hal

Index: libibumad/src/umad.c
===
--- libibumad/src/umad.c(revision 2580)
+++ libibumad/src/umad.c(working copy)
@@ -511,7 +511,8 @@
DEBUG("return 1 ca");
j = 1;
}
-   free(namelist);
+   if (n >= 0)
+   free(namelist);
return j;
 }

Index: osm/libvendor/osm_vendor_ibumad.c
===
--- osm/libvendor/osm_vendor_ibumad.c   (revision 2580)
+++ osm/libvendor/osm_vendor_ibumad.c   (working copy)
@@ -405,7 +405,8 @@
"osm_vendor_init: umad_get_cas_names failed\n");
r = n_cas;
goto Exit;
-   }
+   } else
+   r = n_cas = -1;
 
p_vend->ca_count = n_cas;
p_vend->mtbl.max = OSM_UMAD_MAX_PENDING;



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [osm] segfault in libibumad

2005-06-10 Thread Bernhard Fischer
Hi,

I have no ib drivers loaded, starting opensm fails with a segfault.

I'm not sure if userspace is supposed to work without sysfs. What do
you think?



In osm_vendor_init(), i'd set int r = -1, n_cas = -1; and would say
else\nif ((n_cas = umad_get_cas_names.
Also, in umad_get_cas_names() i guess only freeing namelist if
scandir did not return <0 may be better..

(gdb) run
Starting program: /opt/infiniband/ib/bin/opensm
-
OpenSM Rev:openib-1.0.0
Command Line Arguments:
 Log File: /var/log/osm.log
-
warn: [5900] umad_init: can't read ABI version from
/sys/class/infiniband_mad/abi_version (No such file or directory): is
ib_umad module loaded?

Program received signal SIGSEGV, Segmentation fault.
0x400b8735 in free () from /lib/tls/libc.so.6
(gdb) bt
#0  0x400b8735 in free () from /lib/tls/libc.so.6
#1  0x4002235a in umad_get_cas_names (cas=0x80acc1c, max=32) at
umad.c:513
#2  0x4001afe1 in osm_vendor_init (p_vend=0x80acb88, p_log=0x80a979c,
timeout=100) at osm_vendor_ibumad.c:418
#3  0x4001b0ba in osm_vendor_new (p_log=0x80a979c, timeout=100)
at osm_vendor_ibumad.c:452
#4  0x0805d2e4 in osm_opensm_init (p_osm=0x80a8520, p_opt=0xbfb5dc40)
at osm_opensm.c:234
#5  0x0804ca19 in main (argc=1, argv=0xbfb5de14) at main.c:632

-- 
Bernhard
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread William Jordan
On 6/10/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
>Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device
>Hal> that holds a reference should register * for netdevice
>Hal> notification, and cleanup and put back the * reference if
>Hal> they receive an UNREGISTER event.
> 
>Hal> Is it correct that IPoIB does not need to register for these
>Hal> events ? If it is, then this must be something else which is
>Hal> using the IPoIB driver causing the reference count to be
>Hal> incremented but not handling these events. Troy, any idea on
>Hal> how to recreate this ?
> 
> IPoIB doesn't need to handle these events, since it is the one doing
> the unregistering.

> 

I've seen this problem before with several network drivers. Is there
any possibility that any skb's have been leaked (lost by the driver,
never freed), or that the driver is still holding onto any skb's
(waiting in a send queue)? The refcount of the skb's dst is
decremented when an skb is freed.

I also sometimes see this problem with particular kernels with our own
IPoIB implementation when IPoIB is the only network interface. In this
instance, I'm suspicious that there may be someone else in the stack
that is still holding a reference on an skb that our driver has freed,
preventing the refcount from being decremented.
-- 
Bill Jordan
SilverStorm Technologies
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH][RFC] kDAPL: remove dat wrapper funct ion dat_ia_query()

2005-06-10 Thread Tom Duffy
On Fri, 2005-06-10 at 08:33 +0300, Itamar Rabenstein wrote:
> I think that sp+psp+rsp should be one file dapl_sp.c

You are correct.  I should have done this the first time.  Here is a
patch to fix this.

Signed-off-by: Tom Duffy <[EMAIL PROTECTED]>

Index: linux-kernel/dat-provider/dapl_ia.c
===
--- linux-kernel/dat-provider/dapl_ia.c (revision 2585)
+++ linux-kernel/dat-provider/dapl_ia.c (working copy)
@@ -35,7 +35,7 @@
 #include "dapl_evd.h"
 #include "dapl_hca_util.h"
 #include "dapl_openib_util.h"
-#include "dapl_sp_util.h"
+#include "dapl_sp.h"
 #include "dapl_cr.h"
 
 struct dapl_ia *dapl_ia_alloc(struct dat_provider *provider,
Index: linux-kernel/dat-provider/Makefile
===
--- linux-kernel/dat-provider/Makefile  (revision 2585)
+++ linux-kernel/dat-provider/Makefile  (working copy)
@@ -28,13 +28,10 @@ PROVIDER_MODULES := \
 dapl_lmr   \
 dapl_mr_util   \
 dapl_provider  \
-dapl_sp_util   \
-dapl_psp   \
 dapl_pz \
 dapl_ring_buffer_util  \
 dapl_rmr   \
-dapl_rsp   \
-dapl_sp_util   \
+dapl_sp\
 dapl_srq   \
dapl_util
 
Index: linux-kernel/dat-provider/dapl_psp.c
===
--- linux-kernel/dat-provider/dapl_psp.c(revision 2585)
+++ linux-kernel/dat-provider/dapl_psp.c(working copy)
@@ -1,445 +0,0 @@
-/*
- * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
- *
- * This Software is licensed under one of the following licenses:
- *
- * 1) under the terms of the "Common Public License 1.0" a copy of which is
- *available from the Open Source Initiative, see
- *http://www.opensource.org/licenses/cpl.php.
- *
- * 2) under the terms of the "The BSD License" a copy of which is
- *available from the Open Source Initiative, see
- *http://www.opensource.org/licenses/bsd-license.php.
- *
- * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
- *copy of which is available from the Open Source Initiative, see
- *http://www.opensource.org/licenses/gpl-license.php.
- *
- * Licensee has the right to choose one of the above licenses.
- *
- * Redistributions of source code must retain the above copyright
- * notice and one of the license notices.
- *
- * Redistributions in binary form must reproduce both the above copyright
- * notice, one of the license notices in the documentation
- * and/or other materials provided with the distribution.
- */
-
-/*
- * $Id$
- */
-
-#include "dapl.h"
-#include "dapl_sp_util.h"
-#include "dapl_ia.h"
-#include "dapl_openib_util.h"
-
-/*
- * dapl_psp_create_any
- *
- * Create a persistent Public Service Point that can recieve multiple
- * requests for connections and generate multiple connection request
- * instances that wil be delivered to the specified Event Dispatcher
- * in a notification event. Differs from dapl_psp_create() in that
- * the conn_qual is selected by the implementation and returned to
- * the user.
- *
- * Input:
- * ia
- * evd
- * psp_flags
- *
- * Output:
- * conn_qual
- * psp
- *
- * Returns:
- * DAT_SUCCESS
- * DAT_INSUFFICIENT_RESOURCES
- * DAT_INVALID_HANDLE
- * DAT_INVALID_PARAMETER
- * DAT_CONN_QUAL_IN_USE
- * DAT_MODEL_NOT_SUPPORTED
- */
-u32 dapl_psp_create_any(struct dat_ia *ia, DAT_CONN_QUAL *conn_qual,
-   struct dat_evd *evd, enum dat_psp_flags psp_flags,
-   struct dat_sp **psp)
-{
-   static DAT_CONN_QUAL hint_conn_qual = 1000; /* seed value */
-   struct dapl_ia *ia_ptr;
-   struct dapl_sp *sp_ptr;
-   struct dapl_evd *evd_ptr;
-   u32 status = DAT_SUCCESS;
-   int i;
-
-   ia_ptr = (struct dapl_ia *)ia;
-
-   if (!ia_ptr) {
-   status = DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_IA);
-   goto bail;
-   }
-   if (!evd) {
-   status =
-   DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_EVD_CR);
-   goto bail;
-   }
-
-   if (psp == NULL) {
-   status = DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG5);
-   goto bail;
-   }
-   if (conn_qual == NULL) {
-   status = DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG2);
-   goto bail;
-   }
-
-   evd_ptr = (struct dapl_evd *)evd;
-   if (!(evd_ptr->evd_flags & DAT_EVD_CR_FLAG)) {
-   status = DAT_ERROR(DAT_INVALID_HANDLE,
-  DAT_INVALID_HANDLE_EVD_CR);
-   goto bail;
-   }
-
-   if (psp_f

Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Grant Grundler
On Thu, Jun 09, 2005 at 10:13:13PM -0700, Shirley Ma wrote:
> >I think adding such a calibration to get_cpu_mhz()
> >so we can warn if /proc/cpuinfo data doesn't agree with
> >gettimeofday() and get_cycles().
> 
> using below to calculate the cycles_to_unit is more accurate.
> 
> clock_gettime();
> get_cycles();
> sleep(5);
> clock_gettime();
> get_cycles();

While I agree in general with the concept, I don't
agree with this algorithm for two reaons:
o I don't trust get_cycles() to measure anything more than 1s.
  Problem is on some 32-bit arches, they may only have (or only read)
  a 32-bit cycle counter.  This can wrap in as little as 2 or 3 seconds.
  This obviously isn't a problem with 64-bit counters.

o Trivial but annoying - adding 5 seconds per run means a scripted
  batch job could take minutes longer.

> cycles_to_unit = offset - cycles/ offset - time;

I think we can effectively do the same thing by summing the
recorded "get_cycles" values and compare that sum to before/after
gettimeofday() readings. I'll submit a patch later today that
effectively does that.

grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] A couple of kdapltest oopses

2005-06-10 Thread Hal Rosenstock
On Fri, 2005-06-10 at 11:29, Itamar Rabenstein wrote: 
> is this new ? did you saw it before ?

I pretty sure I did.

> do you see the problem with -i 1000 ?

Yes, but it takes more repeats of the command to make it occur.

> both oops are in the malloc stage.
> I have seen in the past problems were dapl tried to malloc a very big
> memory.

Or is it some malloc'd memory is not being returned and eventually runs
out ? Shouldn't the failure be more graceful too ?

-- Hal

>  Itamar
> 
> 
> > --Original Message--
> > From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> > Sent: Friday, June 10, 2005 4:32 PM
> > To: James Lentini
> > Cc: openib-general@openib.org
> > Subject: [openib-general] A couple of kdapltest oopses
> > 
> > 
> > Hi,
> > 
> > First on the client when running transaction test, I see the 
> > following:
> > kdapltest -T T -s  -D mthca0a -d -i 1 -w 
> > 8 client SR server SR
> > 
> > Jun 10 08:58:17 localhost kernel: DT_Mdep_Thread_: page 
> > allocation failure. order:0, mode:0x20
> > Jun 10 08:58:17 localhost kernel:  [] 
> > __alloc_pages+0x2b2/0x440
> > Jun 10 08:58:17 localhost kernel:  [] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [] 
> > kmem_getpages+0x31/0xb0
> > Jun 10 08:58:17 localhost kernel:  [] cache_grow+0x139/0x360
> > Jun 10 08:58:17 localhost kernel:  [] 
> > cache_alloc_refill+0x153/0x340
> > Jun 10 08:58:17 localhost kernel:  [] dbg_redzone1+0x15/0x30
> > Jun 10 08:58:17 localhost kernel:  [] 
> > cache_alloc_debugcheck_after+0x6e/0x1a0
> > Jun 10 08:58:17 localhost kernel:  [] __kmalloc+0xb1/0xe0
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Tdep_PT_Printf+0x16/0x1b0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [] 
> > cache_free_debugcheck+0x196/0x2d0
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > kernel_thread_helper+0x5/0x10
> > Jun 10 08:58:17 localhost kernel: Unable to handle kernel 
> > NULL pointer dereference at virtual address 0004
> > Jun 10 08:58:17 localhost kernel:  printing eip:
> > Jun 10 08:58:17 localhost kernel: c022048b
> > Jun 10 08:58:17 localhost kernel: *pde = 07cb6067
> > Jun 10 08:58:17 localhost kernel: *pte = 
> > Jun 10 08:58:17 localhost kernel: Oops: 0002 [#1]
> > Jun 10 08:58:17 localhost kernel: DEBUG_PAGEALLOC
> > Jun 10 08:58:17 localhost kernel: Modules linked in: 
> > kdapltest ib_dat_provider ib_cm ib_at dat ib_ipoib ib_sa 
> > ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport 
> > uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 
> > mii evdev usbcore
> > Jun 10 08:58:17 localhost kernel: CPU:0
> > Jun 10 08:58:17 localhost kernel: EIP:0060:[]   
> >  Not tainted VLI
> > Jun 10 08:58:17 localhost kernel: EFLAGS: 00010283   (2.6.11.6) 
> > Jun 10 08:58:17 localhost kernel: EIP is at vsnprintf+0x4b/0x4c0
> > Jun 10 08:58:17 localhost kernel: eax: 0054   ebx: 
> > c1d17f78   ecx:    edx: d0a35eef
> > Jun 10 08:58:17 localhost kernel: esi: 0004   edi: 
> > c1d17f78   ebp: 0103   esp: ce5bfd84
> > Jun 10 08:58:17 localhost kernel: ds: 007b   es: 007b   ss: 0068
> > Jun 10 08:58:17 localhost kernel: Process DT_Mdep_Thread_ 
> > (pid: 6696, threadinfo=ce5be000 task=c3730a90)
> > Jun 10 08:58:17 localhost kernel: Stack: c740 0020 
> >  d0a33995 0104  c1d17f78 d0a33995 
> > Jun 10 08:58:17 localhost kernel:0104 0020 
> > c1d17f78  c1d17f78 c5b58000 d0a3484b 0004 
> > Jun 10 08:58:17 localhost kernel:0100 d0a35eef 
> > ce5bfdf4 007b ff05 c1d17f78 cf34ef78 c60d2060 
> > Jun 10 08:58:17 localhost kernel: Call Trace:
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Tdep_PT_Printf+0x3b/0x1b0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> > Jun 10 08:58:17 localhost kernel:  [] 
> > kernel_map_pages+0x28/0x70
> > Jun 10 08:58:17 localhost kernel:  [] 
> > cache_free_debugcheck+0x196/0x2d0
> > Jun 10 08:58:17 localhost kernel:  [] 
> > DT_Mdep_Thread_Sta

Re: [openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread Roland Dreier
Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device
Hal> that holds a reference should register * for netdevice
Hal> notification, and cleanup and put back the * reference if
Hal> they receive an UNREGISTER event.

Hal> Is it correct that IPoIB does not need to register for these
Hal> events ? If it is, then this must be something else which is
Hal> using the IPoIB driver causing the reference count to be
Hal> incremented but not handling these events. Troy, any idea on
Hal> how to recreate this ?

IPoIB doesn't need to handle these events, since it is the one doing
the unregistering.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] cycles_to_units is incorrect in rdma_lat, rdma_bw.

2005-06-10 Thread Grant Grundler
On Thu, Jun 09, 2005 at 10:17:33PM -0700, Shirley Ma wrote:
> > Please provide "cat /proc/cpuinfo" output so I can rule out
> > broken firmware and look for something else.
> 
> processor   : 0
> cpu : PPC970, altivec supported
> clock   : 1600.00MHz
> revision: 2.2

Ah no wonder...every other arch uses "cpu MHz".

> I modified the get_cpu_mhz() to for PPC. 
> rc = sscanf(buf, "clock : %lf", &m);

Is 1600Mhz correct? I'm assuming it is.
Does the attached patch work for you?

It will allow PPC to change to "cpu MHz" and it will still work.
Or accomodate other arches that might have cloned PPC code.

grant

Index: get_clock.c
===
--- get_clock.c (revision 2572)
+++ get_clock.c (working copy)
@@ -48,8 +48,11 @@
double m;
int rc;
rc = sscanf(buf, "cpu MHz : %lf", &m);
-   if (rc != 1)
-   continue;
+   if (rc != 1) {  /* blech...PPC does it different */
+   rc = sscanf(buf, "clock : %lf", &m);
+   if (rc != 1)
+   continue;
+   }
if (mhz == 0.0) {
mhz = m;
continue;
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] A couple of kdapltest oopses

2005-06-10 Thread Itamar Rabenstein
is this new ? did you saw it before ?
do you see the problem with -i 1000 ?
both oops are in the malloc stage.
I have seen in the past problems were dapl tried to malloc a very big
memory.

 Itamar


> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 10, 2005 4:32 PM
> To: James Lentini
> Cc: openib-general@openib.org
> Subject: [openib-general] A couple of kdapltest oopses
> 
> 
> Hi,
> 
> First on the client when running transaction test, I see the 
> following:
> kdapltest -T T -s  -D mthca0a -d -i 1 -w 
> 8 client SR server SR
> 
> Jun 10 08:58:17 localhost kernel: DT_Mdep_Thread_: page 
> allocation failure. order:0, mode:0x20
> Jun 10 08:58:17 localhost kernel:  [] 
> __alloc_pages+0x2b2/0x440
> Jun 10 08:58:17 localhost kernel:  [] 
> kernel_map_pages+0x28/0x70
> Jun 10 08:58:17 localhost kernel:  [] 
> kmem_getpages+0x31/0xb0
> Jun 10 08:58:17 localhost kernel:  [] cache_grow+0x139/0x360
> Jun 10 08:58:17 localhost kernel:  [] 
> cache_alloc_refill+0x153/0x340
> Jun 10 08:58:17 localhost kernel:  [] dbg_redzone1+0x15/0x30
> Jun 10 08:58:17 localhost kernel:  [] 
> cache_alloc_debugcheck_after+0x6e/0x1a0
> Jun 10 08:58:17 localhost kernel:  [] __kmalloc+0xb1/0xe0
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Tdep_PT_Printf+0x16/0x1b0 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> kernel_map_pages+0x28/0x70
> Jun 10 08:58:17 localhost kernel:  [] 
> cache_free_debugcheck+0x196/0x2d0
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> kernel_thread_helper+0x5/0x10
> Jun 10 08:58:17 localhost kernel: Unable to handle kernel 
> NULL pointer dereference at virtual address 0004
> Jun 10 08:58:17 localhost kernel:  printing eip:
> Jun 10 08:58:17 localhost kernel: c022048b
> Jun 10 08:58:17 localhost kernel: *pde = 07cb6067
> Jun 10 08:58:17 localhost kernel: *pte = 
> Jun 10 08:58:17 localhost kernel: Oops: 0002 [#1]
> Jun 10 08:58:17 localhost kernel: DEBUG_PAGEALLOC
> Jun 10 08:58:17 localhost kernel: Modules linked in: 
> kdapltest ib_dat_provider ib_cm ib_at dat ib_ipoib ib_sa 
> ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport 
> uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 
> mii evdev usbcore
> Jun 10 08:58:17 localhost kernel: CPU:0
> Jun 10 08:58:17 localhost kernel: EIP:0060:[]   
>  Not tainted VLI
> Jun 10 08:58:17 localhost kernel: EFLAGS: 00010283   (2.6.11.6) 
> Jun 10 08:58:17 localhost kernel: EIP is at vsnprintf+0x4b/0x4c0
> Jun 10 08:58:17 localhost kernel: eax: 0054   ebx: 
> c1d17f78   ecx:    edx: d0a35eef
> Jun 10 08:58:17 localhost kernel: esi: 0004   edi: 
> c1d17f78   ebp: 0103   esp: ce5bfd84
> Jun 10 08:58:17 localhost kernel: ds: 007b   es: 007b   ss: 0068
> Jun 10 08:58:17 localhost kernel: Process DT_Mdep_Thread_ 
> (pid: 6696, threadinfo=ce5be000 task=c3730a90)
> Jun 10 08:58:17 localhost kernel: Stack: c740 0020 
>  d0a33995 0104  c1d17f78 d0a33995 
> Jun 10 08:58:17 localhost kernel:0104 0020 
> c1d17f78  c1d17f78 c5b58000 d0a3484b 0004 
> Jun 10 08:58:17 localhost kernel:0100 d0a35eef 
> ce5bfdf4 007b ff05 c1d17f78 cf34ef78 c60d2060 
> Jun 10 08:58:17 localhost kernel: Call Trace:
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Malloc+0x25/0x60 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Tdep_PT_Printf+0x3b/0x1b0 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Transaction_Run+0x48f/0xb60 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> kernel_map_pages+0x28/0x70
> Jun 10 08:58:17 localhost kernel:  [] 
> cache_free_debugcheck+0x196/0x2d0
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
> Jun 10 08:58:17 localhost kernel:  [] 
> kernel_thread_helper+0x5/0x10
> Jun 10 08:58:17 localhost kernel: Code: f0 48 39 c5 73 0d 89 
> f2 f7 da bd ff ff ff ff 89 54 24 40 8b 54 24 44 80 3a 00 74 
> 23 8d 74 26 00 0f b6 02 3c 25 74 3d 39 ee 77 06 <88> 06 8b 54 
> 24 44 46 89 d0 42 89 54 24 44 80 78 01 00 75 e1 39 
> 
> then on the server side, when I try 

RE: [openib-general] [PATCH][RFC] kDAPL: remove dat wrapper funct ion dat_ia_query()

2005-06-10 Thread James Lentini



On Fri, 10 Jun 2005, Itamar Rabenstein wrote:




Does that sound reasonable?



Ok, I can wait.  I definitely understand the need to stabilize the
provider and not introduce any new bugs.

I will start playing around again with getting rid of dapl's own linked
lists.

-tduffy


Hi All,

James,
can you close the issue of merging files into one file ?
it is hard to create a big patch when you know that the file you are working
on
is going to be deleted or merged with other file.
I think we only left ia and cr


Don't let that block you. If you create a patch and I merge the files, 
I'll fix your patch so it applies.



I think that sp+psp+rsp should be one file dapl_sp.c

Tom,
dapl has many lists and I think that most of them should be kernel lists.
I think that the following list :
in struct dapl_sp : struct dapl_llist_entry *cr_list_head;  /* CR
pending queue */
should be replaced with kernel rb tree because
we search in this list in the hot path of the connect-disconnect flow.


Do we know that this is a performance bottleneck?


we have agreed in the past that evd ring buffer should be replaced also with
kernel lists.


If you can replace the ring buffer with a data structure provided by 
Linux, that is a win. Maintaining the locking guarantees will be the 
hard part.



I will prepare a patch that will remove dapl_hash (no need for this hash).

Itamar



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread Hal Rosenstock
Hi Roland and Troy,

Last week, Troy reported the following:
On Fri, 2005-06-03 at 16:33, Troy Benjegerdes wrote:
> > > Also, I have two machines in a state right now where they are 
> > > printing out:
> > > 
> > > kernel: unregister_netdevice: waiting for ib0 to become free. 
> > > Usage count = 1

dev/core.c netdev_wait_allrefs says:
 * Any protocol or device that holds a reference should register
 * for netdevice notification, and cleanup and put back the
 * reference if they receive an UNREGISTER event.

Is it correct that IPoIB does not need to register for these events ? If
it is, then this must be something else which is using the IPoIB driver
causing the reference count to be incremented but not handling these
events. Troy, any idea on how to recreate this ?

Thanks.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] A couple of kdapltest oopses

2005-06-10 Thread Hal Rosenstock
Hi,

First on the client when running transaction test, I see the following:
kdapltest -T T -s  -D mthca0a -d -i 1 -w 8 client SR server 
SR

Jun 10 08:58:17 localhost kernel: DT_Mdep_Thread_: page allocation failure. 
order:0, mode:0x20
Jun 10 08:58:17 localhost kernel:  [] __alloc_pages+0x2b2/0x440
Jun 10 08:58:17 localhost kernel:  [] kernel_map_pages+0x28/0x70
Jun 10 08:58:17 localhost kernel:  [] kmem_getpages+0x31/0xb0
Jun 10 08:58:17 localhost kernel:  [] cache_grow+0x139/0x360
Jun 10 08:58:17 localhost kernel:  [] cache_alloc_refill+0x153/0x340
Jun 10 08:58:17 localhost kernel:  [] dbg_redzone1+0x15/0x30
Jun 10 08:58:17 localhost kernel:  [] 
cache_alloc_debugcheck_after+0x6e/0x1a0
Jun 10 08:58:17 localhost kernel:  [] __kmalloc+0xb1/0xe0
Jun 10 08:58:17 localhost kernel:  [] DT_Mdep_Malloc+0x25/0x60 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] DT_Tdep_PT_Printf+0x16/0x1b0 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] DT_Transaction_Run+0x48f/0xb60 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] kernel_map_pages+0x28/0x70
Jun 10 08:58:17 localhost kernel:  [] 
cache_free_debugcheck+0x196/0x2d0
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] kernel_thread_helper+0x5/0x10
Jun 10 08:58:17 localhost kernel: Unable to handle kernel NULL pointer 
dereference at virtual address 0004
Jun 10 08:58:17 localhost kernel:  printing eip:
Jun 10 08:58:17 localhost kernel: c022048b
Jun 10 08:58:17 localhost kernel: *pde = 07cb6067
Jun 10 08:58:17 localhost kernel: *pte = 
Jun 10 08:58:17 localhost kernel: Oops: 0002 [#1]
Jun 10 08:58:17 localhost kernel: DEBUG_PAGEALLOC
Jun 10 08:58:17 localhost kernel: Modules linked in: kdapltest ib_dat_provider 
ib_cm ib_at dat ib_ipoib ib_sa ib_umad ide_cd cdrom lp ipv6 autofs parport_pc 
parport uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 mii evdev 
usbcore
Jun 10 08:58:17 localhost kernel: CPU:0
Jun 10 08:58:17 localhost kernel: EIP:0060:[]Not tainted VLI
Jun 10 08:58:17 localhost kernel: EFLAGS: 00010283   (2.6.11.6) 
Jun 10 08:58:17 localhost kernel: EIP is at vsnprintf+0x4b/0x4c0
Jun 10 08:58:17 localhost kernel: eax: 0054   ebx: c1d17f78   ecx:  
  edx: d0a35eef
Jun 10 08:58:17 localhost kernel: esi: 0004   edi: c1d17f78   ebp: 0103 
  esp: ce5bfd84
Jun 10 08:58:17 localhost kernel: ds: 007b   es: 007b   ss: 0068
Jun 10 08:58:17 localhost kernel: Process DT_Mdep_Thread_ (pid: 6696, 
threadinfo=ce5be000 task=c3730a90)
Jun 10 08:58:17 localhost kernel: Stack: c740 0020  d0a33995 
0104  c1d17f78 d0a33995 
Jun 10 08:58:17 localhost kernel:0104 0020 c1d17f78  
c1d17f78 c5b58000 d0a3484b 0004 
Jun 10 08:58:17 localhost kernel:0100 d0a35eef ce5bfdf4 007b 
ff05 c1d17f78 cf34ef78 c60d2060 
Jun 10 08:58:17 localhost kernel: Call Trace:
Jun 10 08:58:17 localhost kernel:  [] DT_Mdep_Malloc+0x25/0x60 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] DT_Mdep_Malloc+0x25/0x60 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] DT_Tdep_PT_Printf+0x3b/0x1b0 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] DT_Transaction_Run+0x48f/0xb60 
[kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Transaction_Main+0x1383/0x21a0 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] kernel_map_pages+0x28/0x70
Jun 10 08:58:17 localhost kernel:  [] 
cache_free_debugcheck+0x196/0x2d0
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] 
DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest]
Jun 10 08:58:17 localhost kernel:  [] kernel_thread_helper+0x5/0x10
Jun 10 08:58:17 localhost kernel: Code: f0 48 39 c5 73 0d 89 f2 f7 da bd ff ff 
ff ff 89 54 24 40 8b 54 24 44 80 3a 00 74 23 8d 74 26 00 0f b6 02 3c 25 74 3d 
39 ee 77 06 <88> 06 8b 54 24 44 46 89 d0 42 89 54 24 44 80 78 01 00 75 e1 39 

then on the server side, when I try to rmmod kdapltest, I get:

Unable to handle kernel paging request at 88243a05 RIP: 
[]
PGD 103027 PUD 105027 PMD 3baeb067 PTE 0
Oops: 0010 [1] SMP 
CPU 1 
Modules linked in: ib_dat_provider ib_cm ib_at dat ib_ipoib ib_sa parport_pc lp 
parport autofs4 sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter 
ip_tables video button battery ac md5 ipv6 ohci_hcd i2c_amd8111 i2c_core 
hw_random ib_mthca ib_mad ib_core e100 mii tg3 floppy dm_snapshot dm_zero 
dm_mirror ext3 jbd dm_mod sata_sil libata sd_mod scsi_mod
Pid: 9094, comm: kdapltest Not tainted 2.6.11.6
RIP: 0010:[] []
RSP: 0018:810037787e58  EFLAGS: 00010292
RAX: 

[openib-general] OpenSM: Support for LinkSpeedActive (LSA) added

2005-06-10 Thread Hal Rosenstock
I just added the support for LinkSpeedActive(LSA) for DDR/QDR links into
OpenSM. Diagnostics (smpquery portinfo) already supports this.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [CM] possible problem with crossing DREQs.

2005-06-10 Thread Hal Rosenstock
On Thu, 2005-06-09 at 15:47, Sean Hefty wrote: 
> >  I'm seeing an unusual problem when both halves of a connection
> >actively disconnect at the same time. Each connection peer issues
> >a DREQ at the same time, next each receive the DREQ and responds
> >with a DREP, and finally each connection gets a callback for the
> >transition to the idle state. However, at this point it appears
> >that each CM keeps retransmitting DREQ requests, which then seems
> >to interfere with new connection establishment.
> 
> I think that I understand what's happening.  Receiving the DREQ
> changed the state of the cm_id, but did not cancel the previous send.
> 
> I'm actually out on vacation for a little over two weeks (and will
> be totally away from e-mail after Friday), but something
> like the patch below might fix the issue.  (Note that I didn't test /
> compile this.)  If it does work for you, feel free to commit it.

This works for me. My test case is a little different. It is repeated
loopback kdapl quit tests but it does resolve the same problem. I am
comitting this change. Thanks.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] uverbs performance; ibv_pingpong poll vs sleep

2005-06-10 Thread Steven Wooding

Thanks. I'll  give pipeling ago in my app.
 
Cheers,
 
Steve.
 
Roland Dreier <[EMAIL PROTECTED]> wrote:
Steven> I have compaired the data rates using ibv_pingpong withSteven> and without the -e option. Therefore, using polling andSteven> waiting the CQ events (sleeping).Steven> Is there any way to trade off the data rate with the CPUSteven> usage (I was thinking of some timeout from polling).I suppose you could have some sort of adaptive polling scheme thatspins polling for a while and then sleeps waiting for an event.However, as Michael said, it's probably better to use pipelining andpost multiple send work requests. This hides the latency of getting acompletion event by keeping the HCA busy.- R.
 
		How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos. Get Yahoo! 
Photos___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Portability of AIO

2005-06-10 Thread Steven Wooding
Hi,
 
Does anybody know of any plans to make the API of libaio (Native Linux AIO) portable (ie make it the standard AIO API)?
 
Thanks,
 
 
Steve.
		Yahoo! Messenger
 NEW - crystal clear PC to PC
calling worldwide with voicemail
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general