RE: [openib-general] cmpost: failure sending REQ: -22
Has anyone seen ib_send_cm_req() return -22? I believe that this is a timeout error, possibly indicating that the server side of the connection wasn't running. You may also want to verify the slid and dlid are correct for your configuration. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] cmpost: failure sending REQ: -22
On Tue, 2005-05-31 at 03:51, Sean Hefty wrote: Has anyone seen ib_send_cm_req() return -22? I believe that this is a timeout error, possibly indicating that the server side of the connection wasn't running. You may also want to verify the slid and dlid are correct for your configuration. Don't you get a REJ now when there is no one listening on a service ID requested ? -22 is EINVAL. In terms of ib_send_cm_req, it is returned for a number of cases: 1. peer to peer connection is requested 2. No primary path is supplied 3. QP is not RC or UC 4. private data is supplied and length 92 5. alternate path supplied and PKEY or MTU does not match primary path 6. connection state is not IDLE 7. Primary or alternate path SGID or PKey does not match those of port -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH] [ib_at]: Update async structure prio r to returning requests to appropriate cache
Thanks Hal, this patch fixed the problem (oops in ib_at.c) Itamar -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED] Sent: Monday, May 30, 2005 8:37 PM To: James Lentini Cc: openib-general@openib.org Subject: [openib-general] [PATCH] [ib_at]: Update async structure prior to returning requests to appropriate cache [ib_at]: Update async structure prior to returning requests to appropriate cache. This change affacts req_free, free_route_req, and free_path_req. Also, some other minor changes to eliminate unneeded parameter passed to path_req_output and changes to some DEBUG messages. Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Index: at.c === --- at.c (revision 2507) +++ at.c (working copy) @@ -155,7 +155,8 @@ static void free_route_req(void *async); static void free_path_req(void *async); -static void path_req_complete(int stat, struct ib_sa_path_rec *ret, void *ctx); +static void path_req_complete(int status, struct ib_sa_path_rec *resp, + void *context); static int resolve_path(struct path_req *req); static int resolve_ip(struct ib_at_src *src, u32 dst_ip, u32 src_ip, @@ -274,7 +275,6 @@ } memset(dgid, 0, sizeof *dgid); - return 0; } @@ -319,11 +319,10 @@ break; default: WARN(bad async req type %d, pend-type); + pend-status = IB_AT_STATUS_INVALID; + pend-type = IBAT_REQ_NONE; + pend-sa_query = NULL; } - - pend-status = IB_AT_STATUS_INVALID; - pend-type = IBAT_REQ_NONE; - pend-sa_query = NULL; } static int req_start(struct async *q, struct async *pend, @@ -464,6 +463,11 @@ struct route_req *req = container_of(async, struct route_req, pend); DEBUG(free async %p req %p, async, req); + + req-pend.status = IB_AT_STATUS_INVALID; + req-pend.type = IBAT_REQ_NONE; + req-pend.sa_query = NULL; + kmem_cache_free(route_req_cache, req); } @@ -472,6 +476,11 @@ struct path_req *req = container_of(async, struct path_req, pend); DEBUG(free async %p req %p, async, req); + + req-pend.status = IB_AT_STATUS_INVALID; + req-pend.type = IBAT_REQ_NONE; + req-pend.sa_query = NULL; + kmem_cache_free(path_req_cache, req); } @@ -537,15 +546,14 @@ return 1; /* one entry is filled */ } -static int path_req_output(struct path_req *req, struct ib_sa_path_rec *resp, -int npath, struct ib_sa_path_rec *out, int nelem) +static int path_req_output(struct ib_sa_path_rec *resp, int npath, +struct ib_sa_path_rec *out, int nelem) { int n = min(npath, nelem); - DEBUG(parent %p output %d records, req, n); + DEBUG(fill ib_sa_path_rec %p output %d records, out, n); memcpy(out, resp, n * sizeof (struct ib_sa_path_rec)); - return n; } @@ -579,7 +587,7 @@ unsigned long flags; struct async *pend; - DEBUG(req %p, req); + DEBUG(req %p status %d, req, status); if (req-pend.parent) { WARN(for child req %p???, req); @@ -598,12 +606,12 @@ return; } - req-pend.nelem = path_req_output(req, resp, 1, + req-pend.nelem = path_req_output(resp, 1, req-pend.data, req-pend.nelem); spin_lock_irqsave(pending_reqs.lock, flags); for (pend = req-pend.waiting; pend; pend = pend-waiting) - pend-nelem = path_req_output(req, resp, 1, + pend-nelem = path_req_output(resp, 1, pend-data, pend-nelem); req_end(req-pend, req-pend.nelem, NULL); @@ -876,7 +884,7 @@ if (in_cache) { DEBUG(!in_cache free req %p, preq); kmem_cache_free(path_req_cache, preq); - return path_req_output(preq, cached_arr, n, path_arr, npath); + return path_req_output(cached_arr, n, path_arr, npath); } */ @@ -969,7 +977,6 @@ EXPORT_SYMBOL(ib_at_status); - /* * Internal init/cleanup functions: */ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH][kdapl] replace spin_lock with spin_lock_irqsave in kdapltest
Itamar, Why does this patch comment out uses of the g_PerfTestLock? james On Sun, 29 May 2005, Itamar wrote: itamar With this patch i can run kdapltest -T T ... -t 4 -w 8 ... itamar I still see problems but in general this patch helps the stability a lot. itamar itamar replace spin_lock with spin_lock_irqsave in kdapltest itamar Signed-off-by: Itamar Rabenstein [EMAIL PROTECTED] itamar itamar Index: test/dapl_transaction_stats.c itamar === itamar --- test/dapl_transaction_stats.c (revision 2509) itamar +++ test/dapl_transaction_stats.c (working copy) itamar @@ -45,12 +45,13 @@ itamar DT_transaction_stats_set_ready (DT_Tdep_Print_Head *phead, itamar Transaction_Stats_t * transaction_stats) itamar { itamar -DT_Mdep_Lock (transaction_stats-lock); itamar + unsigned long flags; itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar transaction_stats-wait_count--; itamar itamar DT_Tdep_PT_Debug (1,(phead,Received Sync Message from server (%d left)\n, itamar transaction_stats-wait_count)); itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar } itamar itamar boolean_t itamar @@ -86,7 +87,8 @@ itamarunsigned int bytes_rdma_read, itamarunsigned int bytes_rdma_write) itamar { itamar -DT_Mdep_Lock (transaction_stats-lock); itamar + unsigned long flags; itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar itamar /* look for the longest time... */ itamar if (time_ms transaction_stats-time_ms) itamar @@ -99,5 +101,5 @@ itamar transaction_stats-bytes_recv += bytes_recv; itamar transaction_stats-bytes_rdma_read += bytes_rdma_read; itamar transaction_stats-bytes_rdma_write += bytes_rdma_write; itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar } itamar Index: test/dapl_server.c itamar === itamar --- test/dapl_server.c (revision 2509) itamar +++ test/dapl_server.c (working copy) itamar @@ -49,7 +49,7 @@ itamar unsigned char *buffp = NULL; itamar unsigned char *module = DT_cs_Server; itamar intstatus = 0; itamar - itamar + unsigned long flags; itamar DAT_DTO_COOKIE dto_cookie; itamar struct dat_dto_completion_event_data dto_stat; itamar u32 ret; itamar @@ -616,9 +616,9 @@ itamar itamar itamar /* Count this new client and get ready for the next */ itamar - DT_Mdep_Lock (ps_ptr-num_clients_lock); itamar + spin_lock_irqsave (ps_ptr-num_clients_lock,flags); itamar ps_ptr-num_clients++; itamar - DT_Mdep_Unlock (ps_ptr-num_clients_lock); itamar + spin_unlock_irqrestore (ps_ptr-num_clients_lock,flags); itamar itamar /* we passed the pt_ptr to the thread and must now 'forget' it */ itamar pt_ptr = NULL; itamar Index: test/dapl_thread.c itamar === itamar --- test/dapl_thread.c (revision 2509) itamar +++ test/dapl_thread.c (working copy) itamar @@ -83,6 +83,7 @@ itamar unsigned int stacksize) itamar { itamar Thread *thread_ptr; itamar + unsigned long flags; itamar thread_ptr = (Thread *) DT_MemListAlloc (pt_ptr, thread.c, THREAD, sizeof (Thread)); itamar if (thread_ptr == NULL) itamar { itamar @@ -93,9 +94,9 @@ itamar thread_ptr-thread_handle = 0; itamar thread_ptr-stacksize = stacksize; itamar itamar -DT_Mdep_Lock (pt_ptr-Thread_counter_lock); itamar +spin_lock_irqsave (pt_ptr-Thread_counter_lock,flags); itamar pt_ptr-Thread_counter++; itamar -DT_Mdep_Unlock (pt_ptr-Thread_counter_lock); itamar +spin_unlock_irqrestore (pt_ptr-Thread_counter_lock,flags); itamar itamar DT_Mdep_Thread_Init_Attributes (thread_ptr); itamar itamar @@ -108,11 +109,12 @@ itamar void itamar DT_Thread_Destroy (Thread * thread_ptr, Per_Test_Data_t * pt_ptr) itamar { itamar + unsigned long flags; itamar if (thread_ptr) itamar { itamar - DT_Mdep_Lock (pt_ptr-Thread_counter_lock); itamar + spin_lock_irqsave (pt_ptr-Thread_counter_lock,flags); itamar pt_ptr-Thread_counter--; itamar - DT_Mdep_Unlock (pt_ptr-Thread_counter_lock); itamar + spin_unlock_irqrestore (pt_ptr-Thread_counter_lock,flags); itamar itamar DT_Mdep_Thread_Destroy_Attributes (thread_ptr); itamar DT_MemListFree (pt_ptr, thread_ptr); itamar Index: test/dapl_test_data.c itamar === itamar --- test/dapl_test_data.c (revision 2509) itamar +++
Re: [openib-general] Problem compiling userspace driver.
On Tue, 2005-05-31 at 10:59, Gleb Natapov wrote: Hello, I am trying to compile libmthca but I get following error: src/mthca.c:101: error: unknown field `query_gid' specified in initializer src/mthca.c:101: warning: initialization from incompatible pointer type src/mthca.c:102: error: unknown field `query_pkey' specified in initializer src/mthca.c:102: warning: initialization from incompatible pointer type Also: src/mthca.c:115: unknown field `attach_mcast' specified in initializer src/mthca.c:115: warning: excess elements in struct initializer src/mthca.c:115: warning: (near initialization for `mthca_ctx_ops') src/mthca.c:116: unknown field `detach_mcast' specified in initializer src/mthca.c:117: warning: excess elements in struct initializer Those fields indeed are missing in verbs.h. If I remove those two lines driver compiles but when I run ibv_devices I get: libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /home/glebn/OpenIB/install/lib/infiniband $ ls /home/glebn/OpenIB/install/lib/infiniband mthca.a mthca.la mthca.so Did you modprobe ib_uverbs ? -- Hal Any help? -- Gleb. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
***SPAM:***Re: [openib-general] Problem compiling userspace driver.
On Tue, May 31, 2005 at 11:09:58AM -0400, Hal Rosenstock wrote: On Tue, 2005-05-31 at 10:59, Gleb Natapov wrote: Hello, I am trying to compile libmthca but I get following error: src/mthca.c:101: error: unknown field `query_gid' specified in initializer src/mthca.c:101: warning: initialization from incompatible pointer type src/mthca.c:102: error: unknown field `query_pkey' specified in initializer src/mthca.c:102: warning: initialization from incompatible pointer type Also: src/mthca.c:115: unknown field `attach_mcast' specified in initializer src/mthca.c:115: warning: excess elements in struct initializer src/mthca.c:115: warning: (near initialization for `mthca_ctx_ops') src/mthca.c:116: unknown field `detach_mcast' specified in initializer src/mthca.c:117: warning: excess elements in struct initializer Right, but those only warnings. Those fields indeed are missing in verbs.h. If I remove those two lines driver compiles but when I run ibv_devices I get: libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /home/glebn/OpenIB/install/lib/infiniband $ ls /home/glebn/OpenIB/install/lib/infiniband mthca.a mthca.la mthca.so Did you modprobe ib_uverbs ? Yes. I had another error before I did this. -- Gleb. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] [ib_at]: Update async structure prior to returning requests to appropriate cache
Committed in revision 2513. On Mon, 30 May 2005, Hal Rosenstock wrote: halr [ib_at]: Update async structure prior to returning requests to halr appropriate cache. This change affacts req_free, free_route_req, and halr free_path_req. halr halr Also, some other minor changes to eliminate unneeded parameter passed to halr path_req_output and changes to some DEBUG messages. halr halr Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] halr halr Index: at.c halr === halr --- at.c (revision 2507) halr +++ at.c (working copy) halr @@ -155,7 +155,8 @@ halr halr static void free_route_req(void *async); halr static void free_path_req(void *async); halr -static void path_req_complete(int stat, struct ib_sa_path_rec *ret, void *ctx); halr +static void path_req_complete(int status, struct ib_sa_path_rec *resp, halr + void *context); halr static int resolve_path(struct path_req *req); halr halr static int resolve_ip(struct ib_at_src *src, u32 dst_ip, u32 src_ip, halr @@ -274,7 +275,6 @@ halr } halr halr memset(dgid, 0, sizeof *dgid); halr - halr return 0; halr } halr halr @@ -319,11 +319,10 @@ halr break; halr default: halr WARN(bad async req type %d, pend-type); halr + pend-status = IB_AT_STATUS_INVALID; halr + pend-type = IBAT_REQ_NONE; halr + pend-sa_query = NULL; halr } halr - halr - pend-status = IB_AT_STATUS_INVALID; halr - pend-type = IBAT_REQ_NONE; halr - pend-sa_query = NULL; halr } halr halr static int req_start(struct async *q, struct async *pend, halr @@ -464,6 +463,11 @@ halr struct route_req *req = container_of(async, struct route_req, pend); halr halr DEBUG(free async %p req %p, async, req); halr + halr + req-pend.status = IB_AT_STATUS_INVALID; halr + req-pend.type = IBAT_REQ_NONE; halr + req-pend.sa_query = NULL; halr + halr kmem_cache_free(route_req_cache, req); halr } halr halr @@ -472,6 +476,11 @@ halr struct path_req *req = container_of(async, struct path_req, pend); halr halr DEBUG(free async %p req %p, async, req); halr + halr + req-pend.status = IB_AT_STATUS_INVALID; halr + req-pend.type = IBAT_REQ_NONE; halr + req-pend.sa_query = NULL; halr + halr kmem_cache_free(path_req_cache, req); halr } halr halr @@ -537,15 +546,14 @@ halr return 1; /* one entry is filled */ halr } halr halr -static int path_req_output(struct path_req *req, struct ib_sa_path_rec *resp, halr -int npath, struct ib_sa_path_rec *out, int nelem) halr +static int path_req_output(struct ib_sa_path_rec *resp, int npath, halr +struct ib_sa_path_rec *out, int nelem) halr { halr int n = min(npath, nelem); halr halr - DEBUG(parent %p output %d records, req, n); halr + DEBUG(fill ib_sa_path_rec %p output %d records, out, n); halr halr memcpy(out, resp, n * sizeof (struct ib_sa_path_rec)); halr - halr return n; halr } halr halr @@ -579,7 +587,7 @@ halr unsigned long flags; halr struct async *pend; halr halr - DEBUG(req %p, req); halr + DEBUG(req %p status %d, req, status); halr halr if (req-pend.parent) { halr WARN(for child req %p???, req); halr @@ -598,12 +606,12 @@ halr return; halr } halr halr - req-pend.nelem = path_req_output(req, resp, 1, halr + req-pend.nelem = path_req_output(resp, 1, halr req-pend.data, req-pend.nelem); halr halr spin_lock_irqsave(pending_reqs.lock, flags); halr for (pend = req-pend.waiting; pend; pend = pend-waiting) halr - pend-nelem = path_req_output(req, resp, 1, halr + pend-nelem = path_req_output(resp, 1, halr pend-data, pend-nelem); halr halr req_end(req-pend, req-pend.nelem, NULL); halr @@ -876,7 +884,7 @@ halr if (in_cache) { halr DEBUG(!in_cache free req %p, preq); halr kmem_cache_free(path_req_cache, preq); halr - return path_req_output(preq, cached_arr, n, path_arr, npath); halr + return path_req_output(cached_arr, n, path_arr, npath); halr } halr */ halr halr @@ -969,7 +977,6 @@ halr EXPORT_SYMBOL(ib_at_status); halr halr halr - halr /* halr * Internal init/cleanup functions: halr */ halr halr halr ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH][kdapl] fix fatal bug in triger the evd upcall
Committed in revision 2514. On Sun, 29 May 2005, Itamar wrote: itamar Hi James, itamar itamar This patch fix a fatal bug that exist in current lastet bits in kdapl (svn rev 2507) itamar As you can see we need to triger the upcall when dapl_evd_dequeue return with good status itamar and quit the method when dapl_evd_dequeue return with non zero status which mean queue is empty. itamar In the current bits no kdapltest can run even the simple quit test. itamar itamar Please in the future before you commit changes to the svn run a simple regression. itamar Any way with this patch the code is working again. itamar itamar fix fatal bug in triger the evd upcall itamar Signed-off-by: Itamar Rabenstein [EMAIL PROTECTED] itamar itamar Index: dapl_cno_util.c itamar === itamar --- dapl_cno_util.c (revision 2509) itamar +++ dapl_cno_util.c (working copy) itamar @@ -115,12 +115,8 @@ itamar itamar for (;;) { itamar status = dapl_evd_dequeue((DAT_EVD_HANDLE)evd, event); itamar - if (DAT_SUCCESS == status) { itamar - dapl_dbg_log(DAPL_DBG_TYPE_ERR, itamar -dapl_evd_dequeue failed: %x\n, status); itamar + if (DAT_SUCCESS != status) itamar return; itamar - } itamar - itamar cno-cno_upcall.upcall_func(cno-cno_upcall.instance_data, itamar event, FALSE); itamar } itamar -- itamar Itamar itamar ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [PATCH][kdapl] replace spin_lock with spin_lock_irqsave in kd apltest
it is only declared not in use so we dont need it (;-) Itamar -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 31, 2005 5:46 PM To: Itamar Cc: openib-general Subject: Re: [PATCH][kdapl] replace spin_lock with spin_lock_irqsave in kdapltest Itamar, Why does this patch comment out uses of the g_PerfTestLock? james On Sun, 29 May 2005, Itamar wrote: itamar With this patch i can run kdapltest -T T ... -t 4 -w 8 ... itamar I still see problems but in general this patch helps the stability a lot. itamar itamar replace spin_lock with spin_lock_irqsave in kdapltest itamar Signed-off-by: Itamar Rabenstein [EMAIL PROTECTED] itamar itamar Index: test/dapl_transaction_stats.c itamar === itamar --- test/dapl_transaction_stats.c (revision 2509) itamar +++ test/dapl_transaction_stats.c (working copy) itamar @@ -45,12 +45,13 @@ itamar DT_transaction_stats_set_ready (DT_Tdep_Print_Head *phead, itamar Transaction_Stats_t * transaction_stats) itamar { itamar -DT_Mdep_Lock (transaction_stats-lock); itamar + unsigned long flags; itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar transaction_stats-wait_count--; itamar itamar DT_Tdep_PT_Debug (1,(phead,Received Sync Message from server (%d left)\n, itamar transaction_stats-wait_count)); itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar } itamar itamar boolean_t itamar @@ -86,7 +87,8 @@ itamar unsigned int bytes_rdma_read, itamar unsigned int bytes_rdma_write) itamar { itamar -DT_Mdep_Lock (transaction_stats-lock); itamar + unsigned long flags; itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar itamar /* look for the longest time... */ itamar if (time_ms transaction_stats-time_ms) itamar @@ -99,5 +101,5 @@ itamar transaction_stats-bytes_recv += bytes_recv; itamar transaction_stats-bytes_rdma_read += bytes_rdma_read; itamar transaction_stats-bytes_rdma_write += bytes_rdma_write; itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar } itamar Index: test/dapl_server.c itamar === itamar --- test/dapl_server.c(revision 2509) itamar +++ test/dapl_server.c(working copy) itamar @@ -49,7 +49,7 @@ itamar unsigned char *buffp = NULL; itamar unsigned char *module = DT_cs_Server; itamar int status = 0; itamar - itamar + unsigned long flags; itamar DAT_DTO_COOKIE dto_cookie; itamar struct dat_dto_completion_event_data dto_stat; itamar u32 ret; itamar @@ -616,9 +616,9 @@ itamar itamar itamar /* Count this new client and get ready for the next */ itamar - DT_Mdep_Lock (ps_ptr-num_clients_lock); itamar + spin_lock_irqsave (ps_ptr-num_clients_lock,flags); itamar ps_ptr-num_clients++; itamar - DT_Mdep_Unlock (ps_ptr-num_clients_lock); itamar + spin_unlock_irqrestore (ps_ptr-num_clients_lock,flags); itamar itamar /* we passed the pt_ptr to the thread and must now 'forget' it */ itamar pt_ptr = NULL; itamar Index: test/dapl_thread.c itamar === itamar --- test/dapl_thread.c(revision 2509) itamar +++ test/dapl_thread.c(working copy) itamar @@ -83,6 +83,7 @@ itamar unsigned int stacksize) itamar { itamar Thread *thread_ptr; itamar + unsigned long flags; itamar thread_ptr = (Thread *) DT_MemListAlloc (pt_ptr, thread.c, THREAD, sizeof (Thread)); itamar if (thread_ptr == NULL) itamar { itamar @@ -93,9 +94,9 @@ itamar thread_ptr-thread_handle = 0; itamar thread_ptr-stacksize = stacksize; itamar itamar -DT_Mdep_Lock (pt_ptr-Thread_counter_lock); itamar +spin_lock_irqsave (pt_ptr-Thread_counter_lock,flags); itamar pt_ptr-Thread_counter++; itamar -DT_Mdep_Unlock (pt_ptr-Thread_counter_lock); itamar +spin_unlock_irqrestore (pt_ptr-Thread_counter_lock,flags); itamar itamar DT_Mdep_Thread_Init_Attributes (thread_ptr); itamar itamar @@ -108,11 +109,12 @@ itamar void itamar DT_Thread_Destroy (Thread * thread_ptr, Per_Test_Data_t * pt_ptr) itamar { itamar + unsigned long flags; itamar if (thread_ptr) itamar { itamar - DT_Mdep_Lock (pt_ptr-Thread_counter_lock); itamar + spin_lock_irqsave (pt_ptr-Thread_counter_lock,flags); itamar pt_ptr-Thread_counter--; itamar - DT_Mdep_Unlock
[openib-general] Re: [PATCH] kDAPL: remove typedef DAT_CONTEXT
Mostly committed in revision 2515. I didn't remove DAT_UPCALL_NULL and DAT_UPCALL_SAME. DAT_UPCALL_NULL is provided as a convenience to the consumer. I think it is useful, but I'm willing to hear other opinions. The provider's implementation of dat_evd_modify_upcall() should check for the DAT_UPCALL_SAME value. The fact that it doesn't is a bug. james On Fri, 27 May 2005, Tom Duffy wrote: tduffy Get rid of the typedef DAT_CONTEXT. tduffy tduffy Signed-off-by: Tom Duffy [EMAIL PROTECTED] tduffy tduffy Index: linux-kernel/test/dapltest/include/dapl_common.h tduffy === tduffy --- linux-kernel/test/dapltest/include/dapl_common.h(revision 2506) tduffy +++ linux-kernel/test/dapltest/include/dapl_common.h(working copy) tduffy @@ -42,7 +42,7 @@ typedef enum tduffy typedef struct tduffy { tduffy DAT_RMR_CONTEXT rmr_context; tduffy -DAT_CONTEXT mem_address; tduffy +union dat_context mem_address; tduffy } RemoteMemoryInfo; tduffy #pragma pack() tduffy tduffy Index: linux-kernel/dat-provider/dapl_get_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_get_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_get_consumer_context.c (working copy) tduffy @@ -48,7 +48,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_PARAMETER tduffy */ tduffy -u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT *context) tduffy +u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, union dat_context *context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl_set_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_set_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_set_consumer_context.c (working copy) tduffy @@ -47,7 +47,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_HANDLE tduffy */ tduffy -u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT context) tduffy +u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, union dat_context context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl.h tduffy === tduffy --- linux-kernel/dat-provider/dapl.h(revision 2506) tduffy +++ linux-kernel/dat-provider/dapl.h(working copy) tduffy @@ -177,7 +177,7 @@ struct dapl_header { tduffy enum dat_handle_type handle_type; tduffy struct dapl_ia *owner_ia; tduffy struct dapl_llist_entry ia_list_entry; tduffy - DAT_CONTEXT user_context; /* user context - opaque to DAPL */ tduffy + union dat_context user_context; /* user context - opaque to DAPL */ tduffy spinlock_t lock; tduffy unsigned long flags; /* saved lock flag values */ tduffy }; tduffy @@ -423,9 +423,11 @@ extern u32 dapl_ia_query(DAT_IA_HANDLE, tduffy tduffy /* helper functions */ tduffy tduffy -extern u32 dapl_set_consumer_context(DAT_HANDLE handle, DAT_CONTEXT context); tduffy +extern u32 dapl_set_consumer_context(DAT_HANDLE handle, tduffy +union dat_context context); tduffy tduffy -extern u32 dapl_get_consumer_context(DAT_HANDLE handle, DAT_CONTEXT *context); tduffy +extern u32 dapl_get_consumer_context(DAT_HANDLE handle, tduffy +union dat_context *context); tduffy tduffy extern u32 dapl_get_handle_type(DAT_HANDLE handle, tduffy enum dat_handle_type *type); tduffy Index: linux-kernel/dat/dat.h tduffy === tduffy --- linux-kernel/dat/dat.h (revision 2506) tduffy +++ linux-kernel/dat/dat.h (working copy) tduffy @@ -361,14 +361,14 @@ typedef enum { tduffy TRUE = 1 tduffy } boolean_t; tduffy tduffy -typedef union dat_context { tduffy +union dat_context { tduffy void *as_ptr; tduffy u64 as_64; tduffy unsigned long long as_index; tduffy -} DAT_CONTEXT; tduffy +}; tduffy tduffy -typedef DAT_CONTEXT DAT_DTO_COOKIE; tduffy -typedef DAT_CONTEXT DAT_RMR_COOKIE; tduffy +typedef union dat_context DAT_DTO_COOKIE; tduffy +typedef union dat_context DAT_RMR_COOKIE; tduffy tduffy enum dat_completion_flags { tduffy /* Completes with notification */ tduffy @@ -920,13 +920,6 @@ struct dat_upcall_object { tduffy DAT_UPCALL_FUNC upcall_func; tduffy }; tduffy tduffy -/* Define NULL upcall */ tduffy - tduffy -#define DAT_UPCALL_NULL \ tduffy - ((struct dat_upcall_object) {
[openib-general] RE: [PATCH][kdapl] replace spin_lock with spin_lock_irqsave in kd apltest
Ok, then it should be removed completely not commented out. I'll do that and commit. On Tue, 31 May 2005, Itamar Rabenstein wrote: itamar it is only declared not in use so we dont need it (;-) itamar itamar Itamar itamar itamar -Original Message- itamar From: James Lentini [mailto:[EMAIL PROTECTED] itamar Sent: Tuesday, May 31, 2005 5:46 PM itamar To: Itamar itamar Cc: openib-general itamar Subject: Re: [PATCH][kdapl] replace spin_lock with itamar spin_lock_irqsave in itamar kdapltest itamar itamar itamar itamar Itamar, itamar itamar Why does this patch comment out uses of the g_PerfTestLock? itamar itamar james itamar itamar On Sun, 29 May 2005, Itamar wrote: itamar itamar itamar With this patch i can run kdapltest -T T ... -t 4 -w 8 ... itamar itamar I still see problems but in general this patch helps itamar the stability a lot. itamar itamar itamar itamar replace spin_lock with spin_lock_irqsave in kdapltest itamar itamar Signed-off-by: Itamar Rabenstein [EMAIL PROTECTED] itamar itamar itamar itamar Index: test/dapl_transaction_stats.c itamar itamar itamar === itamar itamar --- test/dapl_transaction_stats.c (revision 2509) itamar itamar +++ test/dapl_transaction_stats.c (working copy) itamar itamar @@ -45,12 +45,13 @@ itamar itamar DT_transaction_stats_set_ready (DT_Tdep_Print_Head *phead, itamar itamar Transaction_Stats_t * itamar transaction_stats) itamar itamar { itamar itamar -DT_Mdep_Lock (transaction_stats-lock); itamar itamar + unsigned long flags; itamar itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar itamar transaction_stats-wait_count--; itamar itamar itamar itamar DT_Tdep_PT_Debug (1,(phead,Received Sync itamar Message from server (%d left)\n, itamar itamar transaction_stats-wait_count)); itamar itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar itamar } itamar itamar itamar itamar boolean_t itamar itamar @@ -86,7 +87,8 @@ itamar itamar unsigned int bytes_rdma_read, itamar itamar unsigned int bytes_rdma_write) itamar itamar { itamar itamar -DT_Mdep_Lock (transaction_stats-lock); itamar itamar + unsigned long flags; itamar itamar +spin_lock_irqsave (transaction_stats-lock,flags); itamar itamar itamar itamar /* look for the longest time... */ itamar itamar if (time_ms transaction_stats-time_ms) itamar itamar @@ -99,5 +101,5 @@ itamar itamar transaction_stats-bytes_recv += bytes_recv; itamar itamar transaction_stats-bytes_rdma_read += bytes_rdma_read; itamar itamar transaction_stats-bytes_rdma_write += bytes_rdma_write; itamar itamar -DT_Mdep_Unlock (transaction_stats-lock); itamar itamar +spin_unlock_irqrestore (transaction_stats-lock,flags); itamar itamar } itamar itamar Index: test/dapl_server.c itamar itamar itamar === itamar itamar --- test/dapl_server.c(revision 2509) itamar itamar +++ test/dapl_server.c(working copy) itamar itamar @@ -49,7 +49,7 @@ itamar itamar unsigned char *buffp = NULL; itamar itamar unsigned char *module = DT_cs_Server; itamar itamar int status = 0; itamar itamar - itamar itamar + unsigned long flags; itamar itamar DAT_DTO_COOKIE dto_cookie; itamar itamar struct dat_dto_completion_event_data dto_stat; itamar itamar u32 ret; itamar itamar @@ -616,9 +616,9 @@ itamar itamar itamar itamar itamar itamar /* Count this new client and get ready for the next */ itamar itamar - DT_Mdep_Lock (ps_ptr-num_clients_lock); itamar itamar + spin_lock_irqsave (ps_ptr-num_clients_lock,flags); itamar itamar ps_ptr-num_clients++; itamar itamar - DT_Mdep_Unlock (ps_ptr-num_clients_lock); itamar itamar + spin_unlock_irqrestore itamar (ps_ptr-num_clients_lock,flags); itamar itamar itamar itamar /* we passed the pt_ptr to the thread and must itamar now 'forget' it */ itamar itamar pt_ptr = NULL; itamar itamar Index: test/dapl_thread.c itamar itamar itamar === itamar itamar --- test/dapl_thread.c(revision 2509) itamar itamar +++ test/dapl_thread.c(working copy) itamar itamar @@ -83,6 +83,7 @@ itamar itamar unsigned int stacksize) itamar itamar { itamar itamar Thread *thread_ptr; itamar itamar + unsigned long flags; itamar itamar thread_ptr = (Thread *) DT_MemListAlloc (pt_ptr, itamar thread.c, THREAD, sizeof (Thread)); itamar itamar if (thread_ptr == NULL) itamar itamar {
Re: [openib-general] Problem compiling userspace driver.
Sorry, I had some uncommitted changes left in my tree. So of course I didn't see any problems. I just checked in the required libibverbs changes. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [PATCH] kDAPL: remove typedef DAT_CONTEX T
hi all, I have tried to use DAT_UPCALL_NULL and I got compile error and I don't think that it is good to try to make comparator ( = ) between structs if we want to check for DAT_UPCALL_NULL we need to check that the CB function pointer is NULL. I mean that if you want to use DAT_UPCALL_NULL you need to have real dat_upcall struct and to set the CB function to NULL. instead of casting NULL to be a struct. Currently dat_evd_modify_upcall() is not implemented according the spec. Itamar -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 31, 2005 6:39 PM To: Tom Duffy Cc: openib-general@openib.org Subject: [openib-general] Re: [PATCH] kDAPL: remove typedef DAT_CONTEXT Mostly committed in revision 2515. I didn't remove DAT_UPCALL_NULL and DAT_UPCALL_SAME. DAT_UPCALL_NULL is provided as a convenience to the consumer. I think it is useful, but I'm willing to hear other opinions. The provider's implementation of dat_evd_modify_upcall() should check for the DAT_UPCALL_SAME value. The fact that it doesn't is a bug. james On Fri, 27 May 2005, Tom Duffy wrote: tduffy Get rid of the typedef DAT_CONTEXT. tduffy tduffy Signed-off-by: Tom Duffy [EMAIL PROTECTED] tduffy tduffy Index: linux-kernel/test/dapltest/include/dapl_common.h tduffy === tduffy --- linux-kernel/test/dapltest/include/dapl_common.h (revision 2506) tduffy +++ linux-kernel/test/dapltest/include/dapl_common.h (working copy) tduffy @@ -42,7 +42,7 @@ typedef enum tduffy typedef struct tduffy { tduffy DAT_RMR_CONTEXT rmr_context; tduffy -DAT_CONTEXT mem_address; tduffy +union dat_context mem_address; tduffy } RemoteMemoryInfo; tduffy #pragma pack() tduffy tduffy Index: linux-kernel/dat-provider/dapl_get_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_get_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_get_consumer_context.c (working copy) tduffy @@ -48,7 +48,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_PARAMETER tduffy */ tduffy -u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT *context) tduffy +u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, union dat_context *context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl_set_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_set_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_set_consumer_context.c (working copy) tduffy @@ -47,7 +47,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_HANDLE tduffy */ tduffy -u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT context) tduffy +u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, union dat_context context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl.h tduffy === tduffy --- linux-kernel/dat-provider/dapl.h (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl.h (working copy) tduffy @@ -177,7 +177,7 @@ struct dapl_header { tduffy enum dat_handle_type handle_type; tduffy struct dapl_ia *owner_ia; tduffy struct dapl_llist_entry ia_list_entry; tduffy - DAT_CONTEXT user_context; /* user context - opaque to DAPL */ tduffy + union dat_context user_context; /* user context - opaque to DAPL */ tduffy spinlock_t lock; tduffy unsigned long flags; /* saved lock flag values */ tduffy }; tduffy @@ -423,9 +423,11 @@ extern u32 dapl_ia_query(DAT_IA_HANDLE, tduffy tduffy /* helper functions */ tduffy tduffy -extern u32 dapl_set_consumer_context(DAT_HANDLE handle, DAT_CONTEXT context); tduffy +extern u32 dapl_set_consumer_context(DAT_HANDLE handle, tduffy + union dat_context context); tduffy tduffy -extern u32 dapl_get_consumer_context(DAT_HANDLE handle, DAT_CONTEXT *context); tduffy +extern u32 dapl_get_consumer_context(DAT_HANDLE handle, tduffy + union dat_context *context); tduffy tduffy extern u32 dapl_get_handle_type(DAT_HANDLE handle, tduffy enum dat_handle_type *type); tduffy Index: linux-kernel/dat/dat.h tduffy === tduffy --- linux-kernel/dat/dat.h(revision 2506) tduffy +++ linux-kernel/dat/dat.h(working copy) tduffy @@ -361,14 +361,14 @@ typedef enum { tduffy TRUE = 1 tduffy }
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Sat, May 28, 2005 at 04:26:43PM -0700, Caitlin Bestler wrote: ... if so what the best strategy for achieving it is (try to plan an IB/iWARP merge immediately or wait until there is an iWARP code base). If there is no iWARP code base, I fail to see how one can merge. Having a specification is one basis for communication. Linux developers normally use existing code as the basis. Committees submit CRs (Change Requests) to update specs. The CRs get voted on by the committee. Linux developers submit patches. The Linux subsystems maintainer(s) decide if patches are ok or not. Claiming that an InfiniBand-specific interface is somehow thinking long term is just plain ludicrous. It Works is worth 10x more to *any* customer than a transport neutral API that only exists as a spec. The specs are guides to how something *should* work and linux tries to comply with them (e.g. 802.3 or T10) where HW implementations actually follow the spec. That doesn't mean linux has to implement every brain damaged spec that some committee comes up withOTOH, rdmaconsortium.org does have a fair shot given I2O made it into the kernel. :^/ (I'm willing to have a conversation about why I think I2O is brain damaged if someone else is buying drinks. It's not total crap, but it certainly has it's downside.) Now it may be that the short term interest of the InfiniBand vendors is such that they cannot commit resources to helping build a transport neutral API. That is always a legitimate tradeoff, but it is short term corporate thinking. Please, that horse is already dead. They have offered to review patches to make the API transport neutral. Test that offer. Submit patches and move the conversation on to something that is more constructive. Last time I looked most of the commits being made to OpenIB (or sourceforge DAPL) were from being drawing paychecks from those evil corporations. Yes, so? The issue isn't the funding - it's the goals. Compare the gen1 stack (I'm being careful to not pick on any IB vendors) to the gen2 stack. The difference is between corporate code and linux code - mostly funded by the same corporation with several of the same programmers. gen1 stack came from somehing that attempted to build/run a shared user/kernel space on every distro. The Makefiles are just a mess - nevermind the code. grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Sat, May 28, 2005 at 05:18:39PM -0700, Caitlin Bestler wrote: Verus.. struct rdma_xyz { /* common fields */ }; struct rdma_xyz_ib { struct rdma_xyz common; /* ib fields */ }; struct rdma_xyz_iwarp { struct rdma_xyz common; /* iwarp fields */ }; The latter style is extensible, but makes it difficult to properly allocate a buffer that works for all variants. The latter assumes the transport specific code is owns responsibilty for allocating/deallocating those buffers. It also forces the generic code to be completely ignorant of the transport specific stuff. It doesn't allow the programmer to hacking around in the public unions. The union style is also already in use in both IT-API and RNIC-PI. I personally prefer sub-classing to unions, but I have found myself in the minority on *most* projects where the issue has been discussed. One reason is that sub-classing provides very little type-safety. struct sockaddr is an example of this. It takes manual inspection to ensure that the variants are properly differentiated and it is still common for developers to pass in a plain struct sockaddr without realizing that it is not large enough for a struct sockaddr_in6. IMHO, unions are a sort of casts on whole structures. Neither method really offers an advantage in type checking. Both require one knows which type is the right one. grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash
On Fri, 2005-05-27 at 17:30, Tom Duffy wrote: Also, did you pick up the user_mad.c fix on Tuesday AM ? If it was, any other changes are either not related or trivial. After you picked up these changes, did you regenerate the various OpenSM makefiles (a define for RMPP changed in them) or just rebuild ? [This would not explain the crash, but is different from how my OpenSM is built.] I just reran make from the toplevel (management) after updating. I would think it would rebuild them if something changed, no? There are certain changes where the makefiles need to be regenerated (and this is not done automatically). Since there was an additional compile flag added, they need to be regenerated or else it is being built the old way (without the real RMPP support enabled). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Fri, 27 May 2005, Tom Duffy wrote: On Thu, 2005-05-26 at 22:25 -0700, Sean Hefty wrote: So, here is the strategy I am taking. Please let me know if it is wrong. When dapl_ep_connect() is called, I save off the timeout value into the dapl_ep struct. Then, when we get ready to call ib_send_cm_req(), I stuff the timeout value (after munging it into IB's strange format) into the conn params remote_cm_response_timeout. From a CM perspective, this sounds fine. Note that the CM timeout will not occur until the number of retries has been met. So I don't know if the timeout passed to dapl_ep_connect() should convert directly into the remote_cm_response_timeout, or needs to be divided by the number of retries. So, are you saying that if you have a timeout of 4 seconds (you pass in 20) and you have retries set to 2, that it will fail after 8 seconds? James, what is the timeout value passed into dapl_ep_connect mean, the total timeout time? Or how much for each retry? It is the total timeout value. Also, did you notice that dapl_ib_connect always sets the timeout to 20 (4 seconds) no matter what? Should this be the case? The timeout should not be constant as it is now. It was being unnecessarily emulated with the extra timeout thread. If the connection fails to complete within the timeout, dapl_cm_active_cb_handler() is called with IB_CM_REQ_ERROR which in turn calls dapl_evd_connection_callback() which does the same thing that dapl_ep_timeout() used to do -- tear down the connection. I haven't looked at your changes, but note that calling ib_destroy_cm_id from within the CM callback thread will hang. The callback holds a reference on the cm_id. The good news is that there should be code in kDAPL to catch this. I will take a look and see if this could happen. Tom, I don't believe that you've changed Hal and Sean's implementation of this. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers insteadof own
Tom, We should attempt to connect for no less than dat_ep_connect's timeout value. We don't need to guarantee that the connection attempts will last for exactly a specific time. Sean, Is there any way of requesting an infinite number of retries? On Fri, 27 May 2005, Sean Hefty wrote: From a CM perspective, this sounds fine. Note that the CM timeout will not occur until the number of retries has been met. So I don't know if the timeout passed to dapl_ep_connect() should convert directly into the remote_cm_response_timeout, or needs to be divided by the number of retries. So, are you saying that if you have a timeout of 4 seconds (you pass in 20) and you have retries set to 2, that it will fail after 8 seconds? James, what is the timeout value passed into dapl_ep_connect mean, the total timeout time? Or how much for each retry? If you pass in a timeout of 4 seconds with retries to 2, the call will timeout in 12 seconds. The request will be sent 3 times (2 retries). I should also note that the CM timeout includes the packet lifetime (round trip time) in its timeout calculation, but this should be small. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
Exactly, the code matters from Linux community standpoint and the discussion around the convergence of common PI is mute until we have that header file definition but which will come out soon. However, I am quite glad to see the OpenIB and OpenRDMA communities in agreement on common ULP's and DAPL/IT-API (even though, there are some disagreements on these APIs). Also, as you pointed out, I absolutely agree the differences between Gen1 and Gen2 but which is exactly what I wanted to avoid with OpenRDMA and rather start from a clean slate right from the beginning through opensource fashion - basically, don't want the code to be dumped by some corporate developers. Thanks Venkat [EMAIL PROTECTED] wrote on 05/31/2005 09:38:51 AM: On Sat, May 28, 2005 at 04:26:43PM -0700, Caitlin Bestler wrote: ... if so what the best strategy for achieving it is (try to plan an IB/iWARP merge immediately or wait until there is an iWARP code base). If there is no iWARP code base, I fail to see how one can merge. Having a specification is one basis for communication. Linux developers normally use existing code as the basis. Committees submit CRs (Change Requests) to update specs. The CRs get voted on by the committee. Linux developers submit patches. The Linux subsystems maintainer(s) decide if patches are ok or not. Claiming that an InfiniBand-specific interface is somehow thinking long term is just plain ludicrous. It Works is worth 10x more to *any* customer than a transport neutral API that only exists as a spec. The specs are guides to how something *should* work and linux tries to comply with them (e.g. 802.3 or T10) where HW implementations actually follow the spec. That doesn't mean linux has to implement every brain damaged spec that some committee comes up withOTOH, rdmaconsortium.org does have a fair shot given I2O made it into the kernel. :^/ (I'm willing to have a conversation about why I think I2O is brain damaged if someone else is buying drinks. It's not total crap, but it certainly has it's downside.) Now it may be that the short term interest of the InfiniBand vendors is such that they cannot commit resources to helping build a transport neutral API. That is always a legitimate tradeoff, but it is short term corporate thinking. Please, that horse is already dead. They have offered to review patches to make the API transport neutral. Test that offer. Submit patches and move the conversation on to something that is more constructive. Last time I looked most of the commits being made to OpenIB (or sourceforge DAPL) were from being drawing paychecks from those evil corporations. Yes, so? The issue isn't the funding - it's the goals. Compare the gen1 stack (I'm being careful to not pick on any IB vendors) to the gen2 stack. The difference is between corporate code and linux code - mostly funded by the same corporation with several of the same programmers. gen1 stack came from somehing that attempted to build/run a shared user/kernel space on every distro. The Makefiles are just a mess - nevermind the code. grant --- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 ___ Rdma-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/rdma-developers ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Tue, 2005-05-31 at 13:27, James Lentini wrote: James, what is the timeout value passed into dapl_ep_connect mean, the total timeout time? Or how much for each retry? It is the total timeout value. Total meaning all everything inclusive ? If that is what it is supposed to be, that is not what is implemented now: DAPL_IB_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ DAPL_IB_MAX_CM_RETRIES 4 There are also the timeout/retries of IBAT as well. DAPL_IB_MAX_AT_RETRY 3 IB_AT_REQ_RETRY_MS 100 -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
Here's the specification's exact description: timeout: Duration of time, in microseconds, that a consumer waits for connection establishment. The value of DAT_TIMEOUT_INFINITE represents no timeout, indefinite wait. Values must be positive. My perspective is that we are not implementing this API for a real time operating system and therefore should take a fuzzy view of time. My interpretation of the definition above is that a provider should attempt to establish a connection for a least [timeout] time. If a connection is not established after attempting for at least [timeout] time, the provider should should give up and post a connection failure event. If there is some reasonable additional time needed for address resolution, etc., I think that is acceptable. james On Tue, 31 May 2005, Hal Rosenstock wrote: On Tue, 2005-05-31 at 13:27, James Lentini wrote: James, what is the timeout value passed into dapl_ep_connect mean, the total timeout time? Or how much for each retry? It is the total timeout value. Total meaning all everything inclusive ? If that is what it is supposed to be, that is not what is implemented now: DAPL_IB_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ DAPL_IB_MAX_CM_RETRIES 4 There are also the timeout/retries of IBAT as well. DAPL_IB_MAX_AT_RETRY 3 IB_AT_REQ_RETRY_MS 100 -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Fri, 27 May 2005, Bob Woodruff wrote: Caitlin wrote, Both uDAPL and kDAPL were designed for *application* use. Even kDAPL is more intended for use by a kernel daemon that is loaded separately from the kernel than for use within the kernel itself. kDAPL is intended as a kernel-level API for RDMA enabled fabrics. As it was initially written, it does not meet the Linux coding style and that is why it is being totally reworked as we speak to meet that goal. An ideal API for use within the kernel would abstract as much as possible (without requiring emulation), and then have transport specific unions or enum values. It would hide no control options, merely provide common controls for common capabilities. So for every new RDMA device type that comes along, you need to add a new enum, and unions for device class specific stuff, etc. Seems rather static and not easily extended. Not to mention that testing nightmare when the thing has to support 20 different types of RDMA enabled devices. I think code like that could get pretty ugly pretty fast. I'd rather see a registration mechanism like what we already have with DAPL that does not require any code changes to add a new RDMA device/provider. We have already proven that this works in DAPL as I know if at least 3 providers, IB, Myrinet, and RNIC (Ammasso) that were developed separately and were able to co-exist without any changes (enums and device class unions) in the DAT mid-layer. I assume that this can also be done with kDAPL in the kernel, but I defer to the DAPL experts to answer that one. Correct. The DAT API (kernel and user) is designed to support heterogeneous providers. The modifications we are making in https://openib.org/svn/gen2/users/jlentini/ will not change that. james ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] [kdapl CM] Add more debug on connection destruction
[kdapl CM] Add more debug on connection destruction Also, make naming of retry defines consistent Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] Index: dapl_openib_cm.c === --- dapl_openib_cm.c(revision 2507) +++ dapl_openib_cm.c(working copy) @@ -42,7 +42,7 @@ #define DAPL_IB_RNR_RETRY_COUNT 6 #define DAPL_IB_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ #define DAPL_IB_MAX_CM_RETRIES 4 -#define DAPL_IB_MAX_AT_RETRY3 +#define DAPL_IB_MAX_AT_RETRIES 3 /* Should these be queried ? */ #define DAPL_IB_TARGET_MAX 4 /* responder resources (max_qp_ous_rd_atom) */ @@ -65,6 +65,9 @@ spin_unlock_irqrestore(conn-lock, flags); if (!in_callback) { + dapl_dbg_log(DAPL_DBG_TYPE_CM, + dapl_destroy_cm_id: conn %p CM ID %p\n, +conn, conn-cm_id); ib_destroy_cm_id(conn-cm_id); if (conn-ep) conn-ep-cm_handle = NULL; @@ -297,7 +300,7 @@ if (rec_num = 0) { printk(KERN_ERR dapl_path_comp_handler: path resolution failed %d retry %d!!!\n, rec_num, conn-retries + 1); - if (++conn-retries DAPL_IB_MAX_AT_RETRY) { + if (++conn-retries DAPL_IB_MAX_AT_RETRIES) { printk(KERN_ERR dapl_path_comp_handler: ep_ptr 0x%p\n, conn-ep); event = DAT_CONNECTION_EVENT_UNREACHABLE; @@ -346,7 +349,7 @@ if (rec_num = 0) { printk(KERN_ERR dapl_rt_comp_handler: rec num %d retry %d\n, rec_num, conn-retries + 1); - if (++conn-retries DAPL_IB_MAX_AT_RETRY) { + if (++conn-retries DAPL_IB_MAX_AT_RETRIES) { event = DAT_CONNECTION_EVENT_UNREACHABLE; goto error; } @@ -580,6 +583,9 @@ struct dapl_ia *ia_ptr; int ib_status; + dapl_dbg_log(DAPL_DBG_TYPE_CM, + dapl_ib_reinit_ep: EP %p\n, ep_ptr); + ia_ptr = ep_ptr-header.owner_ia; /* @@ -671,6 +677,10 @@ */ u32 dapl_ib_remove_conn_listener(struct dapl_ia *ia_ptr, struct dapl_sp *sp_ptr) { +dapl_dbg_log(DAPL_DBG_TYPE_CM, + dapl_ib_remove_conn_listener: SP %p conn %p\n, +sp_ptr, sp_ptr-cm_srvc_handle); + /* * This will hang if called from CM thread context... * Move back to using WQ... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Tue, 31 May 2005, Hal Rosenstock wrote: On Tue, 2005-05-31 at 14:17, James Lentini wrote: Here's the specification's exact description: timeout: Duration of time, in microseconds, that a consumer waits for connection establishment. The value of DAT_TIMEOUT_INFINITE represents no timeout, indefinite wait. Values must be positive. My perspective is that we are not implementing this API for a real time operating system and therefore should take a fuzzy view of time. Fuzzy in that we are certainly not concerned with the granularity of microseconds. My interpretation of the definition above is that a provider should attempt to establish a connection for a least [timeout] time. So any number of retries is allowed up to the time period specified (depending on the timeout used) ? Correct, any number of retries (including 0) is allowed. Once the time period expires, the provider should post a result as quickly as possible. If a connection is not established after attempting for at least [timeout] time, the provider should should give up and post a connection failure event. If there is some reasonable additional time needed for address resolution, etc., I think that is acceptable. This all can be bundled in. One just needs to know what the requirement is. If we included address resolution, how would we divide up the time between address resolution and cm protocol? Wouldn't we have to track how long address resolution took to complete? -- Hal james On Tue, 31 May 2005, Hal Rosenstock wrote: On Tue, 2005-05-31 at 13:27, James Lentini wrote: James, what is the timeout value passed into dapl_ep_connect mean, the total timeout time? Or how much for each retry? It is the total timeout value. Total meaning all everything inclusive ? If that is what it is supposed to be, that is not what is implemented now: DAPL_IB_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ DAPL_IB_MAX_CM_RETRIES 4 There are also the timeout/retries of IBAT as well. DAPL_IB_MAX_AT_RETRY 3 IB_AT_REQ_RETRY_MS 100 -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Tue, 2005-05-31 at 15:57, James Lentini wrote: If we included address resolution, how would we divide up the time between address resolution and cm protocol? Wouldn't we have to track how long address resolution took to complete? Yes, to follow the requirement closely, one would need to time the duration of the address translation but that is pretty straightforward to do. IBAT already has to time out requests anyway. The worst case for address resolution is currently 4 * 100 msec. Other alternatives are to subtract the maximal address translation time off the time supplied and use the rest for CM, or as you said ignore this time and use it all for CM purposes (and just go over by whatever amount this is). Did other implementations factor this in or did they ignore this ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Tue, 31 May 2005, Hal Rosenstock wrote: On Tue, 2005-05-31 at 15:57, James Lentini wrote: If we included address resolution, how would we divide up the time between address resolution and cm protocol? Wouldn't we have to track how long address resolution took to complete? Yes, to follow the requirement closely, one would need to time the duration of the address translation but that is pretty straightforward to do. IBAT already has to time out requests anyway. The worst case for address resolution is currently 4 * 100 msec. If we can account for all of the time properly, then we should implement it that way. Other alternatives are to subtract the maximal address translation time off the time supplied and use the rest for CM, or as you said ignore this time and use it all for CM purposes (and just go over by whatever amount this is). Did other implementations factor this in or did they ignore this ? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash
On Tue, 2005-05-31 at 13:09 -0400, Hal Rosenstock wrote: There are certain changes where the makefiles need to be regenerated (and this is not done automatically). Since there was an additional compile flag added, they need to be regenerated or else it is being built the old way (without the real RMPP support enabled). $ make automake at the toplevel should take care of this, no? -tduffy signature.asc Description: This is a digitally signed message part ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: [PATCH] kDAPL: remove typedef DAT_CONTEX T
On Tue, 31 May 2005, Itamar Rabenstein wrote: hi all, I have tried to use DAT_UPCALL_NULL and I got compile error and I don't think that it is good to try to make comparator ( = ) between structs if we want to check for DAT_UPCALL_NULL we need to check that the CB function pointer is NULL. Where is DAT_UPCALL_NULL used in a comparison? I mean that if you want to use DAT_UPCALL_NULL you need to have real dat_upcall struct and to set the CB function to NULL. instead of casting NULL to be a struct. DAT_UPCALL_NULL is not NULL cast to a struct. It is defined as: #define DAT_UPCALL_NULL \ ((struct dat_upcall_object) { (void *) NULL, (DAT_UPCALL_FUNC) NULL }) Currently dat_evd_modify_upcall() is not implemented according the spec. Itamar -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 31, 2005 6:39 PM To: Tom Duffy Cc: openib-general@openib.org Subject: [openib-general] Re: [PATCH] kDAPL: remove typedef DAT_CONTEXT Mostly committed in revision 2515. I didn't remove DAT_UPCALL_NULL and DAT_UPCALL_SAME. DAT_UPCALL_NULL is provided as a convenience to the consumer. I think it is useful, but I'm willing to hear other opinions. The provider's implementation of dat_evd_modify_upcall() should check for the DAT_UPCALL_SAME value. The fact that it doesn't is a bug. james On Fri, 27 May 2005, Tom Duffy wrote: tduffy Get rid of the typedef DAT_CONTEXT. tduffy tduffy Signed-off-by: Tom Duffy [EMAIL PROTECTED] tduffy tduffy Index: linux-kernel/test/dapltest/include/dapl_common.h tduffy === tduffy --- linux-kernel/test/dapltest/include/dapl_common.h (revision 2506) tduffy +++ linux-kernel/test/dapltest/include/dapl_common.h (working copy) tduffy @@ -42,7 +42,7 @@ typedef enum tduffy typedef struct tduffy { tduffy DAT_RMR_CONTEXT rmr_context; tduffy -DAT_CONTEXT mem_address; tduffy +union dat_context mem_address; tduffy } RemoteMemoryInfo; tduffy #pragma pack() tduffy tduffy Index: linux-kernel/dat-provider/dapl_get_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_get_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_get_consumer_context.c (working copy) tduffy @@ -48,7 +48,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_PARAMETER tduffy */ tduffy -u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT *context) tduffy +u32 dapl_get_consumer_context(DAT_HANDLE dat_handle, union dat_context *context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl_set_consumer_context.c tduffy === tduffy --- linux-kernel/dat-provider/dapl_set_consumer_context.c (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl_set_consumer_context.c (working copy) tduffy @@ -47,7 +47,7 @@ tduffy * DAT_SUCCESS tduffy * DAT_INVALID_HANDLE tduffy */ tduffy -u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, DAT_CONTEXT context) tduffy +u32 dapl_set_consumer_context(DAT_HANDLE dat_handle, union dat_context context) tduffy { tduffy u32 dat_status = DAT_SUCCESS; tduffy struct dapl_header *header; tduffy Index: linux-kernel/dat-provider/dapl.h tduffy === tduffy --- linux-kernel/dat-provider/dapl.h (revision 2506) tduffy +++ linux-kernel/dat-provider/dapl.h (working copy) tduffy @@ -177,7 +177,7 @@ struct dapl_header { tduffy enum dat_handle_type handle_type; tduffy struct dapl_ia *owner_ia; tduffy struct dapl_llist_entry ia_list_entry; tduffy -DAT_CONTEXT user_context; /* user context - opaque to DAPL */ tduffy +union dat_context user_context; /* user context - opaque to DAPL */ tduffy spinlock_t lock; tduffy unsigned long flags; /* saved lock flag values */ tduffy }; tduffy @@ -423,9 +423,11 @@ extern u32 dapl_ia_query(DAT_IA_HANDLE, tduffy tduffy /* helper functions */ tduffy tduffy -extern u32 dapl_set_consumer_context(DAT_HANDLE handle, DAT_CONTEXT context); tduffy +extern u32 dapl_set_consumer_context(DAT_HANDLE handle, tduffy + union dat_context context); tduffy tduffy -extern u32 dapl_get_consumer_context(DAT_HANDLE handle, DAT_CONTEXT *context); tduffy +extern u32 dapl_get_consumer_context(DAT_HANDLE handle, tduffy + union dat_context *context); tduffy tduffy extern u32 dapl_get_handle_type(DAT_HANDLE handle, tduffy enum dat_handle_type *type); tduffy Index: linux-kernel/dat/dat.h tduffy === tduffy --- linux-kernel/dat/dat.h (revision 2506) tduffy +++ linux-kernel/dat/dat.h (working
Re: [openib-general] OpenSM crash
On Tue, 2005-05-31 at 16:43, Tom Duffy wrote: On Tue, 2005-05-31 at 13:09 -0400, Hal Rosenstock wrote: There are certain changes where the makefiles need to be regenerated (and this is not done automatically). Since there was an additional compile flag added, they need to be regenerated or else it is being built the old way (without the real RMPP support enabled). $ make automake at the toplevel should take care of this, no? Yes. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
At 06:47 AM 5/28/2005, Christoph Hellwig wrote: On Sat, May 28, 2005 at 05:17:54AM -0700, Sukanta ganguly wrote: That's a pretty bold statement. Linux grew up to be popular via mass acceptance. Seems like that charter has changed and a few have control over Linux and its future. The My way or the highway philosophy has gotten embedded in the Linux way of life. Life is getting tough. You're totally missing the point. Linux is successfull exactly because it's lookinf for the right solution, not something the business people need short-term. Hence why some of us contend that the end-game, i.e. the right solution, is not necessarily the short-term implementation that is present today that just evolves creating that legacy inertia that I wrote about earlier. I think there is validity to having an implementation to critique - accept, reject, modify. I think there is validity to examining industry standards as the basis for new work / implementation. If people are unwilling to discuss these standards and only stay focused on their business people's short-term needs, then some might contend as above that Linux is evolving to be much like the dreaded Pacific NW company in the end. Not intending to offend anyone but if there can be no debate without implementation on what is the right solution, then people might as well just go off and implement and propose their solution for incorporation into the Linux kernel. It may be that OpenIB wins in the end or it may be that it does not. Just having OpenIB subsume control of anything iWARP or impose only DAPL for all RDMA infrastructure because it just happens to be there today seems rather stifling. Just stating that some OpenIB steering group is somehow empowered to decide this for Linux is also rather strange. Open source is about being open and not under the control of any one entity in the end. Perhaps that is no longer the case. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers insteadof own
Sean, Is there any way of requesting an infinite number of retries? There is not, but nothing prevents a user from simply re-issuing a request after it times out. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers instead of own
On Tue, 2005-05-31 at 14:17 -0400, James Lentini wrote: Here's the specification's exact description: timeout: Duration of time, in microseconds, that a consumer waits for connection establishment. The value of DAT_TIMEOUT_INFINITE represents no timeout, indefinite wait. Values must be positive. Let me make sure I got this right: timeout is in s (10^-6 seconds), not ms (10^-3 seconds). If so, I am off by 3 orders of magnitude in my calculation. Right? My perspective is that we are not implementing this API for a real time operating system and therefore should take a fuzzy view of time. Trust me, it is going to fuzzy what with the mechanism IB uses to encode timeouts. BTW, what do you think would be a good test case to make sure the new code is working as intended? -tduffy signature.asc Description: This is a digitally signed message part ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] user CM uses devfs
Why does the userlevel CM use devfs to create device nodes? Userlevel verbs and mad layers appear to rely on udev. -- Bill Jordan SilverStorm Technologies ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers insteadof own
On Tue, 2005-05-31 at 14:34 -0700, Sean Hefty wrote: Sean, Is there any way of requesting an infinite number of retries? There is not, but nothing prevents a user from simply re-issuing a request after it times out. Infinite retries inside the kernel does not sound like a good idea. How would you break it? At least we should have some sort of exponential backoff to prevent flooding the network. -tduffy signature.asc Description: This is a digitally signed message part ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Tue, May 31, 2005 at 02:31:19PM -0700, Michael Krause wrote: ... Not intending to offend anyone but if there can be no debate without implementation on what is the right solution, then people might as well just go off and implement and propose their solution for incorporation into the Linux kernel. That is certainly one option. I didn't see anyone in openib.org trying to take that choice away. Is it easier to submit a new subsystem than fixup an existing one? I honestly don't know the answer since both options could fail depending on how people approach them. But my gut feeling is if rdmaconsortium can't play nicely with openib.org, they won't be able to play nicely with kernel.org either. I've been advocating rdmaconsortium folks submit patches against openib.org for several reasons: 1) start with a code base that works 2) start with a code base that is already upstream 3) get advice/guidance from people who know how to collaborate in an open source environment. I thought (2) was the most important...but now I have to wonder if it's really (3). Several very good people are driving openib.org developement. Just having OpenIB subsume control of anything iWARP or impose only DAPL for all RDMA infrastructure because it just happens to be there today seems rather stifling. Just stating that some OpenIB steering group is somehow empowered to decide this for Linux is also rather strange. steering group is Committee talk. AFAICT the openib.org steering group doesn't control the content of the svn.openib.org source tree. It manages things like web content, overall charter, etc. People do NOT have to be members of the steering committee or openib.org to become either maintainers or to submit code. Open source is about being open and not under the control of any one entity in the end. Perhaps that is no longer the case. No. SOME entity always controls what goes in (or not) any given source tree. That has nothing to do with open source. Open source is about collaboration and being able to fork if that collaboration ceases to be useful. One can substitute trust for the word collaboration and it would be accurate too. Figure out how to build trust (without contracts!) and then how to get things done in open source becomes clear. hth, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] user CM uses devfs
On Tue, May 31, 2005 at 05:54:38PM -0400, William Jordan wrote: Why does the userlevel CM use devfs to create device nodes? Userlevel verbs and mad layers appear to rely on udev. No reason except that I went with what I thought was the simpler model at the time. Unlike the verbs and mad layers which need a certain number of device nodes for every physical device installed including hot-plug support, the userlevel CM needs just one device node for communicating with the kernel CM which should be present the entire time the kernel CM is loaded. -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
I completely agree. I think this thread was not started to get one of the projects out of the way of the other. I would think it was started to coordinate the development cycles of two related projects, where one project is admittedly much more advanced, just due to the since years now availability of IB technology. I also think that it is not yet proven that good software must always get created by writing something down and then reshaping it to meet upcoming requirements. OpenRDMA has choosen to start by trying to identify main requirements and then agreeing upon an appropriate architecture. We did not start with discussing the style of commentary lines, because it was assumed that the style of commentary lines is less important and even easier to fix. Enabling iWARP under Linux is not an easy task and we are dependent on the open source communities help and support to make this happen. We are not in the position nor willing to bypass this procedure - and its always good to have fruitful discussion. Bernard. [EMAIL PROTECTED] wrote on 31.05.2005 23:31:19: At 06:47 AM 5/28/2005, Christoph Hellwig wrote: On Sat, May 28, 2005 at 05:17:54AM -0700, Sukanta ganguly wrote: That's a pretty bold statement. Linux grew up to be popular via mass acceptance. Seems like that charter has changed and a few have control over Linux and its future. The My way or the highway philosophy has gotten embedded in the Linux way of life. Life is getting tough. You're totally missing the point. Linux is successfull exactly because it's lookinf for the right solution, not something the business people need short-term. Hence why some of us contend that the end-game, i.e. the right solution, is not necessarily the short-term implementation that is present today that just evolves creating that legacy inertia that I wrote about earlier. I think there is validity to having an implementation to critique - accept, reject, modify. I think there is validity to examining industry standards as the basis for new work / implementation. If people are unwilling to discuss these standards and only stay focused on their business people's short- term needs, then some might contend as above that Linux is evolving to be much like the dreaded Pacific NW company in the end. Not intending to offend anyone but if there can be no debate without implementation on what is the right solution, then people might as well just go off and implement and propose their solution for incorporation into the Linux kernel. It may be that OpenIB wins in the end or it may be that it does not. Just having OpenIB subsume control of anything iWARP or impose only DAPL for all RDMA infrastructure because it just happens to be there today seems rather stifling. Just stating that some OpenIB steering group is somehow empowered to decide this for Linux is also rather strange. Open source is about being open and not under the control of any one entity in the end. Perhaps that is no longer the case. Mike___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Tue, May 31, 2005 at 02:03:06PM -0700, Tom Duffy wrote: On Sat, 2005-05-28 at 09:13 +0200, Christoph Hellwig wrote: On Fri, May 27, 2005 at 03:56:58PM -0700, Bob Woodruff wrote: kDAPL is intended as a kernel-level API for RDMA enabled fabrics. As it was initially written, it does not meet the Linux coding style and that is why it is being totally reworked as we speak to meet that goal. The codingstyle alone isn't the problem. The whole design philosophy is rather odd. As one of the people trying to clean up kDAPL, I would like to know what you think, from a design philosophy, is wrong with it. We *can* correct any daim bramaged parts. Well, from a kernel API design philosophy the evd is somewhat odd. The whole idea behind the event model seems a bit convoluted. First multiplex a wide variety of events from the provider into a single event queue, and then have an API so the consumer can tell what type of event they actually have and can still receive the event notification in the provider's context. This seems to be a lot of work to first hide useful information, but also not loose the information in case the consumer really does want it. It appears to be a case of a decent userspace idea that doesn't make much sense in the kernel. Why is it there? I imagine it's to abstract a variety of OS kernels, which was one of the goals of the design. Also, I realize it's just an implementation detail, but I've got a number of issues with ATS. -Libor ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] user CM uses devfs
William Why does the userlevel CM use devfs to create device William nodes? Userlevel verbs and mad layers appear to rely on William udev. Good point. devfs is dying a richly deserved death in a month (cf Documentation/feature-removal.txt) -- we should just use the standard character device stuff and let udev handle things. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [ANNOUNCE][PATCH] New Linux 2.6.9 backport patches and corresponding userspace tar ball available
Michael Wrote Patches are located in the SVN tree under gen2/trunk/src/linux-kernel/patches/backport-to-2.6.9/ infiniband-backport-svn2425-to-2.6.9-kernel-fixups-01.diff infiniband-backport-svn2425-to-2.6.9-openib-drivers-02.diff infiniband-backport-svn2425-to-2.6.9-openib-fixups-03.diff infiniband-backport-svn2425-userspace.tar.gz woody Woody, could you please move these patches to gen2/branches? -- MST - Michael S. Tsirkin What do you guys think, would these be better kept under gen2/branches or where I have put them under linux-kernel/patches ? I can see arguments both ways. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCHv2][RFC] kDAPL: use cm timers insteadofown
On Tue, 2005-05-31 at 14:34 -0700, Sean Hefty wrote: Sean, Is there any way of requesting an infinite number of retries? There is not, but nothing prevents a user from simply re-issuing a request after it times out. Infinite retries inside the kernel does not sound like a good idea. How would you break it? At least we should have some sort of exponential backoff to prevent flooding the network. To be a little more clear. The CM protocol uses 4-bits for its number of retries with a linear timeout. What an app does above that is undefined. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
I've been advocating rdmaconsortium folks submit patches against openib.org for several reasons: Probably, you meant openrdma.org opensource project but not a standards setting body (i.e. RDMA consortium - http://www.rdmaconsortium.org/home) :) 1) start with a code base that works 2) start with a code base that is already upstream 3) get advice/guidance from people who know how to collaborate in an open source environment. I thought (2) was the most important...but now I have to wonder if it's really (3). You are mistaken. I know people in the OpenRDMA community have worked with the opensource projects before and they know how to play and collaborate in an open source environment. The early part of the work in openrdma is in fact, a true example of that effort (which you may disagree with but having worked with several other opensource projects and with OpenIB, we have solved the issues which other projects including OpenIB have faced) and the next phase of work which is of course the code development, a key aspect of broader community effort. I think we are diverging from the real issue - the fundamental differences in the views of each community in how we can solve this common problem of supporting multiple RDMA fabrics, which is what we need to focus on. Just having OpenIB subsume control of anything iWARP or impose only DAPL for all RDMA infrastructure because it just happens to be there today seems rather stifling. Just stating that some OpenIB steering group is somehow empowered to decide this for Linux is also rather strange. AFAICT the openib.org steering group doesn't control the content of the svn.openib.org source tree. It manages things like web content, overall charter, etc Don't agree. If you have read the email thread on this discussion, you would find that steering committee need to decide whether openIB should work on including the support for iWARP. Not that I am supporting this idea -:) In the opensource world, developers should/will have the freedom to add what they want to do but of course, the acceptance of their contributions into mainline is completely a different matter. Thanks Venkat___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] cmpost: failure sending REQ: -22
Has anyone seen ib_send_cm_req() return -22? I believe that this is a timeout error, possibly indicating that the server side of the connection wasn't running. You may also want to verify the slid and dlid are correct for your configuration. Don't you get a REJ now when there is no one listening on a service ID requested ? You do if the CM is loaded on the destination. -22 is EINVAL. In terms of ib_send_cm_req, it is returned for a number of cases: 1. peer to peer connection is requested 2. No primary path is supplied 3. QP is not RC or UC 4. private data is supplied and length 92 5. alternate path supplied and PKEY or MTU does not match primary path 6. connection state is not IDLE 7. Primary or alternate path SGID or PKey does not match those of port You're right. I was thinking about the request failing asynchronously, not synchronously when called. Mostly likely cause is a bad slid/dlid. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Tue, 2005-05-31 at 16:43 -0700, Venkata Jagana wrote: AFAICT the openib.org steering group doesn't control the content of the svn.openib.org source tree. It manages things like web content, overall charter, etc Don't agree. If you have read the email thread on this discussion, you would find that steering committee need to decide whether openIB should work on including the support for iWARP. Not that I am supporting this idea -:) Please don't confuse the development effort going on this list (openib-general) with the corporation that sponsors the development. OpenIB as a (non-profit) corporation is setup with a charter and has bylaws, etc. Its goal may be IB specific at the moment. But, the developers and the development on this list and in the subversion repository don't answer to the OpenIB board of directors. Developers are free to write whatever code they chose to. There is nothing stopping the maintainers from taking iWARP patches *today*. OpenIB, as a corporation, may not put its name behind the work, but that is another matter. If the board of directors of OpenIB decide to cease sponsorship because the developers don't jive with its corporate goals, then development can continue elsewhere. -tduffy signature.asc Description: This is a digitally signed message part ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [Rdma-developers] Re: [openib-general] OpenIB and OpenRDMA: Convergence on common RDMAAPIs and ULPs for Linux
On Tue, May 31, 2005 at 04:43:58PM -0700, Venkata Jagana wrote: I've been advocating rdmaconsortium folks submit patches against openib.org for several reasons: Probably, you meant openrdma.org opensource project but not a standards setting body (i.e. RDMA consortium - http://www.rdmaconsortium.org/home) :) Yes - sorry. My bad. You are mistaken. I know people in the OpenRDMA community have worked with the opensource projects before and they know how to play and collaborate in an open source environment. I likely am. But comments about requiring commitment and business planning resources suggest otherwise. The early part of the work in openrdma is in fact, a true example of that effort (which you may disagree with but having worked with several other opensource projects and with OpenIB, we have solved the issues which other projects including OpenIB have faced) and the next phase of work which is of course the code development, a key aspect of broader community effort. Well, a true example of that effort would have included code. I think we are diverging from the real issue - the fundamental differences in the views of each community in how we can solve this common problem of supporting multiple RDMA fabrics, which is what we need to focus on. If there is a fundemental difference, it's something along the lines of: openrmda: Hey! We have this transport neutral RNIC PI spec that needs IB support! openib: Nice. Where is the code for iWarp? openrdma: Uhm, well, we've only written the spec so far. openib: Ok. What do you want from us? openrdma: Well, we want you to review this RNIC-PI spec and then write the code to support IB. openib: Are you crazy? We have a working implementation. And it's in kernel.org. openrmda: We know. That's why we should collaborate. RNIC PI spec is transport neutral. Could you review it and then implement it in openib.org? openib: No. You can submit patches and we'll review those. openrmda: Ok. But I'm not gonna write any code unless someone commits to accept it. We can't plan our business unless someone commits resources to work on accepting our patches. openib: No. You can submit patches and we'll review those. ... I'm trying to NOT be sarcastic - just summarize what I've understood so far. Please correct or post your own version (sans rude talk by certain people). ... Don't agree. If you have read the email thread on this discussion, you would find that steering committee need to decide whether openIB should work on including the support for iWARP. Not that I am supporting this idea -:) Tom answered this nicely already. In the opensource world, developers should/will have the freedom to add what they want to do Open source developers have *some* allegiance to their funders. HP pays me to look out for their interests - but I don't do that unconditionally. If an HP person is pushing for the wrong things that I know won't fly, I have an obligation to push back. but of course, the acceptance of their contributions into mainline is completely a different matter. We agree acceptance of contributions is conditional. But earlier emails stated someone needed a firm commitment that openrdma RNIC PI would get accepted into openib.org. There's a disconnect there. hth, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [ANNOUNCE][PATCH] New Linux 2.6.9 backport patches and corresponding userspace tar ball available
On Tue, 2005-05-31 at 18:56, Woodruff, Robert J wrote: Michael Wrote Patches are located in the SVN tree under gen2/trunk/src/linux-kernel/patches/backport-to-2.6.9/ infiniband-backport-svn2425-to-2.6.9-kernel-fixups-01.diff infiniband-backport-svn2425-to-2.6.9-openib-drivers-02.diff infiniband-backport-svn2425-to-2.6.9-openib-fixups-03.diff infiniband-backport-svn2425-userspace.tar.gz woody Woody, could you please move these patches to gen2/branches? -- MST - Michael S. Tsirkin What do you guys think, would these be better kept under gen2/branches or where I have put them under linux-kernel/patches ? I can see arguments both ways. Me too (see arguments both ways). I'm not so much concerned with exactly where they are as that they are available. If they are to move out of trunk, another possibility is gen2/users. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Send Side RMPP and OpenSM GetTableResp
--SA GetTableResp RMPP flags 0x05 (Data, Last) SegmentNumber 4 PayloadLength 0x34 TID 8 SA GetTable -- RMPP flags 0x02 (ACK) SegmentNumber 1 NewWindowLast 6 TID 8 This segment number is off - not sure why. It is off in that the 3 segments just sent are not acknowledged but it is legal to acknowledge what you have already received. This does not violate anything. The RMPP implementation sends an ACK under the following conditions: * Upon completion of a received datagram. * If a duplicate segment is receive. * After all segments of the current window are received (including the initial window) So, this ACK isn't violating the protocol, but I don't see which of these cases the ACK matches up against in the implementation. It could indicate that segment 2 was lost, That's one possibility but I doubt it is getting lost. or that its processing came after that of a later segment. After re-examining the RMPP code, the implementation doesn't automatically send an ACK just because a segment is processed out of order. It tries to be intelligent about it in case receive processing is occurring in multiple threads. The gap was 769.912 usec. Not sure whether this corresponds to any IBTA timeout. Is this the hardcoded timeout that is used ? RMPP uses 40 seconds to complete a receive. The sender uses a 2 second timer to wait for an ACK before resending segments. (This is recently reduced from 5 seconds.) The receiver uses a 10 second timer to maintain state after completing a receive in order to re-generate lost final ACKs. Regardless what went wrong on the SA side, the client needs to be able to deal with it. This applies in both directions but in this case I think you mean the other direction (whatever went wrong on the SA client side the SA needs to be able to deal with it). Obviously all bugs need to be fixed. I was simply trying to state that the receiving side must be able to handle a buggy transmitter without adversely affecting the system. --SA GetTableResp RMPP flags 0x01 (Data) SegmentNumber 5 PayloadLength 0x34 TID 8 This should not occur. The maximum segment number sent should have stayed at 4. I guess one area to check is to make sure that the PayloadLength in the original MAD is set correctly. I do not know what would happen if it were set incorrectly. There could also be an error in how RMPP calculates the number of segments that will be sent. It does look like it is trying to resend the last (at least based on the PayloadLength) ? I will find where to instrument this in the code. The code on the send side calculates the total segment number using both the PayloadLength and sge.length field. If either is off, the sender side could probably be thrown off in its calculations. Even if this were the case, I still can't see what would cause segment number 5 to be transmitted... This segment should have been dropped by the client as an invalid segment number. It's not invalid, is it ? Just a repeat. Should it reset one of the RMPP timers too ? If segment 4 had the last bit set, segment 5 is invalid. The RMPP code should drop this. It also fills in 0 in RRespTime. Should it fill in something to correspond to the hard coded time it uses ? Or perhaps 32 (0x1F) ? I don't think the value of RRespTime matters at this point. I will try to get back to gathering more info on this. Having some more info would help, but I can also try modifying grmpp to see if I can reproduce this. My intention is to focus on finding a fix for the MAD problems at the moment, however, so I'll queue this up to look at it when I get back to RMPP. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Send Side RMPP and OpenSM GetTableResp
On Tue, 2005-05-31 at 22:05, Sean Hefty wrote: --SA GetTableResp RMPP flags 0x05 (Data, Last) SegmentNumber 4 PayloadLength 0x34 TID 8 SA GetTable -- RMPP flags 0x02 (ACK) SegmentNumber 1 NewWindowLast 6 TID 8 This segment number is off - not sure why. It is off in that the 3 segments just sent are not acknowledged but it is legal to acknowledge what you have already received. This does not violate anything. The RMPP implementation sends an ACK under the following conditions: * Upon completion of a received datagram. * If a duplicate segment is receive. * After all segments of the current window are received (including the initial window) So, this ACK isn't violating the protocol, but I don't see which of these cases the ACK matches up against in the implementation. That's (the ACK) not from OpenIB but from the Solaris 10 SA client. The code on the send side calculates the total segment number using both the PayloadLength and sge.length field. If either is off, the sender side could probably be thrown off in its calculations. Even if this were the case, I still can't see what would cause segment number 5 to be transmitted... Perhaps there is something wrong with umad in terms of this but it's hard to see what as it just posts the send MAD built with ib_create_send_mad. This segment should have been dropped by the client as an invalid segment number. It's not invalid, is it ? Just a repeat. Should it reset one of the RMPP timers too ? I was referring to the reACK from the client not the retransmitted data segment from the SA which has the wrong segment number). If segment 4 had the last bit set, segment 5 is invalid. The RMPP code should drop this. Right. Is just dropping sufficient ? It looks to me that the receiver should if it is not the expected segment also send ACK for ES - 1 per Figure 178. [There was more to the sequence which I omitted; I only showed up to the point where things looked like they went wrong on the SA side.] I will try to get back to gathering more info on this. Having some more info would help, but I can also try modifying grmpp to see if I can reproduce this. My intention is to focus on finding a fix for the MAD problems at the moment, however, so I'll queue this up to look at it when I get back to RMPP. OK. I'll try to get more info so this can be more focused. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general