Re: P
2010/6/21 Dotan Barak : > On 20/06/2010 07:51, Ding Dinghua wrote: >> >> hello, >> >> 2010/6/19 Dotan Barak: >> >>> >>> I call rdma_create_id to create an ib id, then do resolve remote addr, resolve route work, then setup qp and call rdma_connect to setup connection, before ack or error replies, the thread will wait on a wait queue. The listening ib id of remote node will catch the connect request, setup qp, allocate and map pages to construct the RDMA-WRITE space, and call rdma_accept to reply the request. Some other information which may be useful: 1.All the "RETRY EXCEEDED" problems happened when there were two connections which use RDMA-WRITE to transfer things. And the latter connection had a high possibility to get into this problem. 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE space is 256MB each(that is, for two connections, consumes 512MB mem), when the RDMA-WRITE space is 64MB, this problem never happened in our test. Remote node's total memory is 2GB. Thanks a lot. >>> >>> Some more questions: >>> * Is the WR that "produces" the RETRY EXCEEDED is the first one/last >>> one/in >>> the middle? >>> >> >> it's the first one >> >> >>> >>> * Which values are you using in the QP context for retry exceeded counter >>> + >>> retry timeout? >>> * Did you try to increase those values? >>> >> >> I haven't set these values(actually I don't know where to set these >> values), i just set max_send_wr and max_send_sge >> fields of struct ib_qp_cap when creating qp. >> >> > > Can you perform query QP after establishing a connection between the QPs and > check those values? > All the QPs (local and remote, 2 connections) : 3 qp_state, 19 retry_cnt, 7 timeout, that's, all QPs' qp_state is IB_QPS_RTS(Should remote QP's state be this, or IB_QPS_RTR? But QP's state of first connection is the same and it can work...). >>> * How many more QPs do you have between those nodes and which operations >>> do >>> they use >>> (only RDMA-WRITEs?) >>> >>> >> >> 4096 QPs for each connection, only do RDMA-WRITES. >> > > So, you send in parallel total of 4K (QPs) * 64M (Bytes) = 256 GB > (am i missing something, or this is the amount of data that will be sent > between two nodes?) > if RDMA-WRITE space is 64M(Bytes), this means upper level applications send at most 64M(Bytes) to remote node one time. These QPs may send to different piece of the 64M space in parallel. > Dotan > -- Ding Dinghua -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: P
On 20/06/2010 07:51, Ding Dinghua wrote: hello, 2010/6/19 Dotan Barak: I call rdma_create_id to create an ib id, then do resolve remote addr, resolve route work, then setup qp and call rdma_connect to setup connection, before ack or error replies, the thread will wait on a wait queue. The listening ib id of remote node will catch the connect request, setup qp, allocate and map pages to construct the RDMA-WRITE space, and call rdma_accept to reply the request. Some other information which may be useful: 1.All the "RETRY EXCEEDED" problems happened when there were two connections which use RDMA-WRITE to transfer things. And the latter connection had a high possibility to get into this problem. 2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE space is 256MB each(that is, for two connections, consumes 512MB mem), when the RDMA-WRITE space is 64MB, this problem never happened in our test. Remote node's total memory is 2GB. Thanks a lot. Some more questions: * Is the WR that "produces" the RETRY EXCEEDED is the first one/last one/in the middle? it's the first one * Which values are you using in the QP context for retry exceeded counter + retry timeout? * Did you try to increase those values? I haven't set these values(actually I don't know where to set these values), i just set max_send_wr and max_send_sge fields of struct ib_qp_cap when creating qp. Can you perform query QP after establishing a connection between the QPs and check those values? * How many more QPs do you have between those nodes and which operations do they use (only RDMA-WRITEs?) 4096 QPs for each connection, only do RDMA-WRITES. So, you send in parallel total of 4K (QPs) * 64M (Bytes) = 256 GB (am i missing something, or this is the amount of data that will be sent between two nodes?) Dotan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5] opensm/osmeventplugin: added new events to monitor SM
Adding new events that allow event plug-in to see when SM finishes heavy sweep and routing configuration, when it updates dump files, when it is no longer master, when SM port is down, and when SA DB is actually dumped at the end of light sweep: OSM_EVENT_ID_HEAVY_SWEEP_DONE OSM_EVENT_ID_UCAST_ROUTING_DONE OSM_EVENT_ID_STATE_CHANGE OSM_EVENT_ID_SA_DB_DUMPED Signed-off-by: Yevgeny Kliteynik --- Changes since v4: - OSM_EVENT_ID_SA_DB_DUMPED was still reported during heavy sweep - removed. Changes since v3: - OSM_EVENT_ID_ENTERING_STANDBY and OSM_EVENT_ID_SM_PORT_DOWN replaced by OSM_EVENT_STATE_CHANGE - OSM_EVENT_ID_SA_DB_DUMPED is not reported during heavy sweep, but only if SA DB was actually dumped at the end of light sweep - fixed bug with OSM_EVENT_ID_MAX opensm/include/opensm/osm_event_plugin.h |4 opensm/opensm/osm_state_mgr.c | 17 +++-- opensm/osmeventplugin/src/osmeventplugin.c | 12 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index 33d1920..0b3464e 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -72,6 +72,10 @@ typedef enum { OSM_EVENT_ID_PORT_SELECT, OSM_EVENT_ID_TRAP, OSM_EVENT_ID_SUBNET_UP, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, + OSM_EVENT_ID_UCAST_ROUTING_DONE, + OSM_EVENT_ID_STATE_CHANGE, + OSM_EVENT_ID_SA_DB_DUMPED, OSM_EVENT_ID_MAX } osm_epi_event_id_t; diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 81c8f54..299514b 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1107,8 +1107,10 @@ static void do_sweep(osm_sm_t * sm) if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) return; if (!sm->p_subn->force_heavy_sweep) { - if (sm->p_subn->opt.sa_db_dump) - osm_sa_db_file_dump(sm->p_subn->p_osm); + if (sm->p_subn->opt.sa_db_dump && + !osm_sa_db_file_dump(sm->p_subn->p_osm)) + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_SA_DB_DUMPED, NULL); OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "LIGHT SWEEP COMPLETE"); return; @@ -1151,6 +1153,8 @@ static void do_sweep(osm_sm_t * sm) if (!sm->p_subn->subnet_initialization_error) { OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "REROUTE COMPLETE"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); return; } } @@ -1185,6 +1189,8 @@ repeat_discovery: /* Move to DISCOVERING state */ osm_sm_state_mgr_process(sm, OSM_SM_SIGNAL_DISCOVER); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1205,6 +1211,8 @@ repeat_discovery: "ENTERING STANDBY STATE"); /* notify master SM about us */ osm_send_trap144(sm, 0); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1212,6 +1220,9 @@ repeat_discovery: if (sm->p_subn->force_heavy_sweep) goto repeat_discovery; + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, NULL); + OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "HEAVY SWEEP COMPLETE"); /* If we are MASTER - get the highest remote_sm, and @@ -1314,6 +1325,8 @@ repeat_discovery: OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "SWITCHES CONFIGURED FOR UNICAST"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); if (!sm->p_subn->opt.disable_multicast) { osm_mcast_mgr_process(sm); diff --git a/opensm/osmeventplugin/src/osmeventplugin.c b/opensm/osmeventplugin/src/osmeventplugin.c index b4d9ce9..f40f7fe 100644 --- a/opensm/osmeventplugin/src/osmeventplugin.c +++ b/opensm/osmeventplugin/src/osmeventplugin.c @@ -176,6 +176,18 @@ static void report(void *_log, osm_epi_event_id_t event_id, void *event_data) case OSM_EVENT_ID_SUBNET_UP: fprintf(log->log_file, "Subnet up reported\n"); break; + case OSM_EVENT_ID_HEAVY_SWEEP_DONE: + fprintf(log->log_file, "Heavy sweep completed\n"); +
Re: [PATCH v4] opensm/osmeventplugin: added new events to monitor SM
On 20-Jun-10 12:31 PM, Yevgeny Kliteynik wrote: Adding new events that allow event plug-in to see when SM finishes heavy sweep and routing configuration, when it updates dump files, when it is no longer master, when SM port is down, and when SA DB is actually dumped at the end of light sweep: OSM_EVENT_ID_HEAVY_SWEEP_DONE OSM_EVENT_ID_UCAST_ROUTING_DONE OSM_EVENT_ID_STATE_CHANGE OSM_EVENT_ID_SA_DB_DUMPED Signed-off-by: Yevgeny Kliteynik --- Changes since v3: - OSM_EVENT_ID_ENTERING_STANDBY and OSM_EVENT_ID_SM_PORT_DOWN replaced by OSM_EVENT_STATE_CHANGE - OSM_EVENT_ID_SA_DB_DUMPED is not reported during heavy sweep, but only if SA DB was actually dumped Nope, still reported during heavy sweep. V5 is on its way... -- Yevgeny at the end of light sweep - fixed bug with OSM_EVENT_ID_MAX opensm/include/opensm/osm_event_plugin.h |4 opensm/opensm/osm_state_mgr.c | 22 +++--- opensm/osmeventplugin/src/osmeventplugin.c | 14 ++ 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index 33d1920..0b3464e 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -72,6 +72,10 @@ typedef enum { OSM_EVENT_ID_PORT_SELECT, OSM_EVENT_ID_TRAP, OSM_EVENT_ID_SUBNET_UP, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, + OSM_EVENT_ID_UCAST_ROUTING_DONE, + OSM_EVENT_ID_STATE_CHANGE, + OSM_EVENT_ID_SA_DB_DUMPED, OSM_EVENT_ID_MAX } osm_epi_event_id_t; diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 81c8f54..be6e67d 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1107,8 +1107,10 @@ static void do_sweep(osm_sm_t * sm) if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) return; if (!sm->p_subn->force_heavy_sweep) { - if (sm->p_subn->opt.sa_db_dump) - osm_sa_db_file_dump(sm->p_subn->p_osm); + if (sm->p_subn->opt.sa_db_dump&& + !osm_sa_db_file_dump(sm->p_subn->p_osm)) + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_SA_DB_DUMPED, NULL); OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "LIGHT SWEEP COMPLETE"); return; @@ -1151,6 +1153,8 @@ static void do_sweep(osm_sm_t * sm) if (!sm->p_subn->subnet_initialization_error) { OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "REROUTE COMPLETE"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); return; } } @@ -1185,6 +1189,8 @@ repeat_discovery: /* Move to DISCOVERING state */ osm_sm_state_mgr_process(sm, OSM_SM_SIGNAL_DISCOVER); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1205,6 +1211,8 @@ repeat_discovery: "ENTERING STANDBY STATE"); /* notify master SM about us */ osm_send_trap144(sm, 0); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1212,6 +1220,9 @@ repeat_discovery: if (sm->p_subn->force_heavy_sweep) goto repeat_discovery; + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, NULL); + OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "HEAVY SWEEP COMPLETE"); /* If we are MASTER - get the highest remote_sm, and @@ -1314,6 +1325,8 @@ repeat_discovery: OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "SWITCHES CONFIGURED FOR UNICAST"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); if (!sm->p_subn->opt.disable_multicast) { osm_mcast_mgr_process(sm); @@ -1375,7 +1388,10 @@ repeat_discovery: if (osm_log_is_active(sm->p_log, OSM_LOG_VERBOSE) || sm->p_subn->opt.sa_db_dump) - osm_sa_db_file_dump(sm->p_subn->p_osm); + if (!osm_sa_db_file_dump(sm->p_subn->p_osm)) + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_SA_DB_DUMPED, NULL); + } /* diff --git a/opensm/osmeventplugin/src/osmeventplugin.c b/opensm/
[PATCH v4] opensm/osmeventplugin: added new events to monitor SM
Adding new events that allow event plug-in to see when SM finishes heavy sweep and routing configuration, when it updates dump files, when it is no longer master, when SM port is down, and when SA DB is actually dumped at the end of light sweep: OSM_EVENT_ID_HEAVY_SWEEP_DONE OSM_EVENT_ID_UCAST_ROUTING_DONE OSM_EVENT_ID_STATE_CHANGE OSM_EVENT_ID_SA_DB_DUMPED Signed-off-by: Yevgeny Kliteynik --- Changes since v3: - OSM_EVENT_ID_ENTERING_STANDBY and OSM_EVENT_ID_SM_PORT_DOWN replaced by OSM_EVENT_STATE_CHANGE - OSM_EVENT_ID_SA_DB_DUMPED is not reported during heavy sweep, but only if SA DB was actually dumped at the end of light sweep - fixed bug with OSM_EVENT_ID_MAX opensm/include/opensm/osm_event_plugin.h |4 opensm/opensm/osm_state_mgr.c | 22 +++--- opensm/osmeventplugin/src/osmeventplugin.c | 14 ++ 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index 33d1920..0b3464e 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -72,6 +72,10 @@ typedef enum { OSM_EVENT_ID_PORT_SELECT, OSM_EVENT_ID_TRAP, OSM_EVENT_ID_SUBNET_UP, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, + OSM_EVENT_ID_UCAST_ROUTING_DONE, + OSM_EVENT_ID_STATE_CHANGE, + OSM_EVENT_ID_SA_DB_DUMPED, OSM_EVENT_ID_MAX } osm_epi_event_id_t; diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 81c8f54..be6e67d 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1107,8 +1107,10 @@ static void do_sweep(osm_sm_t * sm) if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats)) return; if (!sm->p_subn->force_heavy_sweep) { - if (sm->p_subn->opt.sa_db_dump) - osm_sa_db_file_dump(sm->p_subn->p_osm); + if (sm->p_subn->opt.sa_db_dump && + !osm_sa_db_file_dump(sm->p_subn->p_osm)) + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_SA_DB_DUMPED, NULL); OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "LIGHT SWEEP COMPLETE"); return; @@ -1151,6 +1153,8 @@ static void do_sweep(osm_sm_t * sm) if (!sm->p_subn->subnet_initialization_error) { OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "REROUTE COMPLETE"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); return; } } @@ -1185,6 +1189,8 @@ repeat_discovery: /* Move to DISCOVERING state */ osm_sm_state_mgr_process(sm, OSM_SM_SIGNAL_DISCOVER); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1205,6 +1211,8 @@ repeat_discovery: "ENTERING STANDBY STATE"); /* notify master SM about us */ osm_send_trap144(sm, 0); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_STATE_CHANGE, NULL); return; } @@ -1212,6 +1220,9 @@ repeat_discovery: if (sm->p_subn->force_heavy_sweep) goto repeat_discovery; + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_HEAVY_SWEEP_DONE, NULL); + OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "HEAVY SWEEP COMPLETE"); /* If we are MASTER - get the highest remote_sm, and @@ -1314,6 +1325,8 @@ repeat_discovery: OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, "SWITCHES CONFIGURED FOR UNICAST"); + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_UCAST_ROUTING_DONE, NULL); if (!sm->p_subn->opt.disable_multicast) { osm_mcast_mgr_process(sm); @@ -1375,7 +1388,10 @@ repeat_discovery: if (osm_log_is_active(sm->p_log, OSM_LOG_VERBOSE) || sm->p_subn->opt.sa_db_dump) - osm_sa_db_file_dump(sm->p_subn->p_osm); + if (!osm_sa_db_file_dump(sm->p_subn->p_osm)) + osm_opensm_report_event(sm->p_subn->p_osm, + OSM_EVENT_ID_SA_DB_DUMPED, NULL); + } /* diff --git a/opensm/osmeventplugin/src/osmeventplugin.c b/opensm/osmeventplugin/src/osmeventplugin.c index b4d9ce9..ea3b9a5 100644 --- a/opensm/osmeventplugin/src/osmeventplugin.c +++ b/opensm/osmeventplug
[PATCH] opensm: Fix wrong messages in MC delete flow
Fix wrong messages in MC delete and update flows. The requester GID was wrong. Signed-off-by: Eli Dorfman --- opensm/opensm/osm_sa_mcmember_record.c | 11 --- 1 files changed, 4 insertions(+), 7 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 07aeb6c..eda2fdd 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -405,8 +405,7 @@ static boolean_t validate_modify(IN osm_sa_t * sa, IN osm_mgrp_t * p_mgrp, "0x%016" PRIx64 " request:0x%016" PRIx64 "\n", cl_ntoh64((*pp_mcm_port)->port_gid.unicast. interface_id), - cl_ntoh64(p_mad_addr->addr_type.gsi.grh_info. - src_gid.unicast.interface_id)); + cl_ntoh64(request_gid.unicast.interface_id)); return FALSE; } } else { @@ -422,11 +421,9 @@ static boolean_t validate_modify(IN osm_sa_t * sa, IN osm_mgrp_t * p_mgrp, p_request_physp)) { /* the request port is not part of the partition for this mgrp */ OSM_LOG(sa->p_log, OSM_LOG_DEBUG, - "ProxyJoin but port not in partition. stored:" - "0x%016" PRIx64 " request:0x%016" PRIx64 "\n", - cl_ntoh64((*pp_mcm_port)->port->guid), - cl_ntoh64(p_mad_addr->addr_type.gsi.grh_info. - src_gid.unicast.interface_id)); + "Requesting port 0x%016" PRIx64 " has no P_Key 0x%04x\n", + cl_ntoh64(p_request_physp->port_guid), + cl_ntoh16(p_mgrp->mcmember_rec.pkey)); return FALSE; } } -- 1.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html