[PATCH] opensm/osm_torus.c: In dump_torus, make sure switch is present before dumping
Fix segmentation fault in osm_torus.c. Signed-off-by: Hal Rosenstock h...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_torus.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c index 1510233..e63cb40 100644 --- a/opensm/osm_torus.c +++ b/opensm/osm_torus.c @@ -7517,11 +7517,12 @@ void dump_torus(struct torus *t) for (k = 0; k z_sz; k++) for (j = 0; j y_sz; j++) for (i = 0; i x_sz; i++) - fprintf(file, switch %u,%u,%u GUID 0x%04 - PRIx64 (%s)\n, - i, j, k, - cl_ntoh64(t-sw[i][j][k]-n_id), - t-sw[i][j][k]-osm_switch-p_node-print_desc); + if (t-sw[i][j][k]) + fprintf(file, switch %u,%u,%u GUID 0x%04 + PRIx64 (%s)\n, + i, j, k, + cl_ntoh64(t-sw[i][j][k]-n_id), + t-sw[i][j][k]-osm_switch-p_node-print_desc); fclose(file); } -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_torus.c: torus routing should fail with VLCap 1 on switch external ports
Signed-off-by: Hal Rosenstock h...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_torus.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c index e63cb40..757a32a 100644 --- a/opensm/osm_torus.c +++ b/opensm/osm_torus.c @@ -7021,11 +7021,13 @@ bool verify_setup(struct torus *t, struct fabric *f) ERR 4E20: missing required torus size specification!\n); goto out; } - if (t-osm-subn.min_sw_data_vls 2) - OSM_LOG(t-osm-log, OSM_LOG_INFO, - Warning: Too few data VLs to support torus routing + if (t-osm-subn.min_sw_data_vls 2) { + OSM_LOG(t-osm-log, OSM_LOG_ERROR, + ERR 4E48: Too few data VLs to support torus routing without credit loops (have switchport %d need 2)\n, (int)t-osm-subn.min_sw_data_vls); + goto out; + } if (t-osm-subn.min_sw_data_vls 4) OSM_LOG(t-osm-log, OSM_LOG_INFO, Warning: Too few data VLs to support torus routing -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] IB/mlx4: fix bug unwinding on error in mlx4_ib_init_sriov()
We have to decrement i before calling mlx4_ib_free_demux_ctx() or we free something that wasn't allocated. That's fine for free_pv_object() but it would lead to a NULL dereference calling mlx4_ib_free_demux_ctx(). The null dereference is because -tun is NULL when we check: if (!ctx-tun[i]) Also we didn't free -sriov.demux[0] so it was a small leak. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com --- Static checker stuff. I have not tested this. diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 0a903c1..934792c 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -1999,16 +1999,17 @@ int mlx4_ib_init_sriov(struct mlx4_ib_dev *dev) goto demux_err; err = mlx4_ib_alloc_demux_ctx(dev, dev-sriov.demux[i], i + 1); if (err) - goto demux_err; + goto free_pv; } mlx4_ib_master_tunnels(dev, 1); return 0; +free_pv: + free_pv_object(dev, mlx4_master_func_num(dev-dev), i + 1); demux_err: - while (i 0) { + while (--i = 0) { free_pv_object(dev, mlx4_master_func_num(dev-dev), i + 1); mlx4_ib_free_demux_ctx(dev-sriov.demux[i]); - --i; } mlx4_ib_device_unregister_sysfs(dev); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Fwd: Error message when trying to use Infiniband virtual functions in virtual machine
Should I try to launch VM in an older version of qemu-kvm ? dmesg when using qemu-kvm 1.2.2-6 : You need to use a release/drop of qemu which has these fixes, http://marc.info/?l=qemu-develm=135835580814935w=2 http://marc.info/?l=qemu-develm=135836502119460w=2 Alex, what stable branch/release would you recommend to use? Or. From: Mathis GAVILLON [jbibo...@gmail.com] Sent: 04 February 2013 11:42 To: Jack Morgenstein Cc: Or Gerlitz Subject: Re: Fwd: Error message when trying to use Infiniband virtual functions in virtual machine The solution you proposed hasn't changed anything. The error message in the VM and Disabling IRQ #16 is already present in log files. Should I try to launch VM in an older version of qemu-kvm ? dmesg when using qemu-kvm 1.2.2-6 : [ 1914.152816] pci :82:00.1: enabling device ( - 0002) [ 1914.156091] mlx4_core :82:00.0: FLR event for slave: 1 [ 1914.156101] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 1914.156103] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 1914.878353] assign device 0:82:0.1 [ 1915.080617] mlx4_core :82:00.0: FLR event for slave: 1 [ 1915.080651] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 1915.080652] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 1932.546300] mlx4_core :82:00.0: Received reset from slave:1 [ 1932.559015] pci :82:00.1: irq 178 for MSI/MSI-X [ 1932.576928] pci :82:00.1: irq 178 for MSI/MSI-X [ 1932.576951] pci :82:00.1: irq 179 for MSI/MSI-X [ 1933.017996] irq 16: nobody cared (try booting with the irqpoll option) [ 1933.018000] Pid: 0, comm: swapper/3 Not tainted 3.8.0-rc2 #1 [ 1933.018001] Call Trace: [ 1933.018002] IRQ [810dc41d] __report_bad_irq+0x3d/0xe0 [ 1933.018015] [810dc713] note_interrupt+0x1a3/0x1f0 [ 1933.018020] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 1933.018023] [810d9e69] handle_irq_event_percpu+0xb9/0x200 [ 1933.018026] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 1933.018029] [810d9ff2] handle_irq_event+0x42/0x70 [ 1933.018031] [810dd259] handle_fasteoi_irq+0x59/0x100 [ 1933.018036] [8101616f] handle_irq+0xbf/0x150 [ 1933.018041] [8155b792] ? __atomic_notifier_call_chain+0x12/0x20 [ 1933.018044] [8155b7b6] ? atomic_notifier_call_chain+0x16/0x20 [ 1933.018047] [8156188a] do_IRQ+0x5a/0xe0 [ 1933.018050] [81557aad] common_interrupt+0x6d/0x6d [ 1933.018051] EOI [81436d00] ? cpuidle_wrap_enter+0x50/0xa0 [ 1933.018057] [81436cf9] ? cpuidle_wrap_enter+0x49/0xa0 [ 1933.018060] [81436d60] cpuidle_enter_tk+0x10/0x20 [ 1933.018063] [8143698b] cpuidle_idle_call+0xbb/0x280 [ 1933.018067] [8101d2ef] cpu_idle+0xaf/0x120 [ 1933.018071] [81544bb9] start_secondary+0x255/0x257 [ 1933.018072] handlers: [ 1933.018075] [813cea90] usb_hcd_irq [ 1933.018077] Disabling IRQ #16 [ 1942.678990] mlx4_core :82:00.0: Received reset from slave:1 dmesg when using qemu-kvm 1.3.0-5 : [ 181.886165] pci :82:00.1: enabling device ( - 0002) [ 181.889480] mlx4_core :82:00.0: FLR event for slave: 1 [ 181.889517] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 181.889519] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 182.622378] assign device 0:82:0.1 [ 182.798342] mlx4_core :82:00.0: FLR event for slave: 1 [ 182.798352] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 182.798354] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 202.515121] mlx4_core :82:00.0: Received reset from slave:1 [ 202.527857] pci :82:00.1: irq 178 for MSI/MSI-X [ 202.549725] pci :82:00.1: irq 178 for MSI/MSI-X [ 202.549740] pci :82:00.1: irq 179 for MSI/MSI-X [ 202.991521] irq 16: nobody cared (try booting with the irqpoll option) [ 202.991526] Pid: 0, comm: swapper/3 Not tainted 3.8.0-rc2 #1 [ 202.991527] Call Trace: [ 202.991528] IRQ [810dc41d] __report_bad_irq+0x3d/0xe0 [ 202.991541] [810dc713] note_interrupt+0x1a3/0x1f0 [ 202.991546] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 202.991550] [810d9e69] handle_irq_event_percpu+0xb9/0x200 [ 202.991553] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 202.991555] [810d9ff2] handle_irq_event+0x42/0x70 [ 202.991558] [810dd259] handle_fasteoi_irq+0x59/0x100 [ 202.991562] [8101616f] handle_irq+0xbf/0x150 [ 202.991567] [8155b792] ? __atomic_notifier_call_chain+0x12/0x20 [ 202.991570] [8155b7b6] ? atomic_notifier_call_chain+0x16/0x20 [ 202.991574] [8156188a] do_IRQ+0x5a/0xe0 [ 202.991577] [81557aad] common_interrupt+0x6d/0x6d [ 202.991577] EOI [81436d00] ? cpuidle_wrap_enter+0x50/0xa0 [ 202.991584] [81436cf9] ? cpuidle_wrap_enter+0x49/0xa0 [ 202.991587] [81436d60] cpuidle_enter_tk+0x10/0x20 [
Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
Hi Bart, thanks for approaching this! We're not the best mainline developers so I guess we won't be there. But we have the big SRP setups and our sysadmins really don't like reconnecting SRP hosts manually and putting their devices complicated to the related dm-multipath devices again. Think about 200 SRP devices per server (already filtered by initiator groups). We also consider the srptools as unmaintained, unreliable and slow. It is possible that the srptools commands don't return. Therefore, we send the SRP connection strings directly to the initiator within our mapping jobs. It would also be great not to develop a DDoS attack reconnect like open-iscsi does. Rebooting the whole cluster to fix this isn't fun. There must be a possibility to configure different reconnect intervals. Btw.: We even had the case that the IPoIB stuff reconnected but the RDMA part didn't with iSER. It was so broken then, that we couldn't disconnect or reconnect anymore - only chance hard reboot. So you know our point of view and we already develop it that way for us. I'm looking forward what's the output of the discussion. At the current state it's difficult to nag our bosses to publish what we have so far. On 01.02.2013 14:43, Bart Van Assche wrote: It is known that it takes about two to three minutes before the upstream SRP initiator fails over from a failed path to a working path. This is not only considered longer than acceptable but is also longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with improving the fail-over SRP initiator has been slow. This is because the discussion about candidate patches occurred at two different levels: not only the patches itself were discussed but also the approach that should be followed. That last aspect is easier to discuss in a meeting than over a mailing list. Hence the proposal to discuss SRP initiator failover behavior during the LSF/MM summit. The topics that need further discussion are: * If a path fails, remove the entire SCSI host or preserve the SCSI host and only remove the SCSI devices associated with that host ? Preserve SCSI hosts and SCSI devices unless they are removed explicitly by disconnect request. Rescanning SCSI devices with - - - like iscsiadm -R does for example may reorder the device names (sda becomes sdb, etc.). * Which software component should test the state of a path and should reconnect to an SRP target if a path is restored ? Should that be done by the user space process srp_daemon or by the SRP initiator kernel module ? By the SRP kernel module. This is exactly the big advantage of SRP so far: It is simple, it is RDMA and kernel only. * How should the SRP initiator behave after a path failure has been detected ? Should the behavior be similar to the FC initiator with its fast_io_fail_tmo and dev_loss_tmo parameters ? Fine for us as long as it is possible to configure such times and the behavior at all. For dm-multipath we need fast IO failing and that the SRP initiator tries to automatically reconnect that path. Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_qos_policy.c: fix segmentation fault on osm_qos_policy_match_rule_destroy (osm_qos_policy.c)
From: Shlomi Nimrodi shlo...@mellanox.com Signed-off-by: Shlomi Nimrodi shlo...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_qos_policy.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/opensm/osm_qos_policy.c b/opensm/osm_qos_policy.c index c8526b9..ad5eaa0 100644 --- a/opensm/osm_qos_policy.c +++ b/opensm/osm_qos_policy.c @@ -373,20 +373,23 @@ void osm_qos_policy_match_rule_destroy(osm_qos_match_rule_t * p) if (p-use) free(p-use); - for (i = 0; i p-service_id_range_len; i++) - free(p-service_id_range_arr[i]); - if (p-service_id_range_arr) + if (p-service_id_range_arr) { + for (i = 0; i p-service_id_range_len; i++) + free(p-service_id_range_arr[i]); free(p-service_id_range_arr); + } - for (i = 0; i p-qos_class_range_len; i++) - free(p-qos_class_range_arr[i]); - if (p-qos_class_range_arr) + if (p-qos_class_range_arr) { + for (i = 0; i p-qos_class_range_len; i++) + free(p-qos_class_range_arr[i]); free(p-qos_class_range_arr); + } - for (i = 0; i p-pkey_range_len; i++) - free(p-pkey_range_arr[i]); - if (p-pkey_range_arr) + if (p-pkey_range_arr) { + for (i = 0; i p-pkey_range_len; i++) + free(p-pkey_range_arr[i]); free(p-pkey_range_arr); + } cl_list_apply_func(p-source_list, __free_single_element, NULL); cl_list_remove_all(p-source_list); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_congestion_control.c: fix use-after-free found by coverity
From: Ilya Nelkenbaum il...@mellanox.com Read from pointer p_madw after free Signed-off-by: Ilya Nelkenbaum il...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_congestion_control.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/opensm/osm_congestion_control.c b/opensm/osm_congestion_control.c index e103ab1..17af407 100644 --- a/opensm/osm_congestion_control.c +++ b/opensm/osm_congestion_control.c @@ -521,6 +521,7 @@ static void cc_poller_send(osm_congestion_control_t *p_cc, { osm_subn_opt_t *p_opt = p_cc-subn-opt; ib_api_status_t status; + osm_madw_context_t mad_context = p_madw-context; status = osm_vendor_send(p_cc-bind_handle, p_madw, TRUE); if (status == IB_SUCCESS) { @@ -530,15 +531,11 @@ static void cc_poller_send(osm_congestion_control_t *p_cc, cl_event_wait_on(p_cc-sig_mads_on_wire_continue, EVENT_NO_TIMEOUT, TRUE); - } - else { - osm_madw_context_t *mad_context = p_madw-context; - + } else OSM_LOG(p_cc-log, OSM_LOG_ERROR, ERR C104: send failed to node 0x% PRIx64 port %u\n, - mad_context-cc_context.node_guid, - mad_context-cc_context.port); - } + mad_context.cc_context.node_guid, + mad_context.cc_context.port); } static void cc_poller(void *p_ptr) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Changed #if to #ifdef when using ENABLE_OSM_PERF_MGR_PROFILE
If some plugins include opensm/osm_madw.h, ENABLE_OSM_PERF_MGR_PROFILE might be undefined and using #if will cause cause compilation error. Signed-off-by: Alex Netes ale...@mellanox.com --- include/opensm/osm_madw.h | 2 +- opensm/osm_perfmgr.c | 12 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/opensm/osm_madw.h b/include/opensm/osm_madw.h index afc3680..c450a1d 100644 --- a/include/opensm/osm_madw.h +++ b/include/opensm/osm_madw.h @@ -334,7 +334,7 @@ typedef struct osm_perfmgr_context { uint64_t node_guid; uint16_t port; uint8_t mad_method; /* was this a get or a set */ -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE struct timeval query_start; #endif } osm_perfmgr_context_t; diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c index d8f933e..5207245 100644 --- a/opensm/osm_perfmgr.c +++ b/opensm/osm_perfmgr.c @@ -70,7 +70,7 @@ #define PERFMGR_INITIAL_TID_VALUE 0xcafe -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE struct { double fastest_us; double slowest_us; @@ -557,7 +557,7 @@ static void perfmgr_query_counters(cl_map_item_t * p_map_item, void *context) mad_context.perfmgr_context.node_guid = node_guid; mad_context.perfmgr_context.port = port; mad_context.perfmgr_context.mad_method = IB_MAD_METHOD_GET; -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE gettimeofday(mad_context.perfmgr_context.query_start, NULL); #endif OSM_LOG(pm-log, OSM_LOG_VERBOSE, Getting stats for node 0x% @@ -800,7 +800,7 @@ _exit: **/ void osm_perfmgr_process(osm_perfmgr_t * pm) { -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE struct timeval before, after; #endif @@ -835,7 +835,7 @@ void osm_perfmgr_process(osm_perfmgr_t * pm) CL_PLOCK_RELEASE(pm-sm-p_lock); } -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE gettimeofday(before, NULL); #endif /* With the global lock held, collect the node guids */ @@ -853,7 +853,7 @@ void osm_perfmgr_process(osm_perfmgr_t * pm) /* clean out any nodes found to be removed during the sweep */ remove_marked_nodes(pm); -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE /* spin on outstanding queries */ while (pm-outstanding_queries 0) cl_event_wait_on(pm-sig_sweep, 1000, TRUE); @@ -1333,7 +1333,7 @@ static void pc_recv_process(void *context, void *data) perfmgr_check_overflow(pm, p_mon_node, pkey_ix, port, wire_read); -#if ENABLE_OSM_PERF_MGR_PROFILE +#ifdef ENABLE_OSM_PERF_MGR_PROFILE do { struct timeval proc_time; gettimeofday(proc_time, NULL); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_ucast_ftree.c: Fix unranked nodes bug in FTree
From: Shlomi Nimrodi shlo...@mellanox.com In case that nodes were unranked (for example: nodes which all their ports are unhealthy) sm crashed, therefore we need to remove from the ftree structure unranked switches and hca's which are connected only to unranked switches. Signed-off-by: Shlomi Nimrodi shlo...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_ucast_ftree.c | 217 ++- 1 file changed, 176 insertions(+), 41 deletions(-) diff --git a/opensm/osm_ucast_ftree.c b/opensm/osm_ucast_ftree.c index d58fff1..8363bd2 100644 --- a/opensm/osm_ucast_ftree.c +++ b/opensm/osm_ucast_ftree.c @@ -556,18 +556,21 @@ static ftree_sw_t *sw_create(IN ftree_fabric_t * p_ftree, sizeof(ftree_port_group_t *)); if (p_sw-down_port_groups == NULL) goto FREE_P_SW; + memset(p_sw-down_port_groups, 0, ports_num * sizeof(ftree_port_group_t *)); p_sw-up_port_groups = (ftree_port_group_t **) malloc(ports_num * sizeof(ftree_port_group_t *)); if (p_sw-up_port_groups == NULL) goto FREE_DOWN; + memset(p_sw-up_port_groups, 0, ports_num * sizeof(ftree_port_group_t *)); p_sw-sibling_port_groups = (ftree_port_group_t **) malloc(ports_num * sizeof(ftree_port_group_t *)); if (p_sw-sibling_port_groups == NULL) goto FREE_UP; + memset(p_sw-sibling_port_groups, 0, ports_num * sizeof(ftree_port_group_t *)); /* initialize lft buffer */ memset(p_osm_sw-new_lft, OSM_NO_PATH, p_osm_sw-lft_size); @@ -807,6 +810,8 @@ static ftree_hca_t *hca_create(IN osm_node_t * p_osm_node) free(p_hca); return NULL; } + memset(p_hca-up_port_groups, 0, osm_node_get_num_physp(p_hca-p_osm_node) * + sizeof(ftree_port_group_t *)); p_hca-up_port_groups_num = 0; return p_hca; } @@ -1757,11 +1762,11 @@ static boolean_t fabric_validate_topology(IN ftree_fabric_t * p_ftree) p_sw = p_next_sw; p_next_sw = (ftree_sw_t *) cl_qmap_next(p_sw-map_item); - if (!reference_sw_arr[p_sw-rank]) { + if (!reference_sw_arr[p_sw-rank]) /* This is the first switch in the current level that we're checking - use it as a reference */ reference_sw_arr[p_sw-rank] = p_sw; - } else { + else { /* compare this switch properties to the reference switch */ if (reference_sw_arr[p_sw-rank]-up_port_groups_num != @@ -3254,7 +3259,8 @@ static void sw_reverse_rank(IN cl_map_item_t * const p_map_item, { ftree_fabric_t *p_ftree = (ftree_fabric_t *) context; ftree_sw_t *p_sw = (ftree_sw_t * const)p_map_item; - p_sw-rank = p_ftree-max_switch_rank - p_sw-rank; + if (p_sw-rank != 0x) + p_sw-rank = p_ftree-max_switch_rank - p_sw-rank; } /*** @@ -3517,7 +3523,8 @@ struct rank_root_cxt { ftree_fabric_t *fabric; cl_list_t *list; }; - +/*** + ***/ static int rank_root_sw_by_guid(void *cxt, uint64_t guid, char *p) { struct rank_root_cxt *c = cxt; @@ -3538,53 +3545,63 @@ static int rank_root_sw_by_guid(void *cxt, uint64_t guid, char *p) return 0; } - -static int fabric_rank_from_roots(IN ftree_fabric_t * p_ftree) +/*** + ***/ +static boolean_t fabric_load_roots(IN ftree_fabric_t * p_ftree, + IN cl_list_t* p_ranking_bfs_list) { struct rank_root_cxt context; + unsigned num_roots; + + if (p_ranking_bfs_list) { + cl_list_init(p_ranking_bfs_list, 10); + + /* Rank all the roots and add them to list */ + OSM_LOG(p_ftree-p_osm-log, OSM_LOG_DEBUG, + Fetching root nodes from file %s\n, + p_ftree-p_osm-subn.opt.root_guid_file); + + context.fabric = p_ftree; + context.list = p_ranking_bfs_list; + if (parse_node_map(p_ftree-p_osm-subn.opt.root_guid_file, + rank_root_sw_by_guid, context)) { + return FALSE; + } + + num_roots = cl_list_count(p_ranking_bfs_list); + if (!num_roots) { + OSM_LOG(p_ftree-p_osm-log, OSM_LOG_ERROR, ERR AB25: + No valid roots supplied\n); + return FALSE; + } + +
[PATCH] opensm/osm_link_mgr.c: Set AM SMSupportExtendedSpeeds bit if port supports ExtPortInfo
When updating PortInfo we should set AM SMSupportExtendedSpeeds bit for the ports that support ExtendedSpeeds. Otherwise, we won't be able to update ExtendedSpeedEnabled field. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_link_mgr.c | 19 --- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c index 5271d59..9d73e74 100644 --- a/opensm/osm_link_mgr.c +++ b/opensm/osm_link_mgr.c @@ -102,7 +102,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp, uint8_t port_num, mtu, op_vls, smsl = OSM_DEFAULT_SL; boolean_t esp0 = FALSE, send_set = FALSE, send_set2 = FALSE; osm_physp_t *p_remote_physp, *physp0; - int qdr_change = 0, fdr10_change = 0; + int issue_ext = 1, fdr10_change = 0; int ret = 0; ib_net32_t attr_mod, cap_mask; boolean_t update_mkey = FALSE; @@ -334,19 +334,8 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp, sm-p_subn-opt. force_link_speed); if (memcmp(p_pi-link_speed, p_old_pi-link_speed, - sizeof(p_pi-link_speed))) { + sizeof(p_pi-link_speed))) send_set = TRUE; - /* Determine whether QDR in LSE is being changed */ - if ((ib_port_info_get_link_speed_enabled(p_pi) -IB_LINK_SPEED_ACTIVE_10 - !(ib_port_info_get_link_speed_enabled(p_old_pi) - IB_LINK_SPEED_ACTIVE_10)) || - ((!(ib_port_info_get_link_speed_enabled(p_pi) - IB_LINK_SPEED_ACTIVE_10) - ib_port_info_get_link_speed_enabled(p_old_pi) - IB_LINK_SPEED_ACTIVE_10))) - qdr_change = 1; - } } if (sm-p_subn-opt.fdr10 @@ -377,7 +366,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp, } else cap_mask = p_pi-capability_mask; if (!(cap_mask IB_PORT_CAP_HAS_EXT_SPEEDS)) - qdr_change = 0; + issue_ext = 0; /* Do peer ports support extended link speeds ? */ if (port_num != 0 p_remote_physp) { @@ -462,7 +451,7 @@ Send: goto Exit; attr_mod = cl_hton32(port_num); - if (qdr_change) + if (issue_ext) attr_mod |= cl_hton32(1 31); /* AM SMSupportExtendedSpeeds */ status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp), payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_link_mgr.c: Fix sending PortInfo(Set) with AM SMSupportExtendedSpeeds bit set for switch base port 0
link_mgr mistakenly assums that port supports extended speeds when setting AM SMSupportExtendedSpeeds. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_link_mgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c index 9d73e74..26671bf 100644 --- a/opensm/osm_link_mgr.c +++ b/opensm/osm_link_mgr.c @@ -102,7 +102,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp, uint8_t port_num, mtu, op_vls, smsl = OSM_DEFAULT_SL; boolean_t esp0 = FALSE, send_set = FALSE, send_set2 = FALSE; osm_physp_t *p_remote_physp, *physp0; - int issue_ext = 1, fdr10_change = 0; + int issue_ext = 0, fdr10_change = 0; int ret = 0; ib_net32_t attr_mod, cap_mask; boolean_t update_mkey = FALSE; @@ -365,8 +365,8 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp, cap_mask = physp0-port_info.capability_mask; } else cap_mask = p_pi-capability_mask; - if (!(cap_mask IB_PORT_CAP_HAS_EXT_SPEEDS)) - issue_ext = 0; + if (cap_mask IB_PORT_CAP_HAS_EXT_SPEEDS) + issue_ext = 1; /* Do peer ports support extended link speeds ? */ if (port_num != 0 p_remote_physp) { -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Revert opensm/osm_ucast_ftree: When roots are not connected, update hop count but not lft
This reverts commit 81dade3aeb1d5c80472a4f9fef55e9916bb38d3a. The patch causes crashes in fat-tree routing and it's replaced by the following patch. Signed-off-by: Vincent Ficet jean-vincent.fi...@bull.net Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_ucast_ftree.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/opensm/osm_ucast_ftree.c b/opensm/osm_ucast_ftree.c index 8363bd2..c81e7a3 100644 --- a/opensm/osm_ucast_ftree.c +++ b/opensm/osm_ucast_ftree.c @@ -3038,10 +3038,8 @@ static void fabric_route_roots(IN ftree_fabric_t * p_ftree) through port %u\n, tuple_to_str(p_sw-tuple), lid, port_num); - if (p_ftree-p_osm-subn.opt.connect_roots) { - /* set local lft */ - p_sw-p_osm_sw-new_lft[lid] = port_num; - } + /* set local lft */ + p_sw-p_osm_sw-new_lft[lid] = port_num; /* * Set local min hop table. @@ -4221,10 +4219,12 @@ static int do_routing(IN void *context) Filling switch forwarding tables for switch-to-switch paths\n); fabric_route_to_switches(p_ftree); - OSM_LOG(p_ftree-p_osm-log, OSM_LOG_VERBOSE, - Connecting switches that are unreachable within - Up/Down rules\n); - fabric_route_roots(p_ftree); + if (p_ftree-p_osm-subn.opt.connect_roots) { + OSM_LOG(p_ftree-p_osm-log, OSM_LOG_VERBOSE, + Connecting switches that are unreachable within + Up/Down rules\n); + fabric_route_roots(p_ftree); + } /* for each switch, set its fwd table */ cl_qmap_apply_func(p_ftree-sw_tbl, set_sw_fwd_table, (void *)p_ftree); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_ucast_ftree.c: fix opensm segfault in osm_ucast_ftree.c
From: Vincent Ficet jean-vincent.fi...@bull.net Instead of unconditionnally assigning valid port numbers to the lft table, it now leaves 'holes' filled with the default value OSM_NO_PATH (=255) Accessing these invalid/unassigned LFT entries yields invalid addresses starting with 0xff as in the above example. After reverting commit 81dade3aeb1d5c80472a4f9fef55e9916bb38d3a and applying the above patch, we have not observed any multicast loop nor any segmentation fault. Reproducing this bug using ibsim is easy: 1/ Load a fat-tree topology 2/ Unlink a leaf switch 3/ Start opensm (configured with the ftree routing engine) 4/ Relink the leaf switch Signed-off-by: Vincent Ficet jean-vincent.fi...@bull.net Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_ucast_ftree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/opensm/osm_ucast_ftree.c b/opensm/osm_ucast_ftree.c index c81e7a3..932ec6a 100644 --- a/opensm/osm_ucast_ftree.c +++ b/opensm/osm_ucast_ftree.c @@ -4171,6 +4171,9 @@ static int construct_fabric(IN void *context) OSM_LOG(p_ftree-p_osm-log, OSM_LOG_VERBOSE, Max LID in switch LFTs: %u\n, p_ftree-lft_max_lid); + /* Build the full lid matrices needed for multicast routing */ + osm_ucast_mgr_build_lid_matrices(p_ftree-p_osm-sm.ucast_mgr); + Exit: if (status != 0) { OSM_LOG(p_ftree-p_osm-log, OSM_LOG_VERBOSE, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm/osm_sm_mad_ctrl.c: Upon receiving trap repress we should decrease qp0_mads_outstanding_on_wire
Current code causes SM to stuck after a few times SM priority is changed. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/osm_sm_mad_ctrl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c index c384eca..11195e8 100644 --- a/opensm/osm_sm_mad_ctrl.c +++ b/opensm/osm_sm_mad_ctrl.c @@ -534,6 +534,7 @@ static void sm_mad_ctrl_process_trap_repress(IN osm_sm_mad_ctrl_t * p_ctrl, */ switch (p_smp-attr_id) { case IB_MAD_ATTR_NOTICE: + sm_mad_ctrl_update_wire_stats(p_ctrl); sm_mad_ctrl_retire_trans_mad(p_ctrl, p_madw); break; default: -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Add physp_p discovery count support
In the cases below, we won't have updated PortInfo information between one of the link ports. In that case we must drop the link. 1. When receive timeouts for PortInfoGet MADs. 2. When port becomes LinkUp during a discovery, when link's peer is discovered first in DOWN state. Signed-off-by: Alex Netes ale...@mellanox.com --- include/opensm/osm_node.h | 6 ++ opensm/osm_drop_mgr.c | 47 -- opensm/osm_node.c | 9 + opensm/osm_node_info_rcv.c | 13 + opensm/osm_port_info_rcv.c | 11 +-- opensm/osm_state_mgr.c | 2 ++ 6 files changed, 84 insertions(+), 4 deletions(-) diff --git a/include/opensm/osm_node.h b/include/opensm/osm_node.h index 482ed89..dd1c5f9 100644 --- a/include/opensm/osm_node.h +++ b/include/opensm/osm_node.h @@ -102,6 +102,7 @@ typedef struct osm_node { uint32_t discovery_count; uint32_t physp_tbl_size; char *print_desc; + uint8_t *physp_discovered; osm_physp_t physp_table[1]; } osm_node_t; /* @@ -133,6 +134,11 @@ typedef struct osm_node { * print_desc * A printable version of the node description. * +* physp_discovered +* Array of physp_discovered objects for all ports of this node. +* Each object indiactes whether the port has been discovered +* during the sweep or not. 1 means that the port had been discovered. +* * phsyp_table * Array of physical port objects belonging to this node. * Index is contiguous by local port number. diff --git a/opensm/osm_drop_mgr.c b/opensm/osm_drop_mgr.c index 5e5f1b1..b309273 100644 --- a/opensm/osm_drop_mgr.c +++ b/opensm/osm_drop_mgr.c @@ -378,9 +378,11 @@ static boolean_t drop_mgr_process_node(osm_sm_t * sm, IN osm_node_t * p_node) static void drop_mgr_check_node(osm_sm_t * sm, IN osm_node_t * p_node) { ib_net64_t node_guid; - osm_physp_t *p_physp; + osm_physp_t *p_physp, *p_remote_physp; + osm_node_t *p_remote_node; osm_port_t *p_port; ib_net64_t port_guid; + uint8_t port_num, remote_port_num; OSM_LOG_ENTER(sm-p_log); @@ -428,7 +430,7 @@ static void drop_mgr_check_node(osm_sm_t * sm, IN osm_node_t * p_node) goto Exit; } - if (p_port-discovery_count == 0) { + if (!p_node-physp_discovered[0]) { OSM_LOG(sm-p_log, OSM_LOG_VERBOSE, Node 0x%016 PRIx64 port has discovery count zero\n, cl_ntoh64(node_guid)); @@ -437,6 +439,47 @@ static void drop_mgr_check_node(osm_sm_t * sm, IN osm_node_t * p_node) goto Exit; } + /* +* Unlink all ports that havn't been discovered during the last sweep. +* Optimization: Skip the check if discovered all the ports of the switch. +*/ + if (p_port-discovery_count p_node-physp_tbl_size) { + for (port_num = 1; port_num p_node-physp_tbl_size; port_num++) { + if (!p_node-physp_discovered[port_num]) { + p_physp = osm_node_get_physp_ptr(p_node, port_num); + if (!p_physp) + continue; + p_remote_physp = osm_physp_get_remote(p_physp); + if (!p_remote_physp) + continue; + + p_remote_node = + osm_physp_get_node_ptr(p_remote_physp); + remote_port_num = + osm_physp_get_port_num(p_remote_physp); + + OSM_LOG(sm-p_log, OSM_LOG_VERBOSE, + Unlinking local node 0x% PRIx64 + , port %u + \n\t\t\t\tand remote node 0x% PRIx64 + , port %u\n due to missing PortInfo, + cl_ntoh64(osm_node_get_node_guid + (p_node)), port_num, + cl_ntoh64(osm_node_get_node_guid + (p_remote_node)), + remote_port_num); + + if (sm-ucast_mgr.cache_valid) + osm_ucast_cache_add_link(sm-ucast_mgr, +p_physp, + p_remote_physp); + + osm_node_unlink(p_node, (uint8_t) port_num, + p_remote_node, + (uint8_t) remote_port_num); + } + } +
Re: [PATCH] librdmacm: Work-around kernel bug returning uid = 0
Thanks a lot Sean! Looks good to me, just a few minor nits below. On 02/02/2013 08:16 AM, sean.he...@intel.com wrote: From: Sean Hefty sean.he...@intel.com Older kernels have a bug where it can report an event with the uid set to 0. The librdmacm crashes when casting the uid to an rdma_cm_id and dereferencing the NULL pointer. There are a limited number of events where this can occur and in most cases it's safe to simply discard the event. (This is what the kernel does anyway.) However, it's possible for us to process an RDMA_CM_EVENT_ESTABLISHED event with the uid set to 0. (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.) Although it's rare for this to occur, it does in fact happen in practice. To work-around the kernel bug, when the uid of an established event is set to 0, we first try to locate the correct user space id based on related data before discarding the event. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/cma.c | 49 - 1 files changed, 48 insertions(+), 1 deletions(-) diff --git a/src/cma.c b/src/cma.c index ff9b426..7a9f6dc 100755 --- a/src/cma.c +++ b/src/cma.c @@ -50,6 +50,7 @@ #include netdb.h #include cma.h +#include indexer.h #include infiniband/driver.h #include infiniband/marshall.h #include rdma/rdma_cma.h @@ -123,6 +124,8 @@ static int cma_dev_cnt; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; static int abi_ver = RDMA_USER_CM_MAX_ABI_VERSION; int af_ib_support; +static struct index_map ucma_idm; +static fastlock_t idm_lock; static void ucma_cleanup(void) { @@ -214,6 +217,7 @@ int ucma_init(void) return 0; } + fastlock_init(idm_lock); ret = check_abi_version(); if (ret) goto err1; @@ -376,8 +380,27 @@ static void ucma_put_device(struct cma_device *cma_dev) pthread_mutex_unlock(mut); } Isn't there a fastlock_destroy(idm_lock) missing in err1? And another one in ucma_cleanup()? +static void ucma_insert_id(struct cma_id_private *id_priv) +{ + fastlock_acquire(idm_lock); + idm_set(ucma_idm, id_priv-handle, id_priv); + fastlock_release(idm_lock); +} + +static void ucma_remove_id(struct cma_id_private *id_priv) +{ + if (id_priv-handle = IDX_MAX_INDEX) + idm_clear(ucma_idm, id_priv-handle); +} + +static struct cma_id_private *ucma_lookup_id(int handle) +{ + return idm_lookup(ucma_idm, handle); +} + static void ucma_free_id(struct cma_id_private *id_priv) { + ucma_remove_id(id_priv); if (id_priv-cma_dev) ucma_put_device(id_priv-cma_dev); pthread_cond_destroy(id_priv-cond); @@ -406,6 +429,7 @@ static struct cma_id_private *ucma_alloc_id(struct rdma_event_channel *channel, id_priv-id.context = context; id_priv-id.ps = ps; id_priv-id.qp_type = qp_type; + id_priv-handle = 0x; if (!channel) { id_priv-id.channel = rdma_create_event_channel(); @@ -455,6 +479,7 @@ static int rdma_create_id2(struct rdma_event_channel *channel, VALGRIND_MAKE_MEM_DEFINED(resp, sizeof resp); id_priv-handle = resp.id; + ucma_insert_id(id_priv); *id = id_priv-id; return 0; @@ -1785,6 +1810,7 @@ static int ucma_process_conn_req(struct cma_event *evt, evt-event.listen_id = evt-id_priv-id; evt-event.id = id_priv-id; id_priv-handle = handle; + ucma_insert_id(id_priv); id_priv-initiator_depth = evt-event.param.conn.initiator_depth; id_priv-responder_resources = evt-event.param.conn.responder_resources; @@ -1916,7 +1942,28 @@ retry: VALGRIND_MAKE_MEM_DEFINED(resp, sizeof resp); evt-event.event = resp.event; - evt-id_priv = (void *) (uintptr_t) resp.uid; + /* +* We should have a non-zero uid, except for connection requests. +* But a bug in older kernels can report a uid 0. Work-around this +* issue by looking up the cma_id based on the kernel's id when the +* uid is 0 and we're processing a connection established event. +* In all other cases, if the uid is 0, we discard the event, like +* the kernel should have done. +*/ + if (resp.uid) { + evt-id_priv = (void *) (uintptr_t) resp.uid; + } else { + evt-id_priv = ucma_lookup_id(resp.id); + if (!evt-id_priv) { + fprintf(stderr, PFX Warning: discarding unmatched + event - rdma_destroy_id may hang.\n); Daemons tend to disconnect from stderr, can we change this to syslog(LOG_WARNING, ...)? + goto retry; + } + if (resp.event != RDMA_CM_EVENT_ESTABLISHED) { + ucma_complete_event(evt-id_priv); + goto retry; + } + } evt-event.id =
Re: [PATCH] librdmacm: Work-around kernel bug returning uid = 0
On 02/03/2013 08:24 AM, Or Gerlitz wrote: On 02/02/2013 09:16, sean.he...@intel.com wrote: Older kernels have a bug where it can report an event with the uid set to 0. The librdmacm crashes when casting the uid to an rdma_cm_id and dereferencing the NULL pointer. There are a limited number of events where this can occur and in most cases it's safe to simply discard the event. (This is what the kernel does anyway.) However, it's possible for us to process an RDMA_CM_EVENT_ESTABLISHED event with the uid set to 0. (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.) Hi Sean, It would be also worthwhile to nail the fix to the root cause little further, e.g push kernel commit 418edaaba96 to -stable Agreed, I opened RHEL bugzilla #719749 (somehow marked as private, no idea why) and hope RedHat is going to back port it into their kernels soon. Cheersm, Bernd -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Error message when trying to use Infiniband virtual functions in virtual machine
On Mon, 2013-02-04 at 11:23 +, Or Gerlitz wrote: Should I try to launch VM in an older version of qemu-kvm ? dmesg when using qemu-kvm 1.2.2-6 : You need to use a release/drop of qemu which has these fixes, http://marc.info/?l=qemu-develm=135835580814935w=2 http://marc.info/?l=qemu-develm=135836502119460w=2 Alex, what stable branch/release would you recommend to use? These are available in the qemu 1.3.1 stable release http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg05403.html And they're in the current development tree which is stabilizing for the 1.4.0 release. Thanks, Alex From: Mathis GAVILLON [jbibo...@gmail.com] Sent: 04 February 2013 11:42 To: Jack Morgenstein Cc: Or Gerlitz Subject: Re: Fwd: Error message when trying to use Infiniband virtual functions in virtual machine The solution you proposed hasn't changed anything. The error message in the VM and Disabling IRQ #16 is already present in log files. Should I try to launch VM in an older version of qemu-kvm ? dmesg when using qemu-kvm 1.2.2-6 : [ 1914.152816] pci :82:00.1: enabling device ( - 0002) [ 1914.156091] mlx4_core :82:00.0: FLR event for slave: 1 [ 1914.156101] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 1914.156103] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 1914.878353] assign device 0:82:0.1 [ 1915.080617] mlx4_core :82:00.0: FLR event for slave: 1 [ 1915.080651] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 1915.080652] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 1932.546300] mlx4_core :82:00.0: Received reset from slave:1 [ 1932.559015] pci :82:00.1: irq 178 for MSI/MSI-X [ 1932.576928] pci :82:00.1: irq 178 for MSI/MSI-X [ 1932.576951] pci :82:00.1: irq 179 for MSI/MSI-X [ 1933.017996] irq 16: nobody cared (try booting with the irqpoll option) [ 1933.018000] Pid: 0, comm: swapper/3 Not tainted 3.8.0-rc2 #1 [ 1933.018001] Call Trace: [ 1933.018002] IRQ [810dc41d] __report_bad_irq+0x3d/0xe0 [ 1933.018015] [810dc713] note_interrupt+0x1a3/0x1f0 [ 1933.018020] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 1933.018023] [810d9e69] handle_irq_event_percpu+0xb9/0x200 [ 1933.018026] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 1933.018029] [810d9ff2] handle_irq_event+0x42/0x70 [ 1933.018031] [810dd259] handle_fasteoi_irq+0x59/0x100 [ 1933.018036] [8101616f] handle_irq+0xbf/0x150 [ 1933.018041] [8155b792] ? __atomic_notifier_call_chain+0x12/0x20 [ 1933.018044] [8155b7b6] ? atomic_notifier_call_chain+0x16/0x20 [ 1933.018047] [8156188a] do_IRQ+0x5a/0xe0 [ 1933.018050] [81557aad] common_interrupt+0x6d/0x6d [ 1933.018051] EOI [81436d00] ? cpuidle_wrap_enter+0x50/0xa0 [ 1933.018057] [81436cf9] ? cpuidle_wrap_enter+0x49/0xa0 [ 1933.018060] [81436d60] cpuidle_enter_tk+0x10/0x20 [ 1933.018063] [8143698b] cpuidle_idle_call+0xbb/0x280 [ 1933.018067] [8101d2ef] cpu_idle+0xaf/0x120 [ 1933.018071] [81544bb9] start_secondary+0x255/0x257 [ 1933.018072] handlers: [ 1933.018075] [813cea90] usb_hcd_irq [ 1933.018077] Disabling IRQ #16 [ 1942.678990] mlx4_core :82:00.0: Received reset from slave:1 dmesg when using qemu-kvm 1.3.0-5 : [ 181.886165] pci :82:00.1: enabling device ( - 0002) [ 181.889480] mlx4_core :82:00.0: FLR event for slave: 1 [ 181.889517] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 181.889519] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 182.622378] assign device 0:82:0.1 [ 182.798342] mlx4_core :82:00.0: FLR event for slave: 1 [ 182.798352] mlx4_core :82:00.0: mlx4_handle_slave_flr [ 182.798354] mlx4_core :82:00.0: mlx4_handle_slave_flr: clean slave: 1 [ 202.515121] mlx4_core :82:00.0: Received reset from slave:1 [ 202.527857] pci :82:00.1: irq 178 for MSI/MSI-X [ 202.549725] pci :82:00.1: irq 178 for MSI/MSI-X [ 202.549740] pci :82:00.1: irq 179 for MSI/MSI-X [ 202.991521] irq 16: nobody cared (try booting with the irqpoll option) [ 202.991526] Pid: 0, comm: swapper/3 Not tainted 3.8.0-rc2 #1 [ 202.991527] Call Trace: [ 202.991528] IRQ [810dc41d] __report_bad_irq+0x3d/0xe0 [ 202.991541] [810dc713] note_interrupt+0x1a3/0x1f0 [ 202.991546] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 202.991550] [810d9e69] handle_irq_event_percpu+0xb9/0x200 [ 202.991553] [814362d0] ? userspace_cpufreq_notifier+0x90/0x90 [ 202.991555] [810d9ff2] handle_irq_event+0x42/0x70 [ 202.991558] [810dd259] handle_fasteoi_irq+0x59/0x100 [ 202.991562] [8101616f] handle_irq+0xbf/0x150 [ 202.991567] [8155b792] ? __atomic_notifier_call_chain+0x12/0x20 [ 202.991570]
[PATCH] ib/ipoib: fix CM crash after commit b13912bbb4a2
after commit b13912bbb4a2 IPoIB: Call skb_dst_drop() once skb is enqueued for sending using connected mode and running multithreaded iperf for long time that is iperf -c IP -P 16 -t 3600 results with a crash. solution is to always perform the skb_orphan and skb_dst_drop before the transmission. In case error occures than it will be no different than the regular case where dev_free_skb_any in the completion path which is assumed to be after these two routines. Signed-off-by: Shlomo Pongratz shlo...@mellanox.com --- drivers/infiniband/ulp/ipoib/ipoib_cm.c |6 +++--- drivers/infiniband/ulp/ipoib/ipoib_ib.c |6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 03103d2..67b0c1d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -741,6 +741,9 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ tx_req-mapping = addr; + skb_orphan(skb); + skb_dst_drop(skb); + rc = post_send(priv, tx, tx-tx_head (ipoib_sendq_size - 1), addr, skb-len); if (unlikely(rc)) { @@ -752,9 +755,6 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_ dev-trans_start = jiffies; ++tx-tx_head; - skb_orphan(skb); - skb_dst_drop(skb); - if (++priv-tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, TX ring 0x%x full, stopping kernel net queue\n, tx-qp-qp_num); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index a1bca70..2cfa76f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -600,6 +600,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, netif_stop_queue(dev); } + skb_orphan(skb); + skb_dst_drop(skb); + rc = post_send(priv, priv-tx_head (ipoib_sendq_size - 1), address-ah, qpn, tx_req, phead, hlen); if (unlikely(rc)) { @@ -615,9 +618,6 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb, address-last_send = priv-tx_head; ++priv-tx_head; - - skb_orphan(skb); - skb_dst_drop(skb); } if (unlikely(priv-tx_outstanding MAX_SEND_CQE)) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script
On 09:20 Thu 31 Jan , Doug Ledford wrote: On 01/31/13 02:21, Alex Netes wrote: On 14:24 Wed 30 Jan , Doug Ledford wrote: On 1/30/2013 2:12 PM, Bart Van Assche wrote: On 01/30/13 18:48, Doug Ledford wrote: On 1/30/2013 11:00 AM, Bart Van Assche wrote: Which convention is followed for other packages ? This is what I found in the Fedora 18 iscsi-initiator-utils package (http://be.mirror.eurid.eu/fedora/linux/releases/18/Fedora/source/SRPMS/i/iscsi-initiator-utils-6.2.0.872-19.fc18.src.rpm): * iscsid.init: Default-Start: 3 4 5 * iscsi-initiator-utils.spec: Okay, first off, any package that still uses the SysV initscripts as of Fedora 18 is not what I would call a package that is keeping up with the Fedora packaging guidelines or Fedora technologies. As such, I'm not really sure you want to use it as an example of a good package. However, that being said, you will note in this spec file that the iSCSI initiator package does exactly what you removed, or suggested be removed, from the opensmd spec file. It unilaterally adds the initscript to the system. The default start/stop settings are different, but the add action is the same. All initscripts should be added to the system, regardless of their default start/stop settings, and the default-start and default-stop should be used to control *how* they are added by default, and chkconfig --level .* scriptname [on|off] should be used to control whether or not they are on or off differently than their default settings. Thanks for the detailed reply. Regarding the purpose of the patch at the start of this thread: do you know whether it is LSB-compliant to use Default-Start: null or should Default-Start be left out entirely in order not to create the start links ? See e.g. http://refspecs.linuxbase.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/initscrcomconv.html. Bart. I've always used Default-start: without any listed levels, but I haven't really tested it either and the spec is not specific about this particular possible configuration. It's how the script was defined before. The behavior on RHEL and SLES is different when not specifying Default-start:. On RHEL, `chkconfig opensmd on` adds the service to the default runlevels: 2 3 4 5. While on SLES it doesn't. That sounds like a bug in SLES to be honest. chkconfig opensmd on without --levels should follow the Default-Start: item (just like chkconfig add opensmd). I think that RHEL and SLES implemented chkconfig differently. From man page of chkconfig on RHEL: By default, the on and off options affect only runlevels 2, 3, 4, and 5. While in SLES nothing is mentioned regarding the defaults. It's not acceptable to load opensm by default on boot, so I don't see other choice right now except of reverting commit 01ab74450fd1227cf2dfb9219ffd697d3beb4a45 or doing something similar of what I suggested at the start of the thread. --Alex -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 25/62] infiniband/cxgb4: convert to idr_alloc()
Reviewed-by: Steve Wise sw...@opengridcomputing.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script
On 02/04/13 16:36, Alex Netes wrote: On 09:20 Thu 31 Jan , Doug Ledford wrote: On 01/31/13 02:21, Alex Netes wrote: On 14:24 Wed 30 Jan , Doug Ledford wrote: On 1/30/2013 2:12 PM, Bart Van Assche wrote: On 01/30/13 18:48, Doug Ledford wrote: On 1/30/2013 11:00 AM, Bart Van Assche wrote: Which convention is followed for other packages ? This is what I found in the Fedora 18 iscsi-initiator-utils package (http://be.mirror.eurid.eu/fedora/linux/releases/18/Fedora/source/SRPMS/i/iscsi-initiator-utils-6.2.0.872-19.fc18.src.rpm): * iscsid.init: Default-Start: 3 4 5 * iscsi-initiator-utils.spec: Okay, first off, any package that still uses the SysV initscripts as of Fedora 18 is not what I would call a package that is keeping up with the Fedora packaging guidelines or Fedora technologies. As such, I'm not really sure you want to use it as an example of a good package. However, that being said, you will note in this spec file that the iSCSI initiator package does exactly what you removed, or suggested be removed, from the opensmd spec file. It unilaterally adds the initscript to the system. The default start/stop settings are different, but the add action is the same. All initscripts should be added to the system, regardless of their default start/stop settings, and the default-start and default-stop should be used to control *how* they are added by default, and chkconfig --level .* scriptname [on|off] should be used to control whether or not they are on or off differently than their default settings. Thanks for the detailed reply. Regarding the purpose of the patch at the start of this thread: do you know whether it is LSB-compliant to use Default-Start: null or should Default-Start be left out entirely in order not to create the start links ? See e.g. http://refspecs.linuxbase.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/initscrcomconv.html. Bart. I've always used Default-start: without any listed levels, but I haven't really tested it either and the spec is not specific about this particular possible configuration. It's how the script was defined before. The behavior on RHEL and SLES is different when not specifying Default-start:. On RHEL, `chkconfig opensmd on` adds the service to the default runlevels: 2 3 4 5. While on SLES it doesn't. That sounds like a bug in SLES to be honest. chkconfig opensmd on without --levels should follow the Default-Start: item (just like chkconfig add opensmd). I think that RHEL and SLES implemented chkconfig differently. From man page of chkconfig on RHEL: By default, the on and off options affect only runlevels 2, 3, 4, and 5. While in SLES nothing is mentioned regarding the defaults. It's not acceptable to load opensm by default on boot, so I don't see other choice right now except of reverting commit 01ab74450fd1227cf2dfb9219ffd697d3beb4a45 or doing something similar of what I suggested at the start of the thread. But why has commit 01ab744 to be reverted ? All that's needed to avoid that chkconfig gets enabled at boot during RPM installation is something like the patch at the top of this thread. Unless I'm overlooking something ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/62] infiniband/ipath: convert to idr_alloc()
Hello, On Mon, Feb 04, 2013 at 04:15:52PM +, Marciniszyn, Mike wrote: I tried the branch you indicated in the initial patch cover. When run with a qib driver, and ipoib ping of another system produces: ... Looks like this is tripping during the arp/neighbour path resolution: void idr_preload(gfp_t gfp_mask) { /* * Consuming preload buffer from non-process context breaks preload * allocation guarantee. Disallow usage from those contexts. */ WARN_ON_ONCE(in_interrupt()); Any ideas Roland? Yeah, firewire had the same problem. It needs to conditionalize preload() if !__GFP_WAIT (no point anyway). Will send update patches soon. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 22/62] infiniband/core: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. v2: Mike triggered WARN_ON() in idr_preload() because send_mad(), which may be used from non-process context, was calling idr_preload() unconditionally. Preload iff @gfp_mask has __GFP_WAIT. Signed-off-by: Tejun Heo t...@kernel.org Reported-by: Marciniszyn, Mike mike.marcinis...@intel.com Cc: Roland Dreier rol...@kernel.org Cc: Sean Hefty sean.he...@intel.com Cc: Hal Rosenstock hal.rosenst...@gmail.com Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/core/cm.c | 22 +++--- drivers/infiniband/core/cma.c| 24 +++- drivers/infiniband/core/sa_query.c | 18 ++ drivers/infiniband/core/ucm.c| 16 drivers/infiniband/core/ucma.c | 32 drivers/infiniband/core/uverbs_cmd.c | 17 - 6 files changed, 48 insertions(+), 81 deletions(-) --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -382,20 +382,21 @@ static int cm_init_av_by_path(struct ib_ static int cm_alloc_id(struct cm_id_private *cm_id_priv) { unsigned long flags; - int ret, id; + int id; static int next_id; - do { - spin_lock_irqsave(cm.lock, flags); - ret = idr_get_new_above(cm.local_id_table, cm_id_priv, - next_id, id); - if (!ret) - next_id = ((unsigned) id + 1) MAX_IDR_MASK; - spin_unlock_irqrestore(cm.lock, flags); - } while( (ret == -EAGAIN) idr_pre_get(cm.local_id_table, GFP_KERNEL) ); + idr_preload(GFP_KERNEL); + spin_lock_irqsave(cm.lock, flags); + + id = idr_alloc(cm.local_id_table, cm_id_priv, next_id, 0, GFP_NOWAIT); + if (id = 0) + next_id = ((unsigned) id + 1) MAX_IDR_MASK; + + spin_unlock_irqrestore(cm.lock, flags); + idr_preload_end(); cm_id_priv-id.local_id = (__force __be32)id ^ cm.random_id_operand; - return ret; + return id 0 ? id : 0; } static void cm_free_id(__be32 local_id) @@ -3844,7 +3845,6 @@ static int __init ib_cm_init(void) cm.remote_sidr_table = RB_ROOT; idr_init(cm.local_id_table); get_random_bytes(cm.random_id_operand, sizeof cm.random_id_operand); - idr_pre_get(cm.local_id_table, GFP_KERNEL); INIT_LIST_HEAD(cm.timewait_list); ret = class_register(cm_class); --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2143,33 +2143,23 @@ static int cma_alloc_port(struct idr *ps unsigned short snum) { struct rdma_bind_list *bind_list; - int port, ret; + int ret; bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; - do { - ret = idr_get_new_above(ps, bind_list, snum, port); - } while ((ret == -EAGAIN) idr_pre_get(ps, GFP_KERNEL)); - - if (ret) - goto err1; - - if (port != snum) { - ret = -EADDRNOTAVAIL; - goto err2; - } + ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL); + if (ret 0) + goto err; bind_list-ps = ps; - bind_list-port = (unsigned short) port; + bind_list-port = (unsigned short)ret; cma_bind_port(bind_list, id_priv); return 0; -err2: - idr_remove(ps, port); -err1: +err: kfree(bind_list); - return ret; + return ret == -ENOSPC ? -EADDRNOTAVAIL : ret; } static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv) --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -611,19 +611,21 @@ static void init_mad(struct ib_sa_mad *m static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { + bool preload = gfp_mask __GFP_WAIT; unsigned long flags; int ret, id; -retry: - if (!idr_pre_get(query_idr, gfp_mask)) - return -ENOMEM; + if (preload) + idr_preload(gfp_mask); spin_lock_irqsave(idr_lock, flags); - ret = idr_get_new(query_idr, query, id); + + id = idr_alloc(query_idr, query, 0, 0, GFP_NOWAIT); + spin_unlock_irqrestore(idr_lock, flags); - if (ret == -EAGAIN) - goto retry; - if (ret) - return ret; + if (preload) + idr_preload_end(); + if (id 0) + return id; query-mad_buf-timeout_ms = timeout_ms; query-mad_buf-context[0] = query; --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -176,7 +176,6 @@ static void ib_ucm_cleanup_events(struct static struct ib_ucm_context *ib_ucm_ctx_alloc(struct ib_ucm_file *file) { struct ib_ucm_context *ctx; - int result;
Re: [PATCH] tile: revert pr_info to printk in asm/io.h
On 2/1/2013 5:33 PM, Joe Perches wrote: On Fri, 2013-02-01 at 12:34 -0500, Chris Metcalf wrote: Using pr_info in a header exposes us to potential trouble from subsystems that define pr_fmt. This change fixes: In file included from include/linux/scatterlist.h:10, from include/scsi/scsi.h:12, from drivers/infiniband/ulp/srp/ib_srp.c:46: arch/tile/include/asm/io.h: In function ‘ioport_map’: arch/tile/include/asm/io.h:296: error: expected ‘)’ before ‘PFX’ Interesting. diff --git a/arch/tile/include/asm/io.h b/arch/tile/include/asm/io.h [] @@ -292,7 +292,7 @@ static inline long ioport_panic(void) static inline void __iomem *ioport_map(unsigned long port, unsigned int len) { -pr_info(ioport_map: mapping IO resources is unsupported on tile.\n); +printk(ioport_map: mapping IO resources is unsupported on tile.\n); It'd be nicer to add an appropriate KERN_LEVEL here. My preference would be to change ib_srp.c (and ib_srpt) like: --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index d5088ce..59bf409 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -30,7 +30,7 @@ * SOFTWARE. */ -#define pr_fmt(fmt) PFX fmt +#define pr_fmt(fmt) KBUILD_MODNAME : fmt #include linux/module.h #include linux/init.h Makes sense to me, and it does fix the compile failure. I've pulled my original change from my tree and instead, for the change above (assuming Joe is implicitly providing a Signed-off-by on this patch): Acked-by: Chris Metcalf cmetc...@tilera.com -- Chris Metcalf, Tilera Corp. http://www.tilera.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8
On Fri, Feb 1, 2013 at 5:18 PM, Bart Van Assche bvanass...@acm.org wrote: This patch series avoids that SCSI error handling triggers an endless loop and also restores reporting of QP errors in the kernel log. Bart, You wrote resend in the subject line, anything new, in these patches OR in other patches merged through the SCSI tree for 3.8 vs. what you had posted earlier on Dec 20th, 2012 (http://marc.info/?t=13559269281r=1w=2)?! as I wrote there, it didn't work for me. Dave, Roland, as I wrote here http://marc.info/?l=linux-rdmam=135603401830703w=2 I think what we need now is 2nd opinion, and I asked if Dave can give the patches a try, no more but no less... we need to know if it works Or. Changes between v3 and v2: - As proposed by Dave, added a patch that prevents sending of a task management function over a closed connection. Changes between v2 and v1: - Track connection state properly. - Make srp_reset_host() reset requests even if reconnecting fails -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/osm_torus.c: In dump_torus, make sure switch is present before dumping
On 02/04/2013 02:36 AM, Alex Netes wrote: Fix segmentation fault in osm_torus.c. Signed-off-by: Hal Rosenstock h...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com Acked-by: Jim Schutt jasc...@sandia.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/osm_torus.c: torus routing should fail with VLCap 1 on switch external ports
On 02/04/2013 02:36 AM, Alex Netes wrote: Signed-off-by: Hal Rosenstock h...@mellanox.com Signed-off-by: Alex Netes ale...@mellanox.com Acked-by: Jim Schutt jasc...@sandia.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] infiniband: hw/cxgb3/iwch_provider.c: fix uninitialized variable issue
The variable npages might be used uninitialized in line 594. Signed-off-by: Cong Ding ding...@gmail.com --- drivers/infiniband/hw/cxgb3/iwch_provider.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 145d82a..90ce483 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -559,7 +559,7 @@ static int iwch_reregister_phys_mem(struct ib_mr *mr, __be64 *page_list = NULL; int shift = 0; u64 total_size; - int npages; + int npages = 0; int ret; PDBG(%s ib_mr %p ib_pd %p\n, __func__, mr, pd); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infiniband: hw/cxgb3/iwch_provider.c: fix uninitialized variable issue
Acked-by: Steve Wise sw...@opengridcomputing.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib/ipoib: fix CM crash after commit b13912bbb4a2
Thanks, applied (although I wish you had included the excellent analysis from your other email in the changelog here). -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 22/62] infiniband/core: convert to idr_alloc()
Reviewed-by: Sean Hefty sean.he...@intel.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] librdmacm: Work-around kernel bug returning uid = 0
I updated the patch based on your feedback and pushed the change into my git tree. I plan on creating a new release of the librdmacm this quarter that will contain this fix. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] librdmacm: Work-around kernel bug returning uid = 0
On 04/02/2013 16:11, Bernd Schubert wrote: On 02/03/2013 08:24 AM, Or Gerlitz wrote: It would be also worthwhile to nail the fix to the root cause little further, e.g push kernel commit 418edaaba96 to -stable Agreed, I opened RHEL bugzilla #719749 (somehow marked as private, no idea why) and hope RedHat is going to back port it into their kernels soon. the mainline kernel -stable process is something else... Sean I assume you understood what I was up to. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html