Re: [PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-05-02 Thread Jeff Squyres (jsquyres)
On May 1, 2013, at 11:30 AM, Doug Ledford dledf...@redhat.com wrote:

 This is fine with me, however, I think you also need to bump the
 autotools version to the latest upstream.  The automated checkers in our
 build environment is spitting out errors about a number of upstream
 packages where the autotools used to configure the package does not
 include proper arm support.  The latest autotools bring in all of the
 forthcoming arm variants.  So I would like to see both of these things done.

Are you referring to the version of Autotools that Roland uses to create his 
tarballs?

Because I have no control over that.  :-)


 On Apr 25, 2013, at 11:38 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com 
 wrote:
 
 Bump.
 
 On Apr 22, 2013, at 1:41 PM, Jeff Squyres jsquy...@cisco.com wrote:
 
 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 
 
 
 -- 
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to: 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-05-02 Thread Jeff Squyres (jsquyres)
On Apr 22, 2013, at 4:00 PM, Doug Ledford dledf...@redhat.com wrote:

 2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as 
 switch(mtu) case IBV_MTU_1024: ... will need to be updated to 
 switch(mtu) case 1024: 
 
 I was actually thinking that an ibverbs API version 2.0 might be an
 interesting way to go.  The proliferation of non-IB link layers
 providing the verbs API make some of the original assumptions of IB link
 layer in the original API obsolete.  But, if we were to do that, I'd
 take some time to really think the issue over and try to catch all of
 the needed updates in one go.


In addition to the MTU, another obvious issue is the active_speed attribute on 
the ibv_port_attr.  On the kernel side, it's an enum (IB_SPEED_SDR through 
IB_SPEED_EDR), but there's no corresponding enum names in libibverbs.  

It would be good to make this value a non-enum-int, too.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] OpenSM: dfsssp - add support for multicast

2013-05-02 Thread Jens Domke
Recent tests on a large system revealed a problem with loops in the multicast 
routing.
Using DFSSSP together with the default mcast routing algorithm of OpenSM can
produce loops in the fabric.

This patch adds the mcast_build_stree function to the DFSSSP routing algorithm,
so that DFSSSP is able to calculate the correct mcast forwarding tables for the
subnet.

It almost does the same steps as the default mcast routing, except that it
uses the Dijkstra algorithm to generate the spanning tree instead of using the
hop count information given by the unicast routing.

General overview of the algorithm in pseudo-code:
1) identify the ports, which are part of the multicast group
2) find the 'best' switch (depending on the hop count) for the mcast group,
   which can be used as a root of the spanning tree
3) perform a dijkstra step with the root switch as starting point
   to generate a spanning tree to all other switches in the subnet
4) build the mcast forwarding tables for relevant switches:
   4.1) select a switch which has mcast member ports connected to it
   4.2) set the downstream ports for the mcast member ports in the mcft
   4.3) traverse towards the root of the spanning tree and set up-/downstream
ports on this path for all involved switches
   4.4) goto 4.1 until all switches have been processed

The same mcast algorithm will be used for SSSP, because SSSP has the potential 
to
produce loops in the mcast forwarding table as well.

Signed-off-by: Jens Domke domke.j...@m.titech.ac.jp
---
 include/opensm/osm_mcast_mgr.h |   72 +++
 opensm/Makefile.am |1 +
 opensm/osm_mcast_mgr.c |   35 
 opensm/osm_ucast_dfsssp.c  |  194 
 4 files changed, 283 insertions(+), 19 deletions(-)
 create mode 100644 include/opensm/osm_mcast_mgr.h

diff --git a/include/opensm/osm_mcast_mgr.h b/include/opensm/osm_mcast_mgr.h
new file mode 100644
index 000..291a478
--- /dev/null
+++ b/include/opensm/osm_mcast_mgr.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
+ * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
+ * Copyright (c) 2009-2011 ZIH, TU Dresden, Federal Republic of Germany. All 
rights reserved.
+ * Copyright (C) 2012-2013 Tokyo Institute of Technology. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+/*
+ * Abstract:
+ * Declaration of osm_mcast_work_obj_t.
+ * Provide access to a mcast function which searches the root swicth for
+ * a spanning tree.
+ */
+
+#ifndef _OSM_MCAST_MGR_H_
+#define _OSM_MCAST_MGR_H_
+
+#ifdef __cplusplus
+#  define BEGIN_C_DECLS extern C {
+#  define END_C_DECLS   }
+#else  /* !__cplusplus */
+#  define BEGIN_C_DECLS
+#  define END_C_DECLS
+#endif /* __cplusplus */
+
+BEGIN_C_DECLS
+
+typedef struct osm_mcast_work_obj {
+   cl_list_item_t list_item;
+   osm_port_t *p_port;
+   cl_map_item_t map_item;
+} osm_mcast_work_obj_t;
+
+int osm_mcast_make_port_list_and_map(cl_qlist_t * list, cl_qmap_t * map,
+osm_mgrp_box_t * mbox);
+
+void osm_mcast_drop_port_list(cl_qlist_t * list);
+
+osm_switch_t * osm_mcast_mgr_find_root_switch(osm_sm_t * sm, cl_qlist_t * 
list);
+
+END_C_DECLS
+#endif /* _OSM_MCAST_MGR_H_ */
diff --git a/opensm/Makefile.am b/opensm/Makefile.am
index 7fd6bc6..20318cc 100644
--- a/opensm/Makefile.am
+++ b/opensm/Makefile.am
@@ -116,6 +116,7 @@ opensminclude_HEADERS = \

RE: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on

2013-05-02 Thread Yan Burman


 -Original Message-
 From: Michael S. Tsirkin [mailto:m...@redhat.com]
 Sent: Thursday, May 02, 2013 04:56
 To: Or Gerlitz
 Cc: Roland Dreier; io...@lists.linux-foundation.org; Yan Burman; linux-
 r...@vger.kernel.org
 Subject: Re: decent performance drop for SCSI LLD / SAN initiator when
 iommu is turned on
 
 On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
  Hi Roland, IOMMU folks,
 
  So we've noted that when configuring the kernel  booting with intel
  iommu set to on on a physical node (non VM, and without enabling SRIOV
  by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
  initiator is reduced notably, e.g in the testbed we looked today we
  had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
  turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
  turned on. No change on the target node between runs.
 
 That's why we have iommu=pt.
 See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.

I tried passing intel_iommu=on iommu=pt to 3.8.11 kernel and I still get 
performance degradation.
I get the same numbers with iommu=pt as without it.

I wanted to send perf output, but currently I seem to have some problem with 
its output.
Will try to get perf differences next week.

Yan


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: initial LIO iSER performance numbers [was: GIT PULL] target updates for v3.10-rc1)

2013-05-02 Thread Nicholas A. Bellinger
On Thu, 2013-05-02 at 16:58 +0300, Or Gerlitz wrote:
 On 30/04/2013 05:59, Nicholas A. Bellinger wrote:
  Hello Linus!
 
  Here are the target pending changes for the v3.10-rc1 merge window.
 
  Please go ahead and pull from:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
  for-next-merge
 

SNIP

 Hi Nic, everyone,
 
 So LIO iser target code is now merged into Linus tree, and will be in 
 kernel 3.10, exciting!
 
 Here's some data on raw performance numbers we were able to get with the 
 LIO iser code.
 
 For single initiator and single lun, block sizes varying over the range 
 1KB,2KB... 128KB
 doing random read:
 
 1KB 227,870K
 2KB 458,099K
 4KB 909,761K
 8KB 1,679,922K
 16KB 3,233,753K
 32KB 4,905,139K
 64KB 5,294,873K
 128KB 5,565,235K
 
 When enlarging the number of luns and still with single initiator, for 
 1KB randomreads we get:
 
 1 LUN  = 230k IOPS
 2 LUNs = 420k IOPS
 4 LUNs = 740k IOPS
 
 When enlarging the number of initiators, and each having four lunswe get 
 for 1KB random reads:
 
 1 initiator  x 4 LUNs = 740k  IOPS
 2 initiators x 4 LUNs = 1480k IOPS
 3 initiators x 4 LUNs = 1570k IOPS
 
 So all in all, things scale pretty nicely, and we observe a some bottleneck
 in the IOPS rate around 1.6 Million IOPS, so there's where to improve...
 

Excellent.  Thanks for the posting these initial performance results.

 Here's the fio command line used by the initiators
 
 $ fio --cpumask=0xfc --rw=randread --bs=1k --numjobs=2 --iodepth=128 
 --runtime=62 --time_based --size=1073741824k --loops=1 --ioengine=libaio 
 --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 
 --norandommap --group_reporting --exitall --name 
 dev-sdb-randread-1k-2thr-libaio-128iodepth-62sec --filename=/dev/sdb
 
 And some details on the setup:
 
 The nodes are HP ProLiant DL380p Gen8 with the following CPU: Intel(R) 
 Xeon(R) CPU E5-2650 0 @ 2.00GHz
 two NUMA nodes with eight cores each, 32GB RAM, PCI express gen3 8x, the 
 HCA being Mellanox ConnectX3 with firmware 2.11.500
 
 The target node was running upstream kernel and the initiators RHEL 6.3 
 kernel, all X86_64
 
 We used RAMDISK_MCP backend which was patched to act as NULL device, so 
 we can test the raw iSER wire performance.
 

Btw, I'll be including a similar patch to allow for RAMDISK_NULL to be
configured as a NULL device mode.

Thanks Or  Co!

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html