Re: [PATCH] libibverbs: Force line-buffering in ibv_asyncwatch

2010-06-03 Thread Håkon Bugge

On Jun 2, 2010, at 19:05 , Roland Dreier wrote:

 setlinebuf() is pretty intuitive to understand, compared to setvbuf().
 
 I finally applied this; however in the end I decided to do
 
   setvbuf(stdout, NULL, _IOLBF, 0);
 
 instead of setlinebuf(), since in the past I've prefered more pedantic
 stuff (eg posix_memalign instead of memalign) to the older simpler
 traditional functions.  Kind of a trivial issue either way anyway.


Agree. Thanks anyway.


Håkon

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ummunotify: Userspace support for MMU notifications

2010-04-13 Thread Håkon Bugge

On Apr 13, 2010, at 1:59 , Jason Gunthorpe wrote:

 On Mon, Apr 12, 2010 at 04:03:59PM -0700, Andrew Morton wrote:
 
 As discussed in http://article.gmane.org/gmane.linux.drivers.openib/61925
 and follow-up messages, libraries using RDMA would like to track
 precisely when application code changes memory mapping via free(),
 munmap(), etc.  Current pure-userspace solutions using malloc hooks
 and other tricks are not robust, and the feeling among experts is that
 the issue is unfixable without kernel help.

I am not sure I agree with the premises here. ptMalloc and malloc hooks are not 
related to the issue in my opinion. User space library calls do not change 
virtual to physical mapping, system calls do. The following sys calls might 
change virtual to physical mapping: munmap(), mremap(), sbrk(), madvice(). What 
we need is glibc to provide hooks for these 4 sys calls and the general 
syscall() when its argument is one of the four mentioned syscalls. To me, that 
is what is needed, and the ummunotify direction seems way too complicated to me.

It is further claimed that … other tricks are not robust. I wrote the code 
used in Scali/Platform MPI handling the issue. I do not think its fair to claim 
that this MPI  is not robust in this matter nor that is performance is bad.


Thanks, Håkon


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ummunotify: Userspace support for MMU notifications

2010-04-13 Thread Håkon Bugge

On Apr 13, 2010, at 20:02 , Peter Zijlstra wrote:

 
 Yeah, virtual-physical maps can change through swapping, page
 migration, memory compaction, huge-page aggregation (the latter two not
 yet being upstream).

Assuming this holds true, RDMA will not work. And with no RDMA, we do not need 
ummunotify. Is that your argument?

Seriously, RDMA requires the virtual to physical mapping to remain constant for 
a period of time. And that time period is from memory registration to 
de-registration. If the virtual to physical mapping changes in that period for 
the registered memory area, I can't see how an HCA can handle that.

For MPI applications, the MPI API defines that the buffers used in 
communication cannot be changed or freed while a non-blocking transfer is in 
progress. So far, we are good. But memory registration (and in particular 
de-registration) is a costly process. Since MPI is about performance, the MPI 
library would like to  re-use earlier memory registrations for other transfers. 
The MPI library records earlier memory mappings (VA+bound) and checks if the 
VA+bound of a new transfer is already contained in memory revisions registered 
earlier.

The problem with this approach is that _normal_ activity _between_ the 
transfers changes the virtual to physical mapping and ruins previous memory 
registrations. The problem is to catch these. I simply argue that they should 
be caught the simplest possible way; a call-back from the system calls 
affecting the virtual to physical mapping.

 
 Even mlock() doesn't pin virtual-physical maps.

Can you elaborate on what you mean here?


Thanks, Håkon

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/2] Add support for enhanced atomic operations

2010-03-12 Thread Håkon Bugge
On Mar 11, 2010, at 19:59 , Roland Dreier wrote:
 I think we can worry about that if/when an HCA comes along that supports
 global atomics for ordinary atomics but not enhanced atomics.

With the proposed patches in place, how do you know if masked atomics are 
implemented or not? Guess apps need to know this information already on todays 
HCAs.

 Although perhaps it would be cleaner to change the atomic_cap enum to:
 
   /*
* IB_ATOMIC_NONE:  no atomic capability
* IB_ATOMIC_HCA:   all ops are atomic within HCA

But IB_ATOMIC_HCA  does not tell you if the masked ones are supported or not.

* IB_ATOMIC_GLOB:  standard ops atomic with respect to all
   memory ops; masked ops atomic within HCA

What if an HCA supports standards ops with respect to all memory ops, but the 
HCA does not support masked atomics?

Hence, I think it would be cleaner if a new capability, masked_atomic_cap, were 
introduced, using the original definitions (NONE, HCA, GLOB).


Thanks, Håkon


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/2] Add support for enhanced atomic operations

2010-03-11 Thread Håkon Bugge
Hi Vlad,

Did you consider my input in 
http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg02803.html wrt. to 
these enhancements?

 

Thanks, Håkon

On Mar 10, 2010, at 16:57 , Vladimir Sokolovsky wrote:

 Hi Roland,
 
 This patchset adds support for the following enhanced atomic
 operations:
 - Masked atomic compare and swap
 - Masked atomic fetch and add
 
 These operations enable using a smaller amount of memory when using
 multiple locks by using portions of a 64 bit value in an atomic
 operation.
 For some applications the memory savings are very significant. One
 example is fine grain lock implementations for huge data sets. In
 other cases, the benefit is the ability to update multiple fields with
 a single io operation.
 
 Vladimir Sokolovsky(2):
 IB/core: Add support for enhanced atomic operations
 mlx4/IB: Add support for enhanced atomic operations
 
 changes from V2:
 - patch #1: 
  Updated description
  Renamed:
IB_WR_ATOMIC_MASKED_CMP_AND_SWP - IB_WR_MASKED_ATOMIC_CMP_AND_SWP
IB_WR_ATOMIC_MASKED_FETCH_AND_ADD - IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
  In the ib_send_wr struct the new fields added before the rkey field
 
 - patch #2:
  Set IB_DEVICE_MASKED_ATOMIC flag with other flags that get set for
  all devices
 
 Regards,
 Vladimir
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Håkon Bugge
haakon.bu...@sun.com
+47 924 84 514



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/2] IB/core: Add support for enhanced atomic operations

2010-03-01 Thread Håkon Bugge
 Still in-development, but it should be ready very soon. We will be the
 first in-kernel user of atomics, as well as masked atomics. This tree
 does not support masked atomics yet because it is based on ofed 1.5.1.
 When 1.5.2 is officially released, I'll rebase, add mask support, and
 plan on pushing to mainline and ofed 1.6 as soon as it opens.



May be I missed something, but how do you guys intend to reflect this new 
functionality vs. the capabilities? A new atomic_enhanced_cap ?

The reason for asking is that the IB ordinary atomic repertoire fits nicely 
with those of PCIe Gen 3, so one could possible assume new HCAs supporting PCIe 
Gen3 to possess the ATOMIC_GLOB capability for atomic_cap. But I do not see 
that happening to the proposed Enhanced IB Atomics.


Thanks, Håkon

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Add support for enhanced atomic operations

2010-02-02 Thread Håkon Bugge

On Feb 2, 2010, at 16:54 , Hal Rosenstock wrote:

 On Tue, Feb 2, 2010 at 5:44 AM, Vladimir Sokolovsky
 [snip]
 Masked Fetch and Add (MFetchAdd)
 The MFetchAdd Atomic operation extends the functionality of the standard IB
 FetchAdd by
 allowing the user to split the target into multiple fields of selectable
 length. The atomic add is done
 independently on each one of this fields. A bit set in the field_boundary
 parameter specifies the
 field boundaries. The pseudo code below describes the operation:


As discussed by private email, my take is that it is more important to support 
adjacent fields than a single-bit fetch-and-add. Hence, encoding the mask 
slightly different where a one-to-zero bit transition indicates break-of-carry, 
but mask-wise treated as a one, allows adjacent bit-fields to be added.

Just my two cents.


Håkon

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ib_write_bw hanging when using max max_inline value

2010-01-25 Thread Håkon Bugge

On Jan 24, 2010, at 6:26 , Or Gerlitz wrote:
 attaching a debugger is typically helpful to see where a program talking 
 directly to the hardware hangs. If it happens on the slow pass, strace can be 
 useful as well.  Did you take a look on the actual values set for this qp, 
 that it as suggested by ibv_create_qp(3) look on the init attributes after 
 the function returns.


The capabilities in qp_init_attr used as input to ibv_create_qp() are:

max_send_wr = 100, max_recv_wr = 1, max_send_sge = 1, max_recv_sge = 1, 
max_inline_data = 928

Upon return from ibv_create_qp, the capabilities are modified to the following 
(note, max_inline_data is not changed);

max_send_wr = 125, max_recv_wr = 1, max_send_sge = 32, max_recv_sge = 1, 
max_inline_data = 928

All WRs have IBV_SEND_SIGNALED set. The program does not get any completions, 
hence it is running in the while loop surrounding the call to ibv_poll_cq().

Note decreasing the size of the RDMA to 912 bytes, the program works.


-h

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ibv_asyncwatch and buffering

2010-01-21 Thread Håkon Bugge
Hi,


It seems like ibv_asyncwatch defaults to standard libc behavior wrt. to 
buffering. That is, if you pipe the output of ibv_asyncwatch, no output 
happens, as the stdout is redirected to a pipe and block buffering is used by 
default.

One could a) use sprintf() and write() or b) force libc buffering to line-mode 
by means of setlinebuf(stdout).

That would make ibv_asyncwatch more useful in scripted environments.



Thanks, 
Håkon



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libibverbs: Force line-buffering in ibv_asyncwatch

2010-01-21 Thread Håkon Bugge
ibv_asyncwatch defaults to block-buffering when stdout is redirected to a file 
or pipe. This fix makes it more usable in scripted environments.

Signed-off-by: Hakon Bugge haakon.bu...@sun.com
---
diff --git a/examples/asyncwatch.c b/examples/asyncwatch.c
index e56b4dc..f9fe6ff 100644
--- a/examples/asyncwatch.c
+++ b/examples/asyncwatch.c
@@ -98,6 +98,9 @@ int main(int argc, char *argv[])
return 1;
}
 
+   /* Force line-buffering if stdout is redirected */
+   setlinebuf(stdout);
+
printf(%s: async event FD %d\n,
   ibv_get_device_name(*dev_list), context-async_fd);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Force line-buffering in ibv_asyncwatch

2010-01-21 Thread Håkon Bugge
I guess it depends. ibverbs has other non-POSIX compliant libc functions - so I 
am not sure there is a POSIX policy enforcement.

If I understand correctly, the charter of OFED is to produce a Linux 
distribution (and also a Windows distro).

setlinebuf() is pretty intuitive to understand, compared to setvbuf().

-h


On Jan 21, 2010, at 15:18 , Bart Van Assche wrote:

 On Thu, Jan 21, 2010 at 2:40 PM, Håkon Bugge haakon.bu...@sun.com wrote:
 ibv_asyncwatch defaults to block-buffering when stdout is redirected to a 
 file or pipe. This fix makes it more usable in scripted environments.
 
 Signed-off-by: Hakon Bugge haakon.bu...@sun.com
 ---
 diff --git a/examples/asyncwatch.c b/examples/asyncwatch.c
 index e56b4dc..f9fe6ff 100644
 --- a/examples/asyncwatch.c
 +++ b/examples/asyncwatch.c
 @@ -98,6 +98,9 @@ int main(int argc, char *argv[])
return 1;
}
 
 +   /* Force line-buffering if stdout is redirected */
 +   setlinebuf(stdout);
 +
printf(%s: async event FD %d\n,
   ibv_get_device_name(*dev_list), context-async_fd);
 
 It might be a good idea to replace setlinebuf() by setvbuf().
 setlinebuf() is a BSD function while setvbuf is POSIX (see also
 http://opengroup.org/onlinepubs/009695399/functions/setvbuf.html).
 
 Bart.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Håkon Bugge
haakon.bu...@sun.com
+47 924 84 514



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html