Re: [PATCH] libibverbs: Force line-buffering in ibv_asyncwatch
On Jun 2, 2010, at 19:05 , Roland Dreier wrote: setlinebuf() is pretty intuitive to understand, compared to setvbuf(). I finally applied this; however in the end I decided to do setvbuf(stdout, NULL, _IOLBF, 0); instead of setlinebuf(), since in the past I've prefered more pedantic stuff (eg posix_memalign instead of memalign) to the older simpler traditional functions. Kind of a trivial issue either way anyway. Agree. Thanks anyway. Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ummunotify: Userspace support for MMU notifications
On Apr 13, 2010, at 1:59 , Jason Gunthorpe wrote: On Mon, Apr 12, 2010 at 04:03:59PM -0700, Andrew Morton wrote: As discussed in http://article.gmane.org/gmane.linux.drivers.openib/61925 and follow-up messages, libraries using RDMA would like to track precisely when application code changes memory mapping via free(), munmap(), etc. Current pure-userspace solutions using malloc hooks and other tricks are not robust, and the feeling among experts is that the issue is unfixable without kernel help. I am not sure I agree with the premises here. ptMalloc and malloc hooks are not related to the issue in my opinion. User space library calls do not change virtual to physical mapping, system calls do. The following sys calls might change virtual to physical mapping: munmap(), mremap(), sbrk(), madvice(). What we need is glibc to provide hooks for these 4 sys calls and the general syscall() when its argument is one of the four mentioned syscalls. To me, that is what is needed, and the ummunotify direction seems way too complicated to me. It is further claimed that … other tricks are not robust. I wrote the code used in Scali/Platform MPI handling the issue. I do not think its fair to claim that this MPI is not robust in this matter nor that is performance is bad. Thanks, Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ummunotify: Userspace support for MMU notifications
On Apr 13, 2010, at 20:02 , Peter Zijlstra wrote: Yeah, virtual-physical maps can change through swapping, page migration, memory compaction, huge-page aggregation (the latter two not yet being upstream). Assuming this holds true, RDMA will not work. And with no RDMA, we do not need ummunotify. Is that your argument? Seriously, RDMA requires the virtual to physical mapping to remain constant for a period of time. And that time period is from memory registration to de-registration. If the virtual to physical mapping changes in that period for the registered memory area, I can't see how an HCA can handle that. For MPI applications, the MPI API defines that the buffers used in communication cannot be changed or freed while a non-blocking transfer is in progress. So far, we are good. But memory registration (and in particular de-registration) is a costly process. Since MPI is about performance, the MPI library would like to re-use earlier memory registrations for other transfers. The MPI library records earlier memory mappings (VA+bound) and checks if the VA+bound of a new transfer is already contained in memory revisions registered earlier. The problem with this approach is that _normal_ activity _between_ the transfers changes the virtual to physical mapping and ruins previous memory registrations. The problem is to catch these. I simply argue that they should be caught the simplest possible way; a call-back from the system calls affecting the virtual to physical mapping. Even mlock() doesn't pin virtual-physical maps. Can you elaborate on what you mean here? Thanks, Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/2] Add support for enhanced atomic operations
On Mar 11, 2010, at 19:59 , Roland Dreier wrote: I think we can worry about that if/when an HCA comes along that supports global atomics for ordinary atomics but not enhanced atomics. With the proposed patches in place, how do you know if masked atomics are implemented or not? Guess apps need to know this information already on todays HCAs. Although perhaps it would be cleaner to change the atomic_cap enum to: /* * IB_ATOMIC_NONE: no atomic capability * IB_ATOMIC_HCA: all ops are atomic within HCA But IB_ATOMIC_HCA does not tell you if the masked ones are supported or not. * IB_ATOMIC_GLOB: standard ops atomic with respect to all memory ops; masked ops atomic within HCA What if an HCA supports standards ops with respect to all memory ops, but the HCA does not support masked atomics? Hence, I think it would be cleaner if a new capability, masked_atomic_cap, were introduced, using the original definitions (NONE, HCA, GLOB). Thanks, Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/2] Add support for enhanced atomic operations
Hi Vlad, Did you consider my input in http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg02803.html wrt. to these enhancements? Thanks, Håkon On Mar 10, 2010, at 16:57 , Vladimir Sokolovsky wrote: Hi Roland, This patchset adds support for the following enhanced atomic operations: - Masked atomic compare and swap - Masked atomic fetch and add These operations enable using a smaller amount of memory when using multiple locks by using portions of a 64 bit value in an atomic operation. For some applications the memory savings are very significant. One example is fine grain lock implementations for huge data sets. In other cases, the benefit is the ability to update multiple fields with a single io operation. Vladimir Sokolovsky(2): IB/core: Add support for enhanced atomic operations mlx4/IB: Add support for enhanced atomic operations changes from V2: - patch #1: Updated description Renamed: IB_WR_ATOMIC_MASKED_CMP_AND_SWP - IB_WR_MASKED_ATOMIC_CMP_AND_SWP IB_WR_ATOMIC_MASKED_FETCH_AND_ADD - IB_WR_MASKED_ATOMIC_FETCH_AND_ADD In the ib_send_wr struct the new fields added before the rkey field - patch #2: Set IB_DEVICE_MASKED_ATOMIC flag with other flags that get set for all devices Regards, Vladimir -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Håkon Bugge haakon.bu...@sun.com +47 924 84 514 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/2] IB/core: Add support for enhanced atomic operations
Still in-development, but it should be ready very soon. We will be the first in-kernel user of atomics, as well as masked atomics. This tree does not support masked atomics yet because it is based on ofed 1.5.1. When 1.5.2 is officially released, I'll rebase, add mask support, and plan on pushing to mainline and ofed 1.6 as soon as it opens. May be I missed something, but how do you guys intend to reflect this new functionality vs. the capabilities? A new atomic_enhanced_cap ? The reason for asking is that the IB ordinary atomic repertoire fits nicely with those of PCIe Gen 3, so one could possible assume new HCAs supporting PCIe Gen3 to possess the ATOMIC_GLOB capability for atomic_cap. But I do not see that happening to the proposed Enhanced IB Atomics. Thanks, Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Add support for enhanced atomic operations
On Feb 2, 2010, at 16:54 , Hal Rosenstock wrote: On Tue, Feb 2, 2010 at 5:44 AM, Vladimir Sokolovsky [snip] Masked Fetch and Add (MFetchAdd) The MFetchAdd Atomic operation extends the functionality of the standard IB FetchAdd by allowing the user to split the target into multiple fields of selectable length. The atomic add is done independently on each one of this fields. A bit set in the field_boundary parameter specifies the field boundaries. The pseudo code below describes the operation: As discussed by private email, my take is that it is more important to support adjacent fields than a single-bit fetch-and-add. Hence, encoding the mask slightly different where a one-to-zero bit transition indicates break-of-carry, but mask-wise treated as a one, allows adjacent bit-fields to be added. Just my two cents. Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib_write_bw hanging when using max max_inline value
On Jan 24, 2010, at 6:26 , Or Gerlitz wrote: attaching a debugger is typically helpful to see where a program talking directly to the hardware hangs. If it happens on the slow pass, strace can be useful as well. Did you take a look on the actual values set for this qp, that it as suggested by ibv_create_qp(3) look on the init attributes after the function returns. The capabilities in qp_init_attr used as input to ibv_create_qp() are: max_send_wr = 100, max_recv_wr = 1, max_send_sge = 1, max_recv_sge = 1, max_inline_data = 928 Upon return from ibv_create_qp, the capabilities are modified to the following (note, max_inline_data is not changed); max_send_wr = 125, max_recv_wr = 1, max_send_sge = 32, max_recv_sge = 1, max_inline_data = 928 All WRs have IBV_SEND_SIGNALED set. The program does not get any completions, hence it is running in the while loop surrounding the call to ibv_poll_cq(). Note decreasing the size of the RDMA to 912 bytes, the program works. -h -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ibv_asyncwatch and buffering
Hi, It seems like ibv_asyncwatch defaults to standard libc behavior wrt. to buffering. That is, if you pipe the output of ibv_asyncwatch, no output happens, as the stdout is redirected to a pipe and block buffering is used by default. One could a) use sprintf() and write() or b) force libc buffering to line-mode by means of setlinebuf(stdout). That would make ibv_asyncwatch more useful in scripted environments. Thanks, Håkon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libibverbs: Force line-buffering in ibv_asyncwatch
ibv_asyncwatch defaults to block-buffering when stdout is redirected to a file or pipe. This fix makes it more usable in scripted environments. Signed-off-by: Hakon Bugge haakon.bu...@sun.com --- diff --git a/examples/asyncwatch.c b/examples/asyncwatch.c index e56b4dc..f9fe6ff 100644 --- a/examples/asyncwatch.c +++ b/examples/asyncwatch.c @@ -98,6 +98,9 @@ int main(int argc, char *argv[]) return 1; } + /* Force line-buffering if stdout is redirected */ + setlinebuf(stdout); + printf(%s: async event FD %d\n, ibv_get_device_name(*dev_list), context-async_fd); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libibverbs: Force line-buffering in ibv_asyncwatch
I guess it depends. ibverbs has other non-POSIX compliant libc functions - so I am not sure there is a POSIX policy enforcement. If I understand correctly, the charter of OFED is to produce a Linux distribution (and also a Windows distro). setlinebuf() is pretty intuitive to understand, compared to setvbuf(). -h On Jan 21, 2010, at 15:18 , Bart Van Assche wrote: On Thu, Jan 21, 2010 at 2:40 PM, Håkon Bugge haakon.bu...@sun.com wrote: ibv_asyncwatch defaults to block-buffering when stdout is redirected to a file or pipe. This fix makes it more usable in scripted environments. Signed-off-by: Hakon Bugge haakon.bu...@sun.com --- diff --git a/examples/asyncwatch.c b/examples/asyncwatch.c index e56b4dc..f9fe6ff 100644 --- a/examples/asyncwatch.c +++ b/examples/asyncwatch.c @@ -98,6 +98,9 @@ int main(int argc, char *argv[]) return 1; } + /* Force line-buffering if stdout is redirected */ + setlinebuf(stdout); + printf(%s: async event FD %d\n, ibv_get_device_name(*dev_list), context-async_fd); It might be a good idea to replace setlinebuf() by setvbuf(). setlinebuf() is a BSD function while setvbuf is POSIX (see also http://opengroup.org/onlinepubs/009695399/functions/setvbuf.html). Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Håkon Bugge haakon.bu...@sun.com +47 924 84 514 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html