With patch
https://github.com/torvalds/linux/commit/e15f431fe2d53cd4673510736da7d4fa1090e096,
the use of ENOSYS has been clarified.
/*
* This error code is special: arch syscall entry code will return
* -ENOSYS if users try to call a syscall that doesn't exist. To keep
* failures of syscalls that really do exist distinguishable from
* failures due to attempts to use a nonexistent syscall, syscall
* implementations should refrain from returning -ENOSYS.
*/
#define ENOSYS 38 /* Invalid system call number */
Now, legacy HCA drivers returns ENOSYS, for example the mlx4 driver.
Open MPI adheres to this, see for example the following code snippet:
#ifdef HAVE_IBV_RESIZE_CQ
else if (cq_size > mca_btl_openib_component.ib_cq_size[cq]){
int rc;
rc = ibv_resize_cq(device->ib_cq[cq], cq_size);
/* For ConnectX the resize CQ is not implemented and verbs returns
-ENOSYS
* but should return ENOSYS. So it is reason for abs */
if(rc && ENOSYS != abs(rc)) {
BTL_ERROR(("cannot resize completion queue, error: %d", rc));
return OPAL_ERROR;
}
}
#endif
Modern HCA drivers cannot return ENOSYS in this case, because they will not
pass checkpatch.pl. See the following patch:
https://github.com/torvalds/linux/commit/91c9afaf97ee554d2cd3042a5ad01ad21c99e8c4
Hence, my humble request is that Open MPI also checks for EOPNOTSUPP (the
choice of error code can of course be discussed) wherever it checks for ENOSYS.
Thxs, Håkon