Granted that I'm not an LNet expert, but "errno: -1 descr: cannot parse net 
'<255:65535>' " doesn't immediately lead me to the same conclusion as if 
"unknown internface 'ib0' " were printed for the error message.  Also "errno: 
-1" is "-EPERM = Operation not permitted", and doesn't give the same 
information as "-ENXIO = No such device or address" or even "-EINVAL = Invalid 
argument" would.

That said, I can't even offer a patch for this myself, since that exact error 
message is used in a few different places, though I suspect it is coming from 
lustre_lnet_config_ni().

Looking further into this, now that I've found where (I think) the error 
message is generated, it seems that "errno: -1" is not "-EPERM" but rather 
"LUSTRE_CFG_RC_BAD_PARAM", which is IMHO a travesty to use different error 
numbers (and then print them after "errno:") instead of existing POSIX error 
codes that could fill the same role (with some creative mapping):

    #define LUSTRE_CFG_RC_NO_ERR                     0  => fine
    #define LUSTRE_CFG_RC_BAD_PARAM                 -1  => -EINVAL
    #define LUSTRE_CFG_RC_MISSING_PARAM             -2  => -EFAULT
    #define LUSTRE_CFG_RC_OUT_OF_RANGE_PARAM        -3  => -ERANGE
    #define LUSTRE_CFG_RC_OUT_OF_MEM                -4  => -ENOMEM
    #define LUSTRE_CFG_RC_GENERIC_ERR               -5  => -ENODATA
    #define LUSTRE_CFG_RC_NO_MATCH                  -6  => -ENOMSG
    #define LUSTRE_CFG_RC_MATCH                     -7  => -EXFULL
    #define LUSTRE_CFG_RC_SKIP                      -8  => -EBADSLT
    #define LUSTRE_CFG_RC_LAST_ELEM                 -9  => -ECHRNG
    #define LUSTRE_CFG_RC_MARSHAL_FAIL              -10 => -ENOSTR

I don't think "overloading" the POSIX error codes to mean something similar is 
worse than using random numbers to report errors.  Also, in some cases (even in 
lustre_lnet_config_ni()) it is using "rc = -errno" so the LUSTRE_CFG_RC_* 
errors are *already* conflicting with POSIX error numbers, and it impossible to 
distinguish between them...

The main question is whether changing these numbers will break a user->kernel 
interface, or if these definitions are only in userspace?    It looks like 
lnetctl.c is only ever checking "!= LUSTRE_CFG_RC_NO_ERR", so maybe it is fine? 
 None of the values currently overlap, so it would be possible to start 
accepting either of the values for the return in the user tools, and then at 
some point in the future start actually returning them...  Something for the 
LNet folks to figure out.

Cheers, Andreas

On Jan 10, 2024, at 13:29, Jeff Johnson 
<jeff.john...@aeoncomputing.com<mailto:jeff.john...@aeoncomputing.com>> wrote:

A LU ticket and patch for lnetctl or for me being an under-caffeinated
idiot? ;-)

On Wed, Jan 10, 2024 at 12:06 PM Andreas Dilger 
<adil...@whamcloud.com<mailto:adil...@whamcloud.com>> wrote:

It would seem that the error message could be improved in this case?  Could you 
file an LU ticket for that with the reproducer below, and ideally along with a 
patch?

Cheers, Andreas
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to