Just use snprintf to copy the error message from strerror_r into a thread-local buffer of 64 bytes or so. Then preserve the existing terror() interface.
Can you open a jira for this? best, Colin On Thu, Dec 11, 2014 at 8:35 PM, malcolm <malcolm.kaval...@oracle.com> wrote: > So, turns out that if I had naively changed all calls to terror or > references to sys_errlist, to using strerror_r, then I would have broken > code for Windows and HPUX (and possibly other OSes). > > If we are to assume that current code runs fine on all platforms (maybe even > AIX an MacOS, for example), then any change/additions made to the code and > not ifdeffed appropriately can break on other OSes. On the other hand, too > many ifdefs can pollute the code source and render it less readable (though > possibly less important). > > In the general case what are code contributors responsibilities to adding > code regarding OSes besides Linux ? > What OSes does jenkins test on ? > I guess maintainers of code on non-tested platforms are responsible for > their own testing ? > > How do we avoid the ping-pong effect, i.e. I make a generic change to code > which breaks on Windows, then the Windows maintainer reverts changes to > break on Solaris for example ? Or does this not happen in actuality ? > > > On 12/11/2014 11:25 PM, Asokan, M wrote: >> >> Hi Malcom, >> The Windows versions of strerror() and strerror_s() functions are >> probably meant for ANSI C library functions that set errno. For core >> Windows API calls (like UNIX system calls), one gets the error number by >> calling GetLastError() function. In the code snippet I sent earlier, the >> "code" argument is the value returned by GetLastError(). Neither strerror() >> nor strerror_s() will give the correct error message for this error code. >> >> You could probably look at libwinutils.c in Hadoop source. It uses >> FormatMessageW (which returns messages in Unicode.) My requirement was to >> return messages in current system locale. >> >> -- Asokan >> ________________________________________ >> From: malcolm [malcolm.kaval...@oracle.com] >> Sent: Thursday, December 11, 2014 4:04 PM >> To: common-dev@hadoop.apache.org >> Subject: Re: Solaris Port >> >> Hi Asok, >> >> I googled and found that windows has strerror, and strerror_s (which is >> the strerror_r equivalent). >> Is there a reason why you didn't use this call ? >> >> On 12/11/2014 06:27 PM, Asokan, M wrote: >>> >>> Hi Malcom, >>> Recently, I had to work on a function to get system error message on >>> various systems. Here is the piece of code I came up with. Hope it helps. >>> >>> static void get_system_error_message(char * buf, int buf_len, int code) >>> { >>> #if defined(_WIN32) >>> LPVOID lpMsgBuf; >>> DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | >>> FORMAT_MESSAGE_FROM_SYSTEM | >>> FORMAT_MESSAGE_IGNORE_INSERTS, >>> NULL, code, >>> MAKELANGID(LANG_NEUTRAL, >>> SUBLANG_DEFAULT), >>> /* Default >>> language */ >>> (LPTSTR) &lpMsgBuf, 0, NULL); >>> if (status > 0) >>> { >>> strncpy(buf, (char *)lpMsgBuf, buf_len-1); >>> buf[buf_len-1] = '\0'; >>> /* Free the buffer returned by system */ >>> LocalFree(lpMsgBuf); >>> } >>> else >>> { >>> _snprintf(buf, buf_len-1 , "%s %d", >>> "Can't get system error message for code", code); >>> buf[buf_len-1] = '\0'; >>> } >>> #else >>> #if defined(_HPUX_SOURCE) >>> { >>> char * msg; >>> errno = 0; >>> msg = strerror(code); >>> if (errno == 0) >>> { >>> strncpy(buf, msg, buf_len-1); >>> buf[buf_len-1] = '\0'; >>> } >>> else >>> { >>> snprintf(buf, buf_len, "%s %d", >>> "Can't get system error message for code", code); >>> } >>> } >>> #else >>> if (strerror_r(code, buf, buf_len) != 0) >>> { >>> snprintf(buf, buf_len, "%s %d", >>> "Can't get system error message for code", code); >>> } >>> #endif >>> #endif >>> } >>> >>> Note that HPUX does not have strerror_r() since strerror() itself is >>> thread-safe. Also Windows does not have snprintf(). The equivalent >>> function _snprintf() has a subtle difference in its interface. >>> >>> -- Asokan >>> ________________________________________ >>> From: malcolm [malcolm.kaval...@oracle.com] >>> Sent: Thursday, December 11, 2014 11:02 AM >>> To: common-dev@hadoop.apache.org >>> Subject: Re: Solaris Port >>> >>> Fine with me, I volunteer to do this, if accepted. >>> >>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote: >>>> >>>> sys_errlist was removed for a reason. Creating a fake sys_errlist on >>>> Solaris will mean the libhadoop.so will need to be tied a specific build >>>> (kernel/include pairing) and therefore limits upward >>>> mobility/compatibility. >>>> That doesn’t seem like a very good idea. >>>> >>>> IMO, switching to strerror_r is much preferred, since other than the >>>> brain-dead GNU libc version, is highly portable and should work regardless >>>> of the kernel or OS in place. >>>> >>>> On Dec 11, 2014, at 5:20 AM, malcolm <malcolm.kaval...@oracle.com> >>>> wrote: >>>> >>>>> FYI, there are a couple more files that reference sys_errlist directly >>>>> (not just terror within exception.c) , but also hdfs_http_client.c and >>>>> NativeiO.c >>>>> >>>>> On 12/11/2014 07:38 AM, malcolm wrote: >>>>>> >>>>>> Hi Colin, >>>>>> >>>>>> Exactly, as you noticed, the problem is the thread-local buffer needed >>>>>> to return from terror. >>>>>> Currently, terror just returns a static string from an array, this is >>>>>> fast, simple and error-proof. >>>>>> >>>>>> In order to use strerror_r inside terror, would require allocating a >>>>>> buffer inside terror and depend on the caller to free the buffer after >>>>>> using it, or to pass a buffer to terrror (which is basically the same as >>>>>> strerror_r, rendering terror redundant). >>>>>> Both cases require modification outside terror itself, as far as I can >>>>>> tell, no simple fix. Unless you have an alternative which I haven't >>>>>> thought >>>>>> of ? >>>>>> >>>>>> As far as I can tell, we have two choices: >>>>>> >>>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer >>>>>> from the callee. >>>>>> Advantage: a more modern portable interface. >>>>>> Disadvantage: All calls to terror need to be modified, though >>>>>> all seem to be in a few files as far as I can tell. >>>>>> >>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris) >>>>>> Advantage: no change to any calls to terror >>>>>> Disadvantage: 2 additional files added to source tree (.c and >>>>>> .h) and some minor ifdefs only used for Solaris. >>>>>> >>>>>> I think it is more a question of style than anything else, so I leave >>>>>> you to make the call. >>>>>> >>>>>> Thanks for your patience, >>>>>> Malcolm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote: >>>>>>> >>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm >>>>>>> <malcolm.kaval...@oracle.com> wrote: >>>>>>>> >>>>>>>> Hi Colin, >>>>>>>> >>>>>>>> Thanks for the hints around JIRAs. >>>>>>>> >>>>>>>> You are correct errno still exists, however sys_errlist does not. >>>>>>>> >>>>>>>> Hadoop uses a function terror (defined in exception.c) which indexes >>>>>>>> sys_errlist by errno to return the error message from the array. >>>>>>>> This >>>>>>>> function is called 26 times in various places (in 2.2) >>>>>>>> >>>>>>>> Originally, I thought to replace all calls to terror with strerror, >>>>>>>> but >>>>>>>> there can be issues with multi-threading (it returns a buffer which >>>>>>>> can be >>>>>>>> overwritten), so it seemed simpler just to recreate the sys_errlist >>>>>>>> message >>>>>>>> array. >>>>>>>> >>>>>>>> There is also a multi-threaded version strerror_r where you pass the >>>>>>>> buffer >>>>>>>> as a parameter, but this would necessitate changing every call to >>>>>>>> terror >>>>>>>> with mutiple lines of code. >>>>>>> >>>>>>> Why don't you just use strerror_r inside terror()? >>>>>>> >>>>>>> I wrote that code originally. The reason I didn't want to use >>>>>>> strerror_r there is because GNU libc provides a non-POSIX definition >>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you >>>>>>> can do it. You also will require a thread-local buffer to hold the >>>>>>> return from strerror_r, since it is not guaranteed to be static >>>>>>> (although in practice it is 99% of the time-- another annoyance with >>>>>>> the API). >>>>>>> >>>>>>> >>> ________________________________ >>> >>> >>> >>> ATTENTION: ----- >>> >>> The information contained in this message (including any files >>> transmitted with this message) may contain proprietary, trade secret or >>> other confidential and/or legally privileged information. Any pricing >>> information contained in this message or in any files transmitted with this >>> message is always confidential and cannot be shared with any third parties >>> without prior written approval from Syncsort. This message is intended to be >>> read only by the individual or entity to whom it is addressed or by their >>> designee. If the reader of this message is not the intended recipient, you >>> are on notice that any use, disclosure, copying or distribution of this >>> message, in any form, is strictly prohibited. If you have received this >>> message in error, please immediately notify the sender and/or Syncsort and >>> destroy all copies of this message in your possession, custody or control. > >