Greetings-- In the process of fixing the Python test suite on Cygwin I ran across one test that was consistently causing segfaults later on, not directly local to that test. The test involves wcsxfrm so that's where I focused my attention.
The attached test demonstrates the bug. Given an output buffer of N wide characters, wcsxfrm will cause bytes beyond the destination size to be reversed. I believe it might actually be a bug in the underlying LCMapStringW workhorse (this is on Windows 10; have not tested other versions). According to its docs [1], the cchDest argument (size of the destination buffer) is treated as a *byte* count when using LCMAP_SORTKEY. However, for the purposes of applying the LCMAP_BYTEREV transformation it seems to be treating the output size (in bytes) as character count. So in the example I give, where the output sort key is 7 bytes (including the null terminator), it swaps *14* bytes--the bytes including the sort key as well as the next 7 adjacent bytes. This is obviously a problem if the destination buffer is allocated out of some larger memory pool. This definitely has to be a bug, right? Or at least very poorly documented on MS's part. A workaround would either be to not use LCMAP_BYTEREV and just swap the bytes manually, or in a second call to LCMapStringW with LCMAP_BYTEREV and the correct character count... Thanks, Erik [1] https://msdn.microsoft.com/en-us/library/windows/desktop/dd318700(v=vs.85).aspx
#include <stdlib.h> #include <stdio.h> #include <locale.h> #include <wchar.h> #include <string.h> #include <windows.h> #define SIZE 32 void fill_bytes(uint8_t *a, int n) { int idx; for (idx=0; idx<n; idx++) { a[idx] = idx; } } void print_bytes(uint8_t *a, int n) { int idx; for (idx=0; idx<n; idx++) { printf("0x%02x ", ((uint8_t*)a)[idx]); if ((idx + 1) % 8 == 0) printf("\n"); } } int main(void) { wchar_t *a, *b; uint8_t *aa; size_t ret; LCID collate_lcid; int idx; collate_lcid = 1033; b = L"b"; a = (wchar_t*) malloc(SIZE); aa = (uint8_t*) a; setlocale(LC_ALL, "en_US.UTF-8"); printf("using wcsxfrm:\n"); fill_bytes(aa, SIZE); printf("before:\n"); print_bytes(aa, SIZE); ret = wcsxfrm(a, b, 4); printf("after (%d):\n", ret); print_bytes(aa, SIZE); printf("\nusing LCMapStringW directly:\n"); fill_bytes(aa, SIZE); printf("before:\n"); print_bytes(aa, SIZE); ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY | LCMAP_BYTEREV, b, -1, a, 8); printf("after (%d):\n", ret); print_bytes(aa, SIZE); printf("\nwithout LCMAP_BYTEREV:\n"); fill_bytes(aa, SIZE); printf("before:\n"); print_bytes(aa, SIZE); ret = LCMapStringW(collate_lcid, LCMAP_SORTKEY, b, -1, a, 8); printf("after (%d):\n", ret); print_bytes(aa, SIZE); free(a); return 0; }
-- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple