Re: /bin/ls formatting broken for non-C(?) locales
On Wed, Nov 25, 2015 at 06:31:16PM +0300, Andrey Chernov wrote: > On 25.11.2015 18:12, Andrey Chernov wrote: > > On 25.11.2015 17:35, Baptiste Daroussin wrote: > >>> BTW, array size looks suspicious: > >>> static wchar_t wab_months[12][MAX_ABMON_WIDTH * 2 * MB_LEN_MAX]; > >>> what MB_LEN_MAX doing here? This constant is for multiple-bytes encoded, > >>> not for wide chars. > >> Bad copy/paste sorry it should be "MAX_ABMON_WIDTH * 2" > > > > I don't check deep enough, it seems first array > > MAX_ABMON_WIDTH * MB_LEN_MAX + 1 > > and second one > > MAX_ABMON_WIDTH * 2 + 1 > > > > No. We can't assume anything here and should integrate limits from the > locale for months fields instead. F.e. in abstract general case in wide > array can be 100 zero-width characters + 5 of normal characters, so > width-oriented sizes not prevents overflowing. > > First array size should be from locale internals, There is no max size for an abbreviated month in the locale internals So maybe the best here is to consider your first proposal: MAX_ABMON_WIDTH * MB_LEN_MAX + 1 and MAX_ABMON_WIDTH * 2 + 1 Check that it does not overflow and fallack if it overflows. Having 100 zero-width characters in a locale have very little probability to happen and if it does one day, it problably means something went wrong in the code generation Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 18:12, Andrey Chernov wrote: > On 25.11.2015 17:35, Baptiste Daroussin wrote: >>> BTW, array size looks suspicious: >>> static wchar_t wab_months[12][MAX_ABMON_WIDTH * 2 * MB_LEN_MAX]; >>> what MB_LEN_MAX doing here? This constant is for multiple-bytes encoded, >>> not for wide chars. >> Bad copy/paste sorry it should be "MAX_ABMON_WIDTH * 2" > > I don't check deep enough, it seems first array > MAX_ABMON_WIDTH * MB_LEN_MAX + 1 > and second one > MAX_ABMON_WIDTH * 2 + 1 > No. We can't assume anything here and should integrate limits from the locale for months fields instead. F.e. in abstract general case in wide array can be 100 zero-width characters + 5 of normal characters, so width-oriented sizes not prevents overflowing. First array size should be from locale internals, second one == first * sizeof(wchar_t) it will be safe for not overflowing. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 17:35, Baptiste Daroussin wrote: >> BTW, array size looks suspicious: >> static wchar_t wab_months[12][MAX_ABMON_WIDTH * 2 * MB_LEN_MAX]; >> what MB_LEN_MAX doing here? This constant is for multiple-bytes encoded, >> not for wide chars. > Bad copy/paste sorry it should be "MAX_ABMON_WIDTH * 2" I don't check deep enough, it seems first array MAX_ABMON_WIDTH * MB_LEN_MAX + 1 and second one MAX_ABMON_WIDTH * 2 + 1 -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On Wed, Nov 25, 2015 at 05:06:17PM +0300, Andrey Chernov wrote: > On 25.11.2015 16:51, Baptiste Daroussin wrote: > > wcslcat works like strlcat: > > to quote the manpage: > > "It will append at most dstsize - strlen(dst) - 1 characters." > > So here with your version it will be n - wcslen(wab_months[i]) -1 > > which won't fit what we want to do. > > > > btw that makes me figure out that what I want is wcsncat because we do care > > to > > make sure we have appended the right number of spaces at the end of the > > abbreviated month based on character width, not the on the len of the wide > > string > > > > so I changed it > > > > if ((n = max_month_width - wab_months_width[i]) > 0) > > wcsncat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); > > Sorry, I initially mean wcsncat() functionality here, wcslcat() is > sneaked in due to wrong memorizing. > > About width counting and fallback... > Without attempt to nicely truncate data damaged by unknown way the code > will be simpler. Instead all that loop adding width one by one wchar, just: > > width = wcswidth(wab_months[i], MAX_ABMON_WIDTH); > if (width == -1) { > max_month_width = -1; > return; > } > wab_months_width[i] = width; > > About > /* NULL terminate to simplify padding later */ > wab_months[i][j] = L'\0'; > You don't need it, mbstowcs() null-terminates result, if there is a room. right if should be only set on the month we do truncate to MAX_ABMON_WIDTH > BTW, array size looks suspicious: > static wchar_t wab_months[12][MAX_ABMON_WIDTH * 2 * MB_LEN_MAX]; > what MB_LEN_MAX doing here? This constant is for multiple-bytes encoded, > not for wide chars. Bad copy/paste sorry it should be "MAX_ABMON_WIDTH * 2" I will fix Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 16:51, Baptiste Daroussin wrote: > wcslcat works like strlcat: > to quote the manpage: > "It will append at most dstsize - strlen(dst) - 1 characters." > So here with your version it will be n - wcslen(wab_months[i]) -1 > which won't fit what we want to do. > > btw that makes me figure out that what I want is wcsncat because we do care to > make sure we have appended the right number of spaces at the end of the > abbreviated month based on character width, not the on the len of the wide > string > > so I changed it > > if ((n = max_month_width - wab_months_width[i]) > 0) > wcsncat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); Sorry, I initially mean wcsncat() functionality here, wcslcat() is sneaked in due to wrong memorizing. About width counting and fallback... Without attempt to nicely truncate data damaged by unknown way the code will be simpler. Instead all that loop adding width one by one wchar, just: width = wcswidth(wab_months[i], MAX_ABMON_WIDTH); if (width == -1) { max_month_width = -1; return; } wab_months_width[i] = width; About /* NULL terminate to simplify padding later */ wab_months[i][j] = L'\0'; You don't need it, mbstowcs() null-terminates result, if there is a room. BTW, array size looks suspicious: static wchar_t wab_months[12][MAX_ABMON_WIDTH * 2 * MB_LEN_MAX]; what MB_LEN_MAX doing here? This constant is for multiple-bytes encoded, not for wide chars. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 15:53, Baptiste Daroussin wrote: > What I did for now is set max_month_width to -1 and in ls_strftime I fallback > on > the plain strftime meaning you keep localized information but the alignement > is > broken as of now. It will be enough. >> 3) wcwidth/wcswidth may return -1 too, it needs to be checked too. > done and truncate the name of the month to the latest valid character I think there is no point for sofisticated truncating, if locale is not valid. F.e. you may truncate to just 1 char. The thing above will be enough for all such cases. > Review updated (if you prefer a diff by mail just tell me, given you do not > have > a phabricator account.) I don't have phabricator. This one if ((n = max_month_width - wab_months_width[i]) > 0) { wcslcat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, max_month_width + 1); } should be if ((n = max_month_width - wab_months_width[i]) > 0) wcslcat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); I.e. you append n spaces and n is the difference between max and current. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On Wed, Nov 25, 2015 at 04:34:17PM +0300, Andrey Chernov wrote: > On 25.11.2015 15:53, Baptiste Daroussin wrote: > > What I did for now is set max_month_width to -1 and in ls_strftime I > > fallback on > > the plain strftime meaning you keep localized information but the > > alignement is > > broken as of now. > > It will be enough. > > >> 3) wcwidth/wcswidth may return -1 too, it needs to be checked too. > > done and truncate the name of the month to the latest valid character > > I think there is no point for sofisticated truncating, if locale is not > valid. F.e. you may truncate to just 1 char. The thing above will be > enough for all such cases. > > > Review updated (if you prefer a diff by mail just tell me, given you do not > > have > > a phabricator account.) > > I don't have phabricator. > > This one > if ((n = max_month_width - wab_months_width[i]) > 0) { > wcslcat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, > max_month_width + 1); > } > should be > if ((n = max_month_width - wab_months_width[i]) > 0) > wcslcat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); > > I.e. you append n spaces and n is the difference between max and current. With wcsncat I would agree, but not with wcslcat. wcslcat works like strlcat: to quote the manpage: "It will append at most dstsize - strlen(dst) - 1 characters." So here with your version it will be n - wcslen(wab_months[i]) -1 which won't fit what we want to do. btw that makes me figure out that what I want is wcsncat because we do care to make sure we have appended the right number of spaces at the end of the abbreviated month based on character width, not the on the len of the wide string so I changed it if ((n = max_month_width - wab_months_width[i]) > 0) wcsncat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); If you do not have further comments, I will commit that later today. Then I will work on libarchive and pax (would you like to review the patches for those?) Best regards, Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On Wed, Nov 25, 2015 at 04:31:21AM +0300, Andrey Chernov wrote: > On 25.11.2015 3:15, Baptiste Daroussin wrote: > > On Sat, Nov 21, 2015 at 11:57:46PM +0300, Andrey Chernov wrote: > >> On 21.11.2015 15:18, Ed Schouten wrote: > >>> Hi Baptiste, > >>> > >>> I suppose you should use the wcswidth() function somewhere to compute > >>> the visible width of the month name. Some characters may be > >>> double-width, others may have no effective width at all. > >>> > >> > >> I agree. Checking error return of wide chars functions with some > >> fallback will be good too. > > > > I have updated the code https://reviews.freebsd.org/D4239 > > > > Tested by modifying some locales to add double width and zero width unicode > > in > > the locales > > > > Also added the error checking for the return of wide chars functions. For > > now I > > haven't added fallback, suggestions welcome if needed. > > 1) For just 1 char in wcswidth(&wab_months[i][j], 1); it is better to > use another function wcwidth(wab_months[i][j]); done > > 2) By fallback I mean something which not stops ls working with > incorrect for some reason locale, like setting max_width_month to > MAX_ABMON_WIDTH on error return (from > mbstowcs/wcwidth/wcswidth/wcswidth) and exit from > populate_abbreviated_month(). Not that easy to provide a fallback (or better to say I can't find a proper way) it if fails on mbstocws then ab_months is not populated so unusable. What I did for now is set max_month_width to -1 and in ls_strftime I fallback on the plain strftime meaning you keep localized information but the alignement is broken as of now. > > 3) wcwidth/wcswidth may return -1 too, it needs to be checked too. done and truncate the name of the month to the latest valid character > > 4) The whole processing looks overcomplicated and not effective. What > about this instead? > for (i = 0; i < 12; i++) { > count wcswidth() of each month and store it in wab_months_width[]. > count max_width_month. > } > for (i = 0; i < 12; i++) { > if ((n = max_width_month - wab_months_width[i]) > 0) > call wcscat(wab_months[i], L" ") n times. > } Done, along with your optimisation from your next mail > > 5) If there is no %b is strftime() format, there is no sense to spend > CPU cycles on from populate_abbreviated_month(), so it should be called > only once inside ls_strftime() on first %b instead of calling it in > printtime() for all cases. done Review updated (if you prefer a diff by mail just tell me, given you do not have a phabricator account.) Thanks for your patience! and reviews! Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 4:31, Andrey Chernov wrote: > 4) The whole processing looks overcomplicated and not effective. What > about this instead? > for (i = 0; i < 12; i++) { > count wcswidth() of each month and store it in wab_months_width[]. > count max_width_month. > } > for (i = 0; i < 12; i++) { > if ((n = max_width_month - wab_months_width[i]) > 0) > call wcscat(wab_months[i], L" ") n times. > } Last line can be optimized further: wcslcat(wab_months[i], L" "/* MAX_ABMON_WIDTH */, n); -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On 25.11.2015 3:15, Baptiste Daroussin wrote: > On Sat, Nov 21, 2015 at 11:57:46PM +0300, Andrey Chernov wrote: >> On 21.11.2015 15:18, Ed Schouten wrote: >>> Hi Baptiste, >>> >>> I suppose you should use the wcswidth() function somewhere to compute >>> the visible width of the month name. Some characters may be >>> double-width, others may have no effective width at all. >>> >> >> I agree. Checking error return of wide chars functions with some >> fallback will be good too. > > I have updated the code https://reviews.freebsd.org/D4239 > > Tested by modifying some locales to add double width and zero width unicode in > the locales > > Also added the error checking for the return of wide chars functions. For now > I > haven't added fallback, suggestions welcome if needed. 1) For just 1 char in wcswidth(&wab_months[i][j], 1); it is better to use another function wcwidth(wab_months[i][j]); 2) By fallback I mean something which not stops ls working with incorrect for some reason locale, like setting max_width_month to MAX_ABMON_WIDTH on error return (from mbstowcs/wcwidth/wcswidth/wcswidth) and exit from populate_abbreviated_month(). 3) wcwidth/wcswidth may return -1 too, it needs to be checked too. 4) The whole processing looks overcomplicated and not effective. What about this instead? for (i = 0; i < 12; i++) { count wcswidth() of each month and store it in wab_months_width[]. count max_width_month. } for (i = 0; i < 12; i++) { if ((n = max_width_month - wab_months_width[i]) > 0) call wcscat(wab_months[i], L" ") n times. } 5) If there is no %b is strftime() format, there is no sense to spend CPU cycles on from populate_abbreviated_month(), so it should be called only once inside ls_strftime() on first %b instead of calling it in printtime() for all cases. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On Wed, Nov 25, 2015 at 01:15:13AM +0100, Baptiste Daroussin wrote: > On Sat, Nov 21, 2015 at 11:57:46PM +0300, Andrey Chernov wrote: > > On 21.11.2015 15:18, Ed Schouten wrote: > > > Hi Baptiste, > > > > > > I suppose you should use the wcswidth() function somewhere to compute > > > the visible width of the month name. Some characters may be > > > double-width, others may have no effective width at all. > > > > > > > I agree. Checking error return of wide chars functions with some > > fallback will be good too. > > I have updated the code https://reviews.freebsd.org/D4239 > > Tested by modifying some locales to add double width and zero width unicode in > the locales > > Also added the error checking for the return of wide chars functions. For now > I > haven't added fallback, suggestions welcome if needed. > > Best regards, > Bapt Actually I can make the fallback on the C locale in case of failure. Would that work for you? Best regards, Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On Sat, Nov 21, 2015 at 11:57:46PM +0300, Andrey Chernov wrote: > On 21.11.2015 15:18, Ed Schouten wrote: > > Hi Baptiste, > > > > I suppose you should use the wcswidth() function somewhere to compute > > the visible width of the month name. Some characters may be > > double-width, others may have no effective width at all. > > > > I agree. Checking error return of wide chars functions with some > fallback will be good too. I have updated the code https://reviews.freebsd.org/D4239 Tested by modifying some locales to add double width and zero width unicode in the locales Also added the error checking for the return of wide chars functions. For now I haven't added fallback, suggestions welcome if needed. Best regards, Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On 21.11.2015 15:18, Ed Schouten wrote: > Hi Baptiste, > > I suppose you should use the wcswidth() function somewhere to compute > the visible width of the month name. Some characters may be > double-width, others may have no effective width at all. > I agree. Checking error return of wide chars functions with some fallback will be good too. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /bin/ls formatting broken for non-C(?) locales
Hi Baptiste, I suppose you should use the wcswidth() function somewhere to compute the visible width of the month name. Some characters may be double-width, others may have no effective width at all. -- Ed Schouten Nuxi, 's-Hertogenbosch, the Netherlands KvK-nr.: 62051717 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /bin/ls formatting broken for non-C(?) locales
On Sat, Nov 21, 2015 at 03:45:29AM +0300, Andrey Chernov wrote: > On 21.11.2015 3:35, Baptiste Daroussin wrote: > > > Here is what I do propose (sorry for the ugly pad_to_col name, if one has > > better > > please share :D > > > > https://reviews.freebsd.org/D4239 > > The whole function is ugly, not only its name. Please no hardcoded > constants assuming some internal encoding knowledge, they are wrong for > non-UTF-8 encodings in any case, use wide chars instead. > > BTW, the same 3 chars restriction is in tar, cpio, pax, lots of ftp > clients, i.e. where 'ls' emulated. > Updated to use wide char functions. tested with ru_RU.UTF-8 and ru_RU.KOI8-R, fr_FR.UTF-8, ja_JP.eucJP and some chinese locales. Concerning tar and cpio, I can probably push the same thing into bsdtar (which will fix the same issue they also have on linux), ftp is safe, pax is not and could probably be fixed as well. FYI on linux only ls has been fixed. gnutar uses a different format for its output. Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On 21.11.2015 3:35, Baptiste Daroussin wrote: > Here is what I do propose (sorry for the ugly pad_to_col name, if one has > better > please share :D > > https://reviews.freebsd.org/D4239 The whole function is ugly, not only its name. Please no hardcoded constants assuming some internal encoding knowledge, they are wrong for non-UTF-8 encodings in any case, use wide chars instead. BTW, the same 3 chars restriction is in tar, cpio, pax, lots of ftp clients, i.e. where 'ls' emulated. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: /bin/ls formatting broken for non-C(?) locales
On Fri, Nov 20, 2015 at 01:23:52PM +0100, Jilles Tjoelker wrote: > On Fri, Nov 20, 2015 at 12:02:13PM +0100, Baptiste Daroussin wrote: > > On Fri, Nov 20, 2015 at 11:42:53AM +0100, Baptiste Daroussin wrote: > > > On Fri, Nov 20, 2015 at 11:05:56AM +0300, Sergey V. Dyatko wrote: > > > > subj. http://i.imgur.com/F9QO29l.png > > > > it is on head@r290573: > > > > WTR: > > > > env LC_ALL=uk_UA.UTF-8 ls -la /usr/ports/databases/ or env > > > > LC_ALL=ru_RU.UTF-8 > > > > ls -la /usr/ports/databases/ > > > > > env LC_ALL=C ls -la /usr/ports/databases/ works fine > > > > also on old stable/10 (r286868) as I can see 'month' field length 3 > > > > symbols > > > > Thanks for reporting, I can reproduce the issue with some other > > > locales. The thing is there seems to be no standard for abbreviated > > > length. Formerly we had a 3 character lenght for abbreviated month. > > > > We now use CLDR which seems to follow the abbreviated rules from IBM: > > > "Each string must be of equal length and contain 5 characters or less" > > > > There are 2 possible fixes: either always pad those in the locale > > > definition which seems wrong or modify ls so that it by itself pads > > > properly. > > > > Neither posix nor ISO-14652 defines the length of the abbreviated form > > > > padding in the locales themself would be wrong so I do propose to > > > pad in the ls command. And padding with 5 characters. > > > For the record glibc/linux had the same problem: > > https://sourceware.org/bugzilla/show_bug.cgi?id=9859 > > > "fixed" in coreutils (gnu ls) the way I propose to do for us > > http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=612b647dd16d5abc03b295abe42d8b4a0fe660f7 > > Coreutils fixed it slightly better: it calculates the maximum width of > the abbreviated month names and pads to that (with a maximum of 5). In > particular, this ensures that the output does not change for locales > that have 3-character abbreviations, such as the POSIX locale. I think > this is valuable. > > They also keep the list of month names from this calculation and they > say this speeds up ls noticeably compared to having strftime() expand %b > for every file. Here is what I do propose (sorry for the ugly pad_to_col name, if one has better please share :D https://reviews.freebsd.org/D4239 Best regards, Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On Fri, Nov 20, 2015 at 12:02:13PM +0100, Baptiste Daroussin wrote: > On Fri, Nov 20, 2015 at 11:42:53AM +0100, Baptiste Daroussin wrote: > > On Fri, Nov 20, 2015 at 11:05:56AM +0300, Sergey V. Dyatko wrote: > > > subj. http://i.imgur.com/F9QO29l.png > > > it is on head@r290573: > > > WTR: > > > env LC_ALL=uk_UA.UTF-8 ls -la /usr/ports/databases/ or env > > > LC_ALL=ru_RU.UTF-8 > > > ls -la /usr/ports/databases/ > > > env LC_ALL=C ls -la /usr/ports/databases/ works fine > > > also on old stable/10 (r286868) as I can see 'month' field length 3 > > > symbols > > Thanks for reporting, I can reproduce the issue with some other > > locales. The thing is there seems to be no standard for abbreviated > > length. Formerly we had a 3 character lenght for abbreviated month. > > We now use CLDR which seems to follow the abbreviated rules from IBM: > > "Each string must be of equal length and contain 5 characters or less" > > There are 2 possible fixes: either always pad those in the locale > > definition which seems wrong or modify ls so that it by itself pads > > properly. > > Neither posix nor ISO-14652 defines the length of the abbreviated form > > padding in the locales themself would be wrong so I do propose to > > pad in the ls command. And padding with 5 characters. > For the record glibc/linux had the same problem: > https://sourceware.org/bugzilla/show_bug.cgi?id=9859 > "fixed" in coreutils (gnu ls) the way I propose to do for us > http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=612b647dd16d5abc03b295abe42d8b4a0fe660f7 Coreutils fixed it slightly better: it calculates the maximum width of the abbreviated month names and pads to that (with a maximum of 5). In particular, this ensures that the output does not change for locales that have 3-character abbreviations, such as the POSIX locale. I think this is valuable. They also keep the list of month names from this calculation and they say this speeds up ls noticeably compared to having strftime() expand %b for every file. -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: /bin/ls formatting broken for non-C(?) locales
On Fri, Nov 20, 2015 at 11:42:53AM +0100, Baptiste Daroussin wrote: > On Fri, Nov 20, 2015 at 11:05:56AM +0300, Sergey V. Dyatko wrote: > > Hi, > > > > subj. http://i.imgur.com/F9QO29l.png > > it is on head@r290573: > > WTR: > > env LC_ALL=uk_UA.UTF-8 ls -la /usr/ports/databases/ or env > > LC_ALL=ru_RU.UTF-8 > > ls -la /usr/ports/databases/ > > > > env LC_ALL=C ls -la /usr/ports/databases/ works fine > > also on old stable/10 (r286868) as I can see 'month' field length 3 > > symbols > > > Thanks for reporting, I can reproduce the issue with some other locales. The > thing is there seems to be no standard for abbreviated length. Formerly we > had a > 3 character lenght for abbreviated month. > > We now use CLDR which seems to follow the abbreviated rules from IBM: > "Each string must be of equal length and contain 5 characters or less" > > There are 2 possible fixes: either always pad those in the locale definition > which seems wrong or modify ls so that it by itself pads properly. > > Neither posix nor ISO-14652 defines the length of the abbreviated form > > padding in the locales themself would be wrong so I do propose to pad in the > ls > command. And padding with 5 characters. > > Best regards, > Bapt For the record glibc/linux had the same problem: https://sourceware.org/bugzilla/show_bug.cgi?id=9859 "fixed" in coreutils (gnu ls) the way I propose to do for us http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=612b647dd16d5abc03b295abe42d8b4a0fe660f7 Best regards, Bapt signature.asc Description: PGP signature
Re: /bin/ls formatting broken for non-C(?) locales
On Fri, Nov 20, 2015 at 11:05:56AM +0300, Sergey V. Dyatko wrote: > Hi, > > subj. http://i.imgur.com/F9QO29l.png > it is on head@r290573: > WTR: > env LC_ALL=uk_UA.UTF-8 ls -la /usr/ports/databases/ or env LC_ALL=ru_RU.UTF-8 > ls -la /usr/ports/databases/ > > env LC_ALL=C ls -la /usr/ports/databases/ works fine > also on old stable/10 (r286868) as I can see 'month' field length 3 symbols > Thanks for reporting, I can reproduce the issue with some other locales. The thing is there seems to be no standard for abbreviated length. Formerly we had a 3 character lenght for abbreviated month. We now use CLDR which seems to follow the abbreviated rules from IBM: "Each string must be of equal length and contain 5 characters or less" There are 2 possible fixes: either always pad those in the locale definition which seems wrong or modify ls so that it by itself pads properly. Neither posix nor ISO-14652 defines the length of the abbreviated form padding in the locales themself would be wrong so I do propose to pad in the ls command. And padding with 5 characters. Best regards, Bapt signature.asc Description: PGP signature
/bin/ls formatting broken for non-C(?) locales
Hi, subj. http://i.imgur.com/F9QO29l.png it is on head@r290573: WTR: env LC_ALL=uk_UA.UTF-8 ls -la /usr/ports/databases/ or env LC_ALL=ru_RU.UTF-8 ls -la /usr/ports/databases/ env LC_ALL=C ls -la /usr/ports/databases/ works fine also on old stable/10 (r286868) as I can see 'month' field length 3 symbols -- wbr, tiger ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"