On Sun, Aug 03, 2014 at 11:50:29PM -0600, James Bowlin wrote:
> I run busybox in an initrd (initramfs) environment using both
> legacy Grub and isolinux as boot loaders.  I want to be able to
> get the correct length of unicode strings in characters, not
> bytes.  I always have these two options set:
> 
>     CONFIG_UNICODE_SUPPORT=y
>     CONFIG_UNICODE_WIDE_WCHARS=y
> 
> and I've played with the 3 combinations of:
> 
>     CONFIG_UNICODE_USING_LOCALE
>     CONFIG_FEATURE_CHECK_UNICODE_IN_ENV
> 
> both off, and one or the other on.
> 
> At best I get very erratic results that depend on the value the
> LANG variable when the first busybox shell (/init script) starts
> and it seems to be immune to later using "export LANG=xxx".
> 
> One of the things I've tried is:
> 
>     echo -n "$x" | LANG=en_US.UTF-8 sed 's/./x/g' | wc -c
> 
> This works *sometimes* but seems to depend on the value of LANG
> when the first busybox shell (/init script) is started but it has
> been flakey at best.  Also, I don't have absolute control over
> that initial value of LANG because it is can be set by users with
> a "lang=xxx" boot parameter.
> 
> Unicode strings always print fine.  I'm just struggling with
> getting the length of a string that has unicode characters.
> Using ${#x} has always failed.  So has "wc -c" which is what led
> me to the sed trick above.
> 
> The code above to get the length always seems to work
> consistently in my development environment (using the right
> busybox .config).  Oddly enough it fails in my development
> environment when CONFIG_FEATURE_CHECK_UNICODE_IN_ENV=y which is
> exactly the opposite of what I would expect.

This option is utterly broken and should never be used. It searches
for the string ".utf" or ".UTF" in $LANG to determine if UTF-8 should
be enabled. There is no reason that this string needs to appear in the
name of a locale for the locale to be UTF-8-based (plenty of locales
have no legacy encoding).

This option should really be removed from Busybox, or at least get a
big warning slapped on it that it's broken and doesn't do what it
should.

The correct (and only correct) way to determine if UTF-8 should be
used is to call setlocale(LC_CTYPE, "") and then check
nl_langinfo(CODESET).

> it seems to mostly fail when I choot into a plain busybox
> environment or inside my initrd (initramfs) during boot.

If you're having trouble even with CONFIG_FEATURE_CHECK_UNICODE_IN_ENV
turned off and CONFIG_UNICODE_USING_LOCALE turned on, please let me
know which libc you're using -- this could matter too -- and if it's
uClibc, whether you have locale properly enabled for it.

Rich
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to