On Sun, Aug 03, 2014 at 11:50:29PM -0600, James Bowlin wrote: > I run busybox in an initrd (initramfs) environment using both > legacy Grub and isolinux as boot loaders. I want to be able to > get the correct length of unicode strings in characters, not > bytes. I always have these two options set: > > CONFIG_UNICODE_SUPPORT=y > CONFIG_UNICODE_WIDE_WCHARS=y > > and I've played with the 3 combinations of: > > CONFIG_UNICODE_USING_LOCALE > CONFIG_FEATURE_CHECK_UNICODE_IN_ENV > > both off, and one or the other on. > > At best I get very erratic results that depend on the value the > LANG variable when the first busybox shell (/init script) starts > and it seems to be immune to later using "export LANG=xxx". > > One of the things I've tried is: > > echo -n "$x" | LANG=en_US.UTF-8 sed 's/./x/g' | wc -c > > This works *sometimes* but seems to depend on the value of LANG > when the first busybox shell (/init script) is started but it has > been flakey at best. Also, I don't have absolute control over > that initial value of LANG because it is can be set by users with > a "lang=xxx" boot parameter. > > Unicode strings always print fine. I'm just struggling with > getting the length of a string that has unicode characters. > Using ${#x} has always failed. So has "wc -c" which is what led > me to the sed trick above. > > The code above to get the length always seems to work > consistently in my development environment (using the right > busybox .config). Oddly enough it fails in my development > environment when CONFIG_FEATURE_CHECK_UNICODE_IN_ENV=y which is > exactly the opposite of what I would expect.
This option is utterly broken and should never be used. It searches for the string ".utf" or ".UTF" in $LANG to determine if UTF-8 should be enabled. There is no reason that this string needs to appear in the name of a locale for the locale to be UTF-8-based (plenty of locales have no legacy encoding). This option should really be removed from Busybox, or at least get a big warning slapped on it that it's broken and doesn't do what it should. The correct (and only correct) way to determine if UTF-8 should be used is to call setlocale(LC_CTYPE, "") and then check nl_langinfo(CODESET). > it seems to mostly fail when I choot into a plain busybox > environment or inside my initrd (initramfs) during boot. If you're having trouble even with CONFIG_FEATURE_CHECK_UNICODE_IN_ENV turned off and CONFIG_UNICODE_USING_LOCALE turned on, please let me know which libc you're using -- this could matter too -- and if it's uClibc, whether you have locale properly enabled for it. Rich _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox