Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com wrote: its the same with bionic libc (arm) printf(test) is ok but not printf(buf) with char buf[] = test; printf(%s, buf) is ok Yeah, I guess it is about personal preference. I personally do not like the extended code just to make some smart option silent. IMHO, it makes the code needlessly longer by adding another small layer for the content of the variable. If there is no need for formatting, why use printf in the first place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On the other hand, it is too bad that there is nothing between fputs and puts where you do not need to use the file descriptor and you do not get a new line either. In principle stdout is more expensive than . stdout is a GOT reference in PIC, and is just PC-relative. For non-PIC it makes no difference though. Then my PC is probably lying since the practice does seem to disagree with you, apparently as well as this post: http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 (and the one below, ok, it is not critical, but it performs better for others, too) Raw strings are also uglier IMHO than well-defined identifiers, like stdout, but that is personal taste, so I do not mind that one. printf() is almost always going to call vfprintf(stdout, fmt, ap); I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only exception I've found, and it's ugly and does not have locking/retry/... newlib calls _vfprintf_r, which is almost the same concept. Meanwhile, printf has to do the formatting. In other words: Just because printf(%s, string) doesn't name stdout, don't assume that it doesn't use a reference to stdout. So an ACK for Laszlo Papp's recommendation of fputs(). Thanks, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
On Wed, Aug 13, 2014 at 7:06 PM, Harald Becker ra...@gmx.de wrote: The world seems to be standardizing on utf-8. Thank God, supporting gazillion of encodings is no fun. You say this, but libbb/unicode.c contains a unicode_strlen calling this complex mb to wc conversion function to count the number of characters. Those multi byte functions tend to be highly complex and slow (don't know if they have gone better). For just UTF-8, things can be optimized. bbox does have unicode-only implementation of mbstowc. See unicode.c size_t utf8len( const char* s ) { size_t n = 0; while (*s) if ((*s++ ^ 0x40) 0xC0) n++; return n; } size_t mystrlen( const char* s ) { return utf8_enabled ? utf8len(s) : strlen(s); } This looks more, but avoids inclusion of mb function. Most compiler shall produce fast code for utf8len. There are situations where you need to do tons of unicode_strlen() and you can tolerate getting wrong results on broken Unicode. Then a function similar to yours can be very useful. -- vda ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
Hi Denys, my new system gets to a usable state, so I'm able to do more then just writing mails ... 4) applet printf, string formats %Ns Does this N mean character positions or bytes. The underlying C printf used to work with bytes for decades. The man page talks about character positions, but printf from bash uses bytes. Also, printf needs to support \u Do you like getting a patch for BB_process_escape_sequence to add Unicode sequences \u? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
In bionic libc (android), wchar functions are stubs, mb functions also, the MAX_MB_LEN is set to 1. For wide chars, all chars are copied in the first byte of uint32_t wchar_t. So, all mbtowc, wctomb and mbstombs functions are useless. but the tools vim and lv are working perfectly with their own convertion functions, but these tools are big (1MB on arm, bb is only 550K in comparaison, 750K statically linked to the libc/libm/libselinux) references : https://github.com/tpruvot/android_external_vim https://github.com/tpruvot/android_external_lv https://github.com/tpruvot/android_external_busybox 2014-08-15 13:32 GMT+02:00 Harald Becker ra...@gmx.de: Hi Denys, my new system gets to a usable state, so I'm able to do more then just writing mails ... 4) applet printf, string formats %Ns Does this N mean character positions or bytes. The underlying C printf used to work with bytes for decades. The man page talks about character positions, but printf from bash uses bytes. Also, printf needs to support \u Do you like getting a patch for BB_process_escape_sequence to add Unicode sequences \u? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Fix the addgroup help output
On Wed, Aug 13, 2014 at 2:25 PM, Laszlo Papp lp...@kde.org wrote: On Wed, Aug 13, 2014 at 12:54 PM, tito farmat...@tiscali.it wrote: On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote: On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp lp...@kde.org wrote: commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a Author: Laszlo Papp lp...@kde.org Date: Wed Aug 13 09:48:08 2014 +0100 Fix the addgroup help output Since the applet has two options, it is quite misleading to only mention one in the usage example. It should either use OPTIONS there or enumerate the possible options. The latter seems to be more common in applet, so I picked that one. diff --git a/loginutils/addgroup.c b/loginutils/addgroup.c index 22cd0e6..ce57459 100644 --- a/loginutils/addgroup.c +++ b/loginutils/addgroup.c @@ -11,7 +11,7 @@ */ //usage:#define addgroup_trivial_usage -//usage: [-g GID] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP +//usage: [-gS] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP //usage:#define addgroup_full_usage \n\n //usage: Add a group IF_FEATURE_ADDUSER_TO_GROUP(or add a user to a group) \n //usage: \n -g GID Group id Hmm, actually, I got some doubts whether this change is useful because while it fixes one issue, the applets seem randomly inconsistent, e.g. this is not holding any consistency with any other either: Usage: crond -fbS -l N -d N -L LOGFILE -c DIR At best, it is missing the brackets, but the applets are inconsistent at large overall, sadly, so I am not sure it is any improvement to fix this one on its own, and fixing all would be more intrusive than I care. Hi, maybe is more easy to understand as it shows which option needs a argument. +//usage: [-g GID][-S] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP Well, it is all about personal preferences, so whatever Denys wishes as the maintainer, but the more important objective issue is outlined above. I fixed help text by adding [-S]. Why applets are doing this on their own is beyond my league other than hysterical raisins... I didn't understand this part. Why applets are doing /what/ on their own? ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Fix the addgroup help output
On Fri, Aug 15, 2014 at 1:31 PM, Denys Vlasenko vda.li...@googlemail.com wrote: On Wed, Aug 13, 2014 at 2:25 PM, Laszlo Papp lp...@kde.org wrote: On Wed, Aug 13, 2014 at 12:54 PM, tito farmat...@tiscali.it wrote: On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote: On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp lp...@kde.org wrote: commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a Author: Laszlo Papp lp...@kde.org Date: Wed Aug 13 09:48:08 2014 +0100 Fix the addgroup help output Since the applet has two options, it is quite misleading to only mention one in the usage example. It should either use OPTIONS there or enumerate the possible options. The latter seems to be more common in applet, so I picked that one. diff --git a/loginutils/addgroup.c b/loginutils/addgroup.c index 22cd0e6..ce57459 100644 --- a/loginutils/addgroup.c +++ b/loginutils/addgroup.c @@ -11,7 +11,7 @@ */ //usage:#define addgroup_trivial_usage -//usage: [-g GID] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP +//usage: [-gS] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP //usage:#define addgroup_full_usage \n\n //usage: Add a group IF_FEATURE_ADDUSER_TO_GROUP(or add a user to a group) \n //usage: \n -g GID Group id Hmm, actually, I got some doubts whether this change is useful because while it fixes one issue, the applets seem randomly inconsistent, e.g. this is not holding any consistency with any other either: Usage: crond -fbS -l N -d N -L LOGFILE -c DIR At best, it is missing the brackets, but the applets are inconsistent at large overall, sadly, so I am not sure it is any improvement to fix this one on its own, and fixing all would be more intrusive than I care. Hi, maybe is more easy to understand as it shows which option needs a argument. +//usage: [-g GID][-S] IF_FEATURE_ADDUSER_TO_GROUP([USER] ) GROUP Well, it is all about personal preferences, so whatever Denys wishes as the maintainer, but the more important objective issue is outlined above. I fixed help text by adding [-S]. Why applets are doing this on their own is beyond my league other than hysterical raisins... I didn't understand this part. Why applets are doing /what/ on their own? As mentioned in the thread above, applets are defining the help display text on their own, and it is different for different applets. There is no centralized common solution to this problem. Every little applet is doing its own way as they wish. Also, it will be difficult to change at many places if you happen to need to.. centralization would rock here imho. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Fix the addgroup help output
On Fri, Aug 15, 2014 at 5:12 PM, Denys Vlasenko vda.li...@googlemail.com wrote: On Fri, Aug 15, 2014 at 3:11 PM, lp...@archlinux.us wrote: Applets are defining the help display text on their own, and it is different for different applets. Also, it will be difficult to change at many places if you happen to need to.. centralization would rock here imho... Centralization used to rock, when all help texts were in include/usage.src.h (then named include/usage.h). It was easy to see all help texts in one place, but it was a complete nightmare to keep applet and its help text in sync: you needed to jump between two different files in order to check what every option does in code, and what help text says. People were routinely forgetting to update help text, because it was in separate file. Adding new applet needed editing many files. You can see all help texts in one place after a build: it's in generated file, include/usage.h; or you can just run auxiliary executable: applets/usage to have it on stdout. That is not what I meant. What I meant is centralizing the formatting output. That is, to have some common functions to which you pass your options and the functions will output a standardized format. I did not mean to put all the custom code into one place as a dump. That would be indeed worse than the current, I agree. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote: On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com wrote: its the same with bionic libc (arm) printf(test) is ok but not printf(buf) with char buf[] = test; printf(%s, buf) is ok Yeah, I guess it is about personal preference. I personally do not like the extended code just to make some smart option silent. IMHO, it makes the code needlessly longer by adding another small layer for the content of the variable. If there is no need for formatting, why use printf in the first place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On the other hand, it is too bad that there is nothing between fputs and puts where you do not need to use the file descriptor and you do not get a new line either. In principle stdout is more expensive than . stdout is a GOT reference in PIC, and is just PC-relative. For non-PIC it makes no difference though. Then my PC is probably lying since the practice does seem to disagree with you, apparently as well as this post: http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 (and the one below, ok, it is not critical, but it performs better for others, too) Raw strings are also uglier IMHO than well-defined identifiers, like stdout, but that is personal taste, so I do not mind that one. printf() is almost always going to call vfprintf(stdout, fmt, ap); I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only exception I've found, and it's ugly and does not have locking/retry/... newlib calls _vfprintf_r, which is almost the same concept. Meanwhile, printf has to do the formatting. In other words: Just because printf(%s, string) doesn't name stdout, don't assume that it doesn't use a reference to stdout. So an ACK for Laszlo Papp's recommendation of fputs(). I was talking about code size overhead, not run time overhead. Of course there's a reference to stdout in printf. When you reference it again yourself, that's a second reference. It's not a big deal but apparently busybox folks care about this kind of micro-optimization... Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
On Fri, Aug 15, 2014 at 12:31:15AM +0200, Harald Becker wrote: and how want you behave in case of invalid UTF-8 sequences? My functions just skip over stray codes of 0x80..0xBF and synchronize on next valid UTF-8 leading byte. How would you count invalid sequences? In general, I would count the whole operation as a failure, returning some value such as -1 reserved for failure, since the string is not actually UTF-8 and thus how many characters? has no meaning. For specific uses, there might be other preferred behaviors. If your goal is display, you may want to simply replace illegal sequences with U+FFFD in which case you'd count each such sequence as 1, but if you're using this character-counting to allocate a buffer for the converted string, you need to be sure your conversion function and character-counting function agree on how illegal sequences are counted, or you might overflow your buffer or end up having to truncate the output. Rich, will you ever use the result of counting the numbers of UTF-8 characters to allocate a buffer? I don't think so. That would be very ill behavior. To allocate buffer space you need the number of bytes occupied by a string, not the number of UTF-8 characters. If your intent is to convert the string to UTF-32/wchar_t/whatever, then yes, you use the result for allocating a buffer. In my mind that's the main point of counting characters (since otherwise you usually care about either bytes, for storage, or columns, for presentation), and while I personally consider it better to work character-at-a-time and keep the string as UTF-8, some APIs require a string in a different format, especially ones that work with a whole string and prepare it for visual presentation. The main other place counting characters makes sense is for implementing languages that do substring operations with character indexes, which I think is the one you care about. So the big question is: Is there anybody who still needs the BB internal Unicode handling and can't use the locale functions of a libc. Why and for what purpose is this needed? In which environment? I think the intent was to let uClibc users (and possibly eglibc users?) omit locale support from the libc, which reduces libc size quite a bit, and use the UTF-8 code in busybox instead. As far as I know, the beginning of those BB internal functions, where at times where only glibc had locale support and there where no alternatives for small environments. But things changed and there are now alternatives. So have we reached a point, where we are able to simplify things in BB (which means to focus on correct mb function usage everywhere and to strip unnecessary decisions, configs and helper code)? I wouldn't object to this change. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Fri, Aug 15, 2014 at 6:31 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote: On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com wrote: its the same with bionic libc (arm) printf(test) is ok but not printf(buf) with char buf[] = test; printf(%s, buf) is ok Yeah, I guess it is about personal preference. I personally do not like the extended code just to make some smart option silent. IMHO, it makes the code needlessly longer by adding another small layer for the content of the variable. If there is no need for formatting, why use printf in the first place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On the other hand, it is too bad that there is nothing between fputs and puts where you do not need to use the file descriptor and you do not get a new line either. In principle stdout is more expensive than . stdout is a GOT reference in PIC, and is just PC-relative. For non-PIC it makes no difference though. Then my PC is probably lying since the practice does seem to disagree with you, apparently as well as this post: http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 (and the one below, ok, it is not critical, but it performs better for others, too) Raw strings are also uglier IMHO than well-defined identifiers, like stdout, but that is personal taste, so I do not mind that one. printf() is almost always going to call vfprintf(stdout, fmt, ap); I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only exception I've found, and it's ugly and does not have locking/retry/... newlib calls _vfprintf_r, which is almost the same concept. Meanwhile, printf has to do the formatting. In other words: Just because printf(%s, string) doesn't name stdout, don't assume that it doesn't use a reference to stdout. So an ACK for Laszlo Papp's recommendation of fputs(). I was talking about code size overhead, not run time overhead. Of course there's a reference to stdout in printf. When you reference it again yourself, that's a second reference. It's not a big deal but apparently busybox folks care about this kind of micro-optimization... Why not care if it is better and it has no disadvantage? The size complaint does not matter since that size overhead is already in the lib pointed out by Isaac. It will not automagically know to print to stdout. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Fri, Aug 15, 2014 at 06:39:39PM +0100, Laszlo Papp wrote: On Fri, Aug 15, 2014 at 6:31 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote: On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote: On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote: On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: If there is no need for formatting, why use printf in the first place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On the other hand, it is too bad that there is nothing between fputs and puts where you do not need to use the file descriptor and you do not get a new line either. In principle stdout is more expensive than . stdout is a GOT reference in PIC, and is just PC-relative. For non-PIC it makes no difference though. Then my PC is probably lying since the practice does seem to disagree with you, apparently as well as this post: http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 (and the one below, ok, it is not critical, but it performs better for others, too) Raw strings are also uglier IMHO than well-defined identifiers, like stdout, but that is personal taste, so I do not mind that one. printf() is almost always going to call vfprintf(stdout, fmt, ap); I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only exception I've found, and it's ugly and does not have locking/retry/... newlib calls _vfprintf_r, which is almost the same concept. Meanwhile, printf has to do the formatting. In other words: Just because printf(%s, string) doesn't name stdout, don't assume that it doesn't use a reference to stdout. So an ACK for Laszlo Papp's recommendation of fputs(). I was talking about code size overhead, not run time overhead. Of course there's a reference to stdout in printf. When you reference it again yourself, that's a second reference. It's not a big deal but apparently busybox folks care about this kind of micro-optimization... Why not care if it is better and it has no disadvantage? The size complaint does not matter since that size overhead is already in the lib pointed out by Isaac. It will not automagically know to print to stdout. If I understand correctly, Rich is saying that every use of fputs will add more to the size of the binary. I did a simple test using the following code: fput.c: #include stdio.h int main() { char *test = TEST...; fputs(test, stdout); } print.c: #include stdio.h int main() { char *test = TEST...; printf(%s, test); } Then I built the objects: make fput.o print.o Per ls, fput.o is 24 bytes larger (1436 vs 1412 bytes). Per size, print.o is 1 byte larger (182 vs 183 bytes text, 0 data/bss). Build with CFLAGS=-Os -g0, and both grow to 1548 bytes on disk; size of fput.o is 198, while print.o is 200. For the dynamically linked binaries built with CFLAGS=-Os -g0, ls and size are 8733/1969 bytes for fput and 8712/1933 for print; both are 5236 bytes on disk after running strip. Thanks, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox