Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote: > On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker wrote: > > > On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: > > > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot > > > wrote: > > > > > > > its the same with bionic libc (arm) > > > > > > > > printf("test") is ok but not printf(buf) with char buf[] = "test"; > > > > printf("%s", buf) is ok > > > > > > > > > > Yeah, I guess it is about personal preference. I personally do not like > > the > > > extended code just to make some "smart" option silent. IMHO, it makes the > > > code needlessly longer by adding another small layer for the content of > > the > > > variable. If there is no need for formatting, why use printf in the first > > > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On > > the > > > other hand, it is too bad that there is nothing between fputs and puts > > > where you do not need to use the file descriptor and you do not get a new > > > line either. > > > > In principle stdout is more expensive than "". stdout is a GOT > > reference in PIC, and "" is just PC-relative. For non-PIC it makes no > > difference though. > > > > Then my PC is probably lying since the practice does seem to disagree with > you, apparently as well as this post: > > http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 > > (and the one below, ok, it is not critical, but it performs better for > others, too) > > Raw strings are also uglier IMHO than well-defined identifiers, like > stdout, but that is personal taste, so I do not mind that one. printf() is almost always going to call vfprintf(stdout, fmt, ap); I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only exception I've found, and it's ugly and does not have locking/retry/... newlib calls _vfprintf_r, which is almost the same concept. Meanwhile, printf has to do the formatting. In other words: Just because printf("%s", string) doesn't name stdout, don't assume that it doesn't use a reference to stdout. So an ACK for Laszlo Papp's recommendation of fputs(). Thanks, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
Hi Rich, hi all, looks like we agree at most topics and tend to reach a point of a more philosophical discussion. There are only a few statements from you, I want to hop on: > It's not an "ad". Sorry Rich, forgot to add a smiley to the "ad" topic ;) ... I like your musl approach, except some very detailed decisions. >> and how want you behave in case of invalid UTF-8 sequences? My functions just skip over stray codes of 0x80..0xBF and synchronize on next valid UTF-8 leading byte. How would you count invalid sequences? In general, I would count the whole operation as a failure, returning some value such as -1 reserved for failure, since the string is not actually UTF-8 and thus "how many characters?" has no meaning. For specific uses, there might be other preferred behaviors. If your goal is display, you may want to simply replace illegal sequences with U+FFFD in which case you'd count each such sequence as "1", but if you're using this character-counting to allocate a buffer for the converted string, you need to be sure your conversion function and character-counting function agree on how illegal sequences are counted, or you might overflow your buffer or end up having to truncate the output. Rich, will you ever use the result of counting the numbers of UTF-8 characters to allocate a buffer? I don't think so. That would be very ill behavior. To allocate buffer space you need the number of bytes occupied by a string, not the number of UTF-8 characters. Beside this I prefer having really fast (and robust) functions for UTF-8, which may give somewhat incorrect result if input comes from an error prone source, but this result shall not break the program, as long they are used carefully. Otherwise you need to check for errors after each function call, which slows down operations additionally. I like it more the way, give the best result we can, even if things are broken, but continue with normal operation ... and at required points, it may be necessary to call utf8test() a function to test for validity, which neglects invalid UTF-8 strings. "Interaction with other code" was not about mixing your own pure UTF-8 functions with the standard C multibyte functions in possibly-non-UTF-8 locales. It was about mixing them with other code that's processing UTF-8 but handling errors differently. One such example would be the standard C multibyte functions when nl_langinfo(CODESET) has already been determined to be "UTF-8" (so you know they're processing UTF-8), but "pure UTF-8" code outside of the standard functions might also be handling errors differently from what you're doing, and mixing it with your handling _could_ be dangerous, depending on what you do. My functions are designed to be fast and robust. With error free UTF-8 they won't produce any errors, Otherwise they just try to give best result, even with invalid or damaged sequences. They just keep any sequence it has bean given, in the original order. If you mix those functions with other functions, the only convention beside not mixing UTF-8 with other multi byte codes is, those other functions has to be robust too, that is they don't have to be trapped by invalid sequences. So if you really like, you may freely mix my simplified UTF-8 functions with multi byte based UTF-8 processing which check every single character to be valid. The only failure would be to use unchecked UTF-8 strings for operations, which badly fail for invalid character sequences (not so much I know about). So I can't see where it get dangerous to mix my functions with others? I'm fine with assuming all data is nominally UTF-8. What's not fine is assuming that data which is nominally UTF-8 is actually valid UTF-8. I never say my functions need valid UTF-8. They just assume all data is either ASCII or nominally UTF-8, and try their best to operate on invalid sequences (either skipping or not breaking those illegal sequences). ... but this all is a philosophical discussion. BB started to use full multi byte locale functions, which is the more accurate way and even adds support for other multi byte character sets (may be welcome for everybody who needs them), so BB shall stay on this ... at least until one day everybody ask what this hole none UTF-8 stuff was about (whenever this will be). My only résumé is, BB shall disable all that Unicode/UTF-8 config stuff and always use full locale support of the library, giving information on how to configure/install known libs on a doc file. Either BB runs in full glibc environment, which has full multi byte locale functions, or BB may linked with a different lib which works at least for UTF-8 correct without any additional stuff to be installed. This would simplify configuration and code and will just rely on usage of a correct configured libc environment. So the big question is: Is there anybody who still needs the BB internal Unicode handling and can't use the locale func
Package to use busybox runit as PID 1 on Arch Linux
I've been using this for over a year now and I feel like I ought to let people know about it since everyone seems to be interested in PID 1 a lot these days. https://aur.archlinux.org/packages/runit-init/ (Mediocre) essay about it: https://woozle.org/~neale/papers/arch-runit.html I created the Arch package because I liked runit so much on a Buildroot-based system I made for running security contests at Defcon. Busybox runit is now PID 1 on every machine I administer. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
On Thu, Aug 14, 2014 at 09:09:02PM +0200, Harald Becker wrote: > Hi Rich! > > > I think uClibc is pretty fast at this too. It's glibc that's horribly > >slow. Rough comparison: > > Pretty fast is still slower than UTF-8 optimized functions. The standard functions certainly can be UTF-8-optimized, and they are in at least several implementations. I think glibc still has a pretty slow path to get to the UTF-8 decoding but hopefully that will be fixed eventually. I'll remind myself to pursue that in the future. > >For processing a full string buffer, musl is roughly twice as fast as > >uClibc, and uClibc is roughly twice as fast as glibc. > > You don't need to make ads for musl here, I' would like to see > prebuild versions of BB statically linked with musl. It's not an "ad". It's just pointing out that uClibc is probably not significantly slower for what you care about. My interpretation was that you trusted me that musl is fast here, but thought other more commonly used implementations might be slow, so I stated the relative speeds as a basis for evaluating that. > >Presumably you would use a full string operation here (mbstowcs with > >null output pointer) for computing length in characters. > > Do you remember my question only UTF-8 or full multi byte locale? It > is exactly this decision. The former may be optimized the later more > accurate. Yes I remember the question. Assuming the standard function has a fast path for UTF-8, which it should, the only reason to expect the standard multibyte functions to be significantly slower than your custom ones is that they detect illegal sequences rather than blindly assuming the input is valid. > >>If you know you are using UTF-8 you do not need to check every > >>string over and over again, else it's pure paranoia. It is robust, > >>as it will not run away on anything which is valid C string. > > > >Well if the string comes from a source outside of your control, you > >need to check it at least once. But you might not want to check and > >reject it at the original point of input, e.g. if you want to be able > >to preserve arbitrary byte sequences that might not be UTF-8, e.g. an > >argument that's a filename in an invalid encoding which you're trying > >to delete or rename to fix. So IMO it makes a lot more sense to do > >your checking at the point of treating the string as a sequence of > >characters, even if it happens multiple times. The cost is not high if > >your implementation is efficient. > > and how want you behave in case of invalid UTF-8 sequences? My > functions just skip over stray codes of 0x80..0xBF and synchronize > on next valid UTF-8 leading byte. How would you count invalid > sequences? In general, I would count the whole operation as a failure, returning some value such as -1 reserved for failure, since the string is not actually UTF-8 and thus "how many characters?" has no meaning. For specific uses, there might be other preferred behaviors. If your goal is display, you may want to simply replace illegal sequences with U+FFFD in which case you'd count each such sequence as "1", but if you're using this character-counting to allocate a buffer for the converted string, you need to be sure your conversion function and character-counting function agree on how illegal sequences are counted, or you might overflow your buffer or end up having to truncate the output. > >>>Of course it also gets tripped up badly on invalid sequences. > >> > >>How can it get tripped? It silently skip over invalid sequences (of > >>0x80 to 0xBF until next leading of a sequence). It shall not get > >>stuck in any way. Or tell me exactly how ... > > > >By itself it's not a problem, but the interaction with other code may > >be a problem if the other code does not follow exactly the same > >conventions. > > Sure, you can't mix multi byte functions with pure UTF-8 functions, > you always need to look what type of function you call in your code. > So what's different here. "Interaction with other code" was not about mixing your own pure UTF-8 functions with the standard C multibyte functions in possibly-non-UTF-8 locales. It was about mixing them with other code that's processing UTF-8 but handling errors differently. One such example would be the standard C multibyte functions when nl_langinfo(CODESET) has already been determined to be "UTF-8" (so you know they're processing UTF-8), but "pure UTF-8" code outside of the standard functions might also be handling errors differently from what you're doing, and mixing it with your handling _could_ be dangerous, depending on what you do. > and the convention is just UTF-8 (even with invalid sequences) > not a mixture with other multi byte codes. Not so much requirement > of a convention? > > The functions have bean designed carefully to be not trapped on > invalid sequences. I know they look extreme simple, but this is part > of the optimization. > > remember: We are not talking about the ability
Re: Possible Unicode Problems in Busybox - Collect and Discussion
Hi Rich! > I think uClibc is pretty fast at this too. It's glibc that's horribly slow. Rough comparison: Pretty fast is still slower than UTF-8 optimized functions. For processing a full string buffer, musl is roughly twice as fast as uClibc, and uClibc is roughly twice as fast as glibc. You don't need to make ads for musl here, I' would like to see prebuild versions of BB statically linked with musl. Presumably you would use a full string operation here (mbstowcs with null output pointer) for computing length in characters. Do you remember my question only UTF-8 or full multi byte locale? It is exactly this decision. The former may be optimized the later more accurate. If you know you are using UTF-8 you do not need to check every string over and over again, else it's pure paranoia. It is robust, as it will not run away on anything which is valid C string. Well if the string comes from a source outside of your control, you need to check it at least once. But you might not want to check and reject it at the original point of input, e.g. if you want to be able to preserve arbitrary byte sequences that might not be UTF-8, e.g. an argument that's a filename in an invalid encoding which you're trying to delete or rename to fix. So IMO it makes a lot more sense to do your checking at the point of treating the string as a sequence of characters, even if it happens multiple times. The cost is not high if your implementation is efficient. ... and how want you behave in case of invalid UTF-8 sequences? My functions just skip over stray codes of 0x80..0xBF and synchronize on next valid UTF-8 leading byte. How would you count invalid sequences? Of course it also gets tripped up badly on invalid sequences. How can it get tripped? It silently skip over invalid sequences (of 0x80 to 0xBF until next leading of a sequence). It shall not get stuck in any way. Or tell me exactly how ... By itself it's not a problem, but the interaction with other code may be a problem if the other code does not follow exactly the same conventions. Sure, you can't mix multi byte functions with pure UTF-8 functions, you always need to look what type of function you call in your code. So what's different here. ... and the convention is just UTF-8 (even with invalid sequences) not a mixture with other multi byte codes. Not so much requirement of a convention? The functions have bean designed carefully to be not trapped on invalid sequences. I know they look extreme simple, but this is part of the optimization. ... remember: We are not talking about the ability to work with other multi byte locales. The assumption was pure ASCII or UTF-8. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
Hi ! On 14.08.2014 19:16, Tanguy Pruvot wrote: you need to test s != NULL, else *s will crash It is like other str functions of the libc, you need to call the function with a valid pointer. ... else if you like add: "if (!s) retun 0;" ahead of the while -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
On Thu, Aug 14, 2014 at 07:14:52PM +0200, Harald Becker wrote: > Hi Rich! > > >> You say this, but libbb/unicode.c contains a unicode_strlen calling > >>this complex mb to wc conversion function to count the number of > >>characters. Those multi byte functions tend to be highly complex and > >>slow (don't know if they have gone better). For just UTF-8, things > >>can be optimized. > > > >This depends on your libc. > > that is, why I added "don't know if gone better" ... really > good when musl is fast on this ... the problem is BB is more likely > linked with glibc or uClibc ... there the results are not so great > :( I think uClibc is pretty fast at this too. It's glibc that's horribly slow. Rough comparison: For processing a full string buffer, musl is roughly twice as fast as uClibc, and uClibc is roughly twice as fast as glibc. For byte-by-byte processing: musl is roughly 3x as fast as uClibc and roughly 4x as fast as glibc. Source: my comparison at http://www.etalabs.net/compare_libcs.html Presumably you would use a full string operation here (mbstowcs with null output pointer) for computing length in characters. > >>size_t utf8len( const char* s ) > >>{ > >> size_t n = 0; > >> while (*s) > >> if ((*s++ ^ 0x40) < 0xC0) > >> n++; > >> return n; > >>} > > > >This function is only valid if the string is known to be valid UTF-8. > > Yes, I told it's for UTF-8. Yes, but there's a difference between "nominally UTF-8" and "known-valid UTF-8". > >Otherwise it hides errors, which may or may not be problematic > >depending on what you're using it for. > > If you know you are using UTF-8 you do not need to check every > string over and over again, else it's pure paranoia. It is robust, > as it will not run away on anything which is valid C string. Well if the string comes from a source outside of your control, you need to check it at least once. But you might not want to check and reject it at the original point of input, e.g. if you want to be able to preserve arbitrary byte sequences that might not be UTF-8, e.g. an argument that's a filename in an invalid encoding which you're trying to delete or rename to fix. So IMO it makes a lot more sense to do your checking at the point of treating the string as a sequence of characters, even if it happens multiple times. The cost is not high if your implementation is efficient. > >Of course it also gets tripped up badly on invalid sequences. > > How can it get tripped? It silently skip over invalid sequences (of > 0x80 to 0xBF until next leading of a sequence). It shall not get > stuck in any way. Or tell me exactly how ... By itself it's not a problem, but the interaction with other code may be a problem if the other code does not follow exactly the same conventions. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
On Thu, Aug 14, 2014 at 07:16:36PM +0200, Tanguy Pruvot wrote: > size_t utf8len( const char* s ) > { >size_t n = 0; >while (*s) > if ((*s++ ^ 0x40) < 0xC0) >n++; >return n; > } > > you need to test s != NULL, else *s will crash Says who? NULL is not a valid pointer. Should you also check for things like s != (char *)-1 ? What value would you return then, anyway? Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
size_t utf8len( const char* s ) { size_t n = 0; while (*s) if ((*s++ ^ 0x40) < 0xC0) n++; return n; } you need to test s != NULL, else *s will crash 2014-08-14 19:14 GMT+02:00 Harald Becker : > Hi Rich! > > > >> You say this, but libbb/unicode.c contains a unicode_strlen calling > >> this complex mb to wc conversion function to count the number of >>> characters. Those multi byte functions tend to be highly complex and >>> slow (don't know if they have gone better). For just UTF-8, things >>> can be optimized. >>> >> >> This depends on your libc. >> > > ... that is, why I added "don't know if gone better" ... really good when > musl is fast on this ... the problem is BB is more likely linked with glibc > or uClibc ... there the results are not so great :( > > > size_t utf8len( const char* s ) >>> { >>>size_t n = 0; >>>while (*s) >>> if ((*s++ ^ 0x40) < 0xC0) >>>n++; >>>return n; >>> } >>> >> >> This function is only valid if the string is known to be valid UTF-8. >> > > Yes, I told it's for UTF-8. > > > Otherwise it hides errors, which may or may not be problematic >> depending on what you're using it for. >> > > If you know you are using UTF-8 you do not need to check every string over > and over again, else it's pure paranoia. It is robust, as it will not run > away on anything which is valid C string. > > > Another fast function I use for UTF-8 ... skip to Nth UTF-8 >>> character in a string (returns a pointer to trailing \0 if N > >>> number of UTF-8 chars in string): >>> >>> char *utf8skip( char const* s, size_t n ) >>> { >>>for ( ; n && *s; --n ) >>> while ((*++s ^ 0x40) >= 0xC0); >>>return (char*)s; >>> } >>> >> >> This code is invalid; it's assuming char is unsigned. In practice, >> *++s ^ 0x40 is going to be negative on most archs. Better would be >> doing an unsigned range check like (unsigned char)*++s-0x80<0x40U. >> > > Yes, I missed the type cast ... sorry, for this, see previous mail > > > Of course it also gets tripped up badly on invalid sequences. >> > > How can it get tripped? It silently skip over invalid sequences (of 0x80 > to 0xBF until next leading of a sequence). It shall not get stuck in any > way. Or tell me exactly how ... > > -- > Harald > > > > ___ > busybox mailing list > busybox@busybox.net > http://lists.busybox.net/mailman/listinfo/busybox > ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Possible Unicode Problems in Busybox - Collect and Discussion
Hi Rich! >> You say this, but libbb/unicode.c contains a unicode_strlen calling this complex mb to wc conversion function to count the number of characters. Those multi byte functions tend to be highly complex and slow (don't know if they have gone better). For just UTF-8, things can be optimized. This depends on your libc. ... that is, why I added "don't know if gone better" ... really good when musl is fast on this ... the problem is BB is more likely linked with glibc or uClibc ... there the results are not so great :( size_t utf8len( const char* s ) { size_t n = 0; while (*s) if ((*s++ ^ 0x40) < 0xC0) n++; return n; } This function is only valid if the string is known to be valid UTF-8. Yes, I told it's for UTF-8. Otherwise it hides errors, which may or may not be problematic depending on what you're using it for. If you know you are using UTF-8 you do not need to check every string over and over again, else it's pure paranoia. It is robust, as it will not run away on anything which is valid C string. Another fast function I use for UTF-8 ... skip to Nth UTF-8 character in a string (returns a pointer to trailing \0 if N > number of UTF-8 chars in string): char *utf8skip( char const* s, size_t n ) { for ( ; n && *s; --n ) while ((*++s ^ 0x40) >= 0xC0); return (char*)s; } This code is invalid; it's assuming char is unsigned. In practice, *++s ^ 0x40 is going to be negative on most archs. Better would be doing an unsigned range check like (unsigned char)*++s-0x80<0x40U. Yes, I missed the type cast ... sorry, for this, see previous mail Of course it also gets tripped up badly on invalid sequences. How can it get tripped? It silently skip over invalid sequences (of 0x80 to 0xBF until next leading of a sequence). It shall not get stuck in any way. Or tell me exactly how ... -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: AW: Possible Unicode Problems in Busybox - Collect and Discussion
Hi Dietmar! On 14.08.2014 08:21, dietmar.schind...@manroland-web.com wrote: > These do not work if char is signed. You are right, I missed the type casts ... sorry size_t utf8len( const char* s ) { size_t n = 0; while (*s) if ((unsigned char)(*s++ ^ 0x40) < (unsigned char)0xC0) n++; return n; } char *utf8skip( char const* s, size_t n ) { for ( ; n && *s; --n ) while ((unsigned char)(*++s ^ 0x40) >= (unsigned char)0xC0); return (char*)s; } I know, most would prefer to use (unsigned char) ahead of *++s or *s++, but at least gcc gave better optimized x86 code for my type casts. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
Here is my benchmark just to show I am not speaking out of an imaginary world: #include int main() { const char *buff = "foo"; int i = 0; for (i; i < 100; ++i) // fputs(buff, stdout); printf("%s", buff); return 0; } Fputs: real0m1.761s user0m0.047s sys 0m0.007s Printf: real0m2.811s user0m0.097s sys 0m0.020s On Thu, Aug 14, 2014 at 4:15 PM, Laszlo Papp wrote: > On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker wrote: > >> On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: >> > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot > > >> > wrote: >> > >> > > its the same with bionic libc (arm) >> > > >> > > printf("test") is ok but not printf(buf) with char buf[] = "test"; >> > > printf("%s", buf) is ok >> > > >> > >> > Yeah, I guess it is about personal preference. I personally do not like >> the >> > extended code just to make some "smart" option silent. IMHO, it makes >> the >> > code needlessly longer by adding another small layer for the content of >> the >> > variable. If there is no need for formatting, why use printf in the >> first >> > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On >> the >> > other hand, it is too bad that there is nothing between fputs and puts >> > where you do not need to use the file descriptor and you do not get a >> new >> > line either. >> >> In principle stdout is more expensive than "". stdout is a GOT >> reference in PIC, and "" is just PC-relative. For non-PIC it makes no >> difference though. >> > > Then my PC is probably lying since the practice does seem to disagree with > you, apparently as well as this post: > > > http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 > > (and the one below, ok, it is not critical, but it performs better for > others, too) > > Raw strings are also uglier IMHO than well-defined identifiers, like > stdout, but that is personal taste, so I do not mind that one. > > >> As for the topic at hand, I don't think it's necessary to make this >> change if the messages are DOCUMENTED as being format strings that are >> required not to contain any format specifiers, and there's a big >> warning to this effect in the source wherever they appear. printf >> allows format strings chosen (or even constructed) at runtime and the >> -W option to make this an error is bogus; at least it cannot be >> applied to code that was not written for a specific coding style that >> forbids such usage. >> > > Well, the advantage of removing this security risk is that any string can > be passed without constraining the string anytime. > ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker wrote: > On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: > > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot > > wrote: > > > > > its the same with bionic libc (arm) > > > > > > printf("test") is ok but not printf(buf) with char buf[] = "test"; > > > printf("%s", buf) is ok > > > > > > > Yeah, I guess it is about personal preference. I personally do not like > the > > extended code just to make some "smart" option silent. IMHO, it makes the > > code needlessly longer by adding another small layer for the content of > the > > variable. If there is no need for formatting, why use printf in the first > > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On > the > > other hand, it is too bad that there is nothing between fputs and puts > > where you do not need to use the file descriptor and you do not get a new > > line either. > > In principle stdout is more expensive than "". stdout is a GOT > reference in PIC, and "" is just PC-relative. For non-PIC it makes no > difference though. > Then my PC is probably lying since the practice does seem to disagree with you, apparently as well as this post: http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840 (and the one below, ok, it is not critical, but it performs better for others, too) Raw strings are also uglier IMHO than well-defined identifiers, like stdout, but that is personal taste, so I do not mind that one. > As for the topic at hand, I don't think it's necessary to make this > change if the messages are DOCUMENTED as being format strings that are > required not to contain any format specifiers, and there's a big > warning to this effect in the source wherever they appear. printf > allows format strings chosen (or even constructed) at runtime and the > -W option to make this an error is bogus; at least it cannot be > applied to code that was not written for a specific coding style that > forbids such usage. > Well, the advantage of removing this security risk is that any string can be passed without constraining the string anytime. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote: > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot > wrote: > > > its the same with bionic libc (arm) > > > > printf("test") is ok but not printf(buf) with char buf[] = "test"; > > printf("%s", buf) is ok > > > > Yeah, I guess it is about personal preference. I personally do not like the > extended code just to make some "smart" option silent. IMHO, it makes the > code needlessly longer by adding another small layer for the content of the > variable. If there is no need for formatting, why use printf in the first > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On the > other hand, it is too bad that there is nothing between fputs and puts > where you do not need to use the file descriptor and you do not get a new > line either. In principle stdout is more expensive than "". stdout is a GOT reference in PIC, and "" is just PC-relative. For non-PIC it makes no difference though. As for the topic at hand, I don't think it's necessary to make this change if the messages are DOCUMENTED as being format strings that are required not to contain any format specifiers, and there's a big warning to this effect in the source wherever they appear. printf allows format strings chosen (or even constructed) at runtime and the -W option to make this an error is bogus; at least it cannot be applied to code that was not written for a specific coding style that forbids such usage. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 3:36 PM, wrote: > Hi Laszlo, > > > Yeah, I guess it is about personal preference. I personally do not like > the > > extended code just to make some "smart" option silent. ... > > Based on your arguing it seems that you have NEVER heard about > format string exploits which are around for 15 years? > > http://en.wikipedia.org/wiki/Uncontrolled_format_string That is your interpretation, but what I am saying, _this code_ does not do any exploit. Either way, please do not use printf. I have had a quick measure, and fputs performs slightly better. The "%s" is an overhead in the way. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot wrote: > its the same with bionic libc (arm) > > printf("test") is ok but not printf(buf) with char buf[] = "test"; > printf("%s", buf) is ok > Yeah, I guess it is about personal preference. I personally do not like the extended code just to make some "smart" option silent. IMHO, it makes the code needlessly longer by adding another small layer for the content of the variable. If there is no need for formatting, why use printf in the first place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On the other hand, it is too bad that there is nothing between fputs and puts where you do not need to use the file descriptor and you do not get a new line either. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
its the same with bionic libc (arm) printf("test") is ok but not printf(buf) with char buf[] = "test"; printf("%s", buf) is ok 2014-08-14 15:29 GMT+02:00 Laszlo Papp : > Sorry, the list was left out at some point due to my mistake ... See below. > > > On Thu, Aug 14, 2014 at 2:28 PM, Laszlo Papp wrote: > >> On Thu, Aug 14, 2014 at 2:04 PM, wrote: >> >>> Hi Laszlo, >>> >>> > Cannot reproduce with -Wall -Wpedantic -Wextra. >>> >>> Did you notice: >>> applets/usage_pod.c:74:3: error: format not a string literal and no >>> format arguments >>> [-Werror=format-security] >>> ? >>> >> >> Hmm, yes, but I got used to that -Wall -Wpedantic -Wextra would cover the >> cases that need to be checked. I am not sure this option respects the C >> language though. >> >> Anyway, I guess I learnt something new today, thanks. :-) >> > > > ___ > busybox mailing list > busybox@busybox.net > http://lists.busybox.net/mailman/listinfo/busybox > ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Re: Re: missing format string in applets/usage_pod.c
Sorry, the list was left out at some point due to my mistake ... See below. On Thu, Aug 14, 2014 at 2:28 PM, Laszlo Papp wrote: > On Thu, Aug 14, 2014 at 2:04 PM, wrote: > >> Hi Laszlo, >> >> > Cannot reproduce with -Wall -Wpedantic -Wextra. >> >> Did you notice: >> applets/usage_pod.c:74:3: error: format not a string literal and no >> format arguments >> [-Werror=format-security] >> ? >> > > Hmm, yes, but I got used to that -Wall -Wpedantic -Wextra would cover the > cases that need to be checked. I am not sure this option respects the C > language though. > > Anyway, I guess I learnt something new today, thanks. :-) > ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: missing format string in applets/usage_pod.c
Hi Markus, On Thu, Aug 14, 2014 at 03:52:17PM +0300, Baruch Siach wrote: > On Thu, Aug 14, 2014 at 02:36:18PM +0200, wald...@gmx.de wrote: > > depending on the compilation options a build could fail here: > > > > --- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400 > > +++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400 > > @@ -71,7 +71,7 @@ > > } else { > > printf(", "); > > } > > - printf(usage_array[i].aname); > > + printf("%s", usage_array[i].aname); > > I guess puts() would make code size smaller (didn't test). Scratch that. puts() adds a trailing newline. baruch -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: missing format string in applets/usage_pod.c
Hi Markus, On Thu, Aug 14, 2014 at 02:36:18PM +0200, wald...@gmx.de wrote: > depending on the compilation options a build could fail here: > > --- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400 > +++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400 > @@ -71,7 +71,7 @@ > } else { > printf(", "); > } > - printf(usage_array[i].aname); > + printf("%s", usage_array[i].aname); I guess puts() would make code size smaller (didn't test). baruch > col += len2; > } > printf("\n\n"); -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
missing format string in applets/usage_pod.c
Hi, depending on the compilation options a build could fail here: --- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400 +++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400 @@ -71,7 +71,7 @@ } else { printf(", "); } - printf(usage_array[i].aname); + printf("%s", usage_array[i].aname); col += len2; } printf("\n\n"); Thanks Markus ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox