Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Isaac Dunham
On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
> On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker  wrote:
> 
> > On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
> > > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot 
> > > wrote:
> > >
> > > > its the same with bionic libc (arm)
> > > >
> > > > printf("test") is ok but not printf(buf) with char buf[] = "test";
> > > > printf("%s", buf) is ok
> > > >
> > >
> > > Yeah, I guess it is about personal preference. I personally do not like
> > the
> > > extended code just to make some "smart" option silent. IMHO, it makes the
> > > code needlessly longer by adding another small layer for the content of
> > the
> > > variable. If there is no need for formatting, why use printf in the first
> > > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On
> > the
> > > other hand, it is too bad that there is nothing between fputs and puts
> > > where you do not need to use the file descriptor and you do not get a new
> > > line either.
> >
> > In principle stdout is more expensive than "". stdout is a GOT
> > reference in PIC, and "" is just PC-relative. For non-PIC it makes no
> > difference though.
> >
> 
> Then my PC is probably lying since the practice does seem to disagree with
> you, apparently as well as this post:
> 
> http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
> 
> (and the one below, ok, it is not critical, but it performs better for
> others, too)
> 
> Raw strings are also uglier IMHO than well-defined identifiers, like
> stdout, but that is personal taste, so I do not mind that one.
 
printf() is almost always going to call vfprintf(stdout, fmt, ap);
I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
exception I've found, and it's ugly and does not have locking/retry/...
newlib calls _vfprintf_r, which is almost the same concept.
Meanwhile, printf has to do the formatting.

In other words: 
Just because printf("%s", string) doesn't name stdout, don't assume that
it doesn't use a reference to stdout.
So an ACK for Laszlo Papp's recommendation of fputs().

Thanks,
Isaac Dunham
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Harald Becker

Hi Rich,
hi all,

looks like we agree at most topics and tend to reach a point of a more 
philosophical discussion. There are only a few statements from you, I 
want to hop on:


> It's not an "ad".

Sorry Rich, forgot to add a smiley to the "ad" topic ;) ... I like your 
musl approach, except some very detailed decisions.


>>  and how want you behave in case of invalid UTF-8 sequences? My

functions just skip over stray codes of 0x80..0xBF and synchronize
on next valid UTF-8 leading byte. How would you count invalid
sequences?


In general, I would count the whole operation as a failure, returning
some value such as -1 reserved for failure, since the string is not
actually UTF-8 and thus "how many characters?" has no meaning. For
specific uses, there might be other preferred behaviors. If your goal
is display, you may want to simply replace illegal sequences with
U+FFFD in which case you'd count each such sequence as "1", but if
you're using this character-counting to allocate a buffer for the
converted string, you need to be sure your conversion function and
character-counting function agree on how illegal sequences are
counted, or you might overflow your buffer or end up having to
truncate the output.


Rich, will you ever use the result of counting the numbers of UTF-8 
characters to allocate a buffer? I don't think so. That would be very 
ill behavior. To allocate buffer space you need the number of bytes 
occupied by a string, not the number of UTF-8 characters.


Beside this I prefer having really fast (and robust) functions for 
UTF-8, which may give somewhat incorrect result if input comes from an 
error prone source, but this result shall not break the program, as long 
they are used carefully. Otherwise you need to check for errors after 
each function call, which slows down operations additionally.


I like it more the way, give the best result we can, even if things are 
broken, but continue with normal operation ... and at required points, 
it may be necessary to call utf8test() a function to test for validity, 
which neglects invalid UTF-8 strings.



"Interaction with other code" was not about mixing your own pure UTF-8
functions with the standard C multibyte functions in
possibly-non-UTF-8 locales. It was about mixing them with other code
that's processing UTF-8 but handling errors differently. One such
example would be the standard C multibyte functions when
nl_langinfo(CODESET) has already been determined to be "UTF-8" (so you
know they're processing UTF-8), but "pure UTF-8" code outside of the
standard functions might also be handling errors differently from what
you're doing, and mixing it with your handling _could_ be dangerous,
depending on what you do.


My functions are designed to be fast and robust. With error free UTF-8 
they won't produce any errors, Otherwise they just try to give best 
result, even with invalid or damaged sequences. They just keep any 
sequence it has bean given, in the original order. If you mix those 
functions with other functions, the only convention beside not mixing 
UTF-8 with other multi byte codes is, those other functions has to be 
robust too, that is they don't have to be trapped by invalid sequences. 
So if you really like, you may freely mix my simplified UTF-8 functions 
with multi byte based UTF-8 processing which check every single 
character to be valid. The only failure would be to use unchecked UTF-8 
strings for operations, which badly fail for invalid character sequences 
(not so much I know about). So I can't see where it get dangerous to mix 
my functions with others?



I'm fine with assuming all data is nominally UTF-8. What's not fine is
assuming that data which is nominally UTF-8 is actually valid UTF-8.


I never say my functions need valid UTF-8. They just assume all data is 
either ASCII or nominally UTF-8, and try their best to operate on 
invalid sequences (either skipping or not breaking those illegal sequences).


... but this all is a philosophical discussion. BB started to use full 
multi byte locale functions, which is the more accurate way and even 
adds support for other multi byte character sets (may be welcome for 
everybody who needs them), so BB shall stay on this ... at least until 
one day everybody ask what this hole none UTF-8 stuff was about 
(whenever this will be).



My only résumé is, BB shall disable all that Unicode/UTF-8 config stuff 
and always use full locale support of the library, giving information on 
how to configure/install known libs on a doc file. Either BB runs in 
full glibc environment, which has full multi byte locale functions, or 
BB may linked with a different lib which works at least for UTF-8 
correct without any additional stuff to be installed. This would 
simplify configuration and code and will just rely on usage of a correct 
configured libc environment.


So the big question is: Is there anybody who still needs the BB internal 
Unicode handling and can't use the locale func

Package to use busybox runit as PID 1 on Arch Linux

2014-08-14 Thread Neale Pickett
I've been using this for over a year now and I feel like I ought to let
people know about it since everyone seems to be interested in PID 1 a lot
these days.

https://aur.archlinux.org/packages/runit-init/

(Mediocre) essay about it: https://woozle.org/~neale/papers/arch-runit.html

I created the Arch package because I liked runit so much on a
Buildroot-based system I made for running security contests at Defcon.
Busybox runit is now PID 1 on every machine I administer.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 09:09:02PM +0200, Harald Becker wrote:
> Hi Rich!
> 
> > I think uClibc is pretty fast at this too. It's glibc that's horribly
> >slow. Rough comparison:
> 
> Pretty fast is still slower than UTF-8 optimized functions.

The standard functions certainly can be UTF-8-optimized, and they are
in at least several implementations. I think glibc still has a pretty
slow path to get to the UTF-8 decoding but hopefully that will be
fixed eventually. I'll remind myself to pursue that in the future.

> >For processing a full string buffer, musl is roughly twice as fast as
> >uClibc, and uClibc is roughly twice as fast as glibc.
> 
> You don't need to make ads for musl here, I' would like to see
> prebuild versions of BB statically linked with musl.

It's not an "ad". It's just pointing out that uClibc is probably not
significantly slower for what you care about. My interpretation was
that you trusted me that musl is fast here, but thought other more
commonly used implementations might be slow, so I stated the relative
speeds as a basis for evaluating that.

> >Presumably you would use a full string operation here (mbstowcs with
> >null output pointer) for computing length in characters.
> 
> Do you remember my question only UTF-8 or full multi byte locale? It
> is exactly this decision. The former may be optimized the later more
> accurate.

Yes I remember the question. Assuming the standard function has a fast
path for UTF-8, which it should, the only reason to expect the
standard multibyte functions to be significantly slower than your
custom ones is that they detect illegal sequences rather than blindly
assuming the input is valid.

> >>If you know you are using UTF-8 you do not need to check every
> >>string over and over again, else it's pure paranoia. It is robust,
> >>as it will not run away on anything which is valid C string.
> >
> >Well if the string comes from a source outside of your control, you
> >need to check it at least once. But you might not want to check and
> >reject it at the original point of input, e.g. if you want to be able
> >to preserve arbitrary byte sequences that might not be UTF-8, e.g. an
> >argument that's a filename in an invalid encoding which you're trying
> >to delete or rename to fix. So IMO it makes a lot more sense to do
> >your checking at the point of treating the string as a sequence of
> >characters, even if it happens multiple times. The cost is not high if
> >your implementation is efficient.
> 
>  and how want you behave in case of invalid UTF-8 sequences? My
> functions just skip over stray codes of 0x80..0xBF and synchronize
> on next valid UTF-8 leading byte. How would you count invalid
> sequences?

In general, I would count the whole operation as a failure, returning
some value such as -1 reserved for failure, since the string is not
actually UTF-8 and thus "how many characters?" has no meaning. For
specific uses, there might be other preferred behaviors. If your goal
is display, you may want to simply replace illegal sequences with
U+FFFD in which case you'd count each such sequence as "1", but if
you're using this character-counting to allocate a buffer for the
converted string, you need to be sure your conversion function and
character-counting function agree on how illegal sequences are
counted, or you might overflow your buffer or end up having to
truncate the output.

> >>>Of course it also gets tripped up badly on invalid sequences.
> >>
> >>How can it get tripped? It silently skip over invalid sequences (of
> >>0x80 to 0xBF until next leading of a sequence). It shall not get
> >>stuck in any way. Or tell me exactly how ...
> >
> >By itself it's not a problem, but the interaction with other code may
> >be a problem if the other code does not follow exactly the same
> >conventions.
> 
> Sure, you can't mix multi byte functions with pure UTF-8 functions,
> you always need to look what type of function you call in your code.
> So what's different here.

"Interaction with other code" was not about mixing your own pure UTF-8
functions with the standard C multibyte functions in
possibly-non-UTF-8 locales. It was about mixing them with other code
that's processing UTF-8 but handling errors differently. One such
example would be the standard C multibyte functions when
nl_langinfo(CODESET) has already been determined to be "UTF-8" (so you
know they're processing UTF-8), but "pure UTF-8" code outside of the
standard functions might also be handling errors differently from what
you're doing, and mixing it with your handling _could_ be dangerous,
depending on what you do.

>  and the convention is just UTF-8 (even with invalid sequences)
> not a mixture with other multi byte codes. Not so much requirement
> of a convention?
> 
> The functions have bean designed carefully to be not trapped on
> invalid sequences. I know they look extreme simple, but this is part
> of the optimization.
> 
>  remember: We are not talking about the ability

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Harald Becker

Hi Rich!

> I think uClibc is pretty fast at this too. It's glibc that's horribly

slow. Rough comparison:


Pretty fast is still slower than UTF-8 optimized functions.


For processing a full string buffer, musl is roughly twice as fast as
uClibc, and uClibc is roughly twice as fast as glibc.


You don't need to make ads for musl here, I' would like to see prebuild 
versions of BB statically linked with musl.



Presumably you would use a full string operation here (mbstowcs with
null output pointer) for computing length in characters.


Do you remember my question only UTF-8 or full multi byte locale? It is 
exactly this decision. The former may be optimized the later more accurate.




If you know you are using UTF-8 you do not need to check every
string over and over again, else it's pure paranoia. It is robust,
as it will not run away on anything which is valid C string.


Well if the string comes from a source outside of your control, you
need to check it at least once. But you might not want to check and
reject it at the original point of input, e.g. if you want to be able
to preserve arbitrary byte sequences that might not be UTF-8, e.g. an
argument that's a filename in an invalid encoding which you're trying
to delete or rename to fix. So IMO it makes a lot more sense to do
your checking at the point of treating the string as a sequence of
characters, even if it happens multiple times. The cost is not high if
your implementation is efficient.


... and how want you behave in case of invalid UTF-8 sequences? My 
functions just skip over stray codes of 0x80..0xBF and synchronize on 
next valid UTF-8 leading byte. How would you count invalid sequences?




Of course it also gets tripped up badly on invalid sequences.


How can it get tripped? It silently skip over invalid sequences (of
0x80 to 0xBF until next leading of a sequence). It shall not get
stuck in any way. Or tell me exactly how ...


By itself it's not a problem, but the interaction with other code may
be a problem if the other code does not follow exactly the same
conventions.


Sure, you can't mix multi byte functions with pure UTF-8 functions, you 
always need to look what type of function you call in your code. So 
what's different here.


... and the convention is just UTF-8 (even with invalid sequences) not a 
mixture with other multi byte codes. Not so much requirement of a 
convention?


The functions have bean designed carefully to be not trapped on invalid 
sequences. I know they look extreme simple, but this is part of the 
optimization.


... remember: We are not talking about the ability to work with other 
multi byte locales. The assumption was pure ASCII or UTF-8.


--
Harald

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Harald Becker

Hi !

On 14.08.2014 19:16, Tanguy Pruvot wrote:


you need to test s != NULL, else *s will crash


It is like other str functions of the libc, you need to call the 
function with a valid pointer.


... else if you like add: "if (!s) retun 0;" ahead of the while

--
Harald

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 07:14:52PM +0200, Harald Becker wrote:
> Hi Rich!
> 
> >> You say this, but libbb/unicode.c contains a unicode_strlen calling
> >>this complex mb to wc conversion function to count the number of
> >>characters. Those multi byte functions tend to be highly complex and
> >>slow (don't know if they have gone better). For just UTF-8, things
> >>can be optimized.
> >
> >This depends on your libc.
> 
>  that is, why I added "don't know if gone better" ... really
> good when musl is fast on this ... the problem is BB is more likely
> linked with glibc or uClibc ... there the results are not so great
> :(

I think uClibc is pretty fast at this too. It's glibc that's horribly
slow. Rough comparison:

For processing a full string buffer, musl is roughly twice as fast as
uClibc, and uClibc is roughly twice as fast as glibc.

For byte-by-byte processing: musl is roughly 3x as fast as uClibc and
roughly 4x as fast as glibc.

Source: my comparison at http://www.etalabs.net/compare_libcs.html

Presumably you would use a full string operation here (mbstowcs with
null output pointer) for computing length in characters.

> >>size_t utf8len( const char* s )
> >>{
> >>   size_t n = 0;
> >>   while (*s)
> >> if ((*s++ ^ 0x40) < 0xC0)
> >>   n++;
> >>   return n;
> >>}
> >
> >This function is only valid if the string is known to be valid UTF-8.
> 
> Yes, I told it's for UTF-8.

Yes, but there's a difference between "nominally UTF-8" and
"known-valid UTF-8".

> >Otherwise it hides errors, which may or may not be problematic
> >depending on what you're using it for.
> 
> If you know you are using UTF-8 you do not need to check every
> string over and over again, else it's pure paranoia. It is robust,
> as it will not run away on anything which is valid C string.

Well if the string comes from a source outside of your control, you
need to check it at least once. But you might not want to check and
reject it at the original point of input, e.g. if you want to be able
to preserve arbitrary byte sequences that might not be UTF-8, e.g. an
argument that's a filename in an invalid encoding which you're trying
to delete or rename to fix. So IMO it makes a lot more sense to do
your checking at the point of treating the string as a sequence of
characters, even if it happens multiple times. The cost is not high if
your implementation is efficient.

> >Of course it also gets tripped up badly on invalid sequences.
> 
> How can it get tripped? It silently skip over invalid sequences (of
> 0x80 to 0xBF until next leading of a sequence). It shall not get
> stuck in any way. Or tell me exactly how ...

By itself it's not a problem, but the interaction with other code may
be a problem if the other code does not follow exactly the same
conventions.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 07:16:36PM +0200, Tanguy Pruvot wrote:
> size_t utf8len( const char* s )
> {
>size_t n = 0;
>while (*s)
>  if ((*s++ ^ 0x40) < 0xC0)
>n++;
>return n;
> }
> 
> you need to test s != NULL, else *s will crash

Says who? NULL is not a valid pointer. Should you also check for
things like s != (char *)-1 ? What value would you return then,
anyway?

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Tanguy Pruvot
size_t utf8len( const char* s )
{
   size_t n = 0;
   while (*s)
 if ((*s++ ^ 0x40) < 0xC0)
   n++;
   return n;
}

you need to test s != NULL, else *s will crash


2014-08-14 19:14 GMT+02:00 Harald Becker :

> Hi Rich!
>
>
> >> You say this, but libbb/unicode.c contains a unicode_strlen calling
>
>> this complex mb to wc conversion function to count the number of
>>> characters. Those multi byte functions tend to be highly complex and
>>> slow (don't know if they have gone better). For just UTF-8, things
>>> can be optimized.
>>>
>>
>> This depends on your libc.
>>
>
> ... that is, why I added "don't know if gone better" ... really good when
> musl is fast on this ... the problem is BB is more likely linked with glibc
> or uClibc ... there the results are not so great :(
>
>
>  size_t utf8len( const char* s )
>>> {
>>>size_t n = 0;
>>>while (*s)
>>>  if ((*s++ ^ 0x40) < 0xC0)
>>>n++;
>>>return n;
>>> }
>>>
>>
>> This function is only valid if the string is known to be valid UTF-8.
>>
>
> Yes, I told it's for UTF-8.
>
>
>  Otherwise it hides errors, which may or may not be problematic
>> depending on what you're using it for.
>>
>
> If you know you are using UTF-8 you do not need to check every string over
> and over again, else it's pure paranoia. It is robust, as it will not run
> away on anything which is valid C string.
>
>
>  Another fast function I use for UTF-8 ... skip to Nth UTF-8
>>> character in a string (returns a pointer to trailing \0 if N >
>>> number of UTF-8 chars in string):
>>>
>>> char *utf8skip( char const* s, size_t n )
>>> {
>>>for ( ; n && *s; --n )
>>>  while ((*++s ^ 0x40) >= 0xC0);
>>>return (char*)s;
>>> }
>>>
>>
>> This code is invalid; it's assuming char is unsigned. In practice,
>> *++s ^ 0x40 is going to be negative on most archs. Better would be
>> doing an unsigned range check like (unsigned char)*++s-0x80<0x40U.
>>
>
> Yes, I missed the type cast ... sorry, for this, see previous mail
>
>
>  Of course it also gets tripped up badly on invalid sequences.
>>
>
> How can it get tripped? It silently skip over invalid sequences (of 0x80
> to 0xBF until next leading of a sequence). It shall not get stuck in any
> way. Or tell me exactly how ...
>
> --
> Harald
>
>
>
> ___
> busybox mailing list
> busybox@busybox.net
> http://lists.busybox.net/mailman/listinfo/busybox
>
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Harald Becker

Hi Rich!

>> You say this, but libbb/unicode.c contains a unicode_strlen calling

this complex mb to wc conversion function to count the number of
characters. Those multi byte functions tend to be highly complex and
slow (don't know if they have gone better). For just UTF-8, things
can be optimized.


This depends on your libc.


... that is, why I added "don't know if gone better" ... really good 
when musl is fast on this ... the problem is BB is more likely linked 
with glibc or uClibc ... there the results are not so great :(



size_t utf8len( const char* s )
{
   size_t n = 0;
   while (*s)
 if ((*s++ ^ 0x40) < 0xC0)
   n++;
   return n;
}


This function is only valid if the string is known to be valid UTF-8.


Yes, I told it's for UTF-8.


Otherwise it hides errors, which may or may not be problematic
depending on what you're using it for.


If you know you are using UTF-8 you do not need to check every string 
over and over again, else it's pure paranoia. It is robust, as it will 
not run away on anything which is valid C string.



Another fast function I use for UTF-8 ... skip to Nth UTF-8
character in a string (returns a pointer to trailing \0 if N >
number of UTF-8 chars in string):

char *utf8skip( char const* s, size_t n )
{
   for ( ; n && *s; --n )
 while ((*++s ^ 0x40) >= 0xC0);
   return (char*)s;
}


This code is invalid; it's assuming char is unsigned. In practice,
*++s ^ 0x40 is going to be negative on most archs. Better would be
doing an unsigned range check like (unsigned char)*++s-0x80<0x40U.


Yes, I missed the type cast ... sorry, for this, see previous mail


Of course it also gets tripped up badly on invalid sequences.


How can it get tripped? It silently skip over invalid sequences (of 0x80 
to 0xBF until next leading of a sequence). It shall not get stuck in any 
way. Or tell me exactly how ...


--
Harald


___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: AW: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Harald Becker

Hi Dietmar!

On 14.08.2014 08:21, dietmar.schind...@manroland-web.com wrote:
> These do not work if char is signed.
You are right, I missed the type casts ... sorry


size_t utf8len( const char* s )
{
   size_t n = 0;
   while (*s)
 if ((unsigned char)(*s++ ^ 0x40) < (unsigned char)0xC0)
   n++;
return n;
}

char *utf8skip( char const* s, size_t n )
{
   for ( ; n && *s; --n )
  while ((unsigned char)(*++s ^ 0x40) >= (unsigned char)0xC0);
   return (char*)s;
}


I know, most would prefer to use (unsigned char) ahead of *++s or *s++, 
but at least gcc gave better optimized x86 code for my type casts.


--
Harald
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Laszlo Papp
Here is my benchmark just to show I am not speaking out of an imaginary
world:

#include 

int main()
{
const char *buff = "foo";
int i = 0;
for (i; i < 100; ++i)
// fputs(buff, stdout);
printf("%s", buff);
return 0;
}

Fputs:

real0m1.761s
user0m0.047s
sys 0m0.007s

Printf:

real0m2.811s
user0m0.097s
sys 0m0.020s



On Thu, Aug 14, 2014 at 4:15 PM, Laszlo Papp  wrote:

> On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker  wrote:
>
>> On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
>> > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot > >
>> > wrote:
>> >
>> > > its the same with bionic libc (arm)
>> > >
>> > > printf("test") is ok but not printf(buf) with char buf[] = "test";
>> > > printf("%s", buf) is ok
>> > >
>> >
>> > Yeah, I guess it is about personal preference. I personally do not like
>> the
>> > extended code just to make some "smart" option silent. IMHO, it makes
>> the
>> > code needlessly longer by adding another small layer for the content of
>> the
>> > variable. If there is no need for formatting, why use printf in the
>> first
>> > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On
>> the
>> > other hand, it is too bad that there is nothing between fputs and puts
>> > where you do not need to use the file descriptor and you do not get a
>> new
>> > line either.
>>
>> In principle stdout is more expensive than "". stdout is a GOT
>> reference in PIC, and "" is just PC-relative. For non-PIC it makes no
>> difference though.
>>
>
> Then my PC is probably lying since the practice does seem to disagree with
> you, apparently as well as this post:
>
>
> http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
>
> (and the one below, ok, it is not critical, but it performs better for
> others, too)
>
> Raw strings are also uglier IMHO than well-defined identifiers, like
> stdout, but that is personal taste, so I do not mind that one.
>
>
>> As for the topic at hand, I don't think it's necessary to make this
>> change if the messages are DOCUMENTED as being format strings that are
>> required not to contain any format specifiers, and there's a big
>> warning to this effect in the source wherever they appear. printf
>> allows format strings chosen (or even constructed) at runtime and the
>> -W option to make this an error is bogus; at least it cannot be
>> applied to code that was not written for a specific coding style that
>> forbids such usage.
>>
>
> Well, the advantage of removing this security risk is that any string can
> be passed without constraining the string anytime.
>
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Laszlo Papp
On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker  wrote:

> On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
> > On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot 
> > wrote:
> >
> > > its the same with bionic libc (arm)
> > >
> > > printf("test") is ok but not printf(buf) with char buf[] = "test";
> > > printf("%s", buf) is ok
> > >
> >
> > Yeah, I guess it is about personal preference. I personally do not like
> the
> > extended code just to make some "smart" option silent. IMHO, it makes the
> > code needlessly longer by adding another small layer for the content of
> the
> > variable. If there is no need for formatting, why use printf in the first
> > place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On
> the
> > other hand, it is too bad that there is nothing between fputs and puts
> > where you do not need to use the file descriptor and you do not get a new
> > line either.
>
> In principle stdout is more expensive than "". stdout is a GOT
> reference in PIC, and "" is just PC-relative. For non-PIC it makes no
> difference though.
>

Then my PC is probably lying since the practice does seem to disagree with
you, apparently as well as this post:

http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840

(and the one below, ok, it is not critical, but it performs better for
others, too)

Raw strings are also uglier IMHO than well-defined identifiers, like
stdout, but that is personal taste, so I do not mind that one.


> As for the topic at hand, I don't think it's necessary to make this
> change if the messages are DOCUMENTED as being format strings that are
> required not to contain any format specifiers, and there's a big
> warning to this effect in the source wherever they appear. printf
> allows format strings chosen (or even constructed) at runtime and the
> -W option to make this an error is bogus; at least it cannot be
> applied to code that was not written for a specific coding style that
> forbids such usage.
>

Well, the advantage of removing this security risk is that any string can
be passed without constraining the string anytime.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
> On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot 
> wrote:
> 
> > its the same with bionic libc (arm)
> >
> > printf("test") is ok but not printf(buf) with char buf[] = "test";
> > printf("%s", buf) is ok
> >
> 
> Yeah, I guess it is about personal preference. I personally do not like the
> extended code just to make some "smart" option silent. IMHO, it makes the
> code needlessly longer by adding another small layer for the content of the
> variable. If there is no need for formatting, why use printf in the first
> place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On the
> other hand, it is too bad that there is nothing between fputs and puts
> where you do not need to use the file descriptor and you do not get a new
> line either.

In principle stdout is more expensive than "". stdout is a GOT
reference in PIC, and "" is just PC-relative. For non-PIC it makes no
difference though.

As for the topic at hand, I don't think it's necessary to make this
change if the messages are DOCUMENTED as being format strings that are
required not to contain any format specifiers, and there's a big
warning to this effect in the source wherever they appear. printf
allows format strings chosen (or even constructed) at runtime and the
-W option to make this an error is bogus; at least it cannot be
applied to code that was not written for a specific coding style that
forbids such usage.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Laszlo Papp
On Thu, Aug 14, 2014 at 3:36 PM,  wrote:

> Hi Laszlo,
>
> > Yeah, I guess it is about personal preference. I personally do not like
> the
> > extended code just to make some "smart" option silent. ...
>
> Based on your arguing it seems that you have NEVER heard about
> format string exploits which are around for 15 years?
>
> http://en.wikipedia.org/wiki/Uncontrolled_format_string


That is your interpretation, but what I am saying, _this code_ does not do
any exploit. Either way, please do not use printf. I have had a quick
measure, and fputs performs slightly better. The "%s" is an overhead in the
way.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Laszlo Papp
On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot 
wrote:

> its the same with bionic libc (arm)
>
> printf("test") is ok but not printf(buf) with char buf[] = "test";
> printf("%s", buf) is ok
>

Yeah, I guess it is about personal preference. I personally do not like the
extended code just to make some "smart" option silent. IMHO, it makes the
code needlessly longer by adding another small layer for the content of the
variable. If there is no need for formatting, why use printf in the first
place and not fputs with stdout? IMHO, stdout is nicer than "%s" :-) On the
other hand, it is too bad that there is nothing between fputs and puts
where you do not need to use the file descriptor and you do not get a new
line either.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Tanguy Pruvot
its the same with bionic libc (arm)

printf("test") is ok but not printf(buf) with char buf[] = "test";
printf("%s", buf) is ok


2014-08-14 15:29 GMT+02:00 Laszlo Papp :

> Sorry, the list was left out at some point due to my mistake ... See below.
>
>
> On Thu, Aug 14, 2014 at 2:28 PM, Laszlo Papp  wrote:
>
>> On Thu, Aug 14, 2014 at 2:04 PM,  wrote:
>>
>>> Hi Laszlo,
>>>
>>> > Cannot reproduce with -Wall -Wpedantic -Wextra.
>>>
>>> Did you notice:
>>> applets/usage_pod.c:74:3: error: format not a string literal and no
>>> format arguments
>>> [-Werror=format-security]
>>> ?
>>>
>>
>> Hmm, yes, but I got used to that -Wall -Wpedantic -Wextra would cover the
>> cases that need to be checked. I am not sure this option respects the C
>> language though.
>>
>> Anyway, I guess I learnt something new today, thanks. :-)
>>
>
>
> ___
> busybox mailing list
> busybox@busybox.net
> http://lists.busybox.net/mailman/listinfo/busybox
>
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Laszlo Papp
Sorry, the list was left out at some point due to my mistake ... See below.


On Thu, Aug 14, 2014 at 2:28 PM, Laszlo Papp  wrote:

> On Thu, Aug 14, 2014 at 2:04 PM,  wrote:
>
>> Hi Laszlo,
>>
>> > Cannot reproduce with -Wall -Wpedantic -Wextra.
>>
>> Did you notice:
>> applets/usage_pod.c:74:3: error: format not a string literal and no
>> format arguments
>> [-Werror=format-security]
>> ?
>>
>
> Hmm, yes, but I got used to that -Wall -Wpedantic -Wextra would cover the
> cases that need to be checked. I am not sure this option respects the C
> language though.
>
> Anyway, I guess I learnt something new today, thanks. :-)
>
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Baruch Siach
Hi Markus,

On Thu, Aug 14, 2014 at 03:52:17PM +0300, Baruch Siach wrote:
> On Thu, Aug 14, 2014 at 02:36:18PM +0200, wald...@gmx.de wrote:
> > depending on the compilation options a build could fail here:
> > 
> > --- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400
> > +++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400
> > @@ -71,7 +71,7 @@
> > } else {
> > printf(", ");
> > }
> > -   printf(usage_array[i].aname);
> > +   printf("%s", usage_array[i].aname);
> 
> I guess puts() would make code size smaller (didn't test).

Scratch that. puts() adds a trailing newline.

baruch

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Baruch Siach
Hi Markus,

On Thu, Aug 14, 2014 at 02:36:18PM +0200, wald...@gmx.de wrote:
> depending on the compilation options a build could fail here:
> 
> --- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400
> +++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400
> @@ -71,7 +71,7 @@
> } else {
> printf(", ");
> }
> -   printf(usage_array[i].aname);
> +   printf("%s", usage_array[i].aname);

I guess puts() would make code size smaller (didn't test).

baruch

> col += len2;
> }
> printf("\n\n");

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


missing format string in applets/usage_pod.c

2014-08-14 Thread waldeck
Hi,

depending on the compilation options a build could fail here:

--- applets/usage_pod.c.orig2014-08-14 08:27:47.021201252 -0400
+++ applets/usage_pod.c 2014-08-14 08:16:20.685224466 -0400
@@ -71,7 +71,7 @@
} else {
printf(", ");
}
-   printf(usage_array[i].aname);
+   printf("%s", usage_array[i].aname);
col += len2;
}
printf("\n\n");

Thanks

Markus
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox