Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-15 Thread Isaac Dunham
On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
 On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote:
 
  On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
   On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com
   wrote:
  
its the same with bionic libc (arm)
   
printf(test) is ok but not printf(buf) with char buf[] = test;
printf(%s, buf) is ok
   
  
   Yeah, I guess it is about personal preference. I personally do not like
  the
   extended code just to make some smart option silent. IMHO, it makes the
   code needlessly longer by adding another small layer for the content of
  the
   variable. If there is no need for formatting, why use printf in the first
   place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On
  the
   other hand, it is too bad that there is nothing between fputs and puts
   where you do not need to use the file descriptor and you do not get a new
   line either.
 
  In principle stdout is more expensive than . stdout is a GOT
  reference in PIC, and  is just PC-relative. For non-PIC it makes no
  difference though.
 
 
 Then my PC is probably lying since the practice does seem to disagree with
 you, apparently as well as this post:
 
 http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
 
 (and the one below, ok, it is not critical, but it performs better for
 others, too)
 
 Raw strings are also uglier IMHO than well-defined identifiers, like
 stdout, but that is personal taste, so I do not mind that one.
 
printf() is almost always going to call vfprintf(stdout, fmt, ap);
I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
exception I've found, and it's ugly and does not have locking/retry/...
newlib calls _vfprintf_r, which is almost the same concept.
Meanwhile, printf has to do the formatting.

In other words: 
Just because printf(%s, string) doesn't name stdout, don't assume that
it doesn't use a reference to stdout.
So an ACK for Laszlo Papp's recommendation of fputs().

Thanks,
Isaac Dunham
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-15 Thread Denys Vlasenko
On Wed, Aug 13, 2014 at 7:06 PM, Harald Becker ra...@gmx.de wrote:
 The world seems to be standardizing on utf-8.

 Thank God, supporting gazillion of encodings is no fun.


 You say this, but libbb/unicode.c contains a unicode_strlen calling this
 complex mb to wc conversion function to count the number of characters.
 Those multi byte functions tend to be highly complex and slow (don't know if
 they have gone better). For just UTF-8, things can be optimized.

bbox does have unicode-only implementation of mbstowc.
See unicode.c

 size_t utf8len( const char* s )
 {
   size_t n = 0;
   while (*s)
 if ((*s++ ^ 0x40)  0xC0)
   n++;
   return n;
 }

 size_t mystrlen( const char* s )
 {
   return utf8_enabled ? utf8len(s) : strlen(s);
 }

 This looks more, but avoids inclusion of mb function. Most compiler shall
 produce fast code for utf8len.

There are situations where you need to do tons of unicode_strlen()
and you can tolerate getting wrong results on broken Unicode.
Then a function similar to yours can be very useful.

-- 
vda
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-15 Thread Harald Becker

Hi Denys,

my new system gets to a usable state, so I'm able to do more then just 
writing mails ...


 4) applet printf, string formats %Ns

Does this N mean character positions or bytes. The underlying C printf used
to work with bytes for decades. The man page talks about character
positions, but printf from bash uses bytes.


Also, printf needs to support \u


Do you like getting a patch for BB_process_escape_sequence to add 
Unicode sequences \u?


--
Harald

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-15 Thread Tanguy Pruvot
In bionic libc (android), wchar functions are stubs, mb functions also, the
MAX_MB_LEN is set to 1.

For wide chars, all chars are copied in the first byte of uint32_t wchar_t.
So, all mbtowc, wctomb and mbstombs functions are useless.

but the tools vim and lv are working perfectly with their own
convertion functions, but these tools are big (1MB on arm, bb is only 550K
in comparaison, 750K statically linked to the libc/libm/libselinux)

references :
https://github.com/tpruvot/android_external_vim
https://github.com/tpruvot/android_external_lv
https://github.com/tpruvot/android_external_busybox


2014-08-15 13:32 GMT+02:00 Harald Becker ra...@gmx.de:

 Hi Denys,

 my new system gets to a usable state, so I'm able to do more then just
 writing mails ...


  4) applet printf, string formats %Ns

 Does this N mean character positions or bytes. The underlying C printf
 used
 to work with bytes for decades. The man page talks about character
 positions, but printf from bash uses bytes.


 Also, printf needs to support \u


 Do you like getting a patch for BB_process_escape_sequence to add Unicode
 sequences \u?

 --
 Harald


 ___
 busybox mailing list
 busybox@busybox.net
 http://lists.busybox.net/mailman/listinfo/busybox

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Fix the addgroup help output

2014-08-15 Thread Denys Vlasenko
On Wed, Aug 13, 2014 at 2:25 PM, Laszlo Papp lp...@kde.org wrote:
 On Wed, Aug 13, 2014 at 12:54 PM, tito farmat...@tiscali.it wrote:
 On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote:
  On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp lp...@kde.org wrote:
 
   commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
   Author: Laszlo Papp lp...@kde.org
   Date:   Wed Aug 13 09:48:08 2014 +0100
  
   Fix the addgroup help output
  
   Since the applet has two options, it is quite misleading to only
   mention one in
   the usage example. It should either use OPTIONS there or enumerate
   the
   possible
   options. The latter seems to be more common in applet, so I picked
   that one.
  
   diff --git a/loginutils/addgroup.c b/loginutils/addgroup.c
   index 22cd0e6..ce57459 100644
   --- a/loginutils/addgroup.c
   +++ b/loginutils/addgroup.c
   @@ -11,7 +11,7 @@
 */
  
//usage:#define addgroup_trivial_usage
   -//usage:   [-g GID]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
   GROUP
   +//usage:   [-gS]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
   GROUP
//usage:#define addgroup_full_usage \n\n
//usage:   Add a group  IF_FEATURE_ADDUSER_TO_GROUP(or add a
   user
   to a group) \n
//usage: \n   -g GID  Group id
  
 
  Hmm, actually, I got some doubts whether this change is useful because
  while it fixes one issue, the applets seem randomly inconsistent, e.g.
  this
  is not holding any consistency with any other either:
 
  Usage: crond -fbS -l N -d N -L LOGFILE -c DIR
 
  At best, it is missing the brackets, but the applets are inconsistent at
  large overall, sadly, so I am not sure it is any improvement to fix this
  one on its own, and fixing all would be more intrusive than I care.
 
 Hi,
 maybe is more easy to understand as it shows which option needs
 a argument.

 +//usage:   [-g GID][-S]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
 GROUP


 Well, it is all about personal preferences, so whatever Denys wishes as the
 maintainer, but the more important objective issue is outlined above.

I fixed help text by adding [-S].

 Why
 applets are doing this on their own is beyond my league other than
 hysterical raisins...

I didn't understand this part.
Why applets are doing /what/ on their own?
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Fix the addgroup help output

2014-08-15 Thread Laszlo Papp
On Fri, Aug 15, 2014 at 1:31 PM, Denys Vlasenko vda.li...@googlemail.com
wrote:

 On Wed, Aug 13, 2014 at 2:25 PM, Laszlo Papp lp...@kde.org wrote:
  On Wed, Aug 13, 2014 at 12:54 PM, tito farmat...@tiscali.it wrote:
  On Wednesday 13 August 2014 12:13:10 Laszlo Papp wrote:
   On Wed, Aug 13, 2014 at 9:52 AM, Laszlo Papp lp...@kde.org wrote:
  
commit 55d6582d88470078cef09f52d1bc3c9c3f7fca6a
Author: Laszlo Papp lp...@kde.org
Date:   Wed Aug 13 09:48:08 2014 +0100
   
Fix the addgroup help output
   
Since the applet has two options, it is quite misleading to only
mention one in
the usage example. It should either use OPTIONS there or
 enumerate
the
possible
options. The latter seems to be more common in applet, so I
 picked
that one.
   
diff --git a/loginutils/addgroup.c b/loginutils/addgroup.c
index 22cd0e6..ce57459 100644
--- a/loginutils/addgroup.c
+++ b/loginutils/addgroup.c
@@ -11,7 +11,7 @@
  */
   
 //usage:#define addgroup_trivial_usage
-//usage:   [-g GID]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
GROUP
+//usage:   [-gS]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
GROUP
 //usage:#define addgroup_full_usage \n\n
 //usage:   Add a group  IF_FEATURE_ADDUSER_TO_GROUP(or add a
user
to a group) \n
 //usage: \n   -g GID  Group id
   
  
   Hmm, actually, I got some doubts whether this change is useful because
   while it fixes one issue, the applets seem randomly inconsistent, e.g.
   this
   is not holding any consistency with any other either:
  
   Usage: crond -fbS -l N -d N -L LOGFILE -c DIR
  
   At best, it is missing the brackets, but the applets are inconsistent
 at
   large overall, sadly, so I am not sure it is any improvement to fix
 this
   one on its own, and fixing all would be more intrusive than I care.
  
  Hi,
  maybe is more easy to understand as it shows which option needs
  a argument.
 
  +//usage:   [-g GID][-S]  IF_FEATURE_ADDUSER_TO_GROUP([USER] )
  GROUP
 
 
  Well, it is all about personal preferences, so whatever Denys wishes as
 the
  maintainer, but the more important objective issue is outlined above.

 I fixed help text by adding [-S].

  Why
  applets are doing this on their own is beyond my league other than
  hysterical raisins...

 I didn't understand this part.
 Why applets are doing /what/ on their own?


As mentioned in the thread above, applets are defining the help display
text on their own, and it is different for different applets. There is no
centralized common solution to this problem. Every little applet is doing
its own way as they wish.

Also, it will be difficult to change at many places if you happen to need
to.. centralization would rock here imho.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Fix the addgroup help output

2014-08-15 Thread Laszlo Papp
On Fri, Aug 15, 2014 at 5:12 PM, Denys Vlasenko vda.li...@googlemail.com
wrote:

 On Fri, Aug 15, 2014 at 3:11 PM, lp...@archlinux.us wrote:

  Applets are defining the help display text on their own, and it is
 different for different applets.

  Also, it will be difficult to change at many places if you happen to need
 to..

  centralization would rock here imho...


 Centralization used to rock, when all help texts
 were in include/usage.src.h (then named include/usage.h).

 It was easy to see all help texts in one place, but it was
 a complete nightmare to keep applet and its help text in sync:
 you needed to jump between two different files
 in order to check what every option does in code,
 and what help text says.

 People were routinely forgetting to update help text,
 because it was in separate file.

 Adding new applet needed editing many files.

 You can see all help texts in one place after a build:
 it's in generated file, include/usage.h;
 or you can just run auxiliary executable: applets/usage
 to have it on stdout.


That is not what I meant. What I meant is centralizing the formatting
output. That is, to have some common functions to which you pass your
options and the functions will output a standardized format.

I did not mean to put all the custom code into one place as a dump. That
would be indeed worse than the current, I agree.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-15 Thread Rich Felker
On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote:
 On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
  On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote:
  
   On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com
wrote:
   
 its the same with bionic libc (arm)

 printf(test) is ok but not printf(buf) with char buf[] = test;
 printf(%s, buf) is ok

   
Yeah, I guess it is about personal preference. I personally do not like
   the
extended code just to make some smart option silent. IMHO, it makes 
the
code needlessly longer by adding another small layer for the content of
   the
variable. If there is no need for formatting, why use printf in the 
first
place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On
   the
other hand, it is too bad that there is nothing between fputs and puts
where you do not need to use the file descriptor and you do not get a 
new
line either.
  
   In principle stdout is more expensive than . stdout is a GOT
   reference in PIC, and  is just PC-relative. For non-PIC it makes no
   difference though.
  
  
  Then my PC is probably lying since the practice does seem to disagree with
  you, apparently as well as this post:
  
  http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
  
  (and the one below, ok, it is not critical, but it performs better for
  others, too)
  
  Raw strings are also uglier IMHO than well-defined identifiers, like
  stdout, but that is personal taste, so I do not mind that one.
  
 printf() is almost always going to call vfprintf(stdout, fmt, ap);
 I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
 exception I've found, and it's ugly and does not have locking/retry/...
 newlib calls _vfprintf_r, which is almost the same concept.
 Meanwhile, printf has to do the formatting.
 
 In other words: 
 Just because printf(%s, string) doesn't name stdout, don't assume that
 it doesn't use a reference to stdout.
 So an ACK for Laszlo Papp's recommendation of fputs().

I was talking about code size overhead, not run time overhead. Of
course there's a reference to stdout in printf. When you reference it
again yourself, that's a second reference. It's not a big deal but
apparently busybox folks care about this kind of micro-optimization...

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-15 Thread Rich Felker
On Fri, Aug 15, 2014 at 12:31:15AM +0200, Harald Becker wrote:
   and how want you behave in case of invalid UTF-8 sequences? My
 functions just skip over stray codes of 0x80..0xBF and synchronize
 on next valid UTF-8 leading byte. How would you count invalid
 sequences?
 
 In general, I would count the whole operation as a failure, returning
 some value such as -1 reserved for failure, since the string is not
 actually UTF-8 and thus how many characters? has no meaning. For
 specific uses, there might be other preferred behaviors. If your goal
 is display, you may want to simply replace illegal sequences with
 U+FFFD in which case you'd count each such sequence as 1, but if
 you're using this character-counting to allocate a buffer for the
 converted string, you need to be sure your conversion function and
 character-counting function agree on how illegal sequences are
 counted, or you might overflow your buffer or end up having to
 truncate the output.
 
 Rich, will you ever use the result of counting the numbers of UTF-8
 characters to allocate a buffer? I don't think so. That would be
 very ill behavior. To allocate buffer space you need the number of
 bytes occupied by a string, not the number of UTF-8 characters.

If your intent is to convert the string to UTF-32/wchar_t/whatever,
then yes, you use the result for allocating a buffer. In my mind
that's the main point of counting characters (since otherwise you
usually care about either bytes, for storage, or columns, for
presentation), and while I personally consider it better to work
character-at-a-time and keep the string as UTF-8, some APIs require a
string in a different format, especially ones that work with a whole
string and prepare it for visual presentation.

The main other place counting characters makes sense is for
implementing languages that do substring operations with character
indexes, which I think is the one you care about.

 So the big question is: Is there anybody who still needs the BB
 internal Unicode handling and can't use the locale functions of a
 libc. Why and for what purpose is this needed? In which environment?

I think the intent was to let uClibc users (and possibly eglibc
users?) omit locale support from the libc, which reduces libc size
quite a bit, and use the UTF-8 code in busybox instead.

 As far as I know, the beginning of those BB internal functions,
 where at times where only glibc had locale support and there where
 no alternatives for small environments. But things changed and there
 are now alternatives. So have we reached a point, where we are able
 to simplify things in BB (which means to focus on correct mb
 function usage everywhere and to strip unnecessary decisions,
 configs and helper code)?

I wouldn't object to this change.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-15 Thread Laszlo Papp
On Fri, Aug 15, 2014 at 6:31 PM, Rich Felker dal...@libc.org wrote:

 On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote:
  On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
   On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote:
  
On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
 On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot 
 tanguy.pru...@gmail.com
 wrote:

  its the same with bionic libc (arm)
 
  printf(test) is ok but not printf(buf) with char buf[] =
 test;
  printf(%s, buf) is ok
 

 Yeah, I guess it is about personal preference. I personally do not
 like
the
 extended code just to make some smart option silent. IMHO, it
 makes the
 code needlessly longer by adding another small layer for the
 content of
the
 variable. If there is no need for formatting, why use printf in
 the first
 place and not fputs with stdout? IMHO, stdout is nicer than %s
 :-) On
the
 other hand, it is too bad that there is nothing between fputs and
 puts
 where you do not need to use the file descriptor and you do not
 get a new
 line either.
   
In principle stdout is more expensive than . stdout is a GOT
reference in PIC, and  is just PC-relative. For non-PIC it makes no
difference though.
   
  
   Then my PC is probably lying since the practice does seem to disagree
 with
   you, apparently as well as this post:
  
  
 http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
  
   (and the one below, ok, it is not critical, but it performs better for
   others, too)
  
   Raw strings are also uglier IMHO than well-defined identifiers, like
   stdout, but that is personal taste, so I do not mind that one.
 
  printf() is almost always going to call vfprintf(stdout, fmt, ap);
  I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
  exception I've found, and it's ugly and does not have locking/retry/...
  newlib calls _vfprintf_r, which is almost the same concept.
  Meanwhile, printf has to do the formatting.
 
  In other words:
  Just because printf(%s, string) doesn't name stdout, don't assume that
  it doesn't use a reference to stdout.
  So an ACK for Laszlo Papp's recommendation of fputs().

 I was talking about code size overhead, not run time overhead. Of
 course there's a reference to stdout in printf. When you reference it
 again yourself, that's a second reference. It's not a big deal but
 apparently busybox folks care about this kind of micro-optimization...


Why not care if it is better and it has no disadvantage?

The size complaint does not matter since that size overhead is already in
the lib pointed out by Isaac. It will not automagically know to print to
stdout.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-15 Thread Isaac Dunham
On Fri, Aug 15, 2014 at 06:39:39PM +0100, Laszlo Papp wrote:
 On Fri, Aug 15, 2014 at 6:31 PM, Rich Felker dal...@libc.org wrote:
 
  On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote:
   On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote:
   
 On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
  If there is no need for formatting, why use printf in
  the first
  place and not fputs with stdout? IMHO, stdout is nicer than %s
  :-) On
 the
  other hand, it is too bad that there is nothing between fputs and
  puts
  where you do not need to use the file descriptor and you do not
  get a new
  line either.

 In principle stdout is more expensive than . stdout is a GOT
 reference in PIC, and  is just PC-relative. For non-PIC it makes no
 difference though.

   
Then my PC is probably lying since the practice does seem to disagree
  with
you, apparently as well as this post:
   
   
  http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
   
(and the one below, ok, it is not critical, but it performs better for
others, too)
   
Raw strings are also uglier IMHO than well-defined identifiers, like
stdout, but that is personal taste, so I do not mind that one.
  
   printf() is almost always going to call vfprintf(stdout, fmt, ap);
   I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
   exception I've found, and it's ugly and does not have locking/retry/...
   newlib calls _vfprintf_r, which is almost the same concept.
   Meanwhile, printf has to do the formatting.
  
   In other words:
   Just because printf(%s, string) doesn't name stdout, don't assume that
   it doesn't use a reference to stdout.
   So an ACK for Laszlo Papp's recommendation of fputs().
 
  I was talking about code size overhead, not run time overhead. Of
  course there's a reference to stdout in printf. When you reference it
  again yourself, that's a second reference. It's not a big deal but
  apparently busybox folks care about this kind of micro-optimization...
 
 
 Why not care if it is better and it has no disadvantage?
 
 The size complaint does not matter since that size overhead is already in
 the lib pointed out by Isaac. It will not automagically know to print to
 stdout.
If I understand correctly, Rich is saying that every use of fputs will
add more to the size of the binary.

I did a simple test using the following code:
fput.c:
#include stdio.h
int main()
{
char *test = TEST...;
fputs(test, stdout);
}

print.c:
#include stdio.h
int main()
{
char *test = TEST...;
printf(%s, test);
}

Then I built the objects:
make fput.o print.o
Per ls, fput.o is 24 bytes larger (1436 vs 1412 bytes).
Per size, print.o is 1 byte larger (182 vs 183 bytes text, 0 data/bss).
Build with CFLAGS=-Os -g0, and both grow to 1548 bytes on disk;
size of fput.o is 198, while print.o is 200.

For the dynamically linked binaries built with CFLAGS=-Os -g0, ls and size
are 8733/1969 bytes for fput and 8712/1933 for print; both are 5236 bytes
on disk after running strip.

Thanks,
Isaac Dunham
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox