Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Joshua Judson Rosen

But why is ls able to match the files when rm is not able to remove them?

Is it perhaps because ls is not actually doing any operations on the files
themselves (not even a stat?), and just reporting the dirent->d_name strings
that it got from readdir()? In which case "ls -l *" would fail on the same
files even when "ls *" doesn't?

Or is there something deeper whereby stat() succeeds but unlink() fails?

On 2014-05-29 18:32, Rich Felker wrote:

On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote:

Hi Rich !


I know this problem very well. It happens about every few
month, that I get a ZIP packaged file from a Windows system.
As the maintainer is a bit stupid, he can't manage to avoid
foreign characters and I end up with unusual file names after
unzip.


This sounds like a bug in the unzip utility. If it encounters
byte sequences which are not UTF-8, it should convert them from
whatever legacy encoding they're in to UTF-8, possibly issuing
an error that the user needs to specify this encoding if it
can't be determined.


Then you need to consider all programs buggy which don't
mangle with the file names. There are so many programs which just
copy filenames through and let the kernel decide what to do. And
I do not mean BB unzip here, normally I'm using the upstream
unzip.

 and how can you consider all names being UTF-8 ... nowadays
may be, but what when using 8 bit locales with different
charsets? UTF-8 mangling would be wrong on those.


My statement was imprecise; of course to support users still stuck on
legacy locales, nl_langinfo(CODESET) should be consulted.


 and not only unzip may produce such results. Think of using
an USB stick at an Windows machine, then carry that over to an
Linux machine.


The filenames are stored in UCS-2. No problem.


Depending on how the file system is mounted you
may get unusual file names when copying names with foreign
characters. Now who is bad?


If you mount it incorrectly, then this is user error. Note that
correct versus incorrect does not depend on the contents of the
storage device, only the encoding the local system where you're
mounting it is using.


Would be nice to have them all fixed ... get them all fixed the
same way when doing some mapping ... but can that ever reach all
programs? This is a so long standing problem, nobody really
cares.


All programs are not affected. Only programs which read filenames as
byte strings from foreign sources (such as the directory table of a
zip file) are affected.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox



--
"'tis an ill wind that blows no minds."
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox nslookup slow on x86_64

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 11:30:58PM +0200, Harald Becker wrote:
> Hi Rich !
> 
> >> bbox nslookup uses libc to perform the lookup.
> 
> >However, it may be nice to have an option for bb nslookup to
> >turn off v6 lookups if such an option doesn't already exist.
> 
> The problem has been solved by placing "single-request" option
> in /etc/resolv.conf. So it was a glibc related problem.

This option is a workaround for buggy nameserver software on some
routers that hangs when you perform multiple requests at the same
time. It's far from being a complete workaround since multiple
processes/threads (or even different machines behind the router) might
make simultaneous requests in a way that hangs the router. The correct
fix is not to use the built-in nameserver on such routers but to
instead either configure a local nameserver on 127.0.0.1 or use a
third-party one (e.g. 8.8.8.8). Or replace the router's firmware with
OpenWRT if possible.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote:
> Hi Rich !
> 
> >> I know this problem very well. It happens about every few
> >> month, that I get a ZIP packaged file from a Windows system.
> >> As the maintainer is a bit stupid, he can't manage to avoid
> >> foreign characters and I end up with unusual file names after
> >> unzip.
> >
> >This sounds like a bug in the unzip utility. If it encounters
> >byte sequences which are not UTF-8, it should convert them from
> >whatever legacy encoding they're in to UTF-8, possibly issuing
> >an error that the user needs to specify this encoding if it
> >can't be determined.
> 
> Then you need to consider all programs buggy which don't
> mangle with the file names. There are so many programs which just
> copy filenames through and let the kernel decide what to do. And
> I do not mean BB unzip here, normally I'm using the upstream
> unzip.
> 
>  and how can you consider all names being UTF-8 ... nowadays
> may be, but what when using 8 bit locales with different
> charsets? UTF-8 mangling would be wrong on those.

My statement was imprecise; of course to support users still stuck on
legacy locales, nl_langinfo(CODESET) should be consulted.

>  and not only unzip may produce such results. Think of using
> an USB stick at an Windows machine, then carry that over to an
> Linux machine.

The filenames are stored in UCS-2. No problem.

> Depending on how the file system is mounted you
> may get unusual file names when copying names with foreign
> characters. Now who is bad?

If you mount it incorrectly, then this is user error. Note that
correct versus incorrect does not depend on the contents of the
storage device, only the encoding the local system where you're
mounting it is using.

> Would be nice to have them all fixed ... get them all fixed the
> same way when doing some mapping ... but can that ever reach all
> programs? This is a so long standing problem, nobody really
> cares. 

All programs are not affected. Only programs which read filenames as
byte strings from foreign sources (such as the directory table of a
zip file) are affected.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox nslookup slow on x86_64

2014-05-29 Thread Harald Becker
Hi Rich !

>> bbox nslookup uses libc to perform the lookup.

>However, it may be nice to have an option for bb nslookup to
>turn off v6 lookups if such an option doesn't already exist.

The problem has been solved by placing "single-request" option
in /etc/resolv.conf. So it was a glibc related problem.
 
--
Harald
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Harald Becker
Hi Rich !

>> I know this problem very well. It happens about every few
>> month, that I get a ZIP packaged file from a Windows system.
>> As the maintainer is a bit stupid, he can't manage to avoid
>> foreign characters and I end up with unusual file names after
>> unzip.
>
>This sounds like a bug in the unzip utility. If it encounters
>byte sequences which are not UTF-8, it should convert them from
>whatever legacy encoding they're in to UTF-8, possibly issuing
>an error that the user needs to specify this encoding if it
>can't be determined.

Then you need to consider all programs buggy which don't
mangle with the file names. There are so many programs which just
copy filenames through and let the kernel decide what to do. And
I do not mean BB unzip here, normally I'm using the upstream
unzip.

... and how can you consider all names being UTF-8 ... nowadays
may be, but what when using 8 bit locales with different
charsets? UTF-8 mangling would be wrong on those.

... and not only unzip may produce such results. Think of using
an USB stick at an Windows machine, then carry that over to an
Linux machine. Depending on how the file system is mounted you
may get unusual file names when copying names with foreign
characters. Now who is bad?

Would be nice to have them all fixed ... get them all fixed the
same way when doing some mapping ... but can that ever reach all
programs? This is a so long standing problem, nobody really
cares. 

>
>Rich


--
Harald
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 09:18:26AM +0200, Harald Becker wrote:
> Hi Denys !
> 
> >> For what it's worth the users with this problem were unable to
> >> remove the files using wildcards. For example, one user had a
> >> file named:
> >>
> >>   På hjul.mkv
> >>
> >> ls P* displayed the file.
> >> rm P* returned the error "can't remove 'På Hjul.mkv': No such
> >> file or directory"
> >
> >I have hard time believing this.
> >Wildcard expansion is done by the shell, not by ls and rm.
> >
> >IOW: ls and rm see exactly the same expanded names.
> >
> >Since they don't mangle the names in any way
> >(e.g. no UTF-8 decoding) before feeding them to system calls,
> >it should work.
> 
> I know this problem very well. It happens about every few month,
> that I get a ZIP packaged file from a Windows system. As the
> maintainer is a bit stupid, he can't manage to avoid foreign
> characters and I end up with unusual file names after unzip.

This sounds like a bug in the unzip utility. If it encounters byte
sequences which are not UTF-8, it should convert them from whatever
legacy encoding they're in to UTF-8, possibly issuing an error that
the user needs to specify this encoding if it can't be determined.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: busybox nslookup slow on x86_64

2014-05-29 Thread Rich Felker
On Wed, May 28, 2014 at 04:34:23PM +0200, Denys Vlasenko wrote:
> On Tue, May 27, 2014 at 9:34 AM, muddyboot  wrote:
> > Hi, I found nslookup resolve very slow on x86_64 system , it cost 5 seconds 
> > or longer almost everytime.
> >
> > Tested OS: Debian 7 x86_64 with kernel 3.2.5 &  LFS x86_64 with kernel 3.12
> >
> > No IPv6 enabled in kernel config.
> > DNS server works fine
> > nslookup program from bind-9.7 works fine
> > nslookup from busybox test on i686 system OK
> >
> > target busybox version: 1.17.4、1.20.2、1.21.1、1.22.1
> >
> > Any response for this problem is great appreciated.
> 
> bbox nslookup uses libc to perform the lookup.
> 
> glibc maintainers known to be quite.. er.. stubborn
> about how DNS should work.
> 
> For example, they insist that IPv6 DNS requests must be sent
> even if the machine has no IPv6 support in kernel
> (let alone a more typical case where machine
> has no IPv6 connectivity).
> 
> Your DNS server does not respond to IPv6 requests,
> but glibc waits for them.

Unless the caller requested AI_ADDRCONFIG or requested AF_INET
explicitly as opposed to AF_INET6, it's required to do this. And I
don't think it's a bug. It may be useful to know all DNS results even
if some of them (v6) won't be used for your current client setup. The
bug is in whatever broken nameserver is ignoring  requests rather
than properly looking them up and returning a result.

However, it may be nice to have an option for bb nslookup to turn off
v6 lookups if such an option doesn't already exist.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Proposition: use a hashtable instead of bsearch to locate applets

2014-05-29 Thread Laurent Bercot

Does this provided a noticeable performance increase? Do you have
benchmarks?


 +1. I'm not convinced this is useful. Maybe for looping sh scripts that
prefer applets (i.e. do not perform an execve() everytime busybox is
called), but I'm willing to bet that even in that kind of script the
performance bottleneck will be something else - typically, invocation of
external commands, or any kind of system call really. For normal
scripts, the cost of application lookup is basically made negligible by
the cost of execve() in the first place. Plus dynamic symbol resolution
if you're not using static linking.

 O(log n) calls to strcmp is cheap, except in very tight loops; and a
full busybox applet invocation rarely happens in a tight loop. So I
wouldn't add to the code size to optimize that part without benchmarks
showing a real performance improvement.

--
 Laurent
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[PATCH] tar: only include selinux context with -p opt

2014-05-29 Thread Tanguy Pruvot
Without answer, i added -p to store selinux contexts (its for android 4.3+)


0001-tar-add-selinux-context-support-on-create.patch
Description: Binary data


0002-tar-only-include-selinux-context-with-p-opt.patch
Description: Binary data
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Proposition: use a hashtable instead of bsearch to locate applets

2014-05-29 Thread Jason Cipriani
Does this provided a noticeable performance increase? Do you have
benchmarks?

J
On May 29, 2014 7:17 AM, "Bartosz Golaszewski"  wrote:

> Hi!
>
> Busybox uses bsearch() to locate the applet's main function in
> find_applet_by_name(). Running 'make defconfig' results in 355 applets
> being built and this in turn results in 8-9 calls to strcmp() on average
> per busybox execution.
>
> Maybe we should switch to using a simple static hashtable? The following
> patch is a simple & dirty proof of concept to show what I mean. It modifies
> applet_tables.c to generate a static hashtable containing indicies of
> fields
> in applet_nameofs. I used a simple and fast hash function taken from
> Robert Jenkins.
>
> With this patch, on each execution and after the hash computation, the
> number of calls to strcmp() has been limited to four at most, and mostly
> it's just one or two. There are no calls to applet_name_compare() too.
>
> The patch results in bigger code, but there's room for improvement as
> we could probably get rid of some of the arrays generated by applet_tables
> and unify the hashtables used in busybox applets.
>
> If there's any interest, I can prepare a better, more memory-wise optimized
> version.
>
> Best regards,
> Bartosz Golaszewski
>
> ---
>  applets/applet_tables.c | 58
> -
>  libbb/appletlib.c   | 45 --
>  2 files changed, 95 insertions(+), 8 deletions(-)
>
> diff --git a/applets/applet_tables.c b/applets/applet_tables.c
> index 94b974e..9656b5c 100644
> --- a/applets/applet_tables.c
> +++ b/applets/applet_tables.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #undef ARRAY_SIZE
>  #define ARRAY_SIZE(x) ((unsigned)(sizeof(x) / sizeof((x)[0])))
> @@ -42,6 +43,26 @@ enum { NUM_APPLETS = ARRAY_SIZE(applets) };
>
>  static int offset[NUM_APPLETS];
>
> +#define MAX_BUCKET_SIZE 16
> +static int applet_hashtable[NUM_APPLETS][16];
> +
> +static unsigned jenkins_hash(const char *key, size_t len)
> +{
> +   unsigned hash, i;
> +
> +   for(hash = i = 0; i < len; ++i) {
> +   hash += key[i];
> +   hash += (hash << 10);
> +   hash ^= (hash >> 6);
> +   }
> +
> +   hash += (hash << 3);
> +   hash ^= (hash >> 11);
> +   hash += (hash << 15);
> +
> +   return hash;
> +}
> +
>  static int cmp_name(const void *a, const void *b)
>  {
> const struct bb_applet *aa = a;
> @@ -51,7 +72,7 @@ static int cmp_name(const void *a, const void *b)
>
>  int main(int argc, char **argv)
>  {
> -   int i;
> +   int i, j;
> int ofs;
>  // unsigned MAX_APPLET_NAME_LEN = 1;
>
> @@ -129,6 +150,41 @@ int main(int argc, char **argv)
> }
> printf("};\n");
>  #endif
> +
> +   // Initialize local hashtable
> +   for (i = 0; i < NUM_APPLETS; i++) {
> +   for (j = 0; j < MAX_BUCKET_SIZE; j++)
> +   applet_hashtable[i][j] = -1;
> +   }
> +
> +   // For each applet - place it in appropriate bucket
> +   for (i = 0; i < NUM_APPLETS; i++) {
> +   unsigned ind = jenkins_hash(applets[i].name,
> +   strlen(applets[i].name)) %
> NUM_APPLETS;
> +
> +   for (j = 0; j < MAX_BUCKET_SIZE; j++) {
> +   if (applet_hashtable[ind][j] < 0) {
> +   applet_hashtable[ind][j] = i;
> +   break;
> +   }
> +   }
> +   }
> +
> +   // Create a static array for each bucket
> +   for (i = 0; i < NUM_APPLETS; i++) {
> +   printf("const int16_t bucket%d[] = { ", i);
> +   for (j = 0; applet_hashtable[i][j] >= 0; j++) {
> +   printf("%d, ", applet_hashtable[i][j]);
> +   }
> +   printf(" -1 };\n");
> +   }
> +
> +   // Create a static array of pointers to the buckets
> +   printf("\nconst int16_t *applet_hashtab[] = {\n");
> +   for (i = 0; i < NUM_APPLETS; i++)
> +   printf("\tbucket%d,\n", i);
> +   printf("};\n");
> +
> //printf("#endif /* SKIP_definitions */\n");
>  // printf("\n");
>  // printf("#define MAX_APPLET_NAME_LEN %u\n", MAX_APPLET_NAME_LEN);
> diff --git a/libbb/appletlib.c b/libbb/appletlib.c
> index f7c416e..6be536d 100644
> --- a/libbb/appletlib.c
> +++ b/libbb/appletlib.c
> @@ -52,7 +52,6 @@
>
>  #include "usage_compressed.h"
>
> -
>  #if ENABLE_SHOW_USAGE && !ENABLE_FEATURE_COMPRESS_USAGE
>  static const char usage_messages[] ALIGN1 = UNPACKED_USAGE;
>  #else
> @@ -140,23 +139,55 @@ void FAST_FUNC bb_show_usage(void)
>  }
>
>  #if NUM_APPLETS > 8
> -static int applet_name_compare(const void *name, const void *idx)
> +static unsigned jenkins_hash(const char *key, size_t len)
>  {
> -   int i = (int)(ptrdiff_t)idx - 1;
> -   return strcmp(name, APPLET_NAME(i));
> +   unsigned

Proposition: use a hashtable instead of bsearch to locate applets

2014-05-29 Thread Bartosz Golaszewski
Hi!

Busybox uses bsearch() to locate the applet's main function in
find_applet_by_name(). Running 'make defconfig' results in 355 applets
being built and this in turn results in 8-9 calls to strcmp() on average
per busybox execution.

Maybe we should switch to using a simple static hashtable? The following
patch is a simple & dirty proof of concept to show what I mean. It modifies
applet_tables.c to generate a static hashtable containing indicies of fields
in applet_nameofs. I used a simple and fast hash function taken from
Robert Jenkins.

With this patch, on each execution and after the hash computation, the
number of calls to strcmp() has been limited to four at most, and mostly
it's just one or two. There are no calls to applet_name_compare() too.

The patch results in bigger code, but there's room for improvement as
we could probably get rid of some of the arrays generated by applet_tables
and unify the hashtables used in busybox applets.

If there's any interest, I can prepare a better, more memory-wise optimized
version.

Best regards,
Bartosz Golaszewski

---
 applets/applet_tables.c | 58 -
 libbb/appletlib.c   | 45 --
 2 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/applets/applet_tables.c b/applets/applet_tables.c
index 94b974e..9656b5c 100644
--- a/applets/applet_tables.c
+++ b/applets/applet_tables.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #undef ARRAY_SIZE
 #define ARRAY_SIZE(x) ((unsigned)(sizeof(x) / sizeof((x)[0])))
@@ -42,6 +43,26 @@ enum { NUM_APPLETS = ARRAY_SIZE(applets) };
 
 static int offset[NUM_APPLETS];
 
+#define MAX_BUCKET_SIZE 16
+static int applet_hashtable[NUM_APPLETS][16];
+
+static unsigned jenkins_hash(const char *key, size_t len)
+{
+   unsigned hash, i;
+
+   for(hash = i = 0; i < len; ++i) {
+   hash += key[i];
+   hash += (hash << 10);
+   hash ^= (hash >> 6);
+   }
+
+   hash += (hash << 3);
+   hash ^= (hash >> 11);
+   hash += (hash << 15);
+
+   return hash;
+}
+
 static int cmp_name(const void *a, const void *b)
 {
const struct bb_applet *aa = a;
@@ -51,7 +72,7 @@ static int cmp_name(const void *a, const void *b)
 
 int main(int argc, char **argv)
 {
-   int i;
+   int i, j;
int ofs;
 // unsigned MAX_APPLET_NAME_LEN = 1;
 
@@ -129,6 +150,41 @@ int main(int argc, char **argv)
}
printf("};\n");
 #endif
+
+   // Initialize local hashtable
+   for (i = 0; i < NUM_APPLETS; i++) {
+   for (j = 0; j < MAX_BUCKET_SIZE; j++)
+   applet_hashtable[i][j] = -1;
+   }
+
+   // For each applet - place it in appropriate bucket
+   for (i = 0; i < NUM_APPLETS; i++) {
+   unsigned ind = jenkins_hash(applets[i].name,
+   strlen(applets[i].name)) % NUM_APPLETS;
+
+   for (j = 0; j < MAX_BUCKET_SIZE; j++) {
+   if (applet_hashtable[ind][j] < 0) {
+   applet_hashtable[ind][j] = i;
+   break;
+   }
+   }
+   }
+
+   // Create a static array for each bucket
+   for (i = 0; i < NUM_APPLETS; i++) {
+   printf("const int16_t bucket%d[] = { ", i);
+   for (j = 0; applet_hashtable[i][j] >= 0; j++) {
+   printf("%d, ", applet_hashtable[i][j]);
+   }
+   printf(" -1 };\n");
+   }
+
+   // Create a static array of pointers to the buckets
+   printf("\nconst int16_t *applet_hashtab[] = {\n");
+   for (i = 0; i < NUM_APPLETS; i++)
+   printf("\tbucket%d,\n", i);
+   printf("};\n");
+
//printf("#endif /* SKIP_definitions */\n");
 // printf("\n");
 // printf("#define MAX_APPLET_NAME_LEN %u\n", MAX_APPLET_NAME_LEN);
diff --git a/libbb/appletlib.c b/libbb/appletlib.c
index f7c416e..6be536d 100644
--- a/libbb/appletlib.c
+++ b/libbb/appletlib.c
@@ -52,7 +52,6 @@
 
 #include "usage_compressed.h"
 
-
 #if ENABLE_SHOW_USAGE && !ENABLE_FEATURE_COMPRESS_USAGE
 static const char usage_messages[] ALIGN1 = UNPACKED_USAGE;
 #else
@@ -140,23 +139,55 @@ void FAST_FUNC bb_show_usage(void)
 }
 
 #if NUM_APPLETS > 8
-static int applet_name_compare(const void *name, const void *idx)
+static unsigned jenkins_hash(const char *key, size_t len)
 {
-   int i = (int)(ptrdiff_t)idx - 1;
-   return strcmp(name, APPLET_NAME(i));
+   unsigned hash, i;
+
+   for(hash = i = 0; i < len; ++i) {
+   hash += key[i];
+   hash += (hash << 10);
+   hash ^= (hash >> 6);
+   }
+
+   hash += (hash << 3);
+   hash ^= (hash >> 11);
+   hash += (hash << 15);
+
+   return hash;
 }
+
+//static int applet_name_compare(const void *name, const void *idx)
+//{
+// int i = (int)(ptrdiff_t)idx - 1;
+/

Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Harald Becker
Hi Denys !

>> For what it's worth the users with this problem were unable to
>> remove the files using wildcards. For example, one user had a
>> file named:
>>
>>   På hjul.mkv
>>
>> ls P* displayed the file.
>> rm P* returned the error "can't remove 'På Hjul.mkv': No such
>> file or directory"
>
>I have hard time believing this.
>Wildcard expansion is done by the shell, not by ls and rm.
>
>IOW: ls and rm see exactly the same expanded names.
>
>Since they don't mangle the names in any way
>(e.g. no UTF-8 decoding) before feeding them to system calls,
>it should work.

I know this problem very well. It happens about every few month,
that I get a ZIP packaged file from a Windows system. As the
maintainer is a bit stupid, he can't manage to avoid foreign
characters and I end up with unusual file names after unzip.

Most likely they can be handled with wildcards (especially ?), but
sometimes it gets a bit tricky to access those files, as they
contain control or unprintable characters. In that case you need
to know the exact length and position to enter question marks in
file name.

If you do a rm -i * it fails. Not due to name mangling in
Busybox, but due to name mangling in file system drivers of the
kernel (especially on fat file systems - like USB sticks or flash
based disks).

Therefore this is not a Busybox related problem, it's a general
name handling problem when intermixing file systems and different
charsets / code pages. It does not depend on a special Busybox
version, I had the same problem even 10 years ago (complete
different versions of kernel/lib/programs).

--
Harald
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox