Re: Issues removing files with certain characters in their names.
But why is ls able to match the files when rm is not able to remove them? Is it perhaps because ls is not actually doing any operations on the files themselves (not even a stat?), and just reporting the dirent->d_name strings that it got from readdir()? In which case "ls -l *" would fail on the same files even when "ls *" doesn't? Or is there something deeper whereby stat() succeeds but unlink() fails? On 2014-05-29 18:32, Rich Felker wrote: On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote: Hi Rich ! I know this problem very well. It happens about every few month, that I get a ZIP packaged file from a Windows system. As the maintainer is a bit stupid, he can't manage to avoid foreign characters and I end up with unusual file names after unzip. This sounds like a bug in the unzip utility. If it encounters byte sequences which are not UTF-8, it should convert them from whatever legacy encoding they're in to UTF-8, possibly issuing an error that the user needs to specify this encoding if it can't be determined. Then you need to consider all programs buggy which don't mangle with the file names. There are so many programs which just copy filenames through and let the kernel decide what to do. And I do not mean BB unzip here, normally I'm using the upstream unzip. and how can you consider all names being UTF-8 ... nowadays may be, but what when using 8 bit locales with different charsets? UTF-8 mangling would be wrong on those. My statement was imprecise; of course to support users still stuck on legacy locales, nl_langinfo(CODESET) should be consulted. and not only unzip may produce such results. Think of using an USB stick at an Windows machine, then carry that over to an Linux machine. The filenames are stored in UCS-2. No problem. Depending on how the file system is mounted you may get unusual file names when copying names with foreign characters. Now who is bad? If you mount it incorrectly, then this is user error. Note that correct versus incorrect does not depend on the contents of the storage device, only the encoding the local system where you're mounting it is using. Would be nice to have them all fixed ... get them all fixed the same way when doing some mapping ... but can that ever reach all programs? This is a so long standing problem, nobody really cares. All programs are not affected. Only programs which read filenames as byte strings from foreign sources (such as the directory table of a zip file) are affected. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox -- "'tis an ill wind that blows no minds." ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: busybox nslookup slow on x86_64
On Thu, May 29, 2014 at 11:30:58PM +0200, Harald Becker wrote: > Hi Rich ! > > >> bbox nslookup uses libc to perform the lookup. > > >However, it may be nice to have an option for bb nslookup to > >turn off v6 lookups if such an option doesn't already exist. > > The problem has been solved by placing "single-request" option > in /etc/resolv.conf. So it was a glibc related problem. This option is a workaround for buggy nameserver software on some routers that hangs when you perform multiple requests at the same time. It's far from being a complete workaround since multiple processes/threads (or even different machines behind the router) might make simultaneous requests in a way that hangs the router. The correct fix is not to use the built-in nameserver on such routers but to instead either configure a local nameserver on 127.0.0.1 or use a third-party one (e.g. 8.8.8.8). Or replace the router's firmware with OpenWRT if possible. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Issues removing files with certain characters in their names.
On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote: > Hi Rich ! > > >> I know this problem very well. It happens about every few > >> month, that I get a ZIP packaged file from a Windows system. > >> As the maintainer is a bit stupid, he can't manage to avoid > >> foreign characters and I end up with unusual file names after > >> unzip. > > > >This sounds like a bug in the unzip utility. If it encounters > >byte sequences which are not UTF-8, it should convert them from > >whatever legacy encoding they're in to UTF-8, possibly issuing > >an error that the user needs to specify this encoding if it > >can't be determined. > > Then you need to consider all programs buggy which don't > mangle with the file names. There are so many programs which just > copy filenames through and let the kernel decide what to do. And > I do not mean BB unzip here, normally I'm using the upstream > unzip. > > and how can you consider all names being UTF-8 ... nowadays > may be, but what when using 8 bit locales with different > charsets? UTF-8 mangling would be wrong on those. My statement was imprecise; of course to support users still stuck on legacy locales, nl_langinfo(CODESET) should be consulted. > and not only unzip may produce such results. Think of using > an USB stick at an Windows machine, then carry that over to an > Linux machine. The filenames are stored in UCS-2. No problem. > Depending on how the file system is mounted you > may get unusual file names when copying names with foreign > characters. Now who is bad? If you mount it incorrectly, then this is user error. Note that correct versus incorrect does not depend on the contents of the storage device, only the encoding the local system where you're mounting it is using. > Would be nice to have them all fixed ... get them all fixed the > same way when doing some mapping ... but can that ever reach all > programs? This is a so long standing problem, nobody really > cares. All programs are not affected. Only programs which read filenames as byte strings from foreign sources (such as the directory table of a zip file) are affected. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: busybox nslookup slow on x86_64
Hi Rich ! >> bbox nslookup uses libc to perform the lookup. >However, it may be nice to have an option for bb nslookup to >turn off v6 lookups if such an option doesn't already exist. The problem has been solved by placing "single-request" option in /etc/resolv.conf. So it was a glibc related problem. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Issues removing files with certain characters in their names.
Hi Rich ! >> I know this problem very well. It happens about every few >> month, that I get a ZIP packaged file from a Windows system. >> As the maintainer is a bit stupid, he can't manage to avoid >> foreign characters and I end up with unusual file names after >> unzip. > >This sounds like a bug in the unzip utility. If it encounters >byte sequences which are not UTF-8, it should convert them from >whatever legacy encoding they're in to UTF-8, possibly issuing >an error that the user needs to specify this encoding if it >can't be determined. Then you need to consider all programs buggy which don't mangle with the file names. There are so many programs which just copy filenames through and let the kernel decide what to do. And I do not mean BB unzip here, normally I'm using the upstream unzip. ... and how can you consider all names being UTF-8 ... nowadays may be, but what when using 8 bit locales with different charsets? UTF-8 mangling would be wrong on those. ... and not only unzip may produce such results. Think of using an USB stick at an Windows machine, then carry that over to an Linux machine. Depending on how the file system is mounted you may get unusual file names when copying names with foreign characters. Now who is bad? Would be nice to have them all fixed ... get them all fixed the same way when doing some mapping ... but can that ever reach all programs? This is a so long standing problem, nobody really cares. > >Rich -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Issues removing files with certain characters in their names.
On Thu, May 29, 2014 at 09:18:26AM +0200, Harald Becker wrote: > Hi Denys ! > > >> For what it's worth the users with this problem were unable to > >> remove the files using wildcards. For example, one user had a > >> file named: > >> > >> På hjul.mkv > >> > >> ls P* displayed the file. > >> rm P* returned the error "can't remove 'På Hjul.mkv': No such > >> file or directory" > > > >I have hard time believing this. > >Wildcard expansion is done by the shell, not by ls and rm. > > > >IOW: ls and rm see exactly the same expanded names. > > > >Since they don't mangle the names in any way > >(e.g. no UTF-8 decoding) before feeding them to system calls, > >it should work. > > I know this problem very well. It happens about every few month, > that I get a ZIP packaged file from a Windows system. As the > maintainer is a bit stupid, he can't manage to avoid foreign > characters and I end up with unusual file names after unzip. This sounds like a bug in the unzip utility. If it encounters byte sequences which are not UTF-8, it should convert them from whatever legacy encoding they're in to UTF-8, possibly issuing an error that the user needs to specify this encoding if it can't be determined. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: busybox nslookup slow on x86_64
On Wed, May 28, 2014 at 04:34:23PM +0200, Denys Vlasenko wrote: > On Tue, May 27, 2014 at 9:34 AM, muddyboot wrote: > > Hi, I found nslookup resolve very slow on x86_64 system , it cost 5 seconds > > or longer almost everytime. > > > > Tested OS: Debian 7 x86_64 with kernel 3.2.5 & LFS x86_64 with kernel 3.12 > > > > No IPv6 enabled in kernel config. > > DNS server works fine > > nslookup program from bind-9.7 works fine > > nslookup from busybox test on i686 system OK > > > > target busybox version: 1.17.4、1.20.2、1.21.1、1.22.1 > > > > Any response for this problem is great appreciated. > > bbox nslookup uses libc to perform the lookup. > > glibc maintainers known to be quite.. er.. stubborn > about how DNS should work. > > For example, they insist that IPv6 DNS requests must be sent > even if the machine has no IPv6 support in kernel > (let alone a more typical case where machine > has no IPv6 connectivity). > > Your DNS server does not respond to IPv6 requests, > but glibc waits for them. Unless the caller requested AI_ADDRCONFIG or requested AF_INET explicitly as opposed to AF_INET6, it's required to do this. And I don't think it's a bug. It may be useful to know all DNS results even if some of them (v6) won't be used for your current client setup. The bug is in whatever broken nameserver is ignoring requests rather than properly looking them up and returning a result. However, it may be nice to have an option for bb nslookup to turn off v6 lookups if such an option doesn't already exist. Rich ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Proposition: use a hashtable instead of bsearch to locate applets
Does this provided a noticeable performance increase? Do you have benchmarks? +1. I'm not convinced this is useful. Maybe for looping sh scripts that prefer applets (i.e. do not perform an execve() everytime busybox is called), but I'm willing to bet that even in that kind of script the performance bottleneck will be something else - typically, invocation of external commands, or any kind of system call really. For normal scripts, the cost of application lookup is basically made negligible by the cost of execve() in the first place. Plus dynamic symbol resolution if you're not using static linking. O(log n) calls to strcmp is cheap, except in very tight loops; and a full busybox applet invocation rarely happens in a tight loop. So I wouldn't add to the code size to optimize that part without benchmarks showing a real performance improvement. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
[PATCH] tar: only include selinux context with -p opt
Without answer, i added -p to store selinux contexts (its for android 4.3+) 0001-tar-add-selinux-context-support-on-create.patch Description: Binary data 0002-tar-only-include-selinux-context-with-p-opt.patch Description: Binary data ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Proposition: use a hashtable instead of bsearch to locate applets
Does this provided a noticeable performance increase? Do you have benchmarks? J On May 29, 2014 7:17 AM, "Bartosz Golaszewski" wrote: > Hi! > > Busybox uses bsearch() to locate the applet's main function in > find_applet_by_name(). Running 'make defconfig' results in 355 applets > being built and this in turn results in 8-9 calls to strcmp() on average > per busybox execution. > > Maybe we should switch to using a simple static hashtable? The following > patch is a simple & dirty proof of concept to show what I mean. It modifies > applet_tables.c to generate a static hashtable containing indicies of > fields > in applet_nameofs. I used a simple and fast hash function taken from > Robert Jenkins. > > With this patch, on each execution and after the hash computation, the > number of calls to strcmp() has been limited to four at most, and mostly > it's just one or two. There are no calls to applet_name_compare() too. > > The patch results in bigger code, but there's room for improvement as > we could probably get rid of some of the arrays generated by applet_tables > and unify the hashtables used in busybox applets. > > If there's any interest, I can prepare a better, more memory-wise optimized > version. > > Best regards, > Bartosz Golaszewski > > --- > applets/applet_tables.c | 58 > - > libbb/appletlib.c | 45 -- > 2 files changed, 95 insertions(+), 8 deletions(-) > > diff --git a/applets/applet_tables.c b/applets/applet_tables.c > index 94b974e..9656b5c 100644 > --- a/applets/applet_tables.c > +++ b/applets/applet_tables.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #undef ARRAY_SIZE > #define ARRAY_SIZE(x) ((unsigned)(sizeof(x) / sizeof((x)[0]))) > @@ -42,6 +43,26 @@ enum { NUM_APPLETS = ARRAY_SIZE(applets) }; > > static int offset[NUM_APPLETS]; > > +#define MAX_BUCKET_SIZE 16 > +static int applet_hashtable[NUM_APPLETS][16]; > + > +static unsigned jenkins_hash(const char *key, size_t len) > +{ > + unsigned hash, i; > + > + for(hash = i = 0; i < len; ++i) { > + hash += key[i]; > + hash += (hash << 10); > + hash ^= (hash >> 6); > + } > + > + hash += (hash << 3); > + hash ^= (hash >> 11); > + hash += (hash << 15); > + > + return hash; > +} > + > static int cmp_name(const void *a, const void *b) > { > const struct bb_applet *aa = a; > @@ -51,7 +72,7 @@ static int cmp_name(const void *a, const void *b) > > int main(int argc, char **argv) > { > - int i; > + int i, j; > int ofs; > // unsigned MAX_APPLET_NAME_LEN = 1; > > @@ -129,6 +150,41 @@ int main(int argc, char **argv) > } > printf("};\n"); > #endif > + > + // Initialize local hashtable > + for (i = 0; i < NUM_APPLETS; i++) { > + for (j = 0; j < MAX_BUCKET_SIZE; j++) > + applet_hashtable[i][j] = -1; > + } > + > + // For each applet - place it in appropriate bucket > + for (i = 0; i < NUM_APPLETS; i++) { > + unsigned ind = jenkins_hash(applets[i].name, > + strlen(applets[i].name)) % > NUM_APPLETS; > + > + for (j = 0; j < MAX_BUCKET_SIZE; j++) { > + if (applet_hashtable[ind][j] < 0) { > + applet_hashtable[ind][j] = i; > + break; > + } > + } > + } > + > + // Create a static array for each bucket > + for (i = 0; i < NUM_APPLETS; i++) { > + printf("const int16_t bucket%d[] = { ", i); > + for (j = 0; applet_hashtable[i][j] >= 0; j++) { > + printf("%d, ", applet_hashtable[i][j]); > + } > + printf(" -1 };\n"); > + } > + > + // Create a static array of pointers to the buckets > + printf("\nconst int16_t *applet_hashtab[] = {\n"); > + for (i = 0; i < NUM_APPLETS; i++) > + printf("\tbucket%d,\n", i); > + printf("};\n"); > + > //printf("#endif /* SKIP_definitions */\n"); > // printf("\n"); > // printf("#define MAX_APPLET_NAME_LEN %u\n", MAX_APPLET_NAME_LEN); > diff --git a/libbb/appletlib.c b/libbb/appletlib.c > index f7c416e..6be536d 100644 > --- a/libbb/appletlib.c > +++ b/libbb/appletlib.c > @@ -52,7 +52,6 @@ > > #include "usage_compressed.h" > > - > #if ENABLE_SHOW_USAGE && !ENABLE_FEATURE_COMPRESS_USAGE > static const char usage_messages[] ALIGN1 = UNPACKED_USAGE; > #else > @@ -140,23 +139,55 @@ void FAST_FUNC bb_show_usage(void) > } > > #if NUM_APPLETS > 8 > -static int applet_name_compare(const void *name, const void *idx) > +static unsigned jenkins_hash(const char *key, size_t len) > { > - int i = (int)(ptrdiff_t)idx - 1; > - return strcmp(name, APPLET_NAME(i)); > + unsigned
Proposition: use a hashtable instead of bsearch to locate applets
Hi! Busybox uses bsearch() to locate the applet's main function in find_applet_by_name(). Running 'make defconfig' results in 355 applets being built and this in turn results in 8-9 calls to strcmp() on average per busybox execution. Maybe we should switch to using a simple static hashtable? The following patch is a simple & dirty proof of concept to show what I mean. It modifies applet_tables.c to generate a static hashtable containing indicies of fields in applet_nameofs. I used a simple and fast hash function taken from Robert Jenkins. With this patch, on each execution and after the hash computation, the number of calls to strcmp() has been limited to four at most, and mostly it's just one or two. There are no calls to applet_name_compare() too. The patch results in bigger code, but there's room for improvement as we could probably get rid of some of the arrays generated by applet_tables and unify the hashtables used in busybox applets. If there's any interest, I can prepare a better, more memory-wise optimized version. Best regards, Bartosz Golaszewski --- applets/applet_tables.c | 58 - libbb/appletlib.c | 45 -- 2 files changed, 95 insertions(+), 8 deletions(-) diff --git a/applets/applet_tables.c b/applets/applet_tables.c index 94b974e..9656b5c 100644 --- a/applets/applet_tables.c +++ b/applets/applet_tables.c @@ -14,6 +14,7 @@ #include #include #include +#include #undef ARRAY_SIZE #define ARRAY_SIZE(x) ((unsigned)(sizeof(x) / sizeof((x)[0]))) @@ -42,6 +43,26 @@ enum { NUM_APPLETS = ARRAY_SIZE(applets) }; static int offset[NUM_APPLETS]; +#define MAX_BUCKET_SIZE 16 +static int applet_hashtable[NUM_APPLETS][16]; + +static unsigned jenkins_hash(const char *key, size_t len) +{ + unsigned hash, i; + + for(hash = i = 0; i < len; ++i) { + hash += key[i]; + hash += (hash << 10); + hash ^= (hash >> 6); + } + + hash += (hash << 3); + hash ^= (hash >> 11); + hash += (hash << 15); + + return hash; +} + static int cmp_name(const void *a, const void *b) { const struct bb_applet *aa = a; @@ -51,7 +72,7 @@ static int cmp_name(const void *a, const void *b) int main(int argc, char **argv) { - int i; + int i, j; int ofs; // unsigned MAX_APPLET_NAME_LEN = 1; @@ -129,6 +150,41 @@ int main(int argc, char **argv) } printf("};\n"); #endif + + // Initialize local hashtable + for (i = 0; i < NUM_APPLETS; i++) { + for (j = 0; j < MAX_BUCKET_SIZE; j++) + applet_hashtable[i][j] = -1; + } + + // For each applet - place it in appropriate bucket + for (i = 0; i < NUM_APPLETS; i++) { + unsigned ind = jenkins_hash(applets[i].name, + strlen(applets[i].name)) % NUM_APPLETS; + + for (j = 0; j < MAX_BUCKET_SIZE; j++) { + if (applet_hashtable[ind][j] < 0) { + applet_hashtable[ind][j] = i; + break; + } + } + } + + // Create a static array for each bucket + for (i = 0; i < NUM_APPLETS; i++) { + printf("const int16_t bucket%d[] = { ", i); + for (j = 0; applet_hashtable[i][j] >= 0; j++) { + printf("%d, ", applet_hashtable[i][j]); + } + printf(" -1 };\n"); + } + + // Create a static array of pointers to the buckets + printf("\nconst int16_t *applet_hashtab[] = {\n"); + for (i = 0; i < NUM_APPLETS; i++) + printf("\tbucket%d,\n", i); + printf("};\n"); + //printf("#endif /* SKIP_definitions */\n"); // printf("\n"); // printf("#define MAX_APPLET_NAME_LEN %u\n", MAX_APPLET_NAME_LEN); diff --git a/libbb/appletlib.c b/libbb/appletlib.c index f7c416e..6be536d 100644 --- a/libbb/appletlib.c +++ b/libbb/appletlib.c @@ -52,7 +52,6 @@ #include "usage_compressed.h" - #if ENABLE_SHOW_USAGE && !ENABLE_FEATURE_COMPRESS_USAGE static const char usage_messages[] ALIGN1 = UNPACKED_USAGE; #else @@ -140,23 +139,55 @@ void FAST_FUNC bb_show_usage(void) } #if NUM_APPLETS > 8 -static int applet_name_compare(const void *name, const void *idx) +static unsigned jenkins_hash(const char *key, size_t len) { - int i = (int)(ptrdiff_t)idx - 1; - return strcmp(name, APPLET_NAME(i)); + unsigned hash, i; + + for(hash = i = 0; i < len; ++i) { + hash += key[i]; + hash += (hash << 10); + hash ^= (hash >> 6); + } + + hash += (hash << 3); + hash ^= (hash >> 11); + hash += (hash << 15); + + return hash; } + +//static int applet_name_compare(const void *name, const void *idx) +//{ +// int i = (int)(ptrdiff_t)idx - 1; +/
Re: Issues removing files with certain characters in their names.
Hi Denys ! >> For what it's worth the users with this problem were unable to >> remove the files using wildcards. For example, one user had a >> file named: >> >> På hjul.mkv >> >> ls P* displayed the file. >> rm P* returned the error "can't remove 'På Hjul.mkv': No such >> file or directory" > >I have hard time believing this. >Wildcard expansion is done by the shell, not by ls and rm. > >IOW: ls and rm see exactly the same expanded names. > >Since they don't mangle the names in any way >(e.g. no UTF-8 decoding) before feeding them to system calls, >it should work. I know this problem very well. It happens about every few month, that I get a ZIP packaged file from a Windows system. As the maintainer is a bit stupid, he can't manage to avoid foreign characters and I end up with unusual file names after unzip. Most likely they can be handled with wildcards (especially ?), but sometimes it gets a bit tricky to access those files, as they contain control or unprintable characters. In that case you need to know the exact length and position to enter question marks in file name. If you do a rm -i * it fails. Not due to name mangling in Busybox, but due to name mangling in file system drivers of the kernel (especially on fat file systems - like USB sticks or flash based disks). Therefore this is not a Busybox related problem, it's a general name handling problem when intermixing file systems and different charsets / code pages. It does not depend on a special Busybox version, I had the same problem even 10 years ago (complete different versions of kernel/lib/programs). -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox