patch for replacing non-printable chars in filenames
There's a bug reported in Debian about the tty being screwed up by wierd filenames, see http://bugs.debian.org/bug=242300 On the one hand, find will also do this. On the other hand, ls will replace such chars with a question mark. Upon inspection, it appears to be fairly simple to also do this in rsync (in the rwrite() function). Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e. offer some way to turn it off? Paul Slootman --- log.c.orig 2004-10-04 11:51:37.0 +0200 +++ log.c 2004-11-23 17:27:29.0 +0100 @@ -180,6 +180,15 @@ buf[len] = 0; +if (code == FINFO) { +/* Replace non-printing chars in the string, most probably due to + * wierd filenames. Skip the first and last chars, they may be \n */ +int i; +for (i=1; i= 0) { /* Pass the message to our sibling. */ send_msg((enum msgcode)code, buf, len); -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: > Here's a patch. Opinions? I think that a better place to munge the name would be in the safe_fname() routine in utils.c (which already munges newlines characters into question marks). The reason I didn't change any other characters was because I feared that it would mangle foreign filenames that use high-bit characters. I'd want some feedback from such users before accepting such a patch. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Hi, On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: > There's a bug reported in Debian about the tty being screwed up by wierd > filenames, see http://bugs.debian.org/bug=242300 > > On the one hand, find will also do this. On the other hand, ls will > replace such chars with a question mark. Upon inspection, it appears to > be fairly simple to also do this in rsync (in the rwrite() function). 1. find's output is mostly for another program's input, not for tty. 2. ls does --hide-control-chars by default only if isatty (STDOUT_FILENO). > Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e. > offer some way to turn it off? I'd make it like ls, i.e. when descriptor is a tty; also I'd add some option to enforce --hide-control-chars also for non-tty. -- ldv pgpQQas3RA4cg.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue 23 Nov 2004, Wayne Davison wrote: > On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: > > Here's a patch. Opinions? > > I think that a better place to munge the name would be in the > safe_fname() routine in utils.c (which already munges newlines > characters into question marks). The reason I didn't change > any other characters was because I feared that it would mangle > foreign filenames that use high-bit characters. I'd want some > feedback from such users before accepting such a patch. Not all filenames that are printed are passed through safe_fname() AFAICS, e.g. a random piece of code from rsync.c:166 : if (verbose > 2) { if (change_uid) { rprintf(FINFO, "set uid of %s from %ld to %ld\n", fname, (long)st->st_uid, (long)file->uid); } if (change_gid) { rprintf(FINFO, "set gid of %s from %ld to %ld\n", fname, (long)st->st_gid, (long)file->gid); } } Note that isprint() will take into account the locale in effect, i.e. when using the FR_fr locale things like é should be recognized as printable. At least, under linux that would seem to be the case; from the NOTE section of isprint's manpage: The details of what characters belong into which class depend on the current locale. [...] setlocale(LC_CTYPE, NULL) probably needs to be called during program startup, however... The bug reporter (a frenchman I believe) was agreeable to all non-ASCII chars being replaced however; that's preferable to having his tty messed now and again. Making it depend on whether stdout is a tty may also be useful. Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: > +/* Replace non-printing chars in the string, most probably due to > + * wierd filenames. Skip the first and last chars, they may be > \n */ > +int i; > +for (i=1; i +if (!isprint(buf[i])) > +buf[i] = '?'; Is looping over strings a good idea in times of UTF-8? cu, Stefan -- Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Fri 26 Nov 2004, Stefan Nehlsen wrote: > On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: > > +/* Replace non-printing chars in the string, most probably due > > to > > + * wierd filenames. Skip the first and last chars, they may be > > \n */ > > +int i; > > +for (i=1; i > +if (!isprint(buf[i])) > > +buf[i] = '?'; > > Is looping over strings a good idea in times of UTF-8? It is if you don't know the strings are in UTF-8, and you want to prevent "garbage" chars reaching the tty (the whole point of this exercise :-) Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Nov 25, 2004 at 11:27:58AM +0100, Paul Slootman wrote: > Not all filenames that are printed are passed through safe_fname() > AFAICS, e.g. a random piece of code from rsync.c:166 : I looked at eliminating safe_fname() in favor of putting the filtering into rwrite(), and there are a bunch of places that expect to be able to output tabs and newlines as a part of the string. So, I decided to try to find all the places that didn't use either safe_fname() or full_fname() (which calls safe_fname()) and fix them. I've also checked in an improvement to safe_fname() that makes it use isprint() (instead of just looking for newlines). ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Hi. Sorry about picking up a rather ancient thread, but this didn't bite me until now (when I upgraded to 2.6.4); Wayne wrote: > I've also checked > in an improvement to safe_fname() that makes it use isprint() (instead > of just looking for newlines). Is there a chance that this "feature" will become selectable? I have some scripts that rely on a specially formatted log (made with --log-format) to do some post-processing after (or during) the transfer, and these now fail, since several files (whose names contain non-ascii chars) might be squashed into the same string. Alternatively, how about escaping the chars instead of just munging them? I.e. output files like "two-line\x0afile name" or "P\xe5ske" (norwegian for "easter", for the curious;), or something like that? Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Oops, I should have added that for isprint() (in safe_fname()) to be locale-aware at all, you need to add a call to setlocale(LC_CTYPE, ""). Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Mar 31, 2005 at 01:17:16PM +0200, Vidar Madsen wrote: > Alternatively, how about escaping the chars instead of just munging > them? I.e. output files like "two-line\x0afile name" or "P\xe5ske" > (norwegian for "easter", for the curious;), or something like that? I'd be fine with that. It would mean doubling "\" characters as well, though. Anyone else have an opinion on this? Appended is a patch that does the suggested escaping. ..wayne.. --- util.c 30 Mar 2005 19:34:20 - 1.181 +++ util.c 31 Mar 2005 16:09:16 - @@ -877,11 +877,12 @@ int pop_dir(char *dir) return 1; } -/* Return the filename, turning any non-printable characters into '?'s. - * This ensures that outputting it on a line of its own cannot generate an - * empty line. This function can return only MAX_SAFE_NAMES values at a - * time! The returned value can be longer than MAXPATHLEN (because we - * may be trying to output an error about a too-long filename)! */ +/* Return the filename, turning any non-printable characters into escaped + * characters (e.g. \n -> \x0d, \ -> \\). This ensures that outputting it + * cannot generate an empty line nor corrupt the screen. This function can + * return only MAX_SAFE_NAMES values at a time! The returned value can be + * longer than MAXPATHLEN (because we may be trying to output an error about + * a too-long filename)! */ char *safe_fname(const char *fname) { #define MAX_SAFE_NAMES 4 @@ -891,13 +892,21 @@ char *safe_fname(const char *fname) char *t; ndx = (ndx + 1) % MAX_SAFE_NAMES; - for (t = fbuf[ndx]; *fname; fname++) { - if (!isprint(*(uchar*)fname)) - *t++ = '?'; - else + for (t = fbuf[ndx]; *fname && limit; fname++) { + if (*fname == '\\') { + if ((limit -= 2) < 0) + break; + *t++ = '\\'; + *t++ = '\\'; + } else if (!isprint(*(uchar*)fname)) { + if ((limit -= 3) < 0) + break; + sprintf(t, "\\%02x", *(uchar*)fname); + t += 3; + } else { + limit--; *t++ = *fname; - if (--limit == 0) - break; + } } *t = '\0'; -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Mar 31, 2005 at 08:13:52AM -0800, Wayne Davison wrote: > Appended is a patch that does the suggested escaping. Actually, that patch didn't put the suggested 'x' in after the '\'. After trying this a bit, I now think it would read better to use 3-digit octal escaping. That would turn a \n into \012 instead of \x0a, for instance. The changes to the prior patch are as easy as increasing the '3's to '4's, changing the sprintf() format to "\\%03o", and fixing the function comment. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Hi. > After trying this a bit, I now think it would read better to use 3-digit > octal escaping. I would be perfectly fine with that. And octal is probably more in the line of how escaping is traditionally done. As long as I can process the files in the log, I'm all for it. Btw, will this change make it into a later rsync version (2.4.7?) ? I would rather not depend on using a custom patched rsync, but if it will become a standard feature at some point it feels less hacky. ;) Anyway, thanks. :) Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Fri, Apr 01, 2005 at 10:26:18AM +0200, Vidar Madsen wrote: > Btw, will this change make it into a later rsync version ? Yes, I've just committed it for 2.6.5. Now I need to add configure checking for setlocale() and locale.h. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html