patch for replacing non-printable chars in filenames

2004-11-23 Thread Paul Slootman
There's a bug reported in Debian about the tty being screwed up by wierd
filenames, see http://bugs.debian.org/bug=242300

On the one hand, find will also do this. On the other hand, ls will
replace such chars with a question mark. Upon inspection, it appears to
be fairly simple to also do this in rsync (in the rwrite() function).

Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e.
offer some way to turn it off?

Paul Slootman

--- log.c.orig  2004-10-04 11:51:37.0 +0200
+++ log.c   2004-11-23 17:27:29.0 +0100
@@ -180,6 +180,15 @@
 
buf[len] = 0;
 
+if (code == FINFO) {
+/* Replace non-printing chars in the string, most probably due to
+ * wierd filenames. Skip the first and last chars, they may be \n 
*/
+int i;
+for (i=1; i= 0) {
/* Pass the message to our sibling. */
send_msg((enum msgcode)code, buf, len);
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-23 Thread Wayne Davison
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
> Here's a patch. Opinions?

I think that a better place to munge the name would be in the
safe_fname() routine in utils.c (which already munges newlines
characters into question marks).  The reason I didn't change
any other characters was because I feared that it would mangle
foreign filenames that use high-bit characters.  I'd want some
feedback from such users before accepting such a patch.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-23 Thread Dmitry V. Levin
Hi,

On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
> There's a bug reported in Debian about the tty being screwed up by wierd
> filenames, see http://bugs.debian.org/bug=242300
> 
> On the one hand, find will also do this. On the other hand, ls will
> replace such chars with a question mark. Upon inspection, it appears to
> be fairly simple to also do this in rsync (in the rwrite() function).

1. find's output is mostly for another program's input, not for tty.
2. ls does --hide-control-chars by default only if isatty (STDOUT_FILENO).

> Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e.
> offer some way to turn it off?

I'd make it like ls, i.e. when descriptor is a tty; also I'd add some
option to enforce --hide-control-chars also for non-tty.


-- 
ldv


pgpQQas3RA4cg.pgp
Description: PGP signature
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: patch for replacing non-printable chars in filenames

2004-11-25 Thread Paul Slootman
On Tue 23 Nov 2004, Wayne Davison wrote:
> On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
> > Here's a patch. Opinions?
> 
> I think that a better place to munge the name would be in the
> safe_fname() routine in utils.c (which already munges newlines
> characters into question marks).  The reason I didn't change
> any other characters was because I feared that it would mangle
> foreign filenames that use high-bit characters.  I'd want some
> feedback from such users before accepting such a patch.

Not all filenames that are printed are passed through safe_fname()
AFAICS, e.g. a random piece of code from rsync.c:166 :

if (verbose > 2) {
if (change_uid) {
rprintf(FINFO,
"set uid of %s from %ld to %ld\n",
fname, (long)st->st_uid, (long)file->uid);
}
if (change_gid) {
rprintf(FINFO,
"set gid of %s from %ld to %ld\n",
fname, (long)st->st_gid, (long)file->gid);
}
}

Note that isprint() will take into account the locale in effect, i.e.
when using the FR_fr locale things like é should be recognized as
printable. At least, under linux that would seem to be the case; from
the NOTE section of isprint's manpage:

The  details of what characters belong into which class depend on
the current locale. [...]

setlocale(LC_CTYPE, NULL) probably needs to be called during program
startup, however...

The bug reporter (a frenchman I believe) was agreeable to all non-ASCII
chars being replaced however; that's preferable to having his tty messed
now and again.

Making it depend on whether stdout is a tty may also be useful.


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-26 Thread Stefan Nehlsen
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
> +/* Replace non-printing chars in the string, most probably due to
> + * wierd filenames. Skip the first and last chars, they may be 
> \n */
> +int i;
> +for (i=1; i +if (!isprint(buf[i]))
> +buf[i] = '?';

Is looping over strings a good idea in times of UTF-8?


cu, Stefan
-- 
Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-26 Thread Paul Slootman
On Fri 26 Nov 2004, Stefan Nehlsen wrote:
> On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
> > +/* Replace non-printing chars in the string, most probably due 
> > to
> > + * wierd filenames. Skip the first and last chars, they may be 
> > \n */
> > +int i;
> > +for (i=1; i > +if (!isprint(buf[i]))
> > +buf[i] = '?';
> 
> Is looping over strings a good idea in times of UTF-8?

It is if you don't know the strings are in UTF-8, and you want to
prevent "garbage" chars reaching the tty (the whole point of this
exercise :-)


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-02-07 Thread Wayne Davison
On Thu, Nov 25, 2004 at 11:27:58AM +0100, Paul Slootman wrote:
> Not all filenames that are printed are passed through safe_fname()
> AFAICS, e.g. a random piece of code from rsync.c:166 :

I looked at eliminating safe_fname() in favor of putting the filtering
into rwrite(), and there are a bunch of places that expect to be able
to output tabs and newlines as a part of the string.  So, I decided to
try to find all the places that didn't use either safe_fname() or
full_fname() (which calls safe_fname()) and fix them.  I've also checked
in an improvement to safe_fname() that makes it use isprint() (instead
of just looking for newlines).

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Vidar Madsen
Hi.

Sorry about picking up a rather ancient thread, but this didn't bite
me until now (when I upgraded to 2.6.4);

Wayne wrote:
> I've also checked
> in an improvement to safe_fname() that makes it use isprint() (instead
> of just looking for newlines).

Is there a chance that this "feature" will become selectable? I have
some scripts that rely on a specially formatted log (made with
--log-format) to do some post-processing after (or during) the
transfer, and these now fail, since several files (whose names contain
non-ascii chars) might be squashed into the same string.

Alternatively, how about escaping the chars instead of just munging
them? I.e. output files like "two-line\x0afile name" or "P\xe5ske"
(norwegian for "easter", for the curious;), or something like that?

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Vidar Madsen
Oops, I should have added that for isprint() (in safe_fname()) to be
locale-aware at all, you need to add a call to setlocale(LC_CTYPE,
"").

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Wayne Davison
On Thu, Mar 31, 2005 at 01:17:16PM +0200, Vidar Madsen wrote:
> Alternatively, how about escaping the chars instead of just munging
> them? I.e. output files like "two-line\x0afile name" or "P\xe5ske"
> (norwegian for "easter", for the curious;), or something like that?

I'd be fine with that.  It would mean doubling "\" characters as well,
though.  Anyone else have an opinion on this?

Appended is a patch that does the suggested escaping.

..wayne..
--- util.c  30 Mar 2005 19:34:20 -  1.181
+++ util.c  31 Mar 2005 16:09:16 -
@@ -877,11 +877,12 @@ int pop_dir(char *dir)
return 1;
 }
 
-/* Return the filename, turning any non-printable characters into '?'s.
- * This ensures that outputting it on a line of its own cannot generate an
- * empty line.  This function can return only MAX_SAFE_NAMES values at a
- * time!  The returned value can be longer than MAXPATHLEN (because we
- * may be trying to output an error about a too-long filename)! */
+/* Return the filename, turning any non-printable characters into escaped
+ * characters (e.g. \n -> \x0d, \ -> \\).  This ensures that outputting it
+ * cannot generate an empty line nor corrupt the screen.  This function can
+ * return only MAX_SAFE_NAMES values at a time!  The returned value can be
+ * longer than MAXPATHLEN (because we may be trying to output an error about
+ * a too-long filename)! */
 char *safe_fname(const char *fname)
 {
 #define MAX_SAFE_NAMES 4
@@ -891,13 +892,21 @@ char *safe_fname(const char *fname)
char *t;
 
ndx = (ndx + 1) % MAX_SAFE_NAMES;
-   for (t = fbuf[ndx]; *fname; fname++) {
-   if (!isprint(*(uchar*)fname))
-   *t++ = '?';
-   else
+   for (t = fbuf[ndx]; *fname && limit; fname++) {
+   if (*fname == '\\') {
+   if ((limit -= 2) < 0)
+   break;
+   *t++ = '\\';
+   *t++ = '\\';
+   } else if (!isprint(*(uchar*)fname)) {
+   if ((limit -= 3) < 0)
+   break;
+   sprintf(t, "\\%02x", *(uchar*)fname);
+   t += 3;
+   } else {
+   limit--;
*t++ = *fname;
-   if (--limit == 0)
-   break;
+   }
}
*t = '\0';
 
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Wayne Davison
On Thu, Mar 31, 2005 at 08:13:52AM -0800, Wayne Davison wrote:
> Appended is a patch that does the suggested escaping.

Actually, that patch didn't put the suggested 'x' in after the '\'.
After trying this a bit, I now think it would read better to use 3-digit
octal escaping.  That would turn a \n into \012 instead of \x0a, for
instance.  The changes to the prior patch are as easy as increasing the
'3's to '4's, changing the sprintf() format to "\\%03o", and fixing the
function comment.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-04-01 Thread Vidar Madsen
Hi.

> After trying this a bit, I now think it would read better to use 3-digit
> octal escaping.

I would be perfectly fine with that. And octal is probably more in the
line of how escaping is traditionally done. As long as I can process
the files in the log, I'm all for it.

Btw, will this change make it into a later rsync version (2.4.7?) ? I
would rather not depend on using a custom patched rsync, but if it
will become a standard feature at some point it feels less hacky. ;)

Anyway, thanks. :)

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-04-01 Thread Wayne Davison
On Fri, Apr 01, 2005 at 10:26:18AM +0200, Vidar Madsen wrote:
> Btw, will this change make it into a later rsync version ?

Yes, I've just committed it for 2.6.5.  Now I need to add configure
checking for setlocale() and locale.h.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html