Re: rsync feature suggestion

2003-01-07 Thread Justin Banks
Dave Dykstra wrote
> What Hadmut wants is the oft-requested and discussed "files-from" option
> that I once offered to write but haven't been able to get to.  Andy Schor
> in http://lists.samba.org/pipermail/rsync/2001-November/005272.html posted
> a patch for something similar but it only worked when the sender was on the
> local machine and not when it was remote (among other issues).  I don't
> believe you've posted your patch, Justin; does your "files-from" directly
> contain the list of files to send and skip the recursive traversal?  If so,
> I don't see the point of having rsync have the extra regex options you
> mention because those could all be done by external greps that pre-process
> the file list.

Mine is also victim to the "sender on the local machine" problem, although 
that could be easily rectified. By default, my files-from doesn't do any
recursive processing, but you can control this on a per-file basis within
the list of filenames. Here's an example :

/a/file
/another/file
/some/directory
/some/other/directory 1

would recurse on /some/other/directory but nothing else. I also made the list
of files base-64 encoded to avoid some obvious problems, and it works with
all filenames (ascii, UTF-N).

-justinb

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-07 Thread Dave Dykstra
What Hadmut wants is the oft-requested and discussed "files-from" option
that I once offered to write but haven't been able to get to.  Andy Schor
in http://lists.samba.org/pipermail/rsync/2001-November/005272.html posted
a patch for something similar but it only worked when the sender was on the
local machine and not when it was remote (among other issues).  I don't
believe you've posted your patch, Justin; does your "files-from" directly
contain the list of files to send and skip the recursive traversal?  If so,
I don't see the point of having rsync have the extra regex options you
mention because those could all be done by external greps that pre-process
the file list.

- Dave


On Fri, Jan 03, 2003 at 11:51:05AM -0600, Justin Banks wrote:
> Max Bowsher wrote
> > Hadmut Danisch wrote:
> > > I'd like to suggest a new feature to rsync.
> > 
> > > I am mirroring a debian archive, but unfortunately,
> > > debian mixes all files of several distributions in a
> > > subtree /pool. There is no way to select only the files
> > > of a certain distribution through a simple exclude/include
> > > expression.
> > >
> > > There is a tool called debmirror, which first downloads
> > > the distribution index files, extracts all the filenames/paths
> > > of the files needed and then calls rsync for every single file.
> > > Thats certainly not useful, especially since rsync shows the
> > > servers motd for every single file.
> > 
> > I was about to suggest:
> > $ rsync --include-from=list-file --exclude=\*
> > but of course that will exclude the parent directories of files you want,
> > causing them to be ignored.
> > 
> > This might work:
> > $ rsync --include-from=list-file --include=\*\*/ --exclude=\*
> > 
> > although it will mirror the entire directory structure (but not unspecified
> > files).
> > 
> > Probably, rsync should be taught that: "If I explicitly include a file, look
> > for it explicitly, even if I've excluded a parent directory."
> 
> Not too long ago, I modified/mangled rsync to do
> 
> rsync --files-from /some/file --include-regexes /some/regular/expressions \
>   --exclude-regexes /some/regular/expressions
> 
> 
> such that all the files in /some/file would be sent iff they matched the
> posix regexes in --include-regexes and didn't match the ones in 
> --exclude-regexes (if present).
> 
> I don't have a wide variety of platforms to test it on, but it worked okay
> on linux, solaris, and irix.
> 
> -justinb 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-03 Thread Hadmut Danisch
Hi,

I just sent an answer to Edward's similar suggestion to the list.

regards
Hadmut

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-03 Thread Hadmut Danisch
On Fri, Jan 03, 2003 at 11:28:37AM -0600, Edward King wrote:
> Might it be possible to take the file list that you want to feed to 
> rsync and turn it into an rsync.conf file?

It might be possible, but maybe it is ambiguous and definitely not
efficient, since --include defines a Pattern, not a file name/path.

As far as I know, rsync has to check every single file against all
include/exclude patterns. That's a complexity of O(n^2). I'm talking
about directories with 30,000 .. 1,000,000 files. This could 
end up in 10^12 file name/comparison patterns, and that's certainly
not what you want to have.

If you read a list of plain filenames, you do not need to perfom
pattern matching, but can use a simple associative/hash array and
check extremely fast, whether a given filename is to be copied or
not. That's a very important difference to a list of patterns.

regards
Hadmut


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-03 Thread Justin Banks
Max Bowsher wrote
> Hadmut Danisch wrote:
> > I'd like to suggest a new feature to rsync.
> 
> > I am mirroring a debian archive, but unfortunately,
> > debian mixes all files of several distributions in a
> > subtree /pool. There is no way to select only the files
> > of a certain distribution through a simple exclude/include
> > expression.
> >
> > There is a tool called debmirror, which first downloads
> > the distribution index files, extracts all the filenames/paths
> > of the files needed and then calls rsync for every single file.
> > Thats certainly not useful, especially since rsync shows the
> > servers motd for every single file.
> 
> I was about to suggest:
> $ rsync --include-from=list-file --exclude=\*
> but of course that will exclude the parent directories of files you want,
> causing them to be ignored.
> 
> This might work:
> $ rsync --include-from=list-file --include=\*\*/ --exclude=\*
> 
> although it will mirror the entire directory structure (but not unspecified
> files).
> 
> Probably, rsync should be taught that: "If I explicitly include a file, look
> for it explicitly, even if I've excluded a parent directory."

Not too long ago, I modified/mangled rsync to do

rsync --files-from /some/file --include-regexes /some/regular/expressions \
  --exclude-regexes /some/regular/expressions


such that all the files in /some/file would be sent iff they matched the
posix regexes in --include-regexes and didn't match the ones in 
--exclude-regexes (if present).

I don't have a wide variety of platforms to test it on, but it worked okay
on linux, solaris, and irix.

-justinb 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-03 Thread Max Bowsher
Hadmut Danisch wrote:
> I'd like to suggest a new feature to rsync.

> I am mirroring a debian archive, but unfortunately,
> debian mixes all files of several distributions in a
> subtree /pool. There is no way to select only the files
> of a certain distribution through a simple exclude/include
> expression.
>
> There is a tool called debmirror, which first downloads
> the distribution index files, extracts all the filenames/paths
> of the files needed and then calls rsync for every single file.
> Thats certainly not useful, especially since rsync shows the
> servers motd for every single file.

I was about to suggest:
$ rsync --include-from=list-file --exclude=\*
but of course that will exclude the parent directories of files you want,
causing them to be ignored.

This might work:
$ rsync --include-from=list-file --include=\*\*/ --exclude=\*

although it will mirror the entire directory structure (but not unspecified
files).

Probably, rsync should be taught that: "If I explicitly include a file, look
for it explicitly, even if I've excluded a parent directory."

Max.


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync feature suggestion

2003-01-03 Thread Edward King
Might it be possible to take the file list that you want to feed to 
rsync and turn it into an rsync.conf file?

A simple bash script could create the config file and call rsync (with 
the --config= to specify the temporary config file)

Something like this (syntax most likely is wrong, haven't tested it):


#!/bin/sh

IFS="
"
cat /etc/rsync.conf > rsync_command

FILES_TO_SYNC=`cat file_list.txt`

for EACH_FILE in $FILES_TO_SYNC; do
echo ' --include="${EACH_FILE}"' >> rsync_command
done

rsync --config=rsync_command


- Ed King

Hadmut Danisch wrote:

Hi,

I'd like to suggest a new feature to rsync.

Problem:
Currently, rsync generates a recursive list of file
existing a the source directory, modifies this list by
includes and excludes, and then copies these files.
That's pretty good in most, but not all cases.

I am mirroring a debian archive, but unfortunately, 
debian mixes all files of several distributions in a 
subtree /pool. There is no way to select only the files
of a certain distribution through a simple exclude/include
expression.

There is a tool called debmirror, which first downloads 
the distribution index files, extracts all the filenames/paths
of the files needed and then calls rsync for every single file.
Thats certainly not useful, especially since rsync shows the
servers motd for every single file.

Therefore, I'd like to suggest a new option: Allow rsync to 
not build the list of files existing at the source directory 
by recursively walking through the source directory, but by
reading a file or stdin to get a list of files to be copied.

This would allow to mirror the distribution index files in a 
first step, then build the list of files needed and then to 
download these files is a second step.

An alternative method would be to keep the recursive method, but 
to open a pipe to an external program. Before downloading a
file, the path is printed to the pipe and an answer is read 
from the pipe. Thus, an external filter program can decide for
each single file whether to copy it or not.

regards
Hadmut 
(Please respond directly, I'm not on your mailing list)


 



--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rsync feature suggestion

2003-01-03 Thread Hadmut Danisch
Hi,

I'd like to suggest a new feature to rsync.

Problem:
Currently, rsync generates a recursive list of file
existing a the source directory, modifies this list by
includes and excludes, and then copies these files.
That's pretty good in most, but not all cases.

I am mirroring a debian archive, but unfortunately, 
debian mixes all files of several distributions in a 
subtree /pool. There is no way to select only the files
of a certain distribution through a simple exclude/include
expression.

There is a tool called debmirror, which first downloads 
the distribution index files, extracts all the filenames/paths
of the files needed and then calls rsync for every single file.
Thats certainly not useful, especially since rsync shows the
servers motd for every single file.

Therefore, I'd like to suggest a new option: Allow rsync to 
not build the list of files existing at the source directory 
by recursively walking through the source directory, but by
reading a file or stdin to get a list of files to be copied.

This would allow to mirror the distribution index files in a 
first step, then build the list of files needed and then to 
download these files is a second step.

An alternative method would be to keep the recursive method, but 
to open a pipe to an external program. Before downloading a
file, the path is printed to the pipe and an answer is read 
from the pipe. Thus, an external filter program can decide for
each single file whether to copy it or not.

regards
Hadmut 
(Please respond directly, I'm not on your mailing list)


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html