Re: Excluding most and including some problems continue.

2010-10-01 Thread Ian Skinner
>>> On Friday, October 01, 2010 at 7:54 AM, in message
>> + /das
>> + /em
>> + /enf
>> + /internal
>> + /itb
>> + /medtox
>> + /pml
>> + /psb
>> + /reg
>> + /whs
>> - /*
>>
> + /*/htdocs
> - /*/*
> 
> + /*/htdocs/docs
>> - /*/htdocs/*

Thanks Wayne, this worked well and seems simpler and lazier then my original 
version.  



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-10-01 Thread Matt McCutchen
On Fri, 2010-10-01 at 16:28 -0400, Benjamin R. Haskell wrote:
> I think I sent a variant of the attached Perl script last time someone 
> was asking something similar.  What Wayne suggested is better right now 
> (that is: while your patterns are very simple -- just a few root-level 
> directories, each of which should include the /htdocs/docs/ subdir). 
> But if you start adding more, it could get more annoying to have to 
> manually fiddle with the rules.
> 
> The attached script takes as input lines of the form:
> /rooted/path/to/include
> 
> and produces what should work as a filter file.

A similar script is distributed with rsync: support/files-to-excludes .

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-10-01 Thread Benjamin R. Haskell

On Fri, 1 Oct 2010, Wayne Davison wrote:


On Thu, Sep 30, 2010 at 8:27 AM, Ian Skinner wrote:

Unfortunately there are some subdirectories in some of these selected 
*/htdocs/docs* directories that are unintentionally being excluded by 
these rules.  I.E. */export/home/enf/htdocs/docs/county/internal/*.



It is usually best to anchor your matching terms, unless you want a 
term to float and match anywhere.  In an .rsync-filter file, terms 
that start with a slash are anchored in that file's directory.  You 
can also use a wildcard for all the subdir excusions.  For example:



+ /das
+ /em
+ /enf
+ /internal
+ /itb
+ /medtox
+ /pml
+ /psb
+ /reg
+ /whs
- /*

+ /*/htdocs
- /*/*

+ /*/htdocs/docs
- /*/htdocs/*

Another alternative is to sprinkle .rsync-filter files throughout your 
hierarchy with localized rules for that part of the hierarchy, but the 
above should do what you want.




I think I sent a variant of the attached Perl script last time someone 
was asking something similar.  What Wayne suggested is better right now 
(that is: while your patterns are very simple -- just a few root-level 
directories, each of which should include the /htdocs/docs/ subdir). 
But if you start adding more, it could get more annoying to have to 
manually fiddle with the rules.


The attached script takes as input lines of the form:
/rooted/path/to/include

and produces what should work as a filter file. It supports 
'{one,other}' brace-style expansions (but I think that may be 
OS-dependent -- I think it works if your system's 'glob()' function 
supports them).  So, for example, I produced a working filter file for 
your situation from:


{==> input.rsync.rules <==}

/{das,em,enf,internal,itb,medtox,pml,psb,reg,whs}/htdocs/docs

{=}

$ perl ./rsync-filter-generate.pl input.rsync.rules
[produces rule file] [1]

$ perl ./rsync-filter-generate.pl input.rsync.rules | rsync --include-from=- 
/path/to/root/
[shows what would be transferred]

$ perl ./rsync-filter-generate.pl input.rsync.rules | rsync --include-from=- 
/path/to/root/ /path/to/dest/
[does it]


The general strategy:
$ echo /abc/def/ghi | perl ./rsync-filter-generate.pl
+ /abc   -- first include each path component
+ /abc/def   -- one-at-a-time, for each thing to include
+ /abc/def/ghi
- /abc/def/* -- then exclude everything else at each
- /abc/* -- level of the hierarchy
- /*

--
Best,
Ben

[1] output for your case:

+ /das
+ /das/htdocs
+ /das/htdocs/docs
+ /das/htdocs/other
+ /em
+ /em/htdocs
+ /em/htdocs/docs
+ /enf
+ /enf/htdocs
+ /enf/htdocs/docs
+ /internal
+ /internal/htdocs
+ /internal/htdocs/docs
+ /itb
+ /itb/htdocs
+ /itb/htdocs/docs
+ /medtox
+ /medtox/htdocs
+ /medtox/htdocs/docs
+ /pml
+ /pml/htdocs
+ /pml/htdocs/docs
+ /psb
+ /psb/htdocs
+ /psb/htdocs/docs
+ /reg
+ /reg/htdocs
+ /reg/htdocs/docs
+ /whs
+ /whs/htdocs
+ /whs/htdocs/docs
- /whs/htdocs/*
- /whs/*
- /reg/htdocs/*
- /reg/*
- /psb/htdocs/*
- /psb/*
- /pml/htdocs/*
- /pml/*
- /medtox/htdocs/*
- /medtox/*
- /itb/htdocs/*
- /itb/*
- /internal/htdocs/*
- /internal/*
- /enf/htdocs/*
- /enf/*
- /em/htdocs/*
- /em/*
- /das/htdocs/*
- /das/*
- /*#!/usr/bin/perl
use strict;
use warnings;

# read in all of the paths to include
my @all;
while (<>) {
	chomp;
	push @all, glob;
}

my (%inc, %exc);
for (@all) {
	my @parts = split m{/};
	for (1..$#parts) {
		# include every path component up to the end
		# e.g.
		# /abc
		# /abc/def
		# /abc/def/ghi
		$inc{join "/", @parts[0..$_]}++;

		# exclude every other path component
		# e.g.
		# /abc/def/*
		# /abc/*
		# /*
		$exc{join "/", @parts[0..$_-1]}++;
	}
}

# include things from shortest-to-longest path
# exclude things from longest to shortest
print "+ $_\n" for sort keys %inc;
print "- $_/*\n" for reverse sort keys %exc;
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Excluding most and including some problems continue.

2010-10-01 Thread Wayne Davison
On Thu, Sep 30, 2010 at 8:27 AM, Ian Skinner  wrote:

> Unfortunately there are some subdirectories in some of these selected
> */htdocs/docs* directories that are unintentionally being excluded by these
> rules.  I.E. */export/home/enf/htdocs/docs/county/internal/*.


It is usually best to anchor your matching terms, unless you want a term to
float and match anywhere.  In an .rsync-filter file, terms that start with a
slash are anchored in that file's directory.  You can also use a wildcard
for all the subdir excusions.  For example:


> + /das

> + /em

> + /enf

> + /internal

> + /itb

> + /medtox

> + /pml

> + /psb

> + /reg

> + /whs

> - /*

>
+ /*/htdocs
- /*/*

+ /*/htdocs/docs

> - /*/htdocs/*

Another alternative is to sprinkle .rsync-filter files throughout your
hierarchy with localized rules for that part of the hierarchy, but the above
should do what you want.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Excluding most and including some problems continue.

2010-09-30 Thread Steven Levine
In <4ca4be08.2858.00a...@cdpr.ca.gov>, on 09/30/10
   at 04:42 PM, "Ian Skinner"  said:

Hi,

>> + das/htdocs/docs/*
>> + em/htdocs/docs/*
>> etc.
>> + */
>> - *

>Thanks for the suggestion, but that did not seem to produce the desired
>results.  I did not look into why in detail, but a dry run produced files
>from directories I wanted to exclude and apparently not all the files I
>wanted to include.

Did you add --prune-empty-directorys to the command line?  This filter
setup along with --prune-empty-directories will copy only the files in the
named directores, which is my understanding of what you want.

>After a day of trail and error and internet searching I now have this
>that is really close.

Looks overly complex to me.  Taking your example layout and using this
filter set

+ das/htdocs/docs/*
+ em/htdocs/docs/*
+ enf/htdocs/docs/*
+ internal/htdocs/docs/*
+ itb/htdocs/docs/*
+ medtox/htdocs/docs/*
+ pml/htdocs/docs/*
+ psb/htdocs/docs/*
+ reg/htdocs/docs/*
+ whs/htdocs/docs/*
+ */
- *

and this command line

rsync --dry-run --prune-empty-dirs --itemize-changes -a -F export\ to\

I get

.d..t.. ./
cd+ home/
cd+ home/das/
cd+ home/das/htdocs/
cd+ home/das/htdocs/docs/
>f+ home/das/htdocs/docs/SHLNotes.txt
cd+ home/em/
cd+ home/em/htdocs/
cd+ home/em/htdocs/docs/
>f+ home/em/htdocs/docs/SHLNotes.txt

Which I think is what you want.  Every subdirectory contains a file.

Good luck,

Steven

-- 
--
"Steven Levine"   eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
--

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-09-30 Thread Ian Skinner
>>> Steven Levine  Thursday, September 30, 2010 3:22 PM 
>>> >>>
> It's close, but you need to augment is a bit.  Try

> + das/htdocs/docs/*
> + em/htdocs/docs/*
> etc.
> + */
> - *

Thanks for the suggestion, but that did not seem to produce the desired 
results.  I did not look into why in detail, but a dry run produced files from 
directories I wanted to exclude and apparently not all the files I wanted to 
include.

After a day of trail and error and internet searching I now have this that is 
really close.  It copies all the directories I earlier identifies that where 
being falsely excluded.  There are still two or three individual files that are 
not copying for some reason.  I am looking into those now.

+ das
+ em
+ enf
+ internal
+ itb
+ medtox
+ pml
+ psb
+ reg
+ whs

+ htdocs
+ docs

- /*

- /das/*
- /em/*
- /enf/*
- /internal/*
- /itb/*
- /medtox/*
- /pml/*
- /psb/*
- /reg/*
- /whs/*

- htdocs/*

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-09-30 Thread Steven Levine
In <4ca4958c.2858.00a...@cdpr.ca.gov>, on 09/30/10
   at 01:50 PM, "Ian Skinner"  said:

Hi,

>>or possibly
>>
>> + das/**htdocs/docs*
>> + em/**/htdocs/docs*
>> etc.

>I'm not sure what the difference between the first example and the second
>example is supposed to be?

That's my bad eyes.  This should have been

 + das/**htdocs/docs*
 + em/**htdocs/docs*

but it's not going to do what you really want.

>I don't see how that would address my needs, but I'm not sure what the
>double ** symbols represent?

I recommend you read the man page.  ** and *** can be very useful.

>But there is no extra directories between
>the "das" and the "htdocs" directories in my use case.

OK.  That's why I said I was not sure what you were asking.

>I want to mirror the following directories from the above example and
>exclude everything else. /export/home/em/htdocs/docs/*
>/export/home/enf/htdocs/docs/*
>/export/home/das/htdocs/docs/*
>(And seven more similar directories)

OK.  This is easier.

>I just tried this filter file somewhat based on your previous suggestion
>but it excluded everything.

It's close, but you need to augment is a bit.  Try

+ das/htdocs/docs/*
+ em/htdocs/docs/*
etc.
+ */
- *

and add --prune-empty-dirs to the command line.

Also, if you really only want the contents of specific directories and not
the content of any of the subdirecories, you can often avoid the recursive
scan and use the --relative option and just list the source directories on
the command line.

Steven

-- 
--
"Steven Levine"   eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
--

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-09-30 Thread Ian Skinner
>>> Steven Levine  Thursday, September 30, 2010 11:14 AM 
>>> >>>
>I'm not sure I entirely understand what you want, but what about
>
> + das/**/htdocs/docs*
> + em/**/htdocs/docs*
> etc.
>
>or possibly
>
> + das/**htdocs/docs*
> + em/**/htdocs/docs*
> etc.

I'm not sure what the difference between the first example and the second 
example is supposed to be?

I don't see how that would address my needs, but I'm not sure what the double 
** symbols represent?  But there is no extra directories between the "das" and 
the "htdocs" directories in my use case.

Maybe this representation would be clearer.

/export
   /home
  /excluded_A
  /em
 /exclude_em-1
 /htdocs
/exclude_em_htdocs-1
/exclude_em_htdocs-2
/docs
 /exclude_em-2
  /exclude_C
  /enf
 /exclude_enf-1
 /htdocs
/exclude_enf_htdocs-1
/exclude_enf_htdocs-2
/docs
 /exclude_enf-2
  /exclude_E
  /das
 /exclude_das-1
 /htdocs
/exclude_das_htdocs-1
/exclude_das_htdocs-2
/docs
 /exclude_das-2

I want to mirror the following directories from the above example and exclude 
everything else.
/export/home/em/htdocs/docs/*
/export/home/enf/htdocs/docs/*
/export/home/das/htdocs/docs/*
(And seven more similar directories)

I just tried this filter file somewhat based on your previous suggestion but it 
excluded everything.
+ das/htdocs/docs/*
+ em/htdocs/docs/*
+ enf/htdocs/docs/*
+ internal/htdocs/docs/*
+ itb/htdocs/docs/*
+ medtox/htdocs/docs/*
+ pml/htdocs/docs/*
+ psb/htdocs/docs/*
+ reg/htdocs/docs/*
+ whs/htdocs/docs/*

- /*

TIA
Ian


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Excluding most and including some problems continue.

2010-09-30 Thread Steven Levine
In <4ca449ea.2858.00a...@cdpr.ca.gov>, on 09/30/10
   at 08:27 AM, "Ian Skinner"  said:

Hi,

>Here is my rsync command as it currently stands.

>/usr/local/bin/rsync -vvv --stats -Pzrtpl --delete
>--password-file=/export/home/webuser/.appprod
>--log-file=/export/home/webuser/logs/rsync-log -F /export/home/
>webu...@appprod::dprweb_extranet/ > rsync-test

>This is doing pretty close to what I want it to do.  Which is to mirror
>only the */htdocs/docs* in each of the ten directories (das,em,enf,etc.)
>in the base path of */export/home* and exclude the rest.

I'm not sure I entirely understand what you want, but what about

 + das/**/htdocs/docs*
 + em/**/htdocs/docs*
 etc.

or possibly

 + das/**htdocs/docs*
 + em/**/htdocs/docs*
 etc.

I'm not sure if the addtional slash is required without setting up a
testcase.

If you really want just the files matching */htdocs/docs/*, the above
needs to change slightly.

Steven

-- 
--
"Steven Levine"   eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
--

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Excluding most and including some problems continue.

2010-09-30 Thread Ian Skinner
Here is my rsync command as it currently stands.

/usr/local/bin/rsync -vvv --stats -Pzrtpl --delete 
--password-file=/export/home/webuser/.appprod 
--log-file=/export/home/webuser/logs/rsync-log -F /export/home/ 
webu...@appprod::dprweb_extranet/ > rsync-test

Here is the current .rsync-filter file.

+ das
+ em
+ enf
+ internal
+ itb
+ medtox
+ pml
+ psb
+ reg
+ whs

+ htdocs
+ docs

- /*

- das/*
- em/*
- enf/*
- internal/*
- itb/*
- medtox/*
- pml/*
- psb/*
- reg/*
- whs/*

- htdocs/*

This is doing pretty close to what I want it to do.  Which is to mirror only 
the */htdocs/docs* in each of the ten directories (das,em,enf,etc.) in the base 
path of */export/home* and exclude the rest.

Unfortunately there are some subdirectories in some of these selected 
*/htdocs/docs* directories that are unintentionally being excluded by these 
rules.  I.E. */export/home/enf/htdocs/docs/county/internal/*.  

[sender] hiding file enf/htdocs/docs/county/internal/gis0402.pdf because of 
pattern internal/* [per-dir .rsync-filter]
[sender] hiding file enf/htdocs/docs/county/internal/gis1201.pdf because of 
pattern internal/* [per-dir .rsync-filter]
[sender] hiding directory enf/htdocs/docs/county/internal/gis1201 because of 
pattern internal/* [per-dir .rsync-filter]
[sender] hiding directory enf/htdocs/docs/county/internal/gis0402 because of 
pattern internal/* [per-dir .rsync-filter]

Is there an easy way to remedy this in the base .rsync-filter file and|or the 
rsync command?  Someway to say only exclude the base */export/home/internal/* 
directory, not any lower "internal" directories?  OR is the only way to create 
sub .rsync-filter files in other directories?  My concern with the latter 
option is that users are in control of these directories and can add and modify 
them at will.  If I find all the special cases today, this is no guarantee that 
there won't be more special cases tomorrow.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html