Re: TODO hardlink performance optimizations

2004-01-07 Thread John Van Essen
On Tue, 6 Jan 2004 15:11:54 -0800, jw schultz [EMAIL PROTECTED] wrote:
 On Tue, Jan 06, 2004 at 02:04:19AM -0600, John Van Essen wrote:
 On Mon, 5 Jan 2004, jw schultz [EMAIL PROTECTED] wrote:
 [snip]
  union links {
  struct idev {
  INO64_T inode;
  DEV64_T dev;
  }
  struct hlinks {
  intcount;
  file_struct**link_list;
  }
  }
  
 [snip]
 
 Now that you have opened the door for re-using the inode and
 dev area in file_struct, I can make my initial suggestion that
 requires two 'additional' pointers in file_struct.  I didn't
 suggest it before because I didn't want to add to the struct.
 
 I propose to reuse the DEV amd INODE areas to store two new
 file_struct pointers (the 2nd part of the union).  These would
 be set during the post-qsort phase during init_hard_links.
 
 The first pointer links together the file_structs for each
 hardlink group in a list in the same order as in the qsort
 so they can be walked by your proposed link-dest method.
 
 The second pointer points to the head (first file_struct of
 a hardlink group).
 
 For each file_struct that is modified in this way a flag bit
 needs to be set indicating so.
 
 No it doesn't.  After we walk the qsorted hlink_list all
 file_structs-links pointers either point to a hlinks or
 are NULL.  If there are no links file_structs-links is
 NULL.

If you qsort the entire list without first filtering out any
entries (e.g. !IS_REG), then yes, that would be true.

 Then, if the file_struct address equals its head pointer, then
 you are at a head of a hardlink list which can be processed
 using the new method using link-dest that you outlined earlier.
 
 So you are proposing a singly linked list:
  struct hlinks {
  file_struct*head;
  file_struct*next;
  }

Yes.
 
 That would work too.  With the pointers into the array
 you compare the first in the array with yourself instead of
 head.  I preferred knowing in advance how many links there
 were.

But there is no need to know exactly how many links, is there?

 I'm still not clear on what exactly will happen during that new
 processing method.  I assume that if the head file does not exist,
 it will find any existant file and hardlink it to the head file,
 and if not found, will transfer the head file.  Is that correct?
 
 Not exactly.  Like my earlier post if the head (your term)
 doesn't exist (existence including compare_dest) we iterate
 over the link set and the first that does exist is used for
 fnamecmp.  But that is phase two or three.

Let me put it another way...  Using your proposed method, after
the 'head' file is processed, will it now exist on the receiver
so that its siblings can be hardlinked to it?

 Processing of non-head files (file_struct address not equal to
 head pointer) can be skipped since they are hardlinked later.
 
 The final walk-through that creates the hardlinks for the
 non-head files can walk the qsorted list and use the head
 pointer as the target for the hardlinks of the non-nead files.
 
 Actually, if we could do the hardlinks of non-head files as
 we encounter them while walking the file list and used your
 singly linked lists we could free the hlink_list after
 traversal.  The problem is that we would need to be sure the
 head had completed processing so the hardlink would have to
 be created in receiver and that gets ugly.

Right.  You've just explained the here-to-fore unknown reason
why that sibling hlinking was being done in a separate, final
phase.  If we keep it that way, I'd like to see a comment added
to explain what you just explained.

But here's an idea, still using my proposed head/next struct...

- Make *next a circular list (the last points to the first).
- Leave *head NULL initially.
- When a file of a hlink group is examined, and *head is NULL,
  then it is the first one of that group to be processed, and
  it becomes the head.  All the *head pointers are set to its
  file_struct address.
- Subsequent processing of siblings can immediately hardlink
  them to the head file.

The drawback is that this will invoke CopyOnWrite (as dicussed
earlier in this thread).  To avoid that, *head would have to
point outside to a group head pointer, which would then be set.
So you'd need an array of pointers of the same size as the
number of distinct hardlink groups.  Say!  We already have the
sorted hlink_list.  We could just point to the first element
of the group (setting it to NULL initially) after creating
the circularly linked list.

  One possibility
 would be to keep the trailing walk-through but reduce the
 hlink_list to an array of heads.

Keep the array and use just the heads, as per above.

 No extra memory required beyond that already required for the
 qsorted pointer list.
 
 No binary search required during the file processing phase.
 
 The key is getting 

Re: TODO hardlink performance optimizations

2004-01-07 Thread John Van Essen
On Tue, 6 Jan 2004 22:33:06 -0800, Wayne Davison [EMAIL PROTECTED] wrote:
 
 I'd suggest also changing the last line of the function:
 
 -return file_compare(f1, f2);
 +return file_compare(f1p, f2p);
 
 This is because the old way asks the compiler to take the address of f1
 and f2, thus forcing them to become real stack variables.  Changing the
 code to use the passed-in f1p and f2p allows the compiler to leave both
 f1 and f2 as registers (if possible).

Good point, but I have an even better suggestion, now that I
finally understand the nuts and bolts of all the hlink.c code.

The file_compare() is invoked when the dev and inode values match
in order to present a consistent sorting order during the sort.

There is no compelling reason to have the hlink list be sorted
alphabetically.  It just has to sort consistently.  So the final
comparison can be done on the addresses of the file_structs,
since they are not moved around and will remain constant:

return ( ( f1  f2 ) ? -1 : ( f1  f2 ) );

(Unsure if the code is right, but you get my drift.)

For filesets with many hardlinks, this will use less CPU time.
-- 
John Van Essen  Univ of MN Alumnus  [EMAIL PROTECTED]


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: TODO hardlink performance optimizations

2004-01-07 Thread jw schultz
On Wed, Jan 07, 2004 at 01:04:34AM -0600, John Van Essen wrote:
 On Tue, 6 Jan 2004 15:11:54 -0800, jw schultz [EMAIL PROTECTED] wrote:
  On Tue, Jan 06, 2004 at 02:04:19AM -0600, John Van Essen wrote:
  On Mon, 5 Jan 2004, jw schultz [EMAIL PROTECTED] wrote:
  [snip]
   union links {
   struct idev {
   INO64_T inode;
   DEV64_T dev;
   }
   struct hlinks {
   intcount;
   file_struct**link_list;
   }
   }
   
  [snip]
  
  Now that you have opened the door for re-using the inode and
  dev area in file_struct, I can make my initial suggestion that
  requires two 'additional' pointers in file_struct.  I didn't
  suggest it before because I didn't want to add to the struct.
  
  I propose to reuse the DEV amd INODE areas to store two new
  file_struct pointers (the 2nd part of the union).  These would
  be set during the post-qsort phase during init_hard_links.
  
  The first pointer links together the file_structs for each
  hardlink group in a list in the same order as in the qsort
  so they can be walked by your proposed link-dest method.
  
  The second pointer points to the head (first file_struct of
  a hardlink group).
  
  For each file_struct that is modified in this way a flag bit
  needs to be set indicating so.
  
  No it doesn't.  After we walk the qsorted hlink_list all
  file_structs-links pointers either point to a hlinks or
  are NULL.  If there are no links file_structs-links is
  NULL.
 
 If you qsort the entire list without first filtering out any
 entries (e.g. !IS_REG), then yes, that would be true.

Either NULL it during the hlink_list walk or don't populate
file_struct-links while building the file list (saving a
malloc) for unlinkable files.

  Then, if the file_struct address equals its head pointer, then
  you are at a head of a hardlink list which can be processed
  using the new method using link-dest that you outlined earlier.
  
  So you are proposing a singly linked list:
   struct hlinks {
   file_struct*head;
   file_struct*next;
   }
 
 Yes.
  
  That would work too.  With the pointers into the array
  you compare the first in the array with yourself instead of
  head.  I preferred knowing in advance how many links there
  were.
 
 But there is no need to know exactly how many links, is there?

Not at this time.

  I'm still not clear on what exactly will happen during that new
  processing method.  I assume that if the head file does not exist,
  it will find any existant file and hardlink it to the head file,
  and if not found, will transfer the head file.  Is that correct?
  
  Not exactly.  Like my earlier post if the head (your term)
  doesn't exist (existence including compare_dest) we iterate
  over the link set and the first that does exist is used for
  fnamecmp.  But that is phase two or three.
 
 Let me put it another way...  Using your proposed method, after
 the 'head' file is processed, will it now exist on the receiver
 so that its siblings can be hardlinked to it?

Yes.

  Processing of non-head files (file_struct address not equal to
  head pointer) can be skipped since they are hardlinked later.
  
  The final walk-through that creates the hardlinks for the
  non-head files can walk the qsorted list and use the head
  pointer as the target for the hardlinks of the non-nead files.
  
  Actually, if we could do the hardlinks of non-head files as
  we encounter them while walking the file list and used your
  singly linked lists we could free the hlink_list after
  traversal.  The problem is that we would need to be sure the
  head had completed processing so the hardlink would have to
  be created in receiver and that gets ugly.
 
 Right.  You've just explained the here-to-fore unknown reason
 why that sibling hlinking was being done in a separate, final
 phase.  If we keep it that way, I'd like to see a comment added
 to explain what you just explained.
 
 But here's an idea, still using my proposed head/next struct...
 
 - Make *next a circular list (the last points to the first).
 - Leave *head NULL initially.
 - When a file of a hlink group is examined, and *head is NULL,
   then it is the first one of that group to be processed, and
   it becomes the head.  All the *head pointers are set to its
   file_struct address.
 - Subsequent processing of siblings can immediately hardlink
   them to the head file.
 
 The drawback is that this will invoke CopyOnWrite (as dicussed
 earlier in this thread).  To avoid that, *head would have to
 point outside to a group head pointer, which would then be set.
 So you'd need an array of pointers of the same size as the
 number of distinct hardlink groups.  Say!  We already have the
 sorted hlink_list.  We could just point to the first element
 of the group (setting it to NULL initially) after creating
 the circularly 

Re: TODO hardlink performance optimizations

2004-01-07 Thread jw schultz
On Wed, Jan 07, 2004 at 01:33:43AM -0600, John Van Essen wrote:
 On Tue, 6 Jan 2004 22:33:06 -0800, Wayne Davison [EMAIL PROTECTED] wrote:
  
  I'd suggest also changing the last line of the function:
  
  -return file_compare(f1, f2);
  +return file_compare(f1p, f2p);
  
  This is because the old way asks the compiler to take the address of f1
  and f2, thus forcing them to become real stack variables.  Changing the
  code to use the passed-in f1p and f2p allows the compiler to leave both
  f1 and f2 as registers (if possible).
 
 Good point, but I have an even better suggestion, now that I
 finally understand the nuts and bolts of all the hlink.c code.
 
 The file_compare() is invoked when the dev and inode values match
 in order to present a consistent sorting order during the sort.
 
 There is no compelling reason to have the hlink list be sorted
 alphabetically.  It just has to sort consistently.  So the final
 comparison can be done on the addresses of the file_structs,
 since they are not moved around and will remain constant:
 
 return ( ( f1  f2 ) ? -1 : ( f1  f2 ) );
 
 (Unsure if the code is right, but you get my drift.)
 
 For filesets with many hardlinks, this will use less CPU time.

There may well be good reason for having the link sets
subsorted consistantly with the file list.  See my notes
regarding COW, fork and the modification of the link info.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


2.6.0 file has vanished fails to set exit code on local client

2004-01-07 Thread John Van Essen
A new 2.6.0 feature is supposed to use a different exit code when the
only 'errors' were from files that disappeared between the building
of the file list and the actual transfer of files.

But if the client is local and the server is remote, IOERR_VANISHED
gets set on the remote server, but is never passed to the local
client (the io_error value is passed at the end of the file list,
not during or after the file transfer phase).

The old scheme used FERROR for the send_files failed to open message.
The new scheme uses FINFO for the file has vanished: message.

The client receiver sets log_got_error when it receives a FERROR
message from the sender.

The old scheme used (io_error || log_got_error) to report a
partial transfer (with no alternative of vanished files).

The new scheme uses the IOERR_VANISHED flag to distinguish the
two errors, and it will never be set in an rsync pull (nor will
log_got_error get set if vanished files are the only errors).
Hence, the exit code stays 0.

Furthermore, if the local client is pre-2.6.0 and the remote server
is 2.6.0, the same problem happens, since the only thing pre-2.6.0
keys on is an FERROR message coming from the server during the
file transfers.  So now it also (incorrectly) exits with a 0 exit
code in the case of partial transfers from a 2.6.0 server.

So this needs some work...

- On the server, if the client protocol is  27, use FERROR instead
  of FINFO so the pre-2.6.0 client can use a RERR_PARTIAL exit code.

- On the client side, it has to somehow recognize the vanished error
  on the server.  It could examine each FINFO message that comes
  over to see if it begins with file has vanished: and set the
  IOERR_VANISHED flag (but that's pretty kludgy...).

I haven't coded anything pending review of this bug by whoever
coded the IOERR_VANISHED feature to verify my analysis.  (Wayne?)
-- 
John Van Essen  Univ of MN Alumnus  [EMAIL PROTECTED]

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: TODO hardlink performance optimizations

2004-01-07 Thread John Van Essen
On Wed, 7 Jan 2004 01:30:19 -0800, jw schultz [EMAIL PROTECTED] wrote:

 The steps i see are:
 
 - The hlink_list change to a pointer array (just committed)
 
 - Create the union and change file_struct and the routines
   that reference and populate it to use the union for dev
   and inode.  This may include not allocating the union for
   unlinkable files.
 
 - Overwrite the unions with the linked list stuff and change
   the logic to use them. Also free the unions for unlinked
   files.
   (this is the biggest step)
 
 - Reduce the hlink_list to just the heads and change
   do_hard_links.
 
 - consolidate the fnamecmp finder function for
   recv_generator() in generator.c and recv_files() in
   receiver.c
 
 - Add the list walk for heads that don't exist yet.
 
 
 Each of these is a discrete step that when complete the code
 will function correctly.
 
 Feel free to start coding.  -)  Not that I'm lazy...  cough
 
 Oh! Sorry to hear that, I am.  The only thing preventing me
 from saying go ahead is my uncertainty whether we both have
 the same design.

Except for the part about allocating and freeing the union, I'm
with ya.  For this initial attempt, shall we just leave the
union in the file_struct (in place of the DEV / INODE vars)?
Maybe move it to the end where it can be conditionally allocated
(leaving a short structure when --hard-links is not used).

Shall we take the coding details discussion off-list?  I imagine
that the faithful readers of this mailing list are getting a bit
weary reading about this fairly obscure bit of code...
-- 
John Van Essen  Univ of MN Alumnus  [EMAIL PROTECTED]

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: Problem with many files in rsync server directory ?

2004-01-07 Thread Jon Hirst
Having had a night to sleep on this I think rsync's limit
on filename globbing needs pointing out more clearly.

I think we need:

1) An entry in the FAQ (Done)

2) A better error message from rsync when it exceeds the
   limit.  Saying:

  rsync: read error: Connection reset by peer
  rsync error: error in rsync protocol data stream (code 12) at
io.c(201)

   doesn't help many users.  Not even programmers with 20 years
   experience, like me ;-)

3) How about adding a file called LIMITS to the source
   distribution that tells system administrators and users
   of the limits that are built in to rsync, and how they
   can be changed.

4) Or maybe even - horror or horrors - some comments in
   the source file rsync.h.

Jon
   
-Original Message-
From: Wayne Davison [mailto:[EMAIL PROTECTED]
Sent: 06 January 2004 17:30
To: Jon Hirst
Cc: '[EMAIL PROTECTED]'
Subject: Re: Problem with many files in rsync server directory ?


On Tue, Jan 06, 2004 at 05:05:16PM -, Jon Hirst wrote:
 $ rsync [EMAIL PROTECTED]::gsh/* .

There's a limit to how many files you can glob with a wildcard.  Just
remove the wildcard and let rsync transfer the whole directory:

rsync [EMAIL PROTECTED]::gsh/ .

While you're at it, you should probably add (at least) the -t option,
which will preserve timestamps and make future updated copies faster (or
just use -a).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problem with many files in rsync server directory ?

2004-01-07 Thread jw schultz
On Wed, Jan 07, 2004 at 10:26:16AM -, Jon Hirst wrote:
 Having had a night to sleep on this I think rsync's limit
 on filename globbing needs pointing out more clearly.
 
 I think we need:
 
 1) An entry in the FAQ (Done)
 
 2) A better error message from rsync when it exceeds the
limit.  Saying:
 
   rsync: read error: Connection reset by peer
   rsync error: error in rsync protocol data stream (code 12) at
 io.c(201)
 
doesn't help many users.  Not even programmers with 20 years
experience, like me ;-)
 
 3) How about adding a file called LIMITS to the source
distribution that tells system administrators and users
of the limits that are built in to rsync, and how they
can be changed.
 
 4) Or maybe even - horror or horrors - some comments in
the source file rsync.h.

5)  It trumpeted from the mountain tops (and maybe in the
documentation somewhere) that using * to get all files in a
directory is stupid or ignorant.

  a) * and ? are globbed by the shell unless quoted and may
 produce unexpected behaviour.

  b) There are limits to the size of command-lines.

  c) filenames with spaces glob badly.

  d) The only time the argument globbing is done by rsync is
 on the daemon, all other times it is done one shell or
 another.

I've lost track of the number of times someone has
complained on this list because blah/blah/* didn't behave as
he expected and the problem went away when he dropped the
unnecessary wildcard.


-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


BUG in 2.6.0: make test failes if build dir is not source dir

2004-01-07 Thread Stefan Nehlsen

There is a small bug in the build system of 2.6.0:

If the directory you build rsync in differs from the sourcedir make test
failes:

$ tar -xzf ~/rsync-2.6.0.tar.gz
$ mkdir build
$ cd build
$ ../rsync-2.6.0/configure

$ make test

PASSunsafe-byname
PASSunsafe-links
- wildmatch log follows
Testing for symlinks using 'test -h'
+ /tmp/bla/build/wildtest
Unable to open wildtest.txt.
- wildmatch log ends
FAILwildmatch

- overall results:
  14 passed
  1 failed
  3 skipped

overall result is 1
make: *** [check] Fehler 1


The problem is in wildtest.c :

if ((fp = fopen(wildtest.txt, r)) == NULL) {
fprintf(stderr, Unable to open wildtest.txt.\n);
exit(1);
}


cu, Stefan
-- 
Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problem with many files in rsync server directory ?

2004-01-07 Thread Carson Gaspar


--On Wednesday, January 07, 2004 03:10:23 -0800 jw schultz [EMAIL PROTECTED] 
wrote:

I've lost track of the number of times someone has
complained on this list because blah/blah/* didn't behave as
he expected and the problem went away when he dropped the
unnecessary wildcard.
Hmmm... given the following files:

foo/a
foo/b
foo/c/1
how do you do rsync foo/* bar without globs? Note that this is _not_ 
recursive. All I can think of is to replace the glob with an exclude, doing 
rsync -r --exclude='*/*' foo/ bar/, which is an absolutely terrible 
construct (please recurse - whoops, just kidding!).

Hmmm... using bash, you can do rsync --files-from=(find foo/. -maxdepth 1 
! -type d -printf '%P\n') foo bar/, but that's also wretched.

--
Carson
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: Copying hard-linked tree structure

2004-01-07 Thread Max Kipness
 I have a tree structure on one server similar to the following:
  
 /Current
 /01-04-2003
 /01-03-2003
  
 etc...
  
 /Current holds the most recent rsynced data, and the date 
 directories are created with cp -al on a daily basis so they 
 are hard-linked. I'm going back 60 days.
  
 The question is how can I move this entire structure to a new 
 server and preserve the links from the date directories to 
 the /Current directory?

Well, I ended up rsyncing the root directory to the new server with the
-H option and it seemed to work. I have 30 directories for 30 days of
rotating backups.

However, I had a dir called /Current that had 12Gbs and then all the
/date directories had 120mb, 60mb, etc...the daily changes that
occurred. Well now the directory called /01-01-2004 has 12Gb and
/Current has like 100mb. I guess /01-01-2004 went first do to sorting.

Anyway to change /Current back as the real directory? Or does it even
matter?

Thanks,
Max
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: Copying hard-linked tree structure

2004-01-07 Thread Max Kipness
   I have a tree structure on one server similar to the following:

   /Current
   /01-04-2003
   /01-03-2003

   etc...

   /Current holds the most recent rsynced data, and the date 
   directories are created with cp -al on a daily basis so they are 
   hard-linked. I'm going back 60 days.

   The question is how can I move this entire structure to a 
 new server 
   and preserve the links from the date directories to the /Current 
   directory?
  
  Well, I ended up rsyncing the root directory to the new server with 
  the -H option and it seemed to work. I have 30 directories 
 for 30 days 
  of rotating backups.
  
  However, I had a dir called /Current that had 12Gbs and 
 then all the 
  /date directories had 120mb, 60mb, etc...the daily changes that 
  occurred. Well now the directory called /01-01-2004 has 12Gb and 
  /Current has like 100mb. I guess /01-01-2004 went first do 
 to sorting.
 
 It has to do with the tool you are using to measure them.
 
  Anyway to change /Current back as the real directory? Or 
 does it even 
  matter?
 
 What do you man real.  With hardlinks all links for an 
 inode are equal.

I'm using du --max-depth=1 -h on the root dir.

The actual file(s) has to be stored in some directory, right? And then
the hard links point to this directory. Well they are all pointing to
/01-01-2004 instead of /Current.

Max
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Copying hard-linked tree structure

2004-01-07 Thread jw schultz
On Wed, Jan 07, 2004 at 03:18:40PM -0600, Max Kipness wrote:
I have a tree structure on one server similar to the following:
 
/Current
/01-04-2003
/01-03-2003
 
etc...
 
/Current holds the most recent rsynced data, and the date 
directories are created with cp -al on a daily basis so they are 
hard-linked. I'm going back 60 days.
 
The question is how can I move this entire structure to a 
  new server 
and preserve the links from the date directories to the /Current 
directory?
   
   Well, I ended up rsyncing the root directory to the new server with 
   the -H option and it seemed to work. I have 30 directories 
  for 30 days 
   of rotating backups.
   
   However, I had a dir called /Current that had 12Gbs and 
  then all the 
   /date directories had 120mb, 60mb, etc...the daily changes that 
   occurred. Well now the directory called /01-01-2004 has 12Gb and 
   /Current has like 100mb. I guess /01-01-2004 went first do 
  to sorting.
  
  It has to do with the tool you are using to measure them.
  
   Anyway to change /Current back as the real directory? Or 
  does it even 
   matter?
  
  What do you man real.  With hardlinks all links for an 
  inode are equal.
 
 I'm using du --max-depth=1 -h on the root dir.
 
 The actual file(s) has to be stored in some directory, right? And then
 the hard links point to this directory. Well they are all pointing to
 /01-01-2004 instead of /Current.

Only symlinks point to another directory entry.  All
hardlinks are equal.  The way you are using it du is simply
using the directory order to pick which paths to descend
first.  ls -f should list the directory in the same order
that du does.  On the source system the directory order will
be semi-random if you have been creating and deleting
entries for awhile.  On the destination they will be in
lexical order because that was the creation order by rsync
and you haven't mixed that up yet.

Try this,
mv 01-01-2004 01-01-2004-a-long-name
mv 01-01-2004-a-long-name 01-01-2004

Now 01-01-2004 will likely not be the first on the list from
ls -f and another direcotory will likely be held responsible
for the space by du.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: TODO hardlink performance optimizations

2004-01-07 Thread Wayne Davison
On Wed, Jan 07, 2004 at 01:30:19AM -0800, jw schultz wrote:
 On Wed, Jan 07, 2004 at 02:45:46AM -0600, John Van Essen wrote:
  The point of this exercise was to find a way to avoid unnecessary
  transfers of already existing files
 I thought the point was to reduce the memory footprint and
 then get rid of the binary search.

They are both desireable goals, and I'd like to see one other:  a
reduction in number of bytes transmitted when sending hard-link data.
If we omit the dev/inode data for items that can't be linked together,
we should be able to save a large amount of transmission size (but
this will require a protocol bump).  Of course this does not mean that
the new optimized hard-link code would require this optimized sending
in order to work.

 - Create the union and change file_struct and the routines
   that reference and populate it to use the union for dev
   and inode.  This may include not allocating the union for
   unlinkable files.

I had been considering possible ways to avoid having the extra pointer
in the flist_struct, and a suggestion John made has made me think that
we can leave it out if we allow the file_struct to be of variable
length.  We'd set a flag if it has the extra trailing data, and never
refer to this data if the flag is not set.

 - Reduce the hlink_list to just the heads and change
   do_hard_links.

I'm not sure this is worth the cost of copying the bytes, but we'll
see.

 Each of these is a discrete step that when complete the code
 will function correctly.

Yes.  Nice plan.  If either of you have started coding the next stuff,
let me know -- I'm thinking about doing some coding.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: TODO hardlink performance optimizations

2004-01-07 Thread jw schultz
On Wed, Jan 07, 2004 at 03:25:39PM -0800, Wayne Davison wrote:
 On Wed, Jan 07, 2004 at 01:30:19AM -0800, jw schultz wrote:
  On Wed, Jan 07, 2004 at 02:45:46AM -0600, John Van Essen wrote:
   The point of this exercise was to find a way to avoid unnecessary
   transfers of already existing files
  I thought the point was to reduce the memory footprint and
  then get rid of the binary search.
 
 They are both desireable goals, and I'd like to see one other:  a
 reduction in number of bytes transmitted when sending hard-link data.
 If we omit the dev/inode data for items that can't be linked together,
 we should be able to save a large amount of transmission size (but

That would also require increasing the size of flags so the
savings of 8-16 bytes would be offset somewhat by a 1 byte
increase.  Most likely use 2 bits (SAME_DEV and HAVE_INODE).
That would give us 6 bits for future expansion.

I'd also want to send for all !IS_DIR and not just IS_REG.
Otherwise fixing the failure to preserve links on symlinks,
device, fifos and sockets would need yet another protocol
bump.

 this will require a protocol bump).  Of course this does not mean that
 the new optimized hard-link code would require this optimized sending
 in order to work.
 
  - Create the union and change file_struct and the routines
that reference and populate it to use the union for dev
and inode.  This may include not allocating the union for
unlinkable files.
 
 I had been considering possible ways to avoid having the extra pointer
 in the flist_struct, and a suggestion John made has made me think that
 we can leave it out if we allow the file_struct to be of variable
 length.  We'd set a flag if it has the extra trailing data, and never
 refer to this data if the flag is not set.

Runtime variable sized structures should be avoided.  Do you
want to make rdev, link and sum conditional also?  We are
replacing two u64 with one pointer that will often be NULL,
that should be enough.

If you wanted i suppose you could make rdev, link and sum a
union within file_struct since they are mutually exclusive
and dependent on IS_*(mode).  That would squeeze another 8
bytes/file with a minimal impact on the code.

  - Reduce the hlink_list to just the heads and change
do_hard_links.
 
 I'm not sure this is worth the cost of copying the bytes, but we'll
 see.

The cache lines are hot, it will free usable amounts of
memory and it will simplify subsequent logic without
complicating the code that walks the hlink_list.

 
  Each of these is a discrete step that when complete the code
  will function correctly.
 
 Yes.  Nice plan.  If either of you have started coding the next stuff,
 let me know -- I'm thinking about doing some coding.

I've not started coding beyond what i've already committed.
John seemed eager to start work on this but i'm not sure of
his status.  Having gotten the design hammered out he seemed
to wish to take implimentation details off-list, i'm sure
he'll be glad to CC you.

The transmission reduction above is largely independant of
the other code.

Q for lurkers:  What is the value of dev and inode on
systems that don't have them?  0 or -1?

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


CVS update: rsync

2004-01-07 Thread Wayne Davison

Date:   Thu Jan  8 04:56:40 2004
Author: wayned

Update of /data/cvs/rsync
In directory dp.samba.org:/tmp/cvs-serv18275

Modified Files:
proto.h 
Log Message:
The latest prototypes.


Revisions:
proto.h 1.166 = 1.167
http://www.samba.org/cgi-bin/cvsweb/rsync/proto.h.diff?r1=1.166r2=1.167
___
rsync-cvs mailing list
[EMAIL PROTECTED]
http://lists.samba.org/mailman/listinfo/rsync-cvs