Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-28 Thread martin f krafft
also sprach Holger Parplies wb...@parplies.de [2011.04.20.2001 +0200]:
 We'd all like to be able to choose an existing *pool file* as
 reference - this would save us transfers of *any* file already
 existing in the pool (e.g. from other hosts). Unfortunately, this
 is technically not possible without a specialized BackupPC client.
[…]
 I hope that clears things up a bit.

Yes, thanks!



also sprach Jeffrey J. Kosowsky backu...@kosowsky.org [2011.04.21.0707 +0200]:
 Holger Parplies wrote at about 20:01:28 +0200 on Wednesday, April 20, 2011:
   4.) There *was* an attempt to write a specialized BackupPC client 
 (BackupPCd)
   quite a while back. I believe this was given up for lack of human
   resources. I always found this matter rather interesting, but I've 
 never
   gotten around to even taking a look at the code, let alone do anything
   with it.
   
 Interesting... it might be worthwhile to revisit this in the context
 of the new BackupPC ver 4. In particular, with ver 4 using full file
 md5sums (and potentially maybe other sha checksums), one could imagine
 an rsync extension that in the absence of a matching file, would first
 compute the local md5sum and transmit it to the BackupPC server to
 look for an exact pool match...

Either that, or maybe it would be possible to work on extending the
rsync protocol, so that with a future version, it would be possible
to transfer additional data alongside.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
it is only the modern that ever becomes old-fashioned.
-- oscar wilde
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see	http://martin-krafft.net/gpg/sig-policy/999bbcc4/current)
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-20 Thread Holger Parplies
Hi,

martin f krafft wrote on 2011-04-17 16:43:07 +0200 [Re: [BackupPC-users] 
Renaming files causes retransfer?]:
 also sprach John Rouillard rouilj-backu...@renesys.com [2011.04.17.1625 
 +0200]:
   In terms of backuppc, this means that the files will have to be
   transferred again, completely, right?
  
  Correct.
 
 Actually, I just did a test, using iptables to count bytes between
 the two hosts, and then renamed a 33M file. backuppc, using rsync,
 only transferred 370k. Hence I think that it actually does *not*
 transfer the whole file.

it always feels strange to contradict reality, but, in theory, there is no way
to get around transferring the file.

For the rsync algorithm to work, you need a local reference copy of the
file you want to transfer. While you and I know that there *is* a local copy,
BackupPC would need to know (a) that there is and (b) where to find it. The
only available information at the point in time where this decision needs to
be made is the (new) file name. For this, there is no candidate in the
reference backup (or any other backup, for that matter). So the file needs to
be transferred in full.

We'd all like to be able to choose an existing *pool file* as reference - this
would save us transfers of *any* file already existing in the pool (e.g. from
other hosts). Unfortunately, this is technically not possible without a
specialized BackupPC client.

 (btw, I also think that what I wrote in
 http://comments.gmane.org/gmane.comp.sysutils.backup.backuppc.general/24352
 is wrong, but I shall follow up on this when I have verified my
 findings).

Is that a backuppc-users thread I somehow missed? I see where your question
is going now, so I'll go into a bit more detail (not sure if any of this was
already mentioned in that thread).

1.) BackupPC uses already existing transfer methods for the sake of not
needing to install anything non-mainstream on the clients. In your case,
that is probably ssh + rsync.
Consequentially, BackupPC is limited to what the rsync protocol will
allow, which does *not* include, hey, send me the 1st and 8th 128kB
chunk of the file before I'll tell you the checksum I have on my side.
Such a request just doesn't make any sense for standalone rsync. We need
to select a candidate before we can start transferring blocks that don't
match (and skip blocks that do). It's really quite obvious, if you think
about it, and it only gets more complicated (but doesn't change) if you go
into the details of which rsync end plays which role in the file delta
exchange.

The same is basically true for tar and smb, respectively. The remote end
decides what data to transfer (which is whole file or nothing), and you
can take it or ignore it, but you can't prevent it from being transferred.

2.) BackupPC reads the first 1MB into memory. It needs to do so to determine
the pool file name. That should not be a problem memory-wise.

3.) BackupPC cannot, obviously, read any arbitrary size file into memory. It
also wants to avoid unnecessary (possibly extremely large) writes to the
pool FS. So it does this:
- Determine pool file candidates (possibly several, in case of pool
  collisions).
- Read pool file candidates in parallel with the network transfer.
- As soon as something doesn't match, discard the respective candidate.
- If that was the last available candidate, copy everything so far (which
  *did* match) from that candidate to a new file.
  We need to get this content from somewhere, and the network stream is,
  obviously, not seekable, so we can't re-get it from there (but then, we
  don't need to and wouldn't want to, because, hopefully, our local disk
  is faster ;-).
- If the whole candidate file matched our complete network stream, we
  have a pool match and only need to link to that.

4.) There *was* an attempt to write a specialized BackupPC client (BackupPCd)
quite a while back. I believe this was given up for lack of human
resources. I always found this matter rather interesting, but I've never
gotten around to even taking a look at the code, let alone do anything
with it.

I hope that clears things up a bit.

Regards,
Holger

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-20 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 20:01:28 +0200 on Wednesday, April 20, 2011:
  4.) There *was* an attempt to write a specialized BackupPC client (BackupPCd)
  quite a while back. I believe this was given up for lack of human
  resources. I always found this matter rather interesting, but I've never
  gotten around to even taking a look at the code, let alone do anything
  with it.
  
Interesting... it might be worthwhile to revisit this in the context
of the new BackupPC ver 4. In particular, with ver 4 using full file
md5sums (and potentially maybe other sha checksums), one could imagine
an rsync extension that in the absence of a matching file, would first
compute the local md5sum and transmit it to the BackupPC server to
look for an exact pool match...

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-18 Thread Tyler J. Wagner
On Sun, 2011-04-17 at 15:43 -0400, Jeffrey J. Kosowsky wrote:
 Well... you could write a script (or even one-liner) to do the same name 
 change (modulof-mangling) on the last backup... this would be pretty easy if 
 your
 name change is well-defined...

What about the file attributes stored in attrib in each subdir of the
pc tree?

Regards,
Tyler

-- 
A human being should be able to change a diaper, plan an invasion,
butcher a hog, conn a ship, design a building, write a sonnet, balance
accounts, build a wall, set a bone, comfort the dying, take orders, give
orders, cooperate, act alone, solve equations, analyze a new problem,
pitch manure, program a computer, cook a tasty meal, fight efficiently,
die gallantly. Specialization is for insects.
   -- Lazarus Long, Time Enough for Love, by Robert A. Heinlein


--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Renaming files causes retransfer?

2011-04-17 Thread martin f krafft
Dear list,

we are facing a policy change requiring people to rename data files
in a trivial way (replace ':' with '-').

In terms of backuppc, this means that the files will have to be
transferred again, completely, right?

Or is there a way in which I can prepare the server for this change
and prevent the completely unnecessary transfer of terabytes of
data, just so backuppc can find out that the data haven't changed?

Thanks,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
when faced with a new problem, the wise algorithmist
 will first attempt to classify it as np-complete.
 this will avoid many tears and tantrums as
 algorithm after algorithm fails.
  -- g. niruta
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see	http://martin-krafft.net/gpg/sig-policy/999bbcc4/current)
--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-17 Thread John Rouillard
On Sun, Apr 17, 2011 at 09:23:07AM +0200, martin f krafft wrote:
 we are facing a policy change requiring people to rename data files
 in a trivial way (replace ':' with '-').
 
 In terms of backuppc, this means that the files will have to be
 transferred again, completely, right?

Correct.
 
 Or is there a way in which I can prepare the server for this change
 and prevent the completely unnecessary transfer of terabytes of
 data, just so backuppc can find out that the data haven't changed?

I assume you are using rsync as your backup method. Hence I claim you
can prepare the server, but I mention YMMV, not valid in months with a
full moon or days whose english name ends in y etc. It requires
surgery on your last valid backup to account for the renaming, and may
make your last valid backup invalid for restoration.

I had had this work (I believe since the backup time/bytes transferred
was much less than it would take for it to transfer the files) a few
times. I suggest taking 2 full backups just before the rename. The
first captures any data that has changed and can be used to do
restores. The second is what you are going to operate on.

Let's call the top of your backup pc data dir (where the cpool, pool
and pc directories reside) /backuppc. Lets assume the files are being
renamed on the host client1 in the share /data and the (sub) directory
/set1. You will have a directory:

  /backuppc/pc/client1/backupnumber/f%2fdata/fset1

under there each file/directory will be represented as ffilename or
fdirectoryname. Change into the /backuppc/pc/client1/backupnumber
directory where backupnumber is the number of your last full backup.
Navigate to the directory where a file that is going to be renamed in
the last full backup and change its (mangled) name. So

  mv f20110204_11:23.dat f20110204_11-23.dat

for example.

Once you have done the surgery on the pc tree and the renames have
occurred on client1, run another full backup. What should happen is:

  the rsync full backup will use the last backup (i.e. the full you
 operated on) as it's reference backup
  since the file names in the reference backup match the file names
 on client1
  it should do a block comparison rather than transferring the
 file(s) all over again.

Since you moved the file to the new name, it will still be linked into
the pool, if you copy the file that will not be the case (and your
data needs will grow since this surgey won't cause the altered backup
to be linked into the cpool). Rather than move I suppose you could use
ln if you wanted to keep both sets of names in the modified backup.

Note that there is an attrib file in the same directory as your
data. It is a binary file that is needed to restore infomation like
uid/gid/mode ... when the files are restored. Because you renamed the
files, you won't be able to restore them cleanly from that backup tree
since there is no entry in the attrib file for the renamed
file. (Linking should get around this issue, but I haven't tried it.)

When I did this (a rename of a bunch of hard disk images backed up
across a wan), I did a couple of test files first and did full backup
and verified that I didn't move enough data (or take long enough) to
have copied the images again, so I claim it works.

Good luck. Use at your own risk etc.

I hope a supported mechanism to allow this will be in backuppc version
4 along with some way of easily importing a copy of the data taken by
other means (e.g. in a tarball on a hard drive). As it comes up often
enough with users who have large data sets.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-17 Thread martin f krafft
also sprach John Rouillard rouilj-backu...@renesys.com [2011.04.17.1625 
+0200]:
  In terms of backuppc, this means that the files will have to be
  transferred again, completely, right?
 
 Correct.

Actually, I just did a test, using iptables to count bytes between
the two hosts, and then renamed a 33M file. backuppc, using rsync,
only transferred 370k. Hence I think that it actually does *not*
transfer the whole file.

(btw, I also think that what I wrote in
http://comments.gmane.org/gmane.comp.sysutils.backup.backuppc.general/24352
is wrong, but I shall follow up on this when I have verified my
findings).

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
beauty, brains, availability, personality; pick any two.
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see	http://martin-krafft.net/gpg/sig-policy/999bbcc4/current)
--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-17 Thread Les Mikesell
On 4/17/11 2:23 AM, martin f krafft wrote:
 Dear list,

 we are facing a policy change requiring people to rename data files
 in a trivial way (replace ':' with '-').

 In terms of backuppc, this means that the files will have to be
 transferred again, completely, right?

 Or is there a way in which I can prepare the server for this change
 and prevent the completely unnecessary transfer of terabytes of
 data, just so backuppc can find out that the data haven't changed?

If you are on a reasonably fast local LAN with the clients, it may not be a 
serious problem, since the server will discard the file as soon as it detects 
the duplicate data content and a transfer isn't much slower than a normal 
full's 
read for checksum comparisons.  However, you should probably force a full run 
afterward or make the change immediately ahead of a scheduled full.  If you  
are 
doing incrementals without incremental levels, you'll transfer the changed file 
every run until a full.

If you are on a slower WAN connection, you might need to follow some of the 
other advice about renaming the underlying backuppc archive files.  If I had to 
do it, I'd try making another hardlink in the last full tree so both the old 
mangled name and the expected new one appears (i.e. link the new name to the 
existing one, you don't need to figure out the pool location) - but I really 
don't know if that is good advice or not.

-- 
Les Mikesell
 lesmikes...@gmail.com

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Renaming files causes retransfer?

2011-04-17 Thread Jeffrey J. Kosowsky
martin f krafft wrote at about 09:23:07 +0200 on Sunday, April 17, 2011:
  Dear list,
  
  we are facing a policy change requiring people to rename data files
  in a trivial way (replace ':' with '-').
  
  In terms of backuppc, this means that the files will have to be
  transferred again, completely, right?
  
  Or is there a way in which I can prepare the server for this change
  and prevent the completely unnecessary transfer of terabytes of
  data, just so backuppc can find out that the data haven't changed?
  
  Thanks,

Well... you could write a script (or even one-liner) to do the same name change 
(modulof-mangling) on the last backup... this would be pretty easy if your
name change is well-defined... you might even be able to use the
'rename' Unix function

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Renaming files causes retransfer?

2011-02-20 Thread martin f krafft
Dear list,

we are facing a policy change requiring people to rename data files
in a trivial way (replace ':' with '-').

In terms of backuppc, this means that the files will have to be
transferred again, completely, right?

Or is there a way in which I can prepare the server for this change
and prevent the completely unnecessary transfer of terabytes of
data, just so backuppc can find out that the data haven't changed?

Thanks,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
my experience is that as soon as people are old enough to know better,
 they don't know anything at all.
-- oscar wilde
 
spamtraps: madduck.bo...@madduck.net


digital_signature_gpg.asc
Description: Digital signature (see	http://martin-krafft.net/gpg/sig-policy/999bbcc4/current)
--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/