[BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Tate1

Thanks to all ! and sorry if I don't understand some answers properly but as i 
said my english is not very good.

Ok, i'll talk with my boss and see what he let me to do with the server. If 
there isn't any other solution I will do RAID between disks.

Just for curious: could I install 2 BACKUPPC's, one in each HDD? I mean, Disk1 
like now with the O.S. and the actual backuppc with PC1, and install another 
backuppc in Disk2 with pc2.

probably be a nonsense... but it's an idea

+--
|This was sent by asdrub...@mailinator.com via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Skip Guenter


On Tue, 2009-06-02 at 16:36 +1000, Adam Goryachev wrote:
> -BEGIN PGP SIGNED MESSAGE-
> > With a modern filesystem capable of multiple copies of each file this
> > can be overcome. ZFS can handle multiple drive failures by selecting the
> > number of redundant copies of each file to store on different physical
> > volumes.  Simply put, a ZFS RAIDZ with 4 drives can be set to have 3
> > copies which would allow 2 drives to fail.  This is somewhat better than
> > RAID1 and RAID5  both because more storage is available yet still allows
> > up to 2 drives to fail before leaving a rebuild hole where the storage
> > is vulnerable to a single drive failure during a rebuild or resilver.
> 
> So, using 4 x 100G drives provides 133G usable storage... we can lose
> any two drives without any data loss. However, from my calculations
> (which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
> system you get 200G available storage, and can lose any two drives
> without data loss.

So isn't this the same as RAID10 w/ 4 drives, 200GB and can lose 2
drives (as long as they aren't on the same mirror) and no risk of
corrupted parity blocks?


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-02 Thread Tino Schwarze
On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote:

> Is the blockdevel-level rsync-like solution going to be something 
> publicly available? 

blockdev-level rsync smells like drbd. I'm not sure whether it support
such huge amounts of unsynchronized data, but it might just be a matter
of configuration.

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
I have read with interest various threads on this list concerning 
methods of how to back up a backuppc server to a remote file system over 
the internet. My impression from reading the threads is that there is no 
*good* way - that rsync is a poor choice if you have many hardlinks, and 
methods like copying a "snapshot"  of a block-level device are 
inefficient if only a relatively small proportion of the data changes. I 
have tried both methods, and am not satisfied with the performance and 
efficiency of either. In addition, BackupPC is not compatible with 
'cloud' storage systems - at least the ones I have looked at do not seem 
to support hardlinks.

As a Linux newbie, I have only a partial understanding of the technology 
underlying Linux and BackupPC, but I get the impression that the problem 
with a rsync-like solution is that processing hardlinks is very 
expensive in terms of cpu time and memory resources. This may be a 
stupid question, but, if hardlinks are the problem, has any thought been 
given to adding to BackupPC an option to use some form of database 
(text, SQL or otherwise) to associate hashes to files, instead? It seems 
to me that using hardlinks is in fact using that feature of the file 
system *as* a database, a use that does not appear to be optimal ... if 
I have misunderstood, please educate me :-)

Peter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Tino Schwarze
On Tue, Jun 02, 2009 at 06:27:35AM -0400, Peter Walter wrote:

> As a Linux newbie, I have only a partial understanding of the technology 
> underlying Linux and BackupPC, but I get the impression that the problem 
> with a rsync-like solution is that processing hardlinks is very 
> expensive in terms of cpu time and memory resources. This may be a 
> stupid question, but, if hardlinks are the problem, has any thought been 
> given to adding to BackupPC an option to use some form of database 
> (text, SQL or otherwise) to associate hashes to files, instead? It seems 
> to me that using hardlinks is in fact using that feature of the file 
> system *as* a database, a use that does not appear to be optimal ... if 
> I have misunderstood, please educate me :-)

An SQL approach would be rather complicated because it would have to
support a directory structure. We would end up with ... a filesystem!
The nice thing about using hardlinks is that the operating system keeps
track of the link count and we can use that link count to check for
superfluous files. This might be doable in a database as well, but
we'd have to keep a file system and a database in sync. Doable, but
error-prone. With the current design, there is only a file system.

Tino, not doing backups of the pool, but archiving hosts to tape.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Mirco Piccin
Hi

> Tino, not doing backups of the pool, but archiving hosts to tape.

thanks, Tino; also i think that this is the better solution: using the
archive tool.
Regards
M

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-02 Thread Pieter Wuille
On Tue, Jun 02, 2009 at 11:44:11AM +0200, Tino Schwarze wrote:
> On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote:
> 
> > Is the blockdevel-level rsync-like solution going to be something 
> > publicly available? 

We certainly intend to, but no guarantee it ever gets finished. Except for
implementation there's not that much work left, but we do it in our free
time.

It really seems strange something like that doesn't exist yet (and rsync
itself doesn't support blockdevices).

> blockdev-level rsync smells like drbd. I'm not sure whether it support
> such huge amounts of unsynchronized data, but it might just be a matter
> of configuration.

In fact, i'd say LVM should be able to do this: generate a blocklevel-diff
between two snapshots of the same volume, and create a new snapshot/volume
based on an old one + a diff. Eg. ZFS supports this using send/receive.
So far, i haven't read about support for such a feature.
On the other hand, i think i've read on this list that using zfs send/receive
for backuppc pools was very slow (but that's on a filesystem level, not
blockdev level).

Drdb might be a solution too - i haven't looked at it closely. It seems
more meant for high availability, probably it can be used for offsite-backup
too. It has support for recovery after disconnection/failure, so maybe you can
use it to keep older versions on a remote system by forcibly disconnecting the
nodes. I don't know how easy it would be to migrate a non-drdb volume either.
Anyone experience with this in combination with backuppc?

-- 
Pieter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
Tino Schwarze wrote:
> An SQL approach would be rather complicated because it would have to
> support a directory structure. We would end up with ... a filesystem!
>   

Yes, but SQL databases are not the only game in town. There are other 
database architectures that would be a good choice for supporting a 
hierarchical structure, and, although I am not a Perl programmer, I am 
told by my friends who are into Perl that Perl's superior 
text-processing capabilities makes databases built solely on text files 
feasible, without requiring inclusion of or dependencies on another 
database project.

> The nice thing about using hardlinks is that the operating system keeps
> track of the link count and we can use that link count to check for
> superfluous files. This might be doable in a database as well, but
> we'd have to keep a file system and a database in sync. Doable, but
> error-prone. With the current design, there is only a file system.
>   
Yes, but there are significant tradeoffs. With the current design, you 
can't use backuppc effectively to backup a backuppc server - a 
surprising limitation for arguably the best open-source backup program. 
You can't use file systems that don't support hardlinks. You can't use 
'cloud' storage. I understand that keeping a file system and a database 
in sync opens the door for error - but there are many techniques that 
can be used to maximize database integrity at the expense of processing 
time and memory resources.

> Tino, not doing backups of the pool, but archiving hosts to tape.
>   
Yes, but my need, and I am sure, the need of many others, is to do 
remote backups. Tape works if you are local only. Additionally, tapes 
are not scaleable (you can't just run out and get a bigger tape if you 
need one - you have to change your entire tape infrastructure).

I urge the developers to consider *adding* a solution that does not 
depend on hardlinks. Such a solution could be run in parallel with using 
hardlinks and be backward-compatible.

Peter

Peter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] why hard links?

2009-06-02 Thread Josh Rubin
I hope this question isn't too dumb, but why does backuppc need to use
hard links?
Of course, nobody wants to change a widely used program in a way that
breaks everything,
so this is really a question about some new, future program.
Isn't the pool just a form of hash table that could be implemented lots
of ways?

-- 
Josh Rubin jlru...@gmail.com

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Stephen Joyce
I'd like to see BackupPC adopt an asynchronous transactional replication 
scheme for replicating servers. Given the way BPC uses the underlying 
filesystem, it seems the most robust and flexible choice.


Cyrus Imapd (which can scale to at least hundreds of thousands of users and 
millions of mailboxes) chose this model of replication over 5 years ago and 
once setup, it works pretty well.


Details at:
http://cyrusimap.web.cmu.edu/imapd/install-replication.html
http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html

Feel free to discuss, or to move to backuppc-devel.

--
Cheers,
Stephen

On Tue, 2 Jun 2009, Tino Schwarze wrote:


On Tue, Jun 02, 2009 at 06:27:35AM -0400, Peter Walter wrote:


As a Linux newbie, I have only a partial understanding of the technology
underlying Linux and BackupPC, but I get the impression that the problem
with a rsync-like solution is that processing hardlinks is very
expensive in terms of cpu time and memory resources. This may be a
stupid question, but, if hardlinks are the problem, has any thought been
given to adding to BackupPC an option to use some form of database
(text, SQL or otherwise) to associate hashes to files, instead? It seems
to me that using hardlinks is in fact using that feature of the file
system *as* a database, a use that does not appear to be optimal ... if
I have misunderstood, please educate me :-)


An SQL approach would be rather complicated because it would have to
support a directory structure. We would end up with ... a filesystem!
The nice thing about using hardlinks is that the operating system keeps
track of the link count and we can use that link count to check for
superfluous files. This might be doable in a database as well, but
we'd have to keep a file system and a database in sync. Doable, but
error-prone. With the current design, there is only a file system.

Tino, not doing backups of the pool, but archiving hosts to tape.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de
--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Peter Walter wrote at about 06:27:35 -0400 on Tuesday, June 2, 2009:
 > I have read with interest various threads on this list concerning 
 > methods of how to back up a backuppc server to a remote file system over 
 > the internet. My impression from reading the threads is that there is no 
 > *good* way - that rsync is a poor choice if you have many hardlinks, and 
 > methods like copying a "snapshot"  of a block-level device are 
 > inefficient if only a relatively small proportion of the data changes. I 
 > have tried both methods, and am not satisfied with the performance and 
 > efficiency of either. In addition, BackupPC is not compatible with 
 > 'cloud' storage systems - at least the ones I have looked at do not seem 
 > to support hardlinks.
 > 
 > As a Linux newbie, I have only a partial understanding of the technology 
 > underlying Linux and BackupPC, but I get the impression that the problem 
 > with a rsync-like solution is that processing hardlinks is very 
 > expensive in terms of cpu time and memory resources. This may be a 
 > stupid question, but, if hardlinks are the problem, has any thought been 
 > given to adding to BackupPC an option to use some form of database 
 > (text, SQL or otherwise) to associate hashes to files, instead? It seems 
 > to me that using hardlinks is in fact using that feature of the file 
 > system *as* a database, a use that does not appear to be optimal ... if 
 > I have misunderstood, please educate me :-)
 > 
 > Peter
 > 

Indeed this has been discussed many times before ;) -- see the archives.

That being said, I agree that using a database to store both the
hardlinks along with the metadata stored in the attrib files would be
a more elegant, extensible, and platform-independent solution though
presumably it would require a major re-write of BackupPC.

I certainly understand why BackupPC uses hardlinks since it allows for
an easy way to do the pooling and in a sense as you suggest uses the
filesystem as a rudimentary database.

On the other hand as I and others have mentioned before moving to a
database would add the following advantages:

1. Platform and filesystem independence -- BackupPC would no longer
   depend on the specific hard link behaviors of linux and associated
   filesytems.

2. It would be easier to extend the attrib notion to store extended
   attributes whether for Linux (e.g., selinux attributes), Windows
   (e.g., ACL attributes) or any other OS.

3. The pool could be split among multiple disks and filesystems since
   it would no longer depend on hard-link behavior

4. Backing up BackupPC backups would be much easier and faster since
   you no longer would have hard links to worry about -- just backup
   the database and any portion of the pool that you want to.

5. The whole system would be more elegant and extensible since all
   types of metadata could be stored in the database rather than being
   stored in various files in the BackupPC tree. For example,
 - You wouldn't need the kludge of file mangling
 - Checksums could be stored in the database rather than being
   appended in a non-standard way to the end of the file
 - File level encryption could easily be added
 - Alternative file-level compression schemes could easily be
   supported.
 - The host-specific config data (and maybe even all the config
   data) could be stored in tables rather than in individual
   config files
 - The 'backups' file could also be stored as a table

6. Presumably a database architecture would also make it easier to
   have more granular control over user access and permissions at the
   feature and file level.

The challenge though is that to do this right (i.e. in a way that is
both elegant and extensible) would require a substantial if not almost
complete re-write of BackupPC and I'm not sure that Craig (or anybody
else for that matter) are willing to sign up for that...

Still, it would be awesome to combine the simplicity and pooling
structure of BackupPC with the flexibility of a database
architecture...


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Les Mikesell
Adam Goryachev wrote:
>
> Is it really worthwhile considering a 3 drive RAID1 system, or even a 4
> drive RAID1 system (one hot spare). Of course, worthwhile depends on the
> cost of not having access to the data, but from a "best practice" point
> of view. ie, Looking at any of the large "online backup" companies, or
> gmail backend, etc... what level of redundancy is considered acceptable.
> (Somewhat surprising actually that google/hotmail/yahoo/etc have ever
> lost any data...)

I use a 3-member software RAID1 with one of the members created as 
'missing', then periodically connect an equal-sized drive, add it to the 
array and let it sync, then fail and remove it and rotate offsite.  I 
used to use external firewire drives as the rotated media, then switched 
to SATA in a trayless hot-swap enclosure.  This leaves a pair of 
internal drives running as mirrors all the time (and yes, I have had 
these fail, including a time when I only had one internal drive and had 
to copy back from a week-old version).  The advantage of this approach 
is that the mirroring to the rotating drives can happen with the system 
still running and the partition mounted.  I only stop backuppc and 
unmount it momentarily while failing the drive - and even if I didn't, 
it would recover like it would if the system had crashed at that point. 
  Realistically, though, the disk is pretty busy during the sync and it 
would not work well to have much other activity happening.  It takes 
about 2 hours to sync a 750 Gig partition if there are no backups or 
restores running.  If you didn't have the raid set up this way you could 
simply stop backuppc, unmount the archive partition and use dd to 
image-copy to an equal sized drive or partition.

I also have a USB cable adapter for the drive so I can mount the offsite 
copies in a laptop or elsewhere in case of emergencies.

> So, using 4 x 100G drives provides 133G usable storage... we can lose
> any two drives without any data loss. However, from my calculations
> (which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
> system you get 200G available storage, and can lose any two drives
> without data loss.

The real advantage of RAID1 is that you can access the data on any 
single drive.  The disadvantage is that you are limited to the size of a 
single drive (probably 2TB now) and the speed it can work.

>> ZFS also is able to put metadata on a different volume and even have a
>> cache on a different volume which can spread out the chance of a loss. 
>> very complicated schemes can be developed to minimize data loss.
> 
> In my experience, if it is too complicated:
> 1) Very few people use it because they don't understand it

Well, mostly they don't use it because Linux doesn't include it...

> 2) Some people who use it, use it in-correctly, and then don't
> understand why they lose data (see the discussion of people who use RAID
> controller cards but don't know enough to read the logfile on the RAID
> card when recovering from failed drives).

Again, it is hard to beat software RAID1 for data recovery.  Take any 
drive that still works, plug into any computer that still works, mount 
it and go.

> Last time I heard of someone using ZFS for their backuppc pool under
> linux, they didn't seem to consider it ready for production use due to
> the significant failures. Is this still true, or did I mis-read something?

ZFS doesn't work with Linux.  The freebsd version might work well enough 
but it would make more sense to use some flavor of opensolaris if you 
want ZFS now.  It should be possible to use the ZFS incremental 
send/receive function to keep a remote system in sync but I don't think 
anyone has done a serious test with backuppc.

> Personally, I used reiserfs for years, and once or twice had some
> problems with it (actually due to RAID hardware problems). I have
> somewhat moved to ext3 now due to the 'stigma' that seems to be attached
> to reiserfs. I don't want to move to another FS before it is very stable...

I used reiserfs for a while with no problems too. Before ext3 it was the 
best way to avoid long fscks.  However I'm not sure I'd expect an fsck 
to be reliable if you ever do need recovery.

>> On-line rebuilds and
>> filesystems aware of the disk systems are becoming more and more relevant.
> 
> I actually thought it would be better to disable these since it:
> 1) increases wear 'n' tear on the drives
> 2) what happens if you have a drive failure in the middle of the rebuild?
> 
> Mainly the 2nd one scared me the most.

I've had that with a 2-drive RAID1 and you end up with both drives bad. 
  Which is why I now use a 3-drive mirror and 4 drives, one of which is 
always offsite.

--
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the late

Re: [BackupPC-users] why hard links?

2009-06-02 Thread Les Mikesell
Josh Rubin wrote:
> I hope this question isn't too dumb, but why does backuppc need to use
> hard links?
> Of course, nobody wants to change a widely used program in a way that
> breaks everything,
> so this is really a question about some new, future program.
> Isn't the pool just a form of hash table that could be implemented lots
> of ways?

Creating a link is an atomic operation in the fileystem.  With anything 
else you would have to add the overhead of some locking mechanism (with 
the potential for deadlocks) to be sure that the equivalent to the link 
and the count of them stayed in sync.   And, not too surprisingly it 
turns out that filesystem operations are actually a pretty good way to 
deal with storing files - it might not be impossible to do it better but 
it wouldn't be easy.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Tino Schwarze wrote at about 13:07:29 +0200 on Tuesday, June 2, 2009:
 > On Tue, Jun 02, 2009 at 06:27:35AM -0400, Peter Walter wrote:
 > 
 > > As a Linux newbie, I have only a partial understanding of the technology 
 > > underlying Linux and BackupPC, but I get the impression that the problem 
 > > with a rsync-like solution is that processing hardlinks is very 
 > > expensive in terms of cpu time and memory resources. This may be a 
 > > stupid question, but, if hardlinks are the problem, has any thought been 
 > > given to adding to BackupPC an option to use some form of database 
 > > (text, SQL or otherwise) to associate hashes to files, instead? It seems 
 > > to me that using hardlinks is in fact using that feature of the file 
 > > system *as* a database, a use that does not appear to be optimal ... if 
 > > I have misunderstood, please educate me :-)
 > 
 > An SQL approach would be rather complicated because it would have to
 > support a directory structure. We would end up with ... a filesystem!
 > The nice thing about using hardlinks is that the operating system keeps
 > track of the link count and we can use that link count to check for
 > superfluous files. This might be doable in a database as well, but
 > we'd have to keep a file system and a database in sync. Doable, but
 > error-prone. With the current design, there is only a file system.

I agree a database architecture adds some complexity but I'm not sure I
agree with your other points.

First, the 'attrib' file (along with the attendant complexity involved
in filling-in incremental backups from previous incrementals) is also
in a sense writing a filesystem on top of a filesystem since the
attrib file encodes file types (including files, directories, soft &
hard links, etc.) and file attribs. If anything, it is kludgey in that
things like hard links are represented in a non-natural
manner. Indeed, once the initial database architecture is established,
I think that extending the database architecture to include additional
attributes is far *simpler* than trying to extend the attrib
structure. For example, backuppc still doesn't account for selinux
extended attributes let alone more general linux ACLs or even Windows
ACLs.

I also don't understand or agree with your points about the difficulty
and error prone nature of synchronizing a database with the pool. If
anything I think that sprinkling thousands of attrib files all over
the place is much more complex, error-prone, and harder to guarantee
integrity. In the database world, you just have a single database and
a pool of files. Also modern databases have a lot more tools and
safeguards for checking and maintaining integrity than older
filesystems like ext2/ext3. Finally, once a day, a process similar to
BackupPC_nightly could be run to crawl the database to delete unneeded
pool entries (or it could be done real-time whenever backups are
deleted).

Now a database architecture would probably be slower for some
operations than raw filesystem access, but for other operations such
as restoring an incremental backup, I wouldn't be surprised if a
databases system would be much faster since the reconstruction and
inheritance could be optimized within the database rather than having
to open multiple trees of attrib files and to reconstruct the
inheritance logic in real-time.

In my mind the only major reason not to move to a database
architecture is that it would require a substantial re-write of
BackupPC as pointed out in my earlier note.

 > 
 > Tino, not doing backups of the pool, but archiving hosts to tape.
 > 
 > -- 
 > "What we nourish flourishes." - "Was wir nähren erblüht."
 > 
 > www.lichtkreis-chemnitz.de
 > www.craniosacralzentrum.de
 > 
 > --
 > OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
 > looking to deploy the next generation of Solaris that includes the latest 
 > innovations from Sun and the OpenSource community. Download a copy and 
 > enjoy capabilities such as Networking, Storage and Virtualization. 
 > Go to: http://p.sf.net/sfu/opensolaris-get
 > ___
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourcef

Re: [BackupPC-users] why hard links?

2009-06-02 Thread Jeffrey J. Kosowsky
Josh Rubin wrote at about 08:51:43 -0400 on Tuesday, June 2, 2009:
 > I hope this question isn't too dumb, but why does backuppc need to use
 > hard links?
 > Of course, nobody wants to change a widely used program in a way that
 > breaks everything,
 > so this is really a question about some new, future program.
 > Isn't the pool just a form of hash table that could be implemented lots
 > of ways?

Yes - you could use a database -- see other active thread. But
BackupPC uses hard links presumably because it was initially easier,
faster, and more elegant.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] why hard links?

2009-06-02 Thread Peter Walter
Les Mikesell wrote:
> Josh Rubin wrote:
>   
>> I hope this question isn't too dumb, but why does backuppc need to use
>> hard links?
>> Of course, nobody wants to change a widely used program in a way that
>> breaks everything,
>> so this is really a question about some new, future program.
>> Isn't the pool just a form of hash table that could be implemented lots
>> of ways?
>> 
>
> Creating a link is an atomic operation in the fileystem.  With anything 
> else you would have to add the overhead of some locking mechanism (with 
> the potential for deadlocks) to be sure that the equivalent to the link 
> and the count of them stayed in sync.   And, not too surprisingly it 
> turns out that filesystem operations are actually a pretty good way to 
> deal with storing files - it might not be impossible to do it better but 
> it wouldn't be easy.
>   

Locking mechanisms aren't always necessary. One of the advantages of 
hashed database systems, such as linear hash technology, is that 
collisions are easily managed, and semaphores are usually adequate as 
simple locking mechanisms if they are needed.


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Peter Walter wrote at about 07:44:49 -0400 on Tuesday, June 2, 2009:
 > Tino Schwarze wrote:
 > > An SQL approach would be rather complicated because it would have to
 > > support a directory structure. We would end up with ... a filesystem!
 > >   
 > 
 > Yes, but SQL databases are not the only game in town. There are other 
 > database architectures that would be a good choice for supporting a 
 > hierarchical structure, and, although I am not a Perl programmer, I am 
 > told by my friends who are into Perl that Perl's superior 
 > text-processing capabilities makes databases built solely on text files 
 > feasible, without requiring inclusion of or dependencies on another 
 > database project.
 > 

I think using databases built solely on text files would be slow. In
fact, that is essentially what BackupPC does now in that the attrib
files are really individual database files constructed for each
directory. However, each time they are accessed, they need to be read
in, unpacked, parsed, edited, repacked, rewritten, etc. which is slow
and kludgy. What we really want here is a relational database - that
would allow access to all the attribs and metadata.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
Jeffrey J. Kosowsky wrote:
> Indeed this has been discussed many times before ;) -- see the archives.
>   
Yes, I have reviewed them.
> That being said, I agree that using a database to store both the
> hardlinks along with the metadata stored in the attrib files would be
> a more elegant, extensible, and platform-independent solution though
> presumably it would require a major re-write of BackupPC.
>   


I agree that the arguments you present seem to have unassailable 
advantages.  I don't think it would require a *major* rewrite - but, 
only the authors and maintainers would be able to size the effort 
properly. I think it could first be implemented as a new feature, along 
with the present methods, and the "old" way of doing things becomes 
deprecated over time.

> Still, it would be awesome to combine the simplicity and pooling
> structure of BackupPC with the flexibility of a database
> architecture...
>   
I, for one, would be willing to contribute financially and with my very 
limited skills if Craig, or others, were willing to undertake such an 
effort. Perhaps Craig would care to comment.

Peter


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Jeffrey J. Kosowsky wrote at about 08:57:54 -0400 on Tuesday, June 2, 2009:
 > Peter Walter wrote at about 06:27:35 -0400 on Tuesday, June 2, 2009:
 >  > I have read with interest various threads on this list concerning 
 >  > methods of how to back up a backuppc server to a remote file system over 
 >  > the internet. My impression from reading the threads is that there is no 
 >  > *good* way - that rsync is a poor choice if you have many hardlinks, and 
 >  > methods like copying a "snapshot"  of a block-level device are 
 >  > inefficient if only a relatively small proportion of the data changes. I 
 >  > have tried both methods, and am not satisfied with the performance and 
 >  > efficiency of either. In addition, BackupPC is not compatible with 
 >  > 'cloud' storage systems - at least the ones I have looked at do not seem 
 >  > to support hardlinks.
 >  > 
 >  > As a Linux newbie, I have only a partial understanding of the technology 
 >  > underlying Linux and BackupPC, but I get the impression that the problem 
 >  > with a rsync-like solution is that processing hardlinks is very 
 >  > expensive in terms of cpu time and memory resources. This may be a 
 >  > stupid question, but, if hardlinks are the problem, has any thought been 
 >  > given to adding to BackupPC an option to use some form of database 
 >  > (text, SQL or otherwise) to associate hashes to files, instead? It seems 
 >  > to me that using hardlinks is in fact using that feature of the file 
 >  > system *as* a database, a use that does not appear to be optimal ... if 
 >  > I have misunderstood, please educate me :-)
 >  > 
 >  > Peter
 >  > 
 > 
 > Indeed this has been discussed many times before ;) -- see the archives.
 > 
 > That being said, I agree that using a database to store both the
 > hardlinks along with the metadata stored in the attrib files would be
 > a more elegant, extensible, and platform-independent solution though
 > presumably it would require a major re-write of BackupPC.
 > 
 > I certainly understand why BackupPC uses hardlinks since it allows for
 > an easy way to do the pooling and in a sense as you suggest uses the
 > filesystem as a rudimentary database.
 > 
 > On the other hand as I and others have mentioned before moving to a
 > database would add the following advantages:
 > 
 > 1. Platform and filesystem independence -- BackupPC would no longer
 >depend on the specific hard link behaviors of linux and associated
 >filesytems.
 > 
 > 2. It would be easier to extend the attrib notion to store extended
 >attributes whether for Linux (e.g., selinux attributes), Windows
 >(e.g., ACL attributes) or any other OS.
 > 
 > 3. The pool could be split among multiple disks and filesystems since
 >it would no longer depend on hard-link behavior
 > 
 > 4. Backing up BackupPC backups would be much easier and faster since
 >you no longer would have hard links to worry about -- just backup
 >the database and any portion of the pool that you want to.
 > 
 > 5. The whole system would be more elegant and extensible since all
 >types of metadata could be stored in the database rather than being
 >stored in various files in the BackupPC tree. For example,
 >   - You wouldn't need the kludge of file mangling
 >   - Checksums could be stored in the database rather than being
 > appended in a non-standard way to the end of the file
 >   - File level encryption could easily be added
 >   - Alternative file-level compression schemes could easily be
 > supported.
 >   - The host-specific config data (and maybe even all the config
 > data) could be stored in tables rather than in individual
 > config files
 >   - The 'backups' file could also be stored as a table
 > 
 > 6. Presumably a database architecture would also make it easier to
 >have more granular control over user access and permissions at the
 >feature and file level.
 > 
 > The challenge though is that to do this right (i.e. in a way that is
 > both elegant and extensible) would require a substantial if not almost
 > complete re-write of BackupPC and I'm not sure that Craig (or anybody
 > else for that matter) are willing to sign up for that...
 > 
 > Still, it would be awesome to combine the simplicity and pooling
 > structure of BackupPC with the flexibility of a database
 > architecture...
 > 

One more advantage of a database architecture:

7. Reconstructing incremental backups would be simpler and faster
   since the database could point directly to the file rather than
   having to crawl through a tree of attrib files to reconstruct the
   hierarchy of which files have changed or not.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Down

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> 
> In my mind the only major reason not to move to a database
> architecture is that it would require a substantial re-write of
> BackupPC as pointed out in my earlier note.

Do you actually have any experience with large scale databases?  I think 
most installations that come anywhere near the size and activity of a 
typical backuppc setup would require a highly experienced DBA to 
configure and would have to be spread across many disks to have adequate 
performance.  When you get down to the real issues, normal operation has 
a bottleneck with disk head motion which a database isn't going to do 
any better without someone knowing how to tune it across multiple disks. 
Also, while some database do offer remote replication, it isn't 
magic either and keeping it working isn't a common skill.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] why hard links?

2009-06-02 Thread Les Mikesell
Peter Walter wrote:
> Les Mikesell wrote:
>> Josh Rubin wrote:
>>   
>>> I hope this question isn't too dumb, but why does backuppc need to use
>>> hard links?
>>> Of course, nobody wants to change a widely used program in a way that
>>> breaks everything,
>>> so this is really a question about some new, future program.
>>> Isn't the pool just a form of hash table that could be implemented lots
>>> of ways?
>>> 
>> Creating a link is an atomic operation in the fileystem.  With anything 
>> else you would have to add the overhead of some locking mechanism (with 
>> the potential for deadlocks) to be sure that the equivalent to the link 
>> and the count of them stayed in sync.   And, not too surprisingly it 
>> turns out that filesystem operations are actually a pretty good way to 
>> deal with storing files - it might not be impossible to do it better but 
>> it wouldn't be easy.
>>   
> 
> Locking mechanisms aren't always necessary. One of the advantages of 
> hashed database systems, such as linear hash technology, is that 
> collisions are easily managed, and semaphores are usually adequate as 
> simple locking mechanisms if they are needed.

Collisions aren't quite the point - you have to manage that anyway.  The 
hard part is knowing that the final target you link to is the one that 
you wanted, not a something created simultaneously by a different 
process doing the same computations, and knowing that the count of 
existing links always matches the actual copies.  The kernel manages 
this automatically when using links.  If you have to add an extra system 
call to lock/unlock around some other operation you'll triple the overhead.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Peter Walter wrote:
> 
>> Still, it would be awesome to combine the simplicity and pooling
>> structure of BackupPC with the flexibility of a database
>> architecture...
>>   
> I, for one, would be willing to contribute financially and with my very 
> limited skills if Craig, or others, were willing to undertake such an 
> effort. Perhaps Craig would care to comment.

The first thing needed would be to demonstrate that there would be an 
advantage to a database approach - like some benchmarks showing an 
improvement in throughput in the TB size range and measurements of the 
bandwidth needed for remote replication.

Personally I think the way to make things better would be to have a 
filesystem that does block-level de-duplication internally. Then most of 
what backuppc does won't even be necessary.   There were some 
indications that this would be added to ZFS at some point, but I don't 
know how the Oracle acquisition will affect those plans.

Meanwhile, if someone has time to kill doing benchmark measurements, 
using ZFS with incremental send/receive to maintain a remote filesystem 
snapshot would be interesting.  Or perhaps making a vmware vmdk disk 
with many small (say 1 or 2 gig) elements and running backuppc in a 
virtual machine.  Then for replication, stop the virtual machine and 
rsync the directory containing the disk image files.  This might even be 
possible without stopping if you can figure out how vmware snapshots work.

-- 
   Les Mikesell
lesmikes...@gmail.com




--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Robert J. Phillips
I like the idea of a discussion about the advantages and disadvantages
of RAID arrangements.

 

Another solution to the two hard drives backing up might be to use Raid
0 (striping).  This does not allow redundancy but it does let you
combine the drives so the system sees them as one drive.

 

I have set up an Ubuntu 8.10 server with a 250gb boot disk and 4 1tb
SATA hard drives in Raid 5 (using software Raid).

 

I trust Raid 5.  We have a mail server that has Raid 5 and twice we have
lost one of the drives and been able to get the new drive, put it back
into the Raid and let it rebuild with no down time except the time to
put the new drive in (20 minutes).  

 

Something else I learned about Raid 5 and 6 from my research into a
Blade server and a SAN is that the more drives you have in your Raid it
increases that bandwidth (throughput) of the data letting everything
work faster.

 

I don't necessarily consider the loss of the BackupPC data a
catastrophe.  I could see how it could be for someone that really wants
to keep archive copies of backups.  If we have a total failure of
BackupPC we would rebuild it and start backing up again.  I guess the
worse scenario would be something that would not only hurt our BackupPC
server but also damage several other servers at the same time.  

 

The question comes down to each individual and how much time can you
afford to rebuild the data.  Is that downtime worth the extra money to
put in more redundancy in the drives, offsite solutions or clustered
servers.

 

There are so many options but they all have a positive and a negative
and I don't think any solution, no matter how much money you have, is
100% failsafe.  The odds of not being able to recover may be small but
it is still there.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] BackupPC 3.2.0_Beta installed on Ubuntu 8.1

2009-06-02 Thread Robert J. Phillips
Has anyone upgraded BackupPC 3.1.0 to 3.2.0_Beta on an Ubuntu server?  I
have finally got my system running and been operating for about a month.
It is a great product.

 

I have the 0gb errors caused by not putting the data in the default data
folder and I understand that this is fixed in the Beta version. Most of
my servers are using RSyncd but I have a couple of servers that would
benefit from the FTP transfer in the new Beta.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Les Mikesell
Tate1 wrote:
> Thanks to all ! and sorry if I don't understand some answers properly but as 
> i said my english is not very good.
> 
> Ok, i'll talk with my boss and see what he let me to do with the server. If 
> there isn't any other solution I will do RAID between disks.
> 
> Just for curious: could I install 2 BACKUPPC's, one in each HDD? I mean, 
> Disk1 like now with the O.S. and the actual backuppc with PC1, and install 
> another backuppc in Disk2 with pc2.
> 
> probably be a nonsense... but it's an idea

That is theoretically possible but would take some code changes to avoid 
  conficts with locations and network ports.  It might be easier to run 
vmware or virtualbox so it looks like a completely different machine 
that could run a stock version of backuppc as the 2nd instance.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 09:50:25 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > > 
 > > In my mind the only major reason not to move to a database
 > > architecture is that it would require a substantial re-write of
 > > BackupPC as pointed out in my earlier note.
 > 
 > Do you actually have any experience with large scale databases?  I think 
 > most installations that come anywhere near the size and activity of a 
 > typical backuppc setup would require a highly experienced DBA to 
 > configure and would have to be spread across many disks to have adequate 
 > performance.

I am by no means a database expert, but I think you are way
overstating the complexity issues.  While the initial design would
certainly need someone with experience, I don't know why each
implementation would require a "highly experienced DBA" or why it
"would have to be spread across many disks" any more than a standard
BackupPC implementation. Modern databases are written to hide a lot of
the complexity of optimization. Plus the database is large only in the
sense of having lots of table entries but is otherwise not
particularly complex nor do you have to deal with multiple
simultaneous access queries which is usually the major bottleneck
requiring optimization and performance tuning. Similarly the queries
will in general be very simple and easily keyed relative to other
real-world databases. Remember size != difficulty or complexity.

 > When you get down to the real issues, normal operation has 
 > a bottleneck with disk head motion which a database isn't going to do 
 > any better without someone knowing how to tune it across multiple disks. 

This seems like a red herring. The disk head motion issue applies
whether the data is stored in a database or in a combination of a
filesystem + attrib files. If anything, storage in a single database
would be more efficient than having to find and individually load (and
unpack) multiple attrib files since the database storage can be
optimized to some degree automagically while even attrib files that
are logically "sequential" could be scattered all over the disk
leading to inefficient head movement. Also, the database could be
stored on one disk and the pool on another but this would be difficult
if not impossible to do on BackupPC where the pool, the links, and the
attrib files are all on the same filesystem.

 > Also, while some database do offer remote replication, it isn't 
 > magic either and keeping it working isn't a common skill.
 > 

Again a red herring. Jut having the ability to temporarily "throttle"
BackupPC leaving the database in a consistent state would allow one to
just simply copy (e.g., rsync) the database and the pool to a backup
device. This copy would be much faster than today's BackupPC because
you wouldn't have the hard link issue. Remote replication would be
even better but not necessary to solve the common issue of copying the
pool raised by so many people on this list.

 > -- 
 >Les Mikesell
 > lesmikes...@gmail.com
 > 
 > 
 > 
 > --
 > OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
 > looking to deploy the next generation of Solaris that includes the latest 
 > innovations from Sun and the OpenSource community. Download a copy and 
 > enjoy capabilities such as Networking, Storage and Virtualization. 
 > Go to: http://p.sf.net/sfu/opensolaris-get
 > ___
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/
 > 

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] why hard links?

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 08:51:38 -0500 on Tuesday, June 2, 2009:
 > Peter Walter wrote:
 > > Les Mikesell wrote:
 > >> Josh Rubin wrote:
 > >>   
 > >>> I hope this question isn't too dumb, but why does backuppc need to use
 > >>> hard links?
 > >>> Of course, nobody wants to change a widely used program in a way that
 > >>> breaks everything,
 > >>> so this is really a question about some new, future program.
 > >>> Isn't the pool just a form of hash table that could be implemented lots
 > >>> of ways?
 > >>> 
 > >> Creating a link is an atomic operation in the fileystem.  With anything 
 > >> else you would have to add the overhead of some locking mechanism (with 
 > >> the potential for deadlocks) to be sure that the equivalent to the link 
 > >> and the count of them stayed in sync.   And, not too surprisingly it 
 > >> turns out that filesystem operations are actually a pretty good way to 
 > >> deal with storing files - it might not be impossible to do it better but 
 > >> it wouldn't be easy.
 > >>   
 > > 
 > > Locking mechanisms aren't always necessary. One of the advantages of 
 > > hashed database systems, such as linear hash technology, is that 
 > > collisions are easily managed, and semaphores are usually adequate as 
 > > simple locking mechanisms if they are needed.
 > 
 > Collisions aren't quite the point - you have to manage that anyway.  The 
 > hard part is knowing that the final target you link to is the one that 
 > you wanted, not a something created simultaneously by a different 
 > process doing the same computations, and knowing that the count of 
 > existing links always matches the actual copies.  The kernel manages 
 > this automatically when using links.  If you have to add an extra system 
 > call to lock/unlock around some other operation you'll triple the overhead.

I'm not sure how you definitively get to the number "triple". Maybe
more maybe less. Even more importantly, it all depends on what is the
critical bottleneck on a backup system. Is it network bandwidth? Is it
disk reads/writes? Is it internal cache/memory? Is it computational
overhead? For your argument to be a true obstacle, one would have to
assume that computational overhead is the bottleneck which I would
think is very unlikely since most systems are probably most limited by
either disk speed or network bandwidth.

Les - I'm really not sure why you seem so intent on picking apart a
database approach. I can understand someone arguing that it would take
too much effort to implement but I don't see the point of challenging
the workability of a database approach, particularly when most high
end enterprise backup systems do just exactly that (and for good
reason!).

 > 
 > -- 
 >Les Mikesell
 > lesmikes...@gmail.com
 > 
 > --
 > OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
 > looking to deploy the next generation of Solaris that includes the latest 
 > innovations from Sun and the OpenSource community. Download a copy and 
 > enjoy capabilities such as Networking, Storage and Virtualization. 
 > Go to: http://p.sf.net/sfu/opensolaris-get
 > ___
 > BackupPC-users mailing list
 > BackupPC-users@lists.sourceforge.net
 > List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/
 > 

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Another BackupPC Fuse filesystem

2009-06-02 Thread Pieter Wuille
Hello,

because of a need to restore files from backuppc in a more flexible way than
through the web-interface (a particular directory in a whole bunch of hosts
at the same time) and some googling, i stumbled upon Stephen Day's fuse system
for backuppc.

It had a few shortcomings, such as not supporting share-names with "/"
characters, and being very slow, so i started rewriting parts and adding
some features. 

If anyone's interested at trying/looking at it:
https://svn.ulyssis.org/repos/sipa/backuppc-fuse/backuppcfs.pl

Some features:
- caches the directory structure to improve efficiency
- supports chardevs and blockdevs (and files/dirs/symlinks)
- correct linkcounts for directories
- merges all shares of a host into one directory tree structure,
  supporting '/' and '\' as separators in sharenames
- open()ed filehandles are kept and reused to prevent seeking for each and
  every read operation - even supports efficient (sequential) reading when
  files are opened more than once at the same time.
- incremental backups are shown correctly
- some command-line options

It is only tested on one 3.1 backuppc pool on a Ubuntu 8.04 system, and not
very extensively. It only opens files/directories in read-only mode, thus
shouldn't be able to damage a working backuppc pool if something goes wrong.

I'd like to get some feedback; ideas, bugreports, ... are very welcome.

-- 
Pieter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Tino Schwarze
On Tue, Jun 02, 2009 at 10:06:40AM -0500, Les Mikesell wrote:

> >> Still, it would be awesome to combine the simplicity and pooling
> >> structure of BackupPC with the flexibility of a database
> >> architecture...
> >>   
> > I, for one, would be willing to contribute financially and with my very 
> > limited skills if Craig, or others, were willing to undertake such an 
> > effort. Perhaps Craig would care to comment.
> 
> The first thing needed would be to demonstrate that there would be an 
> advantage to a database approach - like some benchmarks showing an 
> improvement in throughput in the TB size range and measurements of the 
> bandwidth needed for remote replication.

In my experience, BackupPC is mainly I/O bound. It produces a lot of
seeks within the block device system (for directory and hash lookup).
This might actually benefit from a relational database - you'd just do
the appropiate SELECT, have some indices in place etc. Of course,
there's still that "how to store and query the directory hierarchies
efficiently" problem.

Maybe someone should propose a real design, then we may check how to map
BackupPC's access patterns to the database structure. It might turn out
to become really complex - I'm just wondering how to store files,
directories, attributes, the pool, a particular backup number. We
currently create the directory structure for each backup, so we may
store the attrib file (to keep track of deleted files, at least). We'd
have to do that for the database, too. There's no other solution, IMO.

I suppose, you could only benchmark something after implementing a
sufficiently complex part of the problem to solve.

Another idea: Do we have performance metrics of BackupPC? It might be
useful to check what operations take most of the time. Is it pool
lookups? File decompression? Directory traversal for incrementals?

If, for example, we figure out, that hash lookups and checksum reading
of hash files etc. are expensive, a little database (actually a
hashtable) might suffice, sort of a memcached which keeps track of pool
files, their size and checksum. This might be doable (maybe disabled by
default if it requires additional setup) and work like a cache.

> Personally I think the way to make things better would be to have a 
> filesystem that does block-level de-duplication internally. Then most of 
> what backuppc does won't even be necessary.   There were some 
> indications that this would be added to ZFS at some point, but I don't 
> know how the Oracle acquisition will affect those plans.

I don't think that belongs into the file system. In my opinion, a file
system should be tuned for one purpose: Managing space and files. It
should not care for file contents in any way, IMO.

> Meanwhile, if someone has time to kill doing benchmark measurements, 
> using ZFS with incremental send/receive to maintain a remote filesystem 
> snapshot would be interesting.  Or perhaps making a vmware vmdk disk 
> with many small (say 1 or 2 gig) elements and running backuppc in a 
> virtual machine.  Then for replication, stop the virtual machine and 
> rsync the directory containing the disk image files.  This might even be 
> possible without stopping if you can figure out how vmware snapshots work.

You don't want heavy I/O in Vmware without direct SAN attached or
similarly expensive setups.

I'd rather propose a patch to rsync adding --threat-blockdev-as-files .
This would require block-level checksum generation on _both_ sides,
though, so it's rather I/O and CPU intensive. Then, DRDB might be the
way to go - they already take note of changed parts of the disk (but
that's a guess).

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Les Mikesell
Robert J. Phillips wrote:
>
> I trust Raid 5.  We have a mail server that has Raid 5 and twice we have 
> lost one of the drives and been able to get the new drive, put it back 
> into the Raid and let it rebuild with no down time except the time to 
> put the new drive in (20 minutes). 

Raid5 works, but has a serious efficiency cost on writes - and can be 
extremely slow in degraded mode with a bad drive (where raid1 continues 
at full speed).

> Something else I learned about Raid 5 and 6 from my research into a 
> Blade server and a SAN is that the more drives you have in your Raid it 
> increases that bandwidth (throughput) of the data letting everything 
> work faster.

Yes, if you have about a dozen drives in the array it starts to look good.

> I don’t necessarily consider the loss of the BackupPC data a 
> catastrophe.  I could see how it could be for someone that really wants 
> to keep archive copies of backups.  If we have a total failure of 
> BackupPC we would rebuild it and start backing up again.  I guess the 
> worse scenario would be something that would not only hurt our BackupPC 
> server but also damage several other servers at the same time. 

The question here is what will happen after a site disaster.  Will you 
collect your insurance and retire or go into some other line of work or 
will you have to reconstruct your data and continue as best you can?

> The question comes down to each individual and how much time can you 
> afford to rebuild the data.  Is that downtime worth the extra money to 
> put in more redundancy in the drives, offsite solutions or clustered 
> servers.
> 
>  
> 
> There are so many options but they all have a positive and a negative 
> and I don’t think any solution, no matter how much money you have, is 
> 100% failsafe.  The odds of not being able to recover may be small but 
> it is still there.

In many cases the simple solution is to run a completely separate 
instance of backuppc in a different location that picks up the critical 
data, perhaps over a vpn connection.  This eliminates any singly point 
of failure and doesn't take much extra ongoing work.  If that isn't 
practical, making image copies of the archive might work.

-- 
   Les Mikesell
 lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 10:06:40 -0500 on Tuesday, June 2, 2009:
 > Peter Walter wrote:
 > > 
 > >> Still, it would be awesome to combine the simplicity and pooling
 > >> structure of BackupPC with the flexibility of a database
 > >> architecture...
 > >>   
 > > I, for one, would be willing to contribute financially and with my very 
 > > limited skills if Craig, or others, were willing to undertake such an 
 > > effort. Perhaps Craig would care to comment.
 > 
 > The first thing needed would be to demonstrate that there would be an 
 > advantage to a database approach - like some benchmarks showing an 
 > improvement in throughput in the TB size range and measurements of the 
 > bandwidth needed for remote replication.

No one ever claimed that the primary advantages of a database
approach is throughput. The advantages are really more about
extensibility, flexibility, and transportability. If you don't value
any of the 7 or so advantages I listed before, then I guess a database
approach is not for you.

Also, while clearly, a database approach would in general have more
computational overhead (at least for backups), from my experience the
bottlenecks are network bandwidth and disk speed. In fact, some people
have implemented BackupPC to run native on a 500MHz ARM processor
without effective slowdown. (On the other hand, restore-like
operations would likely be faster since it would be simpler to walk
down the hierarchy of incremental backups) So, I don't think you would
find any significant slowdowns from a database approach. If anything a
database approach could allow significantly *faster* backups since the
file transfers could be split across multiple disks which is not
possible under BackupPC unless you use LVM.

 > Personally I think the way to make things better would be to have a 
 > filesystem that does block-level de-duplication internally. Then most of 
 > what backuppc does won't even be necessary.   There were some 
 > indications that this would be added to ZFS at some point, but I don't 
 > know how the Oracle acquisition will affect those plans.

Ideally, I don't think that the backup approach should depend on the
underlying filesystem architecture. Such a restriction limits the
transportability of the backup solution just as currently BackupPC
really only works on *nix systems with hard links. A database approach
allows one to get away from dependence on specific filesystem features.
That doesn't mean there isn't room for specialized filesystem
approaches but just that such a requirement limits the audience for
the backup solution since it will be a while before we all start
running ZFS-type filesystems and then we will have the issue of
requiring different optimizations and code for different filesystem
approaches.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Tino Schwarze
Hi,

> Another solution to the two hard drives backing up might be to use Raid
> 0 (striping).  This does not allow redundancy but it does let you
> combine the drives so the system sees them as one drive.

One should only use RAID0 if he/she doesn't care for it's data. It might
double throughput, yes. But it doubles failure probability as well.
RAID0 might be suitable for automated build systems or similar where
only temporary data is stored. It's not suitable for a Backup system,
IMO.

Tino.

-- 
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Another BackupPC Fuse filesystem

2009-06-02 Thread Jeffrey J. Kosowsky
Pieter Wuille wrote at about 17:57:06 +0200 on Tuesday, June 2, 2009:
 > Hello,
 > 
 > because of a need to restore files from backuppc in a more flexible way than
 > through the web-interface (a particular directory in a whole bunch of hosts
 > at the same time) and some googling, i stumbled upon Stephen Day's fuse 
 > system
 > for backuppc.
 > 
 > It had a few shortcomings, such as not supporting share-names with "/"
 > characters, and being very slow, so i started rewriting parts and adding
 > some features. 
 > 
 > If anyone's interested at trying/looking at it:
 > https://svn.ulyssis.org/repos/sipa/backuppc-fuse/backuppcfs.pl
 > 
 > Some features:
 > - caches the directory structure to improve efficiency
 > - supports chardevs and blockdevs (and files/dirs/symlinks)
 > - correct linkcounts for directories
 > - merges all shares of a host into one directory tree structure,
 >   supporting '/' and '\' as separators in sharenames
 > - open()ed filehandles are kept and reused to prevent seeking for each and
 >   every read operation - even supports efficient (sequential) reading when
 >   files are opened more than once at the same time.
 > - incremental backups are shown correctly
 > - some command-line options
 > 
 > It is only tested on one 3.1 backuppc pool on a Ubuntu 8.04 system, and not
 > very extensively. It only opens files/directories in read-only mode, thus
 > shouldn't be able to damage a working backuppc pool if something goes wrong.
 > 
 > I'd like to get some feedback; ideas, bugreports, ... are very welcome.

Sounds great!!! I look forward to playing with it when I get a chance.

By the way, if you are ever interested in implementing 'write'
functionality, you might be able to repurpose the code that I wrote for
BackupPC_deleteFile that allows you to delete files and directories
(properly) from any full or incremental backup.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Peter Walter wrote:
> 
> As a Linux newbie, I have only a partial understanding of the technology 
> underlying Linux and BackupPC, but I get the impression that the problem 
> with a rsync-like solution is that processing hardlinks is very 
> expensive in terms of cpu time and memory resources. This may be a 
> stupid question, but, if hardlinks are the problem, has any thought been 
> given to adding to BackupPC an option to use some form of database 
> (text, SQL or otherwise) to associate hashes to files, instead? It seems 
> to me that using hardlinks is in fact using that feature of the file 
> system *as* a database, a use that does not appear to be optimal ... if 
> I have misunderstood, please educate me :-)

Using the filesystem to back up files is pretty much optimal for the 
operations that backuppc actually does.  That is, creating a new link 
will atomically succeed or fail regardless of how many other processes 
try the same thing at the same time and the link count is always 
maintained correctly in the corresponding inode.  Name lookups are 
fairly efficient operations, and the free space list is always 
maintained correctly.  Disk head motion (the usual bottleneck) isn't 
always optimal but nothing else is going to do it any better.

The place it isn't optimal is when you try to do other operations on the 
archive filesystem like reconstructing it with a file-oriented copy 
mechanism that has to traverse all the filenames and match up the inode 
numbers to duplicate the hard links.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Another BackupPC Fuse filesystem

2009-06-02 Thread Jeffrey J. Kosowsky
Pieter Wuille wrote at about 17:57:06 +0200 on Tuesday, June 2, 2009:
 > Hello,
 > 
 > because of a need to restore files from backuppc in a more flexible way than
 > through the web-interface (a particular directory in a whole bunch of hosts
 > at the same time) and some googling, i stumbled upon Stephen Day's fuse 
 > system
 > for backuppc.
 > 
 > It had a few shortcomings, such as not supporting share-names with "/"
 > characters, and being very slow, so i started rewriting parts and adding
 > some features. 
 > 
 > If anyone's interested at trying/looking at it:
 > https://svn.ulyssis.org/repos/sipa/backuppc-fuse/backuppcfs.pl
 > 
 > Some features:
 > - caches the directory structure to improve efficiency
 > - supports chardevs and blockdevs (and files/dirs/symlinks)
 > - correct linkcounts for directories
 > - merges all shares of a host into one directory tree structure,
 >   supporting '/' and '\' as separators in sharenames
 > - open()ed filehandles are kept and reused to prevent seeking for each and
 >   every read operation - even supports efficient (sequential) reading when
 >   files are opened more than once at the same time.
 > - incremental backups are shown correctly
 > - some command-line options
 > 
 > It is only tested on one 3.1 backuppc pool on a Ubuntu 8.04 system, and not
 > very extensively. It only opens files/directories in read-only mode, thus
 > shouldn't be able to damage a working backuppc pool if something goes wrong.
 > 
 > I'd like to get some feedback; ideas, bugreports, ... are very welcome.

Just took a quick spin on it -- looks AWESOME!
In my mind this is the way to go and *much* more useful and infinitely
faster than trying to navigate the web interface. This gives me
*exactly* what I want which is rapid access to all my backup files in
a CLI format that allows me to apply common *nix utilities like 'cp', 'less'
'grep', 'diff' etc. to examine and manipulate different backup
versions.

Couple of questions/comments:
1. Is there anyway to get the root share directory to mount as '/'
   rather than as '_/"? Having root mount differently detracts a
   little from the naturalness of it all.

2. What happens when a new backup is run while the fusefs is mounted?
   Is there an easy way to get the new backup to appear automagically
   or do you need to unmount/remount?

3. What happens when a backup is expired or deleted while the fusefs
   is mounted?

4. It is definitely *much* faster than the web interface and faster
   than Stephen Day's version.

5. It might be nice to add a feature that allows you to get info on
   the backup itself - say time/day created, incremental level,
   etc. Also, it might be nice to be able to print an ascii tree of
   the backup hierarchy. I know this is not strictly speaking part of
   any fuse filesystem but it would be a nice companion CLI accessory
   since now when I start listing backups I realize that I only see
   the 'number' and don't get a good handle on the other data about
   the backup.

6. If it is too complicated to make the filesystem writable at the
   individual file level by allowing the deletion of individual files,
   an easier first step would be to allow the deletion of backups (and
   all descendants backups). This is a *much* easier problem than
   individual file deletion and only requires deleting the root
   directory trees and deleting the corresponding lines from the
   'backup' log file.

Again *nice work* -- I hope you (and others) continue to refine this
program since it is one of the most useful extensions I have seen so far!



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Tino Schwarze wrote:
> 
>> The first thing needed would be to demonstrate that there would be an 
>> advantage to a database approach - like some benchmarks showing an 
>> improvement in throughput in the TB size range and measurements of the 
>> bandwidth needed for remote replication.
> 
> In my experience, BackupPC is mainly I/O bound. It produces a lot of
> seeks within the block device system (for directory and hash lookup).
> This might actually benefit from a relational database - you'd just do
> the appropiate SELECT, have some indices in place etc. Of course,
> there's still that "how to store and query the directory hierarchies
> efficiently" problem.

Yes, you are asking for magic that doesn't exist here.  A skilled DBA 
can work a little bit of magic by placing tables that need concurrent 
access on different physical drives, but not everyone will have either a 
large number of drives or a DBA available for the task.

> Maybe someone should propose a real design, then we may check how to map
> BackupPC's access patterns to the database structure. It might turn out
> to become really complex - I'm just wondering how to store files,
> directories, attributes, the pool, a particular backup number. We
> currently create the directory structure for each backup, so we may
> store the attrib file (to keep track of deleted files, at least). We'd
> have to do that for the database, too. There's no other solution, IMO.
> 
> I suppose, you could only benchmark something after implementing a
> sufficiently complex part of the problem to solve.

Or, benchmark some simple approximation handling the expected amount of 
data.  If it turns out to be impractically slow (as I suspect it 
will...) then you don't need to consider it any more.

> Another idea: Do we have performance metrics of BackupPC? It might be
> useful to check what operations take most of the time. Is it pool
> lookups? File decompression? Directory traversal for incrementals?

I think it is pretty well balanced most of the time.  But you have to 
consider the operation.  Worst case will probably be handling large 
files with small changes (like database dumps, mailboxes or growing 
logfiles) where rsync will end up transferring just the differences but 
the server will reconstruct the entire file copy, compress it, and make 
a new pool entry that is unlikely to be reused.

> If, for example, we figure out, that hash lookups and checksum reading
> of hash files etc. are expensive, a little database (actually a
> hashtable) might suffice, sort of a memcached which keeps track of pool
> files, their size and checksum. This might be doable (maybe disabled by
> default if it requires additional setup) and work like a cache.

I think the hashing scheme is already pretty efficient.

>> Personally I think the way to make things better would be to have a 
>> filesystem that does block-level de-duplication internally. Then most of 
>> what backuppc does won't even be necessary.   There were some 
>> indications that this would be added to ZFS at some point, but I don't 
>> know how the Oracle acquisition will affect those plans.
> 
> I don't think that belongs into the file system. In my opinion, a file
> system should be tuned for one purpose: Managing space and files. It
> should not care for file contents in any way, IMO.

 From the outside it wouldn't care about the contents - it just wouldn't 
use duplicate space to store duplicate contents.  Think of it as 
copy-on-write space just like memory works (except for actively looking 
for matches).  The same sort of content hashing scheme that backuppc 
uses to match files would be used at the block level.  You might not 
want this on every filesystem because of the overhead, but consider the 
advantage in the case of backups of growing logfiles.

>> Meanwhile, if someone has time to kill doing benchmark measurements, 
>> using ZFS with incremental send/receive to maintain a remote filesystem 
>> snapshot would be interesting.  Or perhaps making a vmware vmdk disk 
>> with many small (say 1 or 2 gig) elements and running backuppc in a 
>> virtual machine.  Then for replication, stop the virtual machine and 
>> rsync the directory containing the disk image files.  This might even be 
>> possible without stopping if you can figure out how vmware snapshots work.
> 
> You don't want heavy I/O in Vmware without direct SAN attached or
> similarly expensive setups.

You can afford to waste a little CPU these days - throw something fast 
at it.

> I'd rather propose a patch to rsync adding --threat-blockdev-as-files .
> This would require block-level checksum generation on _both_ sides,
> though, so it's rather I/O and CPU intensive. 

Also, rsync normally builds a new copy, so you need twice the space at 
the remote side - or if you let it rebuild in place you have a likely 
scenario where the site disaster you were trying to protect against 
happens mid-copy, leaving you with no working versions.  But disk s

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
Les Mikesell wrote:
> The first thing needed would be to demonstrate that there would be an 
> advantage to a database approach - like some benchmarks showing an 
> improvement in throughput in the TB size range and measurements of the 
> bandwidth needed for remote replication.
>
> Personally I think the way to make things better would be to have a 
> filesystem that does block-level de-duplication internally. Then most of 
> what backuppc does won't even be necessary.   There were some 
> indications that this would be added to ZFS at some point, but I don't 
> know how the Oracle acquisition will affect those plans.
>   

I am a newbie at this, and you obviously are very experienced with Linux 
and backuppc. However, it seems to me that the features that Jeffrey 
Kosowsky mentioned would be a very acceptable tradeoff against a 
reduction in performance, at least for me. I don't think there is a 
*need* to demonstrate an advantage if you stipulate that you would be 
comfortable with some reduction in performance. Finally, as someone who 
has done some database programming with *very* large databases, I don't 
think there is much to worry about regarding the transaction performance 
of common databases, such as MySQL. Using ZFS or another specific 
filesystem simply to preserve the requirement to support hard links 
seems to me to be requiring a specific technology for the technology's 
sake, instead of  determining which of several available technologies 
would solve the problem at hand. Why shouldn't backuppc be able to run 
on an NTFS volume, for instance?

Peter


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Problem: backup laptop on 2 different ip and 2 different domain

2009-06-02 Thread Andrea
Hi all.
I have a "strange" problem:

On my server i have 2 domain:
  test.lan  with 172.16.0.0/24 - for "ethernet" fixed ip address machine
  test.priv with 172.16.1.0/24 - for "wireless" dynamic address machine

So my laptop could be
1) 172.16.0.1  with name:  foo.test.lan
2) 172.16.1.1  with name:  foo.test.priv

# nmblookup foo
return the correct ip address on both .lan and .priv

PROBLEM:

When backuppc do dns search and then nmblookup search it will always do
# ssh -l root $host

but $host in that case == "foo"

As u all just realize, the /etc/resolv.conf server file has (of course)

search test.lan test.priv

so the backuppc server will always try

   # ssh -l root foo.test.lan  (172.16.0.1)

even if nmblookup return 172.16.1.1

What i've wrong ? Have i forgot some $Conf ??


My apologies for my bad english ('i'm ita),
and many thx in advance to any1 with a "trick" :P
Cya.
--
Andrea.
I amar prestar aen, Han mathon ne nen,
Han mathon ne chae, A han nostron ne wilith

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> 
>  > >> Still, it would be awesome to combine the simplicity and pooling
>  > >> structure of BackupPC with the flexibility of a database
>  > >> architecture...
>  > >>   
>  > > I, for one, would be willing to contribute financially and with my very 
>  > > limited skills if Craig, or others, were willing to undertake such an 
>  > > effort. Perhaps Craig would care to comment.
>  > 
>  > The first thing needed would be to demonstrate that there would be an 
>  > advantage to a database approach - like some benchmarks showing an 
>  > improvement in throughput in the TB size range and measurements of the 
>  > bandwidth needed for remote replication.
> 
> No one ever claimed that the primary advantages of a database
> approach is throughput. The advantages are really more about
> extensibility, flexibility, and transportability. If you don't value
> any of the 7 or so advantages I listed before, then I guess a database
> approach is not for you.

I just consider a filesystem to be a reasonable place to store backups 
of files, where a database is a stretch, and I know how to deal with 
most of the problems with filesystems and what to expect from them where 
databases introduce a whole new set of issues.  What's the equivalent of 
fsck for a corrupted database and how long does it take to fix a TB of data?

> Also, while clearly, a database approach would in general have more
> computational overhead (at least for backups), from my experience the
> bottlenecks are network bandwidth and disk speed. In fact, some people
> have implemented BackupPC to run native on a 500MHz ARM processor
> without effective slowdown. (On the other hand, restore-like
> operations would likely be faster since it would be simpler to walk
> down the hierarchy of incremental backups) So, I don't think you would
> find any significant slowdowns from a database approach. If anything a
> database approach could allow significantly *faster* backups since the
> file transfers could be split across multiple disks which is not
> possible under BackupPC unless you use LVM.

Again, I know how to deal with filesystems and I'd use a raid0/10/6 if I 
wanted to split over multiple disks.  But I don't because I want to be 
able to sync the whole thing to one disk that I can remove and I want to 
be able to access data from any single disk just by plugging it in to 
some still-working computer.

>  > Personally I think the way to make things better would be to have a 
>  > filesystem that does block-level de-duplication internally. Then most of 
>  > what backuppc does won't even be necessary.   There were some 
>  > indications that this would be added to ZFS at some point, but I don't 
>  > know how the Oracle acquisition will affect those plans.
> 
> Ideally, I don't think that the backup approach should depend on the
> underlying filesystem architecture. 

It wouldn't depend on it, it would just mean that you might be able to 
store 10x or more data for the same price where there is a lot of 
redundancy.

> Such a restriction limits the
> transportability of the backup solution just as currently BackupPC
> really only works on *nix systems with hard links.

Transportability?  I can access my backuppc disk copy using a USB 
adapter cable from a vmware instance of linux on my laptap while it is 
also still running windows.  I can do the same from a Mac, probably with 
the free virtualbox if I didn't want to pay for Vmware fusion.  You can 
boot just about anything with a CD or USB drive into linux and mount it. 
  You can't get much more portable than that - the OS itself is both 
portable and transportable.  And opensolaris under vmware/virtualbox 
would work as well if that's what it takes for a quick remote restore.

> A database approach
> allows one to get away from dependence on specific filesystem features.

Some real world examples please?  Are you thinking of replicating from 
one OS to another?

> That doesn't mean there isn't room for specialized filesystem
> approaches but just that such a requirement limits the audience for
> the backup solution since it will be a while before we all start
> running ZFS-type filesystems and then we will have the issue of
> requiring different optimizations and code for different filesystem
> approaches.

We already do have the issue of different optimizations for different 
filesytems - and databases are even worse.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sour

Re: [BackupPC-users] why hard links?

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
>
>  > Collisions aren't quite the point - you have to manage that anyway.  The 
>  > hard part is knowing that the final target you link to is the one that 
>  > you wanted, not a something created simultaneously by a different 
>  > process doing the same computations, and knowing that the count of 
>  > existing links always matches the actual copies.  The kernel manages 
>  > this automatically when using links.  If you have to add an extra system 
>  > call to lock/unlock around some other operation you'll triple the overhead.
> 
> I'm not sure how you definitively get to the number "triple". Maybe
> more maybe less. 

Ummm, link(), vs. lock(),link(),unlock() equivalents, looks like 3x the 
operations to me - and at least the lock/unlock parts have to involve 
system calls even if you convert the link operation to something else.

> Les - I'm really not sure why you seem so intent on picking apart a
> database approach.

I'm not. I'm encouraging you to show that something more than black 
magic is involved.  Databases sort-of work for some things.  They aren't 
particularly better at storing files than a filesystem.  If they were, 
we wouldn't use filesystems for anything.  You've made a bunch of claims 
about how a database might be better, but so far have not provided any 
evidence to back it up.

> I can understand someone arguing that it would take
> too much effort to implement but I don't see the point of challenging
> the workability of a database approach, particularly when most high
> end enterprise backup systems do just exactly that (and for good
> reason!).

One of the 'good reasons' is that most of those systems were designed 
for an OS that didn't have a decent filesystem at the time...

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Chris Robertson
Skip Guenter wrote:
> On Tue, 2009-06-02 at 16:36 +1000, Adam Goryachev wrote:
>   
>> So, using 4 x 100G drives provides 133G usable storage... we can lose
>> any two drives without any data loss. However, from my calculations
>> (which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
>> system you get 200G available storage, and can lose any two drives
>> without data loss.
>> 
>
> So isn't this the same as RAID10 w/ 4 drives, 200GB and can lose 2
> drives (as long as they aren't on the same mirror) and no risk of
> corrupted parity blocks?

With RAID 10, if you loose a drive AND it's mirror, your array is 
toast.  While you CAN loose two drives from a RAID 10 array, they have 
to be specific drives.  With RAID 6 you have X data disks and 2 parity 
disks*.  You can loose ANY two disks from a RAID 6 array without data 
loss.  As you add more disks to a RAID 10 array, you have to dedicate 
half of them to mirroring.  With a RAID 6 array, you only need two 
parity disks.  Any more you add are usable to expand the capacity.

Chris

*This statement is simplified, in that neither RAID 5 nor RAID 6 
actually dedicate a spindle (or two) to parity, but interleave it with 
the data.  RAID 3 has a dedicated parity disk, but doesn't get much 
attention.  With RAID 6, if you loose a disk, performance will not 
suffer, as no data is missing.  RAID 5 down one disk and RAID 6 down two 
disks looses either a part of the data (which is calculated from the 
parity data) or the parity data.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Peter Walter wrote:
> 
>> The first thing needed would be to demonstrate that there would be an 
>> advantage to a database approach - like some benchmarks showing an 
>> improvement in throughput in the TB size range and measurements of the 
>> bandwidth needed for remote replication.
>>
>> Personally I think the way to make things better would be to have a 
>> filesystem that does block-level de-duplication internally. Then most of 
>> what backuppc does won't even be necessary.   There were some 
>> indications that this would be added to ZFS at some point, but I don't 
>> know how the Oracle acquisition will affect those plans.
>>   
> 
> I am a newbie at this, and you obviously are very experienced with Linux 
> and backuppc. However, it seems to me that the features that Jeffrey 
> Kosowsky mentioned would be a very acceptable tradeoff against a 
> reduction in performance, at least for me. I don't think there is a 
> *need* to demonstrate an advantage if you stipulate that you would be 
> comfortable with some reduction in performance. Finally, as someone who 
> has done some database programming with *very* large databases, I don't 
> think there is much to worry about regarding the transaction performance 
> of common databases, such as MySQL.

Would you put the file content in the database or just the equivalent of 
the  directory entries and attributes?  The latter would probably work, 
but then you need something else to manage concurrency between the file 
activity and the database items and tools to fix things if the system 
crashes or ever gets out of sync.  If you would be comfortable with the 
file contents in mysql then you've been working with a different version 
than the ones I've tried.

> Using ZFS or another specific 
> filesystem simply to preserve the requirement to support hard links 
> seems to me to be requiring a specific technology for the technology's 
> sake, instead of  determining which of several available technologies 
> would solve the problem at hand.

Any filesystems with unix/posix semantics will handle hardlinks.  This 
is an instance of using the technology at hand as the obvious solution 
to a problem.

> Why shouldn't backuppc be able to run 
> on an NTFS volume, for instance?

I don't think this is a technical problem.  NTFS claims posix compliance 
and claims to support hard links.  I'm not sure if perl knows how to 
make them or get the link count, though - or if anyone cares.  It seems 
like something that could be fixed if it doesn't already work.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> 
>  > Do you actually have any experience with large scale databases?  I think 
>  > most installations that come anywhere near the size and activity of a 
>  > typical backuppc setup would require a highly experienced DBA to 
>  > configure and would have to be spread across many disks to have adequate 
>  > performance.
> 
> I am by no means a database expert, but I think you are way
> overstating the complexity issues.

I've worked with lots of filesystems and a few databases - and had many 
more problems with the databases.  For example, they are not at all 
happy or forgiving if you run out of underlying filesystem space.  And 
it's not clear how to fix them if they are corrupted by a crash. When 
you are dealing with backups you want them to work regardless of other 
problems - the time you need them is precisely when you have a bunch of 
other problems.

> While the initial design would
> certainly need someone with experience, I don't know why each
> implementation would require a "highly experienced DBA" or why it
> "would have to be spread across many disks" any more than a standard
> BackupPC implementation. Modern databases are written to hide a lot of
> the complexity of optimization.

Modern filesystems optimize file access because they know the related 
structures (directories, inodes, free space list).  Databases don't know 
what you are going to put in them or how they relate.  They can be tuned 
to optimize them for any particular thing but that isn't inherent.

> Plus the database is large only in the
> sense of having lots of table entries but is otherwise not
> particularly complex nor do you have to deal with multiple
> simultaneous access queries which is usually the major bottleneck
> requiring optimization and performance tuning.

Multiple concurrent writes are the hard part, something backuppc will be 
doing all night long.

> Similarly the queries
> will in general be very simple and easily keyed relative to other
> real-world databases. Remember size != difficulty or complexity.

Backups are a mostly-write operation, hopefully.

>  > When you get down to the real issues, normal operation has 
>  > a bottleneck with disk head motion which a database isn't going to do 
>  > any better without someone knowing how to tune it across multiple disks. 
> 
> This seems like a red herring. The disk head motion issue applies
> whether the data is stored in a database or in a combination of a
> filesystem + attrib files.

Sort of, but the OS, filesystem and buffer cache have years of design 
optimization for their specific purpose and they are pretty good at it. 
  And unless the database uses the raw device it can only add overhead 
to the underlying filesystem access.

> If anything, storage in a single database
> would be more efficient than having to find and individually load (and
> unpack) multiple attrib files since the database storage can be
> optimized to some degree automagically while even attrib files that
> are logically "sequential" could be scattered all over the disk
> leading to inefficient head movement.

This is the sort of thing where you need to produce evidence.  I'd 
expect the attrib files to be generally optimized with respect to the 
locations of the relevant directories that you will be accessing at the 
same time because the filesystem knows about these locations when 
allocating the space, whereas a database on top of a filesystem has no 
idea of where the disk head will be going next.

 > Also, the database could be
> stored on one disk and the pool on another but this would be difficult
> if not impossible to do on BackupPC where the pool, the links, and the
> attrib files are all on the same filesystem.

Agreed - if you have a skilled DBA to arrange this.  It's not going to 
happen out of the box.

>  > Also, while some database do offer remote replication, it isn't 
>  > magic either and keeping it working isn't a common skill.
>  > 
> 
> Again a red herring. Jut having the ability to temporarily "throttle"
> BackupPC leaving the database in a consistent state would allow one to
> just simply copy (e.g., rsync) the database and the pool to a backup
> device. This copy would be much faster than today's BackupPC because
> you wouldn't have the hard link issue. Remote replication would be
> even better but not necessary to solve the common issue of copying the
> pool raised by so many people on this list.

There's only a small difference in scale here (and it's not obvious 
which direction) between rsync'ing a raw database file and rsync'ing an 
image copy of a filesystem.  There's probably not much of a practical 
difference.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the Op

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
Les Mikesell wrote:
> Would you put the file content in the database or just the equivalent of 
> the  directory entries and attributes?  The latter would probably work, 
> but then you need something else to manage concurrency between the file 
> activity and the database items and tools to fix things if the system 
> crashes or ever gets out of sync.  If you would be comfortable with the 
> file contents in mysql then you've been working with a different version 
> than the ones I've tried.
>   
I would just put the metadata in the database - the equivalent of the 
directory structures and attributes. The pooled files themselves would 
remain directly on the filesystem and be managed by the filesystem. In 
fact, I am *not* proposing to get rid of the hardlinks, at least not 
immediately - just another layer that would replicate the hardlink 
structure in a database - as a secondary reference, perhaps. Then, a 
backup of the backup server would simply back up the database, AND the 
pooled files, but not the hardlinks, the hardlinks being recreated by a 
restore from the database as necessary.

Yes, I am comfortable with MySQL. I have worked with large commercial 
databases from IBM, Oracle, and others, and MySQL is at least as stable 
as the others when correctly configured - and usually somewhat faster. 
Besides, as said above, the file content would be *outside* the 
database. I think only an idiot would store files directly in the 
database when that is what the OS is designed for. If the database 
crashed for any reason, then it could be recovered from the hardlink 
data if necessary.

In summary, I see this feature as an *addition* to the current 
environment, not a replacement (yet). If it turned out to be a stable 
addition, then we would simply include a switch to turn hardlink 
creation off if the user wanted to use backuppc on a non-posix-compliant 
file system - such as 'cloud' storage.

Peter


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] why hard links?

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 12:32:14 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > >
 > >  > Collisions aren't quite the point - you have to manage that anyway.  
 > > The 
 > >  > hard part is knowing that the final target you link to is the one that 
 > >  > you wanted, not a something created simultaneously by a different 
 > >  > process doing the same computations, and knowing that the count of 
 > >  > existing links always matches the actual copies.  The kernel manages 
 > >  > this automatically when using links.  If you have to add an extra 
 > > system 
 > >  > call to lock/unlock around some other operation you'll triple the 
 > > overhead.
 > > 
 > > I'm not sure how you definitively get to the number "triple". Maybe
 > > more maybe less. 
 > 
 > Ummm, link(), vs. lock(),link(),unlock() equivalents, looks like 3x the 
 > operations to me - and at least the lock/unlock parts have to involve 
 > system calls even if you convert the link operation to something else.

3x operations != 3x worse performance
Given that disk seeks times and input bandwidth are typical
bottlenecks, I'm not particularly worried about the added
computational bandwidth of lock/unlock.


 > > Les - I'm really not sure why you seem so intent on picking apart a
 > > database approach.
 > 
 > I'm not. I'm encouraging you to show that something more than black 
 > magic is involved.  Databases sort-of work for some things.  They aren't 
 > particularly better at storing files than a filesystem.  If they were, 
 > we wouldn't use filesystems for anything.  You've made a bunch of claims 
 > about how a database might be better, but so far have not provided any 
 > evidence to back it up.

I never claimed performance. My claims have been around flexibility,
extendability, and transportability. The use of the database is *not*
to store files -- indeed the files would still be stored as
name-hashed files. The database would be used to store all the
meta-information that now is either stored in the filesystem or stored
in the attrib files (which are a big kludge and certainly not nearly
as well optimized as a modern database can be) or not stored at all
(such as extended attributes and acls). Indeed, if anything filesystem
design is moving towards being more like a database precisely for the
similar need of storing more file-related metadata. If you can find a
filesystem that stores all the above information and gets rid of the
need for the attrib files then I would at least partially accept your
arguments.

I think all (or nearly all) of my 7 claimed advantages are
self-evident. Most are feature-function related and the ones that are
performance related (such as reconstructing a restore view) are
deficits of the current implementation that even you admit.

Again, since the bottleneck is almost always disk I/O or network
bandwidth, I don't think proving database performance is critical plus
I can't prove that to you until the implementation is laid out and
perhaps even implemented. Remember BackupPC can run on 500MHz ARM
machines so I'm not worried about additional processing overhead.

Even if it were a little slower, I would be more than willing to
sacrifice a little inefficiency for a more elegant, extendable,
flexible, and transportable implementation over the current
implementation that kludges together filesystem-specific behaviors
with the kludge of thousands of attrib files. Don't get me wrong, I
think BackupPC is a great program, I just think that it's original
design is showing its limitations as people push to use it to backup
larger systems, other OS's (e.g., Windows) and more advanced
filesystem implementations (e.g., ACLs, selinux.)

 > > I can understand someone arguing that it would take
 > > too much effort to implement but I don't see the point of challenging
 > > the workability of a database approach, particularly when most high
 > > end enterprise backup systems do just exactly that (and for good
 > > reason!).
 > 
 > One of the 'good reasons' is that most of those systems were designed 
 > for an OS that didn't have a decent filesystem at the time...
 > 

There still is no common 'good' filesystem. And we are not all about
to switch to OpenSolaris. Plus, I don't want my backup system to be
filesystem dependent because I might have other reasons for picking
other filesystems or my OS of the future (or of today) might not even
support the filesystem features required. I think good system design
calls for abstracting the backup software from the underlying
filesystem.

 > -- 
 >Les Mikesell
 > lesmikes...@gmail.com
 > 
 > 
 > --
 > OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
 > looking to deploy the next generation of Solaris that includes the latest 
 > innovations from Sun and the OpenSource community. Download a copy and 
 > enjoy capabilities such as Networking, Storage and Virtuali

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 12:16:59 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > > 
 > >  > >> Still, it would be awesome to combine the simplicity and pooling
 > >  > >> structure of BackupPC with the flexibility of a database
 > >  > >> architecture...
 > >  > >>   
 > >  > > I, for one, would be willing to contribute financially and with my 
 > > very 
 > >  > > limited skills if Craig, or others, were willing to undertake such an 
 > >  > > effort. Perhaps Craig would care to comment.
 > >  > 
 > >  > The first thing needed would be to demonstrate that there would be an 
 > >  > advantage to a database approach - like some benchmarks showing an 
 > >  > improvement in throughput in the TB size range and measurements of the 
 > >  > bandwidth needed for remote replication.
 > > 
 > > No one ever claimed that the primary advantages of a database
 > > approach is throughput. The advantages are really more about
 > > extensibility, flexibility, and transportability. If you don't value
 > > any of the 7 or so advantages I listed before, then I guess a database
 > > approach is not for you.
 > 
 > I just consider a filesystem to be a reasonable place to store backups 
 > of files, where a database is a stretch, and I know how to deal with 
 > most of the problems with filesystems and what to expect from them where 
 > databases introduce a whole new set of issues.  What's the equivalent of 
 > fsck for a corrupted database and how long does it take to fix a TB of data?

Red herring. There is no 1TB of data (in most cases). Only the
metadata gets stored in the database. The files still get stored in
the pool. The size of the database would be about the same size as the
combined size of the attrib files - probably even smaller since a lot
of information in the attrib files are repeated between backups.

 > > Also, while clearly, a database approach would in general have more
 > > computational overhead (at least for backups), from my experience the
 > > bottlenecks are network bandwidth and disk speed. In fact, some people
 > > have implemented BackupPC to run native on a 500MHz ARM processor
 > > without effective slowdown. (On the other hand, restore-like
 > > operations would likely be faster since it would be simpler to walk
 > > down the hierarchy of incremental backups) So, I don't think you would
 > > find any significant slowdowns from a database approach. If anything a
 > > database approach could allow significantly *faster* backups since the
 > > file transfers could be split across multiple disks which is not
 > > possible under BackupPC unless you use LVM.
 > 
 > Again, I know how to deal with filesystems and I'd use a raid0/10/6 if I 
 > wanted to split over multiple disks.  But I don't because I want to be 
 > able to sync the whole thing to one disk that I can remove and I want to 
 > be able to access data from any single disk just by plugging it in to 
 > some still-working computer.

Red herring again. The database would typically be much smaller than
the pool so it should more easily fit onto a single disk than the
current implementation. On the other hand, in a database
implementation, the pool files could be split across multiple disks
since there would be no need for hard links anymore, thus giving much
more storage flexibility in addition to making it easier to replicate
the backup.

 > 
 > >  > Personally I think the way to make things better would be to have a 
 > >  > filesystem that does block-level de-duplication internally. Then most 
 > > of 
 > >  > what backuppc does won't even be necessary.   There were some 
 > >  > indications that this would be added to ZFS at some point, but I don't 
 > >  > know how the Oracle acquisition will affect those plans.
 > > 
 > > Ideally, I don't think that the backup approach should depend on the
 > > underlying filesystem architecture. 
 > 
 > It wouldn't depend on it, it would just mean that you might be able to 
 > store 10x or more data for the same price where there is a lot of 
 > redundancy.

Or use a database to store the relationships and it is filesystem independent.

 > 
 > > Such a restriction limits the
 > > transportability of the backup solution just as currently BackupPC
 > > really only works on *nix systems with hard links.
 > 
 > Transportability?  I can access my backuppc disk copy using a USB 
 > adapter cable from a vmware instance of linux on my laptap while it is 
 > also still running windows.  I can do the same from a Mac, probably with 
 > the free virtualbox if I didn't want to pay for Vmware fusion.  You can 
 > boot just about anything with a CD or USB drive into linux and mount it. 
 >   You can't get much more portable than that - the OS itself is both 
 > portable and transportable.  And opensolaris under vmware/virtualbox 
 > would work as well if that's what it takes for a quick remote
 > restore.

Some people might want (horrors) to run BackupPC natively on a Windows
machine without havi

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 13:16:05 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > > 
 > >  > Do you actually have any experience with large scale databases?  I 
 > > think 
 > >  > most installations that come anywhere near the size and activity of a 
 > >  > typical backuppc setup would require a highly experienced DBA to 
 > >  > configure and would have to be spread across many disks to have 
 > > adequate 
 > >  > performance.
 > > 
 > > I am by no means a database expert, but I think you are way
 > > overstating the complexity issues.
 > 
 > I've worked with lots of filesystems and a few databases - and had many 
 > more problems with the databases.  For example, they are not at all 
 > happy or forgiving if you run out of underlying filesystem space.  And 
 > it's not clear how to fix them if they are corrupted by a crash. When 
 > you are dealing with backups you want them to work regardless of other 
 > problems - the time you need them is precisely when you have a bunch of 
 > other problems.

Only the metadata would be stored in the database.

 > 
 > > While the initial design would
 > > certainly need someone with experience, I don't know why each
 > > implementation would require a "highly experienced DBA" or why it
 > > "would have to be spread across many disks" any more than a standard
 > > BackupPC implementation. Modern databases are written to hide a lot of
 > > the complexity of optimization.
 > 
 > Modern filesystems optimize file access because they know the related 
 > structures (directories, inodes, free space list).  Databases don't know 
 > what you are going to put in them or how they relate.  They can be tuned 
 > to optimize them for any particular thing but that isn't inherent.

The filesystem would be used to store the files. The database to store
the metadata. I'm sure just about any modern database would be much
more efficient at storing metadata than a packed ascii text file which is
what the attrib files are. Any time you want to access a file you need
to unpack the attrib file, parse it into a perl structure and then
access the specific data element you want. If you are dealing with
incremental backups you may need to read several attrib files just to
access a single file. You can't tell me that is more efficient than a
well-implemented relational database lookup. Even worse, any change to
an attrib file requires reading it all in, unpacking and parsing it
all, making the change, repacking, rewriting. Again much, much, much
less efficient than a database write.

 > 
 > > Plus the database is large only in the
 > > sense of having lots of table entries but is otherwise not
 > > particularly complex nor do you have to deal with multiple
 > > simultaneous access queries which is usually the major bottleneck
 > > requiring optimization and performance tuning.
 > 
 > Multiple concurrent writes are the hard part, something backuppc will be 
 > doing all night long.

 > > This seems like a red herring. The disk head motion issue applies
 > > whether the data is stored in a database or in a combination of a
 > > filesystem + attrib files.
 > 
 > Sort of, but the OS, filesystem and buffer cache have years of design 
 > optimization for their specific purpose and they are pretty good at it. 
 >   And unless the database uses the raw device it can only add overhead 
 > to the underlying filesystem access.

Overhead is only bad if it is significant and rate limiting.
 > 
 > > If anything, storage in a single database
 > > would be more efficient than having to find and individually load (and
 > > unpack) multiple attrib files since the database storage can be
 > > optimized to some degree automagically while even attrib files that
 > > are logically "sequential" could be scattered all over the disk
 > > leading to inefficient head movement.
 > 
 > This is the sort of thing where you need to produce evidence.  I'd 
 > expect the attrib files to be generally optimized with respect to the 
 > locations of the relevant directories that you will be accessing at the 
 > same time because the filesystem knows about these locations when 
 > allocating the space, whereas a database on top of a filesystem has no 
 > idea of where the disk head will be going next.


Well, again, when you access a file you in general need to read in
multiple attrib files across the chain of incremental backups. There
is no way that the filesystem knows of these relationships. Also,
since the files are hard linked often to pre-existing pool files there
is no reason to think that the attrib files are located logically near
the pool files.

 > 
 >  > Also, the database could be
 > > stored on one disk and the pool on another but this would be difficult
 > > if not impossible to do on BackupPC where the pool, the links, and the
 > > attrib files are all on the same filesystem.
 > 
 > Agreed - if you have a skilled DBA to arrange this.  It's not going to 
 > happen out of the box.

Pool is not st

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> >  > We already do have the issue of different optimizations for different 
>  > filesytems - and databases are even worse.
>  > 
> 
> Pick one or two database implementations that work on multiple
> platforms. Problem solved.

Yes, but you'll inherit the worst properties of both the database and 
the filesystem it sits on.

> Les, I understand that BackupPC as-is works perfectly for you on
> ZFS/Solaris.

It works well enough on ext3/Linux, which is what I'm actually using.  I 
think it might be even better on ZFS.

> However, you need to recognize that some of us have
> different setups and different needs. Just because you don't need an
> SUV for your transportation needs doesn't mean you can convince me
> that I don't need an SUV for my different transportation needs. Maybe
> it's even true that a database approach would measurably degrade
> performance (though I doubt it) but that doesn't mean that the
> tradeoffs of better flexibility, extendability, and transportability
> aren't worth it for other people.

I'm just being pragmatic. Backuppc generally works.  Linux is not 
difficult or expensive to obtain, nor is opensolaris.  Filesystems that 
support hardlinks aren't hard to find. There are ways to deal with 
copying filesystems.  When someone writes the database version I'll try 
it out, but right now it's all talk and not particularly convincing. 
While there are some theoretical points you can make about the attrib 
file, I can't recall anyone ever mentioning problems with it on the 
list. Nothing in my experience tells me that maintaining a database and 
a filesystem that need atomic synchronization is going to be better than 
a filesystem alone in any way that I care about - or that I would be 
able to trust files and database entries copied separately to another 
system. There are other places that could be bigger improvements in my 
opinion - for example in handling large growing files or storing 
additional attributes.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Another BackupPC Fuse filesystem

2009-06-02 Thread Pieter Wuille
On Tue, Jun 02, 2009 at 12:31:54PM -0400, Jeffrey J. Kosowsky wrote:
> Pieter Wuille wrote at about 17:57:06 +0200 on Tuesday, June 2, 2009:

>  > If anyone's interested at trying/looking at it:
>  > https://svn.ulyssis.org/repos/sipa/backuppc-fuse/backuppcfs.pl
>  > 

>  > It is only tested on one 3.1 backuppc pool on a Ubuntu 8.04 system, and not
>  > very extensively. It only opens files/directories in read-only mode, thus
>  > shouldn't be able to damage a working backuppc pool if something goes 
> wrong.
>  > 
>  > I'd like to get some feedback; ideas, bugreports, ... are very welcome.
> 
> Just took a quick spin on it -- looks AWESOME!
> In my mind this is the way to go and *much* more useful and infinitely
> faster than trying to navigate the web interface. This gives me
> *exactly* what I want which is rapid access to all my backup files in
> a CLI format that allows me to apply common *nix utilities like 'cp', 'less'
> 'grep', 'diff' etc. to examine and manipulate different backup
> versions.
Thanks :)

> Couple of questions/comments:
> 1. Is there anyway to get the root share directory to mount as '/'
>rather than as '_/"? Having root mount differently detracts a
>little from the naturalness of it all.

It shouldn't be too hard to change that. The only problem is what to do when
you have a '/' share and a '/etc' share, and he '/' share contains a 'etc'
dir/file/ probably won't occur, and i agree it's a more natural way.

> 2. What happens when a new backup is run while the fusefs is mounted?
>Is there an easy way to get the new backup to appear automagically
>or do you need to unmount/remount?
Refreshing of nodes in the directory cache occurs when:
- an expire happens (depending on where in the filesystem, the TTL is
  either 8s, 4s or 100s (see the source code))
- there's a request for something inexisting. eg. if a new backup #35
  is created while the cache only has backups up to #34, and you'd do a
  "cd 35" even though "ls" does not show a dir '35', it should work
  (and the 35 should exist afterwards in the listing too) - untested though

> 3. What happens when a backup is expired or deleted while the fusefs
>is mounted?
Completely untested... i assume a lot of errors might occur when you try to
read files - maybe empty files, crashes, ...
Errors don't propagate upwards in the tree, so only a refresh of the parent
node in the tree would fix it - so either an expire or the request of something
inexisting.

> 5. It might be nice to add a feature that allows you to get info on
>the backup itself - say time/day created, incremental level,
>etc. Also, it might be nice to be able to print an ascii tree of
>the backup hierarchy. I know this is not strictly speaking part of
>any fuse filesystem but it would be a nice companion CLI accessory
>since now when I start listing backups I realize that I only see
>the 'number' and don't get a good handle on the other data about
>the backup.
I've thought about this myself too - just not sure in what format and where.
The information you can get from backuppc about a host is limited, but you
can get somewhat more about specific backups. Creating one or more "files" in
a "host"-directory is quite simple, but inside the backup#-directories is
harder (since there may be real backup data already there if you have a '/'
share).
About the CLI tool: do you mean 'tree' ?


> 6. If it is too complicated to make the filesystem writable at the
>individual file level by allowing the deletion of individual files,
>an easier first step would be to allow the deletion of backups (and
>all descendants backups). This is a *much* easier problem than
>individual file deletion and only requires deleting the root
>directory trees and deleting the corresponding lines from the
>'backup' log file.
Currently all data comes from the BackupPC libraries itself, and i'd like to
keep it that way for now. Such modifications to the tree require some
duplication of knowledge about backuppc backups. 

-- 
Pieter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
>
>  > 
>  > There's only a small difference in scale here (and it's not obvious 
>  > which direction) between rsync'ing a raw database file and rsync'ing an 
>  > image copy of a filesystem.  There's probably not much of a practical 
>  > difference.
> 
> Except that I have a lot of other stuff on my filesystem so I don't
> want to image the whole filesystem.

That just takes some sensible planning...

> I just want to image the
> backups. Also, not all filesystems support efficient methods for
> imaging a partially filled filesystem.

Why is it that you are concerned about efficiency here, but not in your 
mythical database system?

> Again, you are assuming a tight
> integration between the functionality and setup of the filesystem and
> the backup software whereas I want to abstract away any such
> requirements as much as possible even at the expense of some extra
> overhead.

Somehow I don't see how having to install, tune, and maintain an 
otherwise unneeded database fits into the concept of abstracting away 
functionality.  You have to live with filesystems anyway so you might as 
well learn how to manage them.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 14:23:06 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > > >  > We already do have the issue of different optimizations for different 
 > >  > filesytems - and databases are even worse.
 > >  > 
 > > 
 > > Pick one or two database implementations that work on multiple
 > > platforms. Problem solved.
 > 
 > Yes, but you'll inherit the worst properties of both the database and 
 > the filesystem it sits on.
 > 
 > > Les, I understand that BackupPC as-is works perfectly for you on
 > > ZFS/Solaris.
 > 
 > It works well enough on ext3/Linux, which is what I'm actually using.  I 
 > think it might be even better on ZFS.
 > 
 > > However, you need to recognize that some of us have
 > > different setups and different needs. Just because you don't need an
 > > SUV for your transportation needs doesn't mean you can convince me
 > > that I don't need an SUV for my different transportation needs. Maybe
 > > it's even true that a database approach would measurably degrade
 > > performance (though I doubt it) but that doesn't mean that the
 > > tradeoffs of better flexibility, extendability, and transportability
 > > aren't worth it for other people.
 > 
 > I'm just being pragmatic. Backuppc generally works.  Linux is not 
 > difficult or expensive to obtain, nor is opensolaris.  Filesystems that 
 > support hardlinks aren't hard to find. There are ways to deal with 
 > copying filesystems.  When someone writes the database version I'll try 
 > it out, but right now it's all talk and not particularly convincing. 
 > While there are some theoretical points you can make about the attrib 
 > file, I can't recall anyone ever mentioning problems with it on the 
 > list. Nothing in my experience tells me that maintaining a database and 
 > a filesystem that need atomic synchronization is going to be better than 
 > a filesystem alone in any way that I care about - or that I would be 
 > able to trust files and database entries copied separately to another 
 > system. There are other places that could be bigger improvements in my 
 > opinion - for example in handling large growing files or storing 
 > additional attributes.

For me the biggest limitation is that it doesn't store extended
attributes and ACLs. I have looked (briefly) into extending that
functionality and in fact it was that effort that led me to think that
a database is the way to go. Otherwise adding additional attributes to
the attrib file just seems kludgey since there is no good general
purpose way to store/access them. Plus, the more attributes you store,
the slower it all will become since you will still in general need to
read in and unpack the entire attrib file to find just one piece of
information. Then my thinking was that once you start rewriting it all
in an abstracted objected-oriented fashion then you might as well go
the next step and store it in a database rather than having to
re-create all the database-like functionality.

The second limitation that has been discussed ad-nauseum is that hard
links make it difficult to move, copy, or split the pool across disks.

The other advantages that I mention then come along mostly for the
ride once you implement a database backend.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] Host Summary - Full Size: Wondering where it comes from

2009-06-02 Thread Boniforti Flavio
Hello list...

I was just playing around with occupied HDD space and wondered where the
³Full Size² values in the ³Host Summary² depend on, where they originate.
Indeed, if I do ³du ­sh /var/lib/backuppc/pc/hostname1² I get totally
different values from the ones reported in the web interface (not to talk
about the data effectively stored on the remote host, but that¹s no problem
because I know there¹s pooling and compression going on).

Anybody knows why there¹s difference between the above 2 values?

Thanks,
Flavio Boniforti.
--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 14:36:24 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > >
 > >  > 
 > >  > There's only a small difference in scale here (and it's not obvious 
 > >  > which direction) between rsync'ing a raw database file and rsync'ing an 
 > >  > image copy of a filesystem.  There's probably not much of a practical 
 > >  > difference.
 > > 
 > > Except that I have a lot of other stuff on my filesystem so I don't
 > > want to image the whole filesystem.
 > 
 > That just takes some sensible planning...

Needs change, systems grow, systems added. Not all of us can afford to
start with a lifetime of disk space... Not all of us have perfect
foresight when we set up the system the first time...

Suppose I want to change filesystems (say move to ZFS), exactly how do
I do that now other than to do a file copy operation including hard
links that could take the better part of a day or more, assuming it
doesn't crash or slow to a crawl due to memory constraints.

 > 
 > > I just want to image the
 > > backups. Also, not all filesystems support efficient methods for
 > > imaging a partially filled filesystem.
 > 
 > Why is it that you are concerned about efficiency here, but not in your 
 > mythical database system?

I'm not talking about marginal efficiency. I'm talking about there not
being a *practical* way to replicate a large Backuppc pool  in any
reasonable amount of time given the number of hard links other than
imaging the entire filesystem.

 > 
 > > Again, you are assuming a tight
 > > integration between the functionality and setup of the filesystem and
 > > the backup software whereas I want to abstract away any such
 > > requirements as much as possible even at the expense of some extra
 > > overhead.
 > 
 > Somehow I don't see how having to install, tune, and maintain an 
 > otherwise unneeded database fits into the concept of abstracting away 
 > functionality.  You have to live with filesystems anyway so you might as 
 > well learn how to manage them.

I don't see the any major requirement for tuning and maintaining a
metadata database unless you are doing huge enterprise size backups.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] BackupPC 3.2.0 beta install for Ubuntu

2009-06-02 Thread Robert J. Phillips
I have generated a deb file and upgraded my BackupPC 3.1.0 to 3.2.0
beta.  Everything seems to be working fine.  

 

Can I submit the deb file to you?  What information do you need and how
should I submit it?

 

This program is so helpful.  I hope that I can be a contributor and help
others enjoy the benefits of BackupPC.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> 
> For me the biggest limitation is that it doesn't store extended
> attributes and ACLs. I have looked (briefly) into extending that
> functionality and in fact it was that effort that led me to think that
> a database is the way to go.

But that really has nothing to do with the way they are stored. The 
attrib mechanism could be extended or a parallel extended-attrib text 
file could be added when you have a client capable of passing the 
relevant data.

> Otherwise adding additional attributes to
> the attrib file just seems kludgey since there is no good general
> purpose way to store/access them. Plus, the more attributes you store,
> the slower it all will become since you will still in general need to
> read in and unpack the entire attrib file to find just one piece of
> information. Then my thinking was that once you start rewriting it all
> in an abstracted objected-oriented fashion then you might as well go
> the next step and store it in a database rather than having to
> re-create all the database-like functionality.

There's one big missing piece.  If you don't put the file contents in 
the database, how do you know that the name of the file that you have in 
the database actually refers to something that holds the correct 
content?  And if it doesn't, how do you fix it?  In the current scheme 
you have kernel-atomic operations arbitrating the concurrent processes 
that are writing things.  Can you match that reliability some other way? 
  With hardlinks, the pooled hash filename only exists in one place and 
if two processes try to create the same file one will fail and know it. 
  You can delete the whole pool subsequently and all of the links under 
the pc directories will still have the correct content.  With a database 
you have to create the hashed name in both the database and filesystem 
with separate things that can go wrong and a window of time where 
processes can compete, and you then must always trust that the filename 
holds that content for as long as any references exist.

> The second limitation that has been discussed ad-nauseum is that hard
> links make it difficult to move, copy, or split the pool across disks.

Which has an assortment of easy solutions, the easiest of all being to 
just get a big disk.

> The other advantages that I mention then come along mostly for the
> ride once you implement a database backend.

But at the tradeoff of disconnecting the metadata like the actual 
filename from the content that it is supposed to represent.

-- 
Les Mikesell
 lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Another BackupPC Fuse filesystem

2009-06-02 Thread Jeffrey J. Kosowsky
Pieter Wuille wrote at about 21:40:46 +0200 on Tuesday, June 2, 2009:
 > On Tue, Jun 02, 2009 at 12:31:54PM -0400, Jeffrey J. Kosowsky wrote:
 > > Pieter Wuille wrote at about 17:57:06 +0200 on Tuesday, June 2, 2009:
 > 
 > >  > If anyone's interested at trying/looking at it:
 > >  > https://svn.ulyssis.org/repos/sipa/backuppc-fuse/backuppcfs.pl
 > >  > 
 > 
 > >  > It is only tested on one 3.1 backuppc pool on a Ubuntu 8.04 system, and 
 > > not
 > >  > very extensively. It only opens files/directories in read-only mode, 
 > > thus
 > >  > shouldn't be able to damage a working backuppc pool if something goes 
 > > wrong.
 > >  > 
 > >  > I'd like to get some feedback; ideas, bugreports, ... are very welcome.
 > > 
 > > Just took a quick spin on it -- looks AWESOME!
 > > In my mind this is the way to go and *much* more useful and infinitely
 > > faster than trying to navigate the web interface. This gives me
 > > *exactly* what I want which is rapid access to all my backup files in
 > > a CLI format that allows me to apply common *nix utilities like 'cp', 
 > > 'less'
 > > 'grep', 'diff' etc. to examine and manipulate different backup
 > > versions.
 > Thanks :)
 > 
 > > Couple of questions/comments:
 > > 1. Is there anyway to get the root share directory to mount as '/'
 > >rather than as '_/"? Having root mount differently detracts a
 > >little from the naturalness of it all.
 > 
 > It shouldn't be too hard to change that. The only problem is what to do when
 > you have a '/' share and a '/etc' share, and he '/' share contains a 'etc'
 > dir/file/ probably won't occur, and i agree it's a more natural way.

You could always add a flag that would add an (arbitrary) base name to
the root share if needed.

 > 
 > > 2. What happens when a new backup is run while the fusefs is mounted?
 > >Is there an easy way to get the new backup to appear automagically
 > >or do you need to unmount/remount?
 > Refreshing of nodes in the directory cache occurs when:
 > - an expire happens (depending on where in the filesystem, the TTL is
 >   either 8s, 4s or 100s (see the source code))
 > - there's a request for something inexisting. eg. if a new backup #35
 >   is created while the cache only has backups up to #34, and you'd do a
 >   "cd 35" even though "ls" does not show a dir '35', it should work
 >   (and the 35 should exist afterwards in the listing too) - untested though

Would a refresh of the parent node catch a newly created backup before
the timeout?

 > > 3. What happens when a backup is expired or deleted while the fusefs
 > >is mounted?
 > Completely untested... i assume a lot of errors might occur when you try to
 > read files - maybe empty files, crashes, ...
 > Errors don't propagate upwards in the tree, so only a refresh of the parent
 > node in the tree would fix it - so either an expire or the request of 
 > something
 > inexisting.

Probably should be tested at some point... and then behaviors adjusted
to minimize the nastiness of and maximize the recovery from any
resulting errors.


 > > 5. It might be nice to add a feature that allows you to get info on
 > >the backup itself - say time/day created, incremental level,
 > >etc. Also, it might be nice to be able to print an ascii tree of
 > >the backup hierarchy. I know this is not strictly speaking part of
 > >any fuse filesystem but it would be a nice companion CLI accessory
 > >since now when I start listing backups I realize that I only see
 > >the 'number' and don't get a good handle on the other data about
 > >the backup.
 > I've thought about this myself too - just not sure in what format and where.
 > The information you can get from backuppc about a host is limited, but you
 > can get somewhat more about specific backups. Creating one or more "files" in
 > a "host"-directory is quite simple, but inside the backup#-directories is
 > harder (since there may be real backup data already there if you have a '/'
 > share).
 > About the CLI tool: do you mean 'tree' ?

Yeah a tree something like the way pstree displays: e.g.,
  Machine 1:
 1 +
   |- 2 +
   ||- 3 +
   | |- 4 +
   |- 5 +
|- 6 +
 |- 7 +  
 8 +
   |- 9 +
|- 10

  Machine 2:
 1 +
   |- 2 +
   ||- 3 +
   | |- 4 +
   |- 5 +
|- 6 +
 |- 7 +  
 8 +
   |- 9 +
|- 10
  
 
 > > 6. If it is too complicated to make the filesystem writable at the
 > >individual file level by allowing the deletion of individual files,
 > >an easier first step would be to allow the deletion of backups (and
 > >all descendants backups). This is a *much* easier problem than
 > >individual file deletion and only requires deleting the root
 > >directory trees and deleting the corresponding l

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Peter Walter
Les Mikesell wrote:

> Somehow I don't see how having to install, tune, and maintain an 
> otherwise unneeded database fits into the concept of abstracting away 
> functionality.  You have to live with filesystems anyway so you might as 
> well learn how to manage them.
>   
Les has been a strong advocate for his position. However, backuppc as it 
is currently designed does not meet my need to remotely backup all kinds 
of  computers, including other backuppc servers. I think the 
enhancements Jeffrey Kosowsky and I have outlined in this discussion 
would solve my problem, as well as a number of other problems, would 
significantly extend the functionality of backuppc, and also make it 
compatible with other platforms. I am therefore going to take this 
discussion over to the backuppc-devel and ask Craig what he and others 
over there think. Hopefully, I can sucker a perl developer to start 
coding it as an add-on to the current development release.

Peter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Craig Barratt
Interesting thread!

Jeffrey writes:

> That being said, I agree that using a database to store both the
> hardlinks along with the metadata stored in the attrib files would be
> a more elegant, extensible, and platform-independent solution though
> presumably it would require a major re-write of BackupPC.
> 
> I certainly understand why BackupPC uses hardlinks since it allows for
> an easy way to do the pooling and in a sense as you suggest uses the
> filesystem as a rudimentary database.
> 
> On the other hand as I and others have mentioned before moving to a
> database would add the following advantages:

I agree on all the points.  If I was writing BackupPC today I
would seriously consider this approach.  As Les points out, the
most important open question (other than reliability) is whether
the performance is adequate as the store expands to millions of
files (and 10^8, 10^9 or more file blocks).  Of course, BackupPC
is relatively slow, so maybe the baseline expectation is already
sufficiently low.

I recently heard about lessfs, which runs on top of FUSE to provide
a file system that does block-level de-duplication.  See:

http://www.lessfs.com
https://sourceforge.net/project/showfiles.php?group_id=257120
http://tokyocabinet.sourceforge.net/index.html

The actual storage is several very large (sparse?) files on any
file system(s) of your choice.  It should provide all the benefits
you expect: no issues of local limitations on hardlink counts,
meta-data etc, and the database files can be copied or rsynced.
I'm corresponding with the author to see if some additional useful
features could be added.

Yes, taking this approach would require a very substantial rewrite.
BackupPC would become a lot simpler.  But it also creates a significant
issue of backward compatibility.  The only solution would be to provide
tools that import the old BackupPC store into a new one.  That is
possible, but would likely be very slow.

Craig

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Peter Walter wrote:
> 
>> Somehow I don't see how having to install, tune, and maintain an 
>> otherwise unneeded database fits into the concept of abstracting away 
>> functionality.  You have to live with filesystems anyway so you might as 
>> well learn how to manage them.
>>   
> Les has been a strong advocate for his position. However, backuppc as it 
> is currently designed does not meet my need to remotely backup all kinds 
> of  computers, including other backuppc servers. I think the 
> enhancements Jeffrey Kosowsky and I have outlined in this discussion 
> would solve my problem, as well as a number of other problems, would 
> significantly extend the functionality of backuppc, and also make it 
> compatible with other platforms. I am therefore going to take this 
> discussion over to the backuppc-devel and ask Craig what he and others 
> over there think. Hopefully, I can sucker a perl developer to start 
> coding it as an add-on to the current development release.

Backing up other backuppc servers is really a special case that might 
deserve a special optimization.   But, I'm not sure that adding a 
database automatically makes it any easier - unless you are thinking of 
a common database that could arbitrate a common hashed filename that is 
unique across all instances for every piece of content.  That's an 
interesting idea but seems kind of fragile.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Craig Barratt wrote:
> 
> Yes, taking this approach would require a very substantial rewrite.
> BackupPC would become a lot simpler.  But it also creates a significant
> issue of backward compatibility.  The only solution would be to provide
> tools that import the old BackupPC store into a new one.  That is
> possible, but would likely be very slow.

One simple thing that I've sometimes thought would be useful would be a 
way to re-create the pool links down any PC tree.  That way you could 
rsync individual pc directories to an offsite location which usually 
works OK even with -H (I suppose there is some limit..) and then 
reconstruct the pooling to reclaim the duplicate space, repeating for 
each host or new backup without ever having to deal with the size of the 
combined pool/pc tree.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Skip Guenter
I don't know if this is a factor or not, but an implementation like this
sounds like it would cause a (perhaps very) small portion of the
BackupPC user base would go by the wayside.  I'm talking about the folks
who have a full time job that doesn't include "SysAdmin" but are trying
to keep a small office like environment backed up with minimal hardware
and skills.

If the target user base is large scale implementations then, obviously
this isn't a factor.  If BackupPC, as a package, is intended to address
a wide range of implementation sizes then I think this must factor in.

This list seems dominated by SysAdmin types and that's understandable
and makes for a great source of knowledge.  However, I can't help but
wonder how many little shmucks like me are out there happily using (or
getting ready to use) this package in sub 40 or even sub 20 machine
environments.  I don't think ya'll hear from them much.

Skip


On Tue, 2009-06-02 at 09:50 -0500, Les Mikesell wrote:
> Jeffrey J. Kosowsky wrote:
> > 
> > In my mind the only major reason not to move to a database
> > architecture is that it would require a substantial re-write of
> > BackupPC as pointed out in my earlier note.
> 
> Do you actually have any experience with large scale databases?  I think 
> most installations that come anywhere near the size and activity of a 
> typical backuppc setup would require a highly experienced DBA to 
> configure and would have to be spread across many disks to have adequate 
> performance.  When you get down to the real issues, normal operation has 
> a bottleneck with disk head motion which a database isn't going to do 
> any better without someone knowing how to tune it across multiple disks. 
> Also, while some database do offer remote replication, it isn't 
> magic either and keeping it working isn't a common skill.
> 


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC 3.2.0 beta install for Ubuntu

2009-06-02 Thread royden yates
On Tue, 2009-06-02 at 15:10 -0500, Robert J. Phillips wrote:
> I have generated a deb file and upgraded my BackupPC 3.1.0 to 3.2.0
> beta.  Everything seems to be working fine.  
> 
>  
> 
> Can I submit the deb file to you?  What information do you need and
> how should I submit it?
> 
>  
> 
> This program is so helpful.  I hope that I can be a contributor and
> help others enjoy the benefits of BackupPC.
> 
I am unsure of policy here, but backuppc-de...@lists.sourceforge.net is
where you want to be making this offer, as SF is the official download
site and only the devs can upload.

Could you offer it as a download, and then place a link in the backuppc
wiki?

I would love to give it a spin.

Rgrds,

ryts



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 16:12:19 -0500 on Tuesday, June 2, 2009:
 > Craig Barratt wrote:
 > > 
 > > Yes, taking this approach would require a very substantial rewrite.
 > > BackupPC would become a lot simpler.  But it also creates a significant
 > > issue of backward compatibility.  The only solution would be to provide
 > > tools that import the old BackupPC store into a new one.  That is
 > > possible, but would likely be very slow.
 > 
 > One simple thing that I've sometimes thought would be useful would be a 
 > way to re-create the pool links down any PC tree.  That way you could 
 > rsync individual pc directories to an offsite location which usually 
 > works OK even with -H (I suppose there is some limit..) and then 
 > reconstruct the pooling to reclaim the duplicate space, repeating for 
 > each host or new backup without ever having to deal with the size of the 
 > combined pool/pc tree.
 > 

This would be a lot easier if the footer of the pool files included
the md5sum checksum name of each pool file. For compressed pool files,
this would just mean extending the footer by 16 bytes. Actually for
speed, it would probably be better to store this information in the
header after the initial magic byte. Even better add another couple of
bytes to identify which element of the chain is being referenced when
multiple files have the same pool hash (note: this would have to be
adjusted when BackupPC_nightly runs and re-arranges the chain
numbering).

Then you could pretty easily find the corresponding pool file from any
of the hard links without the usual reverse-lookup problem when trying
to identify the pool file from the hard-link inode in the pc file.

Backing up the BackupPC data would then be as simple as the following:
1. Shutdown BackupPC
2. Copy the pool to the new destination (no hard links)
3. Recurse through the pc directories as follows:
- Copy directory entries to the new destination (i.e. recreate
  directories using something like mkdir)
- Copy regular files with nlinks=1 to the new destination
- For hard-linked files, use the header (or footer) to find the
  cpool pathname (reconstructed from the hash and the chain
  number). Then create the corresponding link on the new
  destination.
4. Restart BackupPC

If you don't add the pool hash information to the cpool file
header/footer, then you could still do a similar process by adding an
intermediate step (say 2.5) of creating a lookup table by recursing
through the pool and associating inodes with cpool entries. Then in
step 3 you would use the inode number of each hard-linked file in the
pc directory to look up the corresponding link that needs to be
created. This would require some cleverness to make the lookup fast
for large pools where the entire table might not fit into memory. My
only concern is that this may require O(n^2) or O(nlogn) operations
vs. the O(n) for the first method.



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Max Hetrick
Skip Guenter wrote:
> I don't know if this is a factor or not, but an implementation like this
> sounds like it would cause a (perhaps very) small portion of the
> BackupPC user base would go by the wayside.  I'm talking about the folks
> who have a full time job that doesn't include "SysAdmin" but are trying
> to keep a small office like environment backed up with minimal hardware
> and skills.
> 
> If the target user base is large scale implementations then, obviously
> this isn't a factor.  If BackupPC, as a package, is intended to address
> a wide range of implementation sizes then I think this must factor in.
> 
> This list seems dominated by SysAdmin types and that's understandable
> and makes for a great source of knowledge.  However, I can't help but
> wonder how many little shmucks like me are out there happily using (or
> getting ready to use) this package in sub 40 or even sub 20 machine
> environments.  I don't think ya'll hear from them much.

I have to agree here. I came to BackupPC to replace an rsnapshot server 
we had doing backups. It was a pieced together system with custom 
scripts and all the things that make for failing backups. BackupPC 
persuaded me because of the ease of setup, and lack out complication for 
setting up new hosts, etc. The compression was a wonderful bonus as 
well, but not really the point. I liked it so much that I wrote a guide 
for installing it on CentOS, which is published on their wiki.

At any rate, I backup about 21 hosts and 500 Gb of data. I bet the 
majority of people using it are like me. I consider myself a pretty 
knowledgeable person overall using Linux, and have been using 
RHEL/CentOS for about 6 or 7 years now. It's my job at my company, plus 
I write technical articles online for a publication, but I am by no 
means a filesystem and database guru.

I think that anything to make the program better is welcome, but at what 
cost? If complexity is added where people now have to have knowledgeable 
people available in the database world, or in the filesystem world that 
  BackupPC is running on, then what advantage does that bring.

For my applications, BackupPC works wonderful exactly the way it is. I'm 
not running enterprise grade stuff here though, so my opinion is of the 
little guy type. Everyone's situation is different, though. Just my 
thoughts, which might not mean much. It's just that the more I read the 
thread, the more it seems that the typical BackupPC user isn't chiming 
in. :)

Regards,
Max

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 16:10:41 -0500 on Tuesday, June 2, 2009:
 > Peter Walter wrote:
 > > 
 > >> Somehow I don't see how having to install, tune, and maintain an 
 > >> otherwise unneeded database fits into the concept of abstracting away 
 > >> functionality.  You have to live with filesystems anyway so you might as 
 > >> well learn how to manage them.
 > >>   
 > > Les has been a strong advocate for his position. However, backuppc as it 
 > > is currently designed does not meet my need to remotely backup all kinds 
 > > of  computers, including other backuppc servers. I think the 
 > > enhancements Jeffrey Kosowsky and I have outlined in this discussion 
 > > would solve my problem, as well as a number of other problems, would 
 > > significantly extend the functionality of backuppc, and also make it 
 > > compatible with other platforms. I am therefore going to take this 
 > > discussion over to the backuppc-devel and ask Craig what he and others 
 > > over there think. Hopefully, I can sucker a perl developer to start 
 > > coding it as an add-on to the current development release.
 > 
 > Backing up other backuppc servers is really a special case that might 
 > deserve a special optimization.   But, I'm not sure that adding a 
 > database automatically makes it any easier - unless you are thinking of 
 > a common database that could arbitrate a common hashed filename that is 
 > unique across all instances for every piece of content.  That's an 
 > interesting idea but seems kind of fragile.
 > 

Once we are talking about redoing things, I would prefer to use a
full md5sum hash for the name of the pool file. You end up
calculating this anyway for free when you use the rsync method
(although with protocol <=28, you get a full file md4sum but with
protocol >=30, I believe you have the true md5sum). This would
simplify the ambiguity of having multiple indexed chain entries with
the same partial md5sum.

With this approach then you would automatically have "a common hashed
filename that is ['statistically'] unique across all instances for
every piece of content."

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
> 
>  > > Except that I have a lot of other stuff on my filesystem so I don't
>  > > want to image the whole filesystem.
>  > 
>  > That just takes some sensible planning...
> 
> Needs change, systems grow, systems added. Not all of us can afford to
> start with a lifetime of disk space... Not all of us have perfect
> foresight when we set up the system the first time...
> 
> Suppose I want to change filesystems (say move to ZFS), exactly how do
> I do that now other than to do a file copy operation including hard
> links that could take the better part of a day or more, assuming it
> doesn't crash or slow to a crawl due to memory constraints.

The easy way is to build a new one the way you want while keeping the 
old one around for as long as its history is relevant.  Or, if you have 
been making offsite copies and are prepared to access them, just use 
that for your history or emergencies while the new one fills in.

When I've done this sort of thing I've usually used rsync (-aH) to copy 
a few of the individual pc directories that are backed up over slow 
remote links and not worried about their pool entries since the new 
drive was large enough and I'm assuming that the pool will fill in and 
the old entries will expire in a few weeks.  The local machines can all 
complete full backups over a weekend so if I start on a Friday, 
everything is on the new drive by Monday.

>  > > I just want to image the
>  > > backups. Also, not all filesystems support efficient methods for
>  > > imaging a partially filled filesystem.
>  > 
>  > Why is it that you are concerned about efficiency here, but not in your 
>  > mythical database system?
> 
> I'm not talking about marginal efficiency. I'm talking about there not
> being a *practical* way to replicate a large Backuppc pool  in any
> reasonable amount of time given the number of hard links other than
> imaging the entire filesystem.

But what's the objection to imaging the filesystem?  It works.  And 
anything up to 2TB can be imaged to a single disk that you can toss in 
your briefcase, ready to go anywhere.

>  > Somehow I don't see how having to install, tune, and maintain an 
>  > otherwise unneeded database fits into the concept of abstracting away 
>  > functionality.  You have to live with filesystems anyway so you might as 
>  > well learn how to manage them.
> 
> I don't see the any major requirement for tuning and maintaining a
> metadata database unless you are doing huge enterprise size backups.

But I don't need a database at all - or to deal with any of the things 
that can go wrong with one.  It's a lot more complexity than just 
mounting a partition in the right place.

-- 
Les Mikesell
  lesmikes...@gmail.com

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
>  > 
>  > One simple thing that I've sometimes thought would be useful would be a 
>  > way to re-create the pool links down any PC tree.  That way you could 
>  > rsync individual pc directories to an offsite location which usually 
>  > works OK even with -H (I suppose there is some limit..) and then 
>  > reconstruct the pooling to reclaim the duplicate space, repeating for 
>  > each host or new backup without ever having to deal with the size of the 
>  > combined pool/pc tree.
>  > 
> 
> This would be a lot easier if the footer of the pool files included
> the md5sum checksum name of each pool file. For compressed pool files,
> this would just mean extending the footer by 16 bytes. Actually for
> speed, it would probably be better to store this information in the
> header after the initial magic byte. Even better add another couple of
> bytes to identify which element of the chain is being referenced when
> multiple files have the same pool hash (note: this would have to be
> adjusted when BackupPC_nightly runs and re-arranges the chain
> numbering).
> 
> Then you could pretty easily find the corresponding pool file from any
> of the hard links without the usual reverse-lookup problem when trying
> to identify the pool file from the hard-link inode in the pc file.

I'm not sure that would help much.  In the scenario I mentioned, the 
pool file won't exist at all or the matching content may have been 
re-linked with a different name due to differences in collisions.  I'd 
just like to be able to go through more or less the same motions the 
original server did when adding new entries, but on a backup copy or 
after an initial copy to a replacement server.  Then it would also need 
to periodically clean the pool like BackupPC_nightly if it is just a backup.

> Backing up the BackupPC data would then be as simple as the following:
> 1. Shutdown BackupPC
> 2. Copy the pool to the new destination (no hard links)
> 3. Recurse through the pc directories as follows:
>   - Copy directory entries to the new destination (i.e. recreate
> directories using something like mkdir)
>   - Copy regular files with nlinks=1 to the new destination
>   - For hard-linked files, use the header (or footer) to find the
> cpool pathname (reconstructed from the hash and the chain
> number). Then create the corresponding link on the new
> destination.
> 4. Restart BackupPC

This can sort-of be done now with BackupPC_tarPCCopy as long as nothing 
changes between copying the pool and getting the pc tree copy with it. 
But, I'd like something that would work for only a subset of the hosts 
or would let the remote copy remove old backups at a slower or faster 
pace than the master.

> If you don't add the pool hash information to the cpool file
> header/footer, then you could still do a similar process by adding an
> intermediate step (say 2.5) of creating a lookup table by recursing
> through the pool and associating inodes with cpool entries. Then in
> step 3 you would use the inode number of each hard-linked file in the
> pc directory to look up the corresponding link that needs to be
> created. This would require some cleverness to make the lookup fast
> for large pools where the entire table might not fit into memory. My
> only concern is that this may require O(n^2) or O(nlogn) operations
> vs. the O(n) for the first method.

But this requires the pool to be in sync with the master.  I'd rather 
recompute a hash and deal with collisions like the server normally does. 
  The structure of the pool is designed to make this reasonably fast, 
although I think the hash is based on the uncompressed content and you'd 
have a compressed copy at this point.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

first of all, could you all please write just a little bit slower? :)

Les Mikesell wrote on 2009-06-02 16:10:41 -0500 [Re: [BackupPC-users] Backing 
up a BackupPC server]:
> Peter Walter wrote:
> > Les has been a strong advocate for his position. However, backuppc as it 
> > is currently designed does not meet my need to remotely backup all kinds 
> > of  computers, including other backuppc servers.

I honestly doubt that a BackupPC pool is something you would want to backup
with BackupPC just like any other set of files. For one, you're compressing
already compressed content. Then, you are pooling that with normal content
(which is unlikely to give any hits). Finally, you can do practically nothing
sensible except restore the complete thing. Do you really want a backup
history of a BackupPC pool? Would you not prefer the ability to access
individual files *within* the pool? You *might* want to restore the complete
pool if something goes wrong, but you might just as well (need to) start off
fresh in such a case and just want to keep the history (meaning *one history*,
not a history of histories).

> > I think the 
> > enhancements Jeffrey Kosowsky and I have outlined in this discussion 
> > would solve my problem, as well as a number of other problems, would 
> > significantly extend the functionality of backuppc, and also make it 
> > compatible with other platforms.

The one basic issue that everyone has so far been ignoring is that you would
need an atomic operation spanning database and file system. The link(2) and
unlink(2) system calls are atomic for free. I'm sure there are ways to emulate
such an operation, but I somehow doubt that is going to improve performance,
reliability and complexity.

Somehow I feel much better about many hardlinks pointing to one piece of
content than only one, to which only the database knows what it represents.
Just imagine what human or software failure (including the file system!) can
do with a couple of unlink(2) system calls in either case - not to mention
what happens if you clobber your database. Sure, you can
"cat /vmlinuz > /dev/hda1" just as well, but you at least need to be root for
that, which the BackupPC software is not.

Aside from that, you can't check the consistency of (database + FS) without a
special tool just for this purpose, because there is no innate relationship
between database content and FS content.

> > I am therefore going to take this 
> > discussion over to the backuppc-devel and ask Craig what he and others 
> > over there think.

Do you think that will differ from what they think over here?


The issue Jeffrey has been adressing - attrib files - is, in my opinion, just
one little (and maybe unimportant) part of the story. I don't see much
difference between implementing this as it is or with a database. If you're so
concerned about performance here (and not about storage requirements), store
attrib files uncompressed and unpooled. You'll waste space, but so will a
database. Why introduce a single point of failure?

The important step is reference counting, and that doesn't happen in attrib
files.

Regards,
Holger

P.S.: I don't really see why you would need to access several *attrib files*
  for creating a view of one directory. The files are spread out, that
  much is true, but shouldn't a single attrib file give you the whole
  picture?

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread David Williams
I did finally get around to hacking my drive and enabling NFS.  I can now
mount it as NFS and backuppc is now working again :)

However, there might still be something wrong with my setup.

I am running Mandriva 2009.0 and installed backuppc from the Mandriva
respostiory.

I updated the config.pl file and one of the things that I did change was the
TopDir parameter, which I set to /backups

My backups are currently going to /backups/pc/, where  is the host
that I am backing up.

I was re-reading the email below about the pool files and I didn't see
anything under /backups/cpool or /backups/pool.

On further investigation I do see files/directories under
/var/lib/backuppc/cpool.  By changing the TopDir parameter in the config.pl
file have I messed anything up ?  Backups seem to be working (very slowly),
but they are working.  I am assuming that this is ok but not sure.

Under /var/lib/backuppc/cpool are directories like 0/ 1/ 2/ 3/, etc but when
I drill down into some of these directories I don't see any files.

Would appreciate some help.


David Williams


-Original Message-
From: Les Mikesell [mailto:lesmikes...@gmail.com] 
Sent: Thursday, March 26, 2009 2:01 PM
To: dwilli...@dtw-consulting.com; General list for user discussion,
questions and support
Subject: Re: [BackupPC-users] External WD Worldbook Device

David Williams wrote:
> Les,
> 
> I believe that under the Mandriva package we are allowed to change the
> TopDir and it is in the config.pl file.

That's not likely. When you install from source, the configuration 
process modifies the code, embedding this location elsewhere.  If you 
don't re-run that process it won't be right.

> When I originally started Backuppc
> it was complaining that it could not create a hardlink under
> /backups/pc/ and /backups/cpool.  It wasn't looking at
> /var/lib/backuppc.  Even with version 3.0 I had TopDir set to /backups and
> backups were being created under /backups.

The test in 3.1 was added to catch this kind of mistake.

> Perhaps it has been 'failing', but I had valid backups that I have
restored
> from.  Whether it was wasting space I don't know, perhaps it was.  Also,
> even though I cannot currently do backups I can restore from existing
> backups and the restore is coming from /backups.  Based upon that I would
> say that the Mandriva package allows the change of the TopDir.

If pooling had ever worked you should have a large tree of files with 
long hash codes for names under the pool or cpool directory.  Does 'ls 
-lR' or 'find' show them?   Backups are stored under the appropriate pc 
directories in any case, but the pooling that eliminates duplicate 
copies can only work when the hardlinks succeed.  If you do have pooled 
files (and ls -l should show link counts > 1), then there must be 
something different about the mount options on the new system.

-- 
   Les Mikesell
 lesmikes...@gmail.com





No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 8.5.339 / Virus Database: 270.12.49/2149 - Release Date: 06/01/09
17:55:00


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread Les Stott

Les Mikesell wrote:

Tate1 wrote:
  

Thanks to all ! and sorry if I don't understand some answers properly but as i 
said my english is not very good.

Ok, i'll talk with my boss and see what he let me to do with the server. If 
there isn't any other solution I will do RAID between disks.

Just for curious: could I install 2 BACKUPPC's, one in each HDD? I mean, Disk1 
like now with the O.S. and the actual backuppc with PC1, and install another 
backuppc in Disk2 with pc2.

probably be a nonsense... but it's an idea



That is theoretically possible but would take some code changes to avoid 
  conficts with locations and network ports.  It might be easier to run 
vmware or virtualbox so it looks like a completely different machine 
that could run a stock version of backuppc as the 2nd instance.


  
You can install multiple instances of BackupPC on the one machine, each 
having its own web interface and own data store. I have done it, it does 
work, but it isn't pretty. In my instances i have backuppc on the main 
filesystem doing backups of network pc's and the second instance is 
installed on a usb drive backing up the localhost (but not the local 
backuppc data store) for offsite backup and includes rotating usb disks.
The backuppc on usb drive is slow to write backups - about 1mb/s which 
makes backups of 40-80gb take 10-12 hours, and i've had issues where the 
two cgi interfaces i created get confused over which backuppc instance 
they should be talking too and consequently you start editting the wrong 
configuration files.


While it does work if i had my time over again i would definitely choose 
a different approach such as vmware. All you need is plenty of memory 
which is relatively cheap.


regards,

Les Stott
--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
>  > 
>  > Backing up other backuppc servers is really a special case that might 
>  > deserve a special optimization.   But, I'm not sure that adding a 
>  > database automatically makes it any easier - unless you are thinking of 
>  > a common database that could arbitrate a common hashed filename that is 
>  > unique across all instances for every piece of content.  That's an 
>  > interesting idea but seems kind of fragile.
>  > 
> 
> Once we are talking about redoing things, I would prefer to use a
> full md5sum hash for the name of the pool file. You end up
> calculating this anyway for free when you use the rsync method
> (although with protocol <=28, you get a full file md4sum but with
> protocol >=30, I believe you have the true md5sum). This would
> simplify the ambiguity of having multiple indexed chain entries with
> the same partial md5sum.
> 
> With this approach then you would automatically have "a common hashed
> filename that is ['statistically'] unique across all instances for
> every piece of content."

Somehow the number of possible different file contents and the number 
possible md5sums don't seem quite statistically equivalent to me.  And 
then there's:

http://www.mscs.dal.ca/~selinger/md5collision/

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

Max Hetrick wrote on 2009-06-02 17:43:29 -0400 [Re: [BackupPC-users] Backing up 
a BackupPC server]:
> Skip Guenter wrote:
> > [...] I'm talking about the folks
> > who have a full time job that doesn't include "SysAdmin" but are trying
> > to keep a small office like environment backed up with minimal hardware
> > and skills.
> > [...]
> > This list seems dominated by SysAdmin types and that's understandable
> > and makes for a great source of knowledge.  However, I can't help but
> > wonder how many little shmucks like me are out there happily using (or
> > getting ready to use) this package in sub 40 or even sub 20 machine
> > environments.  I don't think ya'll hear from them much.

I tend to disagree. While my job *does* include system administration, it's
not that I'm desperately looking for ways to find more to do. I currently run
BackupPC at 2 client sites.

At site 1, I convinced my client that backups are better than no backups,
even if the data is, in his opinion, not crucial. With my knowledge of
BackupPC at the time, it was installed practically for free, costing just
some disk space that was available anyway.

At site 2, we've been running BackupPC alongside a tape-based backup scheme
(with some free but really crappy software) for a while, also because I was
familiar with the software and setting things up was no big deal. Since Debian
etch, the tape-based software won't run anymore. We first kept it running in a
chroot (with lots of bind mounts for the data), but have recently abandoned
it, because BackupPC does all we need, does it reliably and practically
maintenance-free. We back up one file server, one web server, one mail server
and one and a half notebooks. Oh, and one workstation at a remote location.
Not really an "enterprise type" setup.

When I find the time, I'll install it for my own machines. The issue here is
deciding which machine and which disk space to run it on, not setting it up.

> I consider myself a pretty 
> knowledgeable person overall using Linux, and have been using 
> RHEL/CentOS for about 6 or 7 years now. It's my job at my company, plus 
> I write technical articles online for a publication, but I am by no 
> means a filesystem and database guru.

And even if you were, how much time would you *want* to spend on tuning
performance and maintenance tasks?

> I think that anything to make the program better is welcome, but at what 
> cost? If complexity is added where people now have to have knowledgeable 
> people available in the database world, or in the filesystem world that 
> BackupPC is running on, then what advantage does that bring.

Especially, as has been noted by Les before, if you need that knowledge at the
precise moment in time when everything has gone wrong and you just desperately
need access to your backups?

> For my applications, BackupPC works wonderful exactly the way it is. I'm 
> not running enterprise grade stuff here though, so my opinion is of the 
> little guy type. Everyone's situation is different, though. Just my 
> thoughts, which might not mean much. It's just that the more I read the 
> thread, the more it seems that the typical BackupPC user isn't chiming 
> in. :)

I'm rather confident that most people will agree with this: I'd rather have
BackupPC be a piece of software that is easy to install, run and maintain,
without needing any additional knowledge, than not, if that is the only
difference.

If it is effectively feasible to implement certain features using a database,
and this speeds things up and has no downsides, then why not?

At this point, "enterprise grade users" will probably say, "we have
performance problems, and we'd be willing to overcome these by adding
complexity, even if it makes installing, running and maintaining BackupPC
more difficult". People *not* experiencing performance problems will tend
to disagree (and count me amongst them).

One thing to think about: is achieving the goal of making it easier to
replicate the pool off-site worth it that you may effectively *have to*
replicate your pool just to be sure a single crash won't wipe out your
backup history?

Regards,
Holger

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread Les Mikesell
David Williams wrote:
> I did finally get around to hacking my drive and enabling NFS.  I can now
> mount it as NFS and backuppc is now working again :)
> 
> However, there might still be something wrong with my setup.
> 
> I am running Mandriva 2009.0 and installed backuppc from the Mandriva
> respostiory.
> 
> I updated the config.pl file and one of the things that I did change was the
> TopDir parameter, which I set to /backups
> 
> My backups are currently going to /backups/pc/, where  is the host
> that I am backing up.
> 
> I was re-reading the email below about the pool files and I didn't see
> anything under /backups/cpool or /backups/pool.
> 
> On further investigation I do see files/directories under
> /var/lib/backuppc/cpool.  By changing the TopDir parameter in the config.pl
> file have I messed anything up ?  Backups seem to be working (very slowly),
> but they are working.  I am assuming that this is ok but not sure.
> 
> Under /var/lib/backuppc/cpool are directories like 0/ 1/ 2/ 3/, etc but when
> I drill down into some of these directories I don't see any files.
> 
> Would appreciate some help.

If you installed from scratch from the sourceforge tarball you would be 
able to change the storage location, but the location is embedded in the 
code during the install process.  If you are using a version packaged 
for a distribution, this step has already been done and you can't change 
it with just the TopDir setting.  You really need to mount the disk at 
/var/lib/backuppc (copy everything that needs to be there first). You'll 
probably also need to mount the disk async to get any performance.

-- 
   Les Mikesell
lesmikes...@gmail.com



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

Les Mikesell wrote on 2009-06-02 17:32:24 -0500 [Re: [BackupPC-users] Backing 
up a BackupPC server]:
> Jeffrey J. Kosowsky wrote:
> > [...]
> > Once we are talking about redoing things, I would prefer to use a
> > full md5sum hash for the name of the pool file. [...]
> > With this approach then you would automatically have "a common hashed
> > filename that is ['statistically'] unique across all instances for
> > every piece of content."
> 
> Somehow the number of possible different file contents and the number 
> possible md5sums don't seem quite statistically equivalent to me.  And 
> then there's:
> 
> http://www.mscs.dal.ca/~selinger/md5collision/

first of all, if you are *not* using rsync, you *don't* get a *full* md5sum
hash for free or even cheap. You (Jeffrey) know the code well enough to
realize that BackupPC goes to great pains to avoid writing to the pool disk
unless necessary. If you need to transfer the whole file (of arbitrary size)
before you can look up the pool entry, you *have to* write a temporary copy
(probably compressed, too, giving up the benefits you gain from only
compressing once and decompressing when matching). You have to handle
collisions just the same (meaning re-reading your temporary copy and comparing
to the pool file). Yuck.

Yes, you can special-case small files that fit into memory, but yuck just the
same.

If you use a *partial* md5sum, there's no gain from rsync, and you trivially
get collisions just like you do now.

That is not to say, if we end up using a database, that it would not be a good
idea to store the full md5sum in the database. In fact, with a database, file
names would be somewhat arbitrary, and I'd propose keeping them *short* for
the sake of rsync et al. and file lists.

Regards,
Holger

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Steve Willoughby
Max Hetrick wrote:
> Skip Guenter wrote:
>> This list seems dominated by SysAdmin types and that's understandable
>> and makes for a great source of knowledge.  However, I can't help but
>> wonder how many little shmucks like me are out there happily using (or
>> getting ready to use) this package in sub 40 or even sub 20 machine
>> environments.  I don't think ya'll hear from them much.

Yeah, it's a mixture.  I have a sysadmin background but am using
BackupPC at home to back up my little network of a half-dozen systems
and a couple Tb of data, and am happy with the way it's designed as it is.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2009-06-02 17:40:23 -0400 [Re: [BackupPC-users] 
Backing up a BackupPC server]:
> [...]
> Backing up the BackupPC data would then be as simple as the following:
> 1. Shutdown BackupPC
> 2. Copy the pool to the new destination (no hard links)
> 3. Recurse through the pc directories as follows:
>   - Copy directory entries to the new destination (i.e. recreate
> directories using something like mkdir)
>   - Copy regular files with nlinks=1 to the new destination
>   - For hard-linked files, use the header (or footer) to find the
> cpool pathname (reconstructed from the hash and the chain
> number). Then create the corresponding link on the new
> destination.
> 4. Restart BackupPC
> 
> If you don't add the pool hash information to the cpool file
> header/footer, then you could still do a similar process by adding an
> intermediate step (say 2.5) of creating a lookup table by recursing
> through the pool and associating inodes with cpool entries. Then in
> step 3 you would use the inode number of each hard-linked file in the
> pc directory to look up the corresponding link that needs to be
> created. This would require some cleverness to make the lookup fast
> for large pools where the entire table might not fit into memory. My
> only concern is that this may require O(n^2) or O(nlogn) operations
> vs. the O(n) for the first method.

you do, of course, realize that I've implemented most of that (after all, I
wrote so [1] in a reply to one of your messages [2]) - far enough to use it
myself for a local pool copy of an admittedly rather small pool (103 GB, 10
million directory entries, 4 million inodes). Nobody seemed to care. I've had
more important things to do, so I didn't continue my work on that subject.

Hope that helps.

Regards,
Holger

[1] <20081209031017.gm...@gratch.parplies.de>
[2] <18749.12572.675749.745...@consult.pretender>

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] How to use backuppc with TWO HDD

2009-06-02 Thread dan
On Tue, Jun 2, 2009 at 12:36 AM, Adam Goryachev <
mailingli...@websitemanagers.com.au> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> dan wrote:
> > Unfortunately there is a 'rebuild hole' in many redundant
> > configurations.  In RAID1 that is when one drive fails and just one
> > remains.  This can be eliminated by running 3 drives so that 1 drive can
> > fail and 2 would still be operational.
> >
> > There are plenty of charts online to give % of redundancy for regular
> > RAID arrays.
>
> I must admit, this is something I have never given a lot of thought
> to... Then again, I've not yet worked in an environment with large
> numbers of disks. Of course, that is no excuse, and I'm always
> interested in filling in knowledge gaps...
>
> Is it really worthwhile considering a 3 drive RAID1 system, or even a 4
> drive RAID1 system (one hot spare). Of course, worthwhile depends on the
> cost of not having access to the data, but from a "best practice" point
> of view. ie, Looking at any of the large "online backup" companies, or
> gmail backend, etc... what level of redundancy is considered acceptable.
> (Somewhat surprising actually that google/hotmail/yahoo/etc have ever
> lost any data...)
>

Redundancy is the key for these companies.  They use databases that can be
spread out among servers and replicated many times accross their network.
Google for instance could have 20 copies of data in different servers and a
catastrophic loss at one facility has no effect on the whole(or little
effect anyway)

I might also add that these companies have a lot of losable data.  Caches
for website is simply rebuilt in the even data is lost.


>
> > With a modern filesystem capable of multiple copies of each file this
> > can be overcome. ZFS can handle multiple drive failures by selecting the
> > number of redundant copies of each file to store on different physical
> > volumes.  Simply put, a ZFS RAIDZ with 4 drives can be set to have 3
> > copies which would allow 2 drives to fail.  This is somewhat better than
> > RAID1 and RAID5  both because more storage is available yet still allows
> > up to 2 drives to fail before leaving a rebuild hole where the storage
> > is vulnerable to a single drive failure during a rebuild or resilver.
>
> So, using 4 x 100G drives provides 133G usable storage... we can lose
> any two drives without any data loss. However, from my calculations
> (which might be wrong), RAID6 would be more efficient. On a 4 drive 100G
> system you get 200G available storage, and can lose any two drives
> without data loss.
>
Well, really the key to filesystems with build in volume management is that
a large array an be broken down into smaller chunks with various levels of
redundancy across different data stores.  using 4x100 you would likely do a
raidz2 which calculates 2 peices of parity for each file which is something
like raid6.

The real issue with raid6 is abismal performance on software raid because of
the double parity compute and limited support in hardware cards and
similarly the load on the card's cpu and slower performance.

The arguement is always data safety vs access speed.  Keep in mind that the
raid5 write whole also applies to raid6.



>
> > Standard RAID is not going to have this capability and is going to
> > require more drives to improve though each drive also decreases
> > reliability has more drives are likely to fail.
>
> Well, doesn't RAID6 do exactly that (add an additional drive to improve
> data security)? How is ZFS better than RAID6? Not that I am suggesting
> ZFS is bad, I'm just trying to understand the differences...
>

raid6 has a write whole during parity computation that can catch you
suprisingly often.  zfs does not have this.  not to be a zfs fanboy, btrfs
will also have such capabilities.


>
> > ZFS also is able to put metadata on a different volume and even have a
> > cache on a different volume which can spread out the chance of a loss.
> > very complicated schemes can be developed to minimize data loss.
>

> In my experience, if it is too complicated:
> 1) Very few people use it because they don't understand it
> 2) Some people who use it, use it in-correctly, and then don't
> understand why they lose data (see the discussion of people who use RAID
> controller cards but don't know enough to read the logfile on the RAID
> card when recovering from failed drives).
>
> Also, I'm not sure what the advantage of metadata on a different volume
> is? If you lose all your metadata how easily will you recover your
> files? Perhaps you should be just as concerned about protecting your
> metadata as you do for your data, thus why separate it?
>
> What is the advantage of using another volume as a cache ? Sure, you
> might be lucky enough that the data you need is still in cache when you
> lose the whole array, but that doesn't exactly sound like a scenario to
> plan for? (For performance, the cache might be a faster/more expensive
> drive, (read SSD or sim

Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

It seems to me that all this boils down to different people having
different requirements. Might I (re)-propose the following:

1) Keep backuppc/etc exactly as-is
2) Extend backuppc-link so that after it links a file to the pool, or
replaces a file in the pc directory with a link to the pool, then it
will also add an entry to a DB (use the perl-DBI interface to make it
generic or similar). Add some code to backuppc_nightly to modify the
database when it modifies the pool as well.

Now, if Conf{UseDB} = 0; then that code is skipped, if not, then go
read a couple of other variables, and write the data to a DB.

So far, those people who want stuff in a DB get it (with a performance
cost) and those who don't want it, don't get it.

Next, write a tool that can read the database + pool, and create the
hardlinks under the pc directory. (So after you copy the pool, you can
re-create the hardlinks).

Thus, for people you want to copy their entire backuppc pool/etc, they
can do it with a database + overhead, and everyone else can continue
as-is.

I'm sure the above doesn't handle 100% of needed cases, but with a
little effort I'm sure it could be done. Of course, it seems the
difficulty will be in finding someone with both the "itch" as well as
the skill to complete the task.

Regards,
Adam
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkolw+kACgkQGyoxogrTyiWidQCfTBpwYsMp9HSKrO6Kt7VigFQy
6U4AoJNB/DPi7LKddbupHE7GLNQLWLxX
=OUUe
-END PGP SIGNATURE-


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] why hard links?

2009-06-02 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2009-06-02 14:26:44 -0400 [Re: [BackupPC-users] 
why hard links?]:
> Les Mikesell wrote at about 12:32:14 -0500 on Tuesday, June 2, 2009:
>  > Jeffrey J. Kosowsky wrote:
>  > > [...]
>  > >  > If you have to add an extra system call to lock/unlock around some
>  > >  > other operation you'll triple the overhead.
>  > > 
>  > > I'm not sure how you definitively get to the number "triple". Maybe
>  > > more maybe less. 

I agree. It's probably more.

>  > Ummm, link(), vs. lock(),link(),unlock() equivalents, looks like 3x the 
>  > operations to me - and at least the lock/unlock parts have to involve 
>  > system calls even if you convert the link operation to something else.
> 
> 3x operations != 3x worse performance
> Given that disk seeks times and input bandwidth are typical
> bottlenecks, I'm not particularly worried about the added
> computational bandwidth of lock/unlock.

Since you can't lock() the file you are about to create (can you?), you'll
probably need a different file - either one big global lock file or one on the
directory level. I'm not familiar with the kernel code, but I wouldn't be
surprised if that got you the disk seeks you are worried about.

>  > > Les - I'm really not sure why you seem so intent on picking apart a
>  > > database approach.
>  > 
>  > I'm not. I'm encouraging you to show that something more than black 
>  > magic is involved. [...]
> 
> I never claimed performance. My claims have been around flexibility,
> extendability, and transportability.

And I'm worried about complexity and robustness:
1. Complexity
   What additional skills do you need to set up the BackupPC version you are
   imagining and keep it running?
2. Complexity
   Who is going to write and, more importantly, debug the code? How do you test
   all the new cases that can go wrong? How do people feel about entrusting
   vital data to a system they no longer have a basic understanding of?
3. Complexity
   When everything goes wrong, what can you still do with the data? Currently,
   you can locate a file in the file system (file mangling is not that
   complicated) or even with an FS debugging tool in an image of an
   unmountable FS and BackupPC_zcat it to get the contents. Attributes are lost
   that way, but for regaining the contents of a few crucial files, this can
   work quite well. It could be made to even restore the attributes with only
   slightly more requirements (intact attribs file). With a database, can you
   do anything at all without a completely running BackupPC system? What are
   the exact requirements? Database file? Database engine? Accessible pool
   file system?
4. Robustness, points of failure
   How do you handle losing single files, on-disk corruption of a few files?
   Losing/corrupting many files? Your database?

> I think all (or nearly all) of my 7 claimed advantages are
> self-evident.

Yes, mostly, though they were claimed in a different thread. I hope everyone
has multiple MUAs open ...

1. I don't see how "platform and filesystem independence" fits together with
   the use of a database, though. You are currently dependent on a POSIX file
   system. How is depending on one of a set of databases any better?

4. How does backing up the database and *a portion of the pool* work? Sure,
   you can make anything fault-tolerant, but are missing files faults of which
   you *want* to be tolerant?
   But yes, backing up the complete pool would be easier, though it's your
   responsibility to get it right (i.e. consistent), and there's probably no
   sane way to check.

5.1. Why is file name mangling a kludge, and in what way is storing file names
 in a database better?

5.2. What is non-standard about defining a file format any way you like? It's
 not like compressed pool files would otherwise adhere to a particular
 known file format. But yes, treating compressed and uncompressed files
 alike would be nice.

5.3. I'm not really sure encrypting files *on the server* does much, unless
 you are thinking of a remote storage pool. In particular, you need to be
 able to decrypt files not only for restoration, but also for pooling
 (unless you want an intermediate copy and an extra comparison).

5.5. Configuration stored in the database? Is that supposed to be an
 advantage?

6. If you mean access controlled by the database (different database users),
   I don't really see why you are worried about access to the *meta data* when
   the actual contents remain readable (you're not saying that it being such a
   huge amount of data is a security feature, are you?).
   If you mean that a database will make it easier to implement file level
   access control, I honestly don't see how.

7. How that? If you are less concerned about how much space you use, you can
   store things in a way that they can be accessed faster. But I still think
   you are mistaken in that multiple attrib files would need to be read. I've
   had t

Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread David Williams
>If you installed from scratch from the sourceforge tarball you would be 
>able to change the storage location, but the location is embedded in the 
>code during the install process.  If you are using a version packaged 
>for a distribution, this step has already been done and you can't change 
>it with just the TopDir setting.  You really need to mount the disk at 
>/var/lib/backuppc (copy everything that needs to be there first). You'll 
>probably also need to mount the disk async to get any performance.

Les, thanks for the info.  Can you just give me a little more info in
regards to what you mean about "copy everything that needs to be there
first".  If mount my drive from the current location (/backups) to
/var/lib/backuppc then won't everything that is currently in /backups be
where /var/lib/backuppc is now ?  Do you mean to copy whatever is currently
in /var/lib/backuppc to /backups first, and then mount the drive at
/var/lib/backuppc ?

I current mount the NFS drive as follows:

backupdevice1:/shares/internal/PUBLIC /backups nfs
rsize=8192,wsize=8192,nosuid,soft 0 0

I took a look at the options and there is no async option, just a sync
option.  According to the info sync does the following:  All I/O to the file
system should be done synchronously.


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

Adam Goryachev wrote on 2009-06-03 10:29:34 +1000 [Re: [BackupPC-users] Backing 
up a BackupPC server]:
> [...]
> Might I (re)-propose the following:
> [...]
> 2) Extend backuppc-link so that after it links a file to the pool, or
> replaces a file in the pc directory with a link to the pool, then it
> will also add an entry to a DB (use the perl-DBI interface to make it
> generic or similar). Add some code to backuppc_nightly to modify the
> database when it modifies the pool as well.
> [...]
> Next, write a tool that can read the database + pool, and create the
> hardlinks under the pc directory. (So after you copy the pool, you can
> re-create the hardlinks).
> 
> Thus, for people you want to copy their entire backuppc pool/etc, they
> can do it with a database + overhead, and everyone else can continue
> as-is.

I don't want to disappoint you, but that is trivial (to implement) even
without modifying BackupPC. If I'm allowed to write to a database, I can,
with a few lines of Perl code,

1. Create a table "pool" mapping inode numbers to pool files
   (that's a simple iteration over the pool adding (inode, path) pairs)

2. Create a second table "pc" mapping pc/ path names to inode numbers
   (that's a simple iteration over pc/ adding (path, inode) pairs)

Your pc-directory-re-creation tool now reads something like

SELECT   pool.path AS src, pc.path AS dst
FROM pc, pool
WHEREpc.inode = pool.inode
ORDER BY pc.path

and does "link ($src, $dest);" after creating any directories $dest might
need (would ordering by pc.inode work better?). Yes, that statement is going
to produce an *enormous* result table. Let's just hope the database is smart
enough to handle that. If not, you can always do a

SELECT pool.path
FROM   pool
WHERE  pool.inode = ?

and execute that for each inode number you find in pc/ ... you won't even need
the "pc" table then. But still: the "pool" table will be large. The size of
this table is what determines when rsync et al. will stop working. Let's hope
an index on pool.inode will make the DB work wonders. We're still accessing the
information in random inode order, so we're not likely to get (m)any benefits
from caching. No, this is probably not the way to go.

Obviously, the same is true if this information is added to the database
during BackupPC_link operation and maintained by BackupPC_nightly.

> I'm sure the above doesn't handle 100% of needed cases,

Same for my ideas.

If I'm *not* allowed to write to a database, I'll need slightly more brains,
but the difficult part is handled by a "sort" invocation. Sorting 1.1 GB took
2.5 minutes - considerably less than traversing the pool and pc directories.
Actually, the same approach can be used with a database and one single table,
but then the database is really doing nothing except sorting data and
supplying it one time in the correct order.


I like your idea of separating the pool copy operation from re-establishing
the pc/ links. That sort of eliminates all RPC-type problems from my original
task. I only need to figure out how to handle non-pooled content, but that's
probably only stuff immediately in the pc/$host directories and 0-byte files
(which aren't that hard to re-create on the target system ;-). That way, I
would actually end up with something like BackupPC_tarAllPCsCopy without the
tar part (and without the overhead of calculating all pool hashes).

Regards,
Holger

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Steve
I am not a programmer, just a dumb user that thinks the whole thing is
kinda magic.  So perhaps this is a stupid question.  But since
BackupPc somehow knows where all the files are and what is linked to
what etc. why can't there be a button in the CGI interface that
instead of "restore", says something like, "as of the last inc or
whatever, make a perfect copy of what i want over here" and just let
that replicate the pc directories and pool and everything over to the
backup, offsite, or external drive?
Steve
P.S. I have been using this software for about 6 months, love it.  But
still have not succeeded in backing IT up without a crash.  Have tried
cp -dpr and also rsync with -H but both bombed out with "too many hard
links" errors.  If I could get this backed up i'd say it was the
perfect backup solution.

On Tue, Jun 2, 2009 at 8:29 PM, Adam Goryachev
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> It seems to me that all this boils down to different people having
> different requirements. Might I (re)-propose the following:
>
> 1) Keep backuppc/etc exactly as-is
> 2) Extend backuppc-link so that after it links a file to the pool, or
> replaces a file in the pc directory with a link to the pool, then it
> will also add an entry to a DB (use the perl-DBI interface to make it
> generic or similar). Add some code to backuppc_nightly to modify the
> database when it modifies the pool as well.
>
> Now, if Conf{UseDB} = 0; then that code is skipped, if not, then go
> read a couple of other variables, and write the data to a DB.
>
> So far, those people who want stuff in a DB get it (with a performance
> cost) and those who don't want it, don't get it.
>
> Next, write a tool that can read the database + pool, and create the
> hardlinks under the pc directory. (So after you copy the pool, you can
> re-create the hardlinks).
>
> Thus, for people you want to copy their entire backuppc pool/etc, they
> can do it with a database + overhead, and everyone else can continue
> as-is.
>
> I'm sure the above doesn't handle 100% of needed cases, but with a
> little effort I'm sure it could be done. Of course, it seems the
> difficulty will be in finding someone with both the "itch" as well as
> the skill to complete the task.
>
> Regards,
> Adam
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkolw+kACgkQGyoxogrTyiWidQCfTBpwYsMp9HSKrO6Kt7VigFQy
> 6U4AoJNB/DPi7LKddbupHE7GLNQLWLxX
> =OUUe
> -END PGP SIGNATURE-
>
>
> --
> OpenSolaris 2009.06 is a cutting edge operating system for enterprises
> looking to deploy the next generation of Solaris that includes the latest
> innovations from Sun and the OpenSource community. Download a copy and
> enjoy capabilities such as Networking, Storage and Virtualization.
> Go to: http://p.sf.net/sfu/opensolaris-get
> ___
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>



-- 
"It turns out there is considerable overlap between the smartest bears
and the dumbest tourists."

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/



[BackupPC-users] SMB vs RSYNCD

2009-06-02 Thread clint woodrow


Matthias Meyer wrote:
> rsync(d) transmit only changed parts of a file (http://www.samba.org/rsync).
> e.g. a 2.6 GB mailbox.pst and receive one new mail at sunday. rsync will
> only transmit this one new mail.
> 
> And you need a client on windows side. You can use cwRsync or rsync within a
> cygwin environment.
> 
> br
> Matthias


Thanks for the details Matthias.  I was wondering if one of you could confirm 
something else for me.  We've been testing BackupPC for the last few months.  
The local backup server is up and running great and we're getting ready to put 
up an offsite backup server for redundancy and offsite security.  

Up to this point, I've assumed I wouldn't be able to roll PSTs into the backups 
because they'd be too large to send over the limited bandwidth every night and 
that they'd take too much space in the pool.  From what Matthias has said, it 
looks like as long as we have the initial backup, subsequent backups shouldn't 
take nearly as long since rsync is sending changes only.  Can I also take from 
this that BackupPC won't be storing full copies for every backup?  i.e. If the 
user has a 1 GB pst and adds one message of 2k, the pool requirements are only 
1 GB + 2k, not 2 GB + 2k?

Thanks,
Clint

+--
|This was sent by clint.wood...@dsl.state.or.us via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--



--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread Les Mikesell
David Williams wrote:
>> If you installed from scratch from the sourceforge tarball you would be 
>> able to change the storage location, but the location is embedded in the 
>> code during the install process.  If you are using a version packaged 
>> for a distribution, this step has already been done and you can't change 
>> it with just the TopDir setting.  You really need to mount the disk at 
>> /var/lib/backuppc (copy everything that needs to be there first). You'll 
>> probably also need to mount the disk async to get any performance.
> 
> Les, thanks for the info.  Can you just give me a little more info in
> regards to what you mean about "copy everything that needs to be there
> first".  If mount my drive from the current location (/backups) to
> /var/lib/backuppc then won't everything that is currently in /backups be
> where /var/lib/backuppc is now ?  Do you mean to copy whatever is currently
> in /var/lib/backuppc to /backups first, and then mount the drive at
> /var/lib/backuppc ?

If you are just starting, I'd remove the backuppc package, mount the 
drive at /var/lib/backuppc, then reinstall the package so everything 
lands in the right place.  If you have things you need to save, you need 
to copy them to where they would have landed with that setup.  You could 
   rename your current /var/lib/backuppc to something else and create a 
new one, change the mount point and copy, or copy first.  Renaming the 
existing directory is probably the best approach since if you mount on 
top of it the contents will be hidden but still consume space.

> I current mount the NFS drive as follows:
> 
> backupdevice1:/shares/internal/PUBLIC /backups nfs
> rsize=8192,wsize=8192,nosuid,soft 0 0
> 
> I took a look at the options and there is no async option, just a sync
> option.  According to the info sync does the following:  All I/O to the file
> system should be done synchronously.

I think async is the default these days.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Holger Parplies
Hi,

Steve wrote on 2009-06-02 21:41:56 -0400 [Re: [BackupPC-users] Backing up a 
BackupPC server]:
> I am not a programmer, just a dumb user that thinks the whole thing is
> kinda magic.  So perhaps this is a stupid question.  But since
> BackupPc somehow knows where all the files are and what is linked to
> what etc. why can't there be a button in the CGI interface that
> instead of "restore", says something like, "as of the last inc or
> whatever, make a perfect copy of what i want over here" and just let
> that replicate the pc directories and pool and everything over to the
> backup, offsite, or external drive?

if a problem is hard to solve, putting a button in a CGI interface is not
going to make it any easier. It's not as if we're saying "if we only knew what
BackupPC is doing". We know precisely what it is doing. It is letting the
Linux kernel (or rather the file system) handle some details so that it can
forget about them. That is a good thing. It just turns out that sometimes we
would like to know what BackupPC decided to forget and the file system (or
Linux kernel) wasn't designed to tell us.

> P.S. I have been using this software for about 6 months, love it.  But
> still have not succeeded in backing IT up without a crash.  Have tried
> cp -dpr and also rsync with -H but both bombed out with "too many hard
> links" errors.

Can you tell me more about that (off-list, if you like)? I don't think there
is an error message "too many hard links" for either cp or rsync, so it
probably just died or even crashed the system? Can you give me some details
about your pool (size, number of inodes, how many hosts/backups, number of
directory entries, if you can figure that out)? Where do you want to copy it
to? Another disk accessible to the same computer? Is there a reason not to do
an image copy? How long can you afford to shut down BackupPC for the copy?
Are you willing to try out some code which could turn out not to work
correctly (won't destroy your pool, don't worry :)? How much memory has your
BackupPC server machine got?

> If I could get this backed up i'd say it was the perfect backup solution.

I'm sure we'll find something.

Regards,
Holger

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Les Mikesell
Steve wrote:
> I am not a programmer, just a dumb user that thinks the whole thing is
> kinda magic.  So perhaps this is a stupid question.  But since
> BackupPc somehow knows where all the files are and what is linked to
> what etc. why can't there be a button in the CGI interface that
> instead of "restore", says something like, "as of the last inc or
> whatever, make a perfect copy of what i want over here" and just let
> that replicate the pc directories and pool and everything over to the
> backup, offsite, or external drive?

The problem is that there is no good mechanism to find the "other" names 
  that are hardlinked together.  Normally you don't need to know - 
except when you are trying to copy the whole thing maintaining 
consistency.  The not-so-good way to do it is to build a table of names 
and inode numbers for the whole tree and link the names with matching 
inodes as they are copied.  As you may have noticed, this doesn't scale 
very well.

> Steve
> P.S. I have been using this software for about 6 months, love it.  But
> still have not succeeded in backing IT up without a crash.  Have tried
> cp -dpr and also rsync with -H but both bombed out with "too many hard
> links" errors.  If I could get this backed up i'd say it was the
> perfect backup solution.

Rsync 3.x with plenty of RAM on both systems might have a chance. But 
assuming you put the backup archive on its own partition, you can 
unmount it and use some form of image copy of the raw device (dd to 
another matching disk, dd to a file, or over ssh to a file or device on 
another machine, etc.)   Clonezilla would probably work if you don't 
mind rebooting from a CD while the copy is made - it will offer to copy 
to another local disk or to an image with various forms of network 
access (cifs/nfs/sshfs) which you can restore on another machine the 
same way - or just save in case you need it later.

Or, use a raid mirror that you break periodically and rotate drives. 
I've described my approach several times.


-- 
   Les Mikesell
lesmikes...@gmail.com

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] SMB vs RSYNCD

2009-06-02 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

clint woodrow wrote:
> 
> Matthias Meyer wrote:
>> rsync(d) transmit only changed parts of a file
>> (http://www.samba.org/rsync). e.g. a 2.6 GB mailbox.pst and receive
>> one new mail at sunday. rsync will only transmit this one new mail.
>> 
>> 
>> And you need a client on windows side. You can use cwRsync or rsync
>> within a cygwin environment.
>> 
>> br Matthias
> 
> 
> Thanks for the details Matthias.  I was wondering if one of you could
> confirm something else for me.  We've been testing BackupPC for the
> last few months.  The local backup server is up and running great and
> we're getting ready to put up an offsite backup server for redundancy
> and offsite security.
> 
> Up to this point, I've assumed I wouldn't be able to roll PSTs into
> the backups because they'd be too large to send over the limited
> bandwidth every night and that they'd take too much space in the
> pool.  From what Matthias has said, it looks like as long as we have
> the initial backup, subsequent backups shouldn't take nearly as long
> since rsync is sending changes only.  Can I also take from this that
> BackupPC won't be storing full copies for every backup?  i.e. If the
> user has a 1 GB pst and adds one message of 2k, the pool requirements
> are only 1 GB + 2k, not 2 GB + 2k?

No, rsync handles the transfer, so you will only transfer
(approximately) 2k for the changes. (Obviously there are additional
overheads for the non-changed sections, but they are very small).
However, once backuppc has the whole new version of the file, it will
add this 1.002G file to the pool, so the pool will consume 2G + 2k. Of
course, if you use compression, then the size of the cpool will be
smaller depending on how compressible your pst files are.

Hope this helps.

Regards,
Adam

- --
Adam Goryachev
Website Managers
www.websitemanagers.com.au
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkol8S0ACgkQGyoxogrTyiW+agCgh479PTPHXyhaokX8kvmAhoi4
ODcAn2bfjO9x2+MW0OgHwIKGKG8+jNoj
=xJd3
-END PGP SIGNATURE-

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] SMB vs RSYNCD

2009-06-02 Thread Holger Parplies
Hi,

clint woodrow wrote on 2009-06-02 19:50:08 -0400 [[BackupPC-users]  SMB vs 
RSYNCD]:
> Matthias Meyer wrote:
> > rsync(d) transmit only changed parts of a file (http://www.samba.org/rsync).
> > e.g. a 2.6 GB mailbox.pst and receive one new mail at sunday. rsync will
> > only transmit this one new mail.
> [...]
> Up to this point, I've assumed I wouldn't be able to roll PSTs into the
> backups because they'd be too large to send over the limited bandwidth
> every night and that they'd take too much space in the pool.  From what
> Matthias has said, it looks like as long as we have the initial backup,
> subsequent backups shouldn't take nearly as long since rsync is sending
> changes only.

this is true as long as rsync can properly detect the changes. I don't know
anything about the format of PST files. rsync usually does a good job, so if
the changes are typically small and local, it will probably work. You can
always test with command line rsync (use '-v' and '-P', see the man page for
details).

> Can I also take from this that BackupPC won't be storing full copies for
> every backup?

No, as Adam has explained. If it's any consolation, there are also Linux MUAs
(evolution) that consider mbox format such a good idea that they don't seem to
support maildir format.

Regards,
Holger

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] SMB vs RSYNCD

2009-06-02 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Holger Parplies wrote:
> Hi,
> 
> clint woodrow wrote on 2009-06-02 19:50:08 -0400 [[BackupPC-users]  SMB vs 
> RSYNCD]:
 >> Can I also take from this that BackupPC won't be storing full copies for
>> every backup?
> No, as Adam has explained. If it's any consolation, there are also Linux MUAs
> (evolution) that consider mbox format such a good idea that they don't seem to
> support maildir format.

BTW, if it is any help, another possible solution is to not backup the
pst files, but use an IMAP compatible server, which uses maildir format,
and then you can backup the IMAP server. We do this in a few locations.

Regards,
Adam

- --
Adam Goryachev
Website Managers
www.websitemanagers.com.au
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkol+bAACgkQGyoxogrTyiULFgCfaVgnhhptIIZNkos5w0LeX/H1
i8oAmN1ki8s1o3k7rqmhyV9tRoceO0M=
=7jt9
-END PGP SIGNATURE-

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 17:32:24 -0500 on Tuesday, June 2, 2009:
 > Jeffrey J. Kosowsky wrote:
 > >  > 
 > >  > Backing up other backuppc servers is really a special case that might 
 > >  > deserve a special optimization.   But, I'm not sure that adding a 
 > >  > database automatically makes it any easier - unless you are thinking of 
 > >  > a common database that could arbitrate a common hashed filename that is 
 > >  > unique across all instances for every piece of content.  That's an 
 > >  > interesting idea but seems kind of fragile.
 > >  > 
 > > 
 > > Once we are talking about redoing things, I would prefer to use a
 > > full md5sum hash for the name of the pool file. You end up
 > > calculating this anyway for free when you use the rsync method
 > > (although with protocol <=28, you get a full file md4sum but with
 > > protocol >=30, I believe you have the true md5sum). This would
 > > simplify the ambiguity of having multiple indexed chain entries with
 > > the same partial md5sum.
 > > 
 > > With this approach then you would automatically have "a common hashed
 > > filename that is ['statistically'] unique across all instances for
 > > every piece of content."
 > 
 > Somehow the number of possible different file contents and the number 
 > possible md5sums don't seem quite statistically equivalent to me.  And 
 > then there's:
 > 
 > http://www.mscs.dal.ca/~selinger/md5collision/
 > 

That's the whole point. md5sum collisions are exceedingly rare with
any imaginable number of files since there are 2^128 different md5sums
- so even if you have billions of files, the chance of a collision
is infinitesimal. Suppose you have 1 trillion (unique) files that is
just less than 2^40, which means that the chance of at least one
collision is approximately 1- e^(-2^40 * (2^40-1)/2^129) ~ 2^(-49)
which is less than 1 in 500 trillion [this is just a generalization of
the birthday problem]. If you have "only" 1 billion *unique* files
then the chance of at least one collision is less than 2^(-55) which
is less than 1 in 36 quadrillion.

Yes there are some known examples of md5sum collisions but they are
all artificial. I don't believe anyone has ever "accidentally" come
across one in a real world situation. In fact since digital signatures
rely on statistics like this if md5sum collisions were even remotely
possible in real life, the whole electronic financial system would be
unreliable.

Hence, I stand by my statement that in any currently conceivable
BackupPC situation, the md5sums are "statistically" unique.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 17:57:41 -0500 on Tuesday, June 2, 2009:
 > David Williams wrote:
 > > I did finally get around to hacking my drive and enabling NFS.  I can now
 > > mount it as NFS and backuppc is now working again :)
 > > 
 > > However, there might still be something wrong with my setup.
 > > 
 > > I am running Mandriva 2009.0 and installed backuppc from the Mandriva
 > > respostiory.
 > > 
 > > I updated the config.pl file and one of the things that I did change was 
 > > the
 > > TopDir parameter, which I set to /backups
 > > 
 > > My backups are currently going to /backups/pc/, where  is the host
 > > that I am backing up.
 > > 
 > > I was re-reading the email below about the pool files and I didn't see
 > > anything under /backups/cpool or /backups/pool.
 > > 
 > > On further investigation I do see files/directories under
 > > /var/lib/backuppc/cpool.  By changing the TopDir parameter in the config.pl
 > > file have I messed anything up ?  Backups seem to be working (very slowly),
 > > but they are working.  I am assuming that this is ok but not sure.
 > > 
 > > Under /var/lib/backuppc/cpool are directories like 0/ 1/ 2/ 3/, etc but 
 > > when
 > > I drill down into some of these directories I don't see any files.
 > > 
 > > Would appreciate some help.
 > 
 > If you installed from scratch from the sourceforge tarball you would be 
 > able to change the storage location, but the location is embedded in the 
 > code during the install process.  If you are using a version packaged 
 > for a distribution, this step has already been done and you can't change 
 > it with just the TopDir setting.  You really need to mount the disk at 
 > /var/lib/backuppc (copy everything that needs to be there first). You'll 
 > probably also need to mount the disk async to get any performance.

At the level of /var/lib/backuppc you can use a soft link. i.e., you
can mount the disk anywhere and then just create a soft link from
there to /var/lib/backuppc.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 00:57:28 +0200 on Wednesday, June 3, 2009:
 > Hi,
 > 
 > Les Mikesell wrote on 2009-06-02 17:32:24 -0500 [Re: [BackupPC-users] 
 > Backing up a BackupPC server]:
 > > Jeffrey J. Kosowsky wrote:
 > > > [...]
 > > > Once we are talking about redoing things, I would prefer to use a
 > > > full md5sum hash for the name of the pool file. [...]
 > > > With this approach then you would automatically have "a common hashed
 > > > filename that is ['statistically'] unique across all instances for
 > > > every piece of content."
 > > 
 > > Somehow the number of possible different file contents and the number 
 > > possible md5sums don't seem quite statistically equivalent to me.  And 
 > > then there's:
 > > 
 > > http://www.mscs.dal.ca/~selinger/md5collision/
 > 
 > first of all, if you are *not* using rsync, you *don't* get a *full* md5sum
 > hash for free or even cheap. You (Jeffrey) know the code well enough to
 > realize that BackupPC goes to great pains to avoid writing to the pool disk
 > unless necessary. If you need to transfer the whole file (of arbitrary size)
 > before you can look up the pool entry, you *have to* write a temporary copy
 > (probably compressed, too, giving up the benefits you gain from only
 > compressing once and decompressing when matching). You have to handle
 > collisions just the same (meaning re-reading your temporary copy and 
 > comparing
 > to the pool file). Yuck.
 > 
 > Yes, you can special-case small files that fit into memory, but yuck just the
 > same.
 > 
 > If you use a *partial* md5sum, there's no gain from rsync, and you trivially
 > get collisions just like you do now.
 > 
 > That is not to say, if we end up using a database, that it would not be a 
 > good
 > idea to store the full md5sum in the database. In fact, with a database, file
 > names would be somewhat arbitrary, and I'd propose keeping them *short* for
 > the sake of rsync et al. and file lists.
 > 
 > Regards,
 > Holger

I guess my point was as follows:
- If you use rsync, then you get the md5sums for free
- Even if you don't use rsync, given the speed of current processors,
  calculating the md5sum doesn't take any longer than a full file
  compare (though you can tell a file is different as soon as a
  difference arises, that is not really relevant since if a file is
  different you will have to copy it over anyway in which case the
  md5sum doesn't add significant overhead relative to the copy
  operation since you have to read in the file anyway)
- The md5sums for the pool only need to be calculated once and then
  appended (or prepended) to the pool file

I'm tired and I haven't looked at the code in a few months so maybe
I'm forgetting something but I'm having trouble remembering what is the
advantage of using the partial md5sum hashes on a fast (i.e. modern)
computer where the limitation is disk speed and/or network
bandwidth. Because it seems that any time you have to read/write the
entire file, calculating the md5sum will only introduce relatively
trivial overhead relative to the disk read/write or network transfer.

I like the idea of using the full md5sum for the following reasons:
1. It allows you to check file (and hence pool) integrity at any point
2. It can be used to "uniquely" (from a statistical perspective) label
   pool files without any real chance of a collision. If you are still
   worried about a collision with 128 bit md5sums, I'm sure simple
   ways can be found to extend it that make the chance of a collision
   even more infinitesimal.
3. If the md5sum is appended/prepended to the pool file then the name
   of the pool file can be found by reading any of its hard links in
   the pc tree
4. Full-file md5sums are consistent with protocol>30 rsync and come
   for "free" when using rsync. Since they are there anyway, why use
   an alternative and less precise (and also confusing) partial md5sum
   hash when you can use the full md5sum.
5. Using md5sums would get rid of the confusion between partial
   md5sums used for pool hash names, the md4sums used in protocol 28
   rsync and the regular *nix md5sum function.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 01:50:13 +0200 on Wednesday, June 3, 2009:
 > Hi,
 > 
 > Jeffrey J. Kosowsky wrote on 2009-06-02 17:40:23 -0400 [Re: [BackupPC-users] 
 > Backing up a BackupPC server]:
 > > [...]
 > > Backing up the BackupPC data would then be as simple as the following:
 > > 1. Shutdown BackupPC
 > > 2. Copy the pool to the new destination (no hard links)
 > > 3. Recurse through the pc directories as follows:
 > >- Copy directory entries to the new destination (i.e. recreate
 > >  directories using something like mkdir)
 > >- Copy regular files with nlinks=1 to the new destination
 > >- For hard-linked files, use the header (or footer) to find the
 > >  cpool pathname (reconstructed from the hash and the chain
 > >  number). Then create the corresponding link on the new
 > >  destination.
 > > 4. Restart BackupPC
 > > 
 > > If you don't add the pool hash information to the cpool file
 > > header/footer, then you could still do a similar process by adding an
 > > intermediate step (say 2.5) of creating a lookup table by recursing
 > > through the pool and associating inodes with cpool entries. Then in
 > > step 3 you would use the inode number of each hard-linked file in the
 > > pc directory to look up the corresponding link that needs to be
 > > created. This would require some cleverness to make the lookup fast
 > > for large pools where the entire table might not fit into memory. My
 > > only concern is that this may require O(n^2) or O(nlogn) operations
 > > vs. the O(n) for the first method.
 > 
 > you do, of course, realize that I've implemented most of that (after all, I
 > wrote so [1] in a reply to one of your messages [2]) - far enough to use it
 > myself for a local pool copy of an admittedly rather small pool (103 GB, 10
 > million directory entries, 4 million inodes). Nobody seemed to care. I've had
 > more important things to do, so I didn't continue my work on that subject.
 > 

My apologies. I had forgotten (early alzheimers?) that you had done
that work. I do need to find the time to look at your code more
closely so I don't forget it in the future. And I do care because I
think it is an important utility both practically and also from the
theoretical perspective of "playing" with the structure.

As a partial excuse for my forgetfulness, I was more focused on the
advantages of having the full file md5sum prepended/appended to the
pool file which should significantly speed up the algorithm. The step
2.5 paragraph was added more as an afterthought...


 > Hope that helps.
 > 
 > Regards,
 > Holger
 > 
 > [1] <20081209031017.gm...@gratch.parplies.de>
 > [2] <18749.12572.675749.745...@consult.pretender>
 > 

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] External WD Worldbook Device

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 22:13:41 -0500 on Tuesday, June 2, 2009:
 > > I current mount the NFS drive as follows:
 > > 
 > > backupdevice1:/shares/internal/PUBLIC /backups nfs
 > > rsize=8192,wsize=8192,nosuid,soft 0 0
 > > 
 > > I took a look at the options and there is no async option, just a sync
 > > option.  According to the info sync does the following:  All I/O to the 
 > > file
 > > system should be done synchronously.
 > 
 > I think async is the default these days.
 > 

I use the following options in my nfs mount (note some options may be
default and not strictly necessary but I have included them for
clarity)

auto,noexec,nosuid,nodev,intr,_netdev,async,timeo=25,hard

Specifically async is *much* faster than sync. I believe I set the
timeo=25 to avoid problems when my remote disk had spun down.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Backing up a BackupPC server

2009-06-02 Thread Jeffrey J. Kosowsky
Les Mikesell wrote at about 22:34:04 -0500 on Tuesday, June 2, 2009:
 > Steve wrote:
 > > I am not a programmer, just a dumb user that thinks the whole thing is
 > > kinda magic.  So perhaps this is a stupid question.  But since
 > > BackupPc somehow knows where all the files are and what is linked to
 > > what etc. why can't there be a button in the CGI interface that
 > > instead of "restore", says something like, "as of the last inc or
 > > whatever, make a perfect copy of what i want over here" and just let
 > > that replicate the pc directories and pool and everything over to the
 > > backup, offsite, or external drive?
 > 
 > The problem is that there is no good mechanism to find the "other" names 
 >   that are hardlinked together.  Normally you don't need to know - 
 > except when you are trying to copy the whole thing maintaining 
 > consistency.  The not-so-good way to do it is to build a table of names 
 > and inode numbers for the whole tree and link the names with matching 
 > inodes as they are copied.  As you may have noticed, this doesn't scale 
 > very well.

This is why I like the idea of prepending/appending the pool file hash
name to the pool file. Then it does scale well since you don't need to
create and search a long table.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] SMB vs RSYNCD

2009-06-02 Thread Jeffrey J. Kosowsky
Adam Goryachev wrote at about 13:42:41 +1000 on Wednesday, June 3, 2009:
 > clint woodrow wrote:
 > > 
 > > Matthias Meyer wrote:
 > >> rsync(d) transmit only changed parts of a file
 > >> (http://www.samba.org/rsync). e.g. a 2.6 GB mailbox.pst and receive
 > >> one new mail at sunday. rsync will only transmit this one new mail.
 > >> 
 > >> 
 > >> And you need a client on windows side. You can use cwRsync or rsync
 > >> within a cygwin environment.
 > >> 
 > >> br Matthias
 > > 
 > > 
 > > Thanks for the details Matthias.  I was wondering if one of you could
 > > confirm something else for me.  We've been testing BackupPC for the
 > > last few months.  The local backup server is up and running great and
 > > we're getting ready to put up an offsite backup server for redundancy
 > > and offsite security.
 > > 
 > > Up to this point, I've assumed I wouldn't be able to roll PSTs into
 > > the backups because they'd be too large to send over the limited
 > > bandwidth every night and that they'd take too much space in the
 > > pool.  From what Matthias has said, it looks like as long as we have
 > > the initial backup, subsequent backups shouldn't take nearly as long
 > > since rsync is sending changes only.  Can I also take from this that
 > > BackupPC won't be storing full copies for every backup?  i.e. If the
 > > user has a 1 GB pst and adds one message of 2k, the pool requirements
 > > are only 1 GB + 2k, not 2 GB + 2k?
 > 
 > No, rsync handles the transfer, so you will only transfer
 > (approximately) 2k for the changes. (Obviously there are additional
 > overheads for the non-changed sections, but they are very small).
 > However, once backuppc has the whole new version of the file, it will
 > add this 1.002G file to the pool, so the pool will consume 2G + 2k. Of
 > course, if you use compression, then the size of the cpool will be
 > smaller depending on how compressible your pst files are.
 > 
 
Said another way, BackupPC only does pooling and de-duplication at the
file level. To get the "1GB + 2k" you would need to pool at the block
level. Such block-level de-duplication would be nice for files that
grow such as log files or Inboxes, but it is not possible now with BackupPC.

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] Host Summary - Full Size: Wondering where it comes from

2009-06-02 Thread Boniforti Flavio
Maybe it's still "too early" in the morning...

> > Anybody knows why there's difference between the above 2 values?
> 
> yes.

... but to which question was this "yes" related to? It's like one
asking "What time is it?" and getting the same "Yes" answer... :-/

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/