[BackupPC-users] BackupPC and MooseFS?
Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
> > Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? > No, but it seems like a *really* bad idea -- the concept of slow, off-box, redundant storage isn't a really good fit with the concepts of pooling and linking. Since it's fuse-based, there's the possibility it doesn't support hard-linking at all, which would make it completely unfeasible. (I don't know, it's not obvious what it supports from the documentation.) -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 05/17/2011 06:57 PM, Michael Stowe wrote: >> Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? >> > No, but it seems like a *really* bad idea -- the concept of slow, off-box, > redundant storage isn't a really good fit with the concepts of pooling and > linking. > > Since it's fuse-based, there's the possibility it doesn't support > hard-linking at all, which would make it completely unfeasible. (I don't > know, it's not obvious what it supports from the documentation.) I use it with dirvish. It works just really fine. tamas -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 5/17/2011 11:57 AM, Michael Stowe wrote: >> >> Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? >> > > No, but it seems like a *really* bad idea -- the concept of slow, off-box, > redundant storage isn't a really good fit with the concepts of pooling and > linking. Actually the concept looks good. It does rely on a single master node though. > Since it's fuse-based, there's the possibility it doesn't support > hard-linking at all, which would make it completely unfeasible. (I don't > know, it's not obvious what it supports from the documentation.) The docs say it does handle hardlinks - but it is hard to tell if it does it well enough for backuppc. I'd expect the fuse layer to be the bottleneck in the design - at least if you have several data servers. -- Les Mikesell lesmikes...@gmail.com -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
Michael Stowe wrote at about 11:57:42 -0500 on Tuesday, May 17, 2011: > > > > Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? > > > > No, but it seems like a *really* bad idea -- the concept of slow, off-box, > redundant storage isn't a really good fit with the concepts of pooling and > linking. > > Since it's fuse-based, there's the possibility it doesn't support > hard-linking at all, which would make it completely unfeasible. (I don't > know, it's not obvious what it supports from the documentation.) The first paragraph on the page linked by the OP says: Symbolic links (file names pointing to target files, not necessarily on MooseFS) and hard links (different names of files which refer to the same data on MooseFS) So at least it supports hard links... Your points about speed may very well be true... -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 05/17 01:25 , Mike wrote: > Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? Thanks for the link. That looks like a pretty cool project. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 5/18/2011 3:21 PM, Carl Wilhelm Soderstrom wrote: > On 05/17 01:25 , Mike wrote: >> Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? > > Thanks for the link. That looks like a pretty cool project. I've been hoping someone would write a fuse layer on top of riak (a distributed, clustered DB that doesn't need a master node). There is something called luwak that handles files as streams, but because the chunking step hashes the key from the chunk contents (and thus deduplicates with no reference counting) you can't ever delete anything. -- Les Mikesell lesmikes...@gmail.com -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 11-05-18 05:21 PM, Carl Wilhelm Soderstrom wrote: > On 05/17 01:25 , Mike wrote: >> Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? > Thanks for the link. That looks like a pretty cool project. > and from initial appearances / testing, it runs pretty darn well, too. We only have ~2T on our test environment so far over 3 machines, so it's certainly nothing large. I haven't tried backuppc on it yet, but storing mail in maildir folder works well, and virtual machine images work well. being able to say "I want to have 2 copies of anything in this directory and 3 copies of anything in this directory" is very nice. So is being able to put half your servers in one building and half in another. Failing a disk and watching all the unmet goals (goal = min # of copies of something) get resolved is fun as well. Now, to convert the media machine at home... -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
But does moosefs basically duplicate the data, so if you have 2tb of backuppc data, you need a moosefs with 2tb of storage to duplicate the whole thing? On Fri, May 20, 2011 at 8:05 AM, Mike wrote: > On 11-05-18 05:21 PM, Carl Wilhelm Soderstrom wrote: > > On 05/17 01:25 , Mike wrote: > >> Has anyone tried using BackupPC and MooseFS (http://www.moosefs.org/)? > > Thanks for the link. That looks like a pretty cool project. > > > and from initial appearances / testing, it runs pretty darn well, too. > We only have ~2T on our test environment so far over 3 machines, so it's > certainly nothing large. > > I haven't tried backuppc on it yet, but storing mail in maildir folder > works well, and virtual machine images work well. > > being able to say "I want to have 2 copies of anything in this directory > and 3 copies of anything in this directory" is very nice. So is being > able to put half your servers in one building and half in another. > Failing a disk and watching all the unmet goals (goal = min # of copies > of something) get resolved is fun as well. > > > Now, to convert the media machine at home... > > > > > -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
On 5/21/2011 7:24 PM, Scott wrote: > But does moosefs basically duplicate the data, so if you have 2tb of > backuppc data, you need a moosefs with 2tb of storage to duplicate the > whole thing? Yes, it gives the effect of raid1 mirrors - but if I understand it correctly the contents can be distributed across several machines instead of needing space for a full copy of even a single instance of the whole filesystem on any single machine or drive. -- Les Mikesell lesmikes...@gmail.com -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC and MooseFS?
Hi, Mike wrote on 2011-05-20 09:05:11 -0300 [Re: [BackupPC-users] BackupPC and MooseFS?]: > [...] > being able to say "I want to have 2 copies of anything in this directory > and 3 copies of anything in this directory" is very nice. [...] Les Mikesell wrote on 2011-05-23 11:12:18 -0500 [Re: [BackupPC-users] BackupPC and MooseFS?]: > On 5/21/2011 7:24 PM, Scott wrote: > > But does moosefs basically duplicate the data, so if you have 2tb of > > backuppc data, you need a moosefs with 2tb of storage to duplicate the > > whole thing? > > Yes, it gives the effect of raid1 mirrors - from what Mike wrote, shouldn't you be able to say "I want to have only one copy of anything in and two copies of everything else"? With BackupPC, that doesn't seem to make any sense (but: see below) - why would you want to replicate only part of the pool, and why only files that happen to have a partial file md5sum starting with certain letters? You could limit log files to a single copy, but is that enough data to even worry about? This does bring up questions, though: how does it handle hardlinks, if you determine numbers of copies by directory, i.e. how many copies do you get, if a file is in one directory where you want three copies and in another directory where you chose two copies? If the answer is "five copies", it won't work with BackupPC ;-). Mike, can you test what it does with hard links, e.g. by creating a large file with several links? I'm just asking, because with normal UNIX file system usage patterns, you could probably get away with cheating (and creating five copies) without anyone complaining (or even noticing). Then again, the mechanism might be totally different, like putting the number of copies in the inode (and inheriting from the parent directory on file creation; presuming it *has* an own inode and doesn't just use a different FS for local storage). If that is the case, you could even conceivably have some hosts' data replicated X times and other hosts' data Y times (e.g. Y=1) by tagging the appropriate pc/ directories accordingly. Only problem: *shared* data (file contents appearing on hosts in both sets) would 'randomly' have X or Y copies, depending on which set of hosts happened to contain the file first (but you could probably adjust that later and "watch all the unmet goals get resolved" ;-). > but if I understand it correctly the contents can be distributed across > several machines instead of needing space for a full copy of even a single > instance of the whole filesystem on any single machine or drive. The way BackupPC works (heavily relying on fast read performance), I would expect it to be important for performance to have a full copy of the file system locally on the BackupPC server. Is there a way to enforce that? Another consideration would be, how well does it handle a large backlog of unmet goals? If you're replicating over a comparatively slow connection, you might need to spread out updates to the "mirror(s)" over more time than your backup window contains. Does a large backlog of unmet goals deplete system memory needed for caching? Mike wrote: > I haven't tried backuppc on it yet, but storing mail in maildir folder > works well, and virtual machine images work well. Unfortunately, both of these examples don't resemble BackupPC's disk usage. Virtual machine images are possibly high-bandwidth single large file operations, maildir folders use many small files, but the bandwidth is probably severely limited by your internet connection and MTA processing (DNSBL lookups, sender verification, Spamassassin, ...). Reading mail is limited by your POP or IMAP server's processing speed (well, or NFS). And all of that only happens if there is actually incoming mail or users checking their mailbox, which you probably don't have at a sustained high rate for longer periods of time. While BackupPC's performance may also be limited by link bandwidth or client speed, from what I read on this list, server disk performance seems to be the most important limiting factor. So, while your results are encouraging, we still simply need to try it out, unless we can establish a reason why it won't work. For any meaningful results, it would be best to have an alternate BackupPC server with "conventional" storage (and comparable hardware) backing up the same clients (but not at the same time) to compare backup performance with. Regards, Holger -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-d