RE: File::Redundant
Interesting ... not sure if implementing this in this fashion would be worth the overhead. If such a need exists I would imagine that would have choosen a more appropriate OS level solution. Think OpenAFS. It is always nice to use stuff that has ibm backing and likely has at least a professor or two and some grad students helping out on it. I had never heard of OpenAFS before your email. I will have to look into it a bit. My stuff would hopefully make it nice if you didn't want to change your os, or if you just wanted to make File::Redundant a small part of a much larger overall system. The biggest overheard I have seen is having to do readilnks. Maybe I could get around them somehow. I will have to draw up some uml or something to show how my whole system works. Earl
RE: File::Redundant
I would think it could be useful in non-mod_perl applications as well - you give an example of a user's mailbox. With scp it might be even more fun to have around :) (/me is thinking of config files and such) mod_perl works very well with the system for keeping track of what boxes are down, sizes of partitions and the like. However, a simple daemon would do about the same thing for say non-web based mail stuff. When I release I will likely have a daemon version as well as the mod_perl version, just using Net::Server. What's a `very large amount of data' ? We use it for tens of thousands of files, but most of those are small, and they certainly are all small on the 3 GB range. That is sort of the model for dirsync I think. Lots of small files in lots of different directories. Our NIS maps are on the order of 3 GB per file (64k users). Man, that is one big file. Guess dropping a note to this list sorta lets you know what you have to really scale to. Sounds like dirsync could use rsync if Rob makes a couple changes. Can't believe the file couldn't be broken up into smaller files. 3 GB for 64k users doesn't scale so hot for say a million users, but I have no idea about NIS maps, so there you go. Earl
Re: File::Redundant
This is OT for mod_perl, sorry... * Cahill, Earl [EMAIL PROTECTED] [2002-04-29 13:55]: Our NIS maps are on the order of 3 GB per file (64k users). Man, that is one big file. Guess dropping a note to this list sorta lets you know what you have to really scale to. Sounds like dirsync could use rsync if Rob makes a couple changes. Can't believe the file couldn't be broken up into smaller files. 3 GB for 64k users doesn't scale so hot for say a million users, but I have no idea about NIS maps, so there you go. I haven't been following the conversation, for the most part, but this part caught my eye. It is possible to split a NIS map up into many small source files, as long as when you change one of them you recreate the map in question as a whole. I've seen places with large NIS maps (although not 3GB) split the map up into smaller files, where each letter of the alphabet has it's own file in a designated subdirectory and a UID generator is used to get the next UID. When the NIS maps have to be rebuilt, the main map file is rebuilt using something like: (cat passwd.files/[a-z]*) passwd; make passwd which, of course, could be added to the Makefile as part of the passwd target. (darren) -- OCCAM'S ERASER: The philosophical principle that even the simplest solution is bound to have something wrong with it.
Re: File::Redundant
Cahill, Earl [EMAIL PROTECTED] wrote: Just putting about a little feeler about this package I started writing last night. Wondering about its usefulness, current availability, and just overall interest. Designed for mod_perl use. Doesn't make much sense otherwise. I would think it could be useful in non-mod_perl applications as well - you give an example of a user's mailbox. With scp it might be even more fun to have around :) (/me is thinking of config files and such) transactionalizing where I can. The whole system depends on how long the dirsync takes. In my experience, dirsync is very fast. Likely I would have dirsync'ing daemon(s), dirsync'ing as fast as they can. In some best case scenario, the most data that would ever get lost would be the time it takes to do one dirsync (usually less than a second for even very large amounts of data), and the loss would only happen if you were making changes on a dir as the dir went down. I would try to deal with boxes coming back up and keeping everything clean as best I could. What's a `very large amount of data' ? Our NIS maps are on the order of 3 GB per file (64k users). Over a gigabit ethernet link, this still takes half a minute or so to copy to a remote system, at least (for NIS master-slave copies) -- this is just an example of a very large amount of data being sync'd over a network. I don't see how transferring at least 3 GB of data can be avoided (even with diffs, the bits being diff'd have to be present in the same CPU at the same time). If any of the directories being considered by your module are NFS mounted, this will be an issue. Personally, I see NFS mounting as a real possibility since that allows relatively easy maintenance of a remote copy for backup if nothing else. -- James Smith [EMAIL PROTECTED], 979-862-3725 Texas AM CIS Operating Systems Group, Unix
Re: File::Redundant
On Thu, 25 Apr 2002, James G Smith wrote: What's a `very large amount of data' ? Our NIS maps are on the order of 3 GB per file (64k users). Over a gigabit ethernet link, this still takes half a minute or so to copy to a remote system, at least (for NIS master-slave copies) -- this is just an example of a very large amount of data being sync'd over a network. I don't see how transferring at least 3 GB of data can be avoided (even with diffs, the bits being diff'd have to be present in the same CPU at the same rsync solves this problem with sending diffs between machines using a rolling checksum algorithm. It runs over rsh or ssh transport, and compresses the data in transfer. I'd be very interested to hear how well it works with a file of that size. rsync has almost entirely replaced my use of scp. It's even replaced a fair portion of the times where I would have use cp because of it's capability to define exclusion lists when doing a recursive copy of a directory. Andrew McNaughton
Re: File::Redundant
Interesting ... not sure if implementing this in this fashion would be worth the overhead. If such a need exists I would imagine that would have choosen a more appropriate OS level solution. Think OpenAFS. On Thu, 25 Apr 2002, Cahill, Earl wrote: Just putting about a little feeler about this package I started writing last night. Wondering about its usefulness, current availability, and just overall interest. Designed for mod_perl use. Doesn't make much sense otherwise. Don't want to go into too many details here, but File::Redundant, takes some unique word (hopefully guaranteed through a database: a mailbox, a username, a website, etc.) which I call a thing, a pool of dirs, and how many $copies you would like to maintain. From the pool of dirs, $copies good dirs are chosen, ordered by percent full on the given partition. When you open a file with my open method (along with close, this is the only override method I have written so far), you get a file handle. Do what you like on the file handle. When you close the file handle, with my close method, I CORE::close the file and use Rob Brown's File::DirSync to sync to all the directories. DirSync uses time stamps to very quickly sync changes between directory trees. When a dir can't be reached (box is down or what have you), $copies good dirs are re-chosen and the dirsync happens from good old data to the new good dirs. If too much stuff goes down, you're sorta outta luck, but you would have been without my system anyway. I would write methods for everything (within reason) you do to a file, open, close, unlink, rename, stat, etc. So who cares? Well, using this system would make it quite easy to keep track of really an arbitrarily large amount of data. The pool of dirs could be mounts from any number of boxes, located remotely or otherwise, and you could sync accordingly. If File::DirSync gets to the point where you can use ftp or scp, all the better. There are race conditions all over the place, and I plan on transactionalizing where I can. The whole system depends on how long the dirsync takes. In my experience, dirsync is very fast. Likely I would have dirsync'ing daemon(s), dirsync'ing as fast as they can. In some best case scenario, the most data that would ever get lost would be the time it takes to do one dirsync (usually less than a second for even very large amounts of data), and the loss would only happen if you were making changes on a dir as the dir went down. I would try to deal with boxes coming back up and keeping everything clean as best I could. So, it would be a work in progress, and hopefully get better as I went, but I would at least like to give it a shot. Earl -- //\\ || D. Hageman[EMAIL PROTECTED] || \\//
Re: File::Redundant
Andrew McNaughton [EMAIL PROTECTED] wrote: On Thu, 25 Apr 2002, James G Smith wrote: What's a `very large amount of data' ? Our NIS maps are on the order of 3 GB per file (64k users). Over a gigabit ethernet link, this still takes half a minute or so to copy to a remote system, at least (for NIS master-slave copies) -- this is just an example of a very large amount of data being sync'd over a network. I don't see how transferring at least 3 GB of data can be avoided (even with diffs, the bits being diff'd have to be present in the same CPU at the same rsync solves this problem with sending diffs between machines using a rolling checksum algorithm. It runs over rsh or ssh transport, and compresses the data in transfer. Yes - I forgot about that - it's been a year or so since I read the rsync docs :/ but I do remember it mentioning that now. -- James Smith [EMAIL PROTECTED], 979-862-3725 Texas AM CIS Operating Systems Group, Unix
RE: File::Redundant (OT: AFS)
From: D. Hageman [mailto:[EMAIL PROTECTED]] Subject: Re: File::Redundant Interesting ... not sure if implementing this in this fashion would be worth the overhead. If such a need exists I would imagine that would have choosen a more appropriate OS level solution. Think OpenAFS. This is off-topic of course, but you often don't get unbiased opinions from the specific list. Does anyone have success or horror stories about AFS in a distributed production site? Oddly enough the idea of using it just came up in my company a few days ago to publish some large data sets that change once daily to several locations. I'm pushing a lot of stuff around now with rsync which works and is very efficient, but the ability to move the source volumes around transparently and keep backup snapshots is attractive. Les Mikesell [EMAIL PROTECTED]
RE: File::Redundant (OT: AFS)
On Thu, 25 Apr 2002, Les Mikesell wrote: From: D. Hageman [mailto:[EMAIL PROTECTED]] Subject: Re: File::Redundant Interesting ... not sure if implementing this in this fashion would be worth the overhead. If such a need exists I would imagine that would have choosen a more appropriate OS level solution. Think OpenAFS. This is off-topic of course, but you often don't get unbiased opinions from the specific list. Does anyone have success or horror stories about AFS in a distributed production site? Oddly enough the idea of using it just came up in my company a few days ago to publish some large data sets that change once daily to several locations. I'm pushing a lot of stuff around now with rsync which works and is very efficient, but the ability to move the source volumes around transparently and keep backup snapshots is attractive. I haven't personally used AFS on a large scale. I have setup several small tests beds with it to test the feasibility of using it at my job. I work for the EECS Department at the Universty of Kansas, so we have a fairly large hetergenous computer environment. My tests showed that at the time, support for Windows wasn't quite up to par yet. The *nix code base performed quite well. I say at the time because since then, the OpenAFS project has pushed out several more versions of the code base so support might be better. I did have the pleasure of talking with a guy from the University of Missouri that was telling me they have AFS deployed on a very large scale there and were very pleased with it (I think they were using the commercial version to support the Windows side). AFS definately has some promise and if it weren't for the hetergenous issues (and a few non-technical issues) we would be using it here. To avoid being completely off topic - I should point out that AFS modules exist for Perl and a mod_afs exist for Apache. ;-) -- //\\ || D. Hageman[EMAIL PROTECTED] || \\//