On Fri, 26 Mar 2010, Elaine Ashton wrote:

Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 
years which is the canonical mirror for a large number of mirrors and the 
perspective of having a few terabytes spinning in storage changes quite 
dramatically when you are actually serving a few terabytes to thousands of 
clients. CPAN grew to be quite a burden on the site not only because of the 
high demand, but also because of the multitude of small files and I'm sure 
other mirrors feel similarly burdened.

Don't be such an arrogant prick.  You guys made baseless assumptions about
people's experience with storage management in an attempt to diregard their
opinions.  That's being a dick by any metric.

The sort of pruning Tim brought up has long been an idea, but with the current 
and growing size of the archive, something does need to be done to alleviate 
the burden not only on the canonical mirrors, but also on the random folks who 
want to grab a local mirror for themselves. In my present work environment, 
12gb isn't a lot of disk space, but it's a lot considering I don't need to 
install perl modules daily and the vast majority of it I'll likely never use. 
It would be a kindness to both the mirror operators and to the end-users to 
trim it down to a manageable size.

I think I was quite explicit in saying that efficiencies should be pursued
in multiple areas, but the predominant bitch I took away from your thread
dealt with the burden of synchronizing mirrors.  What's the easiest way to
address that pain?  I don't believe it's your method.  I'd look into the
size issue *after* you address the incredible inefficiencies of a simple
rsync.

As for efficiency, rsync remains a good tool for the job that works on nearly 
every platform which is a rather tall order to match with any other solution. 
Relegating the cruft to BackPAN to make the current CPAN slimmer and less 
demanding on all fronts is an idea that would be welcomed by more than just 
mirror ops.

Rsync is an excellent tool for smaller file sets.  I use it to sync my own
mirrors, those mirrors are typically ~10k files.  Am I surprised that it
doesn't scale when you're stat'ing every single file?  No.  Which is why
alternatives should be considered.  A simple FTP client playing a
transaction log forward is trivial.

I maintain several mirrors, most with rsync.  But that's with a clear
understanding of the size of the file set.  Use the right tool for the job.
And it seems apparent to me that rsync isn't the right tool for ~200k files.

The only snag I can forsee in trimming back on the abundance of modules is the 
case where some modules have version requirements for other modules where it 
will barf with a mismatch/newer version of the required module (I bumped into 
this recently but can't remember exactly which module it was) but I think it's 
rare and the practise should be discouraged.

Try doing a simple cost-benefit analysis.  What you guys are proposing will
help.  But not as much as simpler alternatives.  Like replacing rsync with a
perl script and modifying PAUSE to log the transactions.

        --Arthur Corliss
          Live Free or Die

Reply via email to