Jeremy, The wiki family files to which you linked are very interesting and if I go the route of a wiki family, I'd do a very scaled down and simpler version of that, as I only have five very similar wikis (they're essentially just different language versions of the same thing, though their content is managed by different groups of editors, and not by me). I'd seen the files at https://noc.wikimedia.org/conf/ previously, so I had a rough idea of that approach.
As for your suggestions about the uploaded files, I should first reiterate that currently I have a single storage host and that is my primary concern. Using rsync for backups is fine, but not to keep multiple production storage hosts synchronized. Based on some more reading today, I think it would take moving to a clustered filesystem such as GFS2 <http://en.wikipedia.org/wiki/GFS2>. I am curious though about the server architecture and what the MediaWiki configuration would be to use the file servers and image scalers shown in this diagram <http://upload.wikimedia.org/wikipedia/commons/d/d8/Wikimedia-servers-2010-12-28.svg> . Justin On Tue, Oct 28, 2014 at 6:19 AM, Jeremy Baron <jer...@tuxmachine.com> wrote: > On Tue, Oct 28, 2014 at 8:33 AM, Justin Lloyd <jclb...@gmail.com> wrote: > > Also, unless I'm missing something or being dense (it is late here), > rsync > > simply wouldn't work since the upload directories are constantly being > > accessed and files written through one web server could easily be > > immediately accessed afterwards through another web server, and since > there > > are four web servers (and possibly more or even less if I were to add AWS > > Auto Scaling into the mix), so keeping them all identical when writes > could > > go through any of them would be pretty much impossible. > > Well you don't have to limit yourself to having the file uploads > visible in only one part of the filesystem. (so this could work even > if you don't have dedicated storage hosts. mediawiki uses NFS mount > and unrelated vhost serves static files out of rsync target) > > But you could also do like WMF (and it sounds like you already have > dedicated storage boxes?): > files are fetched from one hostname/varnish cluster/storage cluster > and HTML/etc. comes from a completely separate hostname/varnish > cluster/php cluster. > > 4 webservers mount rsync master by NFS. same as now. writes and file > description page rendering runs over NFS. > > new webservers (or vhosts on existing webservers or webservers on the > storage hosts directly) serve images read-only from the local copy of > the files propagated by the rsync cron. (or whatever other way) > > you could autoscale for the 4 webservers that don't have local images > at all (just NFS) and then either build an initial rsync into the > scale up process for storage hosts or do that scaling manually. > > anyway, this is all just to address the immediate spof quickly. longer > term maybe figure out a way to use s3 or something. (which you could > do already actually. your rsync cron could instead be a copy to s3 > cron. but then still spof on the same things where the config > described above would also have spof. e.g. file description pages, > uploading, deleting, etc.) > > -Jeremy > > _______________________________________________ > MediaWiki-l mailing list > To unsubscribe, go to: > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l > _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l