2000-07-05-10:47:05 Scott Sherman:
> We have recently purchased a new webserver (web5) and
> would like to migrate over to it from an older machine (web4).
> Both machines are running RedHat 6.2. If possible we would like
> to perform this migration without any downtime.
>
> Basically there is a directory called /web which I wish to copy
> from web4 to web5. To do this, I have entered the following (from
> / on web5):
>
> rsync -avz -e ssh --progress (web4's ip address):/web ./
>
> It copies all of the files over. However, after completion, the
> destination '/web' directory on web5 is about 100 MB larger than
> the source '/web' directory on web4.
Well, there are unimportant differences in detail from how I tend to
do this sorta thing, like e.g. I tend to push rather than pull for
some reason, just habit I guess. But I can suggest a few changes of
substance that should fix the problem of dst bigger than src.
(1) include --delete, so that files that are present on the dst but
not on the src get removed;
(2) include -H, so that hard links on the src are correctly
reproduced on the dst; and
(3) include -S, so that sparse files on the src have their
sparseness properly propogated to the dst.
> Does anyone know the reason for this increase in size?
Probably one of the above 3 would account for it, would be my guess.
The only other things I can think of are different cluster sizes,
shouldn't be an issue since these are running the same OS and you
probably didn't override the defaults; and the fact that directories
grow and never shrink, but that's unlikely to be the issue here, and
it'd be pretty amazing for it to account for 100MB in any case.
> Also, is this a sensible way to do things?
Very very. It's how I always do this kind of job these days.
> I am concerned about the fate of files that are being altered
> while rsync is being run.
That's the one worry, and basically there's no way to address it
without a deep redesign of your datastore to allow reliable realtime
replication, which a plain old filesystem ain't, and rsync can't
make it be.
But what you _can_ do, which is liable to be good enough in many
settings, would be
(1) keep the dst synced really tightly to the src, repeatedly
running syncs. They'll get _really_ fast as there's almost
nothing to copy, and all the directory structures on each end
are in the kernel's cache.
(2) Shut down whatever process is updating the content on web4 (but
not the webserver, you can leave it delivering the content
readonly).
(3) Do a final prophylactic rsync, just in case a change snuck in
between (1) and (2). If you've got fast fingers it probably
didn't, but play it safe:-).
(4) Restart the update process so it now updates the content tree on
web5.
(5) Test as quickly as possible (automated test scripts using a tool
like e.g. curl can be a great comfort here), then
(6) Switch web5 in to be the live server.
Note that if you are using a load balancer, a thing like Cisco's
LocalDirector or whatever, then you can make step (6) be perfectly
invisible to users, adding web5 to the pool then yanking web4. And
if you don't have an automatic-failover-detecting load-balancer
gizmo of some sort in front of your farm, you probably aren't really
committed to delivering so close to 100% perfect uptime as to make
the quick changeover I describe above impractical. While I find Red
Hat Linux to be a fine, wonderful stable server platform, I don't
find _any_ single-box server hardware to approach 100% closely
enough for serious high-availability needs. So when availability is
a requirement, the design has a pair of routers in HSRP H-A config
outside the farm, provided with diverse routes to the provider[s]'
backbone[s], a pair of LocalDirectors or something like them in H-A
config, a pair of switches behind them strapped together and running
the spanning tree thing, and servers behind that, salt-n-peppered
across the switches. Make sure you plan things like the power
provisioning to ensure no single point of failure that you can
possibly avoid.
-Bennett
PGP signature