Hello, everyone.

TL;DR: shortly, distfiles will need to be present under two paths for
the transitional period.  Would you prefer us using hardlinks or
symlinks for that?


We're planning to start deploying a new GLEP 75-based [1] mirror layout
to our mirrors soonish.  This implies a transitional period during which
we'll be using both old and new layouts, so all file entries will be
duplicated.  The plan is roughly to:

1. Enable new split layout in emirrordist, and start using both
simultaneously for newly-mirrored files.

2. Duplicate the existing distfiles to new layout.

3. Live with both layouts for some longish time, to support people using
old Portage versions.

4. Eventually disable the old (flat) layout and start removing files.


The basic problem is whether to use hardlinks or symlinks
for the duplicate files.  I've elaborate more on both solutions in [2]
but I'll summarize shortly here.

Hardlinks have the advantage that for mirrors enabling -H, they avoid
extra space usage and extra traffic.  However, we don't really know how
many mirrors enable that, and I suspect it's around half of them.
At initial deployment time, rsync will just hardlink files in new layout
to existing entries, and at cleanup time it will just unlink old
entries.

For mirrors not enabling -H, hardlinks will mean all distfiles being
transferred again during deployment time.  Furthermore, through all
transitional period all files will be duplicated, and so duplicated will
be space usage.  Cleanup should be lightweight though.

Symlinks have the advantage that we know that all or almost all mirrors
enable them.  They are lightweight at deployment time since it's just
a matter of rsync copying symlinks, and they definitely won't cause
double space usage.  However, they will cause all files being
retransferred at cleanup time -- due to symlinks being replaced by real
files.

Technically, I suppose we could avoid that by splitting that into two
stages, repeated for smaller groups of files.  Firstly, replace symlinks
with hardlinks which will make it light for at least some of the errors.
Then, remove old files and jump over to the next group.  For mirrors not
using -H, this will still mean double transfer but we'd limit double
space usage to one group at a time, and only for a short period.

If any mirrors sync over rsync without using -l (talking about private
mirrors here), they will not get the new layout at all which is going to
suck for their users.


Which way do you prefer?


[1] https://www.gentoo.org/glep/glep-0075.html
[2] https://bugs.gentoo.org/534528#c38

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to