>
>
> Missing CC to list....
>
>
> -------- Forwarded Message --------
> Subject: Re: [ceph-users] Merging CephFS data pools
> Date: Tue, 23 Aug 2016 08:59:45 +0200
> From: Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de>
> <burkhard.li...@computational.bio.uni-giessen.de>
> To: Gregory Farnum <gfar...@redhat.com> <gfar...@redhat.com>
>
> Hi,
>
>
> On 08/22/2016 10:02 PM, Gregory Farnum wrote:
> > On Thu, Aug 18, 2016 at 12:21 AM, Burkhard Linke
> > <burkhard.li...@computational.bio.uni-giessen.de> 
> > <burkhard.li...@computational.bio.uni-giessen.de> wrote:
> >> Hi,
> >>
> >> the current setup for CephFS at our site uses two data pools due to
> >> different requirements in the past. I want to merge these two pools now,
> >> eliminating the second pool completely.
> >>
> >> I've written a small script to locate all files on the second pool using
> >> their file layout attributes and replace them with a copy on the correct
> >> pool. This works well for files, but modifies the timestamps of the
> >> directories.
> >> Do you have any idea for a better solution that does not modify timestamps
> >> and plays well with active CephFS clients (e.g. no problem with files being
> >> used)? A simple 'rados cppool' probably does not work since the pool 
> >> id/name
> >> is part of a file's metadata and client will not be aware of moved
> >> files.....
> > Can't you just use rsync or something that will set the timestamps itself?
> The script is using 'cp -a', which also preserves the timestamps. So
> file timestamps are ok, but directory timestamps get updated by cp and
> mv. And that's ok from my point of view.
>
> The main concern is data integrity. There are 20TB left to be
> transferred from the old pool, and part of this data is currently in
> active use (including being overwritten in place). If write access to an
> opened file happens while it is being transfered, the changes to that
> file might be lost.
>
> We can coordinate the remaining transfers with the affected users, if no
> other way exists.
>
> I believe that the best way is to copy all the files from the old pool to
the another one, after that set a service window and make the second pass
to copy files with changes only, deny access to the source pool (but keep
data for a while) and open service access again. After some time if there
will not be any data loss issues the old pool can be deleted. I think it's
the only way to guaratee data integrity.

One of the big service integrators few months ago were migrating from a
traditional proprietary storage solution to ceph RADOS (they offer
S3-compatible storage services). So they used the following migration path:
1. They wrote a special script for this propose.
2. Script copied all the data from the old storage to RADOS and put to the
database records for the each file with size, timestamp, owner, permissions
and so on. In case of success migration app wrote "migrated" status to the
db, in case of failure - error, and that file should be migrated during the
next app run. Migration process took around two weeks because they shaped
speed to prevent service performance disruption.
3. After that they started MD5 hash comparison to assure in data integrity.
4. In the end they had put service to maintainance mode for a few hours,
copied and checked all changes made during migration time and finally
opened access to the new cluster.

Best regards,
Vladimir


>
> Regards,
> Burkhard
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to