Hi, 
I think you need to support the following functionality to support HSM (file 
not block based):

1 implement a trigger on file creation/modification/deletion

2 store the additional HSM identifier for recall as a file attribute

3 policy based purging of file related blocks (LRU cache etc.)

4 implement an optional trigger to recall a purged file and block the IO (our 
experience is that automatic recalls are problematic for huge installations if 
the aggregation window for desired recalls is short since they create 
inefficient and chaotic access on tapes)

5 either snapshot a file before migration, do an exclusive lock or freeze it to 
avoid modifications during migration (you need to have a unique enough 
identifier for a file, either inode/path + checksum or also inode/path + 
modification time works)

The interesting part is the policy engine:

Ideally one supports time based and volume triggered policies with meta data 
matching e.g. 

a) time based : one needs to create a LRU list e.g. have a view of all files 
matching a policy by creation and or  access time. 
Example: "evict files from the filesystem when they are not accessed since 1 
month"

b) volume triggered : one needs to create a LRU list by creation and or access 
time and files are evicted from disks when a certain high-watermark is reached 
until the volume goes under a low-watermark
Example "evict files matching size/name/... criteria if the pool volume or 
subtree exceeds 95% to reach 90% usage"

Backup and archiving is simple compared to the above LRU policies.

You need the possibility to create this LRU view from scratch (e.g. full-table 
scan) and afterwards you could use incremental updates via trigger. 

Ideally one has a central view (on a subtree) and apply the policy there but 
this is not scalable as the rest of CEPH. It has the same problem like quota 
accounting by uid/gid on a subtree with the complication that you have to 
maintain a possibly huge file list sorted by ctime/mtime and or atime. CEPHFS 
stores directories as objects but you cannot apply policies on a the individual 
directory level, so it has to be at least at pool level or subtree level. If 
one trades the flexibility of policies one can keep the LRU view small. There 
is also no need to track each change of an atime, one could track atime for the 
LRU view with a granularity of days to avoid too many updates.

Now if you don't want to implement this LRU view you can outsource it to an 
external DB and ship the scalability issue and update frequency issue to the DB 
:-) and just provide the migration/recall hooks and attribute support. Maybe 
your idea was to integrate with RobinHood ... currently it seems tightly 
integrated with Lustre internals.

The HSM logic looks similiar to the peering logic you need for erasure coding 
to trigger eviction and recall. If you have the ctime/mtime/atime information 
on entries in directory objects and not on data objects it is sort of 
corresponding. With ctime/mtime only it is much more lightweight.

I actually wanted to make a BluePrint proposal for meta data searches in 
subtrees running as a method on the MDS objects which would provide the needed 
functionality for the HSM views. Although this is a full subtree scan it would 
be actually nicely distributed on the MDS backend pool and not on the MDS 
itself. The output of the search could go into temporary objects which are then 
converted into HSM actions like migration/deletion trigger etc.

I would favour this approach rather than relying on more and more external 
components since it is easy to do in CEPH.

FYI: there was a paper about migration policy scanning performance by IBM two 
years ago: 
http://domino.watson.ibm.com/library/CyberDig.nsf/papers/4A50C2D66A1F90F7852578E3005A2034/$File/rj10484.pdf

Cheers Andreas.




 





________________________________________
From: [email protected] [[email protected]] on 
behalf of Sage Weil [[email protected]]
Sent: 09 November 2013 09:33
To: [email protected]
Subject: HSM

The latest Lustre just added HSM support:

        
http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html

Here is a slide deck with some high-level detail:

        
https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf

Is anyone familiar with the interfaces and requirements of the file system
itself?  I don't know much about how these systems are implemented, but I
would guess there are relatively lightweight requirements on the fs (ceph
mds in our case) to keep track of file state (online or archived
elsewhere).  And some hooks to trigger migrations?

If anyone is interested in this area, I would be happy to help figure out
how to integrate things cleanly!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to