Re: HSM

Malcolm Haak Mon, 11 Nov 2013 16:58:47 -0800

Hi Gregory,


On 12/11/13 10:13, Gregory Farnum wrote:

On Mon, Nov 11, 2013 at 3:04 AM, John Spray <[email protected]> wrote:

This is a really useful summary from Malcolm.

In addition to the coordinator/copytool interface, there is the question of
where the policy engine gets its data from.  Lustre has the MDS changelog,
which Robinhood uses to replicate metadata into its MySQL database with all
the indices that it wants.

On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <[email protected]> wrote:

So there aren't really any hooks in that exports are triggered by the policy 
engine after a scan of the metadata, and the recalls are triggered when caps 
are requested on offline files


Wait, is the HSM using a changelog or is it just scanning the full
filesystem tree? Scanning the whole tree seems awfully expensive.

While I can't speak at length about the LustreHSM, it may just useincremental updates to its SQL database via metadata logs, I do knowthat filesystem scans are done regularly in other HSM solutions. I alsoknow that the scan is multi-threaded and when backed by decent disksdoes not take an excessive amount of time.

I don't know if CephFS MDS currently has a similar interface.

Well, the MDSes each have their journal of course, but more than that
we can stick whatever we want into the metadata and expose it via
virtual xattrs or whatever else.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


John


On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <[email protected]> wrote:


Hi All,

If you are talking specifically about Lustre HSM, its really an interface to 
add HSM functionality by leveraging existing HSM's (DMF for example)

So with Lustre HSM you have a policy engine that triggers the migrations out of 
the filesystem. Rules are based around size, last accessed and target state 
(online, dual and offline).

There is a 'coordinator' process involved here as well, it (from what I 
understand) runs on MDS nodes. It handles the interaction with the copytool. 
The copytool is provided by the HSM solution you are acutally using.

For recalls when caps are aquired on the MDS for an exported file the 
resposible MSD contacts the coordinator, which in-turn uses the copytool to 
pull the required file out of the HSM.

In the Lustre HSM, the objects that make up a file are all recalled and the 
file, not the objects, are handed to the HSM.

For Lustre all it needs to keep track of is the current state of the file and 
the correct ID to reqest from the HSM. This is done inside the normal metadata 
storage.

So there aren't really any hooks in that exports are triggered by the policy 
engine after a scan of the metadata, and the recalls are triggered when caps 
are requested on offline files. Then its just standard POSIX blocking until the 
file is available.

Most of the state and ID stuff could be stored as XATTRS in cephfs. I'm not as 
sure how to do it for other things but as long as you could store some kind of 
extended metadata about whole objects, it could use the same interfaces as well.

Hope that was acutually helpful and not just an obvious rehash...

Regards

Malcolm Haak

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HSM

Reply via email to