Hi All,

Something I am thinking about doing is utilising the placement policy engine to 
insert custom metadata tags upon file creation, based on which fileset the 
creation occurs in. This might be to facilitate Research Data Management tasks 
that could happen later in the data lifecycle.

I am also thinking about allowing users to specify additional custom metadata 
tags (maybe through a fancy web interface) and also potentially give users 
control over creating new filesets (e.g. for scientists running new 
experiments). So… pretend this is a placement policy on my GPFS driven 
data-ingest platform:


RULE 'RDMTEST'

     SET POOL 'instruments’

     FOR FILESET

('%GPFSRDM%10.01013%RDM%0ab34906-5357-4ca0-9d19-a470943db30a%RDM%8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')

     WHERE SetXattr

('user.rdm.parent','0ab34906-5357-4ca0-9d19-a470943db30a')

     AND SetXattr

               ('user.rdm.ingestor','8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')

RULE 'DEFAULT' SET POOL 'data'

The fileset name can be meaningless (as far as the user is concerned), but 
would be linked somewhere nice that they recognise – say 
/gpfs/incoming/instrument1. The fileset, when it is created, would also be an 
AFM cache for its ‘home’ counterpart which exists on a much larger (also GPFS 
driven) pool of storage… so that my metadata tags are preserved, you see.

This potentially user driven activity might look a bit like this:


-        User logs in to web interface and creates new experiment

-        Filesets (system-generated names) are created on ‘home’ and ‘ingest’ 
file systems and linked into the directory namespace wherever the user specifies

-        AFM relationships are set up and established for the ingest (cache) 
fileset to write back to the AFM home fileset (probably Independent Writer mode)

-        A set of ‘default’ policies are defined and installed on the cache 
file system to tag data for that experiment (the user can’t change these)

-        The user now specifies additional metadata tags they want added to 
their experiment data (some of this might be captured through additional 
mandatory fields in the web form for instance)

-        A policy for later execution by mmapplypolicy on the AFM home file 
system is created which looks for the tags generated at ingest-time and applies 
the extra user-defined tags

There’s much more that would go on later in the lifecycle to take care of 
automated HSM tiering, data publishing, movement and cataloguing of data onto 
external non GPFS file systems, etc. but I won’t go in to it here. My GPFS 
related questions are:

When I install a placement policy into the file system, does the file system 
need to quiesce? My suspicion is yes, because the policy needs to be consistent 
on all nodes performing I/O, but I may be wrong.

What is the specific limitation for having a policy placement file no larger 
than 1MB?

Cheers,
Luke.

Luke Raimbach​
Senior HPC Data and Storage Systems Engineer
The Francis Crick Institute
Gibbs Building
215 Euston Road
London NW1 2BE

E: luke.raimb...@crick.ac.uk<mailto:luke.raimb...@crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk/>


The Francis Crick Institute Limited is a registered charity in England and 
Wales no. 1140062 and a company registered in England and Wales no. 06885462, 
with its registered office at 215 Euston Road, London NW1 2BE.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to