Hi JAB, many thanks for your answer.
Ok, some more background information: We are working with video realtime applications and uncompressed files. So one project is one folder and some subfolders. The size of one project could be more than 1TB. That is the reason why I want to move the whole folder tree. Moving old stuff to the slower storage is not the problem but moving the files back for working with the realtime applications. Not every file will be accessed when you open a project. The Clients get access via GPFS-Client (Windows) and over Samba. Another tool on storage side scan the files for creating playlists etc. While the migration the playout of the video files may not dropped. So I think the best way is to find a solution with mmapplypolicy manually or via crontab. Im must check the access time and the types of files. If I do not do this never a file will be moved the slower storage because the special tool always have access to the files. I will try some concepts and give feedback which solution is working for me. Matthias Von: Jonathan Buzzard <jonathan.buzz...@strath.ac.uk> An: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Datum: 01.11.2017 13:18 Betreff: [Newsletter] Re: [gpfsug-discuss] Combine different rules Gesendet von: gpfsug-discuss-boun...@spectrumscale.org On Wed, 2017-11-01 at 11:55 +0100, matthias.kni...@rohde-schwarz.com wrote: > Hi at all, > > I configured a tiered storage with two pools. > > pool1 >> fast >> ssd > pool2 >> slow >> sata > > First I created a fileset and a placement rule to copy the files to > the fast storage. > > After a time of no access the files and folders should be moved to > the slower storage. This could be done by a migration rule. I want to > move the whole project folder to the slower storage. Why move the whole project? Just wait if the files are not been accessed they will get moved in short order. You are really making it more complicated for no useful or practical gain. This is a basic policy to move old stuff from fast to slow disks. define(age,(DAYS(CURRENT_TIMESTAMP)-DAYS(ACCESS_TIME))) define(weighting, CASE WHEN age>365 THEN age*KB_ALLOCATED WHEN age<30 THEN 0 ELSE KB_ALLOCATED END ) RULE 'ilm' MIGRATE FROM POOL 'fast' THRESHOLD(90,70) WEIGHT(weighting) TO POOL 'slow' RULE 'new' SET POOL 'fast' LIMIT(95) RULE 'spillover' SET POOL 'slow' Basically it says when fast pool is 90% full, flush it down to 70% full, based on a weighting of the size and age. Basically older bigger files go first. The last two are critical. Allocate new files to the fast pool till it gets 95% full then start using the slow pool. Basically you have to stop allocating files to the fast pool long before it gets full otherwise you will end up with problems. Basically imagine there is 100KB left in the fast pool. I create a file which succeeds because there is space and start writing. When I get to 100KB the write fails because there is no space left in the pool, and a file can only be in one pool at a time. Generally programs will cleanup deleting the failed write at which point there will be space left and so the cycle goes on. You might want to force some file types onto slower disk. For example ISO images don't really benefit from ever being on the fast disk. /* force ISO images onto nearline storage */ RULE 'iso' SET POOL 'slow' WHERE LOWER(NAME) LIKE '%.iso' You also might want to punish people storing inappropriate files on your server so /* force MP3's and the like onto nearline storage forever */ RULE 'mp3' SET POOL 'slow' WHERE LOWER(NAME) LIKE '%.mp3' OR LOWER(NAME) LIKE '%.m4a' OR LOWER(NAME) LIKE '%.wma' Another rule I used was to migrate files over to a certain size to the slow pool too. > > If a file in a project folder on the slower storage will be accessed > this whole folder should be moved back to the faster storage. > Waste of time. In my experience the slow disks when not actually taking new files from a flush of the fast pools will be doing jack all. That is under 10 IOPS per second. That's because if you have everything sized correctly and the right rules people rarely go back to old files. As such the penalty for being on the slower disks is most none existent because there is loads of spare IO capacity on those disks. Secondly by the time you have spotted the files need moving the chances are your users have finished with them so moving them gains nothing. Thirdly if the users start working with those files any change to the file will result in a new file being written which will automatically go to the fast disks. It's the standard dance when you save a file; create new temporary file, write the contents, then do some renaming before deleting the old one. If you are insistent then something like the following would be a start, but moving a whole project would be a *lot* more complicated. I disabled the rule because it was a waste of time. I suggest running a similar rule that prints the files out so you can see how pointless it is. /* migrate recently accessed files back the fast disks */ RULE 'restore' MIGRATE FROM POOL 'slow' WEIGHT(KB_ALLOCATED) TO POOL 'fast' WHERE age < 1 Depending on the number of "projects" you anticipate you could allocate a project to a fileset and then move whole filesets about but I really think the idea is one of those that looks sensible at a high level but in practice is not sensible. > The rules must not run automatically. It is ok when this could be > done by a cronjob over night. > I would argue strongly, very strongly that while you might want to flush the fast pool down every night to a certain amount free, you must have it set so that should it become full during the day an automatic flush is triggered. Failure to do so is guaranteed to bite you in the backside some time down the line. > I am a beginner in writing rules. My idea is to write rules which > listed files by date and by access and put the output into a file. > After that a bash script can change the attributes of these files or > rather folders. Eh, you apply the policy and it does the work!!! More reading required on the subject I think. A bash script would be horribly slow. IBM have put a lot of work into making the policy engine really really fast. Messing about changing thousands if not millions of files with a bash script will be much much slower and is a recipe for disaster. Your users will put all sorts of random crap into file and directory names; backtick's, asterix's, question marks, newlines, UTF-8 characters etc. that will invariably break your bash script unless carefully escaped. There is no way for you to prevent this. It's the reason find/xargs have the -print0/-0 options, otherwise stuff will just mysteriously break on you. It's really better to just sidestep the whole issue and not process the files with scripts. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss