Jonathan brings up a good point that you’ll only get one shot at this — if 
you’re using the file system as your record of who owns what.

You might want to use the policy engine to record the existing file names and 
ownership (and then provide updates using the same policy engine for the things 
that changed after the last time you ran it). At that point, you’ve got the 
list of who should own what from before you started.

You could even do some things to see how complex your problem is, like "how 
many directories have files owned by more than one UID?”

With respect to that, it is surprising how easy the sqlite C API is to use 
(though I would still recommend Perl or Python), and equally surprising how 
*bad* the JOIN performance is. If you go with sqlite, denormalize *everything* 
as it’s collected. If that is too dirty for you, then just use MariaDB or 
something else.


-- 
Stephen



> On Jun 9, 2020, at 7:20 AM, Jonathan Buzzard <jonathan.buzz...@strath.ac.uk> 
> wrote:
> 
> On 08/06/2020 18:44, Lohit Valleru wrote:
>> Hello Everyone,
>> We are planning to migrate from LDAP to AD, and one of the best solution was 
>> to change the uidNumber and gidNumber to what SSSD or Centrify would resolve.
>> May I know, if anyone has come across a tool/tools that can change the 
>> uidNumbers and gidNumbers of billions of files efficiently and in a reliable 
>> manner?
> 
> Not to my knowledge.
> 
>> We could spend some time to write a custom script, but wanted to know if a 
>> tool already exists.
> 
> If you can be sure that all files under a specific directory belong to a 
> specific user and you have no ACL's then a whole bunch of "chown -R" would be 
> reasonable. That is you have a lot of user home directories for example.
> 
> What I do in these scenarios is use a small sqlite database, say in this 
> scenario which has the directory that I want to chown on, the target UID and 
> GID and a status field. Initially I set the status field to -1 which 
> indicates they have not been processed. The script sets the status field to 
> -2 when it starts processing an entry and on completion sets the status field 
> to the exit code of the command you are running. This way when the script is 
> finished you can see any directory hierarchies that had a problem and if it 
> dies early you can see where it got up to (that -2).
> 
> You can also do things like set all none zero status codes back to -1 and run 
> again with a simple SQL update on the database from the sqlite CLI.
> 
> If you don't need to modify ACL's but have mixed ownership under directory 
> hierarchies then a script is reasonable but not a shell script. The overhead 
> of execing chown billions of times on individual files will be astronomical. 
> You need something like Perl or Python and make use of the builtin chown 
> facilities of the language to avoid all those exec's. That said I suspect you 
> will see a significant speed up from using C.
> 
> If you have ACL's to contend with then I would definitely spend some time and 
> write some C code using the GPFS library. It will be a *LOT* faster than any 
> script ever will be. Dealing with mmpgetacl and mmputacl in any script is 
> horrendous and you will have billions of exec's of each command.
> 
> As I understand it GPFS stores each ACL once and each file then points to the 
> ACL. Theoretically it would be possible to just modify the stored ACL's for a 
> very speedy update of all the ACL's on the files/directories. However I would 
> imagine you need to engage IBM and bend over while they empty your wallet for 
> that option :-)
> 
> The biggest issue to take care of IMHO is do any of the input UID/GID numbers 
> exist in the output set??? If so life just got a lot harder as you don't get 
> a second chance to run the script/program if there is a problem.
> 
> In this case I would be very tempted to remove such clashes prior to the main 
> change. You might be able to do that incrementally before the main switch and 
> update your LDAP in to match.
> 
> Finally be aware that if you are using TSM for backup you will probably need 
> to back every file up again after the change of ownership as far as I am 
> aware.
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard                         Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to