Re: [lustre-discuss] Data migration software

2023-03-22 Thread Stephane Thiell via lustre-discuss
Hi Anna,

We’re about to deploy Lustre/HSM with Phobos for a new large research data 
archival system at Stanford (200PB).

https://github.com/phobos-storage

Phobos is open source and a Lustre copytool is available. Archiving policies 
can be set up via Robinhood like with other HSMs. Robinhood is also open source 
and supports project IDs if you take the patches from GerritHub (like this one: 
https://review.gerrithub.io/c/cea-hpc/robinhood/+/541104 but more are needed, I 
can give you the list if needed). Data restore concurrency should be well 
handled with Lustre/HSM.
A Lustre userspace coordinator named “coordinatool" is required for using 
Phobos in multi-server mode, but it is also freely available on GitHub. We plan 
to have a dedicated Lustre client for the coordinatool.

Hope that helps.

Stéphane


> On Mar 22, 2023, at 7:47 AM, Anna Fuchs via lustre-discuss 
>  wrote:
> 
> Dear all,
> 
> if you have a large Lustre storage and a large tape archive and maybe even 
> additionally some in-house cloud storage, which software do you use for more 
> or less automatic data migration, that has good scaling?
> Ideally it somehow supports Lustre project quota and more important a 
> synchronized catalog to find the data.
> E.g. if the data to be read is on tape, it somehow transparently moves it to 
> the main faster storage (like Lustre) without the user explicitly knowing (at 
> least not required to) where the data has been initially stored.
> If another user wants to access the same shared file, the software would know 
> it is already "buffered" on Lustre and wouldn't read it again from tape.
> Or If the data has not been touched (really processed, not just touch) for a 
> certain period of time, or the user runs out of Lustre quota, but has free 
> archive space, it would be automatically archived on tape and so on.
> Ideally, the software should not cost a billion for a year license or even be 
> open source :)
> 
> Thank you
> Anna Fuchs
> --
> Universität Hamburg
> https://wr.informatik.uni-hamburg.de/people/anna_fuchs
> 
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Data migration software

2023-03-22 Thread Anna Fuchs via lustre-discuss

Dear all,

if you have a large Lustre storage and a large tape archive and maybe 
even additionally some in-house cloud storage, which software do you use 
for more or less automatic data migration, that has good scaling?
Ideally it somehow supports Lustre project quota and more important a 
synchronized catalog to find the data.
E.g. if the data to be read is on tape, it somehow transparently moves 
it to the main faster storage (like Lustre) without the user explicitly 
knowing (at least not required to) where the data has been initially stored.
If another user wants to access the same shared file, the software would 
know it is already "buffered" on Lustre and wouldn't read it again from 
tape.
Or If the data has not been touched (really processed, not just touch) 
for a certain period of time, or the user runs out of Lustre quota, but 
has free archive space, it would be automatically archived on tape and 
so on.
Ideally, the software should not cost a billion for a year license or 
even be open source :)


Thank you
Anna Fuchs
--
Universität Hamburg
https://wr.informatik.uni-hamburg.de/people/anna_fuchs



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org