On 16.03.2017 08:26, Youssef Eldakar wrote:
Thanks for the reply, Anthony, and I am sorry my question did not give 
sufficient background.

This is the cluster behind archive.bibalex.org. Storage nodes keep archived 
webpages as multi-member GZIP files on the disks, which are formatted using XFS 
as standalone file systems. The access system consults an index that says where 
a URL is stored, which is then fetched over HTTP from the individual storage 
node that has the URL somewhere on one of the disks. So far, we have pretty 
much been managing the storage using homegrown scripts to have each GZIP file 
stored on 2 separate nodes. This obviously has been requiring a good deal of 
manual work and as such has not been very effective.

Given that description, do you feel Ceph could be an appropriate choice?

if you adapt your scripts to something like...

"Storage nodes archives webpages as gzip files, hashes the url to use as an object name and saves the gzipfiles as an object in ceph via the S3 interface. The access system gets a request for an url, it hashes an url into a object name and fetch the gzip (object) using regular S3 get syntax"

ceph would deal with replication, you would only put objects in, and fetch them out.

you could if you need it store the list of urls and hashes. except as a list of what you have stored. this is just an example tho. you could also use cephfs, mounted on nodes and serve files as today.

ceph is just a storage tool it could work very nicely for your needs. but accessing the files on osd's directly will only bring pain.


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to