Why not use swift. The intergration has been around for a while, and may be a better fit.
https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html On Mon, Nov 27, 2017 at 12:55 PM, Aristeu Gil Alves Jr <aristeu...@gmail.com > wrote: > Hi. > > It's my first post on the list. First of all I have to say I'm new on > hadoop. > > We are here a small lab and we have being running cephfs for almost two > years, loading it with large files (4GB to 4TB in size). Our cluster is > with approximately with 400TB with ~75% of usage, and we are planning to > grow a lot. > > Until now, we did process most of the files the "serial reading" way. But > now we will try to implement a parallel process on this files and we are > looking on the hadoop plugin as a solution for using mapreduce, or > something like that. > > Does the hadoop plugin access cephfs over the network as a normal cluster > or I can install the hadoop's processors on every ceph node and process the > data locally? > > > Thanks and regards, > > -- > Aristeu > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com