Unfortunately it seems that currently CephFS doesn't support Hadoop 2.*
The next step will be try Tachyon on top of Ceph.
Maybe somebody tried such constellation already?

-----Original Message-----
From: Lionel Bouton [mailto:lionel+c...@bouton.name] 
Sent: Tuesday, July 07, 2015 7:49 PM
To: Dmitry Meytin
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] FW: Ceph data locality

On 07/07/15 18:20, Dmitry Meytin wrote:
> Exactly because of that issue I've reduced the number of Ceph replications to 
> 2 and the number of HDFS copies is also 2 (so we're talking about 4 copies).
> I want (but didn't tried yet) to change Ceph replication to 1 and change HDFS 
> back to 3.

You are stacking a distributed storage network on top of another, no wonder you 
find the performance below your expectations.

You could (should?) use CephFS instead of HDFS on RBD backed VMs (as this is 
clearly redundant and inefficient). Note that if you try to use
size=1 for your RBD pool instead (which will probably be slower than using 
Hadoop with CephFS) and loose only one disk you will probably freeze most or 
all of your VMs (as their disks will be split across all physical disks of your 
Ceph cluster) and certainly corrupt all of their filesystems.

See http://ceph.com/docs/master/cephfs/hadoop/

If this doesn't work for you I'll suggest separating the VMs system disks from 
the Hadoop storage and run Hadoop storage nodes on bare metal. VMs could either 
be backed by local disks or RBD if you need to but in any case they should 
avoid creating any large IO spikes which could disturb the Hadoop storage nodes.

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to