ceph data locality

Johnu George (johnugeo) Thu, 04 Sep 2014 00:18:07 -0700

Hi All,
        I was reading more on Hadoop over ceph. I heard from Noah that
tuning of Hadoop on Ceph is going on. I am just curious to know if there
is any reason to keep default object size as 64MB. Is it because of the
fact that it becomes difficult to encode
 getBlockLocations if blocks are divided into objects and to choose the
best location for tasks if no nodes in the system has a complete block.?


I am wondering if someone any benchmark results for various object sizes.
If you have them, it will be helpful if you share them.

I see that Ceph doesn¹t place objects considering the client location or
distance between client and the osds where data is stored.(data-locality)
While, data locality is the key idea for HDFS block placement and
retrieval for maximum throughput. So, how does ceph plan to perform better
than HDFS as ceph relies on random placement
 using hashing unlike HDFS block placement? Can someone also point out
some performance results comparing ceph random placements vs hdfs locality
aware placement?

Also, Sage wrote about a way to specify a node to be primary for hadoop
like environments. 
(http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/1548 ) Is
this through primary affinity configuration?

Thanks,
Johnu

N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쐚�]z彈�{ay��뉅숇,j�줲＂톒쉵�z�췿ⅱ�
◁쫔:+v돣둾�j�m텫�쳭喩zZ+껠쉸듶줷"얎!턨

ceph data locality

Reply via email to