Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-06 Thread Zoltan Arnold Nagy
Hi,

Please keep the list on CC as I guess others might be interested as well, if 
you don’t mind.

For VMs one can use rbd backed block devices and for bare-metal nodes where 
there is no abstraction one can use krbd - notice the k there, it stands for 
“kernel”. krdb is the in-kernel driver as there is no other abstraction layer 
there to provide unlike in the VM case where qemu can use librbd to implement 
the functionality.

If it’s a VM environment anyway then just attach the volumes from the 
underlying ceph and you are good to go. Make sure to attach multiple volumes on 
the same node for better performance - if you want 10TB per node for example, 
I’d mount 10x1TB.

I’d keep the replication in HDFS as that gives it “fake” data locality so 
better processing in MR/Spark workloads; just make sure you do map the nodes to 
different physical zones in your ceph cluster; way I’d do it is split the nodes 
across the racks, so let’s say 10 VM on rack 1, 10 VM on rack 2, 10VM on rack3, 
and set up the crush rules to keep the data on that particular volume within 
the rack.

This maps your underlying failure domains from ceph to hdfs basically.

> On 06 Feb 2016, at 14:07, Jose M  wrote:
> 
> Hi Zoltan! Thanks for the tips :)
> 
> I suppose the krbd is for bare metal nodes? I ask because you says "for vm's" 
> for both of them ;)
> 
> A couple of questions if you don't mind. This cloud is base in apache 
> cloudstack, so I understand I should go to the VM way, right? And why is 
> better not to use krbd if they are VM's?
> 
> I understand i should disable replacation in one of them. Would be your 
> recommendation to do it in ceph or in hadoop? Ceph seems to be more reliable, 
> but i have doubts if any hadoop feature would work bad if replication is 
> disabled "from the hadoop side".
> 
> THanks!
> De: Zoltan Arnold Nagy  <mailto:zol...@linux.vnet.ibm.com>>
> Enviado: viernes, 05 de febrero de 2016 02:21 p.m.
> Para: Jose M
> Asunto: Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)
>  
> Hi,
> 
> Are these bare metal nodes or VMs?
> 
> For VMs I suggest you just attach rbd data disks then let hdfs do it’s magic. 
> Just make sure you’re not replicating 9x (3x on ceph + 3x on hadoop).
> If it’s VMs, you can just do the same with krbd, just make sure to run a 
> recent enough kernel  :-)
> 
> Basically putting HDFS on RBDs.
> 
>> On 05 Feb 2016, at 13:42, Jose M > <mailto:soloning...@hotmail.com>> wrote:
>> 
>> Hi Zoltan, thanks for the answer.
>> 
>> Because replacing hdfs:// with ceph:// and use CephFs doesn't work for all 
>> haddop componentes out of the box (unless in my tests), for example I had 
>> issues with Hbase, then with Yarn, Hue, etc (I'm using the cloudera 
>> distribution but I also tried with separate components). And besides the 
>> need to add jars and bindings to each node to get them work, there are a lot 
>> of places (xmls, configuration) where the "hdfs for ceph" replacement need 
>> to be made. 
>> 
>> Giving this issues, I thought that mounting ceph as a local directory and 
>> then use this "virtual dirs" as the haddop dfs dirs, would be easier and 
>> will work better (less configuration problems, and only changing the dfs 
>> dirs will make all components work without any more changes).
>> 
>> Of course I can be totally wrong, and it's a core change to do this, that's 
>> why I think I should ask here first :)
>> 
>> Thanks!
>> 
>> PS: If you are asking why I'm trying to use ceph here, well it's because we 
>> were given an infrastructure with the possibility yo use a big ceph storage 
>> that's working really really well (but as an object store and wasn't use 
>> until now with hadoop).
>> 
>> 
>> De: Zoltan Arnold Nagy > <mailto:zol...@linux.vnet.ibm.com>>
>> Enviado: jueves, 04 de febrero de 2016 06:07 p.m.
>> Para: John Spray
>> Cc: Jose M; ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> Asunto: Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)
>>  
>> Might be totally wrong here, but it’s not layering them but replacing 
>> hdfs:// URLs with ceph:// URLs so all the mapreduce/spark/hbase/whatever is 
>> on top can use CephFS directly which is not a bad thing to do (if it works) 
>> :-)
>> 
>>> On 02 Feb 2016, at 16:50, John Spray >> <mailto:jsp...@redhat.com>> wrote:
>>> 
>>> On Tue, Feb 2, 2016 at 3:42 PM, Jose M >> <mailto:soloning...@hotmail.com>> wrote:
>>>> Hi,
>>>> 
>>>> 
>>>

Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-06 Thread Zoltan Arnold Nagy
Hi,

Are these bare metal nodes or VMs?

For VMs I suggest you just attach rbd data disks then let hdfs do it’s magic. 
Just make sure you’re not replicating 9x (3x on ceph + 3x on hadoop).
If it’s VMs, you can just do the same with krbd, just make sure to run a recent 
enough kernel  :-)

Basically putting HDFS on RBDs.

> On 05 Feb 2016, at 13:42, Jose M  <mailto:soloning...@hotmail.com>> wrote:
> 
> Hi Zoltan, thanks for the answer.
> 
> Because replacing hdfs:// with ceph:// and use CephFs doesn't work for all 
> haddop componentes out of the box (unless in my tests), for example I had 
> issues with Hbase, then with Yarn, Hue, etc (I'm using the cloudera 
> distribution but I also tried with separate components). And besides the need 
> to add jars and bindings to each node to get them work, there are a lot of 
> places (xmls, configuration) where the "hdfs for ceph" replacement need to be 
> made. 
> 
> Giving this issues, I thought that mounting ceph as a local directory and 
> then use this "virtual dirs" as the haddop dfs dirs, would be easier and will 
> work better (less configuration problems, and only changing the dfs dirs will 
> make all components work without any more changes).
> 
> Of course I can be totally wrong, and it's a core change to do this, that's 
> why I think I should ask here first :)
> 
> Thanks!
> 
> PS: If you are asking why I'm trying to use ceph here, well it's because we 
> were given an infrastructure with the possibility yo use a big ceph storage 
> that's working really really well (but as an object store and wasn't use 
> until now with hadoop).
> 
> 
> De: Zoltan Arnold Nagy  <mailto:zol...@linux.vnet.ibm.com>>
> Enviado: jueves, 04 de febrero de 2016 06:07 p.m.
> Para: John Spray
> Cc: Jose M; ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> Asunto: Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)
>  
> Might be totally wrong here, but it’s not layering them but replacing hdfs:// 
> URLs with ceph:// URLs so all the mapreduce/spark/hbase/whatever is on top 
> can use CephFS directly which is not a bad thing to do (if it works) :-)
> 
>> On 02 Feb 2016, at 16:50, John Spray > <mailto:jsp...@redhat.com>> wrote:
>> 
>> On Tue, Feb 2, 2016 at 3:42 PM, Jose M > <mailto:soloning...@hotmail.com>> wrote:
>>> Hi,
>>> 
>>> 
>>> One simple question, in the ceph docs says that to use Ceph as an HDFS
>>> replacement, I can use the CephFs Hadoop plugin
>>> (http://docs.ceph.com/docs/master/cephfs/hadoop/ 
>>> <http://docs.ceph.com/docs/master/cephfs/hadoop/>).
>>> 
>>> 
>>> What I would like to know if instead of using the plugin, I can mount ceph
>>> in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
>>> "ceph" dirs, instead of native local dirs.
>>> 
>>> I understand that maybe will involve more configuration steps (configuring
>>> fstab in each node), but will this work? Is there any problem with this type
>>> of configuration?
>> 
>> Without being a big HDFS expert, it seems like you would be
>> essentially putting one distributed filesystem on top of another
>> distributed filesystem.  I don't know if you're going to find anything
>> that breaks as such, but it's probably not a good idea.
>> 
>> John
>> 
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-05 Thread Jose M
Hi Zoltan, thanks for the answer.


Because replacing hdfs:// with ceph:// and use CephFs doesn't work for all 
haddop componentes out of the box (unless in my tests), for example I had 
issues with Hbase, then with Yarn, Hue, etc (I'm using the cloudera 
distribution but I also tried with separate components). And besides the need 
to add jars and bindings to each node to get them work, there are a lot of 
places (xmls, configuration) where the "hdfs for ceph" replacement need to be 
made.


Giving this issues, I thought that mounting ceph as a local directory and then 
use this "virtual dirs" as the haddop dfs dirs, would be easier and will work 
better (less configuration problems, and only changing the dfs dirs will make 
all components work without any more changes).


Of course I can be totally wrong, and it's a core change to do this, that's why 
I think I should ask here first :)


Thanks!


PS: If you are asking why I'm trying to use ceph here, well it's because we 
were given an infrastructure with the possibility yo use a big ceph storage 
that's working really really well (but as an object store and wasn't use until 
now with hadoop).



De: Zoltan Arnold Nagy 
Enviado: jueves, 04 de febrero de 2016 06:07 p.m.
Para: John Spray
Cc: Jose M; ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

Might be totally wrong here, but it's not layering them but replacing hdfs:// 
URLs with ceph:// URLs so all the mapreduce/spark/hbase/whatever is on top can 
use CephFS directly which is not a bad thing to do (if it works) :-)

On 02 Feb 2016, at 16:50, John Spray 
mailto:jsp...@redhat.com>> wrote:

On Tue, Feb 2, 2016 at 3:42 PM, Jose M 
mailto:soloning...@hotmail.com>> wrote:
Hi,


One simple question, in the ceph docs says that to use Ceph as an HDFS
replacement, I can use the CephFs Hadoop plugin
(http://docs.ceph.com/docs/master/cephfs/hadoop/).


What I would like to know if instead of using the plugin, I can mount ceph
in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
"ceph" dirs, instead of native local dirs.

I understand that maybe will involve more configuration steps (configuring
fstab in each node), but will this work? Is there any problem with this type
of configuration?

Without being a big HDFS expert, it seems like you would be
essentially putting one distributed filesystem on top of another
distributed filesystem.  I don't know if you're going to find anything
that breaks as such, but it's probably not a good idea.

John


Thanks in advance,



___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-04 Thread Zoltan Arnold Nagy
Might be totally wrong here, but it’s not layering them but replacing hdfs:// 
URLs with ceph:// URLs so all the mapreduce/spark/hbase/whatever is on top can 
use CephFS directly which is not a bad thing to do (if it works) :-)

> On 02 Feb 2016, at 16:50, John Spray  wrote:
> 
> On Tue, Feb 2, 2016 at 3:42 PM, Jose M  > wrote:
>> Hi,
>> 
>> 
>> One simple question, in the ceph docs says that to use Ceph as an HDFS
>> replacement, I can use the CephFs Hadoop plugin
>> (http://docs.ceph.com/docs/master/cephfs/hadoop/).
>> 
>> 
>> What I would like to know if instead of using the plugin, I can mount ceph
>> in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
>> "ceph" dirs, instead of native local dirs.
>> 
>> I understand that maybe will involve more configuration steps (configuring
>> fstab in each node), but will this work? Is there any problem with this type
>> of configuration?
> 
> Without being a big HDFS expert, it seems like you would be
> essentially putting one distributed filesystem on top of another
> distributed filesystem.  I don't know if you're going to find anything
> that breaks as such, but it's probably not a good idea.
> 
> John
> 
>> 
>> Thanks in advance,
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-03 Thread Noah Watkins
Hi Jose,

I believe what you are referring to is using Hadoop over Ceph via the
VFS implementation of the Ceph client vs the user-space libcephfs
client library. The current Hadoop plugin for Ceph uses the client
library. You could run Hadoop over Ceph using a local Ceph mount
point, but it would take some configuration (I believe this is how
Gluster Hadoop works). To take advantage of data locality, you'd also
want to invoke an ioctl (if locality is currently expose), or also
integrate with libcephfs for that functionality.

-Noah

On Tue, Feb 2, 2016 at 7:50 AM, John Spray  wrote:
> On Tue, Feb 2, 2016 at 3:42 PM, Jose M  wrote:
>> Hi,
>>
>>
>> One simple question, in the ceph docs says that to use Ceph as an HDFS
>> replacement, I can use the CephFs Hadoop plugin
>> (http://docs.ceph.com/docs/master/cephfs/hadoop/).
>>
>>
>> What I would like to know if instead of using the plugin, I can mount ceph
>> in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
>> "ceph" dirs, instead of native local dirs.
>>
>> I understand that maybe will involve more configuration steps (configuring
>> fstab in each node), but will this work? Is there any problem with this type
>> of configuration?
>
> Without being a big HDFS expert, it seems like you would be
> essentially putting one distributed filesystem on top of another
> distributed filesystem.  I don't know if you're going to find anything
> that breaks as such, but it's probably not a good idea.
>
> John
>
>>
>> Thanks in advance,
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-02 Thread John Spray
On Tue, Feb 2, 2016 at 3:42 PM, Jose M  wrote:
> Hi,
>
>
> One simple question, in the ceph docs says that to use Ceph as an HDFS
> replacement, I can use the CephFs Hadoop plugin
> (http://docs.ceph.com/docs/master/cephfs/hadoop/).
>
>
> What I would like to know if instead of using the plugin, I can mount ceph
> in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
> "ceph" dirs, instead of native local dirs.
>
> I understand that maybe will involve more configuration steps (configuring
> fstab in each node), but will this work? Is there any problem with this type
> of configuration?

Without being a big HDFS expert, it seems like you would be
essentially putting one distributed filesystem on top of another
distributed filesystem.  I don't know if you're going to find anything
that breaks as such, but it's probably not a good idea.

John

>
> Thanks in advance,
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-02 Thread Jose M
Hi,


One simple question, in the ceph docs says that to use Ceph as an HDFS 
replacement, I can use the CephFs Hadoop plugin 
(http://docs.ceph.com/docs/master/cephfs/hadoop/).


What I would like to know if instead of using the plugin, I can mount ceph in 
fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted "ceph" 
dirs, instead of native local dirs.

I understand that maybe will involve more configuration steps (configuring 
fstab in each node), but will this work? Is there any problem with this type of 
configuration?


Thanks in advance,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop

2014-10-27 Thread John Spray
Hi Matan,

Hadoop on CephFS is part of the regular test suites run on CephFS, so
it should work at least to some extent.   Any testing/feedback on this
will be appreciated.

As far as I know, the article you link is the best available documentation.

Cheers,
John


On Fri, Oct 24, 2014 at 8:30 PM, Matan Safriel  wrote:
> Hi,
>
> Given HDFS is far from ideal for small files, I am examining the possibility
> of using Hadoop on top Ceph. I found mainly one online resource about it
> https://ceph.com/docs/v0.79/cephfs/hadoop/. I am wondering whether there is
> any reference implementation or blog post you are aware of, about hadoop on
> top Ceph. Likewise happy to have any pointers about why _not_ to attempt
> just that
>
> Thanks!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and hadoop

2014-10-24 Thread Matan Safriel
Hi,

Given HDFS is far from ideal for small files, I am examining the
possibility of using Hadoop on top Ceph. I found mainly one online resource
about it https://ceph.com/docs/v0.79/cephfs/hadoop/. I am wondering whether
there is any reference implementation or blog post you are aware of, about
hadoop on top Ceph. Likewise happy to have any pointers about why _not_ to
attempt just that

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com