Re: Re: Spark standalone/Mesos on top of Ceph
Hi Sun, The issue with Ceph as the underlying file system for Spark is that you lose data locality. Ceph is not designed to have spark run directly on top of the OSDs. I know that cephfs provides data location information via hadoop compatible API. The last time I researched on this is that the integration is experimental (just google it and you will find a lot of discussions eg. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/002837.html). However, this might not be a biggest issue as long as you have GREAT network bandwidth like infiniband or +10 Gigabit Ethernet. My guess is that the architecture and the performance will be similar to S3+Spark at best (with 10GE instances) if you guys do the network stuff seriously. HTH, Jerry On Tue, Sep 22, 2015 at 9:59 PM, fightf...@163.com wrote: > Hi Jerry > > Yeah, we managed to run and use ceph already in our few production > environment, especially with OpenStack. > > The reason we want to use Ceph is that we aim to look for some workarounds > for unified storage layer and the design > > concepts of ceph is quite catching. I am just interested in such work like > the hadoop cephfs plugin and recently we > > are going to do some benchmark tests between HDFS and cephfs. > > So the ongoing progress would be benificial if some related work between > Apache Spark and Ceph could dedicate some > > thoughful insights. > > BTW, for the Ceph Object Gateway s3 rest api, agreed for such > inconvinience and some incompobilities. However, we had not > > yet quite researched and tested over radosgw a lot. But we had some little > requirements using gw in some use cases. > > Hope for more considerations and talks. > > Best, > Sun. > > -- > fightf...@163.com > > > *From:* Jerry Lam > *Date:* 2015-09-23 09:37 > *To:* fightf...@163.com > *CC:* user > *Subject:* Re: Spark standalone/Mesos on top of Ceph > Do you have specific reasons to use Ceph? I used Ceph before, I'm not too > in love with it especially when I was using the Ceph Object Gateway S3 API. > There are some incompatibilities with aws s3 api. You really really need to > try it because making the commitment. Did you managed to install it? > > On Tue, Sep 22, 2015 at 9:28 PM, fightf...@163.com > wrote: > >> Hi guys, >> >> Here is the info for Ceph : http://ceph.com/ >> >> We are investigating and using Ceph for distributed storage and >> monitoring, specifically interested >> >> in using Ceph as the underlied file system storage for spark. However, we >> had no experience for achiveing >> >> that. Any body has seen such progress ? >> >> Best, >> Sun. >> >> -- >> fightf...@163.com >> > > > > >
Re: Re: Spark standalone/Mesos on top of Ceph
Hi Jerry Yeah, we managed to run and use ceph already in our few production environment, especially with OpenStack. The reason we want to use Ceph is that we aim to look for some workarounds for unified storage layer and the design concepts of ceph is quite catching. I am just interested in such work like the hadoop cephfs plugin and recently we are going to do some benchmark tests between HDFS and cephfs. So the ongoing progress would be benificial if some related work between Apache Spark and Ceph could dedicate some thoughful insights. BTW, for the Ceph Object Gateway s3 rest api, agreed for such inconvinience and some incompobilities. However, we had not yet quite researched and tested over radosgw a lot. But we had some little requirements using gw in some use cases. Hope for more considerations and talks. Best, Sun. fightf...@163.com From: Jerry Lam Date: 2015-09-23 09:37 To: fightf...@163.com CC: user Subject: Re: Spark standalone/Mesos on top of Ceph Do you have specific reasons to use Ceph? I used Ceph before, I'm not too in love with it especially when I was using the Ceph Object Gateway S3 API. There are some incompatibilities with aws s3 api. You really really need to try it because making the commitment. Did you managed to install it? On Tue, Sep 22, 2015 at 9:28 PM, fightf...@163.com wrote: Hi guys, Here is the info for Ceph : http://ceph.com/ We are investigating and using Ceph for distributed storage and monitoring, specifically interested in using Ceph as the underlied file system storage for spark. However, we had no experience for achiveing that. Any body has seen such progress ? Best, Sun. fightf...@163.com
Re: Spark standalone/Mesos on top of Ceph
Do you have specific reasons to use Ceph? I used Ceph before, I'm not too in love with it especially when I was using the Ceph Object Gateway S3 API. There are some incompatibilities with aws s3 api. You really really need to try it because making the commitment. Did you managed to install it? On Tue, Sep 22, 2015 at 9:28 PM, fightf...@163.com wrote: > Hi guys, > > Here is the info for Ceph : http://ceph.com/ > > We are investigating and using Ceph for distributed storage and > monitoring, specifically interested > > in using Ceph as the underlied file system storage for spark. However, we > had no experience for achiveing > > that. Any body has seen such progress ? > > Best, > Sun. > > -- > fightf...@163.com >
Spark standalone/Mesos on top of Ceph
Hi guys, Here is the info for Ceph : http://ceph.com/ We are investigating and using Ceph for distributed storage and monitoring, specifically interested in using Ceph as the underlied file system storage for spark. However, we had no experience for achiveing that. Any body has seen such progress ? Best, Sun. fightf...@163.com