Re: [ceph-users] libvma & Ceph
On Wed, 22 Jan 2014 17:54:44 +1100 Blair Bethwaite wrote: > Has anyone looked at or better - actually tried - Mellanox's libvma to > accelerate Ceph's inter-OSD, client-OSD, or both? Looks like potential > for drop-in latency improvements, or am I missing something... > That's why I am going to deploy an Infiniband based Ceph (and client) cluster because even with just IPoIB I'm expecting it to be quite a bit snappier than 10GigE in the latency bits at least. But of course a more native, low-level approach would be even better. ^o^ Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] libvma & Ceph
Has anyone looked at or better - actually tried - Mellanox's libvma to accelerate Ceph's inter-OSD, client-OSD, or both? Looks like potential for drop-in latency improvements, or am I missing something... -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about CRUSH object placement
Hi Sage, I have a similar question, I need 2 replicas (one on each rack) and I would like to know whether the following rule always save primary on rack1? rule data { ruleset 0 type replicated min_size 2 max_size 2 step take rack1 step chooseleaf firstn 1 type host step emit step take rack2 step chooseleaf firstn 1 type host step emit }If so, I was wondering if you could tell the following rule will do the same thing by spreading primary and replica across rack1 and rack2? rule data { ruleset 0 type replicated min_size 2 max_size 2 step take row1 step choose firstn 2 type rack step chooseleaf firstn 1 type host step emit } Thanks in advance Sherry On Tuesday, January 21, 2014 7:00 AM, Sage Weil wrote: On Mon, 20 Jan 2014, Arnulf Heimsbakk wrote: > Hi, > > I'm trying to understand the CRUSH algorithm and how it distribute data. > Let's say I simplify a small datacenter setup and map it up > hierarchically in the crush map as show below. > > root datacenter > / \ > / \ > / \ > a b room > / | \ / | \ > a1 a2 a3 b1 b2 b3 rack > | | | | | | > h1 h2 h3 h4 h5 h6 host > > I want 4 copies of all data in my pool, configured on pool level. 2 > copies in each room. And I want to be sure not 2 copies resides in the > same rack when there is no HW failures. > > Will the chooseleaf rule below ensure this placement? > > step take root > step chooseleaf firstn 0 type room > step emit This won't ensure the 2 copies in each room are in different racks. > Or do I have to specify this more, like > > step take root > step choose firstn 2 type room > step chooseleaf firstn 2 type rack > step emit I think this is what you want. The thing it won't do is decide to put 4 replicas in room b when room a goes down completely... but at that scale, that is generally not what you want anyway. > Or even more, like? > > step take a > step choose firstn 2 type rack > step chooseleaf firstn 1 type host > step emit > step take b > step choose firstn 2 type rack > step chooseleaf firstn 1 type host > step emit > > Is there difference in failure behaviour in the different configurations? This would work too, but assumes you only have 2 rooms, and that you always want the primary copy to be in room a (which means the reads go there). The previous rule with spread the primary responsibility across both rooms. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Openstack Havana release installation with ceph
On Tue, Jan 21, 2014 at 10:38 AM, Dmitry Borodaenko wrote: > On Tue, Jan 21, 2014 at 2:23 AM, Lalitha Maruthachalam > wrote: >> Can someone please let me know whether there is any documentation for >> installing Havana release of Openstack along with Ceph. > These slides have some information about how this is done in Mirantis > OpenStack 4.0, including some gotchas and troubleshooting pointers: > http://files.meetup.com/11701852/fuel-ceph.pdf I didn't realize you need to be a participant of the meetup to get that file, here's a link to the same slides on SlideShare: http://www.slideshare.net/mirantis/fuel-ceph Apologies, -- Dmitry Borodaenko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] KVM guest using rbd slow and high cpu usage
Hi, I have a cluster of two kvm hosts and three ceph servers (osd+mons). I've been doing some basic performance tests, and I discover that I ftp server running in the guest is slow compared to a ftp server in the host. The same with a Samba file server. For example, ncftpget agaisnt the guest reports 42.78 MB/s and agaisn't the host machine 98.10 MB/s. I notice the CPU load during the transfer between the guest and the client machine. Kvm process is eating all the CPU, 60, 70, 100 % during the transfer. Is this a well known issue? Are there something to tune, debug? I'm using libvirt to run the VMs, This is the disk config: Regards, Diego -- Diego Woitasen VHGroup - Linux and Open Source solutions architect www.vhgroup.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Openstack Havana release installation with ceph
On Tue, Jan 21, 2014 at 10:38 AM, Dmitry Borodaenko wrote: > On Tue, Jan 21, 2014 at 2:23 AM, Lalitha Maruthachalam > wrote: >> Can someone please let me know whether there is any documentation for >> installing Havana release of Openstack along with Ceph. > > These slides have some information about how this is done in Mirantis > OpenStack 4.0, including some gotchas and troubleshooting pointers: > http://files.meetup.com/11701852/fuel-ceph.pdf Here's another link that is mentioned in the slides, I thought I'd bring it up specifically because the webcast is happening as soon as tomorrow: http://mirantis.hs-sites.com/how-to-stop-worrying-about-storage-openstack-ceph-webcast It will be a joint webcast by Mirantis and Inktank about OpenStack integration with Ceph. -- Dmitry Borodaenko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] One specific OSD process using much more CPU than all the others
Hi, I have a cluster that contains 16 OSDs spread over 4 physical machines. Each machines runs 4 OSD process. Among those, one isue periodically using 100% of the CPU. if you aggregate the total CPU time of the process over long periods, you can clearly see it uses roughtly 6x more CPU than any other of the all OSDs. The numbers for the other 15 OSDs (both on the same machine and on other machines) are quite consitent with one another. The PG distribution isn't ideal (some OSDs have more than others) but it's not bad either so there isn't one OSD having twice as much PGs as the other for example. I also ran a full SMART self-check on all the drives hosting OSD data but that didn't uncover anything. The logs (with default logging level) are not really showing anything abnormal either. The problem also seems to have been exacerbated by my recent update to Emperor (from Dumpling) this week end. For instance, here the CPU usage logs for the 16 OSDs during the last 6 months : http://i.imgur.com/cno73Ea.png The red line is osd.14 which is the problematic one. As you can see it recently "flared up" a lot but even before the update it was much higher than the other and rising which is a troubling trend. Any idea what this could be ? How can I isolate it and solve it ? Cheers, Sylvain ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd client affected with only one node down
Udo, I think you might have better luck using "ceph osd set noout" before doing maintenance, rather than "ceph osd set nodown", since you want the node to be marked down to avoid having I/O directed at it (but not out to avoid having recovery backfill begin.) -Aaron On Tue, Jan 21, 2014 at 10:01 AM, Udo Lembke wrote: > Hi, > I need a little bit help. > We have an 4-node ceph cluster and the clients run in trouble if one > node is down (due to maintenance). > > After the node is switched on again ceph health shows (for a little time): > HEALTH_WARN 4 pgs incomplete; 14 pgs peering; 370 pgs stale; 12 pgs > stuck unclean; 36 requests are blocked > 32 sec; nodown flag(s) set > > nodown is set due to maintenance and in the global section of ceph.conf > is following defined to protect for such things: > osd pool default min size = 1 # Allow writing one copy in a degraded state. > > > And in the logfile I see messages like: > 2014-01-21 18:00:18.566712 osd.46 172.20.2.14:6821/12805 17 : [WRN] 6 > slow requests, 3 included below; oldest blocked for > 180.734141 secs > 2014-01-21 18:00:18.566717 osd.46 172.20.2.14:6821/12805 18 : [WRN] slow > request 120.523231 seconds old, received at 2014-01-21 > > Due to the message: > 2014-01-21 18:00:21.126693 mon.0 172.20.2.11:6789/0 410241 : [INF] pgmap > v8331119: 4808 pgs: 4805 active+clean, 1 active+clean+scrubbing, 2 > active+clean+scrubbing+deep; 57849 GB data, 113 TB used, 77841 GB / 189 > TB avail; 2304 B/s wr, 0 op/s > I assume it's has someting to do with scrubbing and not writing from the > VMs? > > Are there any switches which protect for this behavior? > > > regards > > Udo > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Aaron Ten Clay http://www.aarontc.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Openstack Havana release installation with ceph
On Tue, Jan 21, 2014 at 2:23 AM, Lalitha Maruthachalam wrote: > Can someone please let me know whether there is any documentation for > installing Havana release of Openstack along with Ceph. These slides have some information about how this is done in Mirantis OpenStack 4.0, including some gotchas and troubleshooting pointers: http://files.meetup.com/11701852/fuel-ceph.pdf -- Dmitry Borodaenko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd client affected with only one node down
Hi, I need a little bit help. We have an 4-node ceph cluster and the clients run in trouble if one node is down (due to maintenance). After the node is switched on again ceph health shows (for a little time): HEALTH_WARN 4 pgs incomplete; 14 pgs peering; 370 pgs stale; 12 pgs stuck unclean; 36 requests are blocked > 32 sec; nodown flag(s) set nodown is set due to maintenance and in the global section of ceph.conf is following defined to protect for such things: osd pool default min size = 1 # Allow writing one copy in a degraded state. And in the logfile I see messages like: 2014-01-21 18:00:18.566712 osd.46 172.20.2.14:6821/12805 17 : [WRN] 6 slow requests, 3 included below; oldest blocked for > 180.734141 secs 2014-01-21 18:00:18.566717 osd.46 172.20.2.14:6821/12805 18 : [WRN] slow request 120.523231 seconds old, received at 2014-01-21 Due to the message: 2014-01-21 18:00:21.126693 mon.0 172.20.2.11:6789/0 410241 : [INF] pgmap v8331119: 4808 pgs: 4805 active+clean, 1 active+clean+scrubbing, 2 active+clean+scrubbing+deep; 57849 GB data, 113 TB used, 77841 GB / 189 TB avail; 2304 B/s wr, 0 op/s I assume it's has someting to do with scrubbing and not writing from the VMs? Are there any switches which protect for this behavior? regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how does ceph handle object writes?
Almost! The primary OSD sends out the data to its replicas simultaneously with putting it into the journal. -Greg On Monday, January 20, 2014, Tim Zhang wrote: > Hi guys, > I wonder how does store objects. Consider the writing obj process, IMO, > osd first get obj data from client, then the primary osd get the data, > store the data in journal(with sync) , after that the primary osd spray the > obj to other replica osd simultaneously, and each replica osd will write > the journal with sync and then reply to the primary osd. After recieving > all the replies from replica osd, primary osd will report the write op is > finished. Is there anything wrong? > -- Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Making runtime config changes
Hello, I do not know if it will have all the options, but you can use $ ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show This will give the settings for osd.0 . Regards, Laurent Barbe Le 21/01/2014 17:45, Kenneth Waegeman a écrit : Hi all, Is there somewhere a list of parameters that can('t) be changed using the parameter injection (ceph tell ..injectargs)? Thanks!! Kind regards Kenneth Waegeman ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Making runtime config changes
Hi all, Is there somewhere a list of parameters that can('t) be changed using the parameter injection (ceph tell ..injectargs)? Thanks!! Kind regards Kenneth Waegeman ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS or RadosGW
Hi Guys, Thanks for reply Sage. Yes librados seems to be right direction for our long term development. For short term, I guess we will stick on CephFS for files, Sorl for metadata and Riak for thumbnails. Ara On 01/20/2014 08:24 PM, Sage Weil wrote: > On Mon, 20 Jan 2014, Ara Sadoyan wrote: >> Hi list, >> >> We are on a process of developing custom storage application on top of Ceph. >> We will store all metadata in Solr for fast search for paths and get files >> only with calling direct links. >> My question is: Which one is better solution for that propose? Use CephFS or >> RadosGW. >> Which on e will perform faster and better ? > Very generally speaking, radosgw will probably have a lower metadata > overhead, but involves a proxy node (http -> ceph internal protocol). > Cephfs involves the metadata server, but the cephfs clients go direct to > OSDs. > > There is a third option, though: librados. If you are using solr or some > other data store for metadata, you may not need the metadata services of > either the cephfs mds or radosgw. Basically, if you don't need object > rename, and can handle striping of your large data items over smaller > rados objects, then librados may be a good fit. It provides rich data > types (bytes, attributes, key/value storage), single-object transactions, > and the ability to run code directly on the storage node. > > Here is a recent talk about librados: > > http://mirror.linux.org.au/pub/linux.conf.au/2014/Wednesday/60-Distributed_storage_and_compute_with_Cephs_librados_-_Sage_Weil.mp4 > > And the interface is defined here: > > http://ceph.com/docs/master/rados/api/librados/ > https://github.com/ceph/ceph/blob/master/src/include/rados/librados.hpp > https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h > > sage signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD port usage
Hi, I noticed in the documentation that the OSD should use 3 ports per OSD daemon running and so when I setup the cluster, I originally opened enough port to accomodate this (with a small margin so that restart could proceed even is ports aren't released immediately). However today I just noticed that OSD daemons are using 5 ports and so for some of them, a port or two were locked by the firewall. All the OSD were still reporting as OK and the cluster didn't report anything wrong but I was getting some weird behavior that could have been related. So is that usage of 5 TCP ports normal ? And if it is, could the doc be updated ? Cheers, Sylvain ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.75 released
On Jan 19, 2014, Sage Weil wrote: > On Sat, 18 Jan 2014, Sage Weil wrote: >> Which also means this will bite anybody who ran emperor, too. I think I >> need to introduce some pool flag or something indicating whether the dirty >> stats should be scrubbed or not, set only on new pools? > Pushed wip-7184. How about clearing the dirty_stats_invalid flag if, at the end of a scrub, the dirty stats didn't mismatch? Then, old PGs that, with 0.75, were marked inconsistent state and then got repaired, still with v13 stats, won't have to go through *another* round of repair before dirty mismatches are taken seriously again. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Toolchain Engineer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS posix ACLs
Hi, Many thanks for this. Cheers Alex Original Message *Subject:* Re: [ceph-users] CephFS posix ACLs *Date:* Tue, 21 Jan 2014 21:47:37 +0800 *From:* Yan, Zheng *To:* Alex Crow *CC:* ceph-users@lists.ceph.com that code is already in test branch of ceph-client, I think it will go into 3.14 kernel. Regards Yan, Zheng On Tue, Jan 21, 2014 at 7:04 PM, Alex Crow wrote: Hi list, I've noticed that a patch was submitted in October last year to enable POSIX ACLs in cephfs, but things have gone very quiet on that front recently. We're looking to use cephfs in our organisation for resilient storage of documents, but without ACLs we would have some issues. Are there any plans to get the code into a forthcoming release? Thanks Alex -- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. "Transact" is operated by Integrated Financial Arrangements plc. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. "Transact" is operated by Integrated Financial Arrangements plc. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] sending data in multiple segments causes s3upload (jets3t) to fail
Hi, I am trying to use hadoop distcp while copying data from hdfs to s3. Hadoop distcp devides the data in to multiple chunks and sends the data parellely so that faster performance is achieved. However this is failing against ceph s3 indicating a mismatch between md5 and etag returned by s3. However this is working with aws s3 Is there a workaround for this, apart from setting storage-service.disable-live-md5=true in jets3t.properties ? Dont want to disable md5 checking because it will not ensure correctness of uploaded data. Thank you, Jaseer TK -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS posix ACLs
that code is already in test branch of ceph-client, I think it will go into 3.14 kernel. Regards Yan, Zheng On Tue, Jan 21, 2014 at 7:04 PM, Alex Crow wrote: > Hi list, > > I've noticed that a patch was submitted in October last year to enable POSIX > ACLs in cephfs, but things have gone very quiet on that front recently. > > We're looking to use cephfs in our organisation for resilient storage of > documents, but without ACLs we would have some issues. Are there any plans > to get the code into a forthcoming release? > > Thanks > > Alex > -- > > This message is intended only for the addressee and may contain > confidential information. Unless you are that person, you may not > disclose its contents or use it in any way and are requested to delete > the message along with any attachments and notify us immediately. > "Transact" is operated by Integrated Financial Arrangements plc. 29 > Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 > 5300. (Registered office: as above; Registered in England and Wales > under number: 3727592). Authorised and regulated by the Financial > Conduct Authority (entered on the Financial Services Register; no. 190856). > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS posix ACLs
Hi list, I've noticed that a patch was submitted in October last year to enable POSIX ACLs in cephfs, but things have gone very quiet on that front recently. We're looking to use cephfs in our organisation for resilient storage of documents, but without ACLs we would have some issues. Are there any plans to get the code into a forthcoming release? Thanks Alex -- This message is intended only for the addressee and may contain confidential information. Unless you are that person, you may not disclose its contents or use it in any way and are requested to delete the message along with any attachments and notify us immediately. "Transact" is operated by Integrated Financial Arrangements plc. 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300. (Registered office: as above; Registered in England and Wales under number: 3727592). Authorised and regulated by the Financial Conduct Authority (entered on the Financial Services Register; no. 190856). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Openstack Havana release installation with ceph
Hi, Can someone please let me know whether there is any documentation for installing Havana release of Openstack along with Ceph. Thanks, Lalitha.M === Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. === ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com