Re: [ceph-users] Ceph Day Raleigh Cancelled
Hey Daleep, Right now we're planned through the end of the year, but if you know of a location where we would get good attendance and maybe someone with a space to host it we could certainly add that to the planning for 2016. Feel free to shoot me an email if you know of any places that might fit the bill. Typically we shoot for about 75-100 people and like to cater in lunch, snacks, and a cocktail reception. So that should give you an idea for what we look at when planning a venue. Thanks. On Wed, Aug 26, 2015 at 12:55 AM, Daleep Bais daleepb...@gmail.com wrote: Hi Patrick, is there any plans for such events to be held in India? Eagerly looking forward to it.. Thanks. Daleep Singh Bais On Wed, Aug 26, 2015 at 2:53 AM, Patrick McGarry pmcga...@redhat.com wrote: Due to low registration this event is being pushed back to next year. The Ceph Day events for Shanghai, Tokyo, and Melbourne should all still be proceeding as planned, however. Feel free to contact me if you have any questions about Ceph Days. Thanks. -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph monitoring with graphite
On Wed, Aug 26, 2015 at 3:33 PM, Dan van der Ster d...@vanderster.com wrote: Hi Wido, On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com wrote: I'm sending pool statistics to Graphite We're doing the same -- stripping invalid chars as needed -- and I would guess that lots of people have written similar json2graphite convertor scripts for Ceph monitoring in the recent months. It makes me wonder if it might be useful if Ceph had a --format mode to output df/stats/perf commands directly in graphite compatible text. Shouldn't be too difficult to write. Why would you want that instead of using e.g. diamond? I think it makes sense to have an external utility that converts a single ceph format into whatever external tool's format. The existing diamond plugin is pretty comprehensive. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Day Raleigh Cancelled
Yeah, we're still working on nailing down a venue for the Melbourne event (but it looks like 05 Nov is probably the date). As soon as we have a venue confirmed we'll put out a call for speakers and post the details on the /cephdays/ page. Thanks! On Tue, Aug 25, 2015 at 7:47 PM, Goncalo Borges gonc...@physics.usyd.edu.au wrote: Hey Patrick... I am interested in the Melbourne one. Under http://ceph.com/cephdays/ I do not see any reference to it. Can you give me more details on that? TIA Goncalo On 08/26/2015 07:23 AM, Patrick McGarry wrote: Due to low registration this event is being pushed back to next year. The Ceph Day events for Shanghai, Tokyo, and Melbourne should all still be proceeding as planned, however. Feel free to contact me if you have any questions about Ceph Days. Thanks. -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Samsung pm863 / sm863 SSD info request
Le Tue, 25 Aug 2015 17:07:18 +0200 Jan Schermer j...@schermer.cz écrivait: There's a nice whitepaper about under-provisioning everyone using SSDs should read it http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf BTW you can perfectly under-provision SSD by hand, by not allocating all space when partitionning them. It works just as well as firmware-set under-provisioning (just buy the cheaper model, and don't use up all available space, and you get the higher-end model for cheap :) -- Emmanuel Florac | Direction technique | Intellique | eflo...@intellique.com | +33 1 78 94 84 02 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Tech Talk Tomorrow
Hey cephers, Don't forget that tomorrow is our monthly Ceph Tech Talk. This month we're taking a look at performance measuring and tuning in Ceph. Mark Nelson, Ceph's lead performance engineer will be giving an overview of what's new in the performance world of Ceph and sharing some recent findings, definitely not one to miss! http://ceph.com/ceph-tech-talks/ 1p Eastern on our BlueJeans video conferencing system. Hopefully we'll see you there! -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Samsung pm863 / sm863 SSD info request
Maybe if you TRIM it first, but the correct way to do that is like this: https://www.thomas-krenn.com/en/wiki/SSD_Over-provisioning_using_hdparm Jan On 26 Aug 2015, at 18:58, Emmanuel Florac eflo...@intellique.com wrote: Le Tue, 25 Aug 2015 17:07:18 +0200 Jan Schermer j...@schermer.cz écrivait: There's a nice whitepaper about under-provisioning everyone using SSDs should read it http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf BTW you can perfectly under-provision SSD by hand, by not allocating all space when partitionning them. It works just as well as firmware-set under-provisioning (just buy the cheaper model, and don't use up all available space, and you get the higher-end model for cheap :) -- Emmanuel Florac | Direction technique | Intellique | eflo...@intellique.com | +33 1 78 94 84 02 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados: Undefined symbol error
Hello Jason, I checked the version of my built packages and they are all 9.0.2. I purged the cluster and uninstalled the packages and there seems to be nothing else - no older version. Could you elaborate on the fix for this issue? Thanks, Aakanksha -Original Message- From: Jason Dillaman [mailto:dilla...@redhat.com] Sent: Friday, August 21, 2015 6:37 AM To: Aakanksha Pudipeddi-SSI Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Rados: Undefined symbol error It sounds like you have rados CLI tool from an earlier Ceph release ( Hammer) installed and it is attempting to use the librados shared library from a newer (= Hammer) version of Ceph. Jason - Original Message - From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com To: ceph-us...@ceph.com Sent: Thursday, August 20, 2015 11:47:26 PM Subject: [ceph-users] Rados: Undefined symbol error Hello, I cloned the master branch of Ceph and after setting up the cluster, when I tried to use the rados commands, I got this error: rados: symbol lookup error: rados: undefined symbol: _ZN5MutexC1ERKSsbbbP11CephContext I saw a similar post here: http://tracker.ceph.com/issues/12563 but I am not clear on the solution for this problem. I am not performing an upgrade here but the error seems to be similar. Could anybody shed more light on the issue and how to solve it? Thanks a lot! Aakanksha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph repository for Debian Jessie
Hi, I would like to know if there will be a new repository hosting Jessie packages in the near future. If I am not mistaken the issue for not using the existing packages is that there are a few (dependency) libraries in Jessie in newer versions and some porting may be needed. --- At the moment we are using Ceph/RADOS in quite a few deployments, one of which is a production environment hosting VMs (using a custom/in_house_developed block device layer - based on librados). We are using Firefly and we would like to go to Hammer. -- Kind Regards, Konstantinos ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't mount Cephfs
Hi, We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with four disks on each node. Ceph is configured to hold two replicas (size 2). We use this cluster for ceph filesystem. Few days ago we had power outage after which I had to replace three of our cluster OSD disks. All OSD disks are now online, but I'm unable to mount filesystem and constantly receive 'mount error 5 = Input/output error'. Ceph status shows many 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph health detail' mds is replaying journal. [root@cnode0 ceph]# ceph -s cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24 health HEALTH_WARN 25 pgs backfill_toofull 10 pgs degraded 126 pgs down 263 pgs incomplete 54 pgs stale 10 pgs stuck degraded 263 pgs stuck inactive 54 pgs stuck stale 289 pgs stuck unclean 10 pgs stuck undersized 10 pgs undersized 4 requests are blocked 32 sec recovery 27139/10407227 objects degraded (0.261%) recovery 168597/10407227 objects misplaced (1.620%) 4 near full osd(s) too many PGs per OSD (312 max 300) *mds cluster is degraded* monmap e6: 6 mons at {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0} election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m mdsmap e1236: 1/1/1 up {0=2=up:*replay*}, 2 up:standby osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects 32825 GB used, 11698 GB / 44524 GB avail 27139/10407227 objects degraded (0.261%) 168597/10407227 objects misplaced (1.620%) 2153 active+clean 137 incomplete 126 down+incomplete 54 stale+active+clean 15 active+remapped+backfill_toofull 10 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped [root@cnode0 ceph]# I wasn't able to find any solution in the Internet and I worry I will make things even worse when continue to troubleshoot this on my own. I'm stuck. Could you please help? Thanks. Andrzej ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Why are RGW pools all prefixed with a period (.)?
Hi, It's something which has been 'bugging' me for some time now. Why are RGW pools prefixed with a period? I tried setting the root pool to 'rgw.root', but RGW (0.94.1) refuses to start: ERROR: region root pool name must start with a period I'm sending pool statistics to Graphite and when sending a key like this you 'break' Graphite: ceph.pools.stats.pool_name.kb_read A pool like .rgw.root will break this since Graphite splits on periods. So is there any reason why this is? What's the reasoning behind it? -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Great! Yes, behaviour exact as i described. So looks like it's root cause ) Thank you, Sam. Ilya! 2015-08-21 21:08 GMT+03:00 Samuel Just sj...@redhat.com: I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote: Odd, did you happen to capture osd logs? No, but the reproducer is trivial to cut paste. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount Cephfs
If you lost 3 disks with size 2 and at least 2 of those disks were in different host, that means you lost data with the default CRUSH. There's nothing you can do but either get those disks back in or recover from backup. Jan On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl wrote: Hi, We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with four disks on each node. Ceph is configured to hold two replicas (size 2). We use this cluster for ceph filesystem. Few days ago we had power outage after which I had to replace three of our cluster OSD disks. All OSD disks are now online, but I'm unable to mount filesystem and constantly receive 'mount error 5 = Input/output error'. Ceph status shows many 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph health detail' mds is replaying journal. [root@cnode0 ceph]# ceph -s cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24 health HEALTH_WARN 25 pgs backfill_toofull 10 pgs degraded 126 pgs down 263 pgs incomplete 54 pgs stale 10 pgs stuck degraded 263 pgs stuck inactive 54 pgs stuck stale 289 pgs stuck unclean 10 pgs stuck undersized 10 pgs undersized 4 requests are blocked 32 sec recovery 27139/10407227 objects degraded (0.261%) recovery 168597/10407227 objects misplaced (1.620%) 4 near full osd(s) too many PGs per OSD (312 max 300) mds cluster is degraded monmap e6: 6 mons at {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0} election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects 32825 GB used, 11698 GB / 44524 GB avail 27139/10407227 objects degraded (0.261%) 168597/10407227 objects misplaced (1.620%) 2153 active+clean 137 incomplete 126 down+incomplete 54 stale+active+clean 15 active+remapped+backfill_toofull 10 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped [root@cnode0 ceph]# I wasn't able to find any solution in the Internet and I worry I will make things even worse when continue to troubleshoot this on my own. I'm stuck. Could you please help? Thanks. Andrzej ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unexpected AIO Error
Hello, I am experiencing an issue where OSD Services fail due to an unexpected aio error. This has happend on two different OSD servers killing two different OSD Daemons services. I am running Ceph Hammer on Debian Wheezy with a backported Kernel(3.16.0-0.bpo.4-amd64). Below is the log from one of the crashes. I am wondering if anyone else has experienced this issue and might be able to point out some troubleshooting steps? so far all i’ve found are similar issues on the ceph bug tracker. I have posted my case to that as well. 2015-08-16 08:11:54.227567 7f13d68de700 0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for 30.685081 secs 2015-08-16 08:11:54.227579 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.685081 seconds old, received at 2015-08-16 08:11:23.542417: osd_op(client.1109461.0:219374023 rbd_data.10e67e79e2a9e3.0001c201 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2592768~4096] 5.89587894 ack+ondisk+write e1804) currently waiting for subops from 1,30 2015-08-16 08:11:54.227587 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.682262 seconds old, received at 2015-08-16 08:11:23.545236: osd_repop(client.1109461.0:219374083 5.c63 d6b85c63/rbd_data.10e67e79e2a9e3.0001a800/head//5 v 1804'121436) currently started 2015-08-16 08:11:54.227592 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.641702 seconds old, received at 2015-08-16 08:11:23.585797: osd_repop(client.1935041.0:1302764 5.82a 4219482a/rbd_data.1d685c2eb141f2.3c5f/head//5 v 1804'265055) currently started 2015-08-16 08:11:55.227784 7f13d68de700 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for 31.685317 secs 2015-08-16 08:11:55.227808 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.788521 seconds old, received at 2015-08-16 08:11:24.439213: osd_repop(client.1224667.0:34531998 5.abe 2f457abe/rbd_data.12aacc79e2a9e3.1d9d/head//5 v 1804'27936) currently started 2015-08-16 08:11:56.075649 7f13d3d89700 -1 journal aio to 7994220544~8192 wrote 18446744073709551611 2015-08-16 08:11:56.091460 7f13d3d89700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7f13d3d89700 time 2015-08-16 08:11:56.076462 os/FileJournal.cc: 1426: FAILED assert(0 == unexpected aio error) ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0xcdb572] 2: (FileJournal::write_finish_thread_entry()+0x847) [0xb9a437] 3: (FileJournal::WriteFinisher::entry()+0xd) [0xa3befd] 4: (()+0x6b50) [0x7f13de90ab50] 5: (clone()+0x6d) [0x7f13dd32695d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Pontus Lindgren System Engineer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW - multiple dns names
On Mon, Feb 23, 2015 at 10:18 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: -- *From: *Shinji Nakamoto shinji.nakam...@mgo.com *To: *ceph-us...@ceph.com *Sent: *Friday, February 20, 2015 3:58:39 PM *Subject: *[ceph-users] RadosGW - multiple dns names We have multiple interfaces on our Rados gateway node, each of which is assigned to one of our many VLANs with a unique IP address. Is it possible to set multiple DNS names for a single Rados GW, so it can handle the request to each of the VLAN specific IP address DNS names? Not yet, however, the upcoming hammer release will support that (hostnames will be configured as part of the region). I tested this using Hammer ( 0.94.2) and it doesn't seem to work. I'm just adding multiple rgw dns name lines to the configuration. Did it make Hammer, or am I doing it the wrong way? I couldn't find any docs either way... Yehuda eg. rgw dns name = prd-apiceph001 rgw dns name = prd-backendceph001 etc. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount Cephfs
Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I understand it is not possible to recover the data even partially? Unfortunatelly those disks are lost forever. Andrzej W dniu 2015-08-26 o 12:26, Jan Schermer pisze: If you lost 3 disks with size 2 and at least 2 of those disks were in different host, that means you lost data with the default CRUSH. There's nothing you can do but either get those disks back in or recover from backup. Jan On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl mailto:alukaw...@interia.pl wrote: Hi, We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with four disks on each node. Ceph is configured to hold two replicas (size 2). We use this cluster for ceph filesystem. Few days ago we had power outage after which I had to replace three of our cluster OSD disks. All OSD disks are now online, but I'm unable to mount filesystem and constantly receive 'mount error 5 = Input/output error'. Ceph status shows many 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph health detail' mds is replaying journal. [root@cnode0 ceph]# ceph -s cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24 health HEALTH_WARN 25 pgs backfill_toofull 10 pgs degraded 126 pgs down 263 pgs incomplete 54 pgs stale 10 pgs stuck degraded 263 pgs stuck inactive 54 pgs stuck stale 289 pgs stuck unclean 10 pgs stuck undersized 10 pgs undersized 4 requests are blocked 32 sec recovery 27139/10407227 objects degraded (0.261%) recovery 168597/10407227 objects misplaced (1.620%) 4 near full osd(s) too many PGs per OSD (312 max 300) *mds cluster is degraded* monmap e6: 6 mons at {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0} election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m mdsmap e1236: 1/1/1 up {0=2=up:*replay*}, 2 up:standby osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects 32825 GB used, 11698 GB / 44524 GB avail 27139/10407227 objects degraded (0.261%) 168597/10407227 objects misplaced (1.620%) 2153 active+clean 137 incomplete 126 down+incomplete 54 stale+active+clean 15 active+remapped+backfill_toofull 10 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped [root@cnode0 ceph]# I wasn't able to find any solution in the Internet and I worry I will make things even worse when continue to troubleshoot this on my own. I'm stuck. Could you please help? Thanks. Andrzej ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating data into a newer ceph instance
Thanks, Luis. The motivation for using the newer version is to keep up-to-date with Ceph development, since we suspect the old versioned radosgw could not be restarted possibly due to library mismatch. Do you know whether the self-healing feature of ceph is applicable between different versions or not? Fangzhe From: Luis Periquito [mailto:periqu...@gmail.com] Sent: Wednesday, August 26, 2015 10:11 AM To: Chang, Fangzhe (Fangzhe) Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Migrating data into a newer ceph instance I Would say the easiest way would be to leverage all the self-healing of ceph: add the new nodes to the old cluster, allow or force all the data to migrate between nodes, and then remove the old ones out. Well to be fair you could probably just install radosgw on another node and use it as your gateway without the need to even create a new OSD node... Or was there a reason to create a new cluster? I can tell you that one of the clusters I have has been around since bobtail, and now it's hammer... On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) fangzhe.ch...@alcatel-lucent.commailto:fangzhe.ch...@alcatel-lucent.com wrote: Hi, We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some amount of data in it. We are only using ceph as an object store via radosgw. Last week cheph-radosgw daemon suddenly refused to start (with logs only showing “initialization timeout” error on Centos 7). This triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new instance has a different set of key rings by default. The next step is to have all the data migrated. Does anyone know how to get the existing data out of the old ceph cluster (Giant) and into the new instance (Hammer)? Please note that in the old three-node cluster ceph osd is still running but radosgw is not. Any suggestion will be greatly appreciated. Thanks. Regards, Fangzhe Chang ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph monitoring with graphite
Hi Wido, On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com wrote: I'm sending pool statistics to Graphite We're doing the same -- stripping invalid chars as needed -- and I would guess that lots of people have written similar json2graphite convertor scripts for Ceph monitoring in the recent months. It makes me wonder if it might be useful if Ceph had a --format mode to output df/stats/perf commands directly in graphite compatible text. Shouldn't be too difficult to write. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount Cephfs
Most of the data is still here, but you won't be able to just mount it if it's inconsistent. I don't use CephFS so someone else could tell you if it's able to repair the filesystem with some parts missing. You lost part of the data where the copies were only on the 1 disk in one node and on either of the disks on the other node since no other copy exists. How much data you lost I don't exactly know, but since you only have 16 OSDs I'm afraid it will be in the order of ~3% probably? How many files are intact is a different question - it could be that every file is missing 3% of contents which would make the loss total. Guys? I have no idea how files map to pgs and object in CephFS... Jan On 26 Aug 2015, at 14:44, Andrzej Łukawski alukaw...@interia.pl wrote: Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I understand it is not possible to recover the data even partially? Unfortunatelly those disks are lost forever. Andrzej W dniu 2015-08-26 o 12:26, Jan Schermer pisze: If you lost 3 disks with size 2 and at least 2 of those disks were in different host, that means you lost data with the default CRUSH. There's nothing you can do but either get those disks back in or recover from backup. Jan On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl mailto:alukaw...@interia.pl wrote: Hi, We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with four disks on each node. Ceph is configured to hold two replicas (size 2). We use this cluster for ceph filesystem. Few days ago we had power outage after which I had to replace three of our cluster OSD disks. All OSD disks are now online, but I'm unable to mount filesystem and constantly receive 'mount error 5 = Input/output error'. Ceph status shows many 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph health detail' mds is replaying journal. [root@cnode0 ceph]# ceph -s cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24 health HEALTH_WARN 25 pgs backfill_toofull 10 pgs degraded 126 pgs down 263 pgs incomplete 54 pgs stale 10 pgs stuck degraded 263 pgs stuck inactive 54 pgs stuck stale 289 pgs stuck unclean 10 pgs stuck undersized 10 pgs undersized 4 requests are blocked 32 sec recovery 27139/10407227 objects degraded (0.261%) recovery 168597/10407227 objects misplaced (1.620%) 4 near full osd(s) too many PGs per OSD (312 max 300) mds cluster is degraded monmap e6: 6 mons at {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0} election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects 32825 GB used, 11698 GB / 44524 GB avail 27139/10407227 objects degraded (0.261%) 168597/10407227 objects misplaced (1.620%) 2153 active+clean 137 incomplete 126 down+incomplete 54 stale+active+clean 15 active+remapped+backfill_toofull 10 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped [root@cnode0 ceph]# I wasn't able to find any solution in the Internet and I worry I will make things even worse when continue to troubleshoot this on my own. I'm stuck. Could you please help? Thanks. Andrzej ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Migrating data into a newer ceph instance
Hi, We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some amount of data in it. We are only using ceph as an object store via radosgw. Last week cheph-radosgw daemon suddenly refused to start (with logs only showing initialization timeout error on Centos 7). This triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new instance has a different set of key rings by default. The next step is to have all the data migrated. Does anyone know how to get the existing data out of the old ceph cluster (Giant) and into the new instance (Hammer)? Please note that in the old three-node cluster ceph osd is still running but radosgw is not. Any suggestion will be greatly appreciated. Thanks. Regards, Fangzhe Chang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
Hi , We got a good deal on 843T and we are using it in our Openstack setup ..as journals . They have been running for last six months ... No issues . When we compared with Intel SSDs I think it was 3700 they were shade slower for our workload and considerably cheaper. We did not run any synthetic benchmark since we had a specific use case. The performance was better than our old setup so it was good enough. hth On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic andrija.pa...@gmail.com wrote: We have some 850 pro 256gb ssds if anyone interested to buy:) And also there was new 850 pro firmware that broke peoples disk which was revoked later etc... I'm sticking with only vacuum cleaners from Samsung for now, maybe... :) On Aug 25, 2015 12:02 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: To be honest, Samsung 850 PRO not 24/7 series... it's something about desktop+ series, but anyway - results from this drives - very very bad in any scenario acceptable by real life... Possible 845 PRO more better, but we don't want to experiment anymore... So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and no so durable for writes, but we think more better to replace 1 ssd per 1 year than to pay double price now. 2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: And should I mention that in another CEPH installation we had samsung 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from the system, so not wear out... Never again we buy Samsung :) On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote: First read please: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... Also if checking with FIO you will get better latencies on intel s3500 (model tested in our case) along with 20X better IOPS results... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ? On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic andrija.pa...@gmail.com wrote: Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew...@efigence.com mailto:mariusz.gronczew...@efigence.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating data into a newer ceph instance
I Would say the easiest way would be to leverage all the self-healing of ceph: add the new nodes to the old cluster, allow or force all the data to migrate between nodes, and then remove the old ones out. Well to be fair you could probably just install radosgw on another node and use it as your gateway without the need to even create a new OSD node... Or was there a reason to create a new cluster? I can tell you that one of the clusters I have has been around since bobtail, and now it's hammer... On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) fangzhe.ch...@alcatel-lucent.com wrote: Hi, We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some amount of data in it. We are only using ceph as an object store via radosgw. Last week cheph-radosgw daemon suddenly refused to start (with logs only showing “initialization timeout” error on Centos 7). This triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new instance has a different set of key rings by default. The next step is to have all the data migrated. Does anyone know how to get the existing data out of the old ceph cluster (Giant) and into the new instance (Hammer)? Please note that in the old three-node cluster ceph osd is still running but radosgw is not. Any suggestion will be greatly appreciated. Thanks. Regards, Fangzhe Chang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph monitoring with graphite
That would certainly be something we would use. QH On Wed, Aug 26, 2015 at 8:33 AM, Dan van der Ster d...@vanderster.com wrote: Hi Wido, On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com wrote: I'm sending pool statistics to Graphite We're doing the same -- stripping invalid chars as needed -- and I would guess that lots of people have written similar json2graphite convertor scripts for Ceph monitoring in the recent months. It makes me wonder if it might be useful if Ceph had a --format mode to output df/stats/perf commands directly in graphite compatible text. Shouldn't be too difficult to write. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating data into a newer ceph instance
I tend to not do too much each time: either upgrade or data migrate. The actual upgrade process is seamless... So you can just as easily upgrade the current cluster to hammer, and add/remove nodes on the fly. All of this is quite seamless and straightforward (other than the data migration itself). On Wed, Aug 26, 2015 at 3:17 PM, Chang, Fangzhe (Fangzhe) fangzhe.ch...@alcatel-lucent.com wrote: Thanks, Luis. The motivation for using the newer version is to keep up-to-date with Ceph development, since we suspect the old versioned radosgw could not be restarted possibly due to library mismatch. Do you know whether the self-healing feature of ceph is applicable between different versions or not? Fangzhe *From:* Luis Periquito [mailto:periqu...@gmail.com] *Sent:* Wednesday, August 26, 2015 10:11 AM *To:* Chang, Fangzhe (Fangzhe) *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] Migrating data into a newer ceph instance I Would say the easiest way would be to leverage all the self-healing of ceph: add the new nodes to the old cluster, allow or force all the data to migrate between nodes, and then remove the old ones out. Well to be fair you could probably just install radosgw on another node and use it as your gateway without the need to even create a new OSD node... Or was there a reason to create a new cluster? I can tell you that one of the clusters I have has been around since bobtail, and now it's hammer... On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) fangzhe.ch...@alcatel-lucent.com wrote: Hi, We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some amount of data in it. We are only using ceph as an object store via radosgw. Last week cheph-radosgw daemon suddenly refused to start (with logs only showing “initialization timeout” error on Centos 7). This triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new instance has a different set of key rings by default. The next step is to have all the data migrated. Does anyone know how to get the existing data out of the old ceph cluster (Giant) and into the new instance (Hammer)? Please note that in the old three-node cluster ceph osd is still running but radosgw is not. Any suggestion will be greatly appreciated. Thanks. Regards, Fangzhe Chang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unexpected AIO Error
Did you update: http://tracker.ceph.com/issues/12100 Just question. Shinobu On Wed, Aug 26, 2015 at 8:09 PM, Pontus Lindgren pon...@oderland.se wrote: Hello, I am experiencing an issue where OSD Services fail due to an unexpected aio error. This has happend on two different OSD servers killing two different OSD Daemons services. I am running Ceph Hammer on Debian Wheezy with a backported Kernel(3.16.0-0.bpo.4-amd64). Below is the log from one of the crashes. I am wondering if anyone else has experienced this issue and might be able to point out some troubleshooting steps? so far all i’ve found are similar issues on the ceph bug tracker. I have posted my case to that as well. 2015-08-16 08:11:54.227567 7f13d68de700 0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for 30.685081 secs 2015-08-16 08:11:54.227579 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.685081 seconds old, received at 2015-08-16 08:11:23.542417: osd_op(client.1109461.0:219374023 rbd_data.10e67e79e2a9e3.0001c201 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2592768~4096] 5.89587894 ack+ondisk+write e1804) currently waiting for subops from 1,30 2015-08-16 08:11:54.227587 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.682262 seconds old, received at 2015-08-16 08:11:23.545236: osd_repop(client.1109461.0:219374083 5.c63 d6b85c63/rbd_data.10e67e79e2a9e3.0001a800/head//5 v 1804'121436) currently started 2015-08-16 08:11:54.227592 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.641702 seconds old, received at 2015-08-16 08:11:23.585797: osd_repop(client.1935041.0:1302764 5.82a 4219482a/rbd_data.1d685c2eb141f2.3c5f/head//5 v 1804'265055) currently started 2015-08-16 08:11:55.227784 7f13d68de700 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for 31.685317 secs 2015-08-16 08:11:55.227808 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.788521 seconds old, received at 2015-08-16 08:11:24.439213: osd_repop(client.1224667.0:34531998 5.abe 2f457abe/rbd_data.12aacc79e2a9e3.1d9d/head//5 v 1804'27936) currently started 2015-08-16 08:11:56.075649 7f13d3d89700 -1 journal aio to 7994220544~8192 wrote 18446744073709551611 2015-08-16 08:11:56.091460 7f13d3d89700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7f13d3d89700 time 2015-08-16 08:11:56.076462 os/FileJournal.cc: 1426: FAILED assert(0 == unexpected aio error) ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0xcdb572] 2: (FileJournal::write_finish_thread_entry()+0x847) [0xb9a437] 3: (FileJournal::WriteFinisher::entry()+0xd) [0xa3befd] 4: (()+0x6b50) [0x7f13de90ab50] 5: (clone()+0x6d) [0x7f13dd32695d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. Pontus Lindgren System Engineer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Email: shin...@linux.com ski...@redhat.com Life w/ Linux http://i-shinobu.hatenablog.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why are RGW pools all prefixed with a period (.)?
On Wed, Aug 26, 2015 at 9:36 AM, Wido den Hollander w...@42on.com wrote: Hi, It's something which has been 'bugging' me for some time now. Why are RGW pools prefixed with a period? I tried setting the root pool to 'rgw.root', but RGW (0.94.1) refuses to start: ERROR: region root pool name must start with a period I'm sending pool statistics to Graphite and when sending a key like this you 'break' Graphite: ceph.pools.stats.pool_name.kb_read A pool like .rgw.root will break this since Graphite splits on periods. So is there any reason why this is? What's the reasoning behind it? This might just be a leftover from when we were mapping buckets into RADOS pools. Yehuda, is there some more current reason? -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't mount Cephfs
There is a cephfs-journal-tool that I believe is present in hammer and ought to let you get your MDS through replay. Depending on which PGs were lost you will have holes and/or missing files, in addition to not being able to find parts of the directory hierarchy (and maybe getting crashes if you access them). You can explore the options there and if the documentation is sparse, feel free to ask questions... -Greg On Wed, Aug 26, 2015 at 1:44 PM, Andrzej Łukawski alukaw...@interia.pl wrote: Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I understand it is not possible to recover the data even partially? Unfortunatelly those disks are lost forever. Andrzej W dniu 2015-08-26 o 12:26, Jan Schermer pisze: If you lost 3 disks with size 2 and at least 2 of those disks were in different host, that means you lost data with the default CRUSH. There's nothing you can do but either get those disks back in or recover from backup. Jan On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl wrote: Hi, We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with four disks on each node. Ceph is configured to hold two replicas (size 2). We use this cluster for ceph filesystem. Few days ago we had power outage after which I had to replace three of our cluster OSD disks. All OSD disks are now online, but I'm unable to mount filesystem and constantly receive 'mount error 5 = Input/output error'. Ceph status shows many 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph health detail' mds is replaying journal. [root@cnode0 ceph]# ceph -s cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24 health HEALTH_WARN 25 pgs backfill_toofull 10 pgs degraded 126 pgs down 263 pgs incomplete 54 pgs stale 10 pgs stuck degraded 263 pgs stuck inactive 54 pgs stuck stale 289 pgs stuck unclean 10 pgs stuck undersized 10 pgs undersized 4 requests are blocked 32 sec recovery 27139/10407227 objects degraded (0.261%) recovery 168597/10407227 objects misplaced (1.620%) 4 near full osd(s) too many PGs per OSD (312 max 300) mds cluster is degraded monmap e6: 6 mons at {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0} election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects 32825 GB used, 11698 GB / 44524 GB avail 27139/10407227 objects degraded (0.261%) 168597/10407227 objects misplaced (1.620%) 2153 active+clean 137 incomplete 126 down+incomplete 54 stale+active+clean 15 active+remapped+backfill_toofull 10 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped [root@cnode0 ceph]# I wasn't able to find any solution in the Internet and I worry I will make things even worse when continue to troubleshoot this on my own. I'm stuck. Could you please help? Thanks. Andrzej ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.5.28 released
Hi everyone, A new version of ceph-deploy has been released. Version 1.5.28 includes the following: - A fix for a regression introduced in 1.5.27 that prevented importing GPG keys on CentOS 6 only. - Will prevent Ceph daemon deployment on nodes that don't have Ceph installed on them. - Makes it possible to go from 1 monitor daemon to 2 without a 5 minute hang/delay. - More systemd enablement work. Full changelog is at [1]. Updated packages have been uploaded to {rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI. Cheers, - Travis [1] http://ceph.com/ceph-deploy/docs/changelog.html#id2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados: Undefined symbol error
- Original Message - From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com To: Jason Dillaman dilla...@redhat.com Cc: ceph-us...@ceph.com Sent: Thursday, 27 August, 2015 6:22:45 AM Subject: Re: [ceph-users] Rados: Undefined symbol error Hello Jason, I checked the version of my built packages and they are all 9.0.2. I purged the cluster and uninstalled the packages and there seems to be nothing else - no older version. Could you elaborate on the fix for this issue? Some thoughts... # c++filt _ZN5MutexC1ERKSsbbbP11CephContext Mutex::Mutex(std::basic_stringchar, std::char_traitschar, std::allocatorchar const, bool, bool, bool, CephContext*) Thats from common/Mutex.cc # nm --dynamic `which rados` 21|grep Mutex 00504da0 T _ZN5Mutex4LockEb 00504f70 T _ZN5Mutex6UnlockEv 00504a50 T _ZN5MutexC1EPKcbbbP11CephContext 00504a50 T _ZN5MutexC2EPKcbbbP11CephContext 00504d10 T _ZN5MutexD1Ev 00504d10 T _ZN5MutexD2Ev This shows my version is defined in the text section of the binary itself. What do you get when you run the above command? Like Jason says this is some sort of mis-match between your rados binary and your installed libs. HTH, Brad Thanks, Aakanksha -Original Message- From: Jason Dillaman [mailto:dilla...@redhat.com] Sent: Friday, August 21, 2015 6:37 AM To: Aakanksha Pudipeddi-SSI Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Rados: Undefined symbol error It sounds like you have rados CLI tool from an earlier Ceph release ( Hammer) installed and it is attempting to use the librados shared library from a newer (= Hammer) version of Ceph. Jason - Original Message - From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com To: ceph-us...@ceph.com Sent: Thursday, August 20, 2015 11:47:26 PM Subject: [ceph-users] Rados: Undefined symbol error Hello, I cloned the master branch of Ceph and after setting up the cluster, when I tried to use the rados commands, I got this error: rados: symbol lookup error: rados: undefined symbol: _ZN5MutexC1ERKSsbbbP11CephContext I saw a similar post here: http://tracker.ceph.com/issues/12563 but I am not clear on the solution for this problem. I am not performing an upgrade here but the error seems to be similar. Could anybody shed more light on the issue and how to solve it? Thanks a lot! Aakanksha ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW - multiple dns names
On Wed, Aug 26, 2015 at 11:52:02AM +0100, Luis Periquito wrote: On Mon, Feb 23, 2015 at 10:18 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: -- *From: *Shinji Nakamoto shinji.nakam...@mgo.com *To: *ceph-us...@ceph.com *Sent: *Friday, February 20, 2015 3:58:39 PM *Subject: *[ceph-users] RadosGW - multiple dns names We have multiple interfaces on our Rados gateway node, each of which is assigned to one of our many VLANs with a unique IP address. Is it possible to set multiple DNS names for a single Rados GW, so it can handle the request to each of the VLAN specific IP address DNS names? Not yet, however, the upcoming hammer release will support that (hostnames will be configured as part of the region). I tested this using Hammer ( 0.94.2) and it doesn't seem to work. I'm just adding multiple rgw dns name lines to the configuration. Did it make Hammer, or am I doing it the wrong way? I couldn't find any docs either way... http://ceph.com/docs/master/radosgw/config-ref/#get-a-region Look at the hostname entry, which has a description of: hostnames: A list of all the hostnames in the region. For example, you may use multiple domain names to refer to the same region. Optional. The rgw dns name setting will automatically be included in this list. You should restart the radosgw daemon(s) after changing this setting. Then you can stop using 'rgw dns name'. What the functionality does NOT do, is allow you to require a specific hostname arrives on a specific interface. All hostnames are valid for all interfaces/IPs. If you want to restrict it, I'd suggest doing the validation in haproxy, in front of civetweb. -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados bench object not correct errors on v9.0.3
-Original Message- From: Dałek, Piotr [mailto:piotr.da...@ts.fujitsu.com] Sent: Wednesday, August 26, 2015 2:02 AM To: Sage Weil; Deneau, Tom Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com Subject: RE: rados bench object not correct errors on v9.0.3 -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Sage Weil Sent: Tuesday, August 25, 2015 7:43 PM I have built rpms from the tarball http://ceph.com/download/ceph- 9.0.3.tar.bz2. Have done this for fedora 21 x86_64 and for aarch64. On both platforms when I run a single node cluster with a few osds and run rados bench read tests (either seq or rand) I get occasional reports like benchmark_data_myhost_20729_object73 is not correct! I never saw these with similar rpm builds on these platforms from 9.0.2 sources. Also, if I go to an x86-64 system running Ubuntu trusty for which I am able to install prebuilt binary packages via ceph-deploy install --dev v9.0.3 I do not see the errors there. Hrm.. haven't seen it on this end, but we're running/testing master and not 9.0.2 specifically. If you can reproduce this on master, that'd be very helpful! There have been some recent changes to rados bench... Piotr, does this seem like it might be caused by your changes? Yes. My PR #4690 (https://github.com/ceph/ceph/pull/4690) caused rados bench to be fast enough to sometimes run into race condition between librados's AIO and objbencher processing. That was fixed in PR #5152 (https://github.com/ceph/ceph/pull/5152) which didn't make it into 9.0.3. Tom, you can confirm this by inspecting the contents of objects questioned (their contents should be perfectly fine and I in line with other objects). In the meantime you can either apply patch from PR #5152 on your own or use - -no-verify. With best regards / Pozdrawiam Piotr Dałek Piotr -- Thank you. Yes, when I looked at the contents of the objects they always looked correct. And yes a single object would sometimes report an error and sometimes not. So a race condition makes sense. A couple of questions: * Why would I not see this behavior using the pre-built 9.0.3 binaries that get installed using ceph-deploy install --dev v9.0.3? I would assume this is built from the same sources as the 9.0.3 tarball. * So I assume one should not compare pre 9.0.3 rados bench numbers with 9.0.3 and after? The pull request https://github.com/ceph/ceph/pull/4690 did not mention the effect on final bandwidth numbers, did you notice what that effect was? -- Tom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question regarding degraded PGs
Hey guys... 1./ I have a simple question regarding the appearance of degraded PGs. First, for reference: a. I am working with 0.94.2 b. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per server. c. Our cluster is set with 'osd pool default size = 3' and 'osd pool default min size = 2' 2./ I am testing the cluster in several disaster catastrophe scenarios, and I've deliberately powered down a storage server, with its 8 OSDs. At this point, everything went fine: during the night, the cluster performed all the recovery I/O, and in the morning, I got a 'HEALTH_OK' cluster running in only 3 servers and 24 OSDs. 3./ I've now powered up the missing server, and as expected, the cluster enters in 'HEALTH_WARN' and adjusts itself to the presence of one more server and 8 more populated OSDs. 4. However, what I do not understand is why during the former process, there are some PGs reported as degraded. Check the ' ceph -s' output next. As far as i understand, degraded PGs means that ceph has not replicated some objects in the placement group the correct number of times yet. This is actually not the case because, if we started from a 'HEALTH_OK situation' it means all PGs are coherent. What does it happens under the cover when this new server (and its populated 8 OSDS) rejoins the cluster that triggers the existence of degraded PGs? # ceph -s cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc health HEALTH_WARN 115 pgs backfill 121 pgs backfilling 513 pgs degraded 31 pgs recovering 309 pgs recovery_wait 513 pgs stuck degraded 576 pgs stuck unclean recovery 198838/8567132 objects degraded (2.321%) recovery 3267325/8567132 objects misplaced (38.138%) monmap e1: 3 mons at {mon1=X.X.X.X:6789/0,mon2=X.X.X.X.34:6789/0,mon3=X.X.X.X:6789/0} election epoch 24, quorum 0,1,2 mon1,mon3,mon2 mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay osdmap e4764: 32 osds: 32 up, 32 in; 555 remapped pgs pgmap v1159567: 2176 pgs, 2 pools, 6515 GB data, 2240 kobjects 22819 GB used, 66232 GB / 89051 GB avail 198838/8567132 objects degraded (2.321%) 3267325/8567132 objects misplaced (38.138%) 1600 active+clean 292 active+recovery_wait+degraded+remapped 113 active+degraded+remapped+backfilling 60 active+degraded+remapped+wait_backfill 55 active+remapped+wait_backfill 27 active+recovering+degraded+remapped 17 active+recovery_wait+degraded 8 active+remapped+backfilling 4 active+recovering+degraded recovery io 521 MB/s, 170 objects/s Cheers Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Day Raleigh Cancelled
On that note in regards to watching /cephdays for details, the RSS feed 404's! http://ceph.com/cephdays/feed/ Regards, Matt. On 27/08/2015 02:52, Patrick McGarry wrote: Yeah, we're still working on nailing down a venue for the Melbourne event (but it looks like 05 Nov is probably the date). As soon as we have a venue confirmed we'll put out a call for speakers and post the details on the /cephdays/ page. Thanks! On Tue, Aug 25, 2015 at 7:47 PM, Goncalo Borges gonc...@physics.usyd.edu.au wrote: Hey Patrick...2 I am interested in the Melbourne one. Under http://ceph.com/cephdays/ I do not see any reference to it. Can you give me more details on that? TIA Goncalo On 08/26/2015 07:23 AM, Patrick McGarry wrote: Due to low registration this event is being pushed back to next year. The Ceph Day events for Shanghai, Tokyo, and Melbourne should all still be proceeding as planned, however. Feel free to contact me if you have any questions about Ceph Days. Thanks. -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] docker distribution
looks like it only works if an nginx is in front of radosgw, it translates absolute URIs and maybe fix other issues: https://github.com/docker/distribution/pull/808#issuecomment-135286314 https://github.com/docker/distribution/pull/902 On Mon, Aug 17, 2015 at 1:37 PM, Lorieri lori...@gmail.com wrote: Hi, Docker changed the old docker-registry project to docker-distribution and its API to v2. It now uses librados instead of radosgw to save data. In some ceph installations it is easier to get access to radosgw than to the cluster, so I've made a pull request to add radosgw support, it would be great if you test it. https://hub.docker.com/r/lorieri/docker-distribution-generic-s3/ Note: if you already use the old docker-registry you must create another bucket and push the images again, the API changed to v2. There is a shellscript to help https://github.com/docker/migrator How I tested it: docker run -d -p 5000:5000 -e REGISTRY_STORAGE=s3 \ -e REGISTRY_STORAGE_S3_REGION=generic \ -e REGISTRY_STORAGE_S3_REGIONENDPOINT=http://myradosgw.mydomain.com; \ -e REGISTRY_STORAGE_S3_BUCKET=registry \ -e REGISTRY_STORAGE_S3_ACCESSKEY=XXX \ -e REGISTRY_STORAGE_S3_SECRETKEY=XXX \ -e REGISTRY_STORAGE_S3_SECURE=false \ -e REGISTRY_STORAGE_S3_ENCRYPT=false \ -e REGISTRY_STORAGE_S3_REGIONSUPPORTSHEAD=false \ lorieri/docker-distribution-generic-s3 thanks, -lorieri ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com