Re: [ceph-users] buckets and users
Your solution of pre-pending the environment name to the bucket, was my first choice, but at the moment I can't ask the devs to change the code to do that. For now I have to stick with the zones solution. Should I follow the federated zones docs (http://ceph.com/docs/master/radosgw/federated-config/) but skip the sync step? Thank you, Marco Garcês On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote: You could setup dedicated zones for each environment, and not replicate between them. Each zone would have it's own URL, but you would be able to re-use usernames and bucket names. If different URLs are a problem, you might be able to get around that in the load balancer or the web servers. I wouldn't really recommend that, but it's possible. I have a similar requirement. I was able to pre-pending the environment name to the bucket in my client code, which made things much easier. On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote: Hi there, I have this situation, where I'm using the same Ceph cluster (with radosgw), for two different environments, QUAL and PRE-PRODUCTION. I need different users for each environment, but I need to create the same buckets, with the same name; I understand there is no way to have 2 buckets with the same name, but how can I go around this? Perhaps creating a different pool for each user? Can you help me? Thank you in advance, my best regards, Marco Garcês ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All OSDs don't restart after shutdown
Dear Users, I'm quite new to CEPH. I completed the tutorial here: http://ceph.com/docs/giant/start/quick-ceph-deploy After it, I turned off the VMs where the OSDs, Monitors and MDS were. This morning I restarted the machines but the OSD don't want to restart, while the other services restarted without any problems. On one node: [root@ceph-node1 ~]# service ceph status === mon.ceph-node1 === mon.ceph-node1: running {version:0.80.7} === osd.2 === osd.2: not running. === mds.ceph-node1 === mds.ceph-node1: running {version:0.80.7} [root@ceph-node1 ~]# /etc/init.d/ceph -a start osd.2 === osd.2 === failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.2 --keyring=/var/lib/ceph/osd/ceph-2/keyring osd crush create-or-move -- 2 0.01 host=ceph-node1 root=default' This is happening on all the machines. On the admin-node side the ceph healt command or the ceph -w hangs forever. The logs files don't show any problem, the last line on the OSDs is from yesterday. Could anyone help me to solve this problem? Thank you. Cheers. Luca ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All OSDs don't restart after shutdown
On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro luca.mazzafe...@rzg.mpg.de wrote: Dear Users, Hi Luca, On the admin-node side the ceph healt command or the ceph -w hangs forever. I'm not a ceph expert either, but this is usually an indication that the monitors are not running. How many MONs are you running? Are they all alive? What's in the mon logs? Also check the time on the mon nodes. cheers, Antonio -- antonio.s.mess...@gmail.com antonio.mess...@uzh.ch +41 (0)44 635 42 22 S3IT: Service and Support for Science IT http://www.s3it.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Typical 10GbE latency
Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG inconsistency
Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Between two hosts on an HP Procurve 6600, no jumbo frames: rtt min/avg/max/mdev = 0.096/0.128/0.151/0.019 ms Cheers, Dan On Thu Nov 06 2014 at 2:19:07 PM Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.com wrote: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... And what about the bandwith? The values you present don't seem awfully high, and the deviation seems low. On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All OSDs don't restart after shutdown
On 11/06/2014 12:36 PM, Antonio Messina wrote: On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro luca.mazzafe...@rzg.mpg.de wrote: Dear Users, Hi Luca, On the admin-node side the ceph healt command or the ceph -w hangs forever. I'm not a ceph expert either, but this is usually an indication that the monitors are not running. How many MONs are you running? Are they all alive? What's in the mon logs? Also check the time on the mon nodes. cheers, Antonio Ciao Antonio, thank you very much for your answer. I'm running 3 MONs and they are all alive. The logs doesn't shows any problem that I can recognize. This is a section after a restart from the initial monitor: 2014-11-06 14:31:36.795298 7fb66e4867a0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-mon, pid 28050 2014-11-06 14:31:36.860884 7fb66e4867a0 0 starting mon.ceph-node1 rank 0 at 192.168.122.21:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-node1 fsid 62e03428-0c4a-4ede-be18-c2cfed10639d 2014-11-06 14:31:36.861383 7fb66e4867a0 1 mon.ceph-node1@-1(probing) e3 preinit fsid 62e03428-0c4a-4ede-be18-c2cfed10639d 2014-11-06 14:31:36.862614 7fb66e4867a0 1 mon.ceph-node1@-1(probing).paxosservice(pgmap 1..218) refresh upgraded, format 0 - 1 2014-11-06 14:31:36.862666 7fb66e4867a0 1 mon.ceph-node1@-1(probing).pg v0 on_upgrade discarding in-core PGMap 2014-11-06 14:31:36.866958 7fb66e4867a0 0 mon.ceph-node1@-1(probing).mds e4 print_map epoch4 flags0 created2014-11-04 12:30:56.224692 modified2014-11-05 13:00:53.377356 tableserver0 root0 session_timeout60 session_autoclose300 max_file_size1099511627776 last_failure0 last_failure_osd_epoch0 compatcompat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap} max_mds1 in0 up{0=4243} failed stopped data_pools0 metadata_pool1 inline_datadisabled 4243:192.168.122.21:6805/28039 'ceph-node1' mds.0.1 up:active seq 2 2014-11-06 14:31:36.867144 7fb66e4867a0 0 mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, adjusting msgr requires 2014-11-06 14:31:36.867155 7fb66e4867a0 0 mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, adjusting msgr requires 2014-11-06 14:31:36.867157 7fb66e4867a0 0 mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, adjusting msgr requires 2014-11-06 14:31:36.867159 7fb66e4867a0 0 mon.ceph-node1@-1(probing).osd e15 crush map has features 1107558400, adjusting msgr requires 2014-11-06 14:31:36.867850 7fb66e4867a0 1 mon.ceph-node1@-1(probing).paxosservice(auth 1..37) refresh upgraded, format 0 - 1 2014-11-06 14:31:36.868898 7fb66e4867a0 0 mon.ceph-node1@-1(probing) e3 my rank is now 0 (was -1) 2014-11-06 14:31:36.869655 7fb666410700 0 -- 192.168.122.21:6789/0 192.168.122.22:6789/0 pipe(0x2b18a00 sd=22 :0 s=1 pgs=0 cs=0 l=0 c=0x2950c60).fault 2014-11-06 14:31:36.869817 7fb66630f700 0 -- 192.168.122.21:6789/0 192.168.122.23:6789/0 pipe(0x2b19680 sd=21 :0 s=1 pgs=0 cs=0 l=0 c=0x29518c0).fault 2014-11-06 14:31:52.224266 7fb66580d700 0 -- 192.168.122.21:6789/0 192.168.122.22:6789/0 pipe(0x2b1be80 sd=23 :6789 s=0 pgs=0 cs=0 l=0 c=0x2951b80).accept connect_seq 0 vs existing 0 state connecting 2014-11-06 14:31:57.987230 7fb66570c700 0 -- 192.168.122.21:6789/0 192.168.122.23:6789/0 pipe(0x2b1d280 sd=24 :6789 s=0 pgs=0 cs=0 l=0 c=0x2951ce0).accept connect_seq 0 vs existing 0 state connecting 2014-11-06 14:32:36.868421 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 2014-11-06 14:32:36.868739 7fb668213700 0 log [WRN] : reached concerning levels of available space on local monitor storage (20% free) 2014-11-06 14:33:36.869029 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 2014-11-06 14:34:36.869285 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 2014-11-06 14:35:36.869588 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 2014-11-06 14:36:36.869910 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 2014-11-06 14:37:36.870395 7fb668213700 0 mon.ceph-node1@0(probing).data_health(0) update_stats avail 20% total 8563152 used 6364364 avail 1763796 Instead from my admin node waiting for about 5 minutes I got this: [rzgceph@admin-node my-cluster]$ ceph -s 2014-11-06 12:18:43.723751 7f3f5d645700 0 monclient(hunting): authenticate timed out after 300 2014-11-06 12:18:43.723848 7f3f5d645700 0 librados: client.admin authentication error (110) Connection timed out Which leads me to this discussion:
Re: [ceph-users] PG inconsistency
Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can.. Thanks, Guang From: daniel.vanders...@cern.ch Date: Thu, 6 Nov 2014 13:36:46 + Subject: Re: PG inconsistency To: yguan...@outlook.com; ceph-users@lists.ceph.com Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.commailto:yguan...@outlook.com wrote: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
What is your version of the ceph? 0.80.0 - 0.80.3 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b Thu Nov 06 2014 at 16:24:21, GuangYang yguan...@outlook.com: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
We are using v0.80.4. Just would like to ask for general suggestion here :) Thanks, Guang From: malm...@gmail.com Date: Thu, 6 Nov 2014 13:46:12 + Subject: Re: [ceph-users] PG inconsistency To: yguan...@outlook.com; ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com What is your version of the ceph? 0.80.0 - 0.80.3 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b Thu Nov 06 2014 at 16:24:21, GuangYang yguan...@outlook.commailto:yguan...@outlook.com: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Thu Nov 06 2014 at 16:44:09, GuangYang yguan...@outlook.com: Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can.. For example, use the following parameters: osd_recovery_delay_start = 10 osd recovery op priority = 2 osd max backfills = 1 osd recovery max active =1 osd recovery threads = 1 Thanks, Guang From: daniel.vanders...@cern.ch Date: Thu, 6 Nov 2014 13:36:46 + Subject: Re: PG inconsistency To: yguan...@outlook.com; ceph-users@lists.ceph.com Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.commailto:yguan...@outlook.com wrote: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
also, between two hosts on a NetGear SW model at 10GbE: rtt min/avg/max/mdev = 0.104/0.196/0.288/0.055 ms German Anders --- Original message --- Asunto: [ceph-users] Typical 10GbE latency De: Wido den Hollander w...@42on.com Para: ceph-us...@ceph.com Fecha: Thursday, 06/11/2014 10:18 Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
IIRC, the EIO we had also correlated with a SMART status that showed the disk was bad enough for a warranty replacement -- so yes, I replaced the disk in these cases. Cheers, Dan On Thu Nov 06 2014 at 2:44:08 PM GuangYang yguan...@outlook.com wrote: Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can.. Thanks, Guang From: daniel.vanders...@cern.ch Date: Thu, 6 Nov 2014 13:36:46 + Subject: Re: PG inconsistency To: yguan...@outlook.com; ceph-users@lists.ceph.com Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang yguan...@outlook.commailto:yguan...@outlook.com wrote: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] buckets and users
By the way, Is it possible to run 2 radosgw on the same host? I think I have created the zone, not sure if it was correct, because it used the default pool names, even though I had changed them in the json file I had provided. Now I am trying to run ceph-radosgw with two different entries in the ceph.conf file, but without sucess. Example: [client.radosgw.gw] host = GATEWAY keyring = /etc/ceph/keyring.radosgw.gw rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/client.radosgw.gateway.log rgw print continue = false rgw dns name = gateway.local rgw enable ops log = false rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 rgw cache lru size = 15000 rgw thread pool size = 2048 #[client.radosgw.gw.env2] #host = GATEWAY #keyring = /etc/ceph/keyring.radosgw.gw #rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock #log file = /var/log/ceph/client.env2.radosgw.gateway.log #rgw print continue = false #rgw dns name = cephppr.local #rgw enable ops log = false #rgw enable usage log = true #rgw usage log tick interval = 30 #rgw usage log flush threshold = 1024 #rgw usage max shards = 32 #rgw usage max user shards = 1 #rgw cache lru size = 15000 #rgw thread pool size = 2048 #rgw zone = ppr It fails to create the socket: 2014-11-06 15:39:08.862364 7f80cc670880 0 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930 2014-11-06 15:39:08.870429 7f80cc670880 0 librados: client.radosgw.gw.env2 authentication error (1) Operation not permitted 2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage provider (RADOS) What am I doing wrong? Marco Garcês #sysadmin Maputo - Mozambique [Skype] marcogarces On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote: Your solution of pre-pending the environment name to the bucket, was my first choice, but at the moment I can't ask the devs to change the code to do that. For now I have to stick with the zones solution. Should I follow the federated zones docs (http://ceph.com/docs/master/radosgw/federated-config/) but skip the sync step? Thank you, Marco Garcês On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote: You could setup dedicated zones for each environment, and not replicate between them. Each zone would have it's own URL, but you would be able to re-use usernames and bucket names. If different URLs are a problem, you might be able to get around that in the load balancer or the web servers. I wouldn't really recommend that, but it's possible. I have a similar requirement. I was able to pre-pending the environment name to the bucket in my client code, which made things much easier. On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote: Hi there, I have this situation, where I'm using the same Ceph cluster (with radosgw), for two different environments, QUAL and PRE-PRODUCTION. I need different users for each environment, but I need to create the same buckets, with the same name; I understand there is no way to have 2 buckets with the same name, but how can I go around this? Perhaps creating a different pool for each user? Can you help me? Thank you in advance, my best regards, Marco Garcês ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo Am 06.11.2014 14:18, schrieb Wido den Hollander: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
What is the COPP? On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote: On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo Am 06.11.2014 14:18, schrieb Wido den Hollander: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, 2 LACP bonded Intel Corporation Ethernet 10G 2P X520 Adapters, no jumbo frames, here: rtt min/avg/max/mdev = 0.141/0.207/0.313/0.040 ms rtt min/avg/max/mdev = 0.124/0.223/0.289/0.044 ms rtt min/avg/max/mdev = 0.302/0.378/0.460/0.038 ms rtt min/avg/max/mdev = 0.282/0.389/0.473/0.035 ms All hosts on the same stacked pair of Dell N4032F switches. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] buckets and users
Update: I was able to fix the authentication error, and I have 2 radosgw running on the same host. The problem now, is, I believe I have created the zone wrong, or, I am doing something wrong, because I can login with the user I had before, and I can access his buckets. I need to have everything separated. Here are my zone info: default zone: { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} env2 zone: { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} Could you guys help me? Marco Garcês On Thu, Nov 6, 2014 at 3:56 PM, Marco Garcês ma...@garces.cc wrote: By the way, Is it possible to run 2 radosgw on the same host? I think I have created the zone, not sure if it was correct, because it used the default pool names, even though I had changed them in the json file I had provided. Now I am trying to run ceph-radosgw with two different entries in the ceph.conf file, but without sucess. Example: [client.radosgw.gw] host = GATEWAY keyring = /etc/ceph/keyring.radosgw.gw rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/client.radosgw.gateway.log rgw print continue = false rgw dns name = gateway.local rgw enable ops log = false rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 rgw cache lru size = 15000 rgw thread pool size = 2048 #[client.radosgw.gw.env2] #host = GATEWAY #keyring = /etc/ceph/keyring.radosgw.gw #rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock #log file = /var/log/ceph/client.env2.radosgw.gateway.log #rgw print continue = false #rgw dns name = cephppr.local #rgw enable ops log = false #rgw enable usage log = true #rgw usage log tick interval = 30 #rgw usage log flush threshold = 1024 #rgw usage max shards = 32 #rgw usage max user shards = 1 #rgw cache lru size = 15000 #rgw thread pool size = 2048 #rgw zone = ppr It fails to create the socket: 2014-11-06 15:39:08.862364 7f80cc670880 0 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930 2014-11-06 15:39:08.870429 7f80cc670880 0 librados: client.radosgw.gw.env2 authentication error (1) Operation not permitted 2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage provider (RADOS) What am I doing wrong? Marco Garcês #sysadmin Maputo - Mozambique [Skype] marcogarces On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote: Your solution of pre-pending the environment name to the bucket, was my first choice, but at the moment I can't ask the devs to change the code to do that. For now I have to stick with the zones solution. Should I follow the federated zones docs (http://ceph.com/docs/master/radosgw/federated-config/) but skip the sync step? Thank you, Marco Garcês On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote: You could setup dedicated zones for each environment, and not replicate between them. Each zone would have it's own URL, but you would be able to re-use usernames and bucket names. If different URLs are a problem, you might be able to get around that in the load balancer or the web servers. I wouldn't really recommend that, but it's possible. I have a similar requirement. I was able to pre-pending the environment name to the bucket in my client code, which made things much easier. On Wed, Nov 5, 2014 at 8:52 AM, Marco Garcês ma...@garces.cc wrote: Hi there, I have this situation, where I'm using the same Ceph cluster (with radosgw), for two different environments, QUAL and PRE-PRODUCTION. I need different users for each environment, but I need to create the same buckets, with the same name; I understand there is no way to have 2 buckets with the same name, but how can I go around this? Perhaps creating a different pool for each user? Can you help me? Thank you in advance, my best regards, Marco Garcês ___ ceph-users
Re: [ceph-users] Typical 10GbE latency
On 11/06/2014 02:58 PM, Luis Periquito wrote: What is the COPP? Nothing special, default settings. 200 ICMP packets/second. But we also tested with a direct TwinAx cable between two hosts, so no switch involved. That did not improve the latency. So this seems to be a kernel/driver issue somewhere, but I can't think of anything. The systems I have access to have no special tuning and get much better latency. Wido On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote: On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Hi Sam, Sounds like you needed osd 20. You can mark osd 20 lost. -Sam Does not work: # ceph osd lost 20 --yes-i-really-mean-it osd.20 is not down or doesn't exist Also, here is an interesting post which I will follow from October: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. That immediately caused all of my incomplete PGs to start recovering and everything seemed to come back OK. I was serving out and RBD from here and xfs_repair reported no problems. So... happy ending? What started this all was that I was altering my CRUSH map causing significant rebalancing on my cluster which had size = 2. During this process I lost an OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get the PGs to recover without changing the min_size. It is good that this worked for him, but it also seems like a bug that it worked! (I.e. ceph should have been able to recover on its own without weird workarounds.) I'll let you know if this works for me! Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] buckets and users
You need to tell each radosgw daemon which zone to use. In ceph.conf, I have: [client.radosgw.ceph3c] host = ceph3c rgw socket path = /var/run/ceph/radosgw.ceph3c keyring = /etc/ceph/ceph.client.radosgw.ceph3c.keyring log file = /var/log/ceph/radosgw.log admin socket = /var/run/ceph/radosgw.asok rgw dns name = us-central-1.ceph.cdlocal rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-central-1 rgw zone root pool = .us-central-1.rgw.root On Thu, Nov 6, 2014 at 6:35 AM, Marco Garcês ma...@garces.cc wrote: Update: I was able to fix the authentication error, and I have 2 radosgw running on the same host. The problem now, is, I believe I have created the zone wrong, or, I am doing something wrong, because I can login with the user I had before, and I can access his buckets. I need to have everything separated. Here are my zone info: default zone: { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} env2 zone: { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} Could you guys help me? Marco Garcês On Thu, Nov 6, 2014 at 3:56 PM, Marco Garcês ma...@garces.cc wrote: By the way, Is it possible to run 2 radosgw on the same host? I think I have created the zone, not sure if it was correct, because it used the default pool names, even though I had changed them in the json file I had provided. Now I am trying to run ceph-radosgw with two different entries in the ceph.conf file, but without sucess. Example: [client.radosgw.gw] host = GATEWAY keyring = /etc/ceph/keyring.radosgw.gw rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /var/log/ceph/client.radosgw.gateway.log rgw print continue = false rgw dns name = gateway.local rgw enable ops log = false rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 rgw cache lru size = 15000 rgw thread pool size = 2048 #[client.radosgw.gw.env2] #host = GATEWAY #keyring = /etc/ceph/keyring.radosgw.gw #rgw socket path = /var/run/ceph/ceph.env2.radosgw.gateway.fastcgi.sock #log file = /var/log/ceph/client.env2.radosgw.gateway.log #rgw print continue = false #rgw dns name = cephppr.local #rgw enable ops log = false #rgw enable usage log = true #rgw usage log tick interval = 30 #rgw usage log flush threshold = 1024 #rgw usage max shards = 32 #rgw usage max user shards = 1 #rgw cache lru size = 15000 #rgw thread pool size = 2048 #rgw zone = ppr It fails to create the socket: 2014-11-06 15:39:08.862364 7f80cc670880 0 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 7930 2014-11-06 15:39:08.870429 7f80cc670880 0 librados: client.radosgw.gw.env2 authentication error (1) Operation not permitted 2014-11-06 15:39:08.870889 7f80cc670880 -1 Couldn't init storage provider (RADOS) What am I doing wrong? Marco Garcês #sysadmin Maputo - Mozambique [Skype] marcogarces On Thu, Nov 6, 2014 at 10:11 AM, Marco Garcês ma...@garces.cc wrote: Your solution of pre-pending the environment name to the bucket, was my first choice, but at the moment I can't ask the devs to change the code to do that. For now I have to stick with the zones solution. Should I follow the federated zones docs (http://ceph.com/docs/master/radosgw/federated-config/) but skip the sync step? Thank you, Marco Garcês On Wed, Nov 5, 2014 at 8:13 PM, Craig Lewis cle...@centraldesktop.com wrote: You could setup dedicated zones for each environment, and not replicate between them. Each zone would have it's own URL, but you would be able to re-use usernames and bucket names. If different URLs are a problem, you might be able to get around that in the load balancer or the web servers. I wouldn't really recommend that, but it's possible. I have a similar requirement. I was able to pre-pending the
[ceph-users] Red Hat/CentOS kernel-ml to get RBD module
The maintainers of the kernel-ml[1] package have graciously accepted the request to include the RBD module in the mainline kernel build[2]. This should help people test out new kernels with RBD easier if you have better things to than build new kernels. Thanks kernel-ml maintainers! Robert LeBlanc [1] http://elrepo.org/tiki/kernel-ml [2] http://elrepo.org/bugs/view.php?id=521 http://www.google.com/url?q=http%3A%2F%2Felrepo.org%2Fbugs%2Fview.php%3Fid%3D521sa=Dsntz=1usg=AFQjCNGddj0-FGeMKy9k0l4HPduM-47ZRw ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de wrote: Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds isn't working anymore after osd's running full
Jasper, Thanks for this -- I've reproduced this issue in a development environment. We'll see if this is also an issue on giant, and backport a fix if appropriate. I'll update this thread soon. Cheers, John On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I saw that the site of the previous link of the logs uses a very short expiring time so I uploaded it to another one: http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz Thanks, Jasper Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory Farnum [gfar...@redhat.com] Verzonden: donderdag 30 oktober 2014 1:03 Aan: Jasper Siero CC: John Spray; ceph-users Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I added the debug options which you mentioned and started the process again: [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0 old journal was 9483323613~134233517 new journal start will be 9621733376 (4176246 bytes past old end) writing journal head writing EResetJournal entry done [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 9483323613~1048576 writing 9484372189~1048576 writing 9485420765~1048576 writing 9486469341~1048576 writing 9487517917~1048576 writing 9488566493~1048576 writing 9489615069~1048576 writing 9490663645~1048576 writing 9491712221~1048576 writing 9492760797~1048576 writing 9493809373~1048576 writing 9494857949~1048576 writing 9495906525~1048576 writing 9496955101~1048576 writing 9498003677~1048576 writing 9499052253~1048576 writing 9500100829~1048576 writing 9501149405~1048576 writing 9502197981~1048576 writing 9503246557~1048576 writing 9504295133~1048576 writing 9505343709~1048576 writing 9506392285~1048576 writing 9507440861~1048576 writing 9508489437~1048576 writing 9509538013~1048576 writing 9510586589~1048576 writing 9511635165~1048576 writing 9512683741~1048576 writing 9513732317~1048576 writing 9514780893~1048576 writing 9515829469~1048576 writing 9516878045~1048576 writing 9517926621~1048576 writing 9518975197~1048576 writing 9520023773~1048576 writing 9521072349~1048576 writing 9522120925~1048576 writing 9523169501~1048576 writing 9524218077~1048576 writing 9525266653~1048576 writing 9526315229~1048576 writing 9527363805~1048576 writing 9528412381~1048576 writing 9529460957~1048576 writing 9530509533~1048576 writing 9531558109~1048576 writing 9532606685~1048576 writing 9533655261~1048576 writing 9534703837~1048576 writing 9535752413~1048576 writing 9536800989~1048576 writing 9537849565~1048576 writing 9538898141~1048576 writing 9539946717~1048576 writing 9540995293~1048576 writing 9542043869~1048576 writing 9543092445~1048576 writing 9544141021~1048576 writing 9545189597~1048576 writing 9546238173~1048576 writing 9547286749~1048576 writing 9548335325~1048576 writing 9549383901~1048576 writing 9550432477~1048576 writing 9551481053~1048576 writing 9552529629~1048576 writing 9553578205~1048576 writing 9554626781~1048576 writing 9555675357~1048576 writing 9556723933~1048576 writing 9557772509~1048576 writing 9558821085~1048576 writing 9559869661~1048576 writing 9560918237~1048576 writing 9561966813~1048576 writing 9563015389~1048576 writing 9564063965~1048576 writing 9565112541~1048576 writing 9566161117~1048576 writing 9567209693~1048576 writing 9568258269~1048576 writing 9569306845~1048576 writing 9570355421~1048576 writing 9571403997~1048576 writing 9572452573~1048576 writing 9573501149~1048576 writing 9574549725~1048576 writing 9575598301~1048576 writing 9576646877~1048576 writing 9577695453~1048576 writing 9578744029~1048576 writing 9579792605~1048576 writing 9580841181~1048576 writing 9581889757~1048576 writing 9582938333~1048576 writing 9583986909~1048576 writing 9585035485~1048576 writing 9586084061~1048576 writing 9587132637~1048576 writing 9588181213~1048576 writing 9589229789~1048576 writing 9590278365~1048576 writing 9591326941~1048576 writing 9592375517~1048576 writing 9593424093~1048576 writing 9594472669~1048576 writing 9595521245~1048576 writing 9596569821~1048576 writing 9597618397~1048576 writing 9598666973~1048576 writing 9599715549~1048576 writing 9600764125~1048576 writing 9601812701~1048576 writing 9602861277~1048576 writing 9603909853~1048576 writing 9604958429~1048576 writing
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now, but it certainly is unexpected. -Sam On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys cws...@physics.wisc.edu wrote: Hi Sam, Sounds like you needed osd 20. You can mark osd 20 lost. -Sam Does not work: # ceph osd lost 20 --yes-i-really-mean-it osd.20 is not down or doesn't exist Also, here is an interesting post which I will follow from October: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. That immediately caused all of my incomplete PGs to start recovering and everything seemed to come back OK. I was serving out and RBD from here and xfs_repair reported no problems. So... happy ending? What started this all was that I was altering my CRUSH map causing significant rebalancing on my cluster which had size = 2. During this process I lost an OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get the PGs to recover without changing the min_size. It is good that this worked for him, but it also seems like a bug that it worked! (I.e. ceph should have been able to recover on its own without weird workarounds.) I'll let you know if this works for me! Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Also, are you certain that osd 20 is not up? -Sam On Thu, Nov 6, 2014 at 10:52 AM, Samuel Just sam.j...@inktank.com wrote: Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now, but it certainly is unexpected. -Sam On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys cws...@physics.wisc.edu wrote: Hi Sam, Sounds like you needed osd 20. You can mark osd 20 lost. -Sam Does not work: # ceph osd lost 20 --yes-i-really-mean-it osd.20 is not down or doesn't exist Also, here is an interesting post which I will follow from October: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. That immediately caused all of my incomplete PGs to start recovering and everything seemed to come back OK. I was serving out and RBD from here and xfs_repair reported no problems. So... happy ending? What started this all was that I was altering my CRUSH map causing significant rebalancing on my cluster which had size = 2. During this process I lost an OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get the PGs to recover without changing the min_size. It is good that this worked for him, but it also seems like a bug that it worked! (I.e. ceph should have been able to recover on its own without weird workarounds.) I'll let you know if this works for me! Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Hi Sam, Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 Well, thanks for any bugfixes in advance! :) Also, are you certain that osd 20 is not up? -Sam Yep. # ceph osd metadata 20 Error ENOENT: osd.20 does not exist So part of ceph thinks osd.20 doesn't exist, but another part (the down_osds_we_would_probe) thinks the osd exists and is down? In other news, my min_size was set to 1, so the same fix might not apply to me. Instead I set the pool size from 2 to 1, then back again. Looks like the end result is merely going to be that the down+incomplete get converted to incomplete. :/ I'll let you (and future googlers) know. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Basic Ceph Questions
On Wed, Nov 5, 2014 at 11:57 PM, Wido den Hollander w...@42on.com wrote: On 11/05/2014 11:03 PM, Lindsay Mathieson wrote: - Geo Replication - thats done via federated gateways? looks complicated :( * The remote slave, it would be read only? That is only for the RADOS Gateway. Ceph itself (RADOS) does not support Geo Replication. That is only for the RADOS Gateway. Ceph itself (RADOS) does not support Geo Replication. The 3 services built on top of RADOS support backups, but RADOS itself does not. For RDB, you can use snapshot diffs, and ship them offsite (see various threads on the ML). For RadosGW, there is Federation. For CephFS, you can use traditional POSIX filesystem backup tools. - Disaster strikes, apart from DR backups how easy is it to recover your data off ceph OSD's? one of the things I liked about gluster was that if I totally screwed up the gluster masters, I could always just copy the data off the filesystem. Not so much with ceph. It's a bit harder with Ceph. Eventually it is doable, but that is something that would take a lot of time. In practice, not really. Out of curiosity, I attempted this for some RadosGW objects. It was easy when there was a single object less than 4MB. It very quickyl became complicated with a few larger objects. You'd have to have a very deep understanding of the service to track all of the information down with the cluster offline. It's definitely possible, just not practical. - Am I abusing ceph? :) I just have a small 3 node VM server cluster with 20 windows VM;s, some servers, some VDI. The shared store is a QNAP nas which is struggling. I'm using ceph for - Shared Storage - Replication/Redundancy - Improved performance I think that 3 nodes is not sufficient, Ceph really starts performing when you go 10 nodes (excluding monitors). If it meets your needs, then it's working. :-) You're going to spend a lot more time managing the 3 node Ceph cluster than you spent on the QNAP. If it doesn't make sense for you to spent a lot of time dealing with storage, then a single shared store with more IOPS would be a better fit. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds isn't working anymore after osd's running full
This is still an issue on master, so a fix will be coming soon. Follow the ticket for updates: http://tracker.ceph.com/issues/10025 Thanks for finding the bug! John On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote: Jasper, Thanks for this -- I've reproduced this issue in a development environment. We'll see if this is also an issue on giant, and backport a fix if appropriate. I'll update this thread soon. Cheers, John On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I saw that the site of the previous link of the logs uses a very short expiring time so I uploaded it to another one: http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz Thanks, Jasper Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory Farnum [gfar...@redhat.com] Verzonden: donderdag 30 oktober 2014 1:03 Aan: Jasper Siero CC: John Spray; ceph-users Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, I added the debug options which you mentioned and started the process again: [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0 old journal was 9483323613~134233517 new journal start will be 9621733376 (4176246 bytes past old end) writing journal head writing EResetJournal entry done [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 9483323613~1048576 writing 9484372189~1048576 writing 9485420765~1048576 writing 9486469341~1048576 writing 9487517917~1048576 writing 9488566493~1048576 writing 9489615069~1048576 writing 9490663645~1048576 writing 9491712221~1048576 writing 9492760797~1048576 writing 9493809373~1048576 writing 9494857949~1048576 writing 9495906525~1048576 writing 9496955101~1048576 writing 9498003677~1048576 writing 9499052253~1048576 writing 9500100829~1048576 writing 9501149405~1048576 writing 9502197981~1048576 writing 9503246557~1048576 writing 9504295133~1048576 writing 9505343709~1048576 writing 9506392285~1048576 writing 9507440861~1048576 writing 9508489437~1048576 writing 9509538013~1048576 writing 9510586589~1048576 writing 9511635165~1048576 writing 9512683741~1048576 writing 9513732317~1048576 writing 9514780893~1048576 writing 9515829469~1048576 writing 9516878045~1048576 writing 9517926621~1048576 writing 9518975197~1048576 writing 9520023773~1048576 writing 9521072349~1048576 writing 9522120925~1048576 writing 9523169501~1048576 writing 9524218077~1048576 writing 9525266653~1048576 writing 9526315229~1048576 writing 9527363805~1048576 writing 9528412381~1048576 writing 9529460957~1048576 writing 9530509533~1048576 writing 9531558109~1048576 writing 9532606685~1048576 writing 9533655261~1048576 writing 9534703837~1048576 writing 9535752413~1048576 writing 9536800989~1048576 writing 9537849565~1048576 writing 9538898141~1048576 writing 9539946717~1048576 writing 9540995293~1048576 writing 9542043869~1048576 writing 9543092445~1048576 writing 9544141021~1048576 writing 9545189597~1048576 writing 9546238173~1048576 writing 9547286749~1048576 writing 9548335325~1048576 writing 9549383901~1048576 writing 9550432477~1048576 writing 9551481053~1048576 writing 9552529629~1048576 writing 9553578205~1048576 writing 9554626781~1048576 writing 9555675357~1048576 writing 9556723933~1048576 writing 9557772509~1048576 writing 9558821085~1048576 writing 9559869661~1048576 writing 9560918237~1048576 writing 9561966813~1048576 writing 9563015389~1048576 writing 9564063965~1048576 writing 9565112541~1048576 writing 9566161117~1048576 writing 9567209693~1048576 writing 9568258269~1048576 writing 9569306845~1048576 writing 9570355421~1048576 writing 9571403997~1048576 writing 9572452573~1048576 writing 9573501149~1048576 writing 9574549725~1048576 writing 9575598301~1048576 writing 9576646877~1048576 writing 9577695453~1048576 writing 9578744029~1048576 writing 9579792605~1048576 writing 9580841181~1048576 writing 9581889757~1048576 writing 9582938333~1048576 writing 9583986909~1048576 writing 9585035485~1048576 writing 9586084061~1048576 writing 9587132637~1048576 writing 9588181213~1048576 writing 9589229789~1048576 writing 9590278365~1048576 writing 9591326941~1048576 writing 9592375517~1048576 writing 9593424093~1048576 writing 9594472669~1048576 writing 9595521245~1048576 writing 9596569821~1048576
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
On Thu, Nov 6, 2014 at 11:27 AM, Chad Seys Also, are you certain that osd 20 is not up? -Sam Yep. # ceph osd metadata 20 Error ENOENT: osd.20 does not exist So part of ceph thinks osd.20 doesn't exist, but another part (the down_osds_we_would_probe) thinks the osd exists and is down? You'll have trouble until osd.20 exists again. Ceph really does not want to lose data. Even if you tell it the osd is gone, ceph won't believe you. Once ceph can probe any osd that claims to be 20, it might let you proceed with your recovery. Then you'll probably need to use ceph pg pgid mark_unfound_lost. If you don't have a free bay to create a real osd.20, it's possible to fake it with some small loop-back filesystems. Bring it up and mark it OUT. It will probably cause some remapping. I would keep it around until you get things healthy. If you create a real osd.20, you might want to leave it OUT until you get things healthy again. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem
Hi Craig, You'll have trouble until osd.20 exists again. Ceph really does not want to lose data. Even if you tell it the osd is gone, ceph won't believe you. Once ceph can probe any osd that claims to be 20, it might let you proceed with your recovery. Then you'll probably need to use ceph pg pgid mark_unfound_lost. If you don't have a free bay to create a real osd.20, it's possible to fake it with some small loop-back filesystems. Bring it up and mark it OUT. It will probably cause some remapping. I would keep it around until you get things healthy. If you create a real osd.20, you might want to leave it OUT until you get things healthy again. Thanks for the recovery tip! I would guess I could safely remove an OSD (mark OUT, wait for migration to stop, then crush osd rm) and then add back in as osd.20 would work? New switch: --yes-i-really-REALLY-mean-it ;) Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD Diff based on Timestamp
I have been thinking about the implications of losing the snapshot chain on a RBD when doing export-diff-import-diff between two separate physical locations. As I understand it, in this scenario when you take the first snapshot again on the source, you would In effect end up copying the whole RBD image across to the other site as the diff would be based on Creation-1st Snap. If this was a large multi TB RBD, even over a reasonably fast link, this could a long time to resync. From what I understand the RADOS objects which RBD's are striped across, have last modified timestamps. Would it be feasible to add an option to the rbd command to export a diff of modified blocks since a certain timestamp? This way you could take a new snapshot on the source RBD and then specify a timestamp from just before the previously deleted snapshot and export the blocks to bring the 2nd copy back up to date. You could then resume the normal export-diff-import-diff procedure. Please tell me if I am thinking about this in completely the wrong way, or if this is actually a possible solution. Many Thanks, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Installing CephFs via puppet
Hi Guys, I am sure many of you guys have installed cephfs using puppet. I am trying to install “firefly” using the puppet module from https://github.com/ceph/puppet-ceph.git and running into the “ceph_config” file issue where it’s unable to find the config file and I am not sure why. Here’s the error I get while running puppet on one of the mon nodes: Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not evaluate: No ability to determine if ceph_config exists —Jiten___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] installing ceph object gateway
Is there updated documentation explaining how to install and use the object gateway? http://docs.ceph.com/docs/master/install/install-ceph-gateway/ I attempted this install and quickly run into problems. Thanks! -M ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd down
I tried restarting all the osd's on that node, osd.70 was the only ceph process that did not come back online. There is nothing in the ceph-osd log for osd.70. However I do see over 13,000 of these messages in the kern.log: Nov 6 19:54:27 hqosd6 kernel: [34042786.392178] XFS (sdl1): xfs_log_force: error 5 returned. Does anyone have any suggestions on how I might be able to get this HD back in the cluster (or whether or not it is worth even trying). Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 From: Shain Miley [smi...@npr.org] Sent: Tuesday, November 04, 2014 3:55 PM To: ceph-users@lists.ceph.com Subject: osd down Hello, We are running ceph version 0.80.5 with 108 osd's. Today I noticed that one of the osd's is down: root@hqceph1:/var/log/ceph# ceph -s cluster 504b5794-34bd-44e7-a8c3-0494cf800c23 health HEALTH_WARN crush map has legacy tunables monmap e1: 3 mons at {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}, election epoch 146, quorum 0,1,2 hqceph1,hqceph2,hqceph3 osdmap e7119: 108 osds: 107 up, 107 in pgmap v6729985: 3208 pgs, 17 pools, 81193 GB data, 21631 kobjects 216 TB used, 171 TB / 388 TB avail 3204 active+clean 4 active+clean+scrubbing client io 4079 kB/s wr, 8 op/s Using osd dump I determined that it is osd number 70: osd.70 down out weight 0 up_from 2668 up_thru 6886 down_at 6913 last_clean_interval [488,2665) 10.35.1.217:6814/22440 10.35.1.217:6820/22440 10.35.1.217:6824/22440 10.35.1.217:6830/22440 autoout,exists 5dbd4a14-5045-490e-859b-15533cd67568 Looking at that node, the drive is still mounted and I did not see any errors in any of the system logs, and the raid level status shows the drive as up and healthy, etc. root@hqosd6:~# df -h |grep 70 /dev/sdl1 3.7T 1.9T 1.9T 51% /var/lib/ceph/osd/ceph-70 I was hoping that someone might be able to advise me on the next course of action (can I add the osd back in?, should I replace the drive altogether, etc) I have attached the osd log to this email. Any suggestions would be great. Thanks, Shain -- Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing CephFs via puppet
Hi, At the moment puppet-ceph does not support CephFS. The error you're seeing does not ring a bell, would you have more context to help diagnose it ? Cheers On 06/11/2014 23:44, JIten Shah wrote: Hi Guys, I am sure many of you guys have installed cephfs using puppet. I am trying to install “firefly” using the puppet module from https://github.com/ceph/puppet-ceph.git and running into the “ceph_config” file issue where it’s unable to find the config file and I am not sure why. Here’s the error I get while running puppet on one of the mon nodes: Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not evaluate: No ability to determine if ceph_config exists —Jiten ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing CephFs via puppet
Thanks Loic. What is the recommended puppet module for installing cephFS ? I can send more details about puppet-ceph but basically I haven't changed anything in there except for assigning values to the required params in the yaml file. --Jiten On Nov 6, 2014, at 7:24 PM, Loic Dachary l...@dachary.org wrote: Hi, At the moment puppet-ceph does not support CephFS. The error you're seeing does not ring a bell, would you have more context to help diagnose it ? Cheers On 06/11/2014 23:44, JIten Shah wrote: Hi Guys, I am sure many of you guys have installed cephfs using puppet. I am trying to install “firefly” using the puppet module from https://github.com/ceph/puppet-ceph.git and running into the “ceph_config” file issue where it’s unable to find the config file and I am not sure why. Here’s the error I get while running puppet on one of the mon nodes: Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not evaluate: No ability to determine if ceph_config exists Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not evaluate: No ability to determine if ceph_config exists —Jiten ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Cluster with two radosgw
Any best practices available for Radosgw HA? Please suggest. On Wednesday, November 5, 2014 2:08 PM, lakshmi k s lux...@yahoo.com wrote: Hello - My ceph cluster needs to have two rados gateway nodes eventually interfacing with Openstack haproxy. I have been successful in bringing up one of them. What are the steps for additional rados gateway node to be included in cluster? Any help is greatly appreciated. Thanks much. Lakshmi.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] installing ceph object gateway
Please share the problem/issue details(like error msg,etc). We could check and help. Thanks Swami On Fri, Nov 7, 2014 at 4:41 AM, Michael Kuriger mk7...@yp.com wrote: Is there updated documentation explaining how to install and use the object gateway? http://docs.ceph.com/docs/master/install/install-ceph-gateway/ I attempted this install and quickly run into problems. Thanks! -M ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is it normal that osd's memory exceed 1GB under stress test?
I set mon_osd_down_out_interval to two days,and do stress test. the memory of osd exceed 1GB.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest?
and make one osd down.then do stress test by fio. -- Original -- From: 谢锐xie...@szsandstone.com; Date: Fri, Nov 7, 2014 02:50 PM To: ceph-usersceph-us...@ceph.com; Subject: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest? I set mon_osd_down_out_interval to two days,and do stress test. the memory of osd exceed 1GB. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com