[ceph-users] blockdev --setro cannot set krbd to readonly
I mapped an image to a system, and used blockdev to make it readonly. But it failed.[root@ceph0 mnt]# blockdev --setro /dev/rbd2[root@ceph0 mnt]# blockdev --getro /dev/rbd20 It's on Centos6.4 with kernel 3.10.6 .Ceph 0.61.8 . Any idea? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] System reboot hangs when umounting filesystem on rbd
Centos 6.4Kernel 3.10.6Ceph 0.61.8 My ceph cluster is deployed on three nodes.One rbd image was created, mapped to one of the three nodes, formatted with ext4, and mounted.When rebooting this node, it hung umouting the file system on the rbd. My guess about the root cause:When the system shutting down, the services are stopped firstly, then the mounted file systems are umounted. For the file system on rbd, if there are dirty pages, a flush will happen, but the ceph services have been shut down, so it will hang. Am I right? How to work around this? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to force a monitor time check?
Time skews happen frequently when the systems running monitors are restarted.With ntp server configured, the time skew between systems will be fixed over some time. But the ceph monitors won't find it at once if there are no time check messages at that time, so the ceph status will be still shown as time skew warning. Sometimes it will keep for several hours.How can I trigger a monitor time check manually? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it possible to change the pg number after adding new osds?
Only pgp_num is listed in the reference. Though pg_num can be changed in the same way, is there any risk to do that? From: andreas.fu...@swisstxt.ch To: dachun...@outlook.com; ceph-users@lists.ceph.com Subject: RE: [ceph-users] Is it possible to change the pg number after adding new osds? Date: Mon, 2 Sep 2013 09:02:15 + You can change the pg numbers on the fly with ceph osd pool set {pool_name} pg_num {value} ceph osd pool set {pool_name} pgp_num {value} refrence: http://ceph.com/docs/master/rados/operations/pools/ From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Da Chun Ng Sent: Montag, 2. September 2013 04:49 To: ceph-users@lists.ceph.com Subject: [ceph-users] Is it possible to change the pg number after adding new osds? According to the doc, the pg numbers should be enlarged for better read/write balance if the osd number is increased. But seems the pg number cannot be changed on the fly. It's fixed when the pool is created. Am I right? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Is it possible to change the pg number after adding new osds?
According to the doc, the pg numbers should be enlarged for better read/write balance if the osd number is increased.But seems the pg number cannot be changed on the fly. It's fixed when the pool is created. Am I right? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] The whole cluster hangs when changing MTU to 9216
Centos 6.4Ceph Cuttlefish 0.61.7, or 0.61.8. I changed the MTU to 9216(or 9000), then restarted all the cluster nodes. The whole cluster hung, with messages in the mon log as below:4048 2013-08-26 15:52:43.028554 7fd83f131700 1 mon.ceph0@0(electing).elector(15) init, last seen epoch 154049 2013-08-26 15:52:46.431842 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere4050 2013-08-26 15:52:46.431886 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere4051 2013-08-26 15:52:46.431899 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere4052 2013-08-26 15:52:46.431911 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere4053 2013-08-26 15:52:46.431923 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere4054 2013-08-26 15:52:46.431937 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere4055 2013-08-26 15:52:46.431948 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere4056 2013-08-26 15:52:48.028808 7fd83f131700 1 mon.ceph0@0(electing).elector(15) init, last seen epoch 154057 2013-08-26 15:52:51.432073 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere4058 2013-08-26 15:52:51.432116 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere4059 2013-08-26 15:52:51.432129 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere4060 2013-08-26 15:52:51.432147 7fd83f131700 1 mon.ceph0@0(electing) e1 discarding message auth(proto 0 27 bytes epoch 1) v1 and sending client elsewhere4061 2013-08-26 15:52:53.029037 7fd83f131700 1 mon.ceph0@0(electing).elector(15) init, last seen epoch 15 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor write/random read/random write performance
Mark, I tried with journal aio = true, and op thread = 4, but made little difference.Then I tried to enlarge read ahead value both on the osd block devices and cephfs client. It did improve some overall performance, especially the sequential read performance. But still has not much help to the write/random read/random write performance. I tried to change the place group number to (100 * osd_num)/replica_size, it does not decrease the overall performance this time, but not improve neither. Date: Mon, 19 Aug 2013 12:31:07 -0500 From: mark.nel...@inktank.com To: dachun...@outlook.com CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 12:05 PM, Da Chun Ng wrote: Thank you! Testing now. How about pg num? I'm using the default size 64, as I tried with (100 * osd_num)/replica_size, but it decreased the performance surprisingly. Oh! That's odd! Typically you would want more than that. Most likely you aren't distributing PGs very evenly across OSDs with 64. More PGs shouldn't decrease performance unless the monitors are behaving badly. We saw some issues back in early cuttlefish but you should be fine with many more PGs. Mark Date: Mon, 19 Aug 2013 11:33:30 -0500 From: mark.nel...@inktank.com To: dachun...@outlook.com CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 08:59 AM, Da Chun Ng wrote: Thanks very much! Mark. Yes, I put the data and journal on the same disk, no SSD in my environment. My controllers are general SATA II. Ok, so in this case the lack of WB cache on the controller and no SSDs for journals is probably having an effect. Some more questions below in blue. Date: Mon, 19 Aug 2013 07:48:23 -0500 From: mark.nel...@inktank.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 06:28 AM, Da Chun Ng wrote: I have a 3 nodes, 15 osds ceph cluster setup: * 15 7200 RPM SATA disks, 5 for each node. * 10G network * Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node. * 64G Ram for each node. I deployed the cluster with ceph-deploy, and created a new data pool for cephfs. Both the data and metadata pools are set with replica size 3. Then mounted the cephfs on one of the three nodes, and tested the performance with fio. The sequential read performance looks good: fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60 read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec Sounds like readahead and or caching is helping out a lot here. Btw, you might want to make sure this is actually coming from the disks with iostat or collectl or something. I ran sync echo 3 | tee /proc/sys/vm/drop_caches on all the nodes before every test. I used collectl to watch every disk IO, the numbers should match. I think readahead is helping here. Ok, good! I suspect that readahead is indeed helping. But the sequential write/random read/random write performance is very poor: fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msec One thing to keep in mind is that unless you have SSDs in this system, you will be doing 2 writes for every client write to the spinning disks (since data and journals will both be on the same disk). So let's do the math: 6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / drive If there is no write coalescing going on, this isn't terrible. If there is, this is terrible. How can I know if there is write coalescing going on? look in collectl at the average IO sizes going to the disks. I bet they will be 16KB. If you were to look further with blktrace and seekwatcher, I bet you'd see lots of seeking between OSD data writes and journal writes since there is no controller cache helping smooth things out (and your journals are on the same drives). Have you tried buffered writes with the sync engine at the same IO size? Do you mean as below? fio -direct=0-iodepth 1 -thread -rw=write -ioengine=sync-bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 Yeah, that'd work. fio -direct=1 -iodepth 1 -thread -rw=randread -ioengine
[ceph-users] Poor write/random read/random write performance
I have a 3 nodes, 15 osds ceph cluster setup:* 15 7200 RPM SATA disks, 5 for each node.* 10G network* Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node.* 64G Ram for each node. I deployed the cluster with ceph-deploy, and created a new data pool for cephfs.Both the data and metadata pools are set with replica size 3.Then mounted the cephfs on one of the three nodes, and tested the performance with fio. The sequential read performance looks good:fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec But the sequential write/random read/random write performance is very poor:fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msecfio -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60read : io=665664KB, bw=11087KB/s, iops=692 , runt= 60041msecfio -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: io=361056KB, bw=6001.1KB/s, iops=375 , runt= 60157msec I am mostly surprised by the seq write performance comparing to the raw sata disk performance(It can get 4127 IOPS when mounted with ext4). My cephfs only gets 1/10 performance of the raw disk. How can I tune my cluster to improve the sequential write/random read/random write performance? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor write/random read/random write performance
Sorry, forget to tell the OS and kernel version. It's Centos 6.4 with kernel 3.10.6 .fio 2.0.13 . From: dachun...@outlook.com To: ceph-users@lists.ceph.com Date: Mon, 19 Aug 2013 11:28:24 + Subject: [ceph-users] Poor write/random read/random write performance I have a 3 nodes, 15 osds ceph cluster setup:* 15 7200 RPM SATA disks, 5 for each node.* 10G network* Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node.* 64G Ram for each node. I deployed the cluster with ceph-deploy, and created a new data pool for cephfs.Both the data and metadata pools are set with replica size 3.Then mounted the cephfs on one of the three nodes, and tested the performance with fio. The sequential read performance looks good:fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec But the sequential write/random read/random write performance is very poor:fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msecfio -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60read : io=665664KB, bw=11087KB/s, iops=692 , runt= 60041msecfio -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60write: io=361056KB, bw=6001.1KB/s, iops=375 , runt= 60157msec I am mostly surprised by the seq write performance comparing to the raw sata disk performance(It can get 4127 IOPS when mounted with ext4). My cephfs only gets 1/10 performance of the raw disk. How can I tune my cluster to improve the sequential write/random read/random write performance? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor write/random read/random write performance
Thanks very much! Mark.Yes, I put the data and journal on the same disk, no SSD in my environment.My controllers are general SATA II. Some more questions below in blue. Date: Mon, 19 Aug 2013 07:48:23 -0500 From: mark.nel...@inktank.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 06:28 AM, Da Chun Ng wrote: I have a 3 nodes, 15 osds ceph cluster setup: * 15 7200 RPM SATA disks, 5 for each node. * 10G network * Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node. * 64G Ram for each node. I deployed the cluster with ceph-deploy, and created a new data pool for cephfs. Both the data and metadata pools are set with replica size 3. Then mounted the cephfs on one of the three nodes, and tested the performance with fio. The sequential read performance looks good: fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60 read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec Sounds like readahead and or caching is helping out a lot here. Btw, you might want to make sure this is actually coming from the disks with iostat or collectl or something. I ran sync echo 3 | tee /proc/sys/vm/drop_caches on all the nodes before every test. I used collectl to watch every disk IO, the numbers should match. I think readahead is helping here. But the sequential write/random read/random write performance is very poor: fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msec One thing to keep in mind is that unless you have SSDs in this system, you will be doing 2 writes for every client write to the spinning disks (since data and journals will both be on the same disk). So let's do the math: 6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / drive If there is no write coalescing going on, this isn't terrible. If there is, this is terrible. How can I know if there is write coalescing going on? Have you tried buffered writes with the sync engine at the same IO size? Do you mean as below?fio -direct=0 -iodepth 1 -thread -rw=write -ioengine=sync -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 fio -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 read : io=665664KB, bw=11087KB/s, iops=692 , runt= 60041msec In this case: 11087 * 1024 (KB-bytes) / 16384 / 15 = ~46 IOPS / drive. Definitely not great! You might want to try fiddling with read ahead both on the CephFS client and on the block devices under the OSDs themselves. Could you please tell me how to enable read ahead on the CephFS client? For the block devices under the OSDs, the read ahead value is:[root@ceph0 ~]# blockdev --getra /dev/sdi256How big is appropriate for it? One thing I did notice back during bobtail is that increasing the number of osd op threads seemed to help small object read performance. It might be worth looking at too. http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#4kbradosread Other than that, if you really want to dig into this, you can use tools like iostat, collectl, blktrace, and seekwatcher to try and get a feel for what the IO going to the OSDs looks like. That can help when diagnosing this sort of thing. fio -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 write: io=361056KB, bw=6001.1KB/s, iops=375 , runt= 60157msec 6001.1KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~150 IOPS / drive I am mostly surprised by the seq write performance comparing to the raw sata disk performance(It can get 4127 IOPS when mounted with ext4). My cephfs only gets 1/10 performance of the raw disk
Re: [ceph-users] Poor write/random read/random write performance
Thank you! Testing now. How about pg num? I'm using the default size 64, as I tried with (100 * osd_num)/replica_size, but it decreased the performance surprisingly. Date: Mon, 19 Aug 2013 11:33:30 -0500 From: mark.nel...@inktank.com To: dachun...@outlook.com CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 08:59 AM, Da Chun Ng wrote: Thanks very much! Mark. Yes, I put the data and journal on the same disk, no SSD in my environment. My controllers are general SATA II. Ok, so in this case the lack of WB cache on the controller and no SSDs for journals is probably having an effect. Some more questions below in blue. Date: Mon, 19 Aug 2013 07:48:23 -0500 From: mark.nel...@inktank.com To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Poor write/random read/random write performance On 08/19/2013 06:28 AM, Da Chun Ng wrote: I have a 3 nodes, 15 osds ceph cluster setup: * 15 7200 RPM SATA disks, 5 for each node. * 10G network * Intel(R) Xeon(R) CPU E5-2620(6 cores) 2.00GHz, for each node. * 64G Ram for each node. I deployed the cluster with ceph-deploy, and created a new data pool for cephfs. Both the data and metadata pools are set with replica size 3. Then mounted the cephfs on one of the three nodes, and tested the performance with fio. The sequential read performance looks good: fio -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=16K -size=1G -numjobs=16 -group_reporting -name=mytest -runtime 60 read : io=10630MB, bw=181389KB/s, iops=11336 , runt= 60012msec Sounds like readahead and or caching is helping out a lot here. Btw, you might want to make sure this is actually coming from the disks with iostat or collectl or something. I ran sync echo 3 | tee /proc/sys/vm/drop_caches on all the nodes before every test. I used collectl to watch every disk IO, the numbers should match. I think readahead is helping here. Ok, good! I suspect that readahead is indeed helping. But the sequential write/random read/random write performance is very poor: fio -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 write: io=397280KB, bw=6618.2KB/s, iops=413 , runt= 60029msec One thing to keep in mind is that unless you have SSDs in this system, you will be doing 2 writes for every client write to the spinning disks (since data and journals will both be on the same disk). So let's do the math: 6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB-bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / drive If there is no write coalescing going on, this isn't terrible. If there is, this is terrible. How can I know if there is write coalescing going on? look in collectl at the average IO sizes going to the disks. I bet they will be 16KB. If you were to look further with blktrace and seekwatcher, I bet you'd see lots of seeking between OSD data writes and journal writes since there is no controller cache helping smooth things out (and your journals are on the same drives). Have you tried buffered writes with the sync engine at the same IO size? Do you mean as below? fio -direct=0-iodepth 1 -thread -rw=write -ioengine=sync-bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 Yeah, that'd work. fio -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio -bs=16K -size=256M -numjobs=16 -group_reporting -name=mytest -runtime 60 read : io=665664KB, bw=11087KB/s, iops=692 , runt= 60041msec In this case: 11087 * 1024 (KB-bytes) / 16384 / 15 = ~46 IOPS / drive. Definitely not great! You might want to try fiddling with read ahead both on the CephFS client and on the block devices under the OSDs themselves. Could you please tell me how to enable read ahead on the CephFS client? It's one of the mount options: http://ceph.com/docs/master/man/8/mount.ceph/ For the block devices under the OSDs, the read ahead value is: [root@ceph0 ~]# blockdev --getra /dev/sdi 256 How big is appropriate for it? To be honest I've seen different results depending on the hardware. I'd try anywhere from 32kb to 2048kb. One thing I did notice back during bobtail is that increasing the number of osd op threads seemed to help small object read performance. It might be worth looking at too. http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#4kbradosread Other than that, if you really want to dig into this, you can use tools like iostat, collectl, blktrace, and seekwatcher to try and get a feel for what the IO
[ceph-users] All old pgs in stale after recreating all osds
On Centos 6.4, Ceph 0.61.7. I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and recreated 6 new ones. Then I find all the old pgs are in stale. [root@ceph0 ceph]# ceph -s health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck stale; 192 pgs stuck unclean monmap e1: 3 mons at {ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0}, election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2 osdmap e166: 6 osds: 6 up, 6 in pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 5586 GB avail mdsmap e114: 0/0/1 up [root@ceph0 ~]# ceph health detail ... pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5] ... [root@ceph0 ~]# ceph pg 2.3 query i don't have pgid 2.3 How can I get all the pgs back or recreated? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All old pgs in stale after recreating all osds
On Centos 6.4, Ceph 0.61.7. I had a ceph cluster of 9 osds. Today I destroyed all of the osds, and recreated 6 new ones. Then I find all the old pgs are in stale. [root@ceph0 ceph]# ceph -s health HEALTH_WARN 192 pgs stale; 192 pgs stuck inactive; 192 pgs stuck stale; 192 pgs stuck unclean monmap e1: 3 mons at {ceph0=172.18.11.60:6789/0,ceph1=172.18.11.61:6789/0,ceph2=172.18.11.62:6789/0}, election epoch 24, quorum 0,1,2 ceph0,ceph1,ceph2 osdmap e166: 6 osds: 6 up, 6 in pgmap v837: 192 pgs: 192 stale; 9526 bytes data, 221 MB used, 5586 GB / 5586 GB avail mdsmap e114: 0/0/1 up [root@ceph0 ~]# ceph health detail ... pg 2.3 is stuck stale for 10249.230667, current state stale, last acting [5] ... [root@ceph0 ~]# ceph pg 2.3 query i don't have pgid 2.3 How can I get all the pgs back or recreated? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to set Object Size/Stripe Width/Stripe Count?
Hi list, I saw the info about data striping in http://ceph.com/docs/master/architecture/#data-striping . But couldn't find the way to set these values. Could you please tell me how to that or give me a link? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to start/stop ceph daemons separately?
On Ubuntu, we can start/stop ceph daemons separately as below: start ceph-mon id=ceph0 stop ceph-mon id=ceph0 How to do this on Centos or rhel? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos?
Thanks! Neil. If cephfs and krbd are not used, is the default kernel working well with only QEMU/kvm/librbd? AFAIK, librbd doesn't have dependency on the kernel version, right? -- Original -- From: Neil Levineneil.lev...@inktank.com; Date: Thu, Aug 1, 2013 11:13 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos? Yes, default version should work. Neil On Wed, Jul 31, 2013 at 7:11 PM, Da Chun ng...@qq.com wrote: Thanks! Neil and Wido. Neil, what about the livirt version on CentOS 6.4? Just use the official release? -- Original -- From: Neil Levineneil.lev...@inktank.com; Date: Thu, Aug 1, 2013 05:53 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Which Kernel and QEMU/libvirt version do yourecommend on Ubuntu 12.04 and Centos? For CentOS 6.4, we have custom qemu packages available at http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support. You will need to install a newer kernel than the one which ships by default (2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be sufficient. For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get the latest version of all necessary packages. N On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote: Hi List, I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, and test cephfs, krbd, and librbd. Which Kernel and QEMU/libvirt version do you recommend? Any specific patches which I should apply manually? Thanks for your time! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos?
Neil, What's the difference between your custom qemu packages and the official ones? There are two kinds of packages in it: qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64.rpm qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm What's the difference between them? Does the async version support aio flush? -- Original -- From: Neil Levineneil.lev...@inktank.com; Date: Thu, Aug 1, 2013 11:13 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Which Kernel and QEMU/libvirt version doyourecommend on Ubuntu 12.04 and Centos? Yes, default version should work. Neil On Wed, Jul 31, 2013 at 7:11 PM, Da Chun ng...@qq.com wrote: Thanks! Neil and Wido. Neil, what about the livirt version on CentOS 6.4? Just use the official release? -- Original -- From: Neil Levineneil.lev...@inktank.com; Date: Thu, Aug 1, 2013 05:53 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Which Kernel and QEMU/libvirt version do yourecommend on Ubuntu 12.04 and Centos? For CentOS 6.4, we have custom qemu packages available at http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support. You will need to install a newer kernel than the one which ships by default (2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be sufficient. For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get the latest version of all necessary packages. N On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote: Hi List, I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, and test cephfs, krbd, and librbd. Which Kernel and QEMU/libvirt version do you recommend? Any specific patches which I should apply manually? Thanks for your time! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Which Kernel and QEMU/libvirt version do you recommend on Ubuntu 12.04 and Centos?
Hi List, I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, and test cephfs, krbd, and librbd. Which Kernel and QEMU/libvirt version do you recommend? Any specific patches which I should apply manually? Thanks for your time!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Which Kernel and QEMU/libvirt version do you recommendon Ubuntu 12.04 and Centos?
Sorry, forgot to mention the ceph version I want to use. I want to use the latest stable cuttlefish release, 0.61.7 currently. -- Original -- From: Da Chunng...@qq.com; Date: Wed, Jul 31, 2013 10:18 PM To: ceph-usersceph-users@lists.ceph.com; Subject: [ceph-users] Which Kernel and QEMU/libvirt version do you recommendon Ubuntu 12.04 and Centos? Hi List, I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, and test cephfs, krbd, and librbd. Which Kernel and QEMU/libvirt version do you recommend? Any specific patches which I should apply manually? Thanks for your time!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Which Kernel and QEMU/libvirt version do yourecommend on Ubuntu 12.04 and Centos?
Thanks! Neil and Wido. Neil, what about the livirt version on CentOS 6.4? Just use the official release? -- Original -- From: Neil Levineneil.lev...@inktank.com; Date: Thu, Aug 1, 2013 05:53 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Which Kernel and QEMU/libvirt version do yourecommend on Ubuntu 12.04 and Centos? For CentOS 6.4, we have custom qemu packages available at http://ceph.com/packages/ceph-extras/rpm/centos6 which will provide RBD support. You will need to install a newer kernel than the one which ships by default (2.6.32) to use the cephfs or krbd drivers. Any version above 3.x should be sufficient. For Ubuntu 12.04, as per Wido's comments, use the Ubuntu Cloud Archive to get the latest version of all necessary packages. N On Wed, Jul 31, 2013 at 7:18 AM, Da Chun ng...@qq.com wrote: Hi List, I want to deploy two ceph clusters on ubuntu 12.04 and centos 6.4 separately, and test cephfs, krbd, and librbd. Which Kernel and QEMU/libvirt version do you recommend? Any specific patches which I should apply manually? Thanks for your time! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph fio read test hangs
On Ubuntu 13.04, ceph 0.61.4. I was running an fio read test as below, then it hung: root@ceph-node2:/mnt# fio -filename=/dev/rbd1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=4k -size=50G -numjobs=16 -group_reporting -name=mytest mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1 ... mytest: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1 2.0.8 Starting 16 threads ^Cbs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 02d:01h:34m:39s] fio: terminating on signal 2 ^Cbs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 02d:18h:36m:23s] fio: terminating on signal 2 Jobs: 16 (f=16): [] [0.1% done] [0K/0K /s] [0 /0 iops] [eta 04d:07h:40m:55s] The top command shown that one cpu was waiting for disk IO, and the other was idle: top - 20:28:30 up 1 day, 6:02, 3 users, load average: 16.00, 13.91, 8.55 Tasks: 141 total, 1 running, 139 sleeping, 0 stopped, 1 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.3 us, 0.3 sy, 0.0 ni, 0.0 id, 99.3 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 4013924 total, 702112 used, 3311812 free, 3124 buffers KiB Swap: 3903484 total, 184520 used, 3718964 free,74156 cached root@ceph-node4:~# ceph -s health HEALTH_OK monmap e5: 3 mons at {ceph-node0=172.18.11.30:6789/0,ceph-node2=172.18.11.32:6789/0,ceph-node4=172.18.11.34:6789/0}, election epoch 714, quorum 0,1,2 ceph-node0,ceph-node2,ceph-node4 osdmap e4043: 11 osds: 11 up, 11 in pgmap v92429: 1192 pgs: 1192 active+clean; 530 GB data, 1090 GB used, 9041 GB / 10131 GB avail mdsmap e1: 0/0/1 up Nothing error found in the ceph.log. Anything else I can collect for investigation?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Should the disk write cache be disabled?
Do you mean the write barrier? So all ceph disk partitions are mounted with barrier=1? -- Original -- From: Gregory Farnumg...@inktank.com; Date: Wed, Jul 17, 2013 00:29 AM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Should the disk write cache be disabled? Just old kernels, as they didn't correctly provide all the barriers and other ordering constraints necessary for the write cache to be used safely. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jul 16, 2013 at 9:20 AM, Da Chun ng...@qq.com wrote: In this doc, http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/ It says, Ceph aims for data safety, which means that when the Ceph Client receives notice that data was written to a storage drive, that data was actually written to the storage drive. For old kernels (2.6.33), disable the write cache if the journal is on a raw drive. Newer kernels should work fine. Use hdparm to disable write caching on the hard disk: sudo hdparm -W 0 /dev/hda Does it mean the disk write cache should be always disabled for ceph, or just when using the old kernel (2.6.33)? Thanks for your time! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Should the disk write cache be disabled?
In this doc, http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/ It says, Ceph aims for data safety, which means that when the Ceph Client receives notice that data was written to a storage drive, that data was actually written to the storage drive. For old kernels (2.6.33), disable the write cache if the journal is on a raw drive. Newer kernels should work fine. Use hdparm to disable write caching on the hard disk: sudo hdparm -W 0 /dev/hda Does it mean the disk write cache should be always disabled for ceph, or just when using the old kernel (2.6.33)? Thanks for your time!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph iscsi questions
Kurt, do you have performance benchmark data for the tgt target? I ran a simple benchmark for LIO iSCIS target. The ceph cluster is with default settings. The read performance is good. But the write performance is very poor from my point of view. Performance of mapped kernel rbd: root@ceph-observer:/mnt/fs2# echo 3 | sudo tee /proc/sys/vm/drop_caches sudo sync 3 root@ceph-observer:/mnt/fs2# dd bs=1M count=1024 if=/dev/zero of=test conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 10.0333 s, 107 MB/s root@ceph-observer:/mnt/fs2# echo 3 | sudo tee /proc/sys/vm/drop_caches sudo sync 3 root@ceph-observer:/mnt/fs2# dd if=test of=/dev/null bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 10.0018 s, 107 MB/s Performance of LIO iSCSI target, mapped kernel rbd: root@ceph-observer:/mnt/fs3# echo 3 | sudo tee /proc/sys/vm/drop_caches sudo sync 3 root@ceph-observer:/mnt/fs3# dd bs=1M count=1024 if=/dev/zero of=test conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 21.3096 s, 50.4 MB/s root@ceph-observer:/mnt/fs3# echo 3 | sudo tee /proc/sys/vm/drop_caches sudo sync 3 root@ceph-observer:/mnt/fs3# dd if=test of=/dev/null bs=1M 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.70467 s, 102 MB/s -- Original -- From: Kurt Bauerkurt.ba...@univie.ac.at; Date: Tue, Jun 18, 2013 08:38 PM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] ceph iscsi questions Da Chun schrieb: Thanks for sharing! Kurt. Yes. I have read the article you mentioned. But I also read another one: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices. It uses LIO, which is the current standard Linux kernel SCSI target. That has a major disadvantage, which is, that you have to use the kernel rbd module, which is not feature equivalent to ceph userland code, at least in kernel-versions which are shipped with recent distributions. There is another doc in the ceph site: http://ceph.com/w/index.php?title=ISCSIredirect=no Quite outdated I think, last update nearly 3 years ago, I don't understand what the box in the middle should depict. I don't quite understand how the multi path works here. Are the two ISCSI targets on the same system or two different ones? Has anybody tried this already? Leen has illustrated that quite well. -- Original -- From: Kurt Bauerkurt.ba...@univie.ac.at; Date: Tue, Jun 18, 2013 03:52 PM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] ceph iscsi questions Hi, Da Chun schrieb: Hi List, I want to deploy a ceph cluster with latest cuttlefish, and export it with iscsi interface to my applications. Some questions here: 1. Which Linux distro and release would you recommend? I used Ubuntu 13.04 for testing purpose before. For the ceph-cluster or the iSCSI-GW? We use Ubuntu 12.04 LTS for the cluster and the iSCSI-GW, but tested Debian wheezy as iSCSI-GW too. Both work flawless. 2. Which iscsi target is better? LIO, SCST, or others? Have you read http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ ? That's what we do and it works without problems so far. 3. The system for the iscsi target will be a single point of failure. How to eliminate it and make good use of ceph's nature of distribution? That's a question we asked aourselves too. In theory one can set up 2 iSCSI-GW and use multipath but what does that do to the cluster? Will smth. break if 2 iSCSI targets use the same rbd image in the cluster? Even if I use failover-mode only? Has someone already tried this and is willing to share their knowledge? Best regards, Kurt Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Possible to bind one osd with a specific network adapter?
Hi List, Each of my osd nodes has 5 network Gb adapters, and has many osds, one disk one osd. They are all connected with a Gb switch. Currently I can get an average 100MB/s of read/write speed. To improve the throughput further, the network bandwidth will be the bottleneck, right? I can't afford to replace all the adapters and switch with 10Gb ones. How can I improve the throughput based on current gears? My first thought is to use bonding as we have multiple adapters. But bonding has performance cost, surely cannot multiplex the throughput. And it has dependency on the switch. My second thought is to group the adapters and osds. For example, we have three adapters called A1, A2, A3, and 6 osds called O1, O2,..., O6. let O1 O2 use A1 exclusively, O3 O4 use A2 exclusively, O5 O6 use A3 exclusively. So they are separated groups, each group has its own disks, adapters, which are not shared. Only CPU memory resource is shared between groups. Is it possible to do this with current ceph implementation? Thanks for your time and any ideas!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Possible to bind one osd with a specific networkadapter?
James, Thank you?? No, I have not separated the public and cluster yet. They are on the same switch. As I don't have many nodes now, the switch won't be the bottleneck currently. -- Original -- From: James Harperjames.har...@bendigoit.com.au; Date: Sat, Jun 22, 2013 12:41 PM To: Da Chunng...@qq.com; ceph-usersceph-users@lists.ceph.com; Subject: RE: [ceph-users] Possible to bind one osd with a specific networkadapter? Hi List, Each of my osd nodes has 5 network Gb adapters, and has many osds, one disk one osd. They are all connected with a Gb switch. Currently I can get an average 100MB/s of read/write speed. To improve the throughput further, the network bandwidth will be the bottleneck, right? Do you already have separate networks for public and cluster? I can't afford to replace all the adapters and switch with 10Gb ones. How can I improve the throughput based on current gears? My first thought is to use bonding as we have multiple adapters. But bonding has performance cost, surely cannot multiplex the throughput. And it has dependency on the switch. LACP bonding should be okay. Each connection will only be 1gbit/second but if you have multiple clients and multiple connections you could see improved performance. If you want to use plain round robin ordering at layer two, play with net/ipv4/tcp_reordering value to improve things. I do this and iperf gives me 2gbit/second throughput, but with an increase in cpu of course. My second thought is to group the adapters and osds. For example, we have three adapters called A1, A2, A3, and 6 osds called O1, O2,..., O6. let O1 O2 use A1 exclusively, O3 O4 use A2 exclusively, O5 O6 use A3 exclusively. So they are separated groups, each group has its own disks, adapters, which are not shared. Only CPU memory resource is shared between groups. I tested something similar. Each server has two disks and 2 adapters assigned to the cluster network. Each adapter is on a different subnet. As long as each osd can reach each ip address (because it has adapters on both networks) it should be fine and is probably better than bonding. Actual multipath would be nice for the public network, but LACP should give you an aggregate increase even if individual connections are still limited to the adapter link speed. James___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to change the journal size at run time?
Hi List, The default journal size is 1G, which I think is too small for my Gb network. I want to extend all the journal partitions to 2 or 4G. How can I do that? The osds were all created by commands like ceph-deploy osd create ceph-node0:/dev/sdb. The journal partition is on the same disk together with the corresponding data partition. I notice there is an attribute osd journal size which value is 1024. I guess this is why the command ceph-deploy osd create set the journal partition size as 1G. I want to do this job using steps as below: 1. Change the osd journal size in the ceph.conf to 4G 2. Remove the osd 3. Readd the osd 4. Repeat 2 and 3 steps for all the osds. This needs lots of manual work and is time consuming. Are there better ways to do that? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Error in ceph.conf when creating cluster with IP addresses as monitors
ceph@ceph-node0:~/test$ ceph-deploy new 172.18.11.30 172.18.11.32 172.18.11.34 ceph@ceph-node0:~/test$ cat ceph.conf [global] fsid = caf39355-bd8f-450e-b026-6001607e62cf mon initial members = 172, 172, 172 mon host = 172.18.11.30,172.18.11.32,172.18.11.34 auth supported = cephx osd journal size = 1024 filestore xattr use omap = true The IP addresses have been truncated. Is it by design not to use IP address as a name? My ceph-deploy is the latest one: ceph-deploy_1.0-1_all.deb . BTW, is there a version attribute in ceph-deploy for printing version info?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repository Mirroring
+1. I use apt-mirror to do this now. -- Original -- From: John Nielsenli...@jnielsen.net; Date: Thu, Jun 20, 2013 00:21 AM To: Joe Rynerjry...@cait.org; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Repository Mirroring On Jun 18, 2013, at 12:08 PM, Joe Ryner jry...@cait.org wrote: I would like to make a local mirror or your yum repositories. Do you support any of the standard methods of syncing aka rsync? +1. Our Ceph boxes are firewalled from the Internet at large and installing from a local mirror is faster and simpler than trying to go through a proxy. I'm currently using wget but that's not very friendly since the web server hosting the repo doesn't issue last-modified headers so I end up downloading everything every time. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph iscsi questions
Thanks for sharing! Kurt. Yes. I have read the article you mentioned. But I also read another one: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices. It uses LIO, which is the current standard Linux kernel SCSI target. There is another doc in the ceph site: http://ceph.com/w/index.php?title=ISCSIredirect=no I don't quite understand how the multi path works here. Are the two ISCSI targets on the same system or two different ones? Has anybody tried this already? -- Original -- From: Kurt Bauerkurt.ba...@univie.ac.at; Date: Tue, Jun 18, 2013 03:52 PM To: Da Chunng...@qq.com; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] ceph iscsi questions Hi, Da Chun schrieb: Hi List, I want to deploy a ceph cluster with latest cuttlefish, and export it with iscsi interface to my applications. Some questions here: 1. Which Linux distro and release would you recommend? I used Ubuntu 13.04 for testing purpose before. For the ceph-cluster or the iSCSI-GW? We use Ubuntu 12.04 LTS for the cluster and the iSCSI-GW, but tested Debian wheezy as iSCSI-GW too. Both work flawless. 2. Which iscsi target is better? LIO, SCST, or others? Have you read http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ ? That's what we do and it works without problems so far. 3. The system for the iscsi target will be a single point of failure. How to eliminate it and make good use of ceph's nature of distribution? That's a question we asked aourselves too. In theory one can set up 2 iSCSI-GW and use multipath but what does that do to the cluster? Will smth. break if 2 iSCSI targets use the same rbd image in the cluster? Even if I use failover-mode only? Has someone already tried this and is willing to share their knowledge? Best regards, Kurt Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?
Thanks! Craig. umount works. About the time skew, I saw the log said the time difference should be less than 50ms. I setup one of my nodes as the time server, and the others sync the time with it. I don't know why the system time still changes frequently especially after reboot. Maybe it's because all my nodes are VMware virtual machines. The softclock is not accurate enough. -- Original -- From: Craig Lewiscle...@centraldesktop.com; Date: Tue, Jun 18, 2013 05:34 AM To: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2? If you followed the standard setup, each OSD is it's own disk + filesystem. /var/lib/ceph/osd/ceph-2 is in use, as the mount point for the OSD.2 filesystem. Double check by examining the output of the `mount` command. I get the same error when I try to rename a directory that's used as a mount point. Try `umount /var/lib/ceph/osd/ceph-2` instead of the mv and rm. The fuser command is telling you that the kernel has a filesystem mounted in that directory. Nothing else appears to be using it, so the umount should complete successfully. Also, you should fix that time skew on mon.ceph-node5. The mailing list archives have several posts with good answers. On 6/15/2013 2:14 AM, Da Chun wrote: Hi all, On Ubuntu 13.04 with ceph 0.61.3. I want to remove osd.2 from my cluster. The following steps were performed: root@ceph-node6:~# ceph osd out osd.2 marked out osd.2. root@ceph-node6:~# ceph -w health HEALTH_WARN clock skew detected on mon.ceph-node5 monmap e1: 3 mons at {ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0}, election epoch 124, quorum 0,1,2 ceph-node4,ceph-node5,ceph-node6 osdmap e414: 6 osds: 5 up, 5 in pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail mdsmap e102: 1/1/1 up {0=ceph-node4=up:active} 2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail ^C root@ceph-node6:~# stop ceph-osd id=2 ceph-osd stop/waiting root@ceph-node6:~# ceph osd crush remove osd.2 removed item id 2 name 'osd.2' from crush map root@ceph-node6:~# ceph auth del osd.2 updated root@ceph-node6:~# ceph osd rm 2 removed osd.2 root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2.bak mv: cannot move ??/var/lib/ceph/osd/ceph-2?? to ??/var/lib/ceph/osd/ceph-2.bak??: Device or resource busy Everything was working OK until the last step to remove the osd.2 directory /var/lib/ceph/osd/ceph-2. root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2 USERPID ACCESS COMMAND /var/lib/ceph/osd/ceph-2: root kernel mount /var/lib/ceph/osd/ceph-2 // What does this mean? root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2 root@ceph-node6:~# I restarted the system, and found that the osd.2 daemon was still running: root@ceph-node6:~# ps aux | grep osd root 1264 1.4 12.3 550940 125732 ? Ssl 16:41 0:20 /usr/bin/ceph-osd --cluster=ceph -i 2 -f root 2876 0.0 0.0 4440 628 ?Ss 16:44 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh root 2877 4.9 18.2 613780 185676 ? Sl 16:44 1:04 /usr/bin/ceph-osd --cluster=ceph -i 5 -f I have to take this workaround: root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2 rm: cannot remove ??/var/lib/ceph/osd/ceph-2??: Device or resource busy root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2 root@ceph-node6:~# shutdown -r now root@ceph-node6:~# ps aux | grep osd root 1416 0.0 0.0 4440 628 ?Ss 17:10 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh root 1417 8.9 5.8 468052 59868 ?Sl 17:10 0:02
Re: [ceph-users] Live Migrations with cephFS
OpenStack grizzly VM can be started on rbd(0.61.3) with no problem. I didn't try live migration though. -- Original -- From: Wolfgang Hennerbichlerwolfgang.hennerbich...@risc-software.at; Date: Mon, Jun 17, 2013 02:00 PM To: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] Live Migrations with cephFS On 06/14/2013 08:00 PM, Ilja Maslov wrote: Hi, Is live migration supported with RBD and KVM/OpenStack? Always wanted to know but was afraid to ask :) totally works in my productive setup. but we don't use openstack in this installation, just KVM/RBD. Pardon brevity and formatting, replying from the phone. Cheers, Ilja Robert Sander r.san...@heinlein-support.de wrote: On 14.06.2013 12:55, Alvaro Izquierdo Jimeno wrote: By default, openstack uses NFS but… other options are available….can we use cephFS instead of NFS? Wouldn't you use qemu-rbd for your virtual guests in OpenStack? AFAIK CephFS is not needed for KVM/qemu virtual machines. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, please note that any review, dissemination, disclosure, alteration, printing, circulation, retention or transmission of this e-mail and/or any file or attachment transmitted with it, is prohibited and may be unlawful. If you have received this e-mail or any file or attachment transmitted with it in error please notify postmas...@openet.com. Although Openet has taken reasonable precautions to ensure no viruses are present in this email, we cannot accept responsibility for any loss or damage arising from the use of this email or attachments. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbich...@risc-software.at http://www.risc-software.at ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph iscsi questions
Hi List, I want to deploy a ceph cluster with latest cuttlefish, and export it with iscsi interface to my applications. Some questions here: 1. Which Linux distro and release would you recommend? I used Ubuntu 13.04 for testing purpose before. 2. Which iscsi target is better? LIO, SCST, or others? 3. The system for the iscsi target will be a single point of failure. How to eliminate it and make good use of ceph's nature of distribution? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to remove /var/lib/ceph/osd/ceph-2?
Hi all, On Ubuntu 13.04 with ceph 0.61.3. I want to remove osd.2 from my cluster. The following steps were performed: root@ceph-node6:~# ceph osd out osd.2 marked out osd.2. root@ceph-node6:~# ceph -w health HEALTH_WARN clock skew detected on mon.ceph-node5 monmap e1: 3 mons at {ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0}, election epoch 124, quorum 0,1,2 ceph-node4,ceph-node5,ceph-node6 osdmap e414: 6 osds: 5 up, 5 in pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail mdsmap e102: 1/1/1 up {0=ceph-node4=up:active} 2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail ^C root@ceph-node6:~# stop ceph-osd id=2 ceph-osd stop/waiting root@ceph-node6:~# ceph osd crush remove osd.2 removed item id 2 name 'osd.2' from crush map root@ceph-node6:~# ceph auth del osd.2 updated root@ceph-node6:~# ceph osd rm 2 removed osd.2 root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2.bak mv: cannot move ‘/var/lib/ceph/osd/ceph-2’ to ‘/var/lib/ceph/osd/ceph-2.bak’: Device or resource busy Everything was working OK until the last step to remove the osd.2 directory /var/lib/ceph/osd/ceph-2. root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2 USERPID ACCESS COMMAND /var/lib/ceph/osd/ceph-2: root kernel mount /var/lib/ceph/osd/ceph-2 // What does this mean? root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2 root@ceph-node6:~# I restarted the system, and found that the osd.2 daemon was still running: root@ceph-node6:~# ps aux | grep osd root 1264 1.4 12.3 550940 125732 ? Ssl 16:41 0:20 /usr/bin/ceph-osd --cluster=ceph -i 2 -f root 2876 0.0 0.0 4440 628 ?Ss 16:44 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh root 2877 4.9 18.2 613780 185676 ? Sl 16:44 1:04 /usr/bin/ceph-osd --cluster=ceph -i 5 -f I have to take this workaround: root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2 rm: cannot remove ‘/var/lib/ceph/osd/ceph-2’: Device or resource busy root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2 root@ceph-node6:~# shutdown -r now root@ceph-node6:~# ps aux | grep osd root 1416 0.0 0.0 4440 628 ?Ss 17:10 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster=${cluster:-ceph} -i $id -f /bin/sh root 1417 8.9 5.8 468052 59868 ?Sl 17:10 0:02 /usr/bin/ceph-osd --cluster=ceph -i 5 -f root@ceph-node6:~# rm -r /var/lib/ceph/osd/ceph-2 root@ceph-node6:~# Any idea? HELP!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Failed to stop osd.
On Ubuntu 13.04 with ceph 0.61.3 . I'm trying to remove one of the osds according to this guide: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#stopping-the-osd First took it out from the cluster: root@ceph-node4:~# ceph osd out osd.0 root@ceph-node4:~# ceph -w/// wait until the recovery finished. Then stop the daemon: root@ceph-node4:~# /etc/init.d/ceph stop osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) The directory tree /var/lib/ceph is as below: root@ceph-node4:~# tree -L 2 /var/lib/ceph/ /var/lib/ceph/ ├── bootstrap-mds │ └── ceph.keyring ├── bootstrap-osd │ └── ceph.keyring ├── mds │ └── ceph-ceph-node4 ├── mon │ └── ceph-ceph-node4 ├── osd │ ├── ceph-0 │ └── ceph-3 └── tmp It seems there are some mismatch of naming.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Failed to stop osd.
Finally managed to stop it by: stop ceph-osd id=0 -- Original -- From: Da Chunng...@qq.com; Date: Fri, Jun 14, 2013 05:00 PM To: ceph-usersceph-users@lists.ceph.com; Subject: [ceph-users] Failed to stop osd. On Ubuntu 13.04 with ceph 0.61.3 . I'm trying to remove one of the osds according to this guide: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#stopping-the-osd First took it out from the cluster: root@ceph-node4:~# ceph osd out osd.0 root@ceph-node4:~# ceph -w/// wait until the recovery finished. Then stop the daemon: root@ceph-node4:~# /etc/init.d/ceph stop osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) The directory tree /var/lib/ceph is as below: root@ceph-node4:~# tree -L 2 /var/lib/ceph/ /var/lib/ceph/ ├── bootstrap-mds │ └── ceph.keyring ├── bootstrap-osd │ └── ceph.keyring ├── mds │ └── ceph-ceph-node4 ├── mon │ └── ceph-ceph-node4 ├── osd │ ├── ceph-0 │ └── ceph-3 └── tmp It seems there are some mismatch of naming.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy osd create hangs
On Ubuntu 13.04 with ceph 0.61.3 . It hangs when creating a new osd using ceph-deploy. ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdd ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdb ceph@ceph-node4:~/mycluster$ ceph-deploy osd create ceph-node4:sdb:sdd ^CTraceback (most recent call last): File /home/ceph/ceph-deploy/ceph-deploy, line 9, in module load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')() File /home/ceph/ceph-deploy/ceph_deploy/cli.py, line 112, in main return args.func(args) File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 425, in osd prepare(args, cfg, activate_prepared_disk=True) File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 265, in prepare dmcrypt_dir=args.dmcrypt_key_dir, File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/proxy.py, line 255, in lambda (conn.operator(type_, self, args, kwargs)) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py, line 66, in operator return self.send_request(type_, (object, args, kwargs)) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 315, in send_request m = self.__waitForResponse(handler) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 412, in __waitForResponse self.__processing_condition.wait() File /usr/lib/python2.7/threading.py, line 339, in wait waiter.acquire() ps aux | grep ceph ceph 4015 0.0 1.1 118412 11404 pts/1Sl+ 20:51 0:00 /home/ceph/ceph-deploy/virtualenv/bin/python /home/ceph/ceph-deploy/ceph-deploy osd create ceph-node4:sdb:sdd root 4043 0.0 0.0 628 pts/1S+ 20:51 0:00 /bin/sh /usr/sbin/ceph-disk-prepare -- /dev/sdb /dev/sdd root 4049 0.1 0.9 43216 9876 pts/1S+ 20:51 0:00 /usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdb /dev/sdd Any idea? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy osd create hangs
OK. I've fixed it. Previously I ran ceph-deploy osd create ceph-node4:sdb by mistake. I terminated it by Ctrl-c. Therefore the lock on /var/lib/ceph/tmp/ceph-disk.prepare.lock.lock was not released. So the next ceph-deploy osd create was hanging waiting for the lock. It's a user error, but not easy to be located. To avoid this problem, maybe we can catch SIGINT in the command ceph-disk: import signal import sys def signal_handler(signal, frame): prepare_lock.release() sys.exit(0) signal.signal(signal.SIGINT, signal_handler) Or at least, for better problem determination, IMHO, a meaningful error message should be prompted by ceph-deploy osd prepare instead of running until hang. -- Original -- From: Da Chunng...@qq.com; Date: Fri, Jun 14, 2013 09:13 PM To: ceph-usersceph-users@lists.ceph.com; Subject: [ceph-users] ceph-deploy osd create hangs On Ubuntu 13.04 with ceph 0.61.3 . It hangs when creating a new osd using ceph-deploy. ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdd ceph@ceph-node4:~/mycluster$ ceph-deploy disk zap ceph-node4:sdb ceph@ceph-node4:~/mycluster$ ceph-deploy osd create ceph-node4:sdb:sdd ^CTraceback (most recent call last): File /home/ceph/ceph-deploy/ceph-deploy, line 9, in module load_entry_point('ceph-deploy==0.1', 'console_scripts', 'ceph-deploy')() File /home/ceph/ceph-deploy/ceph_deploy/cli.py, line 112, in main return args.func(args) File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 425, in osd prepare(args, cfg, activate_prepared_disk=True) File /home/ceph/ceph-deploy/ceph_deploy/osd.py, line 265, in prepare dmcrypt_dir=args.dmcrypt_key_dir, File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/proxy.py, line 255, in lambda (conn.operator(type_, self, args, kwargs)) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py, line 66, in operator return self.send_request(type_, (object, args, kwargs)) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 315, in send_request m = self.__waitForResponse(handler) File /home/ceph/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 412, in __waitForResponse self.__processing_condition.wait() File /usr/lib/python2.7/threading.py, line 339, in wait waiter.acquire() ps aux | grep ceph ceph 4015 0.0 1.1 118412 11404 pts/1Sl+ 20:51 0:00 /home/ceph/ceph-deploy/virtualenv/bin/python /home/ceph/ceph-deploy/ceph-deploy osd create ceph-node4:sdb:sdd root 4043 0.0 0.0 628 pts/1S+ 20:51 0:00 /bin/sh /usr/sbin/ceph-disk-prepare -- /dev/sdb /dev/sdd root 4049 0.1 0.9 43216 9876 pts/1S+ 20:51 0:00 /usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdb /dev/sdd Any idea? Thanks!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Why does ceph need a filesystem (was Simulating DiskFailure)
I have the same question too. I know Ceph was based on a simple fs of its own years ago. I'd like to hear some more details. -- Original -- From: James Harperjames.har...@bendigoit.com.au; Date: Sat, Jun 15, 2013 11:07 AM To: Gregory Farnumg...@inktank.com; Craig Lewiscle...@centraldesktop.com; Cc: ceph-us...@ceph.comceph-us...@ceph.com; Subject: [ceph-users] Why does ceph need a filesystem (was Simulating DiskFailure) Yeah. You've picked up on some warty bits of Ceph's error handling here for sure, but it's exacerbated by the fact that you're not simulating what you think. In a real disk error situation the filesystem would be returning EIO or something, but here it's returning ENOENT. Since the OSD is authoritative for that key space and the filesystem says there is no such object, presto! It doesn't exist. If you restart the OSD it does a scan of the PGs on-disk as well as what it should have, and can pick up on the data not being there and recover. But correctly handling data that has been (from the local FS' perspective) properly deleted under a running process would require huge and expensive contortions on the part of the daemon (in any distributed system that I can think of). -Greg Why was the decision made for ceph to require an underlying filesystem, rather than direct access to disk (like drbd does)? All of my recent disk failures have been unrecoverable read errors (pending sector in SMART stats), which are easy enough to repair in the short term just by rewriting with a known good copy of the data (assuming that there isn't some other underlying cause and this was just a power-off-at-the-wrong-moment error). Unfortunately because of the disconnect between ceph and the LBA this can't be done by ceph. Just curious... Thanks James ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph mount: Only 240 GB , should be 60TB
Sage, I have the same issue with ceph 0.61.3 on Ubuntu 13.04. ceph@ceph-node4:~/mycluster$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/ubuntu1304--64--vg-root 15G 1.5G 13G 11% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 487M 4.0K 487M 1% /dev tmpfs100M 284K 100M 1% /run none 5.0M 0 5.0M 0% /run/lock none 498M 0 498M 0% /run/shm none 100M 0 100M 0% /run/user /dev/sda1228M 34M 183M 16% /boot /dev/sdc1 14G 4.4G 9.7G 32% /var/lib/ceph/osd/ceph-3 /dev/sdb19.0G 1.6G 7.5G 18% /var/lib/ceph/osd/ceph-0 172.18.46.34:6789:/ 276M 94M 183M 34% /mnt/mycephfs # which should be about 70G. ceph@ceph-node4:~/mycluster$ uname -a Linux ceph-node4 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux -- Original -- From: Sage Weils...@inktank.com; Date: Wed, Jun 12, 2013 11:45 PM To: Markus Goldberggoldb...@uni-hildesheim.de; Cc: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] ceph mount: Only 240 GB , should be 60TB Hi Markus, What version of the kernel are you using on the client? There is an annoying compatibility issue with older glibc that makes representing large values for statfs(2) (df) difficult. We switched this behavior to hopefully do things the better/more right way for the future, but it's possible you have an odd version or combination that gives goofy results. sage On Wed, 12 Jun 2013, Markus Goldberg wrote: Hi, this is cuttlefish 0.63 on Ubuntu 13.04, underlying OSD-FS is btrfs, 3 servers, each of them 20TB (Raid6-array) When i mount at the client (or at one of the servers) the mounted filesystem is only 240GB but it should be 60TB. root@bd-0:~# cat /etc/ceph/ceph.conf [global] fsid = e0dbf70d-af59-42a5-b834-7ad739a7f89b mon_initial_members = bd-0, bd-1, bd-2 mon_host = ###.###.###.20,###.###.###.21,###.###.###.22 auth_supported = cephx public_network = ###.###.###.0/24 cluster_network = 192.168.1.0/24 osd_mkfs_type = btrfs osd_mkfs_options_btrfs = -n 32k -l 32k osd_mount_options_btrfs = rw,noatime,nodiratime,autodefrag osd_journal_size = 10240 root@bd-0:~# df on one of the servers: root@bd-0:~# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda139G 4,5G 32G 13% / none4,0K 0 4,0K 0% /sys/fs/cgroup udev 16G 12K 16G 1% /dev tmpfs 3,2G 852K 3,2G 1% /run none5,0M 4,0K 5,0M 1% /run/lock none 16G 0 16G 0% /run/shm none100M 0 100M 0% /run/user /dev/sdc120T 6,6M 20T 1% /var/lib/ceph/osd/ceph-0 root@bd-0:~# root@bd-0:~# ceph -s health HEALTH_OK monmap e1: 3 mons at {bd-0=###.###.###.20:6789/0,bd-1=###.###.###.21:6789/0,bd-2=###.###.###.22:6789/0}, election epoch 66, quorum 0,1,2 bd-0,bd-1,bd-2 osdmap e109: 3 osds: 3 up, 3 in pgmap v848: 192 pgs: 192 active+clean; 23239 bytes data, 16020 KB used, 61402 GB / 61408 GB avail mdsmap e56: 1/1/1 up {0=bd-1=up:active}, 2 up:standby root@bd-0:~# at the client: root@bs4:~# root@bs4:~# mount -t ceph ###.###.###.20:6789:/ /mnt/myceph -v -o name=admin,secretfile=/etc/ceph/admin.secret parsing options: rw,name=admin,secretfile=/etc/ceph/admin.secret root@bs4:~# df -h Dateisystem Gr???e Benutzt Verf. Verw% Eingeh??ngt auf /dev/sda1 28G3,0G 24G 12% / none 4,0K 0 4,0K0% /sys/fs/cgroup udev 998M4,0K 998M1% /dev tmpfs 201M708K 200M1% /run none 5,0M 0 5,0M0% /run/lock none 1002M 84K 1002M1% /run/shm none 100M 0 100M0% /run/user ###.###.###.20:6789:/ 240G 25M 240G1% /mnt/myceph root@bs4:~# root@bs4:~# cd /mnt/myceph root@bs4:/mnt/myceph# mkdir Test root@bs4:/mnt/myceph# cd Test root@bs4:/mnt/myceph/Test# touch testfile root@bs4:/mnt/myceph/Test# ls -la insgesamt 0 drwxr-xr-x 1 root root 0 Jun 12 2013 . drwxr-xr-x 1 root root 0 Jun 12 10:17 .. -rw-r--r-- 1 root root 0 Jun 12 10:18 testfile root@bs4:/mnt/myceph/Test# pwd /mnt/myceph/Test root@bs4:/mnt/myceph/Test# df -h . Dateisystem Gr???e Benutzt Verf. Verw% Eingeh??ngt auf ###.###.###.20:6789:/ 240G 25M 240G1% /mnt/myceph BTW /dev/sda on the servers are 256GB-SSDs Can anyone please help ? Thank you, Markus -- MfG, Markus Goldberg Markus Goldberg | Universit?t
Re: [ceph-users] core dump: qemu-img info -f rbd
Yes, it works with -f raw. “qemu-img convert” has the same problem: qemu-img convert -f qcow2 -O rbd cirros-0.3.0-x86_64-disk.img rbd:vm_disks/test_disk2 core dump qemu-img convert -f qcow2 -O raw cirros-0.3.0-x86_64-disk.img rbd:vm_disks/test_disk2 working -- Original -- From: Oliver Franckeoliver.fran...@filoo.de; Date: Thu, Jun 6, 2013 07:14 PM To: ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] core dump: qemu-img info -f rbd Hi, On 06/06/2013 08:12 AM, Jens Kristian Søgaard wrote: Hi, I got a core dump when executing: root@ceph-node1:~# qemu-img info -f rbd rbd:vm_disks/box1_disk1 Try leaving out -f rbd from the command - I have seen that make a difference before. ... or try -f raw. Same is for the -drive format=raw,file=... specification. Former format=rbd does not work anymore. Basically the format _is_ raw ;) Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Failed to clone ceph
Failed to clone ceph. Do you have the same problem? root@ceph-node7:~/workspace# git clone --recursive https://github.com/ceph/ceph.git Cloning into 'ceph'... remote: Counting objects: 192874, done. remote: Compressing objects: 100% (41154/41154), done. remote: Total 192874 (delta 155848), reused 186400 (delta 149917) Receiving objects: 100% (192874/192874), 39.74 MiB | 8 KiB/s, done. Resolving deltas: 100% (155848/155848), done. Submodule 'ceph-object-corpus' (git://ceph.com/git/ceph-object-corpus.git) registered for path 'ceph-object-corpus' Submodule 'src/libs3' (git://github.com/ceph/libs3.git) registered for path 'src/libs3' Cloning into 'ceph-object-corpus'... remote: Counting objects: 6255, done. remote: Compressing objects: 100% (3462/3462), done. fatal: read error: Connection reset by peeriB | 127 KiB/s fatal: early EOFs: 10% (626/6255), 72.00 KiB | 127 KiB/s fatal: recursion detected in die handler Clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule path 'ceph-object-corpus' failed root@ceph-node7:~/workspace# git clone git://ceph.com/git/ceph-object-corpus.git Cloning into 'ceph-object-corpus'... remote: Counting objects: 6255, done. remote: Compressing objects: 100% (3462/3462), done. fatal: read error: Connection reset by peeriB | 102 KiB/s fatal: early EOF fatal: recursion detected in die handler___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com