[ceph-users] fs as btrfs and ceph journal
Hello, I'm using btrfs for OSDs and want to know if it still helps to have the journal on a faster drive. From what I've read I'm under the impression that with btrfs journal, the OSD journal doesn't do much work anymore. Best regards, Cristian Falcas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Forcing ceph-mon to bind to a certain IP address
On 07/23/2014 09:23 PM, fake rao wrote: I would like ceph-mon to bind to 0.0.0.0 since it is running on a machine that gets its IP from a DHCP server and the IP changes on every boot. Not possible. The IPs of the mons have to be static and are used in the quorum forming process. So dynamic IPs for a monitor is not an option. Wido Is there a way to specify this in the ceph.conf file? Thanks Akshay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw monitoring
Thanks zhu qiang for your response that means there are only the logs with the help of which we can monitor radosgw instances for coming user request traffic for uploading and downloading the stored data and also for monitoring other features of radosgw no external monitoring tool, such as calamari, nagios collectd, zabbix etc., provide the functionality to monitor radosgw instances. Am I right? Thanks again Pragya Jain On Friday, 25 July 2014 8:12 PM, zhu qiang zhu_qiang...@foxmail.com wrote: Hi, May be you can try the ways below: 1. Set “debug rgw = 2” ,then view the radosgw daemon’s log, also can use ‘sed,grep,awk’,get the infos you want. 2. timely rum “ceph daemon client.radosgw.X perf dump” command to get the statics message of radosgw daemon. This is all I know, may this will be usefull for you. From:ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of pragya jain Sent: Friday, July 25, 2014 6:39 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] radosgw monitoring Hi all, Please suggest me some open source monitoring tools which can monitor radosgw instances for coming user request traffic for uploading and downloading the stored data and also for monitoring other features of radosgw Regards Pragya Jain ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] More problems building Ceph....
On 07/25/2014 08:39 PM, Noah Watkins wrote: You can rm -rf those submodule directories and then re-run submodule init/update to put the tree in a good state without re-cloning. FWIW, as part of my build procedure I usually go with: git submodule foreach 'git clean -fdx' git clean -fdx [remote update checkout if necessary] git submodule update --init (autogen configure) || do_autogen.sh make [-j X] ??? profit Also, Deven, I found that using ccache and distcc on my cubietruck helped build a lot, especially having my (x86_64) server helping in the cross-compilation effort. Further, if you are compiling on some SoC with low RAM you may find it necessary at some point to change the Makefile and remove 'ceph-dencoder' from the build. If that's the case you will soon find out that just trying to link that bad boy will deplete your RAM and the build will be OOM-killed (this was an issue with the 512MB of the pi, for instance, not so much with the cubietruck). -Joao On Fri, Jul 25, 2014 at 12:10 PM, Deven Phillips deven.phill...@gmail.com wrote: Noah, That DOES appear to have been at least part of the problem... The src/lib3/ directory was empty and when I tried to use submodules to update it I got errors about non-empty directories... Trying to fix that now.. Thanks! Deven On Fri, Jul 25, 2014 at 2:51 PM, Noah Watkins noah.watk...@inktank.com wrote: Make sure you are intializing the sub-modules.. the autogen.sh script should probably notify users when these are missing and/or initialize them automatically.. git submodule init git submodule update or alternatively, git clone --recursive ... On Fri, Jul 25, 2014 at 11:48 AM, Deven Phillips deven.phill...@gmail.com wrote: I'm trying to build DEB packages for my armhf devices, but my most recent efforts are dying. Anny suggestions would be MOST welcome! make[5]: Entering directory `/home/cubie/Source/ceph/src/java' jar cf libcephfs.jar -C java com/ceph/fs/CephMount.class -C java com/ceph/fs/CephStat.class -C java com/ceph/fs/CephStatVFS.class -C java com/ceph/fs/CephNativeLoader.class -C java com/ceph/fs/CephNotMountedException.class -C java com/ceph/fs/CephFileAlreadyExistsException.class -C java com/ceph/fs/CephAlreadyMountedException.class -C java com/ceph/fs/CephNotDirectoryException.class -C java com/ceph/fs/CephPoolException.class -C java com/ceph/fs/CephFileExtent.class -C java com/ceph/crush/Bucket.class export CLASSPATH=:/usr/share/java/junit4.jar:java/:test/ ; \ javac -source 1.5 -target 1.5 -Xlint:-options test/com/ceph/fs/*.java jar cf libcephfs-test.jar -C test com/ceph/fs/CephDoubleMountTest.class -C test com/ceph/fs/CephMountCreateTest.class -C test com/ceph/fs/CephMountTest.class -C test com/ceph/fs/CephUnmountedTest.class -C test com/ceph/fs/CephAllTests.class make[5]: Leaving directory `/home/cubie/Source/ceph/src/java' make[4]: Leaving directory `/home/cubie/Source/ceph/src/java' Making all in libs3 make[4]: Entering directory `/home/cubie/Source/ceph/src/libs3' make[4]: *** No rule to make target `all'. Stop. make[4]: Leaving directory `/home/cubie/Source/ceph/src/libs3' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/cubie/Source/ceph/src' make[2]: *** [all] Error 2 make[2]: Leaving directory `/home/cubie/Source/ceph/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/cubie/Source/ceph' make: *** [build-stamp] Error 2 dpkg-buildpackage: error: debian/rules build gave error exit status 2 Thanks in advance! Deven ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Optimal OSD Configuration for 45 drives?
On 25 Jul 2014, at 5:54 pm, Christian Balzer ch...@gol.com wrote: On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote: Hi, I’ve purchased a couple of 45Drives enclosures and would like to figure out the best way to configure these for ceph? That's the second time within a month somebody mentions these 45 drive chassis. Would you mind elaborating which enclosures these are precisely? I'm wondering especially about the backplane, as 45 is such an odd number. The Chassis is from 45drives.com. it has 3 rows of 15 direct wire sas connectors connected to two highpoint rocket 750s using 12 SFF-8087 Connectors. I’m considering replacing the highpoints with 3x LSI 9201-16I cards The chassis’ are loaded up with 45 Seagate 4TB drives, and separate to the 45 large drives are the two boot drives in raid 1. Also if you don't mind, specify a couple and what your net storage requirements are. Total is 3 of these 45drives.com enclosures for 3 replicas of our data, In fact, read this before continuing: --- https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11011.html --- Mainly I was wondering if it was better to set up multiple raid groups and then put an OSD on each rather than an OSD for each of the 45 drives in the chassis? Steve already towed the conservative Ceph party line here, let me give you some alternative views and options on top of that and to recap what I wrote in the thread above. In addition to his links, read this: --- https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf --- Lets go from cheap and cheerful to comes with racing stripes. 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind the cheapest (and densest) controllers you can get. Having the journal on the disks will halve their performance, but you just wanted the space and are not that pressed for IOPS. The best you can expect per node with this setup is something around 2300 IOPS with normal (7200RPM) disks. 2) Same as 1), but use controllers with a large HW cache (4GB Areca comes to mind) in JBOD (or 45 times RAID0) mode. This will alleviate some of the thrashing problems, particular if you're expecting high IOPS to be in short bursts. 3) Ceph Classic, basically what Steve wrote. 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of journals). This will give you sustainable 3200 IOPS, but of course the journals on SSDs not only avoid all that trashing about on the disk but also allow for coalescing of writes, so this is going to be fastest solution so far. Of course you will need 3 of these at minimum for acceptable redundancy, unlike 4) which just needs a replication level of 2. 4) The anti-cephalopod. See my reply from a month ago in the link above. All the arguments apply, it very much depends upon your use case and budget. In my case the higher density, lower cost and ease of maintaining the cluster where well worth the lower IOPS. 5) We can improve upon 3) by using HW cached controllers of course. And hey, you did need to connect those drive bays somehow anyway. ^o^ Maybe even squeeze some more out of it by having the SSD controller separate from the HDD one(s). This is as fast (IOPS) as it comes w/o going to full SSD. Thanks, “All Spinning Rust” will probably be fine, we’re looking to just store full server backups for a long time, so there’s not expected to be high IO or anything like that. The servers came with some pretty underpowered specs re: cpu/ram and they support a max of 32GB each and single socket. but at some point I plan to upgrade the motherboard to allow much much more ram to be fitted. Mainly the reason why I ask if it’s a good idea to set up raid groups for the OSDs is that I can’t put 96GB ram in these and can’t put enough cpu power in to them. I’m imagining it’ll all start to fall to pieces if I try to operate these with ceph due to the small amount of ram and cpu? Networking: Either of the setups above will saturate a single 10Gb/s aka 1GB/s as Steve noted. In fact 3) to 5) will be able to write up to 4GB/s in theory based on the HDDs sequential performance, but that is unlikely to be seen in real live. And of course your maximum write speed is based on the speed of the SSDs. So for example with 3) you would want those 8 SSDs to have write speeds of about 250MB/s, giving you 2GB/s max write. Which in turn means 2 10GB/s links at least, up to 4 if you want redundancy and/or a separation of public and cluster network. RAM: The more, the merrier. It's relatively cheap and avoiding have to actually read from the disks will make your write IOPS so much happier. CPU: You'll want something like Steve recommended for 3), I'd go with 2 8core CPUs actually, so you have some Oomps to spare for the OS, IRQ handling, etc. With 4) and actual 4 OSDs, about half of that will be fine, with the expectation of Ceph code improvements.
Re: [ceph-users] ceph.com centos7 repository ?
Just to let you know that qemu packages from centos don't have rbd compiled in. You will need to compile your own packages with the -ev version from redhat for this. On Thu, Jul 10, 2014 at 4:58 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, RHEL7 repository works just as well. CentOS 7 is effectively a copy of RHEL7 anyway. Packages for CentOS 7 wouldn't actually be any different. Erik. On 07/10/2014 06:14 AM, Alexandre DERUMIER wrote: Hi, I would like to known if a centos7 respository will be available soon ? Or can I use current rhel7 for the moment ? http://ceph.com/rpm-firefly/rhel7/x86_64/ Cheers, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)
Hi, don't see an improvement with tcp_window_scaling=0 with my configuration. More the other way: the iperf-performance are much less: root@ceph-03:~# iperf -c 172.20.2.14 Client connecting to 172.20.2.14, TCP port 5001 TCP window size: 96.1 KByte (default) [ 3] local 172.20.2.13 port 50429 connected with 172.20.2.14 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2.94 GBytes 2.52 Gbits/sec root@ceph-03:~# sysctl -w net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_window_scaling = 1 root@ceph-03:~# iperf -c 172.20.2.14 Client connecting to 172.20.2.14, TCP port 5001 TCP window size: 192 KByte (default) [ 3] local 172.20.2.13 port 50431 connected with 172.20.2.14 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.4 GBytes 9.77 Gbits/sec My kernels are 3.11, 3.14 and the VM-host has an patched rhel-kernel 2.6.32 - the iperf-behavior is between all kernels the same. switched back to net.ipv4.tcp_window_scaling=1 Udo On 24.07.2014 22:15, Jean-Tiare LE BIGOT wrote: What is your kernel version ? On kernel = 3.11 sysctl -w net.ipv4.tcp_window_scaling=0 seems to improve the situation a lot. It also helped a lot to mitigate processes going (and sticking) in 'D' state. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Optimal OSD Configuration for 45 drives?
On Fri, Jul 25, 2014 at 10:30 PM, Christian Balzer ch...@gol.com wrote: If you read the section E of the manual closely and stare at the back of the case you will see that while there are indeed 4 external SAS connectors right next to the power supply, only 2 of those are inbound (upstream, HBA) and the other 2 outbound, downstream ones. So my number stands, 4 lanes at 6Gb/s times 2 = 48Gb/s. Which also means that if one were to put faster drives in there, the backside with slightly more bandwidth would be the preferred location. I have many of these in production. There are multiple ways you can cable them. If you do not cascade them which will further divide your throughput you can use all 16 SAS lanes exposed through the back of the chassis. 8 to the front back plan 8 to the rear for a total of 96Gb/s. This puts 24 drives on 8 SAS lanes and 21 drives on the remaining 8. If you want to cascade to another chassis, you will have to supply your own internal cables to cascade the two back planes. For DR disk pools I cascade them. For production systems I only direct connect the SAS expanders to the HBAs. -Chip ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Optimal OSD Configuration for 45 drives?
On Sat, 26 Jul 2014 20:49:46 +1000 Matt Harlum wrote: On 25 Jul 2014, at 5:54 pm, Christian Balzer ch...@gol.com wrote: On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote: Hi, I’ve purchased a couple of 45Drives enclosures and would like to figure out the best way to configure these for ceph? That's the second time within a month somebody mentions these 45 drive chassis. Would you mind elaborating which enclosures these are precisely? I'm wondering especially about the backplane, as 45 is such an odd number. The Chassis is from 45drives.com. it has 3 rows of 15 direct wire sas connectors connected to two highpoint rocket 750s using 12 SFF-8087 Connectors. I’m considering replacing the highpoints with 3x LSI 9201-16I cards The chassis’ are loaded up with 45 Seagate 4TB drives, and separate to the 45 large drives are the two boot drives in raid 1. Oh, Backblaze inspired! I stared at the originals a couple of years ago. ^.^ And yeah, replacing the Highpoint controllers sounds like a VERY good idea. ^o^ You might want to get 2 (large and thus fast) Intel DC 3700 SSDs for the OS drives and put the journals on those (OS MD RAID1, journals on individual partitions). Also if you don't mind, specify a couple and what your net storage requirements are. Total is 3 of these 45drives.com enclosures for 3 replicas of our data, If you're going to use RAID6, a replica of 2 will be fine. In fact, read this before continuing: --- https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11011.html --- Mainly I was wondering if it was better to set up multiple raid groups and then put an OSD on each rather than an OSD for each of the 45 drives in the chassis? Steve already towed the conservative Ceph party line here, let me give you some alternative views and options on top of that and to recap what I wrote in the thread above. In addition to his links, read this: --- https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf --- Lets go from cheap and cheerful to comes with racing stripes. 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind the cheapest (and densest) controllers you can get. Having the journal on the disks will halve their performance, but you just wanted the space and are not that pressed for IOPS. The best you can expect per node with this setup is something around 2300 IOPS with normal (7200RPM) disks. 2) Same as 1), but use controllers with a large HW cache (4GB Areca comes to mind) in JBOD (or 45 times RAID0) mode. This will alleviate some of the thrashing problems, particular if you're expecting high IOPS to be in short bursts. 3) Ceph Classic, basically what Steve wrote. 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of journals). This will give you sustainable 3200 IOPS, but of course the journals on SSDs not only avoid all that trashing about on the disk but also allow for coalescing of writes, so this is going to be fastest solution so far. Of course you will need 3 of these at minimum for acceptable redundancy, unlike 4) which just needs a replication level of 2. 4) The anti-cephalopod. See my reply from a month ago in the link above. All the arguments apply, it very much depends upon your use case and budget. In my case the higher density, lower cost and ease of maintaining the cluster where well worth the lower IOPS. 5) We can improve upon 3) by using HW cached controllers of course. And hey, you did need to connect those drive bays somehow anyway. ^o^ Maybe even squeeze some more out of it by having the SSD controller separate from the HDD one(s). This is as fast (IOPS) as it comes w/o going to full SSD. Thanks, “All Spinning Rust” will probably be fine, we’re looking to just store full server backups for a long time, so there’s not expected to be high IO or anything like that. The servers came with some pretty underpowered specs re: cpu/ram and they support a max of 32GB each and single socket. but at some point I plan to upgrade the motherboard to allow much much more ram to be fitted. Mainly the reason why I ask if it’s a good idea to set up raid groups for the OSDs is that I can’t put 96GB ram in these and can’t put enough cpu power in to them. I’m imagining it’ll all start to fall to pieces if I try to operate these with ceph due to the small amount of ram and cpu? Yeah, you would probably be in some tight spots with the default mobo and 45 individual OSDs. For your use case and this HW RAIDed OSDs look like a good alternative to 1), heck even MD RAID might do the trick if the CPU is beefy enough. If you can replace the mobo/CPUs/RAM with something more adequate before deployment, go for 1). Christian Networking: Either of the setups above will saturate a single 10Gb/s aka 1GB/s as Steve noted. In fact 3) to 5) will
[ceph-users] firefly osds stuck in state booting
Hi, I just setup a test ceph installation on 3 node Centos 6.5 . two of the nodes are used for hosting osds and the third acts as mon . Please note I'm using LVM so had to set up the osd using the manual install guide. --snip-- ceph -s cluster 2929fa80-0841-4cb6-a133-90b2098fc802 health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; noup,nodown,noout flag(s) set monmap e2: 3 mons at {ceph0= 10.0.12.220:6789/0,ceph1=10.0.12.221:6789/0,ceph2=10.0.12.222:6789/0}, election epoch 46, quorum 0,1,2 ceph0,ceph1,ceph2 osdmap e21: 2 osds: 0 up, 0 in flags noup,nodown,noout pgmap v22: 192 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 192 creating --snip-- osd tree --snip-- ceph osd tree # idweight type name up/down reweight -1 2 root default -3 1 host ceph1 0 1 osd.0 down0 -2 1 host ceph2 1 1 osd.1 down0 --snip-- --snip-- ceph daemon osd.0 status { cluster_fsid: 99babb8f-c880-4b32-a227-94aa483d4871, osd_fsid: 1ad28bde-c23c-44ba-a3b7-0fd3372e, whoami: 0, state: booting, oldest_map: 1, newest_map: 21, num_pgs: 0} --snip-- --snip-- ceph daemon osd.1 status { cluster_fsid: 99babb8f-c880-4b32-a227-94aa483d4871, osd_fsid: becc3252-6977-47d6-87af-7b1337e591d8, whoami: 1, state: booting, oldest_map: 1, newest_map: 21, num_pgs: 0} --snip-- # Cpus are idling # does anybody know what is wrong Thanks in advance ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] firefly osds stuck in state booting
On Sat, 26 Jul 2014, 10 minus wrote: Hi, I just setup a test ceph installation on 3 node Centos 6.5 . two of the nodes are used for hosting osds and the third acts as mon . Please note I'm using LVM so had to set up the osd using the manual install guide. --snip-- ceph -s cluster 2929fa80-0841-4cb6-a133-90b2098fc802 health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; noup,nodown,noout flag(s) set monmap e2: 3 mons at{ceph0=10.0.12.220:6789/0,ceph1=10.0.12.221:6789/0,ceph2=10.0.12.222:6789/0 }, election epoch 46, quorum 0,1,2 ceph0,ceph1,ceph2 osdmap e21: 2 osds: 0 up, 0 in flags noup,nodown,noout Do 'ceph osd unset noup' and they should start up. You likely also want to clear nodown and noout as well. sage pgmap v22: 192 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 192 creating --snip-- osd tree --snip-- ceph osd tree # id weight type name up/down reweight -1 2 root default -3 1 host ceph1 0 1 osd.0 down 0 -2 1 host ceph2 1 1 osd.1 down 0 --snip-- --snip-- ceph daemon osd.0 status { cluster_fsid: 99babb8f-c880-4b32-a227-94aa483d4871, osd_fsid: 1ad28bde-c23c-44ba-a3b7-0fd3372e, whoami: 0, state: booting, oldest_map: 1, newest_map: 21, num_pgs: 0} --snip-- --snip-- ceph daemon osd.1 status { cluster_fsid: 99babb8f-c880-4b32-a227-94aa483d4871, osd_fsid: becc3252-6977-47d6-87af-7b1337e591d8, whoami: 1, state: booting, oldest_map: 1, newest_map: 21, num_pgs: 0} --snip-- # Cpus are idling # does anybody know what is wrong Thanks in advance ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com