Re: [ceph-users] Rbd map command doesn't work
EP, Try setting the crush map to use legacy tunables. I've had the same issue with the"feature mismatch" errors when using krbd that didn't support format 2 and running jewel 10.2.2 on the storage nodes. From the command line: ceph osd crush tunables legacy Bruce > On Aug 16, 2016, at 4:21 PM, Somnath Roywrote: > > This is usual feature mismatch stuff , the inbox krbd you are using is not > supporting Jewel. > Try googling with the error and I am sure you will get lot of prior > discussion around that.. > > From: EP Komarla [mailto:ep.koma...@flextronics.com] > Sent: Tuesday, August 16, 2016 4:15 PM > To: Somnath Roy; ceph-users@lists.ceph.com > Subject: RE: Rbd map command doesn't work > > Somnath, > > Thanks. > > I am trying your suggestion. See the commands below. Still it doesn’t seem > to go. > > I am missing something here… > > Thanks, > > - epk > > = > [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G --image-format 1 > rbd: image format 1 is deprecated > [test@ep-c2-client-01 ~]$ rbd map rbd/test1 > rbd: sysfs write failed > In some cases useful info is found in syslog - try "dmesg | tail" or so. > rbd: map failed: (13) Permission denied > [test@ep-c2-client-01 ~]$ sudo rbd map rbd/test1 > ^C[test@ep-c2-client-01 ~]$ > [test@ep-c2-client-01 ~]$ > [test@ep-c2-client-01 ~]$ > [test@ep-c2-client-01 ~]$ > [test@ep-c2-client-01 ~]$ dmesg|tail -20 > [1201954.248195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1201954.253365] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1201964.274082] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1201964.281195] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1201974.298195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1201974.305300] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204128.917562] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204128.924173] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204138.956737] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204138.964011] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204148.980701] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204148.987892] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204159.004939] libceph: mon2 172.20.60.53:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204159.012136] libceph: mon2 172.20.60.53:6789 missing required protocol > features > [1204169.028802] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204169.035992] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204476.803192] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > [1204476.810578] libceph: mon0 172.20.60.51:6789 missing required protocol > features > [1204486.821279] libceph: mon0 172.20.60.51:6789 feature set mismatch, my > 102b84a842a42 < server's 40102b84a842a42, missing 400 > > > > From: Somnath Roy [mailto:somnath@sandisk.com] > Sent: Tuesday, August 16, 2016 3:59 PM > To: EP Komarla ; ceph-users@lists.ceph.com > Subject: RE: Rbd map command doesn't work > > The default format of rbd image in jewel is 2 along with bunch of other > deatures enabled , so, you have following two option: > > 1. create a format 1 image –image-format 1 > > 2. Or, do this in the ceph.conf file [client] or [global] before creating > image.. > rbd_default_features = 3 > > Thanks & Regards > Somnath > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP > Komarla > Sent: Tuesday, August 16, 2016 2:52 PM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Rbd map command doesn't work > > All, > > I am creating an image and mapping it. The below commands used to work in > Hammer, now the same is not working in Jewel. I see the message about some > feature set mismatch – what features are we talking about here? Is this a > known issue in Jewel with a workaround? > > Thanks, > > - epk > > = > > > [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G > [test@ep-c2-client-01 ~]$ rbd info test1 > rbd image 'test1': > size 1024 MB in 256 objects > order 22
Re: [ceph-users] rbd readahead settings
You'll need to set it on the monitor too. Sent from my iPhone > On Aug 15, 2016, at 2:24 PM, EP Komarlawrote: > > Team, > > I am trying to configure the rbd readahead value? Before I increase this > value, I am trying to find out the current value that is set to. How do I > know the values of these parameters? > > rbd readahead max bytes > rbd readahead trigger requests > rbd readahead disable after bytes > > Thanks, > > - epk > > EP KOMARLA, > > Emal: ep.koma...@flextronics.com > Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA > Phone: 408-674-6090 (mobile) > > > Legal Disclaimer: > The information contained in this message may be privileged and confidential. > It is intended to be read only by the individual or entity to whom it is > addressed or by their designee. If the reader of this message is not the > intended recipient, you are on notice that any distribution of this message, > in any form, is strictly prohibited. If you have received this message in > error, please immediately notify the sender and delete or destroy any copy of > this message! > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] systemd-udevd: failed to execute '/usr/bin/ceph-rbdnamer'
I've been asked to look at the performance of RHEL 7.1/RHCS 1.3. I keep running into these errors on 1 of my RHEL 7.1 client systems. The rbd devices are still present, but ceph-rbdname Is not in /usr/bin, but it is in trusty /usr/bin. Much like the rbdmap init script that ships with RHEL 7.1, but depends on functions from trusty /lib/lsb/init-functions (create a user defined systemd init function to map the images if rbd devices required at boot time) is this another example of RHEL 7.1 being not quite ready from prime time as a Ceph client? Can I ignore these messages? Or should I just return to my trusty client of choice and advise that to others? I'm going to want to know if these ceph-rbdnamer error paths are adding overhead to my performance testing on RHEL 7.1 clients so I will most like re-run everything with trusty clients to see for myself, but I'm curious what others have seen with RHEL/Centos/Fedora systemd Ceph clients. There are 3 10TB rbd's in the cluster and 3 clients. Thanks. 14:22:43.018 Message from slave hd2_client0-0: 14:22:43.018 New messages found on /var/adm/messages. Do they belong to you? 14:22:43.018 /var/log/messages: Aug 5 15:22:39 essperf8 systemd-udevd: failed to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd2': No such file or directory 14:22:43.018 /var/log/messages: Aug 5 15:22:39 essperf8 systemd-udevd: failed to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd1': No such file or directory 14:22:43.018 /var/log/messages: Aug 5 15:22:39 essperf8 systemd-udevd: failed to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd1': No such file or directory ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?
Yes the rbd's are not remapped at system boot time. I haven't run into a VM or system hang because this since I ran into it as part of investigating using RHEL 7.1 as a client distro. Yes remapping the rbd's in a startup script worked around the issue. -Original Message- From: Steve Dainard [mailto:sdain...@spd1.com] Sent: Friday, July 17, 2015 1:59 PM To: Bruce McFarland Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings? Other than those errors, do you find RBD's will not be unmapped on system restart/shutdown on a machine using systemd? Leaving the system hanging without network connections trying to unmap RBD's? That's been my experience thus far, so I wrote an (overly simple) systemd file to handle this on a per RBD basis. On Tue, Jul 14, 2015 at 1:15 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: When starting the rbdmap.service to provide map/unmap of rbd devices across boot/shutdown cycles the /etc/init.d/rbdmap includes /lib/lsb/init-functions. This is not a problem except that the rbdmap script is making calls to the log_daemon_* log_progress_* log_actiion_* functions that are included in Ubuntu 14.04 distro's, but are not in the RHEL 7.1/RHCS 1.3 distro. Are there any recommended workaround for boot time startup in RHEL/Centos 7.1 clients? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?
When starting the rbdmap.service to provide map/unmap of rbd devices across boot/shutdown cycles the /etc/init.d/rbdmap includes /lib/lsb/init-functions. This is not a problem except that the rbdmap script is making calls to the log_daemon_* log_progress_* log_actiion_* functions that are included in Ubuntu 14.04 distro's, but are not in the RHEL 7.1/RHCS 1.3 distro. Are there any recommended workaround for boot time startup in RHEL/Centos 7.1 clients? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Performance test matrix?
Is there a classic ceph cluster test matrix?? I'm wondering what's done for releases ie sector sizes 4k,128k,1M,4M? sequential, random, 80/20 mix? # concurrent IOs? I've seen some spreadsheets in the past, but can't find them. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Performance test matrix?
Mark, Thank you very much. We're focusing on block performance currently. All of my object based testing has been done with rados bench so I've yet to do anything through RGW, but will need to be doing that soon. I also want to revisit COSBench. I exercised it ~ a year ago and then decided to focus on blocks so I never really got familiar with it. Bruce -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: Wednesday, July 08, 2015 1:00 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Performance test matrix? Hi Bruce, There's a google doc that previously was public but when it got moved to RH's google drive from Inktanks it got made private instead. It doesn't appear that I can make it public now. You can see the configuration in the CBT yaml files though up on github: https://github.com/ceph/ceph-tools/tree/master/regression/burnupi- available As is these tests were running over 24 hours so we had to cut them back when we were testing previously. Once we have new high performance nodes in the community lab I'm hoping we'll revise this and start getting good nightly tests in. One thing obviously missing is RGW tests. support for civetweb+rgw was added to CBT a couple of months ago and Intel added a module for running cosbench tests, but so far no one has had time to really beta test it. Docs are here: https://github.com/ceph/cbt/blob/master/docs/cosbench.README Mark On 07/08/2015 02:55 PM, Bruce McFarland wrote: Is there a classic ceph cluster test matrix?? I'm wondering what's done for releases ie sector sizes 4k,128k,1M,4M? sequential, random, 80/20 mix? # concurrent IOs? I've seen some spreadsheets in the past, but can't find them. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2
Using the manual method of creating an OSD on RHEL 7.1 with Ceph 94.2 turns up an issue with the ondisk fsid of the journal device. From a quick web search I've found reference to this exact same issue from earlier this year. Is there a version of Ceph that works with RHEL 7.1??? [root@ceph0 ceph]# ceph-disk-prepare --cluster ceph --cluster-uuid b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs /dev/sdc /dev/sdb1 WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data The operation has completed successfully. partx: /dev/sdc: error adding partition 1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=244188597 blks = sectsz=512 attr=2, projid32bit=1 = crc=0finobt=0 data = bsize=4096 blocks=976754385, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=476930, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 The operation has completed successfully. partx: /dev/sdc: error adding partition 1 [root@ceph0 ceph]# mkdir /var/lib/ceph/osd/ceph-0 [root@ceph0 ceph]# ll /var/lib/ceph/osd/ total 0 drwxr-xr-x. 2 root root 6 Jun 29 12:01 ceph-0 [root@ceph0 ceph]# mount -t xfs /dev/sdc1 /var/lib/ceph/osd/ceph-0/ [root@ceph0 ceph]# mount proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=57648336k,nr_inodes=14412084,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/rhel_ceph0-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) /dev/mapper/rhel_ceph0-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime) /dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,relatime,seclabel,attr2,inode64,noquota) [root@ceph0 ceph]# ceph-osd -i=0 --mkfs 2015-06-29 12:02:47.702808 7f2fb4625880 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-06-29 12:02:47.702851 7f2fb4625880 -1 journal check: ondisk fsid ---- doesn't match expected 7e792d5e-a5c6-40cd-a361-0457875ea92c, invalid (someone else's?) journal 2015-06-29 12:02:47.702876 7f2fb4625880 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2015-06-29 12:02:47.702890 7f2fb4625880 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2015-06-29 12:02:47.702928 7f2fb4625880 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument [root@ceph0 ceph]# -Original Message- From: Bruce McFarland Sent: Monday, June 29, 2015 11:39 AM To: 'Loic Dachary
Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2
It doesn't appear to be related to using wwn's for the drive id. The verbose output shows ceph converting from wwn to sd letter. I ran with verbose on and used sd letters for the data drive and the journal and get the same failures. I'm attempting to create OSD's manually now. [root@ceph0 ceph]# ceph-disk -v prepare --cluster ceph --cluster-uuid b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs --zap-disk /dev/sdc /dev/sdb1 DEBUG:ceph-disk:Zapping partition table on /dev/sdc INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/sdc Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! Main and backup partition tables differ! Use the 'c' and 'e' options on the recovery transformation menu to examine the two tables. Warning! One or more CRCs don't match. You should repair the disk! Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk verification and recovery are STRONGLY recommended. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/sdc Creating new GPT entries. The operation has completed successfully. INFO:ceph-disk:calling partx on zapped device /dev/sdc INFO:ceph-disk:re-reading known partitions will display errors INFO:ceph-disk:Running command: /usr/sbin/partx -d /dev/sdc partx: specified range 1:0 does not make sense INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type DEBUG:ceph-disk:Journal is file /dev/sdb1 WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data DEBUG:ceph-disk:Creating osd partition on /dev/sdc INFO:ceph-disk:Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6d05612e-5cc0-422c-9228-4e53ee0f27ac --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdc The operation has completed successfully. INFO:ceph-disk:calling partx on created device /dev/sdc INFO:ceph-disk:re-reading known partitions will display errors INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/sdc partx: /dev/sdc: error adding partition 1 INFO:ceph-disk:Running command: /usr/bin/udevadm settle DEBUG:ceph-disk:Creating xfs fs on /dev/sdc1 INFO:ceph-disk:Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdc1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=244188597 blks = sectsz=512 attr=2, projid32bit=1 = crc=0finobt=0 data = bsize=4096 blocks=976754385, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=476930, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 DEBUG:ceph-disk:Mounting /dev/sdc1 on /var/lib/ceph/tmp/mnt.DQ8nOj with options noatime,inode64 INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdc1 /var/lib/ceph/tmp/mnt.DQ8nOj DEBUG:ceph-disk:Preparing osd data dir /var/lib/ceph/tmp/mnt.DQ8nOj DEBUG:ceph-disk:Creating symlink /var/lib/ceph/tmp/mnt.DQ8nOj/journal - /dev/sdb1 DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.DQ8nOj INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.DQ8nOj INFO:ceph-disk:Running command: /usr/sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdc The operation has completed successfully. INFO:ceph-disk:calling partx on prepared device /dev/sdc INFO:ceph-disk:re-reading known partitions will display errors INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/sdc partx: /dev/sdc: error adding partition 1 [root@ceph0 ceph]# -Original Message- From: Loic Dachary [mailto:l...@dachary.org] Sent: Saturday, June 27, 2015 1:08 AM To: Bruce McFarland; ceph
Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2
Do these issues occur in Centos 7 also? -Original Message- From: Bruce McFarland Sent: Monday, June 29, 2015 12:06 PM To: 'Loic Dachary'; 'ceph-users@lists.ceph.com' Subject: RE: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2 Using the manual method of creating an OSD on RHEL 7.1 with Ceph 94.2 turns up an issue with the ondisk fsid of the journal device. From a quick web search I've found reference to this exact same issue from earlier this year. Is there a version of Ceph that works with RHEL 7.1??? [root@ceph0 ceph]# ceph-disk-prepare --cluster ceph --cluster-uuid b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs /dev/sdc /dev/sdb1 WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data The operation has completed successfully. partx: /dev/sdc: error adding partition 1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=244188597 blks = sectsz=512 attr=2, projid32bit=1 = crc=0finobt=0 data = bsize=4096 blocks=976754385, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=476930, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 The operation has completed successfully. partx: /dev/sdc: error adding partition 1 [root@ceph0 ceph]# mkdir /var/lib/ceph/osd/ceph-0 [root@ceph0 ceph]# ll /var/lib/ceph/osd/ total 0 drwxr-xr-x. 2 root root 6 Jun 29 12:01 ceph-0 [root@ceph0 ceph]# mount -t xfs /dev/sdc1 /var/lib/ceph/osd/ceph-0/ [root@ceph0 ceph]# mount proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=57648336k,nr_inodes=14412084,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/sys temd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/rhel_ceph0-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) /dev/mapper/rhel_ceph0-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime) /dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,relatime,seclabel,attr2,inode64,noquota) [root@ceph0 ceph]# ceph-osd -i=0 --mkfs 2015-06-29 12:02:47.702808 7f2fb4625880 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-06-29 12:02:47.702851 7f2fb4625880 -1 journal check: ondisk fsid ---- doesn't match expected 7e792d5e-a5c6-40cd-a361-0457875ea92c, invalid (someone else's?) journal 2015-06-29 12:02:47.702876 7f2fb4625880 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2015-06-29 12:02:47.702890
Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD
Loic, Thank you very much for the partprobe workaround. I rebuilt the cluster using 94.2. I've created partitions on the journal SSDs with parted and then use ceph-disk prepare as below. I'm not seeing all of the disks with the tmp mounts when I check 'mount' but I also don't see any of the mount directory mount points at /var/lib/ceph/osd. I'm see the following output from prepare. When I attempt to 'activate' it errors out saying the devices don't exist. ceph-disk prepare --cluster ceph --cluster-uuid b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs --zap-disk /dev/disk/by-id/wwn-0x53959bd02f56 /dev/disk/by-id/wwn-0x500080d91010024b-part1 Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk verification and recovery are STRONGLY recommended. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. Creating new GPT entries. The operation has completed successfully. partx: specified range 1:0 does not make sense WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data WARNING:ceph-disk:Journal /dev/disk/by-id/wwn-0x500080d91010024b-part1 was not prepared with ceph-disk. Symlinking directly. The operation has completed successfully. partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1 meta-data=/dev/sdw1 isize=2048 agcount=4, agsize=244188597 blks = sectsz=512 attr=2, projid32bit=1 = crc=0finobt=0 data = bsize=4096 blocks=976754385, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=476930, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 The operation has completed successfully. partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1 [root@ceph0 ceph]# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) [root@ceph0 ceph]# rpm -qa | grep ceph ceph-radosgw-0.94.2-0.el7.x86_64 libcephfs1-0.94.2-0.el7.x86_64 ceph-common-0.94.2-0.el7.x86_64 python-cephfs-0.94.2-0.el7.x86_64 ceph-0.94.2-0.el7.x86_64 [root@ceph0 ceph]# -Original Message- From: Loic Dachary [mailto:l...@dachary.org] Sent: Friday, June 26, 2015 3:29 PM To: Bruce McFarland; ceph-users@lists.ceph.com Subject: Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD Hi, Prior to firefly v0.80.8 ceph-disk zap did not call partprobe and that was causing the kind of problems you're experiencing. It was fixed by https://github.com/ceph/ceph/commit/e70a81464b906b9a304c29f474e672 6762b63a7c and is described in more details at http://tracker.ceph.com/issues/9665. Rebooting the machine ensures the partition table is up to date and that's what you probably want to do after that kind of failure. You can however avoid the failure by running: * ceph-disk zap * partproble * ceph-disk prepare Cheers P.S. The partx: /dev/disk/by-id/wwn-0x53959ba80a4e: error adding partition 1 can be ignored, it does not actually matter. A message was added later to avoid confusion with a real error. . On 26/06/2015 17:09, Bruce McFarland wrote: I have moved storage nodes to RHEL 7.1 and used the basic server install. I installed ceph-deploy and used the ceph.repo/epel.repo for installation of ceph 80.7. I have tried ceph-disk with issuing zap on the same command line as prepare and on a separate command line immediately before the ceph-disk prepare. I consistently run into the partition errors and am unable to create OSD's on RHEL 7.1. ceph-disk prepare --cluster ceph --cluster-uuid 373a09f7-2070-4d20-8504- c8653fb6db80 --fs-type xfs --zap-disk /dev/disk/by-id/wwn- 0x53959ba80a4e /dev/disk/by-id/wwn-0x500080d9101001d6-part1 Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. ** ** Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk verification and recovery are STRONGLY recommended. ** ** GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. The operation has completed successfully. WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data The operation has completed successfully
[ceph-users] Ceph Client OS - RHEL 7.1??
I've always used Ubuntu for my Ceph client OS and found out in the lab that Centos/RHEL 6.x doesn't have the kernel rbd support. I wanted to investigate using RHEL 7.1 for the client OS. Is there a kernel rbd module that installs with RHEL 7.1?? If not are there 7.1 rpm's or src tar balls available to (relatively) easily create a RHEL 7.1 Ceph client?? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing calamari on centos 7
I followed the Calamari build instructions here: http://ceph.com/category/ceph-step-by-step/ I used an Ubuntu 14.04 system to build all of the Calarmari client and server packages for Centos 6.5 and Ubuntu Trusty (14.04). Once the packages were built I also referenced the Calamari instructions here to make sure my storage nodes were setup: http://ceph.com/calamari/docs/development/building_packages.html My cluster uses Ubuntu 14.04 for the client(s) that is hosting the Calamari Master. All of the Ceph storage nodes and monitors are running Centos 6.5 and Ceph 0.80.8. The only issue I had bringing up Calamari was python related the first time I issued the Calamari initialize after the install. Python returns an Import Error: No module named _io. That was solved by copying the installed python2.7 into the calamari venv cp /usr/bin/python2.7 /opt/calamari/venv/local/bin/python. After the initialize command was working I get a full Calamari monitor. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ignacio Bravo Sent: Tuesday, May 26, 2015 11:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Installing calamari on centos 7 Shailesh, I was trying to do the same, but came across several compiling errors, that I decided to deploy the Calamari Server on a Centos 6 machine. Even then I was not able to finalize the installation. See: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001543.html http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001638.html Now I feel less lonely in the deployment of Calamari since you are already in the same boat as myself. Please keep me updated on your progress. IB On 05/26/2015 11:30 AM, Desai, Shailesh wrote: All our ceph clusters are on centos 7 and I am trying to install calamari on one of the node. I am using instructions from http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html. They are written for centos 6. I tries using them but did not work. Has anyone tried installing calamari on Centos 7? Thanks. Shailesh ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- __ Ignacio Bravo CFO LTG Federal, Inc www.ltgfederal.comhttp://www.ltgfederal.com Office: (703) 951-7760 Mobile: (571) 224-6046 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??
In my never ending saga of calamari with minions on a big endian architecture I've brought up another server from a clean install of Ubuntu 14.04. The calamari master is now essperf13. I was able to figure out how to rebuild salt-minion with zmq 3.0.5 which got salt working so the 'salt \* ceph.get_heartbeats' from the master gets expected info. I have made a couple of attempts at rebuilding salt-minion with salt 0.17.5, but it kept rebuilding with the rc2 salt code. I'll revisit that exercise. root@essperf13:/etc/ceph# salt --versions Salt: 0.17.5 Python: 2.7.6 (default, Mar 22 2014, 22:59:56) Jinja2: 2.7.2 M2Crypto: 0.21.1 msgpack-python: 0.3.0 msgpack-pure: Not Installed pycrypto: 2.6.1 PyYAML: 3.10 PyZMQ: 14.0.1 ZMQ: 4.0.4 root@essperf13:/etc/ceph# root@KVDrive11:~# salt --versions Salt: 2015.2.0rc2 Python: 2.6.6 (r266:84292, Dec 29 2010, 00:55:07) Jinja2: 2.7.3 M2Crypto: 0.20.1 msgpack-python: 0.4.6 msgpack-pure: Not Installed pycrypto: 2.1.0 libnacl: Not Installed PyYAML: 3.09 ioflo: Not Installed PyZMQ: 14.5.0 RAET: Not Installed ZMQ: 4.0.5 Mako: Not Installed root@KVDrive11:~# -Original Message- From: Gregory Meno [mailto:gm...@redhat.com] Sent: Wednesday, May 13, 2015 3:52 PM To: Bruce McFarland Cc: Michael Kuriger; ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: Re: [ceph-calamari] [ceph-users] Does anyone understand Calamari?? Wow, That must be a record. I didn’t realize that. It turns out that you’ll have the best experience if the versions of master and minion are in sync. We test and use 2014.1.5 and are still evaluating 2014.7.Z. Glad to hear things are working better. regards, Gregory On May 13, 2015, at 3:33 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: Possibly my issue as well. The calamari master is salt 0.17.5 but the minions are running 2015.2.0rc2. I have to build the minions from source (big endian unsupported architecture). All of my salt issues seemed to get resolved when I got similar versions of ZMQ running on both master and minion. The calamari master is running on Ubuntu 14.04. From: Michael Kuriger [mailto:mk7...@yp.com] Sent: Wednesday, May 13, 2015 2:00 PM To: Bruce McFarland; ceph-calam...@lists.ceph.com; ceph- us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: Re: [ceph-users] Does anyone understand Calamari?? OK, I finally got mine working. For whatever reason, the latest version of salt was the issue for me. Leaving the latest version of salt on the calamari server is working, but had to downgrade the minions. Removed: salt.noarch 0:2014.7.5-1.el6salt-minion.noarch 0:2014.7.5-1.el6 Installed: salt.noarch 0:2014.7.1-1.el6salt-minion.noarch 0:2014.7.1-1.el6 This is on CentOS 6.6 -=Mike Kuriger image001.png Michael Kuriger Sr. Unix Systems Engineer * mk7...@yp.com |( 818-649-7235 From: Bruce McFarland bruce.mcfarl...@taec.toshiba.com Date: Tuesday, May 12, 2015 at 4:34 PM To: ceph-calam...@lists.ceph.com ceph-calam...@lists.ceph.com, ceph-users ceph-us...@ceph.com, ceph-devel (ceph- de...@vger.kernel.org) ceph-de...@vger.kernel.org Subject: [ceph-users] Does anyone understand Calamari?? Increasing the audience since ceph-calamari is not responsive. What salt event/info does the Calamari Master expect to see from the ceph-mon to determine there is an working cluster? I had to change servers hosting the calamari master and can’t get the new machine to recognize the cluster. The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for the monitor and all of the osd’s. Can anyone point me to docs or code that might enlighten me to what I’m overlooking? Thanks. ___ ceph-calamari mailing list ceph-calam...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Does anyone understand Calamari??
Possibly my issue as well. The calamari master is salt 0.17.5 but the minions are running 2015.2.0rc2. I have to build the minions from source (big endian unsupported architecture). All of my salt issues seemed to get resolved when I got similar versions of ZMQ running on both master and minion. The calamari master is running on Ubuntu 14.04. From: Michael Kuriger [mailto:mk7...@yp.com] Sent: Wednesday, May 13, 2015 2:00 PM To: Bruce McFarland; ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: Re: [ceph-users] Does anyone understand Calamari?? OK, I finally got mine working. For whatever reason, the latest version of salt was the issue for me. Leaving the latest version of salt on the calamari server is working, but had to downgrade the minions. Removed: salt.noarch 0:2014.7.5-1.el6salt-minion.noarch 0:2014.7.5-1.el6 Installed: salt.noarch 0:2014.7.1-1.el6salt-minion.noarch 0:2014.7.1-1.el6 This is on CentOS 6.6 -=Mike Kuriger [yp] Michael Kuriger Sr. Unix Systems Engineer * mk7...@yp.commailto:mk7...@yp.com |* 818-649-7235 From: Bruce McFarland bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com Date: Tuesday, May 12, 2015 at 4:34 PM To: ceph-calam...@lists.ceph.commailto:ceph-calam...@lists.ceph.com ceph-calam...@lists.ceph.commailto:ceph-calam...@lists.ceph.com, ceph-users ceph-us...@ceph.commailto:ceph-us...@ceph.com, ceph-devel (ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org) ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org Subject: [ceph-users] Does anyone understand Calamari?? Increasing the audience since ceph-calamari is not responsive. What salt event/info does the Calamari Master expect to see from the ceph-mon to determine there is an working cluster? I had to change servers hosting the calamari master and can't get the new machine to recognize the cluster. The 'salt \* ceph.get_heartbeats' returns monmap, fsid, ver, epoch, etc for the monitor and all of the osd's. Can anyone point me to docs or code that might enlighten me to what I'm overlooking? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Calamari server
I am having a similar issue. The cluster is up and salt is running on and has accepted keys from all nodes, including the monitor. I can issue salt and salt/ceph.py commands from the Calamari including 'salt \* ceph.get_heartbeats' which returns from all nodes including the monitor with the monmap epoch etc. Calamari reports that it sees all of the Ceph servers, but not a Ceph cluster. Is there a salt event besides ceph.get-heartbeats that the Calamari master requires to recognize the cluster? -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Michael Kuriger Sent: Tuesday, May 12, 2015 8:57 AM To: Alexandre DERUMIER Cc: ceph-users; ceph-devel Subject: Re: [ceph-users] New Calamari server In my case, I did remove all salt keys. The salt portion of my install is working. It’s just that the calamari server is not seeing the ceph cluster. Michael Kuriger Sr. Unix Systems Engineer * mk7...@yp.com |( 818-649-7235 On 5/12/15, 1:35 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, when you have remove salt from nodes, do you have remove the old master key /etc/salt/pki/minion/minion_master.pub ? I have add the same behavior than you when reinstalling calamari server, and previously installed salt on ceph nodes (with explicit error about the key in /var/log/salt/minion on ceph nodes) - Mail original - De: Michael Kuriger mk7...@yp.com À: ceph-users ceph-us...@ceph.com Cc: ceph-devel ceph-de...@vger.kernel.org Envoyé: Lundi 11 Mai 2015 23:43:34 Objet: [ceph-users] New Calamari server I had an issue with my calamari server, so I built a new one from scratch. I¹ve been struggling trying to get the new server to start up and see my ceph cluster. I went so far as to remove salt and diamond from my ceph nodes and reinstalled again. On my calamari server, it sees the hosts connected but doesn¹t detect a cluster. What am I missing? I¹ve set up many calamari servers on different ceph clusters, but this is the first time I¹ve tried to build a new calamari server. Here¹s what I see on my calamari GUI: New Calamari Installation This appears to be the first time you have started Calamari and there are no clusters currently configured. 33 Ceph servers are connected to Calamari, but no Ceph cluster has been created yet. Please use ceph-deploy to create a cluster; please see the Inktank Ceph Enterprise documentation for more details. Thanks! Mike Kuriger ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com {.n + +% lzwm b 맲 r yǩ ׯzX ܨ} Ơz j:+vzZ+ +zf h ~i z w ? )ߢf ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??
/var/log/salt/minion doesn't really look very interesting after that sequence. I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much more interesting when clear calamari and stop salt-minion. Looking at the endpoints from http://essperf2/api/v2/cluster doesn't show anything. It reports HTTP 200 OK and Vary: Accept but there is nothing in the body of the output ie no update_time, id, or name is being reported. root@octeon109:/var/log/salt# tail -f /var/log/salt/minion 2015-05-13 01:31:19,066 [salt.crypt ][DEBUG ][4699] Failed to authenticate message 2015-05-13 01:31:19,068 [salt.minion ][DEBUG ][4699] Attempting to authenticate with the Salt Master at 209.243.160.35 2015-05-13 01:31:19,069 [salt.crypt ][DEBUG ][4699] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506') 2015-05-13 01:31:19,294 [salt.crypt ][DEBUG ][4699] Decrypting the current master AES key 2015-05-13 01:31:19,296 [salt.crypt ][DEBUG ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem 2015-05-13 01:31:20,026 [salt.crypt ][DEBUG ][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem 2015-05-13 01:33:04,027 [salt.minion ][INFO ][4699] User root Executing command ceph.get_heartbeats with jid 20150512183304482562 2015-05-13 01:33:04,028 [salt.minion ][DEBUG ][4699] Command details {'tgt_type': 'glob', 'jid': '20150512183304482562', 'tgt': 'octeon109', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'ceph.get_heartbeats'} 2015-05-13 01:33:04,043 [salt.minion ][INFO ][5912] Starting a new job with PID 5912 2015-05-13 01:33:04,053 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded ceph.get_heartbeats 2015-05-13 01:33:04,209 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded pkg.version 2015-05-13 01:33:04,212 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded pkg_resource.version 2015-05-13 01:33:04,217 [salt.utils.lazy ][DEBUG ][5912] LazyLoaded cmd.run_stdout 2015-05-13 01:33:04,219 [salt.loaded.int.module.cmdmod][INFO ][5912] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}\n', '-W'] in directory '/root' 2015-05-13 01:33:05,432 [salt.minion ][INFO ][5912] Returning information for job: 20150512183304482562 2015-05-13 01:33:05,434 [salt.crypt ][DEBUG ][5912] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 'tcp://209.243.160.35:4506') -Original Message- From: Bruce McFarland Sent: Tuesday, May 12, 2015 6:11 PM To: 'Gregory Meno' Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: RE: [ceph-calamari] Does anyone understand Calamari?? Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master? I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu. -Original Message- From: Gregory Meno [mailto:gm...@redhat.com] Sent: Tuesday, May 12, 2015 5:58 PM To: Bruce McFarland Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: Re: [ceph-calamari] Does anyone understand Calamari?? All that looks fine. There must be some state where the cluster is known to calamari and it is failing to actually show it. If you have time to debug I would love to see the logs at debug level. If you don’t we could try cleaning out calamari’s state. sudo supervisorctl shutdown sudo service httpd stop sudo calamari-ctl cl—yes-i-am-sure sudo calamari-ctl initialize ca then sudo service supervisord start sudo service httpd start see what the API and UI says then. regards, Gregory On May 12, 2015, at 5:18 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: Master was ess68 and now it's essperf3. On all cluster nodes the following files now have 'master: essperf3' /etc/salt/minion /etc/salt/minion/calamari.conf /etc/diamond/diamond.conf The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109
Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??
Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are relatively empty. Possibly Cthulhu from the master? I'm running on Ubuntu 14.04 and don't have an httpd service. I had been start/stopping apache2. Likewise there is no supervisord service and I've been using supervisorctl to start/stop Cthulhu. I've performed the calamari-ctl clear/init sequence more than twice with also stopping/starting apache2 and Cthulhu. -Original Message- From: Gregory Meno [mailto:gm...@redhat.com] Sent: Tuesday, May 12, 2015 5:58 PM To: Bruce McFarland Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org) Subject: Re: [ceph-calamari] Does anyone understand Calamari?? All that looks fine. There must be some state where the cluster is known to calamari and it is failing to actually show it. If you have time to debug I would love to see the logs at debug level. If you don’t we could try cleaning out calamari’s state. sudo supervisorctl shutdown sudo service httpd stop sudo calamari-ctl cl—yes-i-am-sure sudo calamari-ctl initialize ca then sudo service supervisord start sudo service httpd start see what the API and UI says then. regards, Gregory On May 12, 2015, at 5:18 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: Master was ess68 and now it's essperf3. On all cluster nodes the following files now have 'master: essperf3' /etc/salt/minion /etc/salt/minion/calamari.conf /etc/diamond/diamond.conf The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* test.ping' from essperf3 Calamari Master to the cluster. I've also included a quick cluster sanity test with the output of ceph -s and ceph osd tree. And for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' since I suspect there might be a missing field in the monitor response. oot@essperf3:/etc/ceph# salt \* test.ping octeon108: True octeon114: True octeon111: True octeon101: True octeon106: True octeon109: True octeon118: True root@essperf3:/etc/ceph# ceph osd tree # idweight type name up/down reweight -1 7 root default -4 1 host octeon108 0 1 osd.0 up 1 -2 1 host octeon111 1 1 osd.1 up 1 -5 1 host octeon115 2 1 osd.2 DNE -6 1 host octeon118 3 1 osd.3 up 1 -7 1 host octeon114 4 1 osd.4 up 1 -8 1 host octeon106 5 1 osd.5 up 1 -9 1 host octeon101 6 1 osd.6 up 1 root@essperf3:/etc/ceph# ceph -s cluster 868bfacc-e492-11e4-89fa-000fb70c health HEALTH_OK monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, quorum 0 octeon109 osdmap e80: 6 osds: 6 up, 6 in pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects 60604 MB used, 2734 GB / 2793 GB avail 728 active+clean root@essperf3:/etc/ceph# root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats octeon109: -- - boot_time: 1430784431 - ceph_version: 0.80.8-0.el6 - services: -- ceph-mon.octeon109: -- cluster: ceph fsid: 868bfacc-e492-11e4-89fa-000fb70c id: octeon109 status: -- election_epoch: 1 extra_probe_peers: monmap: -- created: 2015-04-16 23:50:52.412686 epoch: 1 fsid: 868bfacc-e492-11e4-89fa-000fb70c modified: 2015-04-16 23:50:52.412686 mons: -- - addr: 209.243.160.70:6789/0 - name: octeon109 - rank: 0 name: octeon109 outside_quorum: quorum: - 0 rank: 0 state: leader sync_provider: type: mon version: 0.86 -- - 868bfacc-e492-11e4-89fa-000fb70c: -- fsid: 868bfacc-e492-11e4-89fa-000fb70c
[ceph-users] accepter.accepter.bind unable to bind to IP on any port in range 6800-7300:
I've run into an issue starting OSD's where I'm running out of ports. I've increased the port range with ms bind port max and on the next attempt to start the osd it reports no ports in the new range. I am only running 1 osd on the node and rarely restart the osd. I've increased the debug level to 20 and the only additional information in the log file is the PID for the process that can't get a port. IPtables is not loaded. This has just recently started occurring on multiple osd's and might possibly be releated to my issues with salt and debugging of the calamari master not recognizing ceph-mon even though 'salt \* ceph.get_heartbeats' returns info for all nodes, monmap etc. 2015-05-08 10:52:17.861855 773b7000 0 ceph version 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82), process ceph-osd, pid 4629 2015-05-08 10:52:17.864413 773b7000 -1 accepter.accepter.bind unable to bind to 192.168.2.102:7370 on any port in range 6800-7370: (126) Cannot assign requested address ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Binding a pool to certain OSDs
You won’t get a PG warning message from ceph –s unless you have 20 PG’s per OSD in your cluster. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bruce McFarland Sent: Tuesday, April 14, 2015 10:00 AM To: Giuseppe Civitella; Saverio Proto Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Binding a pool to certain OSDs I use this to quickly check pool stats: [root@ceph-mon01 ceph]# ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 6 'rcvtst' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 400 pgp_num 400 last_change 10879 flags hashpspool stripe_width 0 [root@ceph-mon01 ceph]# Or to individually query a pool: ceph osd pool get rbd pg_num ceph osd pool get rbd pgp_num From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Giuseppe Civitella Sent: Tuesday, April 14, 2015 9:53 AM To: Saverio Proto Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] Binding a pool to certain OSDs Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.commailto:ziopr...@gmail.com: You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com: Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001=10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0http://10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0}, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com: Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe
Re: [ceph-users] Binding a pool to certain OSDs
I use this to quickly check pool stats: [root@ceph-mon01 ceph]# ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 6 'rcvtst' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 400 pgp_num 400 last_change 10879 flags hashpspool stripe_width 0 [root@ceph-mon01 ceph]# Or to individually query a pool: ceph osd pool get rbd pg_num ceph osd pool get rbd pgp_num From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Giuseppe Civitella Sent: Tuesday, April 14, 2015 9:53 AM To: Saverio Proto Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Binding a pool to certain OSDs Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.commailto:ziopr...@gmail.com: You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com: Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001=10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0http://10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0}, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com: Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
Loic, You're not mistaken the pages are listed under the Installation (Manual) link: http://ceph.com/docs/master/install/ You'll see the first link is the Get Packages link which takes you to: http://ceph.com/docs/master/install/get-packages/ This page contains the details on setting up your system to use APT (Ubuntu) or RPM (Centos) and the code for the ceph.repo file. There are also package dependency lists, trusted keys, etc. Bruce -Original Message- From: Loic Dachary [mailto:l...@dachary.org] Sent: Tuesday, April 07, 2015 1:32 AM To: Bruce McFarland; ceph-users Subject: Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5 Hi Bruce, On 07/04/2015 02:40, Bruce McFarland wrote: I'm not sure exactly what your steps where, but I reinstalled a monitor yesterday on Centos 6.5 using ceph-deploy with the /etc/yum.repos.d/ceph.repo from ceph.com which I've included below. Bruce That's what I also ended up doing. But unless I'm mistaken adding /etc/yum.repos.d/ceph.repo from ceph.com for ceph packages is not in the steps listed at http://ceph.com/docs/master/start/, starting from http://ceph.com/docs/master/start/quick-start-preflight/ and proceeding to http://ceph.com/docs/master/start/quick-ceph-deploy/. Cheers [root@essperf13 ceph-mon01]# ceph -v ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047) [root@essperf13 ceph-mon01]# lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.5 (Final) Release: 6.5 Codename: Final [root@essperf13 ceph-mon01]# I'm using the ceph.repo from ceph.com [root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/$basearch priority=1 gpgcheck=1 type=rpm-md [ceph-source] name=Ceph source packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/SRPMS priority=1 gpgcheck=1 type=rpm-md [Ceph-noarch] name=Ceph noarch packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/noarch priority=1 gpgcheck=1 type=rpm-md [root@essperf13 ceph-mon01]# -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic Dachary Sent: Monday, April 06, 2015 5:33 PM To: ceph-users Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5 Hi, I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by following http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but it installed v0.80.5 instead. Is it really what we want by default ? Or is it me misreading the instructions somehow ? Cheers -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
I'm not sure about Centos 7.0 but Ceph is not part of the 6.5 distro. Sent from my iPhone On Apr 7, 2015, at 12:26 PM, Loic Dachary l...@dachary.org wrote: On 07/04/2015 18:51, Bruce McFarland wrote: Loic, You're not mistaken the pages are listed under the Installation (Manual) link: http://ceph.com/docs/master/install/ You'll see the first link is the Get Packages link which takes you to: http://ceph.com/docs/master/install/get-packages/ This page contains the details on setting up your system to use APT (Ubuntu) or RPM (Centos) and the code for the ceph.repo file. There are also package dependency lists, trusted keys, etc. Thanks for checking. Maybe it is intended that instructions for ceph-deploy only get packages from the distribution and not from the ceph.com repositories. Cheers Bruce -Original Message- From: Loic Dachary [mailto:l...@dachary.org] Sent: Tuesday, April 07, 2015 1:32 AM To: Bruce McFarland; ceph-users Subject: Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5 Hi Bruce, On 07/04/2015 02:40, Bruce McFarland wrote: I'm not sure exactly what your steps where, but I reinstalled a monitor yesterday on Centos 6.5 using ceph-deploy with the /etc/yum.repos.d/ceph.repo from ceph.com which I've included below. Bruce That's what I also ended up doing. But unless I'm mistaken adding /etc/yum.repos.d/ceph.repo from ceph.com for ceph packages is not in the steps listed at http://ceph.com/docs/master/start/, starting from http://ceph.com/docs/master/start/quick-start-preflight/ and proceeding to http://ceph.com/docs/master/start/quick-ceph-deploy/. Cheers [root@essperf13 ceph-mon01]# ceph -v ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047) [root@essperf13 ceph-mon01]# lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID:CentOS Description:CentOS release 6.5 (Final) Release:6.5 Codename:Final [root@essperf13 ceph-mon01]# I'm using the ceph.repo from ceph.com [root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/$basearch priority=1 gpgcheck=1 type=rpm-md [ceph-source] name=Ceph source packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/SRPMS priority=1 gpgcheck=1 type=rpm-md [Ceph-noarch] name=Ceph noarch packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as c enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/noarch priority=1 gpgcheck=1 type=rpm-md [root@essperf13 ceph-mon01]# -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic Dachary Sent: Monday, April 06, 2015 5:33 PM To: ceph-users Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5 Hi, I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by following http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but it installed v0.80.5 instead. Is it really what we want by default ? Or is it me misreading the instructions somehow ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
I'm not sure exactly what your steps where, but I reinstalled a monitor yesterday on Centos 6.5 using ceph-deploy with the /etc/yum.repos.d/ceph.repo from ceph.com which I've included below. Bruce [root@essperf13 ceph-mon01]# ceph -v ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047) [root@essperf13 ceph-mon01]# lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description:CentOS release 6.5 (Final) Release:6.5 Codename: Final [root@essperf13 ceph-mon01]# I'm using the ceph.repo from ceph.com [root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/$basearch priority=1 gpgcheck=1 type=rpm-md [ceph-source] name=Ceph source packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/SRPMS priority=1 gpgcheck=1 type=rpm-md [Ceph-noarch] name=Ceph noarch packages gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc enabled=1 baseurl=http://ceph.com/rpm-firefly/el6/noarch priority=1 gpgcheck=1 type=rpm-md [root@essperf13 ceph-mon01]# -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic Dachary Sent: Monday, April 06, 2015 5:33 PM To: ceph-users Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5 Hi, I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by following http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but it installed v0.80.5 instead. Is it really what we want by default ? Or is it me misreading the instructions somehow ? Cheers -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calamari Questions
Quentin, I got the config page to come up by exiting Calamari, deleting the salt keys on the calamari master ‘salt-key –D’, then restarting Calamari on the master and accepting the salt keys on the master ‘salt-key –A’ after doing salt-minion and diamond service restart on the ceph nodes. Once the salt keys were reaccepted by the master Calamari goes to the accept cluster screen when you “click” on any option. Possibly the root issue being that the cluster’s monitor (lab cluster w/only 1 mon) didn’t have the salt-minion/diamond services running and hadn’t broadcast a key to the calamari master. Thanks, Bruce From: Quentin Hartman [mailto:qhart...@direwolfdigital.com] Sent: Wednesday, April 01, 2015 1:56 PM To: Bruce McFarland Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Calamari Questions You should have a config page in calamari UI where you can accept osd nodes into the cluster as Calamari sees it. If you skipped the little first-setup window like I did, it's kind of a pain to find. QH On Wed, Apr 1, 2015 at 12:34 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com wrote: I’ve built the Calamari client, server, and diamond packages from source for trusty and centos and installed it on the trusty Master. Installed diamond and salt packages on the storage nodes. I can connect to the calamari master, accept salt keys from the ceph nodes, but then Calamari reports “3 Ceph servers are connected to Calamari, but no Ceph cluster has been created yet. Please use ceph-deploy to create a cluster” The 3 Ceph nodes are part of an existing Ceph cluster with 90 OSDS. I also built and installed the minion package on the Calamari Master under /opt/calamari/webapp/content/calamari-minions Any ideas what I’ve overlooked in my Calamari bring up? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Calamari Questions
I've built the Calamari client, server, and diamond packages from source for trusty and centos and installed it on the trusty Master. Installed diamond and salt packages on the storage nodes. I can connect to the calamari master, accept salt keys from the ceph nodes, but then Calamari reports 3 Ceph servers are connected to Calamari, but no Ceph cluster has been created yet. Please use ceph-deploy to create a cluster The 3 Ceph nodes are part of an existing Ceph cluster with 90 OSDS. I also built and installed the minion package on the Calamari Master under /opt/calamari/webapp/content/calamari-minions Any ideas what I've overlooked in my Calamari bring up? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD caching on 4K reads???
I'm still missing something. I can check on the monitor to see that the running config on the cluster has rbd cache = false [root@essperf13 ceph]# ceph --admin-daemon /var/run/ceph/ceph-mon.essperf13.asok config show | grep rbd debug_rbd: 0\/5, rbd_cache: false, Since rbd caching is a client setting I've added the following to the rbd client /etc/ceph/ceph.conf [global] log file = /var/log/ceph/rbd.log rbd cache = false rbd readahead max bytes = 0 should be disabled if rbd cache = false, but I'm paranoid [client] admin socket = /var/run/ceph/rbd-$pid.asok I never see a rbd*asok file in /var/run/ceph. I started the rbd driver on the client without /var/run/ceph directory and then see: 2015-02-02 14:40:30.254509 7f81888257c0 -1 asok(0x7f8189182390) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-1716.asok': (2) No such file or directory When I attempt to map the rbd image to the client device with rbd map. Once I create /var/run/ceph these messages don't occur. So it appears that the admin sockets are being created, but only for the duration of the command. I still see the effects of rbd caching if I run fio/vdbench with 4K random reads, but I have not been able to create a persistant rbd admin socket so that I can dump the running configuration and/or change it at run time. Any ideas on what I've overlooked? Any pointers to documentation on the [client] section of ceph.conf? rbd admin sockets? Nothing at ceph.com/docs on either topic. Thanks, Bruce -Original Message- From: Mykola Golub [mailto:to.my.troc...@gmail.com] Sent: Sunday, February 01, 2015 1:24 PM To: Udo Lembke Cc: Bruce McFarland; ceph-us...@ceph.com; Prashanth Nednoor Subject: Re: [ceph-users] RBD caching on 4K reads??? On Fri, Jan 30, 2015 at 10:09:32PM +0100, Udo Lembke wrote: Hi Bruce, you can also look on the mon, like ceph --admin-daemon /var/run/ceph/ceph-mon.b.asok config show | grep cache rbd cache is a client setting, so you have to check this connecting to the client admin socket. Its location is defined in ceph.conf, [client] section, admin socket parameter. -- Mykola Golub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD caching on 4K reads???
I'm using Ubuntu 14.04 and the kernel rbd which makes calls into libceph root@essperf3:/etc/ceph# lsmod | grep rbd rbd63707 1 libceph 225026 1 rbd root@essperf3:/etc/ceph# I'm doing raw device IO with either fio or vdbench (preferred tool) and there is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by writing to the drop_caches and I've also allocated huge pages to be the max allowable based on free memory. The huge page allocation should minimize any system caches. I have a, relatively, small storage pool since this is a development environment and there is only ~ 4TB total and the rbd image is 3TB. On my lab system with 320TB I don't see this problem since the data set is orders of magnitude larger than available system cache. Maybe I'll remove DIMMs from the client system and physically disable kernel caching. -Original Message- From: Nicheal [mailto:zay11...@gmail.com] Sent: Monday, February 02, 2015 7:35 PM To: Bruce McFarland Cc: ceph-us...@ceph.com; Prashanth Nednoor Subject: Re: [ceph-users] RBD caching on 4K reads??? It seems you use the kernel rbd. So rbd_cache does not work, which is just designed for librbd. Kernel rbd is directly using the system page cache. You said that you have already run like echo 3 /proc/sys/vm/drop_cache to invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on any filesystem, such ext4 or xfs? If so, and you run the test tool like fio, first with a write test and file_size = 10G. Then a file(10G) is created by fio but with lots of holes in the file, and your read test may read those holes so that filesystem can tell thay contain nothing and there is no need to access the physical disk to get data. You may check the fiemap of the file to see whether it contains holes or you just remove the file and recreate the file by a read test. Ning Yao 2015-01-31 4:51 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com: I have a cluster and have created a rbd device - /dev/rbd1. It shows up as expected with ‘rbd –image test info’ and rbd showmapped. I have been looking at cluster performance with the usual Linux block device tools – fio and vdbench. When I look at writes and large block sequential reads I’m seeing what I’d expect with performance limited by either my cluster interconnect bandwidth or the backend device throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K random reads. There is caching occurring somewhere in my system that I haven’t been able to detect and suppress - yet. I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the client, monitor, and storage nodes. I’ve flushed the system caches on the client and storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the maximum available to consume free system memory so that it can’t be used for system cache . I’ve also disabled read-ahead on all of the HDD/OSDs. When I run a 4k randon read workload on the client the most I could expect would be ~100iops/osd x number of osd’s – I’m seeing an order of magnitude greater than that AND running IOSTAT on the storage nodes show no read activity on the OSD disks. Any ideas on what I’ve overlooked? There appears to be some read-ahead caching that I’ve missed. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD caching on 4K reads???
Yes I'm using and the kernel rbd in Ubuntu 14.04 which makes calls into libceph root@essperf3:/etc/ceph# lsmod | grep rbd rbd63707 1 libceph 225026 1 rbd root@essperf3:/etc/ceph# I'm doing raw device IO with either fio or vdbench (preferred tool) and there is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by writing to the drop_caches and I've also allocated huge pages to be the max allowable based on free memory. The huge page allocation should minimize any system caches. I have a, relatively, small storage pool since this is a development environment and there is only ~ 4TB total and the rbd image is 3TB. On my lab system with 320TB I don't see this problem since the data set is orders of magnitude larger than available system cache. Maybe I'll should try and test after removing DIMMs from the client system and physically disabling kernel caching. -Original Message- From: Nicheal [mailto:zay11...@gmail.com] Sent: Monday, February 02, 2015 7:35 PM To: Bruce McFarland Cc: ceph-us...@ceph.com; Prashanth Nednoor Subject: Re: [ceph-users] RBD caching on 4K reads??? It seems you use the kernel rbd. So rbd_cache does not work, which is just designed for librbd. Kernel rbd is directly using the system page cache. You said that you have already run like echo 3 /proc/sys/vm/drop_cache to invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on any filesystem, such ext4 or xfs? If so, and you run the test tool like fio, first with a write test and file_size = 10G. Then a file(10G) is created by fio but with lots of holes in the file, and your read test may read those holes so that filesystem can tell thay contain nothing and there is no need to access the physical disk to get data. You may check the fiemap of the file to see whether it contains holes or you just remove the file and recreate the file by a read test. Ning Yao 2015-01-31 4:51 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com: I have a cluster and have created a rbd device - /dev/rbd1. It shows up as expected with ‘rbd –image test info’ and rbd showmapped. I have been looking at cluster performance with the usual Linux block device tools – fio and vdbench. When I look at writes and large block sequential reads I’m seeing what I’d expect with performance limited by either my cluster interconnect bandwidth or the backend device throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K random reads. There is caching occurring somewhere in my system that I haven’t been able to detect and suppress - yet. I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the client, monitor, and storage nodes. I’ve flushed the system caches on the client and storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the maximum available to consume free system memory so that it can’t be used for system cache . I’ve also disabled read-ahead on all of the HDD/OSDs. When I run a 4k randon read workload on the client the most I could expect would be ~100iops/osd x number of osd’s – I’m seeing an order of magnitude greater than that AND running IOSTAT on the storage nodes show no read activity on the OSD disks. Any ideas on what I’ve overlooked? There appears to be some read-ahead caching that I’ve missed. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD caching on 4K reads???
I have a cluster and have created a rbd device - /dev/rbd1. It shows up as expected with 'rbd -image test info' and rbd showmapped. I have been looking at cluster performance with the usual Linux block device tools - fio and vdbench. When I look at writes and large block sequential reads I'm seeing what I'd expect with performance limited by either my cluster interconnect bandwidth or the backend device throughput speeds - 1 GE frontend and cluster network and 7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K random reads. There is caching occurring somewhere in my system that I haven't been able to detect and suppress - yet. I've set 'rbd_cache=false' in the [client] section of ceph.conf on the client, monitor, and storage nodes. I've flushed the system caches on the client and storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the maximum available to consume free system memory so that it can't be used for system cache . I've also disabled read-ahead on all of the HDD/OSDs. When I run a 4k randon read workload on the client the most I could expect would be ~100iops/osd x number of osd's - I'm seeing an order of magnitude greater than that AND running IOSTAT on the storage nodes show no read activity on the OSD disks. Any ideas on what I've overlooked? There appears to be some read-ahead caching that I've missed. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD caching on 4K reads???
The ceph daemon isn't running on the client with the rbd device so I can't verify if it's disabled at the librbd level on the client. If you mean on the storage nodes I've had some issues dumping the config. Does the rbd caching occur on the storage nodes, client, or both? From: Udo Lembke [mailto:ulem...@polarzone.de] Sent: Friday, January 30, 2015 1:00 PM To: Bruce McFarland; ceph-us...@ceph.com Cc: Prashanth Nednoor Subject: Re: [ceph-users] RBD caching on 4K reads??? Hi Bruce, hmm, sounds for me like the rbd cache. Can you look, if the cache is realy disabled in the running config with ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep cache Udo On 30.01.2015 21:51, Bruce McFarland wrote: I have a cluster and have created a rbd device - /dev/rbd1. It shows up as expected with 'rbd -image test info' and rbd showmapped. I have been looking at cluster performance with the usual Linux block device tools - fio and vdbench. When I look at writes and large block sequential reads I'm seeing what I'd expect with performance limited by either my cluster interconnect bandwidth or the backend device throughput speeds - 1 GE frontend and cluster network and 7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K random reads. There is caching occurring somewhere in my system that I haven't been able to detect and suppress - yet. I've set 'rbd_cache=false' in the [client] section of ceph.conf on the client, monitor, and storage nodes. I've flushed the system caches on the client and storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the maximum available to consume free system memory so that it can't be used for system cache . I've also disabled read-ahead on all of the HDD/OSDs. When I run a 4k randon read workload on the client the most I could expect would be ~100iops/osd x number of osd's - I'm seeing an order of magnitude greater than that AND running IOSTAT on the storage nodes show no read activity on the OSD disks. Any ideas on what I've overlooked? There appears to be some read-ahead caching that I've missed. Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitor/OSD report tuning question
See inline: Ceph version: [root@ceph2 ceph]# ceph -v ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) Initial testing was with 30 osd's 10/storage server with the following HW: 4TB SATA disks - 1 hdd/osd - 30hdd's/server - 6 ssd's/server - forming a md raid0 virtual drive with 30 96GB partitions for 1 partition/osd journal. Storage Server HW: 2 x Xeon e5-2630 2.6GHz 24 cores total with 128GB/server Monitor HW: Monitor: 2 x Xeon e5-2630 2.6GHz 24 cores total with 64GB - system disks are 4 x 480GB SAS ssd configured as virtual md raid0 It seems my cluster's main issue is osd_heartbeat_grace since I constantly see osd failures for reporting outside the 20 second grace. The cluster was configured from boot time (I completely tore down the original cluster and rebuilt with increased osd_heartbeat_grace of 35). As you can see the osd is marked down the cluster then goes into a osdmap/pgmap rebalancing cycle and everything is UP/IN with page states of 'active+clean' - for a few moments and then the osd flapping and map rebalancing restarts. All of the osd's are configured and report osd_heartbeat_grace of 35. Any idea why osd's are still failing for 20?? root@ceph0 ceph]# sh -x ./ceph0-daemon-config.sh beat_grace + '[' 1 '!=' 1 ']' + for i in '{0..29}' + ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show + grep beat_grace mon_osd_adjust_heartbeat_grace: true, osd_heartbeat_grace: 35, + for i in '{0..29}' + grep beat_grace + ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show mon_osd_adjust_heartbeat_grace: true, osd_heartbeat_grace: 35, + for i in '{0..29}' 2014-08-25 10:18:10.812179 mon.0 [INF] osd.26 209.243.160.83:6878/4819 failed (279 reports from 56 peers after 21.006896 = grace 20.995963) 2014-08-25 10:18:10.812440 mon.0 [INF] osd.29 209.243.160.83:6887/7439 failed (254 reports from 51 peers after 21.007140 = grace 20.995963) 2014-08-25 10:18:10.817675 mon.0 [INF] osd.18 209.243.160.83:6854/30165 failed (280 reports from 56 peers after 21.012978 = grace 20.995962) 2014-08-25 10:18:10.817850 mon.0 [INF] osd.19 209.243.160.83:6857/31036 failed (245 reports from 49 peers after 21.013135 = grace 20.995962) 2014-08-25 10:18:11.127275 mon.0 [INF] osdmap e25128: 91 osds: 82 up, 90 in 2014-08-25 10:18:11.157030 mon.0 [INF] pgmap v51553: 5760 pgs: 519 stale+active+clean, 5241 active+clean; 0 bytes data, 135 GB used, 327 TB / 327 TB avail 2014-08-25 10:18:11.924773 mon.0 [INF] osd.5 209.243.160.83:6815/19790 failed (270 reports from 54 peers after 22.120541 = grace 21.991499) 2014-08-25 10:18:11.924858 mon.0 [INF] osd.7 209.243.160.83:6821/21303 failed (240 reports from 48 peers after 22.120345 = grace 21.991499) 2014-08-25 10:18:11.924894 mon.0 [INF] osd.11 209.243.160.83:6833/24394 failed (260 reports from 52 peers after 22.120297 = grace 21.991499) 2014-08-25 10:18:11.924943 mon.0 [INF] osd.16 209.243.160.83:6848/28431 failed (265 reports from 53 peers after 22.120080 = grace 21.991499) 2014-08-25 10:18:11.924977 mon.0 [INF] osd.17 209.243.160.83:6851/29253 failed (250 reports from 50 peers after 22.120067 = grace 21.991499) 2014-08-25 10:18:11.925012 mon.0 [INF] osd.23 209.243.160.83:6869/2073 failed (270 reports from 54 peers after 22.120020 = grace 21.991499) 2014-08-25 10:18:11.925065 mon.0 [INF] osd.24 209.243.160.83:6872/3025 failed (260 reports from 52 peers after 22.120010 = grace 21.991499) 2014-08-25 10:15:17.753867 osd.10 [WRN] map e25128 wrongly marked me down 2014-08-25 10:15:17.960953 osd.18 [WRN] map e25128 wrongly marked me down 2014-08-25 10:15:18.217959 osd.29 [WRN] map e25128 wrongly marked me down 2014-08-25 10:18:11.925143 mon.0 [INF] osd.28 209.243.160.83:6884/6572 failed (275 reports from 55 peers after 22.670894 = grace 21.991288) 2014-08-25 10:18:12.204918 mon.0 [INF] pgmap v51554: 5760 pgs: 519 stale+active+clean, 5241 active+clean; 0 bytes data, 135 GB used, 327 TB / 327 TB avail -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Monday, August 25, 2014 1:15 AM To: ceph-us...@ceph.com Cc: Bruce McFarland Subject: Re: [ceph-users] Monitor/OSD report tuning question Hello, On Sat, 23 Aug 2014 20:23:55 + Bruce McFarland wrote: Firstly while the runtime changes you injected into the cluster should have done something (and I hope some Ceph developer comments on that) you're asking for tuning advice which really isn't the issue here. Your cluster should not need any tuning to become functional, what you're seeing is something massively wrong with it. Hello, I have a Cluster Which version? I assume Firefly due to the single monitor which suggests a test cluster, but if you're running a development version all bets are off. [root@ceph2 ceph]# ceph -v ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) with 30 OSDs What disks? How connected? SSD journals? 4TB SATA disks 1/osd - 30hdd's/server - 6 ssd's forming a md raid0 virtual drive with 30 96GB
Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20
I just added osd_heartbeat_grace to the [mon] section of ceph.conf, restarted ceph-mon, and now the monitor is reporting a 35 second osd_heartbeat_grace: [root@ceph-mon01 ceph]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace osd_heartbeat_grace: 35, [root@ceph-mon01 ceph]# -Original Message- From: Bruce McFarland Sent: Monday, August 25, 2014 10:46 AM To: 'Gregory Farnum' Cc: ceph-us...@ceph.com Subject: RE: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20 That's something that was been puzzling to me. The monitor ceph.conf is set to 35, but it's runtime config reports 20. I've restarted it after initial creation to try and get it to reload the ceph.conf settings, but it stays's at 20. [root@ceph-mon01 ceph]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace osd_heartbeat_grace: 20, [root@ceph-mon01 ceph]# [root@ceph-mon01 ceph]# cat ceph.conf [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 209.243.160.84 mon_initial_members = ceph-mon01 fsid = 94bbb882-42e4-4a6c-bfda-125790616fcc osd_pool_default_pg_num = 4096 osd_pool_default_pgp_num = 4096 osd_pool_default_size = 3 # Write an object 3 times - number of replicas. osd_pool_default_min_size = 1 # Allow writing one copy in a degraded state. [mon] mon_osd_min_down_reporters = 2 [osd] debug_ms = 1 debug_osd = 20 public_network = 209.243.160.0/24 cluster_network = 10.10.50.0/24 osd_journal_size = 96000 osd_heartbeat_grace = 35 [osd.0] . . . -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, August 25, 2014 10:39 AM To: Bruce McFarland Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20 On Sat, Aug 23, 2014 at 11:06 PM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: I see osd’s being failed for heartbeat reporting default osd_heartbeat_grace of 20 but the run time config shows that the grace is set to 30. Is there another variable for the osd or the mon I need to set for the non default osd_heartbeat_grace of 30 to take effect? You need to also set the osd heartbeat grace on the monitors. If I were to guess, the OSDs are actually seeing each other as slow (after 30 seconds) and reporting it in, but the monitors have a grace of 20 seconds set so that's what they're using to generate output. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20
After looking a little closer now that I have a better understanding of osd_heartbeat_grace for the monitor all the osd failures are coming from 1 node in the cluster. Yes your hunch was correct and that node had stale in the iptables. After disabling iptables the osd flapping has stopped. Now I'm going to bring the osd_heartbeat_grace value back down incrementally and see if the cluster runs without reporting issues with the default. Thank you very much for your help. I have some default pool questions concerning cluster bring up: I have 90 osd's (single 4TB HDD/osd with 96GB journal that is a partition on a SSD raid0) 30 osd's per storage node. I have the default page/placement group info in the [global] section of ceph.conf: osd_pool_default_pg_num = 4096 osd_pool_default_pgp_num = 4096 When I bring up a cluster I'm running out of the default pools 0-data, 1-metadata, and 2-rbd and getting error msgs for not enough pages/osd. Since osd's require between 20 and 32 pages each as soon as I've brought up the first storage node I need a minimum of 600 pages, but the system comes up with the defaults of 64/default pool. After creation of each nodes osd's I increased the default pool sizes with ceph osd pool set pool pg_num and pgp_num for each of the default pools. Do I need to increase all 3 pools? Is there a ceph.conf setting that handles this startup issue? - whats' the best practices way to handle bringing up more osd's than the default pool page settings can handle? -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, August 25, 2014 11:01 AM To: Bruce McFarland Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20 On Mon, Aug 25, 2014 at 10:56 AM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: Thank you very much for the help. I'm moving osd_heartbeat_grace to the global section and trying to figure out what's going on between the osd's. Since increasing the osd_heartbeat_grace in the [mon] section of ceph.conf on the monitor I still see failures, but now they are 2 seconds osd_heartbeat_grace. It seems that no matter how much I increase this value osd's are reporting just outside of it. I've looked at netstat -s for all of the nodes and will go back and look at the network stat's much closer. Would it help to put the monitor on a 10G link to the storage nodes? Everything is setup, but we chose to leave the monitor on a 1G link to the storage nodes. No. They're being marked down because they aren't heartbeating the OSDs, and those OSDs are reporting the failures to the monitor (whose connection is apparently working fine). The most likely guess without more data is that you've got firewall rules set up blocking the ports the OSDs are using to send their heartbeats...but it could be many things in your network stack or your cpu scheduler or whatever. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20
I see osd's being failed for heartbeat reporting default osd_heartbeat_grace of 20 but the run time config shows that the grace is set to 30. Is there another variable for the osd or the mon I need to set for the non default osd_heartbeat_grace of 30 to take effect? 2014-08-23 23:03:08.982590 mon.0 [INF] osd.23 209.243.160.83:6812/31567 failed (73 reports from 20 peers after 20.462129 = grace 20.00) 2014-08-23 23:03:09.058927 mon.0 [INF] osdmap e37965: 30 osds: 29 up, 30 in 2014-08-23 23:03:09.070575 mon.0 [INF] pgmap v82213: 1920 pgs: 62 stale+active+clean, 1858 active+clean; 0 bytes data, 8193 MB used, 109 TB / 109 TB avail 2014-08-23 23:03:09.860169 mon.0 [INF] osd.20 209.243.160.83:6806/29554 failed (62 reports from 20 peers after 21.339816 = grace 20.995899) 2014-08-23 23:03:09.860246 mon.0 [INF] osd.26 209.243.160.83:6811/1098 failed (66 reports from 20 peers after 21.339380 = grace 20.995899) 2014-08-23 23:03:09.860307 mon.0 [INF] osd.29 209.243.160.83:6804/3217 failed (62 reports from 20 peers after 21.339341 = grace 20.995899) 2014-08-23 23:03:10.076721 mon.0 [INF] osdmap e37966: 30 osds: 26 up, 30 in [root@ceph1 ceph]# sh -x ./ceph1-daemon-config.sh grace + '[' 1 '!=' 1 ']' + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.4.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.7.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.8.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{0..9}' + ceph --admin-daemon /var/run/ceph/ceph-osd.9.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, [root@ceph1 ceph]# [root@ceph2 ceph]# sh -x ./ceph2-daemon-config.sh grace + '[' 1 '!=' 1 ']' + for i in '{10..19}' + ceph --admin-daemon /var/run/ceph/ceph-osd.10.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.11.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.12.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.13.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + ceph --admin-daemon /var/run/ceph/ceph-osd.14.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + ceph --admin-daemon /var/run/ceph/ceph-osd.15.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + grep grace + ceph --admin-daemon /var/run/ceph/ceph-osd.16.asok config show mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + ceph --admin-daemon /var/run/ceph/ceph-osd.17.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + ceph --admin-daemon /var/run/ceph/ceph-osd.18.asok config show + grep grace mon_osd_adjust_heartbeat_grace: true, mds_beacon_grace: 15, osd_heartbeat_grace: 30, + for i in '{10..19}' + grep grace + ceph --admin-daemon
[ceph-users] Monitor/OSD report tuning question
Hello, I have a Cluster with 30 OSDs distributed over 3 Storage Servers connected by a 10G cluster link and connected to the Monitor over 1G. I still have a lot to understand with Ceph. Observing the cluster messages in a ceph -watch window I see a lot of osd flapping when it is sitting in a configured state and page/placement groups constantly changing status. The cluster was configured and came up to 1920 'active + clean' pages. The 3 status below outputs were issued over the course of about 2 to minutes. As you can see there is a lot of activity where I'm assuming the osd reporting is occasionally outside the heartbeat TO and various pages/placement groups get set to 'stale' and/or 'degrded' but still 'active'. There are osd's being marked 'out' in the osd map that I see in the watch window as reported of failures that very quickly report wrongly marked me down. I'm assuming I need to 'tune' some of the many TO values so that the osd's and page/placement groups all can report within the TO window. A quick look at the -admin-daemon config show cmd tells me that I might consider tuning some of these values: [root@ceph0 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.20.asok config show | grep report mon_osd_report_timeout: 900, mon_osd_min_down_reporters: 1, mon_osd_min_down_reports: 3, osd_mon_report_interval_max: 120, osd_mon_report_interval_min: 5, osd_pg_stat_report_interval_max: 500, [root@ceph0 ceph]# Which osd and/or mon settings should I increase/decrease to eliminate all this state flapping while the cluster sits configured with no data? Thanks, Bruce 014-08-23 13:16:15.564932 mon.0 [INF] osd.20 209.243.160.83:6800/20604 failed (65 reports from 20 peers after 23.380808 = grace 21.991016) 2014-08-23 13:16:15.565784 mon.0 [INF] osd.23 209.243.160.83:6810/29727 failed (79 reports from 20 peers after 23.675170 = grace 21.990903) 2014-08-23 13:16:15.566038 mon.0 [INF] osd.25 209.243.160.83:6808/31984 failed (65 reports from 20 peers after 23.380921 = grace 21.991016) 2014-08-23 13:16:15.566206 mon.0 [INF] osd.26 209.243.160.83:6811/518 failed (65 reports from 20 peers after 23.381043 = grace 21.991016) 2014-08-23 13:16:15.566372 mon.0 [INF] osd.27 209.243.160.83:6822/2511 failed (65 reports from 20 peers after 23.381195 = grace 21.991016) . . . 2014-08-23 13:17:09.547684 osd.20 [WRN] map e27128 wrongly marked me down 2014-08-23 13:17:10.826541 osd.23 [WRN] map e27130 wrongly marked me down 2014-08-23 13:20:09.615826 mon.0 [INF] osdmap e27134: 30 osds: 26 up, 30 in 2014-08-23 13:17:10.954121 osd.26 [WRN] map e27130 wrongly marked me down 2014-08-23 13:17:19.125177 osd.25 [WRN] map e27135 wrongly marked me down [root@ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_OK monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26636: 30 osds: 30 up, 30 in pgmap v56534: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26586 MB used, 109 TB / 109 TB avail 1920 active+clean [root@ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_WARN 160 pgs degraded; 83 pgs stale monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26641: 30 osds: 30 up, 30 in pgmap v56545: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26558 MB used, 109 TB / 109 TB avail 83 stale+active+clean 160 active+degraded 1677 active+clean [root@ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_OK monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26657: 30 osds: 30 up, 30 in pgmap v56584: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26610 MB used, 109 TB / 109 TB avail 1920 active+clean [root@ceph-mon01 ceph]# ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting
I have 3 storage servers each with 30 osds. Each osd has a journal that is a partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd (1 per storage server) cluster to bring up Ceph and figure out configuration etc. From: Dan Van Der Ster [mailto:daniel.vanders...@cern.ch] Sent: Thursday, August 21, 2014 1:17 AM To: Bruce McFarland Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting Hi, You only have one OSD? I've seen similar strange things in test pools having only one OSD - and I kinda explained it by assuming that OSDs need peers (other OSDs sharing the same PG) to behave correctly. Install a second OSD and see how it goes... Cheers, Dan On 21 Aug 2014, at 02:59, Bruce McFarland bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com wrote: I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple OSD's running on it. When I start the OSD using /etc/init.d/ceph start osd.0 I see the expected interaction between the OSD and the monitor authenticating keys etc and finally the OSD starts. Running watching the cluster with 'ceph -w' running on the monitor I never see the INFO messages I expect. There isn't a msg from osd.0 for the boot event and the expected INFO messages from osdmap and pgmap for the osd and it's pages being added to those maps. I only see the last time the monitor was booted and it wins the monitor election and reports monmap, pgmap, and mdsmap info. The firewalls are disabled with selinux==disabled and iptables turned off. All hosts can ssh w/o passwords into each other and I've verified traffic between hosts using tcpdump captures. Any ideas on what I'd need to add to ceph.conf or have overlooked would be greatly appreciated. Thanks, Bruce [root@ceph0 ceph]# /etc/init.d/ceph restart osd.0 === osd.0 === === osd.0 === Stopping Ceph osd.0 on ceph0...kill 15676...done === osd.0 === 2014-08-20 17:43:46.456592 7fa51a034700 1 -- :/0 messenger.start 2014-08-20 17:43:46.457363 7fa51a034700 1 -- :/1025971 -- 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7fa51402f9e0 con 0x7fa51402f570 2014-08-20 17:43:46.458229 7fa5189f0700 1 -- 209.243.160.83:0/1025971 learned my addr 209.243.160.83:0/1025971 2014-08-20 17:43:46.459664 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 1 mon_map v1 200+0+0 (3445960796 0 0) 0x7fa508000ab0 con 0x7fa51402f570 2014-08-20 17:43:46.459849 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570 2014-08-20 17:43:46.460180 7fa5135fe700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fa4fc0012d0 con 0x7fa51402f570 2014-08-20 17:43:46.461341 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570 2014-08-20 17:43:46.461514 7fa5135fe700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fa4fc001cf0 con 0x7fa51402f570 2014-08-20 17:43:46.462824 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570 2014-08-20 17:43:46.463011 7fa5135fe700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0 con 0x7fa51402f570 2014-08-20 17:43:46.463073 7fa5135fe700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x7fa4fc0025d0 con 0x7fa51402f570 2014-08-20 17:43:46.463329 7fa51a034700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa514030490 con 0x7fa51402f570 2014-08-20 17:43:46.463363 7fa51a034700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7fa5140309b0 con 0x7fa51402f570 2014-08-20 17:43:46.463564 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 5 mon_map v1 200+0+0 (3445960796 0 0) 0x7fa508001100 con 0x7fa51402f570 2014-08-20 17:43:46.463639 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 (540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570 2014-08-20 17:43:46.463707 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 7 auth_reply(proto 2 0 (0) Success) v1 194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570 2014-08-20 17:43:46.468877 7fa51a034700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- mon_command({prefix: get_command_descriptions} v 0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570 2014-08-20 17:43
Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting
Yes all of the ceph-osd processes are up and running. I perform a ceph-mon restart to see if that might trigger the osdmap update, but there is no INFO msg from the osdmap or the pgmap that I expect to when the osd's are started. All of the osd's and their hosts appear in the CRUSH map and in ceph.conf. Since I went through a bunch of issues getting the multiple osds/host setup and working I'm assuming that the monitor's tables might be hosed and am going to purgedata and reinstall the monitor and see if it builds the proper mappings. I've stopped all of the osd's and verified that there aren't any active ceph-osd processes. Then I'll follow the procedure for bringing online a new monitor to an existing cluster so that I use the proper fsid. 2014-08-20 17:20:24.648538 7f326ebfd700 0 monclient: hunting for new mon 2014-08-20 17:20:24.648857 7f327455f700 0 -- 209.243.160.84:0/1005462 209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3264020570).fault 2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01@0 won leader election with quorum 0 2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0} 2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail 2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Thursday, August 21, 2014 8:44 AM To: Bruce McFarland Cc: Dan Van Der Ster; ceph-us...@ceph.com Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting Are the OSD processes still alive? What's the osdmap output of ceph -w (which was not in the output you pasted)? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Aug 21, 2014 at 7:11 AM, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: I have 3 storage servers each with 30 osds. Each osd has a journal that is a partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd (1 per storage server) cluster to bring up Ceph and figure out configuration etc. From: Dan Van Der Ster [mailto:daniel.vanders...@cern.ch] Sent: Thursday, August 21, 2014 1:17 AM To: Bruce McFarland Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting Hi, You only have one OSD? I’ve seen similar strange things in test pools having only one OSD — and I kinda explained it by assuming that OSDs need peers (other OSDs sharing the same PG) to behave correctly. Install a second OSD and see how it goes... Cheers, Dan On 21 Aug 2014, at 02:59, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple OSD’s running on it. When I start the OSD using /etc/init.d/ceph start osd.0 I see the expected interaction between the OSD and the monitor authenticating keys etc and finally the OSD starts. Running watching the cluster with ‘ceph –w’ running on the monitor I never see the INFO messages I expect. There isn’t a msg from osd.0 for the boot event and the expected INFO messages from osdmap and pgmap for the osd and it’s pages being added to those maps. I only see the last time the monitor was booted and it wins the monitor election and reports monmap, pgmap, and mdsmap info. The firewalls are disabled with selinux==disabled and iptables turned off. All hosts can ssh w/o passwords into each other and I’ve verified traffic between hosts using tcpdump captures. Any ideas on what I’d need to add to ceph.conf or have overlooked would be greatly appreciated. Thanks, Bruce [root@ceph0 ceph]# /etc/init.d/ceph restart osd.0 === osd.0 === === osd.0 === Stopping Ceph osd.0 on ceph0...kill 15676...done === osd.0 === 2014-08-20 17:43:46.456592 7fa51a034700 1 -- :/0 messenger.start 2014-08-20 17:43:46.457363 7fa51a034700 1 -- :/1025971 -- 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7fa51402f9e0 con 0x7fa51402f570 2014-08-20 17:43:46.458229 7fa5189f0700 1 -- 209.243.160.83:0/1025971 learned my addr 209.243.160.83:0/1025971 2014-08-20 17:43:46.459664 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 1 mon_map v1 200+0+0 (3445960796 0 0) 0x7fa508000ab0 con 0x7fa51402f570 2014-08-20 17:43:46.459849 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570 2014-08-20 17:43:46.460180 7fa5135fe700 1 -- 209.243.160.83:0/1025971 -- 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fa4fc0012d0 con 0x7fa51402f570 2014-08-20 17:43:46.461341 7fa5135fe700 1 -- 209.243.160.83:0/1025971 == mon.0 209.243.160.84:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (409581826 0 0
[ceph-users] How to create multiple OSD's per host?
I've tried using ceph-deploy but it wants to assign the same id for each osd and I end up with a bunch of prepared ceph-disk's and only 1 active. If I use the manual short form method the activate step fails and there are no xfs mount points on the ceph-disks. If I use the manual long form it seems like I'm the closest to getting active ceph-disks/osd's but the monitor always shows the osds as down/in and the ceph-disks don't persist over a boot cycle. Is there a document anywhere that anyone knows of that explains a step by step process for bringing up multiple osd's per host - 1 hdd with ssd journal partition per osd? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to create multiple OSD's per host?
] checking OSD status... [ceph0][INFO ] Running command: ceph --cluster=ceph osd stat --format=json [ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use. From: Bruce McFarland Sent: Thursday, August 14, 2014 11:45 AM To: 'ceph-us...@ceph.com' Subject: How to create multiple OSD's per host? I've tried using ceph-deploy but it wants to assign the same id for each osd and I end up with a bunch of prepared ceph-disk's and only 1 active. If I use the manual short form method the activate step fails and there are no xfs mount points on the ceph-disks. If I use the manual long form it seems like I'm the closest to getting active ceph-disks/osd's but the monitor always shows the osds as down/in and the ceph-disks don't persist over a boot cycle. Is there a document anywhere that anyone knows of that explains a step by step process for bringing up multiple osd's per host - 1 hdd with ssd journal partition per osd? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to create multiple OSD's per host?
I’ll try the prepare/activiate commands again. I spent the least amount of time with them since activate _always_ failed for me. I’ll go back and check my logs, but probably because I was attempting to activate the same location I used in the ‘prepare’ instead of the partition 1 like you suggest (which is exactly how it is show in the documentation example). I seemed to get the closest to a working cluster using the ‘manual’ commands below. I could try changing the XFS mount point to be on a partition of the hdd I’m using for the osd. mkdir /var/lib/ceph/osd/ceph-$OSD mkfs -t xfs -f /dev/sd$i mount -t xfs /dev/sd$i /var/lib/ceph/osd/ceph-$OSD ceph-osd -i $OSD --mkfs --mkkey --osd-journal /dev/md0p$PART What I find most confusing using ceph-deploy with multiple osds on the same host is that when ‘ceph-deploy osd create [data] [journal]’ completes there is no osd directory for each osd under: [root@ceph0 ceph]# ll /var/lib/ceph/osd/ total 0 [root@ceph0 ceph]# From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jason King Sent: Thursday, August 14, 2014 8:13 PM To: ceph-us...@ceph.com Subject: Re: [ceph-users] How to create multiple OSD's per host? 2014-08-15 7:56 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com: This is an example of the output from ‘ceph-deploy osd create [data] [journal’ I’ve noticed that all of the ‘ceph-conf’ commands use the same parameter of ‘–name=osd.’ Everytime ceph-deploy is called. I end up with 30 osd’s – 29 in the prepared and 1 active according to the ‘ceph-disk list’ output and only 1 osd that has a xfs mount point. I’ve tried both with all data/journal devices on the same ceph-deploy command line and issuing 1 ceph-deploy cmd for each OSD data/journal pair (easier to script). + ceph-deploy osd create ceph0:/dev/sdl:/dev/md0p17 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.10): /usr/bin/ceph-deploy osd create ceph0:/dev/sdl:/dev/md0p17 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph0:/dev/sdl:/dev/md0p17 [ceph0][DEBUG ] connected to host: ceph0 [ceph0][DEBUG ] detect platform information from remote host [ceph0][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: CentOS 6.5 Final [ceph_deploy.osd][DEBUG ] Deploying osd to ceph0 [ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph0][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdl journal /dev/md0p17 activate True [ceph0][INFO ] Running command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /dev/sdl /dev/md0p17 [ceph0][DEBUG ] Information: Moved requested sector from 34 to 2048 in [ceph0][DEBUG ] order to align on 2048-sector boundaries. [ceph0][DEBUG ] The operation has completed successfully. [ceph0][DEBUG ] meta-data=/dev/sdl1 isize=2048 agcount=4, agsize=244188597 blks [ceph0][DEBUG ] = sectsz=512 attr=2, projid32bit=0 [ceph0][DEBUG ] data = bsize=4096 blocks=976754385, imaxpct=5 [ceph0][DEBUG ] = sunit=0 swidth=0 blks [ceph0][DEBUG ] naming =version 2 bsize=4096 ascii-ci=0 [ceph0][DEBUG ] log =internal log bsize=4096 blocks=476930, version=2 [ceph0][DEBUG ] = sectsz=512 sunit=0 blks, lazy-count=1 [ceph0][DEBUG ] realtime =none extsz=4096 blocks=0, rtextents=0 [ceph0][DEBUG ] The operation has completed successfully. [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size [ceph0][WARNIN] DEBUG:ceph-disk:Journal /dev/md0p17 is a partition [ceph0][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data [ceph0][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdl [ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:a96b4af4-11f4-4257-9476-64a6e4c93c28 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdl [ceph0][WARNIN] INFO:ceph
Re: [ceph-users] Firefly OSDs stuck in creating state forever
2014-08-04 09:57:37.144649 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001a90).fault 2014-08-04 09:58:07.145097 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 09:58:37.145491 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 2014-08-04 09:59:07.145776 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 09:59:37.146043 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 2014-08-04 10:00:07.146288 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 10:00:37.146543 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 209.243.160.35 - monitor 209.243.160.51 - osd.0 209.243.160.52 - osd.3 209.243.160.59 - osd.2 -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Sunday, August 03, 2014 11:15 AM To: Bruce McFarland Cc: Brian Rak; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever On Sun, 3 Aug 2014, Bruce McFarland wrote: Is there a recommended way to take every thing down and restart the process? I was considering starting completely from scratch ie OS reinstall and then using Ceph-deploy as before. If you're using ceph-deploy, then ceph-deploy purge HOST ceph-deploy purgedata HOST will do it. Then remove the ceph.* (config and keyring) files from the current directory. I've learned a lot and want to figure out a fool proof way I can document for others in our lab to bring up a cluster on new HW. I learn a lot more when I break things and have to figure out what went wrong so its a little frustrating, but I've found out a lot about verifying the configuration and debug options so far. My intent is to investigate rbd usage, perf, and configuration options. The endless loop I'm referring to is a constant stream of fault messages that I'm not yet familiar on how to interpret. I have let them run to see if the cluster recovers, but Ceph-mon always crashed. I'll look for the crash dump and save it since kdump should be enabled on the monitor box. Do you have one of the messages handy? I'm curious whether it is an OSD or a mon. Thanks! sage Thanks for the feedback. On Aug 3, 2014, at 8:30 AM, Sage Weil sw...@redhat.com wrote: Hi Bruce, On Sun, 3 Aug 2014, Bruce McFarland wrote: Yes I looked at tcpdump on each of the OSDs and saw communications between all 3 OSDs before I sent my first question to this list. When I disabled selinux on the one offending server based on your feedback (typically we have this disabled on lab systems that are only on the lab net) the 10 pages in my test pool all went to ?active+clean? almost immediately. Unfortunately the 3 default pools still remain in the creating states and are not health_ok. The OSDs all stayed UP/IN after the selinux change for the rest of the day until I made the mistake of creating a RBD image on demo-pool and it?s 10 ?active+clean? pages. I created the rbd, but when I attempted to look at it with ?rbd info? the cluster went into an endless loop trying to read a placement group and loop that I left running overnight. This morning What do you mean by went into an endless loop? ceph-mon was crashed again. I?ll probably start all over from scratch once again on Monday. Was there a stack dump in the mon log? It is possible that there is a bug with pool creation that surfaced by having selinux in place for so long, but otherwise this scenario doesn't make much sense to me. :/ Very interested in hearing more, and/or whether you can reproduce it. Thanks! sage I deleted ceph-mds and got rid of the ?laggy? comments from ?ceph health?. The ?official? online Ceph docs on that ?coming soon? and most references I could find were pre firefly so it was a little trail and error to figure out to use the pool number and not it?s name to get the removal to work. Same with ?ceph mds newfs? to get rid of ?laggy-ness? in the ?ceph health? output. [root@essperf3 Ceph]# ceph mds rm 0 mds.essperf3 mds gid 0 dne [root@essperf3 Ceph]# ceph health HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck inactive; 192 pgs stuck unclean mds essperf3 is laggy [root@essperf3 Ceph]# ceph mds newfs 1 0 --yes-i-really-mean-it new fs with metadata pool 1 and data pool 0
Re: [ceph-users] Firefly OSDs stuck in creating state forever
Is there a header or first line that appears in all ceph-mon stack dumps I can search for? The couple of ceph-mon stack dumps I've seen in web searches appear to all begin with ceph version 0.xx, but those are from over a year ago. Is that still the case with 0.81 firefly code? -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Monday, August 04, 2014 10:09 AM To: Bruce McFarland Cc: Brian Rak; ceph-users@lists.ceph.com Subject: RE: [ceph-users] Firefly OSDs stuck in creating state forever Okay, looks like the mon went down then. Was there a stack trace in the log after the daemon crashed? (Or did the daemon stay up but go unresponsive or something?) Thanks! sage On Mon, 4 Aug 2014, Bruce McFarland wrote: 2014-08-04 09:57:37.144649 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001a90).fault 2014-08-04 09:58:07.145097 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 09:58:37.145491 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 2014-08-04 09:59:07.145776 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 09:59:37.146043 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 2014-08-04 10:00:07.146288 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 10:00:37.146543 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault 209.243.160.35 - monitor 209.243.160.51 - osd.0 209.243.160.52 - osd.3 209.243.160.59 - osd.2 -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Sunday, August 03, 2014 11:15 AM To: Bruce McFarland Cc: Brian Rak; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever On Sun, 3 Aug 2014, Bruce McFarland wrote: Is there a recommended way to take every thing down and restart the process? I was considering starting completely from scratch ie OS reinstall and then using Ceph-deploy as before. If you're using ceph-deploy, then ceph-deploy purge HOST ceph-deploy purgedata HOST will do it. Then remove the ceph.* (config and keyring) files from the current directory. I've learned a lot and want to figure out a fool proof way I can document for others in our lab to bring up a cluster on new HW. I learn a lot more when I break things and have to figure out what went wrong so its a little frustrating, but I've found out a lot about verifying the configuration and debug options so far. My intent is to investigate rbd usage, perf, and configuration options. The endless loop I'm referring to is a constant stream of fault messages that I'm not yet familiar on how to interpret. I have let them run to see if the cluster recovers, but Ceph-mon always crashed. I'll look for the crash dump and save it since kdump should be enabled on the monitor box. Do you have one of the messages handy? I'm curious whether it is an OSD or a mon. Thanks! sage Thanks for the feedback. On Aug 3, 2014, at 8:30 AM, Sage Weil sw...@redhat.com wrote: Hi Bruce, On Sun, 3 Aug 2014, Bruce McFarland wrote: Yes I looked at tcpdump on each of the OSDs and saw communications between all 3 OSDs before I sent my first question to this list. When I disabled selinux on the one offending server based on your feedback (typically we have this disabled on lab systems that are only on the lab net) the 10 pages in my test pool all went to ?active+clean? almost immediately. Unfortunately the 3 default pools still remain in the creating states and are not health_ok. The OSDs all stayed UP/IN after the selinux change for the rest of the day until I made the mistake of creating a RBD image on demo-pool and it?s 10 ?active+clean? pages. I created the rbd, but when I attempted to look at it with ?rbd info? the cluster went into an endless loop trying to read a placement group and loop that I left running overnight. This morning What do you mean by went into an endless loop? ceph-mon was crashed again. I?ll probably start all over from scratch once again on Monday. Was there a stack dump in the mon log? It is possible that there is a bug with pool creation that surfaced by having selinux in place for so long, but otherwise this scenario doesn't make much sense to me. :/ Very interested
Re: [ceph-users] Firefly OSDs stuck in creating state forever
I couldn't fine the ceph-mon stack dump in the log all greps for 'ceph version' weren't followed by a stack trace. Executed ceph-deploy purge/purgedata on the monitor and osd's. NOTE: had to manually go to the individual osd shells and remove /var/lib/ceph after umount of the ceph/xfs device. Running purgedata from the monitor always failed for the osd's still running initially confused me, but still mounted wouldn't have. Executing 'ceph-deploy purge' from the monitor succeeded on all of the osd's. Ran ceph-deploy new/install/mon create/gatherkeys/osd create on the cluster (I haven't tried using create-initial yet for the monitor, but will use it on my next install). Modified ceph.conf - private cluster network for each osd - osd pool default pag/pgp - osd pool default size/default min size - osd min down reporters AND because it's not costing me anything (that I know of yet) and seems to be the first thing requested on problems: - debug osd = 20 - debug ms = 1 Started ceph-osd on all 3 OSD servers and restarted Ceph-mon (service ceph restart) on the Monitor. As experienced and reported by Brian my cluster came up in the HEALTH_OK state immediately with all 192 pages in the default pools 'active+clean'. It took a week or 2 longer than I would have liked, but I am quite comfortable with install/reinstall and how to inspect all components of the system state. XFS is mounted on each osd data device, using 'ceph-disk list' get the partition # for the journal on the SSD which can then be check/dump the partition with sgdisk and observe 'ceph journal'. [root@essperf3 Ceph]# ceph -s cluster 32c48975-bb57-47f6-8138-e152452e3bbe health HEALTH_OK monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3 osdmap e8: 3 osds: 3 up, 3 in pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects 10106 MB used, 1148 GB / 1158 GB avail 192 active+clean [root@essperf3 Ceph]# ceph osd tree # idweight type name up/down reweight -1 1.13root default -2 0.45host ess51 0 0.45osd.0 up 1 -3 0.23host ess52 1 0.23osd.1 up 1 -4 0.45host ess59 2 0.45osd.2 up 1 [root@essperf3 Ceph]# I'm now moving on to creating RBD image(s) and looking at 'rbd bench-write'. I have some quick questions: - Are there any other benchmarks in wide use for Ceph clusters? - Our next lab deployment is going to be more real world and involve many (~24HDDs) HDDs per OSD chassis (2 or 2 chassis). What is the general recommendation on the number of HDDs/OSD? 1 drive/osd? Where the drive can be a LVM or MD virtual drive spanning multiple HDD's (SW RAID 0). - Partitioning of the journal SSDs for multiple osd's: We can use 1 SSD/OSD for journal and have 4 HDD RAID 0 devices (~13TB/osd) or smaller osd's and multiple journals on each SSD. What is the recommended configuration? (This will most likely be further investigated as we move forward with benchmarking, but would like the RH/Ceph recommended Best Practices). -As long as I maintain 1GB Ram/1TB rotational storage we can have many osd's/physical chassis? Limits? Thank you very much for all of your help. Bruce -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Monday, August 04, 2014 12:25 PM To: Bruce McFarland Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever On Mon, 4 Aug 2014, Bruce McFarland wrote: Is there a header or first line that appears in all ceph-mon stack dumps I can search for? The couple of ceph-mon stack dumps I've seen in web searches appear to all begin with ceph version 0.xx, but those are from over a year ago. Is that still the case with 0.81 firefly code? Yep! Here's a recentish dump: http://tracker.ceph.com/issues/8880 sage -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Monday, August 04, 2014 10:09 AM To: Bruce McFarland Cc: Brian Rak; ceph-users@lists.ceph.com Subject: RE: [ceph-users] Firefly OSDs stuck in creating state forever Okay, looks like the mon went down then. Was there a stack trace in the log after the daemon crashed? (Or did the daemon stay up but go unresponsive or something?) Thanks! sage On Mon, 4 Aug 2014, Bruce McFarland wrote: 2014-08-04 09:57:37.144649 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001a90).fault 2014-08-04 09:58:07.145097 7f4215ac3700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault 2014-08-04 09:58:37.145491 7f42171c8700 0 -- 209.243.160.35:0/1032499 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0
[ceph-users] OSD daemon code in /var/lib/ceph/osd/ceph-2/ dissapears after creating pool/rbd -
This is going to sound odd and if I hadn't been issuing all commands on the monitor I would swear I issued 'rm -rf' from the shell of the osd in the /var/lib/osd/ceph-s/ directory. After creating the pool/rbd and getting an error from 'rbd info' I saw an osd down/out so I went to it's shell and the ceph-osd daemon code is gone. I'll assume I erased it, but how do I recover this cluster without doing a purge/purgedata reinstall? I bought up a new cluster. All pages are 'active+clean' and all 3 OSD's are UP/IN. [root@essperf3 Ceph]# ceph -s cluster 32c48975-bb57-47f6-8138-e152452e3bbe health HEALTH_OK monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3 osdmap e8: 3 osds: 3 up, 3 in pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects 10106 MB used, 1148 GB / 1158 GB avail 192 active+clean [root@essperf3 Ceph]# ceph osd tree # id weight type name up/down reweight -11.13root default -20.45host ess51 0 0.45osd.0 up 1 -30.23host ess52 1 0.23osd.1 up 1 -40.45host ess59 2 0.45osd.2 up 1 [root@essperf3 Ceph]# Next created a test pool and a 1GB rbd and listed it [root@essperf3 Ceph]# ceph osd pool create testpool 75 75 pool 'testpool' created [root@essperf3 Ceph]# ceph osd lspools 0 data,1 metadata,2 rbd,3 testpool, [root@essperf3 Ceph]# rbd create testimage --size 1024 --pool testpool [root@essperf3 Ceph]# rbd ls testpool testimage [root@essperf3 Ceph]# When I look at the 'info' output I start seeing problems. [root@essperf3 Ceph]# rbd --image testimage info rbd: error opening image testimage: (2) No such file or directory2014-08-04 18:39:33.602263 7fc4b9e80760 -1 librbd::ImageCtx: error finding header: (2) No such file or directory [root@essperf3 Ceph]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 693G 683G 10073M 1.42 POOLS: NAME ID USED %USED OBJECTS data 0 00 0 metadata 1 00 0 rbd 2 00 0 testpool 3 137 0 2 [root@essperf3 Ceph]# ceph -s cluster 32c48975-bb57-47f6-8138-e152452e3bbe health HEALTH_WARN 267 pgs degraded; 100 pgs stuck unclean; recovery 2/6 objects degraded (33.333%) monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3 osdmap e21: 3 osds: 2 up, 2 in pgmap v48: 267 pgs, 4 pools, 137 bytes data, 2 objects 10073 MB used, 683 GB / 693 GB avail 2/6 objects degraded (33.333%) 267 active+degraded client io 17 B/s rd, 0 op/s [root@essperf3 Ceph]# Check to see which OSD is down: [root@essperf3 Ceph]# ceph osd tree # id weight type name up/down reweight -11.13root default -20.45host ess51 0 0.45osd.0 up 1 -30.23host ess52 1 0.23osd.1 up 1 -40.45host ess59 2 0.45osd.2 down0 [root@essperf3 Ceph]# Then go to the shell on ess59: and restart the osd: (This is where it gets rather odd) My ceph.conf has debug osd = 20 debug ms = 1 and I expect to see output from the /etc/init.d/ceph restart osd and I see nothing. With a little digging I see that the /var/lib/ceph/osd/ceph-2/ directory is EMPTY. There is no ceph-osd daemon. It's almost like I did a 'rm -rf ' on that directory from the shell of ess59/osd.2 yet all commands have been executed on the monitor. [root@ess59 ceph]# ip addr | grep .59 inet 10.10.40.59/24 brd 10.10.40.255 scope global em1 inet6 fe80::92b1:1cff:fe18:659f/64 scope link inet 209.243.160.59/24 brd 209.243.160.255 scope global em2 inet 10.10.50.59/24 brd 10.10.50.255 scope global p6p2 [root@ess59 ceph]# ll /var/lib/ceph/osd/ total 4 drwxr-xr-x 2 root root 4096 Aug 4 14:46 ceph-2 [root@ess59 ceph]# ll /var/lib/ceph/ total 24 drwxr-xr-x 2 root root 4096 Jul 29 18:36 bootstrap-mds drwxr-xr-x 2 root root 4096 Aug 4 14:23 bootstrap-osd drwxr-xr-x 2 root root 4096 Jul 29 18:36 mds drwxr-xr-x 2 root root 4096 Jul 29 18:36 mon drwxr-xr-x 3 root root 4096 Aug 4 14:46 osd drwxr-xr-x 2 root root 4096 Aug 4 18:14 tmp [root@ess59 ceph]# ll /var/lib/ceph/osd/ceph-2/ total 0 [root@ess59 ceph]# Looking at the monitor logs I see osd.2 boot and even see where osd.2 leaves the cluster,
Re: [ceph-users] Firefly OSDs stuck in creating state forever
MDS: I assumed that I'd need to bring up a ceph-mds for my cluster at initial bringup. We also intended to modify the CRUSH map such that it's pool is resident to SSD(s). It is one of the areas of the online docs there doesn't seem to be a lot of info on and I haven't spent a lot of time researching. I'll stop it. OSD connectivity: The connectivity is good for both 1GE and 10GE. I thought moving to 10GE with nothing else on that net might help with group placement etc and bring up the pages quicker. I've checked 'tcpdump' output on all boxes. Firewall: Thanks for that one - it's the basic I over looked in my ceph learning curve. One of the OSDs had selinux=enforcing - all others were disabled. Changing that box and the 10 pages in my demo-pool (kept page count very small for sanity) are now 'active+clean'. The pages for the default pools - data, metadata, rbd - are still stuck in creating+peering or creating+incomplete. I did have to use manually set 'osd pool default min size = 1' from it's default of 2 for these 3 pools to eliminate a bunch of warnings in the 'ceph health detail' output. I'm adding the [mon] setting you suggested below and stopping ceph-mds and bringing everything up now. [root@essperf3 Ceph]# ceph -s cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73 health HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck inactive; 192 pgs stuck unclean; 28 requests are blocked 32 sec; nodown,noscrub flag(s) set monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3 mdsmap e43: 1/1/1 up {0=essperf3=up:creating} osdmap e752: 3 osds: 3 up, 3 in flags nodown,noscrub pgmap v1483: 202 pgs, 4 pools, 0 bytes data, 0 objects 134 MB used, 1158 GB / 1158 GB avail 96 creating+peering 10 active+clean 96 creating+incomplete [root@essperf3 Ceph]# From: Brian Rak [mailto:b...@gameservers.com] Sent: Friday, August 01, 2014 2:54 PM To: Bruce McFarland; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever Why do you have a MDS active? I'd suggest getting rid of that at least until you have everything else working. I see you've set nodown on the OSDs, did you have problems with the OSDs flapping? Do the OSDs have broken connectivity between themselves? Do you have some kind of firewall interfering here? I've seen odd issues when the OSDs have broken private networking, you'll get one OSD marking all the other ones down. Adding this to my config helped: [mon] mon osd min down reporters = 2 On 8/1/2014 5:41 PM, Bruce McFarland wrote: Hello, I've run out of ideas and assume I've overlooked something very basic. I've created 2 ceph clusters in the last 2 weeks with different OSD HW and private network fabrics - 1GE and 10GE. I have never been able to get the OSDs to come up to the 'active+clean' state. I have followed your online documentation and at this point the only thing I don't think I've done is modifying the CRUSH map (although I have been looking into that). These are new clusters with no data and only 1 HDD and 1 SSD per OSD (24 2.5Ghz cores with 64GB RAM). Since the disks are being recycled is there something I need to flag to let ceph just create it's mappings, but not scrub for data compatibility? I've tried setting the noscrub flag to no effect. I also have constant OSD flapping. I've set nodown, but assume that is just masking a problem that still occurring. Besides the lack of ever reaching 'active+clean' state ceph-mon always crashes after leaving it running overnight. The OSDs all eventually fill /root with with ceph logs so I regularly have to bring everything down Delete logs and restart. I have all sorts of output from the ceph.conf; osd boot ouput with 'debug osd -= 20' and 'debug ms = 1'; ceph -w output; and pretty much all of the debug/monitoring suggestions from the online docs and 2 weeks of google searches from online references in blogs, mailing lists etc. [root@essperf3 Ceph]# ceph -v ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) [root@essperf3 Ceph]# ceph -s cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73 health HEALTH_WARN 96 pgs incomplete; 106 pgs peering; 202 pgs stuck inactive; 202 pgs stuck unclean; nodown,noscrub flag(s) set monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3 mdsmap e43: 1/1/1 up {0=essperf3=up:creating} osdmap e752: 3 osds: 3 up, 3 in flags nodown,noscrub pgmap v1476: 202 pgs, 4 pools, 0 bytes data, 0 objects 134 MB used, 1158 GB / 1158 GB avail 106 creating+peering 96 creating+incomplete [root@essperf3 Ceph]# Suggestions? Thanks, Bruce ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com