from:"Bruce McFarland"

Re: [ceph-users] Rbd map command doesn't work

2016-08-16 Thread Bruce McFarland

EP,
Try setting the crush map to use legacy tunables. I've had the same issue with 
the"feature mismatch" errors when using krbd that didn't support format 2 and 
running jewel 10.2.2 on the storage nodes. 

From the command line:
ceph osd crush tunables legacy

Bruce

> On Aug 16, 2016, at 4:21 PM, Somnath Roy  wrote:
> 
> This is usual feature mismatch stuff , the inbox krbd you are using is not 
> supporting Jewel.
> Try googling with the error and I am sure you will get lot of prior 
> discussion around that..
>  
> From: EP Komarla [mailto:ep.koma...@flextronics.com] 
> Sent: Tuesday, August 16, 2016 4:15 PM
> To: Somnath Roy; ceph-users@lists.ceph.com
> Subject: RE: Rbd map command doesn't work
>  
> Somnath,
>  
> Thanks.
>  
> I am trying your suggestion.  See the commands below.  Still it doesn’t seem 
> to go.
>  
> I am missing something here…
>  
> Thanks,
>  
> - epk
>  
> =
> [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G --image-format 1
> rbd: image format 1 is deprecated
> [test@ep-c2-client-01 ~]$ rbd map rbd/test1
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (13) Permission denied
> [test@ep-c2-client-01 ~]$ sudo rbd map rbd/test1
> ^C[test@ep-c2-client-01 ~]$
> [test@ep-c2-client-01 ~]$
> [test@ep-c2-client-01 ~]$
> [test@ep-c2-client-01 ~]$
> [test@ep-c2-client-01 ~]$ dmesg|tail -20
> [1201954.248195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1201954.253365] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1201964.274082] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1201964.281195] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1201974.298195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1201974.305300] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204128.917562] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204128.924173] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204138.956737] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204138.964011] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204148.980701] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204148.987892] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204159.004939] libceph: mon2 172.20.60.53:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204159.012136] libceph: mon2 172.20.60.53:6789 missing required protocol 
> features
> [1204169.028802] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204169.035992] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204476.803192] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
> [1204476.810578] libceph: mon0 172.20.60.51:6789 missing required protocol 
> features
> [1204486.821279] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 
> 102b84a842a42 < server's 40102b84a842a42, missing 400
>  
>  
>  
> From: Somnath Roy [mailto:somnath@sandisk.com] 
> Sent: Tuesday, August 16, 2016 3:59 PM
> To: EP Komarla ; ceph-users@lists.ceph.com
> Subject: RE: Rbd map command doesn't work
>  
> The default format of rbd image in jewel is 2 along with bunch of other 
> deatures enabled , so, you have following two option:
>  
> 1. create a format 1 image –image-format 1
>  
> 2. Or, do this in the ceph.conf file [client] or [global] before creating 
> image..
> rbd_default_features = 3
>  
> Thanks & Regards
> Somnath
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP 
> Komarla
> Sent: Tuesday, August 16, 2016 2:52 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Rbd map command doesn't work
>  
> All,
>  
> I am creating an image and mapping it.  The below commands used to work in 
> Hammer, now the same is not working in Jewel.  I see the message about some 
> feature set mismatch – what features are we talking about here?  Is this a 
> known issue in Jewel with a workaround?
>  
> Thanks,
>  
> - epk
>  
> =
>  
>  
> [test@ep-c2-client-01 ~]$  rbd create rbd/test1 --size 1G
> [test@ep-c2-client-01 ~]$ rbd info test1
> rbd image 'test1':
> size 1024 MB in 256 objects
> order 22

Re: [ceph-users] rbd readahead settings

2016-08-15 Thread Bruce McFarland

You'll need to set it on the monitor too. 

Sent from my iPhone

> On Aug 15, 2016, at 2:24 PM, EP Komarla  wrote:
> 
> Team,
>  
> I am trying to configure the rbd readahead value?  Before I increase this 
> value, I am trying to find out the current value that is set to. How do I 
> know the values of these parameters?
>  
> rbd readahead max bytes
> rbd readahead trigger requests
> rbd readahead disable after bytes
>  
> Thanks,
>  
> - epk
>  
> EP KOMARLA,
> 
> Emal: ep.koma...@flextronics.com
> Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA
> Phone: 408-674-6090 (mobile)
>  
> 
> Legal Disclaimer:
> The information contained in this message may be privileged and confidential. 
> It is intended to be read only by the individual or entity to whom it is 
> addressed or by their designee. If the reader of this message is not the 
> intended recipient, you are on notice that any distribution of this message, 
> in any form, is strictly prohibited. If you have received this message in 
> error, please immediately notify the sender and delete or destroy any copy of 
> this message!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] systemd-udevd: failed to execute '/usr/bin/ceph-rbdnamer'

2015-08-05 Thread Bruce McFarland

I've been asked to look at the performance of RHEL 7.1/RHCS 1.3. I keep running 
into these errors on 1 of my RHEL 7.1 client systems. The rbd devices are still 
present, but ceph-rbdname Is not in /usr/bin, but it is in trusty /usr/bin. 
Much like the rbdmap init script that ships with RHEL 7.1, but depends on 
functions from trusty /lib/lsb/init-functions (create a user defined systemd 
init function to map the images if rbd devices required at boot time) is this 
another example of RHEL 7.1 being not quite ready from prime time as a Ceph 
client? Can I ignore these messages? Or should I just return to my trusty 
client of choice and advise that to others? I'm going to want to know if these 
ceph-rbdnamer error paths are adding overhead to my performance testing on RHEL 
7.1 clients so I will most like re-run everything with trusty clients to see 
for myself, but I'm curious what others have seen with RHEL/Centos/Fedora 
systemd Ceph clients.

There are 3 10TB rbd's in the cluster and 3 clients.
Thanks.

14:22:43.018 Message from slave hd2_client0-0:
14:22:43.018 New messages found on /var/adm/messages. Do they belong to you?
14:22:43.018 /var/log/messages: Aug  5 15:22:39 essperf8 systemd-udevd: failed 
to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd2': No such file 
or directory
14:22:43.018 /var/log/messages: Aug  5 15:22:39 essperf8 systemd-udevd: failed 
to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd1': No such file 
or directory
14:22:43.018 /var/log/messages: Aug  5 15:22:39 essperf8 systemd-udevd: failed 
to execute '/usr/bin/ceph-rbdnamer' '/usr/bin/ceph-rbdnamer rbd1': No such file 
or directory
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-17 Thread Bruce McFarland

Yes the rbd's are not remapped at system boot time. I haven't run into a VM or 
system hang because this since I ran into it as part of investigating using 
RHEL 7.1 as a client distro. Yes remapping the rbd's in a startup script worked 
around the issue. 

 -Original Message-
 From: Steve Dainard [mailto:sdain...@spd1.com]
 Sent: Friday, July 17, 2015 1:59 PM
 To: Bruce McFarland
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service
 start warnings?

 Other than those errors, do you find RBD's will not be unmapped on system
 restart/shutdown on a machine using systemd? Leaving the system hanging
 without network connections trying to unmap RBD's?

 That's been my experience thus far, so I wrote an (overly simple) systemd
 file to handle this on a per RBD basis.

 On Tue, Jul 14, 2015 at 1:15 PM, Bruce McFarland
 bruce.mcfarl...@taec.toshiba.com wrote:
  When starting the rbdmap.service to provide map/unmap of rbd devices
  across boot/shutdown cycles the /etc/init.d/rbdmap includes
  /lib/lsb/init-functions. This is not a problem except that the rbdmap
  script is making calls to the log_daemon_* log_progress_*
  log_actiion_* functions that are included in Ubuntu 14.04 distro's,
  but are not in the RHEL 7.1/RHCS
  1.3 distro. Are there any recommended workaround for boot time startup
  in RHEL/Centos 7.1 clients?

  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-14 Thread Bruce McFarland

When starting the rbdmap.service to provide map/unmap of rbd devices across 
boot/shutdown cycles the /etc/init.d/rbdmap includes /lib/lsb/init-functions. 
This is not a problem except that the rbdmap script is making calls to the 
log_daemon_* log_progress_* log_actiion_* functions that are included in Ubuntu 
14.04 distro's, but are not in the RHEL 7.1/RHCS 1.3 distro. Are there any 
recommended workaround for boot time startup in RHEL/Centos 7.1 clients?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Performance test matrix?

2015-07-08 Thread Bruce McFarland

Is there a classic ceph cluster test matrix?? I'm wondering what's done for 
releases ie sector sizes 4k,128k,1M,4M? sequential, random, 80/20 mix? # 
concurrent IOs? I've seen some spreadsheets in the past, but can't find them.

Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance test matrix?

2015-07-08 Thread Bruce McFarland

Mark,
Thank you very much. We're focusing on block performance currently. All of my 
object based testing has been done with rados bench so I've yet to do anything 
through RGW, but will need to be doing that soon. I also want to revisit 
COSBench. I exercised it ~ a year ago and then decided to focus on blocks so I 
never really got familiar with it.  

Bruce

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Mark Nelson
 Sent: Wednesday, July 08, 2015 1:00 PM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Performance test matrix?
 
 Hi Bruce,
 
 There's a google doc that previously was public but when it got moved to
 RH's google drive from Inktanks it got made private instead.  It doesn't
 appear that I can make it public now.
 
 You can see the configuration in the CBT yaml files though up on github:
 
 https://github.com/ceph/ceph-tools/tree/master/regression/burnupi-
 available
 
 As is these tests were running over 24 hours so we had to cut them back
 when we were testing previously.  Once we have new high performance
 nodes in the community lab I'm hoping we'll revise this and start getting
 good nightly tests in.  One thing obviously missing is RGW tests.  support for
 civetweb+rgw was added to CBT a couple of months ago and Intel added a
 module for running cosbench tests, but so far no one has had time to really
 beta test it.  Docs are here:
 
 https://github.com/ceph/cbt/blob/master/docs/cosbench.README
 
 Mark
 
 On 07/08/2015 02:55 PM, Bruce McFarland wrote:
  Is there a classic ceph cluster test matrix?? I'm wondering what's
  done for releases ie sector sizes 4k,128k,1M,4M? sequential, random,
  80/20 mix? # concurrent IOs? I've seen some spreadsheets in the past,
  but can't find them.
 
  Thanks,
 
  Bruce
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2

2015-06-29 Thread Bruce McFarland

Using the manual method of creating an OSD on RHEL 7.1 with Ceph 94.2 turns 
up an issue with the ondisk fsid of the journal device. From a quick web search 
I've found reference to this exact same issue from earlier this year. Is there 
a version of Ceph that works with RHEL 7.1??? 

[root@ceph0 ceph]# ceph-disk-prepare --cluster ceph --cluster-uuid 
b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs /dev/sdc /dev/sdb1
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
The operation has completed successfully.
partx: /dev/sdc: error adding partition 1
meta-data=/dev/sdc1  isize=2048   agcount=4, agsize=244188597 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
data =   bsize=4096   blocks=976754385, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
log  =internal log   bsize=4096   blocks=476930, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.
partx: /dev/sdc: error adding partition 1
[root@ceph0 ceph]# mkdir /var/lib/ceph/osd/ceph-0
[root@ceph0 ceph]# ll /var/lib/ceph/osd/
total 0
drwxr-xr-x. 2 root root 6 Jun 29 12:01 ceph-0
[root@ceph0 ceph]# mount -t xfs /dev/sdc1 /var/lib/ceph/osd/ceph-0/
[root@ceph0 ceph]# mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
devtmpfs on /dev type devtmpfs 
(rw,nosuid,seclabel,size=57648336k,nr_inodes=14412084,mode=755)
securityfs on /sys/kernel/security type securityfs 
(rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts 
(rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup 
(rw,nosuid,nodev,noexec,relatime,hugetlb)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/rhel_ceph0-root on / type xfs 
(rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs 
(rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
/dev/mapper/rhel_ceph0-home on /home type xfs 
(rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs 
(rw,relatime,seclabel,attr2,inode64,noquota)
[root@ceph0 ceph]# ceph-osd -i=0 --mkfs 
2015-06-29 12:02:47.702808 7f2fb4625880 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-06-29 12:02:47.702851 7f2fb4625880 -1 journal check: ondisk fsid 
---- doesn't match expected 
7e792d5e-a5c6-40cd-a361-0457875ea92c, invalid (someone else's?) journal
2015-06-29 12:02:47.702876 7f2fb4625880 -1 filestore(/var/lib/ceph/osd/ceph-0) 
mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) 
Invalid argument
2015-06-29 12:02:47.702890 7f2fb4625880 -1 OSD::mkfs: ObjectStore::mkfs failed 
with error -22
2015-06-29 12:02:47.702928 7f2fb4625880 -1  ** ERROR: error creating empty 
object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument
[root@ceph0 ceph]#

 -Original Message-
 From: Bruce McFarland
 Sent: Monday, June 29, 2015 11:39 AM
 To: 'Loic Dachary

Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2

2015-06-29 Thread Bruce McFarland

It doesn't appear to be related to using wwn's for the drive id. The verbose 
output shows ceph converting from wwn to sd letter. I ran with verbose on and 
used sd letters for the data drive and the journal and get the same failures. 
I'm attempting to create OSD's manually now. 

[root@ceph0 ceph]# ceph-disk -v prepare --cluster ceph --cluster-uuid 
b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs --zap-disk /dev/sdc /dev/sdb1
DEBUG:ceph-disk:Zapping partition table on /dev/sdc
INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/sdc
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery  transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!


Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.

GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/sdc
Creating new GPT entries.
The operation has completed successfully.
INFO:ceph-disk:calling partx on zapped device /dev/sdc
INFO:ceph-disk:re-reading known partitions will display errors
INFO:ceph-disk:Running command: /usr/sbin/partx -d /dev/sdc
partx: specified range 1:0 does not make sense
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=osd_journal_size
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_cryptsetup_parameters
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_dmcrypt_key_size
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. 
--lookup osd_dmcrypt_type
DEBUG:ceph-disk:Journal is file /dev/sdb1
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
DEBUG:ceph-disk:Creating osd partition on /dev/sdc
INFO:ceph-disk:Running command: /usr/sbin/sgdisk --largest-new=1 
--change-name=1:ceph data 
--partition-guid=1:6d05612e-5cc0-422c-9228-4e53ee0f27ac 
--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdc
The operation has completed successfully.
INFO:ceph-disk:calling partx on created device /dev/sdc
INFO:ceph-disk:re-reading known partitions will display errors
INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/sdc
partx: /dev/sdc: error adding partition 1
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
DEBUG:ceph-disk:Creating xfs fs on /dev/sdc1
INFO:ceph-disk:Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -- 
/dev/sdc1
meta-data=/dev/sdc1  isize=2048   agcount=4, agsize=244188597 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
data =   bsize=4096   blocks=976754385, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
log  =internal log   bsize=4096   blocks=476930, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
DEBUG:ceph-disk:Mounting /dev/sdc1 on /var/lib/ceph/tmp/mnt.DQ8nOj with options 
noatime,inode64
INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- 
/dev/sdc1 /var/lib/ceph/tmp/mnt.DQ8nOj
DEBUG:ceph-disk:Preparing osd data dir /var/lib/ceph/tmp/mnt.DQ8nOj
DEBUG:ceph-disk:Creating symlink /var/lib/ceph/tmp/mnt.DQ8nOj/journal - 
/dev/sdb1
DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.DQ8nOj
INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.DQ8nOj
INFO:ceph-disk:Running command: /usr/sbin/sgdisk 
--typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdc
The operation has completed successfully.
INFO:ceph-disk:calling partx on prepared device /dev/sdc
INFO:ceph-disk:re-reading known partitions will display errors
INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/sdc
partx: /dev/sdc: error adding partition 1
[root@ceph0 ceph]#

 -Original Message-
 From: Loic Dachary [mailto:l...@dachary.org]
 Sent: Saturday, June 27, 2015 1:08 AM
 To: Bruce McFarland; ceph

Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver 0.94.2

2015-06-29 Thread Bruce McFarland

Do these issues occur in Centos 7 also?

 -Original Message-
 From: Bruce McFarland
 Sent: Monday, June 29, 2015 12:06 PM
 To: 'Loic Dachary'; 'ceph-users@lists.ceph.com'
 Subject: RE: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD with ver
 0.94.2
 
 Using the manual method of creating an OSD on RHEL 7.1 with Ceph 94.2
 turns up an issue with the ondisk fsid of the journal device. From a quick
 web search I've found reference to this exact same issue from earlier this
 year. Is there a version of Ceph that works with RHEL 7.1???
 
 [root@ceph0 ceph]# ceph-disk-prepare --cluster ceph --cluster-uuid
 b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs /dev/sdc /dev/sdb1
 WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the
 same device as the osd data The operation has completed successfully.
 partx: /dev/sdc: error adding partition 1
 meta-data=/dev/sdc1  isize=2048   agcount=4, agsize=244188597
 blks
  =   sectsz=512   attr=2, projid32bit=1
  =   crc=0finobt=0
 data =   bsize=4096   blocks=976754385, imaxpct=5
  =   sunit=0  swidth=0 blks
 naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
 log  =internal log   bsize=4096   blocks=476930, version=2
  =   sectsz=512   sunit=0 blks, lazy-count=1
 realtime =none   extsz=4096   blocks=0, rtextents=0
 The operation has completed successfully.
 partx: /dev/sdc: error adding partition 1
 [root@ceph0 ceph]# mkdir /var/lib/ceph/osd/ceph-0
 [root@ceph0 ceph]# ll /var/lib/ceph/osd/ total 0 drwxr-xr-x. 2 root root 6
 Jun 29 12:01 ceph-0
 [root@ceph0 ceph]# mount -t xfs /dev/sdc1 /var/lib/ceph/osd/ceph-0/
 [root@ceph0 ceph]# mount
 proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type
 sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
 devtmpfs on /dev type devtmpfs
 (rw,nosuid,seclabel,size=57648336k,nr_inodes=14412084,mode=755)
 securityfs on /sys/kernel/security type securityfs
 (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs
 (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts
 (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
 tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
 tmpfs on /sys/fs/cgroup type tmpfs
 (rw,nosuid,nodev,noexec,seclabel,mode=755)
 cgroup on /sys/fs/cgroup/systemd type cgroup
 (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/sys
 temd-cgroups-agent,name=systemd)
 pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
 cgroup on /sys/fs/cgroup/cpuset type cgroup
 (rw,nosuid,nodev,noexec,relatime,cpuset)
 cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
 (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
 cgroup on /sys/fs/cgroup/memory type cgroup
 (rw,nosuid,nodev,noexec,relatime,memory)
 cgroup on /sys/fs/cgroup/devices type cgroup
 (rw,nosuid,nodev,noexec,relatime,devices)
 cgroup on /sys/fs/cgroup/freezer type cgroup
 (rw,nosuid,nodev,noexec,relatime,freezer)
 cgroup on /sys/fs/cgroup/net_cls type cgroup
 (rw,nosuid,nodev,noexec,relatime,net_cls)
 cgroup on /sys/fs/cgroup/blkio type cgroup
 (rw,nosuid,nodev,noexec,relatime,blkio)
 cgroup on /sys/fs/cgroup/perf_event type cgroup
 (rw,nosuid,nodev,noexec,relatime,perf_event)
 cgroup on /sys/fs/cgroup/hugetlb type cgroup
 (rw,nosuid,nodev,noexec,relatime,hugetlb)
 configfs on /sys/kernel/config type configfs (rw,relatime)
 /dev/mapper/rhel_ceph0-root on / type xfs
 (rw,relatime,seclabel,attr2,inode64,noquota)
 selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
 systemd-1 on /proc/sys/fs/binfmt_misc type autofs
 (rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
 debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on
 /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on
 /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
 /dev/mapper/rhel_ceph0-home on /home type xfs
 (rw,relatime,seclabel,attr2,inode64,noquota)
 /dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
 binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
 fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
 /dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs
 (rw,relatime,seclabel,attr2,inode64,noquota)
 [root@ceph0 ceph]# ceph-osd -i=0 --mkfs
 2015-06-29 12:02:47.702808 7f2fb4625880 -1 journal FileJournal::_open:
 disabling aio for non-block journal.  Use journal_force_aio to force use of 
 aio
 anyway
 2015-06-29 12:02:47.702851 7f2fb4625880 -1 journal check: ondisk fsid
 ---- doesn't match expected
 7e792d5e-a5c6-40cd-a361-0457875ea92c, invalid (someone else's?) journal
 2015-06-29 12:02:47.702876 7f2fb4625880 -1
 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on
 /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
 2015-06-29 12:02:47.702890

Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD

2015-06-26 Thread Bruce McFarland

Loic,
Thank you very much for the partprobe workaround. I rebuilt the cluster using 
94.2. 

I've created partitions on the journal SSDs with parted and then use ceph-disk 
prepare as below. I'm not seeing all of the disks with the tmp mounts when I 
check 'mount' but I also don't see any of the mount directory mount points at 
/var/lib/ceph/osd. I'm see the following output from prepare. When I attempt to 
'activate' it errors out saying the devices don't exist.

ceph-disk prepare --cluster ceph --cluster-uuid 
b2c2e866-ab61-4f80-b116-20fa2ea2ca94 --fs-type xfs --zap-disk 
/dev/disk/by-id/wwn-0x53959bd02f56 
/dev/disk/by-id/wwn-0x500080d91010024b-part1
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.


Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.

GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
partx: specified range 1:0 does not make sense
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
WARNING:ceph-disk:Journal /dev/disk/by-id/wwn-0x500080d91010024b-part1 was not 
prepared with ceph-disk. Symlinking directly.
The operation has completed successfully.
partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1
meta-data=/dev/sdw1  isize=2048   agcount=4, agsize=244188597 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
data =   bsize=4096   blocks=976754385, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
log  =internal log   bsize=4096   blocks=476930, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.
partx: /dev/disk/by-id/wwn-0x53959bd02f56: error adding partition 1


[root@ceph0 ceph]# ceph -v
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
[root@ceph0 ceph]# rpm -qa | grep ceph
ceph-radosgw-0.94.2-0.el7.x86_64
libcephfs1-0.94.2-0.el7.x86_64
ceph-common-0.94.2-0.el7.x86_64
python-cephfs-0.94.2-0.el7.x86_64
ceph-0.94.2-0.el7.x86_64
[root@ceph0 ceph]#



 -Original Message-
 From: Loic Dachary [mailto:l...@dachary.org]
 Sent: Friday, June 26, 2015 3:29 PM
 To: Bruce McFarland; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] RHEL 7.1 ceph-disk failures creating OSD
 
 Hi,
 
 Prior to firefly v0.80.8 ceph-disk zap did not call partprobe and that was
 causing the kind of problems you're experiencing. It was fixed by
 https://github.com/ceph/ceph/commit/e70a81464b906b9a304c29f474e672
 6762b63a7c and is described in more details at
 http://tracker.ceph.com/issues/9665. Rebooting the machine ensures the
 partition table is up to date and that's what you probably want to do after
 that kind of failure. You can however avoid the failure by running:
 
  * ceph-disk zap
  * partproble
  * ceph-disk prepare
 
 Cheers
 
 P.S. The partx: /dev/disk/by-id/wwn-0x53959ba80a4e: error adding
 partition 1 can be ignored, it does not actually matter. A message was
 added later to avoid confusion with a real error.
 .
 On 26/06/2015 17:09, Bruce McFarland wrote:
  I have moved storage nodes to RHEL 7.1 and used the basic server install. I
 installed ceph-deploy and used the ceph.repo/epel.repo for installation of
 ceph 80.7. I have tried ceph-disk with issuing zap on the same command
 line as prepare and on a separate command line immediately before the
 ceph-disk prepare. I consistently run into the partition errors and am unable
 to create OSD's on RHEL 7.1.
 
 
 
  ceph-disk prepare --cluster ceph --cluster-uuid 373a09f7-2070-4d20-8504-
 c8653fb6db80 --fs-type xfs --zap-disk /dev/disk/by-id/wwn-
 0x53959ba80a4e /dev/disk/by-id/wwn-0x500080d9101001d6-part1
 
  Caution: invalid backup GPT header, but valid main header; regenerating
 
  backup header from main header.
 
 
 
 
 **
 **
 
  Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but
 disk
 
  verification and recovery are STRONGLY recommended.
 
 
 **
 **
 
  GPT data structures destroyed! You may now partition the disk using fdisk
 or
 
  other utilities.
 
  The operation has completed successfully.
 
  WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the
 same device as the osd data
 
  The operation has completed successfully

[ceph-users] Ceph Client OS - RHEL 7.1??

2015-06-04 Thread Bruce McFarland

I've always used Ubuntu for my Ceph client OS and found out in the lab that 
Centos/RHEL 6.x doesn't have the kernel rbd support. I wanted to investigate 
using RHEL 7.1 for the client OS. Is there a kernel rbd module that installs 
with RHEL 7.1?? If not are there 7.1 rpm's or src tar balls available to 
(relatively) easily create a RHEL 7.1 Ceph client??

Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing calamari on centos 7

2015-05-26 Thread Bruce McFarland

I followed the Calamari build instructions here:

http://ceph.com/category/ceph-step-by-step/

I used an Ubuntu 14.04 system to build all of the Calarmari client and server 
packages for Centos 6.5 and Ubuntu Trusty (14.04).
Once the packages were built I also referenced the Calamari instructions here 
to make sure my storage nodes were setup:

http://ceph.com/calamari/docs/development/building_packages.html

My cluster uses Ubuntu 14.04 for the client(s) that is hosting the Calamari 
Master. All of the Ceph storage nodes and monitors are running Centos 6.5 and 
Ceph 0.80.8.

The only issue I had bringing up Calamari was python related the first time I 
issued the Calamari initialize after the install. Python returns an Import 
Error: No module named _io. That was solved by copying the installed python2.7 
into the calamari venv cp /usr/bin/python2.7 
/opt/calamari/venv/local/bin/python. After the initialize command was working 
I get a full Calamari monitor.






From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Ignacio Bravo
Sent: Tuesday, May 26, 2015 11:32 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Installing calamari on centos 7

Shailesh,

I was trying to do the same, but came across several compiling errors, that I 
decided to deploy the Calamari Server on a Centos 6 machine. Even then I was 
not able to finalize the installation.

See:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001543.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001638.html

Now I feel less lonely in the deployment of Calamari since you are already in 
the same boat as myself.
Please keep me updated on your progress.

IB

On 05/26/2015 11:30 AM, Desai, Shailesh wrote:
All our ceph clusters are on centos 7 and I am trying to install calamari on 
one of the node. I am using instructions from
http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html. They are 
written for centos 6. I tries using them but did not work.
Has anyone tried installing calamari on Centos 7?


Thanks.
Shailesh




___

ceph-users mailing list

ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--



__

Ignacio Bravo

CFO

LTG Federal, Inc

www.ltgfederal.comhttp://www.ltgfederal.com

Office: (703) 951-7760

Mobile: (571) 224-6046
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??

2015-05-13 Thread Bruce McFarland

In my never ending saga of calamari with minions on a big endian architecture 
I've brought up another server from a  clean install of Ubuntu 14.04. The 
calamari master is now essperf13. 

I was able to figure out how to rebuild salt-minion with zmq  3.0.5 which got 
salt working so the 'salt \* ceph.get_heartbeats' from the master gets expected 
info. I have made a couple of attempts at rebuilding salt-minion with salt 
0.17.5, but it kept rebuilding with the rc2 salt code. I'll revisit that 
exercise. 

root@essperf13:/etc/ceph# salt --versions
   Salt: 0.17.5
 Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
 Jinja2: 2.7.2
   M2Crypto: 0.21.1
 msgpack-python: 0.3.0
   msgpack-pure: Not Installed
   pycrypto: 2.6.1
 PyYAML: 3.10 
  PyZMQ: 14.0.1
ZMQ: 4.0.4
root@essperf13:/etc/ceph#

root@KVDrive11:~# salt --versions
   Salt: 2015.2.0rc2
 Python: 2.6.6 (r266:84292, Dec 29 2010, 00:55:07)
 Jinja2: 2.7.3
   M2Crypto: 0.20.1
 msgpack-python: 0.4.6
   msgpack-pure: Not Installed
   pycrypto: 2.1.0
libnacl: Not Installed
 PyYAML: 3.09
  ioflo: Not Installed
  PyZMQ: 14.5.0
   RAET: Not Installed
ZMQ: 4.0.5
   Mako: Not Installed
root@KVDrive11:~#

 -Original Message-
 From: Gregory Meno [mailto:gm...@redhat.com]
 Sent: Wednesday, May 13, 2015 3:52 PM
 To: Bruce McFarland
 Cc: Michael Kuriger; ceph-calam...@lists.ceph.com; ceph-us...@ceph.com;
 ceph-devel (ceph-de...@vger.kernel.org)
 Subject: Re: [ceph-calamari] [ceph-users] Does anyone understand
 Calamari??
 
 Wow,
 
 That must be a record. I didn’t realize that.
 
 It turns out that you’ll have the best experience if the versions of master 
 and
 minion are in sync.
 
 We test and use 2014.1.5 and are still evaluating 2014.7.Z.
 
 Glad to hear things are working better.
 
 regards,
 Gregory
 
 
 
  On May 13, 2015, at 3:33 PM, Bruce McFarland
 bruce.mcfarl...@taec.toshiba.com wrote:
 
  Possibly my issue as well. The calamari master is salt 0.17.5 but the
 minions are running 2015.2.0rc2. I have to build the minions from source
 (big endian unsupported architecture). All of my salt issues seemed to get
 resolved when I got similar versions of ZMQ running on both master and
 minion. The calamari master is running on Ubuntu 14.04.
 
 
  From: Michael Kuriger [mailto:mk7...@yp.com]
  Sent: Wednesday, May 13, 2015 2:00 PM
  To: Bruce McFarland; ceph-calam...@lists.ceph.com; ceph-
 us...@ceph.com; ceph-devel (ceph-de...@vger.kernel.org)
  Subject: Re: [ceph-users] Does anyone understand Calamari??
 
  OK, I finally got mine working.  For whatever reason, the latest version of
 salt was the issue for me.  Leaving the latest version of salt on the calamari
 server is working, but had to downgrade the minions.
 
Removed:
  salt.noarch 0:2014.7.5-1.el6salt-minion.noarch 0:2014.7.5-1.el6
 
Installed:
  salt.noarch 0:2014.7.1-1.el6salt-minion.noarch 0:2014.7.1-1.el6
 
  This is on CentOS 6.6
 
  -=Mike Kuriger
 
 
  image001.png
 
  Michael Kuriger
  Sr. Unix Systems Engineer
  * mk7...@yp.com |( 818-649-7235
 
  From: Bruce McFarland bruce.mcfarl...@taec.toshiba.com
  Date: Tuesday, May 12, 2015 at 4:34 PM
  To: ceph-calam...@lists.ceph.com ceph-calam...@lists.ceph.com,
 ceph-users ceph-us...@ceph.com, ceph-devel (ceph-
 de...@vger.kernel.org) ceph-de...@vger.kernel.org
  Subject: [ceph-users] Does anyone understand Calamari??
 
  Increasing the audience since ceph-calamari is not responsive. What salt
 event/info does the Calamari Master expect to see from the ceph-mon to
 determine there is an working cluster? I had to change servers hosting the
 calamari master and can’t get the new machine to recognize the cluster.
 The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
 the monitor and all of the osd’s. Can anyone point me to docs or code that
 might enlighten me to what I’m overlooking? Thanks.
  ___
  ceph-calamari mailing list
  ceph-calam...@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Does anyone understand Calamari??

2015-05-13 Thread Bruce McFarland

Possibly my issue as well. The calamari master is salt 0.17.5 but the minions 
are running 2015.2.0rc2. I have to build the minions from source (big endian 
unsupported architecture). All of my salt issues seemed to get resolved when I 
got similar versions of ZMQ running on both master and minion. The calamari 
master is running on Ubuntu 14.04.

From: Michael Kuriger [mailto:mk7...@yp.com]
Sent: Wednesday, May 13, 2015 2:00 PM
To: Bruce McFarland; ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; 
ceph-devel (ceph-de...@vger.kernel.org)
Subject: Re: [ceph-users] Does anyone understand Calamari??

OK, I finally got mine working.  For whatever reason, the latest version of 
salt was the issue for me.  Leaving the latest version of salt on the calamari 
server is working, but had to downgrade the minions.

  Removed:

salt.noarch 0:2014.7.5-1.el6salt-minion.noarch 0:2014.7.5-1.el6

  Installed:

salt.noarch 0:2014.7.1-1.el6salt-minion.noarch 0:2014.7.1-1.el6

This is on CentOS 6.6

-=Mike Kuriger

[yp]

Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.commailto:mk7...@yp.com |* 818-649-7235

From: Bruce McFarland 
bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com
Date: Tuesday, May 12, 2015 at 4:34 PM
To: ceph-calam...@lists.ceph.commailto:ceph-calam...@lists.ceph.com 
ceph-calam...@lists.ceph.commailto:ceph-calam...@lists.ceph.com, ceph-users 
ceph-us...@ceph.commailto:ceph-us...@ceph.com, ceph-devel 
(ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org) 
ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org
Subject: [ceph-users] Does anyone understand Calamari??

Increasing the audience since ceph-calamari is not responsive. What salt 
event/info does the Calamari Master expect to see from the ceph-mon to 
determine there is an working cluster? I had to change servers hosting the 
calamari master and can't get the new machine to recognize the cluster. The 
'salt \* ceph.get_heartbeats' returns monmap, fsid, ver, epoch, etc for the 
monitor and all of the osd's. Can anyone point me to docs or code that might 
enlighten me to what I'm overlooking? Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New Calamari server

2015-05-12 Thread Bruce McFarland

I am having a similar issue. The cluster is up and salt is running on and has 
accepted keys from all nodes, including the monitor. I can issue salt and 
salt/ceph.py commands from the Calamari including 'salt \* ceph.get_heartbeats' 
which returns from all nodes including the monitor with the monmap epoch etc. 
Calamari reports that it sees all of the Ceph servers, but not a Ceph cluster. 
Is there a salt event besides ceph.get-heartbeats that the Calamari master 
requires to recognize the cluster?

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Michael Kuriger
 Sent: Tuesday, May 12, 2015 8:57 AM
 To: Alexandre DERUMIER
 Cc: ceph-users; ceph-devel
 Subject: Re: [ceph-users] New Calamari server
 
 In my case, I did remove all salt keys.  The salt portion of my install is
 working.  It’s just that the calamari server is not seeing the ceph cluster.
 
 
 
 
 
 Michael Kuriger
 Sr. Unix Systems Engineer
 * mk7...@yp.com |( 818-649-7235
 
 
 
 
 
 On 5/12/15, 1:35 AM, Alexandre DERUMIER aderum...@odiso.com
 wrote:
 
 Hi, when you have remove salt from nodes,
 
 do you have remove the old master key
 /etc/salt/pki/minion/minion_master.pub
 
 ?
 
 I have add the same behavior than you when reinstalling calamari
 server, and previously installed salt on ceph nodes (with explicit
 error about the key in /var/log/salt/minion on ceph nodes)
 
 - Mail original -
 De: Michael Kuriger mk7...@yp.com
 À: ceph-users ceph-us...@ceph.com
 Cc: ceph-devel ceph-de...@vger.kernel.org
 Envoyé: Lundi 11 Mai 2015 23:43:34
 Objet: [ceph-users] New Calamari server
 
 I had an issue with my calamari server, so I built a new one from
 scratch.
 I¹ve been struggling trying to get the new server to start up and see
 my ceph cluster. I went so far as to remove salt and diamond from my
 ceph nodes and reinstalled again. On my calamari server, it sees the
 hosts connected but doesn¹t detect a cluster. What am I missing? I¹ve
 set up many calamari servers on different ceph clusters, but this is
 the first time I¹ve tried to build a new calamari server.
 
 Here¹s what I see on my calamari GUI:
 
 New Calamari Installation
 
 This appears to be the first time you have started Calamari and there
 are no clusters currently configured.
 
 33 Ceph servers are connected to Calamari, but no Ceph cluster has been
 created yet. Please use ceph-deploy to create a cluster; please see the
 Inktank Ceph Enterprise documentation for more details.
 
 Thanks!
 Mike Kuriger
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  {.n +   +%  lzwm  b 맲  r  yǩ ׯzX    ܨ}   Ơz j:+vzZ+  +zf   h  
  ~i
 z  w   ? )ߢf
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??

2015-05-12 Thread Bruce McFarland

/var/log/salt/minion doesn't really look very interesting after that sequence. 
I issues salt oceton109 ceph.get_heartbeats from the master. The logs are much 
more interesting when clear calamari and stop salt-minion. Looking at the 
endpoints from http://essperf2/api/v2/cluster doesn't show anything. It reports 
HTTP 200 OK and Vary: Accept but there is nothing in the body of the output ie 
no update_time, id, or name is being reported.

root@octeon109:/var/log/salt# tail -f /var/log/salt/minion
2015-05-13 01:31:19,066 [salt.crypt   ][DEBUG   
][4699] Failed to authenticate message
2015-05-13 01:31:19,068 [salt.minion  ][DEBUG   
][4699] Attempting to authenticate with the Salt Master at 209.243.160.35
2015-05-13 01:31:19,069 [salt.crypt   ][DEBUG   
][4699] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 
'tcp://209.243.160.35:4506')
2015-05-13 01:31:19,294 [salt.crypt   ][DEBUG   
][4699] Decrypting the current master AES key
2015-05-13 01:31:19,296 [salt.crypt   ][DEBUG   
][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:31:20,026 [salt.crypt   ][DEBUG   
][4699] Loaded minion key: /etc/salt/pki/minion/minion.pem
2015-05-13 01:33:04,027 [salt.minion  ][INFO
][4699] User root Executing command ceph.get_heartbeats with jid 
20150512183304482562
2015-05-13 01:33:04,028 [salt.minion  ][DEBUG   
][4699] Command details {'tgt_type': 'glob', 'jid': '20150512183304482562', 
'tgt': 'octeon109', 'ret': '', 'user': 'root', 'arg': [], 'fun': 
'ceph.get_heartbeats'}
2015-05-13 01:33:04,043 [salt.minion  ][INFO
][5912] Starting a new job with PID 5912
2015-05-13 01:33:04,053 [salt.utils.lazy  ][DEBUG   
][5912] LazyLoaded ceph.get_heartbeats
2015-05-13 01:33:04,209 [salt.utils.lazy  ][DEBUG   
][5912] LazyLoaded pkg.version
2015-05-13 01:33:04,212 [salt.utils.lazy  ][DEBUG   
][5912] LazyLoaded pkg_resource.version
2015-05-13 01:33:04,217 [salt.utils.lazy  ][DEBUG   
][5912] LazyLoaded cmd.run_stdout
2015-05-13 01:33:04,219 [salt.loaded.int.module.cmdmod][INFO
][5912] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} 
${Version} ${Architecture}\n', '-W'] in directory '/root'
2015-05-13 01:33:05,432 [salt.minion  ][INFO
][5912] Returning information for job: 20150512183304482562
2015-05-13 01:33:05,434 [salt.crypt   ][DEBUG   
][5912] Re-using SAuth for ('/etc/salt/pki/minion', 'octeon109', 
'tcp://209.243.160.35:4506')


 -Original Message-
 From: Bruce McFarland
 Sent: Tuesday, May 12, 2015 6:11 PM
 To: 'Gregory Meno'
 Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel
 (ceph-de...@vger.kernel.org)
 Subject: RE: [ceph-calamari] Does anyone understand Calamari??
 
 Which logs? I'm assuming /var/log/salt/minon since the rest on the minions
 are relatively empty. Possibly Cthulhu from the master?
 
 I'm running on Ubuntu 14.04 and don't have an httpd service. I had been
 start/stopping apache2. Likewise there is no supervisord service and I've
 been using supervisorctl to start/stop Cthulhu.
 
 I've performed the calamari-ctl clear/init sequence more than twice with
 also stopping/starting apache2 and Cthulhu.
 
  -Original Message-
  From: Gregory Meno [mailto:gm...@redhat.com]
  Sent: Tuesday, May 12, 2015 5:58 PM
  To: Bruce McFarland
  Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel
  (ceph-de...@vger.kernel.org)
  Subject: Re: [ceph-calamari] Does anyone understand Calamari??
 
  All that looks fine.
 
  There must be some state where the cluster is known to calamari and it
  is failing to actually show it.
 
  If you have time to debug I would love to see the logs at debug level.
 
  If you don’t we could try cleaning out calamari’s state.
  sudo supervisorctl shutdown
  sudo service httpd stop
  sudo calamari-ctl cl—yes-i-am-sure
  sudo calamari-ctl initialize
  ca
  then
  sudo service supervisord start
  sudo service httpd start
 
  see what the API and UI says then.
 
  regards,
  Gregory
   On May 12, 2015, at 5:18 PM, Bruce McFarland
  bruce.mcfarl...@taec.toshiba.com wrote:
  
   Master was ess68 and now it's essperf3.
  
   On all cluster nodes the following files now have 'master: essperf3'
   /etc/salt/minion
   /etc/salt/minion/calamari.conf
   /etc/diamond/diamond.conf
  
   The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a
   'salt \*
  test.ping' from essperf3 Calamari Master to the cluster. I've also
  included a quick cluster sanity test with the output of ceph -s and
  ceph osd tree. And for your reading pleasure the output of 'salt octeon109

Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??

2015-05-12 Thread Bruce McFarland

Which logs? I'm assuming /var/log/salt/minon since the rest on the minions are 
relatively empty. Possibly Cthulhu from the master?

I'm running on Ubuntu 14.04 and don't have an httpd service. I had been 
start/stopping apache2. Likewise there is no supervisord service and I've been 
using supervisorctl to start/stop Cthulhu. 

I've performed the calamari-ctl clear/init sequence more than twice with also 
stopping/starting apache2 and Cthulhu.

 -Original Message-
 From: Gregory Meno [mailto:gm...@redhat.com]
 Sent: Tuesday, May 12, 2015 5:58 PM
 To: Bruce McFarland
 Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel
 (ceph-de...@vger.kernel.org)
 Subject: Re: [ceph-calamari] Does anyone understand Calamari??
 
 All that looks fine.
 
 There must be some state where the cluster is known to calamari and it is
 failing to actually show it.
 
 If you have time to debug I would love to see the logs at debug level.
 
 If you don’t we could try cleaning out calamari’s state.
 sudo supervisorctl shutdown
 sudo service httpd stop
 sudo calamari-ctl cl—yes-i-am-sure
 sudo calamari-ctl initialize
 ca
 then
 sudo service supervisord start
 sudo service httpd start
 
 see what the API and UI says then.
 
 regards,
 Gregory
  On May 12, 2015, at 5:18 PM, Bruce McFarland
 bruce.mcfarl...@taec.toshiba.com wrote:
 
  Master was ess68 and now it's essperf3.
 
  On all cluster nodes the following files now have 'master: essperf3'
  /etc/salt/minion
  /etc/salt/minion/calamari.conf
  /etc/diamond/diamond.conf
 
  The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt 
  \*
 test.ping' from essperf3 Calamari Master to the cluster. I've also included a
 quick cluster sanity test with the output of ceph -s and ceph osd tree. And 
 for
 your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats'
 since I suspect there might be a missing field in the monitor response.
 
  oot@essperf3:/etc/ceph# salt \* test.ping
  octeon108:
 True
  octeon114:
 True
  octeon111:
 True
  octeon101:
 True
  octeon106:
 True
  octeon109:
 True
  octeon118:
 True
  root@essperf3:/etc/ceph# ceph osd tree
  # idweight  type name   up/down reweight
  -1  7   root default
  -4  1   host octeon108
  0   1   osd.0   up  1
  -2  1   host octeon111
  1   1   osd.1   up  1
  -5  1   host octeon115
  2   1   osd.2   DNE
  -6  1   host octeon118
  3   1   osd.3   up  1
  -7  1   host octeon114
  4   1   osd.4   up  1
  -8  1   host octeon106
  5   1   osd.5   up  1
  -9  1   host octeon101
  6   1   osd.6   up  1
  root@essperf3:/etc/ceph# ceph -s
 cluster 868bfacc-e492-11e4-89fa-000fb70c
  health HEALTH_OK
  monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election
 epoch 1, quorum 0 octeon109
  osdmap e80: 6 osds: 6 up, 6 in
   pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
 60604 MB used, 2734 GB / 2793 GB avail
  728 active+clean
  root@essperf3:/etc/ceph#
 
  root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
  octeon109:
 --
 - boot_time:
 1430784431
 - ceph_version:
 0.80.8-0.el6
 - services:
 --
 ceph-mon.octeon109:
 --
 cluster:
 ceph
 fsid:
 868bfacc-e492-11e4-89fa-000fb70c
 id:
 octeon109
 status:
 --
 election_epoch:
 1
 extra_probe_peers:
 monmap:
 --
 created:
 2015-04-16 23:50:52.412686
 epoch:
 1
 fsid:
 868bfacc-e492-11e4-89fa-000fb70c
 modified:
 2015-04-16 23:50:52.412686
 mons:
 --
 - addr:
 209.243.160.70:6789/0
 - name:
 octeon109
 - rank:
 0
 name:
 octeon109
 outside_quorum:
 quorum:
 - 0
 rank:
 0
 state:
 leader
 sync_provider:
 type:
 mon
 version:
 0.86
 --
 - 868bfacc-e492-11e4-89fa-000fb70c:
 --
 fsid:
 868bfacc-e492-11e4-89fa-000fb70c

[ceph-users] accepter.accepter.bind unable to bind to IP on any port in range 6800-7300:

2015-05-08 Thread Bruce McFarland

I've run into an issue starting OSD's where I'm running out of ports. I've 
increased the port range with ms bind port max and on the next attempt to 
start the osd it reports no ports in the new range. I am only running 1 osd on 
the node and rarely restart the osd. I've increased the debug level to 20 and 
the only additional information in the log file is the PID for the process that 
can't get a port. IPtables is not loaded. This has just recently started 
occurring on multiple osd's and might possibly be releated to my issues with 
salt and debugging of the calamari master not recognizing ceph-mon even though 
'salt \* ceph.get_heartbeats' returns info for all nodes, monmap etc.

2015-05-08 10:52:17.861855 773b7000  0 ceph version 0.86 
(97dcc0539dfa7dac3de74852305d51580b7b1f82), process ceph-osd, pid 4629
2015-05-08 10:52:17.864413 773b7000 -1 accepter.accepter.bind unable to bind to 
192.168.2.102:7370 on any port in range 6800-7370: (126) Cannot assign 
requested address
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Bruce McFarland

You won’t get a PG warning message from ceph –s unless you have  20 PG’s per 
OSD in your cluster.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bruce 
McFarland
Sent: Tuesday, April 14, 2015 10:00 AM
To: Giuseppe Civitella; Saverio Proto
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Binding a pool to certain OSDs

I use this to quickly check pool stats:

[root@ceph-mon01 ceph]# ceph osd dump | grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 
stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 6 'rcvtst' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 400 pgp_num 400 last_change 10879 flags hashpspool stripe_width 0
[root@ceph-mon01 ceph]#

Or to individually query a pool:
ceph osd pool get rbd pg_num
ceph osd pool get rbd pgp_num

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Giuseppe Civitella
Sent: Tuesday, April 14, 2015 9:53 AM
To: Saverio Proto
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Binding a pool to certain OSDs

Hi Saverio,

I first made a test on my test staging lab where I have only 4 OSD.
On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 
cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached.
ceph -s tells me nothing about PGs, shouldn't I get an error message from its 
output?

Thanks
Giuseppe

2015-04-14 18:20 GMT+02:00 Saverio Proto 
ziopr...@gmail.commailto:ziopr...@gmail.com:
You only have 4 OSDs ?
How much RAM per server ?
I think you have already too many PG. Check your RAM usage.

Check on Ceph wiki guidelines to dimension the correct number of PGs.
Remeber that everytime to create a new pool you add PGs into the
system.

Saverio

2015-04-14 17:58 GMT+02:00 Giuseppe Civitella 
giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com:
 Hi all,

 I've been following this tutorial to realize my setup:
 http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

 I got this CRUSH map from my test lab:
 http://paste.openstack.org/show/203887/

 then I modified the map and uploaded it. This is the final version:
 http://paste.openstack.org/show/203888/

 When applied the new CRUSH map, after some rebalancing, I get this health
 status:
 [- avalon1 root@controller001 Ceph -] # ceph -s
 cluster af09420b-4032-415e-93fc-6b60e9db064e
  health HEALTH_WARN crush map has legacy tunables; mon.controller001 low
 disk space; clock skew detected on mon.controller002
  monmap e1: 3 mons at
 {controller001=10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0http://10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0},
 election epoch 314, quorum 0,1,2 controller001,controller002,controller003
  osdmap e3092: 4 osds: 4 up, 4 in
   pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects
 8842 MB used, 271 GB / 279 GB avail
  576 active+clean

 and this osd tree:
 [- avalon1 root@controller001 Ceph -] # ceph osd tree
 # idweight  type name   up/down reweight
 -8  2   root sed
 -5  1   host ceph001-sed
 2   1   osd.2   up  1
 -7  1   host ceph002-sed
 3   1   osd.3   up  1
 -1  2   root default
 -4  1   host ceph001-sata
 0   1   osd.0   up  1
 -6  1   host ceph002-sata
 1   1   osd.1   up  1

 which seems not a bad situation. The problem rise when I try to create a new
 pool, the command ceph osd pool create sed 128 128 gets stuck. It never
 ends.  And I noticed that my Cinder installation is not able to create
 volumes anymore.
 I've been looking in the logs for errors and found nothing.
 Any hint about how to proceed to restore my ceph cluster?
 Is there something wrong with the steps I take to update the CRUSH map? Is
 the problem related to Emperor?

 Regards,
 Giuseppe

 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella
 giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com:

 Hi all,

 I've got a Ceph cluster which serves volumes to a Cinder installation. It
 runs Emperor.
 I'd like to be able to replace some of the disks with OPAL disks and
 create a new pool which uses exclusively the latter kind of disk. I'd like
 to have a traditional pool and a secure one coexisting on the same ceph
 host. I'd then use Cinder multi backend feature to serve them.
 My question is: how is it possible to realize such a setup? How can I bind
 a pool to certain OSDs?

 Thanks
 Giuseppe

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Bruce McFarland

I use this to quickly check pool stats:

[root@ceph-mon01 ceph]# ceph osd dump | grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 
stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 6 'rcvtst' replicated size 2 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 400 pgp_num 400 last_change 10879 flags hashpspool stripe_width 0
[root@ceph-mon01 ceph]#

Or to individually query a pool:
ceph osd pool get rbd pg_num
ceph osd pool get rbd pgp_num



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Giuseppe Civitella
Sent: Tuesday, April 14, 2015 9:53 AM
To: Saverio Proto
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Binding a pool to certain OSDs

Hi Saverio,

I first made a test on my test staging lab where I have only 4 OSD.
On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 
cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached.
ceph -s tells me nothing about PGs, shouldn't I get an error message from its 
output?

Thanks
Giuseppe

2015-04-14 18:20 GMT+02:00 Saverio Proto 
ziopr...@gmail.commailto:ziopr...@gmail.com:
You only have 4 OSDs ?
How much RAM per server ?
I think you have already too many PG. Check your RAM usage.

Check on Ceph wiki guidelines to dimension the correct number of PGs.
Remeber that everytime to create a new pool you add PGs into the
system.

Saverio


2015-04-14 17:58 GMT+02:00 Giuseppe Civitella 
giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com:
 Hi all,

 I've been following this tutorial to realize my setup:
 http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

 I got this CRUSH map from my test lab:
 http://paste.openstack.org/show/203887/

 then I modified the map and uploaded it. This is the final version:
 http://paste.openstack.org/show/203888/

 When applied the new CRUSH map, after some rebalancing, I get this health
 status:
 [- avalon1 root@controller001 Ceph -] # ceph -s
 cluster af09420b-4032-415e-93fc-6b60e9db064e
  health HEALTH_WARN crush map has legacy tunables; mon.controller001 low
 disk space; clock skew detected on mon.controller002
  monmap e1: 3 mons at
 {controller001=10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0http://10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0},
 election epoch 314, quorum 0,1,2 controller001,controller002,controller003
  osdmap e3092: 4 osds: 4 up, 4 in
   pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects
 8842 MB used, 271 GB / 279 GB avail
  576 active+clean

 and this osd tree:
 [- avalon1 root@controller001 Ceph -] # ceph osd tree
 # idweight  type name   up/down reweight
 -8  2   root sed
 -5  1   host ceph001-sed
 2   1   osd.2   up  1
 -7  1   host ceph002-sed
 3   1   osd.3   up  1
 -1  2   root default
 -4  1   host ceph001-sata
 0   1   osd.0   up  1
 -6  1   host ceph002-sata
 1   1   osd.1   up  1

 which seems not a bad situation. The problem rise when I try to create a new
 pool, the command ceph osd pool create sed 128 128 gets stuck. It never
 ends.  And I noticed that my Cinder installation is not able to create
 volumes anymore.
 I've been looking in the logs for errors and found nothing.
 Any hint about how to proceed to restore my ceph cluster?
 Is there something wrong with the steps I take to update the CRUSH map? Is
 the problem related to Emperor?

 Regards,
 Giuseppe




 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella
 giuseppe.civite...@gmail.commailto:giuseppe.civite...@gmail.com:

 Hi all,

 I've got a Ceph cluster which serves volumes to a Cinder installation. It
 runs Emperor.
 I'd like to be able to replace some of the disks with OPAL disks and
 create a new pool which uses exclusively the latter kind of disk. I'd like
 to have a traditional pool and a secure one coexisting on the same ceph
 host. I'd then use Cinder multi backend feature to serve them.
 My question is: how is it possible to realize such a setup? How can I bind
 a pool to certain OSDs?

 Thanks
 Giuseppe



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5

2015-04-07 Thread Bruce McFarland

Loic,
You're not mistaken the pages are listed under the Installation (Manual) link:

http://ceph.com/docs/master/install/

You'll see the first link is the Get Packages link which takes you to:

http://ceph.com/docs/master/install/get-packages/

This page contains the details on setting up your system to use APT (Ubuntu) or 
RPM (Centos) and the code for the ceph.repo file. There are also package 
dependency lists, trusted keys, etc. 

Bruce


-Original Message-
From: Loic Dachary [mailto:l...@dachary.org] 
Sent: Tuesday, April 07, 2015 1:32 AM
To: Bruce McFarland; ceph-users
Subject: Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5


Hi Bruce,

On 07/04/2015 02:40, Bruce McFarland wrote:
 I'm not sure exactly what your steps where, but I reinstalled a monitor 
 yesterday on Centos 6.5 using ceph-deploy with the /etc/yum.repos.d/ceph.repo 
 from ceph.com which I've included below.
 Bruce

That's what I also ended up doing. But unless I'm mistaken adding 
/etc/yum.repos.d/ceph.repo from ceph.com for ceph packages is not in the steps 
listed at http://ceph.com/docs/master/start/, starting from 
http://ceph.com/docs/master/start/quick-start-preflight/ and proceeding to 
http://ceph.com/docs/master/start/quick-ceph-deploy/.

Cheers

 
 [root@essperf13 ceph-mon01]# ceph -v
 ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
 [root@essperf13 ceph-mon01]# lsb_release -a
 LSB Version:  
 :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
 Distributor ID:   CentOS
 Description:  CentOS release 6.5 (Final)
 Release:  6.5
 Codename: Final
 [root@essperf13 ceph-mon01]#
 
 I'm using the ceph.repo from ceph.com
 
 [root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo [Ceph] 
 name=Ceph packages for $basearch 
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/$basearch
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [ceph-source]
 name=Ceph source packages
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/SRPMS
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [Ceph-noarch]
 name=Ceph noarch packages
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/noarch
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [root@essperf13 ceph-mon01]#
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Loic Dachary
 Sent: Monday, April 06, 2015 5:33 PM
 To: ceph-users
 Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
 
 Hi,
 
 I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by 
 following 
 http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but it 
 installed v0.80.5 instead. Is it really what we want by default ? Or is it me 
 misreading the instructions somehow ?
 
 Cheers
 

--
Loïc Dachary, Artisan Logiciel Libre

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5

2015-04-07 Thread Bruce McFarland

I'm not sure about Centos 7.0 but Ceph is not part of the 6.5 distro. 

Sent from my iPhone

 On Apr 7, 2015, at 12:26 PM, Loic Dachary l...@dachary.org wrote:
 
 
 
 On 07/04/2015 18:51, Bruce McFarland wrote:
 Loic,
 You're not mistaken the pages are listed under the Installation (Manual) 
 link:
 
 http://ceph.com/docs/master/install/
 
 You'll see the first link is the Get Packages link which takes you to:
 
 http://ceph.com/docs/master/install/get-packages/
 
 This page contains the details on setting up your system to use APT (Ubuntu) 
 or RPM (Centos) and the code for the ceph.repo file. There are also package 
 dependency lists, trusted keys, etc.
 
 Thanks for checking. Maybe it is intended that instructions for ceph-deploy 
 only get packages from the distribution and not from the ceph.com 
 repositories.
 
 Cheers
 
 Bruce
 
 
 -Original Message-
 From: Loic Dachary [mailto:l...@dachary.org] 
 Sent: Tuesday, April 07, 2015 1:32 AM
 To: Bruce McFarland; ceph-users
 Subject: Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
 
 
 Hi Bruce,
 
 On 07/04/2015 02:40, Bruce McFarland wrote:
 I'm not sure exactly what your steps where, but I reinstalled a monitor 
 yesterday on Centos 6.5 using ceph-deploy with the 
 /etc/yum.repos.d/ceph.repo from ceph.com which I've included below.
 Bruce
 
 That's what I also ended up doing. But unless I'm mistaken adding 
 /etc/yum.repos.d/ceph.repo from ceph.com for ceph packages is not in the 
 steps listed at http://ceph.com/docs/master/start/, starting from 
 http://ceph.com/docs/master/start/quick-start-preflight/ and proceeding to 
 http://ceph.com/docs/master/start/quick-ceph-deploy/.
 
 Cheers
 
 
 [root@essperf13 ceph-mon01]# ceph -v
 ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
 [root@essperf13 ceph-mon01]# lsb_release -a
 LSB Version:
 :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
 Distributor ID:CentOS
 Description:CentOS release 6.5 (Final)
 Release:6.5
 Codename:Final
 [root@essperf13 ceph-mon01]#
 
 I'm using the ceph.repo from ceph.com
 
 [root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo [Ceph] 
 name=Ceph packages for $basearch 
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/$basearch
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [ceph-source]
 name=Ceph source packages
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/SRPMS
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [Ceph-noarch]
 name=Ceph noarch packages
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.as
 c
 enabled=1
 baseurl=http://ceph.com/rpm-firefly/el6/noarch
 priority=1
 gpgcheck=1
 type=rpm-md
 
 [root@essperf13 ceph-mon01]#
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Loic Dachary
 Sent: Monday, April 06, 2015 5:33 PM
 To: ceph-users
 Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5
 
 Hi,
 
 I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by 
 following 
 http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but 
 it installed v0.80.5 instead. Is it really what we want by default ? Or is 
 it me misreading the instructions somehow ?
 
 Cheers
 
 --
 Loïc Dachary, Artisan Logiciel Libre
 
 -- 
 Loïc Dachary, Artisan Logiciel Libre
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5

2015-04-06 Thread Bruce McFarland

I'm not sure exactly what your steps where, but I reinstalled a monitor 
yesterday on Centos 6.5 using ceph-deploy with the /etc/yum.repos.d/ceph.repo 
from ceph.com which I've included below.
Bruce

[root@essperf13 ceph-mon01]# ceph -v
ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
[root@essperf13 ceph-mon01]# lsb_release -a
LSB Version:
:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:CentOS release 6.5 (Final)
Release:6.5
Codename:   Final
[root@essperf13 ceph-mon01]# 

I'm using the ceph.repo from ceph.com

[root@essperf13 ceph-mon01]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/el6/$basearch
priority=1
gpgcheck=1
type=rpm-md

[ceph-source]
name=Ceph source packages
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/el6/SRPMS
priority=1
gpgcheck=1
type=rpm-md

[Ceph-noarch]
name=Ceph noarch packages
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/el6/noarch
priority=1
gpgcheck=1
type=rpm-md

[root@essperf13 ceph-mon01]#

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic 
Dachary
Sent: Monday, April 06, 2015 5:33 PM
To: ceph-users
Subject: [ceph-users] Installing firefly v0.80.9 on RHEL 6.5

Hi,

I tried to install firefly v0.80.9 on a freshly installed RHEL 6.5 by following 
http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster but it 
installed v0.80.5 instead. Is it really what we want by default ? Or is it me 
misreading the instructions somehow ?

Cheers
-- 
Loïc Dachary, Artisan Logiciel Libre

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Calamari Questions

2015-04-01 Thread Bruce McFarland

Quentin,
I got the config page to come up by exiting Calamari, deleting the salt keys on 
the calamari master ‘salt-key –D’, then restarting Calamari on the master and 
accepting the salt keys on the master ‘salt-key –A’ after doing salt-minion and 
diamond service restart on the ceph nodes. Once the salt keys were reaccepted 
by the master Calamari goes to the accept cluster screen when you “click” on 
any option. Possibly the root issue being that the cluster’s monitor (lab 
cluster w/only 1 mon) didn’t have the salt-minion/diamond services running and 
hadn’t broadcast a key to the calamari master.
Thanks,
Bruce

From: Quentin Hartman [mailto:qhart...@direwolfdigital.com]
Sent: Wednesday, April 01, 2015 1:56 PM
To: Bruce McFarland
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Calamari Questions

You should have a config page in calamari UI where you can accept osd nodes 
into the cluster as Calamari sees it. If you skipped the little first-setup 
window like I did, it's kind of a pain to find.

QH

On Wed, Apr 1, 2015 at 12:34 PM, Bruce McFarland 
bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com 
wrote:
I’ve built the Calamari client, server, and diamond packages from source for 
trusty and centos and installed it on the trusty Master. Installed diamond and 
salt packages on the storage nodes. I can connect to the calamari master, 
accept salt keys from the ceph nodes, but then Calamari reports “3 Ceph servers 
are connected to Calamari, but no Ceph cluster has been created yet. Please use 
ceph-deploy to create a cluster” The 3 Ceph nodes are part of an existing Ceph 
cluster with 90 OSDS. I also built and installed the minion package on the 
Calamari Master under /opt/calamari/webapp/content/calamari-minions

Any ideas what I’ve overlooked in my Calamari bring up?
Thanks,
Bruce

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Calamari Questions

2015-04-01 Thread Bruce McFarland

I've built the Calamari client, server, and diamond packages from source for 
trusty and centos and installed it on the trusty Master. Installed diamond and 
salt packages on the storage nodes. I can connect to the calamari master, 
accept salt keys from the ceph nodes, but then Calamari reports 3 Ceph servers 
are connected to Calamari, but no Ceph cluster has been created yet. Please use 
ceph-deploy to create a cluster The 3 Ceph nodes are part of an existing Ceph 
cluster with 90 OSDS. I also built and installed the minion package on the 
Calamari Master under /opt/calamari/webapp/content/calamari-minions

Any ideas what I've overlooked in my Calamari bring up?
Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD caching on 4K reads???

2015-02-02 Thread Bruce McFarland

I'm still missing something. I can check on the monitor to see that the running 
config on the cluster has rbd cache = false

[root@essperf13 ceph]# ceph --admin-daemon 
/var/run/ceph/ceph-mon.essperf13.asok config show | grep rbd
  debug_rbd: 0\/5,
  rbd_cache: false,

Since rbd caching is a client setting I've added the following to the rbd 
client /etc/ceph/ceph.conf

[global]
log file = /var/log/ceph/rbd.log
rbd cache = false
rbd readahead max bytes = 0  should be disabled if rbd cache = false, but I'm 
paranoid 

 [client]
admin socket = /var/run/ceph/rbd-$pid.asok

I never see a rbd*asok file in /var/run/ceph. I started the rbd driver on the 
client without /var/run/ceph directory and then see:
2015-02-02 14:40:30.254509 7f81888257c0 -1 asok(0x7f8189182390) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to '/var/run/ceph/rbd-1716.asok': (2) No such file 
or directory

When I attempt to map the rbd image to the client device with rbd map. Once I 
create /var/run/ceph these messages don't occur. So it appears that the admin 
sockets are being created, but only for the duration of the command. 

I still see the effects of rbd caching if I run fio/vdbench with 4K random 
reads, but I have not been able to create a persistant rbd admin socket so that 
I can dump the running configuration and/or change it at run time.

Any ideas on what I've overlooked?

Any pointers to documentation on the [client] section of ceph.conf? rbd admin 
sockets? Nothing at ceph.com/docs on either topic.
Thanks,
Bruce

-Original Message-
From: Mykola Golub [mailto:to.my.troc...@gmail.com] 
Sent: Sunday, February 01, 2015 1:24 PM
To: Udo Lembke
Cc: Bruce McFarland; ceph-us...@ceph.com; Prashanth Nednoor
Subject: Re: [ceph-users] RBD caching on 4K reads???

On Fri, Jan 30, 2015 at 10:09:32PM +0100, Udo Lembke wrote:
 Hi Bruce,
 you can also look on the mon, like
 ceph --admin-daemon /var/run/ceph/ceph-mon.b.asok config show | grep 
 cache

rbd cache is a client setting, so you have to check this connecting to the 
client admin socket. Its location is defined in ceph.conf, [client] section, 
admin socket parameter.

--
Mykola Golub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD caching on 4K reads???

2015-02-02 Thread Bruce McFarland

I'm using Ubuntu 14.04 and the kernel rbd which makes calls into libceph
root@essperf3:/etc/ceph# lsmod | grep rbd
rbd63707  1 
libceph   225026  1 rbd
root@essperf3:/etc/ceph#

I'm doing raw device IO with either fio or vdbench (preferred tool) and there 
is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by 
writing to the drop_caches and I've also allocated huge pages to be the max 
allowable based on free memory. The huge page allocation should minimize any 
system caches. I have a, relatively, small storage pool since this is a 
development environment and there is only ~ 4TB total and the rbd image is 3TB. 
On my lab system with 320TB I don't see this problem since the data set is 
orders of magnitude larger than available system cache. Maybe I'll remove DIMMs 
from the client system and physically disable kernel caching.

-Original Message-
From: Nicheal [mailto:zay11...@gmail.com] 
Sent: Monday, February 02, 2015 7:35 PM
To: Bruce McFarland
Cc: ceph-us...@ceph.com; Prashanth Nednoor
Subject: Re: [ceph-users] RBD caching on 4K reads???

It seems you use the kernel rbd. So rbd_cache does not work, which is just 
designed for librbd. Kernel rbd is directly using the system page cache. You 
said that you have already run like echo 3  /proc/sys/vm/drop_cache to 
invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on 
any filesystem, such ext4 or xfs?
If so, and you run the test tool like fio, first with a write test and 
file_size = 10G. Then a file(10G) is created by fio but with lots of holes in 
the file, and your read test may read those holes so that filesystem can tell 
thay contain nothing and there is no need to access the physical disk to get 
data. You may check the fiemap of the file to see whether it contains holes or 
you just remove the file and recreate the file by a read test.

Ning Yao

2015-01-31 4:51 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com:
 I have a cluster and have created a rbd device - /dev/rbd1. It shows 
 up as expected with ‘rbd –image test info’ and rbd showmapped. I have 
 been looking at cluster performance with the usual Linux block device 
 tools – fio and vdbench. When I look at writes and large block 
 sequential reads I’m seeing what I’d expect with performance limited 
 by either my cluster interconnect bandwidth or the backend device 
 throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA 
 OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K 
 random reads. There is caching occurring somewhere in my system that I 
 haven’t been able to detect and suppress - yet.



 I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the 
 client, monitor, and storage nodes. I’ve flushed the system caches on 
 the client and storage nodes before test run ie vm.drop_caches=3 and 
 set the huge pages to the maximum available to consume free system 
 memory so that it can’t be used for system cache . I’ve also disabled 
 read-ahead on all of the HDD/OSDs.



 When I run a 4k randon read workload on the client the most I could 
 expect would be ~100iops/osd x number of osd’s – I’m seeing an order 
 of magnitude greater than that AND running IOSTAT on the storage nodes 
 show no read activity on the OSD disks.



 Any ideas on what I’ve overlooked? There appears to be some read-ahead 
 caching that I’ve missed.



 Thanks,

 Bruce


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD caching on 4K reads???

2015-02-02 Thread Bruce McFarland

Yes I'm using and the kernel rbd in Ubuntu 14.04 which makes calls into libceph 

root@essperf3:/etc/ceph# lsmod | grep rbd
rbd63707  1 
libceph   225026  1 rbd
root@essperf3:/etc/ceph#

I'm doing raw device IO with either fio or vdbench (preferred tool) and there 
is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by 
writing to the drop_caches and I've also allocated huge pages to be the max 
allowable based on free memory. The huge page allocation should minimize any 
system caches. I have a, relatively, small storage pool since this is a 
development environment and there is only ~ 4TB total and the rbd image is 3TB. 
On my lab system with 320TB I don't see this problem since the data set is 
orders of magnitude larger than available system cache. 

Maybe I'll should try and test after removing DIMMs from the client system and 
physically disabling kernel caching.

-Original Message-
From: Nicheal [mailto:zay11...@gmail.com] 
Sent: Monday, February 02, 2015 7:35 PM
To: Bruce McFarland
Cc: ceph-us...@ceph.com; Prashanth Nednoor
Subject: Re: [ceph-users] RBD caching on 4K reads???

It seems you use the kernel rbd. So rbd_cache does not work, which is just 
designed for librbd. Kernel rbd is directly using the system page cache. You 
said that you have already run like echo 3  /proc/sys/vm/drop_cache to 
invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on 
any filesystem, such ext4 or xfs?
If so, and you run the test tool like fio, first with a write test and 
file_size = 10G. Then a file(10G) is created by fio but with lots of holes in 
the file, and your read test may read those holes so that filesystem can tell 
thay contain nothing and there is no need to access the physical disk to get 
data. You may check the fiemap of the file to see whether it contains holes or 
you just remove the file and recreate the file by a read test.

Ning Yao

2015-01-31 4:51 GMT+08:00 Bruce McFarland bruce.mcfarl...@taec.toshiba.com:
 I have a cluster and have created a rbd device - /dev/rbd1. It shows 
 up as expected with ‘rbd –image test info’ and rbd showmapped. I have 
 been looking at cluster performance with the usual Linux block device 
 tools – fio and vdbench. When I look at writes and large block 
 sequential reads I’m seeing what I’d expect with performance limited 
 by either my cluster interconnect bandwidth or the backend device 
 throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA 
 OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K 
 random reads. There is caching occurring somewhere in my system that I 
 haven’t been able to detect and suppress - yet.



 I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the 
 client, monitor, and storage nodes. I’ve flushed the system caches on 
 the client and storage nodes before test run ie vm.drop_caches=3 and 
 set the huge pages to the maximum available to consume free system 
 memory so that it can’t be used for system cache . I’ve also disabled 
 read-ahead on all of the HDD/OSDs.



 When I run a 4k randon read workload on the client the most I could 
 expect would be ~100iops/osd x number of osd’s – I’m seeing an order 
 of magnitude greater than that AND running IOSTAT on the storage nodes 
 show no read activity on the OSD disks.



 Any ideas on what I’ve overlooked? There appears to be some read-ahead 
 caching that I’ve missed.



 Thanks,

 Bruce


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD caching on 4K reads???

2015-01-30 Thread Bruce McFarland

I have a cluster and have created a rbd device - /dev/rbd1. It shows up as 
expected with 'rbd -image test info' and rbd showmapped. I have been looking at 
cluster performance with the usual Linux block device tools - fio and vdbench. 
When I look at writes and large block sequential reads I'm seeing what I'd 
expect with performance limited by either my cluster interconnect bandwidth or 
the backend device throughput speeds - 1 GE frontend and cluster network and 
7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K 
random reads. There is caching occurring somewhere in my system that I haven't 
been able to detect and suppress - yet.

I've set 'rbd_cache=false' in the [client] section of ceph.conf on the client, 
monitor, and storage nodes. I've flushed the system caches on the client and 
storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the 
maximum available to consume free system memory so that it can't be used for 
system cache . I've also disabled read-ahead on all of the HDD/OSDs.

When I run a 4k randon read workload on the client the most I could expect 
would be ~100iops/osd x number of osd's - I'm seeing an order of magnitude 
greater than that AND running IOSTAT on the storage nodes show no read activity 
on the OSD disks.

Any ideas on what I've overlooked? There appears to be some read-ahead caching 
that I've missed.

Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD caching on 4K reads???

2015-01-30 Thread Bruce McFarland

The ceph daemon isn't running on the client with the rbd device so I can't 
verify if it's disabled at the librbd level on the client. If you mean on the 
storage nodes I've had some issues dumping the config. Does the rbd caching 
occur on the storage nodes, client, or both?

From: Udo Lembke [mailto:ulem...@polarzone.de]
Sent: Friday, January 30, 2015 1:00 PM
To: Bruce McFarland; ceph-us...@ceph.com
Cc: Prashanth Nednoor
Subject: Re: [ceph-users] RBD caching on 4K reads???

Hi Bruce,
hmm, sounds for me like the rbd cache.
Can you look, if the cache is realy disabled in the running config with

ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep cache

Udo
On 30.01.2015 21:51, Bruce McFarland wrote:
I have a cluster and have created a rbd device - /dev/rbd1. It shows up as 
expected with 'rbd -image test info' and rbd showmapped. I have been looking at 
cluster performance with the usual Linux block device tools - fio and vdbench. 
When I look at writes and large block sequential reads I'm seeing what I'd 
expect with performance limited by either my cluster interconnect bandwidth or 
the backend device throughput speeds - 1 GE frontend and cluster network and 
7200rpm SATA OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K 
random reads. There is caching occurring somewhere in my system that I haven't 
been able to detect and suppress - yet.

I've set 'rbd_cache=false' in the [client] section of ceph.conf on the client, 
monitor, and storage nodes. I've flushed the system caches on the client and 
storage nodes before test run ie vm.drop_caches=3 and set the huge pages to the 
maximum available to consume free system memory so that it can't be used for 
system cache . I've also disabled read-ahead on all of the HDD/OSDs.

When I run a 4k randon read workload on the client the most I could expect 
would be ~100iops/osd x number of osd's - I'm seeing an order of magnitude 
greater than that AND running IOSTAT on the storage nodes show no read activity 
on the OSD disks.

Any ideas on what I've overlooked? There appears to be some read-ahead caching 
that I've missed.

Thanks,
Bruce

___

ceph-users mailing list

ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor/OSD report tuning question

2014-08-25 Thread Bruce McFarland

See inline:
Ceph version:
 [root@ceph2 ceph]# ceph -v
 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)

Initial testing was with 30 osd's 10/storage server with the following HW:
 4TB SATA disks - 1 hdd/osd - 30hdd's/server - 6 ssd's/server - forming a md 
 raid0 virtual drive with 30 96GB partitions for 1 partition/osd journal.

Storage Server HW:
 2 x Xeon e5-2630 2.6GHz 24 cores total with 128GB/server

Monitor HW:
 Monitor: 2 x Xeon e5-2630 2.6GHz 24 cores total with 64GB - system disks 
 are 4 x 480GB SAS ssd configured as virtual md raid0

It seems my cluster's main issue is osd_heartbeat_grace since I constantly see 
osd failures for reporting outside the 20 second grace. The cluster was 
configured from boot time (I completely tore down the original cluster and 
rebuilt with increased osd_heartbeat_grace of 35).  As you can see the osd is 
marked down the cluster then goes into a osdmap/pgmap rebalancing cycle and 
everything is UP/IN with page states of 'active+clean' - for a few moments and 
then the osd flapping and map rebalancing restarts. 

All of the osd's are configured and report osd_heartbeat_grace of 35. Any idea 
why osd's are still failing for  20??
root@ceph0 ceph]# sh -x ./ceph0-daemon-config.sh beat_grace
+ '[' 1 '!=' 1 ']'
+ for i in '{0..29}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show
+ grep beat_grace
  mon_osd_adjust_heartbeat_grace: true,
  osd_heartbeat_grace: 35,
+ for i in '{0..29}'
+ grep beat_grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  osd_heartbeat_grace: 35,
+ for i in '{0..29}'

2014-08-25 10:18:10.812179 mon.0 [INF] osd.26 209.243.160.83:6878/4819 failed 
(279 reports from 56 peers after 21.006896 = grace 20.995963)
2014-08-25 10:18:10.812440 mon.0 [INF] osd.29 209.243.160.83:6887/7439 failed 
(254 reports from 51 peers after 21.007140 = grace 20.995963)
2014-08-25 10:18:10.817675 mon.0 [INF] osd.18 209.243.160.83:6854/30165 failed 
(280 reports from 56 peers after 21.012978 = grace 20.995962)
2014-08-25 10:18:10.817850 mon.0 [INF] osd.19 209.243.160.83:6857/31036 failed 
(245 reports from 49 peers after 21.013135 = grace 20.995962)
2014-08-25 10:18:11.127275 mon.0 [INF] osdmap e25128: 91 osds: 82 up, 90 in
2014-08-25 10:18:11.157030 mon.0 [INF] pgmap v51553: 5760 pgs: 519 
stale+active+clean, 5241 active+clean; 0 bytes data, 135 GB used, 327 TB / 327 
TB avail
2014-08-25 10:18:11.924773 mon.0 [INF] osd.5 209.243.160.83:6815/19790 failed 
(270 reports from 54 peers after 22.120541 = grace 21.991499)
2014-08-25 10:18:11.924858 mon.0 [INF] osd.7 209.243.160.83:6821/21303 failed 
(240 reports from 48 peers after 22.120345 = grace 21.991499)
2014-08-25 10:18:11.924894 mon.0 [INF] osd.11 209.243.160.83:6833/24394 failed 
(260 reports from 52 peers after 22.120297 = grace 21.991499)
2014-08-25 10:18:11.924943 mon.0 [INF] osd.16 209.243.160.83:6848/28431 failed 
(265 reports from 53 peers after 22.120080 = grace 21.991499)
2014-08-25 10:18:11.924977 mon.0 [INF] osd.17 209.243.160.83:6851/29253 failed 
(250 reports from 50 peers after 22.120067 = grace 21.991499)
2014-08-25 10:18:11.925012 mon.0 [INF] osd.23 209.243.160.83:6869/2073 failed 
(270 reports from 54 peers after 22.120020 = grace 21.991499)
2014-08-25 10:18:11.925065 mon.0 [INF] osd.24 209.243.160.83:6872/3025 failed 
(260 reports from 52 peers after 22.120010 = grace 21.991499)
2014-08-25 10:15:17.753867 osd.10 [WRN] map e25128 wrongly marked me down
2014-08-25 10:15:17.960953 osd.18 [WRN] map e25128 wrongly marked me down
2014-08-25 10:15:18.217959 osd.29 [WRN] map e25128 wrongly marked me down
2014-08-25 10:18:11.925143 mon.0 [INF] osd.28 209.243.160.83:6884/6572 failed 
(275 reports from 55 peers after 22.670894 = grace 21.991288)
2014-08-25 10:18:12.204918 mon.0 [INF] pgmap v51554: 5760 pgs: 519 
stale+active+clean, 5241 active+clean; 0 bytes data, 135 GB used, 327 TB / 327 
TB avail

-Original Message-
From: Christian Balzer [mailto:ch...@gol.com] 
Sent: Monday, August 25, 2014 1:15 AM
To: ceph-us...@ceph.com
Cc: Bruce McFarland
Subject: Re: [ceph-users] Monitor/OSD report tuning question


Hello,

On Sat, 23 Aug 2014 20:23:55 + Bruce McFarland wrote:

Firstly while the runtime changes you injected into the cluster should have 
done something (and I hope some Ceph developer comments on that) you're asking 
for tuning advice which really isn't the issue here.

Your cluster should not need any tuning to become functional, what you're 
seeing is something massively wrong with it.

 Hello,
 I have a Cluster
Which version? I assume Firefly due to the single monitor which suggests a test 
cluster, but if you're running a development version all bets are off.

 [root@ceph2 ceph]# ceph -v
 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)

 with 30 OSDs

What disks? How connected? SSD journals?

 4TB SATA disks 1/osd - 30hdd's/server - 6 ssd's forming a md raid0 virtual 
 drive with 30 96GB

Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20

2014-08-25 Thread Bruce McFarland

I just added osd_heartbeat_grace to the [mon] section of ceph.conf, restarted 
ceph-mon, and now the monitor is reporting a 35 second osd_heartbeat_grace:

[root@ceph-mon01 ceph]# ceph --admin-daemon 
/var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace
  osd_heartbeat_grace: 35,
[root@ceph-mon01 ceph]#


-Original Message-
From: Bruce McFarland 
Sent: Monday, August 25, 2014 10:46 AM
To: 'Gregory Farnum'
Cc: ceph-us...@ceph.com
Subject: RE: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail 
for grace  20

That's something that was been puzzling to me. The monitor ceph.conf is set to 
35, but it's runtime config reports 20. I've restarted it after initial 
creation to try and get it to reload the ceph.conf settings, but it stays's at 
20.

[root@ceph-mon01 ceph]# ceph --admin-daemon 
/var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace
  osd_heartbeat_grace: 20,
[root@ceph-mon01 ceph]#

[root@ceph-mon01 ceph]# cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 209.243.160.84
mon_initial_members = ceph-mon01
fsid = 94bbb882-42e4-4a6c-bfda-125790616fcc

osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096

osd_pool_default_size = 3  # Write an object 3 times - number of replicas.
osd_pool_default_min_size = 1 # Allow writing one copy in a degraded state.

[mon]
mon_osd_min_down_reporters = 2

[osd]
debug_ms = 1
debug_osd = 20
public_network = 209.243.160.0/24
cluster_network = 10.10.50.0/24
osd_journal_size = 96000
osd_heartbeat_grace = 35

[osd.0]
.
.
.
-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Monday, August 25, 2014 10:39 AM
To: Bruce McFarland
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail 
for grace  20

On Sat, Aug 23, 2014 at 11:06 PM, Bruce McFarland 
bruce.mcfarl...@taec.toshiba.com wrote:
 I see osd’s being failed for heartbeat reporting  default 
 osd_heartbeat_grace of 20 but the run time config shows that the grace 
 is set to 30. Is there another variable for the osd or the mon I need 
 to set for the non default osd_heartbeat_grace of 30 to take effect?

You need to also set the osd heartbeat grace on the monitors. If I were to 
guess, the OSDs are actually seeing each other as slow (after
30 seconds) and reporting it in, but the monitors have a grace of 20 seconds 
set so that's what they're using to generate output.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20

2014-08-25 Thread Bruce McFarland

After looking a little closer now that I have a better understanding of 
osd_heartbeat_grace for the monitor all the osd failures are coming from 1 node 
in the cluster. Yes your hunch was correct and that node had stale in the 
iptables. After disabling iptables the osd flapping has stopped.  

Now I'm going to bring the osd_heartbeat_grace value back down incrementally 
and see if the cluster runs without reporting issues with the default.

Thank you very much for your help.

I have some default pool questions concerning cluster bring up:
I have 90 osd's (single 4TB HDD/osd with 96GB journal that is a partition on a 
SSD raid0) 30 osd's per storage node.
I have the default page/placement group info in the [global] section of 
ceph.conf:
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096

When I bring up a cluster I'm running out of the default pools 0-data, 
1-metadata, and 2-rbd and getting error msgs for not enough pages/osd. Since 
osd's require between 20 and 32 pages each as soon as I've brought up the first 
storage node I need a minimum of 600 pages, but the system comes up with the 
defaults of 64/default pool. After creation of each nodes osd's I increased the 
default pool sizes with ceph osd pool set pool pg_num and pgp_num for each of 
the default pools. Do I need to increase all 3 pools? Is there a ceph.conf 
setting that handles this startup issue? 

- whats' the best practices way to handle bringing up more osd's than the 
default pool page settings can handle?



-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Monday, August 25, 2014 11:01 AM
To: Bruce McFarland
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail 
for grace  20

On Mon, Aug 25, 2014 at 10:56 AM, Bruce McFarland 
bruce.mcfarl...@taec.toshiba.com wrote:
 Thank you very much for the help.

 I'm moving osd_heartbeat_grace to the global section and trying to figure out 
 what's going on between  the osd's. Since increasing the osd_heartbeat_grace 
 in the [mon] section of ceph.conf on the monitor I still see failures, but 
 now they are 2 seconds  osd_heartbeat_grace. It seems that no matter how 
 much I increase this value osd's are reporting just outside of it.

 I've looked at netstat -s for all of the nodes and will go back and look at 
 the network stat's much closer.

 Would it help to put the monitor on a 10G link to the storage nodes? 
 Everything is setup, but we chose to leave the monitor on a 1G link to the 
 storage nodes.

No. They're being marked down because they aren't heartbeating the OSDs, and 
those OSDs are reporting the failures to the monitor (whose connection is 
apparently working fine). The most likely guess without more data is that 
you've got firewall rules set up blocking the ports the OSDs are using to send 
their heartbeats...but it could be many things in your network stack or your 
cpu scheduler or whatever.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace 20

2014-08-24 Thread Bruce McFarland

I see osd's being failed for heartbeat reporting  default osd_heartbeat_grace 
of 20 but the run time config shows that the grace is set to 30. Is there 
another variable for the osd or the mon I need to set for the non default 
osd_heartbeat_grace of 30 to take effect?

2014-08-23 23:03:08.982590 mon.0 [INF] osd.23 209.243.160.83:6812/31567 failed 
(73 reports from 20 peers after 20.462129 = grace 20.00)
2014-08-23 23:03:09.058927 mon.0 [INF] osdmap e37965: 30 osds: 29 up, 30 in
2014-08-23 23:03:09.070575 mon.0 [INF] pgmap v82213: 1920 pgs: 62 
stale+active+clean, 1858 active+clean; 0 bytes data, 8193 MB used, 109 TB / 109 
TB avail
2014-08-23 23:03:09.860169 mon.0 [INF] osd.20 209.243.160.83:6806/29554 failed 
(62 reports from 20 peers after 21.339816 = grace 20.995899)
2014-08-23 23:03:09.860246 mon.0 [INF] osd.26 209.243.160.83:6811/1098 failed 
(66 reports from 20 peers after 21.339380 = grace 20.995899)
2014-08-23 23:03:09.860307 mon.0 [INF] osd.29 209.243.160.83:6804/3217 failed 
(62 reports from 20 peers after 21.339341 = grace 20.995899)
2014-08-23 23:03:10.076721 mon.0 [INF] osdmap e37966: 30 osds: 26 up, 30 in


[root@ceph1 ceph]# sh -x ./ceph1-daemon-config.sh grace
+ '[' 1 '!=' 1 ']'
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.4.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.7.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.8.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{0..9}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.9.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
[root@ceph1 ceph]#

[root@ceph2 ceph]# sh -x ./ceph2-daemon-config.sh grace
+ '[' 1 '!=' 1 ']'
+ for i in '{10..19}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.10.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.11.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.12.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.13.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.14.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.15.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ grep grace
+ ceph --admin-daemon /var/run/ceph/ceph-osd.16.asok config show
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.17.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ ceph --admin-daemon /var/run/ceph/ceph-osd.18.asok config show
+ grep grace
  mon_osd_adjust_heartbeat_grace: true,
  mds_beacon_grace: 15,
  osd_heartbeat_grace: 30,
+ for i in '{10..19}'
+ grep grace
+ ceph --admin-daemon

[ceph-users] Monitor/OSD report tuning question

2014-08-23 Thread Bruce McFarland

Hello,
I have a Cluster with 30 OSDs distributed over 3 Storage Servers connected by a 
10G cluster link and connected to the Monitor over 1G. I still have a lot to 
understand with Ceph. Observing the cluster messages in a ceph -watch window 
I see a lot of osd flapping when it is sitting in a configured state and 
page/placement groups constantly changing status. The cluster was configured 
and came up to 1920 'active + clean' pages.

The 3 status below outputs were issued over the course of about 2 to minutes. 
As you can see there is a lot of activity where I'm assuming the osd reporting 
is occasionally outside the heartbeat TO and various pages/placement groups get 
set to 'stale' and/or 'degrded' but still 'active'. There are osd's being  
marked 'out' in the osd map that I see in the watch window as reported of 
failures that very quickly report wrongly marked me down. I'm assuming I need 
to 'tune' some of the many TO values so that the osd's and page/placement 
groups all can report within the TO window.


A quick look at the -admin-daemon config show cmd tells me that I might 
consider tuning some of these values:

[root@ceph0 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.20.asok config 
show | grep report
  mon_osd_report_timeout: 900,
  mon_osd_min_down_reporters: 1,
  mon_osd_min_down_reports: 3,
  osd_mon_report_interval_max: 120,
  osd_mon_report_interval_min: 5,
  osd_pg_stat_report_interval_max: 500,
[root@ceph0 ceph]#

Which osd and/or mon settings should I increase/decrease to eliminate all this 
state flapping while the cluster sits configured with no data?
Thanks,
Bruce

014-08-23 13:16:15.564932 mon.0 [INF] osd.20 209.243.160.83:6800/20604 failed 
(65 reports from 20 peers after 23.380808 = grace 21.991016)
2014-08-23 13:16:15.565784 mon.0 [INF] osd.23 209.243.160.83:6810/29727 failed 
(79 reports from 20 peers after 23.675170 = grace 21.990903)
2014-08-23 13:16:15.566038 mon.0 [INF] osd.25 209.243.160.83:6808/31984 failed 
(65 reports from 20 peers after 23.380921 = grace 21.991016)
2014-08-23 13:16:15.566206 mon.0 [INF] osd.26 209.243.160.83:6811/518 failed 
(65 reports from 20 peers after 23.381043 = grace 21.991016)
2014-08-23 13:16:15.566372 mon.0 [INF] osd.27 209.243.160.83:6822/2511 failed 
(65 reports from 20 peers after 23.381195 = grace 21.991016)
.
.
.
2014-08-23 13:17:09.547684 osd.20 [WRN] map e27128 wrongly marked me down
2014-08-23 13:17:10.826541 osd.23 [WRN] map e27130 wrongly marked me down
2014-08-23 13:20:09.615826 mon.0 [INF] osdmap e27134: 30 osds: 26 up, 30 in
2014-08-23 13:17:10.954121 osd.26 [WRN] map e27130 wrongly marked me down
2014-08-23 13:17:19.125177 osd.25 [WRN] map e27135 wrongly marked me down

[root@ceph-mon01 ceph]# ceph -s
cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d
 health HEALTH_OK
 monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, 
quorum 0 ceph-mon01
 osdmap e26636: 30 osds: 30 up, 30 in
  pgmap v56534: 1920 pgs, 3 pools, 0 bytes data, 0 objects
26586 MB used, 109 TB / 109 TB avail
1920 active+clean
[root@ceph-mon01 ceph]# ceph -s
cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d
 health HEALTH_WARN 160 pgs degraded; 83 pgs stale
 monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, 
quorum 0 ceph-mon01
 osdmap e26641: 30 osds: 30 up, 30 in
  pgmap v56545: 1920 pgs, 3 pools, 0 bytes data, 0 objects
26558 MB used, 109 TB / 109 TB avail
  83 stale+active+clean
 160 active+degraded
1677 active+clean
[root@ceph-mon01 ceph]# ceph -s
cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d
 health HEALTH_OK
 monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, 
quorum 0 ceph-mon01
 osdmap e26657: 30 osds: 30 up, 30 in
  pgmap v56584: 1920 pgs, 3 pools, 0 bytes data, 0 objects
26610 MB used, 109 TB / 109 TB avail
1920 active+clean
[root@ceph-mon01 ceph]#

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting

2014-08-21 Thread Bruce McFarland

I have 3 storage servers each with 30 osds. Each osd has a journal that is a 
partition on a virtual drive that is a raid0 of 6 ssds. I brought up a 3 osd (1 
per storage server) cluster to bring up Ceph and figure out configuration etc.

From: Dan Van Der Ster [mailto:daniel.vanders...@cern.ch]
Sent: Thursday, August 21, 2014 1:17 AM
To: Bruce McFarland
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting

Hi,
You only have one OSD? I've seen similar strange things in test pools having 
only one OSD - and I kinda explained it by assuming that OSDs need peers (other 
OSDs sharing the same PG) to behave correctly. Install a second OSD and see how 
it goes...
Cheers, Dan


On 21 Aug 2014, at 02:59, Bruce McFarland 
bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com 
wrote:


I have a cluster with 1 monitor and 3 OSD Servers. Each server has multiple 
OSD's running on it. When I start the OSD using /etc/init.d/ceph start osd.0
I see the expected interaction between the OSD and the monitor authenticating 
keys etc and finally the OSD starts.

Running watching the cluster with 'ceph -w' running on the monitor I never see 
the INFO messages I expect. There isn't a msg from osd.0 for the boot event and 
the expected INFO messages from osdmap and pgmap  for the osd and it's pages 
being added to those maps.  I only see the last time the monitor was booted and 
it wins the monitor election and reports monmap, pgmap, and mdsmap info.

The firewalls are disabled with selinux==disabled and iptables turned off. All 
hosts can ssh w/o passwords into each other and I've verified traffic between 
hosts using tcpdump captures. Any ideas on what I'd need to add to ceph.conf or 
have overlooked would be greatly appreciated.
Thanks,
Bruce

[root@ceph0 ceph]# /etc/init.d/ceph restart osd.0
=== osd.0 ===
=== osd.0 ===
Stopping Ceph osd.0 on ceph0...kill 15676...done
=== osd.0 ===
2014-08-20 17:43:46.456592 7fa51a034700  1 -- :/0 messenger.start
2014-08-20 17:43:46.457363 7fa51a034700  1 -- :/1025971 -- 
209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
0x7fa51402f9e0 con 0x7fa51402f570
2014-08-20 17:43:46.458229 7fa5189f0700  1 -- 209.243.160.83:0/1025971 learned 
my addr 209.243.160.83:0/1025971
2014-08-20 17:43:46.459664 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 1  mon_map v1  200+0+0 (3445960796 0 0) 
0x7fa508000ab0 con 0x7fa51402f570
2014-08-20 17:43:46.459849 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  
33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570
2014-08-20 17:43:46.460180 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 
0x7fa4fc0012d0 con 0x7fa51402f570
2014-08-20 17:43:46.461341 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  
206+0+0 (409581826 0 0) 0x7fa508000f60 con 0x7fa51402f570
2014-08-20 17:43:46.461514 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 
0x7fa4fc001cf0 con 0x7fa51402f570
2014-08-20 17:43:46.462824 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  
393+0+0 (2134012784 0 0) 0x7fa5080011d0 con 0x7fa51402f570
2014-08-20 17:43:46.463011 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7fa51402bbc0 
con 0x7fa51402f570
2014-08-20 17:43:46.463073 7fa5135fe700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x7fa4fc0025d0 
con 0x7fa51402f570
2014-08-20 17:43:46.463329 7fa51a034700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7fa514030490 con 0x7fa51402f570
2014-08-20 17:43:46.463363 7fa51a034700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
0x7fa5140309b0 con 0x7fa51402f570
2014-08-20 17:43:46.463564 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 5  mon_map v1  200+0+0 (3445960796 0 0) 
0x7fa508001100 con 0x7fa51402f570
2014-08-20 17:43:46.463639 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
(540052875 0 0) 0x7fa5080013e0 con 0x7fa51402f570
2014-08-20 17:43:46.463707 7fa5135fe700  1 -- 209.243.160.83:0/1025971 == 
mon.0 209.243.160.84:6789/0 7  auth_reply(proto 2 0 (0) Success) v1  
194+0+0 (1040860857 0 0) 0x7fa5080015d0 con 0x7fa51402f570
2014-08-20 17:43:46.468877 7fa51a034700  1 -- 209.243.160.83:0/1025971 -- 
209.243.160.84:6789/0 -- mon_command({prefix: get_command_descriptions} v 
0) v1 -- ?+0 0x7fa514030e20 con 0x7fa51402f570
2014-08-20 17:43

Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting

2014-08-21 Thread Bruce McFarland

Yes all of the ceph-osd processes are up and running. I perform a ceph-mon 
restart to see if that might trigger the osdmap update, but there is no INFO 
msg from the osdmap or the pgmap that I expect to when the osd's are started. 
All of the osd's and their hosts appear in the CRUSH map and in ceph.conf. 

Since I went through a bunch of issues getting the multiple osds/host setup and 
working I'm assuming that the monitor's tables might be hosed and am going to 
purgedata and reinstall the monitor and see if it builds the proper mappings. 
I've stopped all of the osd's and verified that there aren't any active 
ceph-osd processes. Then I'll follow the procedure for bringing online a new 
monitor to an existing cluster so that I use the proper fsid.

2014-08-20 17:20:24.648538 7f326ebfd700  0 monclient: hunting for new mon
2014-08-20 17:20:24.648857 7f327455f700  0 -- 209.243.160.84:0/1005462  
209.243.160.84:6789/0 pipe(0x7f3264020300 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f3264020570).fault
2014-08-20 17:20:26.077687 mon.0 [INF] mon.ceph-mon01@0 won leader election 
with quorum 0
2014-08-20 17:20:26.077810 mon.0 [INF] monmap e1: 1 mons at 
{ceph-mon01=209.243.160.84:6789/0}
2014-08-20 17:20:26.077931 mon.0 [INF] pgmap v555: 192 pgs: 192 creating; 0 
bytes data, 0 kB used, 0 kB / 0 kB avail
2014-08-20 17:20:26.078032 mon.0 [INF] mdsmap e1: 0/0/1 up


-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Thursday, August 21, 2014 8:44 AM
To: Bruce McFarland
Cc: Dan Van Der Ster; ceph-us...@ceph.com
Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's booting

Are the OSD processes still alive? What's the osdmap output of ceph -w (which 
was not in the output you pasted)?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thu, Aug 21, 2014 at 7:11 AM, Bruce McFarland 
bruce.mcfarl...@taec.toshiba.com wrote:
 I have 3 storage servers each with 30 osds. Each osd has a journal 
 that is a partition on a virtual drive that is a raid0 of 6 ssds. I 
 brought up a 3 osd
 (1 per storage server) cluster to bring up Ceph and figure out 
 configuration etc.



 From: Dan Van Der Ster [mailto:daniel.vanders...@cern.ch]
 Sent: Thursday, August 21, 2014 1:17 AM
 To: Bruce McFarland
 Cc: ceph-us...@ceph.com
 Subject: Re: [ceph-users] MON running 'ceph -w' doesn't see OSD's 
 booting



 Hi,

 You only have one OSD? I’ve seen similar strange things in test pools 
 having only one OSD — and I kinda explained it by assuming that OSDs 
 need peers (other OSDs sharing the same PG) to behave correctly. 
 Install a second OSD and see how it goes...

 Cheers, Dan





 On 21 Aug 2014, at 02:59, Bruce McFarland 
 bruce.mcfarl...@taec.toshiba.com
 wrote:



 I have a cluster with 1 monitor and 3 OSD Servers. Each server has 
 multiple OSD’s running on it. When I start the OSD using 
 /etc/init.d/ceph start osd.0

 I see the expected interaction between the OSD and the monitor 
 authenticating keys etc and finally the OSD starts.



 Running watching the cluster with ‘ceph –w’ running on the monitor I 
 never see the INFO messages I expect. There isn’t a msg from osd.0 for 
 the boot event and the expected INFO messages from osdmap and pgmap  
 for the osd and it’s pages being added to those maps.  I only see the 
 last time the monitor was booted and it wins the monitor election and 
 reports monmap, pgmap, and mdsmap info.



 The firewalls are disabled with selinux==disabled and iptables turned off.
 All hosts can ssh w/o passwords into each other and I’ve verified 
 traffic between hosts using tcpdump captures. Any ideas on what I’d 
 need to add to ceph.conf or have overlooked would be greatly appreciated.

 Thanks,

 Bruce



 [root@ceph0 ceph]# /etc/init.d/ceph restart osd.0

 === osd.0 ===

 === osd.0 ===

 Stopping Ceph osd.0 on ceph0...kill 15676...done

 === osd.0 ===

 2014-08-20 17:43:46.456592 7fa51a034700  1 -- :/0 messenger.start

 2014-08-20 17:43:46.457363 7fa51a034700  1 -- :/1025971 --
 209.243.160.84:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0
 0x7fa51402f9e0 con 0x7fa51402f570

 2014-08-20 17:43:46.458229 7fa5189f0700  1 -- 209.243.160.83:0/1025971 
 learned my addr 209.243.160.83:0/1025971

 2014-08-20 17:43:46.459664 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
 ==
 mon.0 209.243.160.84:6789/0 1  mon_map v1  200+0+0 (3445960796 
 0 0)
 0x7fa508000ab0 con 0x7fa51402f570

 2014-08-20 17:43:46.459849 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
 ==
 mon.0 209.243.160.84:6789/0 2  auth_reply(proto 2 0 (0) Success) 
 v1 
 33+0+0 (536914167 0 0) 0x7fa508000f60 con 0x7fa51402f570

 2014-08-20 17:43:46.460180 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
 --
 209.243.160.84:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
 0x7fa4fc0012d0 con 0x7fa51402f570

 2014-08-20 17:43:46.461341 7fa5135fe700  1 -- 209.243.160.83:0/1025971 
 ==
 mon.0 209.243.160.84:6789/0 3  auth_reply(proto 2 0 (0) Success) 
 v1 
 206+0+0 (409581826 0 0

[ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland

I've tried using ceph-deploy but it wants to assign the same id for each osd 
and I end up with a bunch of prepared ceph-disk's and only 1 active. If I 
use the manual short form method the activate step fails and there are no xfs 
mount points on the ceph-disks. If I use the manual long form it seems like 
I'm the closest to getting active ceph-disks/osd's but the monitor always shows 
the osds as down/in and the ceph-disks don't persist over a boot cycle.

Is there a document anywhere that anyone knows of that explains a step by step 
process for bringing up multiple osd's per host - 1 hdd with ssd journal 
partition per osd?
Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland

  ] checking OSD status...
[ceph0][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph0 is now ready for osd use.

From: Bruce McFarland
Sent: Thursday, August 14, 2014 11:45 AM
To: 'ceph-us...@ceph.com'
Subject: How to create multiple OSD's per host?

I've tried using ceph-deploy but it wants to assign the same id for each osd 
and I end up with a bunch of prepared ceph-disk's and only 1 active. If I 
use the manual short form method the activate step fails and there are no xfs 
mount points on the ceph-disks. If I use the manual long form it seems like 
I'm the closest to getting active ceph-disks/osd's but the monitor always shows 
the osds as down/in and the ceph-disks don't persist over a boot cycle.

Is there a document anywhere that anyone knows of that explains a step by step 
process for bringing up multiple osd's per host - 1 hdd with ssd journal 
partition per osd?
Thanks,
Bruce
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to create multiple OSD's per host?

2014-08-14 Thread Bruce McFarland

I’ll try the prepare/activiate commands again. I spent the least amount of time 
with them since activate _always_ failed for me. I’ll go back and check my 
logs, but probably because I was attempting to activate the same location I 
used in the ‘prepare’ instead of the partition 1 like you suggest (which is 
exactly how it is show in the documentation example).

I seemed to get the closest to a working cluster using the ‘manual’ commands 
below. I could try changing the XFS mount point to be on a partition of the hdd 
I’m using for the osd.

mkdir /var/lib/ceph/osd/ceph-$OSD
mkfs -t xfs -f /dev/sd$i
mount -t xfs  /dev/sd$i /var/lib/ceph/osd/ceph-$OSD
ceph-osd -i $OSD --mkfs --mkkey --osd-journal /dev/md0p$PART

What I find most confusing using ceph-deploy with multiple osds on the same 
host is that when ‘ceph-deploy osd create [data] [journal]’ completes there is 
no osd directory for each osd under:

[root@ceph0 ceph]# ll /var/lib/ceph/osd/
total 0
[root@ceph0 ceph]#


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jason 
King
Sent: Thursday, August 14, 2014 8:13 PM
To: ceph-us...@ceph.com
Subject: Re: [ceph-users] How to create multiple OSD's per host?


2014-08-15 7:56 GMT+08:00 Bruce McFarland 
bruce.mcfarl...@taec.toshiba.commailto:bruce.mcfarl...@taec.toshiba.com:
This is an example of the output from ‘ceph-deploy osd create [data] [journal’
I’ve noticed that all of the ‘ceph-conf’ commands use the same parameter of 
‘–name=osd.’  Everytime ceph-deploy is called. I end up with 30 osd’s – 29 in 
the prepared and 1 active according to the ‘ceph-disk list’ output and only 1 
osd that has a xfs mount point. I’ve tried both with all data/journal devices 
on the same ceph-deploy command line and issuing 1 ceph-deploy cmd for each OSD 
data/journal pair (easier to script).


+ ceph-deploy osd create ceph0:/dev/sdl:/dev/md0p17
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.10): /usr/bin/ceph-deploy osd create 
ceph0:/dev/sdl:/dev/md0p17
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph0:/dev/sdl:/dev/md0p17
[ceph0][DEBUG ] connected to host: ceph0
[ceph0][DEBUG ] detect platform information from remote host
[ceph0][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph0
[ceph0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph0][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph0 disk /dev/sdl journal 
/dev/md0p17 activate True
[ceph0][INFO  ] Running command: ceph-disk -v prepare --fs-type xfs --cluster 
ceph -- /dev/sdl /dev/md0p17
[ceph0][DEBUG ] Information: Moved requested sector from 34 to 2048 in
[ceph0][DEBUG ] order to align on 2048-sector boundaries.
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][DEBUG ] meta-data=/dev/sdl1  isize=2048   agcount=4, 
agsize=244188597 blks
[ceph0][DEBUG ]  =   sectsz=512   attr=2, 
projid32bit=0
[ceph0][DEBUG ] data =   bsize=4096   blocks=976754385, 
imaxpct=5
[ceph0][DEBUG ]  =   sunit=0  swidth=0 blks
[ceph0][DEBUG ] naming   =version 2  bsize=4096   ascii-ci=0
[ceph0][DEBUG ] log  =internal log   bsize=4096   blocks=476930, 
version=2
[ceph0][DEBUG ]  =   sectsz=512   sunit=0 blks, 
lazy-count=1
[ceph0][DEBUG ] realtime =none   extsz=4096   blocks=0, 
rtextents=0
[ceph0][DEBUG ] The operation has completed successfully.
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=osd_journal_size
[ceph0][WARNIN] DEBUG:ceph-disk:Journal /dev/md0p17 is a partition
[ceph0][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is 
not the same device as the osd data
[ceph0][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdl
[ceph0][WARNIN] INFO:ceph-disk:Running command: /usr/sbin/sgdisk 
--largest-new=1 --change-name=1:ceph data 
--partition-guid=1:a96b4af4-11f4-4257-9476-64a6e4c93c28 
--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdl
[ceph0][WARNIN] INFO:ceph

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-04 Thread Bruce McFarland

2014-08-04 09:57:37.144649 7f42171c8700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204001a90).fault
2014-08-04 09:58:07.145097 7f4215ac3700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204001320).fault
2014-08-04 09:58:37.145491 7f42171c8700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204003eb0).fault
2014-08-04 09:59:07.145776 7f4215ac3700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204001320).fault
2014-08-04 09:59:37.146043 7f42171c8700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204003eb0).fault
2014-08-04 10:00:07.146288 7f4215ac3700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204001320).fault
2014-08-04 10:00:37.146543 7f42171c8700  0 -- 209.243.160.35:0/1032499  
209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f4204003eb0).fault

209.243.160.35 - monitor
209.243.160.51 - osd.0
209.243.160.52 - osd.3
209.243.160.59 - osd.2

-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: Sunday, August 03, 2014 11:15 AM
To: Bruce McFarland
Cc: Brian Rak; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever

On Sun, 3 Aug 2014, Bruce McFarland wrote:
 Is there a recommended way to take every thing down and restart the 
 process? I was considering starting completely from scratch ie OS 
 reinstall and then using Ceph-deploy as before.

If you're using ceph-deploy, then

 ceph-deploy purge HOST
 ceph-deploy purgedata HOST

will do it.  Then remove the ceph.* (config and keyring) files from the current 
directory.

 I've learned a lot and want to figure out a fool proof way I can 
 document for others in our lab to bring up a cluster on new HW.  I 
 learn a lot more when I break things and have to figure out what went 
 wrong so its a little frustrating, but I've found out a lot about 
 verifying the configuration and debug options so far. My intent is to 
 investigate rbd usage, perf, and configuration options.
 
 The endless loop I'm referring to is a constant stream of fault 
 messages that I'm not yet familiar on how to interpret. I have let 
 them run to see if the cluster recovers, but Ceph-mon always crashed. 
 I'll look for the crash dump and save it since kdump should be enabled 
 on the monitor box.

Do you have one of the messages handy?  I'm curious whether it is an OSD or a 
mon.

Thanks!
sage



 Thanks for the feedback. 
 
 
  On Aug 3, 2014, at 8:30 AM, Sage Weil sw...@redhat.com wrote:
  
  Hi Bruce,
  
  On Sun, 3 Aug 2014, Bruce McFarland wrote:
  Yes I looked at tcpdump on each of the OSDs and saw communications 
  between all 3 OSDs before I sent my first question to this list. 
  When I disabled selinux on the one offending server based on your 
  feedback (typically we have this disabled on lab systems that are 
  only on the lab net) the 10 pages in my test pool all went to 
  ?active+clean? almost immediately. Unfortunately the 3 default 
  pools still remain in the creating states and are not health_ok. 
  The OSDs all stayed UP/IN after the selinux change for the rest of 
  the day until I made the mistake of creating a RBD image on 
  demo-pool and it?s 10 ?active+clean? pages. I created the rbd, but 
  when I attempted to look at it with ?rbd info? the cluster went 
  into an endless loop  trying to read a placement group and loop 
  that I left running overnight. This morning
  
  What do you mean by went into an endless loop?
  
  ceph-mon was crashed again. I?ll probably start all over from 
  scratch once again on Monday.
  
  Was there a stack dump in the mon log?
  
  It is possible that there is a bug with pool creation that surfaced 
  by having selinux in place for so long, but otherwise this scenario 
  doesn't make much sense to me.  :/  Very interested in hearing more, 
  and/or whether you can reproduce it.
  
  Thanks!
  sage
  
  
  
   
  
  I deleted ceph-mds and got rid of the ?laggy? comments from ?ceph health?.
  The ?official? online Ceph docs on that ?coming soon? and most 
  references I could find were pre firefly so it was a little trail 
  and error to figure out to use the pool number and not it?s name to 
  get the removal to work. Same with ?ceph mds newfs? to get rid of 
  ?laggy-ness? in the ?ceph health?
  output.
  
   
  
  [root@essperf3 Ceph]# ceph mds rm 0  mds.essperf3
  
  mds gid 0 dne
  
  [root@essperf3 Ceph]# ceph health
  
  HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck 
  inactive; 192 pgs stuck unclean mds essperf3 is laggy
  
  [root@essperf3 Ceph]# ceph mds newfs 1 0  --yes-i-really-mean-it
  
  new fs with metadata pool 1 and data pool 0

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-04 Thread Bruce McFarland

Is there a header or first line that appears in all ceph-mon stack dumps I can  
search for?  The couple of ceph-mon stack dumps I've seen in web searches 
appear to all begin with ceph version 0.xx, but those are from over a year 
ago. Is that still the case with 0.81 firefly code?

-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: Monday, August 04, 2014 10:09 AM
To: Bruce McFarland
Cc: Brian Rak; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Firefly OSDs stuck in creating state forever

Okay, looks like the mon went down then.

Was there a stack trace in the log after the daemon crashed?  (Or did the 
daemon stay up but go unresponsive or something?)

Thanks!
sage

On Mon, 4 Aug 2014, Bruce McFarland wrote:

 2014-08-04 09:57:37.144649 7f42171c8700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204001a90).fault
 2014-08-04 09:58:07.145097 7f4215ac3700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204001320).fault
 2014-08-04 09:58:37.145491 7f42171c8700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204003eb0).fault
 2014-08-04 09:59:07.145776 7f4215ac3700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204001320).fault
 2014-08-04 09:59:37.146043 7f42171c8700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204003eb0).fault
 2014-08-04 10:00:07.146288 7f4215ac3700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204001320).fault
 2014-08-04 10:00:37.146543 7f42171c8700  0 -- 209.243.160.35:0/1032499 
  209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 
 l=1 c=0x7f4204003eb0).fault

 209.243.160.35 - monitor
 209.243.160.51 - osd.0
 209.243.160.52 - osd.3
 209.243.160.59 - osd.2

 -Original Message-
 From: Sage Weil [mailto:sw...@redhat.com]
 Sent: Sunday, August 03, 2014 11:15 AM
 To: Bruce McFarland
 Cc: Brian Rak; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever

 On Sun, 3 Aug 2014, Bruce McFarland wrote:
  Is there a recommended way to take every thing down and restart the 
  process? I was considering starting completely from scratch ie OS 
  reinstall and then using Ceph-deploy as before.

 If you're using ceph-deploy, then

  ceph-deploy purge HOST
  ceph-deploy purgedata HOST

 will do it.  Then remove the ceph.* (config and keyring) files from the 
 current directory.

  I've learned a lot and want to figure out a fool proof way I can 
  document for others in our lab to bring up a cluster on new HW.  I 
  learn a lot more when I break things and have to figure out what 
  went wrong so its a little frustrating, but I've found out a lot 
  about verifying the configuration and debug options so far. My 
  intent is to investigate rbd usage, perf, and configuration options.

  The endless loop I'm referring to is a constant stream of fault 
  messages that I'm not yet familiar on how to interpret. I have let 
  them run to see if the cluster recovers, but Ceph-mon always crashed.
  I'll look for the crash dump and save it since kdump should be 
  enabled on the monitor box.

 Do you have one of the messages handy?  I'm curious whether it is an OSD or a 
 mon.

 Thanks!
 sage

  Thanks for the feedback. 

   On Aug 3, 2014, at 8:30 AM, Sage Weil sw...@redhat.com wrote:

   Hi Bruce,

   On Sun, 3 Aug 2014, Bruce McFarland wrote:
   Yes I looked at tcpdump on each of the OSDs and saw 
   communications between all 3 OSDs before I sent my first question to 
   this list.
   When I disabled selinux on the one offending server based on your 
   feedback (typically we have this disabled on lab systems that are 
   only on the lab net) the 10 pages in my test pool all went to 
   ?active+clean? almost immediately. Unfortunately the 3 default 
   pools still remain in the creating states and are not health_ok.
   The OSDs all stayed UP/IN after the selinux change for the rest 
   of the day until I made the mistake of creating a RBD image on 
   demo-pool and it?s 10 ?active+clean? pages. I created the rbd, 
   but when I attempted to look at it with ?rbd info? the cluster 
   went into an endless loop  trying to read a placement group and 
   loop that I left running overnight. This morning

   What do you mean by went into an endless loop?

   ceph-mon was crashed again. I?ll probably start all over from 
   scratch once again on Monday.

   Was there a stack dump in the mon log?

   It is possible that there is a bug with pool creation that 
   surfaced by having selinux in place for so long, but otherwise 
   this scenario doesn't make much sense to me.  :/  Very interested

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-04 Thread Bruce McFarland

I couldn't fine the ceph-mon stack dump in the log all greps for 'ceph version' 
weren't followed by a stack trace. 

Executed ceph-deploy purge/purgedata on the monitor and osd's. 
NOTE:  had to manually go to the individual osd shells and remove /var/lib/ceph 
after umount of the ceph/xfs device. Running purgedata from the monitor always 
failed for the osd's still running initially confused me, but still mounted 
wouldn't have. Executing 'ceph-deploy purge' from the monitor succeeded on all 
of the osd's.

Ran ceph-deploy new/install/mon create/gatherkeys/osd create on the cluster (I 
haven't tried using create-initial yet for the monitor, but will use it on my 
next install).

Modified ceph.conf
- private cluster network for each osd
- osd pool default pag/pgp 
- osd pool default size/default min size
- osd min down reporters
AND because it's not costing me anything (that I know of yet) and seems to be 
the first thing requested on problems:
- debug osd = 20
- debug ms = 1

Started ceph-osd on all 3 OSD servers and restarted Ceph-mon (service ceph 
restart) on the Monitor.

As experienced and reported by Brian my cluster came up in the HEALTH_OK state 
immediately with all 192 pages in the default pools 'active+clean'. It took a 
week or 2 longer than I would have liked, but I am quite comfortable with 
install/reinstall and how to inspect all components of the system state. XFS is 
mounted on each osd data device, using 'ceph-disk list' get the partition # for 
the journal on the SSD which can then be check/dump the partition with sgdisk 
and observe 'ceph journal'.

[root@essperf3 Ceph]# ceph -s
cluster 32c48975-bb57-47f6-8138-e152452e3bbe
 health HEALTH_OK
 monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, 
quorum 0 essperf3
 osdmap e8: 3 osds: 3 up, 3 in
  pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects
10106 MB used, 1148 GB / 1158 GB avail
 192 active+clean
[root@essperf3 Ceph]# ceph osd tree
# idweight  type name   up/down reweight
-1  1.13root default
-2  0.45host ess51
0   0.45osd.0   up  1   
-3  0.23host ess52
1   0.23osd.1   up  1   
-4  0.45host ess59
2   0.45osd.2   up  1   
[root@essperf3 Ceph]#

I'm now moving on to creating RBD image(s) and looking at 'rbd bench-write'.

I have some quick questions:
- Are there any other benchmarks in wide use for Ceph clusters?

- Our next lab deployment is going to be more real world and involve many 
(~24HDDs) HDDs per OSD chassis (2 or 2 chassis). What is the general 
recommendation on the number of HDDs/OSD? 1 drive/osd? Where the drive can be a 
LVM or  MD virtual drive spanning multiple HDD's (SW RAID 0). 

- Partitioning of the journal SSDs for multiple osd's: 
We can use 1 SSD/OSD for journal and have 4 HDD RAID 0 devices (~13TB/osd) or 
smaller osd's and multiple journals on each SSD. What is the recommended 
configuration? (This will most likely be further investigated as we move 
forward with benchmarking, but would like the RH/Ceph recommended Best 
Practices).

-As long as I maintain 1GB Ram/1TB rotational storage we can have many 
osd's/physical chassis? Limits?

Thank you very much for all of your help.
Bruce

-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: Monday, August 04, 2014 12:25 PM
To: Bruce McFarland
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever

On Mon, 4 Aug 2014, Bruce McFarland wrote:
 Is there a header or first line that appears in all ceph-mon stack 
 dumps I can search for?  The couple of ceph-mon stack dumps I've seen 
 in web searches appear to all begin with ceph version 0.xx, but 
 those are from over a year ago. Is that still the case with 0.81 firefly code?

Yep!  Here's a recentish dump:

http://tracker.ceph.com/issues/8880

sage


 
 -Original Message-
 From: Sage Weil [mailto:sw...@redhat.com]
 Sent: Monday, August 04, 2014 10:09 AM
 To: Bruce McFarland
 Cc: Brian Rak; ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] Firefly OSDs stuck in creating state forever
 
 Okay, looks like the mon went down then.
 
 Was there a stack trace in the log after the daemon crashed?  (Or did 
 the daemon stay up but go unresponsive or something?)
 
 Thanks!
 sage
 
 
 On Mon, 4 Aug 2014, Bruce McFarland wrote:
 
  2014-08-04 09:57:37.144649 7f42171c8700  0 -- 
  209.243.160.35:0/1032499
   209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0
  l=1 c=0x7f4204001a90).fault
  2014-08-04 09:58:07.145097 7f4215ac3700  0 -- 
  209.243.160.35:0/1032499
   209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0
  l=1 c=0x7f4204001320).fault
  2014-08-04 09:58:37.145491 7f42171c8700  0 -- 
  209.243.160.35:0/1032499
   209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0

[ceph-users] OSD daemon code in /var/lib/ceph/osd/ceph-2/ dissapears after creating pool/rbd -

2014-08-04 Thread Bruce McFarland

This is going to sound odd and if I hadn't been issuing all commands on the 
monitor I would swear I issued 'rm -rf' from the shell of the osd in the 
/var/lib/osd/ceph-s/ directory. After creating the pool/rbd and getting an 
error from 'rbd info' I saw an osd down/out so I went to it's shell and the 
ceph-osd daemon code is gone. I'll assume I erased it, but how do I recover 
this cluster without doing a purge/purgedata reinstall?

I bought up a new cluster. All pages are 'active+clean' and all 3 OSD's are 
UP/IN.

[root@essperf3 Ceph]# ceph -s
cluster 32c48975-bb57-47f6-8138-e152452e3bbe
 health HEALTH_OK
 monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, 
quorum 0 essperf3
 osdmap e8: 3 osds: 3 up, 3 in
  pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects
10106 MB used, 1148 GB / 1158 GB avail
 192 active+clean
[root@essperf3 Ceph]# ceph osd tree
# id weight  type name  up/down reweight
-11.13root default
-20.45host ess51
0  0.45osd.0 up 
  1
-30.23host ess52
1  0.23osd.1 up 
  1
-40.45host ess59
2  0.45osd.2 up 
  1
[root@essperf3 Ceph]#

Next created a test pool and a 1GB rbd  and listed it

[root@essperf3 Ceph]# ceph osd pool create testpool 75 75
pool 'testpool' created
[root@essperf3 Ceph]# ceph osd lspools
0 data,1 metadata,2 rbd,3 testpool,
[root@essperf3 Ceph]# rbd create testimage --size 1024 --pool testpool
[root@essperf3 Ceph]# rbd ls testpool
testimage
[root@essperf3 Ceph]#


When I look at the 'info' output I start seeing problems.

[root@essperf3 Ceph]# rbd --image testimage info
rbd: error opening image testimage: (2) No such file or directory2014-08-04 
18:39:33.602263 7fc4b9e80760 -1 librbd::ImageCtx: error finding header: (2) No 
such file or directory

[root@essperf3 Ceph]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
693G 683G  10073M   1.42
POOLS:
NAME ID USED %USED OBJECTS
data 0  00 0
metadata 1  00 0
rbd  2  00 0
testpool 3  137  0 2
[root@essperf3 Ceph]# ceph -s
cluster 32c48975-bb57-47f6-8138-e152452e3bbe
 health HEALTH_WARN 267 pgs degraded; 100 pgs stuck unclean; recovery 2/6 
objects degraded (33.333%)
 monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, 
quorum 0 essperf3
 osdmap e21: 3 osds: 2 up, 2 in
  pgmap v48: 267 pgs, 4 pools, 137 bytes data, 2 objects
10073 MB used, 683 GB / 693 GB avail
2/6 objects degraded (33.333%)
 267 active+degraded
  client io 17 B/s rd, 0 op/s
[root@essperf3 Ceph]#

Check to see which OSD is down:

[root@essperf3 Ceph]# ceph osd tree
# id weight  type name  up/down reweight
-11.13root default
-20.45host ess51
0  0.45osd.0 up 
  1
-30.23host ess52
1  0.23osd.1 up 
  1
-40.45host ess59
2  0.45osd.2 down0
[root@essperf3 Ceph]#

Then go to the shell on ess59: and restart the osd: (This is where it gets 
rather odd) My ceph.conf has
debug osd = 20
debug ms = 1
and I expect to see output from the /etc/init.d/ceph restart osd and I see 
nothing. With a little digging I see that the /var/lib/ceph/osd/ceph-2/ 
directory is EMPTY. There is no ceph-osd daemon. It's almost like I did a 'rm 
-rf ' on that directory from the shell of ess59/osd.2 yet all commands have 
been  executed on the monitor.

[root@ess59 ceph]# ip addr | grep .59
inet 10.10.40.59/24 brd 10.10.40.255 scope global em1
inet6 fe80::92b1:1cff:fe18:659f/64 scope link
inet 209.243.160.59/24 brd 209.243.160.255 scope global em2
inet 10.10.50.59/24 brd 10.10.50.255 scope global p6p2
[root@ess59 ceph]# ll /var/lib/ceph/osd/
total 4
drwxr-xr-x 2 root root 4096 Aug  4 14:46 ceph-2
[root@ess59 ceph]# ll /var/lib/ceph/
total 24
drwxr-xr-x 2 root root 4096 Jul 29 18:36 bootstrap-mds
drwxr-xr-x 2 root root 4096 Aug  4 14:23 bootstrap-osd
drwxr-xr-x 2 root root 4096 Jul 29 18:36 mds
drwxr-xr-x 2 root root 4096 Jul 29 18:36 mon
drwxr-xr-x 3 root root 4096 Aug  4 14:46 osd
drwxr-xr-x 2 root root 4096 Aug  4 18:14 tmp
[root@ess59 ceph]# ll /var/lib/ceph/osd/ceph-2/
total 0
[root@ess59 ceph]#


Looking at the monitor logs I see osd.2 boot and even see where osd.2 leaves 
the cluster,

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-01 Thread Bruce McFarland

MDS: I assumed that I'd need to bring up a ceph-mds for my cluster at initial 
bringup. We also intended to modify the CRUSH map such that it's pool is 
resident to SSD(s). It is one of the areas of the online docs there doesn't 
seem to be a lot of info on and I haven't spent a lot of time researching. I'll 
stop it.

OSD connectivity:  The connectivity is good for both 1GE and 10GE. I thought 
moving to 10GE with nothing else on that net might help with group placement 
etc and bring up the pages quicker. I've checked 'tcpdump' output on all boxes.
Firewall: Thanks for that one - it's the basic I over looked in my ceph 
learning curve. One of the OSDs had selinux=enforcing - all others were 
disabled. Changing that box and the 10 pages in my demo-pool (kept page count 
very small for sanity) are now 'active+clean'. The pages for the default pools 
- data, metadata, rbd - are still stuck in  creating+peering or 
creating+incomplete. I did have to use manually set 'osd pool default min size 
= 1' from it's default of 2  for these 3 pools to eliminate a bunch of warnings 
in the 'ceph health detail' output.
I'm adding the [mon] setting  you suggested below and stopping ceph-mds and 
bringing everything up now.
[root@essperf3 Ceph]# ceph -s
cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
 health HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck 
inactive; 192 pgs stuck unclean; 28 requests are blocked  32 sec; 
nodown,noscrub flag(s) set
 monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, 
quorum 0 essperf3
 mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
 osdmap e752: 3 osds: 3 up, 3 in
flags nodown,noscrub
  pgmap v1483: 202 pgs, 4 pools, 0 bytes data, 0 objects
134 MB used, 1158 GB / 1158 GB avail
  96 creating+peering
  10 active+clean 
  96 creating+incomplete
[root@essperf3 Ceph]#

From: Brian Rak [mailto:b...@gameservers.com]
Sent: Friday, August 01, 2014 2:54 PM
To: Bruce McFarland; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever

Why do you have a MDS active?  I'd suggest getting rid of that at least until 
you have everything else working.

I see you've set nodown on the OSDs, did you have problems with the OSDs 
flapping?  Do the OSDs have broken connectivity between themselves?  Do you 
have some kind of firewall interfering here?
I've seen odd issues when the OSDs have broken private networking, you'll get 
one OSD marking all the other ones down.  Adding this to my config helped:

[mon]
mon osd min down reporters = 2

On 8/1/2014 5:41 PM, Bruce McFarland wrote:
Hello,
I've run out of ideas and assume I've overlooked something very basic. I've 
created 2 ceph clusters in the last 2 weeks with different OSD HW and private 
network fabrics - 1GE and 10GE. I have never been  able to get the OSDs to come 
up to the 'active+clean' state. I have followed your online documentation and 
at this point the only thing I don't think I've done is modifying the CRUSH map 
(although I have been looking into that). These are new clusters with no data 
and only 1 HDD and 1 SSD per OSD (24 2.5Ghz cores with 64GB RAM).

Since the disks are being recycled is there something I need to flag to let 
ceph just create it's mappings, but not scrub for data compatibility? I've 
tried setting the noscrub flag to no effect.

I also have constant OSD flapping. I've set nodown, but assume that is just 
masking a problem that still occurring.

Besides the lack of ever reaching 'active+clean' state ceph-mon always crashes 
after leaving it running overnight. The OSDs all eventually fill /root with 
with ceph logs so I regularly have to bring everything down Delete logs and 
restart.

I have all sorts of output from the ceph.conf; osd boot ouput with 'debug osd 
-= 20' and 'debug ms = 1'; ceph -w output; and pretty much all of the 
debug/monitoring suggestions from the online docs and 2 weeks of google 
searches from online references in blogs, mailing lists etc.

[root@essperf3 Ceph]# ceph -v
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
[root@essperf3 Ceph]# ceph -s
cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
 health HEALTH_WARN 96 pgs incomplete; 106 pgs peering; 202 pgs stuck 
inactive; 202 pgs stuck unclean; nodown,noscrub flag(s) set
 monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, 
quorum 0 essperf3
 mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
 osdmap e752: 3 osds: 3 up, 3 in
flags nodown,noscrub
  pgmap v1476: 202 pgs, 4 pools, 0 bytes data, 0 objects
134 MB used, 1158 GB / 1158 GB avail
 106 creating+peering
  96 creating+incomplete
[root@essperf3 Ceph]#

Suggestions?
Thanks,
Bruce




___

ceph-users mailing list

ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

46 matches

Mail list logo