Hello,
We from Nokia are validating bluestore on 3 node cluster with EC 2+1 While upgrading our cluster from Kraken 11.0.2 to 11.1.1 with bluesotre , the cluster affected more than half of the OSDs went down. $ceph -s cluster cb55baa8-d5a5-442e-9aae-3fd83553824e health HEALTH_ERR 792 pgs are stuck inactive for more than 300 seconds 792 pgs stale 792 pgs stuck stale 8/12 in osds are down monmap e2: 3 mons at {PL0-CN1= 10.50.5.16:6789/0,PL0-CN2=10.50.5.17:6789/0,PL0-CN3=10.50.5.18:6789/0} election epoch 28, quorum 0,1,2 PL0-CN1,PL0-CN2,PL0-CN3 mgr active: PL0-CN2 standbys: PL0-CN1, PL0-CN3 osdmap e191: 15 osds: 4 up, 12 in; 856 remapped pgs flags sortbitwise,require_jewel_osds,require_kraken_osds pgmap v508: 1088 pgs, 2 pools, 0 bytes data, 0 objects 157 MB used, 33531 GB / 33531 GB avail 792 stale+active+clean 296 active+clean OSD logs.. -------------- ~~~ 2017-01-11 12:03:38.740504 7f7741b13940 0 pidfile_write: ignore empty --pid-file 2017-01-11 12:03:38.758541 7f7741b13940 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb 2017-01-11 12:03:38.767324 7f7741b13940 0 load: jerasure load: lrc load: isa 2017-01-11 12:03:38.767791 7f7741b13940 1 bluestore(/var/lib/ceph/osd/ceph-5) mount path /var/lib/ceph/osd/ceph-5 2017-01-11 12:03:38.769697 7f7741b13940 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-5/block.db size 65536 kB 2017-01-11 12:03:38.770443 7f7741b13940 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-5/block size 2794 GB 2017-01-11 12:03:38.770961 7f7741b13940 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-5/block.wal size 128 MB 2017-01-11 12:03:38.771176 7f7741b13940 1 bluefs mount 2017-01-11 12:03:38.790311 7f7741b13940 0 set rocksdb option compression = kNoCompression 2017-01-11 12:03:38.790320 7f7741b13940 0 set rocksdb option max_write_buffer_number = 4 2017-01-11 12:03:38.790323 7f7741b13940 0 set rocksdb option min_write_buffer_number_to_merge = 1 2017-01-11 12:03:38.790328 7f7741b13940 0 set rocksdb option recycle_log_file_num = 4 2017-01-11 12:03:38.790332 7f7741b13940 0 set rocksdb option write_buffer_size = 268435456 2017-01-11 12:03:38.790354 7f7741b13940 0 set rocksdb option compression = kNoCompression 2017-01-11 12:03:38.790356 7f7741b13940 0 set rocksdb option max_write_buffer_number = 4 2017-01-11 12:03:38.790358 7f7741b13940 0 set rocksdb option min_write_buffer_number_to_merge = 1 2017-01-11 12:03:38.790360 7f7741b13940 0 set rocksdb option recycle_log_file_num = 4 2017-01-11 12:03:38.790362 7f7741b13940 0 set rocksdb option write_buffer_size = 268435456 2017-01-11 12:03:38.790493 7f7741b13940 4 rocksdb: RocksDB version: 5.0.0 <snip> 2017-01-11 12:03:38.839442 7f7741b13940 4 rocksdb: DB pointer 0x7f774cef0b00 2017-01-11 12:03:38.839470 7f7741b13940 1 bluestore(/var/lib/ceph/osd/ceph-5) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456 2017-01-11 12:03:38.840407 7f7741b13940 -1 bluestore(/var/lib/ceph/osd/ceph-5) warning: bluestore_min_alloc_size 65536 > min_min_alloc_size 4096, may impact performance. 2017-01-11 12:03:38.840429 7f7741b13940 1 freelist init 2017-01-11 12:03:39.339645 7f7741b13940 -1 osd.5 0 OSD::init() : unable to read osd superblock 2017-01-11 12:03:39.339659 7f7741b13940 1 bluestore(/var/lib/ceph/osd/ceph-5) umount 2017-01-11 12:03:39.454719 7f7741b13940 1 freelist shutdown 2017-01-11 12:03:39.454960 7f7741b13940 1 bluefs umount 2017-01-11 12:03:40.278242 7f7741b13940 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argumentESC[0m 2017-01-11 12:04:00.487006 7f9c3bebc940 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb 2017-01-11 12:04:00.487011 7f9c3bebc940 0 set uid:gid to 167:167 (ceph:ceph) 2017-01-11 12:04:00.487021 7f9c3bebc940 0 ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755), process ceph-osd, pid 11983 2017-01-11 12:04:00.487058 7f9c3bebc940 -1 WARNING: experimental feature 'bluestore' is enabled .~~~~~ Our Findings:-- 1) Cluster installed from scratch with 11.1.1 works fine , having issues with upgrade . 2) OSD's not getting activated after upgrade which cause osd process failed to read the superblock. Please provide suggestions/feedback to unblock this issue. Thanks Jayaram
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com