[ceph-users] slow 4k writes, Luminous with bluestore backend
Hi All, I upgraded my cluster from Hammer to Jewel and then to Luminous , changed from filestore to bluestore backend. on a KVM vm with 4 cpu /2 Gb RAM i have attached a 20gb rbd volume as vdc and performed following test. dd if=/dev/zero of=/dev/vdc bs=4k count=1000 oflag=direct 1000+0 records in 1000+0 records out 4096000 bytes (4.1 MB) copied, 3.08965 s, *1.3 MB/s* and its consistently giving 1.3MB/s which i feel is too low.I have 3 ceph osd nodes each with 24 x15k RPM with a replication of 2 ,connected 2x10G LACP bonded NICs with an MTU of 9100. Rados Bench results: rados bench -p volumes 4 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 4 seconds or 0 objects Object prefix: benchmark_data_ceph3.sapiennetworks.com_820994 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 276 260 1039.98 1040 0.0165053 0.0381299 2 16 545 529 1057.92 10760.043151 0.0580376 3 16 847 831 1107.91 1208 0.0394811 0.0567684 4 16 1160 11441143.9 1252 0.63265 0.0541888 Total time run: 4.099801 Total writes made: 1161 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 1132.74 Stddev Bandwidth: 101.98 Max bandwidth (MB/sec): 1252 Min bandwidth (MB/sec): 1040 Average IOPS: 283 Stddev IOPS:25 Max IOPS: 313 Min IOPS: 260 Average Latency(s): 0.0560897 Stddev Latency(s): 0.107352 Max latency(s): 1.02123 Min latency(s): 0.00920514 Cleaning up (deleting benchmark objects) Removed 1161 objects Clean up completed and total clean up time :0.079850 After upgrading to Luminous i have executed ceph osd crush tunables optimal ceph.conf [global] fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf mon_initial_members = node-16 node-30 node-31 mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true log_to_syslog_level = info log_to_syslog = True osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 64 public_network = 172.16.1.0/24 log_to_syslog_facility = LOG_LOCAL0 osd_journal_size = 2048 auth_supported = cephx osd_pool_default_pgp_num = 64 osd_mkfs_type = xfs cluster_network = 172.16.1.0/24 osd_recovery_max_active = 1 osd_max_backfills = 1 max_open_files = 131072 debug_default = False [client] rbd_cache_writethrough_until_flush = True rbd_cache = True [client.radosgw.gateway] rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator keyring = /etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 rgw_socket_path = /tmp/radosgw.sock rgw_keystone_revocation_interval = 100 rgw_keystone_url = http://192.168.1.3:35357 rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R rgw_init_timeout = 36 host = controller2 rgw_dns_name = *.sapiennetworks.com rgw_print_continue = True rgw_keystone_token_cache_size = 10 rgw_data = /var/lib/ceph/radosgw user = www-data [osd] journal_queue_max_ops = 3000 objecter_inflight_ops = 10240 journal_queue_max_bytes = 1048576000 filestore_queue_max_ops = 500 osd_mkfs_type = xfs osd_mount_options_xfs = rw,relatime,inode64,logbsize=256k,allocsize=4M osd_op_threads = 20 filestore_queue_committing_max_ops = 5000 journal_max_write_entries = 1000 objecter_infilght_op_bytes = 1048576000 filestore_queue_max_bytes = 1048576000 filestore_max_sync_interval = 10 journal_max_write_bytes = 1048576000 filestore_queue_committing_max_bytes = 1048576000 ms_dispatch_throttle_bytes = 1048576000 ceph -s cluster: id: 06c5c906-fc43-499f-8a6f-6c8e21807acf health: HEALTH_WARN application not enabled on 2 pool(s) services: mon: 3 daemons, quorum controller3,controller2,controller1 mgr: controller1(active) osd: 72 osds: 72 up, 72 in rgw: 1 daemon active data: pools: 5 pools, 6240 pgs objects: 12732 objects, 72319 MB usage: 229 GB used, 39965 GB / 40195 GB avail pgs: 6240 active+clean can some one suggest a way to improve this. Thanks, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
It was a firewall issue on the controller nodes.After allowing ceph-mgr port in iptables everything is displaying correctly.Thanks to people on IRC. Thanks alot, Kevin On Thu, Dec 21, 2017 at 5:24 PM, kevin parrikar wrote: > accidently removed mailing list email > > ++ceph-users > > Thanks a lot JC for looking into this issue. I am really out of ideas. > > > ceph.conf on mgr node which is also monitor node. > > [global] > fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf > mon_initial_members = node-16 node-30 node-31 > mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > log_to_syslog_level = info > log_to_syslog = True > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 64 > public_network = 172.16.1.0/24 > log_to_syslog_facility = LOG_LOCAL0 > osd_journal_size = 2048 > auth_supported = cephx > osd_pool_default_pgp_num = 64 > osd_mkfs_type = xfs > cluster_network = 172.16.1.0/24 > osd_recovery_max_active = 1 > osd_max_backfills = 1 > mon allow pool delete = true > > [client] > rbd_cache_writethrough_until_flush = True > rbd_cache = True > > [client.radosgw.gateway] > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator > keyring = /etc/ceph/keyring.radosgw.gateway > rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 > rgw_socket_path = /tmp/radosgw.sock > rgw_keystone_revocation_interval = 100 > rgw_keystone_url = http://192.168.1.3:35357 > rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R > rgw_init_timeout = 36 > host = controller3 > rgw_dns_name = *.sapiennetworks.com > rgw_print_continue = True > rgw_keystone_token_cache_size = 10 > rgw_data = /var/lib/ceph/radosgw > user = www-data > > > > > ceph auth list > > > osd.100 > key: AQAtZjpaVZOFBxAAwl0yFLdUOidLzPFjv+HnjA== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > osd.101 > key: AQA4ZjpaS4wwGBAABwgoXQRc1J8sav4MUkWceQ== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > osd.102 > key: AQBDZjpaBS2tEBAAtFiPKBzh8JGi8Nh3PtAGCg== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * > > client.admin > key: AQD0yXFYflnYFxAAEz/2XLHO/6RiRXQ5HXRAnw== > caps: [mds] allow * > caps: [mgr] allow * > caps: [mon] allow * > caps: [osd] allow * > client.backups > key: AQC0y3FY4YQNNhAAs5fludq0yvtp/JJt7RT4HA== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=backups, allow rwx pool=volumes > client.bootstrap-mds > key: AQD5yXFYyIxiFxAAyoqLPnxxqWmUr+zz7S+qVQ== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-mds > client.bootstrap-mgr > key: AQBmOTpaXqHQDhAAyDXoxlPmG9QovfmmUd8gIg== > caps: [mon] allow profile bootstrap-mgr > client.bootstrap-osd > key: AQD0yXFYuGkSIhAAelSb3TCPuXRFoFJTBh7Vdg== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-osd > client.bootstrap-rbd > key: AQBnOTpafDS/IRAAnKzuI9AYEF81/6mDVv0QgQ== > caps: [mon] allow profile bootstrap-rbd > > client.bootstrap-rgw > key: AQD3yXFYxt1mLRAArxOgRvWmmzT9pmsqTLpXKw== > caps: [mgr] allow r > caps: [mon] allow profile bootstrap-rgw > client.compute > key: AQCbynFYRcNWOBAAPzdAKfP21GvGz1VoHBimGQ== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=volumes, allow rx pool=images, allow rwx pool=compute > client.images > key: AQCyy3FYSMtlJRAAbJ8/U/R82NXvWBC5LmkPGw== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=images > client.radosgw.gateway > key: AQA3ynFYAYMSAxAApvfe/booa9KhigpKpLpUOA== > caps: [mgr] allow r > caps: [mon] allow rw > caps: [osd] allow rwx > client.volumes > key: AQCzy3FYa3paKBAA9BlYpQ1PTeR770ghVv1jKQ== > caps: [mgr] allow r > caps: [mon] allow r > caps: [osd] allow class-read object_prefix rbd_children, allow rwx > pool=volumes, allow rx pool=images > mgr.controller2 > key: AQAmVTpaA+9vBhAApD3rMs//Qri+SawjUF4U4Q== > caps: [mds] allow * > caps: [mgr] allow *
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
accidently removed mailing list email ++ceph-users Thanks a lot JC for looking into this issue. I am really out of ideas. ceph.conf on mgr node which is also monitor node. [global] fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf mon_initial_members = node-16 node-30 node-31 mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true log_to_syslog_level = info log_to_syslog = True osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 64 public_network = 172.16.1.0/24 log_to_syslog_facility = LOG_LOCAL0 osd_journal_size = 2048 auth_supported = cephx osd_pool_default_pgp_num = 64 osd_mkfs_type = xfs cluster_network = 172.16.1.0/24 osd_recovery_max_active = 1 osd_max_backfills = 1 mon allow pool delete = true [client] rbd_cache_writethrough_until_flush = True rbd_cache = True [client.radosgw.gateway] rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator keyring = /etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 rgw_socket_path = /tmp/radosgw.sock rgw_keystone_revocation_interval = 100 rgw_keystone_url = http://192.168.1.3:35357 rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R rgw_init_timeout = 36 host = controller3 rgw_dns_name = *.sapiennetworks.com rgw_print_continue = True rgw_keystone_token_cache_size = 10 rgw_data = /var/lib/ceph/radosgw user = www-data ceph auth list osd.100 key: AQAtZjpaVZOFBxAAwl0yFLdUOidLzPFjv+HnjA== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.101 key: AQA4ZjpaS4wwGBAABwgoXQRc1J8sav4MUkWceQ== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * osd.102 key: AQBDZjpaBS2tEBAAtFiPKBzh8JGi8Nh3PtAGCg== caps: [mgr] allow profile osd caps: [mon] allow profile osd caps: [osd] allow * client.admin key: AQD0yXFYflnYFxAAEz/2XLHO/6RiRXQ5HXRAnw== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * client.backups key: AQC0y3FY4YQNNhAAs5fludq0yvtp/JJt7RT4HA== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=backups, allow rwx pool=volumes client.bootstrap-mds key: AQD5yXFYyIxiFxAAyoqLPnxxqWmUr+zz7S+qVQ== caps: [mgr] allow r caps: [mon] allow profile bootstrap-mds client.bootstrap-mgr key: AQBmOTpaXqHQDhAAyDXoxlPmG9QovfmmUd8gIg== caps: [mon] allow profile bootstrap-mgr client.bootstrap-osd key: AQD0yXFYuGkSIhAAelSb3TCPuXRFoFJTBh7Vdg== caps: [mgr] allow r caps: [mon] allow profile bootstrap-osd client.bootstrap-rbd key: AQBnOTpafDS/IRAAnKzuI9AYEF81/6mDVv0QgQ== caps: [mon] allow profile bootstrap-rbd client.bootstrap-rgw key: AQD3yXFYxt1mLRAArxOgRvWmmzT9pmsqTLpXKw== caps: [mgr] allow r caps: [mon] allow profile bootstrap-rgw client.compute key: AQCbynFYRcNWOBAAPzdAKfP21GvGz1VoHBimGQ== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images, allow rwx pool=compute client.images key: AQCyy3FYSMtlJRAAbJ8/U/R82NXvWBC5LmkPGw== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=images client.radosgw.gateway key: AQA3ynFYAYMSAxAApvfe/booa9KhigpKpLpUOA== caps: [mgr] allow r caps: [mon] allow rw caps: [osd] allow rwx client.volumes key: AQCzy3FYa3paKBAA9BlYpQ1PTeR770ghVv1jKQ== caps: [mgr] allow r caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images mgr.controller2 key: AQAmVTpaA+9vBhAApD3rMs//Qri+SawjUF4U4Q== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * mgr.controller3 key: AQByfDparprIEBAAj7Pxdr/87/v0kmJV49aKpQ== caps: [mds] allow * caps: [mgr] allow * caps: [mon] allow * caps: [osd] allow * Regards, Kevin On Thu, Dec 21, 2017 at 8:10 AM, kevin parrikar wrote: > Thanks JC, > I tried > ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr > 'allow *' > > but still status is same,also mgr.log is being flooded with below errors. > > 2017-12-21 02:39:10.622834 7fb40a22b700 0 Cannot get stat of OSD 140 > 2017-12-21 02:39:10.622835 7fb40a22b700 0 Cannot get stat of OSD 141 > Not sure whats wrong in my setup > > Regards, > Kevin > > > On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopez > wrote: > >> Hi,
Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade
Thanks JC, I tried ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr 'allow *' but still status is same,also mgr.log is being flooded with below errors. 2017-12-21 02:39:10.622834 7fb40a22b700 0 Cannot get stat of OSD 140 2017-12-21 02:39:10.622835 7fb40a22b700 0 Cannot get stat of OSD 141 Not sure whats wrong in my setup Regards, Kevin On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopez wrote: > Hi, > > make sure client.admin user has an MGR cap using ceph auth list. At some > point there was a glitch with the update process that was not adding the > MGR cap to the client.admin user. > > JC > > > On Dec 20, 2017, at 10:02, kevin parrikar > wrote: > > hi All, > I have upgraded the cluster from Hammer to Jewel and to Luminous . > > i am able to upload/download glance images but ceph -s shows 0kb used and > Available and probably because of that cinder create is failing. > > > ceph -s > cluster: > id: 06c5c906-fc43-499f-8a6f-6c8e21807acf > health: HEALTH_WARN > Reduced data availability: 6176 pgs inactive > Degraded data redundancy: 6176 pgs unclean > > services: > mon: 3 daemons, quorum controller3,controller2,controller1 > mgr: controller3(active) > osd: 71 osds: 71 up, 71 in > rgw: 1 daemon active > > data: > pools: 4 pools, 6176 pgs > objects: 0 objects, 0 bytes > usage: 0 kB used, 0 kB / 0 kB avail > pgs: 100.000% pgs unknown > 6176 unknown > > > i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr > create ,it was successfull but for some reason ceph -s is not showing > correct values. > Can some one help me here please > > Regards, > Kevin > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph status doesnt show available and used disk space after upgrade
hi All, I have upgraded the cluster from Hammer to Jewel and to Luminous . i am able to upload/download glance images but ceph -s shows 0kb used and Available and probably because of that cinder create is failing. ceph -s cluster: id: 06c5c906-fc43-499f-8a6f-6c8e21807acf health: HEALTH_WARN Reduced data availability: 6176 pgs inactive Degraded data redundancy: 6176 pgs unclean services: mon: 3 daemons, quorum controller3,controller2,controller1 mgr: controller3(active) osd: 71 osds: 71 up, 71 in rgw: 1 daemon active data: pools: 4 pools, 6176 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 6176 unknown i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create ,it was successfull but for some reason ceph -s is not showing correct values. Can some one help me here please Regards, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start
Thank you all for your suggestions: This is what i followed for the upgrade: Hammer to Jewel: apt-get dist-upgrade on each node seperately. stopped monitor process; stopped osd; changed permission to ceph:ceph recursively for /var/lib/ceph/ restarted monitor process; restarted osd; *ceph osd set require_jewel_osds;* *ceph osd set sortbitwise;* verified with ceph -s rados bench Jewel to Luminous apt-get dist-upgrade on each node. stopped monitor process; stopped osd process; restarted monitor; restarted osd process; Result: osd is not coming up Steps tried to resolve: rebooted all nodes; upgrade from Hammer to Jewel was almost smooth but Jewel to Luminous OSD is not coming up. Any suggestions on where to check for clue. Regards, Kev On Wed, Sep 13, 2017 at 1:17 AM, Lincoln Bryant wrote: > Did you set the sortbitwise flag, fix OSD ownership (or use the "setuser > match path" option) and such after upgrading from Hammer to Jewel? I am not > sure if that matters here, but it might help if you elaborate on your > upgrade process a bit. > > --Lincoln > > > On Sep 12, 2017, at 2:22 PM, kevin parrikar > wrote: > > > > Can some one please help me on this.I have no idea how to bring up the > cluster to operational state. > > > > Thanks, > > Kev > > > > On Tue, Sep 12, 2017 at 11:12 AM, kevin parrikar < > kevin.parker...@gmail.com> wrote: > > hello All, > > I am trying to upgrade a small test setup having one monitor and one osd > node which is in hammer release . > > > > > > I updating from hammer to jewel using package update commands and things > are working. > > How ever after updating from Jewel to Luminous, i am facing issues with > osd failing to start . > > > > upgraded packages on both nodes and i can see in "ceph mon versions" is > successful > > > > ceph mon versions > > { > > "ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > luminous (rc)": 1 > > } > > > > but ceph osd versions returns empty strig > > > > > > ceph osd versions > > {} > > > > > > dpkg --list|grep ceph > > ii ceph 12.2.0-1trusty >amd64distributed storage and file system > > ii ceph-base12.2.0-1trusty >amd64common ceph daemon libraries and management tools > > ii ceph-common 12.2.0-1trusty >amd64common utilities to mount and interact with a ceph > storage cluster > > ii ceph-deploy 1.5.38 >all Ceph-deploy is an easy to use configuration tool > > ii ceph-mgr 12.2.0-1trusty >amd64manager for the ceph distributed storage system > > ii ceph-mon 12.2.0-1trusty >amd64monitor server for the ceph storage system > > ii ceph-osd 12.2.0-1trusty >amd64OSD server for the ceph storage system > > ii libcephfs1 10.2.9-1trusty >amd64Ceph distributed file system client library > > ii libcephfs2 12.2.0-1trusty >amd64Ceph distributed file system client library > > ii python-cephfs12.2.0-1trusty >amd64Python 2 libraries for the Ceph libcephfs library > > > > from OSD log: > > > > 2017-09-12 05:38:10.618023 7fc307a10d00 0 set uid:gid to 64045:64045 > (ceph:ceph) > > 2017-09-12 05:38:10.618618 7fc307a10d00 0 ceph version 12.2.0 ( > 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process > (unknown), pid 21513 > > 2017-09-12 05:38:10.624473 7fc307a10d00 0 pidfile_write: ignore empty > --pid-file > > 2017-09-12 05:38:10.633099 7fc307a10d00 0 load: jerasure load: lrc > load: isa > > 2017-09-12 05:38:10.633657 7fc307a10d00 0 > > filestore(/var/lib/ceph/osd/ceph-0) > backend xfs (magic 0x58465342) > > 2017-09-12 05:38:10.635164 7fc307a10d00 0 > > filestore(/var/lib/ceph/osd/ceph-0) > backend xfs (magic 0x58465342) > > 2017-09-12 05:38:10.637503 7fc307a10d00 0 > > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config > option > > 2017-09-12 05:38:10.637833 7fc307a10d00 0 > > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data > hole' config option > > 2017-09-12 05:38:10.637923 7fc307a10d00
Re: [ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start
Can some one please help me on this.I have no idea how to bring up the cluster to operational state. Thanks, Kev On Tue, Sep 12, 2017 at 11:12 AM, kevin parrikar wrote: > hello All, > I am trying to upgrade a small test setup having one monitor and one osd > node which is in hammer release . > > > I updating from hammer to jewel using package update commands and things > are working. > How ever after updating from Jewel to Luminous, i am facing issues with > osd failing to start . > > upgraded packages on both nodes and i can see in "ceph mon versions" is > successful > > > > > > * ceph mon versions{"ceph version 12.2.0 > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)": 1}* > but ceph osd versions returns empty strig > > > > > *ceph osd versions{}* > > *dpkg --list|grep ceph* > ii ceph 12.2.0-1trusty > amd64distributed storage and file system > ii ceph-base12.2.0-1trusty > amd64common ceph daemon libraries and management tools > ii ceph-common 12.2.0-1trusty > amd64common utilities to mount and interact with a ceph storage > cluster > ii ceph-deploy 1.5.38 > all Ceph-deploy is an easy to use configuration tool > ii ceph-mgr 12.2.0-1trusty > amd64manager for the ceph distributed storage system > ii ceph-mon 12.2.0-1trusty > amd64monitor server for the ceph storage system > ii ceph-osd 12.2.0-1trusty > amd64OSD server for the ceph storage system > ii libcephfs1 10.2.9-1trusty > amd64Ceph distributed file system client library > ii libcephfs2 12.2.0-1trusty > amd64Ceph distributed file system client library > ii python-cephfs12.2.0-1trusty > amd64Python 2 libraries for the Ceph libcephfs library > > > *from OSD log:* > 2017-09-12 05:38:10.618023 7fc307a10d00 0 set uid:gid to 64045:64045 > (ceph:ceph) > 2017-09-12 05:38:10.618618 7fc307a10d00 0 ceph version 12.2.0 ( > 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process > (unknown), pid 21513 > 2017-09-12 05:38:10.624473 7fc307a10d00 0 pidfile_write: ignore empty > --pid-file > 2017-09-12 05:38:10.633099 7fc307a10d00 0 load: jerasure load: lrc load: > isa > 2017-09-12 05:38:10.633657 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) > backend xfs (magic 0x58465342) > 2017-09-12 05:38:10.635164 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) > backend xfs (magic 0x58465342) > 2017-09-12 05:38:10.637503 7fc307a10d00 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config > option > 2017-09-12 05:38:10.637833 7fc307a10d00 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data > hole' config option > 2017-09-12 05:38:10.637923 7fc307a10d00 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: splice() is disabled via 'filestore splice' config option > 2017-09-12 05:38:10.639047 7fc307a10d00 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_features: syncfs(2) syscall fully supported (by glibc and kernel) > 2017-09-12 05:38:10.639501 7fc307a10d00 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) > detect_feature: extsize is disabled by conf > 2017-09-12 05:38:10.640417 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) > start omap initiation > 2017-09-12 05:38:10.640842 7fc307a10d00 1 leveldb: Recovering log #102 > 2017-09-12 05:38:10.642690 7fc307a10d00 1 leveldb: Delete type=0 #102 > > 2017-09-12 05:38:10.643128 7fc307a10d00 1 leveldb: Delete type=3 #101 > > 2017-09-12 05:38:10.649616 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) > mount(1758): enabling WRITEAHEAD journal mode: checkpoint is not enabled > 2017-09-12 05:38:10.654071 7fc307a10d00 -1 journal FileJournal::_open: > disabling aio for non-block journal. Use journal_force_aio to force use of > aio anyway > 2017-09-12 05:38:10.654590 7fc307a10d00 1 journal _open > /var/lib/ceph/osd/ceph-0/journal fd 28: 2147483648 bytes, block size 4096 > bytes, directio = 1, aio = 0 > 2017-09-12 05:38:10.655353 7fc307a10d00 1 journal _open > /var/lib/ceph/osd/ceph-0/journal fd 28: 2147483648 bytes, block size 4096 > bytes, directio = 1, aio = 0 > 2017-09-12 05:38:10.656985 7fc307a10d00 1 filestore(/var/lib/ceph/osd/ceph-0) > upgrade(1365) > 2017-09-12 05:38:10.657798 7fc30
[ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start
hello All, I am trying to upgrade a small test setup having one monitor and one osd node which is in hammer release . I updating from hammer to jewel using package update commands and things are working. How ever after updating from Jewel to Luminous, i am facing issues with osd failing to start . upgraded packages on both nodes and i can see in "ceph mon versions" is successful * ceph mon versions{"ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)": 1}* but ceph osd versions returns empty strig *ceph osd versions{}* *dpkg --list|grep ceph* ii ceph 12.2.0-1trusty amd64distributed storage and file system ii ceph-base 12.2.0-1trusty amd64common ceph daemon libraries and management tools ii ceph-common 12.2.0-1trusty amd64common utilities to mount and interact with a ceph storage cluster ii ceph-deploy 1.5.38 all Ceph-deploy is an easy to use configuration tool ii ceph-mgr 12.2.0-1trusty amd64manager for the ceph distributed storage system ii ceph-mon 12.2.0-1trusty amd64monitor server for the ceph storage system ii ceph-osd 12.2.0-1trusty amd64OSD server for the ceph storage system ii libcephfs1 10.2.9-1trusty amd64Ceph distributed file system client library ii libcephfs2 12.2.0-1trusty amd64Ceph distributed file system client library ii python-cephfs 12.2.0-1trusty amd64Python 2 libraries for the Ceph libcephfs library *from OSD log:* 2017-09-12 05:38:10.618023 7fc307a10d00 0 set uid:gid to 64045:64045 (ceph:ceph) 2017-09-12 05:38:10.618618 7fc307a10d00 0 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), pid 21513 2017-09-12 05:38:10.624473 7fc307a10d00 0 pidfile_write: ignore empty --pid-file 2017-09-12 05:38:10.633099 7fc307a10d00 0 load: jerasure load: lrc load: isa 2017-09-12 05:38:10.633657 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2017-09-12 05:38:10.635164 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342) 2017-09-12 05:38:10.637503 7fc307a10d00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-09-12 05:38:10.637833 7fc307a10d00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-09-12 05:38:10.637923 7fc307a10d00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 'filestore splice' config option 2017-09-12 05:38:10.639047 7fc307a10d00 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-09-12 05:38:10.639501 7fc307a10d00 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf 2017-09-12 05:38:10.640417 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) start omap initiation 2017-09-12 05:38:10.640842 7fc307a10d00 1 leveldb: Recovering log #102 2017-09-12 05:38:10.642690 7fc307a10d00 1 leveldb: Delete type=0 #102 2017-09-12 05:38:10.643128 7fc307a10d00 1 leveldb: Delete type=3 #101 2017-09-12 05:38:10.649616 7fc307a10d00 0 filestore(/var/lib/ceph/osd/ceph-0) mount(1758): enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-09-12 05:38:10.654071 7fc307a10d00 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2017-09-12 05:38:10.654590 7fc307a10d00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 28: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-09-12 05:38:10.655353 7fc307a10d00 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 28: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0 2017-09-12 05:38:10.656985 7fc307a10d00 1 filestore(/var/lib/ceph/osd/ceph-0) upgrade(1365) 2017-09-12 05:38:10.657798 7fc307a10d00 0 _get_class not permitted to load sdk 2017-09-12 05:38:10.658675 7fc307a10d00 0 _get_class not permitted to load lua 2017-09-12 05:38:10.658931 7fc307a10d00 0 /build/ceph-12.2.0/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs 2017-09-12 05:38:10.659320 7fc307a10d00 0 /build/ceph-12.2.0/src/cls/hello/cls_hello.cc:296: loading cls_hello 2017-09-12 05:38:10.662854 7fc307a10d00 0 _get_class not permitted to load kvs 2017-09-12 05:38:10.663621 7fc307a10d00 -1 osd.0 0 failed to load OSD map for epoch 32, got 0 bytes 2017-09-12 05:38:10.70 7fc307a10d00 -1 /build/ceph-12.2.0/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fc307a10d00 time 2017-09-12 05:38:10
[ceph-users] 50 OSD on 10 nodes vs 50 osd on 50 nodes
hello All, I have 50 compute nodes in my environment which are running virtual machines.I can add one more 10k RPM SAS disk and 1X10G interface to each server and thus there will be 50 OSD running on 50 compute nodes. Its not easy to obtain more servers for running Ceph nor taking away servers from existing pool. *Question:* Will there be performance impact running 1 OSD process each on 50 servers *vs* 10 OSD process each on 5 servers? I asked some of the guys who have worked on ceph in large scale before, but they all say i should run more OSD process per node to get performance over 1 OSD per node but they are not able to justify it. can someone please clarify about the performance difference between these two. Regards, Kev ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-jewel on docker+Kubernetes - crashing
hello All, I am trying Ceph - Jewel on Ubuntu 16.04 with Kubernetes 1.6.2 and Docker 1.11.2 but for some unknown reason its not coming up and crashing often,all ceph commands are failing. from *ceph-mon-check:* kubectl logs -n ceph ceph-mon-check-3190136794-21xg4 -f subprocess.CalledProcessError: Command 'ceph --cluster=${CLUSTER} mon getmap > /tmp/monmap && monmaptool -f /tmp/monmap --print' returned non-zero exit status 1 2017-05-01 15:45:52 /entrypoint.sh: sleep 30 sec 2017-05-01 15:46:22 /entrypoint.sh: checking for zombie mons 2017-05-01 15:51:22.613476 7f0d3ea8c700 0 monclient(hunting): authenticate timed out after 300 2017-05-01 15:51:22.613561 7f0d3ea8c700 0 librados: client.admin authentication error (110) Connection timed out Error connecting to cluster: TimedOut Traceback (most recent call last): File "/check_zombie_mons.py", line 30, in current_mons = extract_mons_from_monmap() File "/check_zombie_mons.py", line 18, in extract_mons_from_monmap monmap = subprocess.check_output(monmap_command, shell=True) File "/usr/lib/python2.7/subprocess.py", line 574, in check_output raise CalledProcessError(retcode, cmd, output=output) all pods and nodes are able to resolve service-name "ceph-mon" *cep keys are present in all pods.* kubectl exec -n ceph ceph-mon-0 -- ls /etc/ceph/ ceph.client.admin.keyring ceph.conf ceph.mon.keyring *kubectl logs -n ceph ceph-mon-0 --tail=20* 2017-05-01 16:08:44.081462 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:45.158398 7fcdf1595700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d603f1980).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:45.158328 7fcdf0f8f700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d602eac00).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:45.745314 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:46.081824 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:47.745473 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:48.081962 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:49.745526 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:50.081979 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:51.746027 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:52.082151 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:53.745586 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:54.082630 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:55.158549 7fcdf0b8b700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608ff900).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:55.158621 7fcdf1191700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608fd500).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:55.745867 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:56.082868 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:57.686779 7fcdf3e9b700 0 mon.ceph-mon-0@-1(probing).data_health(0) update_stats avail 93% total 237 GB, used 4398 MB, avail 221 GB 2017-05-01 16:08:57.746175 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:58.083616 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints kubectl get po -n ceph NAME READY STATUS RESTARTS AGE ceph-mds-722237312-35l5k 0/1 CrashLoopBackOff 3241d ceph-mon-01/1 Running 0 1d ceph-mon-11/1 Running 0 1d ceph-mon-21/1 Running 0 1d ceph-mon-check-3190136794-21xg4 1/1 Running0 1d ceph-osd-bvz3h0/1CrashLoopBackOff 409 1d ceph-osd-hq50d0/1Running408 1d ceph-osd-ljdwh0/1 CrashLoopBackOff 409 1d kubectl logs -n ceph
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Wow thats a lot of good information. I wish i knew about all these before investing on all these devices.Since i dont have any other option,will get better SSD and faster HDD . I have one more generic question about Ceph. To increase the throughput of a cluster what is the standard practice is it more osd "per" node or more osd "nodes". Thanks alot for all your help.Learned so many new things thanks again Kevin On Sat, Jan 7, 2017 at 7:33 PM, Lionel Bouton < lionel-subscript...@bouton.name> wrote: > Le 07/01/2017 à 14:11, kevin parrikar a écrit : > > Thanks for your valuable input. > We were using these SSD in our NAS box(synology) and it was giving 13k > iops for our fileserver in raid1.We had a few spare disks which we added to > our ceph nodes hoping that it will give good performance same as that of > NAS box.(i am not comparing NAS with ceph ,just the reason why we decided > to use these SSD) > > We dont have S3520 or S3610 at the moment but can order one of these to > see how it performs in ceph .We have 4xS3500 80Gb handy. > If i create a 2 node cluster with 2xS3500 each and with replica of 2,do > you think it can deliver 24MB/s of 4k writes . > > > Probably not. See http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to- > test-if-your-ssd-is-suitable-as-a-journal-device/ > > According to the page above the DC S3500 reaches 39MB/s. Its capacity > isn't specified, yours are 80GB only which is the lowest capacity I'm aware > of and for all DC models I know of the speed goes down with the capacity so > you probably will get lower than that. > If you put both data and journal on the same device you cut your bandwidth > in half : so this would give you an average <20MB/s per OSD (with > occasional peaks above that if you don't have a sustained 20MB/s). With 4 > OSDs and size=2, your total write bandwidth is <40MB/s. For a single stream > of data you will only get <20MB/s though (you won't benefit from parallel > writes to the 4 OSDs and will only write on 2 at a time). > > Not that by comparison the 250GB 840 EVO only reaches 1.9MB/s. > > But even if you reach the 40MB/s, these models are not designed for heavy > writes, you will probably kill them long before their warranty is expired > (IIRC these are rated for ~24GB writes per day over the warranty period). > In your configuration you only have to write 24G each day (as you have 4 of > them, write both to data and journal and size=2) to be in this situation > (this is an average of only 0.28 MB/s compared to your 24 MB/s target). > > We bought S3500 because last time when we tried ceph, people were > suggesting this model :) :) > > > The 3500 series might be enough with the higher capacities in some rare > cases but the 80GB model is almost useless. > > You have to do the math considering : > - how much you will write to the cluster (guess high if you have to guess), > - if you will use the SSD for both journals and data (which means writing > twice on them), > - your replication level (which means you will write multiple times the > same data), > - when you expect to replace the hardware, > - the amount of writes per day they support under warranty (if the > manufacturer doesn't present this number prominently they probably are > trying to sell you a fast car headed for a brick wall) > > If your hardware can't handle the amount of write you expect to put in it > then you are screwed. There were reports of new Ceph users not aware of > this and using cheap SSDs that failed in a matter of months all at the same > time. You definitely don't want to be in their position. > In fact as problems happen (hardware failure leading to cluster storage > rebalancing for example) you should probably get a system able to handle > 10x the amount of writes you expect it to handle and then monitor the SSD > SMART attributes to be alerted long before they die and replace them before > problems happen. You definitely want a controller allowing access to this > information. If you can't you will have to monitor the writes and guess > this value which is risky as write amplification inside SSDs is not easy to > guess... > > Lionel > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Thanks for your valuable input. We were using these SSD in our NAS box(synology) and it was giving 13k iops for our fileserver in raid1.We had a few spare disks which we added to our ceph nodes hoping that it will give good performance same as that of NAS box.(i am not comparing NAS with ceph ,just the reason why we decided to use these SSD) We dont have S3520 or S3610 at the moment but can order one of these to see how it performs in ceph .We have 4xS3500 80Gb handy. If i create a 2 node cluster with 2xS3500 each and with replica of 2,do you think it can deliver 24MB/s of 4k writes . We bought S3500 because last time when we tried ceph, people were suggesting this model :) :) Thanks alot for your help On Sat, Jan 7, 2017 at 6:01 PM, Lionel Bouton < lionel-subscript...@bouton.name> wrote: > Hi, > > Le 07/01/2017 à 04:48, kevin parrikar a écrit : > > i really need some help here :( > > replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with no > seperate journal Disk .Now both OSD nodes are with 2 ssd disks with a > replica of *2* . > Total number of OSD process in the cluster is *4*.with all SSD. > > > These SSDs are not designed for the kind of usage you are putting them > through. The Evo and even the Pro line from Samsung can't write both fast > and securely (ie : you can write fast and lose data if you get a power > outage or you can write slow and keep your data, Ceph always makes sure > your data is recoverable before completing a write : it is slow with these > SSDs). > > Christian already warned you about endurance and reliability, you just > discovered the third problem : speed. > > Lionel > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Thanks Maged for your suggestion. I have executed rbd bench and here is the result,please have a look at it rbd bench-write image01 --pool=rbd --io-threads=32 --io-size 4096 --io-pattern rand --rbd_cache=false bench-write io_size 4096 io_threads 32 bytes 1073741824 pattern rand SEC OPS OPS/SEC BYTES/SEC 1 4750 4750.19 19456758.28 2 7152 3068.49 12568516.09 4 7220 1564.41 6407837.20 5 8941 1794.35 7349666.74 6 11938 1994.94 8171294.61 7 12932 1365.21 5591891.85 ^C not sure why it skipped "3" from SEC . I suppose this also shows slow performance. Any idea where could be the issue? I use LSI 9260-4i controller (firmware 12.13.0.-0154) on both the nodes with write back enabled . i am not sure if this controller is suitable for ceph. Regards, Kevin On Sat, Jan 7, 2017 at 1:23 PM, Maged Mokhtar wrote: > The numbers are very low. I would first benchmark the system without the > vm client using rbd 4k test such as: > > rbd bench-write image01 --pool=rbd --io-threads=32 --io-size 4096 > --io-pattern rand --rbd_cache=false > > > > ---- Original message > From: kevin parrikar > Date: 07/01/2017 05:48 (GMT+02:00) > To: Christian Balzer > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Analysing ceph performance with SSD journal, > 10gbe NIC and 2 replicas -Hammer release > > i really need some help here :( > > replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with no > seperate journal Disk .Now both OSD nodes are with 2 ssd disks with a > replica of *2* . > Total number of OSD process in the cluster is *4*.with all SSD. > > > But throughput has gone down from 1.4 MB/s to 1.3 MB/s for 4k writes and > for 4M it has gone down from 140MB/s to 126MB/s . > > now atop no longer shows OSD device as 100% busy.. > > How ever i can see both ceph-osd process in atop with 53% and 47% disk > utilization. > > PID RDDSK WRDSK WCANCL > DSK CMD1/2 > 20771 0K648.8M 0K > 53%ceph-osd > 19547 0K576.7M 0K > 47%ceph-osd > > > OSD disks(ssd) utilization from atop > > DSK | sdc | busy 6% | read 0 | write 517 | KiB/r 0 | KiB/w 293 > | MBr/s 0.00 | MBw/s 148.18 | avq 9.44 | avio 0.12 ms | > > DSK | sdd | busy 5% | read 0 | write 336 | KiB/r 0 | KiB/w 292 > | MBr/s 0.00 | MBw/s 96.12 | avq 7.62 | avio 0.15 ms | > > > Queue Depth of OSD disks > cat /sys/block/sdd/device//queue_depth > 256 > > atop inside virtual machine:[4 CPU/3Gb RAM] > DSK | vdc | busy 96% | read 0 | write 256 | KiB/r 0 | > KiB/w 512 | MBr/s 0.00 | MBw/s 128.00 | avq7.96 | avio 3.77 ms | > > > Both Guest and Host are using deadline I/O scheduler > > > Virtual Machine Configuration: > > > > > > > > > > > > > > 449da0e7-6223-457c-b2c6-b5e112099212 >function='0x0'/> > > > > > ceph.conf > > cat /etc/ceph/ceph.conf > > [global] > fsid = c4e1a523-9017-492e-9c30-8350eba1bd51 > mon_initial_members = node-16 node-30 node-31 > mon_host = 172.16.1.11 172.16.1.12 172.16.1.8 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > log_to_syslog_level = info > log_to_syslog = True > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 64 > public_network = 172.16.1.0/24 > log_to_syslog_facility = LOG_LOCAL0 > osd_journal_size = 2048 > auth_supported = cephx > osd_pool_default_pgp_num = 64 > osd_mkfs_type = xfs > cluster_network = 172.16.1.0/24 > osd_recovery_max_active = 1 > osd_max_backfills = 1 > > > [client] > rbd_cache_writethrough_until_flush = True > rbd_cache = True > > [client.radosgw.gateway] > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator > keyring = /etc/ceph/keyring.radosgw.gateway > rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 > rgw_socket_path = /tmp/radosgw.sock > rgw_keystone_revocation_interval = 100 > > Any guidance on where to look for issues. > > Regards, > Kevin > > On Fri, Jan 6, 2017 at 4:42 PM, kevin parrikar > wrote: > >> Thanks Christian for your valuable comments,each comment is a new >> learning for me. >> Please see inline >> >> On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer wrote: >
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
i really need some help here :( replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with no seperate journal Disk .Now both OSD nodes are with 2 ssd disks with a replica of *2* . Total number of OSD process in the cluster is *4*.with all SSD. But throughput has gone down from 1.4 MB/s to 1.3 MB/s for 4k writes and for 4M it has gone down from 140MB/s to 126MB/s . now atop no longer shows OSD device as 100% busy.. How ever i can see both ceph-osd process in atop with 53% and 47% disk utilization. PID RDDSK WRDSK WCANCL DSK CMD1/2 20771 0K648.8M 0K 53%ceph-osd 19547 0K576.7M 0K 47%ceph-osd OSD disks(ssd) utilization from atop DSK | sdc | busy 6% | read 0 | write 517 | KiB/r 0 | KiB/w 293 | MBr/s 0.00 | MBw/s 148.18 | avq 9.44 | avio 0.12 ms | DSK | sdd | busy 5% | read 0 | write 336 | KiB/r 0 | KiB/w 292 | MBr/s 0.00 | MBw/s 96.12 | avq 7.62 | avio 0.15 ms | Queue Depth of OSD disks cat /sys/block/sdd/device//queue_depth 256 atop inside virtual machine:[4 CPU/3Gb RAM] DSK | vdc | busy 96% | read 0 | write 256 | KiB/r 0 | KiB/w 512 | MBr/s 0.00 | MBw/s 128.00 | avq7.96 | avio 3.77 ms | Both Guest and Host are using deadline I/O scheduler Virtual Machine Configuration: 449da0e7-6223-457c-b2c6-b5e112099212 ceph.conf cat /etc/ceph/ceph.conf [global] fsid = c4e1a523-9017-492e-9c30-8350eba1bd51 mon_initial_members = node-16 node-30 node-31 mon_host = 172.16.1.11 172.16.1.12 172.16.1.8 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true log_to_syslog_level = info log_to_syslog = True osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 64 public_network = 172.16.1.0/24 log_to_syslog_facility = LOG_LOCAL0 osd_journal_size = 2048 auth_supported = cephx osd_pool_default_pgp_num = 64 osd_mkfs_type = xfs cluster_network = 172.16.1.0/24 osd_recovery_max_active = 1 osd_max_backfills = 1 [client] rbd_cache_writethrough_until_flush = True rbd_cache = True [client.radosgw.gateway] rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator keyring = /etc/ceph/keyring.radosgw.gateway rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 rgw_socket_path = /tmp/radosgw.sock rgw_keystone_revocation_interval = 100 Any guidance on where to look for issues. Regards, Kevin On Fri, Jan 6, 2017 at 4:42 PM, kevin parrikar wrote: > Thanks Christian for your valuable comments,each comment is a new learning > for me. > Please see inline > > On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer wrote: > >> >> Hello, >> >> On Fri, 6 Jan 2017 08:40:36 +0530 kevin parrikar wrote: >> >> > Hello All, >> > >> > I have setup a ceph cluster based on 0.94.6 release in 2 servers each >> with >> > 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM >> > which is connected to a 10G switch with a replica of 2 [ i will add 3 >> more >> > servers to the cluster] and 3 seperate monitor nodes which are vms. >> > >> I'd go to the latest hammer, this version has a lethal cache-tier bug if >> you should decide to try that. >> >> 80Gb Intel DC S3510 are a) slow and b) have only 0.3 DWPD. >> You're going to wear those out quickly and if not replaced in time loose >> data. >> >> 2 HDDs give you a theoretical speed of something like 300MB/s sustained, >> when used a OSDs I'd expect the usual 50-60MB/s per OSD due to >> seeks, journal (file system) and leveldb overheads. >> Which perfectly matches your results. >> > > H that makes sense ,its hitting 7.2 rpm OSD's peak write speed.I was > in an assumption that ssd Journal to OSD will happen slowly at a later time > and hence i could use slower and cheaper disks for OSD.But in practise it > looks like many articles in the internet that talks about faster journal > and slower OSD dont seems to be correct. > > Will adding more OSD disks per node improve the overall performance? > > i can add 4 more disks to each node,but all are 7.2 rpm disks .I am > expecting some kind of parallel writes on these disks and magically > improves performance :D > > This is my second experiment with Ceph last time i gave up and purchased > another costly solution from a vendor.But this time i am determined to fix > all issues and bring up a solid cluster . > Last time clsuter was giving a throughput of around 900kbps for 1G writes > from virtual
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Thanks Christian for your valuable comments,each comment is a new learning for me. Please see inline On Fri, Jan 6, 2017 at 9:32 AM, Christian Balzer wrote: > > Hello, > > On Fri, 6 Jan 2017 08:40:36 +0530 kevin parrikar wrote: > > > Hello All, > > > > I have setup a ceph cluster based on 0.94.6 release in 2 servers each > with > > 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM > > which is connected to a 10G switch with a replica of 2 [ i will add 3 > more > > servers to the cluster] and 3 seperate monitor nodes which are vms. > > > I'd go to the latest hammer, this version has a lethal cache-tier bug if > you should decide to try that. > > 80Gb Intel DC S3510 are a) slow and b) have only 0.3 DWPD. > You're going to wear those out quickly and if not replaced in time loose > data. > > 2 HDDs give you a theoretical speed of something like 300MB/s sustained, > when used a OSDs I'd expect the usual 50-60MB/s per OSD due to > seeks, journal (file system) and leveldb overheads. > Which perfectly matches your results. > H that makes sense ,its hitting 7.2 rpm OSD's peak write speed.I was in an assumption that ssd Journal to OSD will happen slowly at a later time and hence i could use slower and cheaper disks for OSD.But in practise it looks like many articles in the internet that talks about faster journal and slower OSD dont seems to be correct. Will adding more OSD disks per node improve the overall performance? i can add 4 more disks to each node,but all are 7.2 rpm disks .I am expecting some kind of parallel writes on these disks and magically improves performance :D This is my second experiment with Ceph last time i gave up and purchased another costly solution from a vendor.But this time i am determined to fix all issues and bring up a solid cluster . Last time clsuter was giving a throughput of around 900kbps for 1G writes from virtual machine and now things have improved ,its giving 1.4 Mbps but still far slower than the target of 24Mbps. Expecting to make some progress with the help of experts here :) > > > rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid > > card with 512Mb cache [ssd is in writeback mode wth BBU] > > > > > > Before installing ceph, i tried to check max throughpit of intel 3500 > 80G > > SSD using block size of 4M [i read somewhere that ceph uses 4m objects] > and > > it was giving 220mbps {dd if=/dev/zero of=/dev/sdb bs=4M count=1000 > > oflag=direct} > > > Irrelevant, sustained sequential writes will be limited by what your OSDs > (HDDs) can sustain. > > > *Observation:* > > Now the cluster is up and running and from the vm i am trying to write a > 4g > > file to its volume using dd if=/dev/zero of=/dev/sdb bs=4M count=1000 > > oflag=direct .It takes aroud 39 seconds to write. > > > > during this time ssd journal was showing disk write of 104M on both the > > ceph servers (dstat sdb) and compute node a network transfer rate of > ~110M > > on its 10G storage interface(dstat -nN eth2] > > > As I said, sounds about right. > > > > > my questions are: > > > > > >- Is this the best throughput ceph can offer or can anything in my > >environment be optmised to get more performance? [iperf shows a max > >throughput 9.8Gbits/s] > > > Not your network. > > Watch your nodes with atop and you will note that your HDDs are maxed out. > > > > > > >- I guess Network/SSD is under utilized and it can handle more writes > >how can this be improved to send more data over network to ssd? > > > As jiajia wrote, a cache-tier might give you some speed boosts. > But with those SSDs I'd advise against it, both too small and too low > endurance. > > > > > > >- rbd kernel module wasn't loaded on compute node,i loaded it manually > >using "modprobe" and later destroyed/re-created vms,but this doesnot > give > >any performance boost. So librbd and RBD are equally fast? > > > Irrelevant and confusing. > Your VMs will use on or the other depending on how they are configured. > > > > > > >- Samsung evo 840 512Gb shows a throughput of 500Mbps for 4M writes > [dd > >if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct] and for 4Kb > it was > >equally fast as that of intel S3500 80gb .Does changing my SSD from > intel > >s3500 100Gb to Samsung 840 500Gb make any performance difference > here just > >because for 4M wirtes samsung 840 evo is faster?Can Ceph utilize this > extra > >speed.Since samsung evo 840 is faster in 4M
Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Thanks Zhong. We got 5 servers for testing ,two are already configured to be OSD nodes and as per the storage requirement we need at least 5 OSD nodes .Let me try to get more servers to try cache tier ,but i am not hopefull though :( . Will try bcache and see how it improves performance,thanks for your suggestion. Regards, Kevin On Fri, Jan 6, 2017 at 8:56 AM, jiajia zhong wrote: > > > 2017-01-06 11:10 GMT+08:00 kevin parrikar : > >> Hello All, >> >> I have setup a ceph cluster based on 0.94.6 release in 2 servers each >> with 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM >> which is connected to a 10G switch with a replica of 2 [ i will add 3 >> more servers to the cluster] and 3 seperate monitor nodes which are vms. >> >> rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid >> card with 512Mb cache [ssd is in writeback mode wth BBU] >> >> >> Before installing ceph, i tried to check max throughpit of intel 3500 >> 80G SSD using block size of 4M [i read somewhere that ceph uses 4m >> objects] and it was giving 220mbps {dd if=/dev/zero of=/dev/sdb bs=4M >> count=1000 oflag=direct} >> >> *Observation:* >> Now the cluster is up and running and from the vm i am trying to write a >> 4g file to its volume using dd if=/dev/zero of=/dev/sdb bs=4M count=1000 >> oflag=direct .It takes aroud 39 seconds to write. >> >> during this time ssd journal was showing disk write of 104M on both the >> ceph servers (dstat sdb) and compute node a network transfer rate of ~110M >> on its 10G storage interface(dstat -nN eth2] >> >> >> my questions are: >> >> >>- Is this the best throughput ceph can offer or can anything in my >>environment be optmised to get more performance? [iperf shows a max >>throughput 9.8Gbits/s] >> >> >> >>- I guess Network/SSD is under utilized and it can handle more writes >>how can this be improved to send more data over network to ssd? >> >> cache tiering? http://docs.ceph.com/docs/hammer/rados/operations/cache- > tiering/ > or try bcache in kernel. > >> >>- rbd kernel module wasn't loaded on compute node,i loaded it >>manually using "modprobe" and later destroyed/re-created vms,but this >>doesnot give any performance boost. So librbd and RBD are equally fast? >> >> >> >>- Samsung evo 840 512Gb shows a throughput of 500Mbps for 4M writes >>[dd if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct] and for 4Kb it >>was equally fast as that of intel S3500 80gb .Does changing my SSD from >>intel s3500 100Gb to Samsung 840 500Gb make any performance difference >>here just because for 4M wirtes samsung 840 evo is faster?Can Ceph utilize >>this extra speed.Since samsung evo 840 is faster in 4M writes. >> >> >> Can somebody help me understand this better. >> >> Regards, >> Kevin >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release
Hello All, I have setup a ceph cluster based on 0.94.6 release in 2 servers each with 80Gb intel s3510 and 2x3 Tb 7.2 SATA disks,16 CPU,24G RAM which is connected to a 10G switch with a replica of 2 [ i will add 3 more servers to the cluster] and 3 seperate monitor nodes which are vms. rbd_cache is enabled in configurations,XFS filesystem,LSI 92465-4i raid card with 512Mb cache [ssd is in writeback mode wth BBU] Before installing ceph, i tried to check max throughpit of intel 3500 80G SSD using block size of 4M [i read somewhere that ceph uses 4m objects] and it was giving 220mbps {dd if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct} *Observation:* Now the cluster is up and running and from the vm i am trying to write a 4g file to its volume using dd if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct .It takes aroud 39 seconds to write. during this time ssd journal was showing disk write of 104M on both the ceph servers (dstat sdb) and compute node a network transfer rate of ~110M on its 10G storage interface(dstat -nN eth2] my questions are: - Is this the best throughput ceph can offer or can anything in my environment be optmised to get more performance? [iperf shows a max throughput 9.8Gbits/s] - I guess Network/SSD is under utilized and it can handle more writes how can this be improved to send more data over network to ssd? - rbd kernel module wasn't loaded on compute node,i loaded it manually using "modprobe" and later destroyed/re-created vms,but this doesnot give any performance boost. So librbd and RBD are equally fast? - Samsung evo 840 512Gb shows a throughput of 500Mbps for 4M writes [dd if=/dev/zero of=/dev/sdb bs=4M count=1000 oflag=direct] and for 4Kb it was equally fast as that of intel S3500 80gb .Does changing my SSD from intel s3500 100Gb to Samsung 840 500Gb make any performance difference here just because for 4M wirtes samsung 840 evo is faster?Can Ceph utilize this extra speed.Since samsung evo 840 is faster in 4M writes. Can somebody help me understand this better. Regards, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss
I have 4 node cluster each with 5 disks (4 OSD and 1 Operating system also hosting 3 monitoring process) with default replica 3. Total OSD disks : 16 Total Nodes : 4 How can i calculate the - Maximum number of disk failures my cluster can handle with out any impact on current data and new writes. - Maximum number of node failures my cluster can handle with out any impact on current data and new writes. Thanks for any help ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93
thanks i will follow this work around. On Thu, Mar 12, 2015 at 12:18 AM, Somnath Roy wrote: > Kevin, > > This is a known issue and should be fixed in the latest krbd. The problem > is, it is not backported to 14.04 krbd yet. You need to build it from > latest krbd source if you want to stick with 14.04. > > The workaround is, you need to unmap your clients before restarting osds. > > > > Thanks & Regards > > Somnath > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *kevin parrikar > *Sent:* Wednesday, March 11, 2015 11:44 AM > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] client crashed when osd gets restarted - hammer > 0.93 > > > > Hi, > > I am trying hammer 0.93 on Ubuntu 14.04. > > rbd is mapped in client ,which is also ubuntu 14.04 . > > When i did a stop ceph-osd-all and then a start,client machine crashed and > attached pic was in the console.Not sure if its related to ceph. > > > > Thanks > > -- > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] client crashed when osd gets restarted - hammer 0.93
Hi, I am trying hammer 0.93 on Ubuntu 14.04. rbd is mapped in client ,which is also ubuntu 14.04 . When i did a stop ceph-osd-all and then a start,client machine crashed and attached pic was in the console.Not sure if its related to ceph. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to improve seek time using hammer-test release
hello All, I just setup single node ceph with no replication to familiarize with ceph. using 2 intel S3500 SSD 800 Gb and 8Gb RAM and 16 core CPU. Os is ubuntu 14.04 64 bit ,kbd is loaded (modprobe kbd) When running bonniee++ against /dev/rbd0 it shows a seekrate of 892.2/s. How can the seek time be improved.If i ran 5 bonnie on /mnt where /dev/rbd0 is mounted as ext4 seek/s reduces to 500/s .I am trying to achieve over 1000 seek/s for each thread. What can i do to improve performance. *Tried following * scheduler to noop filesystem to btrfs debugging to 0/0 (all parameters found from mailing list) - This showed some noticeable difference . Will configuring ssd in RAID0 improve this,A single OSD from RAID0 Regards, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com