Re: [ceph-users] New install error

2017-08-08 Thread Brad Hubbard
On ceph01 if you login as ceph-deploy and run the following command
what output do you get?

$ sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon.
--keyring=/var/lib/ceph/mon/ceph-ceph01/keyring auth get client.admin

On Tue, Aug 8, 2017 at 11:41 PM, Timothy Wolgemuth
 wrote:
> I have a new installation and following the quick start guide at:
>
> http://docs.ceph.com/docs/master/start/quick-ceph-deploy/
>
> Running into the following error in the create-initial step.  See below:
>
>
>
> $ ceph-deploy --username ceph-deploy mon create-initial
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/ceph-deploy/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.37): /bin/ceph-deploy --username
> ceph-deploy mon create-initial
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : ceph-deploy
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  subcommand: create-initial
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   :
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  func  :  0x275e320>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.cli][INFO  ]  keyrings  : None
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph01
> [ceph_deploy.mon][DEBUG ] detecting platform for host ceph01 ...
> [ceph01][DEBUG ] connection detected need for sudo
> [ceph01][DEBUG ] connected to host: ceph-deploy@ceph01
> [ceph01][DEBUG ] detect platform information from remote host
> [ceph01][DEBUG ] detect machine type
> [ceph01][DEBUG ] find the location of an executable
> [ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.3.1611 Core
> [ceph01][DEBUG ] determining if provided host has same hostname in remote
> [ceph01][DEBUG ] get remote short hostname
> [ceph01][DEBUG ] deploying mon to ceph01
> [ceph01][DEBUG ] get remote short hostname
> [ceph01][DEBUG ] remote hostname: ceph01
> [ceph01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [ceph01][DEBUG ] create the mon path if it does not exist
> [ceph01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph01/done
> [ceph01][DEBUG ] create a done file to avoid re-doing the mon deployment
> [ceph01][DEBUG ] create the init path if it does not exist
> [ceph01][INFO  ] Running command: sudo systemctl enable ceph.target
> [ceph01][INFO  ] Running command: sudo systemctl enable ceph-mon@ceph01
> [ceph01][INFO  ] Running command: sudo systemctl start ceph-mon@ceph01
> [ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.ceph01.asok mon_status
> [ceph01][DEBUG ]
> 
> [ceph01][DEBUG ] status for monitor: mon.ceph01
> [ceph01][DEBUG ] {
> [ceph01][DEBUG ]   "election_epoch": 3,
> [ceph01][DEBUG ]   "extra_probe_peers": [
> [ceph01][DEBUG ] "192.168.100.11:6789/0"
> [ceph01][DEBUG ]   ],
> [ceph01][DEBUG ]   "monmap": {
> [ceph01][DEBUG ] "created": "2017-08-08 09:00:47.536389",
> [ceph01][DEBUG ] "epoch": 1,
> [ceph01][DEBUG ] "fsid": "89935cd7-d056-4dcd-80b2-925257811fd6",
> [ceph01][DEBUG ] "modified": "2017-08-08 09:00:47.536389",
> [ceph01][DEBUG ] "mons": [
> [ceph01][DEBUG ]   {
> [ceph01][DEBUG ] "addr": "10.135.130.95:6789/0",
> [ceph01][DEBUG ] "name": "ceph01",
> [ceph01][DEBUG ] "rank": 0
> [ceph01][DEBUG ]   }
> [ceph01][DEBUG ] ]
> [ceph01][DEBUG ]   },
> [ceph01][DEBUG ]   "name": "ceph01",
> [ceph01][DEBUG ]   "outside_quorum": [],
> [ceph01][DEBUG ]   "quorum": [
> [ceph01][DEBUG ] 0
> [ceph01][DEBUG ]   ],
> [ceph01][DEBUG ]   "rank": 0,
> [ceph01][DEBUG ]   "state": "leader",
> [ceph01][DEBUG ]   "sync_provider": []
> [ceph01][DEBUG ] }
> [ceph01][DEBUG ]
> 
> [ceph01][INFO  ] monitor: mon.ceph01 is running
> [ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.ceph01.asok mon_status
> [ceph_deploy.mon][INFO  ] processing monitor mon.ceph01
> [ceph01][DEBUG ] connection detected need for sudo
> [ceph01][DEBUG ] connected to host: ceph-deploy@ceph01
> [ceph01][DEBUG ] detect platform information from remote host
> [ceph01][DEBUG ] detect machine type
> [ceph01][DEBUG ] find the location of an executable
> [ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
> /var/run/ceph/ceph-mon.ceph01.asok mon_status
> [ceph_deploy.mon][INFO  ] mon.ceph01 monitor has reached quorum!
> [ceph_deploy.mon][INFO  ] all initial monitors are ru

Re: [ceph-users] Pg inconsistent / export_files error -5

2017-08-08 Thread Sage Weil
On Wed, 9 Aug 2017, Brad Hubbard wrote:
> Wee
> 
> On Wed, Aug 9, 2017 at 12:41 AM, Marc Roos  wrote:
> >
> >
> >
> > The --debug indeed comes up with something
> > bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device
> > location [0x15a017~1000], logical extent 0x0~1000,
> >  bluestore(/var/lib/ceph/osd/ceph-9) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x0, got 0xb40b26a7, expected 0x90407f75, device
> > location [0x2daea~1000], logical extent 0x0~1000,

What about the 3rd OSD?

It would be interesting to capture the fsck output for one of these.  
Stop the OSD, and then run

 ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-12 --log-file out \
--debug-bluestore 30 --no-log-to-stderr

That'll generate a pretty huge log, but should include dumps of onode 
metadata and will hopefully include something else with the checksum of 
0x100ac314 so we can get some clue as to where the bad data came from.

Thanks!
sage


> >
> > I dont know how to interpret this, but am I correct to understand that
> > data has been written across the cluster to these 3 osd's and all 3 have
> > somehow received something different?
> 
> Did you run this command on OSD 0? What was the output in that case?
> 
> Possibly, all we currently know for sure is that the crc32c checksum for the
> object on OSDs 12 and 9 do not match the expected checksum according to the 
> code
> when we attempt to read the object
> #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4#. There seems to be
> some history behind this based on your previous emails regarding these OSDs
> (12,9,0, and possibly 13) could you give us as much detail as possible about 
> how
> this issue came about and what you have done in the interim to try to resolve
> it?
> 
> When was the first indication there was a problem with pg 17.36? Did this
> correspond with any significant event?
> 
> Are these OSDs all on separate hosts?
> 
> It's possible ceph-bluestore-tool may help here but I would hold off on that
> option until we understand the issue better.
> 
> 
> >
> >
> > size=4194304 object_info:
> > 17:6ca10b29:::rbd_data.1fff61238e1f29.9923:head(5387'35157
> > client.2096993.0:78941 dirty|data_digest|omap_digest s 4194304 uv 35356 dd
> > f53dff2e od  alloc_hint [4194304 4194304 0]) data section offset=0
> > len=1048576 data section offset=1048576 len=1048576 data section
> > offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs 
> > size
> > 2 omap map size 0 Read
> > #17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head# size=4194304
> > object_info:
> > 17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head(5163'7136
> > client.2074638.1:483264 dirty|data_digest|omap_digest s 4194304 uv 7418 dd
> > 43d61c5d od  alloc_hint [4194304 4194304 0]) data section offset=0
> > len=1048576 data section offset=1048576 len=1048576 data section
> > offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs 
> > size
> > 2 omap map size 0 Read
> > #17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head# size=4194304
> > object_info:
> > 17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head(5236'7640
> > client.2074638.1:704364 dirty|data_digest|omap_digest s 4194304 uv 7922 dd
> > 3bcff64d od  alloc_hint [4194304 4194304 0]) data section offset=0
> > len=1048576 data section offset=1048576 len=1048576 data section
> > offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs 
> > size
> > 2 omap map size 0 Read
> > #17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head# size=4194304
> > object_info:
> > 17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head(5387'35553
> > client.2096993.0:123721 dirty|data_digest|omap_digest s 4194304 uv 35752 dd
> > f9bc0fbd od  alloc_hint [4194304 4194304 0]) data section offset=0
> > len=1048576 data section offset=1048576 len=1048576 data section
> > offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs 
> > size
> > 2 omap map size 0 Read
> > #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4# size=4194304
> > object_info:
> > 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4(5390'56613
> > client.2096907.1:3222443 dirty|omap_digest s 4194304 uv 55477 od 
> > alloc_hint [0 0 0]) 2017-08-08 15:57:45.078348 7fad08fa4100 -1
> > bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 checksum
> > at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device location
> > [0x15a017~1000], logical extent 0x0~1000, object
> > #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4# export_files 
> > error
> > -5 2017-08-08 15:57:45.081279 7fad08fa4100  1
> > bluestore(/var/lib/ceph/osd/ceph-12) umount 2017-08-08 15:57:45.150210
> > 7fad08fa4100  1 freelist shutdown 2017-08-08 15:57:45.150307 7fad08fa4100  4
> > rocksdb:
> > [/home/jenkin

Re: [ceph-users] Pg inconsistent / export_files error -5

2017-08-08 Thread Brad Hubbard
Wee

On Wed, Aug 9, 2017 at 12:41 AM, Marc Roos  wrote:
>
>
>
> The --debug indeed comes up with something
> bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device
> location [0x15a017~1000], logical extent 0x0~1000,
>  bluestore(/var/lib/ceph/osd/ceph-9) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0xb40b26a7, expected 0x90407f75, device
> location [0x2daea~1000], logical extent 0x0~1000,
>
> I dont know how to interpret this, but am I correct to understand that
> data has been written across the cluster to these 3 osd's and all 3 have
> somehow received something different?

Did you run this command on OSD 0? What was the output in that case?

Possibly, all we currently know for sure is that the crc32c checksum for the
object on OSDs 12 and 9 do not match the expected checksum according to the code
when we attempt to read the object
#17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4#. There seems to be
some history behind this based on your previous emails regarding these OSDs
(12,9,0, and possibly 13) could you give us as much detail as possible about how
this issue came about and what you have done in the interim to try to resolve
it?

When was the first indication there was a problem with pg 17.36? Did this
correspond with any significant event?

Are these OSDs all on separate hosts?

It's possible ceph-bluestore-tool may help here but I would hold off on that
option until we understand the issue better.


>
>
> size=4194304 object_info:
> 17:6ca10b29:::rbd_data.1fff61238e1f29.9923:head(5387'35157
> client.2096993.0:78941 dirty|data_digest|omap_digest s 4194304 uv 35356 dd
> f53dff2e od  alloc_hint [4194304 4194304 0]) data section offset=0
> len=1048576 data section offset=1048576 len=1048576 data section
> offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs size
> 2 omap map size 0 Read
> #17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head# size=4194304
> object_info:
> 17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head(5163'7136
> client.2074638.1:483264 dirty|data_digest|omap_digest s 4194304 uv 7418 dd
> 43d61c5d od  alloc_hint [4194304 4194304 0]) data section offset=0
> len=1048576 data section offset=1048576 len=1048576 data section
> offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs size
> 2 omap map size 0 Read
> #17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head# size=4194304
> object_info:
> 17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head(5236'7640
> client.2074638.1:704364 dirty|data_digest|omap_digest s 4194304 uv 7922 dd
> 3bcff64d od  alloc_hint [4194304 4194304 0]) data section offset=0
> len=1048576 data section offset=1048576 len=1048576 data section
> offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs size
> 2 omap map size 0 Read
> #17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head# size=4194304
> object_info:
> 17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head(5387'35553
> client.2096993.0:123721 dirty|data_digest|omap_digest s 4194304 uv 35752 dd
> f9bc0fbd od  alloc_hint [4194304 4194304 0]) data section offset=0
> len=1048576 data section offset=1048576 len=1048576 data section
> offset=2097152 len=1048576 data section offset=3145728 len=1048576 attrs size
> 2 omap map size 0 Read
> #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4# size=4194304
> object_info:
> 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4(5390'56613
> client.2096907.1:3222443 dirty|omap_digest s 4194304 uv 55477 od 
> alloc_hint [0 0 0]) 2017-08-08 15:57:45.078348 7fad08fa4100 -1
> bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 checksum
> at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device location
> [0x15a017~1000], logical extent 0x0~1000, object
> #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4# export_files error
> -5 2017-08-08 15:57:45.081279 7fad08fa4100  1
> bluestore(/var/lib/ceph/osd/ceph-12) umount 2017-08-08 15:57:45.150210
> 7fad08fa4100  1 freelist shutdown 2017-08-08 15:57:45.150307 7fad08fa4100  4
> rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] Shutdown:
> canceling all background work 2017-08-08 15:57:45.152099 7fad08fa4100  4
> rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
> CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
> 12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown
> complete 2017-08-08 15:57:45.184742 7fad08fa4100  1 bluefs umount 2017-08-08
> 15:57:45.203674 7fad08fa4100  1 bdev(0x7fad0b260e00
> /var/lib/ceph/osd/ceph-12/block) close 2017-08-08 15:57:45.442499

Re: [ceph-users] Iscsi configuration

2017-08-08 Thread Jason Dillaman
We are working hard to formalize active/passive iSCSI configuration
across Linux/Windows/ESX via LIO. We have integrated librbd into LIO's
tcmu-runner and have developed a set of support applications to
managing the clustered configuration of your iSCSI targets. There is
some preliminary documentation here [1] that will be merged once we
can finish our testing.

[1] https://github.com/ceph/ceph/pull/16182

On Tue, Aug 8, 2017 at 4:45 PM, Samuel Soulard  wrote:
> Hi all,
>
> Platform : Centos 7 Luminous 12.1.2
>
> First time here but, are there any guides or guidelines out there on how to
> configure ISCSI gateways in HA so that if one gateway fails, IO can continue
> on the passive node?
>
> What I've done so far
> -ISCSI node with Ceph client map rbd on boot
> -Rbd has exclusive-lock feature enabled and layering
> -Targetd service dependent on rbdmap.service
> -rbd exported through LUN ISCSI
> -Windows ISCSI imitator can map the lun and format / write to it (awesome)
>
> Now I have no idea where to start to have an active /passive scenario for
> luns exported with LIO.  Any ideas?
>
> Also the web dashboard seem to hint that it can get stats for various
> clients made on ISCSI gateways, I'm not sure where it pulls that
> information. Is Luminous now shipping a ISCSI daemon of some sort?
>
> Thanks all!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Iscsi configuration

2017-08-08 Thread Adrian Saul
Hi Sam,
  We use SCST for iSCSI with Ceph, and a pacemaker cluster to orchestrate the 
management of active/passive presentation using ALUA though SCST device groups. 
 In our case we ended up writing our own pacemaker resources to support our 
particular model and preferences, but I believe there are a few resources out 
there for setting this up that you could make use of.

For us it consists of resources for the RBD devices, the iSCSI targets, the 
device groups and hostgroups for presentation.  The resources are cloned across 
all the cluster nodes, except for the device group resources which are 
master/slave, with the master becoming the active ALUA member and the others 
becoming standby or non-optimised.

The iSCSI clients see the ALUA presentation and manage it with their own 
multipathing stacks.

There may be ways to do it with LIO now, but at the time I looked at the ALUA 
support in SCST was a lot better.

HTH.

Cheers,
 Adrian



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Samuel 
Soulard
Sent: Wednesday, 9 August 2017 6:45 AM
To: ceph-us...@ceph.com
Subject: [ceph-users] Iscsi configuration

Hi all,

Platform : Centos 7 Luminous 12.1.2

First time here but, are there any guides or guidelines out there on how to 
configure ISCSI gateways in HA so that if one gateway fails, IO can continue on 
the passive node?

What I've done so far
-ISCSI node with Ceph client map rbd on boot
-Rbd has exclusive-lock feature enabled and layering
-Targetd service dependent on rbdmap.service
-rbd exported through LUN ISCSI
-Windows ISCSI imitator can map the lun and format / write to it (awesome)

Now I have no idea where to start to have an active /passive scenario for luns 
exported with LIO.  Any ideas?

Also the web dashboard seem to hint that it can get stats for various clients 
made on ISCSI gateways, I'm not sure where it pulls that information. Is 
Luminous now shipping a ISCSI daemon of some sort?

Thanks all!
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One Monitor filling the logs

2017-08-08 Thread Mehmet
I guess this is Related to
"debug_mgr": "1/5"
But not Sure.. . Give it a try.

Hth
Mehmet 


Am 8. August 2017 16:28:21 MESZ schrieb Konrad Riedel :
>Hi Ceph users,
>
>my luminous (ceph version 12.1.1) testcluster is doing fine, except
>that 
>one Monitor is filling the logs
>
>  -rw-r--r-- 1 ceph ceph 119M Aug  8 15:27 ceph-mon.1.log
>
>ceph-mon.1.log:
>
>2017-08-08 15:57:49.509176 7ff4573c4700  0 log_channel(cluster) log 
>[DBG] : Standby manager daemon felix started
>2017-08-08 15:57:49.646006 7ff4573c4700  0 log_channel(cluster) log 
>[DBG] : Standby manager daemon daniel started
>2017-08-08 15:57:49.830046 7ff45d13a700  0 log_channel(cluster) log 
>[DBG] : mgrmap e256330: udo(active)
>2017-08-08 15:57:51.509410 7ff4573c4700  0 log_channel(cluster) log 
>[DBG] : Standby manager daemon felix started
>2017-08-08 15:57:51.646269 7ff4573c4700  0 log_channel(cluster) log 
>[DBG] : Standby manager daemon daniel started
>2017-08-08 15:57:52.054987 7ff45d13a700  0 log_channel(cluster) log 
>[DBG] : mgrmap e256331: udo(active)
>
>I've tried to reduce the debug settings ( "debug_mon": "0/1", 
>"debug_monc": "0/1"), but I still get 3 messages per
>second. Does anybody know how to mute this?
>
>All log settings (defaults):
>
>{
> "name": "mon.1",
> "cluster": "ceph",
> "debug_none": "0/5",
> "debug_lockdep": "0/1",
> "debug_context": "0/1",
> "debug_crush": "1/1",
> "debug_mds": "1/5",
> "debug_mds_balancer": "1/5",
> "debug_mds_locker": "1/5",
> "debug_mds_log": "1/5",
> "debug_mds_log_expire": "1/5",
> "debug_mds_migrator": "1/5",
> "debug_buffer": "0/1",
> "debug_timer": "0/1",
> "debug_filer": "0/1",
> "debug_striper": "0/1",
> "debug_objecter": "0/1",
> "debug_rados": "0/5",
> "debug_rbd": "0/5",
> "debug_rbd_mirror": "0/5",
> "debug_rbd_replay": "0/5",
> "debug_journaler": "0/5",
> "debug_objectcacher": "0/5",
> "debug_client": "0/5",
> "debug_osd": "1/5",
> "debug_optracker": "0/5",
> "debug_objclass": "0/5",
> "debug_filestore": "1/3",
> "debug_journal": "1/3",
> "debug_ms": "0/5",
> "debug_mon": "0/1",
> "debug_monc": "0/1",
> "debug_paxos": "1/5",
> "debug_tp": "0/5",
> "debug_auth": "1/5",
> "debug_crypto": "1/5",
> "debug_finisher": "1/1",
> "debug_heartbeatmap": "1/5",
> "debug_perfcounter": "1/5",
> "debug_rgw": "1/5",
> "debug_civetweb": "1/10",
> "debug_javaclient": "1/5",
> "debug_asok": "1/5",
> "debug_throttle": "1/1",
> "debug_refs": "0/0",
> "debug_xio": "1/5",
> "debug_compressor": "1/5",
> "debug_bluestore": "1/5",
> "debug_bluefs": "1/5",
> "debug_bdev": "1/3",
> "debug_kstore": "1/5",
> "debug_rocksdb": "4/5",
> "debug_leveldb": "4/5",
> "debug_memdb": "4/5",
> "debug_kinetic": "1/5",
> "debug_fuse": "1/5",
> "debug_mgr": "1/5",
> "debug_mgrc": "1/5",
> "debug_dpdk": "1/5",
> "debug_eventtrace": "1/5",
> "host": "felix",
>
>Thanks & regards
>
>Konrad Riedel
>
>--
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread Nick Fisk


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mclean, Patrick
> Sent: 08 August 2017 20:13
> To: David Turner ; ceph-us...@ceph.com
> Cc: Colenbrander, Roelof ; Payno,
> Victor ; Yip, Rae 
> Subject: Re: [ceph-users] ceph cluster experiencing major performance
issues
> 
> On 08/08/17 10:50 AM, David Turner wrote:
> > Are you also seeing osds marking themselves down for a little bit and
> > then coming back up?  There are 2 very likely problems
> > causing/contributing to this.  The first is if you are using a lot of
> > snapshots.  Deleting snapshots is a very expensive operation for your
> > cluster and can cause a lot of slowness.  The second is PG subfolder
> > splitting.  This will show as blocked requests and osds marking
> > themselves down and coming back up a little later without any errors
> > in the log.  I linked a previous thread where someone was having these
> > problems where both causes were investigated.
> >
> > https://www.mail-archive.com/ceph-
> us...@lists.ceph.com/msg36923.html
> 
> We are not seeing OSDs marking themselves down a little bit and coming
> back as far as we can tell. We will do some more investigation in to this.
> 
> We are creating and deleting quite a few snapshots, is there anything we
can
> do to make this less expensive? We are going to attempt to create less
> snapshots in our systems, but unfortunately we have to create a fair
number
> due to our use case.

That's probably most likely your problem. Upgrade to 10.2.9 and enable the
snap trim sleep option on your OSD's to somewhere around 0.1, it has a
massive effect on snapshot removal.

> 
> Is slow snapshot deletion likely to cause a slow backlog of purged snaps?
In
> some cases we are seeing ~40k snaps still in cached_removed_snaps.
> 
> > If you have 0.94.9 or 10.2.5 or later, then you can split your PG
> > subfolders sanely while your osds are temporarily turned off using the
> > 'ceph-objectstore-tool apply-layout-settings'.  There are a lot of
> > ways to skin the cat of snap trimming, but it depends greatly on your
use
> case.
> 
> We are currently running 10.2.5, and are planning to update to 10.2.9 at
> some point soon. Our clients are using the 4.9 kernel RBD driver (which
sort
> of forces us to keep our snapshot count down below 510), we are currently
> testing the possibility of using the nbd-rbd driver as an alternative.
> 
> > On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick
> > mailto:patrick.mcl...@sony.com>> wrote:
> >
> > High CPU utilization and inexplicably slow I/O requests
> >
> > We have been having similar performance issues across several ceph
> > clusters. When all the OSDs are up in the cluster, it can stay
HEALTH_OK
> > for a while, but eventually performance worsens and becomes (at
first
> > intermittently, but eventually continually) HEALTH_WARN due to slow
> I/O
> > request blocked for longer than 32 sec. These slow requests are
> > accompanied by "currently waiting for rw locks", but we have not
found
> > any network issue that normally is responsible for this warning.
> >
> > Examining the individual slow OSDs from `ceph health detail` has
been
> > unproductive; there don't seem to be any slow disks and if we stop
the
> > OSD the problem just moves somewhere else.
> >
> > We also think this trends with increased number of RBDs on the
clusters,
> > but not necessarily a ton of Ceph I/O. At the same time, user %CPU
time
> > spikes up to 95-100%, at first frequently and then consistently,
> > simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz
> CPU
> > with 6 cores and 64GiB RAM per node.
> >
> > ceph1 ~ $ sudo ceph status
> > cluster ----
> >  health HEALTH_WARN
> > 547 requests are blocked > 32 sec
> >  monmap e1: 3 mons at
> >
> {cephmon1.XXX=XXX.XXX.XXX.XXX:/0,cephmon1.
> XXX=XXX.XXX.XXX.XX:/0,cephmon1.XXX
> =XXX.XXX.XXX.XXX:/0}
> > election epoch 16, quorum 0,1,2
> >
> cephmon1.XXX,cephmon1.X
> XX,cephmon1.XXX
> >  osdmap e577122: 72 osds: 68 up, 68 in
> > flags sortbitwise,require_jewel_osds
> >   pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091
kobjects
> > 126 TB used, 368 TB / 494 TB avail
> > 4084 active+clean
> >   12 active+clean+scrubbing+deep
> >   client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
> >
> > ceph1 ~ $ vmstat 5 5
> > procs ---memory-- ---swap-- -io -system--
> > --cpu-
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us
sy
> > id wa st
> > 27  1  0 3112660 165544 3626169200   472  127401
22
> >

[ceph-users] Iscsi configuration

2017-08-08 Thread Samuel Soulard
Hi all,

Platform : Centos 7 Luminous 12.1.2

First time here but, are there any guides or guidelines out there on how to
configure ISCSI gateways in HA so that if one gateway fails, IO can
continue on the passive node?

What I've done so far
-ISCSI node with Ceph client map rbd on boot
-Rbd has exclusive-lock feature enabled and layering
-Targetd service dependent on rbdmap.service
-rbd exported through LUN ISCSI
-Windows ISCSI imitator can map the lun and format / write to it (awesome)

Now I have no idea where to start to have an active /passive scenario for
luns exported with LIO.  Any ideas?

Also the web dashboard seem to hint that it can get stats for various
clients made on ISCSI gateways, I'm not sure where it pulls that
information. Is Luminous now shipping a ISCSI daemon of some sort?

Thanks all!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running commands on Mon or OSD nodes

2017-08-08 Thread David Turner
Regardless of which node you run that command on, the command is talking to
the mons.  If you are getting different values between different nodes,
double check their configs and make sure your mon quorum isn't somehow in a
split-brain scenario.  Which version of Ceph are you running.

On Tue, Aug 8, 2017 at 4:13 AM Osama Hasebou  wrote:

> Hi Everyone,
>
> I was trying to run the ceph osd crush reweight command to move data out
> of one node that has hardware failures and I noticed that as I set the
> crush reweight to 0, some nodes would reflect it when I do ceph osd tree
> and some wouldn't.
>
> What is the proper way to run command access cluster, does one need to run
> same command *ceph osd crush reweight* from all mon nodes and it would push
> it down to all osd tree and update the crush, or is it also ok to run it
> once on an osd node and it will copy it across the other nodes and update
> the crush map?
>
> Thank you!
>
> Regards,
> Ossi
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread Mclean, Patrick
On 08/08/17 10:50 AM, David Turner wrote:
> Are you also seeing osds marking themselves down for a little bit and
> then coming back up?  There are 2 very likely problems
> causing/contributing to this.  The first is if you are using a lot of
> snapshots.  Deleting snapshots is a very expensive operation for your
> cluster and can cause a lot of slowness.  The second is PG subfolder
> splitting.  This will show as blocked requests and osds marking
> themselves down and coming back up a little later without any errors in
> the log.  I linked a previous thread where someone was having these
> problems where both causes were investigated.
> 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg36923.html  

We are not seeing OSDs marking themselves down a little bit and coming
back as far as we can tell. We will do some more investigation in to this.

We are creating and deleting quite a few snapshots, is there anything we
can do to make this less expensive? We are going to attempt to create
less snapshots in our systems, but unfortunately we have to create a
fair number due to our use case.

Is slow snapshot deletion likely to cause a slow backlog of purged
snaps? In some cases we are seeing ~40k snaps still in cached_removed_snaps.

> If you have 0.94.9 or 10.2.5 or later, then you can split your PG
> subfolders sanely while your osds are temporarily turned off using the
> 'ceph-objectstore-tool apply-layout-settings'.  There are a lot of ways
> to skin the cat of snap trimming, but it depends greatly on your use case.

We are currently running 10.2.5, and are planning to update to 10.2.9 at
some point soon. Our clients are using the 4.9 kernel RBD driver (which
sort of forces us to keep our snapshot count down below 510), we are
currently testing the possibility of using the nbd-rbd driver as an
alternative.

> On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick  > wrote:
> 
> High CPU utilization and inexplicably slow I/O requests
> 
> We have been having similar performance issues across several ceph
> clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK
> for a while, but eventually performance worsens and becomes (at first
> intermittently, but eventually continually) HEALTH_WARN due to slow I/O
> request blocked for longer than 32 sec. These slow requests are
> accompanied by "currently waiting for rw locks", but we have not found
> any network issue that normally is responsible for this warning.
> 
> Examining the individual slow OSDs from `ceph health detail` has been
> unproductive; there don't seem to be any slow disks and if we stop the
> OSD the problem just moves somewhere else.
> 
> We also think this trends with increased number of RBDs on the clusters,
> but not necessarily a ton of Ceph I/O. At the same time, user %CPU time
> spikes up to 95-100%, at first frequently and then consistently,
> simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU
> with 6 cores and 64GiB RAM per node.
> 
> ceph1 ~ $ sudo ceph status
> cluster ----
>  health HEALTH_WARN
> 547 requests are blocked > 32 sec
>  monmap e1: 3 mons at
> 
> {cephmon1.XXX=XXX.XXX.XXX.XXX:/0,cephmon1.XXX=XXX.XXX.XXX.XX:/0,cephmon1.XXX=XXX.XXX.XXX.XXX:/0}
> election epoch 16, quorum 0,1,2
> 
> cephmon1.XXX,cephmon1.XXX,cephmon1.XXX
>  osdmap e577122: 72 osds: 68 up, 68 in
> flags sortbitwise,require_jewel_osds
>   pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects
> 126 TB used, 368 TB / 494 TB avail
> 4084 active+clean
>   12 active+clean+scrubbing+deep
>   client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
> 
> ceph1 ~ $ vmstat 5 5
> procs ---memory-- ---swap-- -io -system--
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
> id wa st
> 27  1  0 3112660 165544 3626169200   472  127401 22
> 1 76  1  0
> 25  0  0 3126176 165544 3624650800   858 12692 12122 110478
> 97  2  1  0  0
> 22  0  0 3114284 165544 3625813600 1  6118 9586 118625
> 97  2  1  0  0
> 11  0  0 3096508 165544 3627624400 8  6762 10047 188618
> 89  3  8  0  0
> 18  0  0 2990452 165544 3638404800  1209 21170 11179 179878
> 85  4 11  0  0
> 
> There is no apparent memory shortage, and none of the HDDs or SSDs show
> consistently high utilization, slow service times, or any other form of
> hardware saturation, other than user CPU utilization. Can CPU starvation
> be responsible for "w

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-08-08 Thread Lincoln Bryant
Hi all, 

Apologies for necromancing an old thread, but I was wondering if anyone had any 
more thoughts on this. We're running v10.2.9 now and still have 3 PGs 
exhibiting this behavior in our cache pool after scrubs, deep-scrubs, and 
repair attempts. Some more information below.

Thanks much,
Lincoln


[1]

# rados list-inconsistent-obj 36.14f0 | jq
{
  "epoch": 820795,
  "inconsistents": [
{
  "object": {
"name": "1002378e2a6.0001",
"nspace": "",
"locator": "",
"snap": "head",
"version": 2251698
  },
  "errors": [],
  "union_shard_errors": [
"size_mismatch_oi"
  ],
  "selected_object_info": 
"36:0f29a1d4:::1002378e2a6.0001:head(737930'2208087 
client.36346283.1:5757188 dirty
 s 4136960 uv 2251698 alloc_hint [0 0])",
  "shards": [
{
  "osd": 173,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 242,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
},
{
  "osd": 295,
  "errors": [
"size_mismatch_oi"
  ],
  "size": 0
}
  ]
}
  ]
}

2017-08-08 13:26:23.243245 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 212 missing 36:a13626c6:::1002378e9a9.0001:head
2017-08-08 13:26:23.243250 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 295: soid 36:a13626c6:::1002378e9a9.0001:head size 0 != size 
4173824 from auth oi 36:a13626c6:::1002378e9a9.0001:head(737930'2123468 
client.36346283.1:5782375 dirty s 4173824 uv 2164627 alloc_hint [0 0])
2017-08-08 13:26:23.243253 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 shard 353 missing 36:a13626c6:::1002378e9a9.0001:head
2017-08-08 13:26:23.243255 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 soid 36:a13626c6:::1002378e9a9.0001:head: failed to pick suitable 
auth object
2017-08-08 13:26:23.243362 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
scrub 36.2c85 36:a13626c6:::1002378e9a9.0001:head on disk size (0) does not 
match object info size (4173824) adjusted for ondisk to (4173824)
2017-08-08 13:26:34.310237 7fafac78a700 -1 log_channel(cluster) log [ERR] : 
36.2c85 scrub 4 errors

> On May 15, 2017, at 5:28 PM, Gregory Farnum  wrote:
> 
> 
> 
> On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant  wrote:
> Hi Greg,
> 
> Curiously, some of these scrub errors went away on their own. The example pg 
> in the original post is now active+clean, and nothing interesting in the logs:
> 
> # zgrep "36.277b" ceph-osd.244*gz
> ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170510.gz:2017-05-09 06:58:01.872484 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170511.gz:2017-05-10 20:40:47.536974 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170511.gz:2017-05-10 20:41:38.399614 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170514.gz:2017-05-13 20:49:47.063789 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700  0 
> log_channel(cluster) log [INF] : 36.277b scrub starts
> ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186e28700  0 
> log_channel(cluster) log [INF] : 36.277b scrub ok
> 
> (No matches in the logs for osd 175 and osd 297  — perhaps already rotated 
> away?)
> 
> Other PGs still exhibit this behavior though:
> 
> # rados list-inconsistent-obj 36.2953 | jq .
> {
>   "epoch": 737940,
>   "inconsistents": [
> {
>   "object": {
> "name": "1002378da6c.0001",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 2213621
>   },
>   "errors": [],
>   "union_shard_errors": [
> "size_mismatch_oi"
>   ],
>   "selected_object_info": 
> "36:ca95a23b:::1002378da6c.0001:head(737930'2177823 
> client.36346283.1:5635626 dirty s 4067328 uv 2213621)",
>   "shards": [
> {
>   "osd": 113,
>   "errors": [
> "size_mismatch_oi"
>   ],
>   "size": 0
> },
> {
>   "osd": 123,
>   "errors": [
> "size_mismatch_oi"
>   ],
>   "size": 0
> },
> {
>   "osd": 173,
>   "errors": [
> "size_mismatch_oi"
>   ],
>   "size": 0
> }
>   ]
> }
>   ]
> }
> 
> Perhaps new data being written to this pg cleared things up?
> 
> Hmm, somebody else did report the same thing (and the symptoms disappearing) 
> recently as wel

[ceph-users] Two clusters on same hosts - mirroring

2017-08-08 Thread Oscar Segarra
Hi,

I'd like to use the mirroring feature

http://docs.ceph.com/docs/master/rbd/rbd-mirroring/

In my environment I have just one host (at the moment for testing purposes
before production deployment).

I want to dispose:

/dev/sdb for standard operatoin
/dev/sdc for mirror

Of course, I'd like to create two clusters, each cluster with a pool
"mypool" and enable mirror.

The final idea is using CephFS for exporting to tape my VMs in consistent
state without affecting production OSD /dev/sdb.

http://docs.ceph.com/docs/master/cephfs/createfs/

Anybody has tried something similar? can anybody explain his experience?

Thanks a lot.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread bstillw...@godaddy.com
Dan,

I set norebalance, do a bunch of reweights, then unset norebalance.  Degraded 
PGs will still recover as long as they're not waiting on one of the PGs that is 
marked as backfilling (which does happen).

What I believe is happening is that when you change CRUSH weights while PGs are 
actively backfilling, sometimes the backfilling PGs will be remapped again and 
the peering process takes a bit longer (which blocks I/O on those PGs).  
However, when 'norebalance' is set, I believe the peering process is much 
faster which prevents the slow requests.  This is just a guess, so I would love 
for a developer to chime in to confirm whether or not that's the case.

Bryan

From: Dan van der Ster 
Date: Tuesday, August 8, 2017 at 2:06 AM
To: Bryan Stillwell 
Cc: Laszlo Budai , ceph-users 

Subject: Re: [ceph-users] expanding cluster with minimal impact

Hi Bryan,

How does the norebalance procedure work? You set the flag, increase
the weight, then I expect the PGs to stay in remapped unless they're
degraded ... why would a PG be degraded just because of a weight
change? And then what happens when you unset norebalance?

Cheers, Dan


On Mon, Aug 7, 2017 at 6:07 PM, Bryan Stillwell 
mailto:bstillw...@godaddy.com>> wrote:
Dan,

We recently went through an expansion of an RGW cluster and found that we 
needed 'norebalance' set whenever making CRUSH weight changes to avoid slow 
requests.  We were also increasing the CRUSH weight by 1.0 each time which 
seemed to reduce the extra data movement we were seeing with smaller weight 
increases.  Maybe something to try out next time?

Bryan

From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Dan van der Ster mailto:d...@vanderster.com>>
Date: Friday, August 4, 2017 at 1:59 AM
To: Laszlo Budai mailto:las...@componentsoft.eu>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] expanding cluster with minimal impact

Hi Laszlo,

The script defaults are what we used to do a large intervention (the
default delta weight is 0.01). For our clusters going any faster
becomes disruptive, but this really depends on your cluster size and
activity.

BTW, in case it wasn't clear, to use this script for adding capacity
you need to create the new OSDs to your cluster with initial crush
weight = 0.0

osd crush initial weight = 0
osd crush update on start = true

-- Dan



On Thu, Aug 3, 2017 at 8:12 PM, Laszlo Budai 
mailto:las...@componentsoft.eu>> wrote:
Dear all,

I need to expand a ceph cluster with minimal impact. Reading previous
threads on this topic from the list I've found the ceph-gentle-reweight
script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der Ster (Thank you Dan for sharing the script with us!).

I've done some experiments, and it looks promising, but it is needed to
properly set the parameters. Did any of you tested this script before? what
is the recommended delta_weight to be used? From the default parameters of
the script I can see that the default delta weight is .5% of the target
weight that means 200 reweighting cycles. I have experimented with a
reweight ratio of 5% while running a fio test on a client. The results were
OK (I mean no slow requests), but my  test cluster was a very small one.

If any of you has done some larger experiments with this script I would be
really interested to read about your results.

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel - radosgw-admin bucket limit check broken?

2017-08-08 Thread Sam Wouters
Hi,

I wanted to the test the new feature to check the present buckets for
optimal index sharding.
According to the docs this should be as simple as "radosgw-admin -n
client.xxx bucket limit check" with an optional param for printing only
buckets over or nearing the limit.

When I invoke this, however I get the simple error output

unrecognized arg limit
usage: radosgw-admin  [options...]

followed by help output.

Tested this with 10.2.8 and 10.2.9; other radosgw-admin commands work fine.
I've looked into the open issues but don't seem to find this in the tracker.

Simple bug or am I completely missing something?

r,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mon time to form quorum

2017-08-08 Thread Travis Nielsen
At cluster creation I'm seeing that the mons are taking a while time to form 
quorum. It seems like I'm hitting a timeout of 60s somewhere. Am I missing a 
config setting that would help paxos establish quorum sooner? When initializing 
with the monmap I would have expected the mons to initialize very quickly.

The scenario is:

  *   Luminous RC 2
  *   The mons are initialized with a monmap
  *   Running in Kubernetes (Rook)

The symptoms are:

  *   When all three mons start in parallel, they appear to determine their 
rank immediately. I assume this means they establish communication. A log 
message is seen such as this in each of the mon logs:
 *   2017-08-08 17:03:16.383599 7f8da7c85f40  0 
mon.rook-ceph-mon1@-1(probing) e0  my rank is now 0 (was –1)
  *   Now paxos enters a loop that times out every two seconds and lasts about 
60s, trying to probe the other monitors. During this wait, I am able to curl 
the mon endpoints successfully.
 *   2017-08-08 17:03:17.345877 7f02b779af40 10 
mon.rook-ceph-mon0@1(probing) e0 probing other monitors
 *   2017-08-08 17:03:19.346032 7f02ae568700  4 
mon.rook-ceph-mon0@1(probing) e0 probe_timeout 0x55c93678bb00
  *   After about 60 seconds the probe succeeds and the mons start responding
 *   2017-08-08 17:04:17.356928 7f02ae568700 10 
mon.rook-ceph-mon0@1(probing) e0 probing other monitors
 *   2017-08-08 17:04:17.366587 7f02a855c700 10 
mon.rook-ceph-mon0@1(probing) e0 ms_verify_authorizer 10.0.0.254:6790/0 mon 
protocol 2

The relevant settings in the config are:
mon initial members  = rook-ceph-mon0 rook-ceph-mon1 rook-ceph-mon2
mon host  = 10.0.0.24:6790,10.0.0.163:6790,10.0.0.139:6790
public addr   = 10.0.0.24
cluster addr  = 172.17.0.5

The full log for this mon at debug log level 20 can be found here:
https://gist.github.com/travisn/2c2641a6b80a7479b3b22accb41a5193

Any ideas?

Thanks,
Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster experiencing major performance issues

2017-08-08 Thread David Turner
Are you also seeing osds marking themselves down for a little bit and then
coming back up?  There are 2 very likely problems causing/contributing to
this.  The first is if you are using a lot of snapshots.  Deleting
snapshots is a very expensive operation for your cluster and can cause a
lot of slowness.  The second is PG subfolder splitting.  This will show as
blocked requests and osds marking themselves down and coming back up a
little later without any errors in the log.  I linked a previous thread
where someone was having these problems where both causes were investigated.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg36923.html

If you have 0.94.9 or 10.2.5 or later, then you can split your PG
subfolders sanely while your osds are temporarily turned off using the
'ceph-objectstore-tool apply-layout-settings'.  There are a lot of ways to
skin the cat of snap trimming, but it depends greatly on your use case.

On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick 
wrote:

> High CPU utilization and inexplicably slow I/O requests
>
> We have been having similar performance issues across several ceph
> clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK
> for a while, but eventually performance worsens and becomes (at first
> intermittently, but eventually continually) HEALTH_WARN due to slow I/O
> request blocked for longer than 32 sec. These slow requests are
> accompanied by "currently waiting for rw locks", but we have not found
> any network issue that normally is responsible for this warning.
>
> Examining the individual slow OSDs from `ceph health detail` has been
> unproductive; there don't seem to be any slow disks and if we stop the
> OSD the problem just moves somewhere else.
>
> We also think this trends with increased number of RBDs on the clusters,
> but not necessarily a ton of Ceph I/O. At the same time, user %CPU time
> spikes up to 95-100%, at first frequently and then consistently,
> simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU
> with 6 cores and 64GiB RAM per node.
>
> ceph1 ~ $ sudo ceph status
> cluster ----
>  health HEALTH_WARN
> 547 requests are blocked > 32 sec
>  monmap e1: 3 mons at
>
> {cephmon1.XXX=XXX.XXX.XXX.XXX:/0,cephmon1.XXX=XXX.XXX.XXX.XX:/0,cephmon1.XXX=XXX.XXX.XXX.XXX:/0}
> election epoch 16, quorum 0,1,2
>
> cephmon1.XXX,cephmon1.XXX,cephmon1.XXX
>  osdmap e577122: 72 osds: 68 up, 68 in
> flags sortbitwise,require_jewel_osds
>   pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects
> 126 TB used, 368 TB / 494 TB avail
> 4084 active+clean
>   12 active+clean+scrubbing+deep
>   client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
>
> ceph1 ~ $ vmstat 5 5
> procs ---memory-- ---swap-- -io -system--
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
> id wa st
> 27  1  0 3112660 165544 3626169200   472  127401 22
> 1 76  1  0
> 25  0  0 3126176 165544 3624650800   858 12692 12122 110478
> 97  2  1  0  0
> 22  0  0 3114284 165544 3625813600 1  6118 9586 118625
> 97  2  1  0  0
> 11  0  0 3096508 165544 3627624400 8  6762 10047 188618
> 89  3  8  0  0
> 18  0  0 2990452 165544 3638404800  1209 21170 11179 179878
> 85  4 11  0  0
>
> There is no apparent memory shortage, and none of the HDDs or SSDs show
> consistently high utilization, slow service times, or any other form of
> hardware saturation, other than user CPU utilization. Can CPU starvation
> be responsible for "waiting for rw locks"?
>
> Our main pool (the one with all the data) currently has 1024 PGs,
> leaving us room to add more PGs if needed, but we're concerned if we do
> so that we'd consume even more CPU.
>
> We have moved to running Ceph + jemalloc instead of tcmalloc, and that
> has helped with CPU utilization somewhat, but we still see occurences of
> 95-100% CPU with not terribly high Ceph workload.
>
> Any suggestions of what else to look at? We have a peculiar use case
> where we have many RBDs but only about 1-5% of them are active at the
> same time, and we're constantly making and expiring RBD snapshots. Could
> this lead to aberrant performance? For instance, is it normal to have
> ~40k snaps still in cached_removed_snaps?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH bluestore space consumption with small objects

2017-08-08 Thread Marcus Haarmann
Hi, 

I can check if this would change anything, but we are currently trying to find 
a different solution. 
The issue we ran into by using rados as backend with a bluestore osd was that 
every object seems 
to be cached in the osd and the memory consumption of the osd was increasing 
very much. 
This is not very useful for us, since the objects need to be accessed rarely 
and have a very long 
period of time to exist. 
So we are checking out now rdb with a database on top or a filesystem on top, 
which will handle 
the huge amount of small objects. This will have the drawback that a filesystem 
or a database 
could become inconsistent easier than a rados-only approach. 
Even cephfs was not the right approach since the space consumption would be the 
same as with 
rados directly. 

Thanks to everybody, 

Marcus Haarmann 


Von: "Pavel Shub"  
An: "Gregory Farnum"  
CC: "Wido den Hollander" , "ceph-users" 
, "Marcus Haarmann"  
Gesendet: Dienstag, 8. August 2017 17:50:44 
Betreff: Re: [ceph-users] CEPH bluestore space consumption with small objects 

Marcus, 

You may want to look at the bluestore_min_alloc_size setting as well 
as the respective bluestore_min_alloc_size_ssd and 
bluestore_min_alloc_size_hdd. By default bluestore sets a 64k block 
size for ssds. I'm also using ceph for small objects and I've see my 
OSD usage go down from 80% to 20% after setting the min alloc size to 
4k. 

Thanks, 
Pavel 

On Thu, Aug 3, 2017 at 3:59 PM, Gregory Farnum  wrote: 
> Don't forget that at those sizes the internal journals and rocksdb size 
> tunings are likely to be a significant fixed cost. 
> 
> On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander  wrote: 
>> 
>> 
>> > Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann 
>> > : 
>> > 
>> > 
>> > Hi, 
>> > we are doing some tests here with a Kraken setup using bluestore backend 
>> > (on Ubuntu 64 bit). 
>> > We are trying to store > 10 mio very small objects using RADOS. 
>> > (no fs, no rdb, only osd and monitors) 
>> > 
>> > The setup was done with ceph-deploy, using the standard bluestore 
>> > option, no separate devices 
>> > for wal. The test cluster spreads over 3 virtual machines, each with 
>> > 100GB storage für osd. 
>> > 
>> > We are now in the following situation (used pool is "test"): 
>> > rados df 
>> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED 
>> > RD_OPS RD WR_OPS WR 
>> > rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k 
>> > test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M 
>> > 
>> > total_objects 595429 
>> > total_used 141G 
>> > total_avail 158G 
>> > total_space 299G 
>> > 
>> > ceph osd df 
>> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 
>> > 0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72 
>> > 1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72 
>> > 2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72 
>> > TOTAL 299G 148G 150G 49.65 
>> > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 
>> > 
>> > As you can see, there are about 18GB data stored in ~595000 objects now. 
>> > The actual space consumption is about 150GB, which fills about half of 
>> > the storage. 
>> > 
>> 
>> Not really. Each OSD uses 50GB, but since you replicate 3 times (default) 
>> it's storing 150GB spread out over 3 OSDs. 
>> 
>> So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a 
>> lot, but a lot less then 150GB. 
>> 
>> > Objects have been added with a test script using the rados command line 
>> > (put). 
>> > 
>> > Obviously, the stored objects are counted byte by byte in the rados df 
>> > command, 
>> > but the real space allocation is about factor 8. 
>> > 
>> 
>> As written above, it's ~2.5x, not 8x. 
>> 
>> > The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. 
>> > 
>> > Is there any recommended way to configure bluestore with a better 
>> > suitable 
>> > block size for those small objects ? I cannot find any configuration 
>> > option 
>> > which would allow modification of the internal block handling of 
>> > bluestore. 
>> > Is luminous an option which allows more specific configuration ? 
>> > 
>> 
>> Could you try this with the Luminous RC as well? I don't know the answer 
>> here, but since Kraken a LOT has been improved to BlueStore. 
>> 
>> Wido 
>> 
>> > Thank you all in advance for support. 
>> > 
>> > Marcus Haarmann 
>> > 
>> > ___ 
>> > ceph-users mailing list 
>> > ceph-users@lists.ceph.com 
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/c

Re: [ceph-users] bluestore on luminous using ramdisk?

2017-08-08 Thread Gregory Farnum
I've no idea how the setup would go, but there's also a "Memstore" backend.
It's used exclusively for testing, may or may not scale well, and doesn't
have integration with the tooling, but it's got very limited setup (I think
you just start an OSD with the appropriate config options set). You might
want to look at that.
-Greg

On Tue, Aug 8, 2017 at 8:23 AM  wrote:

> Hi
> I’m coming at this with not a lot of ceph experience but some enthusiasm
> so forgive me if this is an inappropriate question but is there any reason
> why it’s not possible, in theory, to setup bluestore using ramdisk?
>
>  In my application I can afford to risk losing all data on system
> failure/reboot/whatever,  but I’m looking at trying to optimise
> performance.  If it is possible what would be the best way to do this? The
> ceph-disk prepare —bluestore works really well on standard spinning drives
> but fails with /dev/ram* devices at the stage where partitions are
> created.  Is this a non-starter or should I pursue further and dig on down
> in the documentation?  I’m using rhel 7 btw which come with sgdisk  0.8.6
> and parted 3.1
>
> Thanks
>
> Matt
>
>
>
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH bluestore space consumption with small objects

2017-08-08 Thread Pavel Shub
Marcus,

You may want to look at the bluestore_min_alloc_size setting as well
as the respective bluestore_min_alloc_size_ssd and
bluestore_min_alloc_size_hdd. By default bluestore sets a 64k block
size for ssds. I'm also using ceph for small objects and I've see my
OSD usage go down from 80% to 20% after setting the min alloc size to
4k.

Thanks,
Pavel

On Thu, Aug 3, 2017 at 3:59 PM, Gregory Farnum  wrote:
> Don't forget that at those sizes the internal journals and rocksdb size
> tunings are likely to be a significant fixed cost.
>
> On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander  wrote:
>>
>>
>> > Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann
>> > :
>> >
>> >
>> > Hi,
>> > we are doing some tests here with a Kraken setup using bluestore backend
>> > (on Ubuntu 64 bit).
>> > We are trying to store > 10 mio very small objects using RADOS.
>> > (no fs, no rdb, only osd and monitors)
>> >
>> > The setup was done with ceph-deploy, using the standard bluestore
>> > option, no separate devices
>> > for wal. The test cluster spreads over 3 virtual machines, each with
>> > 100GB storage für osd.
>> >
>> > We are now in the following situation (used pool is "test"):
>> > rados df
>> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED
>> > RD_OPS RD WR_OPS WR
>> > rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k
>> > test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M
>> >
>> > total_objects 595429
>> > total_used 141G
>> > total_avail 158G
>> > total_space 299G
>> >
>> > ceph osd df
>> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
>> > 0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72
>> > 1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72
>> > 2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72
>> > TOTAL 299G 148G 150G 49.65
>> > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02
>> >
>> > As you can see, there are about 18GB data stored in ~595000 objects now.
>> > The actual space consumption is about 150GB, which fills about half of
>> > the storage.
>> >
>>
>> Not really. Each OSD uses 50GB, but since you replicate 3 times (default)
>> it's storing 150GB spread out over 3 OSDs.
>>
>> So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a
>> lot, but a lot less then 150GB.
>>
>> > Objects have been added with a test script using the rados command line
>> > (put).
>> >
>> > Obviously, the stored objects are counted byte by byte in the rados df
>> > command,
>> > but the real space allocation is about factor 8.
>> >
>>
>> As written above, it's ~2.5x, not 8x.
>>
>> > The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects.
>> >
>> > Is there any recommended way to configure bluestore with a better
>> > suitable
>> > block size for those small objects ? I cannot find any configuration
>> > option
>> > which would allow modification of the internal block handling of
>> > bluestore.
>> > Is luminous an option which allows more specific configuration ?
>> >
>>
>> Could you try this with the Luminous RC as well? I don't know the answer
>> here, but since Kraken a LOT has been improved to BlueStore.
>>
>> Wido
>>
>> > Thank you all in advance for support.
>> >
>> > Marcus Haarmann
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore on luminous using ramdisk?

2017-08-08 Thread matthew.wells
Hi
I’m coming at this with not a lot of ceph experience but some enthusiasm so 
forgive me if this is an inappropriate question but is there any reason why 
it’s not possible, in theory, to setup bluestore using ramdisk?

 In my application I can afford to risk losing all data on system 
failure/reboot/whatever,  but I’m looking at trying to optimise performance.  
If it is possible what would be the best way to do this? The ceph-disk prepare 
—bluestore works really well on standard spinning drives but fails with 
/dev/ram* devices at the stage where partitions are created.  Is this a 
non-starter or should I pursue further and dig on down in the documentation?  
I’m using rhel 7 btw which come with sgdisk  0.8.6 and parted 3.1

Thanks

Matt




-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pg inconsistent / export_files error -5

2017-08-08 Thread Marc Roos
 


The --debug indeed comes up with something 
bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device 
location [0x15a017~1000], logical extent 0x0~1000,
 bluestore(/var/lib/ceph/osd/ceph-9) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0xb40b26a7, expected 0x90407f75, device 
location [0x2daea~1000], logical extent 0x0~1000,

I dont know how to interpret this, but am I correct to understand that 
data has been written across the cluster to these 3 osd's and all 3 have 
somehow received something different?


size=4194304
object_info: 
17:6ca10b29:::rbd_data.1fff61238e1f29.9923:head(5387'35157 
client.2096993.0:78941 dirty|data_digest|omap_digest s 4194304 uv 35356 
dd f53dff2e od  alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head#
size=4194304
object_info: 
17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.11b4:head(5163'7136 
client.2074638.1:483264 dirty|data_digest|omap_digest s 4194304 uv 7418 
dd 43d61c5d od  alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head#
size=4194304
object_info: 
17:6ca13bed:::rbd_data.1f114174b0dc51.02c6:head(5236'7640 
client.2074638.1:704364 dirty|data_digest|omap_digest s 4194304 uv 7922 
dd 3bcff64d od  alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head#
size=4194304
object_info: 
17:6ca1a791:::rbd_data.1fff61238e1f29.f101:head(5387'35553 
client.2096993.0:123721 dirty|data_digest|omap_digest s 4194304 uv 35752 
dd f9bc0fbd od  alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4#
size=4194304
object_info: 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4(5390'56613 
client.2096907.1:3222443 dirty|omap_digest s 4194304 uv 55477 od 
 alloc_hint [0 0 0])
2017-08-08 15:57:45.078348 7fad08fa4100 -1 
bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device 
location [0x15a017~1000], logical extent 0x0~1000, object 
#17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4#
export_files error -5
2017-08-08 15:57:45.081279 7fad08fa4100  1 
bluestore(/var/lib/ceph/osd/ceph-12) umount
2017-08-08 15:57:45.150210 7fad08fa4100  1 freelist shutdown
2017-08-08 15:57:45.150307 7fad08fa4100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] 
Shutdown: canceling all background work
2017-08-08 15:57:45.152099 7fad08fa4100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown 
complete
2017-08-08 15:57:45.184742 7fad08fa4100  1 bluefs umount
2017-08-08 15:57:45.203674 7fad08fa4100  1 bdev(0x7fad0b260e00 
/var/lib/ceph/osd/ceph-12/block) close
2017-08-08 15:57:45.442499 7fad08fa4100  1 bdev(0x7fad0b0a5a00 
/var/lib/ceph/osd/ceph-12/block) close

grep -i export_files strace.out -C 10

814  16:08:19.261144 futex(0x7fffea9378c0, FUTEX_WAKE_PRIVATE, 1) = 0 
<0.10>
6814  16:08:19.261242 futex(0x7f4832bb60bc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 
0x7f4832bb60b8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.12>
6814  16:08:19.261281 madvise(0x7f4843bf, 524288, MADV_DONTNEED 

6815  16:08:19.261382 <... futex resumed> ) = 0 <14.990766>
6814  16:08:19.261412 <... madvise resumed> ) = 0 <0.000123>
6814  16:08:19.261446 madvise(0x7f4843b7, 1048576, MADV_DONTNEED 

6815  16:08:19.261474 futex(0x7f4832bb6038, FUTEX_WAKE_PRIVATE, 1 

6814  16:08:19.261535 <... madvise resumed> ) = 0 <0.67>
6815  16:08:19.261557 <... futex resumed> ) = 0 <0.69>
6815  16:08:19.261647 futex(0x7f4832bb60bc, FUTEX_WAIT_PRIVATE, 45, NULL 

6814  16:08:19.261700 write(2, "export_files error ", 19) = 
19 <0.24>
6814  16:08:19.261774 write(2, "-5", 2) = 2 <0.18>
6814  16:08:19.26184

[ceph-users] One Monitor filling the logs

2017-08-08 Thread Konrad Riedel

Hi Ceph users,

my luminous (ceph version 12.1.1) testcluster is doing fine, except that 
one Monitor is filling the logs


 -rw-r--r-- 1 ceph ceph 119M Aug  8 15:27 ceph-mon.1.log

ceph-mon.1.log:

2017-08-08 15:57:49.509176 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon felix started
2017-08-08 15:57:49.646006 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon daniel started
2017-08-08 15:57:49.830046 7ff45d13a700  0 log_channel(cluster) log 
[DBG] : mgrmap e256330: udo(active)
2017-08-08 15:57:51.509410 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon felix started
2017-08-08 15:57:51.646269 7ff4573c4700  0 log_channel(cluster) log 
[DBG] : Standby manager daemon daniel started
2017-08-08 15:57:52.054987 7ff45d13a700  0 log_channel(cluster) log 
[DBG] : mgrmap e256331: udo(active)


I've tried to reduce the debug settings ( "debug_mon": "0/1", 
"debug_monc": "0/1"), but I still get 3 messages per

second. Does anybody know how to mute this?

All log settings (defaults):

{
"name": "mon.1",
"cluster": "ceph",
"debug_none": "0/5",
"debug_lockdep": "0/1",
"debug_context": "0/1",
"debug_crush": "1/1",
"debug_mds": "1/5",
"debug_mds_balancer": "1/5",
"debug_mds_locker": "1/5",
"debug_mds_log": "1/5",
"debug_mds_log_expire": "1/5",
"debug_mds_migrator": "1/5",
"debug_buffer": "0/1",
"debug_timer": "0/1",
"debug_filer": "0/1",
"debug_striper": "0/1",
"debug_objecter": "0/1",
"debug_rados": "0/5",
"debug_rbd": "0/5",
"debug_rbd_mirror": "0/5",
"debug_rbd_replay": "0/5",
"debug_journaler": "0/5",
"debug_objectcacher": "0/5",
"debug_client": "0/5",
"debug_osd": "1/5",
"debug_optracker": "0/5",
"debug_objclass": "0/5",
"debug_filestore": "1/3",
"debug_journal": "1/3",
"debug_ms": "0/5",
"debug_mon": "0/1",
"debug_monc": "0/1",
"debug_paxos": "1/5",
"debug_tp": "0/5",
"debug_auth": "1/5",
"debug_crypto": "1/5",
"debug_finisher": "1/1",
"debug_heartbeatmap": "1/5",
"debug_perfcounter": "1/5",
"debug_rgw": "1/5",
"debug_civetweb": "1/10",
"debug_javaclient": "1/5",
"debug_asok": "1/5",
"debug_throttle": "1/1",
"debug_refs": "0/0",
"debug_xio": "1/5",
"debug_compressor": "1/5",
"debug_bluestore": "1/5",
"debug_bluefs": "1/5",
"debug_bdev": "1/3",
"debug_kstore": "1/5",
"debug_rocksdb": "4/5",
"debug_leveldb": "4/5",
"debug_memdb": "4/5",
"debug_kinetic": "1/5",
"debug_fuse": "1/5",
"debug_mgr": "1/5",
"debug_mgrc": "1/5",
"debug_dpdk": "1/5",
"debug_eventtrace": "1/5",
"host": "felix",

Thanks & regards

Konrad Riedel

--


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All flash ceph witch NVMe and SPDK

2017-08-08 Thread Mike A

> 7 авг. 2017 г., в 9:54, Wido den Hollander  написал(а):
> 
> 
>> Op 3 augustus 2017 om 15:28 schreef Mike A :
>> 
>> 
>> Hello
>> 
>> Our goal it is make fast storage as possible. 
>> By now our configuration of 6 servers look like that:
>> * 2 x CPU Intel Gold 6150 20 core 2.4Ghz
>> * 2 x 16 Gb NVDIMM DDR4 DIMM
>> * 6 x 16 Gb RAM DDR4
>> * 6 x Intel DC P4500 4Tb NVMe 2.5"
>> * 2 x Mellanox ConnectX-4 EN Lx 25Gb dualport
>> 
> 
> To get the maximum out of your NVMe you will need higher clocked CPUs. 3.5Ghz 
> or something.
> 
> However, I'm still not convinced you will get the maximum out of your NVMe 
> with Ceph.
> 
> Although you are looking into 'partitioning' your NVMe with SPDK I would look 
> at less core which are clocked higher.
> 
> Wido
> 

Thanks for reply.

The description of server have an error: not 20 core with 2.4Ghz per core, this 
is 18 core with 2.7Ghz per core.

Yes, I also think that an any cpu that I can use will be not enough the maximum 
out of these disks.
I think, maybe in future ceph will be better in CPU consumption issue and these 
CPU can server more IOPS than now.

By now my idea is do decrease a CPU consumtion by use SPDK and RDMA technology. 
Hope that SPDK and RDMA is now prodaction use ready. 

>> What a status in ceph of RDMA, NVDIMM access using libpmem and SPDK software?
>> How mature this technologes in Ceph? Ready for prodaction use?
>> 
>> Mike
>> 
>> 

— 
Mike, runs

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to fix X is an unexpected clone

2017-08-08 Thread Steve Taylor
I encountered this same issue on two different clusters running Hammer 0.94.9 
last week. In both cases I was able to resolve it by deleting (moving) all 
replicas of the unexpected clone manually and issuing a pg repair. Which 
version did you see this on? A call stack for the resulting crash would also be 
interesting, although troubleshooting further is probably less valid and less 
valuable now that you've resolved the problem. It's just a matter of curiosity 
at this point.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Tue, 2017-08-08 at 12:02 +0200, Stefan Priebe - Profihost AG wrote:

Hello Greg,

Am 08.08.2017 um 11:56 schrieb Gregory Farnum:


On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG
mailto:s.pri...@profihost.ag> 
> wrote:

Hello,

how can i fix this one:

2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
3:58654d3d:::rbd_data.106dd406b8b4567.018c:9d455 is an
unexpected clone
2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
pgs repair; 1 scrub errors
2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
scrub errors

If i just delete manually the relevant files ceph is crashing. rados
does not list those at all?

How can i fix this?


You've sent quite a few emails that have this story spread out, and I
think you've tried several different steps to repair it that have been a
bit difficult to track.

It would be helpful if you could put the whole story in one place and
explain very carefully exactly what you saw and how you responded. Stuff
like manually copying around the wrong files, or files without a
matching object info, could have done some very strange things.
Also, basic debugging stuff like what version you're running will help. :)

Also note that since you've said elsewhere you don't need this image, I
don't think it's going to hurt you to leave it like this for a bit
(though it will definitely mess up your monitoring).
-Greg



i'm sorry about that. You're correct.

I was able to fix this just a few minutes ago by using the
ceph-object-tool and the remove operation to remove all left over files.

I did this on all OSDs with the problematic pg. After that ceph was able
to fix itself.

A better approach might be that ceph can recover itself from an
unexpected clone by just deleting it.

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New install error

2017-08-08 Thread Timothy Wolgemuth
I have a new installation and following the quick start guide at:

http://docs.ceph.com/docs/master/start/quick-ceph-deploy/

Running into the following error in the create-initial step.  See below:



$ ceph-deploy --username ceph-deploy mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph-deploy/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /bin/ceph-deploy --username
ceph-deploy mon create-initial
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : ceph-deploy
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create-initial
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph01
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph01 ...
[ceph01][DEBUG ] connection detected need for sudo
[ceph01][DEBUG ] connected to host: ceph-deploy@ceph01
[ceph01][DEBUG ] detect platform information from remote host
[ceph01][DEBUG ] detect machine type
[ceph01][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.3.1611 Core
[ceph01][DEBUG ] determining if provided host has same hostname in remote
[ceph01][DEBUG ] get remote short hostname
[ceph01][DEBUG ] deploying mon to ceph01
[ceph01][DEBUG ] get remote short hostname
[ceph01][DEBUG ] remote hostname: ceph01
[ceph01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph01][DEBUG ] create the mon path if it does not exist
[ceph01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph01/done
[ceph01][DEBUG ] create a done file to avoid re-doing the mon deployment
[ceph01][DEBUG ] create the init path if it does not exist
[ceph01][INFO  ] Running command: sudo systemctl enable ceph.target
[ceph01][INFO  ] Running command: sudo systemctl enable ceph-mon@ceph01
[ceph01][INFO  ] Running command: sudo systemctl start ceph-mon@ceph01
[ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.ceph01.asok mon_status
[ceph01][DEBUG ]

[ceph01][DEBUG ] status for monitor: mon.ceph01
[ceph01][DEBUG ] {
[ceph01][DEBUG ]   "election_epoch": 3,
[ceph01][DEBUG ]   "extra_probe_peers": [
[ceph01][DEBUG ] "192.168.100.11:6789/0"
[ceph01][DEBUG ]   ],
[ceph01][DEBUG ]   "monmap": {
[ceph01][DEBUG ] "created": "2017-08-08 09:00:47.536389",
[ceph01][DEBUG ] "epoch": 1,
[ceph01][DEBUG ] "fsid": "89935cd7-d056-4dcd-80b2-925257811fd6",
[ceph01][DEBUG ] "modified": "2017-08-08 09:00:47.536389",
[ceph01][DEBUG ] "mons": [
[ceph01][DEBUG ]   {
[ceph01][DEBUG ] "addr": "10.135.130.95:6789/0",
[ceph01][DEBUG ] "name": "ceph01",
[ceph01][DEBUG ] "rank": 0
[ceph01][DEBUG ]   }
[ceph01][DEBUG ] ]
[ceph01][DEBUG ]   },
[ceph01][DEBUG ]   "name": "ceph01",
[ceph01][DEBUG ]   "outside_quorum": [],
[ceph01][DEBUG ]   "quorum": [
[ceph01][DEBUG ] 0
[ceph01][DEBUG ]   ],
[ceph01][DEBUG ]   "rank": 0,
[ceph01][DEBUG ]   "state": "leader",
[ceph01][DEBUG ]   "sync_provider": []
[ceph01][DEBUG ] }
[ceph01][DEBUG ]

[ceph01][INFO  ] monitor: mon.ceph01 is running
[ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.ceph01.asok mon_status
[ceph_deploy.mon][INFO  ] processing monitor mon.ceph01
[ceph01][DEBUG ] connection detected need for sudo
[ceph01][DEBUG ] connected to host: ceph-deploy@ceph01
[ceph01][DEBUG ] detect platform information from remote host
[ceph01][DEBUG ] detect machine type
[ceph01][DEBUG ] find the location of an executable
[ceph01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.ceph01.asok mon_status
[ceph_deploy.mon][INFO  ] mon.ceph01 monitor has reached quorum!
[ceph_deploy.mon][INFO  ] all initial monitors are running and have formed
quorum
[ceph_deploy.mon][INFO  ] Running gatherkeys...
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory
/tmp/tmpmn4Gzd
[ceph01][DEBUG ] connection detected need for sudo
[ceph01][DEBUG ] connected to host: ceph-deploy@ceph01
[ceph01][DEBUG ] detect platform information from remote host
[ceph01][DEBUG ] detect machine type
[ceph01][DEBUG ] get remote short hostname
[ceph01][DEBUG ] fetch remote file
[ceph01][INFO  ] Running command: sudo /usr/bin

Re: [ceph-users] 答复: hammer(0.94.5) librbd dead lock,i want to how to resolve

2017-08-08 Thread Jason Dillaman
The hammer release is nearly end-of-life pending the release of
luminous. I wouldn't say it's a bug so much as a consequence of timing
out RADOS operations -- as I stated before, you most likely have
another thread stuck waiting on the cluster while that lock is held,
but you only provided the backtrace for a single thread.

On Tue, Aug 8, 2017 at 2:34 AM, Shilu  wrote:
> rbd_data.259fe1073f804.0929 925696~4096 should_complete: r = -110 
>this  is  timeout log, I put few logfile.
>
> I stop ceph by ceph osd pause, then ceph osd unpause , i use librbd by tgt, 
> it will cause tgt thread hang, finally tgt can not write data to ceph
>
>
> I test this on ceph 10.2.5,It work well, I think librbd has a bug on ceph 
> 0.94.5
>
> My Ceph.conf set rados_mon_op_timeout =75
>rados_osd_op_timeout = 75
>client_mount_timeout = 75
>
> -邮件原件-
> 发件人: Jason Dillaman [mailto:jdill...@redhat.com]
> 发送时间: 2017年8月8日 7:58
> 收件人: shilu 09816 (RD)
> 抄送: ceph-users
> 主题: Re: hammer(0.94.5) librbd dead lock,i want to how to resolve
>
> I am not sure what you mean by "I stop ceph" (stopped all the OSDs?)
> -- and I am not sure how you are seeing ETIMEDOUT errors on a "rbd_write" 
> call since it should just block assuming you are referring to stopping the 
> OSDs. What is your use-case? Are you developing your own application on top 
> of librbd?
>
> Regardless, I can only assume there is another thread that is blocked while 
> it owns the librbd::ImageCtx::owner_lock.
>
> On Mon, Aug 7, 2017 at 8:35 AM, Shilu  wrote:
>> I write data by rbd_write,when I stop ceph, rbd_write timeout and
>> return
>> -110
>>
>>
>>
>> Then I call rbd_write again, it will deadlock, the code stack is
>> showed below
>>
>>
>>
>>
>>
>>
>>
>> #0  pthread_rwlock_rdlock () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
>>
>> #1  0x7fafbf9f75a0 in RWLock::get_read (this=0x7fafc48e1198) at
>> ./common/RWLock.h:76
>>
>> #2  0x7fafbfa31de0 in RLocker (lock=..., this=)
>> at
>> ./common/RWLock.h:130
>>
>> #3  librbd::aio_write (ictx=0x7fafc48e1000, off=71516229632, len=4096,
>>
>> buf=0x7fafc499e000 "\235?[\257\367n\255\263?\200\034\061\341\r",
>> c=0x7fafab44ef80, op_flags=0) at librbd/internal.cc:3320
>>
>> #4  0x7fafbf9eff19 in Context::complete (this=0x7fafab4174c0,
>> r=) at ./include/Context.h:65
>>
>> #5  0x7fafbfb00016 in ThreadPool::worker (this=0x7fafc4852c40,
>> wt=0x7fafc4948550) at common/WorkQueue.cc:128
>>
>> #6  0x7fafbfb010b0 in ThreadPool::WorkThread::entry
>> (this=> out>) at common/WorkQueue.h:408
>>
>> #7  0x7fafc59b6184 in start_thread (arg=0x7fafadbed700) at
>> pthread_create.c:312
>>
>> #8  0x7fafc52aaffd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> --
>> ---
>> 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
>> 邮件!
>> This e-mail and its attachments contain confidential information from
>> New H3C, which is intended only for the person or entity whose address
>> is listed above. Any use of the information contained herein in any
>> way (including, but not limited to, total or partial disclosure,
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error,
>> please notify the sender by phone or email immediately and delete it!
>
>
>
> --
> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] implications of losing the MDS map

2017-08-08 Thread John Spray
On Tue, Aug 8, 2017 at 1:51 AM, Daniel K  wrote:
> I finally figured out how to get the ceph-monstore-tool (compiled from
> source) and am ready to attemp to recover my cluster.
>
> I have one question -- in the instructions,
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
> under Recovery from OSDs, Known limitations:
>
> ->
>
> MDS Maps: the MDS maps are lost.
>
>
> What are the implications of this? Do I just need to rebuild this, or is
> there a data loss component to it? -- Is my data stored in CephFS still
> safe?

It depends.  If you just had a single active MDS, then you can
probably get back to a working state by just doing an "fs new"
pointing at your existing pools, followed by an "fs reset" to make it
skip the "creating" phase.  Make sure you do not have any MDS daemons
running until after you have done the fs reset.

If you had multiple active MDS daemons, then you would need to use the
disaster recovery tools to try and salvage their metadata before
resetting the mds map.

John

>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW - Unable to delete bucket with radosgw-admin

2017-08-08 Thread Andreas Calminder
Hi,
I'm running into a weird issue while trying to delete a bucket with
radosgw-admin

#  radosgw-admin --cluster ceph bucket rm --bucket=12856/weird_bucket
--purge-objects

This returns almost instantly even though the bucket contains +1M
objects and the bucket isn't removed. Running above command with debug
flags (--debug-rgw=20 --debug-ms 20)

I notice the session closing down after encountering:
2017-08-08 10:51:52.032946 7f8a9caf4700 10 -- CLIENT_IP:0/482026554 >>
ENDPOINT_IP:6800/5740 pipe(0x7f8ac2acc8c0 sd=7 :3482 s=2 pgs=7856733
cs=1 l=1 c=0x7f8ac2acb3a0).reader got message 8 0x7f8a64001640
osd_op_reply(218
be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_a_weird_object
[getxattrs,stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7
2017-08-08 10:51:52.032970 7f8a9caf4700  1 -- CLIENT_IP:0/482026554
<== osd.47 ENDPOINT_IP:6800/5740 8  osd_op_reply(218
be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_a_weird_object
[getxattrs,stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7
 317+0+0 (3298345941 0 0) 0x7f8a64001640 con 0x7f8ac2acb3a0

If I understand the output correctly, the file wasn't found and the
session was closed down. The radosgw-admin command doesn't hint that
anything bad has happened though.

Anyone seen this behaviour or anything similar? Any pointers of how to
fix it, I just want to get rid of the bucket since it's both
over-sized and unused.

Best regards,
Andreas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to fix X is an unexpected clone

2017-08-08 Thread Stefan Priebe - Profihost AG
Hello Greg,

Am 08.08.2017 um 11:56 schrieb Gregory Farnum:
> On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG
> mailto:s.pri...@profihost.ag>> wrote:
> 
> Hello,
> 
> how can i fix this one:
> 
> 2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
> 3:58654d3d:::rbd_data.106dd406b8b4567.018c:9d455 is an
> unexpected clone
> 2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
> pgs repair; 1 scrub errors
> 2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
> 2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
> scrub errors
> 
> If i just delete manually the relevant files ceph is crashing. rados
> does not list those at all?
> 
> How can i fix this?
> 
> 
> You've sent quite a few emails that have this story spread out, and I
> think you've tried several different steps to repair it that have been a
> bit difficult to track.
> 
> It would be helpful if you could put the whole story in one place and
> explain very carefully exactly what you saw and how you responded. Stuff
> like manually copying around the wrong files, or files without a
> matching object info, could have done some very strange things.
> Also, basic debugging stuff like what version you're running will help. :)
> 
> Also note that since you've said elsewhere you don't need this image, I
> don't think it's going to hurt you to leave it like this for a bit
> (though it will definitely mess up your monitoring).
> -Greg

i'm sorry about that. You're correct.

I was able to fix this just a few minutes ago by using the
ceph-object-tool and the remove operation to remove all left over files.

I did this on all OSDs with the problematic pg. After that ceph was able
to fix itself.

A better approach might be that ceph can recover itself from an
unexpected clone by just deleting it.

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to fix X is an unexpected clone

2017-08-08 Thread Gregory Farnum
On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> Hello,
>
> how can i fix this one:
>
> 2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
> 3:58654d3d:::rbd_data.106dd406b8b4567.018c:9d455 is an
> unexpected clone
> 2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
> pgs repair; 1 scrub errors
> 2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
> 2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
> scrub errors
>
> If i just delete manually the relevant files ceph is crashing. rados
> does not list those at all?
>
> How can i fix this?
>

You've sent quite a few emails that have this story spread out, and I think
you've tried several different steps to repair it that have been a bit
difficult to track.

It would be helpful if you could put the whole story in one place and
explain very carefully exactly what you saw and how you responded. Stuff
like manually copying around the wrong files, or files without a matching
object info, could have done some very strange things.
Also, basic debugging stuff like what version you're running will help. :)

Also note that since you've said elsewhere you don't need this image, I
don't think it's going to hurt you to leave it like this for a bit (though
it will definitely mess up your monitoring).
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running commands on Mon or OSD nodes

2017-08-08 Thread Osama Hasebou
Hi Everyone, 

I was trying to run the ceph osd crush reweight command to move data out of one 
node that has hardware failures and I noticed that as I set the crush reweight 
to 0, some nodes would reflect it when I do ceph osd tree and some wouldn't. 

What is the proper way to run command access cluster, does one need to run same 
command *ceph osd crush reweight* from all mon nodes and it would push it down 
to all osd tree and update the crush, or is it also ok to run it once on an osd 
node and it will copy it across the other nodes and update the crush map? 

Thank you! 

Regards, 
Ossi 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread Dan van der Ster
Hi Bryan,

How does the norebalance procedure work? You set the flag, increase
the weight, then I expect the PGs to stay in remapped unless they're
degraded ... why would a PG be degraded just because of a weight
change? And then what happens when you unset norebalance?

Cheers, Dan


On Mon, Aug 7, 2017 at 6:07 PM, Bryan Stillwell  wrote:
> Dan,
>
> We recently went through an expansion of an RGW cluster and found that we 
> needed 'norebalance' set whenever making CRUSH weight changes to avoid slow 
> requests.  We were also increasing the CRUSH weight by 1.0 each time which 
> seemed to reduce the extra data movement we were seeing with smaller weight 
> increases.  Maybe something to try out next time?
>
> Bryan
>
> From: ceph-users  on behalf of Dan van der 
> Ster 
> Date: Friday, August 4, 2017 at 1:59 AM
> To: Laszlo Budai 
> Cc: ceph-users 
> Subject: Re: [ceph-users] expanding cluster with minimal impact
>
> Hi Laszlo,
>
> The script defaults are what we used to do a large intervention (the
> default delta weight is 0.01). For our clusters going any faster
> becomes disruptive, but this really depends on your cluster size and
> activity.
>
> BTW, in case it wasn't clear, to use this script for adding capacity
> you need to create the new OSDs to your cluster with initial crush
> weight = 0.0
>
> osd crush initial weight = 0
> osd crush update on start = true
>
> -- Dan
>
>
>
> On Thu, Aug 3, 2017 at 8:12 PM, Laszlo Budai  wrote:
> Dear all,
>
> I need to expand a ceph cluster with minimal impact. Reading previous
> threads on this topic from the list I've found the ceph-gentle-reweight
> script
> (https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
> created by Dan van der Ster (Thank you Dan for sharing the script with us!).
>
> I've done some experiments, and it looks promising, but it is needed to
> properly set the parameters. Did any of you tested this script before? what
> is the recommended delta_weight to be used? From the default parameters of
> the script I can see that the default delta weight is .5% of the target
> weight that means 200 reweighting cycles. I have experimented with a
> reweight ratio of 5% while running a fio test on a client. The results were
> OK (I mean no slow requests), but my  test cluster was a very small one.
>
> If any of you has done some larger experiments with this script I would be
> really interested to read about your results.
>
> Thank you!
> Laszlo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread Laszlo Budai

Hi Dan,

Thank you for your answer. Yes, I understood, that I need to have the initial 
crush weight 0. That's how I tested when manually adding OSDs in my test 
cluster. I see that with the settings mentioned by you adding OSDs using the 
ceph-disk tool will also have the crush weight 0, so I could use it with chef 
cookbook. Thank you!!!

Kind regards,
Laszlo


On 04.08.2017 10:58, Dan van der Ster wrote:

Hi Laszlo,

The script defaults are what we used to do a large intervention (the
default delta weight is 0.01). For our clusters going any faster
becomes disruptive, but this really depends on your cluster size and
activity.

BTW, in case it wasn't clear, to use this script for adding capacity
you need to create the new OSDs to your cluster with initial crush
weight = 0.0

osd crush initial weight = 0
osd crush update on start = true

-- Dan



On Thu, Aug 3, 2017 at 8:12 PM, Laszlo Budai  wrote:

Dear all,

I need to expand a ceph cluster with minimal impact. Reading previous
threads on this topic from the list I've found the ceph-gentle-reweight
script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der Ster (Thank you Dan for sharing the script with us!).

I've done some experiments, and it looks promising, but it is needed to
properly set the parameters. Did any of you tested this script before? what
is the recommended delta_weight to be used? From the default parameters of
the script I can see that the default delta weight is .5% of the target
weight that means 200 reweighting cycles. I have experimented with a
reweight ratio of 5% while running a fio test on a client. The results were
OK (I mean no slow requests), but my  test cluster was a very small one.

If any of you has done some larger experiments with this script I would be
really interested to read about your results.

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] expanding cluster with minimal impact

2017-08-08 Thread Laszlo Budai


Hello,
Thank you all for sharing your experiences, thoughts. One more question: 
regarding the pool used for the measurement (the -p option of the script), is 
it recommended to create a new pool for this? Or I could use one of our already 
existing pools in the cluster?

Thank you,
Laszlo


On 07.08.2017 19:07, Bryan Stillwell wrote:

Dan,

We recently went through an expansion of an RGW cluster and found that we 
needed 'norebalance' set whenever making CRUSH weight changes to avoid slow 
requests.  We were also increasing the CRUSH weight by 1.0 each time which 
seemed to reduce the extra data movement we were seeing with smaller weight 
increases.  Maybe something to try out next time?

Bryan

From: ceph-users  on behalf of Dan van der Ster 

Date: Friday, August 4, 2017 at 1:59 AM
To: Laszlo Budai 
Cc: ceph-users 
Subject: Re: [ceph-users] expanding cluster with minimal impact

Hi Laszlo,

The script defaults are what we used to do a large intervention (the
default delta weight is 0.01). For our clusters going any faster
becomes disruptive, but this really depends on your cluster size and
activity.

BTW, in case it wasn't clear, to use this script for adding capacity
you need to create the new OSDs to your cluster with initial crush
weight = 0.0

osd crush initial weight = 0
osd crush update on start = true

-- Dan



On Thu, Aug 3, 2017 at 8:12 PM, Laszlo Budai  wrote:
Dear all,

I need to expand a ceph cluster with minimal impact. Reading previous
threads on this topic from the list I've found the ceph-gentle-reweight
script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der Ster (Thank you Dan for sharing the script with us!).

I've done some experiments, and it looks promising, but it is needed to
properly set the parameters. Did any of you tested this script before? what
is the recommended delta_weight to be used? From the default parameters of
the script I can see that the default delta weight is .5% of the target
weight that means 200 reweighting cycles. I have experimented with a
reweight ratio of 5% while running a fio test on a client. The results were
OK (I mean no slow requests), but my  test cluster was a very small one.

If any of you has done some larger experiments with this script I would be
really interested to read about your results.

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com