[ceph-users] BUG 14154 on erasure coded PG

2016-09-09 Thread Gerd Jakobovitsch

Dear all,

I am using an erasure coded pool, and I get to a situation where I'm not 
able to recover a PG. The OSDs that contain this PG keep crashing, on 
the same behavior registered at http://tracker.ceph.com/issues/14154.


I'm using ceph 0.94.9 (it first appeared on 0.94.7, an upgrade didn't 
solve the issue) on centOS 7.2, kernel 3.10.0-327.18.2.el7.x86_64.


My EC profile:

directory=/usr/lib64/ceph/erasure-code
k=3
m=2
plugin=isa

Is this issue being handled? Is there any hint on how to handle it?
--




--

As informa��es contidas nesta mensagem s�o CONFIDENCIAIS, protegidas pelo 
sigilo legal e por direitos autorais. A divulga��o, distribui��o, reprodu��o ou 
qualquer forma de utiliza��o do teor deste documento depende de autoriza��o do 
emissor, sujeitando-se o infrator �s san��es legais. Caso esta comunica��o 
tenha sido recebida por engano, favor avisar imediatamente, respondendo esta 
mensagem.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu latest ceph-deploy fails to install hammer

2016-09-09 Thread Alex Gorbachev
Confirmed - older version of ceph-deploy is working fine.  Odd as
there is a large number of Hammer users out there.  Thank you for the
explanation and fix.
--
Alex Gorbachev
Storcium


On Fri, Sep 9, 2016 at 12:15 PM, Vasu Kulkarni  wrote:
> There is a known issue with latest ceph-deploy with *hammer*, the
> package split in later releases after *hammer* is the root cause,
> If you use ceph-deploy 1.5.25 (older version) it will work. you can
> get 1.5.25 from pypi
>
> http://tracker.ceph.com/issues/17128
>
> On Fri, Sep 9, 2016 at 8:28 AM, Shain Miley  wrote:
>> Alex,
>> I ran into this issue yesterday as well.
>>
>> I ended up just installing ceph via apt-get locally on the new server.
>>
>> I have not been able to get an actual osd added to the cluster at this point 
>> though (see my emails over the last 2 days or so).
>>
>> Please let me know if you end up able to add an osd properly with 1.5.35.
>>
>> Thanks,
>>
>> Shain
>>
>>> On Sep 9, 2016, at 11:12 AM, Alex Gorbachev  
>>> wrote:
>>>
>>> This problem seems to occur with the latest ceph-deploy version 1.5.35
>>>
>>> [lab2-mon3][DEBUG ] Fetched 5,382 kB in 4s (1,093 kB/s)
>>> [lab2-mon3][DEBUG ] Reading package lists...
>>> [lab2-mon3][INFO  ] Running command: env
>>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>>> --assume-yes -q --no-install-recommends install -o
>>> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
>>> [lab2-mon3][DEBUG ] Reading package lists...
>>> [lab2-mon3][DEBUG ] Building dependency tree...
>>> [lab2-mon3][DEBUG ] Reading state information...
>>> [lab2-mon3][WARNIN] E: Unable to locate package ceph-osd
>>> [lab2-mon3][WARNIN] E: Unable to locate package ceph-mon
>>> [lab2-mon3][ERROR ] RuntimeError: command returned non-zero exit status: 100
>>> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>>> --assume-yes -q --no-install-recommends install -o
>>> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
>>>
>>> --
>>> Alex Gorbachev
>>> Storcium
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw meta pool

2016-09-09 Thread Casey Bodley

Hi,

My (limited) understanding of this metadata heap pool is that it's an 
archive of metadata entries and their versions. According to Yehuda, 
this was intended to support recovery operations by reverting specific 
metadata objects to a previous version. But nothing has been implemented 
so far, and I'm not aware of any plans to do so. So these objects are 
being created, but never read or deleted.


This was discussed in the rgw standup this morning, and we agreed that 
this archival should be made optional (and default to off), most likely 
by assigning an empty pool name to the zone's 'metadata_heap' field. 
I've created a ticket at http://tracker.ceph.com/issues/17256 to track 
this issue.


Casey


On 09/09/2016 11:01 AM, Warren Wang - ISD wrote:

A little extra context here. Currently the metadata pool looks like it is
on track to exceed the number of objects in the data pool, over time. In a
brand new cluster, we¹re already up to almost 2 million in each pool.

 NAME  ID USED  %USED MAX AVAIL
OBJECTS
 default.rgw.buckets.data  17 3092G  0.86  345T
2013585
 default.rgw.meta  25  743M 0  172T
1975937

We¹re concerned this will be unmanageable over time.

Warren Wang


On 9/9/16, 10:54 AM, "ceph-users on behalf of Pavan Rallabhandi"
 wrote:


Any help on this is much appreciated, am considering to fix this, given
it¹s confirmed an issue unless am missing something obvious.

Thanks,
-Pavan.

On 9/8/16, 5:04 PM, "ceph-users on behalf of Pavan Rallabhandi"
 wrote:

Trying it one more time on the users list.

In our clusters running Jewel 10.2.2, I see default.rgw.meta pool

running into large number of objects, potentially to the same range of
objects contained in the data pool.

I understand that the immutable metadata entries are now stored in

this heap pool, but I couldn¹t reason out why the metadata objects are
left in this pool even after the actual bucket/object/user deletions.

The put_entry() promptly seems to be storing the same in the heap

pool
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880,
but I do not see them to be reaped ever. Are they left there for some
reason?

Thanks,

-Pavan.


___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] help on keystone v3 ceph.conf in Jewel

2016-09-09 Thread LOPEZ Jean-Charles
Hi,

from the log file it looks like librbd.so doesn’t contain a specific entry 
point that needs to be called. See my comment inline.

Have you upgraded the ceph client packages on the cinder node and on the nova 
compute node? Or you just did the upgrade on the ceph nodes?

JC

> On Sep 9, 2016, at 09:37, Robert Duncan  wrote:
> 
> Hi,
> 
> I have deployed the Mirantis distribution of OpenStack Mitaka which comes 
> with Ceph Hammer, since I want to use keystone v3 with radosgw I added the 
> Ubuntu cloud archive for Mitaka on Trusty.
> And then followed the upgrade instructions (had to remove the mos sources 
> from sources.list)
> 
> Anyway the upgrade looks to have gone okay and I am now on jewel, but rdb and 
> rgw have stopped working in the cloud - is this down to my ceph.conf?
> 
> There are no clues on keystone logs
> 
> 
> 
> [global]
> fsid = 5d587e15-5904-4fd2-84db-b4038c18e327
> mon_initial_members = node-10
> mon_host = 172.25.80.4
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 172.25.80.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 2048
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 172.25.80.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> setuser match path = /var/lib/ceph/$type/$cluster-$id
> 
> [client]
> rbd_cache_writethrough_until_flush = True
> rbd_cache = True
> 
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
> rgw_socket_path = /tmp/radosgw.sock
> rgw_keystone_revocation_interval = 100
> rgw_keystone_url = http://172.25.90.5:35357
> rgw_keystone_admin_token = iaUKRVcU6dSa8xuJvJiZYkEZ
> host = node-10
> rgw_dns_name = *.domain.local
> rgw_print_continue = True
> rgw_keystone_token_cache_size = 10
> rgw_data = /var/lib/ceph/radosgw
> user = www-data
> 
> Cinder throws the following error:
> 
> 9 16:01:26 node-10 cinder-volume: 2016-09-09 16:01:26.026 3759 ERROR 
> oslo_messaging.rpc.dispatcher [req-c88086a3-3d6b-42a3-9670-c4c92909423c 
> 9f4bf81c57214f88bced5e233061e71e 1cb2488ad03541df8f122b6f4907c820 - - -] 
> Exception during message handling: /usr/lib/librbd.so.1: undefined symbol: 
> _ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher Traceback 
> (most recent call last):
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
> 138, in _dispatch_and_reply
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher 
> incoming.message))
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
> 185, in _dispatch
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher return 
> self._do_dispatch(endpoint, method, ctxt, args)
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
> 127, in _do_dispatch
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher result = 
> func(ctxt, **new_args)
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 631, in 
> create_volume
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher 
> _run_flow()
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 618, in 
> _run_flow
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher 
> flow_engine.run()
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
> "/usr/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", 
> line 224, in run
> 2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher for 
> _state in self.run_iter():
> <155>Sep  9 16:01:26 node-10 cinder-scheduler: 2016-09-09 16:01:26.167 4008 
> ERROR cinder.scheduler.filter_scheduler 
> [req-c88086a3-3d6b-42a3-9670-c4c92909423c 9f4bf81c57214f88bced5e233061e71e 
> 1cb2488ad03541df8f122b6f4907c820 - - -] Error scheduling None from last 
> vol-service: rbd:volumes@RBD-backend#RBD-backend : [u'Traceback (most recent 
> call last):\n', u'  File 
> "/usr/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py",
>  line 82, in _execute_task\nresult = task.execute(**arguments)\n', u'  
> File 
> 

[ceph-users] help on keystone v3 ceph.conf in Jewel

2016-09-09 Thread Robert Duncan
Hi,

I have deployed the Mirantis distribution of OpenStack Mitaka which comes with 
Ceph Hammer, since I want to use keystone v3 with radosgw I added the Ubuntu 
cloud archive for Mitaka on Trusty.
And then followed the upgrade instructions (had to remove the mos sources from 
sources.list)

Anyway the upgrade looks to have gone okay and I am now on jewel, but rdb and 
rgw have stopped working in the cloud - is this down to my ceph.conf?

There are no clues on keystone logs



[global]
fsid = 5d587e15-5904-4fd2-84db-b4038c18e327
mon_initial_members = node-10
mon_host = 172.25.80.4
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
log_to_syslog_level = info
log_to_syslog = True
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 64
public_network = 172.25.80.0/24
log_to_syslog_facility = LOG_LOCAL0
osd_journal_size = 2048
auth_supported = cephx
osd_pool_default_pgp_num = 64
osd_mkfs_type = xfs
cluster_network = 172.25.80.0/24
osd_recovery_max_active = 1
osd_max_backfills = 1
setuser match path = /var/lib/ceph/$type/$cluster-$id

[client]
rbd_cache_writethrough_until_flush = True
rbd_cache = True

[client.radosgw.gateway]
rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
rgw_socket_path = /tmp/radosgw.sock
rgw_keystone_revocation_interval = 100
rgw_keystone_url = http://172.25.90.5:35357
rgw_keystone_admin_token = iaUKRVcU6dSa8xuJvJiZYkEZ
host = node-10
rgw_dns_name = *.domain.local
rgw_print_continue = True
rgw_keystone_token_cache_size = 10
rgw_data = /var/lib/ceph/radosgw
user = www-data

Cinder throws the following error:

9 16:01:26 node-10 cinder-volume: 2016-09-09 16:01:26.026 3759 ERROR 
oslo_messaging.rpc.dispatcher [req-c88086a3-3d6b-42a3-9670-c4c92909423c 
9f4bf81c57214f88bced5e233061e71e 1cb2488ad03541df8f122b6f4907c820 - - -] 
Exception during message handling: /usr/lib/librbd.so.1: undefined symbol: 
_ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher Traceback 
(most recent call last):
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, 
in _dispatch_and_reply
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher 
incoming.message))
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, 
in _dispatch
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, 
in _do_dispatch
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher result = 
func(ctxt, **new_args)
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 631, in 
create_volume
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher _run_flow()
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 618, in 
_run_flow
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher 
flow_engine.run()
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", 
line 224, in run
2016-09-09 16:01:26.026 3759 ERROR oslo_messaging.rpc.dispatcher for _state 
in self.run_iter():
<155>Sep  9 16:01:26 node-10 cinder-scheduler: 2016-09-09 16:01:26.167 4008 
ERROR cinder.scheduler.filter_scheduler 
[req-c88086a3-3d6b-42a3-9670-c4c92909423c 9f4bf81c57214f88bced5e233061e71e 
1cb2488ad03541df8f122b6f4907c820 - - -] Error scheduling None from last 
vol-service: rbd:volumes@RBD-backend#RBD-backend : [u'Traceback (most recent 
call last):\n', u'  File 
"/usr/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", 
line 82, in _execute_task\nresult = task.execute(**arguments)\n', u'  File 
"/usr/lib/python2.7/dist-packages/cinder/volume/flows/manager/create_volume.py",
 line 819, in execute\n**volume_spec)\n', u'  File 
"/usr/lib/python2.7/dist-packages/cinder/volume/flows/manager/create_volume.py",
 line 797, in _create_raw_volume\nreturn 
self.driver.create_volume(volume_ref)\n', u'  File 
"/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 551, in 
create_volume\nself.RBDProxy().create(client.ioctx,\n', u'  File "/usr/lib
 /python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 305, in 
RBDProxy\nreturn tpool.Proxy(self.rbd.RBD())\n', u'  File 

Re: [ceph-users] Ubuntu latest ceph-deploy fails to install hammer

2016-09-09 Thread Vasu Kulkarni
There is a known issue with latest ceph-deploy with *hammer*, the
package split in later releases after *hammer* is the root cause,
If you use ceph-deploy 1.5.25 (older version) it will work. you can
get 1.5.25 from pypi

http://tracker.ceph.com/issues/17128

On Fri, Sep 9, 2016 at 8:28 AM, Shain Miley  wrote:
> Alex,
> I ran into this issue yesterday as well.
>
> I ended up just installing ceph via apt-get locally on the new server.
>
> I have not been able to get an actual osd added to the cluster at this point 
> though (see my emails over the last 2 days or so).
>
> Please let me know if you end up able to add an osd properly with 1.5.35.
>
> Thanks,
>
> Shain
>
>> On Sep 9, 2016, at 11:12 AM, Alex Gorbachev  wrote:
>>
>> This problem seems to occur with the latest ceph-deploy version 1.5.35
>>
>> [lab2-mon3][DEBUG ] Fetched 5,382 kB in 4s (1,093 kB/s)
>> [lab2-mon3][DEBUG ] Reading package lists...
>> [lab2-mon3][INFO  ] Running command: env
>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>> --assume-yes -q --no-install-recommends install -o
>> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
>> [lab2-mon3][DEBUG ] Reading package lists...
>> [lab2-mon3][DEBUG ] Building dependency tree...
>> [lab2-mon3][DEBUG ] Reading state information...
>> [lab2-mon3][WARNIN] E: Unable to locate package ceph-osd
>> [lab2-mon3][WARNIN] E: Unable to locate package ceph-mon
>> [lab2-mon3][ERROR ] RuntimeError: command returned non-zero exit status: 100
>> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
>> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
>> --assume-yes -q --no-install-recommends install -o
>> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
>>
>> --
>> Alex Gorbachev
>> Storcium
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] virtio-blk multi-queue support and RBD devices?

2016-09-09 Thread Jason Dillaman
On Fri, Sep 9, 2016 at 10:33 AM, Alexandre DERUMIER  wrote:
> The main bottleneck with rbd currently, is cpu usage (limited to 1 iothread 
> by disk)

Yes, definitely a bottleneck -- but you can bypass the librbd IO
dispatch thread by setting "rbd_non_blocking_aio = false" in your Ceph
client config.

Our long term librbd goal is to remove that IO dispatch thread once we
can get a non-blocking IO path from end-to-end.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu latest ceph-deploy fails to install hammer

2016-09-09 Thread Shain Miley
Alex,
I ran into this issue yesterday as well.

I ended up just installing ceph via apt-get locally on the new server.

I have not been able to get an actual osd added to the cluster at this point 
though (see my emails over the last 2 days or so).

Please let me know if you end up able to add an osd properly with 1.5.35.

Thanks,

Shain

> On Sep 9, 2016, at 11:12 AM, Alex Gorbachev  wrote:
> 
> This problem seems to occur with the latest ceph-deploy version 1.5.35
> 
> [lab2-mon3][DEBUG ] Fetched 5,382 kB in 4s (1,093 kB/s)
> [lab2-mon3][DEBUG ] Reading package lists...
> [lab2-mon3][INFO  ] Running command: env
> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
> --assume-yes -q --no-install-recommends install -o
> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
> [lab2-mon3][DEBUG ] Reading package lists...
> [lab2-mon3][DEBUG ] Building dependency tree...
> [lab2-mon3][DEBUG ] Reading state information...
> [lab2-mon3][WARNIN] E: Unable to locate package ceph-osd
> [lab2-mon3][WARNIN] E: Unable to locate package ceph-mon
> [lab2-mon3][ERROR ] RuntimeError: command returned non-zero exit status: 100
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
> DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
> --assume-yes -q --no-install-recommends install -o
> Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
> 
> --
> Alex Gorbachev
> Storcium
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] virtio-blk multi-queue support and RBD devices?

2016-09-09 Thread Dzianis Kahanovich
Does "rbd op threads = N" solve bottleneck? IMHO it is possible to make this
value automated by QEMU from num-queues. If now not.

Alexandre DERUMIER пишет:
> Hi,
> 
> I'll test it next week to integrate it in proxmox.
> 
> But I'm not sure I'll improve too much performance , 
> 
> until qemu will be able to use multiple iothread with multiple queue.
> (I think that Paolo Bonzini still working on this currently).
> 
> The main bottleneck with rbd currently, is cpu usage (limited to 1 iothread 
> by disk)
> 
> I'll send a benchmark report to the ceph mailing next week.
> 
> 
> 
> - Mail original -
> De: "Simon Leinen" 
> À: "ceph-users" 
> Envoyé: Samedi 3 Septembre 2016 19:37:50
> Objet: [ceph-users] virtio-blk multi-queue support and RBD devices?
> 
> One of the new features in Qemu 2.7[1] is 
> 
> * virtio-blk now supports multiqueue through a "num-queues" device 
> property. 
> 
> We use virtio-blk in our OpenStack cluster to expose RBD volumes to 
> Qemu/KVM VMs. Can RBD-backed virtio-blk benefit from multiple queues? 
> 
> (I'm hopeful because virtio-scsi had multi-queue support for a while, 
> and someone reported increased IOPS even with RBD devices behind those.) 
> 


-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ubuntu latest ceph-deploy fails to install hammer

2016-09-09 Thread Alex Gorbachev
This problem seems to occur with the latest ceph-deploy version 1.5.35

[lab2-mon3][DEBUG ] Fetched 5,382 kB in 4s (1,093 kB/s)
[lab2-mon3][DEBUG ] Reading package lists...
[lab2-mon3][INFO  ] Running command: env
DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
--assume-yes -q --no-install-recommends install -o
Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw
[lab2-mon3][DEBUG ] Reading package lists...
[lab2-mon3][DEBUG ] Building dependency tree...
[lab2-mon3][DEBUG ] Reading state information...
[lab2-mon3][WARNIN] E: Unable to locate package ceph-osd
[lab2-mon3][WARNIN] E: Unable to locate package ceph-mon
[lab2-mon3][ERROR ] RuntimeError: command returned non-zero exit status: 100
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
--assume-yes -q --no-install-recommends install -o
Dpkg::Options::=--force-confnew ceph-osd ceph-mds ceph-mon radosgw

--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw meta pool

2016-09-09 Thread Warren Wang - ISD
A little extra context here. Currently the metadata pool looks like it is
on track to exceed the number of objects in the data pool, over time. In a
brand new cluster, we¹re already up to almost 2 million in each pool.

NAME  ID USED  %USED MAX AVAIL
OBJECTS
default.rgw.buckets.data  17 3092G  0.86  345T
2013585
default.rgw.meta  25  743M 0  172T
1975937

We¹re concerned this will be unmanageable over time.

Warren Wang


On 9/9/16, 10:54 AM, "ceph-users on behalf of Pavan Rallabhandi"
 wrote:

>Any help on this is much appreciated, am considering to fix this, given
>it¹s confirmed an issue unless am missing something obvious.
>
>Thanks,
>-Pavan.
>
>On 9/8/16, 5:04 PM, "ceph-users on behalf of Pavan Rallabhandi"
>prallabha...@walmartlabs.com> wrote:
>
>Trying it one more time on the users list.
>
>In our clusters running Jewel 10.2.2, I see default.rgw.meta pool
>running into large number of objects, potentially to the same range of
>objects contained in the data pool.
>
>I understand that the immutable metadata entries are now stored in
>this heap pool, but I couldn¹t reason out why the metadata objects are
>left in this pool even after the actual bucket/object/user deletions.
>
>The put_entry() promptly seems to be storing the same in the heap
>pool 
>https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880,
>but I do not see them to be reaped ever. Are they left there for some
>reason?
>
>Thanks,
>-Pavan.
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw meta pool

2016-09-09 Thread Pavan Rallabhandi
Any help on this is much appreciated, am considering to fix this, given it’s 
confirmed an issue unless am missing something obvious. 

Thanks,
-Pavan.

On 9/8/16, 5:04 PM, "ceph-users on behalf of Pavan Rallabhandi" 
 
wrote:

Trying it one more time on the users list.

In our clusters running Jewel 10.2.2, I see default.rgw.meta pool running 
into large number of objects, potentially to the same range of objects 
contained in the data pool. 

I understand that the immutable metadata entries are now stored in this 
heap pool, but I couldn’t reason out why the metadata objects are left in this 
pool even after the actual bucket/object/user deletions.

The put_entry() promptly seems to be storing the same in the heap pool 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880, but I do 
not see them to be reaped ever. Are they left there for some reason?

Thanks,
-Pavan.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] virtio-blk multi-queue support and RBD devices?

2016-09-09 Thread Alexandre DERUMIER
Hi,

I'll test it next week to integrate it in proxmox.

But I'm not sure I'll improve too much performance , 

until qemu will be able to use multiple iothread with multiple queue.
(I think that Paolo Bonzini still working on this currently).

The main bottleneck with rbd currently, is cpu usage (limited to 1 iothread by 
disk)

I'll send a benchmark report to the ceph mailing next week.



- Mail original -
De: "Simon Leinen" 
À: "ceph-users" 
Envoyé: Samedi 3 Septembre 2016 19:37:50
Objet: [ceph-users] virtio-blk multi-queue support and RBD devices?

One of the new features in Qemu 2.7[1] is 

* virtio-blk now supports multiqueue through a "num-queues" device 
property. 

We use virtio-blk in our OpenStack cluster to expose RBD volumes to 
Qemu/KVM VMs. Can RBD-backed virtio-blk benefit from multiple queues? 

(I'm hopeful because virtio-scsi had multi-queue support for a while, 
and someone reported increased IOPS even with RBD devices behind those.) 
-- 
Simon. 

[1] http://wiki.qemu.org/ChangeLog/2.7 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy not creating osd's

2016-09-09 Thread Shain Miley
Can someone please suggest a course of action moving forward?

I don't fee comfortable making changes to the crush map without a better 
understanding of what exactly is going on here.

The new osd appears in the 'osd tree' but not in the current crush map. The 
sever that hosts the osd is not present in either the current crush map or the 
'osd tree'.

Thanks,

Shain

> On Sep 8, 2016, at 10:27 PM, Shain Miley  wrote:
> 
> I ended up starting from scratch and doing a purge and purgedata on that host 
> using ceph-deploy, after that things seemed to go better.
> The osd is up and in at this point, however when the osd was added to the 
> cluster...no data was being moved to the new osd.
> 
> Here is a copy of my current crush map:
> 
> http://pastebin.com/PMk3xZ0a
> 
> as you can see from the entry for osd number 108 (the last osd to be added to 
> the cluster)...the crush map does not contain a host entry for 
> hqosd10...which is the host for osd #108.
> 
> Any ideas on how to resolve this?
> 
> Thanks,
> Shain
> 
> 
>> On 9/8/16 2:20 PM, Shain Miley wrote:
>> Hello,
>> 
>> I am trying to use ceph-deploy to add some new osd's to our cluster.  I have 
>> used this method over the last few years to add all of our 107 osd's and 
>> things have seemed to work quite well.
>> 
>> One difference this time is that we are going to use a pci nvme card to 
>> journal the 16 disks in this server (Dell R730xd).
>> 
>> As you can see below it appears as though things complete successfully, 
>> however the osd count never increases, and when I look at hqosd10, there are 
>> no osd's mounted, and nothing in '/var/lib/ceph/osd', no ceph daemons 
>> running, etc.
>> 
>> I created the partitions on the nvme card by hand using parted (I was not 
>> sure if I ceph-deploy should take care of this part or not).
>> 
>> I have zapped the disk and re-run this command several times, and I have 
>> gotten the same result every time.
>> 
>> We are running Ceph version 0.94.9  on Ubuntu 14.04.5
>> 
>> Here is the output from my attempt:
>> 
>> root@hqceph1:/usr/local/ceph-deploy# ceph-deploy --verbose osd create 
>> hqosd10:sdb:/dev/nvme0n1p1
>> [ceph_deploy.conf][DEBUG ] found configuration file at: 
>> /root/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.36): /usr/local/bin/ceph-deploy 
>> --verbose osd create hqosd10:sdb:/dev/nvme0n1p1
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username  : None
>> [ceph_deploy.cli][INFO  ]  disk  : [('hqosd10', 
>> '/dev/sdb', '/dev/nvme0n1p1')]
>> [ceph_deploy.cli][INFO  ]  dmcrypt   : False
>> [ceph_deploy.cli][INFO  ]  verbose   : True
>> [ceph_deploy.cli][INFO  ]  bluestore : None
>> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>> [ceph_deploy.cli][INFO  ]  subcommand: create
>> [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : 
>> /etc/ceph/dmcrypt-keys
>> [ceph_deploy.cli][INFO  ]  quiet : False
>> [ceph_deploy.cli][INFO  ]  cd_conf   : 
>> 
>> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>> [ceph_deploy.cli][INFO  ]  fs_type   : xfs
>> [ceph_deploy.cli][INFO  ]  func  : > 0x7f6ba750cc80>
>> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>> [ceph_deploy.cli][INFO  ]  default_release   : False
>> [ceph_deploy.cli][INFO  ]  zap_disk  : False
>> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
>> hqosd10:/dev/sdb:/dev/nvme0n1p1
>> [hqosd10][DEBUG ] connected to host: hqosd10
>> [hqosd10][DEBUG ] detect platform information from remote host
>> [hqosd10][DEBUG ] detect machine type
>> [hqosd10][DEBUG ] find the location of an executable
>> [hqosd10][INFO  ] Running command: /sbin/initctl version
>> [hqosd10][DEBUG ] find the location of an executable
>> [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
>> [ceph_deploy.osd][DEBUG ] Deploying osd to hqosd10
>> [hqosd10][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>> [ceph_deploy.osd][DEBUG ] Preparing host hqosd10 disk /dev/sdb journal 
>> /dev/nvme0n1p1 activate True
>> [hqosd10][DEBUG ] find the location of an executable
>> [hqosd10][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare --cluster 
>> ceph --fs-type xfs -- /dev/sdb /dev/nvme0n1p1
>> [hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
>> --cluster=ceph --show-config-value=fsid
>> [hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
>> --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
>> [hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
>> --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
>> [hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
>> --cluster=ceph 

[ceph-users] osd reweight vs osd crush reweight

2016-09-09 Thread Simone Spinelli

Hi all,

we are running a 144 osds ceph cluster and a couple of osd are >80% full.

This is the general situation:

 osdmap e29344: 144 osds: 144 up, 144 in
  pgmap v48302229: 42064 pgs, 18 pools, 60132 GB data, 15483 kobjects
173 TB used, 90238 GB / 261 TB avail

We are currenty mitigating the problem using osd reweight but the more 
we read about this problem the more our doubts abouts using osd crush 
reweight increases.

Actually, we do not have plans to buy new hardware.

Our main question is: what if the re-weighted osd restart and get the 
original weight are the data going back?


How to correcly face this kind of situation?

Many thanks

Simone



--
Simone Spinelli 
Università di Pisa
Settore Rete, Telecomunicazioni e Fonia - Serra
Direzione Edilizia e Telecomunicazioni
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unauthorized to list radosgw swift container objects

2016-09-09 Thread B, Naga Venkata
Hi all,

After upgrade from firefly(0.80.7) to hammer(0.94.7), I am unable to list 
objects in containers for radosgw swift user and I am able to list containers 
for the same user.

I have created the user using
radosgw-admin user create --subuser=s3User:swiftUser --display-name="First 
User" --key-type=swift --access=full

stack@m1-mgmt:~$ swift -V 1.0 -A http://192.17.16.5:8079/auth -U 
"s3User:swiftUser" -K "VG+OJoRtloR7AhsD3xsQRi5ug2V5SidqgTpbZe0x" list
test3
test4
test5

stack@m1-mgmt:~$ swift -V 1.0 -A http://192.17.16.5:8079/auth -U 
"s3User:swiftUser" -K "VG+OJoRtloR7AhsD3xsQRi5ug2V5SidqgTpbZe0x" list test3
Container GET failed: http://192.17.16.5:8079/swift/v1/test3?format=json 401 
Unauthorized   {"Code":"AccessDenied"}

I am seeing the below logs in radosgw.log

2016-09-09 08:02:24.594110 7f89a67e4700  1 == starting new request 
req=0x7f89b803b430 =
2016-09-09 08:02:24.594128 7f89a67e4700  2 req 91413:0.18::GET 
/swift/v1/test3::initializing for trans_id = 
tx16515-0057d26c90-5f24-default
2016-09-09 08:02:24.594133 7f89a67e4700 10 host=192.17.16.5
2016-09-09 08:02:24.594134 7f89a67e4700 20 subdomain= domain= in_hosted_domain=0
2016-09-09 08:02:24.594169 7f89a67e4700 10 ver=v1 first=test3 req=
2016-09-09 08:02:24.594173 7f89a67e4700 10 s->object= s->bucket=test3
2016-09-09 08:02:24.594178 7f89a67e4700  2 req 91413:0.68:swift:GET 
/swift/v1/test3::getting op
2016-09-09 08:02:24.594182 7f89a67e4700  2 req 91413:0.72:swift:GET 
/swift/v1/test3:list_bucket:authorizing
2016-09-09 08:02:24.594243 7f89a67e4700 10 swift_user=s3User:swiftUser
2016-09-09 08:02:24.594271 7f89a67e4700 20 build_token 
token=10007333557365723a737769667455736572ea8442e3530b7e4f10bed3573a113723
2016-09-09 08:02:24.594372 7f89a67e4700  2 req 91413:0.000262:swift:GET 
/swift/v1/test3:list_bucket:reading permissions
2016-09-09 08:02:24.594418 7f89a67e4700 15 Read 
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/;>s3UserFirst
 Userhttp://www.w3.org/2001/XMLSchema-instance; 
xsi:type="CanonicalUser">s3UserFirst 
UserFULL_CONTROL
2016-09-09 08:02:24.594437 7f89a67e4700  2 req 91413:0.000327:swift:GET 
/swift/v1/test3:list_bucket:init op
2016-09-09 08:02:24.594441 7f89a67e4700  2 req 91413:0.000331:swift:GET 
/swift/v1/test3:list_bucket:verifying op mask
2016-09-09 08:02:24.594449 7f89a67e4700 20 required_mask= 1 user.op_mask=7
2016-09-09 08:02:24.594451 7f89a67e4700  2 req 91413:0.000341:swift:GET 
/swift/v1/test3:list_bucket:verifying op permissions
2016-09-09 08:02:24.594492 7f89a67e4700  2 req 91413:0.000382:swift:GET 
/swift/v1/test3:list_bucket:http status=401
2016-09-09 08:02:24.594496 7f89a67e4700  1 == req done req=0x7f89b803b430 
http_status=401 ==
2016-09-09 08:02:24.594515 7f89a67e4700 20 process_request() returned -13

Can someone help me on this issue?

Thanks & Regards,
Naga Venkata

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

2016-09-09 Thread Mehmet

Hello Alexey,

thank you for your mail - my answers inline :)

Am 2016-09-08 16:24, schrieb Alexey Sheplyakov:

Hi,


root@:~# ceph-osd -i 12 --flush-journal

 > SG_IO: questionable sense data, results may be incorrect
 > SG_IO: questionable sense data, results may be incorrect

As far as I understand these lines is a hdparm warning (OSD uses
hdparm command to query the journal device write cache state).

The message means hdparm is unable to reliably figure out if the drive
write cache is enabled. This might indicate a hardware problem.


I guess this has to do with the the NVMe-Device (Intel DC P3700 NVMe) 
which is used for journaling.

And so.. a normal behavior?


ceph-osd -i 12 --flush-journal


I think it's a good idea to
a) check the journal drive (smartctl),


The disks are all fine - checked 2-3 weeks before.


b) capture a more verbose log,

i.e. add this to ceph.conf

[osd]
debug filestore = 20/20
debug journal = 20/20

and try flushing the journal once more (note: this won't fix the
problem, the point is to get a useful log)


I have flushed the the journal @ ~09:55:26 today and got this lines

root@:~# ceph-osd -i 10 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7f38a2ecf700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x560356296dde]
 2: (()+0x113d0) [0x7f38a81b03d0]
 3: [0x560360f79f00]
2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7f38a2ecf700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x560356296dde]
 2: (()+0x113d0) [0x7f38a81b03d0]
 3: [0x560360f79f00]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


 0> 2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7f38a2ecf700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x560356296dde]
 2: (()+0x113d0) [0x7f38a81b03d0]
 3: [0x560360f79f00]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


Segmentation fault


This is the actual logfile for osd.10
- http://slexy.org/view/s21lhpkLGQ

By the way:
I have done "ceph osd set noout" before stop and flushing.

Hope this is useful for you!

- Mehmet


Best regards,
  Alexey

On Wed, Sep 7, 2016 at 6:48 PM, Mehmet  wrote:


Hey again,

now i have stopped my osd.12 via

root@:~# systemctl stop ceph-osd@12

and when i am flush the journal...

root@:~# ceph-osd -i 12 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7f421d49d700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f421d49d700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

     0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught
signal (Segmentation fault) **
 in thread 7f421d49d700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

Segmentation fault

The logfile with further information
- http://slexy.org/view/s2T8AohMfU [4]

I guess i will get same message when i flush the other journals.

- Mehmet

Am 2016-09-07 13:23, schrieb Mehmet:


Hello ceph people,

yesterday i stopped one of my OSDs via

root@:~# systemctl stop ceph-osd@10

and tried to flush the journal for this osd via

root@:~# ceph-osd -i 10 --flush-journal

but getting this output on the screen:

SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS `
is
needed to interpret this.

     0> 

Re: [ceph-users] non-effective new deep scrub interval

2016-09-09 Thread David DELON

Hi,
this is good for me:

ceph tell osd.* injectargs --osd_scrub_end_hour 7
ceph tell osd.* injectargs --osd_scrub_load_threshold 0.1

About the "(unchangeable)" warning, it seems to be a bug according:
http://tracker.ceph.com/issues/16054

Have a nice day.
D.


- Le 9 Sep 16, à 3:42, Christian Balzer ch...@gol.com a écrit :

> Hello,
> 
> On Thu, 8 Sep 2016 17:09:27 +0200 (CEST) David DELON wrote:
> 
>> 
>> First, thanks for your answer Christian.
>>
> C'est rien.
> 
>> - Le 8 Sep 16, à 13:30, Christian Balzer ch...@gol.com a écrit :
>> 
>> > Hello,
>> > 
>> > On Thu, 8 Sep 2016 09:48:46 +0200 (CEST) David DELON wrote:
>> > 
>> >> 
>> >> Hello,
>> >> 
>> >> i'm using ceph jewel.
>> >> I would like to schedule the deep scrub operations on my own.
>> > 
>> > Welcome to the club, alas the ride isn't for the faint of heart.
>> > 
>> > You will want to (re-)search the ML archive (google) and in particular the
>> > recent "Spreading deep-scrubbing load" thread.
>> 
>> It is not exactly what i would like to do. That's why i have posted.
>> I wanted to trigger on my own the deep scrubbing on sundays with a cron
>> script...
>>
> If you look at that thread (and others) that's what I do, too.
> And ideally, not even needing a cron script after the first time,
> provided your scrubs can fit into the time frame permitted.
> 
>> 
>> >> First of all, i have tried to change the interval value for 30 days:
>> >> In each /etc/ceph/ceph.conf, i have added:
>> >> 
>> >> [osd]
>> >> #30*24*3600
>> >> osd deep scrub interval = 2592000
>> >> I have restarted all the OSD daemons.
>> > 
>> > This could have been avoided by an "inject" for all OSDs.
>> > Restarting (busy) OSDs isn't particular nice for a cluster.
>> 
>> I have first done the inject of the new value. But as it did not the trick 
>> after
>> some hours and the "injectargs" command have returned
>> "(unchangeable)"
>> i have thought OSD restarts were needed...
>>
> I keep forgetting about that, annoying.
> 
>> 
>> >> The new value has been taken into account as for each OSD:
>> >> 
>> >> ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep
>> >> deep_scrub_interval
>> >> "osd_deep_scrub_interval": "2.592e+06",
>> >> 
>> >> 
>> >> I have checked the last_deep_scrub value for each pg with
>> >> ceph pg dump
>> >> And each pg has been deep scrubbed during the last 7 days (which is the 
>> >> default
>> >> behavior).
>> >> 
>> > See the above thread.
>> > 
>> >> Since i have made the changes 2 days ago, it keeps on deep scrubbing.
>> >> Do i miss something?
>> >> 
>> > At least 2 things, maybe more.
>> > 
>> > Unless you changed the "osd_scrub_max_interval" as well, that will enforce
>> > things, by default after a week.
>> 
>> Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.
>> 
> 
> osd_scrub_min_interval has no impact on deep scrubs,
> osd_scrub_max_interval interestingly and unexpectedly does.
> 
> Meaning it's the next one:
> 
>> > And with Jewel you get that well meaning, but turned on by default and
>> > ill-documented "osd_scrub_interval_randomize_ratio", which will spread
>> > things out happily and not when you want them.
>> > 
> 
> If you set osd_scrub_interval_randomize_ratio to 0, scrubs should be
> become fixed interval and deterministic again.
> 
> Christian
> 
>> > Again, read the above thread.
>> > 
>> > Also your cluster _should_ be able to endure deep scrubs even when busy,
>> > otherwise you're looking at trouble when you loose an OSD and the
>> > resulting balancing as well.
>> > 
>> > Setting these to something sensible:
>> >"osd_scrub_begin_hour": "0",
>> >"osd_scrub_end_hour": "6",
>> > 
>> > and especially this:
>> >"osd_scrub_sleep": "0.1",
>> 
>> 
>> OK, i will consider this solution.
>> 
>> > will minimize the impact of scrub as well.
>> > 
>> > Christian
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com  Global OnLine Japan/Rakuten Communications
>> > http://www.gol.com/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs toofull

2016-09-09 Thread gjprabu
Hi Gregory,



My doubt has been cleared , by default cephfs will allow 82% of 
data to store and we can increase this value using osd_backfill_full_ratio.



Regards

Prabu GJ




 On Tue, 30 Aug 2016 17:05:34 +0530 gjprabu 
gjpr...@zohocorp.comwrote  




Hi Gregory,



   Our cause we have 6TB data and replica 2 and its around 12TB size 
occupied, still i have remaining 4TB even though it says this error.



 51 active+undersized+degraded+remapped+backfill_toofull 



Regards

Prabu GJ





 On Mon, 29 Aug 2016 23:44:12 +0530 Gregory Farnum 
gfar...@redhat.comwrote  











On Mon, Aug 29, 2016 at 12:53 AM, Christian Balzer ch...@gol.com wrote:

 On Mon, 29 Aug 2016 12:51:55 +0530 gjprabu wrote:



 Hi Chrishtian,







 Sorry for subject and thanks for your reply,







 gt; That's incredibly small in terms of OSD numbers, how many 
hosts? What replication size?



 Total host 5.



 Replicated size : 2



 At this replication size you need to act and replace/add OSDs NOW.

 The next OSD failure will result in data loss.



 So your RAW space is about 16TB, leaving you with 8TB of usable space.



 Which doesn't mesh with your "df", showing the ceph FS with 11TB used...



When you run df against a CephFS mount, it generally reports the same

data as you get out of RADOS — so if you have replica 2 and 4 TB of

data, it will report as 8TB used (since, after all, you have used

8TB!). There are exceptions in a few cases; you can have it based off

of your quotas for subtree mounts for one.

-Greg

___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com