[ceph-users] object store backup tool recommendations

2017-03-02 Thread Blair Bethwaite
Hi all,

Does anyone have any recommendations for good tools to perform
file-system/tree backups and restores to/from a RGW object store (Swift or
S3 APIs)? Happy to hear about both FOSS and commercial options please.

I'm interested in:
1) tools known to work or not work at all for a basic file-based data backup

Plus these extras:
2) preserves/restores correct file metadata (e.g. owner, group, acls etc)
3) preserves/restores xattrs
4) backs up empty directories and files
5) supports some sort of snapshot/versioning/differential functionality,
i.e., will keep a copy or diff or last N versions of a file or whole backup
set, e.g., so that one can restore yesterday's file/s or last week's but
not have to keep two full copies to achieve it
6) is readily able to restore individual files
7) can encrypt/decrypt client side
8) anything else I should be considering

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack Talks

2017-03-02 Thread Patrick McGarry
Hey cephers,

This is just a reminder that we have 10x 40 minutes talk slots
available at OpenStack Boston (and 10 free passes to go with them). If
you are interested in giving a Ceph-related talk, please contact me as
soon as possible with the following:

* Presenter Name
* Presenter Org
* Talk title
* Talk abstract

These slots will go fast, so the sooner you get something submitted
for review, the better your chances will be. Thanks!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-02 Thread Ilya Dryomov
On Thu, Mar 2, 2017 at 5:01 PM, Xiaoxi Chen  wrote:
> 2017-03-02 23:25 GMT+08:00 Ilya Dryomov :
>> On Thu, Mar 2, 2017 at 1:06 AM, Sage Weil  wrote:
>>> On Thu, 2 Mar 2017, Xiaoxi Chen wrote:
 >Still applies. Just create a Round Robin DNS record. The clients will
 obtain a new monmap while they are connected to the cluster.
 It works to some extent, but causing issue for "mount -a". We have such
 deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
 it works fine in terms of failover/ mount.

 But, user usually automation such mount by fstab and even, "mount -a " are
 periodically called. With such DNS approach above, they will get mount 
 point
 busy message every time. Just due to mount.ceph resolve the DNS name to
 another IP, and kernel client was feeling like you are trying to attach
 another fs...
>>>
>>> The kernel client is (should be!) smart enough to tell that it is the same
>>> mount point and will share the superblock.  If you see a problem here it's
>>> a bug.
>>
>> I think -EBUSY actually points out that the sharing code is working.
>>
>> The DNS name in fstab doesn't match the IPs it resolves to, so "mount
>> -a" attempts to mount.  The kernel client tells that it's the same fs
>> and returns the existing super to the VFS.  The VFS refuses the same
>> super on the same mount point...
>
> True,
> root@lvspuppetmaster-ng2-1209253:/mnt# mount -a
> mount error 16 = Device or resource busy
>
> Do  we have any chane to make dynamic works(i.e suppress the -EBUSY
> for this case) for old kernel?

No, probably not.  mount.ceph resolves DNS names, so you end up with
IPs in /proc/mounts which trick "mount -a" into attempting the mount.
Currently there is no way to tell mount.ceph to not resolve, and even
if there was, the in-kernel DNS resolver is disabled -- you'd need to
rebuild libceph and ceph kernel modules to enable it.

In your case -EBUSY most likely means that the filesystem is already
mounted, so it should be safe to ignore.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-02 Thread Heller, Chris
Success! There was an issue related to my operating system install procedure 
that was causing the journals to become corrupt, but it was not caused by ceph! 
That bug fixed; now the procedure on shutdown in this thread has been verified 
to work as expected. Thanks for all the help.

-Chris

> On Mar 1, 2017, at 9:39 AM, Peter Maloney 
>  wrote:
> 
> On 03/01/17 15:36, Heller, Chris wrote:
>> I see. My journal is specified in ceph.conf. I'm not removing it from the 
>> OSD so sounds like flushing isn't needed in my case.
>> 
> Okay but it seems it's not right if it's saying it's a non-block journal. 
> (meaning a file, not a block device).
> 
> Double check your ceph.conf... make sure the path works, and somehow make 
> sure the [osd.x] actually matches that osd (no idea how to test that, esp. if 
> the osd doesn't start ... maybe just increase logging).
> 
> Or just make a symlink for now, just to see if it solves the problem, which 
> would imply the ceph.conf is wrong.
> 
> 
>> -Chris
>>> On Mar 1, 2017, at 9:31 AM, Peter Maloney 
>>> >> > wrote:
>>> 
>>> On 03/01/17 14:41, Heller, Chris wrote:
 That is a good question, and I'm not sure how to answer. The journal is on 
 its own volume, and is not a symlink. Also how does one flush the journal? 
 That seems like an important step when bringing down a cluster safely.
 
>>> You only need to flush the journal if you are removing it from the osd, 
>>> replacing it with a different journal.
>>> 
>>> So since your journal is on its own, then you need either a symlink in the 
>>> osd directory named "journal" which points to the device (ideally not 
>>> /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf.
>>> 
>>> And since it said you have a non-block journal now, it probably means there 
>>> is a file... you should remove that (rename it to journal.junk until you're 
>>> sure it's not an important file, and delete it later).
 
>> This is where I've stopped. All but one OSD came back online. One has 
>> this backtrace:
>> 
>> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: 
>> disabling aio for non-block journal.  Use journal_force_aio to force use 
>> of aio anyway
> Are the journals inline? or separate? If they're separate, the above 
> means the journal symlink/config is missing, so it would possibly make a 
> new journal, which would be bad if you didn't flush the old journal 
> before.
> 
> And also just one osd is easy enough to replace (which I wouldn't do 
> until the cluster settled down and recovered). So it's lame for it to be 
> broken, but it's still recoverable if that's the only issue.
 
>>> 
>>> 
>> 
> 
> 
> -- 
> 
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.malo...@brockmann-consult.de 
> 
> Internet: http://www.brockmann-consult.de 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-02 Thread Xiaoxi Chen
2017-03-02 23:25 GMT+08:00 Ilya Dryomov :
> On Thu, Mar 2, 2017 at 1:06 AM, Sage Weil  wrote:
>> On Thu, 2 Mar 2017, Xiaoxi Chen wrote:
>>> >Still applies. Just create a Round Robin DNS record. The clients will
>>> obtain a new monmap while they are connected to the cluster.
>>> It works to some extent, but causing issue for "mount -a". We have such
>>> deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
>>> it works fine in terms of failover/ mount.
>>>
>>> But, user usually automation such mount by fstab and even, "mount -a " are
>>> periodically called. With such DNS approach above, they will get mount point
>>> busy message every time. Just due to mount.ceph resolve the DNS name to
>>> another IP, and kernel client was feeling like you are trying to attach
>>> another fs...
>>
>> The kernel client is (should be!) smart enough to tell that it is the same
>> mount point and will share the superblock.  If you see a problem here it's
>> a bug.
>
> I think -EBUSY actually points out that the sharing code is working.
>
> The DNS name in fstab doesn't match the IPs it resolves to, so "mount
> -a" attempts to mount.  The kernel client tells that it's the same fs
> and returns the existing super to the VFS.  The VFS refuses the same
> super on the same mount point...

True,
root@lvspuppetmaster-ng2-1209253:/mnt# mount -a
mount error 16 = Device or resource busy

Do  we have any chane to make dynamic works(i.e suppress the -EBUSY
for this case) for old kernel?
>
> We should look into enabling the in-kernel DNS resolver.

Thanks for explaination,  looking forward :)
>
> Thanks,
>
> Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Jewel] upgrade 10.2.3 => 10.2.5 KO : first OSD server freeze every two days :)

2017-03-02 Thread pascal.pu...@pci-conseil.net

Erratum : Sorry for bad link for screenshots :

1st : https://supervision.pci-conseil.net/screenshot_LOAD.png

2nd : https://supervision.pci-conseil.net/screenshot_OSD_IO.png

:)

Le 02/03/2017 à 15:34, pascal.pu...@pci-conseil.net a écrit :


Hello,

So, I need maybe some advices : 1 week ago (last 19 feb), I upgraded 
my stable Ceph Jewel from 10.2.3 to 10.2.5 (YES, It was maybe a bad idea).


I never had problem with Ceph 10.2.3 since last upgrade, last 23 
September.


So since my upgrade (10.2.5), every 2 days, the first OSD server 
totaly Freeze. Load go > 500 and come back after somes minutes… I lost 
all OSD from this server (12/36) during issue.


It’s very strange: So, some informations :

_Infrastructure_:

3 x OSD servers with 12x OSD disk each and SSD Journal + 3 Mon server 
+ 3 clients Ceph - RBD.


10G dedicated network for client and 10G dedicated networks for OSD.

So 36 x OSD. Each server has 16 CPU core (E5-2630v3x2) and 32G Ram. No 
problem with resources.


Performance is good for 36 x NL-SAS DISK 4To + 1 SSD write intensiv 
per OSD-server.


_Issue:_

This morning (last issue was 2 days ago):

See screenshot : 
http://www.performance-conseil-informatique.net/wp-content/uploads/2017/03/screenshot_LOAD-1.png


As you can see, there are few IO (just 2 clients, writing sometime 
150Mo/s during few minutes) – It’s a big NAS for cold Data.


So during issue, there was no IO: it's strange. Same for other issue.

See screenshot : 
http://www.performance-conseil-informatique.net/wp-content/uploads/2017/03/screenshot_OSD_IO.png


Before issue: no activity. You can see all OSD READ, OSD Write, 
Journal (SSD), IO wait.


7 :07=>7 :09. 2 minutes with 12/36 OSD totaly lost. It come back 
after, but I need to fix that.


During time of issue, scrub is stopped as well, Trim night was 
finished… no IO.


No other cron on server, nothing. all server have same configuration.

*LOGS : *

A lot :

ceph-osd.3.log:2017-03-02 07:09:32.061754 7f6d501e4700 -1 osd.3 14557 
heartbeat_check: no reply from 0x7f6dadb48c10 osd.19 since back 
2017-03-02 07:07:53.286880 front 2017-03-02 07:07:53.286880 (cutoff 
2017-03-02 07:09:12.061690)


 Sometime:

common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7fc38a5e9425]


2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x2e1) [0x7fc38a528de1]


3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7fc38a52963e]

4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7fc38a529e1c]

5: (CephContextServiceThread::entry()+0x15b) [0x7fc38a6011ab]

6: (()+0x7dc5) [0x7fc388304dc5]

7: (clone()+0x6d) [0x7fc38698f73d]

NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.



_*Questions:*_ Why only first OSD-server freeze? 3 servers are 
strictly same. What are freezing server, increased load… ?


Already 4 freezes from the last upgrade. I will today modify log level 
and restart all to have more logs.


Any idea to troubleshoot ? (I already use sar statistics to find 
something…).


Maybe some change with heartbeat ?

Should I think to downgrade to 10.2.3 ? upgrade to Kraken ?

Thanks for your help,

Regards,

*Other things :*

rpm -qa|grep ceph

libcephfs1-10.2.5-0.el7.x86_64

ceph-common-10.2.5-0.el7.x86_64

ceph-mon-10.2.5-0.el7.x86_64

ceph-release-1-1.el7.noarch

ceph-10.2.5-0.el7.x86_64

ceph-radosgw-10.2.5-0.el7.x86_64

ceph-selinux-10.2.5-0.el7.x86_64

ceph-mds-10.2.5-0.el7.x86_64

python-cephfs-10.2.5-0.el7.x86_64

ceph-base-10.2.5-0.el7.x86_64

ceph-osd-10.2.5-0.el7.x86_64

uname -a

Linux ceph-osd-03 3.10.0-514.6.2.el7.x86_64 #1 SMP Thu Feb 23 03:04:39 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Ceph conf :

[global]

fsid = d26f269b-852f-4181-821d-756f213ae155

mon_initial_members = ceph-mon-01, ceph-mon-02, ceph-mon-03

mon_host = 192.168.43.147,192.168.43.148,192.168.43.149

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

max_open_files = 131072

public_network = 192.168.43.0/24

cluster_network = 192.168.44.0/24

osd_journal_size = 13000

osd_pool_default_size = 2 # Write an object n times.

osd_pool_default_min_size = 2 # Allow writing n copy in a degraded state.

osd_pool_default_pg_num = 512

osd_pool_default_pgp_num = 512

osd_crush_chooseleaf_type = 8

cephx_cluster_require_signatures = true

cephx_service_require_signatures = false

mon_pg_warn_max_object_skew = 0

mon_pg_warn_max_per_osd = 0

[mon]

[osd]

osd_max_backfills = 1

osd_recovery_priority = 3

osd_recovery_max_active = 3

osd_recovery_max_start = 3

filestore merge threshold = 40

filestore split multiple = 8

filestore xattr use omap = true

osd op threads = 8

osd disk threads = 4

osd op num threads per shard = 3

osd op num shards = 10

osd map cache size = 1024

osd_enable_op_tracker = false

osd_scrub_begin_hour = 20


Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-02 Thread Ilya Dryomov
On Thu, Mar 2, 2017 at 1:06 AM, Sage Weil  wrote:
> On Thu, 2 Mar 2017, Xiaoxi Chen wrote:
>> >Still applies. Just create a Round Robin DNS record. The clients will
>> obtain a new monmap while they are connected to the cluster.
>> It works to some extent, but causing issue for "mount -a". We have such
>> deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
>> it works fine in terms of failover/ mount.
>>
>> But, user usually automation such mount by fstab and even, "mount -a " are
>> periodically called. With such DNS approach above, they will get mount point
>> busy message every time. Just due to mount.ceph resolve the DNS name to
>> another IP, and kernel client was feeling like you are trying to attach
>> another fs...
>
> The kernel client is (should be!) smart enough to tell that it is the same
> mount point and will share the superblock.  If you see a problem here it's
> a bug.

I think -EBUSY actually points out that the sharing code is working.

The DNS name in fstab doesn't match the IPs it resolves to, so "mount
-a" attempts to mount.  The kernel client tells that it's the same fs
and returns the existing super to the VFS.  The VFS refuses the same
super on the same mount point...

We should look into enabling the in-kernel DNS resolver.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph PG repair

2017-03-02 Thread Reed Dier
Over the weekend, two inconsistent PG’s popped up in my cluster. This being 
after having scrubs disabled for close to 6 weeks after a very long rebalance 
after adding 33% more OSD’s, an OSD failing, increasing PG’s, etc.

It appears we came out the other end with 2 inconsistent PG’s and I’m trying to 
resolve them, and not seeming to have much luck.
Ubuntu 16.04, Jewel 10.2.5, 3x replicated pool for reference.

> $ ceph health detail
> HEALTH_ERR 2 pgs inconsistent; 3 scrub errors; 
> noout,sortbitwise,require_jewel_osds flag(s) set
> pg 10.7bd is active+clean+inconsistent, acting [8,23,17]
> pg 10.2d8 is active+clean+inconsistent, acting [18,17,22]
> 3 scrub errors

> $ rados list-inconsistent-pg objects
> ["10.2d8","10.7bd”]

Pretty straight forward, 2 PG’s with inconsistent copies. Lets dig deeper.

> $ rados list-inconsistent-obj 10.2d8 --format=json-pretty
> {
> "epoch": 21094,
> "inconsistents": [
> {
> "object": {
> "name": “object.name",
> "nspace": “namespace.name",
> "locator": "",
> "snap": "head"
> },
> "errors": [],
> "shards": [
> {
> "osd": 17,
> "size": 15913,
> "omap_digest": "0x",
> "data_digest": "0xa6798e03",
> "errors": []
> },
> {
> "osd": 18,
> "size": 15913,
> "omap_digest": "0x",
> "data_digest": "0xa6798e03",
> "errors": []
> },
> {
> "osd": 22,
> "size": 15913,
> "omap_digest": "0x",
> "data_digest": "0xa6798e03",
> "errors": [
> "data_digest_mismatch_oi"
> ]
> }
> ]
> }
> ]
> }

> $ rados list-inconsistent-obj 10.7bd --format=json-pretty
> {
> "epoch": 21070,
> "inconsistents": [
> {
> "object": {
> "name": “object2.name",
> "nspace": “namespace.name",
> "locator": "",
> "snap": "head"
> },
> "errors": [
> "read_error"
> ],
> "shards": [
> {
> "osd": 8,
> "size": 27691,
> "omap_digest": "0x",
> "data_digest": "0x9ce36903",
> "errors": []
> },
> {
> "osd": 17,
> "size": 27691,
> "omap_digest": "0x",
> "data_digest": "0x9ce36903",
> "errors": []
> },
> {
> "osd": 23,
> "size": 27691,
> "errors": [
> "read_error"
> ]
> }
> ]
> }
> ]
> }


So we have one PG (10.7bd) with a read error on osd.23, which is known and 
scheduled for replacement.
We also have a data digest mismatch on PG 10.2d8 on osd.22, which I have been 
attempting to repair with no real tangible results.

> $ ceph pg repair 10.2d8
> instructing pg 10.2d8 on osd.18 to repair

I’ve run the ceph pg repair command multiple times, and each time, it instructs 
osd.18 to repair to the PG.
Is this to assume that osd.18 is the acting member of the copies, and its being 
told to backfill the known-good copy of the PG over the agreed upon wrong 
version on osd.22.

> $ zgrep 'ERR' /var/log/ceph/*
> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561164 7fc8dfeb8700 -1 
> log_channel(cluster) log [ERR] : 10.2d8 recorded data digest 0x7fa9879c != on 
> disk 0xa6798e03 on 10:1b42251f:{object.name}:head
> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561225 7fc8dfeb8700 -1 
> log_channel(cluster) log [ERR] : deep-scrub 10.2d8 
> 10:1b42251f:{object.name}:head on disk size (15913) does not match object 
> info size (10280) adjusted for ondisk to (10280)
> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 21:05:59.935815 7fc8dfeb8700 -1 
> log_channel(cluster) log [ERR] : 10.2d8 deep-scrub 2 errors


> $ ceph pg 10.2d8 query
> {
> "state": "active+clean+inconsistent",
> "snap_trimq": "[]",
> "epoch": 21746,
> "up": [
> 18,
> 17,
> 22
> ],
> "acting": [
> 18,
> 17,
> 22
> ],
> "actingbackfill": [
> "17",
> "18",
> "22"
> ],

However, no recovery io ever occurs, and the PG never goes active+clean. Not 
seeing anything exciting in the logs of the OSD’s nor the mon’s.

I’ve found a few articles and mailing list entries that 

Re: [ceph-users] Hammer update

2017-03-02 Thread Sasha Litvak
I run centos 6.8 so no 0.94.10 packages for el6.

On Mar 2, 2017 8:47 AM, "Abhishek L"  wrote:


Sasha Litvak writes:

> Hello everyone,
>
> Hammer 0.94.10 update was announced in the blog a week ago. However,
there are no packages available for either version of redhat. Can someone
tell me what is going on?

I see the packages at http://download.ceph.com/rpm-hammer/el7/x86_64/.
Are you able to see the packages after following the instructions at
http://docs.ceph.com/docs/master/install/get-packages/ ?

Best,
Abhishek
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB
21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Ah ...


Il 02/03/2017 15:56, Jason Dillaman ha scritto:

I'll refer you to the man page for blkdiscard [1]. Since it operates
on the block device, it doesn't know about filesystem holes and
instead will discard all data specified (i.e. it will delete all your
data).

[1] http://man7.org/linux/man-pages/man8/blkdiscard.8.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Jason Dillaman
I'll refer you to the man page for blkdiscard [1]. Since it operates
on the block device, it doesn't know about filesystem holes and
instead will discard all data specified (i.e. it will delete all your
data).

[1] http://man7.org/linux/man-pages/man8/blkdiscard.8.html

On Thu, Mar 2, 2017 at 9:54 AM, Massimiliano Cuttini  wrote:
>
>
> Il 02/03/2017 14:11, Jason Dillaman ha scritto:
>>
>> On Thu, Mar 2, 2017 at 8:09 AM, Massimiliano Cuttini 
>> wrote:
>>>
>>> Ok,
>>>
>>> then, if the command comes from the hypervisor that hold the image is it
>>> safe?
>>
>> No, it needs to be issued from the guest VM -- not the hypervisor that
>> is running the guest VM. The reason is that it's a black box to the
>> hypervisor and it won't know what sectors can be safely discarded.
>
> This is true if you talk about the filesystem.
> So the command
>
> fstrim
>
> would be the case for sure.
> But if we talk about the block device.
> The command
>
> blkdiscard
>
> could not run on a VM which see images as localdisks without any thin
> provisioning.
> This command should be casted by the Hypervisor not the guest.
>
> ... or not?
>
>
>>> But if the guest VM on the same Hypervisor try to using the image, what
>>> happen?
>>
>> If you trim from outside the guest, I would expect you to potentially
>> corrupt the image (if the fstrim tool doesn't stop you first since the
>> filesystem isn't mounted).
>
>
> Ok make it sense on fstrim, but with blkdiscard?
>
>
>>> Are these safe tools? (aka: safely exit with error instead of try the
>>> command and ruin the image?).
>>> Should I consider a snapshot before go?
>>>
>> As I mentioned, the only safe way to proceed would be to run the trim
>> from within the guest VM or wait until Ceph adds the rbd CLI tooling
>> to safely sparsify an image.



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini



Il 02/03/2017 14:11, Jason Dillaman ha scritto:

On Thu, Mar 2, 2017 at 8:09 AM, Massimiliano Cuttini  wrote:

Ok,

then, if the command comes from the hypervisor that hold the image is it
safe?

No, it needs to be issued from the guest VM -- not the hypervisor that
is running the guest VM. The reason is that it's a black box to the
hypervisor and it won't know what sectors can be safely discarded.

This is true if you talk about the filesystem.
So the command

fstrim

would be the case for sure.
But if we talk about the block device.
The command

blkdiscard

could not run on a VM which see images as localdisks without any thin 
provisioning.

This command should be casted by the Hypervisor not the guest.

... or not?



But if the guest VM on the same Hypervisor try to using the image, what
happen?

If you trim from outside the guest, I would expect you to potentially
corrupt the image (if the fstrim tool doesn't stop you first since the
filesystem isn't mounted).


Ok make it sense on fstrim, but with blkdiscard?


Are these safe tools? (aka: safely exit with error instead of try the
command and ruin the image?).
Should I consider a snapshot before go?


As I mentioned, the only safe way to proceed would be to run the trim
from within the guest VM or wait until Ceph adds the rbd CLI tooling
to safely sparsify an image.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer update

2017-03-02 Thread Abhishek L

Sasha Litvak writes:

> Hello everyone,
>
> Hammer 0.94.10 update was announced in the blog a week ago. However, there 
> are no packages available for either version of redhat. Can someone tell me 
> what is going on?

I see the packages at http://download.ceph.com/rpm-hammer/el7/x86_64/.
Are you able to see the packages after following the instructions at
http://docs.ceph.com/docs/master/install/get-packages/ ?

Best,
Abhishek
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [Jewel] upgrade 10.2.3 => 10.2.5 KO : first OSD server freeze every two days :)

2017-03-02 Thread pascal.pu...@pci-conseil.net

Hello,

So, I need maybe some advices : 1 week ago (last 19 feb), I upgraded my 
stable Ceph Jewel from 10.2.3 to 10.2.5 (YES, It was maybe a bad idea).


I never had problem with Ceph 10.2.3 since last upgrade, last 23 September.

So since my upgrade (10.2.5), every 2 days, the first OSD server totaly 
Freeze. Load go > 500 and come back after somes minutes… I lost all OSD 
from this server (12/36) during issue.


It’s very strange: So, some informations :

_Infrastructure_:

3 x OSD servers with 12x OSD disk each and SSD Journal + 3 Mon server + 
3 clients Ceph - RBD.


10G dedicated network for client and 10G dedicated networks for OSD.

So 36 x OSD. Each server has 16 CPU core (E5-2630v3x2) and 32G Ram. No 
problem with resources.


Performance is good for 36 x NL-SAS DISK 4To + 1 SSD write intensiv per 
OSD-server.


_Issue:_

This morning (last issue was 2 days ago):

See screenshot : 
http://www.performance-conseil-informatique.net/wp-content/uploads/2017/03/screenshot_LOAD-1.png


As you can see, there are few IO (just 2 clients, writing sometime 
150Mo/s during few minutes) – It’s a big NAS for cold Data.


So during issue, there was no IO: it's strange. Same for other issue.

See screenshot : 
http://www.performance-conseil-informatique.net/wp-content/uploads/2017/03/screenshot_OSD_IO.png


Before issue: no activity. You can see all OSD READ, OSD Write, Journal 
(SSD), IO wait.


7 :07=>7 :09. 2 minutes with 12/36 OSD totaly lost. It come back after, 
but I need to fix that.


During time of issue, scrub is stopped as well, Trim night was finished… 
no IO.


No other cron on server, nothing. all server have same configuration.

*LOGS : *

A lot :

ceph-osd.3.log:2017-03-02 07:09:32.061754 7f6d501e4700 -1 osd.3 14557 
heartbeat_check: no reply from 0x7f6dadb48c10 osd.19 since back 
2017-03-02 07:07:53.286880 front 2017-03-02 07:07:53.286880 (cutoff 
2017-03-02 07:09:12.061690)


 Sometime:

common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7fc38a5e9425]


2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x2e1) [0x7fc38a528de1]


3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7fc38a52963e]

4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7fc38a529e1c]

5: (CephContextServiceThread::entry()+0x15b) [0x7fc38a6011ab]

6: (()+0x7dc5) [0x7fc388304dc5]

7: (clone()+0x6d) [0x7fc38698f73d]

NOTE: a copy of the executable, or `objdump -rdS ` is needed 
to interpret this.



_*Questions:*_ Why only first OSD-server freeze? 3 servers are strictly 
same. What are freezing server, increased load… ?


Already 4 freezes from the last upgrade. I will today modify log level 
and restart all to have more logs.


Any idea to troubleshoot ? (I already use sar statistics to find 
something…).


Maybe some change with heartbeat ?

Should I think to downgrade to 10.2.3 ? upgrade to Kraken ?

Thanks for your help,

Regards,

*Other things :*

rpm -qa|grep ceph

libcephfs1-10.2.5-0.el7.x86_64

ceph-common-10.2.5-0.el7.x86_64

ceph-mon-10.2.5-0.el7.x86_64

ceph-release-1-1.el7.noarch

ceph-10.2.5-0.el7.x86_64

ceph-radosgw-10.2.5-0.el7.x86_64

ceph-selinux-10.2.5-0.el7.x86_64

ceph-mds-10.2.5-0.el7.x86_64

python-cephfs-10.2.5-0.el7.x86_64

ceph-base-10.2.5-0.el7.x86_64

ceph-osd-10.2.5-0.el7.x86_64

uname -a

Linux ceph-osd-03 3.10.0-514.6.2.el7.x86_64 #1 SMP Thu Feb 23 03:04:39 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Ceph conf :

[global]

fsid = d26f269b-852f-4181-821d-756f213ae155

mon_initial_members = ceph-mon-01, ceph-mon-02, ceph-mon-03

mon_host = 192.168.43.147,192.168.43.148,192.168.43.149

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

max_open_files = 131072

public_network = 192.168.43.0/24

cluster_network = 192.168.44.0/24

osd_journal_size = 13000

osd_pool_default_size = 2 # Write an object n times.

osd_pool_default_min_size = 2 # Allow writing n copy in a degraded state.

osd_pool_default_pg_num = 512

osd_pool_default_pgp_num = 512

osd_crush_chooseleaf_type = 8

cephx_cluster_require_signatures = true

cephx_service_require_signatures = false

mon_pg_warn_max_object_skew = 0

mon_pg_warn_max_per_osd = 0

[mon]

[osd]

osd_max_backfills = 1

osd_recovery_priority = 3

osd_recovery_max_active = 3

osd_recovery_max_start = 3

filestore merge threshold = 40

filestore split multiple = 8

filestore xattr use omap = true

osd op threads = 8

osd disk threads = 4

osd op num threads per shard = 3

osd op num shards = 10

osd map cache size = 1024

osd_enable_op_tracker = false

osd_scrub_begin_hour = 20

osd_scrub_end_hour = 6

[client]

rbd_cache = true

rbd cache size = 67108864

rbd cache max dirty = 50331648

rbd cache target dirty = 33554432

rbd cache max dirty age = 2

rbd cache writethrough until flush = true

rbd readahead trigger requests = 10 # 

Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Jason Dillaman
On Thu, Mar 2, 2017 at 8:09 AM, Massimiliano Cuttini  wrote:
> Ok,
>
> then, if the command comes from the hypervisor that hold the image is it
> safe?

No, it needs to be issued from the guest VM -- not the hypervisor that
is running the guest VM. The reason is that it's a black box to the
hypervisor and it won't know what sectors can be safely discarded.

> But if the guest VM on the same Hypervisor try to using the image, what
> happen?

If you trim from outside the guest, I would expect you to potentially
corrupt the image (if the fstrim tool doesn't stop you first since the
filesystem isn't mounted).

> Are these safe tools? (aka: safely exit with error instead of try the
> command and ruin the image?).
> Should I consider a snapshot before go?
>

As I mentioned, the only safe way to proceed would be to run the trim
from within the guest VM or wait until Ceph adds the rbd CLI tooling
to safely sparsify an image.

>
>
>
> Il 02/03/2017 13:53, Jason Dillaman ha scritto:
>>
>> In that case, the trim/discard requests would need to come directly
>> from the guest virtual machines to avoid damaging the filesystems. We
>> do have a backlog feature ticket [1] to allow an administrator to
>> transparently sparsify a in-use image via the rbd CLI, but no work has
>> been started on it yet.
>>
>> [1] http://tracker.ceph.com/issues/13706
>>
>> On Thu, Mar 2, 2017 at 5:16 AM, Massimiliano Cuttini 
>> wrote:
>>>
>>> Thanks Jason,
>>>
>>> I need some further info, because I'm really worried about ruin my data.
>>> On this pool I have only XEN virtual disks.
>>> Did I have to run the command directly on the "pool" or on the "virtual
>>> disks" ?
>>>
>>> I guess that I have to run it on the pool.
>>> As Admin I don't have access to local filesystem of the customer's
>>> virtual
>>> disk and neither I can temporarly mount it to trim them.
>>> Are my assumptions right?
>>>
>>> Another info: did I need to umount the image from every device that is
>>> actually using the image while I'm trimming it?
>>>
>>> Thanks,
>>> Max
>>>
>>>
>>>
>>> Il 01/03/2017 20:11, Jason Dillaman ha scritto:

 You should be able to issue an fstrim against the filesystem on top of
 the nbd device or run blkdiscard against the raw device if you don't
 have a filesystem.

 On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini 
 wrote:
>
> Dear all,
>
> i use the rbd-nbd connector.
> Is there a way to reclaim free space from rbd image using this
> component
> or
> not?
>
>
> Thanks,
> Max
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



>>
>>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Ok,

then, if the command comes from the hypervisor that hold the image is it 
safe?
But if the guest VM on the same Hypervisor try to using the image, what 
happen?
Are these safe tools? (aka: safely exit with error instead of try the 
command and ruin the image?).

Should I consider a snapshot before go?




Il 02/03/2017 13:53, Jason Dillaman ha scritto:

In that case, the trim/discard requests would need to come directly
from the guest virtual machines to avoid damaging the filesystems. We
do have a backlog feature ticket [1] to allow an administrator to
transparently sparsify a in-use image via the rbd CLI, but no work has
been started on it yet.

[1] http://tracker.ceph.com/issues/13706

On Thu, Mar 2, 2017 at 5:16 AM, Massimiliano Cuttini  wrote:

Thanks Jason,

I need some further info, because I'm really worried about ruin my data.
On this pool I have only XEN virtual disks.
Did I have to run the command directly on the "pool" or on the "virtual
disks" ?

I guess that I have to run it on the pool.
As Admin I don't have access to local filesystem of the customer's virtual
disk and neither I can temporarly mount it to trim them.
Are my assumptions right?

Another info: did I need to umount the image from every device that is
actually using the image while I'm trimming it?

Thanks,
Max



Il 01/03/2017 20:11, Jason Dillaman ha scritto:

You should be able to issue an fstrim against the filesystem on top of
the nbd device or run blkdiscard against the raw device if you don't
have a filesystem.

On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini 
wrote:

Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component
or
not?


Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Jason Dillaman
In that case, the trim/discard requests would need to come directly
from the guest virtual machines to avoid damaging the filesystems. We
do have a backlog feature ticket [1] to allow an administrator to
transparently sparsify a in-use image via the rbd CLI, but no work has
been started on it yet.

[1] http://tracker.ceph.com/issues/13706

On Thu, Mar 2, 2017 at 5:16 AM, Massimiliano Cuttini  wrote:
> Thanks Jason,
>
> I need some further info, because I'm really worried about ruin my data.
> On this pool I have only XEN virtual disks.
> Did I have to run the command directly on the "pool" or on the "virtual
> disks" ?
>
> I guess that I have to run it on the pool.
> As Admin I don't have access to local filesystem of the customer's virtual
> disk and neither I can temporarly mount it to trim them.
> Are my assumptions right?
>
> Another info: did I need to umount the image from every device that is
> actually using the image while I'm trimming it?
>
> Thanks,
> Max
>
>
>
> Il 01/03/2017 20:11, Jason Dillaman ha scritto:
>>
>> You should be able to issue an fstrim against the filesystem on top of
>> the nbd device or run blkdiscard against the raw device if you don't
>> have a filesystem.
>>
>> On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini 
>> wrote:
>>>
>>> Dear all,
>>>
>>> i use the rbd-nbd connector.
>>> Is there a way to reclaim free space from rbd image using this component
>>> or
>>> not?
>>>
>>>
>>> Thanks,
>>> Max
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CrushMap Rule Change

2017-03-02 Thread Maxime Guyot
Hi Ahsley,

The rule you indicated, with “step choose indep 0 type osd” should select 13 
different OSDs but not necessary on 13 different servers. So you should be able 
to test that on say 4 servers if you have ~4 OSDs per server.

To split the selected OSDs across 4 hosts, I think you would do something like:
“step take fourtb
step choose indep 4 type host
step choose indep 4 type osd
step emit”

Cheers,
Maxime


From: ceph-users  on behalf of Ashley 
Merrick 
Date: Thursday 2 March 2017 11:34
To: "ceph-us...@ceph.com" 
Subject: [ceph-users] CrushMap Rule Change

Hello,

I am currently doing some erasure code tests in a dev environment.

I have set the following by “default”

rule sas {
ruleset 2
type erasure
min_size 3
max_size 13
step set_chooseleaf_tries 5
step set_choose_tries 100
step take fourtb
step choose indep 0 type osd
step emit
}

As I am splitting the file into 13 chunks it is placing these across 13 
different OSD’s.

In the DEV environment I do not have 13 hosts to do full host replication, 
however I am sure I can change the crush map rule to try and split evenly 
across the 4 HOST I have.

I’m think I will need to tell it to pick 4 HOST’s, and then the second line to 
pick OSD’s, however as 13 does not divide by 4 exactly what would be the best 
way to lay out this crushmap rule?

Thanks,
Ashley
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CrushMap Rule Change

2017-03-02 Thread Ashley Merrick
Hello,

I am currently doing some erasure code tests in a dev environment.

I have set the following by "default"

rule sas {
ruleset 2
type erasure
min_size 3
max_size 13
step set_chooseleaf_tries 5
step set_choose_tries 100
step take fourtb
step choose indep 0 type osd
step emit
}

As I am splitting the file into 13 chunks it is placing these across 13 
different OSD's.

In the DEV environment I do not have 13 hosts to do full host replication, 
however I am sure I can change the crush map rule to try and split evenly 
across the 4 HOST I have.

I'm think I will need to tell it to pick 4 HOST's, and then the second line to 
pick OSD's, however as 13 does not divide by 4 exactly what would be the best 
way to lay out this crushmap rule?

Thanks,
Ashley
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-02 Thread Massimiliano Cuttini

Thanks Jason,

I need some further info, because I'm really worried about ruin my data.
On this pool I have only XEN virtual disks.
Did I have to run the command directly on the "pool" or on the "virtual 
disks" ?


I guess that I have to run it on the pool.
As Admin I don't have access to local filesystem of the customer's 
virtual disk and neither I can temporarly mount it to trim them.

Are my assumptions right?

Another info: did I need to umount the image from every device that is 
actually using the image while I'm trimming it?


Thanks,
Max


Il 01/03/2017 20:11, Jason Dillaman ha scritto:

You should be able to issue an fstrim against the filesystem on top of
the nbd device or run blkdiscard against the raw device if you don't
have a filesystem.

On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini  wrote:

Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component or
not?


Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 'defect PG' caused heartbeat_map is_healthy timeout and recurring OSD breakdowns

2017-03-02 Thread Daniel Marks
Hi all,

yesterday we encountered a problem within our ceph cluster. After a long day we 
were able to fix it, but we are very unsatisfied with the fix and assume that 
it is also only temporary. Any hint or help is very appreciated.

We have a production ceph cluster (ceph version 10.2.5 on ubuntu 16.04, 
deployed with ceph-ansible) serving our private OpenStack cloud with object- 
and blockstorage. We went live a few days ago with a few friendly users / 
projects on it. Yesterday we woke up to a few alarms about failed VM creations 
and a few down OSDs. The troubleshooting took us the whole day and this is what 
we have found:

Starting situation:
Some OSDs were marked down, and we had 100 blocked requests on one OSD (+ a few 
on other OSDs), ceph was recovering and backfilling. We also saw OSDs rejoining 
the cluster from time to time, others left it as 'down'. Causing recovery 
traffic.

Analysis:
- The OSD process with 100 blocked request was using significantly more CPU 
than the other OSDs on the node. Lets call this OSD osd.113 (although this 
changed later on). Most of the time it was like >120% - it looked like one 
thread was using its CPU up to 100%.

- In the logs of the OSDs that recently left the cluster as down we found 
several of the following entries:

2017-03-01 10:22:29.258982 7f840d258700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:29.258988 7f840d157700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:31.080447 7f8447ea6700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:32.75 7f840d258700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:32.760010 7f840d157700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:36.080545 7f8447ea6700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had timed out after 15
2017-03-01 10:22:36.080568 7f8447ea6700 1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f842a0bf700' had suicide timed out after 150

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x80) [0x55c490e24520]
2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x259) [0x55c490d61209]
3: (ceph::HeartbeatMap::is_healthy()+0xe6) [0x55c490d61b36]
4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x55c490d6238c]
5: (CephContextServiceThread::entry()+0x167) [0x55c490e3cda7]
6: (()+0x76ba) [0x7f4f028096ba]
7: (clone()+0x6d) [0x7f4f0088282d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

  On the monitor we also saw several of those entries for different OSDs:

2017-03-01 08:34:23.321262 osd.77 10.97.162.16:6843/61369 62043 : cluster 
[WRN] map e357 wrongly marked me down

  Something was preventing OSDs from properly sending or answering heartbeats...

- We then tried to increase the osd_heartbeat_grace parameter with 'ceph tell 
osd --injectargs' to break the chain of leaving and rejoining OSDs. We were 
able to set it on all OSDs except osd.113 (the one using >120% CPU and having 
100 blocked request), because it did not answer. What we got was a bit more 
stability. The cluster recovered to an almost stable state:

# ceph pg dump_stuck | column -t
ok
pg_stat  state   up up_primary  
actingacting_primary
4.33 active+recovering+degraded+remapped [113,104,115]  113 
[113,77,87]   113
8.7  active+recovery_wait+degraded   [113,75,94]113 
[113,75,94]   113
3.40 active+recovery_wait+degraded+remapped  [113,92,78]113 
[113,116,84]  113

  Although it was saying ' 4.33 active+recovering' there was no actual 
recovery traffic visible in 'ceph -s’ over quiet some time. We tried 'ceph pg 
4.33 query' but the command did not return.

- As we now had a clear indication that somethong was wrong with osd.113 we 
decided to take it down and let ceph recover. As soon as we took osd.113 down 
its PGs got a new primary OSD and ... the CPU usage >120% and the 100 blocked 
request disappeared from osd.113, only to reappear on the new primary node for 
PG 4.33. Also we now had a lot of additional recovery traffic, because osd.113 
was taken down and the new primary OSD also went down with a suicide timout (as 
seen above), switching the primary to a new OSD, *leaving us exactly with our 
starting situation*, just with a different primary for the 'defect PG' 4.33. In 
this vicious circle the new primary OSDs also died from heartbeat_map timeouts 
from time to time, causing PG 4.33 to 'wander' through our OSDs, leaving behind 
a mess of down, 

[ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-02 Thread nokia ceph
Hello,

Env:- v11.2.0 - bluestore - EC 3 + 1

We are getting below entries both in /var/log/messages and osd logs. May I
know what is the impact of the below message and as these message were
flooded in osd and sys logs.

~~~

2017-03-01 13:00:59.938839 7f6c96915700 -1
bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit retries 2

2017-03-01 13:00:59.940939 7f6c96915700 -1
bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit retries 4

2017-03-01 13:00:59.941126 7f6c96915700 -1
bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit retries 1

~~~


I found these messages during the activation of ceph-osd's


~~~ From /var/log/messages

Feb 27 13:59:46 PL6-CN2 sh: command_check_call: Running command:
/usr/bin/systemctl start ceph-osd@39

Feb 27 13:59:46 PL6-CN2 systemd: Started Ceph disk activation: /dev/sdk2.

Feb 27 13:59:46 PL6-CN2 ceph-osd: 2017-02-27 13:59:46.540781 7f1f3f016700
-1 bdev(/var/lib/ceph/osd/ceph-39/block) aio_submit retries 9

Feb 27 13:59:46 PL6-CN2 ceph-osd: 2017-02-27 13:59:46.544670 7f1f3f016700
-1 bdev(/var/lib/ceph/osd/ceph-39/block) aio_submit retries 2

Feb 27 13:59:46 PL6-CN2 ceph-osd: 2017-02-27 13:59:46.544854 7f1f3f016700
-1 bdev(/var/lib/ceph/osd/ceph-39/block) aio_submit retries 1

Feb 27 13:59:47 PL6-CN2 kernel: sdl: sdl1

Feb 27 13:59:47 PL6-CN2 kernel: sdl: sdl1

Feb 27 13:59:48 PL6-CN2 kernel: sdl: sdl1 sdl2

Feb 27 13:59:48 PL6-CN2 systemd: Cannot add dependency job for unit
microcode.service, ignoring: Unit is not loaded properly: Invalid argument.

Feb 27 13:59:48 PL6-CN2 systemd: Starting Ceph disk activation: /dev/sdl2...
~~~

At the same time on OSD logs:-- /var/log/ceph/ceph-osd.43.log-20170228.gz

2017-02-27 14:00:17.121460 7f147351e940  0 osd.43 0 crush map has features
2199057072128, adjusting msgr requires for clients
2017-02-27 14:00:17.121466 7f147351e940  0 osd.43 0 crush map has features
2199057072128 was 8705, adjusting msgr requires for mons
2017-02-27 14:00:17.121468 7f147351e940  0 osd.43 0 crush map has features
2199057072128, adjusting msgr requires for osds
2017-02-27 14:00:17.121511 7f147351e940  0 osd.43 0 load_pgs
2017-02-27 14:00:17.121514 7f147351e940  0 osd.43 0 load_pgs opened 0 pgs
2017-02-27 14:00:17.121517 7f147351e940  0 osd.43 0 using 1 op queue with
priority op cut off at 64.
2017-02-27 14:00:17.122364 7f147351e940 -1 osd.43 0 log_to_monitors
{default=true}
2017-02-27 14:00:18.371762 7f147351e940  0 osd.43 0 done with init,
starting boot process
2017-02-27 14:00:18.486559 7f1459952700 -1
bdev(/var/lib/ceph/osd/ceph-43/block) aio_submit retries 7
2017-02-27 14:00:18.488770 7f1459952700 -1
bdev(/var/lib/ceph/osd/ceph-43/block) aio_submit retries 4
2017-02-27 14:00:18.489306 7f1459952700 -1
bdev(/var/lib/ceph/osd/ceph-43/block) aio_submit retries 2
2017-02-27 14:00:18.489826 7f1459952700 -1
bdev(/var/lib/ceph/osd/ceph-43/block) aio_submit retries 2

..

2017-02-27 14:00:18.583234 7f145814f700  0 osd.43 93 crush map has features
2200130813952, adjusting msgr requires for clients
2017-02-27 14:00:18.583257 7f145814f700  0 osd.43 93 crush map has features
2200130813952 was 2199057080833, adjusting msgr requires for mons
2017-02-27 14:00:18.583271 7f145814f700  0 osd.43 93 crush map has features
2200130813952, adjusting msgr requires for osds


As per my understanding on a bluestore device, we can write with both
O_DIRECT and aio way. For some reasons, bdev can't able to commit with aio
type.

~~~
"ms_type": "async"
~~~

Need your suggestion how to skip these messages.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com