[ceph-users] failed to connect to the RADOS monitor on: IP:6789, : Connection timed out

2017-03-21 Thread Vince

Hi,

We have setup a ceph cluster and while adding it as primary storage in 
Cloudstack, I am getting the below error in hypervisor server . The 
error says the hypervisor server timed out while connecting to the ceph 
monitor.


Disabled firewall and made sure ports are open. This is the final step. 
Please help.


==

2017-03-22 02:26:48,842 INFO [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-2:null) Didn't find an existing storage pool 
2a8446a8-cd7b-33a8-b5c7-7cda92f0 by UUID, checking for pools with 
duplicate paths
2017-03-22 02:27:18,999 ERROR [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-2:null) Failed to create RBD storage pool: 
org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: 
123.345.56.7:6789,: Connection timed out
2017-03-22 02:27:18,999 ERROR [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-2:null) Failed to create the RBD storage pool, 
cleaning up the libvirt secret


==


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Vince

Hi,

I have checked and confirmed that the monitor daemon is running and the 
socket file /var/run/ceph/ceph-mon.mon1.asok has been created. But the 
server messages is still showing the error.



Mar 22 00:47:38 mon1 ceph-create-keys: admin_socket: exception getting 
command descriptions: [Errno 2] No such file or directory
Mar 22 00:47:38 mon1 ceph-create-keys: *INFO:ceph-create-keys:ceph-mon 
admin socket not ready yet*.




[root@mon1 ~]# ll /var/run/ceph/ceph-mon.mon1.asok
srwxr-xr-x. 1 ceph ceph 0 Mar 21 04:13*/var/run/ceph/ceph-mon.mon1.asok*


=
[root@mon1 ~]# systemctl status ceph-mon@mon1.service
● ceph-mon@mon1.service - Ceph cluster monitor daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; 
vendor preset: disabled)

*Active: active (running*) since Tue 2017-03-21 04:13:20 PDT; 17h ago
 Main PID: 29746 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@mon1.service
   └─29746 /usr/bin/ceph-mon -f --cluster ceph --id mon1 
--setuser ceph --setgroup ceph


Mar 21 04:13:20 mon1.ihnetworks.com systemd[1]: Started Ceph cluster 
monitor daemon.
Mar 21 04:13:20 mon1.ihnetworks.com systemd[1]: Starting Ceph cluster 
monitor daemon...
Mar 21 04:13:20 mon1.ihnetworks.com ceph-mon[29746]: starting mon.mon1 
rank 0 at 10.10.48.7:6789/0 mon_data /var/lib/ceph/mon/ceph-mon1 fsid 
ebac75fc-e631...cbcdd1d25
Mar 21 04:21:23 mon1.ihnetworks.com systemd[1]: 
[/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' 
in section 'Service'
Mar 21 04:35:47 mon1.ihnetworks.com systemd[1]: 
[/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' 
in section 'Service'
Mar 21 04:40:25 mon1.ihnetworks.com systemd[1]: 
[/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' 
in section 'Service'
Mar 21 04:43:01 mon1.ihnetworks.com systemd[1]: 
[/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' 
in section 'Service'
Mar 21 05:39:56 mon1.ihnetworks.com systemd[1]: 
[/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' 
in section 'Service'

Hint: Some lines were ellipsized, use -l to show in full.
=


On 03/21/2017 11:34 PM, Wes Dillingham wrote:
Generally this means the monitor daemon is not running. Is the monitor 
daemon running? The monitor daemon creates the admin socket in 
/var/run/ceph/$socket


Elaborate on how you are attempting to deploy ceph.

On Tue, Mar 21, 2017 at 9:01 AM, Vince > wrote:


Hi,

I am getting the below error in messages after setting up ceph
monitor.

===
Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception
getting command descriptions: [Errno 2] No such file or directory
Mar 21 08:48:23 mon1 ceph-create-keys:
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception
getting command descriptions: [Errno 2] No such file or directory
Mar 21 08:48:23 mon1 ceph-create-keys:
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
===

On checking the ceph-create-keys service status, getting the below
error.

===
[root@mon1 ~]# systemctl status ceph-create-keys@mon1.service

● ceph-create-keys@mon1.service
 - Ceph cluster key creator task
Loaded: loaded (/usr/lib/systemd/system/ceph-create-keys@.service;
static; vendor preset: disabled)
Active: inactive (dead) since Thu 2017-02-16 10:47:14 PST; 1
months 2 days ago
Condition: start condition failed at Tue 2017-03-21 05:47:42 PDT;
2s ago
ConditionPathExists=!/var/lib/ceph/bootstrap-mds/ceph.keyring was
not met
Main PID: 2576 (code=exited, status=0/SUCCESS)
===

Have anyone faced this error before ?

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu 
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Thanks Brad

--
Deepak

> On Mar 21, 2017, at 9:31 PM, Brad Hubbard  wrote:
> 
>> On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu  wrote:
>> Do we know which version of ceph client does this bug has a fix. Bug:
>> http://tracker.ceph.com/issues/17191
>> 
>> 
>> 
>> I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common-
>> 10.2.6-1(Ubuntu 14.04.5)
> 
> ceph-client is the repository for the ceph kernel client (kernel modules).
> 
> The commits referenced in the tracker above went into upstream kernel 4.9-rc1.
> 
> https://lkml.org/lkml/2016/10/8/110
> 
> I doubt these are available in any CentOS 7.x kernel yet but you could
> check the source.
> 
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> This email message is for the sole use of the intended recipient(s) and may
>> contain confidential information.  Any unauthorized review, use, disclosure
>> or distribution is prohibited.  If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> 
> -- 
> Cheers,
> Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Brad Hubbard
On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu  wrote:
> Do we know which version of ceph client does this bug has a fix. Bug:
> http://tracker.ceph.com/issues/17191
>
>
>
> I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common-
> 10.2.6-1(Ubuntu 14.04.5)

ceph-client is the repository for the ceph kernel client (kernel modules).

The commits referenced in the tracker above went into upstream kernel 4.9-rc1.

https://lkml.org/lkml/2016/10/8/110

I doubt these are available in any CentOS 7.x kernel yet but you could
check the source.

>
>
>
> --
>
> Deepak
>
> 
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential information.  Any unauthorized review, use, disclosure
> or distribution is prohibited.  If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Do we know which version of ceph client does this bug has a fix. Bug: 
http://tracker.ceph.com/issues/17191

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common- 
10.2.6-1(Ubuntu 14.04.5)

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-21 Thread Brad Hubbard
Based solely on the information given the only rpms with this specific commit in
them would be here
https://shaman.ceph.com/builds/ceph/wip-prune-past-intervals-kraken/
(specifically
https://4.chacra.ceph.com/r/ceph/wip-prune-past-intervals-kraken/8263140fe539f9c3241c1c0f6ee9cfadde9178c0/centos/7/flavors/default/x86_64/).
These are test rpms, not official releases.

Note that the branch "wip-prune-past-intervals-kraken" exists only in the
ceph-ci repo and *not* the main ceph repo and that the particular commit above
does not seem to have made it into the "ceph" repo.

$ git log -S _simplify_past_intervals
$ git log --grep="_simplify_past_intervals"
$

Given this commit is not in the ceph repo I would suggest we have never shipped
an official rpm that contains this commit.

It's not totally clear to me exactly what you are trying to achieve, maybe you
could have another go at describing your objective?

On Wed, Mar 22, 2017 at 12:26 AM, nokia ceph  wrote:
> Hello,
>
> I made some changes in the below file on ceph kraken v11.2.0 source code as
> per this article 
>
> https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken
>
> ..src/osd/PG.cc
> ..src/osd/PG.h
>
> Is there any way to find which rpm got affected by these two files. I
> believe it should be ceph-osd-11.2.0-0.el7.x86_64.rpm . Can you confirm
> please ?
>
> I failed to find it from the ceph.spec file. 
>
> Could anyone please guide me the right procedure to check this.
>
> The main intention is that if we find the exact rpm affected by these files,
> we can simply overwrite it with the old rpm.  
>
> Awaiting for comments.
>
> Thanks
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Shinobu Kinjo
> I am sure I remember having to reduce min_size to 1 temporarily in the past 
> to allow recovery from having two drives irrecoverably die at the same time 
> in one of my clusters.

What was the situation that you had to do that?
Thanks for sharing your experience in advance.

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-21 Thread Shain Miley
Hi,

Thank you for providing me this level of detail.

I ended up just failing the drive since it is still under support and we had in 
fact gotten emails about the health of this drive in the past.

I will however use this in the future if we have an issue with a pg and it is 
the first time  we have had an issue with the drive and/or it's not still under 
support.

Thanks again.

Shain

> On Mar 19, 2017, at 11:19 AM, Mehmet  wrote:
> 
> Hi Shain,
> 
> what i would do:
> take the osd.32 out
> 
> # systemctl stop ceph-osd@32
> # ceph osd out osd.32
> 
> this will cause rebalancing.
> 
> to repair/reuse the drive you can do:
> 
> # smartctl -t long /dev/sdX
> This will start a long self-test on the drive and - i bet - abort this after 
> a while with somethin like
> 
> # smartctl -a /dev/sdX
> [...]
> SMART Self-test log
> 
> Num  Test  Status segment  LifeTime  
> LBA_first_err [SK ASC ASQ]
> 
> Description  number   (hours)
> 
> # 1  Background long   Failed in segment -->   -4378  
> 35494670 [0x3 0x11 0x0]
> [...]
> 
> 
> Now mark the segmant as "malfunction" - my system was Ubuntu
> 
> # apt install sg3-utils/xenial
> # sg_verify --lba=35494670 /dev/sdX1
> # sg_reassign --address=35494670 /dev/sdX
> # sg_reassign --grown /dev/sdX
> 
> the next long test should hopefully work fine:
> # smartctl -t long /dev/sdX
> 
> If not repeat the above with new found defekt lba.
> 
> Ive done this three time successfully - but not with an error on a primary pg.
> 
> After that you can start the osd with
> 
> # systemctl start ceph-osd@32
> # ceph osd in osd.32
> 
> HTH
> - Mehmet
> 
> 
> Am 2017-03-17 20:08, schrieb Shain Miley:
>> Brian,
>> Thank you for the detailed information.  I was able to compare the 3
>> hexdump files and it looks like the primary pg is the odd man out.
>> I stopped the OSD and then I attempted to move the object:
>> root@hqosd3:/var/lib/ceph/osd/ceph-32/current/3.2b8_head/DIR_8/DIR_B/DIR_2/DIR_A/DIR_0#
>> mv rb.0.fe307e.238e1f29.0076024c__head_4650A2B8__3 /root
>> mv: error reading
>> ‘rb.0.fe307e.238e1f29.0076024c__head_4650A2B8__3’:
>> Input/output error
>> mv: failed to extend
>> ‘/root/rb.0.fe307e.238e1f29.0076024c__head_4650A2B8__3’:
>> Input/output error
>> However I got a nice Input/output error instead.
>> I assume that this is not the case normally.
>> Any ideas on how I should proceed at this point..should I fail out
>> this OSD and replace the drive (I have had no indication other than
>> the IO error that there is an issue with this disk), or is there
>> something I can try first?
>> Thanks again,
>> Shain
>>> On 03/17/2017 11:38 AM, Brian Andrus wrote:
>>> We went through a period of time where we were experiencing these
>>> daily...
>>> cd to the PG directory on each OSD and do a find for
>>> "238e1f29.0076024c" (mentioned in your error message). This will
>>> likely return a file that has a slash in the name, something like
>>> rbdudata.238e1f29.0076024c_head_blah_1f...
>>> hexdump -C the object (tab completing the name helps) and pipe the
>>> output to a different location. Once you obtain the hexdumps, do a
>>> diff or cmp against them and find which one is not like the others.
>>> If the primary is not the outlier, perform the PG repair without
>>> worry. If the primary is the outlier, you will need to stop the OSD,
>>> move the object out of place, start it back up and then it will be
>>> okay to issue a PG repair.
>>> Other less common inconsistent PGs we see are differing object sizes
>>> (easy to detect with a simple list of file size) and differing
>>> attributes ("attr -l", but the error logs are usually precise in
>>> identifying the problematic PG copy).
 On Fri, Mar 17, 2017 at 8:16 AM, Shain Miley  wrote:
 Hello,
 Ceph status is showing:
 1 pgs inconsistent
 1 scrub errors
 1 active+clean+inconsistent
 I located the error messages in the logfile after querying the pg
 in question:
 root@hqosd3:/var/log/ceph# zgrep -Hn 'ERR' ceph-osd.32.log.1.gz
 ceph-osd.32.log.1.gz:846:2017-03-17 02:25:20.281608 7f7744d7f700
 -1 log_channel(cluster) log [ERR] : 3.2b8 shard 32: soid
 3/4650a2b8/rb.0.fe307e.238e1f29.0076024c/head candidate had a
 read error, data_digest 0x84c33490 != known data_digest 0x974a24a7
 from auth shard
>> 62   
>>  
 ceph-osd.32.log.1.gz:847:2017-03-17 02:30:40.264219 7f7744d7f700
 -1 log_channel(cluster) log [ERR] : 3.2b8 deep-scrub 0 missing, 1
 inconsistent
>> objects 
 ceph-osd.32.log.1.gz:848:2017-03-17 02:30:40.264307 7f7744d7f700
 -1 log_channel(cluster) log [ERR] : 3.2b8 deep-scrub 1 errors
 Is this a case where it would be safe to use 'ceph pg repair'? The
 documentation indicates there are times where running this command
 is

Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Anthony D'Atri
I’m fairly sure I saw it as recently as Hammer, definitely Firefly. YMMV.


> On Mar 21, 2017, at 4:09 PM, Gregory Farnum  wrote:
> 
> You shouldn't need to set min_size to 1 in order to heal any more. That was 
> the case a long time ago but it's been several major LTS releases now. :)
> So: just don't ever set min_size to 1.
> -Greg
> On Tue, Mar 21, 2017 at 6:04 PM Anthony D'Atri  wrote:
> >> a min_size of 1 is dangerous though because it means you are 1 hard disk 
> >> failure away from losing the objects within that placement group entirely. 
> >> a min_size of 2 is generally considered the minimum you want but many 
> >> people ignore that advice, some wish they hadn't.
> >
> > I admit I am having difficulty following why this is the case
> 
> I think we have a case of fervently agreeing.
> 
> Setting min_size on a specific pool to 1 to allow PG’s to heal is absolutely 
> a normal thing in certain circumstances, but it’s important to
> 
> 1) Know _exactly_ what you’re doing, to which pool, and why
> 2) Do it very carefully, changing ‘size’ instead of ‘min_size’ on a busy pool 
> with a bunch of PG’s and data can be quite the rude awakening.
> 3) Most importantly, _only_ set it for the minimum time needed, with eyes 
> watching the healing, and set it back immediately after all affected PG’s 
> have peered and healed.
> 
> The danger, which I think is what Wes was getting at, is in leaving it set to 
> 1 all the time, or forgetting to revert it.  THAT is, as we used to say, 
> begging to lose.
> 
> — aad
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados gateway

2017-03-21 Thread Garg, Pankaj
Hi,
I'm installing Rados Gateway, using Jewel 10.2.5, and can't seem to find the 
correct documentation.
I used ceph-deploy to start the gateway, but cant seem to restart the process 
correctly.

Can someone point me to the correct steps?
Also, how do I start my rados gateway back.

This is what I was following :

http://docs.ceph.com/docs/jewel/install/install-ceph-gateway/

I'm on Ubuntu 16.04.

Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Gregory Farnum
You shouldn't need to set min_size to 1 in order to heal any more. That was
the case a long time ago but it's been several major LTS releases now. :)
So: just don't ever set min_size to 1.
-Greg
On Tue, Mar 21, 2017 at 6:04 PM Anthony D'Atri  wrote:

> >> a min_size of 1 is dangerous though because it means you are 1 hard
> disk failure away from losing the objects within that placement group
> entirely. a min_size of 2 is generally considered the minimum you want but
> many people ignore that advice, some wish they hadn't.
> >
> > I admit I am having difficulty following why this is the case
>
> I think we have a case of fervently agreeing.
>
> Setting min_size on a specific pool to 1 to allow PG’s to heal is
> absolutely a normal thing in certain circumstances, but it’s important to
>
> 1) Know _exactly_ what you’re doing, to which pool, and why
> 2) Do it very carefully, changing ‘size’ instead of ‘min_size’ on a busy
> pool with a bunch of PG’s and data can be quite the rude awakening.
> 3) Most importantly, _only_ set it for the minimum time needed, with eyes
> watching the healing, and set it back immediately after all affected PG’s
> have peered and healed.
>
> The danger, which I think is what Wes was getting at, is in leaving it set
> to 1 all the time, or forgetting to revert it.  THAT is, as we used to say,
> begging to lose.
>
> — aad
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Preconditioning an RBD image

2017-03-21 Thread Alex Gorbachev
I wanted to share the recent experience, in which a few RBD volumes,
formatted as XFS and exported via Ubuntu NFS-kernel-server performed
poorly, even generated an "out of space" warnings on a nearly empty
filesystem.  I tried a variety of hacks and fixes to no effect, until
things started magically working just after some dd write testing.

The only explanation I can come up with is that preconditioning, or
thickening, the images with this benchmarking is what caused the
improvement.

Ceph is Hammer 0.94.7 running on Ubuntu 14.04, kernel 4.10 on OSD nodes and
4.4 on NFS nodes.

Regards,
Alex
Storcium
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Anthony D'Atri
Deploying or removing OSD’s in parallel for sure can save elapsed time and 
avoid moving data more than once.  There are certain pitfalls, though, and the 
strategy needs careful planning.

- Deploying a new OSD at full weight means a lot of write operations.  Running 
multiple whole-OSD backfills to a single host can — depending on your situation 
— saturate the HBA, resulting in slow requests. 
- Judicious setting of norebalance/norecover can help somewhat, to give the 
affected OSD’s/ PG’s time to peer and become ready before shoving data at them
- Deploying at 0 CRUSH weight and incrementally ratcheting up the weight as 
PG’s peer can spread that out
- I’ve recently seen the idea of temporarily setting primary-affinity to 0 on 
the affected OSD’s to deflect some competing traffic as well
- One workaround is that if you have OSD’s to deploy on more than one server, 
you could deploy them in batches of say 1-2 on each server, striping them if 
you will.  That diffuses the impact and results in faster elapsed recovery

As for how many is safe to do in parallel, there are multiple variables there.  
HDD vs SSD, client workload.  And especially how many other OSD’s are in the 
same logical rack/host.  On a cluster of 450 OSD’s, with 150 in each logical 
rack, each OSD is less than 1% of a rack, so deploying 4 of them at once would 
not be a massive change.  However in a smaller cluster with say 45 OSD’s, 15 in 
each rack, that would tickle a much larger fraction of the cluster and be more 
disruptive.

If the numbers below are TOTALS, if you would be expanding your cluster from a 
total of 4 OSD’s to a total of 8, that would be something I wouldn’t do, having 
experienced under Dumpling what it was like to triple the size of a certain 
cluster in one swoop.  

So one approach is trial and error to see how many you can get away with before 
you get slow requests, then backing off.  In production of course this is 
playing with fire. Depending on which release you’re running, cranking down a 
common set of backfill/recovery tunable can help mitigate the thundering herd 
effect as well.

— aad

> This morning I tried the careful approach, and added one OSD to server1. 
> It all went fine, everything rebuilt and I have a HEALTH_OK again now. 
> It took around 7 hours.
> 
> But now I started thinking... (and that's when things go wrong, 
> therefore hoping for feedback here)
> 
> The question: was I being stupid to add only ONE osd to the server1? Is 
> it not smarter to add all four OSDs at the same time?
> 
> I mean: things will rebuild anyway...and I have the feeling that 
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than 
> rebuilding from 4 -> 5 OSDs. Right?
> 
> So better add all new OSDs together on a specific server?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Anthony D'Atri
>> a min_size of 1 is dangerous though because it means you are 1 hard disk 
>> failure away from losing the objects within that placement group entirely. a 
>> min_size of 2 is generally considered the minimum you want but many people 
>> ignore that advice, some wish they hadn't. 
> 
> I admit I am having difficulty following why this is the case

I think we have a case of fervently agreeing.

Setting min_size on a specific pool to 1 to allow PG’s to heal is absolutely a 
normal thing in certain circumstances, but it’s important to

1) Know _exactly_ what you’re doing, to which pool, and why
2) Do it very carefully, changing ‘size’ instead of ‘min_size’ on a busy pool 
with a bunch of PG’s and data can be quite the rude awakening.
3) Most importantly, _only_ set it for the minimum time needed, with eyes 
watching the healing, and set it back immediately after all affected PG’s have 
peered and healed.

The danger, which I think is what Wes was getting at, is in leaving it set to 1 
all the time, or forgetting to revert it.  THAT is, as we used to say, begging 
to lose.

— aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-21 Thread Deepak Naidu
Greetings,

I have below two cephFS "volumes/filesystem" created on my ceph cluster. Yes I 
used the "enable_multiple" flag to enable the multiple cephFS feature. My 
question


1)  How do I mention the fs name ie dataX or data1 during cephFS mount 
either using kernel mount of ceph-fuse mount.

2)  When using kernel / ceph-fuse how do I mention dataX or data1 during 
the fuse mount or kernel mount


[root@Admin ~]# ceph fs ls
name: dataX, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_cepfsData ]
name: data1, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_cepfsData ]


--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Correcting inconsistent pg in EC pool

2017-03-21 Thread Graham Allan
I came across an inconsistent pg in our 4+2 EC storage pool (ceph 
10.2.5). Since "ceph pg repair" wasn't able to correct it, I followed 
the general outline given in this thread


http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003965.html


# zgrep -Hn ERR /var/log/ceph/ceph-osd.368.log.*
/var/log/ceph/ceph-osd.368.log.1.gz:525:2017-03-19 23:41:11.736066 7f7a649d9700 
-1 log_channel(cluster) log [ERR] : 70.319s0 shard 63(2): soid 
70:98cb99a5:::default.539464.38__multipart_140411_SN261_0546_AC49YEACXX%2fsam%2fF10216D_CG_L004_001.sorted.dedup.realigned.recal.gvcf.2~LgDQTFVEBK6TSp2Kaw2Z3aylGsP_cRa.156:head
 candidate had a read error
/var/log/ceph/ceph-osd.368.log.1.gz:529:2017-03-19 23:47:47.160589 7f7a671de700 
-1 log_channel(cluster) log [ERR] : 70.319s0 deep-scrub 0 missing, 1 
inconsistent objects
/var/log/ceph/ceph-osd.368.log.1.gz:530:2017-03-19 23:47:47.160624 7f7a671de700 
-1 log_channel(cluster) log [ERR] : 70.319 deep-scrub 1 errors


shows where the error lies, and on that osd:


/var/log/ceph/ceph-osd.63.log.1.gz:811:2017-03-19 23:41:11.657532 7f8d67f77700  
0 osd.63 pg_epoch: 474876 pg[70.319s2( v 474876'387130 
(474876'384063,474876'387130] local-les=474678 n=38859 ec=21494 les/c/f 
474678/474682/0 474662/474673/474565) [368,151,63,313,432,272] r=2 lpr=474673 
pi=135288-474672/1939 luod=0'0 crt=474876'387128 active NIBBLEWISE] _scan_list  
70:98cb99a5:::default.539464.38__multipart_140411_SN261_0546_AC49YEACXX%2fsam%2fF10216D_CG_L004_001.sorted.dedup.realigned.recal.gvcf.2~LgDQTFVEBK6TSp2Kaw2Z3aylGsP_cRa.156:head
 got -5 on read, read_error


and indeed the file has a read error.

So I set the osd down, and used ceph-objectstore-tool to export then 
remove the affected pg (actually it couldn't export without first 
deleting the bad file).


after restarting the osd... and waiting for recovery... the pg directory 
and contents all appear to have been recreated, but the pg is still 
active+clean+inconsistent...


Am I missing something? "ceph pg repair" and "ceph pg scrub" also don't 
clear the inconsistency.


Thanks for any suggestions,

G.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd exclusive-lock

2017-03-21 Thread Jason Dillaman
The exclusive-lock feature does, by default, automatically transition
the lock between clients that are attempting to use the image. Only
one client will be able to issue writes to the image at a time. If you
ran "dd" against both mappings concurrently, I'd expect you'd see a
vastly decreased throughput due to this lock exchange.

There is a pending change to optionally disable this automatic lock
transition that it is expected be included in a future kernel release.

On Tue, Mar 21, 2017 at 6:54 AM, Mikaël Cluseau  wrote:
> Hi,
>
> There's something I don't understand about the exclusive-lock feature.
>
> I created an image:
>
> $ ssh host-3
> Container Linux by CoreOS stable (1298.6.0)
> Update Strategy: No Reboots
> host-3 ~ # uname -a
> Linux host-3 4.9.9-coreos-r1 #1 SMP Tue Mar 14 21:09:42 UTC 2017 x86_64 
> Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux
> host-3 ~ # rbd create test-xlock --size=1024 --image-feature exclusive-lock
> host-3 ~ # rbd feature enable test-xlock exclusive-lock
> rbd: failed to update image features: (22) Invalid argument2017-03-21 
> 10:16:50.911598 7f6975ff0100 -1 librbd: one or more requested features are 
> already enabled
>
> I mapped it
>
> host-3 ~ # rbd map --options lock_on_read test-xlock
> /dev/rbd0
>
> I also could map it from another host (disappointment started here):
>
> host-2 ~ # rbd map --options lock_on_read test-xlock
> /dev/rbd9
>
> I can read from both host:
>
> host-2 ~ # dd if=/dev/rbd9 of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.61844 s, 232 MB/s
>
> host-3 ~ # dd if=/dev/rbd0 of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.33527 s, 322 MB/s
>
> And also write:
>
> host-3 ~ # dd if=/dev/urandom of=/dev/rbd0 bs=1M count=1
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0153388 s, 68.4 MB/s
>
> host-2 ~ # dd if=/dev/urandom of=/dev/rbd9 bs=1M count=1
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0312396 s, 33.6 MB/s
>
> Isn't exclusive-lock supposed to forbid at least concurrent writes?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw global quotas

2017-03-21 Thread Graham Allan

On 03/17/2017 11:47 AM, Casey Bodley wrote:


On 03/16/2017 03:47 PM, Graham Allan wrote:

This might be a dumb question, but I'm not at all sure what the
"global quotas" in the radosgw region map actually do.

It is like a default quota which is applied to all users or buckets,
without having to set them individually, or is it a blanket/aggregate
quota applied across all users and buckets in the region/zonegroup?

Graham


They're defaults that are applied in the absence of quota settings on
specific users/buckets, not aggregate quotas. I agree that the
documentation in http://docs.ceph.com/docs/master/radosgw/admin/ is not
clear about the relationship between 'default quotas' and 'global
quotas' - they're basically the same thing, except for their scope.


Thanks, that's great to know, and exactly what I hoped it would do. It 
seemed most likely but not 100% obvious!


My next question is how to set/enable the master quota, since I'm not 
sure that the documented procedure still works for jewel. Although 
radosgw-admin doesn't acknowledge the "region-map" command in its help 
output any more, it does accept it, however the "region-map set" appears 
to have no effect.


I think I should be using the radosgw-admin period commands, but it's 
not clear to me how I can update the quotas within the period_config


G.
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Kjetil Jørgensen
Hi,

On Tue, Mar 21, 2017 at 11:59 AM, Adam Carheden  wrote:

> Let's see if I got this. 4 host cluster. size=3, min_size=2. 2 hosts
> fail. Are all of the following accurate?
>
> a. An rdb is split into lots of objects, parts of which will probably
> exist on all 4 hosts.
>

Correct.


>
> b. Some objects will have 2 of their 3 replicas on 2 of the offline OSDs.
>
> Likely correct.


> c. Reads can continue from the single online OSD even in pgs that
> happened to have two of 3 osds offline.
>
>
Hypothetically (This is partially informed guessing on my part):
If the survivor happens to be the acting primary and it were up-to-date at
the time,
it can in theory serve reads. (Only the primary serves reads).

If the survivor weren't the acting primary - you don't have any guarantees
as to
whether or not it had the most up-to-date version of any objects. I don't
know
if enough state is tracked outside of the osds to make this determination,
but
I doubt it (it feels costly to maintain).

Regardless of scenario - I'd guess - the PG is marked as down, and will stay
that way until you revive either of deceased OSDs or you essentially tell
ceph
that they're a lost cause and incur potential data loss over that. (See:
ceph osd lost).

d. Writes hang for pgs that have 2 offline OSDs because CRUSH can't meet
> the min_size=2 constraint.
>

Correct.


> e. Rebalancing does not occur because with only two hosts online there
> is no way for CRUSH to meet the size=3 constraint even if it were to
> rebalance.
>

Partially correct, see c)

f. I/O can been restored by setting min_size=1.
>

See c)


> g. Alternatively, I/O can be restored by setting size=2, which would
> kick off rebalancing and restored I/O as the pgs come into compliance
> with the size=2 constraint.
>

See c)


> h. If I instead have a cluster with 10 hosts, size=3 and min_size=2 and
> two hosts fail, some pgs would have only 1 OSD online, but rebalancing
> would start immediately since CRUSH can honor the size=3 constraint by
> rebalancing. This means more nodes makes for a more reliable cluster.
>

See c)

Side-note: This is where you start using crush to enumerate what you'd
consider
the likely failure domains for concurrent failures. I.e. you have racks
with distinct
power circuits and TOR switches, your more likely large scale failures will
be
a rack, so you tell crush to maintain replicas in distinct racks.

i. If I wanted to force CRUSH to bring I/O back online with size=3 and
> min_size=2 but only 2 hosts online, I could remove the host bucket from
> the crushmap. CRUSH would then rebalance, but some PGs would likely end
> up with 3 OSDs all on the same host. (This is theory. I promise not to
> do any such thing to a production system ;)
>

Partially correct, see c).



> Thanks
> --
> Adam Carheden
>
>
> On 03/21/2017 11:48 AM, Wes Dillingham wrote:
> > If you had set min_size to 1 you would not have seen the writes pause. a
> > min_size of 1 is dangerous though because it means you are 1 hard disk
> > failure away from losing the objects within that placement group
> > entirely. a min_size of 2 is generally considered the minimum you want
> > but many people ignore that advice, some wish they hadn't.
> >
> > On Tue, Mar 21, 2017 at 11:46 AM, Adam Carheden  > > wrote:
> >
> > Thanks everyone for the replies. Very informative. However, should I
> > have expected writes to pause if I'd had min_size set to 1 instead
> of 2?
> >
> > And yes, I was under the false impression that my rdb devices was a
> > single object. That explains what all those other things are on a
> test
> > cluster where I only created a single object!
> >
> >
> > --
> > Adam Carheden
> >
> > On 03/20/2017 08:24 PM, Wes Dillingham wrote:
> > > This is because of the min_size specification. I would bet you
> have it
> > > set at 2 (which is good).
> > >
> > > ceph osd pool get rbd min_size
> > >
> > > With 4 hosts, and a size of 3, removing 2 of the hosts (or 2
> drives 1
> > > from each hosts) results in some of the objects only having 1
> replica
> > > min_size dictates that IO freezes for those objects until min_size
> is
> > > achieved. http://docs.ceph.com/docs/jewel/rados/operations/pools/#
> set-the-number-of-object-replicas
> >  set-the-number-of-object-replicas>
> > >
> > > I cant tell if your under the impression that your RBD device is a
> > > single object. It is not. It is chunked up into many objects and
> spread
> > > throughout the cluster, as Kjeti mentioned earlier.
> > >
> > > On Mon, Mar 20, 2017 at 8:48 PM, Kjetil Jørgensen <
> kje...@medallia.com 
> > > >> wrote:
> > >
> > > Hi,
> > >
> > > rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's
> contents
> > > 

Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Jonathan Proulx



If it took 7hr for one drive you probably already done this (or
defaults are for low impact recovery) but before doing anything you
want to besure you OSD settings max backfills, max recovery active,
recovery sleep (perhaps others?) are set such that revovery and
backfilling doesn't overwhelm produciton use.

look through the recovery section of
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/ 

This is important because if you do have a failure and thus unplanned
recovery you want to have this tuned to your prefered balance of
quick performance or quick return to full redundancy. 

That said my theory is to add things in as balanced a way as possible to
minimize moves.

What that means depends on your crush map.

For me I have 3 "racks" and all (most) of my pools are 3x replication
so each object should have one copy in each rack.

I've only expanded once, but what I did was to add three servers.  One
to each 'rack'.  I set them all 'in' at the same time which should
have minimized movement between racks and moved obbjects from other
servers' osds in the same rack onto the osds in the new server.  This
seemed to work well for me.

In your case this would mean adding drives to all servers at once in a
balanced way.  That would prevent copy across servers since the
balance amoung servers wouldn't change.

You could do one disk on each server or load them all up and trust
recovery settings to keep the thundering herd in check.

As I said I've only gone through one expantion round and while this
theory seemed to work out for me hopefully someone with deeper
knowlege can confirm or deny it's general applicability.

-Jon

On Tue, Mar 21, 2017 at 07:56:57PM +0100, mj wrote:
:Hi,
:
:Just a quick question about adding OSDs, since most of the docs I can find
:talk about adding ONE OSD, and I'd like to add four per server on my
:three-node cluster.
:
:This morning I tried the careful approach, and added one OSD to server1. It
:all went fine, everything rebuilt and I have a HEALTH_OK again now. It took
:around 7 hours.
:
:But now I started thinking... (and that's when things go wrong, therefore
:hoping for feedback here)
:
:The question: was I being stupid to add only ONE osd to the server1? Is it
:not smarter to add all four OSDs at the same time?
:
:I mean: things will rebuild anyway...and I have the feeling that rebuilding
:from 4 -> 8 OSDs is not going to be much heavier than rebuilding from 4 -> 5
:OSDs. Right?
:
:So better add all new OSDs together on a specific server?
:
:Or not? :-)
:
:MJ
:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Steve Taylor
Generally speaking, you are correct. Adding more OSDs at once is more
efficient than adding fewer at a time.

That being said, do so carefully. We typically add OSDs to our clusters
either 32 or 64 at once, and we have had issues on occasion with bad
drives. It's common for us to have a drive or two go bad within 24
hours or so of adding them to Ceph, and if multiple drives fail in
multiple failure domains within a short amount of time, bad things can
happen. The efficient, safe approach is to add as many drives as
possible within a single failure domain, wait for recovery, and repeat.

On Tue, 2017-03-21 at 19:56 +0100, mj wrote:
> Hi,
>
> Just a quick question about adding OSDs, since most of the docs I
> can
> find talk about adding ONE OSD, and I'd like to add four per server
> on
> my three-node cluster.
>
> This morning I tried the careful approach, and added one OSD to
> server1.
> It all went fine, everything rebuilt and I have a HEALTH_OK again
> now.
> It took around 7 hours.
>
> But now I started thinking... (and that's when things go wrong,
> therefore hoping for feedback here)
>
> The question: was I being stupid to add only ONE osd to the server1?
> Is
> it not smarter to add all four OSDs at the same time?
>
> I mean: things will rebuild anyway...and I have the feeling that
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than
> rebuilding from 4 -> 5 OSDs. Right?
>
> So better add all new OSDs together on a specific server?
>
> Or not? :-)
>
> MJ
>



[cid:imagec7a5fc.JPG@dc945914.44a32fb5]   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Adam Carheden
Let's see if I got this. 4 host cluster. size=3, min_size=2. 2 hosts
fail. Are all of the following accurate?

a. An rdb is split into lots of objects, parts of which will probably
exist on all 4 hosts.

b. Some objects will have 2 of their 3 replicas on 2 of the offline OSDs.

c. Reads can continue from the single online OSD even in pgs that
happened to have two of 3 osds offline.

d. Writes hang for pgs that have 2 offline OSDs because CRUSH can't meet
the min_size=2 constraint.

e. Rebalancing does not occur because with only two hosts online there
is no way for CRUSH to meet the size=3 constraint even if it were to
rebalance.

f. I/O can been restored by setting min_size=1.

g. Alternatively, I/O can be restored by setting size=2, which would
kick off rebalancing and restored I/O as the pgs come into compliance
with the size=2 constraint.

h. If I instead have a cluster with 10 hosts, size=3 and min_size=2 and
two hosts fail, some pgs would have only 1 OSD online, but rebalancing
would start immediately since CRUSH can honor the size=3 constraint by
rebalancing. This means more nodes makes for a more reliable cluster.

i. If I wanted to force CRUSH to bring I/O back online with size=3 and
min_size=2 but only 2 hosts online, I could remove the host bucket from
the crushmap. CRUSH would then rebalance, but some PGs would likely end
up with 3 OSDs all on the same host. (This is theory. I promise not to
do any such thing to a production system ;)

Thanks
-- 
Adam Carheden


On 03/21/2017 11:48 AM, Wes Dillingham wrote:
> If you had set min_size to 1 you would not have seen the writes pause. a
> min_size of 1 is dangerous though because it means you are 1 hard disk
> failure away from losing the objects within that placement group
> entirely. a min_size of 2 is generally considered the minimum you want
> but many people ignore that advice, some wish they hadn't. 
> 
> On Tue, Mar 21, 2017 at 11:46 AM, Adam Carheden  > wrote:
> 
> Thanks everyone for the replies. Very informative. However, should I
> have expected writes to pause if I'd had min_size set to 1 instead of 2?
> 
> And yes, I was under the false impression that my rdb devices was a
> single object. That explains what all those other things are on a test
> cluster where I only created a single object!
> 
> 
> --
> Adam Carheden
> 
> On 03/20/2017 08:24 PM, Wes Dillingham wrote:
> > This is because of the min_size specification. I would bet you have it
> > set at 2 (which is good).
> >
> > ceph osd pool get rbd min_size
> >
> > With 4 hosts, and a size of 3, removing 2 of the hosts (or 2 drives 1
> > from each hosts) results in some of the objects only having 1 replica
> > min_size dictates that IO freezes for those objects until min_size is
> > achieved. 
> http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas
> 
> 
> >
> > I cant tell if your under the impression that your RBD device is a
> > single object. It is not. It is chunked up into many objects and spread
> > throughout the cluster, as Kjeti mentioned earlier.
> >
> > On Mon, Mar 20, 2017 at 8:48 PM, Kjetil Jørgensen  
> > >> wrote:
> >
> > Hi,
> >
> > rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents
> > will get you a "prefix", which then gets you on to
> > rbd_header., rbd_header.prefix contains block size,
> > striping, etc. The actual data bearing objects will be named
> > something like rbd_data.prefix.%-016x.
> >
> > Example - vm-100-disk-1 has the prefix 86ce2ae8944a, the first
> >  of that image will be named rbd_data.
> > 86ce2ae8944a., the second  will be
> > 86ce2ae8944a.0001, and so on, chances are that one of these
> > objects are mapped to a pg which has both host3 and host4 among it's
> > replicas.
> >
> > An rbd image will end up scattered across most/all osds of the pool
> > it's in.
> >
> > Cheers,
> > -KJ
> >
> > On Fri, Mar 17, 2017 at 12:30 PM, Adam Carheden  
> > >> wrote:
> >
> > I have a 4 node cluster shown by `ceph osd tree` below.
> Monitors are
> > running on hosts 1, 2 and 3. It has a single replicated
> pool of size
> > 3. I have a VM with its hard drive replicated to OSDs
> 11(host3),
> > 5(host1) and 3(host2).
> >
> > I can 'fail' any one host by disabling the SAN network
> interface and
> > the VM keeps running with a simple slowdown in I/O performance
>

[ceph-users] add multiple OSDs to cluster

2017-03-21 Thread mj

Hi,

Just a quick question about adding OSDs, since most of the docs I can 
find talk about adding ONE OSD, and I'd like to add four per server on 
my three-node cluster.


This morning I tried the careful approach, and added one OSD to server1. 
It all went fine, everything rebuilt and I have a HEALTH_OK again now. 
It took around 7 hours.


But now I started thinking... (and that's when things go wrong, 
therefore hoping for feedback here)


The question: was I being stupid to add only ONE osd to the server1? Is 
it not smarter to add all four OSDs at the same time?


I mean: things will rebuild anyway...and I have the feeling that 
rebuilding from 4 -> 8 OSDs is not going to be much heavier than 
rebuilding from 4 -> 5 OSDs. Right?


So better add all new OSDs together on a specific server?

Or not? :-)

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Maxime Guyot
Hi Vincent,

There is no buffering until the object reaches 8MB. When the object is written, 
it has a given size. RADOS just splits the object in K chunks, padding occurs 
if the object size is not a multiple of K.

See also: 
http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/developer_notes/

Cheers,
Maxime

From: ceph-users  on behalf of Vincent Godin 

Date: Tuesday 21 March 2017 17:16
To: ceph-users 
Subject: [ceph-users] Need erasure coding, pg and block size explanation

When we use a replicated pool of size 3 for example, each data, a block of 4MB 
is written on one PG which is distributed on 3 hosts (by default). The osd 
holding the primary will copy the block to OSDs holding the secondary and third 
PG.
With erasure code, let's take a raid5 schema like k=2 and m=1. Does Ceph buffer 
the data till it reach a amount of 8 MB which it can then divide into two 
blocks of 4MB and a parity control of 4MB  ? Does it just divide the data in 
two chunks whatever the size ? Will it use then PG1 on OSD.A  to store the 
first block, PG1 on OSD.X to store the second block of data and PG1 on OSD.z to 
store the parity ?
Thanks for your explanation because i didn't found any clear explanation on how 
data chunk and parity
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Richard Hesketh
On 21/03/17 17:48, Wes Dillingham wrote:
> a min_size of 1 is dangerous though because it means you are 1 hard disk 
> failure away from losing the objects within that placement group entirely. a 
> min_size of 2 is generally considered the minimum you want but many people 
> ignore that advice, some wish they hadn't. 

I admit I am having difficulty following why this is the case. From searching 
about I understand that the min_size parameter prevents I/O to a PG which does 
not have the required number of replicas, but the justification confuses me - 
if your min_size is one, and you have a PG which now only exists on one OSD, 
surely you are one OSD failure away from losing that PG entirely regardless of 
whether or not you are doing any I/O to it, as that's the last copy of your 
data? And the OSD itself likely serves many other placement groups which are 
above the min_size, so it is not as if freezing I/O on that PG prevents the 
actual disk from doing any activity which could possibly exacerbate a failure. 
Is the assumption that the other lost OSDs could be coming back with their old 
copy of the PG so any newer writes to the PG would be lost if you're unlucky 
enough that the last remaining OSD went down before the others came back? Which 
is not the same thing as losing the objects in that PG entirely, though 
obviously it's not at all ideal, and is also completely irrelevant if you know 
the other OSDs will not be coming back. I am sure I remember having to reduce 
min_size to 1 temporarily in the past to allow recovery from having two drives 
irrecoverably die at the same time in one of my clusters.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Wes Dillingham
Generally this means the monitor daemon is not running. Is the monitor
daemon running? The monitor daemon creates the admin socket in
/var/run/ceph/$socket

Elaborate on how you are attempting to deploy ceph.

On Tue, Mar 21, 2017 at 9:01 AM, Vince  wrote:

> Hi,
>
> I am getting the below error in messages after setting up ceph monitor.
>
> ===
> Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception getting
> command descriptions: [Errno 2] No such file or directory
> Mar 21 08:48:23 mon1 ceph-create-keys: INFO:ceph-create-keys:ceph-mon
> admin socket not ready yet.
> Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception getting
> command descriptions: [Errno 2] No such file or directory
> Mar 21 08:48:23 mon1 ceph-create-keys: INFO:ceph-create-keys:ceph-mon
> admin socket not ready yet.
> ===
>
> On checking the ceph-create-keys service status, getting the below error.
>
> ===
> [root@mon1 ~]# systemctl status ceph-create-keys@mon1.service
> ● ceph-create-keys@mon1.service - Ceph cluster key creator task
> Loaded: loaded (/usr/lib/systemd/system/ceph-create-keys@.service;
> static; vendor preset: disabled)
> Active: inactive (dead) since Thu 2017-02-16 10:47:14 PST; 1 months 2 days
> ago
> Condition: start condition failed at Tue 2017-03-21 05:47:42 PDT; 2s ago
> ConditionPathExists=!/var/lib/ceph/bootstrap-mds/ceph.keyring was not met
> Main PID: 2576 (code=exited, status=0/SUCCESS)
> ===
>
> Have anyone faced this error before ?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Wes Dillingham
If you had set min_size to 1 you would not have seen the writes pause. a
min_size of 1 is dangerous though because it means you are 1 hard disk
failure away from losing the objects within that placement group entirely.
a min_size of 2 is generally considered the minimum you want but many
people ignore that advice, some wish they hadn't.

On Tue, Mar 21, 2017 at 11:46 AM, Adam Carheden  wrote:

> Thanks everyone for the replies. Very informative. However, should I
> have expected writes to pause if I'd had min_size set to 1 instead of 2?
>
> And yes, I was under the false impression that my rdb devices was a
> single object. That explains what all those other things are on a test
> cluster where I only created a single object!
>
>
> --
> Adam Carheden
>
> On 03/20/2017 08:24 PM, Wes Dillingham wrote:
> > This is because of the min_size specification. I would bet you have it
> > set at 2 (which is good).
> >
> > ceph osd pool get rbd min_size
> >
> > With 4 hosts, and a size of 3, removing 2 of the hosts (or 2 drives 1
> > from each hosts) results in some of the objects only having 1 replica
> > min_size dictates that IO freezes for those objects until min_size is
> > achieved. http://docs.ceph.com/docs/jewel/rados/operations/pools/#
> set-the-number-of-object-replicas
> >
> > I cant tell if your under the impression that your RBD device is a
> > single object. It is not. It is chunked up into many objects and spread
> > throughout the cluster, as Kjeti mentioned earlier.
> >
> > On Mon, Mar 20, 2017 at 8:48 PM, Kjetil Jørgensen  > > wrote:
> >
> > Hi,
> >
> > rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents
> > will get you a "prefix", which then gets you on to
> > rbd_header., rbd_header.prefix contains block size,
> > striping, etc. The actual data bearing objects will be named
> > something like rbd_data.prefix.%-016x.
> >
> > Example - vm-100-disk-1 has the prefix 86ce2ae8944a, the first
> >  of that image will be named rbd_data.
> > 86ce2ae8944a., the second  will be
> > 86ce2ae8944a.0001, and so on, chances are that one of these
> > objects are mapped to a pg which has both host3 and host4 among it's
> > replicas.
> >
> > An rbd image will end up scattered across most/all osds of the pool
> > it's in.
> >
> > Cheers,
> > -KJ
> >
> > On Fri, Mar 17, 2017 at 12:30 PM, Adam Carheden  > > wrote:
> >
> > I have a 4 node cluster shown by `ceph osd tree` below. Monitors
> are
> > running on hosts 1, 2 and 3. It has a single replicated pool of
> size
> > 3. I have a VM with its hard drive replicated to OSDs 11(host3),
> > 5(host1) and 3(host2).
> >
> > I can 'fail' any one host by disabling the SAN network interface
> and
> > the VM keeps running with a simple slowdown in I/O performance
> > just as
> > expected. However, if 'fail' both nodes 3 and 4, I/O hangs on
> > the VM.
> > (i.e. `df` never completes, etc.) The monitors on hosts 1 and 2
> > still
> > have quorum, so that shouldn't be an issue. The placement group
> > still
> > has 2 of its 3 replicas online.
> >
> > Why does I/O hang even though host4 isn't running a monitor and
> > doesn't have anything to do with my VM's hard drive.
> >
> >
> > Size?
> > # ceph osd pool get rbd size
> > size: 3
> >
> > Where's rbd_id.vm-100-disk-1?
> > # ceph osd getmap -o /tmp/map && osdmaptool --pool 0
> > --test-map-object
> > rbd_id.vm-100-disk-1 /tmp/map
> > got osdmap epoch 1043
> > osdmaptool: osdmap file '/tmp/map'
> >  object 'rbd_id.vm-100-disk-1' -> 0.1ea -> [11,5,3]
> >
> > # ceph osd tree
> > ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> > -1 8.06160 root default
> > -7 5.50308 room A
> > -3 1.88754 host host1
> >  4 0.40369 osd.4   up  1.0  1.0
> >  5 0.40369 osd.5   up  1.0  1.0
> >  6 0.54008 osd.6   up  1.0  1.0
> >  7 0.54008 osd.7   up  1.0  1.0
> > -2 3.61554 host host2
> >  0 0.90388 osd.0   up  1.0  1.0
> >  1 0.90388 osd.1   up  1.0  1.0
> >  2 0.90388 osd.2   up  1.0  1.0
> >  3 0.90388 osd.3   up  1.0  1.0
> > -6 2.55852 room B
> > -4 1.75114 host host3
> >  8 0.40369 osd.8   up  1.0  1.0
> >  9 0.40369 osd.9   up  1.0  1.0
> > 10 0.40369 osd.10  up 

[ceph-users] Linux Fest NW CFP

2017-03-21 Thread Federico Lucifredi
Hello Ceph team,
  Linux Fest NorthWest's CFP is out. It is a bit too far for me to do
it as a day trip from Boston, but it would be nice if someone on the
pacific coast feels like giving a technical overview / architecture
session.

https://www.linuxfestnorthwest.org/2017/news/2017-call-presentations-open

Best -F

_
-- "'Problem' is a bleak word for challenge" - Richard Fish
(Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph rebalancing problem!!

2017-03-21 Thread Arturo N. Diaz Crespo
Hello, I have a small cluster of Ceph installed and I have followed the
manual installation instructions since I do not have internet.
I have configured the system with two network interfaces, one for the
client network and one for the cluster network.
The problem is that the system when it begins to rebalance through the
network of the cluster my client network falls to zero, that is, there is
no transfer in it.
What problem can this cause?

I attach the configuration of my ceph.conf and my network interfaces ..

Thank you

PD: ceph version  10.2.5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Vincent Godin
When we use a replicated pool of size 3 for example, each data, a block of
4MB is written on one PG which is distributed on 3 hosts (by default). The
osd holding the primary will copy the block to OSDs holding the secondary
and third PG.

With erasure code, let's take a raid5 schema like k=2 and m=1. Does Ceph
buffer the data till it reach a amount of 8 MB which it can then divide
into two blocks of 4MB and a parity control of 4MB  ? Does it just divide
the data in two chunks whatever the size ? Will it use then PG1 on OSD.A
to store the first block, PG1 on OSD.X to store the second block of data
and PG1 on OSD.z to store the parity ?

Thanks for your explanation because i didn't found any clear explanation on
how data chunk and parity
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephalocon 2017 CFP Open!

2017-03-21 Thread Patrick McGarry
Hey cephers,

For those of you that are interested in presenting, sponsoring, or
attending Cephalocon, all of those options are now available on the
Ceph site.

http://ceph.com/cephalocon2017/

If you have any questions, comments, or difficulties, feel free to let
me know. Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Loic Dachary
Hi Logan,

On 03/21/2017 03:27 PM, Logan Kuhn wrote:
> I like the idea
> 
> Being able to play around with different configuration options and using this 
> tool as a sanity checker or showing what will change as well as whether or 
> not the changes could cause health warn or health err.

The tool is offline and can only approximate what would happen. It only knows 
about the crushmap which misses a things that influence placement such as the 
osd reweight values stored in the osdmap.

> For example, if I were to change the replication level of a pool, how much 
> space would be left as well as an estimate for how long it would take to 
> rebalance.

It would be easy to display the percentage by which the usage will increase 
(although it's fairly straightforward to guess). The duration of the 
rebalancing depends on the throughput between the OSDs involved and that's 
unfortunately not available to Ceph right now.

> Benchmark capabilities, replication, crush changes, osd add/drop, node 
> add/drop, iops, read/write performance

It occurs to me that the estimated bandwidth between OSDs in the same machine, 
hosts in the same rack etc. would be a nice information to have in the crushmap.

Cheers

> Regards,
> Logan
> 
> - On Mar 21, 2017, at 6:58 AM, Xavier Villaneau 
>  wrote:
> 
> Hello all,
> 
> A few weeks ago Loïc Dachary presented his work on python-crush to the 
> ceph-devel list, but I don't think it's been done here yet. In a few words, 
> python-crush is a new Python 2 and 3 library / API for the CRUSH algorithm. 
> It also provides a CLI executable with a few built-in tools related to CRUSH 
> maps. If you want to try it, follow the instructions from its documentation 
> page:
> http://crush.readthedocs.io/en/latest/
> 
> Currently the crush CLI has two features:
>  - analyze: Get a estimation of how (un)evenly the objects will be placed 
> into your cluster
>  - compare: Get a summary of how much data would be moved around if the 
> map was changed
> Both these tools are very basic and have a few known caveats. But nothing 
> that cannot be fixed, the project is still young and open to suggestions and 
> contributions.
> 
> This is where we'd like to hear from the users' community feedback, given 
> everyone's experience in operating (or just messing around with) Ceph 
> clusters. What kind of CRUSH / data placement tools would be interesting to 
> have? Are there some very common architectural / technical questions related 
> to CRUSH that such tools would help answering? Any specific cases where such 
> a thing could have spared you some pain?
> 
> Here a few ideas on top of my head, to help with starting the discussion:
>  - Static analysis of the failure domains, with detection of potential 
> SPOFs
>  - Help to capacity planning, estimations of how much data could 
> practically be stored in a cluster
>  - Built-in basic scenarios for "compare" such as adding a node or 
> removing an OSD.
>  
> Please share your ideas, those will eventually help making a better tool!
> Regards,
> -- 
> Xavier Villaneau
> Software Engineer, working with Ceph during day and sometimes at night 
> too.
> Storage R&D at Concurrent Computer Corporation, Atlanta USA
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Adam Carheden
Thanks everyone for the replies. Very informative. However, should I
have expected writes to pause if I'd had min_size set to 1 instead of 2?

And yes, I was under the false impression that my rdb devices was a
single object. That explains what all those other things are on a test
cluster where I only created a single object!


-- 
Adam Carheden

On 03/20/2017 08:24 PM, Wes Dillingham wrote:
> This is because of the min_size specification. I would bet you have it
> set at 2 (which is good). 
> 
> ceph osd pool get rbd min_size
> 
> With 4 hosts, and a size of 3, removing 2 of the hosts (or 2 drives 1
> from each hosts) results in some of the objects only having 1 replica
> min_size dictates that IO freezes for those objects until min_size is
> achieved. 
> http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas
> 
> I cant tell if your under the impression that your RBD device is a
> single object. It is not. It is chunked up into many objects and spread
> throughout the cluster, as Kjeti mentioned earlier.
> 
> On Mon, Mar 20, 2017 at 8:48 PM, Kjetil Jørgensen  > wrote:
> 
> Hi,
> 
> rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents
> will get you a "prefix", which then gets you on to
> rbd_header., rbd_header.prefix contains block size,
> striping, etc. The actual data bearing objects will be named
> something like rbd_data.prefix.%-016x.
> 
> Example - vm-100-disk-1 has the prefix 86ce2ae8944a, the first
>  of that image will be named rbd_data.
> 86ce2ae8944a., the second  will be
> 86ce2ae8944a.0001, and so on, chances are that one of these
> objects are mapped to a pg which has both host3 and host4 among it's
> replicas.
> 
> An rbd image will end up scattered across most/all osds of the pool
> it's in.
> 
> Cheers,
> -KJ
> 
> On Fri, Mar 17, 2017 at 12:30 PM, Adam Carheden  > wrote:
> 
> I have a 4 node cluster shown by `ceph osd tree` below. Monitors are
> running on hosts 1, 2 and 3. It has a single replicated pool of size
> 3. I have a VM with its hard drive replicated to OSDs 11(host3),
> 5(host1) and 3(host2).
> 
> I can 'fail' any one host by disabling the SAN network interface and
> the VM keeps running with a simple slowdown in I/O performance
> just as
> expected. However, if 'fail' both nodes 3 and 4, I/O hangs on
> the VM.
> (i.e. `df` never completes, etc.) The monitors on hosts 1 and 2
> still
> have quorum, so that shouldn't be an issue. The placement group
> still
> has 2 of its 3 replicas online.
> 
> Why does I/O hang even though host4 isn't running a monitor and
> doesn't have anything to do with my VM's hard drive.
> 
> 
> Size?
> # ceph osd pool get rbd size
> size: 3
> 
> Where's rbd_id.vm-100-disk-1?
> # ceph osd getmap -o /tmp/map && osdmaptool --pool 0
> --test-map-object
> rbd_id.vm-100-disk-1 /tmp/map
> got osdmap epoch 1043
> osdmaptool: osdmap file '/tmp/map'
>  object 'rbd_id.vm-100-disk-1' -> 0.1ea -> [11,5,3]
> 
> # ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 8.06160 root default
> -7 5.50308 room A
> -3 1.88754 host host1
>  4 0.40369 osd.4   up  1.0  1.0
>  5 0.40369 osd.5   up  1.0  1.0
>  6 0.54008 osd.6   up  1.0  1.0
>  7 0.54008 osd.7   up  1.0  1.0
> -2 3.61554 host host2
>  0 0.90388 osd.0   up  1.0  1.0
>  1 0.90388 osd.1   up  1.0  1.0
>  2 0.90388 osd.2   up  1.0  1.0
>  3 0.90388 osd.3   up  1.0  1.0
> -6 2.55852 room B
> -4 1.75114 host host3
>  8 0.40369 osd.8   up  1.0  1.0
>  9 0.40369 osd.9   up  1.0  1.0
> 10 0.40369 osd.10  up  1.0  1.0
> 11 0.54008 osd.11  up  1.0  1.0
> -5 0.80737 host host4
> 12 0.40369 osd.12  up  1.0  1.0
> 13 0.40369 osd.13  up  1.0  1.0
> 
> 
> --
> Adam Carheden
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

[ceph-users] idea about optimize an osd rebuild

2017-03-21 Thread Vincent Godin
when you replace a failed osd, it has to recover all of its pgs and so it
is pretty busy. Is it possible to tell the OSD to not become primary for
any of its already synchronized pgs till every pgs (of the OSD) have
recover ? It should accelerate the rebuild process because the OSD won't
have to serve client's read requests but just sync the writes request from
primary pgs and recover. Is this stupid or is there something wrong with
this idea ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Logan Kuhn
I like the idea 

Being able to play around with different configuration options and using this 
tool as a sanity checker or showing what will change as well as whether or not 
the changes could cause health warn or health err. 

For example, if I were to change the replication level of a pool, how much 
space would be left as well as an estimate for how long it would take to 
rebalance. 

Benchmark capabilities, replication, crush changes, osd add/drop, node 
add/drop, iops, read/write performance 

Regards, 
Logan 

- On Mar 21, 2017, at 6:58 AM, Xavier Villaneau  
wrote: 

| Hello all,

| A few weeks ago Loïc Dachary presented his work on python-crush to the
| ceph-devel list, but I don't think it's been done here yet. In a few words,
| python-crush is a new Python 2 and 3 library / API for the CRUSH algorithm. It
| also provides a CLI executable with a few built-in tools related to CRUSH 
maps.
| If you want to try it, follow the instructions from its documentation page:
| [ http://crush.readthedocs.io/en/latest/ |
| http://crush.readthedocs.io/en/latest/ ]

| Currently the crush CLI has two features:
| - analyze: Get a estimation of how (un)evenly the objects will be placed into
| your cluster
| - compare: Get a summary of how much data would be moved around if the map was
| changed
| Both these tools are very basic and have a few known caveats. But nothing that
| cannot be fixed, the project is still young and open to suggestions and
| contributions.

| This is where we'd like to hear from the users' community feedback, given
| everyone's experience in operating (or just messing around with) Ceph 
clusters.
| What kind of CRUSH / data placement tools would be interesting to have? Are
| there some very common architectural / technical questions related to CRUSH
| that such tools would help answering? Any specific cases where such a thing
| could have spared you some pain?

| Here a few ideas on top of my head, to help with starting the discussion:
| - Static analysis of the failure domains, with detection of potential SPOFs
| - Help to capacity planning, estimations of how much data could practically be
| stored in a cluster
| - Built-in basic scenarios for "compare" such as adding a node or removing an
| OSD.

| Please share your ideas, those will eventually help making a better tool!
| Regards,
| --
| Xavier Villaneau
| Software Engineer, working with Ceph during day and sometimes at night too.
| Storage R&D at Concurrent Computer Corporation, Atlanta USA

| ___
| ceph-users mailing list
| ceph-users@lists.ceph.com
| http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recompiling source code - to find exact RPM

2017-03-21 Thread nokia ceph
Hello,

I made some changes in the below file on ceph kraken v11.2.0 source code as
per this article

https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken

..src/osd/PG.cc
..src/osd/PG.h

Is there any way to find which rpm got affected by these two files. I
believe it should be ceph-osd-11.2.0-0.el7.x86_64.rpm . Can you confirm
please ?

I failed to find it from the ceph.spec file.

Could anyone please guide me the right procedure to check this.

The main intention is that if we find the exact rpm affected by these
files, we can simply overwrite it with the old rpm.

Awaiting for comments.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Day Warsaw (25 Apr)

2017-03-21 Thread Patrick McGarry
Hey cephers,

We have no finalized the details for Ceph Day Warsaw (see
http://ceph.com/cephdays) and as a result, we need speakers!

If you would be interested in sharing some of your experiences or work
around Ceph please let me know as soon as possible. Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Alexey Sheplyakov
> i read from an OpenStack/Ceph tuning document and when you set this parameter 
> to true you could get better performance for block level storage

Only if you copy sparse rbd images a lot, however in practice sparse
rbd images are rare.
On the other hand fiemap is somewhat counter-intuitive and tricky to
use correctly,
hence quite a number of fiemap related bugs have lurked into the code
(not only in ceph).
That's why ceph no longer uses fiemap by default.

> Is it completely safe to directly change it from true to false and restart 
> Ceph deamons in order.

Disabling fiemap and restarting ceph-osd's should be pretty safe
(usual precautions still apply:
it's wise to set noout flag before restarting OSDs, and restart them
one by one giving each OSD
enough time to perform peering and recovery).

Best regards,
  Alexey


On Tue, Mar 21, 2017 at 5:00 PM, Özhan Rüzgar Karaman
 wrote:
> Hi Alexey;
> 1 year ago, i read from an OpenStack/Ceph tuning document and when you set
> this parameter to true you could get better performance for block level
> storage.
>
> Is it completely safe to directly change it from true to false and restart
> Ceph deamons in order.
>
> Thanks for all your support.
>
> Özhan
>
>
> On Tue, Mar 21, 2017 at 3:27 PM, Alexey Sheplyakov
>  wrote:
>>
>> Hi,
>>
>> This looks like a bug [1]. You can work it around by disabling the
>> fiemap feature, like this:
>>
>> [osd]
>> filestore fiemap = false
>>
>> Fiemap should have been disabled by default, perhaps you've explicitly
>> enabled it?
>>
>> [1] http://tracker.ceph.com/issues/19323
>>
>> Best regards,
>>   Alexey
>>
>> On Tue, Mar 21, 2017 at 12:21 PM, Özhan Rüzgar Karaman
>>  wrote:
>> > Hi Wido;
>> > After 30 minutes osd id 3 crashed also with segmentation fault, i
>> > uploaded
>> > logs again to the same location as ceph.log.wido.20170321-3.tgz. So now
>> > all
>> > OSD deamons on that server is crashed.
>> >
>> > Thanks
>> > Özhan
>> >
>> > On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman
>> >  wrote:
>> >>
>> >> Hi Wido;
>> >> At weekend i roll back all servers to 0.94.9-1 version and all worked
>> >> fine
>> >> with old release.
>> >>
>> >> Today i upgraded all monitor servers and 1 osd server to 0.94.10-1
>> >> version. All OSD servers has 2 osds. I update the ceph.conf on the osd
>> >> server removed debug lines and restart osd daemons.
>> >>
>> >> This time osd id 3 started and operated successfully but osd id 2
>> >> failed
>> >> again with same segmentation fault.
>> >>
>> >> I have uploaded new logs as to the same destination as
>> >> ceph.log.wido.20170321-2.tgz and its link is below again.
>> >>
>> >>
>> >>
>> >> https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing
>> >>
>> >> Thanks for all your help.
>> >>
>> >> Özhan
>> >>
>> >>
>> >> On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander 
>> >> wrote:
>> >>>
>> >>>
>> >>> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman
>> >>> > :
>> >>> >
>> >>> >
>> >>> > Hi;
>> >>> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to
>> >>> > 0.94.10.
>> >>> > All monitor servers upgraded successfully but i experience problems
>> >>> > on
>> >>> > starting upgraded OSD daemons.
>> >>> >
>> >>> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives
>> >>> > Segmentation Fault and it kills after 2-3 minutes. To clarify the
>> >>> > issue
>> >>> > i
>> >>> > have role backed Ceph packages on that OSD Server  back to 0.94.9
>> >>> > and
>> >>> > problematic servers could rejoin to the 0.94.10 cluster.
>> >>> >
>> >>> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x
>> >>> > kernel
>> >>> > and i am using standard packages from
>> >>> > http://eu.ceph.com/debian-hammer
>> >>> > nothing special on my environment.
>> >>> >
>> >>> > I have uploaded the Ceph OSD Logs to the

[ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Vince

Hi,

I am getting the below error in messages after setting up ceph monitor.

===
Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception getting 
command descriptions: [Errno 2] No such file or directory
Mar 21 08:48:23 mon1 ceph-create-keys: INFO:ceph-create-keys:ceph-mon 
admin socket not ready yet.
Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception getting 
command descriptions: [Errno 2] No such file or directory
Mar 21 08:48:23 mon1 ceph-create-keys: INFO:ceph-create-keys:ceph-mon 
admin socket not ready yet.

===

On checking the ceph-create-keys service status, getting the below error.

===
[root@mon1 ~]# systemctl status ceph-create-keys@mon1.service 

● ceph-create-keys@mon1.service  - 
Ceph cluster key creator task
Loaded: loaded (/usr/lib/systemd/system/ceph-create-keys@.service; 
static; vendor preset: disabled)
Active: inactive (dead) since Thu 2017-02-16 10:47:14 PST; 1 months 2 
days ago

Condition: start condition failed at Tue 2017-03-21 05:47:42 PDT; 2s ago
ConditionPathExists=!/var/lib/ceph/bootstrap-mds/ceph.keyring was not met
Main PID: 2576 (code=exited, status=0/SUCCESS)
===

Have anyone faced this error before ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Özhan Rüzgar Karaman
Hi Alexey;
1 year ago, i read from an OpenStack/Ceph tuning document and when you set
this parameter to true you could get better performance for block level
storage.

Is it completely safe to directly change it from true to false and restart
Ceph deamons in order.

Thanks for all your support.

Özhan


On Tue, Mar 21, 2017 at 3:27 PM, Alexey Sheplyakov  wrote:

> Hi,
>
> This looks like a bug [1]. You can work it around by disabling the
> fiemap feature, like this:
>
> [osd]
> filestore fiemap = false
>
> Fiemap should have been disabled by default, perhaps you've explicitly
> enabled it?
>
> [1] http://tracker.ceph.com/issues/19323
>
> Best regards,
>   Alexey
>
> On Tue, Mar 21, 2017 at 12:21 PM, Özhan Rüzgar Karaman
>  wrote:
> > Hi Wido;
> > After 30 minutes osd id 3 crashed also with segmentation fault, i
> uploaded
> > logs again to the same location as ceph.log.wido.20170321-3.tgz. So now
> all
> > OSD deamons on that server is crashed.
> >
> > Thanks
> > Özhan
> >
> > On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman
> >  wrote:
> >>
> >> Hi Wido;
> >> At weekend i roll back all servers to 0.94.9-1 version and all worked
> fine
> >> with old release.
> >>
> >> Today i upgraded all monitor servers and 1 osd server to 0.94.10-1
> >> version. All OSD servers has 2 osds. I update the ceph.conf on the osd
> >> server removed debug lines and restart osd daemons.
> >>
> >> This time osd id 3 started and operated successfully but osd id 2 failed
> >> again with same segmentation fault.
> >>
> >> I have uploaded new logs as to the same destination as
> >> ceph.log.wido.20170321-2.tgz and its link is below again.
> >>
> >>
> >> https://drive.google.com/drive/folders/0B_
> hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing
> >>
> >> Thanks for all your help.
> >>
> >> Özhan
> >>
> >>
> >> On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander 
> wrote:
> >>>
> >>>
> >>> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman
> >>> > :
> >>> >
> >>> >
> >>> > Hi;
> >>> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to
> >>> > 0.94.10.
> >>> > All monitor servers upgraded successfully but i experience problems
> on
> >>> > starting upgraded OSD daemons.
> >>> >
> >>> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives
> >>> > Segmentation Fault and it kills after 2-3 minutes. To clarify the
> issue
> >>> > i
> >>> > have role backed Ceph packages on that OSD Server  back to 0.94.9 and
> >>> > problematic servers could rejoin to the 0.94.10 cluster.
> >>> >
> >>> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x
> >>> > kernel
> >>> > and i am using standard packages from http://eu.ceph.com/debian-
> hammer
> >>> > nothing special on my environment.
> >>> >
> >>> > I have uploaded the Ceph OSD Logs to the link below.
> >>> >
> >>> >
> >>> > https://drive.google.com/drive/folders/0B_
> hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing
> >>> >
> >>> > And my ceph.conf is below
> >>> >
> >>> > [global]
> >>> > fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e
> >>> > mon_initial_members = mont1, mont2, mont3
> >>> > mon_host = 172.16.51.101,172.16.51.102,172.16.51.103
> >>> > auth_cluster_required = cephx
> >>> > auth_service_required = cephx
> >>> > auth_client_required = cephx
> >>> > filestore_xattr_use_omap = true
> >>> > public_network = 172.16.51.0/24
> >>> > cluster_network = 172.16.51.0/24
> >>> > debug_ms = 0/0
> >>> > debug_auth = 0/0
> >>> >
> >>> > [mon]
> >>> > mon_allow_pool_delete = false
> >>> > mon_osd_down_out_interval = 300
> >>> > osd_pool_default_flag_nodelete = true
> >>> >
> >>> > [osd]
> >>> > filestore_max_sync_interval = 15
> >>> > filestore_fiemap = true
> >>> > osd_max_backfills = 1
> >>> > osd_backfill_scan_min = 16
> >>> > osd_backfill_scan_max = 128
> >>> > osd_max_scrubs = 1
> >>> > osd_scrub_sleep = 1
> >>> > osd_scrub_chunk_min = 2
> >>> > osd_scrub_chunk_max = 16
> >>> > debug_osd = 0/0
> >>> > debug_filestore = 0/0
> >>> > debug_rbd = 0/0
> >>> > debug_rados = 0/0
> >>> > debug_journal = 0/0
> >>> > debug_journaler = 0/0
> >>>
> >>> Can you try without all the debug_* lines and see what the log then
> >>> yields?
> >>>
> >>> It's crashing on something which isn't logged now.
> >>>
> >>> Wido
> >>>
> >>> >
> >>> > Thanks for all help.
> >>> >
> >>> > Özhan
> >>> > ___
> >>> > ceph-users mailing list
> >>> > ceph-users@lists.ceph.com
> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Alexey Sheplyakov
Hi,

This looks like a bug [1]. You can work it around by disabling the
fiemap feature, like this:

[osd]
filestore fiemap = false

Fiemap should have been disabled by default, perhaps you've explicitly
enabled it?

[1] http://tracker.ceph.com/issues/19323

Best regards,
  Alexey

On Tue, Mar 21, 2017 at 12:21 PM, Özhan Rüzgar Karaman
 wrote:
> Hi Wido;
> After 30 minutes osd id 3 crashed also with segmentation fault, i uploaded
> logs again to the same location as ceph.log.wido.20170321-3.tgz. So now all
> OSD deamons on that server is crashed.
>
> Thanks
> Özhan
>
> On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman
>  wrote:
>>
>> Hi Wido;
>> At weekend i roll back all servers to 0.94.9-1 version and all worked fine
>> with old release.
>>
>> Today i upgraded all monitor servers and 1 osd server to 0.94.10-1
>> version. All OSD servers has 2 osds. I update the ceph.conf on the osd
>> server removed debug lines and restart osd daemons.
>>
>> This time osd id 3 started and operated successfully but osd id 2 failed
>> again with same segmentation fault.
>>
>> I have uploaded new logs as to the same destination as
>> ceph.log.wido.20170321-2.tgz and its link is below again.
>>
>>
>> https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing
>>
>> Thanks for all your help.
>>
>> Özhan
>>
>>
>> On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander  wrote:
>>>
>>>
>>> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman
>>> > :
>>> >
>>> >
>>> > Hi;
>>> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to
>>> > 0.94.10.
>>> > All monitor servers upgraded successfully but i experience problems on
>>> > starting upgraded OSD daemons.
>>> >
>>> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives
>>> > Segmentation Fault and it kills after 2-3 minutes. To clarify the issue
>>> > i
>>> > have role backed Ceph packages on that OSD Server  back to 0.94.9 and
>>> > problematic servers could rejoin to the 0.94.10 cluster.
>>> >
>>> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x
>>> > kernel
>>> > and i am using standard packages from http://eu.ceph.com/debian-hammer
>>> > nothing special on my environment.
>>> >
>>> > I have uploaded the Ceph OSD Logs to the link below.
>>> >
>>> >
>>> > https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=sharing
>>> >
>>> > And my ceph.conf is below
>>> >
>>> > [global]
>>> > fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e
>>> > mon_initial_members = mont1, mont2, mont3
>>> > mon_host = 172.16.51.101,172.16.51.102,172.16.51.103
>>> > auth_cluster_required = cephx
>>> > auth_service_required = cephx
>>> > auth_client_required = cephx
>>> > filestore_xattr_use_omap = true
>>> > public_network = 172.16.51.0/24
>>> > cluster_network = 172.16.51.0/24
>>> > debug_ms = 0/0
>>> > debug_auth = 0/0
>>> >
>>> > [mon]
>>> > mon_allow_pool_delete = false
>>> > mon_osd_down_out_interval = 300
>>> > osd_pool_default_flag_nodelete = true
>>> >
>>> > [osd]
>>> > filestore_max_sync_interval = 15
>>> > filestore_fiemap = true
>>> > osd_max_backfills = 1
>>> > osd_backfill_scan_min = 16
>>> > osd_backfill_scan_max = 128
>>> > osd_max_scrubs = 1
>>> > osd_scrub_sleep = 1
>>> > osd_scrub_chunk_min = 2
>>> > osd_scrub_chunk_max = 16
>>> > debug_osd = 0/0
>>> > debug_filestore = 0/0
>>> > debug_rbd = 0/0
>>> > debug_rados = 0/0
>>> > debug_journal = 0/0
>>> > debug_journaler = 0/0
>>>
>>> Can you try without all the debug_* lines and see what the log then
>>> yields?
>>>
>>> It's crashing on something which isn't logged now.
>>>
>>> Wido
>>>
>>> >
>>> > Thanks for all help.
>>> >
>>> > Özhan
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Xavier Villaneau
Hello all,

A few weeks ago Loïc Dachary presented his work on python-crush to the
ceph-devel list, but I don't think it's been done here yet. In a few words,
python-crush is a new Python 2 and 3 library / API for the CRUSH algorithm.
It also provides a CLI executable with a few built-in tools related to
CRUSH maps. If you want to try it, follow the instructions from its
documentation page:
http://crush.readthedocs.io/en/latest/

Currently the crush CLI has two features:
 - analyze: Get a estimation of how (un)evenly the objects will be placed
into your cluster
 - compare: Get a summary of how much data would be moved around if the map
was changed
Both these tools are very basic and have a few known caveats. But nothing
that cannot be fixed, the project is still young and open to suggestions
and contributions.

This is where we'd like to hear from the users' community feedback, given
everyone's experience in operating (or just messing around with) Ceph
clusters. What kind of CRUSH / data placement tools would be interesting to
have? Are there some very common architectural / technical questions
related to CRUSH that such tools would help answering? Any specific cases
where such a thing could have spared you some pain?

Here a few ideas on top of my head, to help with starting the discussion:
 - Static analysis of the failure domains, with detection of potential SPOFs
 - Help to capacity planning, estimations of how much data could
practically be stored in a cluster
 - Built-in basic scenarios for "compare" such as adding a node or removing
an OSD.

Please share your ideas, those will eventually help making a better tool!
Regards,
-- 
Xavier Villaneau
Software Engineer, working with Ceph during day and sometimes at night too.
Storage R&D at Concurrent Computer Corporation, Atlanta USA
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] krbd exclusive-lock

2017-03-21 Thread Mikaël Cluseau
Hi,

There's something I don't understand about the exclusive-lock feature.

I created an image:

$ ssh host-3
Container Linux by CoreOS stable (1298.6.0)
Update Strategy: No Reboots
host-3 ~ # uname -a
Linux host-3 4.9.9-coreos-r1 #1 SMP Tue Mar 14 21:09:42 UTC 2017 x86_64 
Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux
host-3 ~ # rbd create test-xlock --size=1024 --image-feature exclusive-lock
host-3 ~ # rbd feature enable test-xlock exclusive-lock
rbd: failed to update image features: (22) Invalid argument2017-03-21 
10:16:50.911598 7f6975ff0100 -1 librbd: one or more requested features are 
already enabled

I mapped it

host-3 ~ # rbd map --options lock_on_read test-xlock
/dev/rbd0

I also could map it from another host (disappointment started here):

host-2 ~ # rbd map --options lock_on_read test-xlock
/dev/rbd9

I can read from both host:

host-2 ~ # dd if=/dev/rbd9 of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.61844 s, 232 MB/s

host-3 ~ # dd if=/dev/rbd0 of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.33527 s, 322 MB/s

And also write:

host-3 ~ # dd if=/dev/urandom of=/dev/rbd0 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0153388 s, 68.4 MB/s

host-2 ~ # dd if=/dev/urandom of=/dev/rbd9 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0312396 s, 33.6 MB/s

Isn't exclusive-lock supposed to forbid at least concurrent writes?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stale during patching

2017-03-21 Thread Laszlo Budai

Hello,

we have been patching our ceph cluster 0.94.7 to 0.94.10. We were updating one 
node at a time, and after each OSD node has been rebooted we were waiting for 
the cluster health status to be OK.
In the docs we have "stale - The placement group status has not been updated by a 
ceph-osd, indicating that all nodes storing this placement group may be down." This 
wasn't the case, because there was only one node down at a time.

Is it normal to see stale PGs during this procedure?

Kind regards,
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Özhan Rüzgar Karaman
Hi Wido;
After 30 minutes osd id 3 crashed also with segmentation fault, i uploaded
logs again to the same location as ceph.log.wido.20170321-3.tgz. So now all
OSD deamons on that server is crashed.

Thanks
Özhan

On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman <
oruzgarkara...@gmail.com> wrote:

> Hi Wido;
> At weekend i roll back all servers to 0.94.9-1 version and all worked fine
> with old release.
>
> Today i upgraded all monitor servers and 1 osd server to 0.94.10-1
> version. All OSD servers has 2 osds. I update the ceph.conf on the osd
> server removed debug lines and restart osd daemons.
>
> This time osd id 3 started and operated successfully but osd id 2 failed
> again with same segmentation fault.
>
> I have uploaded new logs as to the same destination
> as ceph.log.wido.20170321-2.tgz and its link is below again.
>
> https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5Y
> Unh6UE0?usp=sharing
>
> Thanks for all your help.
>
> Özhan
>
>
> On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander  wrote:
>
>>
>> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman <
>> oruzgarkara...@gmail.com>:
>> >
>> >
>> > Hi;
>> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to
>> 0.94.10.
>> > All monitor servers upgraded successfully but i experience problems on
>> > starting upgraded OSD daemons.
>> >
>> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives
>> > Segmentation Fault and it kills after 2-3 minutes. To clarify the issue
>> i
>> > have role backed Ceph packages on that OSD Server  back to 0.94.9 and
>> > problematic servers could rejoin to the 0.94.10 cluster.
>> >
>> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x
>> kernel
>> > and i am using standard packages from http://eu.ceph.com/debian-hammer
>> > nothing special on my environment.
>> >
>> > I have uploaded the Ceph OSD Logs to the link below.
>> >
>> > https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5Y
>> Unh6UE0?usp=sharing
>> >
>> > And my ceph.conf is below
>> >
>> > [global]
>> > fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e
>> > mon_initial_members = mont1, mont2, mont3
>> > mon_host = 172.16.51.101,172.16.51.102,172.16.51.103
>> > auth_cluster_required = cephx
>> > auth_service_required = cephx
>> > auth_client_required = cephx
>> > filestore_xattr_use_omap = true
>> > public_network = 172.16.51.0/24
>> > cluster_network = 172.16.51.0/24
>> > debug_ms = 0/0
>> > debug_auth = 0/0
>> >
>> > [mon]
>> > mon_allow_pool_delete = false
>> > mon_osd_down_out_interval = 300
>> > osd_pool_default_flag_nodelete = true
>> >
>> > [osd]
>> > filestore_max_sync_interval = 15
>> > filestore_fiemap = true
>> > osd_max_backfills = 1
>> > osd_backfill_scan_min = 16
>> > osd_backfill_scan_max = 128
>> > osd_max_scrubs = 1
>> > osd_scrub_sleep = 1
>> > osd_scrub_chunk_min = 2
>> > osd_scrub_chunk_max = 16
>> > debug_osd = 0/0
>> > debug_filestore = 0/0
>> > debug_rbd = 0/0
>> > debug_rados = 0/0
>> > debug_journal = 0/0
>> > debug_journaler = 0/0
>>
>> Can you try without all the debug_* lines and see what the log then
>> yields?
>>
>> It's crashing on something which isn't logged now.
>>
>> Wido
>>
>> >
>> > Thanks for all help.
>> >
>> > Özhan
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Özhan Rüzgar Karaman
Hi Wido;
At weekend i roll back all servers to 0.94.9-1 version and all worked fine
with old release.

Today i upgraded all monitor servers and 1 osd server to 0.94.10-1 version.
All OSD servers has 2 osds. I update the ceph.conf on the osd server
removed debug lines and restart osd daemons.

This time osd id 3 started and operated successfully but osd id 2 failed
again with same segmentation fault.

I have uploaded new logs as to the same destination
as ceph.log.wido.20170321-2.tgz and its link is below again.

https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=
sharing

Thanks for all your help.

Özhan


On Sun, Mar 19, 2017 at 8:47 PM, Wido den Hollander  wrote:

>
> > Op 17 maart 2017 om 8:39 schreef Özhan Rüzgar Karaman <
> oruzgarkara...@gmail.com>:
> >
> >
> > Hi;
> > Yesterday i started to upgrade my Ceph environment from 0.94.9 to
> 0.94.10.
> > All monitor servers upgraded successfully but i experience problems on
> > starting upgraded OSD daemons.
> >
> > When i try to start an Ceph OSD Daemon(/usr/bin/ceph-osd) receives
> > Segmentation Fault and it kills after 2-3 minutes. To clarify the issue i
> > have role backed Ceph packages on that OSD Server  back to 0.94.9 and
> > problematic servers could rejoin to the 0.94.10 cluster.
> >
> > My environment is standard 14.04.5 Ubuntu Trusty server with 4.4.x kernel
> > and i am using standard packages from http://eu.ceph.com/debian-hammer
> > nothing special on my environment.
> >
> > I have uploaded the Ceph OSD Logs to the link below.
> >
> > https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp=
> sharing
> >
> > And my ceph.conf is below
> >
> > [global]
> > fsid = a3742d34-9b51-4a36-bf56-4defb62b2b8e
> > mon_initial_members = mont1, mont2, mont3
> > mon_host = 172.16.51.101,172.16.51.102,172.16.51.103
> > auth_cluster_required = cephx
> > auth_service_required = cephx
> > auth_client_required = cephx
> > filestore_xattr_use_omap = true
> > public_network = 172.16.51.0/24
> > cluster_network = 172.16.51.0/24
> > debug_ms = 0/0
> > debug_auth = 0/0
> >
> > [mon]
> > mon_allow_pool_delete = false
> > mon_osd_down_out_interval = 300
> > osd_pool_default_flag_nodelete = true
> >
> > [osd]
> > filestore_max_sync_interval = 15
> > filestore_fiemap = true
> > osd_max_backfills = 1
> > osd_backfill_scan_min = 16
> > osd_backfill_scan_max = 128
> > osd_max_scrubs = 1
> > osd_scrub_sleep = 1
> > osd_scrub_chunk_min = 2
> > osd_scrub_chunk_max = 16
> > debug_osd = 0/0
> > debug_filestore = 0/0
> > debug_rbd = 0/0
> > debug_rados = 0/0
> > debug_journal = 0/0
> > debug_journaler = 0/0
>
> Can you try without all the debug_* lines and see what the log then yields?
>
> It's crashing on something which isn't logged now.
>
> Wido
>
> >
> > Thanks for all help.
> >
> > Özhan
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com