[ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
Hi all,

I am trying to remove several rbd images from the cluster.
Unfortunately, that doesn't work:

$ rbd info foo
rbd image 'foo':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.919443.238e1f29
format: 1


$ rbd rm foo
2015-07-29 10:25:01.438296 7f868d330760 -1 librbd: image has watchers -
not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try
again after closing/unmapping it or waiting 30s for the crashed client
to timeout.

$ rados -p rbd listwatchers foo
error listing watchers rbd/foo: (2) No such file or directory

Well, that is quite frustrating. The image was mapped on one host, where
I was unmapping it. What do I have to do to get rid of it?

We are using ceph version 0.87.2

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] small cluster reboot fail

2015-07-29 Thread pixelfairy
have a small test cluster (vmware fusion, 3 mon+osd nodes) all run ubuntu
trusty. tried rebooting all 3 nodes and this happend.

root@ubuntu:~# ceph --version ceph version 0.94.2
(5fb85614ca8f354284c713a2f9c610860720bbf3)

root@ubuntu:~# ceph health 2015-07-29 02:08:31.360516 7f5bd711a700 -1
asok(0x7f5bdbf0) AdminSocketConfigObs::init: failed:
AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to
'/var/run/ceph/rbd-clients/ceph-client.admin.3282.140032308415712.asok':
(2) No such file or directory HEALTH_WARN 64 pgs stuck unclean; recovery
512/1024 objects misplaced (50.000%); too few PGs per OSD (21 < min 30)

the osd disks are only 50gigs but they seemed to work fine before the
reboot.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD Image

2015-07-29 Thread Ilya Dryomov
On Wed, Jul 29, 2015 at 11:30 AM, Christian Eichelmann
 wrote:
> Hi all,
>
> I am trying to remove several rbd images from the cluster.
> Unfortunately, that doesn't work:
>
> $ rbd info foo
> rbd image 'foo':
> size 1024 GB in 262144 objects
> order 22 (4096 kB objects)
> block_name_prefix: rb.0.919443.238e1f29
> format: 1
>
>
> $ rbd rm foo
> 2015-07-29 10:25:01.438296 7f868d330760 -1 librbd: image has watchers -
> not removing
> Removing image: 0% complete...failed.
> rbd: error: image still has watchers
> This means the image is still open or the client using it crashed. Try
> again after closing/unmapping it or waiting 30s for the crashed client
> to timeout.
>
> $ rados -p rbd listwatchers foo
> error listing watchers rbd/foo: (2) No such file or directory

For a format 1 image, you need to do

$ rados -p rbd listwatchers foo.rbd

"rbd status" command was recently introduced to abstract this, but it's
not in 0.87.

>
> Well, that is quite frustrating. The image was mapped on one host, where
> I was unmapping it. What do I have to do to get rid of it?

Did you unmap the image?  What is the output of "rbd showmapped" on the
host you had it mapped?  Is there anything rbd or ceph related in dmesg on
that host?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
Hi Ilya,

that worked for me and actually pointed out that one of my collegues
currently had the rbd pool locally mounted via fuse-rbd, which obviously
locks all images in this pool. Problem solved! Thanks!

Regards,
Christian

Am 29.07.2015 um 11:48 schrieb Ilya Dryomov:
> On Wed, Jul 29, 2015 at 11:30 AM, Christian Eichelmann
>  wrote:
>> Hi all,
>>
>> I am trying to remove several rbd images from the cluster.
>> Unfortunately, that doesn't work:
>>
>> $ rbd info foo
>> rbd image 'foo':
>> size 1024 GB in 262144 objects
>> order 22 (4096 kB objects)
>> block_name_prefix: rb.0.919443.238e1f29
>> format: 1
>>
>>
>> $ rbd rm foo
>> 2015-07-29 10:25:01.438296 7f868d330760 -1 librbd: image has watchers -
>> not removing
>> Removing image: 0% complete...failed.
>> rbd: error: image still has watchers
>> This means the image is still open or the client using it crashed. Try
>> again after closing/unmapping it or waiting 30s for the crashed client
>> to timeout.
>>
>> $ rados -p rbd listwatchers foo
>> error listing watchers rbd/foo: (2) No such file or directory
> 
> For a format 1 image, you need to do
> 
> $ rados -p rbd listwatchers foo.rbd
> 
> "rbd status" command was recently introduced to abstract this, but it's
> not in 0.87.
> 
>>
>> Well, that is quite frustrating. The image was mapped on one host, where
>> I was unmapping it. What do I have to do to get rid of it?
> 
> Did you unmap the image?  What is the output of "rbd showmapped" on the
> host you had it mapped?  Is there anything rbd or ceph related in dmesg on
> that host?
> 
> Thanks,
> 
> Ilya
> 


-- 
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM usage values

2015-07-29 Thread Kenneth Waegeman



On 07/28/2015 04:04 PM, Dan van der Ster wrote:

On Tue, Jul 28, 2015 at 12:07 PM, Gregory Farnum  wrote:

On Tue, Jul 28, 2015 at 11:00 AM, Kenneth Waegeman
 wrote:



On 07/17/2015 02:50 PM, Gregory Farnum wrote:


On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
 wrote:


Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)
Now, our OSD's are all using around 2GB of RAM memory while the cluster
is
healthy.


PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
  cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
   health HEALTH_OK
   monmap e1: 3 mons at

{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
  election epoch 58, quorum 0,1,2 mds01,mds02,mds03
   mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
   osdmap e25542: 258 osds: 258 up, 258 in
pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
  270 TB used, 549 TB / 819 TB avail
  4152 active+clean
 8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the documentation
a
bit outdated, or should we look into something else?



2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.



Which data is actually in memory of the OSDS?
Is this mostly cached data?
We are short on memory on these servers, can we have influence on this?


Mmm, we've discussed this a few times on the mailing list. The CERN
guys published a document on experimenting with a very large cluster
and not enough RAM, but there's nothing I would really recommend
changing for a production system, especially an EC one, if you aren't
intimately familiar with what's going on.


In that CERN test the obvious large memory consumer was the osdmap
cache, which was so large because (a) the maps were getting quite
large (7200 OSDs creates a 4MB map, IIRC) and (b) so much osdmap churn
was leading each OSD to cache 500 of the maps. Once the cluster was
fully deployed and healthy, we could restart an OSD and it would then
only use ~300MB (because now the osdmap cache was ~empty).

Kenneth: does the memory usage shrink if you restart an osd? If so, it
could be a similar issue.


Thanks!
I tried restarting some OSDS when the cluster was healthy. Sometimes 
OSDS grow immediately back to the memory level they were having before. 
When trying again, they take about 1GB of memory, so about half. We do 
not see it going under that level, but that is maybe because of EC..


Kenneth


Cheers, Dan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM usage values

2015-07-29 Thread Kenneth Waegeman



On 07/28/2015 04:21 PM, Mark Nelson wrote:



On 07/17/2015 07:50 AM, Gregory Farnum wrote:

On Fri, Jul 17, 2015 at 1:13 PM, Kenneth Waegeman
 wrote:

Hi all,

I've read in the documentation that OSDs use around 512MB on a healthy
cluster.(http://ceph.com/docs/master/start/hardware-recommendations/#ram)

Now, our OSD's are all using around 2GB of RAM memory while the
cluster is
healthy.


   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
29784 root  20   0 6081276 2.535g   4740 S   0.7  8.1   1346:55
ceph-osd
32818 root  20   0 5417212 2.164g  24780 S  16.2  6.9   1238:55
ceph-osd
25053 root  20   0 5386604 2.159g  27864 S   0.7  6.9   1192:08
ceph-osd
33875 root  20   0 5345288 2.092g   3544 S   0.7  6.7   1188:53
ceph-osd
30779 root  20   0 5474832 2.090g  28892 S   1.0  6.7   1142:29
ceph-osd
22068 root  20   0 5191516 2.000g  28664 S   0.7  6.4  31:56.72
ceph-osd
34932 root  20   0 5242656 1.994g   4536 S   0.3  6.4   1144:48
ceph-osd
26883 root  20   0 5178164 1.938g   6164 S   0.3  6.2   1173:01
ceph-osd
31796 root  20   0 5193308 1.916g  27000 S  16.2  6.1 923:14.87
ceph-osd
25958 root  20   0 5193436 1.901g   2900 S   0.7  6.1   1039:53
ceph-osd
27826 root  20   0 5225764 1.845g   5576 S   1.0  5.9   1031:15
ceph-osd
36011 root  20   0 5111660 1.823g  20512 S  15.9  5.8   1093:01
ceph-osd
19736 root  20   0 2134680 0.994g  0 S   0.3  3.2  46:13.47
ceph-osd



[root@osd003 ~]# ceph status
2015-07-17 14:03:13.865063 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
2015-07-17 14:03:13.887087 7f1fde5f0700 -1 WARNING: the following
dangerous
and experimental features are enabled: keyvaluestore
 cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
  health HEALTH_OK
  monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}

 election epoch 58, quorum 0,1,2 mds01,mds02,mds03
  mdsmap e17218: 1/1/1 up {0=mds03=up:active}, 1 up:standby
  osdmap e25542: 258 osds: 258 up, 258 in
   pgmap v2460163: 4160 pgs, 4 pools, 228 TB data, 154 Mobjects
 270 TB used, 549 TB / 819 TB avail
 4152 active+clean
8 active+clean+scrubbing+deep


We are using erasure code on most of our OSDs, so maybe that is a
reason.
But also the cache-pool filestore OSDS on 200GB SSDs are using 2GB of
RAM.
Our erasure code pool (16*14 osds) have a pg_num of 2048; our cache pool
(2*14 OSDS) has a pg_num of 1024.

Are these normal values for this configuration, and is the
documentation a
bit outdated, or should we look into something else?


2GB of RSS is larger than I would have expected, but not unreasonable.
In particular I don't think we've gathered numbers on either EC pools
or on the effects of the caching processes.


FWIW, here's statistics for ~36 ceph-osds on the wip-promote-prob branch
after several hours of cache tiering tests (30 OSD base, 6 OS cache
tier) using an EC6+2 pool.  At the time of this test, 4K random
read/writes were being performed.  The cache tier OSDs specifically use
quite a bit more memory than the base tier.  Interestingly in this test
major pagefaults are showing up for the cache tier OSDs which is
annoying. I may need to tweak kernel VM settings on this box.


Ah, we see the same here with our cache OSDS: those small OSDS are 
taking the most memory, on some servers they are taking 3G of RAM.

Even If I restart these , they take up the same amount again.



# PROCESS SUMMARY (counters are /sec)
#Time  PID  User PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT
Pct  AccuTime  RKB  WKB MajF MinF Command
09:58:48   715  root 20 1  424 S1G  271M  8  0.19  0.43
6  30:12.64000 2502 /usr/local/bin/ceph-osd
09:58:48  1363  root 20 1  424 S1G  325M  8  0.14  0.33
4  26:50.54000   68 /usr/local/bin/ceph-osd
09:58:48  2080  root 20 1  420 S1G  276M  1  0.21  0.49
7  23:49.36000 2848 /usr/local/bin/ceph-osd
09:58:48  2747  root 20 1  424 S1G  283M  8  0.25  0.68
9  25:16.63000 1391 /usr/local/bin/ceph-osd
09:58:48  3451  root 20 1  424 S1G  331M  6  0.13  0.14
2  27:36.71000  148 /usr/local/bin/ceph-osd
09:58:48  4172  root 20 1  424 S1G  301M  6  0.19  0.43
6  29:44.56000 2165 /usr/local/bin/ceph-osd
09:58:48  4935  root 20 1  420 S1G  310M  9  0.18  0.28
4  29:09.78000 2042 /usr/local/bin/ceph-osd
09:58:48  5750  root 20 1  420 S1G  267M  2  0.11  0.14
2  26:55.31000  866 /usr/local/bin/ceph-osd
09:58:48  6544  root 20 1  424 S1G  299M  7  0.22  0.62
8  26:46.35000 3468 /usr/local/bin/ceph-osd
09:58:48  7379  root 20 1  424 S1G  283M  8  0.16  0.47
6  25:47.86000  538 /usr/local/bin/ceph-osd
09:58:48  8183  root 20 1  424 S

[ceph-users] Unable to mount Format 2 striped RBD image

2015-07-29 Thread Daleep Bais
Hi,

I have created a format 2 striped image, however, I am not able to mount it
on client machine..

# rbd -p foo info strpimg
rbd image 'strpimg':
size 2048 MB in 513 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.20c942ae8944a
format: 2
features: striping
flags:
stripe unit: 65536 bytes
stripe count: 3


Getting below error  when trying to mount using echo --- > /sys/bus/rbd/add

write error : invalid arguement

if i use a format 1 image, I am able to mount the RBD to client and use it.

Please suggest...

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to mount Format 2 striped RBD image

2015-07-29 Thread Ilya Dryomov
On Wed, Jul 29, 2015 at 1:45 PM, Daleep Bais  wrote:
> Hi,
>
> I have created a format 2 striped image, however, I am not able to mount it
> on client machine..
>
> # rbd -p foo info strpimg
> rbd image 'strpimg':
> size 2048 MB in 513 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.20c942ae8944a
> format: 2
> features: striping
> flags:
> stripe unit: 65536 bytes
> stripe count: 3
>
>
> Getting below error  when trying to mount using echo --- > /sys/bus/rbd/add
>
> write error : invalid arguement
>
> if i use a format 1 image, I am able to mount the RBD to client and use it.

Custom striping settings (i.e. non-default stripe_unit and
stripe_count) are not yet supported by the kernel client.

Unrelated, why are you using sysfs directly instead of rbd cli tool?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] small cluster reboot fail

2015-07-29 Thread pixelfairy
disregard. i did this on a cluster of test vms and didnt bother setting
different hostnames, thus confusing ceph.

On Wed, Jul 29, 2015 at 2:24 AM pixelfairy  wrote:

> have a small test cluster (vmware fusion, 3 mon+osd nodes) all run ubuntu
> trusty. tried rebooting all 3 nodes and this happend.
>
> root@ubuntu:~# ceph --version ceph version 0.94.2
> (5fb85614ca8f354284c713a2f9c610860720bbf3)
>
> root@ubuntu:~# ceph health 2015-07-29 02:08:31.360516 7f5bd711a700 -1
> asok(0x7f5bdbf0) AdminSocketConfigObs::init: failed:
> AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to
> '/var/run/ceph/rbd-clients/ceph-client.admin.3282.140032308415712.asok':
> (2) No such file or directory HEALTH_WARN 64 pgs stuck unclean; recovery
> 512/1024 objects misplaced (50.000%); too few PGs per OSD (21 < min 30)
>
> the osd disks are only 50gigs but they seemed to work fine before the
> reboot.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd-fuse Transport endpoint is not connected

2015-07-29 Thread pixelfairy
client debian wheezy, server ubuntu trusty. both running ceph 0.94.2

rbd-fuse seems to work, but cant access, saying "Transport endpoint is
not connected" when i try to ls the mount point.

on the ceph server, (a virtual machine, as its a test cluster)

root@c3:/etc/ceph# ceph -s
cluster 35ef596a-1e16-4b0a-bad9-0c262db31a3e
 health HEALTH_OK
 monmap e1: 3 mons at
{c1=192.168.113.41:6789/0,c2=192.168.113.42:6789/0,c3=192.168.113.43:6789/0}
election epoch 24, quorum 0,1,2 c1,c2,c3
 osdmap e25: 3 osds: 3 up, 3 in
  pgmap v749: 64 pgs, 1 pools, 3961 MB data, 1023 objects
7987 MB used, 139 GB / 146 GB avail
  64 active+clean
root@c3:/etc/ceph# ceph osd tree
ID WEIGHT  TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.14996 root default
-2 0.04999 host c1
 0 0.04999 osd.0up  1.0  1.0
-3 0.04999 host c2
 1 0.04999 osd.1up  1.0  1.0
-4 0.04999 host c3
 2 0.04999 osd.2up  1.0  1.0
-5   0 host ubuntu
root@c3:/etc/ceph# ceph --version
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
root@c3:/etc/ceph#
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-fuse Transport endpoint is not connected

2015-07-29 Thread Ilya Dryomov
On Wed, Jul 29, 2015 at 2:52 PM, pixelfairy  wrote:
> client debian wheezy, server ubuntu trusty. both running ceph 0.94.2
>
> rbd-fuse seems to work, but cant access, saying "Transport endpoint is
> not connected" when i try to ls the mount point.
>
> on the ceph server, (a virtual machine, as its a test cluster)

rbd-fuse is pretty much a prototype and could use a lot of improvement.
In particular, it doesn't use ceph cli arguments infrastructure, and is
therefore very peaky ceph.conf.  Make sure you are specifying the full
path to your ceph.conf explicitly:

$ sudo rbd-fuse -c /full/path/to/ceph.conf /mnt

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrate OSDs to different backend

2015-07-29 Thread Kenneth Waegeman

Hi all,

We are considering to migrate all our OSDs of our EC pool from KeyValue 
to Filestore. Does someone has experience with this? What would be a 
good procedure?


We have Erasure Code using k+m: 10+3, with host-level failure domain on 
14 servers. Our pool is 30% filled.


I was thinking:
We set the weight of 1/2 of the OSDS on each host to 0 and let the 
cluster migrate the data

We then remove these OSDS, and re-add them
We do then the same with the other OSDS
(Or we do it in 3 times with 1/3 of the OSDS)

Another option:
We repeatedly;
Remove all KV OSDS of 2 servers (m=3)
Re-add all those OSDS with filestore
Wait for data to rebalance

Does someone know what would be the best way? Are there things we should 
not forget or be careful with ?


Thank you very much!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate OSDs to different backend

2015-07-29 Thread Haomai Wang
I think option 2 should be reliable.

On Wed, Jul 29, 2015 at 9:00 PM, Kenneth Waegeman
 wrote:
> Hi all,
>
> We are considering to migrate all our OSDs of our EC pool from KeyValue to
> Filestore. Does someone has experience with this? What would be a good
> procedure?
>
> We have Erasure Code using k+m: 10+3, with host-level failure domain on 14
> servers. Our pool is 30% filled.
>
> I was thinking:
> We set the weight of 1/2 of the OSDS on each host to 0 and let the cluster
> migrate the data
> We then remove these OSDS, and re-add them
> We do then the same with the other OSDS
> (Or we do it in 3 times with 1/3 of the OSDS)
>
> Another option:
> We repeatedly;
> Remove all KV OSDS of 2 servers (m=3)
> Re-add all those OSDS with filestore
> Wait for data to rebalance
>
> Does someone know what would be the best way? Are there things we should not
> forget or be careful with ?
>
> Thank you very much!
>
> Kenneth
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-29 Thread Jake Young
On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic 
wrote:
>
> Hi again,
>
> So I have tried
> - changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
> - changing the memory configuration, from "advanced ecc mode" to
"performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
> - plugged a second 10GB/s link and setup a ceph internal network
> - tried various "tuned-adm profile" such as "throughput-performance"
>
> This changed about nothing.
>
> If
> - the CPUs are not maxed out, and lowering the frequency doesn't change a
thing
> - the network is not maxed out
> - the memory doesn't seem to have an impact
> - network interrupts are spread across all 8 cpu cores and receive queues
are OK
> - disks are not used at their maximum potential (iostat shows my dd
commands produce much more tps than the 4MB ceph transfers...)
>
> Where can I possibly find a bottleneck ?
>
> I'm /(almost) out of ideas/ ... :'(
>
> Regards
>
>
Frederic,

I was trying to optimize my ceph cluster as well and I looked at all of the
same things you described, which didn't help my performance noticeably.

The following network kernel tuning settings did help me significantly.

This is my /etc/sysctl.conf file on all of  my hosts: ceph mons, ceph osds
and any client that connects to my ceph cluster.

# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for
10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
#net.core.rmem_max = 56623104
#net.core.wmem_max = 56623104
# Use 128M buffers
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.optmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.core.somaxconn = 1024
# Increase the length of the processor input queue
net.core.netdev_max_backlog = 25
net.ipv4.tcp_max_syn_backlog = 3
net.ipv4.tcp_max_tw_buckets = 200
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10

# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0

# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192

# Disable source routing and redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0

# Recommended when jumbo frames are enabled
net.ipv4.tcp_mtu_probing = 1

I have 40 Gbps links on my osd nodes, and 10 Gbps links on everything else.

Let me know if that helps.

Jake
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-29 Thread Mark Nelson

On 07/29/2015 10:13 AM, Jake Young wrote:

On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic
mailto:frederic.sch...@cea.fr>> wrote:
 >
 > Hi again,
 >
 > So I have tried
 > - changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
 > - changing the memory configuration, from "advanced ecc mode" to
"performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
 > - plugged a second 10GB/s link and setup a ceph internal network
 > - tried various "tuned-adm profile" such as "throughput-performance"
 >
 > This changed about nothing.
 >
 > If
 > - the CPUs are not maxed out, and lowering the frequency doesn't
change a thing
 > - the network is not maxed out
 > - the memory doesn't seem to have an impact
 > - network interrupts are spread across all 8 cpu cores and receive
queues are OK
 > - disks are not used at their maximum potential (iostat shows my dd
commands produce much more tps than the 4MB ceph transfers...)
 >
 > Where can I possibly find a bottleneck ?
 >
 > I'm /(almost) out of ideas/ ... :'(
 >
 > Regards
 >
 >
Frederic,

I was trying to optimize my ceph cluster as well and I looked at all of
the same things you described, which didn't help my performance noticeably.

The following network kernel tuning settings did help me significantly.

This is my /etc/sysctl.conf file on all of  my hosts: ceph mons, ceph
osds and any client that connects to my ceph cluster.

 # Increase Linux autotuning TCP buffer limits
 # Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104)
for 10GE
 # Don't set tcp_mem itself! Let the kernel scale it based on RAM.
 #net.core.rmem_max = 56623104
 #net.core.wmem_max = 56623104
 # Use 128M buffers
 net.core.rmem_max = 134217728
 net.core.wmem_max = 134217728
 net.core.rmem_default = 67108864
 net.core.wmem_default = 67108864
 net.core.optmem_max = 134217728
 net.ipv4.tcp_rmem = 4096 87380 67108864
 net.ipv4.tcp_wmem = 4096 65536 67108864

 # Make room for more TIME_WAIT sockets due to more clients,
 # and allow them to be reused if we run out of sockets
 # Also increase the max packet backlog
 net.core.somaxconn = 1024
 # Increase the length of the processor input queue
 net.core.netdev_max_backlog = 25
 net.ipv4.tcp_max_syn_backlog = 3
 net.ipv4.tcp_max_tw_buckets = 200
 net.ipv4.tcp_tw_reuse = 1
 net.ipv4.tcp_tw_recycle = 1
 net.ipv4.tcp_fin_timeout = 10

 # Disable TCP slow start on idle connections
 net.ipv4.tcp_slow_start_after_idle = 0

 # If your servers talk UDP, also up these limits
 net.ipv4.udp_rmem_min = 8192
 net.ipv4.udp_wmem_min = 8192

 # Disable source routing and redirects
 net.ipv4.conf.all.send_redirects = 0
 net.ipv4.conf.all.accept_redirects = 0
 net.ipv4.conf.all.accept_source_route = 0

 # Recommended when jumbo frames are enabled
 net.ipv4.tcp_mtu_probing = 1

I have 40 Gbps links on my osd nodes, and 10 Gbps links on everything else.

Let me know if that helps.


Hi Jake,

Could you talk a little bit about what scenarios you've seen tuning this 
help?  I noticed improvement in RGW performance in some cases with 
similar TCP tunings, but it would be good to understand what other folks 
are seeing and in what situations.




Jake


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
I've got a situation that seems on the surface like it should be 
recoverable, but I'm struggling to understand how to do it.


I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After 
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds 
and am attempting to bring them back up again on new hardware in a new 
cluster.  I see plenty of documentation on how to zap and initialize and 
add "new" osds, but I don't see anything on rebuilding with existing osd 
disks.


Could somebody provide guidance on how to do this?  I'm running 94.2 on 
all machines.


Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 0.94 (and lower) performance on >1 hosts ??

2015-07-29 Thread Jake Young
On Wed, Jul 29, 2015 at 11:23 AM, Mark Nelson  wrote:

> On 07/29/2015 10:13 AM, Jake Young wrote:
>
>> On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic
>> mailto:frederic.sch...@cea.fr>> wrote:
>>  >
>>  > Hi again,
>>  >
>>  > So I have tried
>>  > - changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
>>  > - changing the memory configuration, from "advanced ecc mode" to
>> "performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
>>  > - plugged a second 10GB/s link and setup a ceph internal network
>>  > - tried various "tuned-adm profile" such as "throughput-performance"
>>  >
>>  > This changed about nothing.
>>  >
>>  > If
>>  > - the CPUs are not maxed out, and lowering the frequency doesn't
>> change a thing
>>  > - the network is not maxed out
>>  > - the memory doesn't seem to have an impact
>>  > - network interrupts are spread across all 8 cpu cores and receive
>> queues are OK
>>  > - disks are not used at their maximum potential (iostat shows my dd
>> commands produce much more tps than the 4MB ceph transfers...)
>>  >
>>  > Where can I possibly find a bottleneck ?
>>  >
>>  > I'm /(almost) out of ideas/ ... :'(
>>  >
>>  > Regards
>>  >
>>  >
>> Frederic,
>>
>> I was trying to optimize my ceph cluster as well and I looked at all of
>> the same things you described, which didn't help my performance
>> noticeably.
>>
>> The following network kernel tuning settings did help me significantly.
>>
>> This is my /etc/sysctl.conf file on all of  my hosts: ceph mons, ceph
>> osds and any client that connects to my ceph cluster.
>>
>>  # Increase Linux autotuning TCP buffer limits
>>  # Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104)
>> for 10GE
>>  # Don't set tcp_mem itself! Let the kernel scale it based on RAM.
>>  #net.core.rmem_max = 56623104
>>  #net.core.wmem_max = 56623104
>>  # Use 128M buffers
>>  net.core.rmem_max = 134217728
>>  net.core.wmem_max = 134217728
>>  net.core.rmem_default = 67108864
>>  net.core.wmem_default = 67108864
>>  net.core.optmem_max = 134217728
>>  net.ipv4.tcp_rmem = 4096 87380 67108864
>>  net.ipv4.tcp_wmem = 4096 65536 67108864
>>
>>  # Make room for more TIME_WAIT sockets due to more clients,
>>  # and allow them to be reused if we run out of sockets
>>  # Also increase the max packet backlog
>>  net.core.somaxconn = 1024
>>  # Increase the length of the processor input queue
>>  net.core.netdev_max_backlog = 25
>>  net.ipv4.tcp_max_syn_backlog = 3
>>  net.ipv4.tcp_max_tw_buckets = 200
>>  net.ipv4.tcp_tw_reuse = 1
>>  net.ipv4.tcp_tw_recycle = 1
>>  net.ipv4.tcp_fin_timeout = 10
>>
>>  # Disable TCP slow start on idle connections
>>  net.ipv4.tcp_slow_start_after_idle = 0
>>
>>  # If your servers talk UDP, also up these limits
>>  net.ipv4.udp_rmem_min = 8192
>>  net.ipv4.udp_wmem_min = 8192
>>
>>  # Disable source routing and redirects
>>  net.ipv4.conf.all.send_redirects = 0
>>  net.ipv4.conf.all.accept_redirects = 0
>>  net.ipv4.conf.all.accept_source_route = 0
>>
>>  # Recommended when jumbo frames are enabled
>>  net.ipv4.tcp_mtu_probing = 1
>>
>> I have 40 Gbps links on my osd nodes, and 10 Gbps links on everything
>> else.
>>
>> Let me know if that helps.
>>
>
> Hi Jake,
>
> Could you talk a little bit about what scenarios you've seen tuning this
> help?  I noticed improvement in RGW performance in some cases with similar
> TCP tunings, but it would be good to understand what other folks are seeing
> and in what situations.
>
>
>> Jake
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>  ___
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

Hey Mark,

I'm only using RBD.  My clients are all VMware, so I have a few iSCSI proxy
VMs (using rbd enabled tgt).  My workload is typically light random
read/write, except for the periodic eager zeroing of multi terabyte
volumes.  Since there is no VAAI in tgt, this turns into heavy sequential
writing.

I found the network tuning above helped to "open up" the connection from a
single iSCSI proxy VM to the cluster.

Note that my osd nodes have both a public network interface as well as a
dedicated private network interface, which are both 40G.  I believe the
network tuning also has another effect of improving the performance of the
cluster network (where the replication data is sent across), because
initially I had only applied the kernel tuning to the osd nodes and saw a
performance improvement before I implemented it on the iSCSI proxy VMs.

I should m

Re: [ceph-users] Configuring MemStore in Ceph

2015-07-29 Thread Aakanksha Pudipeddi-SSI
Hello Haomai,

Thanks for your response. Yes, I cannot write more than 1GB of data to it. I am 
using the latest version deployed by ceph-deploy so I am assuming the fix must 
be a part of it. Also, while creating osds using ceph-deploy, I just use a 
local directory such as /var/local/osd0. Is there any particular step to change 
while creating the osd with memstore? Thanks a lot for your help!

Aakanksha

-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com] 
Sent: Tuesday, July 28, 2015 7:36 PM
To: Aakanksha Pudipeddi-SSI
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Configuring MemStore in Ceph

On Wed, Jul 29, 2015 at 10:21 AM, Aakanksha Pudipeddi-SSI 
 wrote:
> Hello Haomai,
>
> I am using v0.94.2.
>
> Thanks,
> Aakanksha
>
> -Original Message-
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Tuesday, July 28, 2015 7:20 PM
> To: Aakanksha Pudipeddi-SSI
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>
> Which version do you use?
>
> https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2fc
> aab41a
> should fix your problem
>
> On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI 
>  wrote:
>> Hello,
>>
>>
>>
>> I am trying to setup a ceph cluster with a memstore backend. The 
>> problem is, it is always created with a fixed size (1GB). I made 
>> changes to the ceph.conf file as follows:
>>
>>
>>
>> osd_objectstore = memstore
>>
>> memstore_device_bytes = 5*1024*1024*1024
>>
>>
>>
>> The resultant cluster still has 1GB allocated to it. Could anybody 
>> point out what I am doing wrong here?

What's the mean of "The resultant cluster still has 1GB allocated to it"?

Is it mean that you can't write data more than 1GB?

>>
>>
>>
>> Thanks,
>>
>> Aakanksha
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configuring MemStore in Ceph

2015-07-29 Thread Haomai Wang
On Thu, Jul 30, 2015 at 1:09 AM, Aakanksha Pudipeddi-SSI
 wrote:
> Hello Haomai,
>
> Thanks for your response. Yes, I cannot write more than 1GB of data to it. I 
> am using the latest version deployed by ceph-deploy so I am assuming the fix 
> must be a part of it. Also, while creating osds using ceph-deploy, I just use 
> a local directory such as /var/local/osd0. Is there any particular step to 
> change while creating the osd with memstore? Thanks a lot for your help!
>

Hmm, I can't think of other ideas. Maybe you could verify your osd
config value via "ceph daemon osd.0 config show | grep memstore"

> Aakanksha
>
> -Original Message-
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Tuesday, July 28, 2015 7:36 PM
> To: Aakanksha Pudipeddi-SSI
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>
> On Wed, Jul 29, 2015 at 10:21 AM, Aakanksha Pudipeddi-SSI 
>  wrote:
>> Hello Haomai,
>>
>> I am using v0.94.2.
>>
>> Thanks,
>> Aakanksha
>>
>> -Original Message-
>> From: Haomai Wang [mailto:haomaiw...@gmail.com]
>> Sent: Tuesday, July 28, 2015 7:20 PM
>> To: Aakanksha Pudipeddi-SSI
>> Cc: ceph-us...@ceph.com
>> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>>
>> Which version do you use?
>>
>> https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2fc
>> aab41a
>> should fix your problem
>>
>> On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI 
>>  wrote:
>>> Hello,
>>>
>>>
>>>
>>> I am trying to setup a ceph cluster with a memstore backend. The
>>> problem is, it is always created with a fixed size (1GB). I made
>>> changes to the ceph.conf file as follows:
>>>
>>>
>>>
>>> osd_objectstore = memstore
>>>
>>> memstore_device_bytes = 5*1024*1024*1024
>>>
>>>
>>>
>>> The resultant cluster still has 1GB allocated to it. Could anybody
>>> point out what I am doing wrong here?
>
> What's the mean of "The resultant cluster still has 1GB allocated to it"?
>
> Is it mean that you can't write data more than 1GB?
>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Aakanksha
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configuring MemStore in Ceph

2015-07-29 Thread Aakanksha Pudipeddi-SSI
Hello Haomai,

The issue was with me not mentioning the value correctly in ceph.conf. I 
mistakenly used 5*1024*1024*1024 instead of the value 5368709120. Thanks for 
your help! :)

Aakanksha

-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com] 
Sent: Wednesday, July 29, 2015 10:15 AM
To: Aakanksha Pudipeddi-SSI
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Configuring MemStore in Ceph

On Thu, Jul 30, 2015 at 1:09 AM, Aakanksha Pudipeddi-SSI 
 wrote:
> Hello Haomai,
>
> Thanks for your response. Yes, I cannot write more than 1GB of data to it. I 
> am using the latest version deployed by ceph-deploy so I am assuming the fix 
> must be a part of it. Also, while creating osds using ceph-deploy, I just use 
> a local directory such as /var/local/osd0. Is there any particular step to 
> change while creating the osd with memstore? Thanks a lot for your help!
>

Hmm, I can't think of other ideas. Maybe you could verify your osd config value 
via "ceph daemon osd.0 config show | grep memstore"

> Aakanksha
>
> -Original Message-
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Tuesday, July 28, 2015 7:36 PM
> To: Aakanksha Pudipeddi-SSI
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>
> On Wed, Jul 29, 2015 at 10:21 AM, Aakanksha Pudipeddi-SSI 
>  wrote:
>> Hello Haomai,
>>
>> I am using v0.94.2.
>>
>> Thanks,
>> Aakanksha
>>
>> -Original Message-
>> From: Haomai Wang [mailto:haomaiw...@gmail.com]
>> Sent: Tuesday, July 28, 2015 7:20 PM
>> To: Aakanksha Pudipeddi-SSI
>> Cc: ceph-us...@ceph.com
>> Subject: Re: [ceph-users] Configuring MemStore in Ceph
>>
>> Which version do you use?
>>
>> https://github.com/ceph/ceph/commit/c60f88ba8a6624099f576eaa5f1225c2f
>> c
>> aab41a
>> should fix your problem
>>
>> On Wed, Jul 29, 2015 at 5:44 AM, Aakanksha Pudipeddi-SSI 
>>  wrote:
>>> Hello,
>>>
>>>
>>>
>>> I am trying to setup a ceph cluster with a memstore backend. The 
>>> problem is, it is always created with a fixed size (1GB). I made 
>>> changes to the ceph.conf file as follows:
>>>
>>>
>>>
>>> osd_objectstore = memstore
>>>
>>> memstore_device_bytes = 5*1024*1024*1024
>>>
>>>
>>>
>>> The resultant cluster still has 1GB allocated to it. Could anybody 
>>> point out what I am doing wrong here?
>
> What's the mean of "The resultant cluster still has 1GB allocated to it"?
>
> Is it mean that you can't write data more than 1GB?
>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Aakanksha
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
should use udev to start he OSDs. In that case, a new host that has
the correct ceph.conf and osd-bootstrap key should be able to bring up
the OSDs into the cluster automatically. Just make sure you have the
correct journal in the same host with the matching OSD disk, udev
should do the magic.

The OSD logs are your friend if they don't start properly.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
> I've got a situation that seems on the surface like it should be
> recoverable, but I'm struggling to understand how to do it.
>
> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds and
> am attempting to bring them back up again on new hardware in a new cluster.
> I see plenty of documentation on how to zap and initialize and add "new"
> osds, but I don't see anything on rebuilding with existing osd disks.
>
> Could somebody provide guidance on how to do this?  I'm running 94.2 on all
> machines.
>
> Thanks,
>
> --
> Peter Hinman
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
7AWQ
=VJV0
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
Thanks for the guidance.  I'm working on building a valid ceph.conf 
right now.  I'm not familiar with the osd-bootstrap key. Is that the 
standard filename for it?  Is it the keyring that is stored on the osd?


I'll see if the logs turn up anything I can decipher after I rebuild the 
ceph.conf file.


--
Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
should use udev to start he OSDs. In that case, a new host that has
the correct ceph.conf and osd-bootstrap key should be able to bring up
the OSDs into the cluster automatically. Just make sure you have the
correct journal in the same host with the matching OSD disk, udev
should do the magic.

The OSD logs are your friend if they don't start properly.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds and
am attempting to bring them back up again on new hardware in a new cluster.
I see plenty of documentation on how to zap and initialize and add "new"
osds, but I don't see anything on rebuilding with existing osd disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on all
machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
7AWQ
=VJV0
-END PGP SIGNATURE-



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Steve Taylor
I recently migrated 240 OSDs to new servers this way in a single cluster, and 
it worked great. There are two additional items I would note based on my 
experience though.

First, if you're using dmcrypt then of course you need to copy the dmcrypt keys 
for the OSDs to the new host(s). I had to do this in my case, but it was very 
straightforward.

Second was an issue I didn't expect, probably just because of my ignorance. I 
was not able to migrate existing OSDs from different failure domains into a 
new, single failure domain without waiting for full recovery to HEALTH_OK in 
between. The very first server I put OSD disks from two different failure 
domains into had issues. The OSDs came up and in just fine, but immediately 
started flapping and failed to make progress toward recovery. I removed the 
disks from one failure domain and left the others, and recovery progressed as 
expected. As soon as I saw HEALTH_OK I re-migrated the OSDs from the other 
failure domain and again the cluster recovered as expected. Proceeding via this 
method allowed me to migrate all 240 OSDs without any further problems. I was 
also able to migrate as many OSDs as I wanted to simultaneously as long as I 
didn't mix OSDs from different, old failure domains in a new failure domain 
without recovering in between. I understand mixing failure domains li
 ke this is risky, but I sort of expected it to work anyway. Maybe it was 
better in the end that Ceph forced me to do it more safely.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Peter 
Hinman
Sent: Wednesday, July 29, 2015 12:58 PM
To: Robert LeBlanc 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Recovery question

Thanks for the guidance.  I'm working on building a valid ceph.conf right now.  
I'm not familiar with the osd-bootstrap key. Is that the standard filename for 
it?  Is it the keyring that is stored on the osd?

I'll see if the logs turn up anything I can decipher after I rebuild the 
ceph.conf file.

--
Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it 
> should use udev to start he OSDs. In that case, a new host that has 
> the correct ceph.conf and osd-bootstrap key should be able to bring up 
> the OSDs into the cluster automatically. Just make sure you have the 
> correct journal in the same host with the matching OSD disk, udev 
> should do the magic.
>
> The OSD logs are your friend if they don't start properly.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
>> I've got a situation that seems on the surface like it should be 
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After 
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal 
>> ssds and am attempting to bring them back up again on new hardware in a new 
>> cluster.
>> I see plenty of documentation on how to zap and initialize and add "new"
>> osds, but I don't see anything on rebuilding with existing osd disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 
>> on all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
> 13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
> ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
> l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
> I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
> diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
> KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
> uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
> Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
> UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
> 2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
> 7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
> 7AWQ
> =VJV0
> -END PGP SIGNATURE-


___
ceph-users mailing list
ceph-users@l

Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds odd. Can you create a ticket in the tracker with all the
details you can remember or reconstruct?
-Greg

On Wed, Jul 29, 2015 at 8:34 PM Steve Taylor 
wrote:

> I recently migrated 240 OSDs to new servers this way in a single cluster,
> and it worked great. There are two additional items I would note based on
> my experience though.
>
> First, if you're using dmcrypt then of course you need to copy the dmcrypt
> keys for the OSDs to the new host(s). I had to do this in my case, but it
> was very straightforward.
>
> Second was an issue I didn't expect, probably just because of my
> ignorance. I was not able to migrate existing OSDs from different failure
> domains into a new, single failure domain without waiting for full recovery
> to HEALTH_OK in between. The very first server I put OSD disks from two
> different failure domains into had issues. The OSDs came up and in just
> fine, but immediately started flapping and failed to make progress toward
> recovery. I removed the disks from one failure domain and left the others,
> and recovery progressed as expected. As soon as I saw HEALTH_OK I
> re-migrated the OSDs from the other failure domain and again the cluster
> recovered as expected. Proceeding via this method allowed me to migrate all
> 240 OSDs without any further problems. I was also able to migrate as many
> OSDs as I wanted to simultaneously as long as I didn't mix OSDs from
> different, old failure domains in a new failure domain without recovering
> in between. I understand mixing failure domains li
>  ke this is risky, but I sort of expected it to work anyway. Maybe it was
> better in the end that Ceph forced me to do it more safely.
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | Fax: 801.545.4705
>
> If you are not the intended recipient of this message, be advised that any
> dissemination or copying of this message is prohibited.
> If you received this message erroneously, please notify the sender and
> delete it, together with any attachments.
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Peter Hinman
> Sent: Wednesday, July 29, 2015 12:58 PM
> To: Robert LeBlanc 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Recovery question
>
> Thanks for the guidance.  I'm working on building a valid ceph.conf right
> now.  I'm not familiar with the osd-bootstrap key. Is that the standard
> filename for it?  Is it the keyring that is stored on the osd?
>
> I'll see if the logs turn up anything I can decipher after I rebuild the
> ceph.conf file.
>
> --
> Peter Hinman
>
> On 7/29/2015 12:49 PM, Robert LeBlanc wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> >
> > Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it
> > should use udev to start he OSDs. In that case, a new host that has
> > the correct ceph.conf and osd-bootstrap key should be able to bring up
> > the OSDs into the cluster automatically. Just make sure you have the
> > correct journal in the same host with the matching OSD disk, udev
> > should do the magic.
> >
> > The OSD logs are your friend if they don't start properly.
> > - 
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >
> >
> > On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
> >> I've got a situation that seems on the surface like it should be
> >> recoverable, but I'm struggling to understand how to do it.
> >>
> >> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> >> multiple hardware failures, I pulled the 3 osd disks and 3 journal
> >> ssds and am attempting to bring them back up again on new hardware in a
> new cluster.
> >> I see plenty of documentation on how to zap and initialize and add "new"
> >> osds, but I don't see anything on rebuilding with existing osd disks.
> >>
> >> Could somebody provide guidance on how to do this?  I'm running 94.2
> >> on all machines.
> >>
> >> Thanks,
> >>
> >> --
> >> Peter Hinman
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > -BEGIN PGP SIGNATURE-
> > Version: Mailvelope v0.13.1
> > Comment: https://www.mailvelope.com
> >
> > wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
> > 13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
> > ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
> > l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
> > I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
> > diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
> > KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
> > uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
> > Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPF

Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
This sounds like you're trying to reconstruct a cluster after destroying
the monitors. That is...not going to work well. The monitors define the
cluster and you can't move OSDs into different clusters. We have ideas for
how to reconstruct monitors and it can be done manually with a lot of
hassle, but the process isn't written down and there aren't really fools I
help with it. :/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

> I've got a situation that seems on the surface like it should be
> recoverable, but I'm struggling to understand how to do it.
>
> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
> and am attempting to bring them back up again on new hardware in a new
> cluster.  I see plenty of documentation on how to zap and initialize and
> add "new" osds, but I don't see anything on rebuilding with existing osd
> disks.
>
> Could somebody provide guidance on how to do this?  I'm running 94.2 on
> all machines.
>
> Thanks,
>
> --
> Peter Hinman
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-fuse Transport endpoint is not connected

2015-07-29 Thread pixelfairy
copied ceph.conf from the servers. hope this helps. should this be
concidered an unsupported feature?

# rbd-fuse /cmnt -c /etc/ceph/ceph.conf -d
FUSE library version: 2.9.2
nullpath_ok: 0
nopath: 0
utime_omit_ok: 0
unique: 1,
opcode: INIT (26), nodeid: 0, insize: 56, pid: 0
INIT: 7.22 flags=0xf7fb max_readahead=0x0002 Error connecting to
cluster: No such file or directory



On Wed, Jul 29, 2015 at 5:33 AM Ilya Dryomov  wrote:

> On Wed, Jul 29, 2015 at 2:52 PM, pixelfairy  wrote:
> > client debian wheezy, server ubuntu trusty. both running ceph 0.94.2
> >
> > rbd-fuse seems to work, but cant access, saying "Transport endpoint is
> > not connected" when i try to ls the mount point.
> >
> > on the ceph server, (a virtual machine, as its a test cluster)
>
> rbd-fuse is pretty much a prototype and could use a lot of improvement.
> In particular, it doesn't use ceph cli arguments infrastructure, and is
> therefore very peaky ceph.conf.  Make sure you are specifying the full
> path to your ceph.conf explicitly:
>
> $ sudo rbd-fuse -c /full/path/to/ceph.conf /mnt
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

 === osd.3 ===
 Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
 2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 
authentication error (1) Operation not permitted

 Error connecting to cluster: PermissionError
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3 
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 
3 3.64 host=stor-2 root=default'
 ceph-disk: Error: ceph osd start failed: Command 
'['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' 
returned non-zero exit status 1

 ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've 
copied the client.bootstrap-osd key from the output of ceph auth list, 
and pasted it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that 
has not resolve the error.


But it sounds like you are saying that even once I get this resolved, I 
have no hope of recovering the data?


--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:
This sounds like you're trying to reconstruct a cluster after 
destroying the monitors. That is...not going to work well. The 
monitors define the cluster and you can't move OSDs into different 
clusters. We have ideas for how to reconstruct monitors and it can be 
done manually with a lot of hassle, but the process isn't written down 
and there aren't really fools I help with it. :/

-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman > wrote:


I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal
ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and
initialize and
add "new" osds, but I don't see anything on rebuilding with
existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running
94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Gregory Farnum
On Wednesday, July 29, 2015, Peter Hinman  wrote:

>  Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>  === osd.3 ===
>  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
> error (1) Operation not permitted
>  Error connecting to cluster: PermissionError
>  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
> 3.64 host=stor-2 root=default'
>  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>  ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've copied
> the client.bootstrap-osd key from the output of ceph auth list, and pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I
> have no hope of recovering the data?
>

Well, I think you'd need to buy help to assemble a working cluster with
these OSDs. But if you have rbd images you want to get out, you might be
able to string together the tools to make that happen. I'd have to defer to
David (for OSD object extraction options) or Josh/Jason (for rbd
export/import) for that, though.

ceph-objectstore-tool will I think be part of your solution, but I'm not
sure how much of can do on its own. What's your end goal?


>
> --
> Peter Hinman
>
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after destroying
> the monitors. That is...not going to work well. The monitors define the
> cluster and you can't move OSDs into different clusters. We have ideas for
> how to reconstruct monitors and it can be done manually with a lot of
> hassle, but the process isn't written down and there aren't really fools I
> help with it. :/
>
> *tools to help with it. Sorry for the unfortunate autocorrect!




>
>
>
>  On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman <
> 
> peter.hin...@myib.com
> > wrote:
>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
>> and am attempting to bring them back up again on new hardware in a new
>> cluster.  I see plenty of documentation on how to zap and initialize and
>> add "new" osds, but I don't see anything on rebuilding with existing osd
>> disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 on
>> all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
The end goal is to recover the data.  I don't need to re-implement the 
cluster as it was - that just appeared to the the natural way to recover 
the data.


What monitor data would be required to re-implement the cluster?

--
Peter Hinman
International Bridge / ParcelPool.com

On 7/29/2015 2:55 PM, Gregory Farnum wrote:



On Wednesday, July 29, 2015, Peter Hinman > wrote:


Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

 === osd.3 ===
 Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
 2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
authentication error (1) Operation not permitted
 Error connecting to cluster: PermissionError
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
--name=osd.3 --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush
create-or-move -- 3 3.64 host=stor-2 root=default'
 ceph-disk: Error: ceph osd start failed: Command
'['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
'osd.3']' returned non-zero exit status 1
 ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError? I've
copied the client.bootstrap-osd key from the output of ceph auth
list, and pasted it into /var/lib/ceph/bootstrap-osd/ceph.keyring,
but that has not resolve the error.

But it sounds like you are saying that even once I get this
resolved, I have no hope of recovering the data?


Well, I think you'd need to buy help to assemble a working cluster 
with these OSDs. But if you have rbd images you want to get out, you 
might be able to string together the tools to make that happen. I'd 
have to defer to David (for OSD object extraction options) or 
Josh/Jason (for rbd export/import) for that, though.


ceph-objectstore-tool will I think be part of your solution, but I'm 
not sure how much of can do on its own. What's your end goal?



-- 
Peter Hinman


On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after
destroying the monitors. That is...not going to work well. The
monitors define the cluster and you can't move OSDs into
different clusters. We have ideas for how to reconstruct monitors
and it can be done manually with a lot of hassle, but the process
isn't written down and there aren't really fools I help with it. :/


*tools to help with it. Sorry for the unfortunate autocorrect!




On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman
> wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal
ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3
journal ssds
and am attempting to bring them back up again on new hardware
in a new
cluster.  I see plenty of documentation on how to zap and
initialize and
add "new" osds, but I don't see anything on rebuilding with
existing osd
disks.

Could somebody provide guidance on how to do this?  I'm
running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
> Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>  === osd.3 ===
>  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
> error (1) Operation not permitted
>  Error connecting to cluster: PermissionError
>  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
> 3.64 host=stor-2 root=default'
>  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>  ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've copied
> the client.bootstrap-osd key from the output of ceph auth list, and pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I have
> no hope of recovering the data?
>
> --
> Peter Hinman
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after destroying the
> monitors. That is...not going to work well. The monitors define the cluster
> and you can't move OSDs into different clusters. We have ideas for how to
> reconstruct monitors and it can be done manually with a lot of hassle, but
> the process isn't written down and there aren't really fools I help with it.
> :/
> -Greg
>
> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:
>>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
>> and am attempting to bring them back up again on new hardware in a new
>> cluster.  I see plenty of documentation on how to zap and initialize and
>> add "new" osds, but I don't see anything on rebuilding with existing osd
>> disks.
>>
>> Could somebody provide guidance on how to do this?  I'm running 94.2 on
>> all machines.
>>
>> Thanks,
>>
>> --
>> Peter Hinman
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
b23I
=1NQh
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
me). You will also need the information from /etc/ceph/ to reconstruct
the data. I *think* you should be able to just copy this to a new box
with the same name and IP address and start it up.

I haven't actually done this, so there still may be some bumps.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
> Thanks Robert -
>
> Where would that monitor data (database) be found?
>
> --
> Peter Hinman
>
>
> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> If you built new monitors, this will not work. You would have to
>> recover the monitor data (database) from at least one monitor and
>> rebuild the monitor. The new monitors would not have any information
>> about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>>>
>>> Hi Greg -
>>>
>>> So at the moment, I seem to be trying to resolve a permission error.
>>>
>>>   === osd.3 ===
>>>   Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>>>   2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
>>> authentication
>>> error (1) Operation not permitted
>>>   Error connecting to cluster: PermissionError
>>>   failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
>>> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
>>> 3.64 host=stor-2 root=default'
>>>   ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
>>> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
>>> status 1
>>>   ceph-disk: Error: One or more partitions failed to activate
>>>
>>>
>>> Is there a way to identify the cause of this PermissionError?  I've
>>> copied
>>> the client.bootstrap-osd key from the output of ceph auth list, and
>>> pasted
>>> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
>>> resolve
>>> the error.
>>>
>>> But it sounds like you are saying that even once I get this resolved, I
>>> have
>>> no hope of recovering the data?
>>>
>>> --
>>> Peter Hinman
>>>
>>> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>>>
>>> This sounds like you're trying to reconstruct a cluster after destroying
>>> the
>>> monitors. That is...not going to work well. The monitors define the
>>> cluster
>>> and you can't move OSDs into different clusters. We have ideas for how to
>>> reconstruct monitors and it can be done manually with a lot of hassle,
>>> but
>>> the process isn't written down and there aren't really fools I help with
>>> it.
>>> :/
>>> -Greg
>>>
>>> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

 I've got a situation that seems on the surface like it should be
 recoverable, but I'm struggling to understand how to do it.

 I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
 multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
 and am attempting to bring them back up again on new hardware in a new
 cluster.  I see plenty of documentation on how to zap and initialize and
 add "new" osds, but I don't see anything on rebuilding with existing osd
 disks.

 Could somebody provide guidance on how to do this?  I'm running 94.2 on
 all machines.

 Thanks,

 --
 Peter Hinman


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
>> hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
>> 7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
>> fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
>> st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
>> BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
>> zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
>> +grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
>> dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
>> aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
>> /mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
>> 12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
>> b23I
>> =1NQh
>> -END PGP SIGNATURE-
>
>
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.

Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman

Thanks Robert -

Where would that monitor data (database) be found?

--
Peter Hinman

On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

  === osd.3 ===
  Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
  2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3 authentication
error (1) Operation not permitted
  Error connecting to cluster: PermissionError
  failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
3.64 host=stor-2 root=default'
  ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
status 1
  ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've copied
the client.bootstrap-osd key from the output of ceph auth list, and pasted
it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not resolve
the error.

But it sounds like you are saying that even once I get this resolved, I have
no hope of recovering the data?

--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after destroying the
monitors. That is...not going to work well. The monitors define the cluster
and you can't move OSDs into different clusters. We have ideas for how to
reconstruct monitors and it can be done manually with a lot of hassle, but
the process isn't written down and there aren't really fools I help with it.
:/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and initialize and
add "new" osds, but I don't see anything on rebuilding with existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
b23I
=1NQh
-END PGP SIGNATURE-



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Peter Hinman
Ok - that is encouraging.  I've believe I've got data from a previous 
monitor. I see files in a store.db dated yesterday, with a 
MANIFEST- file that is significantly greater than the 
MANIFEST-07 file listed for the current monitors.


I've actually found data for two previous monitors.  Any idea which one 
I should select? The one with the highest manifest number? The most 
recent time stamp?


What files should I be looking for in /etc/conf?  Just the keyring and 
rbdmap files?  How important is it to use the same keyring file?


--
Peter Hinman
International Bridge / ParcelPool.com

On 7/29/2015 3:47 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
me). You will also need the information from /etc/ceph/ to reconstruct
the data. I *think* you should be able to just copy this to a new box
with the same name and IP address and start it up.

I haven't actually done this, so there still may be some bumps.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:

Thanks Robert -

Where would that monitor data (database) be found?

--
Peter Hinman


On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you built new monitors, this will not work. You would have to
recover the monitor data (database) from at least one monitor and
rebuild the monitor. The new monitors would not have any information
about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:

Hi Greg -

So at the moment, I seem to be trying to resolve a permission error.

   === osd.3 ===
   Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
   2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
authentication
error (1) Operation not permitted
   Error connecting to cluster: PermissionError
   failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.3
--keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3
3.64 host=stor-2 root=default'
   ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service',
'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
status 1
   ceph-disk: Error: One or more partitions failed to activate


Is there a way to identify the cause of this PermissionError?  I've
copied
the client.bootstrap-osd key from the output of ceph auth list, and
pasted
it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
resolve
the error.

But it sounds like you are saying that even once I get this resolved, I
have
no hope of recovering the data?

--
Peter Hinman

On 7/29/2015 1:57 PM, Gregory Farnum wrote:

This sounds like you're trying to reconstruct a cluster after destroying
the
monitors. That is...not going to work well. The monitors define the
cluster
and you can't move OSDs into different clusters. We have ideas for how to
reconstruct monitors and it can be done manually with a lot of hassle,
but
the process isn't written down and there aren't really fools I help with
it.
:/
-Greg

On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:

I've got a situation that seems on the surface like it should be
recoverable, but I'm struggling to understand how to do it.

I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
multiple hardware failures, I pulled the 3 osd disks and 3 journal ssds
and am attempting to bring them back up again on new hardware in a new
cluster.  I see plenty of documentation on how to zap and initialize and
add "new" osds, but I don't see anything on rebuilding with existing osd
disks.

Could somebody provide guidance on how to do this?  I'm running 94.2 on
all machines.

Thanks,

--
Peter Hinman


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
+grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
/mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aED

Re: [ceph-users] Recovery question

2015-07-29 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you had multiple monitors, you should recover if possible more than
50% of them (they will need to form a quorum). If you can't, it is
messy but, you can manually remove enough monitors to start a quorum.
>From /etc/ceph/ you will want the keyring and the ceph.conf at a
minimim. The keys for the monitor I think are in the store.db which
will let the monitors start, but the keyring has the admin key which
lets you manage the cluster once you get it up. rbdmap is not needed
for recovery (only automatically mounting RBDs at boot time), we can
deal with that later if needed.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jul 29, 2015 at 4:40 PM, Peter Hinman  wrote:
> Ok - that is encouraging.  I've believe I've got data from a previous
> monitor. I see files in a store.db dated yesterday, with a MANIFEST-
> file that is significantly greater than the MANIFEST-07 file listed for
> the current monitors.
>
> I've actually found data for two previous monitors.  Any idea which one I
> should select? The one with the highest manifest number? The most recent
> time stamp?
>
> What files should I be looking for in /etc/conf?  Just the keyring and
> rbdmap files?  How important is it to use the same keyring file?
>
> --
> Peter Hinman
> International Bridge / ParcelPool.com
>
> On 7/29/2015 3:47 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
>> me). You will also need the information from /etc/ceph/ to reconstruct
>> the data. I *think* you should be able to just copy this to a new box
>> with the same name and IP address and start it up.
>>
>> I haven't actually done this, so there still may be some bumps.
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
>>>
>>> Thanks Robert -
>>>
>>> Where would that monitor data (database) be found?
>>>
>>> --
>>> Peter Hinman
>>>
>>>
>>> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 If you built new monitors, this will not work. You would have to
 recover the monitor data (database) from at least one monitor and
 rebuild the monitor. The new monitors would not have any information
 about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>
> Hi Greg -
>
> So at the moment, I seem to be trying to resolve a permission error.
>
>=== osd.3 ===
>Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
> authentication
> error (1) Operation not permitted
>Error connecting to cluster: PermissionError
>failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
> --name=osd.3
> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move --
> 3
> 3.64 host=stor-2 root=default'
>ceph-disk: Error: ceph osd start failed: Command
> '['/usr/sbin/service',
> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero exit
> status 1
>ceph-disk: Error: One or more partitions failed to activate
>
>
> Is there a way to identify the cause of this PermissionError?  I've
> copied
> the client.bootstrap-osd key from the output of ceph auth list, and
> pasted
> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
> resolve
> the error.
>
> But it sounds like you are saying that even once I get this resolved, I
> have
> no hope of recovering the data?
>
> --
> Peter Hinman
>
> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>
> This sounds like you're trying to reconstruct a cluster after
> destroying
> the
> monitors. That is...not going to work well. The monitors define the
> cluster
> and you can't move OSDs into different clusters. We have ideas for how
> to
> reconstruct monitors and it can be done manually with a lot of hassle,
> but
> the process isn't written down and there aren't really fools I help
> with
> it.
> :/
> -Greg
>
> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:
>>
>> I've got a situation that seems on the surface like it should be
>> recoverable, but I'm struggling to understand how to do it.
>>
>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After
>> multiple hardware failures, I pulled the 3 osd disks and 3 journal
>> ssds
>> and am attempting to bring them back up again on new hardwa

[ceph-users] injectargs not working?

2015-07-29 Thread Quentin Hartman
I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be working:

# ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
 failed to parse arguments: --osd-scrub-begin-hour,1


I've also tried the daemon config set variant and it also fails:

# ceph daemon osd.0 config set osd_scrub_begin_hour 1
{ "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such file
or directory"}

I'm guessing I have something goofed in my admin socket client config:

[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/$cluster-$type.$id.asok

but that seems to correlate with the structure that exists:

# ls
ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
# pwd
/var/run/ceph

I can show my configs all over the place, but changing them seems to always
fail. It behaves the same if I'm working on a local daemon, or on my config
node trying to make changes globally.

Thanks in advance for any ideas

QH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] injectargs not working?

2015-07-29 Thread Travis Rhoden
Hi Quentin,

It may be the specific option you are trying to tweak.
osd-scrub-begin-hour was first introduced in development release
v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
0.87.1 (Giant).

Cheers,

 - Travis

On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
 wrote:
> I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be working:
>
> # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
>  failed to parse arguments: --osd-scrub-begin-hour,1
>
>
> I've also tried the daemon config set variant and it also fails:
>
> # ceph daemon osd.0 config set osd_scrub_begin_hour 1
> { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such file or
> directory"}
>
> I'm guessing I have something goofed in my admin socket client config:
>
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true
> admin socket = /var/run/ceph/$cluster-$type.$id.asok
>
> but that seems to correlate with the structure that exists:
>
> # ls
> ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
> # pwd
> /var/run/ceph
>
> I can show my configs all over the place, but changing them seems to always
> fail. It behaves the same if I'm working on a local daemon, or on my config
> node trying to make changes globally.
>
> Thanks in advance for any ideas
>
> QH
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] injectargs not working?

2015-07-29 Thread Quentin Hartman
well, that would certainly do it. I _always_ forget to twiddle the little
thing on the web page that changes the version of the docs I'm looking at.

So I guess then my question becomes, "How do i prevent deep scrubs from
happening in the middle of the day and ruining everything?"

QH


On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden  wrote:

> Hi Quentin,
>
> It may be the specific option you are trying to tweak.
> osd-scrub-begin-hour was first introduced in development release
> v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
> 0.87.1 (Giant).
>
> Cheers,
>
>  - Travis
>
> On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
>  wrote:
> > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be working:
> >
> > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
> >  failed to parse arguments: --osd-scrub-begin-hour,1
> >
> >
> > I've also tried the daemon config set variant and it also fails:
> >
> > # ceph daemon osd.0 config set osd_scrub_begin_hour 1
> > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such
> file or
> > directory"}
> >
> > I'm guessing I have something goofed in my admin socket client config:
> >
> > [client]
> > rbd cache = true
> > rbd cache writethrough until flush = true
> > admin socket = /var/run/ceph/$cluster-$type.$id.asok
> >
> > but that seems to correlate with the structure that exists:
> >
> > # ls
> > ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
> > # pwd
> > /var/run/ceph
> >
> > I can show my configs all over the place, but changing them seems to
> always
> > fail. It behaves the same if I'm working on a local daemon, or on my
> config
> > node trying to make changes globally.
> >
> > Thanks in advance for any ideas
> >
> > QH
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon cpu usage

2015-07-29 Thread Quentin Hartman
I just had my ceph cluster exhibit this behavior (two of three mons eat all
CPU, cluster becomes unusably slow) which is running 0.87.1

It seems to be tied to deep scrubbing, as the behavior almost immediately
surfaces if that is turned on, but if it is off the behavior eventually
seems to return to normal and stays that way while scrubbing is off. I have
not yet found anything in the cluster to indicate a hardware problem.

Any thoughts or further insights on this subject would be appreciated.

QH

On Sat, Jul 25, 2015 at 12:31 AM, Luis Periquito 
wrote:

> I think I figured out! All 4 of the OSDs on one host (OSD 107-110) were
> sending massive amounts of auth requests to the monitors, seeming to
> overwhelm them.
>
> Weird bit is that I removed them (osd crush remove, auth del, osd rm), dd
> the box and all of the disks, reinstalled and guess what? They are still
> doing a lot of requests to the MONs... this will require some further
> investigations.
>
> As this is happening during my holidays, I just disabled them, and will
> investigate further when I get back.
>
>
> On Fri, Jul 24, 2015 at 11:11 PM, Kjetil Jørgensen 
> wrote:
>
>> It sounds slightly similar to what I just experienced.
>>
>> I had one monitor out of three, which seemed to essentially run one core
>> at full tilt continuously, and had it's virtual address space allocated at
>> the point where top started calling it Tb. Requests hitting this monitor
>> did not get very timely responses (although; I don't know if this were
>> happening consistently or arbitrarily).
>>
>> I ended up re-building the monitor from the two healthy ones I had, which
>> made the problem go away for me.
>>
>> After the fact inspection of the monitor I ripped out, clocked it in at
>> 1.3Gb compared to the 250Mb of the other two, after rebuild they're all
>> comparable in size.
>>
>> In my case; this started out for me on firefly, and persisted after
>> upgrading to hammer. Which prompted the rebuild, suspecting that in my case
>> it were related to "something" persistent for this monitor.
>>
>> I do not have that much more useful to contribute to this discussion,
>> since I've more-or-less destroyed any evidence by re-building the monitor.
>>
>> Cheers,
>> KJ
>>
>> On Fri, Jul 24, 2015 at 1:55 PM, Luis Periquito 
>> wrote:
>>
>>> The leveldb is smallish: around 70mb.
>>>
>>> I ran debug mon = 10 for a while,  but couldn't find any interesting
>>> information. I would run out of space quite quickly though as the log
>>> partition only has 10g.
>>> On 24 Jul 2015 21:13, "Mark Nelson"  wrote:
>>>
 On 07/24/2015 02:31 PM, Luis Periquito wrote:

> Now it's official,  I have a weird one!
>
> Restarted one of the ceph-mons with jemalloc and it didn't make any
> difference. It's still using a lot of cpu and still not freeing up
> memory...
>
> The issue is that the cluster almost stops responding to requests, and
> if I restart the primary mon (that had almost no memory usage nor cpu)
> the cluster goes back to its merry way responding to requests.
>
> Does anyone have any idea what may be going on? The worst bit is that I
> have several clusters just like this (well they are smaller), and as we
> do everything with puppet, they should be very similar... and all the
> other clusters are just working fine, without any issues whatsoever...
>

 We've seen cases where leveldb can't compact fast enough and memory
 balloons, but it's usually associated with extreme CPU usage as well. It
 would be showing up in perf though if that were the case...


> On 24 Jul 2015 10:11, "Jan Schermer"  > wrote:
>
> You don’t (shouldn’t) need to rebuild the binary to use jemalloc.
> It
> should be possible to do something like
>
> LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
>
> The last time we tried it segfaulted after a few minutes, so YMMV
> and be careful.
>
> Jan
>
>  On 23 Jul 2015, at 18:18, Luis Periquito > > wrote:
>>
>> Hi Greg,
>>
>> I've been looking at the tcmalloc issues, but did seem to affect
>> osd's, and I do notice it in heavy read workloads (even after the
>> patch and
>> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This
>> is affecting the mon process though.
>>
>> looking at perf top I'm getting most of the CPU usage in mutex
>> lock/unlock
>>   5.02% libpthread-2.19.so [.]
>> pthread_mutex_unlock
>>   3.82%  libsoftokn3.so[.] 0x0001e7cb
>>   3.46% libpthread-2.19.so [.]
>> pthread_mutex_lock
>>
>> I could try to use jemalloc, are you aware of any built binaries?
>> Can I mix a cluste

Re: [ceph-users] injectargs not working?

2015-07-29 Thread 池信泽
hi, ceph osd set noscrub(or nodeep-scrub) would stop the scrub
forever. and ceph osd unset noscrub would continue to reschedule
scrub.

so maybe you could use this two command in crontab to schedule scrub manually.

2015-07-30 7:59 GMT+08:00 Quentin Hartman :
> well, that would certainly do it. I _always_ forget to twiddle the little
> thing on the web page that changes the version of the docs I'm looking at.
>
> So I guess then my question becomes, "How do i prevent deep scrubs from
> happening in the middle of the day and ruining everything?"
>
> QH
>
>
> On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden  wrote:
>>
>> Hi Quentin,
>>
>> It may be the specific option you are trying to tweak.
>> osd-scrub-begin-hour was first introduced in development release
>> v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
>> 0.87.1 (Giant).
>>
>> Cheers,
>>
>>  - Travis
>>
>> On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
>>  wrote:
>> > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be
>> > working:
>> >
>> > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
>> >  failed to parse arguments: --osd-scrub-begin-hour,1
>> >
>> >
>> > I've also tried the daemon config set variant and it also fails:
>> >
>> > # ceph daemon osd.0 config set osd_scrub_begin_hour 1
>> > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such
>> > file or
>> > directory"}
>> >
>> > I'm guessing I have something goofed in my admin socket client config:
>> >
>> > [client]
>> > rbd cache = true
>> > rbd cache writethrough until flush = true
>> > admin socket = /var/run/ceph/$cluster-$type.$id.asok
>> >
>> > but that seems to correlate with the structure that exists:
>> >
>> > # ls
>> > ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
>> > # pwd
>> > /var/run/ceph
>> >
>> > I can show my configs all over the place, but changing them seems to
>> > always
>> > fail. It behaves the same if I'm working on a local daemon, or on my
>> > config
>> > node trying to make changes globally.
>> >
>> > Thanks in advance for any ideas
>> >
>> > QH
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Regards,
xinze
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] injectargs not working?

2015-07-29 Thread Christian Balzer

Hello,

On Wed, 29 Jul 2015 17:59:10 -0600 Quentin Hartman wrote:

> well, that would certainly do it. I _always_ forget to twiddle the little
> thing on the web page that changes the version of the docs I'm looking
> at.
> 
> So I guess then my question becomes, "How do i prevent deep scrubs from
> happening in the middle of the day and ruining everything?"
> 

Firstly a qualification and quantification of "ruining everything" would
be interesting, but I'll assume it's bad.

I have (had) clusters where even simple scrubs would be detrimental, so I
can relate.

That being said, if your cluster goes catatonic when being scrubbed, you
might want to improve it (more, faster OSDs, etc) because a deep scrub
isn't all that different from the load you'll experience when loosing an
OSD or node even, something your cluster should survive w/o becoming
totally unusable in regards to client I/O. 

The most effective way to keep scrubs from starving client
I/O is setting "osd_scrub_sleep = 0.1" (the recommended value in
documentation seems to be far too small to have any beneficial effect for most 
people).

To scrub at a specific time and given that your cluster can deep- scrub
itself completely during the night, consider issuing a 
"ceph osd deep-scrub \*" 
late on a weekend evening.

My largest cluster can deep scrub itself in 4 hours, so once I kicked that
off at midnight on a Saturday all scrubs (daily) and deep scrubs
(weekly) happen in that time frame.

Christian

> QH
> 
> 
> On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden  wrote:
> 
> > Hi Quentin,
> >
> > It may be the specific option you are trying to tweak.
> > osd-scrub-begin-hour was first introduced in development release
> > v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
> > 0.87.1 (Giant).
> >
> > Cheers,
> >
> >  - Travis
> >
> > On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
> >  wrote:
> > > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be
> > > working:
> > >
> > > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
> > >  failed to parse arguments: --osd-scrub-begin-hour,1
> > >
> > >
> > > I've also tried the daemon config set variant and it also fails:
> > >
> > > # ceph daemon osd.0 config set osd_scrub_begin_hour 1
> > > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such
> > file or
> > > directory"}
> > >
> > > I'm guessing I have something goofed in my admin socket client
> > > config:
> > >
> > > [client]
> > > rbd cache = true
> > > rbd cache writethrough until flush = true
> > > admin socket = /var/run/ceph/$cluster-$type.$id.asok
> > >
> > > but that seems to correlate with the structure that exists:
> > >
> > > # ls
> > > ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
> > > # pwd
> > > /var/run/ceph
> > >
> > > I can show my configs all over the place, but changing them seems to
> > always
> > > fail. It behaves the same if I'm working on a local daemon, or on my
> > config
> > > node trying to make changes globally.
> > >
> > > Thanks in advance for any ideas
> > >
> > > QH
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-29 Thread van

> On Jul 29, 2015, at 12:40 AM, Ilya Dryomov  wrote:
> 
> On Tue, Jul 28, 2015 at 7:20 PM, van  > wrote:
>> 
>>> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov  wrote:
>>> 
>>> On Tue, Jul 28, 2015 at 2:46 PM, van  wrote:
 Hi, Ilya,
 
 In the dmesg, there is also a lot of libceph socket error, which I think
 may be caused by my stopping ceph service without unmap rbd.
>>> 
>>> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
>>> of rbd device will get stuck.
>> 
>> Sure it will get stuck if osds are stopped. And since rados requests have 
>> retry policy, the stucked requests will recover after I start the daemon 
>> again.
>> 
>> But in my case, the osds are running in normal state and librbd API can 
>> read/write normally.
>> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device 
>> will get stuck.
>> 
>> I wonder if this phenomenon is triggered by running rbd kernel client on 
>> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
>> 
>> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will 
>> become unresponsive.
>> No matter the requests are from user space requests (like API) or from 
>> kernel client.
>> Am I right?
> 
> Not necessarily.
> 
>> 
>> If so, my case seems to be triggered by another bug.
>> 
>> Anyway, it seems that I should separate client and daemons at least.
> 
> Try 3.18.19 if you can.  I'd be interested in your results.

It’s strange, after I drop the page cache and restart my OSDs, same heavy IO 
tests on rbd folder now works fine.
The deadlock seems not that easy to trigger. Maybe I need longer tests.

I’ll try 3.18.19 LTS, thanks.

> 
> Thanks,
> 
>Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] injectargs not working?

2015-07-29 Thread Quentin Hartman
So it looks like the scrub was not actually the root of the problem. It
seems that I have some hardware that is failing that I'm now trying to run
down.

QH

On Wed, Jul 29, 2015 at 8:22 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Wed, 29 Jul 2015 17:59:10 -0600 Quentin Hartman wrote:
>
> > well, that would certainly do it. I _always_ forget to twiddle the little
> > thing on the web page that changes the version of the docs I'm looking
> > at.
> >
> > So I guess then my question becomes, "How do i prevent deep scrubs from
> > happening in the middle of the day and ruining everything?"
> >
>
> Firstly a qualification and quantification of "ruining everything" would
> be interesting, but I'll assume it's bad.
>
> I have (had) clusters where even simple scrubs would be detrimental, so I
> can relate.
>
> That being said, if your cluster goes catatonic when being scrubbed, you
> might want to improve it (more, faster OSDs, etc) because a deep scrub
> isn't all that different from the load you'll experience when loosing an
> OSD or node even, something your cluster should survive w/o becoming
> totally unusable in regards to client I/O.
>
> The most effective way to keep scrubs from starving client
> I/O is setting "osd_scrub_sleep = 0.1" (the recommended value in
> documentation seems to be far too small to have any beneficial effect for
> most people).
>
> To scrub at a specific time and given that your cluster can deep- scrub
> itself completely during the night, consider issuing a
> "ceph osd deep-scrub \*"
> late on a weekend evening.
>
> My largest cluster can deep scrub itself in 4 hours, so once I kicked that
> off at midnight on a Saturday all scrubs (daily) and deep scrubs
> (weekly) happen in that time frame.
>
> Christian
>
> > QH
> >
> >
> > On Wed, Jul 29, 2015 at 5:55 PM, Travis Rhoden 
> wrote:
> >
> > > Hi Quentin,
> > >
> > > It may be the specific option you are trying to tweak.
> > > osd-scrub-begin-hour was first introduced in development release
> > > v0.93, which means it would be in 0.94.x (Hammer), but your cluster is
> > > 0.87.1 (Giant).
> > >
> > > Cheers,
> > >
> > >  - Travis
> > >
> > > On Wed, Jul 29, 2015 at 4:28 PM, Quentin Hartman
> > >  wrote:
> > > > I'm running a 0.87.1 cluster, and my "ceph tell" seems to not be
> > > > working:
> > > >
> > > > # ceph tell osd.0 injectargs '--osd-scrub-begin-hour 1'
> > > >  failed to parse arguments: --osd-scrub-begin-hour,1
> > > >
> > > >
> > > > I've also tried the daemon config set variant and it also fails:
> > > >
> > > > # ceph daemon osd.0 config set osd_scrub_begin_hour 1
> > > > { "error": "error setting 'osd_scrub_begin_hour' to '1': (2) No such
> > > file or
> > > > directory"}
> > > >
> > > > I'm guessing I have something goofed in my admin socket client
> > > > config:
> > > >
> > > > [client]
> > > > rbd cache = true
> > > > rbd cache writethrough until flush = true
> > > > admin socket = /var/run/ceph/$cluster-$type.$id.asok
> > > >
> > > > but that seems to correlate with the structure that exists:
> > > >
> > > > # ls
> > > > ceph-osd.24.asok  ceph-osd.25.asok  ceph-osd.26.asok
> > > > # pwd
> > > > /var/run/ceph
> > > >
> > > > I can show my configs all over the place, but changing them seems to
> > > always
> > > > fail. It behaves the same if I'm working on a local daemon, or on my
> > > config
> > > > node trying to make changes globally.
> > > >
> > > > Thanks in advance for any ideas
> > > >
> > > > QH
> > > >
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] questions on editing crushmap for ceph cache tier

2015-07-29 Thread van
Hi, list,
  
 Ceph cache tier seems very promising for performance.
 According to 
http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
 

  , I need to create a new pool based on SSD OSDs。
 
 Currently, I’ve two servers with several HDD-based OSDs. I mean to add one 
SSD-based OSD for each server, and then use these two OSDs to build a cache 
pool.
 But I’ve found problems editing crushmap.
 The example in the link use two new hosts to build SSD OSDs and then create a 
new ruleset take the new hosts.
 But in my environment, I do not have new servers to use.
 Can I create a ruleset choose part of OSDs in a host?
 For example, as the crushmap shown below, osd.2 and osd.5 are new added 
SSD-based OSDs, how can I create a ruleset choose these two OSDs only and how 
can I avoid default ruleset to choose osd.2 and osd.5?
 Is this possible, or I have to add a new server to deploy cache tier?
 Thanks.

host node0 {
  id -2
  alg straw
  hash 0
  item osd.0 weight 1.0 # HDD
  item osd.1 weight 1.0 # HDD
  item osd.2 weight 0.5 # SSD
}

host node1 {
  id -3
  alg straw
  hash 0
  item osd.3 weight 1.0 # HDD
  item osd.4 weight 1.0 # HDD
  item osd.5 weight 0.5 # SSD
}

root default {
id -1   # do not change unnecessarily
# weight 1.560
alg straw
hash 0  # rjenkins1
item node0 weight 2.5
item node1 weight 2.5
}

 # typical ruleset
 rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default 
step chooseleaf firstn 0 type host
step emit
}



van
chaofa...@owtware.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-29 Thread Z Zhang
We also hit the similar issue from time to time on centos with 3.10.x kernel. 
By iostat, we can see kernel rbd client's util is 100%, but no r/w io, and we 
can't umount/unmap this rbd client. After restarting OSDs, it will become 
normal.
@Ilya, could you pls point us the possible fixes on 3.18.19 towards this issue? 
Then we can try to back-port them to our old kernel because we can't jump to a 
major kernel version.  

Thanks.
David Zhang

From: chaofa...@owtware.com
Date: Thu, 30 Jul 2015 10:30:12 +0800
To: idryo...@gmail.com
CC: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] which kernel version can help avoid kernel client 
deadlock


On Jul 29, 2015, at 12:40 AM, Ilya Dryomov  wrote:On Tue, 
Jul 28, 2015 at 7:20 PM, van  wrote:
On Jul 28, 2015, at 7:57 PM, Ilya Dryomov  wrote:

On Tue, Jul 28, 2015 at 2:46 PM, van  wrote:
Hi, Ilya,

In the dmesg, there is also a lot of libceph socket error, which I think
may be caused by my stopping ceph service without unmap rbd.

Well, sure enough, if you kill all OSDs, the filesystem mounted on top
of rbd device will get stuck.

Sure it will get stuck if osds are stopped. And since rados requests have retry 
policy, the stucked requests will recover after I start the daemon again.

But in my case, the osds are running in normal state and librbd API can 
read/write normally.
Meanwhile, heavy fio test for the filesystem mounted on top of rbd device will 
get stuck.

I wonder if this phenomenon is triggered by running rbd kernel client on 
machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.

In my opinion, if it’s due to the loopback mount deadlock, the OSDs will become 
unresponsive.
No matter the requests are from user space requests (like API) or from kernel 
client.
Am I right?
Not necessarily.
If so, my case seems to be triggered by another bug.

Anyway, it seems that I should separate client and daemons at least.
Try 3.18.19 if you can.  I'd be interested in your results.
It’s strange, after I drop the page cache and restart my OSDs, same heavy IO 
tests on rbd folder now works fine.The deadlock seems not that easy to trigger. 
Maybe I need longer tests.
I’ll try 3.18.19 LTS, thanks.
Thanks,   Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which kernel version can help avoid kernel client deadlock

2015-07-29 Thread van
> 
> On Jul 30, 2015, at 12:48 PM, Z Zhang  wrote:
> 
> We also hit the similar issue from time to time on centos with 3.10.x kernel. 
> By iostat, we can see kernel rbd client's util is 100%, but no r/w io, and we 
> can't umount/unmap this rbd client. After restarting OSDs, it will become 
> normal.

Is your rbd kernel client and ceph OSDs running on the same machine?
Or you’ve encountered this problem even you separate the kernel client and ceph 
OSDs?

> 
> @Ilya, could you pls point us the possible fixes on 3.18.19 towards this 
> issue? Then we can try to back-port them to our old kernel because we can't 
> jump to a major kernel version.  
> 
> Thanks.
> 
> David Zhang
> 
> 
> From: chaofa...@owtware.com
> Date: Thu, 30 Jul 2015 10:30:12 +0800
> To: idryo...@gmail.com
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] which kernel version can help avoid kernel client   
> deadlock
> 
> 
> On Jul 29, 2015, at 12:40 AM, Ilya Dryomov  > wrote:
> 
> On Tue, Jul 28, 2015 at 7:20 PM, van  > wrote:
> 
> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov  > wrote:
> 
> On Tue, Jul 28, 2015 at 2:46 PM, van  > wrote:
> Hi, Ilya,
> 
> In the dmesg, there is also a lot of libceph socket error, which I think
> may be caused by my stopping ceph service without unmap rbd.
> 
> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
> of rbd device will get stuck.
> 
> Sure it will get stuck if osds are stopped. And since rados requests have 
> retry policy, the stucked requests will recover after I start the daemon 
> again.
> 
> But in my case, the osds are running in normal state and librbd API can 
> read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device 
> will get stuck.
> 
> I wonder if this phenomenon is triggered by running rbd kernel client on 
> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
> 
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will 
> become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel 
> client.
> Am I right?
> 
> Not necessarily.
> 
> 
> If so, my case seems to be triggered by another bug.
> 
> Anyway, it seems that I should separate client and daemons at least.
> 
> Try 3.18.19 if you can.  I'd be interested in your results.
> 
> It’s strange, after I drop the page cache and restart my OSDs, same heavy IO 
> tests on rbd folder now works fine.
> The deadlock seems not that easy to trigger. Maybe I need longer tests.
> 
> I’ll try 3.18.19 LTS, thanks.
> 
> 
> Thanks,
> 
>Ilya
> 
> 
> ___ ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Elastic-sized RBD planned?

2015-07-29 Thread Shneur Zalman Mattern
Hi to all!


Perhaps, somebody already thought about, but my Googling had no results.


How can I do RBD that will grow on demand of VM/client disk space.

Are there in Ceph some options for this?

Is it planned to do?

Is it utopic idea?

Is this client need CephFS already?


Thanks,

Shneur




This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer 
viruses.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse mount in fstab

2015-07-29 Thread Alvaro Simon Garcia
Hi

More info about this issue, we have opened a ticket to redhat here is
the feedback:

https://bugzilla.redhat.com/show_bug.cgi?id=1248003

Cheers
Alvaro

On 16/07/15 15:19, Alvaro Simon Garcia wrote:
> Hi
>
> I have tested a bit this with different ceph-fuse versions and linux
> distros and it seems a mount issue in CentOS7. The problem is that mount
> tries to find first the = from fstab fs_spec field into
> the blkid block device attributes and of course this flag is not there
> and you always get an error like this:
> ||
> mount: can't find =
>
> and stops here, the mount values are never parsed by
> /sbin/mount.fuse.ceph helper...
>
> The only workaround that I found without change mount version is to
> change the "spurious" = by another special character like a colon for
> example:
>
> id:admin  /mnt/ceph fuse.ceph defaults 0 0
>
> but you also have to change /sbin/mount.fuse.ceph parser:
>
> ...
> # convert device string to options
> fs_spec=`echo $1 | sed 's/:/=/g'`
> cephargs='--'`echo $fs_spec | sed 's/,/ --/g'`
> ...
>
> but this is a bit annoying...
>
> someone else has found the same mount fuse issue in RHEL7 or CentOS?
>
> Cheers
> Alvaro
>
> On 09/07/15 12:22, Kenneth Waegeman wrote:
>> Hmm, it looks like a version issue..
>>
>> I am testing with these versions on centos7:
>>  ~]# mount -V
>> mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert)
>>  ~]# ceph-fuse -v
>> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>>
>> This do not work..
>>
>>
>> On my fedora box, with these versions from repo:
>> # mount -V
>> mount from util-linux 2.24.2 (libmount 2.24.0: selinux, debug, assert)
>> # ceph-fuse -v
>> ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>>
>> this works..
>>
>>
>> Which versions are you running?
>> And does someone knows from which versions , or which version
>> combinations do work?
>>
>> Thanks a lot!
>> K
>>
>> On 07/09/2015 11:53 AM, Thomas Lemarchand wrote:
>>> Hello Kenneth,
>>>
>>> I have a working ceph fuse in fstab. Only difference I see it that I
>>> don't use "conf", your configuration file is at the default path
>>> anyway.
>> I tried it with and without conf, but it always complains about id
>>> id=recette-files-rw,client_mountpoint=/recette-files/files
>>>   /mnt/wimi/ceph-files  fuse.ceph noatime,_netdev 0 0
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd mounting issue with ocfs2

2015-07-29 Thread gjprabu
Hi All,



   We are using ceph with two OSD and three clients. Clients try to mount with 
OCFS2 file system. Here when i start mounting only two clients i can able to 
mount properly and third client giving below errors. Some time i can able to 
mount third client but data not sync to third client.





mount /dev/rbd/rbd/integdownloads /soho/build/downloads



mount.ocfs2: Invalid argument while mounting /dev/rbd0 on 
/soho/build/downloads. Check 'dmesg' for more information on this error.



dmesg



[1280548.676688] (mount.ocfs2,1807,4):dlm_send_nodeinfo:1294 ERROR: node 
mismatch -22, node 0

[1280548.676766] (mount.ocfs2,1807,4):dlm_try_to_join_domain:1681 ERROR: status 
= -22

[1280548.677278] (mount.ocfs2,1807,8):dlm_join_domain:1950 ERROR: status = -22

[1280548.677443] (mount.ocfs2,1807,8):dlm_register_domain:2210 ERROR: status = 
-22

[1280548.677541] (mount.ocfs2,1807,8):o2cb_cluster_connect:368 ERROR: status = 
-22

[1280548.677602] (mount.ocfs2,1807,8):ocfs2_dlm_init:2988 ERROR: status = -22

[1280548.677703] (mount.ocfs2,1807,8):ocfs2_mount_volume:1864 ERROR: status = 
-22

[1280548.677800] ocfs2: Unmounting device (252,0) on (node 0)

[1280548.677808] (mount.ocfs2,1807,8):ocfs2_fill_super:1238 ERROR: status = -22







OCFS2 configuration



cluster:

   node_count=3

   heartbeat_mode = local

   name=ocfs2



node:

ip_port = 

ip_address = 192.168.112.192

number = 0

name = integ-hm5

cluster = ocfs2

node:

ip_port = 

ip_address = 192.168.113.42

number = 1

name = integ-soho

cluster = ocfs2

node:

ip_port = 7778

ip_address = 192.168.112.115

number = 2

name = integ-hm2

cluster = ocfs2



Ceph configuration

# ceph -s

cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f

 health HEALTH_OK

 monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

election epoch 54, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7

 osdmap e10: 2 osds: 2 up, 2 in

  pgmap v32626: 64 pgs, 1 pools, 10293 MB data, 8689 objects

14575 MB used, 23651 GB / 24921 GB avail

  64 active+clean

  client io 2047 B/s rd, 1023 B/s wr, 2 op/s





Regards

GJ









___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com