Re: [ceph-users] slow ops for mon slowly increasing

2019-09-20 Thread Kevin Olbrich
OK, looks like clock skew is the problem. I thought this is caused by the
reboot but it did not fix itself after some minutes (mon3 was 6 seconds
ahead).
After forcing time sync from the same server, it seems to be solved now.

Kevin

Am Fr., 20. Sept. 2019 um 07:33 Uhr schrieb Kevin Olbrich :

> Hi!
>
> Today some OSDs went down, a temporary problem that was solved easily.
> The mimic cluster is working and all OSDs are complete, all active+clean.
>
> Completely new for me is this:
> > 25 slow ops, oldest one blocked for 219 sec, mon.mon03 has slow ops
>
> The cluster itself looks fine, monitoring for the VMs that use RBD are
> fine.
>
> I thought that might be (https://tracker.ceph.com/issues/24531) but I've
> restarted the mon service (and the node as a whole) but both did not help.
> The slop ops slowly increase.
>
> Example:
>
> {
> "description": "auth(proto 0 30 bytes epoch 0)",
> "initiated_at": "2019-09-20 05:31:52.295858",
> "age": 7.851164,
> "duration": 7.900068,
> "type_data": {
> "events": [
> {
> "time": "2019-09-20 05:31:52.295858",
> "event": "initiated"
> },
> {
> "time": "2019-09-20 05:31:52.295858",
> "event": "header_read"
> },
> {
> "time": "2019-09-20 05:31:52.295864",
> "event": "throttled"
> },
> {
> "time": "2019-09-20 05:31:52.295875",
> "event": "all_read"
> },
> {
> "time": "2019-09-20 05:31:52.296075",
> "event": "dispatched"
> },
> {
> "time": "2019-09-20 05:31:52.296089",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2019-09-20 05:31:52.296097",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2019-09-20 05:31:52.296098",
> "event": "psvc:dispatch"
> },
> {
> "time": "2019-09-20 05:31:52.296172",
> "event": "auth:wait_for_readable"
> },
> {
> "time": "2019-09-20 05:31:52.296177",
> "event": "auth:wait_for_readable/paxos"
> },
> {
> "time": "2019-09-20 05:31:52.296232",
> "event": "paxos:wait_for_readable"
> }
> ],
> "info": {
> "seq": 1708,
> "src_is_mon": false,
> "source": "client.?
> [fd91:462b:4243:47e::1:3]:0/2365414961",
> "forwarded_to_leader": false
> }
> }
> },
> {
> "description": "auth(proto 0 30 bytes epoch 0)",
> "initiated_at": "2019-09-20 05:31:52.314892",
> "age": 7.832131,
> "duration": 7.881230,
> "type_data": {
> "events": [
> {
> "time": "2019-09-20 05:31:52.314892",
> "event": "initiated"
> },
> {
> "time": "2019-09-20 05:31:52.314892",
> "event": "header_read"
> },
> {
> "time": "2019-09-20 05:31:52.3

[ceph-users] slow ops for mon slowly increasing

2019-09-19 Thread Kevin Olbrich
Hi!

Today some OSDs went down, a temporary problem that was solved easily.
The mimic cluster is working and all OSDs are complete, all active+clean.

Completely new for me is this:
> 25 slow ops, oldest one blocked for 219 sec, mon.mon03 has slow ops

The cluster itself looks fine, monitoring for the VMs that use RBD are fine.

I thought that might be (https://tracker.ceph.com/issues/24531) but I've
restarted the mon service (and the node as a whole) but both did not help.
The slop ops slowly increase.

Example:

{
"description": "auth(proto 0 30 bytes epoch 0)",
"initiated_at": "2019-09-20 05:31:52.295858",
"age": 7.851164,
"duration": 7.900068,
"type_data": {
"events": [
{
"time": "2019-09-20 05:31:52.295858",
"event": "initiated"
},
{
"time": "2019-09-20 05:31:52.295858",
"event": "header_read"
},
{
"time": "2019-09-20 05:31:52.295864",
"event": "throttled"
},
{
"time": "2019-09-20 05:31:52.295875",
"event": "all_read"
},
{
"time": "2019-09-20 05:31:52.296075",
"event": "dispatched"
},
{
"time": "2019-09-20 05:31:52.296089",
"event": "mon:_ms_dispatch"
},
{
"time": "2019-09-20 05:31:52.296097",
"event": "mon:dispatch_op"
},
{
"time": "2019-09-20 05:31:52.296098",
"event": "psvc:dispatch"
},
{
"time": "2019-09-20 05:31:52.296172",
"event": "auth:wait_for_readable"
},
{
"time": "2019-09-20 05:31:52.296177",
"event": "auth:wait_for_readable/paxos"
},
{
"time": "2019-09-20 05:31:52.296232",
"event": "paxos:wait_for_readable"
}
],
"info": {
"seq": 1708,
"src_is_mon": false,
"source": "client.?
[fd91:462b:4243:47e::1:3]:0/2365414961",
"forwarded_to_leader": false
}
}
},
{
"description": "auth(proto 0 30 bytes epoch 0)",
"initiated_at": "2019-09-20 05:31:52.314892",
"age": 7.832131,
"duration": 7.881230,
"type_data": {
"events": [
{
"time": "2019-09-20 05:31:52.314892",
"event": "initiated"
},
{
"time": "2019-09-20 05:31:52.314892",
"event": "header_read"
},
{
"time": "2019-09-20 05:31:52.314897",
"event": "throttled"
},
{
"time": "2019-09-20 05:31:52.314907",
"event": "all_read"
},
{
"time": "2019-09-20 05:31:52.315057",
"event": "dispatched"
},
{
"time": "2019-09-20 05:31:52.315072",
"event": "mon:_ms_dispatch"
},
{
"time": "2019-09-20 05:31:52.315082",
"event": "mon:dispatch_op"
},
{
"time": "2019-09-20 05:31:52.315083",
"event": "psvc:dispatch"
},
{
"time": "2019-09-20 05:31:52.315161",
"event": "auth:wait_for_readable"
},
{
"time": "2019-09-20 05:31:52.315167",
"event": "auth:wait_for_readable/paxos"
},
{
"time": "2019-09-20 05:31:52.315230",
"event": "paxos:wait_for_readable"
}
],
"info": {
"seq": 1709,
"src_is_mon": false,
"source": "client.?
[fd91:462b:4243:47e::1:3]:0/997594187",

Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Am Di., 28. Mai 2019 um 10:20 Uhr schrieb Wido den Hollander :

>
>
> On 5/28/19 10:04 AM, Kevin Olbrich wrote:
> > Hi Wido,
> >
> > thanks for your reply!
> >
> > For CentOS 7, this means I can switch over to the "rpm-nautilus/el7"
> > repository and Qemu uses a nautilus compatible client?
> > I just want to make sure, I understand correctly.
> >
>
> Yes, that is correct. Keep in mind though that you will need to
> Stop/Start the VMs or (Live) Migrate them to a different hypervisor for
> the new packages to be loaded.
>
>
Actually the hosts are Fedora 29 which I need to re-deploy with Fedora 30
to get nautilus on the clients.
I just wanted to unterstand how this works. I always reboot the whole
machine after such a large change to make sure it works.

Thank you for your time!


> Wido
>
> > Thank you very much!
> >
> > Kevin
> >
> > Am Di., 28. Mai 2019 um 09:46 Uhr schrieb Wido den Hollander
> > mailto:w...@42on.com>>:
> >
> >
> >
> > On 5/28/19 7:52 AM, Kevin Olbrich wrote:
> > > Hi!
> > >
> > > How can I determine which client compatibility level (luminous,
> mimic,
> > > nautilus, etc.) is supported in Qemu/KVM?
> > > Does it depend on the version of ceph packages on the system? Or
> do I
> > > need a recent version Qemu/KVM?
> >
> > This is mainly related to librados and librbd on your system. Qemu
> talks
> > to librbd which then talks to librados.
> >
> > Qemu -> librbd -> librados -> Ceph cluster
> >
> > So make sure you keep the librbd and librados packages updated on
> your
> > hypervisor.
> >
> > When upgrading them make sure you either Stop/Start or Live Migrate
> the
> > VMs to a different hypervisor so the VMs are initiated with the new
> > code.
> >
> > Wido
> >
> > > Which component defines, which client level will be supported?
> > >
> > > Thank you very much!
> > >
> > > Kind regards
> > > Kevin
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Hi Wido,

thanks for your reply!

For CentOS 7, this means I can switch over to the "rpm-nautilus/el7"
repository and Qemu uses a nautilus compatible client?
I just want to make sure, I understand correctly.

Thank you very much!

Kevin

Am Di., 28. Mai 2019 um 09:46 Uhr schrieb Wido den Hollander :

>
>
> On 5/28/19 7:52 AM, Kevin Olbrich wrote:
> > Hi!
> >
> > How can I determine which client compatibility level (luminous, mimic,
> > nautilus, etc.) is supported in Qemu/KVM?
> > Does it depend on the version of ceph packages on the system? Or do I
> > need a recent version Qemu/KVM?
>
> This is mainly related to librados and librbd on your system. Qemu talks
> to librbd which then talks to librados.
>
> Qemu -> librbd -> librados -> Ceph cluster
>
> So make sure you keep the librbd and librados packages updated on your
> hypervisor.
>
> When upgrading them make sure you either Stop/Start or Live Migrate the
> VMs to a different hypervisor so the VMs are initiated with the new code.
>
> Wido
>
> > Which component defines, which client level will be supported?
> >
> > Thank you very much!
> >
> > Kind regards
> > Kevin
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] QEMU/KVM client compatibility

2019-05-27 Thread Kevin Olbrich
Hi!

How can I determine which client compatibility level (luminous, mimic,
nautilus, etc.) is supported in Qemu/KVM?
Does it depend on the version of ceph packages on the system? Or do I need
a recent version Qemu/KVM?
Which component defines, which client level will be supported?

Thank you very much!

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster is not stable

2019-03-12 Thread Kevin Olbrich
Are you sure that firewalld is stopped and disabled?
Looks exactly like that when I missed one host in a test cluster.

Kevin


Am Di., 12. März 2019 um 09:31 Uhr schrieb Zhenshi Zhou :

> Hi,
>
> I deployed a ceph cluster with good performance. But the logs
> indicate that the cluster is not as stable as I think it should be.
>
> The log shows the monitors mark some osd as down periodly:
> [image: image.png]
>
> I didn't find any useful information in osd logs.
>
> ceph version 13.2.4 mimic (stable)
> OS version CentOS 7.6.1810
> kernel version 5.0.0-2.el7
>
> Thanks.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-26 Thread Kevin Olbrich
dd 0.90999  1.0  932GiB  335GiB  597GiB 35.96 0.79  91
12   hdd 0.90999  1.0  932GiB  357GiB  575GiB 38.28 0.84  96
35   hdd 0.90970  1.0  932GiB  318GiB  614GiB 34.14 0.75  86
 6   ssd 0.43700  1.0  447GiB  278GiB  170GiB 62.08 1.36  63
 7   ssd 0.43700  1.0  447GiB  256GiB  191GiB 57.17 1.25  60
 8   ssd 0.43700  1.0  447GiB  291GiB  156GiB 65.01 1.42  57
31   ssd 0.43660  1.0  447GiB  246GiB  201GiB 54.96 1.20  51
34   ssd 0.43660  1.0  447GiB  189GiB  258GiB 42.22 0.92  46
36   ssd 0.87329  1.0  894GiB  389GiB  506GiB 43.45 0.95  91
37   ssd 0.87329  1.0  894GiB  390GiB  504GiB 43.63 0.96  85
42   ssd 0.87329  1.0  894GiB  401GiB  493GiB 44.88 0.98  92
43   ssd 0.87329  1.0  894GiB  455GiB  439GiB 50.89 1.11  89
17   hdd 0.90999  1.0  932GiB  368GiB  563GiB 39.55 0.87 100
18   hdd 0.90999  1.0  932GiB  350GiB  582GiB 37.56 0.82  95
24   hdd 0.90999  1.0  932GiB  359GiB  572GiB 38.58 0.84  97
26   hdd 0.90999  1.0  932GiB  388GiB  544GiB 41.62 0.91 105
13   ssd 0.43700  1.0  447GiB  322GiB  125GiB 72.12 1.58  80
14   ssd 0.43700  1.0  447GiB  291GiB  156GiB 65.16 1.43  70
15   ssd 0.43700  1.0  447GiB  350GiB 96.9GiB 78.33 1.72  78 <--
16   ssd 0.43700  1.0  447GiB  268GiB  179GiB 60.05 1.31  71
23   hdd 0.90999  1.0  932GiB  364GiB  567GiB 39.08 0.86  98
25   hdd 0.90999  1.0  932GiB  391GiB  541GiB 41.92 0.92 106
27   hdd 0.90999  1.0  932GiB  393GiB  538GiB 42.21 0.92 106
28   hdd 0.90970  1.0  932GiB  467GiB  464GiB 50.14 1.10 126
19   ssd 0.43700  1.0  447GiB  310GiB  137GiB 69.36 1.52  76
20   ssd 0.43700  1.0  447GiB  316GiB  131GiB 70.66 1.55  76
21   ssd 0.43700  1.0  447GiB  323GiB  125GiB 72.13 1.58  80
22   ssd 0.43700  1.0  447GiB  283GiB  164GiB 63.39 1.39  69
38   ssd 0.43660  1.0  447GiB  146GiB  302GiB 32.55 0.71  46
39   ssd 0.43660  1.0  447GiB  142GiB  305GiB 31.84 0.70  43
40   ssd 0.87329  1.0  894GiB  407GiB  487GiB 45.53 1.00  98
41   ssd 0.87329  1.0  894GiB  353GiB  541GiB 39.51 0.87 102
TOTAL 29.9TiB 13.7TiB 16.3TiB 45.66
MIN/MAX VAR: 0.63/1.72  STDDEV: 13.59




Kevin

Am So., 6. Jan. 2019 um 07:34 Uhr schrieb Konstantin Shalygin :
>
> On 1/5/19 4:17 PM, Kevin Olbrich wrote:
> > root@adminnode:~# ceph osd tree
> > ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
> >   -1   30.82903 root default
> > -16   30.82903 datacenter dc01
> > -19   30.82903 pod dc01-agg01
> > -10   17.43365 rack dc01-rack02
> >   -47.20665 host node1001
> >0   hdd  0.90999 osd.0 up  1.0 1.0
> >1   hdd  0.90999 osd.1 up  1.0 1.0
> >5   hdd  0.90999 osd.5 up  1.0 1.0
> >   29   hdd  0.90970 osd.29up  1.0 1.0
> >   32   hdd  0.90970 osd.32  down0 1.0
> >   33   hdd  0.90970 osd.33up  1.0 1.0
> >2   ssd  0.43700 osd.2 up  1.0 1.0
> >3   ssd  0.43700 osd.3 up  1.0 1.0
> >4   ssd  0.43700 osd.4 up  1.0 1.0
> >   30   ssd  0.43660 osd.30up  1.0 1.0
> >   -76.29724 host node1002
> >9   hdd  0.90999 osd.9 up  1.0 1.0
> >   10   hdd  0.90999 osd.10up  1.0 1.0
> >   11   hdd  0.90999 osd.11up  1.0 1.0
> >   12   hdd  0.90999 osd.12up  1.0 1.0
> >   35   hdd  0.90970 osd.35up  1.0 1.0
> >6   ssd  0.43700 osd.6 up  1.0 1.0
> >7   ssd  0.43700 osd.7 up  1.0 1.0
> >8   ssd  0.43700 osd.8 up  1.0 1.0
> >   31   ssd  0.43660 osd.31up  1.0 1.0
> > -282.18318 host node1005
> >   34   ssd  0.43660 osd.34up  1.0 1.0
> >   36   ssd  0.87329 osd.36up  1.0 1.0
> >   37   ssd  0.87329 osd.37up  1.0 1.0
> > -291.74658 host node1006
> >   42   ssd  0.87329 osd.42up  1.0 1.0
> >   43   ssd  0.87329 osd.43up  1.0 1.0
> > -11   13.39537 rack dc01-rack03
> > -225.38794 host node100

Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Kevin Olbrich
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke
:
>
> Hi,
>
> I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.
>
> I grow that rbd and ext4 starting with an 2TB rbd that way:
>
> rbd resize testpool/disk01--size 4194304
>
> resize2fs /dev/rbd0
>
> Today I wanted to extend that ext4 to 8 TB and did:
>
> rbd resize testpool/disk01--size 8388608
>
> resize2fs /dev/rbd0
>
> => which gives an error: The filesystem is already 1073741824 blocks. Nothing 
> to do.
>
>
> I bet I missed something very simple. Any hint? Thanks and regards . 
> Götz

Try "partprobe" to read device metrics again.

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck in creating+peering state

2019-01-17 Thread Kevin Olbrich
Are you sure, no service like firewalld is running?
Did you check that all machines have the same MTU and jumbo frames are
enabled if needed?

I had this problem when I first started with ceph and forgot to
disable firewalld.
Replication worked perfectly fine but the OSD was kicked out every few seconds.

Kevin

Am Do., 17. Jan. 2019 um 11:57 Uhr schrieb Johan Thomsen :
>
> Hi,
>
> I have a sad ceph cluster.
> All my osds complain about failed reply on heartbeat, like so:
>
> osd.10 635 heartbeat_check: no reply from 192.168.160.237:6810 osd.42
> ever on either front or back, first ping sent 2019-01-16
> 22:26:07.724336 (cutoff 2019-01-16 22:26:08.225353)
>
> .. I've checked the network sanity all I can, and all ceph ports are
> open between nodes both on the public network and the cluster network,
> and I have no problems sending traffic back and forth between nodes.
> I've tried tcpdump'ing and traffic is passing in both directions
> between the nodes, but unfortunately I don't natively speak the ceph
> protocol, so I can't figure out what's going wrong in the heartbeat
> conversation.
>
> Still:
>
> # ceph health detail
>
> HEALTH_WARN nodown,noout flag(s) set; Reduced data availability: 1072
> pgs inactive, 1072 pgs peering
> OSDMAP_FLAGS nodown,noout flag(s) set
> PG_AVAILABILITY Reduced data availability: 1072 pgs inactive, 1072 pgs peering
> pg 7.3cd is stuck inactive for 245901.560813, current state
> creating+peering, last acting [13,41,1]
> pg 7.3ce is stuck peering for 245901.560813, current state
> creating+peering, last acting [1,40,7]
> pg 7.3cf is stuck peering for 245901.560813, current state
> creating+peering, last acting [0,42,9]
> pg 7.3d0 is stuck peering for 245901.560813, current state
> creating+peering, last acting [20,8,38]
> pg 7.3d1 is stuck peering for 245901.560813, current state
> creating+peering, last acting [10,20,42]
>()
>
>
> I've set "noout" and "nodown" to prevent all osd's from being removed
> from the cluster. They are all running and marked as "up".
>
> # ceph osd tree
>
> ID  CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
>  -1   249.73434 root default
> -25   166.48956 datacenter m1
> -2483.24478 pod kube1
> -3541.62239 rack 10
> -3441.62239 host ceph-sto-p102
>  40   hdd   7.27689 osd.40 up  1.0 1.0
>  41   hdd   7.27689 osd.41 up  1.0 1.0
>  42   hdd   7.27689 osd.42 up  1.0 1.0
>()
>
> I'm at a point where I don't know which options and what logs to check 
> anymore?
>
> Any debug hint would be very much appreciated.
>
> btw. I have no important data in the cluster (yet), so if the solution
> is to drop all osd and recreate them, it's ok for now. But I'd really
> like to know how the cluster ended in this state.
>
> /Johan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
It would but you should not:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html

Kevin

Am Di., 8. Jan. 2019 um 15:35 Uhr schrieb Rodrigo Embeita
:
>
> Thanks again Kevin.
> If I reduce the size flag to a value of 2, that should fix the problem?
>
> Regards
>
> On Tue, Jan 8, 2019 at 11:28 AM Kevin Olbrich  wrote:
>>
>> You use replication 3 failure-domain host.
>> OSD 2 and 4 are full, thats why your pool is also full.
>> You need to add two disks to pf-us1-dfs3 or swap one from the larger
>> nodes to this one.
>>
>> Kevin
>>
>> Am Di., 8. Jan. 2019 um 15:20 Uhr schrieb Rodrigo Embeita
>> :
>> >
>> > Hi Yoann, thanks for your response.
>> > Here are the results of the commands.
>> >
>> > root@pf-us1-dfs2:/var/log/ceph# ceph osd df
>> > ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>> > 0   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310
>> > 5   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271
>> > 6   hdd 7.27739  1.0 7.3 TiB 609 GiB 6.7 TiB  8.17 0.15  49
>> > 8   hdd 7.27739  1.0 7.3 TiB 2.5 GiB 7.3 TiB  0.030  42
>> > 1   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285
>> > 3   hdd 7.27739  1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296
>> > 7   hdd 7.27739  1.0 7.3 TiB 360 GiB 6.9 TiB  4.84 0.09  53
>> > 9   hdd 7.27739  1.0 7.3 TiB 4.1 GiB 7.3 TiB  0.06 0.00  38
>> > 2   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321
>> > 4   hdd 7.27739  1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351
>> >TOTAL  73 TiB  39 TiB  34 TiB 53.13
>> > MIN/MAX VAR: 0/1.79  STDDEV: 41.15
>> >
>> >
>> > root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail
>> > pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 object_hash 
>> > rjenkins pg_num 128 pgp_num 128 last_change 471 fla
>> > gs hashpspool,full stripe_width 0
>> > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash 
>> > rjenkins pg_num 256 pgp_num 256 last_change 471 lf
>> > or 0/439 flags hashpspool,full stripe_width 0 application cephfs
>> > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
>> > object_hash rjenkins pg_num 256 pgp_num 256 last_change 47
>> > 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs
>> > pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
>> > rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha
>> > shpspool,full stripe_width 0 application rgw
>> > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
>> > object_hash rjenkins pg_num 8 pgp_num 8 last_change 47
>> > 1 flags hashpspool,full stripe_width 0 application rgw
>> > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 
>> > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f
>> > lags hashpspool,full stripe_width 0 application rgw
>> > pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 
>> > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl
>> > ags hashpspool,full stripe_width 0 application rgw
>> >
>> >
>> > root@pf-us1-dfs2:/var/log/ceph# ceph osd tree
>> > ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> > -1   72.77390 root default
>> > -3   29.10956 host pf-us1-dfs1
>> > 0   hdd  7.27739 osd.0up  1.0 1.0
>> > 5   hdd  7.27739 osd.5up  1.0 1.0
>> > 6   hdd  7.27739 osd.6up  1.0 1.0
>> > 8   hdd  7.27739 osd.8up  1.0 1.0
>> > -5   29.10956 host pf-us1-dfs2
>> > 1   hdd  7.27739 osd.1up  1.0 1.0
>> > 3   hdd  7.27739 osd.3up  1.0 1.0
>> > 7   hdd  7.27739 osd.7up  1.0 1.0
>> > 9   hdd  7.27739 osd.9up  1.0 1.0
>> > -7   14.55478 host pf-us1-dfs3
>> > 2   hdd  7.27739 osd.2up  1.0 1.0
>> > 4   hdd  7.27739 osd.4up  1.0 1.0
>> >
>> >
>> > Thanks for your help guys.
>> >
>> >
>> > On Tue, Jan 8, 2019 at 10:36 AM Yoann Moulin  wrote:
>> >>
>> >> Hello,
>> >>
>> >> > Hi guys, I need your help.
>> >> > I'm new with Cephfs and we started using it 

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
You use replication 3 failure-domain host.
OSD 2 and 4 are full, thats why your pool is also full.
You need to add two disks to pf-us1-dfs3 or swap one from the larger
nodes to this one.

Kevin

Am Di., 8. Jan. 2019 um 15:20 Uhr schrieb Rodrigo Embeita
:
>
> Hi Yoann, thanks for your response.
> Here are the results of the commands.
>
> root@pf-us1-dfs2:/var/log/ceph# ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
> 0   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310
> 5   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271
> 6   hdd 7.27739  1.0 7.3 TiB 609 GiB 6.7 TiB  8.17 0.15  49
> 8   hdd 7.27739  1.0 7.3 TiB 2.5 GiB 7.3 TiB  0.030  42
> 1   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285
> 3   hdd 7.27739  1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296
> 7   hdd 7.27739  1.0 7.3 TiB 360 GiB 6.9 TiB  4.84 0.09  53
> 9   hdd 7.27739  1.0 7.3 TiB 4.1 GiB 7.3 TiB  0.06 0.00  38
> 2   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321
> 4   hdd 7.27739  1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351
>TOTAL  73 TiB  39 TiB  34 TiB 53.13
> MIN/MAX VAR: 0/1.79  STDDEV: 41.15
>
>
> root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail
> pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 128 pgp_num 128 last_change 471 fla
> gs hashpspool,full stripe_width 0
> pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 256 pgp_num 256 last_change 471 lf
> or 0/439 flags hashpspool,full stripe_width 0 application cephfs
> pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 47
> 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs
> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha
> shpspool,full stripe_width 0 application rgw
> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 47
> 1 flags hashpspool,full stripe_width 0 application rgw
> pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f
> lags hashpspool,full stripe_width 0 application rgw
> pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl
> ags hashpspool,full stripe_width 0 application rgw
>
>
> root@pf-us1-dfs2:/var/log/ceph# ceph osd tree
> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> -1   72.77390 root default
> -3   29.10956 host pf-us1-dfs1
> 0   hdd  7.27739 osd.0up  1.0 1.0
> 5   hdd  7.27739 osd.5up  1.0 1.0
> 6   hdd  7.27739 osd.6up  1.0 1.0
> 8   hdd  7.27739 osd.8up  1.0 1.0
> -5   29.10956 host pf-us1-dfs2
> 1   hdd  7.27739 osd.1up  1.0 1.0
> 3   hdd  7.27739 osd.3up  1.0 1.0
> 7   hdd  7.27739 osd.7up  1.0 1.0
> 9   hdd  7.27739 osd.9up  1.0 1.0
> -7   14.55478 host pf-us1-dfs3
> 2   hdd  7.27739 osd.2up  1.0 1.0
> 4   hdd  7.27739 osd.4up  1.0 1.0
>
>
> Thanks for your help guys.
>
>
> On Tue, Jan 8, 2019 at 10:36 AM Yoann Moulin  wrote:
>>
>> Hello,
>>
>> > Hi guys, I need your help.
>> > I'm new with Cephfs and we started using it as file storage.
>> > Today we are getting no space left on device but I'm seeing that we have 
>> > plenty space on the filesystem.
>> > Filesystem  Size  Used Avail Use% Mounted on
>> > 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts   73T   
>> > 39T   35T  54% /mnt/cephfs
>> >
>> > We have 35TB of disk space. I've added 2 additional OSD disks with 7TB 
>> > each but I'm getting the error "No space left on device" every time that
>> > I want to add a new file.
>> > After adding the 2 additional OSD disks I'm seeing that the load is beign 
>> > distributed among the cluster.
>> > Please I need your help.
>>
>> Could you give us the output of
>>
>> ceph osd df
>> ceph osd pool ls detail
>> ceph osd tree
>>
>> Best regards,
>>
>> --
>> Yoann Moulin
>> EPFL IC-IT
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
Looks like the same problem like mine:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032054.html

The free space is total while Ceph uses the smallest free space (worst OSD).
Please check your (re-)weights.

Kevin

Am Di., 8. Jan. 2019 um 14:32 Uhr schrieb Rodrigo Embeita
:
>
> Hi guys, I need your help.
> I'm new with Cephfs and we started using it as file storage.
> Today we are getting no space left on device but I'm seeing that we have 
> plenty space on the filesystem.
> Filesystem  Size  Used Avail Use% Mounted on
> 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts   73T   
> 39T   35T  54% /mnt/cephfs
>
> We have 35TB of disk space. I've added 2 additional OSD disks with 7TB each 
> but I'm getting the error "No space left on device" every time that I want to 
> add a new file.
> After adding the 2 additional OSD disks I'm seeing that the load is beign 
> distributed among the cluster.
> Please I need your help.
>
> root@pf-us1-dfs1:/etc/ceph# ceph -s
>  cluster:
>id: 609e9313-bdd3-449e-a23f-3db8382e71fb
>health: HEALTH_ERR
>2 backfillfull osd(s)
>1 full osd(s)
>7 pool(s) full
>197313040/508449063 objects misplaced (38.807%)
>Degraded data redundancy: 2/508449063 objects degraded (0.000%), 2 
> pgs degraded
>Degraded data redundancy (low space): 16 pgs backfill_toofull, 3 
> pgs recovery_toofull
>
>  services:
>mon: 3 daemons, quorum pf-us1-dfs2,pf-us1-dfs1,pf-us1-dfs3
>mgr: pf-us1-dfs3(active), standbys: pf-us1-dfs2
>mds: pagefs-2/2/2 up  {0=pf-us1-dfs3=up:active,1=pf-us1-dfs1=up:active}, 1 
> up:standby
>osd: 10 osds: 10 up, 10 in; 189 remapped pgs
>rgw: 1 daemon active
>
>  data:
>pools:   7 pools, 416 pgs
>objects: 169.5 M objects, 3.6 TiB
>usage:   39 TiB used, 34 TiB / 73 TiB avail
>pgs: 2/508449063 objects degraded (0.000%)
> 197313040/508449063 objects misplaced (38.807%)
> 224 active+clean
> 168 active+remapped+backfill_wait
> 16  active+remapped+backfill_wait+backfill_toofull
> 5   active+remapped+backfilling
> 2   active+recovery_toofull+degraded
> 1   active+recovery_toofull
>
>  io:
>recovery: 1.1 MiB/s, 31 objects/s
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer=on with crush-compat mode

2019-01-05 Thread Kevin Olbrich
If I understand the balancer correct, it balances PGs not data.
This worked perfectly fine in your case.

I prefer a PG count of ~100 per OSD, you are at 30. Maybe it would
help to bump the PGs.

Kevin

Am Sa., 5. Jan. 2019 um 14:39 Uhr schrieb Marc Roos :
>
>
> I have straw2, balancer=on, crush-compat and it gives worst spread over
> my ssd drives (4 only) being used by only 2 pools. One of these pools
> has pg 8. Should I increase this to 16 to create a better result, or
> will it never be any better.
>
> For now I like to stick to crush-compat, so I can use a default centos7
> kernel.
>
> Luminous 12.2.8, 3.10.0-862.14.4.el7.x86_64, CentOS Linux release
> 7.5.1804 (Core)
>
>
>
> [@c01 ~]# cat balancer-1-before.txt | egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.0  447GiB  164GiB  283GiB 36.79 0.93  31
> 20   ssd 0.48000  1.0  447GiB  136GiB  311GiB 30.49 0.77  32
> 21   ssd 0.48000  1.0  447GiB  215GiB  232GiB 48.02 1.22  30
> 30   ssd 0.48000  1.0  447GiB  151GiB  296GiB 33.72 0.86  27
>
> [@c01 ~]# ceph osd df | egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.0  447GiB  157GiB  290GiB 35.18 0.87  30
> 20   ssd 0.48000  1.0  447GiB  125GiB  322GiB 28.00 0.69  30
> 21   ssd 0.48000  1.0  447GiB  245GiB  202GiB 54.71 1.35  30
> 30   ssd 0.48000  1.0  447GiB  217GiB  230GiB 48.46 1.20  30
>
> [@c01 ~]# ceph osd pool ls detail | egrep 'fs_meta|rbd.ssd'
> pool 19 'fs_meta' replicated size 3 min_size 2 crush_rule 5 object_hash
> rjenkins pg_num 16 pgp_num 16 last_change 22425 lfor 0/9035 flags
> hashpspool stripe_width 0 application cephfs
> pool 54 'rbd.ssd' replicated size 3 min_size 2 crush_rule 5 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 24666 flags hashpspool
> stripe_width 0 application rbd
>
> [@c01 ~]# ceph df |egrep 'ssd|fs_meta'
> fs_meta   19  170MiB  0.07
> 240GiB 2451382
> fs_data.ssd   33  0B 0
> 240GiB   0
> rbd.ssd   54  266GiB 52.57
> 240GiB   75902
> fs_data.ec21.ssd  55  0B 0
> 480GiB   0
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-05 Thread Kevin Olbrich
osd.33
  2   ssd  0.43700  1.0  447GiB  271GiB  176GiB 60.67 1.30  50
osd.2
  3   ssd  0.43700  1.0  447GiB  249GiB  198GiB 55.62 1.19  58
osd.3
  4   ssd  0.43700  1.0  447GiB  297GiB  150GiB 66.39 1.42  56
osd.4
 30   ssd  0.43660  1.0  447GiB  236GiB  211GiB 52.85 1.13  48
osd.30
 -76.29724- 6.29TiB 2.74TiB 3.55TiB 43.53 0.93   -
host node1002
  9   hdd  0.90999  1.0  932GiB  354GiB  578GiB 37.96 0.81  95
osd.9
 10   hdd  0.90999  1.0  932GiB  357GiB  575GiB 38.28 0.82  96
osd.10
 11   hdd  0.90999  1.0  932GiB  318GiB  613GiB 34.18 0.73  86
osd.11
 12   hdd  0.90999  1.0  932GiB  373GiB  558GiB 40.09 0.86 100
osd.12
 35   hdd  0.90970  1.0  932GiB  343GiB  588GiB 36.83 0.79  92
osd.35
  6   ssd  0.43700  1.0  447GiB  269GiB  178GiB 60.20 1.29  60
osd.6
  7   ssd  0.43700  1.0  447GiB  249GiB  198GiB 55.69 1.19  56
osd.7
  8   ssd  0.43700  1.0  447GiB  286GiB  161GiB 63.95 1.37  56
osd.8
 31   ssd  0.43660  1.0  447GiB  257GiB  190GiB 57.47 1.23  55
osd.31
-282.18318- 2.18TiB  968GiB 1.24TiB 43.29 0.93   -
host node1005
 34   ssd  0.43660  1.0  447GiB  202GiB  245GiB 45.14 0.97  47
osd.34
 36   ssd  0.87329  1.0  894GiB  405GiB  489GiB 45.28 0.97  91
osd.36
 37   ssd  0.87329  1.0  894GiB  361GiB  533GiB 40.38 0.87  79
osd.37
-291.74658- 1.75TiB  888GiB  900GiB 49.65 1.06   -
host node1006
 42   ssd  0.87329  1.0  894GiB  417GiB  477GiB 46.68 1.00  92
osd.42
 43   ssd  0.87329  1.0  894GiB  471GiB  424GiB 52.63 1.13  90
osd.43
-11   13.39537- 13.4TiB 6.64TiB 6.75TiB 49.60 1.06   -
rack dc01-rack03
-225.38794- 5.39TiB 2.70TiB 2.69TiB 50.14 1.07   -
host node1003
 17   hdd  0.90999  1.0  932GiB  371GiB  560GiB 39.83 0.85 100
osd.17
 18   hdd  0.90999  1.0  932GiB  390GiB  542GiB 41.82 0.90 105
osd.18
 24   hdd  0.90999  1.0  932GiB  352GiB  580GiB 37.77 0.81  94
osd.24
 26   hdd  0.90999  1.0  932GiB  387GiB  545GiB 41.54 0.89 104
osd.26
 13   ssd  0.43700  1.0  447GiB  319GiB  128GiB 71.32 1.53  77
osd.13
 14   ssd  0.43700  1.0  447GiB  303GiB  144GiB 67.76 1.45  70
osd.14
 15   ssd  0.43700  1.0  447GiB  361GiB 86.4GiB 80.67 1.73  77
osd.15
 16   ssd  0.43700  1.0  447GiB  283GiB  164GiB 63.29 1.36  71
osd.16
-255.38765- 5.39TiB 2.83TiB 2.56TiB 52.55 1.13   -
host node1004
 23   hdd  0.90999  1.0  932GiB  382GiB  549GiB 41.05 0.88 102
osd.23
 25   hdd  0.90999  1.0  932GiB  412GiB  520GiB 44.20 0.95 111
osd.25
 27   hdd  0.90999  1.0  932GiB  385GiB  546GiB 41.36 0.89 103
osd.27
 28   hdd  0.90970  1.0  932GiB  462GiB  469GiB 49.64 1.06 124
osd.28
 19   ssd  0.43700  1.0  447GiB  314GiB  133GiB 70.22 1.51  75
osd.19
 20   ssd  0.43700  1.0  447GiB  327GiB  120GiB 73.06 1.57  76
osd.20
 21   ssd  0.43700  1.0  447GiB  324GiB  123GiB 72.45 1.55  77
osd.21
 22   ssd  0.43700  1.0  447GiB  292GiB  156GiB 65.21 1.40  68
osd.22
-302.61978- 2.62TiB 1.11TiB 1.51TiB 42.43 0.91   -
host node1007
 38   ssd  0.43660  1.0  447GiB  165GiB  283GiB 36.82 0.79  46
osd.38
 39   ssd  0.43660  1.0  447GiB  156GiB  292GiB 34.79 0.75  42
osd.39
 40   ssd  0.87329  1.0  894GiB  429GiB  466GiB 47.94 1.03  98
osd.40
 41   ssd  0.87329  1.0  894GiB  389GiB  505GiB 43.55 0.93 103
osd.41
  TOTAL 29.9TiB 14.0TiB 16.0TiB 46.65
MIN/MAX VAR: 0.65/1.73  STDDEV: 13.30

=
root@adminnode:~# ceph df && ceph -v
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
29.9TiB 16.0TiB  14.0TiB 46.65
POOLS:
NAME  ID USED%USED MAX AVAIL OBJECTS
rbd_vms_ssd   2   986GiB 49.83993GiB  262606
rbd_vms_hdd   3  3.76TiB 48.94   3.92TiB  992255
rbd_vms_ssd_014   372KiB 0662GiB 148
rbd_vms_ssd_01_ec 6  2.85TiB 68.81   1.29TiB  770506

ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)

Kevin

Am Sa., 5. Jan. 2019 um 05:12 Uhr schrieb Konstantin Shalygin :
>
> On 1/5/19 1:51 AM, Kevin Olbrich wrote:
> &

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
degraded, acting [9,31]
> pg 14.8f9 is activating+degraded, acting [27,21]
> pg 14.901 is activating+degraded, acting [22,8]
> pg 14.910 is activating+degraded, acting [17,2]
> pg 20.808 is activating+degraded, acting [20,12]
> pg 20.825 is activating+degraded, acting [25,35]
> pg 20.827 is activating+degraded, acting [23,16]
> pg 20.829 is activating+degraded, acting [20,31]
> pg 20.837 is activating+degraded, acting [31,6]
> pg 20.83c is activating+degraded, acting [26,17]
> pg 20.85e is activating+degraded, acting [4,27]
> pg 20.85f is activating+degraded, acting [1,25]
> pg 20.865 is activating+degraded, acting [8,33]
> pg 20.88b is activating+degraded, acting [6,32]
> pg 20.895 is stale+activating+degraded, acting [37,27]
> pg 20.89c is activating+degraded, acting [1,24]
> pg 20.8a3 is activating+degraded, acting [30,1]
> pg 20.8ad is activating+degraded, acting [1,20]
> pg 20.8af is activating+degraded, acting [33,31]
> pg 20.8b4 is activating+degraded, acting [9,1]
> pg 20.8b7 is activating+degraded, acting [0,33]
> pg 20.8b9 is activating+degraded, acting [20,24]
> pg 20.8c5 is activating+degraded, acting [27,14]
> pg 20.8d1 is activating+degraded, acting [10,7]
> pg 20.8d4 is activating+degraded, acting [28,21]
> pg 20.8d5 is activating+degraded, acting [24,15]
> pg 20.8e0 is activating+degraded, acting [18,0]
> pg 20.8e2 is activating+degraded, acting [25,7]
> pg 20.8ea is activating+degraded, acting [17,21]
> pg 20.8f1 is activating+degraded, acting [15,11]
> pg 20.8fb is activating+degraded, acting [10,24]
> pg 20.8fc is activating+degraded, acting [20,15]
> pg 20.8ff is activating+degraded, acting [18,25]
> pg 20.913 is activating+degraded, acting [11,0]
> pg 20.91d is activating+degraded, acting [10,16]
> REQUEST_SLOW 99059 slow requests are blocked > 32 sec
> 24235 ops are blocked > 2097.15 sec
> 17029 ops are blocked > 1048.58 sec
> 54122 ops are blocked > 524.288 sec
> 2311 ops are blocked > 262.144 sec
> 767 ops are blocked > 131.072 sec
> 396 ops are blocked > 65.536 sec
> 199 ops are blocked > 32.768 sec
> osd.32 has blocked requests > 262.144 sec
> osds 5,8,12,26,28 have blocked requests > 524.288 sec
> osds 1,3,9,10 have blocked requests > 1048.58 sec
> osds 2,14,18,19,20,23,24,25,27,29,30,31,33,34,35 have blocked requests > 
> 2097.15 sec
> REQUEST_STUCK 4834 stuck requests are blocked > 4096 sec
> 4834 ops are blocked > 4194.3 sec
> osds 0,4,11,13,17,21,22 have stuck requests > 4194.3 sec
> TOO_MANY_PGS too many PGs per OSD (3003 > max 200)
> [root@fre101 ~]#
>
> [root@fre101 ~]# ceph -s
> 2019-01-04 15:18:53.398950 7fc372c94700 -1 asok(0x7fc36c0017a0) 
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
> bind the UNIX domain socket to 
> '/var/run/ceph-guests/ceph-client.admin.130425.140477307296080.asok': (2) No 
> such file or directory
>   cluster:
> id: adb9ad8e-f458-4124-bf58-7963a8d1391f
> health: HEALTH_ERR
> 3 pools have many more objects per pg than average
> 523656/12393978 objects misplaced (4.225%)
> 6523 PGs pending on creation
> Reduced data availability: 6584 pgs inactive, 1267 pgs down, 2 
> pgs peering, 2696 pgs stale
> Degraded data redundancy: 86858/12393978 objects degraded 
> (0.701%), 717 pgs degraded, 21 pgs undersized
> 107622 slow requests are blocked > 32 sec
> 4957 stuck requests are blocked > 4096 sec
> too many PGs per OSD (3003 > max 200)
>
>   services:
> mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
> mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
> osd: 39 osds: 39 up, 36 in; 85 remapped pgs
> rgw: 1 daemon active
>
>   data:
> pools:   18 pools, 54656 pgs
> objects: 6051k objects, 10947 GB
> usage:   21971 GB used, 50650 GB / 72622 GB avail
> pgs: 0.002% pgs unknown
>  12.046% pgs not active
>  86858/12393978 objects degraded (0.701%)
>  523656/12393978 objects misplaced (4.225%)
>  46743 active+clean
>  4342  activating
>  1317  stale+active+clean
>  1151  stale+down
>  667   activating+degraded
>  159   stale+activating
>      116   down
>  77activating+remapped
>  34stale+activating+degraded
>  21stale+activating+remapped
>  9 stale+active+undersiz

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
I don't think this will help you. Unfound means, the cluster is unable
to find the data anywhere (it's lost).
It would be sufficient to shut down the new host - the OSDs will then be out.

You can also force-heal the cluster, something like "do your best possible":

ceph pg 2.5 mark_unfound_lost revert|delete

Src: http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/

Kevin

Am Fr., 4. Jan. 2019 um 20:47 Uhr schrieb Arun POONIA
:
>
> Hi Kevin,
>
> Can I remove newly added server from Cluster and see if it heals cluster ?
>
> When I check Hard Disk Iops on new server which are very low compared to 
> existing cluster server.
>
> Indeed this is a critical cluster but I don't have expertise to make it 
> flawless.
>
> Thanks
> Arun
>
> On Fri, Jan 4, 2019 at 11:35 AM Kevin Olbrich  wrote:
>>
>> If you realy created and destroyed OSDs before the cluster healed
>> itself, this data will be permanently lost (not found / inactive).
>> Also your PG count is so much oversized, the calculation for peering
>> will most likely break because this was never tested.
>>
>> If this is a critical cluster, I would start a new one and bring back
>> the backups (using a better PG count).
>>
>> Kevin
>>
>> Am Fr., 4. Jan. 2019 um 20:25 Uhr schrieb Arun POONIA
>> :
>> >
>> > Can anyone comment on this issue please, I can't seem to bring my cluster 
>> > healthy.
>> >
>> > On Fri, Jan 4, 2019 at 6:26 AM Arun POONIA  
>> > wrote:
>> >>
>> >> Hi Caspar,
>> >>
>> >> Number of IOPs are also quite low. It used be around 1K Plus on one of 
>> >> Pool (VMs) now its like close to 10-30 .
>> >>
>> >> Thansk
>> >> Arun
>> >>
>> >> On Fri, Jan 4, 2019 at 5:41 AM Arun POONIA 
>> >>  wrote:
>> >>>
>> >>> Hi Caspar,
>> >>>
>> >>> Yes and No, numbers are going up and down. If I run ceph -s command I 
>> >>> can see it decreases one time and later it increases again. I see there 
>> >>> are so many blocked/slow requests. Almost all the OSDs have slow 
>> >>> requests. Around 12% PGs are inactive not sure how to activate them 
>> >>> again.
>> >>>
>> >>>
>> >>> [root@fre101 ~]# ceph health detail
>> >>> 2019-01-04 05:39:23.860142 7fc37a3a0700 -1 asok(0x7fc3740017a0) 
>> >>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed 
>> >>> to bind the UNIX domain socket to 
>> >>> '/var/run/ceph-guests/ceph-client.admin.1066526.140477441513808.asok': 
>> >>> (2) No such file or directory
>> >>> HEALTH_ERR 1 osds down; 3 pools have many more objects per pg than 
>> >>> average; 472812/12392654 objects misplaced (3.815%); 3610 PGs pending on 
>> >>> creation; Reduced data availability: 6578 pgs inactive, 1882 pgs down, 
>> >>> 86 pgs peering, 850 pgs stale; Degraded data redundancy: 216694/12392654 
>> >>> objects degraded (1.749%), 866 pgs degraded, 16 pgs undersized; 116082 
>> >>> slow requests are blocked > 32 sec; 551 stuck requests are blocked > 
>> >>> 4096 sec; too many PGs per OSD (2709 > max 200)
>> >>> OSD_DOWN 1 osds down
>> >>> osd.28 (root=default,host=fre119) is down
>> >>> MANY_OBJECTS_PER_PG 3 pools have many more objects per pg than average
>> >>> pool glance-images objects per pg (10478) is more than 92.7257 times 
>> >>> cluster average (113)
>> >>> pool vms objects per pg (4717) is more than 41.7434 times cluster 
>> >>> average (113)
>> >>> pool volumes objects per pg (1220) is more than 10.7965 times 
>> >>> cluster average (113)
>> >>> OBJECT_MISPLACED 472812/12392654 objects misplaced (3.815%)
>> >>> PENDING_CREATING_PGS 3610 PGs pending on creation
>> >>> osds 
>> >>> [osd.0,osd.1,osd.10,osd.11,osd.14,osd.15,osd.17,osd.18,osd.19,osd.20,osd.21,osd.22,osd.23,osd.25,osd.26,osd.27,osd.28,osd.3,osd.30,osd.32,osd.33,osd.35,osd.36,osd.37,osd.38,osd.4,osd.5,osd.6,osd.7,osd.9]
>> >>>  have pending PGs.
>> >>> PG_AVAILABILITY Reduced data availability: 6578 pgs inactive, 1882 pgs 
>> >>> down, 86 pgs peering, 850 pgs stale
>> >>> pg 10.900 is down, acting [18]
>> >>> pg 10.90e is stuck inactive for 60266.030164, current state 
>> >&g

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
If you realy created and destroyed OSDs before the cluster healed
itself, this data will be permanently lost (not found / inactive).
Also your PG count is so much oversized, the calculation for peering
will most likely break because this was never tested.

If this is a critical cluster, I would start a new one and bring back
the backups (using a better PG count).

Kevin

Am Fr., 4. Jan. 2019 um 20:25 Uhr schrieb Arun POONIA
:
>
> Can anyone comment on this issue please, I can't seem to bring my cluster 
> healthy.
>
> On Fri, Jan 4, 2019 at 6:26 AM Arun POONIA  
> wrote:
>>
>> Hi Caspar,
>>
>> Number of IOPs are also quite low. It used be around 1K Plus on one of Pool 
>> (VMs) now its like close to 10-30 .
>>
>> Thansk
>> Arun
>>
>> On Fri, Jan 4, 2019 at 5:41 AM Arun POONIA  
>> wrote:
>>>
>>> Hi Caspar,
>>>
>>> Yes and No, numbers are going up and down. If I run ceph -s command I can 
>>> see it decreases one time and later it increases again. I see there are so 
>>> many blocked/slow requests. Almost all the OSDs have slow requests. Around 
>>> 12% PGs are inactive not sure how to activate them again.
>>>
>>>
>>> [root@fre101 ~]# ceph health detail
>>> 2019-01-04 05:39:23.860142 7fc37a3a0700 -1 asok(0x7fc3740017a0) 
>>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
>>> bind the UNIX domain socket to 
>>> '/var/run/ceph-guests/ceph-client.admin.1066526.140477441513808.asok': (2) 
>>> No such file or directory
>>> HEALTH_ERR 1 osds down; 3 pools have many more objects per pg than average; 
>>> 472812/12392654 objects misplaced (3.815%); 3610 PGs pending on creation; 
>>> Reduced data availability: 6578 pgs inactive, 1882 pgs down, 86 pgs 
>>> peering, 850 pgs stale; Degraded data redundancy: 216694/12392654 objects 
>>> degraded (1.749%), 866 pgs degraded, 16 pgs undersized; 116082 slow 
>>> requests are blocked > 32 sec; 551 stuck requests are blocked > 4096 sec; 
>>> too many PGs per OSD (2709 > max 200)
>>> OSD_DOWN 1 osds down
>>> osd.28 (root=default,host=fre119) is down
>>> MANY_OBJECTS_PER_PG 3 pools have many more objects per pg than average
>>> pool glance-images objects per pg (10478) is more than 92.7257 times 
>>> cluster average (113)
>>> pool vms objects per pg (4717) is more than 41.7434 times cluster 
>>> average (113)
>>> pool volumes objects per pg (1220) is more than 10.7965 times cluster 
>>> average (113)
>>> OBJECT_MISPLACED 472812/12392654 objects misplaced (3.815%)
>>> PENDING_CREATING_PGS 3610 PGs pending on creation
>>> osds 
>>> [osd.0,osd.1,osd.10,osd.11,osd.14,osd.15,osd.17,osd.18,osd.19,osd.20,osd.21,osd.22,osd.23,osd.25,osd.26,osd.27,osd.28,osd.3,osd.30,osd.32,osd.33,osd.35,osd.36,osd.37,osd.38,osd.4,osd.5,osd.6,osd.7,osd.9]
>>>  have pending PGs.
>>> PG_AVAILABILITY Reduced data availability: 6578 pgs inactive, 1882 pgs 
>>> down, 86 pgs peering, 850 pgs stale
>>> pg 10.900 is down, acting [18]
>>> pg 10.90e is stuck inactive for 60266.030164, current state activating, 
>>> last acting [2,38]
>>> pg 10.913 is stuck stale for 1887.552862, current state stale+down, 
>>> last acting [9]
>>> pg 10.915 is stuck inactive for 60266.215231, current state activating, 
>>> last acting [30,38]
>>> pg 11.903 is stuck inactive for 59294.465961, current state activating, 
>>> last acting [11,38]
>>> pg 11.910 is down, acting [21]
>>> pg 11.919 is down, acting [25]
>>> pg 12.902 is stuck inactive for 57118.544590, current state activating, 
>>> last acting [36,14]
>>> pg 13.8f8 is stuck inactive for 60707.167787, current state activating, 
>>> last acting [29,37]
>>> pg 13.901 is stuck stale for 60226.543289, current state 
>>> stale+active+clean, last acting [1,31]
>>> pg 13.905 is stuck inactive for 60266.050940, current state activating, 
>>> last acting [2,36]
>>> pg 13.909 is stuck inactive for 60707.160714, current state activating, 
>>> last acting [34,36]
>>> pg 13.90e is stuck inactive for 60707.410749, current state activating, 
>>> last acting [21,36]
>>> pg 13.911 is down, acting [25]
>>> pg 13.914 is stale+down, acting [29]
>>> pg 13.917 is stuck stale for 580.224688, current state stale+down, last 
>>> acting [16]
>>> pg 14.901 is stuck inactive for 60266.037762, current state 
>>> activating+degraded, last acting [22,37]
>>> pg 14.90f is stuck inactive for 60296.996447, current state activating, 
>>> last acting [30,36]
>>> pg 14.910 is stuck inactive for 60266.077310, current state 
>>> activating+degraded, last acting [17,37]
>>> pg 14.915 is stuck inactive for 60266.032445, current state activating, 
>>> last acting [34,36]
>>> pg 15.8fa is stuck stale for 560.223249, current state stale+down, last 
>>> acting [8]
>>> pg 15.90c is stuck inactive for 59294.402388, current state activating, 
>>> last acting [29,38]
>>> pg 15.90d is stuck inactive for 60266.176492, current state activating, 
>>> last acting [5,36]
>>> pg 15.915 

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
PS: Could be http://tracker.ceph.com/issues/36361
There is one HDD OSD that is out (which will not be replaced because
the SSD pool will get the images and the hdd pool will be deleted).

Kevin

Am Fr., 4. Jan. 2019 um 19:46 Uhr schrieb Kevin Olbrich :
>
> Hi!
>
> I did what you wrote but my MGRs started to crash again:
> root@adminnode:~# ceph -s
>   cluster:
> id: 086d9f80-6249-4594-92d0-e31b6a9c
> health: HEALTH_WARN
> no active mgr
> 105498/6277782 objects misplaced (1.680%)
>
>   services:
> mon: 3 daemons, quorum mon01,mon02,mon03
> mgr: no daemons active
> osd: 44 osds: 43 up, 43 in
>
>   data:
> pools:   4 pools, 1616 pgs
> objects: 1.88M objects, 7.07TiB
> usage:   13.2TiB used, 16.7TiB / 29.9TiB avail
> pgs: 105498/6277782 objects misplaced (1.680%)
>  1606 active+clean
>  8active+remapped+backfill_wait
>  2active+remapped+backfilling
>
>   io:
> client:   5.51MiB/s rd, 3.38MiB/s wr, 33op/s rd, 317op/s wr
> recovery: 60.3MiB/s, 15objects/s
>
>
> MON 1 log:
>-13> 2019-01-04 14:05:04.432186 7fec56a93700  4 mgr ms_dispatch
> active mgrdigest v1
>-12> 2019-01-04 14:05:04.432194 7fec56a93700  4 mgr ms_dispatch mgrdigest 
> v1
>-11> 2019-01-04 14:05:04.822041 7fec434e1700  4 mgr[balancer]
> Optimize plan auto_2019-01-04_14:05:04
>-10> 2019-01-04 14:05:04.822170 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/mode
> -9> 2019-01-04 14:05:04.822231 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/max_misplaced
> -8> 2019-01-04 14:05:04.822268 7fec434e1700  4 ceph_config_get
> max_misplaced not found
> -7> 2019-01-04 14:05:04.822444 7fec434e1700  4 mgr[balancer] Mode
> upmap, max misplaced 0.05
> -6> 2019-01-04 14:05:04.822849 7fec434e1700  4 mgr[balancer] do_upmap
> -5> 2019-01-04 14:05:04.822923 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/upmap_max_iterations
> -4> 2019-01-04 14:05:04.822964 7fec434e1700  4 ceph_config_get
> upmap_max_iterations not found
> -3> 2019-01-04 14:05:04.823013 7fec434e1700  4 mgr get_config
> get_configkey: mgr/balancer/upmap_max_deviation
> -2> 2019-01-04 14:05:04.823048 7fec434e1700  4 ceph_config_get
> upmap_max_deviation not found
> -1> 2019-01-04 14:05:04.823265 7fec434e1700  4 mgr[balancer] pools
> ['rbd_vms_hdd', 'rbd_vms_ssd', 'rbd_vms_ssd_01', 'rbd_vms_ssd_01_ec']
>  0> 2019-01-04 14:05:04.836124 7fec434e1700 -1
> /build/ceph-12.2.8/src/osd/OSDMap.cc: In function 'int
> OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set int>&, OSDMap::Incremental*)' thread 7fec434e1700 time 2019-01-04
> 14:05:04.832885
> /build/ceph-12.2.8/src/osd/OSDMap.cc: 4102: FAILED assert(target > 0)
>
>  ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x558c3c0bb572]
>  2: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set std::less, std::allocator > const&,
> OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
>  3: (()+0x2f3020) [0x558c3bf5d020]
>  4: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
>  5: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  6: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
>  7: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  8: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
>  9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
>  10: (()+0x13e370) [0x7fec5e8be370]
>  11: (PyObject_Call()+0x43) [0x7fec5e891273]
>  12: (()+0x1853ac) [0x7fec5e9053ac]
>  13: (PyObject_Call()+0x43) [0x7fec5e891273]
>  14: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
>  15: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
>  16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
>  17: (()+0x76ba) [0x7fec5d74c6ba]
>  18: (clone()+0x6d) [0x7fec5c7b841d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 rbd_mirror
>0/ 5 rbd_replay
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>1/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>   

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
Hi!

I did what you wrote but my MGRs started to crash again:
root@adminnode:~# ceph -s
  cluster:
id: 086d9f80-6249-4594-92d0-e31b6a9c
health: HEALTH_WARN
no active mgr
105498/6277782 objects misplaced (1.680%)

  services:
mon: 3 daemons, quorum mon01,mon02,mon03
mgr: no daemons active
osd: 44 osds: 43 up, 43 in

  data:
pools:   4 pools, 1616 pgs
objects: 1.88M objects, 7.07TiB
usage:   13.2TiB used, 16.7TiB / 29.9TiB avail
pgs: 105498/6277782 objects misplaced (1.680%)
 1606 active+clean
 8active+remapped+backfill_wait
 2active+remapped+backfilling

  io:
client:   5.51MiB/s rd, 3.38MiB/s wr, 33op/s rd, 317op/s wr
recovery: 60.3MiB/s, 15objects/s


MON 1 log:
   -13> 2019-01-04 14:05:04.432186 7fec56a93700  4 mgr ms_dispatch
active mgrdigest v1
   -12> 2019-01-04 14:05:04.432194 7fec56a93700  4 mgr ms_dispatch mgrdigest v1
   -11> 2019-01-04 14:05:04.822041 7fec434e1700  4 mgr[balancer]
Optimize plan auto_2019-01-04_14:05:04
   -10> 2019-01-04 14:05:04.822170 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/mode
-9> 2019-01-04 14:05:04.822231 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
-8> 2019-01-04 14:05:04.822268 7fec434e1700  4 ceph_config_get
max_misplaced not found
-7> 2019-01-04 14:05:04.822444 7fec434e1700  4 mgr[balancer] Mode
upmap, max misplaced 0.05
-6> 2019-01-04 14:05:04.822849 7fec434e1700  4 mgr[balancer] do_upmap
-5> 2019-01-04 14:05:04.822923 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
-4> 2019-01-04 14:05:04.822964 7fec434e1700  4 ceph_config_get
upmap_max_iterations not found
-3> 2019-01-04 14:05:04.823013 7fec434e1700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
-2> 2019-01-04 14:05:04.823048 7fec434e1700  4 ceph_config_get
upmap_max_deviation not found
-1> 2019-01-04 14:05:04.823265 7fec434e1700  4 mgr[balancer] pools
['rbd_vms_hdd', 'rbd_vms_ssd', 'rbd_vms_ssd_01', 'rbd_vms_ssd_01_ec']
 0> 2019-01-04 14:05:04.836124 7fec434e1700 -1
/build/ceph-12.2.8/src/osd/OSDMap.cc: In function 'int
OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set&, OSDMap::Incremental*)' thread 7fec434e1700 time 2019-01-04
14:05:04.832885
/build/ceph-12.2.8/src/osd/OSDMap.cc: 4102: FAILED assert(target > 0)

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x558c3c0bb572]
 2: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set, std::allocator > const&,
OSDMap::Incremental*)+0x2801) [0x558c3c1c0ee1]
 3: (()+0x2f3020) [0x558c3bf5d020]
 4: (PyEval_EvalFrameEx()+0x8a51) [0x7fec5e832971]
 5: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 6: (PyEval_EvalFrameEx()+0x6ffd) [0x7fec5e830f1d]
 7: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 8: (PyEval_EvalFrameEx()+0x7124) [0x7fec5e831044]
 9: (PyEval_EvalCodeEx()+0x85c) [0x7fec5e96805c]
 10: (()+0x13e370) [0x7fec5e8be370]
 11: (PyObject_Call()+0x43) [0x7fec5e891273]
 12: (()+0x1853ac) [0x7fec5e9053ac]
 13: (PyObject_Call()+0x43) [0x7fec5e891273]
 14: (PyObject_CallMethod()+0xf4) [0x7fec5e892444]
 15: (PyModuleRunner::serve()+0x5c) [0x558c3bf5a18c]
 16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x1b8) [0x558c3bf5a998]
 17: (()+0x76ba) [0x7fec5d74c6ba]
 18: (clone()+0x6d) [0x7fec5c7b841d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log
--- end dump of recent events ---
2019-01-04 14:05:05.032479 7fec434e1700 -1 *** Caught signal (Aborted) **
 in thread 7fec434e1700 thread_name:balancer

 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
luminous (stable)
 1: (()+0x4105b4) [0x558c3c07a5b4]
 2: (()+0x11390) [0x7fec5d756390]
 3: 

[ceph-users] TCP qdisc + congestion control / BBR

2019-01-02 Thread Kevin Olbrich
Hi!

I wonder if changing qdisc and congestion_control (for example fq with
Google BBR) on Ceph servers / clients has positive effects during high
load.
Google BBR: 
https://cloud.google.com/blog/products/gcp/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster

I am running a lot of VMs with BBR but the hypervisors run fq_codel +
cubic (OSDs run Ubuntu defaults).

Did someone test qdisc and congestion control settings?

Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Usage of devices in SSD pool vary very much

2019-01-02 Thread Kevin Olbrich
Hi!

On a medium sized cluster with device-classes, I am experiencing a
problem with the SSD pool:

root@adminnode:~# ceph osd df | grep ssd
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 2   ssd 0.43700  1.0  447GiB  254GiB  193GiB 56.77 1.28  50
 3   ssd 0.43700  1.0  447GiB  208GiB  240GiB 46.41 1.04  58
 4   ssd 0.43700  1.0  447GiB  266GiB  181GiB 59.44 1.34  55
30   ssd 0.43660  1.0  447GiB  222GiB  225GiB 49.68 1.12  49
 6   ssd 0.43700  1.0  447GiB  238GiB  209GiB 53.28 1.20  59
 7   ssd 0.43700  1.0  447GiB  228GiB  220GiB 50.88 1.14  56
 8   ssd 0.43700  1.0  447GiB  269GiB  178GiB 60.16 1.35  57
31   ssd 0.43660  1.0  447GiB  231GiB  217GiB 51.58 1.16  56
34   ssd 0.43660  1.0  447GiB  186GiB  261GiB 41.65 0.94  49
36   ssd 0.87329  1.0  894GiB  364GiB  530GiB 40.68 0.92  91
37   ssd 0.87329  1.0  894GiB  321GiB  573GiB 35.95 0.81  78
42   ssd 0.87329  1.0  894GiB  375GiB  519GiB 41.91 0.94  92
43   ssd 0.87329  1.0  894GiB  438GiB  456GiB 49.00 1.10  92
13   ssd 0.43700  1.0  447GiB  249GiB  198GiB 55.78 1.25  72
14   ssd 0.43700  1.0  447GiB  290GiB  158GiB 64.76 1.46  71
15   ssd 0.43700  1.0  447GiB  368GiB 78.6GiB 82.41 1.85  78 <
16   ssd 0.43700  1.0  447GiB  253GiB  194GiB 56.66 1.27  70
19   ssd 0.43700  1.0  447GiB  269GiB  178GiB 60.21 1.35  70
20   ssd 0.43700  1.0  447GiB  312GiB  135GiB 69.81 1.57  77
21   ssd 0.43700  1.0  447GiB  312GiB  135GiB 69.77 1.57  77
22   ssd 0.43700  1.0  447GiB  269GiB  178GiB 60.10 1.35  67
38   ssd 0.43660  1.0  447GiB  153GiB  295GiB 34.11 0.77  46
39   ssd 0.43660  1.0  447GiB  127GiB  320GiB 28.37 0.64  38
40   ssd 0.87329  1.0  894GiB  386GiB  508GiB 43.17 0.97  97
41   ssd 0.87329  1.0  894GiB  375GiB  520GiB 41.88 0.94 113

This leads to just 1.2TB free space (some GBs away from NEAR_FULL pool).
Currently, the balancer plugin is off because it immediately crashed
the MGR in the past (on 12.2.5).
Since then I upgraded to 12.2.8 but did not re-enable the balancer. [I
am unable to find the bugtracker ID]

Would the balancer plugin correct this situation?
What happens if all MGRs die like they did on 12.2.5 because of the plugin?
Will the balancer take data from the most-unbalanced OSDs first?
Otherwise the OSD may fill up more then FULL which would cause the
whole pool to freeze (because the smallest OSD is taken into account
for free space calculation).
This would be the worst case as over 100 VMs would freeze, causing lot
of trouble. This is also the reason I did not try to enable the
balancer again.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
> > Assuming everything is on LVM including the root filesystem, only moving
> > the boot partition will have to be done outside of LVM.
>
> Since the OP mentioned MS Exchange, I assume the VM is running windows.
> You can do the same LVM-like trick in Windows Server via Disk Manager
> though; add the new ceph RBD disk to the existing data volume as a
> mirror; wait for it to sync, then break the mirror and remove the
> original disk.

Mirrors only work on dynamic disks which are a pain to revert and
cause lot's of problems with backup solutions.
I will keep this in mind as this is still better than shutting down
the whole VM.

@all
Thank you very much for your inputs. I will try some less important
VMs and then start migration of the big one.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
Hi!

Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server has access to both local and cluster-storage, I only need
to live migrate the storage, not machine.

I have never used live migration as it can cause more issues and the
VMs that are already migrated, had planned downtime.
Taking the VM offline and convert/import using qemu-img would take
some hours but I would like to still serve clients, even if it is
slower.

The VM is I/O-heavy in terms of the old storage (LSI/Adaptec with
BBU). There are two HDDs bound as RAID1 which are constantly under 30%
- 60% load (this goes up to 100% during reboot, updates or login
prime-time).

What happens when either the local compute node or the ceph cluster
fails (degraded)? Or network is unavailable?
Are all writes performed to both locations? Is this fail-safe? Or does
the VM crash in worst case, which can lead to dirty shutdown for MS-EX
DBs?

The node currently has 4GB free RAM and 29GB listed as cache /
available. These numbers need caution because we have "tuned" enabled
which causes de-deplication on RAM and this host runs about 10 Windows
VMs.
During reboots or updates, RAM can get full again.

Maybe I am to cautious about live-storage-migration, maybe I am not.

What are your experiences or advices?

Thank you very much!

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-11-15 Thread Kevin Olbrich
I now had the time to test and after installing this package, uploads to
rbd are working perfectly.
Thank you very much fur sharing this!

Kevin

Am Mi., 7. Nov. 2018 um 15:36 Uhr schrieb Kevin Olbrich :

> Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard <
> nhuill...@dolomede.fr>:
>
>>
>> > It lists rbd but still fails with the exact same error.
>>
>> I stumbled upon the exact same error, and since there was no answer
>> anywhere, I figured it was a very simple problem: don't forget to
>> install the qemu-block-extra package (Debian stretch) along with qemu-
>> utils which contains the qemu-img command.
>> This command is actually compiled with rbd support (hence the output
>> above), but need this extra package to pull actual support-code and
>> dependencies...
>>
>
> I have not been able to test this yet but this package was indeed missing
> on my system!
> Thank you for this hint!
>
>
>> --
>> Nicolas Huillard
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Kevin Olbrich
I read the whole thread and it looks like the write cache should always be
disabled as in the worst case, the performance is the same(?).
This is based on this discussion.

I will test some WD4002FYYZ which don't mention "media cache".

Kevin

Am Di., 13. Nov. 2018 um 09:27 Uhr schrieb Виталий Филиппов <
vita...@yourcmc.ru>:

> This may be the explanation:
>
>
> https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and
>
> Other manufacturers may have started to do the same, I suppose.
> --
> With best regards,
> Vitaliy Filippov___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
Hi Dan,

ZFS without sync would be very much identical to ext2/ext4 without journals
or XFS with barriers disabled.
The ARC cache in ZFS is awesome but disbaling sync on ZFS is a very high
risk (using ext4 with kvm-mode unsafe would be similar I think).

Also, ZFS only works as expected with scheduler set to noop as it is
optimized to consume whole, non-shared devices.

Just my 2 cents ;-)

Kevin


Am Mo., 12. Nov. 2018 um 15:08 Uhr schrieb Dan van der Ster <
d...@vanderster.com>:

> We've done ZFS on RBD in a VM, exported via NFS, for a couple years.
> It's very stable and if your use-case permits you can set zfs
> sync=disabled to get very fast write performance that's tough to beat.
>
> But if you're building something new today and have *only* the NAS
> use-case then it would make better sense to try CephFS first and see
> if it works for you.
>
> -- Dan
>
> On Mon, Nov 12, 2018 at 3:01 PM Kevin Olbrich  wrote:
> >
> > Hi!
> >
> > ZFS won't play nice on ceph. Best would be to mount CephFS directly with
> the ceph-fuse driver on the endpoint.
> > If you definitely want to put a storage gateway between the data and the
> compute nodes, then go with nfs-ganesha which can export CephFS directly
> without local ("proxy") mount.
> >
> > I had such a setup with nfs and switched to mount CephFS directly. If
> using NFS with the same data, you must make sure your HA works well to
> avoid data corruption.
> > With ceph-fuse you directly connect to the cluster, one component less
> that breaks.
> >
> > Kevin
> >
> > Am Mo., 12. Nov. 2018 um 12:44 Uhr schrieb Premysl Kouril <
> premysl.kou...@gmail.com>:
> >>
> >> Hi,
> >>
> >>
> >> We are planning to build NAS solution which will be primarily used via
> NFS and CIFS and workloads ranging from various archival application to
> more “real-time processing”. The NAS will not be used as a block storage
> for virtual machines, so the access really will always be file oriented.
> >>
> >>
> >> We are considering primarily two designs and I’d like to kindly ask for
> any thoughts, views, insights, experiences.
> >>
> >>
> >> Both designs utilize “distributed storage software at some level”. Both
> designs would be built from commodity servers and should scale as we grow.
> Both designs involve virtualization for instantiating "access virtual
> machines" which will be serving the NFS and CIFS protocol - so in this
> sense the access layer is decoupled from the data layer itself.
> >>
> >>
> >> First design is based on a distributed filesystem like Gluster or
> CephFS. We would deploy this software on those commodity servers and mount
> the resultant filesystem on the “access virtual machines” and they would be
> serving the mounted filesystem via NFS/CIFS.
> >>
> >>
> >> Second design is based on distributed block storage using CEPH. So we
> would build distributed block storage on those commodity servers, and then,
> via virtualization (like OpenStack Cinder) we would allocate the block
> storage into the access VM. Inside the access VM we would deploy ZFS which
> would aggregate block storage into a single filesystem. And this filesystem
> would be served via NFS/CIFS from the very same VM.
> >>
> >>
> >> Any advices and insights highly appreciated
> >>
> >>
> >> Cheers,
> >>
> >> Prema
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
Hi!

ZFS won't play nice on ceph. Best would be to mount CephFS directly with
the ceph-fuse driver on the endpoint.
If you definitely want to put a storage gateway between the data and the
compute nodes, then go with nfs-ganesha which can export CephFS directly
without local ("proxy") mount.

I had such a setup with nfs and switched to mount CephFS directly. If using
NFS with the same data, you must make sure your HA works well to avoid data
corruption.
With ceph-fuse you directly connect to the cluster, one component less that
breaks.

Kevin

Am Mo., 12. Nov. 2018 um 12:44 Uhr schrieb Premysl Kouril <
premysl.kou...@gmail.com>:

> Hi,
>
> We are planning to build NAS solution which will be primarily used via NFS
> and CIFS and workloads ranging from various archival application to more
> “real-time processing”. The NAS will not be used as a block storage for
> virtual machines, so the access really will always be file oriented.
>
> We are considering primarily two designs and I’d like to kindly ask for
> any thoughts, views, insights, experiences.
>
> Both designs utilize “distributed storage software at some level”. Both
> designs would be built from commodity servers and should scale as we grow.
> Both designs involve virtualization for instantiating "access virtual
> machines" which will be serving the NFS and CIFS protocol - so in this
> sense the access layer is decoupled from the data layer itself.
>
> First design is based on a distributed filesystem like Gluster or CephFS.
> We would deploy this software on those commodity servers and mount the
> resultant filesystem on the “access virtual machines” and they would be
> serving the mounted filesystem via NFS/CIFS.
>
> Second design is based on distributed block storage using CEPH. So we
> would build distributed block storage on those commodity servers, and then,
> via virtualization (like OpenStack Cinder) we would allocate the block
> storage into the access VM. Inside the access VM we would deploy ZFS which
> would aggregate block storage into a single filesystem. And this filesystem
> would be served via NFS/CIFS from the very same VM.
>
>
> Any advices and insights highly appreciated
>
>
> Cheers,
>
> Prema
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Kevin Olbrich
Am Mi., 7. Nov. 2018 um 16:40 Uhr schrieb Gregory Farnum :

> On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside 
> wrote:
>
>>
>>
>> On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> >> I wonder if there is any release announcement for ceph 12.2.9 that I
>> missed.
>> >> I just found the new packages on download.ceph.com, is this an
>> official
>> >> release?
>> >
>> > This is because 12.2.9 have a several bugs. You should avoid to use
>> this
>> > release and wait for 12.2.10
>>
>> Argh! What's it doing in the repos then?? I've just upgraded to it!
>> What are the bugs? Is there a thread about them?
>
>
> If you’ve already upgraded and have no issues then you won’t have any
> trouble going forward — except perhaps on the next upgrade, if you do it
> while the cluster is unhealthy.
>
> I agree that it’s annoying when these issues make it out. We’ve had
> ongoing discussions to try and improve the release process so it’s less
> drawn-out and to prevent these upgrade issues from making it through
> testing, but nobody has resolved it yet. If anybody has experience working
> with deb repositories and handling releases, the Ceph upstream could use
> some help... ;)
> -Greg
>
>>
>>
We solve this problem by hosting two repos. One for staging and QA and one
for production.
Every release gets to staging (for example directly after building a scm
tag).

If QA passed, the stage repo is turned into the prod one.
Using symlinks, it would be possible to switch back if problems occure.
Example: https://incoming.debian.org/

Currently I would be unable to deploy new nodes if I use the official
mirrors as apt is unable to use older versions (which does work on yum/dnf).
Thats why we are implementing "mirror-sync" / rsync with a copy of the repo
and the desired packages until such solution is available.

Kevin


>> Simon
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-11-07 Thread Kevin Olbrich
Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard <
nhuill...@dolomede.fr>:

>
> > It lists rbd but still fails with the exact same error.
>
> I stumbled upon the exact same error, and since there was no answer
> anywhere, I figured it was a very simple problem: don't forget to
> install the qemu-block-extra package (Debian stretch) along with qemu-
> utils which contains the qemu-img command.
> This command is actually compiled with rbd support (hence the output
> above), but need this extra package to pull actual support-code and
> dependencies...
>

I have not been able to test this yet but this package was indeed missing
on my system!
Thank you for this hint!


> --
> Nicolas Huillard
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy osd creation failed with multipath and dmcrypt

2018-11-06 Thread Kevin Olbrich
I met the same problem. I had to create GPT table for each disk, create
first partition over full space and then fed these to ceph-volume (should
be similar for ceph-deploy).
Also I am not sure if you can combine fs-type btrfs with bluestore (afaik
this is for filestore).

Kevin


Am Di., 6. Nov. 2018 um 14:41 Uhr schrieb Pavan, Krish <
krish.pa...@nuance.com>:

> Trying to created OSD with multipath with dmcrypt and it failed . Any
> suggestion please?.
>
> ceph-deploy --overwrite-conf osd create ceph-store1:/dev/mapper/mpathr
> --bluestore --dmcrypt  -- failed
>
> ceph-deploy --overwrite-conf osd create ceph-store1:/dev/mapper/mpathr
> --bluestore – worked
>
>
>
> the logs for fail
>
> [ceph-store12][WARNIN] command: Running command: /usr/sbin/restorecon -R
> /var/lib/ceph/osd-lockbox/e15f1adc-feff-4890-a617-adc473e7331e/magic.68428.tmp
>
> [ceph-store12][WARNIN] command: Running command: /usr/bin/chown -R
> ceph:ceph
> /var/lib/ceph/osd-lockbox/e15f1adc-feff-4890-a617-adc473e7331e/magic.68428.tmp
>
> [ceph-store12][WARNIN] Traceback (most recent call last):
>
> [ceph-store12][WARNIN]   File "/usr/sbin/ceph-disk", line 9, in 
>
> [ceph-store12][WARNIN] load_entry_point('ceph-disk==1.0.0',
> 'console_scripts', 'ceph-disk')()
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5736, in run
>
> [ceph-store12][WARNIN] main(sys.argv[1:])
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5687, in main
>
> [ceph-store12][WARNIN] args.func(args)
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2108, in main
>
> [ceph-store12][WARNIN] Prepare.factory(args).prepare()
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2097, in prepare
>
> [ceph-store12][WARNIN] self._prepare()
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2171, in _prepare
>
> [ceph-store12][WARNIN] self.lockbox.prepare()
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2861, in prepare
>
> [ceph-store12][WARNIN] self.populate()
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2818, in populate
>
> [ceph-store12][WARNIN] get_partition_base(self.partition.get_dev()),
>
> [ceph-store12][WARNIN]   File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 844, in
> get_partition_base
>
> [ceph-store12][WARNIN] raise Error('not a partition', dev)
>
> [ceph-store12][WARNIN] ceph_disk.main.Error: Error: not a partition:
> /dev/dm-215
>
> [ceph-store12][ERROR ] RuntimeError: command returned non-zero exit
> status: 1
>
> [ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-disk
> -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore
> --cluster ceph --fs-type btrfs -- /dev/mapper/mpathr
>
> [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Hi!

Proxmox has support for rbd as they ship additional packages as well as
ceph via their own repo.

I ran your command and got this:

> qemu-img version 2.8.1(Debian 1:2.8+dfsg-6+deb9u4)
> Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
> Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
> ftps gluster host_cdrom host_device http https iscsi iser luks nbd nfs
> null-aio null-co parallels qcow qcow2 qed quorum raw rbd replication
> sheepdog ssh vdi vhdx vmdk vpc vvfat


It lists rbd but still fails with the exact same error.

Kevin


Am Di., 30. Okt. 2018 um 17:14 Uhr schrieb David Turner <
drakonst...@gmail.com>:

> What version of qemu-img are you using?  I found [1] this when poking
> around on my qemu server when checking for rbd support.  This version (note
> it's proxmox) has rbd listed as a supported format.
>
> [1]
> # qemu-img -V; qemu-img --help|grep rbd
> qemu-img version 2.11.2pve-qemu-kvm_2.11.2-1
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> Supported formats: blkdebug blkreplay blkverify bochs cloop dmg file ftp
> ftps gluster host_cdrom host_device http https iscsi iser luks nbd null-aio
> null-co parallels qcow qcow2 qed quorum raw rbd replication sheepdog
> throttle vdi vhdx vmdk vpc vvfat zeroinit
> On Tue, Oct 30, 2018 at 12:08 PM Kevin Olbrich  wrote:
>
>> Is it possible to use qemu-img with rbd support on Debian Stretch?
>> I am on Luminous and try to connect my image-buildserver to load images
>> into a ceph pool.
>>
>> root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
>>> rbd:rbd_vms_ssd_01/test_vm
>>> qemu-img: Unknown protocol 'rbd'
>>
>>
>> Kevin
>>
>> Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
>> abhis...@suse.com>:
>>
>>> arad...@tma-0.net writes:
>>>
>>> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain
>>> packages for
>>> > Debian? I'm not seeing any, but maybe I'm missing something...
>>> >
>>> > I'm seeing ceph-deploy install an older version of ceph on the nodes
>>> (from the
>>> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
>>> ceph-
>>> > volume doesn't exist on the nodes.
>>> >
>>> The newer versions of Ceph (from mimic onwards) requires compiler
>>> toolchains supporting c++17 which we unfortunately do not have for
>>> stretch/jessie yet.
>>>
>>> -
>>> Abhishek
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Is it possible to use qemu-img with rbd support on Debian Stretch?
I am on Luminous and try to connect my image-buildserver to load images
into a ceph pool.

root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2
> rbd:rbd_vms_ssd_01/test_vm
> qemu-img: Unknown protocol 'rbd'


Kevin

Am Mo., 3. Sep. 2018 um 12:07 Uhr schrieb Abhishek Lekshmanan <
abhis...@suse.com>:

> arad...@tma-0.net writes:
>
> > Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages
> for
> > Debian? I'm not seeing any, but maybe I'm missing something...
> >
> > I'm seeing ceph-deploy install an older version of ceph on the nodes
> (from the
> > Debian repo) and then failing when I run "ceph-deploy osd ..." because
> ceph-
> > volume doesn't exist on the nodes.
> >
> The newer versions of Ceph (from mimic onwards) requires compiler
> toolchains supporting c++17 which we unfortunately do not have for
> stretch/jessie yet.
>
> -
> Abhishek
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Command to check last change to rbd image?

2018-10-28 Thread Kevin Olbrich
Hi!

Is there an easy way to check when an image was last modified?
I want to make sure, that the images I want to clean up, were not used for
a long time.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] nfs-ganesha version in Ceph repos

2018-10-09 Thread Kevin Olbrich
I had a similar problem:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/029698.html

But even the recent 2.6.x releases were not working well for me (many many
segfaults). I am on the master-branch (2.7.x) and that works well with less
crashs.
Cluster is 13.2.1/.2 with nfs-ganesha as standalone VM.

Kevin


Am Di., 9. Okt. 2018 um 19:39 Uhr schrieb Erik McCormick <
emccorm...@cirrusseven.com>:

> On Tue, Oct 9, 2018 at 1:27 PM Erik McCormick
>  wrote:
> >
> > Hello,
> >
> > I'm trying to set up an nfs-ganesha server with the Ceph FSAL, and
> > running into difficulties getting the current stable release running.
> > The versions in the Luminous repo is stuck at 2.6.1, whereas the
> > current stable version is 2.6.3. I've seen a couple of HA issues in
> > pre 2.6.3 versions that I'd like to avoid.
> >
>
> I should have been more specific that the ones I am looking for are for
> Centos 7
>
> > I've also been attempting to build my own from source, but banging my
> > head against a wall as far as dependencies and config options are
> > concerned.
> >
> > If anyone reading this has the ability to kick off a fresh build of
> > the V2.6-stable branch with all the knobs turned properly for Ceph, or
> > can point me to a set of cmake configs and scripts that might help me
> > do it myself, I would be eternally grateful.
> >
> > Thanks,
> > Erik
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi Jakub,

"ceph osd metadata X" this is perfect! This also lists multipath devices
which I was looking for!

Kevin


Am Mo., 8. Okt. 2018 um 21:16 Uhr schrieb Jakub Jaszewski <
jaszewski.ja...@gmail.com>:

> Hi Kevin,
> Have you tried ceph osd metadata OSDid ?
>
> Jakub
>
> pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza 
> napisał:
>
>> On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich  wrote:
>> >
>> > Hi!
>> >
>> > Yes, thank you. At least on one node this works, the other node just
>> freezes but this might by caused by a bad disk that I try to find.
>>
>> If it is freezing, you could maybe try running the command where it
>> freezes? (ceph-volume will log it to the terminal)
>>
>>
>> >
>> > Kevin
>> >
>> > Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander <
>> w...@42on.com>:
>> >>
>> >> Hi,
>> >>
>> >> $ ceph-volume lvm list
>> >>
>> >> Does that work for you?
>> >>
>> >> Wido
>> >>
>> >> On 10/08/2018 12:01 PM, Kevin Olbrich wrote:
>> >> > Hi!
>> >> >
>> >> > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id?
>> >> > Before I migrated from filestore with simple-mode to bluestore with
>> lvm,
>> >> > I was able to find the raw disk with "df".
>> >> > Now, I need to go from LVM LV to PV to disk every time I need to
>> >> > check/smartctl a disk.
>> >> >
>> >> > Kevin
>> >> >
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi!

Yes, thank you. At least on one node this works, the other node just
freezes but this might by caused by a bad disk that I try to find.

Kevin

Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander :

> Hi,
>
> $ ceph-volume lvm list
>
> Does that work for you?
>
> Wido
>
> On 10/08/2018 12:01 PM, Kevin Olbrich wrote:
> > Hi!
> >
> > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id?
> > Before I migrated from filestore with simple-mode to bluestore with lvm,
> > I was able to find the raw disk with "df".
> > Now, I need to go from LVM LV to PV to disk every time I need to
> > check/smartctl a disk.
> >
> > Kevin
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi!

Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id?
Before I migrated from filestore with simple-mode to bluestore with lvm, I
was able to find the raw disk with "df".
Now, I need to go from LVM LV to PV to disk every time I need to
check/smartctl a disk.

Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-08 Thread Kevin Olbrich
nt: (5)
Input/output error
2018-10-08 10:32:17.434 7f6af518e1c0 20 bdev aio_wait 0x55a3a1edb8c0 done
2018-10-08 10:32:17.434 7f6af518e1c0  1 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) close
2018-10-08 10:32:17.434 7f6af518e1c0 10 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _aio_stop
2018-10-08 10:32:17.568 7f6add7d3700 10 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _aio_thread end
2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _discard_stop
2018-10-08 10:32:17.573 7f6adcfd2700 20 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _discard_thread wake
2018-10-08 10:32:17.573 7f6adcfd2700 10 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _discard_thread finish
2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62a80
/var/lib/ceph/osd/ceph-46/block) _discard_stop stopped
2018-10-08 10:32:17.573 7f6af518e1c0  1 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) close
2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _aio_stop
2018-10-08 10:32:17.817 7f6ade7d5700 10 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _aio_thread end
2018-10-08 10:32:17.822 7f6af518e1c0 10 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _discard_stop
2018-10-08 10:32:17.822 7f6addfd4700 20 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _discard_thread wake
2018-10-08 10:32:17.822 7f6addfd4700 10 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _discard_thread finish
2018-10-08 10:32:17.822 7f6af518e1c0 10 bdev(0x55a3a1d62000
/var/lib/ceph/osd/ceph-46/block) _discard_stop stopped
2018-10-08 10:32:17.823 7f6af518e1c0 -1 osd.46 0 OSD:init: unable to mount
object store
2018-10-08 10:32:17.823 7f6af518e1c0 -1  ** ERROR: osd init failed: (5)
Input/output error


Anything interesting here?

I will try to export the down PGs from the disks. I got a bunch of new
disks to replace all. Most of current disks are of same age.

Kevin

Am Mi., 3. Okt. 2018 um 13:52 Uhr schrieb Paul Emmerich <
paul.emmer...@croit.io>:

> There's "ceph-bluestore-tool repair/fsck"
>
> In your scenario, a few more log files would be interesting: try
> setting debug bluefs to 20/20. And if that's not enough log try also
> setting debug osd, debug bluestore, and debug bdev to 20/20.
>
>
>
> Paul
> Am Mi., 3. Okt. 2018 um 13:48 Uhr schrieb Kevin Olbrich :
> >
> > The disks were deployed with ceph-deploy / ceph-volume using the default
> style (lvm) and not simple-mode.
> >
> > The disks were provisioned as a whole, no resizing. I never touched the
> disks after deployment.
> >
> > It is very strange that this first happened after the update, never met
> such an error before.
> >
> > I found a BUG in the tracker, that also shows such an error with count
> 0. That was closed with „can’t reproduce“ (don’t have the link ready). For
> me this seems like the data itself is fine and I just hit a bad transaction
> in the replay (which maybe caused the crash in the first place).
> >
> > I need one of three disks back. Object corruption would not be a problem
> (regarding drop of a journal), as this cluster hosts backups which will
> fail validation and regenerate. Just marking the OSD lost does not seem to
> be an option.
> >
> > Is there some sort of fsck for BlueFS?
> >
> > Kevin
> >
> >
> > Igor Fedotov  schrieb am Mi. 3. Okt. 2018 um 13:01:
> >>
> >> I've seen somewhat similar behavior in a log from Sergey Malinin in
> another thread ("mimic: 3/4 OSDs crashed...")
> >>
> >> He claimed it happened after LVM volume expansion. Isn't this the case
> for you?
> >>
> >> Am I right that you use LVM volumes?
> >>
> >>
> >> On 10/3/2018 11:22 AM, Kevin Olbrich wrote:
> >>
> >> Small addition: the failing disks are in the same host.
> >> This is a two-host, failure-domain OSD cluster.
> >>
> >>
> >> Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich :
> >>>
> >>> Hi!
> >>>
> >>> Yesterday one of our (non-priority) clusters failed when 3 OSDs went
> down (EC 8+2) together.
> >>> This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two
> hours before.
> >>> They failed exactly at the same moment, rendering the cluster unusable
> (CephFS).
> >>> We are using CentOS 7 with latest updates and ceph repo. No cache
> SSDs, no external journal / wal / db.
> >>>
> >>> OSD 29 (no disk failure in dmesg):
> >>> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 set uid:gid to 167:167
> (ceph:ceph)
> >>> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 ceph version 13.2.2
> (02899bfda8141

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
The disks were deployed with ceph-deploy / ceph-volume using the default
style (lvm) and not simple-mode.

The disks were provisioned as a whole, no resizing. I never touched the
disks after deployment.

It is very strange that this first happened after the update, never met
such an error before.

I found a BUG in the tracker, that also shows such an error with count 0.
That was closed with „can’t reproduce“ (don’t have the link ready). For me
this seems like the data itself is fine and I just hit a bad transaction in
the replay (which maybe caused the crash in the first place).

I need one of three disks back. Object corruption would not be a problem
(regarding drop of a journal), as this cluster hosts backups which will
fail validation and regenerate. Just marking the OSD lost does not seem to
be an option.

Is there some sort of fsck for BlueFS?

Kevin


Igor Fedotov  schrieb am Mi. 3. Okt. 2018 um 13:01:

> I've seen somewhat similar behavior in a log from Sergey Malinin in
> another thread ("mimic: 3/4 OSDs crashed...")
>
> He claimed it happened after LVM volume expansion. Isn't this the case for
> you?
>
> Am I right that you use LVM volumes?
>
> On 10/3/2018 11:22 AM, Kevin Olbrich wrote:
>
> Small addition: the failing disks are in the same host.
> This is a two-host, failure-domain OSD cluster.
>
>
> Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich :
>
>> Hi!
>>
>> Yesterday one of our (non-priority) clusters failed when 3 OSDs went down
>> (EC 8+2) together.
>> *This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two
>> hours before.*
>> They failed exactly at the same moment, rendering the cluster unusable
>> (CephFS).
>> We are using CentOS 7 with latest updates and ceph repo. No cache SSDs,
>> no external journal / wal / db.
>>
>> *OSD 29 (no disk failure in dmesg):*
>> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 set uid:gid to 167:167 (ceph:ceph)
>> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 ceph version 13.2.2
>> (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process
>> ceph-osd, pid 20899
>> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 pidfile_write: ignore empty
>> --pid-file
>> 2018-10-03 09:47:15.100 7fb8835ce1c0  0 load: jerasure load: lrc load:
>> isa
>> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev create path
>> /var/lib/ceph/osd/ceph-29/block type kernel
>> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
>> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
>> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
>> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
>> GiB) block_size 4096 (4 KiB) rotational
>> 2018-10-03 09:47:15.101 7fb8835ce1c0  1
>> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
>> kv_ratio 0.5
>> 2018-10-03 09:47:15.101 7fb8835ce1c0  1
>> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
>> meta 0 kv 1 data 0
>> 2018-10-03 09:47:15.101 7fb8835ce1c0  1 bdev(0x561250a2
>> /var/lib/ceph/osd/ceph-29/block) close
>> 2018-10-03 09:47:15.358 7fb8835ce1c0  1
>> bluestore(/var/lib/ceph/osd/ceph-29) _mount path /var/lib/ceph/osd/ceph-29
>> 2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev create path
>> /var/lib/ceph/osd/ceph-29/block type kernel
>> 2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev(0x561250a2
>> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
>> 2018-10-03 09:47:15.359 7fb8835ce1c0  1 bdev(0x561250a2
>> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
>> GiB) block_size 4096 (4 KiB) rotational
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1
>> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
>> kv_ratio 0.5
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1
>> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
>> meta 0 kv 1 data 0
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev create path
>> /var/lib/ceph/osd/ceph-29/block type kernel
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
>> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
>> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
>> GiB) block_size 4096 (4 KiB) rotational
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs add_block_device bdev 1
>> path /var/lib/ceph/osd/ceph-29/block size 932 GiB
>> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs mount
>> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs _replay file wi

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Small addition: the failing disks are in the same host.
This is a two-host, failure-domain OSD cluster.


Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich :

> Hi!
>
> Yesterday one of our (non-priority) clusters failed when 3 OSDs went down
> (EC 8+2) together.
> *This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two
> hours before.*
> They failed exactly at the same moment, rendering the cluster unusable
> (CephFS).
> We are using CentOS 7 with latest updates and ceph repo. No cache SSDs, no
> external journal / wal / db.
>
> *OSD 29 (no disk failure in dmesg):*
> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 set uid:gid to 167:167 (ceph:ceph)
> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 ceph version 13.2.2
> (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process
> ceph-osd, pid 20899
> 2018-10-03 09:47:15.074 7fb8835ce1c0  0 pidfile_write: ignore empty
> --pid-file
> 2018-10-03 09:47:15.100 7fb8835ce1c0  0 load: jerasure load: lrc load: isa
> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev create path
> /var/lib/ceph/osd/ceph-29/block type kernel
> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
> 2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
> GiB) block_size 4096 (4 KiB) rotational
> 2018-10-03 09:47:15.101 7fb8835ce1c0  1
> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
> kv_ratio 0.5
> 2018-10-03 09:47:15.101 7fb8835ce1c0  1
> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
> meta 0 kv 1 data 0
> 2018-10-03 09:47:15.101 7fb8835ce1c0  1 bdev(0x561250a2
> /var/lib/ceph/osd/ceph-29/block) close
> 2018-10-03 09:47:15.358 7fb8835ce1c0  1
> bluestore(/var/lib/ceph/osd/ceph-29) _mount path /var/lib/ceph/osd/ceph-29
> 2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev create path
> /var/lib/ceph/osd/ceph-29/block type kernel
> 2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev(0x561250a2
> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
> 2018-10-03 09:47:15.359 7fb8835ce1c0  1 bdev(0x561250a2
> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
> GiB) block_size 4096 (4 KiB) rotational
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1
> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
> kv_ratio 0.5
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1
> bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
> meta 0 kv 1 data 0
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev create path
> /var/lib/ceph/osd/ceph-29/block type kernel
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
> /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
> /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
> GiB) block_size 4096 (4 KiB) rotational
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs add_block_device bdev 1
> path /var/lib/ceph/osd/ceph-29/block size 932 GiB
> 2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs mount
> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs _replay file with link
> count 0: file(ino 519 size 0x31e2f42 mtime 2018-10-02 12:24:22.632397 bdev
> 1 allocated 320 extents
> [1:0x700820+10,1:0x700900+10,1:0x700910+10,1:0x700920+10,1:0x700930+10,1:0x700940+10,1:0x700950+10,1:0x700960+10,1:0x700970+10,1:0x700980+10,1:0x700990+10,1:0x7009a0+10,1:0x7009b0+10,1:0x7009c0+10,1:0x7009d0+10,1:0x7009e0+10,1:0x7009f0+10,1:0x700a00+10,1:0x700a10+10,1:0x700a20+10,1:0x700a30+10,1:0x700a40+10,1:0x700a50+10,1:0x700a60+10,1:0x700a70+10,1:0x700a80+10,1:0x700a90+10,1:0x700aa0+10,1:0x700ab0+10,1:0x700ac0+10,1:0x700ad0+10,1:0x700ae0+10,1:0x700af0+10,1:0x700b00+10,1:0x700b10+10,1:0x700b20+10,1:0x700b30+10,1:0x700b40+10,1:0x700b50+10,1:0x700b60+10,1:0x700b70+10,1:0x700b80+10,1:0x700b90+10,1:0x700ba0+10,1:0x700bb0+10,1:0x700bc0+10,1:0x700bd0+10,1:0x700be0+10,1:0x700bf0+10,1:0x700c00+10])
> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs mount failed to replay log:
> (5) Input/output error
> 2018-10-03 09:47:15.538 7fb8835ce1c0  1 stupidalloc 0x0x561250b8d030
> shutdown
> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1
> bluestore(/var/lib/ceph/osd/ceph-29) _open_db failed bluefs mount: (

[ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Hi!

Yesterday one of our (non-priority) clusters failed when 3 OSDs went down
(EC 8+2) together.
*This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two
hours before.*
They failed exactly at the same moment, rendering the cluster unusable
(CephFS).
We are using CentOS 7 with latest updates and ceph repo. No cache SSDs, no
external journal / wal / db.

*OSD 29 (no disk failure in dmesg):*
2018-10-03 09:47:15.074 7fb8835ce1c0  0 set uid:gid to 167:167 (ceph:ceph)
2018-10-03 09:47:15.074 7fb8835ce1c0  0 ceph version 13.2.2
(02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process
ceph-osd, pid 20899
2018-10-03 09:47:15.074 7fb8835ce1c0  0 pidfile_write: ignore empty
--pid-file
2018-10-03 09:47:15.100 7fb8835ce1c0  0 load: jerasure load: lrc load: isa
2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev create path
/var/lib/ceph/osd/ceph-29/block type kernel
2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
2018-10-03 09:47:15.100 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
GiB) block_size 4096 (4 KiB) rotational
2018-10-03 09:47:15.101 7fb8835ce1c0  1
bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
kv_ratio 0.5
2018-10-03 09:47:15.101 7fb8835ce1c0  1
bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
meta 0 kv 1 data 0
2018-10-03 09:47:15.101 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) close
2018-10-03 09:47:15.358 7fb8835ce1c0  1
bluestore(/var/lib/ceph/osd/ceph-29) _mount path /var/lib/ceph/osd/ceph-29
2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev create path
/var/lib/ceph/osd/ceph-29/block type kernel
2018-10-03 09:47:15.358 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
2018-10-03 09:47:15.359 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
GiB) block_size 4096 (4 KiB) rotational
2018-10-03 09:47:15.360 7fb8835ce1c0  1
bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 >
kv_ratio 0.5
2018-10-03 09:47:15.360 7fb8835ce1c0  1
bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912
meta 0 kv 1 data 0
2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev create path
/var/lib/ceph/osd/ceph-29/block type kernel
2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
/var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block
2018-10-03 09:47:15.360 7fb8835ce1c0  1 bdev(0x561250a20a80
/var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e080, 932
GiB) block_size 4096 (4 KiB) rotational
2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs add_block_device bdev 1 path
/var/lib/ceph/osd/ceph-29/block size 932 GiB
2018-10-03 09:47:15.360 7fb8835ce1c0  1 bluefs mount
2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs _replay file with link count
0: file(ino 519 size 0x31e2f42 mtime 2018-10-02 12:24:22.632397 bdev 1
allocated 320 extents
[1:0x700820+10,1:0x700900+10,1:0x700910+10,1:0x700920+10,1:0x700930+10,1:0x700940+10,1:0x700950+10,1:0x700960+10,1:0x700970+10,1:0x700980+10,1:0x700990+10,1:0x7009a0+10,1:0x7009b0+10,1:0x7009c0+10,1:0x7009d0+10,1:0x7009e0+10,1:0x7009f0+10,1:0x700a00+10,1:0x700a10+10,1:0x700a20+10,1:0x700a30+10,1:0x700a40+10,1:0x700a50+10,1:0x700a60+10,1:0x700a70+10,1:0x700a80+10,1:0x700a90+10,1:0x700aa0+10,1:0x700ab0+10,1:0x700ac0+10,1:0x700ad0+10,1:0x700ae0+10,1:0x700af0+10,1:0x700b00+10,1:0x700b10+10,1:0x700b20+10,1:0x700b30+10,1:0x700b40+10,1:0x700b50+10,1:0x700b60+10,1:0x700b70+10,1:0x700b80+10,1:0x700b90+10,1:0x700ba0+10,1:0x700bb0+10,1:0x700bc0+10,1:0x700bd0+10,1:0x700be0+10,1:0x700bf0+10,1:0x700c00+10])
2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs mount failed to replay log:
(5) Input/output error
2018-10-03 09:47:15.538 7fb8835ce1c0  1 stupidalloc 0x0x561250b8d030
shutdown
2018-10-03 09:47:15.538 7fb8835ce1c0 -1
bluestore(/var/lib/ceph/osd/ceph-29) _open_db failed bluefs mount: (5)
Input/output error
2018-10-03 09:47:15.538 7fb8835ce1c0  1 bdev(0x561250a20a80
/var/lib/ceph/osd/ceph-29/block) close
2018-10-03 09:47:15.616 7fb8835ce1c0  1 bdev(0x561250a2
/var/lib/ceph/osd/ceph-29/block) close
2018-10-03 09:47:15.870 7fb8835ce1c0 -1 osd.29 0 OSD:init: unable to mount
object store
2018-10-03 09:47:15.870 7fb8835ce1c0 -1  ** ERROR: osd init failed: (5)
Input/output error

*OSD 42:*
disk is found by lvm, tmpfs is created but service immediately dies on
start without log...
This might be 

Re: [ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
Hi Paul,

thanks for the hint, I just checked and it works perfectly.

I found this guide:
https://www.reddit.com/r/ceph/comments/72yc9m/ceph_openstack_with_ec/

The works well with one meta/data setup but not with multiple (like
device-class based pools).

The link above uses client-auth, is there a better way?

Kevin

Am So., 23. Sep. 2018 um 18:08 Uhr schrieb Paul Emmerich
:
>
> The usual trick for clients not supporting this natively is the option
> "rbd_default_data_pool" in ceph.conf which should also work here.
>
>
>   Paul
> Am So., 23. Sep. 2018 um 18:03 Uhr schrieb Kevin Olbrich :
> >
> > Hi!
> >
> > Is it possible to set data-pool for ec-pools on qemu-img?
> > For repl-pools I used "qemu-img convert" to convert from e.g. vmdk to raw 
> > and write to rbd/ceph directly.
> >
> > The rbd utility is able to do this for raw or empty images but without 
> > convert (converting 800G and writing it again would now take at least twice 
> > the time).
> >
> > Do I miss a parameter for qemu-kvm?
> >
> > Kind regards
> > Kevin
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
Hi!

Is it possible to set data-pool for ec-pools on qemu-img?
For repl-pools I used "qemu-img convert" to convert from e.g. vmdk to raw
and write to rbd/ceph directly.

The rbd utility is able to do this for raw or empty images but without
convert (converting 800G and writing it again would now take at least twice
the time).

Do I miss a parameter for qemu-kvm?

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Thank you very much Paul.

Kevin


Am Do., 20. Sep. 2018 um 15:19 Uhr schrieb Paul Emmerich <
paul.emmer...@croit.io>:

> Hi,
>
> device classes are internally represented as completely independent
> trees/roots; showing them in one tree is just syntactic sugar.
>
> For example, if you have a hierarchy like root --> host1, host2, host3
> --> nvme/ssd/sata OSDs, then you'll actually have 3 trees:
>
> root~ssd -> host1~ssd, host2~ssd ...
> root~sata -> host~sata, ...
>
>
> Paul
>
> 2018-09-20 14:54 GMT+02:00 Kevin Olbrich :
> > Hi!
> >
> > Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host.
> > I also have replication rules to distinguish between HDD and SSD (and
> > failure-domain set to rack) which are mapped to pools.
> >
> > What happens if I add a heterogeneous host with 1x SSD and 1x NVMe (where
> > NVMe will be a new device-class based rule)?
> >
> > Will the crush weight be calculated from the OSDs up to the
> failure-domain
> > based on the crush rule?
> > The only crush-weights I know and see are those shown by "ceph osd tree".
> >
> > Kind regards
> > Kevin
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
To answer my own question:

ceph osd crush tree --show-shadow

Sorry for the noise...

Am Do., 20. Sep. 2018 um 14:54 Uhr schrieb Kevin Olbrich :

> Hi!
>
> Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host.
> I also have replication rules to distinguish between HDD and SSD (and
> failure-domain set to rack) which are mapped to pools.
>
> What happens if I add a heterogeneous host with 1x SSD and 1x NVMe (where
> NVMe will be a new device-class based rule)?
>
> Will the crush weight be calculated from the OSDs up to the failure-domain
> based on the crush rule?
> The only crush-weights I know and see are those shown by "ceph osd tree".
>
> Kind regards
> Kevin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Hi!

Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host.
I also have replication rules to distinguish between HDD and SSD (and
failure-domain set to rack) which are mapped to pools.

What happens if I add a heterogeneous host with 1x SSD and 1x NVMe (where
NVMe will be a new device-class based rule)?

Will the crush weight be calculated from the OSDs up to the failure-domain
based on the crush rule?
The only crush-weights I know and see are those shown by "ceph osd tree".

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] (no subject)

2018-09-18 Thread Kevin Olbrich
Hi!

is the compressible hint / incompressible hint supported on qemu+kvm?

http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/

If not, only aggressive would work in this case for rbd, right?

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] nfs-ganesha FSAL CephFS: nfs_health :DBUS :WARN :Health status is unhealthy

2018-09-10 Thread Kevin Olbrich
Hi!

Today one of our nfs-ganesha gateway experienced an outage and since crashs
every time, the client behind it tries to access the data.
This is a Ceph Mimic cluster with nfs-ganesha from ceph-repos:

nfs-ganesha-2.6.2-0.1.el7.x86_64
nfs-ganesha-ceph-2.6.2-0.1.el7.x86_64

There were fixes for this problem in 2.6.3:
https://github.com/nfs-ganesha/nfs-ganesha/issues/339

Can the build in the repos be compiled against this bugfix release?

Thank you very much.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SPDK/DPDK with Intel P3700 NVMe pool

2018-08-30 Thread Kevin Olbrich
Hi!

During our move from filestore to bluestore, we removed several Intel P3700
NVMe from the nodes.

Is someone running a SPDK/DPDK NVMe-only EC pool? Is it working well?
The docs are very short about the setup:
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#spdk-usage

I would like to re-use these cards for high-end (max IO) for database VMs.

Some notes or feedback about the setup (ceph-volume etc.) would be
appreciated.

Thank you.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HDD-only CephFS cluster with EC and without SSD/NVMe

2018-08-22 Thread Kevin Olbrich
Hi!

I am in the progress of moving a local ("large", 24x1TB) ZFS RAIDZ2 to
CephFS.
This storage is used for backup images (large sequential reads and writes).

To save space and have a RAIDZ2 (RAID6) like setup, I am planning the
following profile:

ceph osd erasure-code-profile set myprofile \
   k=3 \
   m=2 \
   ruleset-failure-domain=rack

Performance is not the first priority, this is why I do not plan to
outsource WAL/DB (broken NVMe = broken OSDs is more administrative overhead
then single OSDs).
Disks are attached by SAS multipath, throughput in general is no problem
but I did not test with ceph yet.

Is anyone using CephFS + bluestore + ec 3/2 + without WAL/DB-dev and is it
working well?

Thank you.

Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running 12.2.5 without problems, should I upgrade to 12.2.7 or wait for 12.2.8?

2018-08-10 Thread Kevin Olbrich
Am Fr., 10. Aug. 2018 um 19:29 Uhr schrieb :

>
>
> Am 30. Juli 2018 09:51:23 MESZ schrieb Micha Krause :
> >Hi,
>
> Hi Micha,
>
> >
> >I'm Running 12.2.5 and I have no Problems at the moment.
> >
> >However my servers reporting daily that they want to upgrade to 12.2.7,
> >is this save or should I wait for 12.2.8?
> >
> I guess you should Upgrade to 12.2.7 as soon as you can, specialy when
>

Why? As far as I unterstood, replicated pools for rbd are out of danger -
.6 and .7 were mostly fixes for the known cases.
We are not planning any upgrade from 12.2.5 atm. Please correct me, if I am
wrong.

Kevin


> Quote:
> The v12.2.5 release has a potential data corruption issue with erasure
> coded pools. If you ran v12.2.5 with erasure coding, please see below.
>
> See: https://ceph.com/releases/12-2-7-luminous-released/
>
> Hth
> - Mehmet
> >Are there any predictions when the 12.2.8 release will be available?
> >
> >
> >Micha Krause
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.7 Luminous released

2018-07-19 Thread Kevin Olbrich
Hi,

on upgrade from 12.2.4 to 12.2.5 the balancer module broke (mgr crashes
minutes after service started).
Only solution was to disable the balancer (service is running fine since).

Is this fixed in 12.2.7?
I was unable to locate the bug in bugtracker.

Kevin

2018-07-17 18:28 GMT+02:00 Abhishek Lekshmanan :

>
> This is the seventh bugfix release of Luminous v12.2.x long term
> stable release series. This release contains several fixes for
> regressions in the v12.2.6 and v12.2.5 releases.  We recommend that
> all users upgrade.
>
> *NOTE* The v12.2.6 release has serious known regressions, while 12.2.6
> wasn't formally announced in the mailing lists or blog, the packages
> were built and available on download.ceph.com since last week. If you
> installed this release, please see the upgrade procedure below.
>
> *NOTE* The v12.2.5 release has a potential data corruption issue with
> erasure coded pools. If you ran v12.2.5 with erasure coding, please see
> below.
>
> The full blog post alongwith the complete changelog is published at the
> official ceph blog at https://ceph.com/releases/12-2-7-luminous-released/
>
> Upgrading from v12.2.6
> --
>
> v12.2.6 included an incomplete backport of an optimization for
> BlueStore OSDs that avoids maintaining both the per-object checksum
> and the internal BlueStore checksum.  Due to the accidental omission
> of a critical follow-on patch, v12.2.6 corrupts (fails to update) the
> stored per-object checksum value for some objects.  This can result in
> an EIO error when trying to read those objects.
>
> #. If your cluster uses FileStore only, no special action is required.
>This problem only affects clusters with BlueStore.
>
> #. If your cluster has only BlueStore OSDs (no FileStore), then you
>should enable the following OSD option::
>
>  osd skip data digest = true
>
>This will avoid setting and start ignoring the full-object digests
>whenever the primary for a PG is BlueStore.
>
> #. If you have a mix of BlueStore and FileStore OSDs, then you should
>enable the following OSD option::
>
>  osd distrust data digest = true
>
>This will avoid setting and start ignoring the full-object digests
>in all cases.  This weakens the data integrity checks for
>FileStore (although those checks were always only opportunistic).
>
> If your cluster includes BlueStore OSDs and was affected, deep scrubs
> will generate errors about mismatched CRCs for affected objects.
> Currently the repair operation does not know how to correct them
> (since all replicas do not match the expected checksum it does not
> know how to proceed).  These warnings are harmless in the sense that
> IO is not affected and the replicas are all still in sync.  The number
> of affected objects is likely to drop (possibly to zero) on their own
> over time as those objects are modified.  We expect to include a scrub
> improvement in v12.2.8 to clean up any remaining objects.
>
> Additionally, see the notes below, which apply to both v12.2.5 and v12.2.6.
>
> Upgrading from v12.2.5 or v12.2.6
> -
>
> If you used v12.2.5 or v12.2.6 in combination with erasure coded
> pools, there is a small risk of corruption under certain workloads.
> Specifically, when:
>
> * An erasure coded pool is in use
> * The pool is busy with successful writes
> * The pool is also busy with updates that result in an error result to
>   the librados user.  RGW garbage collection is the most common
>   example of this (it sends delete operations on objects that don't
>   always exist.)
> * Some OSDs are reasonably busy.  One known example of such load is
>   FileStore splitting, although in principle any load on the cluster
>   could also trigger the behavior.
> * One or more OSDs restarts.
>
> This combination can trigger an OSD crash and possibly leave PGs in a state
> where they fail to peer.
>
> Notably, upgrading a cluster involves OSD restarts and as such may
> increase the risk of encountering this bug.  For this reason, for
> clusters with erasure coded pools, we recommend the following upgrade
> procedure to minimize risk:
>
> 1. Install the v12.2.7 packages.
> 2. Temporarily quiesce IO to cluster::
>
>  ceph osd pause
>
> 3. Restart all OSDs and wait for all PGs to become active.
> 4. Resume IO::
>
>  ceph osd unpause
>
> This will cause an availability outage for the duration of the OSD
> restarts.  If this in unacceptable, an *more risky* alternative is to
> disable RGW garbage collection (the primary known cause of these rados
> operations) for the duration of the upgrade::
>
> 1. Set ``rgw_enable_gc_threads = false`` in ceph.conf
> 2. Restart all radosgw daemons
> 3. Upgrade and restart all OSDs
> 4. Remove ``rgw_enable_gc_threads = false`` from ceph.conf
> 5. Restart all radosgw daemons
>
> Upgrading from other versions
> -
>
> If your cluster did not run v12.2.5 or v12.2.6 then none of the above
> 

Re: [ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
PS: It's luminous 12.2.5!


Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

2018-07-14 15:19 GMT+02:00 Kevin Olbrich :

> Hi,
>
> why do I see activating followed by peering during OSD add (refill)?
> I did not change pg(p)_num.
>
> Is this normal? From my other clusters, I don't think that happend...
>
> Kevin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
Hi,

why do I see activating followed by peering during OSD add (refill)?
I did not change pg(p)_num.

Is this normal? From my other clusters, I don't think that happend...

Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore and number of devices

2018-07-13 Thread Kevin Olbrich
You can keep the same layout as before. Most place DB/WAL combined in one
partition (similar to the journal on filestore).

Kevin

2018-07-13 12:37 GMT+02:00 Robert Stanford :

>
>  I'm using filestore now, with 4 data devices per journal device.
>
>  I'm confused by this: "BlueStore manages either one, two, or (in certain
> cases) three storage devices."
> (http://docs.ceph.com/docs/luminous/rados/configuration/
> bluestore-config-ref/)
>
>  When I convert my journals to bluestore, will they still be four data
> devices (osds) per journal, or will they each require a dedicated journal
> drive now?
>
>  Regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Kevin Olbrich
Sounds a little bit like the problem I had on OSDs:

[ceph-users] Blocked requests activating+remapped after extending pg(p)_num
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html>
 *Kevin
Olbrich*

   - [ceph-users] Blocked requests activating+remapped
   afterextendingpg(p)_num
   <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026681.html>
 *Burkhard Linke*
  - [ceph-users] Blocked requests activating+remapped
  afterextendingpg(p)_num
  <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026682.html>
*Kevin Olbrich*
 - [ceph-users] Blocked requests activating+remapped
 afterextendingpg(p)_num
 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026683.html>
   *Kevin Olbrich*
 - [ceph-users] Blocked requests activating+remapped
 afterextendingpg(p)_num
 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026685.html>
   *Kevin Olbrich*
 - [ceph-users] Blocked requests activating+remapped
 afterextendingpg(p)_num
 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026689.html>
   *Kevin Olbrich*
 - [ceph-users] Blocked requests activating+remapped
 afterextendingpg(p)_num
 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026692.html>
   *Paul Emmerich*
 - [ceph-users] Blocked requests activating+remapped
 afterextendingpg(p)_num
 
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026695.html>
   *Kevin Olbrich*

I ended up restarting the OSDs which were stuck in that state and they
immediately fixed themselfs.
It should also work to just "out" the problem-OSDs and immeditly up them
again to fix it.

- Kevin

2018-07-11 20:30 GMT+02:00 Magnus Grönlund :

> Hi,
>
> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous (12.2.6)
>
> After upgrading and restarting the mons everything looked OK, the mons had
> quorum, all OSDs where up and in and all the PGs where active+clean.
> But before I had time to start upgrading the OSDs it became obvious that
> something had gone terribly wrong.
> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data
> was misplaced!
>
> The mons appears OK and all OSDs are still up and in, but a few hours
> later there was still 1483 pgs stuck inactive, essentially all of them in
> peering!
> Investigating one of the stuck PGs it appears to be looping between
> “inactive”, “remapped+peering” and “peering” and the epoch number is rising
> fast, see the attached pg query outputs.
>
> We really can’t afford to loose the cluster or the data so any help or
> suggestions on how to debug or fix this issue would be very, very
> appreciated!
>
>
> health: HEALTH_ERR
> 1483 pgs are stuck inactive for more than 60 seconds
> 542 pgs backfill_wait
> 14 pgs backfilling
> 11 pgs degraded
> 1402 pgs peering
> 3 pgs recovery_wait
> 11 pgs stuck degraded
> 1483 pgs stuck inactive
> 2042 pgs stuck unclean
> 7 pgs stuck undersized
> 7 pgs undersized
> 111 requests are blocked > 32 sec
> 10586 requests are blocked > 4096 sec
> recovery 9472/11120724 objects degraded (0.085%)
> recovery 1181567/11120724 objects misplaced (10.625%)
> noout flag(s) set
> mon.eselde02u32 low disk space
>
>   services:
> mon: 3 daemons, quorum eselde02u32,eselde02u33,eselde02u34
> mgr: eselde02u32(active), standbys: eselde02u33, eselde02u34
> osd: 111 osds: 111 up, 111 in; 800 remapped pgs
>  flags noout
>
>   data:
> pools:   18 pools, 4104 pgs
> objects: 3620k objects, 13875 GB
> usage:   42254 GB used, 160 TB / 201 TB avail
> pgs: 1.876% pgs unknown
>  34.259% pgs not active
>  9472/11120724 objects degraded (0.085%)
>  1181567/11120724 objects misplaced (10.625%)
>  2062 active+clean
> 1221 peering
>  535  active+remapped+backfill_wait
>  181  remapped+peering
>  77   unknown
>  13   active+remapped+backfilling
>  7active+undersized+degraded+remapped+backfill_wait
>  4remapped
>  3active+recovery_wait+degraded+remapped
>  1active+degraded+remapped+backfilling
>
>   io:
> recovery: 298 MB/s, 77 objects/s
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
2018-07-10 14:37 GMT+02:00 Jason Dillaman :

> On Tue, Jul 10, 2018 at 2:37 AM Kevin Olbrich  wrote:
>
>> 2018-07-10 0:35 GMT+02:00 Jason Dillaman :
>>
>>> Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least
>>> present on the client computer you used? I would have expected the OSD to
>>> determine the client address, so it's odd that it was able to get a
>>> link-local address.
>>>
>>
>> Yes, it is. eth0 is part of bond0 which is a vlan trunk. Bond0.X is
>> attached to brX which has an ULA-prefix for the ceph cluster.
>> Eth0 has no address itself. In this case this must mean, the address has
>> been carried down to the hardware interface.
>>
>> I am wondering why it uses link local when there is an ULA-prefix
>> available.
>>
>> The address is available on brX on this client node.
>>
>
> I'll open a tracker ticker to get that issue fixed, but in the meantime,
> you can run "rados -p  rmxattr rbd_header.
> lock.rbd_lock" to remove the lock.
>

Worked perfectly, thank you very much!


>
>> - Kevin
>>
>>
>>> On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich  wrote:
>>>
>>>> 2018-07-09 21:25 GMT+02:00 Jason Dillaman :
>>>>
>>>>> BTW -- are you running Ceph on a one-node computer? I thought IPv6
>>>>> addresses starting w/ fe80 were link-local addresses which would probably
>>>>> explain why an interface scope id was appended. The current IPv6 address
>>>>> parser stops reading after it encounters a non hex, colon character [1].
>>>>>
>>>>
>>>> No, this is a compute machine attached to the storage vlan where I
>>>> previously had also local disks.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman 
>>>>> wrote:
>>>>>
>>>>>> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses
>>>>>> since it is failing to parse the address as valid. Perhaps it's barfing 
>>>>>> on
>>>>>> the "%eth0" scope id suffix within the address.
>>>>>>
>>>>>> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>>>>>>> Immediately I stopped the transfer but the image is stuck locked:
>>>>>>>
>>>>>>> Previusly when that happened, I was able to remove the image after
>>>>>>> 30 secs.
>>>>>>>
>>>>>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>>>>>>> There is 1 exclusive lock on this image.
>>>>>>> Locker ID  Address
>>>>>>>
>>>>>>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>>>>>>> eth0]:0/1200385089
>>>>>>>
>>>>>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02
>>>>>>> "auto 93921602220416" client.1195723
>>>>>>> rbd: releasing lock failed: (22) Invalid argument
>>>>>>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>>>>>>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>>>>>>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to
>>>>>>> blacklist client: (22) Invalid argument
>>>>>>>
>>>>>>> The image is not in use anywhere!
>>>>>>>
>>>>>>> How can I force removal of all locks for this image?
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Kevin
>>>>>>> ___
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>>>
>>>>>
>>>>> [1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108
>>>>>
>>>>> --
>>>>> Jason
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Jason
>>>
>>
>>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
2018-07-10 0:35 GMT+02:00 Jason Dillaman :

> Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least
> present on the client computer you used? I would have expected the OSD to
> determine the client address, so it's odd that it was able to get a
> link-local address.
>

Yes, it is. eth0 is part of bond0 which is a vlan trunk. Bond0.X is
attached to brX which has an ULA-prefix for the ceph cluster.
Eth0 has no address itself. In this case this must mean, the address has
been carried down to the hardware interface.

I am wondering why it uses link local when there is an ULA-prefix available.

The address is available on brX on this client node.

- Kevin


> On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich  wrote:
>
>> 2018-07-09 21:25 GMT+02:00 Jason Dillaman :
>>
>>> BTW -- are you running Ceph on a one-node computer? I thought IPv6
>>> addresses starting w/ fe80 were link-local addresses which would probably
>>> explain why an interface scope id was appended. The current IPv6 address
>>> parser stops reading after it encounters a non hex, colon character [1].
>>>
>>
>> No, this is a compute machine attached to the storage vlan where I
>> previously had also local disks.
>>
>>
>>>
>>>
>>> On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman 
>>> wrote:
>>>
>>>> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses
>>>> since it is failing to parse the address as valid. Perhaps it's barfing on
>>>> the "%eth0" scope id suffix within the address.
>>>>
>>>> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>>>>> Immediately I stopped the transfer but the image is stuck locked:
>>>>>
>>>>> Previusly when that happened, I was able to remove the image after 30
>>>>> secs.
>>>>>
>>>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>>>>> There is 1 exclusive lock on this image.
>>>>> Locker ID  Address
>>>>>
>>>>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>>>>> eth0]:0/1200385089
>>>>>
>>>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>>>>> 93921602220416" client.1195723
>>>>> rbd: releasing lock failed: (22) Invalid argument
>>>>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>>>>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>>>>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>>>>> client: (22) Invalid argument
>>>>>
>>>>> The image is not in use anywhere!
>>>>>
>>>>> How can I force removal of all locks for this image?
>>>>>
>>>>> Kind regards,
>>>>> Kevin
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>>>
>>>
>>> [1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108
>>>
>>> --
>>> Jason
>>>
>>
>>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
2018-07-09 21:25 GMT+02:00 Jason Dillaman :

> BTW -- are you running Ceph on a one-node computer? I thought IPv6
> addresses starting w/ fe80 were link-local addresses which would probably
> explain why an interface scope id was appended. The current IPv6 address
> parser stops reading after it encounters a non hex, colon character [1].
>

No, this is a compute machine attached to the storage vlan where I
previously had also local disks.


>
>
> On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman  wrote:
>
>> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses
>> since it is failing to parse the address as valid. Perhaps it's barfing on
>> the "%eth0" scope id suffix within the address.
>>
>> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>>
>>> Hi!
>>>
>>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>>> Immediately I stopped the transfer but the image is stuck locked:
>>>
>>> Previusly when that happened, I was able to remove the image after 30
>>> secs.
>>>
>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>>> There is 1 exclusive lock on this image.
>>> Locker ID  Address
>>>
>>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>>> eth0]:0/1200385089
>>>
>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>>> 93921602220416" client.1195723
>>> rbd: releasing lock failed: (22) Invalid argument
>>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>>> client: (22) Invalid argument
>>>
>>> The image is not in use anywhere!
>>>
>>> How can I force removal of all locks for this image?
>>>
>>> Kind regards,
>>> Kevin
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Jason
>>
>
> [1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
Is it possible to force-remove the lock or the image?

Kevin

2018-07-09 21:14 GMT+02:00 Jason Dillaman :

> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses since
> it is failing to parse the address as valid. Perhaps it's barfing on the
> "%eth0" scope id suffix within the address.
>
> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>
>> Hi!
>>
>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>> Immediately I stopped the transfer but the image is stuck locked:
>>
>> Previusly when that happened, I was able to remove the image after 30
>> secs.
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>> There is 1 exclusive lock on this image.
>> Locker ID  Address
>>
>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>> eth0]:0/1200385089
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>> 93921602220416" client.1195723
>> rbd: releasing lock failed: (22) Invalid argument
>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>> client: (22) Invalid argument
>>
>> The image is not in use anywhere!
>>
>> How can I force removal of all locks for this image?
>>
>> Kind regards,
>> Kevin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
Hi!

I tried to convert an qcow2 file to rbd and set the wrong pool.
Immediately I stopped the transfer but the image is stuck locked:

Previusly when that happened, I was able to remove the image after 30 secs.

[root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
There is 1 exclusive lock on this image.
Locker ID  Address

client.1195723 auto 93921602220416
[fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089

[root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
93921602220416" client.1195723
rbd: releasing lock failed: (22) Invalid argument
2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
client: (22) Invalid argument

The image is not in use anywhere!

How can I force removal of all locks for this image?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] GFS2 as RBD on ceph?

2018-06-12 Thread Kevin Olbrich
Hi!

*Is it safe to run GFS2 on ceph as RBD and mount it to approx. 3 to 5 vm's?*
Idea is to consolidate 3 webservers which are located behind proxys. The
old infrastructure is not HA or capable of load balancing.
I would like to set up a webserver, clone the image and mount the GFS2 disk
as shared storage. This would also allow FTP load balancing.

Redundancy would be taken care of by ceph while the VMs share up-to-date
data on all nodes.

*I don't think CephFS is an option, as most files are very small and
thousands of files will be opened simultaneously.*

Anyone using such an approach?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding cluster network to running cluster

2018-06-07 Thread Kevin Olbrich
Realy?

I always thought that splitting the replication network is best practice.
Keeping everything in the same IPv6 network is much easier.

Thank you.

Kevin

2018-06-07 10:44 GMT+02:00 Wido den Hollander :

>
>
> On 06/07/2018 09:46 AM, Kevin Olbrich wrote:
> > Hi!
> >
> > When we installed our new luminous cluster, we had issues with the
> > cluster network (setup of mon's failed).
> > We moved on with a single network setup.
> >
> > Now I would like to set the cluster network again but the cluster is in
> > use (4 nodes, 2 pools, VMs).
>
> Why? What is the benefit from having the cluster network? Back in the
> old days when 10Gb was expensive you would run public on 1G and cluster
> on 10G.
>
> Now with 2x10Gb going into each machine, why still bother with managing
> two networks?
>
> I really do not see the benefit.
>
> I manage multiple 1000 ~ 2500 OSD clusters all running with all their
> nodes on IPv6 and 2x10Gb in a single network. That works just fine.
>
> Try to keep the network simple and do not overcomplicate it.
>
> Wido
>
> > What happens if I set the cluster network on one of the nodes and reboot
> > (maintenance, updates, etc.)?
> > Will the node use both networks as the other three nodes are not
> > reachable there?
> >
> > Both the MONs and OSDs have IPs in both networks, routing is not needed.
> > This cluster is dualstack but we set ms_bind_ipv6 = true.
> >
> > Thank you.
> >
> > Kind regards
> > Kevin
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi!

@Paul
Thanks! I know, I read the whole topic about size 2 some months ago. But
this has not been my decision, I had to set it up like that.

In the meantime, I did a reboot of node1001 and node1002 with flag "noout"
set and now peering has finished and only 0.0x% are rebalanced.
IO is flowing again. This happend as soon as the OSD was down (not out).

This looks very much like a bug for me, isn't it? Restarting an OSD to
"repair" crush?
Also I did query the pg but it did not show any error. It just lists stats
and that the pg was active since 8:40 this morning.
There are row(s) with "blocked by" but no value, is that supposed to be
filled with data?

Kind regards,
Kevin



2018-05-17 16:45 GMT+02:00 Paul Emmerich <paul.emmer...@croit.io>:

> Check ceph pg query, it will (usually) tell you why something is stuck
> inactive.
>
> Also: never do min_size 1.
>
>
> Paul
>
>
> 2018-05-17 15:48 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>
>> I was able to obtain another NVMe to get the HDDs in node1004 into the
>> cluster.
>> The number of disks (all 1TB) is now balanced between racks, still some
>> inactive PGs:
>>
>>   data:
>> pools:   2 pools, 1536 pgs
>> objects: 639k objects, 2554 GB
>> usage:   5167 GB used, 14133 GB / 19300 GB avail
>> pgs: 1.562% pgs not active
>>  1183/1309952 objects degraded (0.090%)
>>  199660/1309952 objects misplaced (15.242%)
>>  1072 active+clean
>>  405  active+remapped+backfill_wait
>>  35   active+remapped+backfilling
>>  21   activating+remapped
>>  3activating+undersized+degraded+remapped
>>
>>
>>
>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>  -1   18.85289 root default
>> -16   18.85289 datacenter dc01
>> -19   18.85289 pod dc01-agg01
>> -108.98700 rack dc01-rack02
>>  -44.03899 host node1001
>>   0   hdd  0.90999 osd.0 up  1.0 1.0
>>   1   hdd  0.90999 osd.1 up  1.0 1.0
>>   5   hdd  0.90999 osd.5 up  1.0 1.0
>>   2   ssd  0.43700 osd.2 up  1.0 1.0
>>   3   ssd  0.43700 osd.3 up  1.0 1.0
>>   4   ssd  0.43700 osd.4 up  1.0 1.0
>>  -74.94899 host node1002
>>   9   hdd  0.90999 osd.9 up  1.0 1.0
>>  10   hdd  0.90999 osd.10up  1.0 1.0
>>  11   hdd  0.90999 osd.11up  1.0 1.0
>>  12   hdd  0.90999 osd.12up  1.0 1.0
>>   6   ssd  0.43700 osd.6 up  1.0 1.0
>>   7   ssd  0.43700 osd.7 up  1.0 1.0
>>   8   ssd  0.43700 osd.8 up  1.0 1.0
>> -119.86589 rack dc01-rack03
>> -225.38794 host node1003
>>  17   hdd  0.90999 osd.17up  1.0 1.0
>>  18   hdd  0.90999 osd.18up  1.0 1.0
>>  24   hdd  0.90999 osd.24up  1.0 1.0
>>  26   hdd  0.90999 osd.26up  1.0 1.0
>>  13   ssd  0.43700 osd.13up  1.0 1.0
>>  14   ssd  0.43700 osd.14up  1.0 1.0
>>  15   ssd  0.43700 osd.15up  1.0 1.0
>>  16   ssd  0.43700 osd.16up  1.0 1.0
>> -254.47795 host node1004
>>  23   hdd  0.90999 osd.23up  1.0 1.0
>>  25   hdd  0.90999 osd.25up  1.0 1.0
>>  27   hdd  0.90999 osd.27up  1.0 1.0
>>  19   ssd  0.43700 osd.19up  1.0 1.0
>>  20   ssd  0.43700 osd.20up  1.0 1.0
>>  21   ssd  0.43700 osd.21up  1.0 1.0
>>  22   ssd  0.43700 osd.22up  1.0 1.0
>>
>>
>> Pools are size 2, min_size 1 during setup.
>>
>> The count of PGs in activate state are related to the weight of OSDs but
>> why are they failing to proceed to active+clean or active+remapped?
>>
>> Kind regards,
&

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
I was able to obtain another NVMe to get the HDDs in node1004 into the
cluster.
The number of disks (all 1TB) is now balanced between racks, still some
inactive PGs:

  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5167 GB used, 14133 GB / 19300 GB avail
pgs: 1.562% pgs not active
 1183/1309952 objects degraded (0.090%)
 199660/1309952 objects misplaced (15.242%)
 1072 active+clean
 405  active+remapped+backfill_wait
 35   active+remapped+backfilling
 21   activating+remapped
 3activating+undersized+degraded+remapped



ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   18.85289 root default
-16   18.85289 datacenter dc01
-19   18.85289 pod dc01-agg01
-108.98700 rack dc01-rack02
 -44.03899 host node1001
  0   hdd  0.90999 osd.0 up  1.0 1.0
  1   hdd  0.90999 osd.1 up  1.0 1.0
  5   hdd  0.90999 osd.5 up  1.0 1.0
  2   ssd  0.43700 osd.2 up  1.0 1.0
  3   ssd  0.43700 osd.3 up  1.0 1.0
  4   ssd  0.43700 osd.4 up  1.0 1.0
 -74.94899 host node1002
  9   hdd  0.90999 osd.9 up  1.0 1.0
 10   hdd  0.90999 osd.10up  1.0 1.0
 11   hdd  0.90999 osd.11up  1.0 1.0
 12   hdd  0.90999 osd.12up  1.0 1.0
  6   ssd  0.43700 osd.6 up  1.0 1.0
  7   ssd  0.43700 osd.7 up  1.0 1.0
  8   ssd  0.43700 osd.8 up  1.0 1.0
-119.86589 rack dc01-rack03
-225.38794 host node1003
 17   hdd  0.90999 osd.17up  1.0 1.0
 18   hdd  0.90999 osd.18up  1.0 1.0
 24   hdd  0.90999 osd.24up  1.0 1.0
 26   hdd  0.90999 osd.26up  1.0 1.0
 13   ssd  0.43700 osd.13up  1.0 1.0
 14   ssd  0.43700 osd.14up  1.0 1.0
 15   ssd  0.43700 osd.15up  1.0 1.0
 16   ssd  0.43700 osd.16up  1.0 1.0
-254.47795 host node1004
 23   hdd  0.90999 osd.23up  1.0 1.0
 25   hdd  0.90999 osd.25up  1.0 1.0
 27   hdd  0.90999 osd.27up  1.0 1.0
 19   ssd  0.43700 osd.19up  1.0 1.0
 20   ssd  0.43700 osd.20up  1.0 1.0
 21   ssd  0.43700 osd.21up  1.0 1.0
 22   ssd  0.43700 osd.22up  1.0 1.0


Pools are size 2, min_size 1 during setup.

The count of PGs in activate state are related to the weight of OSDs but
why are they failing to proceed to active+clean or active+remapped?

Kind regards,
Kevin

2018-05-17 14:05 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> Ok, I just waited some time but I still got some "activating" issues:
>
>   data:
> pools:   2 pools, 1536 pgs
> objects: 639k objects, 2554 GB
> usage:   5194 GB used, 11312 GB / 16506 GB avail
> pgs: 7.943% pgs not active
>  5567/1309948 objects degraded (0.425%)
>  195386/1309948 objects misplaced (14.916%)
>  1147 active+clean
>  235  active+remapped+backfill_wait
> * 107  activating+remapped*
>  32   active+remapped+backfilling
> * 15   activating+undersized+degraded+remapped*
>
> I set these settings during runtime:
> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>
> Sure, mon_max_pg_per_osd is oversized but this is just temporary.
> Calculated PGs per OSD is 200.
>
> I searched the net and the bugtracker but most posts suggest
> osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I got
> more stuck PGs.
>
> Any more hints?
>
> Kind regards.
> Kevin
>
> 2018-05-17 13:37 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>
>> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
>> default, will place 200 PGs on each OSD.
>> I read about the protection in

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Ok, I just waited some time but I still got some "activating" issues:

  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5194 GB used, 11312 GB / 16506 GB avail
pgs: 7.943% pgs not active
 5567/1309948 objects degraded (0.425%)
 195386/1309948 objects misplaced (14.916%)
 1147 active+clean
 235  active+remapped+backfill_wait
* 107  activating+remapped*
 32   active+remapped+backfilling
* 15   activating+undersized+degraded+remapped*

I set these settings during runtime:
ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'

Sure, mon_max_pg_per_osd is oversized but this is just temporary.
Calculated PGs per OSD is 200.

I searched the net and the bugtracker but most posts suggest
osd_max_pg_per_osd_hard_ratio
= 32 to fix this issue but this time, I got more stuck PGs.

Any more hints?

Kind regards.
Kevin

2018-05-17 13:37 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
> default, will place 200 PGs on each OSD.
> I read about the protection in the docs and later noticed that I better
> had only placed 100 PGs.
>
>
> 2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>
>> Hi!
>>
>> Thanks for your quick reply.
>> Before I read your mail, i applied the following conf to my OSDs:
>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>>
>> Status is now:
>>   data:
>> pools:   2 pools, 1536 pgs
>> objects: 639k objects, 2554 GB
>> usage:   5211 GB used, 11295 GB / 16506 GB avail
>> pgs: 7.943% pgs not active
>>  5567/1309948 objects degraded (0.425%)
>>  252327/1309948 objects misplaced (19.262%)
>>  1030 active+clean
>>  351  active+remapped+backfill_wait
>>  107  activating+remapped
>>  33   active+remapped+backfilling
>>  15   activating+undersized+degraded+remapped
>>
>> A little bit better but still some non-active PGs.
>> I will investigate your other hints!
>>
>> Thanks
>> Kevin
>>
>> 2018-05-17 13:30 GMT+02:00 Burkhard Linke <Burkhard.Linke@computational.
>> bio.uni-giessen.de>:
>>
>>> Hi,
>>>
>>>
>>>
>>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>>
>>>> Hi!
>>>>
>>>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>>>> complaining about to few PGs. (I noticed that should better have been
>>>> small
>>>> changes).
>>>>
>>>> This is the current status:
>>>>
>>>>  health: HEALTH_ERR
>>>>  336568/1307562 objects misplaced (25.740%)
>>>>  Reduced data availability: 128 pgs inactive, 3 pgs
>>>> peering, 1
>>>> pg stale
>>>>  Degraded data redundancy: 6985/1307562 objects degraded
>>>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>>>  107 slow requests are blocked > 32 sec
>>>>  218 stuck requests are blocked > 4096 sec
>>>>
>>>>data:
>>>>  pools:   2 pools, 1536 pgs
>>>>  objects: 638k objects, 2549 GB
>>>>  usage:   5210 GB used, 11295 GB / 16506 GB avail
>>>>  pgs: 0.195% pgs unknown
>>>>   8.138% pgs not active
>>>>   6985/1307562 objects degraded (0.534%)
>>>>   336568/1307562 objects misplaced (25.740%)
>>>>   855 active+clean
>>>>   517 active+remapped+backfill_wait
>>>>   107 activating+remapped
>>>>   31  active+remapped+backfilling
>>>>   15  activating+undersized+degraded+remapped
>>>>   4   active+undersized+degraded+remapped+backfilling
>>>>   3   unknown
>>>>   3   peering
>>>>   1   stale+active+clean
>>>>
>>>
>>> You need to resolve the unknown/peering/activating pgs first. You have
>>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>>> OSDs and the heterogenous host sizes,

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
default, will place 200 PGs on each OSD.
I read about the protection in the docs and later noticed that I better had
only placed 100 PGs.


2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> Hi!
>
> Thanks for your quick reply.
> Before I read your mail, i applied the following conf to my OSDs:
> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>
> Status is now:
>   data:
> pools:   2 pools, 1536 pgs
> objects: 639k objects, 2554 GB
> usage:   5211 GB used, 11295 GB / 16506 GB avail
> pgs: 7.943% pgs not active
>  5567/1309948 objects degraded (0.425%)
>  252327/1309948 objects misplaced (19.262%)
>  1030 active+clean
>  351  active+remapped+backfill_wait
>  107  activating+remapped
>  33   active+remapped+backfilling
>  15   activating+undersized+degraded+remapped
>
> A little bit better but still some non-active PGs.
> I will investigate your other hints!
>
> Thanks
> Kevin
>
> 2018-05-17 13:30 GMT+02:00 Burkhard Linke <Burkhard.Linke@computational.
> bio.uni-giessen.de>:
>
>> Hi,
>>
>>
>>
>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>
>>> Hi!
>>>
>>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>>> complaining about to few PGs. (I noticed that should better have been
>>> small
>>> changes).
>>>
>>> This is the current status:
>>>
>>>  health: HEALTH_ERR
>>>  336568/1307562 objects misplaced (25.740%)
>>>  Reduced data availability: 128 pgs inactive, 3 pgs peering,
>>> 1
>>> pg stale
>>>  Degraded data redundancy: 6985/1307562 objects degraded
>>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>>  107 slow requests are blocked > 32 sec
>>>  218 stuck requests are blocked > 4096 sec
>>>
>>>data:
>>>  pools:   2 pools, 1536 pgs
>>>  objects: 638k objects, 2549 GB
>>>  usage:   5210 GB used, 11295 GB / 16506 GB avail
>>>  pgs: 0.195% pgs unknown
>>>   8.138% pgs not active
>>>   6985/1307562 objects degraded (0.534%)
>>>   336568/1307562 objects misplaced (25.740%)
>>>   855 active+clean
>>>   517 active+remapped+backfill_wait
>>>   107 activating+remapped
>>>   31  active+remapped+backfilling
>>>   15  activating+undersized+degraded+remapped
>>>   4   active+undersized+degraded+remapped+backfilling
>>>   3   unknown
>>>   3   peering
>>>   1   stale+active+clean
>>>
>>
>> You need to resolve the unknown/peering/activating pgs first. You have
>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
>> than 200 PGs. There's a threshold for the number of PGs; reaching this
>> threshold keeps the OSDs from accepting new PGs.
>>
>> Try to increase the threshold  (mon_max_pg_per_osd /
>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
>> the exact one, consult the documentation) to allow more PGs on the OSDs. If
>> this is the cause of the problem, the peering and activating states should
>> be resolved within a short time.
>>
>> You can also check the number of PGs per OSD with 'ceph osd df'; the last
>> column is the current number of PGs.
>>
>>
>>>
>>> OSD tree:
>>>
>>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>>   -1   16.12177 root default
>>> -16   16.12177 datacenter dc01
>>> -19   16.12177 pod dc01-agg01
>>> -108.98700 rack dc01-rack02
>>>   -44.03899 host node1001
>>>0   hdd  0.90999 osd.0 up  1.0 1.0
>>>1   hdd  0.90999 osd.1 up  1.0 1.0
>>>5   hdd  0.90999 osd.5 up  1.0 1.0
>>>2   ssd  0.43700 osd.2 up  1.0 1.0
>>>3   ssd  0.43700 osd.3 up  1.0 1.0
>>>

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi!

Thanks for your quick reply.
Before I read your mail, i applied the following conf to my OSDs:
ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'

Status is now:
  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5211 GB used, 11295 GB / 16506 GB avail
pgs: 7.943% pgs not active
 5567/1309948 objects degraded (0.425%)
 252327/1309948 objects misplaced (19.262%)
 1030 active+clean
 351  active+remapped+backfill_wait
 107  activating+remapped
 33   active+remapped+backfilling
 15   activating+undersized+degraded+remapped

A little bit better but still some non-active PGs.
I will investigate your other hints!

Thanks
Kevin

2018-05-17 13:30 GMT+02:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
>
>
> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>
>> Hi!
>>
>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>> complaining about to few PGs. (I noticed that should better have been
>> small
>> changes).
>>
>> This is the current status:
>>
>>  health: HEALTH_ERR
>>  336568/1307562 objects misplaced (25.740%)
>>  Reduced data availability: 128 pgs inactive, 3 pgs peering, 1
>> pg stale
>>  Degraded data redundancy: 6985/1307562 objects degraded
>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>  107 slow requests are blocked > 32 sec
>>  218 stuck requests are blocked > 4096 sec
>>
>>data:
>>  pools:   2 pools, 1536 pgs
>>  objects: 638k objects, 2549 GB
>>  usage:   5210 GB used, 11295 GB / 16506 GB avail
>>  pgs: 0.195% pgs unknown
>>   8.138% pgs not active
>>   6985/1307562 objects degraded (0.534%)
>>   336568/1307562 objects misplaced (25.740%)
>>   855 active+clean
>>   517 active+remapped+backfill_wait
>>   107 activating+remapped
>>   31  active+remapped+backfilling
>>   15  activating+undersized+degraded+remapped
>>   4   active+undersized+degraded+remapped+backfilling
>>   3   unknown
>>   3   peering
>>   1   stale+active+clean
>>
>
> You need to resolve the unknown/peering/activating pgs first. You have
> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
> than 200 PGs. There's a threshold for the number of PGs; reaching this
> threshold keeps the OSDs from accepting new PGs.
>
> Try to increase the threshold  (mon_max_pg_per_osd /
> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
> the exact one, consult the documentation) to allow more PGs on the OSDs. If
> this is the cause of the problem, the peering and activating states should
> be resolved within a short time.
>
> You can also check the number of PGs per OSD with 'ceph osd df'; the last
> column is the current number of PGs.
>
>
>>
>> OSD tree:
>>
>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>   -1   16.12177 root default
>> -16   16.12177 datacenter dc01
>> -19   16.12177 pod dc01-agg01
>> -108.98700 rack dc01-rack02
>>   -44.03899 host node1001
>>0   hdd  0.90999 osd.0 up  1.0 1.0
>>1   hdd  0.90999 osd.1 up  1.0 1.0
>>5   hdd  0.90999 osd.5 up  1.0 1.0
>>2   ssd  0.43700 osd.2 up  1.0 1.0
>>3   ssd  0.43700 osd.3 up  1.0 1.0
>>4   ssd  0.43700 osd.4 up  1.0 1.0
>>   -74.94899 host node1002
>>9   hdd  0.90999 osd.9 up  1.0 1.0
>>   10   hdd  0.90999 osd.10up  1.0 1.0
>>   11   hdd  0.90999 osd.11up  1.0 1.0
>>   12   hdd  0.90999 osd.12up  1.0 1.0
>>6   ssd  0.43700 osd.6 up  1.0 1.0
>>7   ssd  0.43700 osd.7 up  1.0 1.0
>>8   ssd  0.43700 osd.8 up  1.0 1.0
>> -11

[ceph-users] Blocked requests activating+remapped after extending pg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi!

Today I added some new OSDs (nearly doubled) to my luminous cluster.
I then changed pg(p)_num from 256 to 1024 for that pool because it was
complaining about to few PGs. (I noticed that should better have been small
changes).

This is the current status:

health: HEALTH_ERR
336568/1307562 objects misplaced (25.740%)
Reduced data availability: 128 pgs inactive, 3 pgs peering, 1
pg stale
Degraded data redundancy: 6985/1307562 objects degraded
(0.534%), 19 pgs degraded, 19 pgs undersized
107 slow requests are blocked > 32 sec
218 stuck requests are blocked > 4096 sec

  data:
pools:   2 pools, 1536 pgs
objects: 638k objects, 2549 GB
usage:   5210 GB used, 11295 GB / 16506 GB avail
pgs: 0.195% pgs unknown
 8.138% pgs not active
 6985/1307562 objects degraded (0.534%)
 336568/1307562 objects misplaced (25.740%)
 855 active+clean
 517 active+remapped+backfill_wait
 107 activating+remapped
 31  active+remapped+backfilling
 15  activating+undersized+degraded+remapped
 4   active+undersized+degraded+remapped+backfilling
 3   unknown
 3   peering
 1   stale+active+clean


OSD tree:

ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   16.12177 root default
-16   16.12177 datacenter dc01
-19   16.12177 pod dc01-agg01
-108.98700 rack dc01-rack02
 -44.03899 host node1001
  0   hdd  0.90999 osd.0 up  1.0 1.0
  1   hdd  0.90999 osd.1 up  1.0 1.0
  5   hdd  0.90999 osd.5 up  1.0 1.0
  2   ssd  0.43700 osd.2 up  1.0 1.0
  3   ssd  0.43700 osd.3 up  1.0 1.0
  4   ssd  0.43700 osd.4 up  1.0 1.0
 -74.94899 host node1002
  9   hdd  0.90999 osd.9 up  1.0 1.0
 10   hdd  0.90999 osd.10up  1.0 1.0
 11   hdd  0.90999 osd.11up  1.0 1.0
 12   hdd  0.90999 osd.12up  1.0 1.0
  6   ssd  0.43700 osd.6 up  1.0 1.0
  7   ssd  0.43700 osd.7 up  1.0 1.0
  8   ssd  0.43700 osd.8 up  1.0 1.0
-117.13477 rack dc01-rack03
-225.38678 host node1003
 17   hdd  0.90970 osd.17up  1.0 1.0
 18   hdd  0.90970 osd.18up  1.0 1.0
 24   hdd  0.90970 osd.24up  1.0 1.0
 26   hdd  0.90970 osd.26up  1.0 1.0
 13   ssd  0.43700 osd.13up  1.0 1.0
 14   ssd  0.43700 osd.14up  1.0 1.0
 15   ssd  0.43700 osd.15up  1.0 1.0
 16   ssd  0.43700 osd.16up  1.0 1.0
-251.74799 host node1004
 19   ssd  0.43700 osd.19up  1.0 1.0
 20   ssd  0.43700 osd.20up  1.0 1.0
 21   ssd  0.43700 osd.21up  1.0 1.0
 22   ssd  0.43700 osd.22up  1.0 1.0


Crush rule is set to chooseleaf rack and (temporary!) to size 2.
Why are PGs stuck in peering and activating?
"ceph df" shows that only 1,5TB are used on the pool, residing on the hdd's
- which would perfectly fit the crush rule(?)

Is this only a problem during recovery and the cluster moves to OK after
rebalance or can I take any action to unblock IO on the hdd pool?
This is a pre-prod cluster, it does not have highest prio but I would
appreciate if we would be able to use it before rebalancing is completed.

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] read_fsid unparsable uuid

2018-04-26 Thread Kevin Olbrich
Hi!

Yesterday I deployed 3x SSDs as OSDs fine but today I get this error when
deploying an HDD with separted WAL/DB:
stderr: 2018-04-26 11:58:19.531966 7fe57e5f5e00 -1
bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid

Command:
ceph-deploy --overwrite-conf osd create --dmcrypt --bluestore --data
/dev/sde --block-db /dev/nvme0n1p1 --block-wal /dev/nvme0n1p1
node1001.ceph01.example.com

Seems related to:
http://tracker.ceph.com/issues/15386

I am using an Intel P3700 NVMe.

Any ideas?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Where to place Block-DB?

2018-04-26 Thread Kevin Olbrich
>>What happens im the NVMe dies?

>You lost OSDs backed by that NVMe and need to re-add them to cluster.

With data located on the OSD (recovery) or as fresh formatted OSD?
Thank you.

- Kevin


2018-04-26 12:36 GMT+02:00 Serkan Çoban <cobanser...@gmail.com>:

> >On bluestore, is it safe to move both Block-DB and WAL to this journal
> NVMe?
> Yes, just specify block-db with ceph-volume and wal also use that
> partition. You can put 12-18 HDDs per NVMe
>
> >What happens im the NVMe dies?
> You lost OSDs backed by that NVMe and need to re-add them to cluster.
>
> On Thu, Apr 26, 2018 at 12:58 PM, Kevin Olbrich <k...@sv01.de> wrote:
> > Hi!
> >
> > On a small cluster I have an Intel P3700 as the journaling device for 4
> > HDDs.
> > While using filestore, I used it as journal.
> >
> > On bluestore, is it safe to move both Block-DB and WAL to this journal
> NVMe?
> > Easy maintenance is first priority (on filestore we just had to flush and
> > replace the SSD).
> >
> > What happens im the NVMe dies?
> >
> > Thank you.
> >
> > - Kevin
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Where to place Block-DB?

2018-04-26 Thread Kevin Olbrich
Hi!

On a small cluster I have an Intel P3700 as the journaling device for 4
HDDs.
While using filestore, I used it as journal.

On bluestore, is it safe to move both Block-DB and WAL to this journal NVMe?
Easy maintenance is first priority (on filestore we just had to flush and
replace the SSD).

What happens im the NVMe dies?

Thank you.

- Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backup LUKS/Dmcrypt keys

2018-04-25 Thread Kevin Olbrich
Hi,

how can I backup the dmcrypt keys on luminous?
The folder under /etc/ceph does not exist anymore.

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
I found a fix: It is *mandatory *to set the public network to the same
network the mons use.
Skipping this while the mon has another network interface, saves garbage to
the monmap.

- Kevin

2018-02-23 11:38 GMT+01:00 Kevin Olbrich <k...@sv01.de>:

> I always see this:
>
> [mon01][DEBUG ] "mons": [
> [mon01][DEBUG ]   {
> [mon01][DEBUG ] "addr": "[fd91:462b:4243:47e::1:1]:6789/0",
> [mon01][DEBUG ] "name": "mon01",
> [mon01][DEBUG ] "public_addr": "[fd91:462b:4243:47e::1:1]:6789/0",
> [mon01][DEBUG ] "rank": 0
> [mon01][DEBUG ]   },
> [mon01][DEBUG ]   {
> [mon01][DEBUG ] "addr": "0.0.0.0:0/1",
> [mon01][DEBUG ] "name": "mon02",
> [mon01][DEBUG ] "public_addr": "0.0.0.0:0/1",
> [mon01][DEBUG ] "rank": 1
> [mon01][DEBUG ]   },
> [mon01][DEBUG ]   {
> [mon01][DEBUG ] "addr": "0.0.0.0:0/2",
> [mon01][DEBUG ] "name": "mon03",
> [mon01][DEBUG ] "public_addr": "0.0.0.0:0/2",
> [mon01][DEBUG ] "rank": 2
> [mon01][DEBUG ]   }
> [mon01][DEBUG ] ]
>
>
> DNS is working fine and the hostnames are also listed in /etc/hosts.
> I already purged the mon but still the same problem.
>
> - Kevin
>
>
> 2018-02-23 10:26 GMT+01:00 Kevin Olbrich <k...@sv01.de>:
>
>> Hi!
>>
>> On a new cluster, I get the following error. All 3x mons are connected to
>> the same switch and ping between them works (firewalls disabled).
>> Mon-nodes are Ubuntu 16.04 LTS on Cep Luminous.
>>
>>
>> [ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum:
>> [ceph_deploy.mon][ERROR ] mon03
>> [ceph_deploy.mon][ERROR ] mon02
>> [ceph_deploy.mon][ERROR ] mon01
>>
>>
>> root@adminnode:~# cat ceph.conf
>> [global]
>> fsid = 2689defb-8715-47bb-8d78-e862089adf7a
>> ms_bind_ipv6 = true
>> mon_initial_members = mon01, mon02, mon03
>> mon_host = [fd91:462b:4243:47e::1:1],[fd91:462b:4243:47e::1:2],[fd91:
>> 462b:4243:47e::1:3]
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> public network = fdd1:ecbd:731f:ee8e::/64
>> cluster network = fd91:462b:4243:47e::/64
>>
>>
>> root@mon01:~# ip a
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
>> default qlen 1000
>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> inet 127.0.0.1/8 scope host lo
>>valid_lft forever preferred_lft forever
>> inet6 ::1/128 scope host
>>valid_lft forever preferred_lft forever
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast
>> state UP group default qlen 1000
>> link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
>> inet 172.17.1.1/16 brd 172.17.255.255 scope global eth0
>>valid_lft forever preferred_lft forever
>> inet6 fd91:462b:4243:47e::1:1/64 scope global
>>valid_lft forever preferred_lft forever
>> inet6 fe80::baae:edff:fee9:b661/64 scope link
>>valid_lft forever preferred_lft forever
>> 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
>> default qlen 1000
>> link/ether 00:db:df:64:34:d5 brd ff:ff:ff:ff:ff:ff
>> 4: eth0.22@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
>> noqueue state UP group default qlen 1000
>> link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
>> inet6 fdd1:ecbd:731f:ee8e::1:1/64 scope global
>>valid_lft forever preferred_lft forever
>> inet6 fe80::baae:edff:fee9:b661/64 scope link
>>valid_lft forever preferred_lft forever
>>
>>
>> Don't mind wlan0, thats because this node is built from an Intel NUC.
>>
>> Any idea?
>>
>> Kind regards
>> Kevin
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
I always see this:

[mon01][DEBUG ] "mons": [
[mon01][DEBUG ]   {
[mon01][DEBUG ] "addr": "[fd91:462b:4243:47e::1:1]:6789/0",
[mon01][DEBUG ] "name": "mon01",
[mon01][DEBUG ] "public_addr": "[fd91:462b:4243:47e::1:1]:6789/0",
[mon01][DEBUG ] "rank": 0
[mon01][DEBUG ]   },
[mon01][DEBUG ]   {
[mon01][DEBUG ] "addr": "0.0.0.0:0/1",
[mon01][DEBUG ] "name": "mon02",
[mon01][DEBUG ] "public_addr": "0.0.0.0:0/1",
[mon01][DEBUG ] "rank": 1
[mon01][DEBUG ]   },
[mon01][DEBUG ]   {
[mon01][DEBUG ] "addr": "0.0.0.0:0/2",
[mon01][DEBUG ] "name": "mon03",
[mon01][DEBUG ] "public_addr": "0.0.0.0:0/2",
[mon01][DEBUG ] "rank": 2
[mon01][DEBUG ]   }
[mon01][DEBUG ] ]


DNS is working fine and the hostnames are also listed in /etc/hosts.
I already purged the mon but still the same problem.

- Kevin


2018-02-23 10:26 GMT+01:00 Kevin Olbrich <k...@sv01.de>:

> Hi!
>
> On a new cluster, I get the following error. All 3x mons are connected to
> the same switch and ping between them works (firewalls disabled).
> Mon-nodes are Ubuntu 16.04 LTS on Cep Luminous.
>
>
> [ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum:
> [ceph_deploy.mon][ERROR ] mon03
> [ceph_deploy.mon][ERROR ] mon02
> [ceph_deploy.mon][ERROR ] mon01
>
>
> root@adminnode:~# cat ceph.conf
> [global]
> fsid = 2689defb-8715-47bb-8d78-e862089adf7a
> ms_bind_ipv6 = true
> mon_initial_members = mon01, mon02, mon03
> mon_host = [fd91:462b:4243:47e::1:1],[fd91:462b:4243:47e::1:2],[
> fd91:462b:4243:47e::1:3]
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = fdd1:ecbd:731f:ee8e::/64
> cluster network = fd91:462b:4243:47e::/64
>
>
> root@mon01:~# ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast
> state UP group default qlen 1000
> link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
> inet 172.17.1.1/16 brd 172.17.255.255 scope global eth0
>valid_lft forever preferred_lft forever
> inet6 fd91:462b:4243:47e::1:1/64 scope global
>valid_lft forever preferred_lft forever
> inet6 fe80::baae:edff:fee9:b661/64 scope link
>valid_lft forever preferred_lft forever
> 3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> default qlen 1000
> link/ether 00:db:df:64:34:d5 brd ff:ff:ff:ff:ff:ff
> 4: eth0.22@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue
> state UP group default qlen 1000
> link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
> inet6 fdd1:ecbd:731f:ee8e::1:1/64 scope global
>valid_lft forever preferred_lft forever
> inet6 fe80::baae:edff:fee9:b661/64 scope link
>valid_lft forever preferred_lft forever
>
>
> Don't mind wlan0, thats because this node is built from an Intel NUC.
>
> Any idea?
>
> Kind regards
> Kevin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
 Hi!

On a new cluster, I get the following error. All 3x mons are connected to
the same switch and ping between them works (firewalls disabled).
Mon-nodes are Ubuntu 16.04 LTS on Cep Luminous.


[ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum:
[ceph_deploy.mon][ERROR ] mon03
[ceph_deploy.mon][ERROR ] mon02
[ceph_deploy.mon][ERROR ] mon01


root@adminnode:~# cat ceph.conf
[global]
fsid = 2689defb-8715-47bb-8d78-e862089adf7a
ms_bind_ipv6 = true
mon_initial_members = mon01, mon02, mon03
mon_host =
[fd91:462b:4243:47e::1:1],[fd91:462b:4243:47e::1:2],[fd91:462b:4243:47e::1:3]
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = fdd1:ecbd:731f:ee8e::/64
cluster network = fd91:462b:4243:47e::/64


root@mon01:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth0:  mtu 9000 qdisc pfifo_fast state
UP group default qlen 1000
link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
inet 172.17.1.1/16 brd 172.17.255.255 scope global eth0
   valid_lft forever preferred_lft forever
inet6 fd91:462b:4243:47e::1:1/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::baae:edff:fee9:b661/64 scope link
   valid_lft forever preferred_lft forever
3: wlan0:  mtu 1500 qdisc noop state DOWN group
default qlen 1000
link/ether 00:db:df:64:34:d5 brd ff:ff:ff:ff:ff:ff
4: eth0.22@eth0:  mtu 9000 qdisc noqueue
state UP group default qlen 1000
link/ether b8:ae:ed:e9:b6:61 brd ff:ff:ff:ff:ff:ff
inet6 fdd1:ecbd:731f:ee8e::1:1/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::baae:edff:fee9:b661/64 scope link
   valid_lft forever preferred_lft forever


Don't mind wlan0, thats because this node is built from an Intel NUC.

Any idea?

Kind regards
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Kevin Olbrich
2018-02-08 11:20 GMT+01:00 Martin Emrich :

> I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally,
> running linux-generic-hwe-16.04 (4.13.0-32-generic).
>
> Works fine, except that it does not support the latest features: I had to
> disable exclusive-lock,fast-diff,object-map,deep-flatten on the image.
> Otherwise it runs well.
>

I always thought that the latest features are built into newer kernels, are
they available on non-HWE 4.4, HWE 4.8 or HWE 4.10?
Also I am researching for the OSD server side.

- Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-07 Thread Kevin Olbrich
Would be interested as well.

- Kevin

2018-02-04 19:00 GMT+01:00 Yoann Moulin :

> Hello,
>
> What is the best kernel for Luminous on Ubuntu 16.04 ?
>
> Is linux-image-virtual-lts-xenial still the best one ? Or
> linux-virtual-hwe-16.04 will offer some improvement ?
>
> Thanks,
>
> --
> Yoann Moulin
> EPFL IC-IT
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
Running the following after prepare and a reboot, "solves" this problem.

[root@osd01 ~]# partx -v -a /dev/mapper/mpatha
partition: none, disk: /dev/mapper/mpatha, lower: 0, upper: 0
/dev/mapper/mpatha: partition table type 'gpt' detected
partx: /dev/mapper/mpatha: adding partition #1 failed: Invalid argument
partx: /dev/mapper/mpatha: adding partition #2 failed: Invalid argument
partx: /dev/mapper/mpatha: error adding partitions 1-2


The disk is then activated and in and up. It seems like the partuuid was
not correctly imported into the kernel.
Even if it states that partitions 1 - 2 were not added, they are (this disk
has only two partitions).

Should I open a bug?

Kind regards,
Kevin

2018-02-04 19:05 GMT+01:00 Kevin Olbrich <k...@sv01.de>:

> I also noticed there are no folders under /var/lib/ceph/osd/ ...
>
>
> Mit freundlichen Grüßen / best regards,
> Kevin Olbrich.
>
> 2018-02-04 19:01 GMT+01:00 Kevin Olbrich <k...@sv01.de>:
>
>> Hi!
>>
>> Currently I try to re-deploy a cluster from filestore to bluestore.
>> I zapped all disks (multiple times) but I fail adding a disk array:
>>
>> Prepare:
>>
>>> ceph-deploy --overwrite-conf osd prepare --bluestore --block-wal
>>> /dev/sdb --block-db /dev/sdb osd01.cloud.example.local:/dev
>>> /mapper/mpatha
>>
>>
>> Activate:
>>
>>> ceph-deploy --overwrite-conf osd activate osd01.cloud.example.local:/dev
>>> /mapper/mpatha1
>>
>>
>> Error on activate:
>>
>>> [osd01.cloud.example.local][WARNIN] got monmap epoch 2
>>> [osd01.cloud.example.local][WARNIN] command_check_call: Running
>>> command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap
>>> /var/lib/ceph/tmp/mnt.pAfCl4/activate.monmap --osd-data
>>> /var/lib/ceph/tmp/mnt.pAfCl4 --osd-uuid d5b6ab85-9437-4cb2-a34d-16a29067ba27
>>> --setuser ceph --setgroup ceph
>>>
>>> *[osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900368
>>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block)
>>> _read_bdev_label failed to open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No
>>> such file or directory[osd01.cloud.example.local][WARNIN] 2018-02-04
>>> 18:52:43.900405 7f00d6359d00 -1
>>> bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block) _read_bdev_label failed to
>>> open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No such file or directory*
>>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900462
>>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4)
>>> _setup_block_symlink_or_file failed to open block file: (13) Permission
>>> denied
>>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900480
>>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4) mkfs failed,
>>> (13) Permission denied
>>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900485
>>> 7f00d6359d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (13)
>>> Permission denied
>>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900662
>>> 7f00d6359d00 -1  ** ERROR: error creating empty object store in
>>> /var/lib/ceph/tmp/mnt.pAfCl4: (13) Permission denied
>>> [osd01.cloud.example.local][WARNIN] mount_activate: Failed to activate
>>> [osd01.cloud.example.local][WARNIN] unmount: Unmounting
>>> /var/lib/ceph/tmp/mnt.pAfCl4
>>>
>>
>>
>> Same problem on 2x 14 disks. I was unable to get this cluster up.
>>
>> Any ideas?
>>
>> Kind regards,
>> Kevin
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
I also noticed there are no folders under /var/lib/ceph/osd/ ...


Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

2018-02-04 19:01 GMT+01:00 Kevin Olbrich <k...@sv01.de>:

> Hi!
>
> Currently I try to re-deploy a cluster from filestore to bluestore.
> I zapped all disks (multiple times) but I fail adding a disk array:
>
> Prepare:
>
>> ceph-deploy --overwrite-conf osd prepare --bluestore --block-wal /dev/sdb
>> --block-db /dev/sdb osd01.cloud.example.local:/dev/mapper/mpatha
>
>
> Activate:
>
>> ceph-deploy --overwrite-conf osd activate osd01.cloud.example.local:/
>> dev/mapper/mpatha1
>
>
> Error on activate:
>
>> [osd01.cloud.example.local][WARNIN] got monmap epoch 2
>> [osd01.cloud.example.local][WARNIN] command_check_call: Running command:
>> /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap
>> /var/lib/ceph/tmp/mnt.pAfCl4/activate.monmap --osd-data
>> /var/lib/ceph/tmp/mnt.pAfCl4 --osd-uuid d5b6ab85-9437-4cb2-a34d-16a29067ba27
>> --setuser ceph --setgroup ceph
>>
>> *[osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900368
>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block)
>> _read_bdev_label failed to open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No
>> such file or directory[osd01.cloud.example.local][WARNIN] 2018-02-04
>> 18:52:43.900405 7f00d6359d00 -1
>> bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block) _read_bdev_label failed to
>> open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No such file or directory*
>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900462
>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4)
>> _setup_block_symlink_or_file failed to open block file: (13) Permission
>> denied
>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900480
>> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4) mkfs failed,
>> (13) Permission denied
>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900485
>> 7f00d6359d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (13)
>> Permission denied
>> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900662
>> 7f00d6359d00 -1  ** ERROR: error creating empty object store in
>> /var/lib/ceph/tmp/mnt.pAfCl4: (13) Permission denied
>> [osd01.cloud.example.local][WARNIN] mount_activate: Failed to activate
>> [osd01.cloud.example.local][WARNIN] unmount: Unmounting
>> /var/lib/ceph/tmp/mnt.pAfCl4
>>
>
>
> Same problem on 2x 14 disks. I was unable to get this cluster up.
>
> Any ideas?
>
> Kind regards,
> Kevin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
Hi!

Currently I try to re-deploy a cluster from filestore to bluestore.
I zapped all disks (multiple times) but I fail adding a disk array:

Prepare:

> ceph-deploy --overwrite-conf osd prepare --bluestore --block-wal /dev/sdb
> --block-db /dev/sdb osd01.cloud.example.local:/dev/mapper/mpatha


Activate:

> ceph-deploy --overwrite-conf osd activate osd01.cloud.example
> .local:/dev/mapper/mpatha1


Error on activate:

> [osd01.cloud.example.local][WARNIN] got monmap epoch 2
> [osd01.cloud.example.local][WARNIN] command_check_call: Running command:
> /usr/bin/ceph-osd --cluster ceph --mkfs -i 0 --monmap
> /var/lib/ceph/tmp/mnt.pAfCl4/activate.monmap --osd-data
> /var/lib/ceph/tmp/mnt.pAfCl4 --osd-uuid
> d5b6ab85-9437-4cb2-a34d-16a29067ba27 --setuser ceph --setgroup ceph
>
> *[osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900368
> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block)
> _read_bdev_label failed to open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No
> such file or directory[osd01.cloud.example.local][WARNIN] 2018-02-04
> 18:52:43.900405 7f00d6359d00 -1
> bluestore(/var/lib/ceph/tmp/mnt.pAfCl4/block) _read_bdev_label failed to
> open /var/lib/ceph/tmp/mnt.pAfCl4/block: (2) No such file or directory*
> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900462
> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4)
> _setup_block_symlink_or_file failed to open block file: (13) Permission
> denied
> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900480
> 7f00d6359d00 -1 bluestore(/var/lib/ceph/tmp/mnt.pAfCl4) mkfs failed, (13)
> Permission denied
> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900485
> 7f00d6359d00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (13)
> Permission denied
> [osd01.cloud.example.local][WARNIN] 2018-02-04 18:52:43.900662
> 7f00d6359d00 -1  ** ERROR: error creating empty object store in
> /var/lib/ceph/tmp/mnt.pAfCl4: (13) Permission denied
> [osd01.cloud.example.local][WARNIN] mount_activate: Failed to activate
> [osd01.cloud.example.local][WARNIN] unmount: Unmounting
> /var/lib/ceph/tmp/mnt.pAfCl4
>


Same problem on 2x 14 disks. I was unable to get this cluster up.

Any ideas?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
2018-02-02 12:44 GMT+01:00 Richard Hesketh <richard.hesk...@rd.bbc.co.uk>:

> On 02/02/18 08:33, Kevin Olbrich wrote:
> > Hi!
> >
> > I am planning a new Flash-based cluster. In the past we used SAMSUNG
> PM863a 480G as journal drives in our HDD cluster.
> > After a lot of tests with luminous and bluestore on HDD clusters, we
> plan to re-deploy our whole RBD pool (OpenNebula cloud) using these disks.
> >
> > As far as I understand, it would be best to skip journaling / WAL and
> just deploy every OSD 1-by-1. This would have the following pro's (correct
> me, if I am wrong):
> > - maximum performance as the journal is spread accross all devices
> > - a lost drive does not affect any other drive
> >
> > Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to
> migrate to Ubuntu 16.04.3 with HWE (kernel 4.10).
> > Clients will be Fedora 27 + OpenNebula.
> >
> > Any comments?
> >
> > Thank you.
> >
> > Kind regards,
> > Kevin
>
> There is only a real advantage to separating the DB/WAL from the main data
> if they're going to be hosted on a device which is appreciably faster than
> the main storage. Since you're going all SSD, it makes sense to deploy each
> OSD all-in-one; as you say, you don't bottleneck on any one disk, and it
> also offers you more maintenance flexibility as you will be able to easily
> move OSDs between hosts if required. If you wanted to start pushing
> performance more, you'd be looking at putting NVMe disks in your hosts for
> DB/WAL.
>

We got some Intel P3700 NVMe (PCIe) disks but each host will be serving 10
OSDs, combined sync-speed on the samsungs was better than this single NVMe
(we did some short fio-benchmarks no real-ceph-test, could also be
different now).
If performance is only slightly better, sticking to single OSD failure
domain is better for maintenance, as this new cluster will not be monitored
24/7 by our staff while migration is in progress.


> FYI, the 16.04 HWE kernel has currently rolled on over to 4.13.
>

Did someone test this kernel branch with ceph? Any performance impact? If I
unterstood the docs, Ubuntu is a well tested plattform for ceph, so this
should have been already tested (?).


>   May I ask why are you using EL repo with centos?
> AFAIK, Redhat is backporting all ceph features to 3.10 kernels. Am I
> wrong?
>

Before we moved from OpenStack to OpenNebula in early 2017, we had some
problems with krbd / fuse (missing features, etc.).
We then decided to move from 3.10 zu 4.4 which solved all problems and we
noticed a small performance improvement.
Maybe these problems are solved already, we had these problems when we
rolled out Mitaka.
We did not change our deployment scripts since then, thats why we are still
at kernel-ml.

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
Hi!

I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a
480G as journal drives in our HDD cluster.
After a lot of tests with luminous and bluestore on HDD clusters, we plan
to re-deploy our whole RBD pool (OpenNebula cloud) using these disks.

As far as I understand, it would be best to skip journaling / WAL and just
deploy every OSD 1-by-1. This would have the following pro's (correct me,
if I am wrong):
- maximum performance as the journal is spread accross all devices
- a lost drive does not affect any other drive

Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to migrate
to Ubuntu 16.04.3 with HWE (kernel 4.10).
Clients will be Fedora 27 + OpenNebula.

Any comments?

Thank you.

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding a new node to a small cluster (size = 2)

2017-05-31 Thread Kevin Olbrich
Hi!

A customer is running a small two node ceph cluster with 14 disks each.
He has min_size 1 and size 2 and it is only used for backups.

If we add a third member with 14 identical disks and remain size = 2,
replicas should be distributed evenly, right?
Or is an uneven count of hosts unadvisable or not working?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
Hi,

seems that I found the cause. The disk array was used for ZFS before and
was not wiped.
I zapped the disks with sgdisk and via ceph but "zfs_member" was still
somewhere on the disk.
Wiping the disk (wipefs -a -f /dev/mapper/mpatha), "ceph osd create
--zap-disk" twice until entry in "df"  and reboot fixed it.

Then OSDs were failing again. Cause: IPv6 DAD on bond-interface. Disabled
via sysctl.
Reboot and voila, cluster immediately online.

Kind regards,
Kevin.

2017-05-16 16:59 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> HI!
>
> Currently I am deploying a small cluster with two nodes. I installed ceph
> jewel on all nodes and made a basic deployment.
> After "ceph osd create..." I am now getting "Failed to start Ceph disk
> activation: /dev/dm-18" on boot. All 28 OSDs were never active.
> This server has a 14 disk JBOD with 4x fiber using multipath (4x active
> multibus). We have two servers.
>
> OS: Latest CentOS 7
>
> [root@osd01 ~]# ceph -v
>> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>
>
> Command run:
>
>> ceph-deploy osd create osd01.example.local:/dev/mappe
>> r/mpatha:/dev/disk/by-partlabel/journal01
>
>
> There is no error in journalctl, just that the unit failed:
>
>> May 16 16:47:33 osd01.example.local systemd[1]: Failed to start Ceph disk
>> activation: /dev/dm-27.
>> May 16 16:47:33 osd01.example.local systemd[1]: ceph-disk@dev-dm
>> \x2d27.service: main process exited, code=exited, status=124/n/a
>> May 16 16:47:33 osd01.example.local systemd[1]: 
>> ceph-disk@dev-dm\x2d24.service
>> failed.
>> May 16 16:47:33 osd01.example.local systemd[1]: Unit 
>> ceph-disk@dev-dm\x2d24.service
>> entered failed state.
>
>
> [root@osd01 ~]# gdisk -l /dev/mapper/mpatha
>> GPT fdisk (gdisk) version 0.8.6
>> Partition table scan:
>>   MBR: protective
>>   BSD: not present
>>   APM: not present
>>   GPT: present
>> Found valid GPT with protective MBR; using GPT.
>> Disk /dev/mapper/mpatha: 976642095 sectors, 465.7 GiB
>> Logical sector size: 512 bytes
>> Disk identifier (GUID): DEF0B782-3B7F-4AF5-A0CB-9E2B96C40B13
>> Partition table holds up to 128 entries
>> First usable sector is 34, last usable sector is 976642061
>> Partitions will be aligned on 2048-sector boundaries
>> Total free space is 2014 sectors (1007.0 KiB)
>> Number  Start (sector)End (sector)  Size   Code  Name
>>12048   976642061   465.7 GiB     ceph data
>
>
> I had problems with multipath in the past when running ceph but this time
> I was unable to solve the problem.
> Any ideas?
>
> Kind regards,
> Kevin.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
HI!

Currently I am deploying a small cluster with two nodes. I installed ceph
jewel on all nodes and made a basic deployment.
After "ceph osd create..." I am now getting "Failed to start Ceph disk
activation: /dev/dm-18" on boot. All 28 OSDs were never active.
This server has a 14 disk JBOD with 4x fiber using multipath (4x active
multibus). We have two servers.

OS: Latest CentOS 7

[root@osd01 ~]# ceph -v
> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)


Command run:

> ceph-deploy osd create
> osd01.example.local:/dev/mapper/mpatha:/dev/disk/by-partlabel/journal01


There is no error in journalctl, just that the unit failed:

> May 16 16:47:33 osd01.example.local systemd[1]: Failed to start Ceph disk
> activation: /dev/dm-27.
> May 16 16:47:33 osd01.example.local systemd[1]: 
> ceph-disk@dev-dm\x2d27.service:
> main process exited, code=exited, status=124/n/a
> May 16 16:47:33 osd01.example.local systemd[1]: ceph-disk@dev-dm\x2d24.service
> failed.
> May 16 16:47:33 osd01.example.local systemd[1]: Unit 
> ceph-disk@dev-dm\x2d24.service
> entered failed state.


[root@osd01 ~]# gdisk -l /dev/mapper/mpatha
> GPT fdisk (gdisk) version 0.8.6
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
> Found valid GPT with protective MBR; using GPT.
> Disk /dev/mapper/mpatha: 976642095 sectors, 465.7 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): DEF0B782-3B7F-4AF5-A0CB-9E2B96C40B13
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 976642061
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 2014 sectors (1007.0 KiB)
> Number  Start (sector)End (sector)  Size   Code  Name
>12048   976642061   465.7 GiB     ceph data


I had problems with multipath in the past when running ceph but this time I
was unable to solve the problem.
Any ideas?

Kind regards,
Kevin.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-22 Thread Kevin Olbrich
2017-03-22 5:30 GMT+01:00 Brad Hubbard <bhubb...@redhat.com>:

> On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu <dna...@nvidia.com> wrote:
> > Do we know which version of ceph client does this bug has a fix. Bug:
> > http://tracker.ceph.com/issues/17191
> >
> >
> >
> > I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common-
> > 10.2.6-1(Ubuntu 14.04.5)
>
> ceph-client is the repository for the ceph kernel client (kernel modules).
>
> The commits referenced in the tracker above went into upstream kernel
> 4.9-rc1.
>
> https://lkml.org/lkml/2016/10/8/110
>
> I doubt these are available in any CentOS 7.x kernel yet but you could
> check the source.
>

If it is in 4.9-rc1, it could also be in 4.10.4.
We are using kernel-lt (4.4.x) for our clusters but there is also mainline
in elrepo:
https://elrepo.org/linux/kernel/el7/x86_64/RPMS/

I did not test 4.10.x with ceph but 4.4.x with rbd and kvm works well for
us.


>
> >
> >
> >
> > --
> >
> > Deepak
> >
> > 
> > This email message is for the sole use of the intended recipient(s) and
> may
> > contain confidential information.  Any unauthorized review, use,
> disclosure
> > or distribution is prohibited.  If you are not the intended recipient,
> > please contact the sender by reply email and destroy all copies of the
> > original message.
> > 
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


Kind regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-08 Thread Kevin Olbrich
Hi!

Currently I have a cluster with 6 OSDs (5 hosts, 7TB RAID6 each).
We want to shut down the cluster but it holds some semi-productive VMs we
might or might not need in the future.
To keep them, we would like to shrink our cluster from 6 to 2 OSDs (we use
size 2 and min_size 1).

Should I set the OSDs out one by one or with norefill, norecovery flags set
but all at once?
If last is the case, which flags should be set also?

Thanks!

Kind regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Kevin Olbrich
In all cases, VMs were
fully functional.

Currently we are migrating most VMs out of the cluster to shut it down (we
had some semi-productive VMs on it to get real world usage stats).

I just wanted to let you know which problems we had with Ceph on ZFS. No
doubt we made a lot of mistakes (this was our first Ceph cluster) but we
had a lot of tests running on it and would not recommand to use ZFS as the
backend.

And for those interested in monitoring this type of cluster: Do not use
munin. As the disks were spinning at 100% and each disk is seen three times
(2 paths combined in one mpath) I caused a deadlock resulting in 3/4
offline nodes (one of the disasters we had Ceph repair everything).

I hope this helps all Ceph users who are interested in the idea of running
Ceph on ZFS.

Kind regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What happens if all replica OSDs journals are broken?

2016-12-14 Thread Kevin Olbrich
2016-12-14 2:37 GMT+01:00 Christian Balzer <ch...@gol.com>:

>
> Hello,
>

Hi!


>
> On Wed, 14 Dec 2016 00:06:14 +0100 Kevin Olbrich wrote:
>
> > Ok, thanks for your explanation!
> > I read those warnings about size 2 + min_size 1 (we are using ZFS as
> RAID6,
> > called zraid2) as OSDs.
> >
> This is similar to my RAID6 or RAID10 backed OSDs with regards to having
> very resilient, extremely unlikely to fail OSDs.
>

This was our intention (unlikely to fail, data security > performance).
We use Ceph for OpenStack (Cinder RBD).


> As such a Ceph replication of 2 with min_size is a calculated risk,
> acceptable for me on others in certain use cases.
> This is also with very few (2-3) journals per SSD.
>

We are running 14x 500G RAID6 ZFS-RAID per Host (1x journal, 1x OSD, 32GB
RAM).
The ZFS pools use L2ARC-Cache on Samsung 850 PRO's 128GB.
Hint: Was a bad idea, would have better split the ZFS pools. (ZFS
performance was very good but double parity with 4k random on sync with
ceph takes very long, resulting in XXX requests blocked more than 32
seconds).
Currently I am waiting for a lab cluster to test "osd op threads" for these
single OSD hosts.


> If:
>
> 1. Your journal SSDs are well trusted and monitored (Intel DC S36xx, 37xx)
>

Indeed Intel DC P3700 400GB for Ceph. We had Samsung 850 PRO before I leard
4k random while DSYNC is a very bad idea... ;-)

2. Your failure domain represented by a journal SSD is small enough
> (meaning that replicating the lost OSDs can be done quickly)
>

OSDs are rather large but we are "just" using 8 TB (size 2) in the whole
cluster (OSD is 24% full).
Before we moved from infernalis to jewel, a recovery from an OSD which was
offline for 8 hours took approx. one hour to be back in sync.

it may be an acceptable risk for you as well.


We got reliable backups in the past but downtime is a greater problem.


>
>
> Time to raise replication!
> >
> If you can afford that (money, space, latency), definitely go for it.
>

It's more the double journal failure which scares me compared to the OSD
itself (as ZFS was very reliable in the past).


Kevin


> Christian
> > Kevin
> >
> > 2016-12-13 0:00 GMT+01:00 Christian Balzer <ch...@gol.com>:
> >
> > > On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
> > >
> > > > Hi,
> > > >
> > > > just in case: What happens when all replica journal SSDs are broken
> at
> > > once?
> > > >
> > > That would be bad, as in BAD.
> > >
> > > In theory you just "lost" all the associated OSDs and their data.
> > >
> > > In practice everything but in the in-flight data at the time is still
> on
> > > the actual OSDs (HDDs), but it's inconsistent and inaccessible as far
> as
> > > Ceph is concerned.
> > >
> > > So with some trickery and an experienced data-recovery Ceph consultant
> you
> > > _may_ get things running with limited data loss/corruption, but that's
> > > speculation and may be wishful thinking on my part.
> > >
> > > Another data point to deploy only well known/monitored/trusted SSDs and
> > > have a 3x replication.
> > >
> > > > The PGs most likely will be stuck inactive but as I read, the
> journals
> > > just
> > > > need to be replaced (http://ceph.com/planet/ceph-
> recover-osds-after-ssd-
> > > > journal-failure/).
> > > >
> > > > Does this also work in this case?
> > > >
> > > Not really, no.
> > >
> > > The above works by having still a valid state and operational OSDs from
> > > which the "broken" one can recover.
> > >
> > > Christian
> > > --
> > > Christian BalzerNetwork/Systems Engineer
> > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > http://www.gol.com/
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What happens if all replica OSDs journals are broken?

2016-12-13 Thread Kevin Olbrich
Ok, thanks for your explanation!
I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6,
called zraid2) as OSDs.
Time to raise replication!

Kevin

2016-12-13 0:00 GMT+01:00 Christian Balzer <ch...@gol.com>:

> On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
>
> > Hi,
> >
> > just in case: What happens when all replica journal SSDs are broken at
> once?
> >
> That would be bad, as in BAD.
>
> In theory you just "lost" all the associated OSDs and their data.
>
> In practice everything but in the in-flight data at the time is still on
> the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
> Ceph is concerned.
>
> So with some trickery and an experienced data-recovery Ceph consultant you
> _may_ get things running with limited data loss/corruption, but that's
> speculation and may be wishful thinking on my part.
>
> Another data point to deploy only well known/monitored/trusted SSDs and
> have a 3x replication.
>
> > The PGs most likely will be stuck inactive but as I read, the journals
> just
> > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> > journal-failure/).
> >
> > Does this also work in this case?
> >
> Not really, no.
>
> The above works by having still a valid state and operational OSDs from
> which the "broken" one can recover.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What happens if all replica OSDs journals are broken?

2016-12-12 Thread Kevin Olbrich
Hi,

just in case: What happens when all replica journal SSDs are broken at once?
The PGs most likely will be stuck inactive but as I read, the journals just
need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
journal-failure/).

Does this also work in this case?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Kevin Olbrich
Is Ceph accepting this OSD if the other (newer) replica is down?
In this case I would assume that my cluster is instantly broken when rack
_after_ rack fails (power outage) and I just start in random order.
We have at least one MON on stand-alone UPS to resolv such an issue - I
just assumed this is safe regardless of full outage.


Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

2016-12-07 21:10 GMT+01:00 Wido den Hollander <w...@42on.com>:

>
> > Op 7 december 2016 om 21:04 schreef "Will.Boege" <will.bo...@target.com
> >:
> >
> >
> > Hi Wido,
> >
> > Just curious how blocking IO to the final replica provides protection
> from data loss?  I’ve never really understood why this is a Ceph best
> practice.  In my head all 3 replicas would be on devices that have roughly
> the same odds of physically failing or getting logically corrupted in any
> given minute.  Not sure how blocking IO prevents this.
> >
>
> Say, disk #1 fails and you have #2 and #3 left. Now #2 fails leaving only
> #3 left.
>
> By block you know that #2 and #3 still have the same data. Although #2
> failed it could be that it is the host which went down but the disk itself
> is just fine. Maybe the SATA cable broke, you never know.
>
> If disk #3 now fails you can still continue your operation if you bring #2
> back. It has the same data on disk as #3 had before it failed. Since you
> didn't allow for any I/O on #3 when #2 went down earlier.
>
> If you would have accepted writes on #3 while #1 and #2 were gone you have
> invalid/old data on #2 by the time it comes back.
>
> Writes were made on #3 but that one really broke down. You managed to get
> #2 back, but it doesn't have the changes which #3 had.
>
> The result is corrupted data.
>
> Does this make sense?
>
> Wido
>
> > On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" <
> ceph-users-boun...@lists.ceph.com on behalf of loic.devul...@mpsa.com>
> wrote:
> >
> > > -Message d'origine-
> > > De : Wido den Hollander [mailto:w...@42on.com]
> > > Envoyé : mercredi 7 décembre 2016 16:01
> > > À : ceph-us...@ceph.com; LOIC DEVULDER - U329683 <
> loic.devul...@mpsa.com>
> > > Objet : RE: [ceph-users] 2x replication: A BIG warning
> > >
> > >
> > > > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER
> > > <loic.devul...@mpsa.com>:
> > > >
> > > >
> > > > Hi Wido,
> > > >
> > > > > As a Ceph consultant I get numerous calls throughout the year
> to
> > > > > help people with getting their broken Ceph clusters back
> online.
> > > > >
> > > > > The causes of downtime vary vastly, but one of the biggest
> causes is
> > > > > that people use replication 2x. size = 2, min_size = 1.
> > > >
> > > > We are building a Ceph cluster for our OpenStack and for data
> integrity
> > > reasons we have chosen to set size=3. But we want to continue to
> access
> > > data if 2 of our 3 osd server are dead, so we decided to set
> min_size=1.
> > > >
> > > > Is it a (very) bad idea?
> > > >
> > >
> > > I would say so. Yes, downtime is annoying on your cloud, but data
> loss if
> > > even worse, much more worse.
> > >
> > > I would always run with min_size = 2 and manually switch to
> min_size = 1
> > > if the situation really requires it at that moment.
> > >
> > > Loosing two disks at the same time is something which doesn't
> happen that
> > > much, but if it happens you don't want to modify any data on the
> only copy
> > > which you still have left.
> > >
> > > Setting min_size to 1 should be a manual action imho when size = 3
> and you
> > > loose two copies. In that case YOU decide at that moment if it is
> the
> > > right course of action.
> > >
> > > Wido
> >
> > Thanks for your quick response!
> >
> > That's make sense, I will try to convince my colleagues :-)
> >
> > Loic
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying new OSDs in parallel or one after another

2016-11-28 Thread Kevin Olbrich
I need to note that I already have 5 hosts with one OSD each.


Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

2016-11-28 10:02 GMT+01:00 Kevin Olbrich <k...@sv01.de>:

> Hi!
>
> I want to deploy two nodes with 4 OSDs each. I already prepared OSDs and
> only need to activate them.
> What is better? One by one or all at once?
>
> Kind regards,
> Kevin.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deploying new OSDs in parallel or one after another

2016-11-28 Thread Kevin Olbrich
Hi!

I want to deploy two nodes with 4 OSDs each. I already prepared OSDs and
only need to activate them.
What is better? One by one or all at once?

Kind regards,
Kevin.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread Kevin Olbrich
Hi,

we are running 80 VMs using KVM in OpenStack via RBD in Ceph Jewel on a
total of 53 disks (RAID parity already excluded).
Our nodes are using Intel P3700 DC-SSDs for journaling.

Most VMs are linux based and load is low to medium. There are also about 10
VMs running Windows 2012R2, two of them run remote services (terminal).

My question is: Are 80 VMs hosted on 53 disks (mostly 7.2k SATA) to much?
We sometime experience lags where nearly all servers suffer from "blocked
IO > 32" seconds.

What are your experiences?

Mit freundlichen Grüßen / best regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] degraded objects after osd add

2016-11-23 Thread Kevin Olbrich
Hi,

what happens when size = 2 and some objects are in degraded state?
This sounds like easy data loss when the old but active OSD fails while
recovery is in progress?

It would make more sense to have the pg replicate first and then remove the
PG from the old OSD.

Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

>
>  Original Message 
> Subject: Re: [ceph-users] degraded objects after osd add (17-Nov-2016 9:14)
> From:Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de>
> To:  c...@dolphin-it.de
>
> Hi,
>
>
> On 11/17/2016 08:07 AM, Steffen Weißgerber wrote:
> > Hello,
> >
> > just for understanding:
> >
> > When starting to fill osd's with data due to setting the weigth from 0
> to the normal value
> > the ceph status displays degraded objects (>0.05%).
> >
> > I don't understand the reason for this because there's no storage
> revoekd from the cluster,
> > only added. Therefore only the displayed object displacement makes sense.
> If you just added a new OSD, a number of PGs will be backfilling or
> waiting for backfilling (the remapped ones). I/O to these PGs is not
> blocked, and thus object may be modified. AFAIK these objects show up as
> degraded.
>
> I'm not sure how ceph handles these objects, e.g. whether it writes them
> to the old OSDs assigned to the PG, or whether they are put on the new OSD
> already, even if the corresponding PG is waiting for backfilling.
>
> Nonetheless the degraded objects will be cleaned up during backfilling.
>
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How are replicas spread in default crush configuration?

2016-11-23 Thread Kevin Olbrich
Hi,

just to make sure, as I did not find a reference in the docs:
Are replicas spread across hosts or "just" OSDs?

I am using a 5 OSD cluster (4 pools, 128 pgs each) with size = 2. Currently
each OSD is a ZFS backed storage array.
Now I installed a server which is planned to host 4x OSDs (and setting size
to 3).

I want to make sure we can resist two offline hosts (in terms of hardware).
Is my assumption correct?

Mit freundlichen Grüßen / best regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com