[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou
Hi,the docs say the upmap mode is trying to achieve perfect distribution as to 
have equal amount of PGs/OSD.This is what I got(v14.2.4):
  0   ssd 3.49219  1.0 3.5 TiB 794 GiB 753 GiB  38 GiB 3.4 GiB 2.7 TiB 
22.20 0.32  82 up 
  1   ssd 3.49219  1.0 3.5 TiB 800 GiB 751 GiB  45 GiB 3.7 GiB 2.7 TiB 
22.37 0.33  84 up 
  2   ssd 3.49219  1.0 3.5 TiB 846 GiB 792 GiB  50 GiB 3.6 GiB 2.7 TiB 
23.66 0.35  88 up 
  3   ssd 3.49219  1.0 3.5 TiB 812 GiB 776 GiB  33 GiB 3.3 GiB 2.7 TiB 
22.71 0.33  85 up 
  4   ssd 3.49219  1.0 3.5 TiB 768 GiB 730 GiB  34 GiB 4.1 GiB 2.7 TiB 
21.47 0.31  83 up 
  6   ssd 3.49219  1.0 3.5 TiB 765 GiB 731 GiB  31 GiB 3.3 GiB 2.7 TiB 
21.40 0.31  82 up 
  8   ssd 3.49219  1.0 3.5 TiB 872 GiB 828 GiB  41 GiB 3.2 GiB 2.6 TiB 
24.40 0.36  85 up 
 10   ssd 3.49219  1.0 3.5 TiB 789 GiB 743 GiB  42 GiB 3.3 GiB 2.7 TiB 
22.05 0.32  82 up 
  5   ssd 3.49219  1.0 3.5 TiB 719 GiB 683 GiB  32 GiB 3.9 GiB 2.8 TiB 
20.12 0.29  78 up 
  7   ssd 3.49219  1.0 3.5 TiB 741 GiB 698 GiB  39 GiB 3.8 GiB 2.8 TiB 
20.73 0.30  79 up 
  9   ssd 3.49219  1.0 3.5 TiB 709 GiB 664 GiB  41 GiB 3.5 GiB 2.8 TiB 
19.82 0.29  78 up 
 11   ssd 3.49219  1.0 3.5 TiB 858 GiB 834 GiB  22 GiB 2.4 GiB 2.7 TiB 
23.99 0.35  82 up 
101   ssd 3.49219  1.0 3.5 TiB 815 GiB 774 GiB  38 GiB 3.5 GiB 2.7 TiB 
22.80 0.33  80 up 
103   ssd 3.49219  1.0 3.5 TiB 827 GiB 783 GiB  40 GiB 3.3 GiB 2.7 TiB 
23.11 0.34  81 up 
105   ssd 3.49219  1.0 3.5 TiB 797 GiB 759 GiB  36 GiB 2.5 GiB 2.7 TiB 
22.30 0.33  81 up 
107   ssd 3.49219  1.0 3.5 TiB 840 GiB 788 GiB  50 GiB 2.8 GiB 2.7 TiB 
23.50 0.34  83 up 
100   ssd 3.49219  1.0 3.5 TiB 728 GiB 678 GiB  47 GiB 2.4 GiB 2.8 TiB 
20.36 0.30  78 up 
102   ssd 3.49219  1.0 3.5 TiB 764 GiB 750 GiB  12 GiB 2.2 GiB 2.7 TiB 
21.37 0.31  76 up 
104   ssd 3.49219  1.0 3.5 TiB 795 GiB 761 GiB  31 GiB 2.5 GiB 2.7 TiB 
22.22 0.33  78 up 
106   ssd 3.49219  1.0 3.5 TiB 730 GiB 665 GiB  62 GiB 2.8 GiB 2.8 TiB 
20.41 0.30  78 up 
108   ssd 3.49219  1.0 3.5 TiB 849 GiB 808 GiB  38 GiB 2.5 GiB 2.7 TiB 
23.73 0.35  92 up 
109   ssd 3.49219  1.0 3.5 TiB 798 GiB 754 GiB  41 GiB 2.7 GiB 2.7 TiB 
22.30 0.33  83 up 
110   ssd 3.49219  1.0 3.5 TiB 840 GiB 810 GiB  28 GiB 2.4 GiB 2.7 TiB 
23.49 0.34  85 up 
111   ssd 3.49219  1.0 3.5 TiB 788 GiB 741 GiB  45 GiB 2.5 GiB 2.7 TiB 
22.04 0.32  85 up 

PG's are badly distributed.ceph balancer status
{
    "active": true, 
    "plans": [], 
    "mode": "upmap"
}
It is because of this?    health: HEALTH_WARN
    Failed to send data to Zabbix
    1 subtrees have overcommitted pool target_size_bytes
    1 subtrees have overcommitted pool target_size_ratio


Any ideas why its not working?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Wido den Hollander


On 12/7/19 11:42 AM, Philippe D'Anjou wrote:
> Hi,
> the docs say the upmap mode is trying to achieve perfect distribution as
> to have equal amount of PGs/OSD.
> This is what I got(v14.2.4):
> 
>   0   ssd 3.49219  1.0 3.5 TiB 794 GiB 753 GiB  38 GiB 3.4 GiB 2.7
> TiB 22.20 0.32  82 up
>   1   ssd 3.49219  1.0 3.5 TiB 800 GiB 751 GiB  45 GiB 3.7 GiB 2.7
> TiB 22.37 0.33  84 up
>   2   ssd 3.49219  1.0 3.5 TiB 846 GiB 792 GiB  50 GiB 3.6 GiB 2.7
> TiB 23.66 0.35  88 up
>   3   ssd 3.49219  1.0 3.5 TiB 812 GiB 776 GiB  33 GiB 3.3 GiB 2.7
> TiB 22.71 0.33  85 up
>   4   ssd 3.49219  1.0 3.5 TiB 768 GiB 730 GiB  34 GiB 4.1 GiB 2.7
> TiB 21.47 0.31  83 up
>   6   ssd 3.49219  1.0 3.5 TiB 765 GiB 731 GiB  31 GiB 3.3 GiB 2.7
> TiB 21.40 0.31  82 up
>   8   ssd 3.49219  1.0 3.5 TiB 872 GiB 828 GiB  41 GiB 3.2 GiB 2.6
> TiB 24.40 0.36  85 up
>  10   ssd 3.49219  1.0 3.5 TiB 789 GiB 743 GiB  42 GiB 3.3 GiB 2.7
> TiB 22.05 0.32  82 up
>   5   ssd 3.49219  1.0 3.5 TiB 719 GiB 683 GiB  32 GiB 3.9 GiB 2.8
> TiB 20.12 0.29  78 up
>   7   ssd 3.49219  1.0 3.5 TiB 741 GiB 698 GiB  39 GiB 3.8 GiB 2.8
> TiB 20.73 0.30  79 up
>   9   ssd 3.49219  1.0 3.5 TiB 709 GiB 664 GiB  41 GiB 3.5 GiB 2.8
> TiB 19.82 0.29  78 up
>  11   ssd 3.49219  1.0 3.5 TiB 858 GiB 834 GiB  22 GiB 2.4 GiB 2.7
> TiB 23.99 0.35  82 up
> 101   ssd 3.49219  1.0 3.5 TiB 815 GiB 774 GiB  38 GiB 3.5 GiB 2.7
> TiB 22.80 0.33  80 up
> 103   ssd 3.49219  1.0 3.5 TiB 827 GiB 783 GiB  40 GiB 3.3 GiB 2.7
> TiB 23.11 0.34  81 up
> 105   ssd 3.49219  1.0 3.5 TiB 797 GiB 759 GiB  36 GiB 2.5 GiB 2.7
> TiB 22.30 0.33  81 up
> 107   ssd 3.49219  1.0 3.5 TiB 840 GiB 788 GiB  50 GiB 2.8 GiB 2.7
> TiB 23.50 0.34  83 up
> 100   ssd 3.49219  1.0 3.5 TiB 728 GiB 678 GiB  47 GiB 2.4 GiB 2.8
> TiB 20.36 0.30  78 up
> 102   ssd 3.49219  1.0 3.5 TiB 764 GiB 750 GiB  12 GiB 2.2 GiB 2.7
> TiB 21.37 0.31  76 up
> 104   ssd 3.49219  1.0 3.5 TiB 795 GiB 761 GiB  31 GiB 2.5 GiB 2.7
> TiB 22.22 0.33  78 up
> 106   ssd 3.49219  1.0 3.5 TiB 730 GiB 665 GiB  62 GiB 2.8 GiB 2.8
> TiB 20.41 0.30  78 up
> 108   ssd 3.49219  1.0 3.5 TiB 849 GiB 808 GiB  38 GiB 2.5 GiB 2.7
> TiB 23.73 0.35  92 up
> 109   ssd 3.49219  1.0 3.5 TiB 798 GiB 754 GiB  41 GiB 2.7 GiB 2.7
> TiB 22.30 0.33  83 up
> 110   ssd 3.49219  1.0 3.5 TiB 840 GiB 810 GiB  28 GiB 2.4 GiB 2.7
> TiB 23.49 0.34  85 up
> 111   ssd 3.49219  1.0 3.5 TiB 788 GiB 741 GiB  45 GiB 2.5 GiB 2.7
> TiB 22.04 0.32  85 up
> 
> PG's are badly distributed.

From what information do you draw that conclusion? You use about 22% on
all OSDs.

I suggest that you increase your PGs to at least 100 per OSD, that will
make distribution even better.

Wido

> ceph balancer status
> {
>     "active": true,
>     "plans": [],
>     "mode": "upmap"
> }
> 
> It is because of this?
>     health: HEALTH_WARN
>     Failed to send data to Zabbix
>     1 subtrees have overcommitted pool target_size_bytes
>     1 subtrees have overcommitted pool target_size_ratio
> 
> 
> Any ideas why its not working?
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou
@Wido Den Hollander 
That doesn't explain why its between 76 and 92 PGs, that's major not equal. 
Raising PGs to 100 is an old statement anyway, anything 60+ should be fine. Not 
an excuse for distribution failure in this case.I am expecting more or less 
equal PGs/OSD
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Wido den Hollander


On 12/7/19 1:42 PM, Philippe D'Anjou wrote:
> @Wido Den Hollander 
> 
> That doesn't explain why its between 76 and 92 PGs, that's major not equal.

The balancer will balance the PGs so that all OSDs have an almost equal
data usage. It doesn't balance that all OSDs have an equal amount of PGs.

The end goal is to make sure all OSDs are filled equally.

> Raising PGs to 100 is an old statement anyway, anything 60+ should be
> fine. Not an excuse for distribution failure in this case.
> I am expecting more or less equal PGs/OSD

That will not happen. Objects are distributed over PGs based on the hash
of their name and not their size. The distribution of objects over PGs
is almost perfect.

However, objects will vary in size. If you are low on the amount of PGs
you'll have PGs which are bigger then others. This causes data
distribution to be off.

Therefor it can help to increase the amount of PGs as that will result
in a better distribution. PGs will be less varied in size and makes them
easier to balance.

In addition you gain more parallel performance and suffer less from PG
contention.

As disks (or SSDs) become bigger and bigger you can benefit by having
more PGs.

Hope this explains it :-)

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou
@Wido Den Hollander  

First of all the docs say: " In most cases, this distribution is “perfect,” 
whichan equal number of PGs on each OSD (+/-1 PG, since they might notdivide 
evenly)."Either this is just false information or very badly stated.
I increased PGs and see no difference.
I pointed out MULTIPLE times that Nautilus has major flaws in the data 
distribution but nobody seems to listen to me. Not sure how much more evidence 
I have to show.
  0   ssd 3.49219  1.0 3.5 TiB 715 GiB 674 GiB  37 GiB 3.9 GiB 2.8 TiB 
19.99 0.29 147 up 
  1   ssd 3.49219  1.0 3.5 TiB 724 GiB 672 GiB  49 GiB 3.8 GiB 2.8 TiB 
20.25 0.30 146 up 
  2   ssd 3.49219  1.0 3.5 TiB 736 GiB 681 GiB  50 GiB 4.4 GiB 2.8 TiB 
20.57 0.30 150 up 
  3   ssd 3.49219  1.0 3.5 TiB 712 GiB 676 GiB  33 GiB 3.5 GiB 2.8 TiB 
19.92 0.29 146 up 
  4   ssd 3.49219  1.0 3.5 TiB 752 GiB 714 GiB  34 GiB 4.6 GiB 2.8 TiB 
21.03 0.31 156 up 
  6   ssd 3.49219  1.0 3.5 TiB 710 GiB 671 GiB  35 GiB 3.8 GiB 2.8 TiB 
19.85 0.29 146 up 
  8   ssd 3.49219  1.0 3.5 TiB 781 GiB 738 GiB  40 GiB 3.7 GiB 2.7 TiB 
21.85 0.32 156 up 
 10   ssd 3.49219  1.0 3.5 TiB 728 GiB 682 GiB  42 GiB 4.0 GiB 2.8 TiB 
20.35 0.30 146 up 
  5   ssd 3.49219  1.0 3.5 TiB 664 GiB 628 GiB  32 GiB 4.3 GiB 2.8 TiB 
18.58 0.27 141 up 
  7   ssd 3.49219  1.0 3.5 TiB 656 GiB 613 GiB  39 GiB 4.0 GiB 2.9 TiB 
18.35 0.27 136 up 
  9   ssd 3.49219  1.0 3.5 TiB 632 GiB 586 GiB  41 GiB 4.4 GiB 2.9 TiB 
17.67 0.26 131 up 
 11   ssd 3.49219  1.0 3.5 TiB 725 GiB 701 GiB  22 GiB 2.6 GiB 2.8 TiB 
20.28 0.30 138 up 
101   ssd 3.49219  1.0 3.5 TiB 755 GiB 713 GiB  38 GiB 3.9 GiB 2.8 TiB 
21.11 0.31 146 up 
103   ssd 3.49219  1.0 3.5 TiB 761 GiB 718 GiB  40 GiB 3.6 GiB 2.7 TiB 
21.29 0.31 150 up 
105   ssd 3.49219  1.0 3.5 TiB 715 GiB 676 GiB  36 GiB 2.6 GiB 2.8 TiB 
19.99 0.29 148 up 
107   ssd 3.49219  1.0 3.5 TiB 760 GiB 706 GiB  50 GiB 3.2 GiB 2.8 TiB 
21.24 0.31 147 up 
100   ssd 3.49219  1.0 3.5 TiB 724 GiB 674 GiB  47 GiB 2.5 GiB 2.8 TiB 
20.25 0.30 144 up 
102   ssd 3.49219  1.0 3.5 TiB 669 GiB 654 GiB  12 GiB 2.3 GiB 2.8 TiB 
18.71 0.27 141 up 
104   ssd 3.49219  1.0 3.5 TiB 721 GiB 687 GiB  31 GiB 3.0 GiB 2.8 TiB 
20.16 0.30 144 up 
106   ssd 3.49219  1.0 3.5 TiB 715 GiB 646 GiB  65 GiB 3.8 GiB 2.8 TiB 
19.99 0.29 143 up 
108   ssd 3.49219  1.0 3.5 TiB 729 GiB 691 GiB  36 GiB 2.6 GiB 2.8 TiB 
20.38 0.30 156 up 
109   ssd 3.49219  1.0 3.5 TiB 732 GiB 684 GiB  45 GiB 3.0 GiB 2.8 TiB 
20.47 0.30 146 up 
110   ssd 3.49219  1.0 3.5 TiB 773 GiB 743 GiB  28 GiB 2.7 GiB 2.7 TiB 
21.63 0.32 154 up 
111   ssd 3.49219  1.0 3.5 TiB 708 GiB 660 GiB  45 GiB 2.7 GiB 2.8 TiB 
19.78 0.29 146 up
The % fillrate is no different than before, fluctuates hard.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Wido den Hollander


On 12/7/19 3:39 PM, Philippe D'Anjou wrote:
> @Wido Den Hollander 
> 
> First of all the docs say: "In most cases, this distribution is
> “perfect,” which an equal number of PGs on each OSD (+/-1 PG, since they
> might not divide evenly)."
> Either this is just false information or very badly stated.

Might be both. But what are you trying to achieve? PGs will never be
equally sized because objects vary in size.

The end result, I assume, is that you have equally filled OSDs.

> 
> I increased PGs and see no difference.
> 
> I pointed out MULTIPLE times that Nautilus has major flaws in the data
> distribution but nobody seems to listen to me. Not sure how much more
> evidence I have to show.
> 

What has changed? Because this can only change if Nautilus had a CRUSH
algorithm change which it didn't. Upgrading from Mimic nor Luminous
causes a major shift in data.

>   0   ssd 3.49219  1.0 3.5 TiB 715 GiB 674 GiB  37 GiB 3.9 GiB 2.8
> TiB 19.99 0.29 147 up
>   1   ssd 3.49219  1.0 3.5 TiB 724 GiB 672 GiB  49 GiB 3.8 GiB 2.8
> TiB 20.25 0.30 146 up
>   2   ssd 3.49219  1.0 3.5 TiB 736 GiB 681 GiB  50 GiB 4.4 GiB 2.8
> TiB 20.57 0.30 150 up
>   3   ssd 3.49219  1.0 3.5 TiB 712 GiB 676 GiB  33 GiB 3.5 GiB 2.8
> TiB 19.92 0.29 146 up
>   4   ssd 3.49219  1.0 3.5 TiB 752 GiB 714 GiB  34 GiB 4.6 GiB 2.8
> TiB 21.03 0.31 156 up
>   6   ssd 3.49219  1.0 3.5 TiB 710 GiB 671 GiB  35 GiB 3.8 GiB 2.8
> TiB 19.85 0.29 146 up
>   8   ssd 3.49219  1.0 3.5 TiB 781 GiB 738 GiB  40 GiB 3.7 GiB 2.7
> TiB 21.85 0.32 156 up
>  10   ssd 3.49219  1.0 3.5 TiB 728 GiB 682 GiB  42 GiB 4.0 GiB 2.8
> TiB 20.35 0.30 146 up
>   5   ssd 3.49219  1.0 3.5 TiB 664 GiB 628 GiB  32 GiB 4.3 GiB 2.8
> TiB 18.58 0.27 141 up
>   7   ssd 3.49219  1.0 3.5 TiB 656 GiB 613 GiB  39 GiB 4.0 GiB 2.9
> TiB 18.35 0.27 136 up
>   9   ssd 3.49219  1.0 3.5 TiB 632 GiB 586 GiB  41 GiB 4.4 GiB 2.9
> TiB 17.67 0.26 131 up
>  11   ssd 3.49219  1.0 3.5 TiB 725 GiB 701 GiB  22 GiB 2.6 GiB 2.8
> TiB 20.28 0.30 138 up
> 101   ssd 3.49219  1.0 3.5 TiB 755 GiB 713 GiB  38 GiB 3.9 GiB 2.8
> TiB 21.11 0.31 146 up
> 103   ssd 3.49219  1.0 3.5 TiB 761 GiB 718 GiB  40 GiB 3.6 GiB 2.7
> TiB 21.29 0.31 150 up
> 105   ssd 3.49219  1.0 3.5 TiB 715 GiB 676 GiB  36 GiB 2.6 GiB 2.8
> TiB 19.99 0.29 148 up
> 107   ssd 3.49219  1.0 3.5 TiB 760 GiB 706 GiB  50 GiB 3.2 GiB 2.8
> TiB 21.24 0.31 147 up
> 100   ssd 3.49219  1.0 3.5 TiB 724 GiB 674 GiB  47 GiB 2.5 GiB 2.8
> TiB 20.25 0.30 144 up
> 102   ssd 3.49219  1.0 3.5 TiB 669 GiB 654 GiB  12 GiB 2.3 GiB 2.8
> TiB 18.71 0.27 141 up
> 104   ssd 3.49219  1.0 3.5 TiB 721 GiB 687 GiB  31 GiB 3.0 GiB 2.8
> TiB 20.16 0.30 144 up
> 106   ssd 3.49219  1.0 3.5 TiB 715 GiB 646 GiB  65 GiB 3.8 GiB 2.8
> TiB 19.99 0.29 143 up
> 108   ssd 3.49219  1.0 3.5 TiB 729 GiB 691 GiB  36 GiB 2.6 GiB 2.8
> TiB 20.38 0.30 156 up
> 109   ssd 3.49219  1.0 3.5 TiB 732 GiB 684 GiB  45 GiB 3.0 GiB 2.8
> TiB 20.47 0.30 146 up
> 110   ssd 3.49219  1.0 3.5 TiB 773 GiB 743 GiB  28 GiB 2.7 GiB 2.7
> TiB 21.63 0.32 154 up
> 111   ssd 3.49219  1.0 3.5 TiB 708 GiB 660 GiB  45 GiB 2.7 GiB 2.8
> TiB 19.78 0.29 146 up
> 
> The % fillrate is no different than before, fluctuates hard.

All OSDs are very close to 20%, that's very good.

What is the real problem here?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou
I never had those issues with Luminous, never once, since Nautilus this is a 
constant headache.My issue is that I have OSDs that are over 85% whilst others 
are at 63%. My issue is that every time I do a rebalance or add new disks ceph 
moves PGs on near full OSDs and almost causes pool failures.
My STDDEV: 21.31 ...it's a joke.
It's simply not acceptable to deal with nearfull OSDs whilst others are half 
empty.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs metadata fix tool

2019-12-07 Thread Robert LeBlanc
Our Jewel cluster is exhibiting some similar issues to the one in this
thread [0] and it was indicated that a tool would need to be written to fix
that kind of corruption. Has the tool been written? How would I go about
repair this 16EB directories that won't delete?

Thank you,
Robert LeBlanc

[0] https://www.spinics.net/lists/ceph-users/msg31598.html

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com