Re: [ceph-users] Problem with CephFS - No space left on device

Rodrigo Embeita Tue, 08 Jan 2019 06:51:03 -0800

Hi Yoann, thanks a lot for your help.


root@pf-us1-dfs3:/home/rodrigo# ceph osd crush tree
ID CLASS WEIGHT   TYPE NAME
-1       72.77390 root default
-3       29.10956     host pf-us1-dfs1
0   hdd  7.27739         osd.0
5   hdd  7.27739         osd.5
6   hdd  7.27739         osd.6
8   hdd  7.27739         osd.8
-5       29.10956     host pf-us1-dfs2
1   hdd  7.27739         osd.1
3   hdd  7.27739         osd.3
7   hdd  7.27739         osd.7
9   hdd  7.27739         osd.9
-7       14.55478     host pf-us1-dfs3
2   hdd  7.27739         osd.2
4   hdd  7.27739         osd.4

root@pf-us1-dfs3:/home/rodrigo# ceph osd crush rule ls
replicated_rule

root@pf-us1-dfs3:/home/rodrigo# ceph osd crush rule dump
[
   {
       "rule_id": 0,
       "rule_name": "replicated_rule",
       "ruleset": 0,
       "type": 1,
       "min_size": 1,
       "max_size": 10,
       "steps": [
           {
               "op": "take",
               "item": -1,
               "item_name": "default"
           },
           {
               "op": "chooseleaf_firstn",
               "num": 0,
               "type": "host"
           },
           {
               "op": "emit"
           }
       ]
   }
]


On Tue, Jan 8, 2019 at 11:35 AM Yoann Moulin <yoann.mou...@epfl.ch> wrote:

> Hello,
>
> > Hi Yoann, thanks for your response.
> > Here are the results of the commands.
> >
> > root@pf-us1-dfs2:/var/log/ceph# ceph osd df
> > ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
> > 0   hdd 7.27739  1.00000 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310
> > 5   hdd 7.27739  1.00000 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271
> > 6   hdd 7.27739  1.00000 7.3 TiB 609 GiB 6.7 TiB  8.17 0.15  49
> > 8   hdd 7.27739  1.00000 7.3 TiB 2.5 GiB 7.3 TiB  0.03    0  42
> > 1   hdd 7.27739  1.00000 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285
> > 3   hdd 7.27739  1.00000 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296
> > 7   hdd 7.27739  1.00000 7.3 TiB 360 GiB 6.9 TiB  4.84 0.09  53
> > 9   hdd 7.27739  1.00000 7.3 TiB 4.1 GiB 7.3 TiB  0.06 0.00  38
> > 2   hdd 7.27739  1.00000 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321
> > 4   hdd 7.27739  1.00000 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351
> >                    TOTAL  73 TiB  39 TiB  34 TiB 53.13
> > MIN/MAX VAR: 0/1.79  STDDEV: 41.15
>
> It looks like you don't have a good balance between your OSD, what is your
> failure domain ?
>
> could you provide your crush map
> http://docs.ceph.com/docs/luminous/rados/operations/crush-map/
>
> ceph osd crush tree
> ceph osd crush rule ls
> ceph osd crush rule dump
>
>
> > root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail
> > pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 471 fla
> > gs hashpspool,full stripe_width 0
> > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 471 lf
> > or 0/439 flags hashpspool,full stripe_width 0 application cephfs
> > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 47
> > 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs
> > pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha
> > shpspool,full stripe_width 0 application rgw
> > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 47
> > 1 flags hashpspool,full stripe_width 0 application rgw
> > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f
> > lags hashpspool,full stripe_width 0 application rgw
> > pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl
> > ags hashpspool,full stripe_width 0 application rgw
>
> You may need to increase the pg num for cephfs_data pool. But before, you
> must understand what is the impact https://ceph.com/pgcalc/
> you can't decrease pg_num, if it set too high you may have trouble in your
> cluster.
>
> > root@pf-us1-dfs2:/var/log/ceph# ceph osd tree
> > ID CLASS WEIGHT   TYPE NAME            STATUS REWEIGHT PRI-AFF
> > -1       72.77390 root default
> > -3       29.10956     host pf-us1-dfs1
> > 0   hdd  7.27739         osd.0            up  1.00000 1.00000
> > 5   hdd  7.27739         osd.5            up  1.00000 1.00000
> > 6   hdd  7.27739         osd.6            up  1.00000 1.00000
> > 8   hdd  7.27739         osd.8            up  1.00000 1.00000
> > -5       29.10956     host pf-us1-dfs2
> > 1   hdd  7.27739         osd.1            up  1.00000 1.00000
> > 3   hdd  7.27739         osd.3            up  1.00000 1.00000
> > 7   hdd  7.27739         osd.7            up  1.00000 1.00000
> > 9   hdd  7.27739         osd.9            up  1.00000 1.00000
> > -7       14.55478     host pf-us1-dfs3
> > 2   hdd  7.27739         osd.2            up  1.00000 1.00000
> > 4   hdd  7.27739         osd.4            up  1.00000 1.00000
>
> You really should add 2 disks to pf-us1-dfs3, currently, the cluster tries
> to balance data between the 3 hosts, (replica 3, failure domain set to
> 'host' I guess). Each host will store 1/3 of data (1 replica) pf-us1-dfs3
> only have half of the 2 others, you won't be able to put more than 3x
> (osd.2+osd.4) even though there are free spaces on others OSDs.
>
> Best regards,
>
> Yoann
>
> > On Tue, Jan 8, 2019 at 10:36 AM Yoann Moulin <yoann.mou...@epfl.ch
> <mailto:yoann.mou...@epfl.ch>> wrote:
> >
> >     Hello,
> >
> >     > Hi guys, I need your help.
> >     > I'm new with Cephfs and we started using it as file storage.
> >     > Today we are getting no space left on device but I'm seeing that
> we have plenty space on the filesystem.
> >     > Filesystem              Size  Used Avail Use% Mounted on
> >     > 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts
>   73T   39T   35T  54% /mnt/cephfs
> >     >
> >     > We have 35TB of disk space. I've added 2 additional OSD disks with
> 7TB each but I'm getting the error "No space left on device" every time
> >     that
> >     > I want to add a new file.
> >     > After adding the 2 additional OSD disks I'm seeing that the load
> is beign distributed among the cluster.
> >     > Please I need your help.
> >
> >     Could you give us the output of
> >
> >     ceph osd df
> >     ceph osd pool ls detail
> >     ceph osd tree
> >
> >     Best regards,
> >
> >     --
> >     Yoann Moulin
> >     EPFL IC-IT
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Yoann Moulin
> EPFL IC-IT
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem with CephFS - No space left on device

Reply via email to