Re: [ceph-users] Full OSD with 29% free

2013-10-31 Thread Bryan Stillwell
Shain,

I investigated the segfault a little more since I sent this message
and found this email thread:

http://oss.sgi.com/archives/xfs/2012-06/msg00066.html

After reading that I did the following:

[root@den2ceph001 ~]# xfs_db -r "-c freesp -s" /dev/sdb1
Segmentation fault (core dumped)
[root@den2ceph001 ~]# service ceph stop osd.0
=== osd.0 ===
Stopping Ceph osd.0 on den2ceph001...kill 2407...kill 2407...done
[root@den2ceph001 ~]# umount /dev/sdb1
[root@den2ceph001 ~]# xfs_db -r "-c freesp -s" /dev/sdb1
   from  to extents  blockspct
  1   1   44510   44510   0.05
  2   3   60341  142274   0.16
  4   7   68836  355735   0.39
  8  15  274122 3212122   3.50
 16  31 1429274 37611619  41.02
 32  63   43225 1945740   2.12
 64 127   39480 3585579   3.91
128 255   36046 6544005   7.14
256 511   30946 10899979  11.89
5121023   14119 9907129  10.80
   102420475727 7998938   8.72
   204840952647 6811258   7.43
   40968191 362 1940622   2.12
   8192   16383  59  603690   0.66
  16384   32767   5   90464   0.10
total free extents 2049699
total free blocks 91693664
average free extent size 44.7352


That gives me a little more confidence in using 2K block sizes now.  :)

Bryan

On Thu, Oct 31, 2013 at 11:02 AM, Bryan Stillwell
 wrote:
> Shain,
>
> After getting the segfaults when running 'xfs_db -r "-c freesp -s"' on
> a couple partitions, I'm concerned that 2K block sizes aren't nearly
> as well tested as 4K block sizes.  This could just be a problem with
> RHEL/CentOS 6.4 though, so if you're using a newer kernel the problem
> might already be fixed.  There also appears to be more overhead with
> 2K block sizes which I believe manifests as high CPU usage by the
> xfsalloc processes.  However, my cluster has been running in a clean
> state for over 24 hours and none of the scrubs have found a problem
> yet.
>
> According to 'ceph -s' my cluster has the following stats:
>
>  osdmap e16882: 40 osds: 40 up, 40 in
>   pgmap v3520420: 2808 pgs, 13 pools, 5694 GB data, 72705 kobjects
> 18095 GB used, 13499 GB / 31595 GB avail
>
> That's about 78k per object on average, so if your files aren't that
> small I would stay with 4K block sizes to avoid headaches.
>
> Bryan
>
>
> On Thu, Oct 31, 2013 at 6:43 AM, Shain Miley  wrote:
>>
>> Bryan,
>>
>> We are setting up a cluster using xfs and have been a bit concerned about 
>> running into similar issues to the ones you described below.
>>
>> I am just wondering if you came across any potential downsides to using a 2K 
>> block size with xfs on your osd's.
>>
>> Thanks,
>>
>> Shain
>>
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
>> smi...@npr.org | 202.513.3649
>>
>> ________________
>> From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] 
>> on behalf of Bryan Stillwell [bstillw...@photobucket.com]
>> Sent: Wednesday, October 30, 2013 2:18 PM
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Full OSD with 29% free
>>
>> I wanted to report back on this since I've made some progress on
>> fixing this issue.
>>
>> After converting every OSD on a single server to use a 2K block size,
>> I've been able to cross 90% utilization without running into the 'No
>> space left on device' problem.  They're currently between 51% and 75%,
>> but I hit 90% over the weekend after a couple OSDs died during
>> recovery.
>>
>> This conversion was pretty rough though with OSDs randomly dying
>> multiple times during the process (logs point at suicide time outs).
>> When looking at top I would frequently see xfsalloc pegging multiple
>> cores, so I wonder if that has something to do with it.  I also had
>> the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
>> which was fixed by running xfs_repair on those partitions.  This has
>> me wondering how well XFS is tested with non-default block sizes on
>> CentOS 6.4...
>>
>> Anyways, after about a week I was finally able to get the cluster to
>> fully recover today.  Now I need to repeat the process on 7 more
>> servers before I can finish populating my cluster...
>>
>> In case anyone is wondering how I switched to a 2K block size, this is
>> what I added to my ceph.conf:
>>
>> [osd]
>> osd_mount_options_xfs = "rw,noatime,inode64"
>> osd_mkfs_options_xfs = "-f -b size=2048"
>>
>&g

Re: [ceph-users] Full OSD with 29% free

2013-10-31 Thread Bryan Stillwell
Shain,

After getting the segfaults when running 'xfs_db -r "-c freesp -s"' on
a couple partitions, I'm concerned that 2K block sizes aren't nearly
as well tested as 4K block sizes.  This could just be a problem with
RHEL/CentOS 6.4 though, so if you're using a newer kernel the problem
might already be fixed.  There also appears to be more overhead with
2K block sizes which I believe manifests as high CPU usage by the
xfsalloc processes.  However, my cluster has been running in a clean
state for over 24 hours and none of the scrubs have found a problem
yet.

According to 'ceph -s' my cluster has the following stats:

 osdmap e16882: 40 osds: 40 up, 40 in
  pgmap v3520420: 2808 pgs, 13 pools, 5694 GB data, 72705 kobjects
18095 GB used, 13499 GB / 31595 GB avail

That's about 78k per object on average, so if your files aren't that
small I would stay with 4K block sizes to avoid headaches.

Bryan


On Thu, Oct 31, 2013 at 6:43 AM, Shain Miley  wrote:
>
> Bryan,
>
> We are setting up a cluster using xfs and have been a bit concerned about 
> running into similar issues to the ones you described below.
>
> I am just wondering if you came across any potential downsides to using a 2K 
> block size with xfs on your osd's.
>
> Thanks,
>
> Shain
>
> Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
> smi...@npr.org | 202.513.3649
>
> 
> From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] 
> on behalf of Bryan Stillwell [bstillw...@photobucket.com]
> Sent: Wednesday, October 30, 2013 2:18 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Full OSD with 29% free
>
> I wanted to report back on this since I've made some progress on
> fixing this issue.
>
> After converting every OSD on a single server to use a 2K block size,
> I've been able to cross 90% utilization without running into the 'No
> space left on device' problem.  They're currently between 51% and 75%,
> but I hit 90% over the weekend after a couple OSDs died during
> recovery.
>
> This conversion was pretty rough though with OSDs randomly dying
> multiple times during the process (logs point at suicide time outs).
> When looking at top I would frequently see xfsalloc pegging multiple
> cores, so I wonder if that has something to do with it.  I also had
> the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
> which was fixed by running xfs_repair on those partitions.  This has
> me wondering how well XFS is tested with non-default block sizes on
> CentOS 6.4...
>
> Anyways, after about a week I was finally able to get the cluster to
> fully recover today.  Now I need to repeat the process on 7 more
> servers before I can finish populating my cluster...
>
> In case anyone is wondering how I switched to a 2K block size, this is
> what I added to my ceph.conf:
>
> [osd]
> osd_mount_options_xfs = "rw,noatime,inode64"
> osd_mkfs_options_xfs = "-f -b size=2048"
>
>
> The cluster is currently running the 0.71 release.
>
> Bryan
>
> On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell
>  wrote:
> > So I'm running into this issue again and after spending a bit of time
> > reading the XFS mailing lists, I believe the free space is too
> > fragmented:
> >
> > [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
> >from  to extents  blockspct
> >   1   1 85773 85773   0.24
> >   2   3  176891  444356   1.27
> >   4   7  430854 2410929   6.87
> >   8  15 2327527 30337352  86.46
> >  16  31   75871 1809577   5.16
> > total free extents 3096916
> > total free blocks 35087987
> > average free extent size 11.33
> >
> >
> > Compared to a drive which isn't reporting 'No space left on device':
> >
> > [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
> >from  to extents  blockspct
> >   1   1  133148  133148   0.15
> >   2   3  320737  808506   0.94
> >   4   7  809748 4532573   5.27
> >   8  15 4536681 59305608  68.96
> >  16  31   31531  751285   0.87
> >  32  63 364   16367   0.02
> >  64 127  909174   0.01
> > 128 255   92072   0.00
> > 256 511  48   18018   0.02
> > 5121023 128  102422   0.12
> >10242047 290  451017   0.52
> >20484095 538 1649408   1.92
> >40968191 851 5066070   5.89
> >

Re: [ceph-users] Full OSD with 29% free

2013-10-31 Thread Shain Miley
Bryan,

We are setting up a cluster using xfs and have been a bit concerned about 
running into similar issues to the ones you described below.

I am just wondering if you came across any potential downsides to using a 2K 
block size with xfs on your osd's.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-boun...@lists.ceph.com] on 
behalf of Bryan Stillwell [bstillw...@photobucket.com]
Sent: Wednesday, October 30, 2013 2:18 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Full OSD with 29% free

I wanted to report back on this since I've made some progress on
fixing this issue.

After converting every OSD on a single server to use a 2K block size,
I've been able to cross 90% utilization without running into the 'No
space left on device' problem.  They're currently between 51% and 75%,
but I hit 90% over the weekend after a couple OSDs died during
recovery.

This conversion was pretty rough though with OSDs randomly dying
multiple times during the process (logs point at suicide time outs).
When looking at top I would frequently see xfsalloc pegging multiple
cores, so I wonder if that has something to do with it.  I also had
the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
which was fixed by running xfs_repair on those partitions.  This has
me wondering how well XFS is tested with non-default block sizes on
CentOS 6.4...

Anyways, after about a week I was finally able to get the cluster to
fully recover today.  Now I need to repeat the process on 7 more
servers before I can finish populating my cluster...

In case anyone is wondering how I switched to a 2K block size, this is
what I added to my ceph.conf:

[osd]
osd_mount_options_xfs = "rw,noatime,inode64"
osd_mkfs_options_xfs = "-f -b size=2048"


The cluster is currently running the 0.71 release.

Bryan

On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell
 wrote:
> So I'm running into this issue again and after spending a bit of time
> reading the XFS mailing lists, I believe the free space is too
> fragmented:
>
> [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
>from  to extents  blockspct
>   1   1 85773 85773   0.24
>   2   3  176891  444356   1.27
>   4   7  430854 2410929   6.87
>   8  15 2327527 30337352  86.46
>  16  31   75871 1809577   5.16
> total free extents 3096916
> total free blocks 35087987
> average free extent size 11.33
>
>
> Compared to a drive which isn't reporting 'No space left on device':
>
> [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
>from  to extents  blockspct
>   1   1  133148  133148   0.15
>   2   3  320737  808506   0.94
>   4   7  809748 4532573   5.27
>   8  15 4536681 59305608  68.96
>  16  31   31531  751285   0.87
>  32  63 364   16367   0.02
>  64 127  909174   0.01
> 128 255   92072   0.00
> 256 511  48   18018   0.02
> 5121023 128  102422   0.12
>10242047 290  451017   0.52
>20484095 538 1649408   1.92
>40968191 851 5066070   5.89
>8192   16383 746 8436029   9.81
>   16384   32767 194 4042573   4.70
>   32768   65535  15  614301   0.71
>   65536  131071   1   66630   0.08
> total free extents 5835119
> total free blocks 86005201
> average free extent size 14.7392
>
>
> What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
> would help?  I'm pretty sure this would take require re-running
> mkfs.xfs on every OSD to fix if that's the case...
>
> Thanks,
> Bryan
>
>
> On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
>  wrote:
>>
>> The filesystem isn't as full now, but the fragmentation is pretty low:
>>
>> [root@den2ceph001 ~]# df /dev/sdc1
>> Filesystem   1K-blocks  Used Available Use% Mounted on
>> /dev/sdc1486562672 270845628 215717044  56% 
>> /var/lib/ceph/osd/ceph-1
>> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
>> actual 3481543, ideal 3447443, fragmentation factor 0.98%
>>
>> Bryan
>>
>> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe  
>> wrote:
>> >
>> > How fragmented is that file system?
>> >
>> > Sent from my iPad
>> >
>> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell 
>> > >  wrote:
>> > >
>> > > This appears to be more of an XFS issue than a ceph issue, but I've
&g

Re: [ceph-users] Full OSD with 29% free

2013-10-30 Thread Bryan Stillwell
I wanted to report back on this since I've made some progress on
fixing this issue.

After converting every OSD on a single server to use a 2K block size,
I've been able to cross 90% utilization without running into the 'No
space left on device' problem.  They're currently between 51% and 75%,
but I hit 90% over the weekend after a couple OSDs died during
recovery.

This conversion was pretty rough though with OSDs randomly dying
multiple times during the process (logs point at suicide time outs).
When looking at top I would frequently see xfsalloc pegging multiple
cores, so I wonder if that has something to do with it.  I also had
the 'xfs_db -r "-c freesp -s"' command segfault on me a few times
which was fixed by running xfs_repair on those partitions.  This has
me wondering how well XFS is tested with non-default block sizes on
CentOS 6.4...

Anyways, after about a week I was finally able to get the cluster to
fully recover today.  Now I need to repeat the process on 7 more
servers before I can finish populating my cluster...

In case anyone is wondering how I switched to a 2K block size, this is
what I added to my ceph.conf:

[osd]
osd_mount_options_xfs = "rw,noatime,inode64"
osd_mkfs_options_xfs = "-f -b size=2048"


The cluster is currently running the 0.71 release.

Bryan

On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell
 wrote:
> So I'm running into this issue again and after spending a bit of time
> reading the XFS mailing lists, I believe the free space is too
> fragmented:
>
> [root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
>from  to extents  blockspct
>   1   1 85773 85773   0.24
>   2   3  176891  444356   1.27
>   4   7  430854 2410929   6.87
>   8  15 2327527 30337352  86.46
>  16  31   75871 1809577   5.16
> total free extents 3096916
> total free blocks 35087987
> average free extent size 11.33
>
>
> Compared to a drive which isn't reporting 'No space left on device':
>
> [root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
>from  to extents  blockspct
>   1   1  133148  133148   0.15
>   2   3  320737  808506   0.94
>   4   7  809748 4532573   5.27
>   8  15 4536681 59305608  68.96
>  16  31   31531  751285   0.87
>  32  63 364   16367   0.02
>  64 127  909174   0.01
> 128 255   92072   0.00
> 256 511  48   18018   0.02
> 5121023 128  102422   0.12
>10242047 290  451017   0.52
>20484095 538 1649408   1.92
>40968191 851 5066070   5.89
>8192   16383 746 8436029   9.81
>   16384   32767 194 4042573   4.70
>   32768   65535  15  614301   0.71
>   65536  131071   1   66630   0.08
> total free extents 5835119
> total free blocks 86005201
> average free extent size 14.7392
>
>
> What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
> would help?  I'm pretty sure this would take require re-running
> mkfs.xfs on every OSD to fix if that's the case...
>
> Thanks,
> Bryan
>
>
> On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
>  wrote:
>>
>> The filesystem isn't as full now, but the fragmentation is pretty low:
>>
>> [root@den2ceph001 ~]# df /dev/sdc1
>> Filesystem   1K-blocks  Used Available Use% Mounted on
>> /dev/sdc1486562672 270845628 215717044  56% 
>> /var/lib/ceph/osd/ceph-1
>> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
>> actual 3481543, ideal 3447443, fragmentation factor 0.98%
>>
>> Bryan
>>
>> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe  
>> wrote:
>> >
>> > How fragmented is that file system?
>> >
>> > Sent from my iPad
>> >
>> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell 
>> > >  wrote:
>> > >
>> > > This appears to be more of an XFS issue than a ceph issue, but I've
>> > > run into a problem where some of my OSDs failed because the filesystem
>> > > was reported as full even though there was 29% free:
>> > >
>> > > [root@den2ceph001 ceph-1]# touch blah
>> > > touch: cannot touch `blah': No space left on device
>> > > [root@den2ceph001 ceph-1]# df .
>> > > Filesystem   1K-blocks  Used Available Use% Mounted on
>> > > /dev/sdc1486562672 342139340 144423332  71% 
>> > > /var/lib/ceph/osd/ceph-1
>> > > [root@den2ceph001 ceph-1]# df -i .
>> > > FilesystemInodes   IUsed   IFree IUse% Mounted on
>> > > /dev/sdc160849984 4097408 567525767% 
>> > > /var/lib/ceph/osd/ceph-1
>> > > [root@den2ceph001 ceph-1]#
>> > >
>> > > I've tried remounting the filesystem with the inode64 option like a
>> > > few people recommended, but that didn't help (probably because it
>> > > doesn't appear to be running out of inodes).
>> > >
>> > > This happened while I was on vacation and I'm pretty sure it was
>> > > caused by another OSD failing on the same node.  I've been able to
>> > > recover from the situation by bringing the failed OSD back online, but
>> > > it's 

Re: [ceph-users] Full OSD with 29% free

2013-10-21 Thread Bryan Stillwell
So I'm running into this issue again and after spending a bit of time
reading the XFS mailing lists, I believe the free space is too
fragmented:

[root@den2ceph001 ceph-0]# xfs_db -r "-c freesp -s" /dev/sdb1
   from  to extents  blockspct
  1   1   85773   85773   0.24
  2   3  176891  444356   1.27
  4   7  430854 2410929   6.87
  8  15 2327527 30337352  86.46
 16  31   75871 1809577   5.16
total free extents 3096916
total free blocks 35087987
average free extent size 11.33


Compared to a drive which isn't reporting 'No space left on device':

[root@den2ceph008 ~]# xfs_db -r "-c freesp -s" /dev/sdc1
   from  to extents  blockspct
  1   1  133148  133148   0.15
  2   3  320737  808506   0.94
  4   7  809748 4532573   5.27
  8  15 4536681 59305608  68.96
 16  31   31531  751285   0.87
 32  63 364   16367   0.02
 64 127  909174   0.01
128 255   92072   0.00
256 511  48   18018   0.02
5121023 128  102422   0.12
   10242047 290  451017   0.52
   20484095 538 1649408   1.92
   40968191 851 5066070   5.89
   8192   16383 746 8436029   9.81
  16384   32767 194 4042573   4.70
  32768   65535  15  614301   0.71
  65536  131071   1   66630   0.08
total free extents 5835119
total free blocks 86005201
average free extent size 14.7392


What I'm wondering is if reducing the block size from 4K to 2K (or 1K)
would help?  I'm pretty sure this would take require re-running
mkfs.xfs on every OSD to fix if that's the case...

Thanks,
Bryan


On Mon, Oct 14, 2013 at 5:28 PM, Bryan Stillwell
 wrote:
>
> The filesystem isn't as full now, but the fragmentation is pretty low:
>
> [root@den2ceph001 ~]# df /dev/sdc1
> Filesystem   1K-blocks  Used Available Use% Mounted on
> /dev/sdc1486562672 270845628 215717044  56% 
> /var/lib/ceph/osd/ceph-1
> [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
> actual 3481543, ideal 3447443, fragmentation factor 0.98%
>
> Bryan
>
> On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe  
> wrote:
> >
> > How fragmented is that file system?
> >
> > Sent from my iPad
> >
> > > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell  
> > > wrote:
> > >
> > > This appears to be more of an XFS issue than a ceph issue, but I've
> > > run into a problem where some of my OSDs failed because the filesystem
> > > was reported as full even though there was 29% free:
> > >
> > > [root@den2ceph001 ceph-1]# touch blah
> > > touch: cannot touch `blah': No space left on device
> > > [root@den2ceph001 ceph-1]# df .
> > > Filesystem   1K-blocks  Used Available Use% Mounted on
> > > /dev/sdc1486562672 342139340 144423332  71% 
> > > /var/lib/ceph/osd/ceph-1
> > > [root@den2ceph001 ceph-1]# df -i .
> > > FilesystemInodes   IUsed   IFree IUse% Mounted on
> > > /dev/sdc160849984 4097408 567525767% 
> > > /var/lib/ceph/osd/ceph-1
> > > [root@den2ceph001 ceph-1]#
> > >
> > > I've tried remounting the filesystem with the inode64 option like a
> > > few people recommended, but that didn't help (probably because it
> > > doesn't appear to be running out of inodes).
> > >
> > > This happened while I was on vacation and I'm pretty sure it was
> > > caused by another OSD failing on the same node.  I've been able to
> > > recover from the situation by bringing the failed OSD back online, but
> > > it's only a matter of time until I'll be running into this issue again
> > > since my cluster is still being populated.
> > >
> > > Any ideas on things I can try the next time this happens?
> > >
> > > Thanks,
> > > Bryan
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD with 29% free

2013-10-14 Thread Bryan Stillwell
The filesystem isn't as full now, but the fragmentation is pretty low:

[root@den2ceph001 ~]# df /dev/sdc1
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sdc1486562672 270845628 215717044  56% /var/lib/ceph/osd/ceph-1
[root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1
actual 3481543, ideal 3447443, fragmentation factor 0.98%

Bryan

On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe  wrote:
>
> How fragmented is that file system?
>
> Sent from my iPad
>
> > On Oct 14, 2013, at 5:44 PM, Bryan Stillwell  
> > wrote:
> >
> > This appears to be more of an XFS issue than a ceph issue, but I've
> > run into a problem where some of my OSDs failed because the filesystem
> > was reported as full even though there was 29% free:
> >
> > [root@den2ceph001 ceph-1]# touch blah
> > touch: cannot touch `blah': No space left on device
> > [root@den2ceph001 ceph-1]# df .
> > Filesystem   1K-blocks  Used Available Use% Mounted on
> > /dev/sdc1486562672 342139340 144423332  71% 
> > /var/lib/ceph/osd/ceph-1
> > [root@den2ceph001 ceph-1]# df -i .
> > FilesystemInodes   IUsed   IFree IUse% Mounted on
> > /dev/sdc160849984 4097408 567525767% 
> > /var/lib/ceph/osd/ceph-1
> > [root@den2ceph001 ceph-1]#
> >
> > I've tried remounting the filesystem with the inode64 option like a
> > few people recommended, but that didn't help (probably because it
> > doesn't appear to be running out of inodes).
> >
> > This happened while I was on vacation and I'm pretty sure it was
> > caused by another OSD failing on the same node.  I've been able to
> > recover from the situation by bringing the failed OSD back online, but
> > it's only a matter of time until I'll be running into this issue again
> > since my cluster is still being populated.
> >
> > Any ideas on things I can try the next time this happens?
> >
> > Thanks,
> > Bryan
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full OSD with 29% free

2013-10-14 Thread Michael Lowe
How fragmented is that file system?

Sent from my iPad

> On Oct 14, 2013, at 5:44 PM, Bryan Stillwell  
> wrote:
> 
> This appears to be more of an XFS issue than a ceph issue, but I've
> run into a problem where some of my OSDs failed because the filesystem
> was reported as full even though there was 29% free:
> 
> [root@den2ceph001 ceph-1]# touch blah
> touch: cannot touch `blah': No space left on device
> [root@den2ceph001 ceph-1]# df .
> Filesystem   1K-blocks  Used Available Use% Mounted on
> /dev/sdc1486562672 342139340 144423332  71% 
> /var/lib/ceph/osd/ceph-1
> [root@den2ceph001 ceph-1]# df -i .
> FilesystemInodes   IUsed   IFree IUse% Mounted on
> /dev/sdc160849984 4097408 567525767% /var/lib/ceph/osd/ceph-1
> [root@den2ceph001 ceph-1]#
> 
> I've tried remounting the filesystem with the inode64 option like a
> few people recommended, but that didn't help (probably because it
> doesn't appear to be running out of inodes).
> 
> This happened while I was on vacation and I'm pretty sure it was
> caused by another OSD failing on the same node.  I've been able to
> recover from the situation by bringing the failed OSD back online, but
> it's only a matter of time until I'll be running into this issue again
> since my cluster is still being populated.
> 
> Any ideas on things I can try the next time this happens?
> 
> Thanks,
> Bryan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com