Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Eric Eastman
Under Jewel 10.2.2 I have also had to delete PG directories to get very
full OSDs to restart. I first use "du -sh *" under the "current" directory
to find which OSD directories are the fullest on the full OSD disk, and
pick 1 of the fullest.  I then look at the PG map and verify the PG is
replicated and is on another running OSD.  I also reweight the full OSD
before deleting the selected PG tree on the full OSD, using:

rm -rf PG_head
  where PG is the PG number

After the directory is removed, I restart the OSD.  Once the OSD is back up
running and has some free space, I do a deep-scrub on the PG.  So far I
have had no issues using this procedure.

Eric



On Mon, Aug 8, 2016 at 7:32 AM, Gerd Jakobovitsch 
wrote:

> I got to this situation several times, due to a strange behavior in the
> xfs filesystem - I initially ran on debian, afterwards reinstalled the
> nodes to centos7, kernel 3.10.0-229.14.1.el7.x86_64, package
> xfsprogs-3.2.1-6.el7.x86_64. Around 75-80% of usage shown with df, the disk
> is already full.
>
> To delete PGs in order to restart the OSD, I first lowered the weight of
> the affected OSD, and observed which PGs started backfilling elsewhere.
> Then I deleted some of these backfilling PGs before trying to restart the
> OSD. It worked without data loss.
>
> Em 08-08-2016 08:19, Mykola Dvornik escreveu:
>
> @Shinobu
>
> According to
> http://docs.ceph.com/docs/master/rados/troubleshooting/
> troubleshooting-osd/
>
> "If you cannot start an OSD because it is full, you may delete some data
> by deleting some placement group directories in the full OSD."
>
>
> On 8 August 2016 at 13:16, Shinobu Kinjo  wrote:
>
>> On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik 
>> wrote:
>> > Dear ceph community,
>> >
>> > One of the OSDs in my cluster cannot start due to the
>> >
>> > ERROR: osd init failed: (28) No space left on device
>> >
>> > A while ago it was recommended to manually delete PGs on the OSD to let
>> it
>> > start.
>>
>> Who recommended that?
>>
>> >
>> > So I am wondering was is the recommended way to fix this issue for the
>> > cluster running Jewel release (10.2.2)?
>> >
>> > Regards,
>> >
>> > --
>> >  Mykola
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Email:
>> shin...@linux.com
>> shin...@redhat.com
>>
>
>
>
> --
>  Mykola
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
>
> --
>
> [image: Mandic Cloud Solutions]
> 
>  *
> Gerd Jakobovitsch *
> *Diretoria de Tecnologia*
> +55 11 3030-3456
>
> Avalie a Mandic Cloud
> 
>  |
> Como está sua satisfação?
>
> *Vendas:* 4007-2442
> *Suporte 24h:* 4007-1858 | 400-365-24
>
> *Mandic.* Somos Especialistas em Cloud.
> 
>
>   [image: Imagem: Cloud Sob Medida - Garantia de serviço e contrato em
> reais, sem variação do dólar.]
> 
>
>
>
>
> --
>
> As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo
> sigilo legal e por direitos autorais. A divulgação, distribuição,
> reprodução ou qualquer forma de utilização do teor deste documento depende
> de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso
> esta comunicação tenha sido recebida por engano, favor avisar
> imediatamente, respondendo esta mensagem.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Gerd Jakobovitsch
I got to this situation several times, due to a strange behavior in the 
xfs filesystem - I initially ran on debian, afterwards reinstalled the 
nodes to centos7, kernel 3.10.0-229.14.1.el7.x86_64, package 
xfsprogs-3.2.1-6.el7.x86_64. Around 75-80% of usage shown with df, the 
disk is already full.


To delete PGs in order to restart the OSD, I first lowered the weight of 
the affected OSD, and observed which PGs started backfilling elsewhere. 
Then I deleted some of these backfilling PGs before trying to restart 
the OSD. It worked without data loss.



Em 08-08-2016 08:19, Mykola Dvornik escreveu:

@Shinobu

According to
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

"If you cannot start an OSD because it is full, you may delete some 
data by deleting some placement group directories in the full OSD."



On 8 August 2016 at 13:16, Shinobu Kinjo > wrote:


On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik
> wrote:
> Dear ceph community,
>
> One of the OSDs in my cluster cannot start due to the
>
> ERROR: osd init failed: (28) No space left on device
>
> A while ago it was recommended to manually delete PGs on the OSD
to let it
> start.

Who recommended that?

>
> So I am wondering was is the recommended way to fix this issue
for the
> cluster running Jewel release (10.2.2)?
>
> Regards,
>
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>



--
Email:
shin...@linux.com 
shin...@redhat.com 




--
 Mykola**


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--

--


Mandic Cloud Solutions 
 
	*Gerd Jakobovitsch *

*Diretoria de Tecnologia*
+55 11 3030-3456

Avalie a Mandic Cloud 
 
| Como está sua satisfação?


*Vendas:* 4007-2442
*Suporte 24h:* 4007-1858 | 400-365-24

*Mandic.* Somos Especialistas em Cloud. 
 



	Imagem: Cloud Sob Medida - Garantia de serviço e contrato em reais, sem 
variação do dólar. 
 







--

As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo 
sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou 
qualquer forma de utilização do teor deste documento depende de autorização do 
emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação 
tenha sido recebida por engano, favor avisar imediatamente, respondendo esta 
mensagem.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Shinobu Kinjo
So I am wondering ``was`` is the recommended way to fix this issue for
the cluster running Jewel release (10.2.2)?

So I am wondering ``what`` is the recommended way to fix this issue
for the cluster running Jewel release (10.2.2)?

typo?


On Mon, Aug 8, 2016 at 8:19 PM, Mykola Dvornik  wrote:
> @Shinobu
>
> According to
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
>
> "If you cannot start an OSD because it is full, you may delete some data by
> deleting some placement group directories in the full OSD."
>
>
> On 8 August 2016 at 13:16, Shinobu Kinjo  wrote:
>>
>> On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik 
>> wrote:
>> > Dear ceph community,
>> >
>> > One of the OSDs in my cluster cannot start due to the
>> >
>> > ERROR: osd init failed: (28) No space left on device
>> >
>> > A while ago it was recommended to manually delete PGs on the OSD to let
>> > it
>> > start.
>>
>> Who recommended that?
>>
>> >
>> > So I am wondering was is the recommended way to fix this issue for the
>> > cluster running Jewel release (10.2.2)?
>> >
>> > Regards,
>> >
>> > --
>> >  Mykola
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Email:
>> shin...@linux.com
>> shin...@redhat.com
>
>
>
>
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Mykola Dvornik
@Shinobu

According to
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

"If you cannot start an OSD because it is full, you may delete some data by
deleting some placement group directories in the full OSD."


On 8 August 2016 at 13:16, Shinobu Kinjo  wrote:

> On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik 
> wrote:
> > Dear ceph community,
> >
> > One of the OSDs in my cluster cannot start due to the
> >
> > ERROR: osd init failed: (28) No space left on device
> >
> > A while ago it was recommended to manually delete PGs on the OSD to let
> it
> > start.
>
> Who recommended that?
>
> >
> > So I am wondering was is the recommended way to fix this issue for the
> > cluster running Jewel release (10.2.2)?
> >
> > Regards,
> >
> > --
> >  Mykola
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Email:
> shin...@linux.com
> shin...@redhat.com
>



-- 
 Mykola
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Shinobu Kinjo
On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik  wrote:
> Dear ceph community,
>
> One of the OSDs in my cluster cannot start due to the
>
> ERROR: osd init failed: (28) No space left on device
>
> A while ago it was recommended to manually delete PGs on the OSD to let it
> start.

Who recommended that?

>
> So I am wondering was is the recommended way to fix this issue for the
> cluster running Jewel release (10.2.2)?
>
> Regards,
>
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovering full OSD

2016-08-08 Thread Mykola Dvornik
Dear ceph community,

One of the OSDs in my cluster cannot start due to the

*ERROR: osd init failed: (28) No space left on device*

A while ago it was recommended to manually delete PGs on the OSD to let it
start.

So I am wondering was is the recommended way to fix this issue for the
cluster running Jewel release (10.2.2)?

Regards,

-- 
 Mykola
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com