Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-03 Thread Andrija Panic
I really advise removing the bastards becore they die...no rebalancing
hapening just temp osd down while replacing journals...

What size and model are yours Samsungs?
On Sep 3, 2015 7:10 PM, "Quentin Hartman" 
wrote:

> We also just started having our 850 Pros die one after the other after
> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
> is fine, and then it's not even visible to the machine. According to the
> stats in hdparm and the calcs I did they should have had years of life
> left, so it seems that ceph journals definitely do something they do not
> like, which is not reflected in their stats.
>
> QH
>
> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus  wrote:
>
>> Hi ,
>>
>> We got a good deal on 843T and we are using it in our Openstack setup
>> ..as journals .
>> They have been running for last six months ... No issues .
>> When we compared with  Intel SSDs I think it was 3700 they  were shade
>> slower for our workload and considerably cheaper.
>> We did not run any synthetic benchmark since we had a specific use case.
>> The performance was better than our old setup so it was good enough.
>>
>> hth
>>
>>
>>
>> On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic 
>> wrote:
>>
>>> We have some 850 pro 256gb ssds if anyone interested to buy:)
>>>
>>> And also there was new 850 pro firmware that broke peoples disk which
>>> was revoked later etc... I'm sticking with only vacuum cleaners from
>>> Samsung for now, maybe... :)
>>> On Aug 25, 2015 12:02 PM, "Voloshanenko Igor" <
>>> igor.voloshane...@gmail.com> wrote:
>>>
>>>> To be honest, Samsung 850 PRO not 24/7 series... it's something about
>>>> desktop+ series, but anyway - results from this drives - very very bad in
>>>> any scenario acceptable by real life...
>>>>
>>>> Possible 845 PRO more better, but we don't want to experiment
>>>> anymore... So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x
>>>> times), and no so durable for writes, but we think more better to replace 1
>>>> ssd per 1 year than to pay double price now.
>>>>
>>>> 2015-08-25 12:59 GMT+03:00 Andrija Panic :
>>>>
>>>>> And should I mention that in another CEPH installation we had samsung
>>>>> 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear
>>>>> from the system, so not wear out...
>>>>>
>>>>> Never again we buy Samsung :)
>>>>> On Aug 25, 2015 11:57 AM, "Andrija Panic" 
>>>>> wrote:
>>>>>
>>>>>> First read please:
>>>>>>
>>>>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>>>>
>>>>>> We are getting 200 IOPS in comparison to Intels3500 18.000 iops -
>>>>>> those are  constant performance numbers, meaning avoiding drives cache 
>>>>>> and
>>>>>> running for longer period of time...
>>>>>> Also if checking with FIO you will get better latencies on intel
>>>>>> s3500 (model tested in our case) along with 20X better IOPS results...
>>>>>>
>>>>>> We observed original issue by having high speed at begining of i.e.
>>>>>> file transfer inside VM, which than halts to zero... We moved journals 
>>>>>> back
>>>>>> to HDDs and performans was acceptable...no we are upgrading to intel
>>>>>> S3500...
>>>>>>
>>>>>> Best
>>>>>> any details on that ?
>>>>>>
>>>>>> On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
>>>>>>  wrote:
>>>>>>
>>>>>> > Make sure you test what ever you decide. We just learned this the
>>>>>> hard way
>>>>>> > with samsung 850 pro, which is total crap, more than you could
>>>>>> imagine...
>>>>>> >
>>>>>> > Andrija
>>>>>> > On Aug 25, 2015 11:25 AM, "Jan Schermer"  wrote:
>>>>>> >
>>>>>> > > I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
>>>>>> > > Very cheap, better than Intel 3610 for sure (and I think it beats
>>>>>> even
>>>>>> > > 3700).
>>>

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Hi James,

I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
partitions on each SSD) - SSDs just vanished with no warning, no smartctl
errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
3-4 months of being in production (VMs/KVM/CloudStack)

Mine were also Samsung 850 PRO 128GB.

Best,
Andrija

On 4 September 2015 at 19:27, James (Fei) Liu-SSI  wrote:

> Hi Quentin and Andrija,
>
> Thanks so much for reporting the problems with Samsung.
>
>
>
> Would be possible to get to know your configuration of your system?  What
> kind of workload are you running?  Do you use Samsung SSD as separate
> journaling disk, right?
>
>
>
> Thanks so much.
>
>
>
> James
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Quentin Hartman
> *Sent:* Thursday, September 03, 2015 1:06 PM
> *To:* Andrija Panic
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Yeah, we've ordered some S3700's to replace them already. Should be here
> early next week. Hopefully they arrive before we have multiple nodes die at
> once and can no longer rebalance successfully.
>
>
>
> Most of the drives I have are the 850 Pro 128GB (specifically
> MZ7KE128HMGA)
>
> There are a couple 120GB 850 EVOs in there too, but ironically, none of
> them have pooped out yet.
>
>
>
> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic 
> wrote:
>
> I really advise removing the bastards becore they die...no rebalancing
> hapening just temp osd down while replacing journals...
>
> What size and model are yours Samsungs?
>
> On Sep 3, 2015 7:10 PM, "Quentin Hartman" 
> wrote:
>
> We also just started having our 850 Pros die one after the other after
> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
> is fine, and then it's not even visible to the machine. According to the
> stats in hdparm and the calcs I did they should have had years of life
> left, so it seems that ceph journals definitely do something they do not
> like, which is not reflected in their stats.
>
>
>
> QH
>
>
>
> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus  wrote:
>
> Hi ,
>
> We got a good deal on 843T and we are using it in our Openstack setup ..as
> journals .
> They have been running for last six months ... No issues .
>
> When we compared with  Intel SSDs I think it was 3700 they  were shade
> slower for our workload and considerably cheaper.
>
> We did not run any synthetic benchmark since we had a specific use case.
>
> The performance was better than our old setup so it was good enough.
>
> hth
>
>
>
> On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic 
> wrote:
>
> We have some 850 pro 256gb ssds if anyone interested to buy:)
>
> And also there was new 850 pro firmware that broke peoples disk which was
> revoked later etc... I'm sticking with only vacuum cleaners from Samsung
> for now, maybe... :)
>
> On Aug 25, 2015 12:02 PM, "Voloshanenko Igor" 
> wrote:
>
> To be honest, Samsung 850 PRO not 24/7 series... it's something about
> desktop+ series, but anyway - results from this drives - very very bad in
> any scenario acceptable by real life...
>
>
>
> Possible 845 PRO more better, but we don't want to experiment anymore...
> So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and
> no so durable for writes, but we think more better to replace 1 ssd per 1
> year than to pay double price now.
>
>
>
> 2015-08-25 12:59 GMT+03:00 Andrija Panic :
>
> And should I mention that in another CEPH installation we had samsung 850
> pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
> the system, so not wear out...
>
> Never again we buy Samsung :)
>
> On Aug 25, 2015 11:57 AM, "Andrija Panic"  wrote:
>
> First read please:
>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
> We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
> are  constant performance numbers, meaning avoiding drives cache and
> running for longer period of time...
> Also if checking with FIO you will get better latencies on intel s3500
> (model tested in our case) along with 20X better IOPS results...
>
> We observed original issue by having high speed at begining of i.e. file
> transfer inside VM, which than halts to zero... We moved journals back to
> HDDs and performans was acceptable...no we are upgrading to intel S3500...
>
> Best
>
> any details on that ?
>
> On 

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Quentin,

try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than
1MB/s - that is common for most "home" drives - check the post down to
understand

We removed all Samsung 850 pro 256GB from our new CEPH installation and
replaced with Intel S3500 (18.000 (4Kb) IOPS constant write speed with
O_DIRECT, D_SYNC, in comparison to 200 IOPS for Samsun 850pro - you can
imagine the difference...):

http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Best

On 4 September 2015 at 21:09, Quentin Hartman 
wrote:

> Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in
> there just because I couldn't find 14 pros at the time we were ordering
> hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves
> as the boot drive  and the journal for 3 OSDs. And similarly, mine just
> started disappearing a few weeks ago. I've now had four fail (three 850
> Pro, one 840 Pro). I expect the rest to fail any day.
>
> As it turns out I had a phone conversation with the support rep who has
> been helping me with RMA's today and he's putting together a report with my
> pertinent information in it to forward on to someone.
>
> FWIW, I tried to get your 845's for this deploy, but couldn't find them
> anywhere, and since the 850's looked about as durable on paper I figured
> they would do ok. Seems not to be the case.
>
> QH
>
> On Fri, Sep 4, 2015 at 12:53 PM, Andrija Panic 
> wrote:
>
>> Hi James,
>>
>> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
>> partitions on each SSD) - SSDs just vanished with no warning, no smartctl
>> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
>> 3-4 months of being in production (VMs/KVM/CloudStack)
>>
>> Mine were also Samsung 850 PRO 128GB.
>>
>> Best,
>> Andrija
>>
>> On 4 September 2015 at 19:27, James (Fei) Liu-SSI <
>> james@ssi.samsung.com> wrote:
>>
>>> Hi Quentin and Andrija,
>>>
>>> Thanks so much for reporting the problems with Samsung.
>>>
>>>
>>>
>>> Would be possible to get to know your configuration of your system?
>>> What kind of workload are you running?  Do you use Samsung SSD as separate
>>> journaling disk, right?
>>>
>>>
>>>
>>> Thanks so much.
>>>
>>>
>>>
>>> James
>>>
>>>
>>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *Quentin Hartman
>>> *Sent:* Thursday, September 03, 2015 1:06 PM
>>> *To:* Andrija Panic
>>> *Cc:* ceph-users
>>> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T
>>> vs. Intel s3700
>>>
>>>
>>>
>>> Yeah, we've ordered some S3700's to replace them already. Should be here
>>> early next week. Hopefully they arrive before we have multiple nodes die at
>>> once and can no longer rebalance successfully.
>>>
>>>
>>>
>>> Most of the drives I have are the 850 Pro 128GB (specifically
>>> MZ7KE128HMGA)
>>>
>>> There are a couple 120GB 850 EVOs in there too, but ironically, none of
>>> them have pooped out yet.
>>>
>>>
>>>
>>> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic 
>>> wrote:
>>>
>>> I really advise removing the bastards becore they die...no rebalancing
>>> hapening just temp osd down while replacing journals...
>>>
>>> What size and model are yours Samsungs?
>>>
>>> On Sep 3, 2015 7:10 PM, "Quentin Hartman" 
>>> wrote:
>>>
>>> We also just started having our 850 Pros die one after the other after
>>> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
>>> is fine, and then it's not even visible to the machine. According to the
>>> stats in hdparm and the calcs I did they should have had years of life
>>> left, so it seems that ceph journals definitely do something they do not
>>> like, which is not reflected in their stats.
>>>
>>>
>>>
>>> QH
>>>
>>>
>>>
>>> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus  wrote:
>>>
>>> Hi ,
>>>
>>> We got a good deal on 843T and we are using it in our Openstack setup
>>> ..as journals .
>>> They have been running for last six months ... No issues .
>>>
>>> When we compared with  Intel S

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Hi James,

yes CEPH with Cloudstack. all 6 SSDs (2 SSDs in each of 3 nodes) vanished
in 2-3 weeks total time, and yes brand new Samsung 850 Pro 128GB - I also
checked wear_level atribute via smartctl prior to all drives dying - no
indication wear_level is low or anything...also all other parametres seemed
fine...

I cant reproduce setup, we returned all 850 pros...

Hardware configuration: server model (
http://www.quantaqct.com/Product/Servers/Rackmount-Servers/2U/STRATOS-S210-X22RQ-p7c77c70c83c118?search=S210-X22RQ)
= 64GB RAM, 2 x Intel 2620 v2 CPU - 12 HDDS connected from the front of
server to main disk backplain (12 OSDs) and 2 SSDs connected to embedded
Intel C601 controler on back of the servers (6 partitions on each SSD for
Jorunals + 1 partition used for OS)...

As for workload, I dont think we had very heavy workload at all, since not
to many VMs were running there, and it was mostly Linux web servers...

Best,
Andrija

On 4 September 2015 at 21:15, James (Fei) Liu-SSI  wrote:

> Hi Andrija,
>
> Thanks for your promptly response. Would be possible to have any change to
> know your hardware configuration including your server information?
> Secondly, Is there anyway to duplicate your workload with fio-rbd, rbd
> bench or rados bench?
>
>
>
>   “so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of
> being in production (VMs/KVM/CloudStack)”
>
>
>
>What you mean over here is that you deploy Ceph with CloudStack , am I
> correct? The 2 SSDs vanished in 2~3 weeks is brand new Samsung 850 Pro
> 128GB, right?
>
>
>
> Thanks,
>
> James
>
>
>
> *From:* Andrija Panic [mailto:andrija.pa...@gmail.com]
> *Sent:* Friday, September 04, 2015 11:53 AM
> *To:* James (Fei) Liu-SSI
> *Cc:* Quentin Hartman; ceph-users
>
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Hi James,
>
>
>
> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
> partitions on each SSD) - SSDs just vanished with no warning, no smartctl
> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
> 3-4 months of being in production (VMs/KVM/CloudStack)
>
> Mine were also Samsung 850 PRO 128GB.
>
>
>
> Best,
>
> Andrija
>
>
>
> On 4 September 2015 at 19:27, James (Fei) Liu-SSI <
> james@ssi.samsung.com> wrote:
>
> Hi Quentin and Andrija,
>
> Thanks so much for reporting the problems with Samsung.
>
>
>
> Would be possible to get to know your configuration of your system?  What
> kind of workload are you running?  Do you use Samsung SSD as separate
> journaling disk, right?
>
>
>
> Thanks so much.
>
>
>
> James
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Quentin Hartman
> *Sent:* Thursday, September 03, 2015 1:06 PM
> *To:* Andrija Panic
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Yeah, we've ordered some S3700's to replace them already. Should be here
> early next week. Hopefully they arrive before we have multiple nodes die at
> once and can no longer rebalance successfully.
>
>
>
> Most of the drives I have are the 850 Pro 128GB (specifically
> MZ7KE128HMGA)
>
> There are a couple 120GB 850 EVOs in there too, but ironically, none of
> them have pooped out yet.
>
>
>
> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic 
> wrote:
>
> I really advise removing the bastards becore they die...no rebalancing
> hapening just temp osd down while replacing journals...
>
> What size and model are yours Samsungs?
>
> On Sep 3, 2015 7:10 PM, "Quentin Hartman" 
> wrote:
>
> We also just started having our 850 Pros die one after the other after
> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
> is fine, and then it's not even visible to the machine. According to the
> stats in hdparm and the calcs I did they should have had years of life
> left, so it seems that ceph journals definitely do something they do not
> like, which is not reflected in their stats.
>
>
>
> QH
>
>
>
> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus  wrote:
>
> Hi ,
>
> We got a good deal on 843T and we are using it in our Openstack setup ..as
> journals .
> They have been running for last six months ... No issues .
>
> When we compared with  Intel SSDs I think it was 3700 they  were shade
> slower for our workload and considerably cheaper.
>
> We did not run any synthetic benchmark since we had a specific use case.
>
> The performance was better than our old setup so it was good enough.
>
>

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
James,

there are simple FIO tests or even DD test on Linux, which you can run to
see how good SSD will perform as CEPH Journal device (CEPH does writes with
O_DIRECT and D_SYNC flags to SSDs) - Samsung 850 perform here extremely
bad, as many, many other vendors (D_SYNC kills performance for them...)

If you are not using D_SYNC flag, then Samsung can achieve some nice
numbers...

dd if=/dev/zero of=/dev/sda bs=4k count=10 oflag=direct,dsync (where
/dev/sda is raw drive, or replace that with mount point i.e. /root/ddfile)

Check post for more info please:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
Thanks

On 4 September 2015 at 21:31, James (Fei) Liu-SSI  wrote:

> Andrija,
>
> In your email thread, (18.000 (4Kb) IOPS constant write speed stands for
> 18K iops with 4k block size, right? However, you can only achieve 200IOPS
> with Samsung 850Pro, right?
>
>
>
> Theoretically, Samsung 850 Pro can get up to 100,000 IOPS with 4k Random
> Read with certain workload.  It is a little bit strange over here.
>
>
>
> Regards,
>
> James
>
>
>
>
>
> *From:* Andrija Panic [mailto:andrija.pa...@gmail.com]
> *Sent:* Friday, September 04, 2015 12:21 PM
> *To:* Quentin Hartman
> *Cc:* James (Fei) Liu-SSI; ceph-users
>
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Quentin,
>
>
>
> try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than
> 1MB/s - that is common for most "home" drives - check the post down to
> understand
>
> We removed all Samsung 850 pro 256GB from our new CEPH installation and
> replaced with Intel S3500 (18.000 (4Kb) IOPS constant write speed with
> O_DIRECT, D_SYNC, in comparison to 200 IOPS for Samsun 850pro - you can
> imagine the difference...):
>
>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
>
>
> Best
>
>
>
> On 4 September 2015 at 21:09, Quentin Hartman <
> qhart...@direwolfdigital.com> wrote:
>
> Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in
> there just because I couldn't find 14 pros at the time we were ordering
> hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves
> as the boot drive  and the journal for 3 OSDs. And similarly, mine just
> started disappearing a few weeks ago. I've now had four fail (three 850
> Pro, one 840 Pro). I expect the rest to fail any day.
>
>
>
> As it turns out I had a phone conversation with the support rep who has
> been helping me with RMA's today and he's putting together a report with my
> pertinent information in it to forward on to someone.
>
>
>
> FWIW, I tried to get your 845's for this deploy, but couldn't find them
> anywhere, and since the 850's looked about as durable on paper I figured
> they would do ok. Seems not to be the case.
>
>
>
> QH
>
>
>
> On Fri, Sep 4, 2015 at 12:53 PM, Andrija Panic 
> wrote:
>
> Hi James,
>
>
>
> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
> partitions on each SSD) - SSDs just vanished with no warning, no smartctl
> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
> 3-4 months of being in production (VMs/KVM/CloudStack)
>
> Mine were also Samsung 850 PRO 128GB.
>
>
>
> Best,
>
> Andrija
>
>
>
> On 4 September 2015 at 19:27, James (Fei) Liu-SSI <
> james@ssi.samsung.com> wrote:
>
> Hi Quentin and Andrija,
>
> Thanks so much for reporting the problems with Samsung.
>
>
>
> Would be possible to get to know your configuration of your system?  What
> kind of workload are you running?  Do you use Samsung SSD as separate
> journaling disk, right?
>
>
>
> Thanks so much.
>
>
>
> James
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Quentin Hartman
> *Sent:* Thursday, September 03, 2015 1:06 PM
> *To:* Andrija Panic
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Yeah, we've ordered some S3700's to replace them already. Should be here
> early next week. Hopefully they arrive before we have multiple nodes die at
> once and can no longer rebalance successfully.
>
>
>
> Most of the drives I have are the 850 Pro 128GB (specifically
> MZ7KE128HMGA)
>
> There are a couple 120GB 850 EVOs in there too, but ironically, none of
> them have pooped out yet.
>
>
>
> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic 
> wro

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-07 Thread Andrija Panic
There is
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

On the other hand, I'm not sure if SSD vendors would be happy to see their
device listed performing total crap (for Journaling) ...but yes, I vote for
having some oficial page if possible !

On 7 September 2015 at 11:12, Eino Tuominen  wrote:

> Hello,
>
> Should we (somebody, please?) gather up a comprehensive list of suitable
> SSD devices to use as ceph journals? This seems to be a FAQ, and it would
> be nice if all the knowledge and user experiences from several different
> threads could be referenced easily in the future. I took a look at
> wiki.ceph.org and there was nothing on this.
>
> --
>   Eino Tuominen
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jan Schermer
> Sent: 7. syyskuuta 2015 11:44
> To: Christian Balzer
> Cc: ceph-users; Межов Игорь Александрович
> Subject: Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
> Re: Samsungs - I feel some of you are mixing and confusing different
> Samsung drives.
>
> There is a DC line of Samsung drives meant for DataCenter use. Those have
> EVO (write once read many) and PRO (write mostly) variants.
> You don't want to go anywhere near the EVO line with Ceph.
> Then there are "regular" EVO and PRO drives - they are not meant for
> server use so don't use them.
>
> The main difference is that the "DC" line should provide reliable and
> stable performance over time, no surprises, while the desktop drives can
> just pause and perform garbage collection and have completely different
> cache setup. If you torture desktop drive hard enough it will protect
> itself (slow down to a crawl).
>
> So the only usable drivess for us are "DC PRO" and nothing else.
>
> Jan
>
> > On 05 Sep 2015, at 04:36, Christian Balzer  wrote:
> >
> >
> > Hello,
> >
> > On Fri, 4 Sep 2015 22:37:06 + Межов Игорь Александрович wrote:
> >
> >> Hi!
> >>
> >>
> >> Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one
> >>
> >> ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw
> >> space.
> >>
> > Meaning you're limited to 360MB/s writes per node at best.
> > But yes, I do understand budget constraints. ^o^
> >
> >> Cluster serves as RBD storage for ~100VM.
> >>
> >>
> >> Not a  single failure per year - all devices are healthy.
> >>
> >> The remainig resource (by smart) is ~92%.
> >>
> > I use 1:2 or 1:3 journals and haven't made any dent into my 200GB S3700
> > yet.
> >
> >>
> >> Now we're try to use DC S3710 for journals.
> >
> > As I wrote a few days ago, unless you go for the 400GB version the the
> > 200GB S3710 is actually slower (for journal purposes) than the 3700, as
> > sequential write speed is the key factor here.
> >
> > Christian
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] higher read iop/s for single thread

2015-09-10 Thread Andrija Panic
"enough 4k read iop/s for multithreaded apps (around 23 000) with qemu
2.2.1."

That is very nice number if I'm allowed to comment - may I know what is
your setup (in 2 lines, hardware, number of OSDs) ?

Thanks

On 10 September 2015 at 15:39, Jan Schermer  wrote:

> Get faster CPUs (sorry, nothing else comes to mind).
> What type of application is that and what exactly does it do?
>
> Basically you would have to cache it in rbd cache or pagecache in the VM
> but that only works if the reads repeat.
>
> Jan
>
> > On 10 Sep 2015, at 15:34, Stefan Priebe - Profihost AG <
> s.pri...@profihost.ag> wrote:
> >
> > Hi,
> >
> > while we're happy running ceph firefly in production and also reach
> > enough 4k read iop/s for multithreaded apps (around 23 000) with qemu
> 2.2.1.
> >
> > We've now a customer having a single threaded application needing around
> > 2000 iop/s but we don't go above 600 iop/s in this case.
> >
> > Any tuning hints for this case?
> >
> > Stefan
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] higher read iop/s for single thread

2015-09-10 Thread Andrija Panic
We also get 2ms for writes, INtel S3500 Journals (5 journals on 1 SSD) and
4TB OSDs...

On 10 September 2015 at 16:41, Jan Schermer  wrote:

> What did you tune? Did you have to make a human sacrifice? :) Which
> release?
> The last proper benchmark numbers I saw were from hammer and the latencies
> were basically still the same, about 2ms for write.
>
> Jan
>
>
> On 10 Sep 2015, at 16:38, Haomai Wang  wrote:
>
>
>
> On Thu, Sep 10, 2015 at 10:36 PM, Jan Schermer  wrote:
>
>>
>> On 10 Sep 2015, at 16:26, Haomai Wang  wrote:
>>
>> Actually we can reach 700us per 4k write IO for single io depth(2 copy,
>> E52650, 10Gib, intel s3700). So I think 400 read iops shouldn't be a
>> unbridgeable problem.
>>
>>
>> Flushed to disk?
>>
>
> of course
>
>
>>
>>
>> CPU is critical for ssd backend, so what's your cpu model?
>>
>> On Thu, Sep 10, 2015 at 9:48 PM, Jan Schermer  wrote:
>>
>>> It's certainly not a problem with DRBD (yeah, it's something completely
>>> different but it's used for all kinds of workloads including things like
>>> replicated tablespaces for databases).
>>> It won't be a problem with VSAN (again, a bit different, but most people
>>> just want something like that)
>>> It surely won't be a problem with e.g. ScaleIO which should be
>>> comparable to Ceph.
>>>
>>> Latency on the network can be very low (0.05ms on my 10GbE). Latency on
>>> good SSDs is  2 orders of magnitute lower (as low as 0.5 ms). Linux is
>>> pretty good nowadays at waking up threads and pushing the work. Multiply
>>> those numbers by whatever factor and it's still just a fraction of the
>>> 0.5ms needed.
>>> The problem is quite frankly slow OSD code and the only solution now is
>>> to keep the data closer to the VM.
>>>
>>> Jan
>>>
>>> > On 10 Sep 2015, at 15:38, Gregory Farnum  wrote:
>>> >
>>> > On Thu, Sep 10, 2015 at 2:34 PM, Stefan Priebe - Profihost AG
>>> >  wrote:
>>> >> Hi,
>>> >>
>>> >> while we're happy running ceph firefly in production and also reach
>>> >> enough 4k read iop/s for multithreaded apps (around 23 000) with qemu
>>> 2.2.1.
>>> >>
>>> >> We've now a customer having a single threaded application needing
>>> around
>>> >> 2000 iop/s but we don't go above 600 iop/s in this case.
>>> >>
>>> >> Any tuning hints for this case?
>>> >
>>> > If the application really wants 2000 sync IOPS to disk without any
>>> > parallelism, I don't think any network storage system is likely to
>>> > satisfy him — that's only half a millisecond per IO. 600 IOPS is about
>>> > the limit of what the OSD can do right now (in terms of per-op
>>> > speeds), and although there is some work being done to improve that
>>> > it's not going to be in a released codebase for a while.
>>> >
>>> > Or perhaps I misunderstood the question?
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>>
>> Best Regards,
>>
>> Wheat
>>
>>
>>
>
>
> --
>
> Best Regards,
>
> Wheat
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Andrija Panic
Another one bites the dust...

This is Samsung 850 PRO 256GB... (6 journals on this SSDs just died...)

[root@cs23 ~]# smartctl -a /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.66-1.el6.elrepo.x86_64]
(local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:   /1:0:0:0
Product:
User Capacity:600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T
permissive' options

On 8 September 2015 at 18:01, Quentin Hartman 
wrote:

> On Tue, Sep 8, 2015 at 9:05 AM, Mark Nelson  wrote:
>
>> A list of hardware that is known to work well would be incredibly
>>> valuable to people getting started. It doesn't have to be exhaustive,
>>> nor does it have to provide all the guidance someone could want. A
>>> simple "these things have worked for others" would be sufficient. If
>>> nothing else, it will help people justify more expensive gear when their
>>> approval people say "X seems just as good and is cheaper, why can't we
>>> get that?".
>>>
>>
>> So I have my opinions on different drives, but I think we do need to be
>> really careful not to appear to endorse or pick on specific vendors. The
>> more we can stick to high-level statements like:
>>
>> - Drives should have high write endurance
>> - Drives should perform well with O_DSYNC writes
>> - Drives should support power loss protection for data in motion
>>
>> The better I think.  Once those are established, I think it's reasonable
>> to point out that certain drives meet (or do not meet) those criteria and
>> get feedback from the community as to whether or not vendor's marketing
>> actually reflects reality.  It'd also be really nice to see more
>> information available like the actual hardware (capacitors, flash cells,
>> etc) used in the drives.  I've had to show photos of the innards of
>> specific drives to vendors to get them to give me accurate information
>> regarding certain drive capabilities.  Having a database of such things
>> available to the community would be really helpful.
>>
>>
> That's probably a very good approach. I think it would be pretty simple to
> avoid the appearance of endorsement if the data is presented correctly.
>
>
>>
>>> To that point, I think perhaps though something more important than a
>>> list of known "good" hardware would be a list of known "bad" hardware,
>>>
>>
>> I'm rather hesitant to do this unless it's been specifically confirmed by
>> the vendor.  It's too easy to point fingers (see the recent kernel trim bug
>> situation).
>
>
> I disagree. I think that only comes into play if you claim to know why the
> hardware has problems. In this case, if you simply state "people who have
> used this drive have experienced a large number of seemingly premature
> failures when using them as journals" that provides sufficient warning to
> users, and if the vendor wants to engage the community and potentially pin
> down why and help us find a way to make the device work or confirm that
> it's just not suited, then that's on them. Samsung seems to be doing
> exactly that. It would be great to have them help provide that level of
> detail, but again, I don't think it's necessary. We're not saying
> "ceph/redhat/$whatever says this hardware sucks" we're saying "The
> community has found that using this hardware with ceph has exhibited these
> negative behaviors...". At that point you're just relaying experiences and
> collecting them in a central location. It's up to the reader to draw
> conclusions from it.
>
> But again, I think more important than either of these would be a
> collection of use cases with actual journal write volumes that have
> occurred in those use cases so that people can make more informed
> purchasing decisions. The fact that my small openstack cluster created 3.6T
> of writes per month on my journal drives (3 OSD each) is somewhat
> mind-blowing. That's almost four times the amount of writes my best guess
> estimates indicated we'd be doing. Clearly there's more going on than we
> are used to paying attention to. Someone coming to ceph and seeing the cost
> of DC-class SSDs versus consumer-class SSDs will almost certainly suffer
> from some amount of sticker shock, and even if they don't their purchasing
> approval people almost certainly will. This is especially true for people
> in smaller organizations where SSDs are still somewhat exotic. And when
> they come back with the "Why won't cheaper thing X be OK?" they need to
> have sufficient information to answer that. Without a test environment to
> generate data with, they will need to rely on the experiences of others,
> and right now those experiences don't seem to be documented anywhere, and
> if they are, they are not very discoverable.
>
> QH
>
> ___
> ceph-users mailing list
> cep

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Andrija Panic
"  came to the conclusion they we put to an "unintended use".   "
wtf ? : Best to install them inside shutdown workstation... :)

On 18 September 2015 at 01:04, Quentin Hartman  wrote:

> I ended up having 7 total die. 5 while in service, 2 more when I hooked
> them up to a test machine to collect information from them. To Samsung's
> credit, they've been great to deal with and are replacing the failed
> drives, on the condition that I don't use them for ceph again. Apparently
> they sent some of my failed drives to an engineer in Korea and they did a
> failure analysis on them and came to the conclusion they we put to an
> "unintended use". I have seven left I'm not sure what to do with.
>
> I've honestly always really liked Samsung, and I'm disappointed that I
> wasn't able to find anyone with their DC-class drives actually in stock so
> I ended up switching the to Intel S3700s. My users will be happy to have
> some SSDs to put in their workstations though!
>
> QH
>
> On Thu, Sep 17, 2015 at 4:49 PM, Andrija Panic 
> wrote:
>
>> Another one bites the dust...
>>
>> This is Samsung 850 PRO 256GB... (6 journals on this SSDs just died...)
>>
>> [root@cs23 ~]# smartctl -a /dev/sda
>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.66-1.el6.elrepo.x86_64]
>> (local build)
>> Copyright (C) 2002-12 by Bruce Allen,
>> http://smartmontools.sourceforge.net
>>
>> Vendor:   /1:0:0:0
>> Product:
>> User Capacity:600,332,565,813,390,450 bytes [600 PB]
>> Logical block size:   774843950 bytes
>> >> Terminate command early due to bad response to IEC mode page
>> A mandatory SMART command failed: exiting. To continue, add one or more
>> '-T permissive' options
>>
>> On 8 September 2015 at 18:01, Quentin Hartman <
>> qhart...@direwolfdigital.com> wrote:
>>
>>> On Tue, Sep 8, 2015 at 9:05 AM, Mark Nelson  wrote:
>>>
>>>> A list of hardware that is known to work well would be incredibly
>>>>> valuable to people getting started. It doesn't have to be exhaustive,
>>>>> nor does it have to provide all the guidance someone could want. A
>>>>> simple "these things have worked for others" would be sufficient. If
>>>>> nothing else, it will help people justify more expensive gear when
>>>>> their
>>>>> approval people say "X seems just as good and is cheaper, why can't we
>>>>> get that?".
>>>>>
>>>>
>>>> So I have my opinions on different drives, but I think we do need to be
>>>> really careful not to appear to endorse or pick on specific vendors. The
>>>> more we can stick to high-level statements like:
>>>>
>>>> - Drives should have high write endurance
>>>> - Drives should perform well with O_DSYNC writes
>>>> - Drives should support power loss protection for data in motion
>>>>
>>>> The better I think.  Once those are established, I think it's
>>>> reasonable to point out that certain drives meet (or do not meet) those
>>>> criteria and get feedback from the community as to whether or not vendor's
>>>> marketing actually reflects reality.  It'd also be really nice to see more
>>>> information available like the actual hardware (capacitors, flash cells,
>>>> etc) used in the drives.  I've had to show photos of the innards of
>>>> specific drives to vendors to get them to give me accurate information
>>>> regarding certain drive capabilities.  Having a database of such things
>>>> available to the community would be really helpful.
>>>>
>>>>
>>> That's probably a very good approach. I think it would be pretty simple
>>> to avoid the appearance of endorsement if the data is presented correctly.
>>>
>>>
>>>>
>>>>> To that point, I think perhaps though something more important than a
>>>>> list of known "good" hardware would be a list of known "bad" hardware,
>>>>>
>>>>
>>>> I'm rather hesitant to do this unless it's been specifically confirmed
>>>> by the vendor.  It's too easy to point fingers (see the recent kernel trim
>>>> bug situation).
>>>
>>>
>>> I disagree. I think that only comes into play if you claim to know why
>>> the hardware has problems. In this case, if you simply state &qu

Re: [ceph-users] snapshot failed after enable cache tier

2015-09-21 Thread Andrija Panic
Hi,

depending on cache mode etc - from what we have also experienced (using
CloudStack) - CEPH snapshot functionality simply stops working in some
cache configuration.
This means, we were also unable to deploy new VMs (base-gold snapshot is
created on CEPH and new data disk which is child of snapshot etc).

Inktank:
https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

Mail-list:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html


On 21 September 2015 at 13:36, Xiangyu (Raijin, BP&IT Dept) <
xiang...@huawei.com> wrote:

> Openstack Kilo use ceph as the backend storage (nova,cinder and
> glance),after enable cache tier for glance pool, take snapshot for instance
> failed (it seems generate the snapshot then delete it automatically soon)
>
>
>
> If cache tier not suitable for glance ?
>
>
>
> *Best Regards!*
>
> *
>
> [image: 图像 007]
>
> *向毓**(Raijin.Xiang)*
>
> 计算与存储部(Computing and Storage Dept)
>
> 华为技术有限公司(Huawei Technologies Co., Ltd.)
>
> Mobile:+86 186 2032 2562
>
> Mail: xiang...@huawei.com
>
> 地址:深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编:518129
> Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,
> 518129  Shenzhen, P. R. China
>
> ***
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Andrija Panic
Jiri,

if you colocate more Journals on 1 SSD (we do...), make sure to understand
the following:

- if SSD dies, all OSDs that had their journals on it, are lost...
- the more journals you put on single SSD (1 journal being 1 partition),
the worse performance, since total SSD performance is not i.e.
dedicated/available to only 1 journal, since you are now i.e. colocating 6
journals on 1 SSD...so perromance is 1/6 for each journal...

Latenc will go up, bandwith will go down, the more journals you colocate...
XFS recommended...

I suggest make balance between wanted performance and $$$ for SSDs...

best

On 29 September 2015 at 13:32, Jiri Kanicky  wrote:

> Hi Lionel.
>
> Thank you for your reply. In this case I am considering to create separate
> partitions for each disk on the SSD drive. Would be good to know what is
> the performance difference, because creating partitions is kind of waste of
> space.
>
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down? Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you think
> that having the journal on SSD is better trade off?
>
> Thank you
> Jiri
>
>
> On 29/09/2015 21:10, Lionel Bouton wrote:
>
>> Le 29/09/2015 07:29, Jiri Kanicky a écrit :
>>
>>> Hi,
>>>
>>> Is it possible to create journal in directory as explained here:
>>>
>>> http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster
>>>
>> Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
>> start) is valid for moving your journal wherever you want.
>> That said it probably won't perform as well on a filesystem (LVM as
>> lower overhead than a filesystem).
>>
>> 1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
>>> for journal) and mount it to /srv/ceph/journal
>>>
>> BTRFS is probably the worst idea for hosting journals. If you must use
>> BTRFS, you'll have to make sure that the journals are created NoCoW
>> before the first byte is ever written to them.
>>
>> 2. Add OSD: ceph-deploy osd create --fs-type btrfs
>>> ceph1:sdb:/srv/ceph/journal/osd$id/journal
>>>
>> I've no experience with ceph-deploy...
>>
>> Best regards,
>>
>> Lionel
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-30 Thread Andrija Panic
Make sure to check this blog page
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Since Im not sure if you are playing arround with CEPH, or plan it for
production and good performance.
My experience SSD as journal: SSD Samsung 850 PRO = 200 IOPS sustained
writes, vs Intel S3500 18.000 IOPS sustained writes - so you understand the
difference,,,

regards

On 30 September 2015 at 11:17, Jiri Kanicky  wrote:

> Thanks to all for responses. Great thread with a lot of info.
>
> I will go with the 3 partitions on Kingstone SDD for 3 OSDs on each node.
>
> Thanks
> Jiri
>
> On 30/09/2015 00:38, Lionel Bouton wrote:
>
>> Hi,
>>
>> Le 29/09/2015 13:32, Jiri Kanicky a écrit :
>>
>>> Hi Lionel.
>>>
>>> Thank you for your reply. In this case I am considering to create
>>> separate partitions for each disk on the SSD drive. Would be good to
>>> know what is the performance difference, because creating partitions
>>> is kind of waste of space.
>>>
>> The difference is hard to guess : filesystems need more CPU power than
>> raw block devices for example, so if you don't have much CPU power this
>> can make a significant difference. Filesystems might put more load on
>> our storage too (for example ext3/4 with data=journal will at least
>> double the disk writes). So there's a lot to consider and nothing will
>> be faster for journals than a raw partition. LVM logical volumes come a
>> close second behind because usually (if you simply use LVM to create
>> your logical volumes and don't try to use anything else like snapshots)
>> they don't change access patterns and almost don't need any CPU power.
>>
>> One more question, is it a good idea to move journal for 3 OSDs to a
>>> single SSD considering if SSD fails the whole node with 3 HDDs will be
>>> down?
>>>
>> If your SSDs are working well with Ceph and aren't cheap models dying
>> under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
>> OSDs (using 60GB of it for the 6 journals) and it works very well (they
>> were a huge performance boost compared to our previous use of internal
>> journals).
>> Some SSDs are slower than HDDs for Ceph journals though (there has been
>> a lot of discussions on this subject on this mailing list).
>>
>> Thinking of it, leaving journal on each OSD might be safer, because
>>> journal on one disk does not affect other disks (OSDs). Or do you
>>> think that having the journal on SSD is better trade off?
>>>
>> You will put significantly more stress on your HDD leaving journal on
>> them and good SSDs are far more robust than HDDs so if you pick Intel DC
>> or equivalent SSD for journal your infrastructure might even be more
>> robust than one using internal journals (HDDs are dropping like flies
>> when you have hundreds of them). There are other components able to take
>> down all your OSDs : the disk controller, the CPU, the memory, the power
>> supply, ... So adding one robust SSD shouldn't change the overall
>> availabilty much (you must check their wear level and choose the models
>> according to the amount of writes you want them to support over their
>> lifetime though).
>>
>> The main reason for journals on SSD is performance anyway. If your setup
>> is already fast enough without them, I wouldn't try to add SSDs.
>> Otherwise, if you can't reach the level of performance needed by adding
>> the OSDs already needed for your storage capacity objectives, go SSD.
>>
>> Best regards,
>>
>> Lionel
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Hi,

I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
6.5) does some complaints:
-  internal error missing backend for pool type 8

Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
compiled with RBD support ?
Can't find how to check this...

I'm able to use qemu-img to create rbd images etc...

Here is cloudstack-agent DEBUG output, all seems fine...


1e119e4c-20d1-3fbc-a525-a5771944046d
1e119e4c-20d1-3fbc-a525-a5771944046d


cloudstack






-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thank you very much Wido,
any suggestion on compiling libvirt with support (I already found a way) or
perhaps use some prebuilt , that you would recommend ?

Best


On 28 April 2014 13:25, Wido den Hollander  wrote:

> On 04/28/2014 12:49 PM, Andrija Panic wrote:
>
>> Hi,
>>
>> I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS
>> 6.5) does some complaints:
>> -  internal error missing backend for pool type 8
>>
>> Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not
>> compiled with RBD support ?
>> Can't find how to check this...
>>
>>
> No, it's probably not compiled with RBD storage pool support.
>
> As far as I know CentOS doesn't compile libvirt with that support yet.
>
>
>  I'm able to use qemu-img to create rbd images etc...
>>
>> Here is cloudstack-agent DEBUG output, all seems fine...
>>
>> 
>> 1e119e4c-20d1-3fbc-a525-a5771944046d
>> 1e119e4c-20d1-3fbc-a525-a5771944046d
>> 
>> 
>>
>
> I recommend creating a Round Robin DNS record which points to all your
> monitors.
>
>  cloudstack
>> 
>> 
>> 
>> 
>> 
>>
>> --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Thanks Dan :)


On 28 April 2014 15:02, Dan van der Ster  wrote:

>
> On 28/04/14 14:54, Wido den Hollander wrote:
>
>> On 04/28/2014 02:15 PM, Andrija Panic wrote:
>>
>>> Thank you very much Wido,
>>> any suggestion on compiling libvirt with support (I already found a way)
>>> or perhaps use some prebuilt , that you would recommend ?
>>>
>>>
>> No special suggestions, just make sure you use at least Ceph 0.67.7
>>
>> I'm not aware of any pre-build packages for CentOS.
>>
>
> Look for qemu-kvm-rhev ... el6 ...
> That's the Redhat built version of kvm which supports RBD.
>
> Cheers, Dan
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type

2014-04-28 Thread Andrija Panic
Dan, is this maybe just rbd support for kvm package (I already have rbd
enabled qemu, qemu-img etc from ceph.com site)
I need just libvirt with rbd support ?

Thanks


On 28 April 2014 15:05, Andrija Panic  wrote:

> Thanks Dan :)
>
>
> On 28 April 2014 15:02, Dan van der Ster wrote:
>
>>
>> On 28/04/14 14:54, Wido den Hollander wrote:
>>
>>> On 04/28/2014 02:15 PM, Andrija Panic wrote:
>>>
>>>> Thank you very much Wido,
>>>> any suggestion on compiling libvirt with support (I already found a way)
>>>> or perhaps use some prebuilt , that you would recommend ?
>>>>
>>>>
>>> No special suggestions, just make sure you use at least Ceph 0.67.7
>>>
>>> I'm not aware of any pre-build packages for CentOS.
>>>
>>
>> Look for qemu-kvm-rhev ... el6 ...
>> That's the Redhat built version of kvm which supports RBD.
>>
>> Cheers, Dan
>>
>
>
>
> --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD not starting at boot time

2014-04-30 Thread Andrija Panic
Hi,

I was wondering why would OSDs not start at the boot time, happens on 1
server (2 OSDs).

If i check with: chkconfig ceph --list, I can see that is should start,
that is, the MON on this server does really start but OSDs does not.

I can normally start them with: service ceph start osd.X

This is CentOS 6.5, and CEPH 0.72.2 deployed with ceph deploy tool.

I did not forget the ceph osd activate... for sure.

Thanks
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrate system VMs from local storage to CEPH

2014-05-02 Thread Andrija Panic
Hi.

I was wondering what would be correct way to migrate system VMs
(storage,console,VR) from local storage to CEPH.

I'm on CS 4.2.1 and will be soon updating to 4.3...

Is it enough to just change global setting system.vm.use.local.storage =
true, to FALSE, and then destroy system VMs (cloudstack will recreate them
in 1-2 minutes)

Also how to make sure that system VMs will NOT end up on NFS storage ?

Thanks for any input...

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Thank you very much Wido, that's exatly what I was looking for.
Thanks


On 4 May 2014 18:30, Wido den Hollander  wrote:

> On 05/02/2014 04:06 PM, Andrija Panic wrote:
>
>> Hi.
>>
>> I was wondering what would be correct way to migrate system VMs
>> (storage,console,VR) from local storage to CEPH.
>>
>> I'm on CS 4.2.1 and will be soon updating to 4.3...
>>
>> Is it enough to just change global setting system.vm.use.local.storage =
>> true, to FALSE, and then destroy system VMs (cloudstack will recreate
>> them in 1-2 minutes)
>>
>>
> Yes, that would be sufficient. CloudStack will then deploy the SSVMs on
> your RBD storage.
>
>
>  Also how to make sure that system VMs will NOT end up on NFS storage ?
>>
>>
> Make use of the tagging. Tag the RBD pools with 'rbd' and change the
> Service Offering for the SSVMs where they require 'rbd' as a storage tag.
>
>  Thanks for any input...
>>
>> --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Will try creating tag inside CS database, since GUI/cloudmoneky editing of
existing offer is NOT possible...



On 5 May 2014 16:04, Brian Rak  wrote:

>  This would be a better question for the Cloudstack community.
>
>
> On 5/2/2014 10:06 AM, Andrija Panic wrote:
>
> Hi.
>
>  I was wondering what would be correct way to migrate system VMs
> (storage,console,VR) from local storage to CEPH.
>
>  I'm on CS 4.2.1 and will be soon updating to 4.3...
>
>  Is it enough to just change global setting system.vm.use.local.storage =
> true, to FALSE, and then destroy system VMs (cloudstack will recreate them
> in 1-2 minutes)
>
>  Also how to make sure that system VMs will NOT end up on NFS storage ?
>
>  Thanks for any input...
>
>  --
>
> Andrija Panić
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Andrija Panic
Hi Wido,

thanks again for inputs.

Everything is fine, except for the Software Router - it doesn't seem to get
created on CEPH, no matter what I try.

I created new offering for CPVV and SSVM and used the guide here:
https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvmto
start using these new system offerings and it is all fine. Did the
same
for Software Router, but it keeps using original system offering, instead
of the one I created.

CS keeps creating VR on NFS storage, choosen randomly among 3 NFS storage
nodes...

Any suggestion, please ?

Thanks,
Andrija


On 5 May 2014 16:11, Andrija Panic  wrote:

> Will try creating tag inside CS database, since GUI/cloudmoneky editing of
> existing offer is NOT possible...
>
>
>
> On 5 May 2014 16:04, Brian Rak  wrote:
>
>>  This would be a better question for the Cloudstack community.
>>
>>
>> On 5/2/2014 10:06 AM, Andrija Panic wrote:
>>
>> Hi.
>>
>>  I was wondering what would be correct way to migrate system VMs
>> (storage,console,VR) from local storage to CEPH.
>>
>>  I'm on CS 4.2.1 and will be soon updating to 4.3...
>>
>>  Is it enough to just change global setting system.vm.use.local.storage
>> = true, to FALSE, and then destroy system VMs (cloudstack will recreate
>> them in 1-2 minutes)
>>
>>  Also how to make sure that system VMs will NOT end up on NFS storage ?
>>
>>  Thanks for any input...
>>
>>  --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
> --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replace journals disk

2014-05-06 Thread Andrija Panic
Good question - I'm also interested. Do you want to movejournal to
dedicated disk/partition i.e. on SSD or just replace (failed) disk with
new/bigger one ?

I was thinking (for moving jorunal to dedicated disk) about changing
symbolic links or similar, on /var/lib/ceph/osd/osd-x/journal... ?

Regards,
Andrija


On 6 May 2014 12:34, Gandalf Corvotempesta
wrote:

> Hi to all,
> I would like to replace a disk used as journal (one partition for each OSD)
>
> Which is the safest method to do so?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replace journals disk

2014-05-06 Thread Andrija Panic
If you have dedicated disk for Journal, that you want to replace - consider
(this may be not optimal, but crosses my mind...) stoping OSD (if that is
possible), maybe with "no-out" etc, then DD old disk to new one, and just
resize file system and partitions if needed...

I guess there is more elegant than this manual steps...

Cheers


On 6 May 2014 12:52, Gandalf Corvotempesta
wrote:

> 2014-05-06 12:39 GMT+02:00 Andrija Panic :
> > Good question - I'm also interested. Do you want to movejournal to
> dedicated
> > disk/partition i.e. on SSD or just replace (failed) disk with new/bigger
> one
> > ?
>
> I would like to replace the disk with a bigger one (in fact, my new
> disk is smaller, but this should not change the workflow)
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-06 Thread Andrija Panic
I appologize, I did post to wrong mailing list, to much emails these days :)
@Wido, yes I did check and there is separete offering butyou can't change
it the same way you change for CPVM and SSVM...
Will post to CS mailing, sorry for this..



On 6 May 2014 17:52, Wido den Hollander  wrote:

> On 05/05/2014 11:40 PM, Andrija Panic wrote:
>
>> Hi Wido,
>>
>> thanks again for inputs.
>>
>> Everything is fine, except for the Software Router - it doesn't seem to
>> get created on CEPH, no matter what I try.
>>
>>
> There is a separate offering for the VR, have you checked that?
>
> But this is more something for the CloudStack users list as it's not
> related to Ceph.
>
> Wido
>
>  I created new offering for CPVV and SSVM and used the guide here:
>> https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/
>> 4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvm
>> to start using these new system offerings and it is all fine. Did the
>> same for Software Router, but it keeps using original system offering,
>> instead of the one I created.
>>
>> CS keeps creating VR on NFS storage, choosen randomly among 3 NFS
>> storage nodes...
>>
>> Any suggestion, please ?
>>
>> Thanks,
>> Andrija
>>
>>
>> On 5 May 2014 16:11, Andrija Panic > <mailto:andrija.pa...@gmail.com>> wrote:
>>
>> Will try creating tag inside CS database, since GUI/cloudmoneky
>> editing of existing offer is NOT possible...
>>
>>
>>
>> On 5 May 2014 16:04, Brian Rak > <mailto:b...@gameservers.com>> wrote:
>>
>> This would be a better question for the Cloudstack community.
>>
>>
>> On 5/2/2014 10:06 AM, Andrija Panic wrote:
>>
>>> Hi.
>>>
>>> I was wondering what would be correct way to migrate system
>>> VMs (storage,console,VR) from local storage to CEPH.
>>>
>>> I'm on CS 4.2.1 and will be soon updating to 4.3...
>>>
>>> Is it enough to just change global
>>> setting system.vm.use.local.storage = true, to FALSE, and then
>>> destroy system VMs (cloudstack will recreate them in 1-2 minutes)
>>>
>>> Also how to make sure that system VMs will NOT end up on NFS
>>> storage ?
>>>
>>> Thanks for any input...
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>> --
>> http://admintweets.com
>> --
>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>> --
>> http://admintweets.com
>> --
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS over CEPH - best practice

2014-05-07 Thread Andrija Panic
Mapping RBD image to 2 or more servers is the same as a shared storage
device (SAN) -  so from there on, you could do any clustering you want,
based on what Wido said...



On 7 May 2014 12:43, Andrei Mikhailovsky  wrote:

>
> Wido, would this work if I were to run nfs over two or more servers with
> virtual IP?
>
> I can see what you've suggested working in a one server setup. What about
> if you want to have two nfs servers in an active/backup or active/active
> setup?
>
> Thanks
>
> Andrei
>
>
> --
> *From: *"Wido den Hollander" 
> *To: *ceph-users@lists.ceph.com
> *Sent: *Wednesday, 7 May, 2014 11:15:39 AM
> *Subject: *Re: [ceph-users] NFS over CEPH - best practice
>
> On 05/07/2014 11:46 AM, Andrei Mikhailovsky wrote:
> > Hello guys,
> >
> > I would like to offer NFS service to the XenServer and VMWare
> > hypervisors for storing vm images. I am currently running ceph rbd with
> > kvm, which is working reasonably well.
> >
> > What would be the best way of running NFS services over CEPH, so that
> > the XenServer and VMWare's vm disk images are stored in ceph storage
> > over NFS?
> >
>
> Use kernel RBD, put XFS on it an re-export that with NFS? Would that be
> something that works?
>
> I'd however suggest that you use a recent kernel so that you have a new
> version of krbd. For example Ubuntu 14.04 LTS.
>
> > Many thanks
> >
> > Andrei
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] qemu-img break cloudstack snapshot

2014-05-10 Thread Andrija Panic
Hi,

just to share my issue with qemu-img provided by CEPH (RedHat made a
problem, not CEPH):

newest qemu-img  - /qemu-img-0.12.1.2-2.415.el6.3ceph.x86_64.rpm  was built
from RHEL 6.5 source code, where Redhat removed the "-s" paramter, so
snapshooting in CloudStack up to 4.2.1 does not work, I guess there are
also problems with OpenStack...

Older CEPH's RPM for qemu-img that I have, that is working fine (I suppose
it was built based on RHEL 6.4 source)
is qemu-img-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm

Raised a ticket, although this is not a problem caused by CEPH, but by
RedHat.
Ticket was raised on hope for CEPH's developers to provide a older qemu-img
that works fine (the one that I have) - or possibly to compile new one
based on RHEL 6.4 source.
http://tracker.ceph.com/issues/8329

Best,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client: centos6.4 no rbd.ko

2014-05-14 Thread Andrija Panic
Try 3.x from elrepo repo...works for me, cloudstack/ceph...

Sent from Google Nexus 4
On May 14, 2014 11:56 AM, "maoqi1982"  wrote:

> Hi list
> our ceph(0.72) cluster use ubuntu12.04  is ok . client server run
> openstack install "CentOS6.4 final", the kernel is up to
> kernel-2.6.32-358.123.2.openstack.el6.x86_64.
> the question is the kernel does not support the rbd.ko ceph.ko. can anyone
>  help me to add the rbd.ko ceph.ko in 
> kernel-2.6.32-358.123.2.openstack.el6.x86_64
> or other way except up kernel
>
> thanks.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi,

I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
network also fine:
Ceph ceph-0.72.2.

When I issue "ceph status" command, I get randomly HEALTH_OK, and
imidiately after that when repeating command, I get HEALTH_WARN

Examle given down - these commands were issues within less than 1 sec
between them
There are NO occuring of word "warn" in the logs (grep -ir "warn"
/var/log/ceph) on any of the servers...
I get false alerts with my status monitoring script, for this reason...

Any help would be greatly appriciated.

Thanks,

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_OK
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 17331 kB/s rd, 113 kB/s wr, 176 op/s

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_WARN
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 28383 kB/s rd, 566 kB/s wr, 321 op/s

[root@cs3 ~]# ceph status
cluster cab20370-bf6a-4589-8010-8d5fc8682eab
 health HEALTH_OK
 monmap e2: 3 mons at
{cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
election epoch 122, quorum 0,1,2 cs1,cs2,cs3
 osdmap e890: 6 osds: 6 up, 6 in
  pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
2576 GB used, 19732 GB / 22309 GB avail
 448 active+clean
  client io 21632 kB/s rd, 49354 B/s wr, 283 op/s

-- 

Andrija Panić
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi Christian,

that seems true, thanks.

But again, there are only occurence in GZ logs files (that were logrotated,
not in current log files):
Example:

[root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140614.gz matches
Binary file /var/log/ceph/ceph.log-20140611.gz matches
Binary file /var/log/ceph/ceph.log-20140612.gz matches
Binary file /var/log/ceph/ceph.log-20140613.gz matches

Thanks,
Andrija


On 17 June 2014 10:48, Christian Balzer  wrote:

>
> Hello,
>
> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>
> > Hi,
> >
> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> > network also fine:
> > Ceph ceph-0.72.2.
> >
> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
> > imidiately after that when repeating command, I get HEALTH_WARN
> >
> > Examle given down - these commands were issues within less than 1 sec
> > between them
> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
> > /var/log/ceph) on any of the servers...
> > I get false alerts with my status monitoring script, for this reason...
> >
> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
> WRN.
>
> Regards,
>
> Christian
>
> > Any help would be greatly appriciated.
> >
> > Thanks,
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_WARN
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-17 Thread Andrija Panic
Hi,

thanks for that, but is not space issue:

OSD drives are only 12% full.
and /var drive on which MON lives is over 70% only on CS3 server, but I
have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
data avail crit = 5), and since I increased them those alerts are gone
(anyway, these alerts for /var full over 70% can be normally seen in logs
and in ceph -w output).

Here I get no normal/visible warning in eather logs or ceph -w output...

Thanks,
Andrija




On 17 June 2014 11:00, Stanislav Yanchev  wrote:

> Try grep in cs1 and cs3 could be a disk space issue.
>
>
>
>
>
> Regards,
>
> *Stanislav Yanchev*
> Core System Administrator
>
> [image: MAX TELECOM]
>
> Mobile: +359 882 549 441
> s.yanc...@maxtelecom.bg
> www.maxtelecom.bg
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Andrija Panic
> *Sent:* Tuesday, June 17, 2014 11:57 AM
> *To:* Christian Balzer
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
> HEALTH_WARN
>
>
>
> Hi Christian,
>
>
>
> that seems true, thanks.
>
>
>
> But again, there are only occurence in GZ logs files (that were
> logrotated, not in current log files):
>
> Example:
>
>
>
> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>
> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>
> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>
>
>
> Thanks,
>
> Andrija
>
>
>
> On 17 June 2014 10:48, Christian Balzer  wrote:
>
>
> Hello,
>
>
> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>
> > Hi,
> >
> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data,
> > network also fine:
> > Ceph ceph-0.72.2.
> >
> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
> > imidiately after that when repeating command, I get HEALTH_WARN
> >
> > Examle given down - these commands were issues within less than 1 sec
> > between them
> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
> > /var/log/ceph) on any of the servers...
> > I get false alerts with my status monitoring script, for this reason...
> >
>
> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
> WRN.
>
> Regards,
>
> Christian
>
>
> > Any help would be greatly appriciated.
> >
> > Thanks,
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 17331 kB/s rd, 113 kB/s wr, 176 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_WARN
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 28383 kB/s rd, 566 kB/s wr, 321 op/s
> >
> > [root@cs3 ~]# ceph status
> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
> >  health HEALTH_OK
> >  monmap e2: 3 mons at
> >
> {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0},
> > election epoch 122, quorum 0,1,2 cs1,cs2,cs3
> >  osdmap e890: 6 osds: 6 up, 6 in
> >   pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects
> > 2576 GB used, 19732 GB / 22309 GB avail
> >  448 active+clean
> >   client io 21632 kB/s rd, 49354 B/s wr, 283 op/s
> >
>
>
> --
>
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
>
>
>
>
> --
>
>
>
> Andrija Panić
>
> --
>
>

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Hi Gregory,

indeed - I still have warnings about 20% free space on CS3 server, where
MON lives...strange is that I don't get these warnings with prolonged "ceph
-w" output...
[root@cs2 ~]# ceph health detail
HEALTH_WARN
mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space!

I don't understand, how is this possible to get warnings - I have folowing
in each ceph.conf file, under the general section:

mon data avail warn = 15
mon data avail crit = 5

I found this settings on ceph mailing list...

Thanks a lot,
Andrija


On 17 June 2014 19:22, Gregory Farnum  wrote:

> Try running "ceph health detail" on each of the monitors. Your disk space
> thresholds probably aren't configured correctly or something.
> -Greg
>
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
> wrote:
>
>> Hi,
>>
>> thanks for that, but is not space issue:
>>
>> OSD drives are only 12% full.
>> and /var drive on which MON lives is over 70% only on CS3 server, but I
>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
>> data avail crit = 5), and since I increased them those alerts are gone
>> (anyway, these alerts for /var full over 70% can be normally seen in logs
>> and in ceph -w output).
>>
>> Here I get no normal/visible warning in eather logs or ceph -w output...
>>
>> Thanks,
>> Andrija
>>
>>
>>
>>
>> On 17 June 2014 11:00, Stanislav Yanchev  wrote:
>>
>>> Try grep in cs1 and cs3 could be a disk space issue.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> *Stanislav Yanchev*
>>> Core System Administrator
>>>
>>> [image: MAX TELECOM]
>>>
>>> Mobile: +359 882 549 441
>>> s.yanc...@maxtelecom.bg
>>> www.maxtelecom.bg
>>>
>>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *Andrija Panic
>>> *Sent:* Tuesday, June 17, 2014 11:57 AM
>>> *To:* Christian Balzer
>>> *Cc:* ceph-users@lists.ceph.com
>>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
>>> HEALTH_WARN
>>>
>>>
>>>
>>> Hi Christian,
>>>
>>>
>>>
>>> that seems true, thanks.
>>>
>>>
>>>
>>> But again, there are only occurence in GZ logs files (that were
>>> logrotated, not in current log files):
>>>
>>> Example:
>>>
>>>
>>>
>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>>
>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>>
>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Andrija
>>>
>>>
>>>
>>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>>
>>>
>>> Hello,
>>>
>>>
>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>>
>>> > Hi,
>>> >
>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
>>> data,
>>> > network also fine:
>>> > Ceph ceph-0.72.2.
>>> >
>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>>> > imidiately after that when repeating command, I get HEALTH_WARN
>>> >
>>> > Examle given down - these commands were issues within less than 1 sec
>>> > between them
>>> > There are NO occuring of word "warn" in the logs (grep -ir "warn"
>>> > /var/log/ceph) on any of the servers...
>>> > I get false alerts with my status monitoring script, for this reason...
>>> >
>>>
>>> If I recall correctly, the logs will show INF, WRN and ERR, so grep for
>>> WRN.
>>>
>>> Regards,
>>>
>>> Christian
>>>
>>>
>>> > Any help would be greatly appriciated.
>>> >
>>> > Thanks,
>>> >
>>> > [root@cs3 ~]# ceph status
>>> > cluster cab20370-bf6a-4589-8010-8d5fc8682eab
>>> >  health HEALTH_OK
>>> >  monmap e2: 3 mons at
>

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
As stupid as I could do it...
After lowering mon data . from 20% to 15% treshold, it seems I forgot
to restart MON service on this one node...

I appologies for bugging and thanks again everybody.

Andrija


On 18 June 2014 09:49, Andrija Panic  wrote:

> Hi Gregory,
>
> indeed - I still have warnings about 20% free space on CS3 server, where
> MON lives...strange is that I don't get these warnings with prolonged "ceph
> -w" output...
> [root@cs2 ~]# ceph health detail
> HEALTH_WARN
> mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
> space!
>
> I don't understand, how is this possible to get warnings - I have folowing
> in each ceph.conf file, under the general section:
>
> mon data avail warn = 15
> mon data avail crit = 5
>
> I found this settings on ceph mailing list...
>
> Thanks a lot,
> Andrija
>
>
> On 17 June 2014 19:22, Gregory Farnum  wrote:
>
>> Try running "ceph health detail" on each of the monitors. Your disk space
>> thresholds probably aren't configured correctly or something.
>> -Greg
>>
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
>> wrote:
>>
>>> Hi,
>>>
>>> thanks for that, but is not space issue:
>>>
>>> OSD drives are only 12% full.
>>> and /var drive on which MON lives is over 70% only on CS3 server, but I
>>> have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
>>> data avail crit = 5), and since I increased them those alerts are gone
>>> (anyway, these alerts for /var full over 70% can be normally seen in logs
>>> and in ceph -w output).
>>>
>>> Here I get no normal/visible warning in eather logs or ceph -w output...
>>>
>>> Thanks,
>>> Andrija
>>>
>>>
>>>
>>>
>>> On 17 June 2014 11:00, Stanislav Yanchev 
>>> wrote:
>>>
>>>> Try grep in cs1 and cs3 could be a disk space issue.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> *Stanislav Yanchev*
>>>> Core System Administrator
>>>>
>>>> [image: MAX TELECOM]
>>>>
>>>> Mobile: +359 882 549 441
>>>> s.yanc...@maxtelecom.bg
>>>> www.maxtelecom.bg
>>>>
>>>>
>>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>>> Behalf Of *Andrija Panic
>>>> *Sent:* Tuesday, June 17, 2014 11:57 AM
>>>> *To:* Christian Balzer
>>>> *Cc:* ceph-users@lists.ceph.com
>>>> *Subject:* Re: [ceph-users] Cluster status reported wrongly as
>>>> HEALTH_WARN
>>>>
>>>>
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>>
>>>> that seems true, thanks.
>>>>
>>>>
>>>>
>>>> But again, there are only occurence in GZ logs files (that were
>>>> logrotated, not in current log files):
>>>>
>>>> Example:
>>>>
>>>>
>>>>
>>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
>>>>
>>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
>>>>
>>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Andrija
>>>>
>>>>
>>>>
>>>> On 17 June 2014 10:48, Christian Balzer  wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I have 3 node (2 OSD per node) CEPH cluster, running fine, not much
>>>> data,
>>>> > network also fine:
>>>> > Ceph ceph-0.72.2.
>>>> >
>>>> > When I issue "ceph status" command, I get randomly HEALTH_OK, and
>>>> > imidiately after that when repeating command, I get HEALTH_WARN
>>>> >
>>>> > Examle given down - these commands were issues within less than

Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN

2014-06-18 Thread Andrija Panic
Thanks Greg, seems like I'm going to update soon...

Thanks again,
Andrija


On 18 June 2014 14:06, Gregory Farnum  wrote:

> The lack of warnings in ceph -w for this issue is a bug in Emperor.
> It's resolved in Firefly.
> -Greg
>
> On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic 
> wrote:
> >
> > Hi Gregory,
> >
> > indeed - I still have warnings about 20% free space on CS3 server, where
> MON lives...strange is that I don't get these warnings with prolonged "ceph
> -w" output...
> > [root@cs2 ~]# ceph health detail
> > HEALTH_WARN
> > mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk
> space!
> >
> > I don't understand, how is this possible to get warnings - I have
> folowing in each ceph.conf file, under the general section:
> >
> > mon data avail warn = 15
> > mon data avail crit = 5
> >
> > I found this settings on ceph mailing list...
> >
> > Thanks a lot,
> > Andrija
> >
> >
> > On 17 June 2014 19:22, Gregory Farnum  wrote:
> >>
> >> Try running "ceph health detail" on each of the monitors. Your disk
> space thresholds probably aren't configured correctly or something.
> >> -Greg
> >>
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>
> >>
> >> On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> thanks for that, but is not space issue:
> >>>
> >>> OSD drives are only 12% full.
> >>> and /var drive on which MON lives is over 70% only on CS3 server, but
> I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon
> data avail crit = 5), and since I increased them those alerts are gone
> (anyway, these alerts for /var full over 70% can be normally seen in logs
> and in ceph -w output).
> >>>
> >>> Here I get no normal/visible warning in eather logs or ceph -w
> output...
> >>>
> >>> Thanks,
> >>> Andrija
> >>>
> >>>
> >>>
> >>>
> >>> On 17 June 2014 11:00, Stanislav Yanchev 
> wrote:
> >>>>
> >>>> Try grep in cs1 and cs3 could be a disk space issue.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Stanislav Yanchev
> >>>> Core System Administrator
> >>>>
> >>>>
> >>>>
> >>>> Mobile: +359 882 549 441
> >>>> s.yanc...@maxtelecom.bg
> >>>> www.maxtelecom.bg
> >>>>
> >>>>
> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> Behalf Of Andrija Panic
> >>>> Sent: Tuesday, June 17, 2014 11:57 AM
> >>>> To: Christian Balzer
> >>>> Cc: ceph-users@lists.ceph.com
> >>>> Subject: Re: [ceph-users] Cluster status reported wrongly as
> HEALTH_WARN
> >>>>
> >>>>
> >>>>
> >>>> Hi Christian,
> >>>>
> >>>>
> >>>>
> >>>> that seems true, thanks.
> >>>>
> >>>>
> >>>>
> >>>> But again, there are only occurence in GZ logs files (that were
> logrotated, not in current log files):
> >>>>
> >>>> Example:
> >>>>
> >>>>
> >>>>
> >>>> [root@cs2 ~]# grep -ir "WRN" /var/log/ceph/
> >>>>
> >>>> Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140614.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140611.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140612.gz matches
> >>>>
> >>>> Binary file /var/log/ceph/ceph.log-20140613.gz matches
> >>>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Andrija
> >>>>
> >>>>
> >>>>
> >>>> On 17 June 2014 10:48, Christian Balzer  wrote:
> >>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>>
> >>>> On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote:
> >>>>
> >>

[ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-02 Thread Andrija Panic
Hi,

I have existing CEPH cluster of 3 nodes, versions 0.72.2

I'm in a process of installing CEPH on 4th node, but now CEPH version is
0.80.1

Will this make problems running mixed CEPH versions ?

I intend to upgrade CEPH on exsiting 3 nodes anyway ?
Recommended steps ?

Thanks

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Hi Wido, thanks for answers - I have mons and OSD on each host... server1:
mon + 2 OSDs, same for server2 and server3.

Any Proposed upgrade path, or just start with 1 server and move along to
others ?

Thanks again.
Andrija


On 2 July 2014 16:34, Wido den Hollander  wrote:

> On 07/02/2014 04:08 PM, Andrija Panic wrote:
>
>> Hi,
>>
>> I have existing CEPH cluster of 3 nodes, versions 0.72.2
>>
>> I'm in a process of installing CEPH on 4th node, but now CEPH version is
>> 0.80.1
>>
>> Will this make problems running mixed CEPH versions ?
>>
>>
> No, but the recommendation is not to have this running for a very long
> period. Try to upgrade all nodes to the same version within a reasonable
> amount of time.
>
>
>  I intend to upgrade CEPH on exsiting 3 nodes anyway ?
>> Recommended steps ?
>>
>>
> Always upgrade the monitors first! Then to the OSDs one by one.
>
>  Thanks
>>
>> --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Thanks a lot Wido, will do...

Andrija


On 3 July 2014 13:12, Wido den Hollander  wrote:

> On 07/03/2014 10:59 AM, Andrija Panic wrote:
>
>> Hi Wido, thanks for answers - I have mons and OSD on each host...
>> server1: mon + 2 OSDs, same for server2 and server3.
>>
>> Any Proposed upgrade path, or just start with 1 server and move along to
>> others ?
>>
>>
> Upgrade the packages, but don't restart the daemons yet, then:
>
> 1. Restart the mon leader
> 2. Restart the two other mons
> 3. Restart all the OSDs one by one
>
> I suggest that you wait for the cluster to become fully healthy again
> before restarting the next OSD.
>
> Wido
>
>  Thanks again.
>> Andrija
>>
>>
>> On 2 July 2014 16:34, Wido den Hollander > <mailto:w...@42on.com>> wrote:
>>
>> On 07/02/2014 04:08 PM, Andrija Panic wrote:
>>
>> Hi,
>>
>> I have existing CEPH cluster of 3 nodes, versions 0.72.2
>>
>> I'm in a process of installing CEPH on 4th node, but now CEPH
>> version is
>> 0.80.1
>>
>> Will this make problems running mixed CEPH versions ?
>>
>>
>> No, but the recommendation is not to have this running for a very
>> long period. Try to upgrade all nodes to the same version within a
>> reasonable amount of time.
>>
>>
>> I intend to upgrade CEPH on exsiting 3 nodes anyway ?
>> Recommended steps ?
>>
>>
>> Always upgrade the monitors first! Then to the OSDs one by one.
>>
>> Thanks
>>
>> --
>>
>> Andrija Panić
>>
>>
>> _
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902 
>> Skype: contact42on
>> _
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
>
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Wido,
one final question:
since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
recompile libvirt again now with ceph-devel 0.80 ?

Perhaps not smart question, but need to make sure I don't screw something...
Thanks for your time,
Andrija


On 3 July 2014 14:27, Andrija Panic  wrote:

> Thanks a lot Wido, will do...
>
> Andrija
>
>
> On 3 July 2014 13:12, Wido den Hollander  wrote:
>
>> On 07/03/2014 10:59 AM, Andrija Panic wrote:
>>
>>> Hi Wido, thanks for answers - I have mons and OSD on each host...
>>> server1: mon + 2 OSDs, same for server2 and server3.
>>>
>>> Any Proposed upgrade path, or just start with 1 server and move along to
>>> others ?
>>>
>>>
>> Upgrade the packages, but don't restart the daemons yet, then:
>>
>> 1. Restart the mon leader
>> 2. Restart the two other mons
>> 3. Restart all the OSDs one by one
>>
>> I suggest that you wait for the cluster to become fully healthy again
>> before restarting the next OSD.
>>
>> Wido
>>
>>  Thanks again.
>>> Andrija
>>>
>>>
>>> On 2 July 2014 16:34, Wido den Hollander >> <mailto:w...@42on.com>> wrote:
>>>
>>> On 07/02/2014 04:08 PM, Andrija Panic wrote:
>>>
>>> Hi,
>>>
>>> I have existing CEPH cluster of 3 nodes, versions 0.72.2
>>>
>>> I'm in a process of installing CEPH on 4th node, but now CEPH
>>> version is
>>> 0.80.1
>>>
>>> Will this make problems running mixed CEPH versions ?
>>>
>>>
>>> No, but the recommendation is not to have this running for a very
>>> long period. Try to upgrade all nodes to the same version within a
>>> reasonable amount of time.
>>>
>>>
>>> I intend to upgrade CEPH on exsiting 3 nodes anyway ?
>>> Recommended steps ?
>>>
>>>
>>> Always upgrade the monitors first! Then to the OSDs one by one.
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>>
>>> _
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>>
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>
>>>
>>>
>>> --
>>> Wido den Hollander
>>> 42on B.V.
>>> Ceph trainer and consultant
>>>
>>> Phone: +31 (0)20 700 9902 
>>> Skype: contact42on
>>> _
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>>
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>> --
>> Wido den Hollander
>> Ceph consultant and trainer
>> 42on B.V.
>>
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>>
>
>
>
> --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-03 Thread Andrija Panic
Thanks again a lot.


On 3 July 2014 15:20, Wido den Hollander  wrote:

> On 07/03/2014 03:07 PM, Andrija Panic wrote:
>
>> Wido,
>> one final question:
>> since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
>> recompile libvirt again now with ceph-devel 0.80 ?
>>
>> Perhaps not smart question, but need to make sure I don't screw
>> something...
>>
>
> No, no need to. The librados API didn't change in case you are using RBD
> storage pool support.
>
> Otherwise it just talks to Qemu and that talks to librbd/librados.
>
> Wido
>
>  Thanks for your time,
>> Andrija
>>
>>
>> On 3 July 2014 14:27, Andrija Panic > <mailto:andrija.pa...@gmail.com>> wrote:
>>
>> Thanks a lot Wido, will do...
>>
>> Andrija
>>
>>
>> On 3 July 2014 13:12, Wido den Hollander > <mailto:w...@42on.com>> wrote:
>>
>> On 07/03/2014 10:59 AM, Andrija Panic wrote:
>>
>> Hi Wido, thanks for answers - I have mons and OSD on each
>> host...
>> server1: mon + 2 OSDs, same for server2 and server3.
>>
>> Any Proposed upgrade path, or just start with 1 server and
>> move along to
>> others ?
>>
>>
>> Upgrade the packages, but don't restart the daemons yet, then:
>>
>> 1. Restart the mon leader
>> 2. Restart the two other mons
>> 3. Restart all the OSDs one by one
>>
>> I suggest that you wait for the cluster to become fully healthy
>> again before restarting the next OSD.
>>
>> Wido
>>
>> Thanks again.
>> Andrija
>>
>>
>> On 2 July 2014 16:34, Wido den Hollander > <mailto:w...@42on.com>
>> <mailto:w...@42on.com <mailto:w...@42on.com>>> wrote:
>>
>>  On 07/02/2014 04:08 PM, Andrija Panic wrote:
>>
>>  Hi,
>>
>>  I have existing CEPH cluster of 3 nodes, versions
>> 0.72.2
>>
>>  I'm in a process of installing CEPH on 4th node,
>> but now CEPH
>>  version is
>>  0.80.1
>>
>>  Will this make problems running mixed CEPH versions ?
>>
>>
>>  No, but the recommendation is not to have this running
>> for a very
>>  long period. Try to upgrade all nodes to the same
>> version within a
>>  reasonable amount of time.
>>
>>
>>  I intend to upgrade CEPH on exsiting 3 nodes anyway ?
>>  Recommended steps ?
>>
>>
>>  Always upgrade the monitors first! Then to the OSDs one
>> by one.
>>
>>  Thanks
>>
>>  --
>>
>>  Andrija Panić
>>
>>
>>  ___
>>
>>  ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> <mailto:ceph-us...@lists.ceph.__com
>> <mailto:ceph-users@lists.ceph.com>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
>> ___com
>> <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com>
>>
>>
>>
>> <http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>
>>
>>
>>
>>  --
>>  Wido den Hollander
>>  42on B.V.
>>  Ceph trainer and consultant
>>
>>  Phone: +31 (0)20 700 9902
>> 
>> 
>>  Skype: contact42on
>>  ___
>>
>>  ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> <mailto:ceph-us...@lists.ceph.__com
>> <mailto:ceph-users@lists.ceph.com>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._
>> ___com
>> <http://lists.ceph.com/__listinfo

[ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-12 Thread Andrija Panic
Hi,

Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to
0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect.

I did basic "yum update ceph" on the first MON leader, and all CEPH
services on that HOST, have been restarted - done same on other CEPH nodes
(I have 1MON + 2 OSD per physical host), then I have set variables to
optimal with "ceph osd crush tunables optimal" and after some rebalancing,
ceph shows HEALTH_OK.

Also, I can create new images with qemu-img -f rbd rbd:/cloudstack

Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions from
Wido that I don't need to REcompile now with ceph 0.80...

Libvirt logs:

libvirt: Storage Driver error : Storage pool not found: no storage pool
with matching uuid ‡Îhyš___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-12 Thread Andrija Panic
Hi Mark,
actually, CEPH is running fine, and I have deployed NEW host (new compile
libvirt with ceph 0.8 devel, and newer kernel) - and it works... so
migrating some VMs to this new host...

I have 3 physical hosts, that are both MON and 2x OSD per host, all3 don't
work-cloudstack/libvirt...

Any suggestion on need to recompile libvirt ? I got info from Wido, that
libvirt does NOT need to be recompiled


Best


On 13 July 2014 08:35, Mark Kirkwood  wrote:

> On 13/07/14 17:07, Andrija Panic wrote:
>
>> Hi,
>>
>> Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to
>> 0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect.
>>
>> I did basic "yum update ceph" on the first MON leader, and all CEPH
>> services on that HOST, have been restarted - done same on other CEPH
>> nodes (I have 1MON + 2 OSD per physical host), then I have set variables
>> to optimal with "ceph osd crush tunables optimal" and after some
>> rebalancing, ceph shows HEALTH_OK.
>>
>> Also, I can create new images with qemu-img -f rbd rbd:/cloudstack
>>
>> Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions
>> from Wido that I don't need to REcompile now with ceph 0.80...
>>
>> Libvirt logs:
>>
>> libvirt: Storage Driver error : Storage pool not found: no storage pool
>> with matching uuid ‡Îhyš>
>> Note there are some strange "uuid" - not sure what is happening ?
>>
>> Did I forget to do something after CEPH upgrade ?
>>
>
> Have you got any ceph logs to examine on the host running libvirt? When I
> try to connect a v0.72 client to v0.81 cluster I get:
>
> 2014-07-13 18:21:23.860898 7fc3bd2ca700  0 -- 192.168.122.41:0/1002012 >>
> 192.168.122.21:6789/0 pipe(0x7fc3c00241f0 sd=3 :49451 s=1 pgs=0 cs=0 l=1
> c=0x7fc3c0024450).connect protocol feature mismatch, my f < peer
> 5f missing 50
>
> Regards
>
> Mark
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80

2014-07-13 Thread Andrija Panic
Hi Mark,

update:

after restarting libvirtd and cloudstack-agent and management server God
know how many times - it WORKS now !

Not sure what is happening here, but it works again... I know for sure it
was not CEPH cluster, since it was fine, and accessible via qemu-img, etc...

Thanks Mark for your time for my issue...
Best.
Andrija




On 13 July 2014 10:20, Mark Kirkwood  wrote:

> On 13/07/14 19:15, Mark Kirkwood wrote:
>
>> On 13/07/14 18:38, Andrija Panic wrote:
>>
>
>  Any suggestion on need to recompile libvirt ? I got info from Wido, that
>>> libvirt does NOT need to be recompiled
>>>
>>>
> Thinking about this a bit more - Wido *may* have meant:
>
> - *libvirt* does not need to be rebuild
> - ...but you need to get/build a later ceph client i.e - 0.80
>
> Of course depending on how your libvirt build was set up (e.g static
> linkage), this *might* have meant you needed to rebuild it too.
>
> Regards
>
> Mark
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-13 Thread Andrija Panic
Hi,

after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd crush
tunables optimal" and after only few minutes I have added 2 more OSDs to
the CEPH cluster...

So these 2 changes were more or a less done at the same time - rebalancing
because of tunables optimal, and rebalancing because of adding new OSD...

Result - all VMs living on CEPH storage have gone mad, no disk access
efectively, blocked so to speak.

Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...

Did I do wrong by causing "2 rebalancing" to happen at the same time ?
Is this behaviour normal, to cause great load on all VMs because they are
unable to access CEPH storage efectively ?

Thanks for any input...
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mixing CEPH versions on new ceph nodes...

2014-07-13 Thread Andrija Panic
Hi Wido,

you said previously:
  Upgrade the packages, but don't restart the daemons yet, then:
  1. Restart the mon leader
  2. Restart the two other mons
  3. Restart all the OSDs one by one

But in reality (yum update or by using ceph-deploy install nodename) -
the package manager does restart ALL ceph services on that node by its
own...
So, I have upgraded - MON leader and 2 OSD on this 1st upgraded host were
restarted, folowed by doing the same with other 2 servers (1 MON peon and 2
OSD per host).

Is this perhaps a package (RPM) bug - restarting daemons automatically ?
Since it makes sense to have all MONs updated first, and than OSD (and
perhaps after that MDS if using it...)

Upgraded to 0.80.3 release btw.

Thanks for your help again.
Andrija



On 3 July 2014 15:21, Andrija Panic  wrote:

> Thanks again a lot.
>
>
> On 3 July 2014 15:20, Wido den Hollander  wrote:
>
>> On 07/03/2014 03:07 PM, Andrija Panic wrote:
>>
>>> Wido,
>>> one final question:
>>> since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to
>>> recompile libvirt again now with ceph-devel 0.80 ?
>>>
>>> Perhaps not smart question, but need to make sure I don't screw
>>> something...
>>>
>>
>> No, no need to. The librados API didn't change in case you are using RBD
>> storage pool support.
>>
>> Otherwise it just talks to Qemu and that talks to librbd/librados.
>>
>> Wido
>>
>>  Thanks for your time,
>>> Andrija
>>>
>>>
>>> On 3 July 2014 14:27, Andrija Panic >> <mailto:andrija.pa...@gmail.com>> wrote:
>>>
>>> Thanks a lot Wido, will do...
>>>
>>> Andrija
>>>
>>>
>>> On 3 July 2014 13:12, Wido den Hollander >> <mailto:w...@42on.com>> wrote:
>>>
>>> On 07/03/2014 10:59 AM, Andrija Panic wrote:
>>>
>>> Hi Wido, thanks for answers - I have mons and OSD on each
>>> host...
>>> server1: mon + 2 OSDs, same for server2 and server3.
>>>
>>> Any Proposed upgrade path, or just start with 1 server and
>>> move along to
>>> others ?
>>>
>>>
>>> Upgrade the packages, but don't restart the daemons yet, then:
>>>
>>> 1. Restart the mon leader
>>> 2. Restart the two other mons
>>> 3. Restart all the OSDs one by one
>>>
>>> I suggest that you wait for the cluster to become fully healthy
>>> again before restarting the next OSD.
>>>
>>> Wido
>>>
>>> Thanks again.
>>> Andrija
>>>
>>>
>>> On 2 July 2014 16:34, Wido den Hollander >> <mailto:w...@42on.com>
>>> <mailto:w...@42on.com <mailto:w...@42on.com>>> wrote:
>>>
>>>  On 07/02/2014 04:08 PM, Andrija Panic wrote:
>>>
>>>  Hi,
>>>
>>>  I have existing CEPH cluster of 3 nodes, versions
>>> 0.72.2
>>>
>>>  I'm in a process of installing CEPH on 4th node,
>>> but now CEPH
>>>  version is
>>>  0.80.1
>>>
>>>  Will this make problems running mixed CEPH versions
>>> ?
>>>
>>>
>>>  No, but the recommendation is not to have this running
>>> for a very
>>>  long period. Try to upgrade all nodes to the same
>>> version within a
>>>  reasonable amount of time.
>>>
>>>
>>>  I intend to upgrade CEPH on exsiting 3 nodes anyway
>>> ?
>>>  Recommended steps ?
>>>
>>>
>>>  Always upgrade the monitors first! Then to the OSDs one
>>> by one.
>>>
>>>  Thanks
>>>
>>>  --
>>>
>>>  Andrija Panić
>>>
>>>
>>>  ___
>>>
>>>  ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>

Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Hi Andrei, nice to meet you again ;)

Thanks for sharing this info with me - I though it was my mistake by
introducing new OSD components at the same time - I though that since it's
rebalancing, let's add those new OSD, so it also rebalances - so I don't
have to cause 2 data rebalancing  - but during normal OSD restart and data
rebalancing (I did not set osd noout etc...) I did have somehat lower VM
performacne, but it was all UP and fine.

Also 30% of data moving during my upgrade/tunables change... although
documents say 10% as you said.

Did not lost any data, but finding all VMs that use CEPH as storage, is
somewhat PITA...

So, any CEPH developers input would be greatly appriciated...

Thanks agan for such detailed info,
Andrija





On 14 July 2014 10:52, Andrei Mikhailovsky  wrote:

> Hi Andrija,
>
> I've got at least two more stories of similar nature. One is my friend
> running a ceph cluster and one is from me. Both of our clusters are pretty
> small. My cluster has only two osd servers with 8 osds each, 3 mons. I have
> an ssd journal per 4 osds. My friend has a cluster of 3 mons and 3 osd
> servers with 4 osds each and an ssd per 4 osds as well. Both clusters are
> connected with 40gbit/s IP over Infiniband links.
>
> We had the same issue while upgrading to firefly. However, we did not add
> any new disks, just ran the "ceph osd crush tunables optimal" command after
> following an upgrade.
>
> Both of our clusters were "down" as far as the virtual machines are
> concerned. All vms have crashed because of the lack of IO. It was a bit
> problematic, taking into account that ceph is typically so great at staying
> alive during failures and upgrades. So, there seems to be a problem with
> the upgrade. I wish devs would have added a big note in red letters that if
> you run this command it will likely affect your cluster performance and
> most likely all your vms will die. So, please shutdown your vms if you do
> not want to have data loss.
>
> I've changed the default values to reduce the load during recovery and
> also to tune a few things performance wise. My settings were:
>
> osd recovery max chunk = 8388608
>
> osd recovery op priority = 2
>
> osd max backfills = 1
>
> osd recovery max active = 1
>
> osd recovery threads = 1
>
> osd disk threads = 2
>
> filestore max sync interval = 10
>
> filestore op threads = 20
>
> filestore_flusher = false
>
> However, this didn't help much and i've noticed that shortly after running
> the tunnables command my guest vms iowait has quickly jumped to 50% and a
> to 99% a minute after. This has happened on all vms at once. During the
> recovery phase I ran the "rbd -p  ls -l" command several times
> and it took between 20-40 minutes to complete. It typically takes less than
> 2 seconds when the cluster is not in recovery mode.
>
> My mate's cluster had the same tunables apart from the last three. He had
> exactly the same behaviour.
>
> One other thing that i've noticed is that somewhere in the docs I've read
> that running the tunnable optimal command should move not more than 10% of
> your data. However, in both of our cases our status was just over 30%
> degraded and it took a good part of 9 hours to complete the data
> reshuffling.
>
>
> Any comments from the ceph team or other ceph gurus on:
>
> 1. What have we done wrong in our upgrade  process
> 2. What options should we have used to keep our vms alive
>
>
> Cheers
>
> Andrei
>
>
>
>
> --
> *From: *"Andrija Panic" 
> *To: *ceph-users@lists.ceph.com
> *Sent: *Sunday, 13 July, 2014 9:54:17 PM
> *Subject: *[ceph-users] ceph osd crush tunables optimal AND add new OSD
> at thesame time
>
>
> Hi,
>
> after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd crush
> tunables optimal" and after only few minutes I have added 2 more OSDs to
> the CEPH cluster...
>
> So these 2 changes were more or a less done at the same time - rebalancing
> because of tunables optimal, and rebalancing because of adding new OSD...
>
> Result - all VMs living on CEPH storage have gone mad, no disk access
> efectively, blocked so to speak.
>
> Since this rebalancing took 5h-6h, I had bunch of VMs down for that long...
>
> Did I do wrong by causing "2 rebalancing" to happen at the same time ?
> Is this behaviour normal, to cause great load on all VMs because they are
> unable to access CEPH storage efectively ?
>
> Thanks for any input...
> --
>
> Andrija Panić
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Perhaps here: http://ceph.com/releases/v0-80-firefly-released/
Thanks


On 14 July 2014 18:18, Sage Weil  wrote:

> I've added some additional notes/warnings to the upgrade and release
> notes:
>
>
> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
>
> If there is somewhere else where you think a warning flag would be useful,
> let me know!
>
> Generally speaking, we want to be able to cope with huge data rebalances
> without interrupting service.  It's an ongoing process of improving the
> recovery vs client prioritization, though, and removing sources of
> overhead related to rebalancing... and it's clearly not perfect yet. :/
>
> sage
>
>
> On Sun, 13 Jul 2014, Andrija Panic wrote:
>
> > Hi,
> > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd
> crush
> > tunables optimal" and after only few minutes I have added 2 more OSDs to
> the
> > CEPH cluster...
> >
> > So these 2 changes were more or a less done at the same time -
> rebalancing
> > because of tunables optimal, and rebalancing because of adding new OSD...
> >
> > Result - all VMs living on CEPH storage have gone mad, no disk access
> > efectively, blocked so to speak.
> >
> > Since this rebalancing took 5h-6h, I had bunch of VMs down for that
> long...
> >
> > Did I do wrong by causing "2 rebalancing" to happen at the same time ?
> > Is this behaviour normal, to cause great load on all VMs because they are
> > unable to access CEPH storage efectively ?
> >
> > Thanks for any input...
> > --
> >
> > Andrija Pani?
> >
> >




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-14 Thread Andrija Panic
Udo, I had all VMs completely unoperational - so don't set "optimal" for
now...


On 14 July 2014 20:48, Udo Lembke  wrote:

> Hi,
> which values are all changed with "ceph osd crush tunables optimal"?
>
> Is it perhaps possible to change some parameter the weekends before the
> upgrade is running, to have more time?
> (depends if the parameter are available in 0.72...).
>
> The warning told, it's can take days... we have an cluster with 5
> storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster
> is 60% filled.
> Networkconnection 10Gb.
> Takes tunables optimal in such an configuration one, two or more days?
>
> Udo
>
> On 14.07.2014 18:18, Sage Weil wrote:
> > I've added some additional notes/warnings to the upgrade and release
> > notes:
> >
> >
> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
> >
> > If there is somewhere else where you think a warning flag would be
> useful,
> > let me know!
> >
> > Generally speaking, we want to be able to cope with huge data rebalances
> > without interrupting service.  It's an ongoing process of improving the
> > recovery vs client prioritization, though, and removing sources of
> > overhead related to rebalancing... and it's clearly not perfect yet. :/
> >
> > sage
> >
> >
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-15 Thread Andrija Panic
Hi Sage,

since this problem is tunables-related, do we need to expect same behavior
or not  when we do regular data rebalancing caused by adding new/removing
OSD? I guess not, but would like your confirmation.
I'm already on optimal tunables, but I'm afraid to test this by i.e.
shuting down 1 OSD.

Thanks,
Andrija


On 14 July 2014 18:18, Sage Weil  wrote:

> I've added some additional notes/warnings to the upgrade and release
> notes:
>
>
> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
>
> If there is somewhere else where you think a warning flag would be useful,
> let me know!
>
> Generally speaking, we want to be able to cope with huge data rebalances
> without interrupting service.  It's an ongoing process of improving the
> recovery vs client prioritization, though, and removing sources of
> overhead related to rebalancing... and it's clearly not perfect yet. :/
>
> sage
>
>
> On Sun, 13 Jul 2014, Andrija Panic wrote:
>
> > Hi,
> > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd
> crush
> > tunables optimal" and after only few minutes I have added 2 more OSDs to
> the
> > CEPH cluster...
> >
> > So these 2 changes were more or a less done at the same time -
> rebalancing
> > because of tunables optimal, and rebalancing because of adding new OSD...
> >
> > Result - all VMs living on CEPH storage have gone mad, no disk access
> > efectively, blocked so to speak.
> >
> > Since this rebalancing took 5h-6h, I had bunch of VMs down for that
> long...
> >
> > Did I do wrong by causing "2 rebalancing" to happen at the same time ?
> > Is this behaviour normal, to cause great load on all VMs because they are
> > unable to access CEPH storage efectively ?
> >
> > Thanks for any input...
> > --
> >
> > Andrija Pani?
> >
> >




-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.4 Firefly released

2014-07-16 Thread Andrija Panic
Hi Sage,

can anyone validate, if there is still "bug" inside RPMs that does
automatic CEPH service restart after updating packages ?

We are instructed to first update/restart MONs, and after that OSD - but
that is impossible if we have MON+OSDs on same host...since the ceph is
automaticaly restarted with YUM/RPM, but NOT automaticaly restarted on
Ubuntu/Debian (as reported by some other list memeber...)

Thanks


On 16 July 2014 01:45, Sage Weil  wrote:

> This Firefly point release fixes an potential data corruption problem
> when ceph-osd daemons run on top of XFS and service Firefly librbd
> clients.  A recently added allocation hint that RBD utilizes triggers
> an XFS bug on some kernels (Linux 3.2, and likely others) that leads
> to data corruption and deep-scrub errors (and inconsistent PGs).  This
> release avoids the situation by disabling the allocation hint until we
> can validate which kernels are affected and/or are known to be safe to
> use the hint on.
>
> We recommend that all v0.80.x Firefly users urgently upgrade,
> especially if they are using RBD.
>
> Notable Changes
> ---
>
> * osd: disable XFS extsize hint by default (#8830, Samuel Just)
> * rgw: fix extra data pool default name (Yehuda Sadeh)
>
> For more detailed information, see:
>
>   http://ceph.com/docs/master/_downloads/v0.80.4.txt
>
> Getting Ceph
> 
>
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://ceph.com/download/ceph-0.80.4.tar.gz
> * For packages, see http://ceph.com/docs/master/install/get-packages
> * For ceph-deploy, see
> http://ceph.com/docs/master/install/install-ceph-deploy
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time

2014-07-16 Thread Andrija Panic
For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used...
I went through pain of waiting for data rebalancing and now I'm on
"optimal" tunables...
Cheers


On 16 July 2014 14:29, Andrei Mikhailovsky  wrote:

> Quenten,
>
> We've got two monitors sitting on the osd servers and one on a different
> server.
>
> Andrei
>
> --
> Andrei Mikhailovsky
> Director
> Arhont Information Security
>
> Web: http://www.arhont.com
> http://www.wi-foo.com
> Tel: +44 (0)870 4431337
> Fax: +44 (0)208 429 3111
> PGP: Key ID - 0x2B3438DE
> PGP: Server - keyserver.pgp.com
>
> DISCLAIMER
>
> The information contained in this email is intended only for the use of
> the person(s) to whom it is addressed and may be confidential or contain
> legally privileged information. If you are not the intended recipient you
> are hereby notified that any perusal, use, distribution, copying or
> disclosure is strictly prohibited. If you have received this email in error
> please immediately advise us by return email at and...@arhont.com and
> delete and purge the email and any attachments without making a copy.
>
>
> --
> *From: *"Quenten Grasso" 
> *To: *"Andrija Panic" , "Sage Weil" <
> sw...@redhat.com>
> *Cc: *ceph-users@lists.ceph.com
> *Sent: *Wednesday, 16 July, 2014 1:20:19 PM
>
> *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
> OSD at the same time
>
> Hi Sage, Andrija & List
>
>
>
> I have seen the tuneables issue on our cluster when I upgraded to firefly.
>
>
>
> I ended up going back to legacy settings after about an hour as my cluster
> is of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32%
> of our data, which after an hour all of our vm’s were frozen and I had to
> revert the change back to legacy settings and wait about the same time
> again until our cluster had recovered and reboot our vms. (wasn’t really
> expecting that one from the patch notes)
>
>
>
> Also our CPU usage went through the roof as well on our nodes, do you per
> chance have your metadata servers co-located on your osd nodes as we do?
>  I’ve been thinking about trying to move these to dedicated nodes as it may
> resolve our issues.
>
>
>
> Regards,
>
> Quenten
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Andrija Panic
> *Sent:* Tuesday, 15 July 2014 8:38 PM
> *To:* Sage Weil
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new
> OSD at the same time
>
>
>
> Hi Sage,
>
>
>
> since this problem is tunables-related, do we need to expect same behavior
> or not  when we do regular data rebalancing caused by adding new/removing
> OSD? I guess not, but would like your confirmation.
>
> I'm already on optimal tunables, but I'm afraid to test this by i.e.
> shuting down 1 OSD.
>
>
>
> Thanks,
> Andrija
>
>
>
> On 14 July 2014 18:18, Sage Weil  wrote:
>
> I've added some additional notes/warnings to the upgrade and release
> notes:
>
>
> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
>
> If there is somewhere else where you think a warning flag would be useful,
> let me know!
>
> Generally speaking, we want to be able to cope with huge data rebalances
> without interrupting service.  It's an ongoing process of improving the
> recovery vs client prioritization, though, and removing sources of
> overhead related to rebalancing... and it's clearly not perfect yet. :/
>
> sage
>
>
>
> On Sun, 13 Jul 2014, Andrija Panic wrote:
>
> > Hi,
> > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd
> crush
> > tunables optimal" and after only few minutes I have added 2 more OSDs to
> the
> > CEPH cluster...
> >
> > So these 2 changes were more or a less done at the same time -
> rebalancing
> > because of tunables optimal, and rebalancing because of adding new OSD...
> >
> > Result - all VMs living on CEPH storage have gone mad, no disk access
> > efectively, blocked so to speak.
> >
> > Since this rebalancing took 5h-6h, I had bunch of VMs down for that
> long...
> >
> > Did I do wrong by causing "2 rebalancing" to happen at the same time ?
> > Is this behaviour normal, to cause great load on all VMs because they are
> > unable to access CEPH storage efectively ?
> >
> > Thanks for any input...
> > --
> >
>
> > Andrija Pani?
> >
> >
>
>
>
>
>
> --
>
>
>
> Andrija Panić
>
> --
>
>   http://admintweets.com
>
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hi,

we just had some new clients, and have suffered very big degradation in
CEPH performance for some reasons (we are using CloudStack).

I'm wondering if there is way to monitor OP/s or similar usage by client
connected, so we can isolate the heavy client ?

Also, what is the general best practice to monitor these kind of changes in
CEPH ? I'm talking about R/W or OP/s change or similar...

Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
some precise OP/s per ceph Image at least...
Will check CloudStack then...

Thx


On 8 August 2014 13:53, Wido den Hollander  wrote:

> On 08/08/2014 01:51 PM, Andrija Panic wrote:
>
>> Hi,
>>
>> we just had some new clients, and have suffered very big degradation in
>> CEPH performance for some reasons (we are using CloudStack).
>>
>> I'm wondering if there is way to monitor OP/s or similar usage by client
>> connected, so we can isolate the heavy client ?
>>
>>
> This is not very easy to do with Ceph, but CloudStack keeps track of this
> in the usage database.
>
> With never versions of CloudStack you can also limit the IOps of Instances
> to prevent such situations.
>
>  Also, what is the general best practice to monitor these kind of changes
>> in CEPH ? I'm talking about R/W or OP/s change or similar...
>>
>> Thanks,
>> --
>>
>> Andrija Panić
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hm, true...
One final question, I might be a noob...
13923 B/s rd, 4744 kB/s wr, 1172 op/s
what does this op/s represent - is it classic IOps (4k reads/writes) or
something else ? how much is too much :)  - I'm familiar with SATA/SSD IO/s
specs/tests, etc, but not sure what CEPH menas by op/s - could not find
anything with google...

Thanks again Wido.
Andrija


On 8 August 2014 14:07, Wido den Hollander  wrote:

> On 08/08/2014 02:02 PM, Andrija Panic wrote:
>
>> Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer
>> some precise OP/s per ceph Image at least...
>> Will check CloudStack then...
>>
>>
> Ceph doesn't really know that since RBD is just a layer on top of RADOS.
> In the end the CloudStack hypervisors are doing I/O towards RADOS objects,
> so giving exact stats of how many IOps you are seeing per image is hard to
> figure out.
>
> The hypervisor knows this best since it sees all the I/O going through.
>
> Wido
>
>  Thx
>>
>>
>> On 8 August 2014 13:53, Wido den Hollander > <mailto:w...@42on.com>> wrote:
>>
>> On 08/08/2014 01:51 PM, Andrija Panic wrote:
>>
>> Hi,
>>
>> we just had some new clients, and have suffered very big
>> degradation in
>> CEPH performance for some reasons (we are using CloudStack).
>>
>> I'm wondering if there is way to monitor OP/s or similar usage
>> by client
>> connected, so we can isolate the heavy client ?
>>
>>
>> This is not very easy to do with Ceph, but CloudStack keeps track of
>> this in the usage database.
>>
>> With never versions of CloudStack you can also limit the IOps of
>> Instances to prevent such situations.
>>
>> Also, what is the general best practice to monitor these kind of
>> changes
>> in CEPH ? I'm talking about R/W or OP/s change or similar...
>>
>> Thanks,
>> --
>>
>> Andrija Panić
>>
>>
>>
>> _
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902 
>> Skype: contact42on
>> _
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
>>
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>> --
>> http://admintweets.com
>> --
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Hi Dan,

thank you very much for the script, will check it out...no thortling so
far, but I guess it will have to be done...

This seems to read only gziped logs? so since read only I guess it is safe
to run it on proudction cluster now... ?
The script will also check for mulitply OSDs as far as I can understadn,
not just osd.0 given in script comment ?

Thanks a lot.
Andrija




On 8 August 2014 15:44, Dan Van Der Ster  wrote:

>  Hi,
> Here’s what we do to identify our top RBD users.
>
>  First, enable log level 10 for the filestore so you can see all the IOs
> coming from the VMs. Then use a script like this (used on a dumpling
> cluster):
>
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>
>  to summarize the osd logs and identify the top clients.
>
>  Then its just a matter of scripting to figure out the ops/sec per
> volume, but for us at least the main use-case has been to identify who is
> responsible for a new peak in overall ops — and daily-granular statistics
> from the above script tends to suffice.
>
>  BTW, do you throttle your clients? We found that its absolutely
> necessary, since without a throttle just a few active VMs can eat up the
> entire iops capacity of the cluster.
>
>  Cheers, Dan
>
> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>
>
>  On 08 Aug 2014, at 13:51, Andrija Panic  wrote:
>
>  Hi,
>
>  we just had some new clients, and have suffered very big degradation in
> CEPH performance for some reasons (we are using CloudStack).
>
>  I'm wondering if there is way to monitor OP/s or similar usage by client
> connected, so we can isolate the heavy client ?
>
>  Also, what is the general best practice to monitor these kind of changes
> in CEPH ? I'm talking about R/W or OP/s change or similar...
>
>  Thanks,
> --
>
> Andrija Panić
>
>   ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Thanks again, and btw, beside being Friday I'm also on vacation - so double
the joy of troubleshooting performance problmes :)))

Thx :)


On 8 August 2014 16:01, Dan Van Der Ster  wrote:

>  Hi,
>
>  On 08 Aug 2014, at 15:55, Andrija Panic  wrote:
>
>  Hi Dan,
>
>  thank you very much for the script, will check it out...no thortling so
> far, but I guess it will have to be done...
>
>  This seems to read only gziped logs?
>
>
>  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
> files in the current script. But you can change that pretty trivially ;)
>
>  so since read only I guess it is safe to run it on proudction cluster
> now… ?
>
>
>  I personally don’t do anything new on a Friday just before leaving ;)
>
>  But its just grepping the log files, so start with one, then two, then...
>
>   The script will also check for mulitply OSDs as far as I can
> understadn, not just osd.0 given in script comment ?
>
>
>  Yup, what I do is gather all of the OSD logs for a single day in a
> single directory (in CephFS ;), then run that script on all of the OSDs. It
> takes awhile, but it will give you the overall daily totals for the whole
> cluster.
>
>  If you are only trying to find the top users, then it is sufficient to
> check a subset of OSDs, since by their nature the client IOs are spread
> across most/all OSDs.
>
>  Cheers, Dan
>
>  Thanks a lot.
> Andrija
>
>
>
>
> On 8 August 2014 15:44, Dan Van Der Ster 
> wrote:
>
>> Hi,
>> Here’s what we do to identify our top RBD users.
>>
>>  First, enable log level 10 for the filestore so you can see all the IOs
>> coming from the VMs. Then use a script like this (used on a dumpling
>> cluster):
>>
>>
>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>
>>  to summarize the osd logs and identify the top clients.
>>
>>  Then its just a matter of scripting to figure out the ops/sec per
>> volume, but for us at least the main use-case has been to identify who is
>> responsible for a new peak in overall ops — and daily-granular statistics
>> from the above script tends to suffice.
>>
>>  BTW, do you throttle your clients? We found that its absolutely
>> necessary, since without a throttle just a few active VMs can eat up the
>> entire iops capacity of the cluster.
>>
>>  Cheers, Dan
>>
>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>
>>
>>   On 08 Aug 2014, at 13:51, Andrija Panic 
>> wrote:
>>
>>Hi,
>>
>>  we just had some new clients, and have suffered very big degradation in
>> CEPH performance for some reasons (we are using CloudStack).
>>
>>  I'm wondering if there is way to monitor OP/s or similar usage by
>> client connected, so we can isolate the heavy client ?
>>
>>  Also, what is the general best practice to monitor these kind of
>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>
>>  Thanks,
>> --
>>
>> Andrija Panić
>>
>>___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
>  --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-08 Thread Andrija Panic
Will do so definitively, thanks Wido and Dan...
Cheers guys


On 8 August 2014 16:13, Wido den Hollander  wrote:

> On 08/08/2014 03:44 PM, Dan Van Der Ster wrote:
>
>> Hi,
>> Here’s what we do to identify our top RBD users.
>>
>> First, enable log level 10 for the filestore so you can see all the IOs
>> coming from the VMs. Then use a script like this (used on a dumpling
>> cluster):
>>
>> https://github.com/cernceph/ceph-scripts/blob/master/
>> tools/rbd-io-stats.pl
>>
>> to summarize the osd logs and identify the top clients.
>>
>> Then its just a matter of scripting to figure out the ops/sec per
>> volume, but for us at least the main use-case has been to identify who
>> is responsible for a new peak in overall ops — and daily-granular
>> statistics from the above script tends to suffice.
>>
>> BTW, do you throttle your clients? We found that its absolutely
>> necessary, since without a throttle just a few active VMs can eat up the
>> entire iops capacity of the cluster.
>>
>
> +1
>
> I'd strongly advise to set I/O limits for Instances. I've had multiple
> occasions where a runaway script inside a VM was hammering on the
> underlying storage killing all I/O.
>
> Not only with Ceph, but over the many years I've worked with storage. I/O
> == expensive
>
> CloudStack supports I/O limiting, so I recommend you set a limit. Set it
> to 750 write IOps for example. That way one Instance can't kill the whole
> cluster, but it still has enough I/O to run. (usually).
>
> Wido
>
>
>> Cheers, Dan
>>
>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>
>>
>> On 08 Aug 2014, at 13:51, Andrija Panic > <mailto:andrija.pa...@gmail.com>> wrote:
>>
>>  Hi,
>>>
>>> we just had some new clients, and have suffered very big degradation
>>> in CEPH performance for some reasons (we are using CloudStack).
>>>
>>> I'm wondering if there is way to monitor OP/s or similar usage by
>>> client connected, so we can isolate the heavy client ?
>>>
>>> Also, what is the general best practice to monitor these kind of
>>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>>
>>> Thanks,
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
Hi Dan,

the script provided seems to not work on my ceph cluster :(
This is ceph version 0.80.3

I get empty results, on both debug level 10 and the maximum level of 20...

[root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
Writes per OSD:
Writes per pool:
Writes per PG:
Writes per RBD:
Writes per object:
Writes per length:
.
.
.




On 8 August 2014 16:01, Dan Van Der Ster  wrote:

>  Hi,
>
>  On 08 Aug 2014, at 15:55, Andrija Panic  wrote:
>
>  Hi Dan,
>
>  thank you very much for the script, will check it out...no thortling so
> far, but I guess it will have to be done...
>
>  This seems to read only gziped logs?
>
>
>  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
> files in the current script. But you can change that pretty trivially ;)
>
>  so since read only I guess it is safe to run it on proudction cluster
> now… ?
>
>
>  I personally don’t do anything new on a Friday just before leaving ;)
>
>  But its just grepping the log files, so start with one, then two, then...
>
>   The script will also check for mulitply OSDs as far as I can
> understadn, not just osd.0 given in script comment ?
>
>
>  Yup, what I do is gather all of the OSD logs for a single day in a
> single directory (in CephFS ;), then run that script on all of the OSDs. It
> takes awhile, but it will give you the overall daily totals for the whole
> cluster.
>
>  If you are only trying to find the top users, then it is sufficient to
> check a subset of OSDs, since by their nature the client IOs are spread
> across most/all OSDs.
>
>  Cheers, Dan
>
>  Thanks a lot.
> Andrija
>
>
>
>
> On 8 August 2014 15:44, Dan Van Der Ster 
> wrote:
>
>> Hi,
>> Here’s what we do to identify our top RBD users.
>>
>>  First, enable log level 10 for the filestore so you can see all the IOs
>> coming from the VMs. Then use a script like this (used on a dumpling
>> cluster):
>>
>>
>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>
>>  to summarize the osd logs and identify the top clients.
>>
>>  Then its just a matter of scripting to figure out the ops/sec per
>> volume, but for us at least the main use-case has been to identify who is
>> responsible for a new peak in overall ops — and daily-granular statistics
>> from the above script tends to suffice.
>>
>>  BTW, do you throttle your clients? We found that its absolutely
>> necessary, since without a throttle just a few active VMs can eat up the
>> entire iops capacity of the cluster.
>>
>>  Cheers, Dan
>>
>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>
>>
>>   On 08 Aug 2014, at 13:51, Andrija Panic 
>> wrote:
>>
>>Hi,
>>
>>  we just had some new clients, and have suffered very big degradation in
>> CEPH performance for some reasons (we are using CloudStack).
>>
>>  I'm wondering if there is way to monitor OP/s or similar usage by
>> client connected, so we can isolate the heavy client ?
>>
>>  Also, what is the general best practice to monitor these kind of
>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>
>>  Thanks,
>> --
>>
>> Andrija Panić
>>
>>___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
>  --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>
>
>


-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
I appologize, clicked the Send button to fast...

Anyway, I can see there are lines in log file:
2014-08-11 12:43:25.477693 7f022d257700 10
filestore(/var/lib/ceph/osd/ceph-0) write
3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
3641344~4608 = 4608
Not sure if I can do anything to fix this... ?

Thanks,
Andrija



On 11 August 2014 12:46, Andrija Panic  wrote:

> Hi Dan,
>
> the script provided seems to not work on my ceph cluster :(
> This is ceph version 0.80.3
>
> I get empty results, on both debug level 10 and the maximum level of 20...
>
> [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
> Writes per OSD:
> Writes per pool:
> Writes per PG:
> Writes per RBD:
> Writes per object:
> Writes per length:
> .
> .
> .
>
>
>
>
> On 8 August 2014 16:01, Dan Van Der Ster 
> wrote:
>
>>  Hi,
>>
>>  On 08 Aug 2014, at 15:55, Andrija Panic  wrote:
>>
>>  Hi Dan,
>>
>>  thank you very much for the script, will check it out...no thortling so
>> far, but I guess it will have to be done...
>>
>>  This seems to read only gziped logs?
>>
>>
>>  Well it’s pretty simple, and it zcat’s each input file. So yes, only gz
>> files in the current script. But you can change that pretty trivially ;)
>>
>>  so since read only I guess it is safe to run it on proudction cluster
>> now… ?
>>
>>
>>  I personally don’t do anything new on a Friday just before leaving ;)
>>
>>  But its just grepping the log files, so start with one, then two,
>> then...
>>
>>   The script will also check for mulitply OSDs as far as I can
>> understadn, not just osd.0 given in script comment ?
>>
>>
>>  Yup, what I do is gather all of the OSD logs for a single day in a
>> single directory (in CephFS ;), then run that script on all of the OSDs. It
>> takes awhile, but it will give you the overall daily totals for the whole
>> cluster.
>>
>>  If you are only trying to find the top users, then it is sufficient to
>> check a subset of OSDs, since by their nature the client IOs are spread
>> across most/all OSDs.
>>
>>  Cheers, Dan
>>
>>  Thanks a lot.
>> Andrija
>>
>>
>>
>>
>> On 8 August 2014 15:44, Dan Van Der Ster 
>> wrote:
>>
>>> Hi,
>>> Here’s what we do to identify our top RBD users.
>>>
>>>  First, enable log level 10 for the filestore so you can see all the
>>> IOs coming from the VMs. Then use a script like this (used on a dumpling
>>> cluster):
>>>
>>>
>>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>>
>>>  to summarize the osd logs and identify the top clients.
>>>
>>>  Then its just a matter of scripting to figure out the ops/sec per
>>> volume, but for us at least the main use-case has been to identify who is
>>> responsible for a new peak in overall ops — and daily-granular statistics
>>> from the above script tends to suffice.
>>>
>>>  BTW, do you throttle your clients? We found that its absolutely
>>> necessary, since without a throttle just a few active VMs can eat up the
>>> entire iops capacity of the cluster.
>>>
>>>  Cheers, Dan
>>>
>>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>>
>>>
>>>   On 08 Aug 2014, at 13:51, Andrija Panic 
>>> wrote:
>>>
>>>Hi,
>>>
>>>  we just had some new clients, and have suffered very big degradation
>>> in CEPH performance for some reasons (we are using CloudStack).
>>>
>>>  I'm wondering if there is way to monitor OP/s or similar usage by
>>> client connected, so we can isolate the heavy client ?
>>>
>>>  Also, what is the general best practice to monitor these kind of
>>> changes in CEPH ? I'm talking about R/W or OP/s change or similar...
>>>
>>>  Thanks,
>>> --
>>>
>>> Andrija Panić
>>>
>>>___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>>
>>  --
>>
>> Andrija Panić
>> --
>>   http://admintweets.com
>> --
>>
>>
>>
>
>
> --
>
> Andrija Panić
> --
>   http://admintweets.com
> --
>



-- 

Andrija Panić
--
  http://admintweets.com
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show IOps per VM/client to find heavy users...

2014-08-11 Thread Andrija Panic
That's better :D

Thanks a lot, now I will be able to troubleshoot my problem :)

Thanks Dan,
Andrija


On 11 August 2014 13:21, Dan Van Der Ster  wrote:

>  Hi,
> I changed the script to be a bit more flexible with the osd path. Give
> this a try again:
> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
> Cheers, Dan
>
> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>
>
>  On 11 Aug 2014, at 12:48, Andrija Panic  wrote:
>
>  I appologize, clicked the Send button to fast...
>
>  Anyway, I can see there are lines in log file:
> 2014-08-11 12:43:25.477693 7f022d257700 10
> filestore(/var/lib/ceph/osd/ceph-0) write
> 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3
> 3641344~4608 = 4608
>  Not sure if I can do anything to fix this... ?
>
>  Thanks,
> Andrija
>
>
>
> On 11 August 2014 12:46, Andrija Panic  wrote:
>
>> Hi Dan,
>>
>>  the script provided seems to not work on my ceph cluster :(
>> This is ceph version 0.80.3
>>
>>  I get empty results, on both debug level 10 and the maximum level of
>> 20...
>>
>>  [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz
>> Writes per OSD:
>> Writes per pool:
>>  Writes per PG:
>>  Writes per RBD:
>>  Writes per object:
>>  Writes per length:
>>  .
>>  .
>> .
>>
>>
>>
>>
>> On 8 August 2014 16:01, Dan Van Der Ster 
>> wrote:
>>
>>> Hi,
>>>
>>>  On 08 Aug 2014, at 15:55, Andrija Panic 
>>> wrote:
>>>
>>>  Hi Dan,
>>>
>>>  thank you very much for the script, will check it out...no thortling
>>> so far, but I guess it will have to be done...
>>>
>>>  This seems to read only gziped logs?
>>>
>>>
>>>  Well it’s pretty simple, and it zcat’s each input file. So yes, only
>>> gz files in the current script. But you can change that pretty trivially ;)
>>>
>>>  so since read only I guess it is safe to run it on proudction cluster
>>> now… ?
>>>
>>>
>>>  I personally don’t do anything new on a Friday just before leaving ;)
>>>
>>>  But its just grepping the log files, so start with one, then two,
>>> then...
>>>
>>>   The script will also check for mulitply OSDs as far as I can
>>> understadn, not just osd.0 given in script comment ?
>>>
>>>
>>>  Yup, what I do is gather all of the OSD logs for a single day in a
>>> single directory (in CephFS ;), then run that script on all of the OSDs. It
>>> takes awhile, but it will give you the overall daily totals for the whole
>>> cluster.
>>>
>>>  If you are only trying to find the top users, then it is sufficient to
>>> check a subset of OSDs, since by their nature the client IOs are spread
>>> across most/all OSDs.
>>>
>>>  Cheers, Dan
>>>
>>>  Thanks a lot.
>>> Andrija
>>>
>>>
>>>
>>>
>>> On 8 August 2014 15:44, Dan Van Der Ster 
>>> wrote:
>>>
>>>> Hi,
>>>> Here’s what we do to identify our top RBD users.
>>>>
>>>>  First, enable log level 10 for the filestore so you can see all the
>>>> IOs coming from the VMs. Then use a script like this (used on a dumpling
>>>> cluster):
>>>>
>>>>
>>>> https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl
>>>>
>>>>  to summarize the osd logs and identify the top clients.
>>>>
>>>>  Then its just a matter of scripting to figure out the ops/sec per
>>>> volume, but for us at least the main use-case has been to identify who is
>>>> responsible for a new peak in overall ops — and daily-granular statistics
>>>> from the above script tends to suffice.
>>>>
>>>>  BTW, do you throttle your clients? We found that its absolutely
>>>> necessary, since without a throttle just a few active VMs can eat up the
>>>> entire iops capacity of the cluster.
>>>>
>>>>  Cheers, Dan
>>>>
>>>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>>>
>>>>
>>>>   On 08 Aug 2014, at 13:51, Andrija Panic 
>>>> wrote:
>>>>
>>>>Hi,
>>>>
>>>>  we just had some new clients, and have suffered very big degradation
>>>

Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Andrija Panic
Hi,

small update:

in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in
between of each SSD death) - cant believe it - NOT due to wearing out... I
really hope we got efective series from suplier...

Regards

On 18 April 2015 at 14:24, Andrija Panic  wrote:

> yes I know, but to late now, I'm afraid :)
>
> On 18 April 2015 at 14:18, Josef Johansson  wrote:
>
>> Have you looked into the samsung 845 dc? They are not that expensive last
>> time I checked.
>>
>> /Josef
>> On 18 Apr 2015 13:15, "Andrija Panic"  wrote:
>>
>>> might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but
>>> these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times
>>> at least faster on sequential, and more than 3 times faser on random/IOPS
>>> measures.
>>> And ofcourse modern enterprise drives = ...
>>>
>>> On 18 April 2015 at 12:42, Mark Kirkwood 
>>> wrote:
>>>
>>>> Yes, it sure is - my experience with 'consumer' SSD is that they die
>>>> with obscure firmware bugs (wrong capacity, zero capacity, not detected in
>>>> bios anymore) rather than flash wearout. It seems that the 'enterprise'
>>>> tagged drives are less inclined to suffer this fate.
>>>>
>>>> Regards
>>>>
>>>> Mark
>>>>
>>>> On 18/04/15 22:23, Andrija Panic wrote:
>>>>
>>>>> these 2 drives, are on the regular SATA (on board)controler, and beside
>>>>> this, there is 12 x 4TB on the fron of the servers - normal backplane
>>>>> on
>>>>> the front.
>>>>>
>>>>> Anyway, we are going to check those dead SSDs on a pc/laptop or so,just
>>>>> to confirm they are really dead - but this is the way they die, not
>>>>> wear
>>>>> out, but simply show different space instead of real one - thse were 3
>>>>> months old only when they died...
>>>>>
>>>>> On 18 April 2015 at 11:55, Josef Johansson >>>> <mailto:jose...@gmail.com>> wrote:
>>>>>
>>>>> If the same chassi/chip/backplane is behind both drives and maybe
>>>>> other drives in the chassi have troubles,it may be a defect there
>>>>> as
>>>>> well.
>>>>>
>>>>> On 18 Apr 2015 09:42, "Steffen W Sørensen" >>>> <mailto:ste...@me.com>> wrote:
>>>>>
>>>>>
>>>>>  > On 17/04/2015, at 21.07, Andrija Panic
>>>>> mailto:andrija.pa...@gmail.com>>
>>>>> wrote:
>>>>>  >
>>>>>  > nahSamsun 850 PRO 128GB - dead after 3months - 2 of
>>>>> these
>>>>> died... wearing level is 96%, so only 4% wasted... (yes I know
>>>>> these are not enterprise,etc… )
>>>>> Damn… but maybe your surname says it all - Don’t Panic :) But
>>>>> making sure same type of SSD devices ain’t of near same age and
>>>>> doing preventive replacement rotation might be good practice I
>>>>> guess.
>>>>>
>>>>> /Steffen
>>>>>
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Andrija Panić
>>>>>
>>>>>
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replace dead SSD journal

2015-05-06 Thread Andrija Panic
Well, seems like they are on satellite :)

On 6 May 2015 at 02:58, Matthew Monaco  wrote:

> On 05/05/2015 08:55 AM, Andrija Panic wrote:
> > Hi,
> >
> > small update:
> >
> > in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in
> > between of each SSD death) - cant believe it - NOT due to wearing out...
> I
> > really hope we got efective series from suplier...
> >
>
> That's ridiculous. Are these drives mounted un-shielded on a satellite? I
> didn't
> know the ISS had a ceph cluster.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Andrija Panic
Guys,

I'm Igor's colleague, working a bit on CEPH,  together with Igor.

This is production cluster, and we are becoming more desperate as the time
goes by.

Im not sure if this is appropriate place to seek commercial support, but
anyhow, I do it...

If anyone feels like and have some experience in this particular PG
troubleshooting issues, we are also ready to seek for commercial support to
solve our issue, company or individual, it doesn't matter.


Thanks,
Andrija

On 20 August 2015 at 19:07, Voloshanenko Igor 
wrote:

> Inktank:
>
> https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
>
> Mail-list:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
>
> 2015-08-20 20:06 GMT+03:00 Samuel Just :
>
>> Which docs?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
>>  wrote:
>> > Not yet. I will create.
>> > But according to mail lists and Inktank docs - it's expected behaviour
>> when
>> > cache enable
>> >
>> > 2015-08-20 19:56 GMT+03:00 Samuel Just :
>> >>
>> >> Is there a bug for this in the tracker?
>> >> -Sam
>> >>
>> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
>> >>  wrote:
>> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
>> >> > take
>> >> > snapshot - data not proper update in cache layer, and client (ceph)
>> see
>> >> > damaged snap.. As headers requested from cache layer.
>> >> >
>> >> > 2015-08-20 19:53 GMT+03:00 Samuel Just :
>> >> >>
>> >> >> What was the issue?
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
>> >> >>  wrote:
>> >> >> > Samuel, we turned off cache layer few hours ago...
>> >> >> > I will post ceph.log in few minutes
>> >> >> >
>> >> >> > For snap - we found issue, was connected with cache tier..
>> >> >> >
>> >> >> > 2015-08-20 19:23 GMT+03:00 Samuel Just :
>> >> >> >>
>> >> >> >> Ok, you appear to be using a replicated cache tier in front of a
>> >> >> >> replicated base tier.  Please scrub both inconsistent pgs and
>> post
>> >> >> >> the
>> >> >> >> ceph.log from before when you started the scrub until after.
>> Also,
>> >> >> >> what command are you using to take snapshots?
>> >> >> >> -Sam
>> >> >> >>
>> >> >> >> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
>> >> >> >>  wrote:
>> >> >> >> > Hi Samuel, we try to fix it in trick way.
>> >> >> >> >
>> >> >> >> > we check all rbd_data chunks from logs (OSD) which are
>> affected,
>> >> >> >> > then
>> >> >> >> > query
>> >> >> >> > rbd info to compare which rbd consist bad rbd_data, after that
>> we
>> >> >> >> > mount
>> >> >> >> > this
>> >> >> >> > rbd as rbd0, create empty rbd, and DD all info from bad volume
>> to
>> >> >> >> > new
>> >> >> >> > one.
>> >> >> >> >
>> >> >> >> > But after that - scrub errors growing... Was 15 errors.. .Now
>> >> >> >> > 35...
>> >> >> >> > We
>> >> >> >> > laos
>> >> >> >> > try to out OSD which was lead, but after rebalancing this 2 pgs
>> >> >> >> > still
>> >> >> >> > have
>> >> >> >> > 35 scrub errors...
>> >> >> >> >
>> >> >> >> > ceph osd getmap -o  - attached
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2015-08-18 18:48 GMT+03:00 Samuel Just :
>> >> >> >> >>
>> >> >> >> >> Is the number of inconsistent objects growing?  Can you attach
>> >> >> >> >> the
>> >> >> >> >> whole ceph.log from the 6 hours before and after the snippet
>> you
>> >> >> >> >> linked above?  Are you using cache/tiering?  Can you attach
>> the
>> >> >> >> >> osdmap
>> >> >> >> >> (ceph osd getmap -o )?
>> >> >> >> >> -Sam
>> >> >> >> >>
>> >> >> >> >> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
>> >> >> >> >>  wrote:
>> >> >> >> >> > ceph - 0.94.2
>> >> >> >> >> > Its happen during rebalancing
>> >> >> >> >> >
>> >> >> >> >> > I thought too, that some OSD miss copy, but looks like all
>> >> >> >> >> > miss...
>> >> >> >> >> > So any advice in which direction i need to go
>> >> >> >> >> >
>> >> >> >> >> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum <
>> gfar...@redhat.com>:
>> >> >> >> >> >>
>> >> >> >> >> >> From a quick peek it looks like some of the OSDs are
>> missing
>> >> >> >> >> >> clones
>> >> >> >> >> >> of
>> >> >> >> >> >> objects. I'm not sure how that could happen and I'd expect
>> the
>> >> >> >> >> >> pg
>> >> >> >> >> >> repair to handle that but if it's not there's probably
>> >> >> >> >> >> something
>> >> >> >> >> >> wrong; what version of Ceph are you running? Sam, is this
>> >> >> >> >> >> something
>> >> >> >> >> >> you've seen, a new bug, or some kind of config issue?
>> >> >> >> >> >> -Greg
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
>> >> >> >> >> >>  wrote:
>> >> >> >> >> >> > Hi all, at our production cluster, due high rebalancing
>> (((
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have 2
>> >> >> >> >> >> > pgs in
>> >> >> >> >> >> > inconsistent state...
>> >> >> >> >> >> >
>> >> >> >> >> >> > root@temp:~# ceph health detail | grep inc
>> >> >> >> >> >> > HEALTH_ERR 2 pgs inconsistent; 18

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Andrija Panic
This was related to the caching layer, which doesnt support snapshooting
per docs...for sake of closing the thread.

On 17 August 2015 at 21:15, Voloshanenko Igor 
wrote:

> Hi all, can you please help me with unexplained situation...
>
> All snapshot inside ceph broken...
>
> So, as example, we have VM template, as rbd inside ceph.
> We can map it and mount to check that all ok with it
>
> root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
> /dev/rbd0
> root@test:~# parted /dev/rbd0 print
> Model: Unknown (unknown)
> Disk /dev/rbd0: 10.7GB
> Sector size (logical/physical): 512B/512B
> Partition Table: msdos
>
> Number  Start   End SizeType File system  Flags
>  1  1049kB  525MB   524MB   primary  ext4 boot
>  2  525MB   10.7GB  10.2GB  primary   lvm
>
> Than i want to create snap, so i do:
> root@test:~# rbd snap create
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
>
> And now i want to map it:
>
> root@test:~# rbd map
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> /dev/rbd1
> root@test:~# parted /dev/rbd1 print
> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>  /dev/rbd1 has been opened read-only.
> Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
>  /dev/rbd1 has been opened read-only.
> Error: /dev/rbd1: unrecognised disk label
>
> Even md5 different...
> root@ix-s2:~# md5sum /dev/rbd0
> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
> root@ix-s2:~# md5sum /dev/rbd1
> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>
>
> Ok, now i protect snap and create clone... but same thing...
> md5 for clone same as for snap,,
>
> root@test:~# rbd unmap /dev/rbd1
> root@test:~# rbd snap protect
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> root@test:~# rbd clone
> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
> cold-storage/test-image
> root@test:~# rbd map cold-storage/test-image
> /dev/rbd1
> root@test:~# md5sum /dev/rbd1
> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
>
>  but it's broken...
> root@test:~# parted /dev/rbd1 print
> Error: /dev/rbd1: unrecognised disk label
>
>
> =
>
> tech details:
>
> root@test:~# ceph -v
> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>
> We have 2 inconstistent pgs, but all images not placed on this pgs...
>
> root@test:~# ceph health detail
> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
> pg 2.490 is active+clean+inconsistent, acting [56,15,29]
> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
> 18 scrub errors
>
> 
>
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70 (2.770) -> up
> ([37,15,14], p37) acting ([37,15,14], p37)
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3 (2.4a3) ->
> up ([12,23,17], p12) acting ([12,23,17], p12)
> root@test:~# ceph osd map cold-storage
> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
> osdmap e16770 pool 'cold-storage' (2) object
> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg 2.9519c2a9
> (2.2a9) -> up ([12,44,23], p12) acting ([12,44,23], p12)
>
>
> Also we use cache layer, which in current moment - in forward mode...
>
> Can you please help me with this.. As my brain stop to understand what is
> going on...
>
> Thank in advance!
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
Make sure you test what ever you decide. We just learned this the hard way
with samsung 850 pro, which is total crap, more than you could imagine...

Andrija
On Aug 25, 2015 11:25 AM, "Jan Schermer"  wrote:

> I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
> Very cheap, better than Intel 3610 for sure (and I think it beats even
> 3700).
>
> Jan
>
> > On 25 Aug 2015, at 11:23, Christopher Kunz 
> wrote:
> >
> > Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
> >> Hi,
> >>
> >> most of the times I do get the recommendation from resellers to go with
> >> the intel s3700 for the journalling.
> >>
> > Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it
> > is cheaper than S3700.
> >
> > Regards,
> >
> > --ck
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
First read please:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
are  constant performance numbers, meaning avoiding drives cache and
running for longer period of time...
Also if checking with FIO you will get better latencies on intel s3500
(model tested in our case) along with 20X better IOPS results...

We observed original issue by having high speed at begining of i.e. file
transfer inside VM, which than halts to zero... We moved journals back to
HDDs and performans was acceptable...no we are upgrading to intel S3500...

Best
any details on that ?

On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
 wrote:

> Make sure you test what ever you decide. We just learned this the hard way
> with samsung 850 pro, which is total crap, more than you could imagine...
>
> Andrija
> On Aug 25, 2015 11:25 AM, "Jan Schermer"  wrote:
>
> > I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
> > Very cheap, better than Intel 3610 for sure (and I think it beats even
> > 3700).
> >
> > Jan
> >
> > > On 25 Aug 2015, at 11:23, Christopher Kunz 
> > wrote:
> > >
> > > Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
> > >> Hi,
> > >>
> > >> most of the times I do get the recommendation from resellers to go
with
> > >> the intel s3700 for the journalling.
> > >>
> > > Check out the Intel s3610. 3 drive writes per day for 5 years. Plus,
it
> > > is cheaper than S3700.
> > >
> > > Regards,
> > >
> > > --ck
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >



--
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com
<mailto:mariusz.gronczew...@efigence.com>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
And should I mention that in another CEPH installation we had samsung 850
pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
the system, so not wear out...

Never again we buy Samsung :)
On Aug 25, 2015 11:57 AM, "Andrija Panic"  wrote:

> First read please:
>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
> We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
> are  constant performance numbers, meaning avoiding drives cache and
> running for longer period of time...
> Also if checking with FIO you will get better latencies on intel s3500
> (model tested in our case) along with 20X better IOPS results...
>
> We observed original issue by having high speed at begining of i.e. file
> transfer inside VM, which than halts to zero... We moved journals back to
> HDDs and performans was acceptable...no we are upgrading to intel S3500...
>
> Best
> any details on that ?
>
> On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
>  wrote:
>
> > Make sure you test what ever you decide. We just learned this the hard
> way
> > with samsung 850 pro, which is total crap, more than you could imagine...
> >
> > Andrija
> > On Aug 25, 2015 11:25 AM, "Jan Schermer"  wrote:
> >
> > > I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
> > > Very cheap, better than Intel 3610 for sure (and I think it beats even
> > > 3700).
> > >
> > > Jan
> > >
> > > > On 25 Aug 2015, at 11:23, Christopher Kunz 
> > > wrote:
> > > >
> > > > Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
> > > >> Hi,
> > > >>
> > > >> most of the times I do get the recommendation from resellers to go
> with
> > > >> the intel s3700 for the journalling.
> > > >>
> > > > Check out the Intel s3610. 3 drive writes per day for 5 years. Plus,
> it
> > > > is cheaper than S3700.
> > > >
> > > > Regards,
> > > >
> > > > --ck
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczew...@efigence.com
> <mailto:mariusz.gronczew...@efigence.com>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Andrija Panic
We have some 850 pro 256gb ssds if anyone interested to buy:)

And also there was new 850 pro firmware that broke peoples disk which was
revoked later etc... I'm sticking with only vacuum cleaners from Samsung
for now, maybe... :)
On Aug 25, 2015 12:02 PM, "Voloshanenko Igor" 
wrote:

> To be honest, Samsung 850 PRO not 24/7 series... it's something about
> desktop+ series, but anyway - results from this drives - very very bad in
> any scenario acceptable by real life...
>
> Possible 845 PRO more better, but we don't want to experiment anymore...
> So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and
> no so durable for writes, but we think more better to replace 1 ssd per 1
> year than to pay double price now.
>
> 2015-08-25 12:59 GMT+03:00 Andrija Panic :
>
>> And should I mention that in another CEPH installation we had samsung 850
>> pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
>> the system, so not wear out...
>>
>> Never again we buy Samsung :)
>> On Aug 25, 2015 11:57 AM, "Andrija Panic" 
>> wrote:
>>
>>> First read please:
>>>
>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>
>>> We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
>>> are  constant performance numbers, meaning avoiding drives cache and
>>> running for longer period of time...
>>> Also if checking with FIO you will get better latencies on intel s3500
>>> (model tested in our case) along with 20X better IOPS results...
>>>
>>> We observed original issue by having high speed at begining of i.e. file
>>> transfer inside VM, which than halts to zero... We moved journals back to
>>> HDDs and performans was acceptable...no we are upgrading to intel S3500...
>>>
>>> Best
>>> any details on that ?
>>>
>>> On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
>>>  wrote:
>>>
>>> > Make sure you test what ever you decide. We just learned this the hard
>>> way
>>> > with samsung 850 pro, which is total crap, more than you could
>>> imagine...
>>> >
>>> > Andrija
>>> > On Aug 25, 2015 11:25 AM, "Jan Schermer"  wrote:
>>> >
>>> > > I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
>>> > > Very cheap, better than Intel 3610 for sure (and I think it beats
>>> even
>>> > > 3700).
>>> > >
>>> > > Jan
>>> > >
>>> > > > On 25 Aug 2015, at 11:23, Christopher Kunz 
>>> > > wrote:
>>> > > >
>>> > > > Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
>>> > > >> Hi,
>>> > > >>
>>> > > >> most of the times I do get the recommendation from resellers to
>>> go with
>>> > > >> the intel s3700 for the journalling.
>>> > > >>
>>> > > > Check out the Intel s3610. 3 drive writes per day for 5 years.
>>> Plus, it
>>> > > > is cheaper than S3700.
>>> > > >
>>> > > > Regards,
>>> > > >
>>> > > > --ck
>>> > > > ___
>>> > > > ceph-users mailing list
>>> > > > ceph-users@lists.ceph.com
>>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> > >
>>> > > ___
>>> > > ceph-users mailing list
>>> > > ceph-users@lists.ceph.com
>>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> > >
>>>
>>>
>>>
>>> --
>>> Mariusz Gronczewski, Administrator
>>>
>>> Efigence S. A.
>>> ul. Wołoska 9a, 02-583 Warszawa
>>> T: [+48] 22 380 13 13
>>> F: [+48] 22 380 13 14
>>> E: mariusz.gronczew...@efigence.com
>>> <mailto:mariusz.gronczew...@efigence.com>
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
Hi people,

I had one OSD crash, so the rebalancing happened - all fine (some 3% of the
data has been moved arround, and rebalanced) and my previous
recovery/backfill throtling was applied fine and we didnt have a unusable
cluster.

Now I used the procedure to remove this crashed OSD comletely from the CEPH
(
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
)

and when I used the "ceph osd crush remove osd.0" command, all of a sudden,
CEPH started to rebalance once again, this time with 37% of the object that
are "missplaced" and based on the eperience inside VMs, and the Recovery
RAte in MB/s - I can tell that my throtling of backfilling and recovery is
not taken into consideration.

Why is this, 37% of all objects again being moved arround, any help, hint,
explanation greatly appreciated.

This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash
etc.

The throtling that I have applied from before is like folowing:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

Please advise...
Thanks

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
OK
thx Wido.

Than can we at least update the documentaiton, that will say MAJOR data
rebalancing will happen AGAIN, and not 3%, but 37% in my case.
Because, I would never run this during work hours, while clients are
hammering VMs...

This reminds me of those tunable changes couple of months ago, when my
cluster completely colapsed during data rebalancing...

I don't see any option to contribute to documentation ?

Best




On 2 March 2015 at 16:07, Wido den Hollander  wrote:

> On 03/02/2015 03:56 PM, Andrija Panic wrote:
> > Hi people,
> >
> > I had one OSD crash, so the rebalancing happened - all fine (some 3% of
> the
> > data has been moved arround, and rebalanced) and my previous
> > recovery/backfill throtling was applied fine and we didnt have a unusable
> > cluster.
> >
> > Now I used the procedure to remove this crashed OSD comletely from the
> CEPH
> > (
> >
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
> > )
> >
> > and when I used the "ceph osd crush remove osd.0" command, all of a
> sudden,
> > CEPH started to rebalance once again, this time with 37% of the object
> that
> > are "missplaced" and based on the eperience inside VMs, and the Recovery
> > RAte in MB/s - I can tell that my throtling of backfilling and recovery
> is
> > not taken into consideration.
> >
> > Why is this, 37% of all objects again being moved arround, any help,
> hint,
> > explanation greatly appreciated.
> >
>
> This has been discussed a couple of times on the list. If you remove a
> item from the CRUSHMap, although it has a weight of 0, a rebalance still
> happens since the CRUSHMap changes.
>
> > This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the
> crash
> > etc.
> >
> > The throtling that I have applied from before is like folowing:
> >
> > ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> > ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
> > ceph tell osd.* injectargs '--osd_max_backfills 1'
> >
> > Please advise...
> > Thanks
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
HI Guys,

I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over
37% od the data to rebalance - let's say this is fine (this is when I
removed it frm Crush Map).

I'm wondering - I have previously set some throtling mechanism, but during
first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s -
and VMs were unusable completely, and then last 4h of the duration of
recover this recovery rate went down to, say, 100-200 MB.s and during this
VM performance was still pretty impacted, but at least I could work more or
a less

So my question, is this behaviour expected, is throtling here working as
expected, since first 1h was almoust no throtling applied if I check the
recovery rate 1500MB/s and the impact on Vms.
And last 4h seemed pretty fine (although still lot of impact in general)

I changed these throtling on the fly with:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD,
6 journals on another SSD)  - I have 3 of these hosts.

Any thought are welcome.
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thanks Irek.

Does this mean, that after peering for each PG, there will be delay of
10sec, meaning that every once in a while, I will have 10sec od the cluster
NOT being stressed/overloaded, and then the recovery takes place for that
PG, and then another 10sec cluster is fine, and then stressed again ?

I'm trying to understand process before actually doing stuff (config
reference is there on ceph.com but I don't fully understand the process)

Thanks,
Andrija

On 3 March 2015 at 11:32, Irek Fasikhov  wrote:

> Hi.
>
> Use value "osd_recovery_delay_start"
> example:
> [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
> config show  | grep osd_recovery_delay_start
>   "osd_recovery_delay_start": "10"
>
> 2015-03-03 13:13 GMT+03:00 Andrija Panic :
>
>> HI Guys,
>>
>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
>> over 37% od the data to rebalance - let's say this is fine (this is when I
>> removed it frm Crush Map).
>>
>> I'm wondering - I have previously set some throtling mechanism, but
>> during first 1h of rebalancing, my rate of recovery was going up to 1500
>> MB/s - and VMs were unusable completely, and then last 4h of the duration
>> of recover this recovery rate went down to, say, 100-200 MB.s and during
>> this VM performance was still pretty impacted, but at least I could work
>> more or a less
>>
>> So my question, is this behaviour expected, is throtling here working as
>> expected, since first 1h was almoust no throtling applied if I check the
>> recovery rate 1500MB/s and the impact on Vms.
>> And last 4h seemed pretty fine (although still lot of impact in general)
>>
>> I changed these throtling on the fly with:
>>
>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>> ceph tell osd.* injectargs '--osd_max_backfills 1'
>>
>> My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
>> SSD, 6 journals on another SSD)  - I have 3 of these hosts.
>>
>> Any thought are welcome.
>> --
>>
>> Andrija Panić
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Another question - I mentioned here 37% of objects being moved arround -
this is MISPLACED object (degraded objects were 0.001%, after I removed 1
OSD from cursh map (out of 44 OSD or so).

Can anybody confirm this is normal behaviour - and are there any
workarrounds ?

I understand this is because of the object placement algorithm of CEPH, but
still 37% of object missplaces just by removing 1 OSD from crush maps out
of 44 make me wonder why this large percentage ?

Seems not good to me, and I have to remove another 7 OSDs (we are demoting
some old hardware nodes). This means I can potentialy go with 7 x the same
number of missplaced objects...?

Any thoughts ?

Thanks

On 3 March 2015 at 12:14, Andrija Panic  wrote:

> Thanks Irek.
>
> Does this mean, that after peering for each PG, there will be delay of
> 10sec, meaning that every once in a while, I will have 10sec od the cluster
> NOT being stressed/overloaded, and then the recovery takes place for that
> PG, and then another 10sec cluster is fine, and then stressed again ?
>
> I'm trying to understand process before actually doing stuff (config
> reference is there on ceph.com but I don't fully understand the process)
>
> Thanks,
> Andrija
>
> On 3 March 2015 at 11:32, Irek Fasikhov  wrote:
>
>> Hi.
>>
>> Use value "osd_recovery_delay_start"
>> example:
>> [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
>> config show  | grep osd_recovery_delay_start
>>   "osd_recovery_delay_start": "10"
>>
>> 2015-03-03 13:13 GMT+03:00 Andrija Panic :
>>
>>> HI Guys,
>>>
>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
>>> over 37% od the data to rebalance - let's say this is fine (this is when I
>>> removed it frm Crush Map).
>>>
>>> I'm wondering - I have previously set some throtling mechanism, but
>>> during first 1h of rebalancing, my rate of recovery was going up to 1500
>>> MB/s - and VMs were unusable completely, and then last 4h of the duration
>>> of recover this recovery rate went down to, say, 100-200 MB.s and during
>>> this VM performance was still pretty impacted, but at least I could work
>>> more or a less
>>>
>>> So my question, is this behaviour expected, is throtling here working as
>>> expected, since first 1h was almoust no throtling applied if I check the
>>> recovery rate 1500MB/s and the impact on Vms.
>>> And last 4h seemed pretty fine (although still lot of impact in general)
>>>
>>> I changed these throtling on the fly with:
>>>
>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>>> ceph tell osd.* injectargs '--osd_max_backfills 1'
>>>
>>> My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
>>> SSD, 6 journals on another SSD)  - I have 3 of these hosts.
>>>
>>> Any thought are welcome.
>>> --
>>>
>>> Andrija Panić
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> С уважением, Фасихов Ирек Нургаязович
>> Моб.: +79229045757
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Hi Irek,

yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded
and moved/recovered.
When I after that removed it from Crush map "ceph osd crush rm id", that's
when the stuff with 37% happened.

And thanks Irek for help - could you kindly just let me know of the
prefered steps when removing whole node?
Do you mean I first stop all OSDs again, or just remove each OSD from crush
map, or perhaps, just decompile cursh map, delete the node completely,
compile back in, and let it heal/recover ?

Do you think this would result in less data missplaces and moved arround ?

Sorry for bugging you, I really appreaciate your help.

Thanks

On 3 March 2015 at 12:58, Irek Fasikhov  wrote:

> A large percentage of the rebuild of the cluster map (But low percentage
> degradation). If you had not made "ceph osd crush rm id", the percentage
> would be low.
> In your case, the correct option is to remove the entire node, rather than
> each disk individually
>
> 2015-03-03 14:27 GMT+03:00 Andrija Panic :
>
>> Another question - I mentioned here 37% of objects being moved arround -
>> this is MISPLACED object (degraded objects were 0.001%, after I removed 1
>> OSD from cursh map (out of 44 OSD or so).
>>
>> Can anybody confirm this is normal behaviour - and are there any
>> workarrounds ?
>>
>> I understand this is because of the object placement algorithm of CEPH,
>> but still 37% of object missplaces just by removing 1 OSD from crush maps
>> out of 44 make me wonder why this large percentage ?
>>
>> Seems not good to me, and I have to remove another 7 OSDs (we are
>> demoting some old hardware nodes). This means I can potentialy go with 7 x
>> the same number of missplaced objects...?
>>
>> Any thoughts ?
>>
>> Thanks
>>
>> On 3 March 2015 at 12:14, Andrija Panic  wrote:
>>
>>> Thanks Irek.
>>>
>>> Does this mean, that after peering for each PG, there will be delay of
>>> 10sec, meaning that every once in a while, I will have 10sec od the cluster
>>> NOT being stressed/overloaded, and then the recovery takes place for that
>>> PG, and then another 10sec cluster is fine, and then stressed again ?
>>>
>>> I'm trying to understand process before actually doing stuff (config
>>> reference is there on ceph.com but I don't fully understand the process)
>>>
>>> Thanks,
>>> Andrija
>>>
>>> On 3 March 2015 at 11:32, Irek Fasikhov  wrote:
>>>
>>>> Hi.
>>>>
>>>> Use value "osd_recovery_delay_start"
>>>> example:
>>>> [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
>>>> config show  | grep osd_recovery_delay_start
>>>>   "osd_recovery_delay_start": "10"
>>>>
>>>> 2015-03-03 13:13 GMT+03:00 Andrija Panic :
>>>>
>>>>> HI Guys,
>>>>>
>>>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
>>>>> over 37% od the data to rebalance - let's say this is fine (this is when I
>>>>> removed it frm Crush Map).
>>>>>
>>>>> I'm wondering - I have previously set some throtling mechanism, but
>>>>> during first 1h of rebalancing, my rate of recovery was going up to 1500
>>>>> MB/s - and VMs were unusable completely, and then last 4h of the duration
>>>>> of recover this recovery rate went down to, say, 100-200 MB.s and during
>>>>> this VM performance was still pretty impacted, but at least I could work
>>>>> more or a less
>>>>>
>>>>> So my question, is this behaviour expected, is throtling here working
>>>>> as expected, since first 1h was almoust no throtling applied if I check 
>>>>> the
>>>>> recovery rate 1500MB/s and the impact on Vms.
>>>>> And last 4h seemed pretty fine (although still lot of impact in
>>>>> general)
>>>>>
>>>>> I changed these throtling on the fly with:
>>>>>
>>>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
>>>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>>>>> ceph tell osd.* injectargs '--osd_max_backfills 1'
>>>>>
>>>>> My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
>>>>> SSD, 6 journals on another SSD)  - I have 3 of these hosts.
>>>>>
>>>>> Any thought are welcome.
>>>>> --
>>>>>
>>>>> Andrija Panić
>>>>>
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> С уважением, Фасихов Ирек Нургаязович
>>>> Моб.: +79229045757
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thx Irek. Number of replicas is 3.

I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
decommissioned), which is further connected to a new 10G switch/network
with 3 servers on it with 12 OSDs each.
I'm decommissioning old 3 nodes on 1G network...

So you suggest removing whole node with 2 OSDs manually from crush map?
Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
were originally been distributed over all 3 nodes. So anyway It could be
safe to remove 2 OSDs at once together with the node itself...since replica
count is 3...
?

Thx again for your time
On Mar 3, 2015 1:35 PM, "Irek Fasikhov"  wrote:

> Once you have only three nodes in the cluster.
> I recommend you add new nodes to the cluster, and then delete the old.
>
> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov :
>
>> You have a number of replication?
>>
>> 2015-03-03 15:14 GMT+03:00 Andrija Panic :
>>
>>> Hi Irek,
>>>
>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
>>> degraded and moved/recovered.
>>> When I after that removed it from Crush map "ceph osd crush rm id",
>>> that's when the stuff with 37% happened.
>>>
>>> And thanks Irek for help - could you kindly just let me know of the
>>> prefered steps when removing whole node?
>>> Do you mean I first stop all OSDs again, or just remove each OSD from
>>> crush map, or perhaps, just decompile cursh map, delete the node
>>> completely, compile back in, and let it heal/recover ?
>>>
>>> Do you think this would result in less data missplaces and moved arround
>>> ?
>>>
>>> Sorry for bugging you, I really appreaciate your help.
>>>
>>> Thanks
>>>
>>> On 3 March 2015 at 12:58, Irek Fasikhov  wrote:
>>>
>>>> A large percentage of the rebuild of the cluster map (But low
>>>> percentage degradation). If you had not made "ceph osd crush rm id", the
>>>> percentage would be low.
>>>> In your case, the correct option is to remove the entire node, rather
>>>> than each disk individually
>>>>
>>>> 2015-03-03 14:27 GMT+03:00 Andrija Panic :
>>>>
>>>>> Another question - I mentioned here 37% of objects being moved arround
>>>>> - this is MISPLACED object (degraded objects were 0.001%, after I removed 
>>>>> 1
>>>>> OSD from cursh map (out of 44 OSD or so).
>>>>>
>>>>> Can anybody confirm this is normal behaviour - and are there any
>>>>> workarrounds ?
>>>>>
>>>>> I understand this is because of the object placement algorithm of
>>>>> CEPH, but still 37% of object missplaces just by removing 1 OSD from crush
>>>>> maps out of 44 make me wonder why this large percentage ?
>>>>>
>>>>> Seems not good to me, and I have to remove another 7 OSDs (we are
>>>>> demoting some old hardware nodes). This means I can potentialy go with 7 x
>>>>> the same number of missplaced objects...?
>>>>>
>>>>> Any thoughts ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On 3 March 2015 at 12:14, Andrija Panic 
>>>>> wrote:
>>>>>
>>>>>> Thanks Irek.
>>>>>>
>>>>>> Does this mean, that after peering for each PG, there will be delay
>>>>>> of 10sec, meaning that every once in a while, I will have 10sec od the
>>>>>> cluster NOT being stressed/overloaded, and then the recovery takes place
>>>>>> for that PG, and then another 10sec cluster is fine, and then stressed
>>>>>> again ?
>>>>>>
>>>>>> I'm trying to understand process before actually doing stuff (config
>>>>>> reference is there on ceph.com but I don't fully understand the
>>>>>> process)
>>>>>>
>>>>>> Thanks,
>>>>>> Andrija
>>>>>>
>>>>>> On 3 March 2015 at 11:32, Irek Fasikhov  wrote:
>>>>>>
>>>>>>> Hi.
>>>>>>>
>>>>>>> Use value "osd_recovery_delay_start"
>>>>>>> example:
>>>>>>> [root@ceph08 ceph]# ceph --admin-daemon
>>>>>>> /var/run/ceph/ceph-osd.94.asok config show  | grep 
>>>>>>> osd_recovery_delay_start
>>>>>>>   "osd_recovery_delay_

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush
map - weather that will cause more than 37% of data moved (80% or whatever)

I'm also wondering if the thortling that I applied is fine or not - I will
introduce the osd_recovery_delay_start 10sec as Irek said.

I'm just wondering hom much will be the performance impact, because:
- when stoping OSD, the impact while backfilling was fine more or a less -
I can leave with this
- when I removed OSD from cursh map - first 1h or so, impact was
tremendous, and later on during recovery process impact was much less but
still noticable...

Thanks for the tip of course !
Andrija

On 3 March 2015 at 18:34, Robert LeBlanc  wrote:

> I would be inclined to shut down both OSDs in a node, let the cluster
> recover. Once it is recovered, shut down the next two, let it recover.
> Repeat until all the OSDs are taken out of the cluster. Then I would
> set nobackfill and norecover. Then remove the hosts/disks from the
> CRUSH then unset nobackfill and norecover.
>
> That should give you a few small changes (when you shut down OSDs) and
> then one big one to get everything in the final place. If you are
> still adding new nodes, when nobackfill and norecover is set, you can
> add them in so that the one big relocate fills the new drives too.
>
> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic 
> wrote:
> > Thx Irek. Number of replicas is 3.
> >
> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
> > decommissioned), which is further connected to a new 10G switch/network
> with
> > 3 servers on it with 12 OSDs each.
> > I'm decommissioning old 3 nodes on 1G network...
> >
> > So you suggest removing whole node with 2 OSDs manually from crush map?
> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
> > were originally been distributed over all 3 nodes. So anyway It could be
> > safe to remove 2 OSDs at once together with the node itself...since
> replica
> > count is 3...
> > ?
> >
> > Thx again for your time
> >
> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov"  wrote:
> >>
> >> Once you have only three nodes in the cluster.
> >> I recommend you add new nodes to the cluster, and then delete the old.
> >>
> >> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov :
> >>>
> >>> You have a number of replication?
> >>>
> >>> 2015-03-03 15:14 GMT+03:00 Andrija Panic :
> >>>>
> >>>> Hi Irek,
> >>>>
> >>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
> >>>> degraded and moved/recovered.
> >>>> When I after that removed it from Crush map "ceph osd crush rm id",
> >>>> that's when the stuff with 37% happened.
> >>>>
> >>>> And thanks Irek for help - could you kindly just let me know of the
> >>>> prefered steps when removing whole node?
> >>>> Do you mean I first stop all OSDs again, or just remove each OSD from
> >>>> crush map, or perhaps, just decompile cursh map, delete the node
> completely,
> >>>> compile back in, and let it heal/recover ?
> >>>>
> >>>> Do you think this would result in less data missplaces and moved
> arround
> >>>> ?
> >>>>
> >>>> Sorry for bugging you, I really appreaciate your help.
> >>>>
> >>>> Thanks
> >>>>
> >>>> On 3 March 2015 at 12:58, Irek Fasikhov  wrote:
> >>>>>
> >>>>> A large percentage of the rebuild of the cluster map (But low
> >>>>> percentage degradation). If you had not made "ceph osd crush rm id",
> the
> >>>>> percentage would be low.
> >>>>> In your case, the correct option is to remove the entire node, rather
> >>>>> than each disk individually
> >>>>>
> >>>>> 2015-03-03 14:27 GMT+03:00 Andrija Panic :
> >>>>>>
> >>>>>> Another question - I mentioned here 37% of objects being moved
> arround
> >>>>>> - this is MISPLACED object (degraded objects were 0.001%, after I
> removed 1
> >>>>>> OSD from cursh map (out of 44 OSD or so).
> >>>>>>
> >>>>>> Can anybody confirm this is normal behaviour - and are there any
> >>>>>> workarrounds ?
> >>>>>>
> >>>>>> I understand this is because of the object placement algorit

[ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Hi,

I'm having a live cluster with only public network (so no explicit network
configuraion in the ceph.conf file)

I'm wondering what is the procedure to implement dedicated
Replication/Private and Public network.
I've read the manual, know how to do it in ceph.conf, but I'm wondering
since this is already running cluster - what should I do after I change
ceph.conf on all nodes ?
Restarting OSDs one by one, or... ? Is there any downtime expected ? - for
the replication network to actually imlemented completely.


Another related quetion:

Also, I'm demoting some old OSDs, on old servers, I will have them all
stoped, but would like to implement replication network before actually
removing old OSDs from crush map - since lot of data will be moved arround.

My old nodes/OSDs (that will be stoped before I implement replication
network) - do NOT have dedicated NIC for replication network, in contrast
to new nodes/OSDs. So there will be still reference to these old OSD in the
crush map.
Will this be a problem - me changing/implementing replication network that
WILL work on new nodes/OSDs, but not on old ones since they don't have
dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
like opinion.

Or perhaps i might remove OSD from crush map with prior seting of
 nobackfill and   norecover (so no rebalancing happens) and then implement
replication netwotk?


Sorry for old post, but...

Thanks,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
That was my thought, yes - I found this blog that confirms what you are
saying I guess:
http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
I will do that... Thx

I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped (and cluster resynced after that) ?

Thx again for the help

On 4 March 2015 at 17:44, Robert LeBlanc  wrote:

> If I remember right, someone has done this on a live cluster without
> any issues. I seem to remember that it had a fallback mechanism if the
> OSDs couldn't be reached on the cluster network to contact them on the
> public network. You could test it pretty easily without much impact.
> Take one OSD that has both networks and configure it and restart the
> process. If all the nodes (specifically the old ones with only one
> network) is able to connect to it, then you are good to go by
> restarting one OSD at a time.
>
> On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic 
> wrote:
> > Hi,
> >
> > I'm having a live cluster with only public network (so no explicit
> network
> > configuraion in the ceph.conf file)
> >
> > I'm wondering what is the procedure to implement dedicated
> > Replication/Private and Public network.
> > I've read the manual, know how to do it in ceph.conf, but I'm wondering
> > since this is already running cluster - what should I do after I change
> > ceph.conf on all nodes ?
> > Restarting OSDs one by one, or... ? Is there any downtime expected ? -
> for
> > the replication network to actually imlemented completely.
> >
> >
> > Another related quetion:
> >
> > Also, I'm demoting some old OSDs, on old servers, I will have them all
> > stoped, but would like to implement replication network before actually
> > removing old OSDs from crush map - since lot of data will be moved
> arround.
> >
> > My old nodes/OSDs (that will be stoped before I implement replication
> > network) - do NOT have dedicated NIC for replication network, in
> contrast to
> > new nodes/OSDs. So there will be still reference to these old OSD in the
> > crush map.
> > Will this be a problem - me changing/implementing replication network
> that
> > WILL work on new nodes/OSDs, but not on old ones since they don't have
> > dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
> like
> > opinion.
> >
> > Or perhaps i might remove OSD from crush map with prior seting of
> > nobackfill and   norecover (so no rebalancing happens) and then implement
> > replication netwotk?
> >
> >
> > Sorry for old post, but...
> >
> > Thanks,
> > --
> >
> > Andrija Panić
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx Wido, I needed this confirmations - thanks!

On 4 March 2015 at 17:49, Wido den Hollander  wrote:

> On 03/04/2015 05:44 PM, Robert LeBlanc wrote:
> > If I remember right, someone has done this on a live cluster without
> > any issues. I seem to remember that it had a fallback mechanism if the
> > OSDs couldn't be reached on the cluster network to contact them on the
> > public network. You could test it pretty easily without much impact.
> > Take one OSD that has both networks and configure it and restart the
> > process. If all the nodes (specifically the old ones with only one
> > network) is able to connect to it, then you are good to go by
> > restarting one OSD at a time.
> >
>
> In the OSDMap each OSD has a public and cluster network address. If the
> cluster network address is not set, replication to that OSD will be done
> over the public network.
>
> So you can push a new configuration to all OSDs and restart them one by
> one.
>
> Make sure the network is ofcourse up and running and it should work.
>
> > On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic 
> wrote:
> >> Hi,
> >>
> >> I'm having a live cluster with only public network (so no explicit
> network
> >> configuraion in the ceph.conf file)
> >>
> >> I'm wondering what is the procedure to implement dedicated
> >> Replication/Private and Public network.
> >> I've read the manual, know how to do it in ceph.conf, but I'm wondering
> >> since this is already running cluster - what should I do after I change
> >> ceph.conf on all nodes ?
> >> Restarting OSDs one by one, or... ? Is there any downtime expected ? -
> for
> >> the replication network to actually imlemented completely.
> >>
> >>
> >> Another related quetion:
> >>
> >> Also, I'm demoting some old OSDs, on old servers, I will have them all
> >> stoped, but would like to implement replication network before actually
> >> removing old OSDs from crush map - since lot of data will be moved
> arround.
> >>
> >> My old nodes/OSDs (that will be stoped before I implement replication
> >> network) - do NOT have dedicated NIC for replication network, in
> contrast to
> >> new nodes/OSDs. So there will be still reference to these old OSD in the
> >> crush map.
> >> Will this be a problem - me changing/implementing replication network
> that
> >> WILL work on new nodes/OSDs, but not on old ones since they don't have
> >> dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
> like
> >> opinion.
> >>
> >> Or perhaps i might remove OSD from crush map with prior seting of
> >> nobackfill and   norecover (so no rebalancing happens) and then
> implement
> >> replication netwotk?
> >>
> >>
> >> Sorry for old post, but...
> >>
> >> Thanks,
> >> --
> >>
> >> Andrija Panić
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
"I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
that are stoped (and cluster resynced after that) ?"

I wanted to say: it doesnt matter (I guess?) that my Crush map is still
referencing old OSD nodes that are already stoped. Tired, sorry...

On 4 March 2015 at 17:48, Andrija Panic  wrote:

> That was my thought, yes - I found this blog that confirms what you are
> saying I guess:
> http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
> I will do that... Thx
>
> I guess it doesnt matter, since my Crush Map will still refernce old OSDs,
> that are stoped (and cluster resynced after that) ?
>
> Thx again for the help
>
> On 4 March 2015 at 17:44, Robert LeBlanc  wrote:
>
>> If I remember right, someone has done this on a live cluster without
>> any issues. I seem to remember that it had a fallback mechanism if the
>> OSDs couldn't be reached on the cluster network to contact them on the
>> public network. You could test it pretty easily without much impact.
>> Take one OSD that has both networks and configure it and restart the
>> process. If all the nodes (specifically the old ones with only one
>> network) is able to connect to it, then you are good to go by
>> restarting one OSD at a time.
>>
>> On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic 
>> wrote:
>> > Hi,
>> >
>> > I'm having a live cluster with only public network (so no explicit
>> network
>> > configuraion in the ceph.conf file)
>> >
>> > I'm wondering what is the procedure to implement dedicated
>> > Replication/Private and Public network.
>> > I've read the manual, know how to do it in ceph.conf, but I'm wondering
>> > since this is already running cluster - what should I do after I change
>> > ceph.conf on all nodes ?
>> > Restarting OSDs one by one, or... ? Is there any downtime expected ? -
>> for
>> > the replication network to actually imlemented completely.
>> >
>> >
>> > Another related quetion:
>> >
>> > Also, I'm demoting some old OSDs, on old servers, I will have them all
>> > stoped, but would like to implement replication network before actually
>> > removing old OSDs from crush map - since lot of data will be moved
>> arround.
>> >
>> > My old nodes/OSDs (that will be stoped before I implement replication
>> > network) - do NOT have dedicated NIC for replication network, in
>> contrast to
>> > new nodes/OSDs. So there will be still reference to these old OSD in the
>> > crush map.
>> > Will this be a problem - me changing/implementing replication network
>> that
>> > WILL work on new nodes/OSDs, but not on old ones since they don't have
>> > dedicated NIC ? I guess not since old OSDs are stoped anyway, but would
>> like
>> > opinion.
>> >
>> > Or perhaps i might remove OSD from crush map with prior seting of
>> > nobackfill and   norecover (so no rebalancing happens) and then
>> implement
>> > replication netwotk?
>> >
>> >
>> > Sorry for old post, but...
>> >
>> > Thanks,
>> > --
>> >
>> > Andrija Panić
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Hi Robert,

I already have this stuff set. CEph is 0.87.0 now...

Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
move data in less than 8h per my last experineced that was arround8h, but
some 1G OSDs were included...

Thx!

On 4 March 2015 at 17:49, Robert LeBlanc  wrote:

> You will most likely have a very high relocation percentage. Backfills
> always are more impactful on smaller clusters, but "osd max backfills"
> should be what you need to help reduce the impact. The default is 10,
> you will want to use 1.
>
> I didn't catch which version of Ceph you are running, but I think
> there was some priority work done in firefly to help make backfills
> lower priority. I think it has gotten better in later versions.
>
> On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic 
> wrote:
> > Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
> crush
> > map - weather that will cause more than 37% of data moved (80% or
> whatever)
> >
> > I'm also wondering if the thortling that I applied is fine or not - I
> will
> > introduce the osd_recovery_delay_start 10sec as Irek said.
> >
> > I'm just wondering hom much will be the performance impact, because:
> > - when stoping OSD, the impact while backfilling was fine more or a less
> - I
> > can leave with this
> > - when I removed OSD from cursh map - first 1h or so, impact was
> tremendous,
> > and later on during recovery process impact was much less but still
> > noticable...
> >
> > Thanks for the tip of course !
> > Andrija
> >
> > On 3 March 2015 at 18:34, Robert LeBlanc  wrote:
> >>
> >> I would be inclined to shut down both OSDs in a node, let the cluster
> >> recover. Once it is recovered, shut down the next two, let it recover.
> >> Repeat until all the OSDs are taken out of the cluster. Then I would
> >> set nobackfill and norecover. Then remove the hosts/disks from the
> >> CRUSH then unset nobackfill and norecover.
> >>
> >> That should give you a few small changes (when you shut down OSDs) and
> >> then one big one to get everything in the final place. If you are
> >> still adding new nodes, when nobackfill and norecover is set, you can
> >> add them in so that the one big relocate fills the new drives too.
> >>
> >> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic 
> >> wrote:
> >> > Thx Irek. Number of replicas is 3.
> >> >
> >> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
> >> > decommissioned), which is further connected to a new 10G
> switch/network
> >> > with
> >> > 3 servers on it with 12 OSDs each.
> >> > I'm decommissioning old 3 nodes on 1G network...
> >> >
> >> > So you suggest removing whole node with 2 OSDs manually from crush
> map?
> >> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3
> replicas
> >> > were originally been distributed over all 3 nodes. So anyway It could
> be
> >> > safe to remove 2 OSDs at once together with the node itself...since
> >> > replica
> >> > count is 3...
> >> > ?
> >> >
> >> > Thx again for your time
> >> >
> >> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov"  wrote:
> >> >>
> >> >> Once you have only three nodes in the cluster.
> >> >> I recommend you add new nodes to the cluster, and then delete the
> old.
> >> >>
> >> >> 2015-03-03 15:28 GMT+03:00 Irek Fasikhov :
> >> >>>
> >> >>> You have a number of replication?
> >> >>>
> >> >>> 2015-03-03 15:14 GMT+03:00 Andrija Panic :
> >> >>>>
> >> >>>> Hi Irek,
> >> >>>>
> >> >>>> yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
> >> >>>> degraded and moved/recovered.
> >> >>>> When I after that removed it from Crush map "ceph osd crush rm id",
> >> >>>> that's when the stuff with 37% happened.
> >> >>>>
> >> >>>> And thanks Irek for help - could you kindly just let me know of the
> >> >>>> prefered steps when removing whole node?
> >> >>>> Do you mean I first stop all OSDs again, or just remove each OSD
> from
> >> >>>> crush map, or perhaps, just decompile cursh map, delete the node
> >> >>>> 

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx again - I really appreciatethe help guys !

On 4 March 2015 at 17:51, Robert LeBlanc  wrote:

> If the data have been replicated to new OSDs, it will be able to
> function properly even them them down or only on the public network.
>
> On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic 
> wrote:
> > "I guess it doesnt matter, since my Crush Map will still refernce old
> OSDs,
> > that are stoped (and cluster resynced after that) ?"
> >
> > I wanted to say: it doesnt matter (I guess?) that my Crush map is still
> > referencing old OSD nodes that are already stoped. Tired, sorry...
> >
> > On 4 March 2015 at 17:48, Andrija Panic  wrote:
> >>
> >> That was my thought, yes - I found this blog that confirms what you are
> >> saying I guess:
> >>
> http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/
> >> I will do that... Thx
> >>
> >> I guess it doesnt matter, since my Crush Map will still refernce old
> OSDs,
> >> that are stoped (and cluster resynced after that) ?
> >>
> >> Thx again for the help
> >>
> >> On 4 March 2015 at 17:44, Robert LeBlanc  wrote:
> >>>
> >>> If I remember right, someone has done this on a live cluster without
> >>> any issues. I seem to remember that it had a fallback mechanism if the
> >>> OSDs couldn't be reached on the cluster network to contact them on the
> >>> public network. You could test it pretty easily without much impact.
> >>> Take one OSD that has both networks and configure it and restart the
> >>> process. If all the nodes (specifically the old ones with only one
> >>> network) is able to connect to it, then you are good to go by
> >>> restarting one OSD at a time.
> >>>
> >>> On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic  >
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > I'm having a live cluster with only public network (so no explicit
> >>> > network
> >>> > configuraion in the ceph.conf file)
> >>> >
> >>> > I'm wondering what is the procedure to implement dedicated
> >>> > Replication/Private and Public network.
> >>> > I've read the manual, know how to do it in ceph.conf, but I'm
> wondering
> >>> > since this is already running cluster - what should I do after I
> change
> >>> > ceph.conf on all nodes ?
> >>> > Restarting OSDs one by one, or... ? Is there any downtime expected ?
> -
> >>> > for
> >>> > the replication network to actually imlemented completely.
> >>> >
> >>> >
> >>> > Another related quetion:
> >>> >
> >>> > Also, I'm demoting some old OSDs, on old servers, I will have them
> all
> >>> > stoped, but would like to implement replication network before
> actually
> >>> > removing old OSDs from crush map - since lot of data will be moved
> >>> > arround.
> >>> >
> >>> > My old nodes/OSDs (that will be stoped before I implement replication
> >>> > network) - do NOT have dedicated NIC for replication network, in
> >>> > contrast to
> >>> > new nodes/OSDs. So there will be still reference to these old OSD in
> >>> > the
> >>> > crush map.
> >>> > Will this be a problem - me changing/implementing replication network
> >>> > that
> >>> > WILL work on new nodes/OSDs, but not on old ones since they don't
> have
> >>> > dedicated NIC ? I guess not since old OSDs are stoped anyway, but
> would
> >>> > like
> >>> > opinion.
> >>> >
> >>> > Or perhaps i might remove OSD from crush map with prior seting of
> >>> > nobackfill and   norecover (so no rebalancing happens) and then
> >>> > implement
> >>> > replication netwotk?
> >>> >
> >>> >
> >>> > Sorry for old post, but...
> >>> >
> >>> > Thanks,
> >>> > --
> >>> >
> >>> > Andrija Panić
> >>> >
> >>> > ___
> >>> > ceph-users mailing list
> >>> > ceph-users@lists.ceph.com
> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Andrija Panić
> >
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Hi Robert,

it seems I have not listened well on your advice - I set osd to out,
instead of stoping it - and now instead of some ~ 3% of degraded objects,
now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
is happening again, but this is small percentage..

Do you know if later when I remove this OSD from crush map - no more data
will be rebalanced (as per CEPH official documentation) - since already
missplaced objects are geting distributed away to all other nodes ?

(after service ceph stop osd.0 - there was 2.45% degraded data - but no
backfilling was happening for some reason...it just stayed degraded... so
this is a reason why I started back the OSD, and then set it to out...)

Thanks

On 4 March 2015 at 17:54, Andrija Panic  wrote:

> Hi Robert,
>
> I already have this stuff set. CEph is 0.87.0 now...
>
> Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
> move data in less than 8h per my last experineced that was arround8h, but
> some 1G OSDs were included...
>
> Thx!
>
> On 4 March 2015 at 17:49, Robert LeBlanc  wrote:
>
>> You will most likely have a very high relocation percentage. Backfills
>> always are more impactful on smaller clusters, but "osd max backfills"
>> should be what you need to help reduce the impact. The default is 10,
>> you will want to use 1.
>>
>> I didn't catch which version of Ceph you are running, but I think
>> there was some priority work done in firefly to help make backfills
>> lower priority. I think it has gotten better in later versions.
>>
>> On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic 
>> wrote:
>> > Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
>> crush
>> > map - weather that will cause more than 37% of data moved (80% or
>> whatever)
>> >
>> > I'm also wondering if the thortling that I applied is fine or not - I
>> will
>> > introduce the osd_recovery_delay_start 10sec as Irek said.
>> >
>> > I'm just wondering hom much will be the performance impact, because:
>> > - when stoping OSD, the impact while backfilling was fine more or a
>> less - I
>> > can leave with this
>> > - when I removed OSD from cursh map - first 1h or so, impact was
>> tremendous,
>> > and later on during recovery process impact was much less but still
>> > noticable...
>> >
>> > Thanks for the tip of course !
>> > Andrija
>> >
>> > On 3 March 2015 at 18:34, Robert LeBlanc  wrote:
>> >>
>> >> I would be inclined to shut down both OSDs in a node, let the cluster
>> >> recover. Once it is recovered, shut down the next two, let it recover.
>> >> Repeat until all the OSDs are taken out of the cluster. Then I would
>> >> set nobackfill and norecover. Then remove the hosts/disks from the
>> >> CRUSH then unset nobackfill and norecover.
>> >>
>> >> That should give you a few small changes (when you shut down OSDs) and
>> >> then one big one to get everything in the final place. If you are
>> >> still adding new nodes, when nobackfill and norecover is set, you can
>> >> add them in so that the one big relocate fills the new drives too.
>> >>
>> >> On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic > >
>> >> wrote:
>> >> > Thx Irek. Number of replicas is 3.
>> >> >
>> >> > I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
>> >> > decommissioned), which is further connected to a new 10G
>> switch/network
>> >> > with
>> >> > 3 servers on it with 12 OSDs each.
>> >> > I'm decommissioning old 3 nodes on 1G network...
>> >> >
>> >> > So you suggest removing whole node with 2 OSDs manually from crush
>> map?
>> >> > Per my knowledge, ceph never places 2 replicas on 1 node, all 3
>> replicas
>> >> > were originally been distributed over all 3 nodes. So anyway It
>> could be
>> >> > safe to remove 2 OSDs at once together with the node itself...since
>> >> > replica
>> >> > count is 3...
>> >> > ?
>> >> >
>> >> > Thx again for your time
>> >> >
>> >> > On Mar 3, 2015 1:35 PM, "Irek Fasikhov"  wrote:
>> >> >>
>> >> >> Once you have only three nodes in the cluster.
>> >> >> I recommend you add new nodes to the cluster, and then delete the
>> old.
>> >> &

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Thanks a lot Robert.

I have actually already tried folowing:

a) set one OSD to out (6% of data misplaced, CEPH recovered fine), stop
OSD, remove OSD from crush map (again 36% of data misplaced !!!) - then
inserted OSD back in to crushmap - and those 36% displaced objects
disappeared, of course - I'v undone the crush remove...
so damage undone - the OSD is just "out" and cluster healthy again.


b) set norecover, nobackfill, and then:
- Remove one OSD from crush (the running OSD, not the one from point a)
- only 18% of data misplaced !!! (no recovery was happening though, because
of norecover, nobackfill)
- Removed another OSD from same node - total of only 20% of objects
missplaced (with 2 OSDs on same node, removed from crush map)
-So these 2 OSD were still running UP and IN, and I just removed them
from crush map, per the advice to avoid calcualting Crush map twice = from:
http://image.slidesharecdn.com/scalingcephatcern-140311134847-phpapp01/95/scaling-ceph-at-cern-ceph-day-frankfurt-19-638.jpg?cb=1394564547
- And I added back this 2 OSD to crush map, this was just a test...

So the algorith is very funny in some aspect..but it's all pseudo stuff so
I kind of understand...

I will share my finding during the rest of the OSD demotion, after I demote
them...

Thanks for your detailed inputs !
Andrija


On 5 March 2015 at 22:51, Robert LeBlanc  wrote:

> Setting an OSD out will start the rebalance with the degraded object
> count. The OSD is still alive and can participate in the relocation of the
> objects. This is preferable so that you don't happen to get less the
> min_size because a disk fails during the rebalance then I/O stops on the
> cluster.
>
> Because CRUSH is an algorithm, anything that changes algorithm will cause
> a change in the output (location). When you set/fail an OSD, it changes the
> CRUSH, but the host and weight of the host are still in effect. When you
> remove the host or change the weight of the host (by removing a single
> OSD), it makes a change to the algorithm which will also cause some changes
> in how it computes the locations.
>
> Disclaimer - I have not tried this
>
> It may be possible to minimize the data movement by doing the following:
>
>1. set norecover and nobackfill on the cluster
>2. Set the OSDs to be removed to "out"
>3. Adjust the weight of the hosts in the CRUSH (if removing all OSDs
>for the host, set it to zero)
>4. If you have new OSDs to add, add them into the cluster now
>5. Once all OSDs changes have been entered, unset norecover and
>nobackfill
>6. This will migrate the data off the old OSDs and onto the new OSDs
>in one swoop.
>7. Once the data migration is complete, set norecover and nobackfill
>on the cluster again.
>8. Remove the old OSDs
>9. Unset norecover and nobackfill
>
> The theory is that by setting the host weights to 0, removing the
> OSDs/hosts later should minimize the data movement afterwards because the
> algorithm should have already dropped it out as a candidate for placement.
>
> If this works right, then you basically queue up a bunch of small changes,
> do one data movement, always keep all copies of your objects online and
> minimize the impact of the data movement by leveraging both your old and
> new hardware at the same time.
>
> If you try this, please report back on your experience. I'm might try it
> in my lab, but I'm really busy at the moment so I don't know if I'll get to
> it real soon.
>
> On Thu, Mar 5, 2015 at 12:53 PM, Andrija Panic 
> wrote:
>
>> Hi Robert,
>>
>> it seems I have not listened well on your advice - I set osd to out,
>> instead of stoping it - and now instead of some ~ 3% of degraded objects,
>> now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
>> is happening again, but this is small percentage..
>>
>> Do you know if later when I remove this OSD from crush map - no more data
>> will be rebalanced (as per CEPH official documentation) - since already
>> missplaced objects are geting distributed away to all other nodes ?
>>
>> (after service ceph stop osd.0 - there was 2.45% degraded data - but no
>> backfilling was happening for some reason...it just stayed degraded... so
>> this is a reason why I started back the OSD, and then set it to out...)
>>
>> Thanks
>>
>> On 4 March 2015 at 17:54, Andrija Panic  wrote:
>>
>>> Hi Robert,
>>>
>>> I already have this stuff set. CEph is 0.87.0 now...
>>>
>>> Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
>>> move data in less than 8h per my last experineced that was arround8h, bu

[ceph-users] [rbd cache experience - given]

2015-03-07 Thread Andrija Panic
Hi there,

just wanted to share some benchmark experience with RBD caching, that I
have just (partially) implemented. This is not nicely formated results,
just raw numbers to understadn the difference

 *INFRASTRUCTURE:
- 3 hosts with:  12 x 4TB drives, 6 Journals on 1 SSD, 6 journals on
second SSD
- 10GB NICs on both Compute and Storage nodes
- 10GB dedicated replication/private CEPH network
- Libvirt 1.2.3
- Qemu 0.12.1.2
- qemu drive-cache=none (set by CloudStack)

*** CEPH SETTINGS (ceph.conf on KVM hosts):
[client]
rbd cache = true
rbd cache size = 67108864 # (64MB)
rbd cache max dirty = 50331648 # (48MB)
rbd cache target dirty = 33554432 # (32MB)
rbd cache max dirty age = 2
rbd cache writethrough until flush = true # For safety reasons


 *NUMBERS (CentOS 6.6 VM - FIO/sysbench tools):

Random write 16k IO size (yes I know, this is not "iops" because "true"
IOPS is considered 4K size - but is good enough for comparison):

Random write, NO RBD cache: 170 IOPS 
Random write, RBD cache 64MB:  6500 IOPS.

Sequential writes improved from ~ 40 MB/s to 800 MB/s

Will check latency also...and let you know

*** IMPORTANT:
Make sure to have latest VirtIO drivers, because:
- CentOS 6.6, Kernel 2.6.32.x - *RBD caching does not work* (2.6.32 VirtiIO
driver does not send flushes properly)
- CentOS 6.6 Kernel 3.10 Elrepo *RBD caching works fine* (new VirtIO
drivers sending flushes fine)

I dont know for Windows, but will give you "before" and "after" numbers
very soon.

Best,
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Doesn't Support Qcow2 Disk images

2015-03-12 Thread Andrija Panic
ceph is RAW format - should be all fine...so VM will be using that RAW
format

On 12 March 2015 at 09:03, Azad Aliyar  wrote:

> Community please explain the 2nd warning on this page:
>
> http://ceph.com/docs/master/rbd/rbd-openstack/
>
> "Important Ceph doesn’t support QCOW2 for hosting a virtual machine disk.
> Thus if you want to boot virtual machines in Ceph (ephemeral backend or
> boot from volume), the Glance image format must be RAW."
>
>
> --
>Warm Regards,  Azad Aliyar
>  Linux Server Engineer
>  *Email* :  azad.ali...@sparksupport.com   *|*   *Skype* :   spark.azad
>  
> 
> 
> 3rd Floor, Leela Infopark, Phase
> -2,Kakanad, Kochi-30, Kerala, India  *Phone*:+91 484 6561696 , 
> *Mobile*:91-8129270421.
>   *Confidentiality Notice:* Information in this e-mail is proprietary to
> SparkSupport. and is intended for use only by the addressed, and may
> contain information that is privileged, confidential or exempt from
> disclosure. If you are not the intended recipient, you are notified that
> any use of this information in any manner is strictly prohibited. Please
> delete this mail & notify us immediately at i...@sparksupport.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Hi all,

I have set nodeep-scrub and noscrub while I had small/slow hardware for the
cluster.
It has been off for a while now.

Now we are upgraded with hardware/networking/SSDs and I would like to
activate - or unset these flags.

Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was
wondering what is the best way to unset flags - meaning if I just unset the
flags, should I expect that the SCRUB will start all of the sudden on all
disks - or is there way to let the SCRUB do drives one by one...

In other words - should I expect BIG performance impact ornot ?

Any experience is very appreciated...

Thanks,

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Thanks Wido - I will do that.

On 13 March 2015 at 09:46, Wido den Hollander  wrote:

>
>
> On 13-03-15 09:42, Andrija Panic wrote:
> > Hi all,
> >
> > I have set nodeep-scrub and noscrub while I had small/slow hardware for
> > the cluster.
> > It has been off for a while now.
> >
> > Now we are upgraded with hardware/networking/SSDs and I would like to
> > activate - or unset these flags.
> >
> > Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
> > was wondering what is the best way to unset flags - meaning if I just
> > unset the flags, should I expect that the SCRUB will start all of the
> > sudden on all disks - or is there way to let the SCRUB do drives one by
> > one...
> >
>
> So, I *think* that unsetting these flags will trigger a big scrub, since
> all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
>
> You can verify this with:
>
> $ ceph pg  query
>
> A solution would be to scrub each PG manually first in a timely fashion.
>
> $ ceph pg scrub 
>
> That way you set the timestamps and slowly scrub each PG.
>
> When that's done, unset the flags.
>
> Wido
>
> > In other words - should I expect BIG performance impact ornot ?
> >
> > Any experience is very appreciated...
> >
> > Thanks,
> >
> > --
> >
> > Andrija Panić
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Nice - so I just realized I need to manually scrub 1216 placements groups :)


On 13 March 2015 at 10:16, Andrija Panic  wrote:

> Thanks Wido - I will do that.
>
> On 13 March 2015 at 09:46, Wido den Hollander  wrote:
>
>>
>>
>> On 13-03-15 09:42, Andrija Panic wrote:
>> > Hi all,
>> >
>> > I have set nodeep-scrub and noscrub while I had small/slow hardware for
>> > the cluster.
>> > It has been off for a while now.
>> >
>> > Now we are upgraded with hardware/networking/SSDs and I would like to
>> > activate - or unset these flags.
>> >
>> > Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
>> > was wondering what is the best way to unset flags - meaning if I just
>> > unset the flags, should I expect that the SCRUB will start all of the
>> > sudden on all disks - or is there way to let the SCRUB do drives one by
>> > one...
>> >
>>
>> So, I *think* that unsetting these flags will trigger a big scrub, since
>> all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
>>
>> You can verify this with:
>>
>> $ ceph pg  query
>>
>> A solution would be to scrub each PG manually first in a timely fashion.
>>
>> $ ceph pg scrub 
>>
>> That way you set the timestamps and slowly scrub each PG.
>>
>> When that's done, unset the flags.
>>
>> Wido
>>
>> > In other words - should I expect BIG performance impact ornot ?
>> >
>> > Any experience is very appreciated...
>> >
>> > Thanks,
>> >
>> > --
>> >
>> > Andrija Panić
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Will do, of course :)

THx Wido for quick help, as always !

On 13 March 2015 at 12:04, Wido den Hollander  wrote:

>
>
> On 13-03-15 12:00, Andrija Panic wrote:
> > Nice - so I just realized I need to manually scrub 1216 placements
> groups :)
> >
>
> With manual I meant using a script.
>
> Loop through 'ceph pg dump', get the PGid, issue a scrub, sleep for X
> seconds and issue the next scrub.
>
> Wido
>
> >
> > On 13 March 2015 at 10:16, Andrija Panic  > <mailto:andrija.pa...@gmail.com>> wrote:
> >
> > Thanks Wido - I will do that.
> >
> > On 13 March 2015 at 09:46, Wido den Hollander  > <mailto:w...@42on.com>> wrote:
> >
> >
> >
> > On 13-03-15 09:42, Andrija Panic wrote:
> > > Hi all,
> > >
> > > I have set nodeep-scrub and noscrub while I had small/slow
> hardware for
> > > the cluster.
> > > It has been off for a while now.
> > >
> > > Now we are upgraded with hardware/networking/SSDs and I would
> like to
> > > activate - or unset these flags.
> > >
> > > Since I now have 3 servers with 12 OSDs each (SSD based
> Journals) - I
> > > was wondering what is the best way to unset flags - meaning if
> I just
> > > unset the flags, should I expect that the SCRUB will start all
> of the
> > > sudden on all disks - or is there way to let the SCRUB do
> drives one by
> > > one...
> > >
> >
> > So, I *think* that unsetting these flags will trigger a big
> > scrub, since
> > all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
> >
> > You can verify this with:
> >
> > $ ceph pg  query
> >
> > A solution would be to scrub each PG manually first in a timely
> > fashion.
> >
> > $ ceph pg scrub 
> >
> > That way you set the timestamps and slowly scrub each PG.
> >
> > When that's done, unset the flags.
> >
> > Wido
> >
> > > In other words - should I expect BIG performance impact
> ornot ?
> > >
> > > Any experience is very appreciated...
> > >
> > > Thanks,
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > --
> >
> > Andrija Panić
> >
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Interestingthx for that Henrik.

BTW, my placements groups are arround 1800 objects (ceph pg dump) - meainng
max of 7GB od data at the moment,

regular scrub just took 5-10sec to finish. Deep scrub would I guess take
some minutes for sure

What about deepscrub - timestamp is still some months ago, but regular
scrub is fine now with fresh timestamp...?

I don't see max deep scrub setings - or are these settings applied in
general for both kind on scrubs ?

Thanks



On 13 March 2015 at 12:22, Henrik Korkuc  wrote:

>  I think that there will be no big scrub, as there are limits of maximum
> scrubs at a time.
> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
>
> If we take "osd max scrubs" which is 1 by default, then you will not get
> more than 1 scrub per OSD.
>
> I couldn't quickly find if there are cluster wide limits.
>
>
> On 3/13/15 10:46, Wido den Hollander wrote:
>
>
> On 13-03-15 09:42, Andrija Panic wrote:
>
>  Hi all,
>
> I have set nodeep-scrub and noscrub while I had small/slow hardware for
> the cluster.
> It has been off for a while now.
>
> Now we are upgraded with hardware/networking/SSDs and I would like to
> activate - or unset these flags.
>
> Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
> was wondering what is the best way to unset flags - meaning if I just
> unset the flags, should I expect that the SCRUB will start all of the
> sudden on all disks - or is there way to let the SCRUB do drives one by
> one...
>
>
>  So, I *think* that unsetting these flags will trigger a big scrub, since
> all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
>
> You can verify this with:
>
> $ ceph pg  query
>
> A solution would be to scrub each PG manually first in a timely fashion.
>
> $ ceph pg scrub 
>
> That way you set the timestamps and slowly scrub each PG.
>
> When that's done, unset the flags.
>
> Wido
>
>
>  In other words - should I expect BIG performance impact ornot ?
>
> Any experience is very appreciated...
>
> Thanks,
>
> --
>
> Andrija Panić
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>  ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Andrija Panic
Hmnice Thx guys


On 13 March 2015 at 12:33, Henrik Korkuc  wrote:

>  I think settings apply to both kinds of scrubs
>
>
> On 3/13/15 13:31, Andrija Panic wrote:
>
> Interestingthx for that Henrik.
>
>  BTW, my placements groups are arround 1800 objects (ceph pg dump) -
> meainng max of 7GB od data at the moment,
>
>  regular scrub just took 5-10sec to finish. Deep scrub would I guess take
> some minutes for sure
>
>  What about deepscrub - timestamp is still some months ago, but regular
> scrub is fine now with fresh timestamp...?
>
>  I don't see max deep scrub setings - or are these settings applied in
> general for both kind on scrubs ?
>
>  Thanks
>
>
>
> On 13 March 2015 at 12:22, Henrik Korkuc  wrote:
>
>>  I think that there will be no big scrub, as there are limits of maximum
>> scrubs at a time.
>> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
>>
>> If we take "osd max scrubs" which is 1 by default, then you will not get
>> more than 1 scrub per OSD.
>>
>> I couldn't quickly find if there are cluster wide limits.
>>
>>
>> On 3/13/15 10:46, Wido den Hollander wrote:
>>
>> On 13-03-15 09:42, Andrija Panic wrote:
>>
>>  Hi all,
>>
>> I have set nodeep-scrub and noscrub while I had small/slow hardware for
>> the cluster.
>> It has been off for a while now.
>>
>> Now we are upgraded with hardware/networking/SSDs and I would like to
>> activate - or unset these flags.
>>
>> Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
>> was wondering what is the best way to unset flags - meaning if I just
>> unset the flags, should I expect that the SCRUB will start all of the
>> sudden on all disks - or is there way to let the SCRUB do drives one by
>> one...
>>
>>
>>  So, I *think* that unsetting these flags will trigger a big scrub, since
>> all PGs have a very old last_scrub_stamp and last_deepscrub_stamp
>>
>> You can verify this with:
>>
>> $ ceph pg  query
>>
>> A solution would be to scrub each PG manually first in a timely fashion.
>>
>> $ ceph pg scrub 
>>
>> That way you set the timestamps and slowly scrub each PG.
>>
>> When that's done, unset the flags.
>>
>> Wido
>>
>>
>>  In other words - should I expect BIG performance impact ornot ?
>>
>> Any experience is very appreciated...
>>
>> Thanks,
>>
>> --
>>
>> Andrija Panić
>>
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>  ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
>  --
>
> Andrija Panić
>
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Andrija Panic
Check firewall - I hit this issue over and over again...

On 13 March 2015 at 22:25, Georgios Dimitrakakis 
wrote:

> On an already available cluster I 've tried to add a new monitor!
>
> I have used ceph-deploy mon create {NODE}
>
> where {NODE}=the name of the node
>
> and then I restarted the /etc/init.d/ceph service with a success at the
> node
> where it showed that the monitor is running like:
>
> # /etc/init.d/ceph restart
> === mon.jin ===
> === mon.jin ===
> Stopping Ceph mon.jin on jin...kill 36388...done
> === mon.jin ===
> Starting Ceph mon.jin on jin...
> Starting ceph-create-keys on jin...
>
>
>
> But checking the quorum it doesn't show the newly added monitor!
>
> Plus ceph mon stat gives out only 1 monitor!!!
>
> # ceph mon stat
> e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu
>
>
> Any ideas on what have I done wrong???
>
>
> Regards,
>
> George
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Andrija Panic
:D
On Mar 13, 2015 10:30 PM, "Jesus Chavez (jeschave)" 
wrote:

>  So am I the firewall was the reason
>
>
> * Jesus Chavez*
> SYSTEMS ENGINEER-C.SALES
>
> jesch...@cisco.com
> Phone: *+52 55 5267 3146 <+52%2055%205267%203146>*
> Mobile: *+51 1 5538883255 <+51%201%205538883255>*
>
> CCIE - 44433
>
> On Mar 13, 2015, at 3:30 PM, Andrija Panic 
> wrote:
>
>   Check firewall - I hit this issue over and over again...
>
> On 13 March 2015 at 22:25, Georgios Dimitrakakis 
> wrote:
>
>> On an already available cluster I 've tried to add a new monitor!
>>
>> I have used ceph-deploy mon create {NODE}
>>
>> where {NODE}=the name of the node
>>
>> and then I restarted the /etc/init.d/ceph service with a success at the
>> node
>> where it showed that the monitor is running like:
>>
>> # /etc/init.d/ceph restart
>> === mon.jin ===
>> === mon.jin ===
>> Stopping Ceph mon.jin on jin...kill 36388...done
>> === mon.jin ===
>> Starting Ceph mon.jin on jin...
>> Starting ceph-create-keys on jin...
>>
>>
>>
>> But checking the quorum it doesn't show the newly added monitor!
>>
>> Plus ceph mon stat gives out only 1 monitor!!!
>>
>> # ceph mon stat
>> e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu
>>
>>
>> Any ideas on what have I done wrong???
>>
>>
>> Regards,
>>
>> George
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
>  --
>
> Andrija Panić
>
>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Andrija Panic
Georgeos

, you need to have "deployment server" and cd into folder that you used
originaly while deploying CEPH - in this folder you should already have
ceph.conf, admin.client keyring and other stuff - which is required to to
connect to cluster...and provision new MONs or OSDs, etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to
create a new cluster...

...means (if I'm not mistaken) that you are runnign ceph-deploy from NOT
original folder...


On 13 March 2015 at 23:03, Georgios Dimitrakakis 
wrote:

> Not a firewall problem!! Firewall is disabled ...
>
> Loic I 've tried mon create because of this: http://ceph.com/docs/v0.80.5/
> start/quick-ceph-deploy/#adding-monitors
>
>
> Should I first create and then add?? What is the proper order??? Should I
> do it from the already existing monitor node or can I run it from the new
> one?
>
> If I try add from the beginning I am getting this:
>
> ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy mon add
> jin
> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to
> create a new cluster
>
>
>
> Regards,
>
>
> George
>
>
>
>  Hi,
>>
>> I think ceph-deploy mon add (instead of create) is what you should be
>> using.
>>
>> Cheers
>>
>> On 13/03/2015 22:25, Georgios Dimitrakakis wrote:
>>
>>> On an already available cluster I 've tried to add a new monitor!
>>>
>>> I have used ceph-deploy mon create {NODE}
>>>
>>> where {NODE}=the name of the node
>>>
>>> and then I restarted the /etc/init.d/ceph service with a success at the
>>> node
>>> where it showed that the monitor is running like:
>>>
>>> # /etc/init.d/ceph restart
>>> === mon.jin ===
>>> === mon.jin ===
>>> Stopping Ceph mon.jin on jin...kill 36388...done
>>> === mon.jin ===
>>> Starting Ceph mon.jin on jin...
>>> Starting ceph-create-keys on jin...
>>>
>>>
>>>
>>> But checking the quorum it doesn't show the newly added monitor!
>>>
>>> Plus ceph mon stat gives out only 1 monitor!!!
>>>
>>> # ceph mon stat
>>> e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu
>>>
>>>
>>> Any ideas on what have I done wrong???
>>>
>>>
>>> Regards,
>>>
>>> George
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Public Network Meaning

2015-03-14 Thread Andrija Panic
Public network is clients-to-OSD traffic - and if you have NOT explicitely
defined cluster network, than also OSD-to-OSD replication takes place over
same network.

Otherwise, you can define public and cluster(private) network - so OSD
replication will happen over dedicated NICs (cluster network) and thus
speed up.

If i.e. replica count on pool is 3, that means, each 1GB of data writen to
some particualr OSD, will generate 3 x 1GB of more writes, to the
replicas... - which ideally will take place over separate NICs to speed up
things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis 
wrote:

>
> Hi all!!
>
> What is the meaning of public_network in ceph.conf?
>
> Is it the network that OSDs are talking and transferring data?
>
> I have two nodes with two IP addresses each. One for internal network
> 192.168.1.0/24
> and one external 15.12.6.*
>
> I see the following in my logs:
>
> osd.0 is down since epoch 2204, last address 15.12.6.21:6826/33094
> osd.1 is down since epoch 2206, last address 15.12.6.21:6817/32463
> osd.2 is down since epoch 2198, last address 15.12.6.21:6843/34921
> osd.3 is down since epoch 2200, last address 15.12.6.21:6838/34208
> osd.4 is down since epoch 2202, last address 15.12.6.21:6831/33610
> osd.5 is down since epoch 2194, last address 15.12.6.21:6858/35948
> osd.7 is down since epoch 2192, last address 15.12.6.21:6871/36720
> osd.8 is down since epoch 2196, last address 15.12.6.21:6855/35354
>
>
> I 've managed to add a second node and during rebalancing I see that data
> is transfered through
> the internal 192.* but the external link is also saturated!
>
> What is being transferred from that?
>
>
> Any help much appreciated!
>
> Regards,
>
> George
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !

2015-03-14 Thread Andrija Panic
changin PG number - causes LOOOT of data rebalancing (in my case was 80%)
which I learned the hard way...

On 14 March 2015 at 18:49, Gabri Mate 
wrote:

> I had the same issue a few days ago. I was increasing the pg_num of one
> pool from 512 to 1024 and all the VMs in that pool stopped. I came to
> the conclusion that doubling the pg_num caused such a high load in ceph
> that the VMs were blocked. The next time I will test with small
> increments.
>
>
> On 12:38 Sat 14 Mar , Florent B wrote:
> > Hi all,
> >
> > I have a Giant cluster in production.
> >
> > Today one of my RBD pools had the "too few pgs" warning. So I changed
> > pg_num & pgp_num.
> >
> > And at this moment, some of the VM stored on this pool were stopped (on
> > some hosts, not all, it depends, no logic)
> >
> > All was running fine for months...
> >
> > Have you ever seen this ?
> > What could have caused this ?
> >
> > Thank you.
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Public Network Meaning

2015-03-14 Thread Andrija Panic
This is how I did it, and then retart each OSD one by one, but monritor
with ceph -s, when ceph is healthy, proceed with next OSD restart...
Make sure the networks are fine on physical nodes, that you can ping in
between...

[global]
x
x
x
x
x
x

#
### REPLICATION NETWORK ON SEPARATE 10G NICs

# replication network
cluster network = 10.44.251.0/24

# public/client network
public network = 10.44.253.0/16

#

[mon.xx]
mon_addr = x.x.x.x:6789
host = xx

[mon.yy]
mon_addr = x.x.x.x:6789
host = yy

[mon.zz]
mon_addr = x.x.x.x:6789
host = zz

On 14 March 2015 at 19:14, Georgios Dimitrakakis 
wrote:

> I thought that it was easy but apparently it's not!
>
> I have the following in my conf file
>
>
> mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
> public_network = 192.168.1.0/24
> mon_initial_members = fu,rai,jin
>
>
> but still the 15.12.6.21 link is being saturated
>
> Any ideas why???
>
> Should I put cluster network as well??
>
> Should I put each OSD in the CONF file???
>
>
> Regards,
>
>
> George
>
>
>
>
>
>  Andrija,
>>
>> thanks a lot for the useful info!
>>
>> I would also like to thank "Kingrat" at the IRC channel for his
>> useful advice!
>>
>>
>> I was under the wrong impression that public is the one used for RADOS.
>>
>> So I thought that public=external=internet and therefore I used that
>> one in my conf.
>>
>> I understand now that I should have specified in CEPH Public's
>> Network what I call
>> "internal" and which is the one that all machines are talking
>> directly to each other.
>>
>>
>> Thanks you all for the feedback!
>>
>>
>> Regards,
>>
>>
>> George
>>
>>
>>  Public network is clients-to-OSD traffic - and if you have NOT
>>> explicitely defined cluster network, than also OSD-to-OSD replication
>>> takes place over same network.
>>>
>>> Otherwise, you can define public and cluster(private) network - so OSD
>>> replication will happen over dedicated NICs (cluster network) and thus
>>> speed up.
>>>
>>> If i.e. replica count on pool is 3, that means, each 1GB of data
>>> writen to some particualr OSD, will generate 3 x 1GB of more writes,
>>> to the replicas... - which ideally will take place over separate NICs
>>> to speed up things...
>>>
>>> On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:
>>>
>>>  Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring data?

 I have two nodes with two IP addresses each. One for internal
 network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
 192.168.1.0/24 [1]
 and one external 15.12.6.*

 I see the following in my logs:

 osd.0 is down since epoch 2204, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
 osd.1 is down since epoch 2206, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
 osd.2 is down since epoch 2198, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
 osd.3 is down since epoch 2200, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
 osd.4 is down since epoch 2202, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
 osd.5 is down since epoch 2194, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
 osd.7 is down since epoch 2192, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
 osd.8 is down since epoch 2196, last address MAILSCANNER WARNING:
 NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9]

 I ve managed to add a second node and during rebalancing I see that
 data is transfered through
 the internal 192.* but the external link is also saturated!

 What is being transferred from that?

 Any help much appreciated!

 Regards,

 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [10]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]

>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>> Links:
>>> --
>>> [1] http://192.168.1.0/24
>>> [2] http://15.12.6.21:6826/33094
>>> [3] http://15.12.6.21:6817/32463
>>> [4] http://15.12.6.21:6843/34921
>>> [5] http://15.12.6.21:6838/34208
>>> [6] http://15.12.6.21:6831/33610
>>> [7] http://15.12.6.21:6858/35948
>>> [8] http://15.12.6.21:6871/36720
>>> [9] http://15.12.6.21:6855/35354
>>> [10] mailto:ceph-users@lists.ceph.com
>>> [11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> [12] mailto:gior...@acmac.uoc.gr
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http:

Re: [ceph-users] {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Andrija Panic
Georgios,

no need to put ANYTHING if you don't plan to split client-to-OSD vs
OSD-OSD-replication on 2 different Network Cards/Networks - for pefromance
reasons.

if you have only 1 network - simply DONT configure networks at all inside
your CEPH.conf file...

if you have 2 x 1G cards in servers, then you may use first 1G for client
traffic, and second 1G for OSD-to-OSD replication...

best

On 14 March 2015 at 19:33, Georgios Dimitrakakis 
wrote:

> Andrija,
>
> Thanks for you help!
>
> In my case I just have one 192.* network, so should I put that for both?
>
> Besides monitors do I have to list OSDs as well?
>
> Thanks again!
>
> Best,
>
> George
>
>  This is how I did it, and then retart each OSD one by one, but
>> monritor with ceph -s, when ceph is healthy, proceed with next OSD
>> restart...
>> Make sure the networks are fine on physical nodes, that you can ping
>> in between...
>>
>> [global]
>> x
>> x
>> x
>> x
>> x
>> x
>>
>> #
>> ### REPLICATION NETWORK ON SEPARATE 10G NICs
>>
>> # replication network
>> cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
>> MALICIOUS: 10.44.251.0/24 [29]
>>
>> # public/client network
>> public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
>> MALICIOUS: 10.44.253.0/16 [30]
>>
>> #
>>
>> [mon.xx]
>> mon_addr = x.x.x.x:6789
>> host = xx
>>
>> [mon.yy]
>> mon_addr = x.x.x.x:6789
>> host = yy
>>
>> [mon.zz]
>> mon_addr = x.x.x.x:6789
>> host = zz
>>
>> On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:
>>
>>  I thought that it was easy but apparently its not!
>>>
>>> I have the following in my conf file
>>>
>>> mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
>>> public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
>>> MALICIOUS: 192.168.1.0/24 [26]
>>> mon_initial_members = fu,rai,jin
>>>
>>> but still the 15.12.6.21 link is being saturated
>>>
>>> Any ideas why???
>>>
>>> Should I put cluster network as well??
>>>
>>> Should I put each OSD in the CONF file???
>>>
>>> Regards,
>>>
>>> George
>>>
>>>  Andrija,

 thanks a lot for the useful info!

 I would also like to thank "Kingrat" at the IRC channel for his
 useful advice!

 I was under the wrong impression that public is the one used for
 RADOS.

 So I thought that public=external=internet and therefore I used
 that
 one in my conf.

 I understand now that I should have specified in CEPH Publics
 Network what I call
 "internal" and which is the one that all machines are talking
 directly to each other.

 Thanks you all for the feedback!

 Regards,

 George

  Public network is clients-to-OSD traffic - and if you have NOT
> explicitely defined cluster network, than also OSD-to-OSD
> replication
> takes place over same network.
>
> Otherwise, you can define public and cluster(private) network -
> so OSD
> replication will happen over dedicated NICs (cluster network)
> and thus
> speed up.
>
> If i.e. replica count on pool is 3, that means, each 1GB of
> data
> writen to some particualr OSD, will generate 3 x 1GB of more
> writes,
> to the replicas... - which ideally will take place over
> separate NICs
> to speed up things...
>
> On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:
>
>  Hi all!!
>>
>> What is the meaning of public_network in ceph.conf?
>>
>> Is it the network that OSDs are talking and transferring
>> data?
>>
>> I have two nodes with two IP addresses each. One for internal
>> network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
>> MALICIOUS:
>> MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
>> 192.168.1.0/24 [1] [1]
>> and one external 15.12.6.*
>>
>> I see the following in my logs:
>>
>> osd.0 is down since epoch 2204, last address MAILSCANNER
>> WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
>> [2]
>> osd.1 is down since epoch 2206, last address MAILSCANNER
>> WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
>> [3]
>> osd.2 is down since epoch 2198, last address MAILSCANNER
>> WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
>> [4]
>> osd.3 is down since epoch 2200, last address MAILSCANNER
>> WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
>> [5]
>> osd.4 is down since epoch 2202, last address MAILSCANNER
>> WARNING:
>> NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER 

Re: [ceph-users] {Disarmed} Re: {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Andrija Panic
In that case - yes...put everything on 1 card - or if both cards are 1G (or
same speed for that matter...) - then you might want toblock all external
traffic except i.e. SSH, WEB, but allow ALL traffic between all CEPH
OSDs... so you can still use that network for "public/client" traffic - not
sure how do you connect/use CEPH - from internet ??? or you have some more
VMs/servers/clients on 192.* network... ?



On 14 March 2015 at 19:38, Georgios Dimitrakakis 
wrote:

> Andrija,
>
> I have two cards!
>
> One on 15.12.* and one on 192.*
>
> Obviously the 15.12.* is the external network (real public IP address e.g
> used to access the node via SSH)
>
> That's why I am telling that my public network for CEPH is the 192. and
> should I use the cluster network for that as well?
>
> Best,
>
> George
>
>
>  Georgios,
>>
>> no need to put ANYTHING if you dont plan to split client-to-OSD vs
>> OSD-OSD-replication on 2 different Network Cards/Networks - for
>> pefromance reasons.
>>
>> if you have only 1 network - simply DONT configure networks at all
>> inside your CEPH.conf file...
>>
>> if you have 2 x 1G cards in servers, then you may use first 1G for
>> client traffic, and second 1G for OSD-to-OSD replication...
>>
>> best
>>
>> On 14 March 2015 at 19:33, Georgios Dimitrakakis  wrote:
>>
>>  Andrija,
>>>
>>> Thanks for you help!
>>>
>>> In my case I just have one 192.* network, so should I put that for
>>> both?
>>>
>>> Besides monitors do I have to list OSDs as well?
>>>
>>> Thanks again!
>>>
>>> Best,
>>>
>>> George
>>>
>>>  This is how I did it, and then retart each OSD one by one, but
 monritor with ceph -s, when ceph is healthy, proceed with next
 OSD
 restart...
 Make sure the networks are fine on physical nodes, that you can
 ping
 in between...

 [global]
 x
 x
 x
 x
 x
 x

 #
 ### REPLICATION NETWORK ON SEPARATE 10G NICs

 # replication network
 cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.251.0/24 [29] [29]

 # public/client network
 public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
 MALICIOUS: 10.44.253.0/16 [30] [30]

 #

 [mon.xx]
 mon_addr = x.x.x.x:6789
 host = xx

 [mon.yy]
 mon_addr = x.x.x.x:6789
 host = yy

 [mon.zz]
 mon_addr = x.x.x.x:6789
 host = zz

 On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:

  I thought that it was easy but apparently its not!
>
> I have the following in my conf file
>
> mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
> public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
> MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
> MALICIOUS: 192.168.1.0/24 [26] [26]
> mon_initial_members = fu,rai,jin
>
> but still the 15.12.6.21 link is being saturated
>
> Any ideas why???
>
> Should I put cluster network as well??
>
> Should I put each OSD in the CONF file???
>
> Regards,
>
> George
>
>  Andrija,
>>
>> thanks a lot for the useful info!
>>
>> I would also like to thank "Kingrat" at the IRC channel for
>> his
>> useful advice!
>>
>> I was under the wrong impression that public is the one used
>> for
>> RADOS.
>>
>> So I thought that public=external=internet and therefore I
>> used
>> that
>> one in my conf.
>>
>> I understand now that I should have specified in CEPH Publics
>> Network what I call
>> "internal" and which is the one that all machines are talking
>> directly to each other.
>>
>> Thanks you all for the feedback!
>>
>> Regards,
>>
>> George
>>
>>  Public network is clients-to-OSD traffic - and if you have
>>> NOT
>>> explicitely defined cluster network, than also OSD-to-OSD
>>> replication
>>> takes place over same network.
>>>
>>> Otherwise, you can define public and cluster(private)
>>> network -
>>> so OSD
>>> replication will happen over dedicated NICs (cluster
>>> network)
>>> and thus
>>> speed up.
>>>
>>> If i.e. replica count on pool is 3, that means, each 1GB of
>>> data
>>> writen to some particualr OSD, will generate 3 x 1GB of
>>> more
>>> writes,
>>> to the replicas... - which ideally will take place over
>>> separate NICs
>>> to speed up things...
>>>
>>> On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:
>>>
>>>  Hi all!!

 What is the meaning of public_network in ceph.conf?

 Is it the network that OSDs are talking and transferring
 data?

Re: [ceph-users] RBD read-ahead not working in 0.87.1

2015-03-18 Thread Andrija Panic
Acutally, good question - is RBD caching at all - possible with Windows
guestes, if it ussing latest VirtIO drivers ?
Linux caching (write caching, writeback) is working fine with newer virt-io
drivers...

Thanks

On 18 March 2015 at 10:39, Alexandre DERUMIER  wrote:

> Hi,
>
> I don't known how rbd read-ahead is working,
>
> but with qemu virtio-scsi, you can have read merge request (for sequential
> reads), so it's doing bigger ops to ceph cluster and improve throughput.
> virtio-blk merge request will be supported in coming qemu 2.3.
>
>
> (I'm not sure of virtio-win drivers support of theses features)
>
>
> - Mail original -
> De: "Stephen Taylor" 
> À: "ceph-users" 
> Envoyé: Mardi 17 Mars 2015 21:22:59
> Objet: Re: [ceph-users] RBD read-ahead not working in 0.87.1
>
>
>
> Never mind. After digging through the history on Github it looks like the
> docs are wrong. The code for the RBD read-ahead feature appears in 0.88,
> not 0.86, which explains why I can’t get it to work in 0.87.1.
>
>
>
> Steve
>
>
>
>
> From: Stephen Taylor
> Sent: Tuesday, March 17, 2015 11:32 AM
> To: 'ceph-us...@ceph.com'
> Subject: RBD read-ahead not working in 0.87.1
>
>
>
>
> Hello, fellow Ceph users,
>
>
>
> I’m trying to utilize RBD read-ahead settings with 0.87.1 (documented as
> new in 0.86) to convince the Windows boot loader to boot a Windows RBD in a
> reasonable amount of time using QEMU on Ubuntu 14.04.2. Below is the output
> of “ceph -w” during the Windows VM boot process. During the boot loader
> phase it’s almost a perfect correspondence of kB/s rd and op/s, which I
> interpret as the boot loader doing LOTS of non-cached, 1kB reads. This is
> what the [client] section of my ceph.conf looks like:
>
>
>
> [client]
>
> rbd_cache = true
>
> rbd_cache_size = 268435456
>
> rbd_cache_max_dirty = 201326592
>
> rbd_cache_target_dirty = 134217728
>
> rbd_readahead_trigger_requests = 1
>
> rbd_readahead_max_bytes = 524288
>
> rbd_readahead_disable_after_bytes = 0
>
> rbd_cache_writethrough_until_flush = true
>
> admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
>
>
>
> Some of those values are not what I would use in production. This is just
> a test environment to try to prove that the RBD read-ahead caching works as
> I expect.
>
>
>
> Another interesting note is that “sudo ceph daemon /var/run/ceph/ socket> config show | grep rbd_readahead” yields nothing. The “config show”
> lists all of the config settings with the values I expect, but the
> rbd_readahead_* settings are absent. I have tried all kinds of different
> values in my ceph.conf file with the same result.
>
>
>
> The reason I’m convinced that read-ahead caching is my problem here is
> that I can mount my RBD via rbd-fuse and use the same QEMU command with the
> -drive parameter changed to use the rbd-fuse mount as a raw file instead of
> direct librbd, and the same Windows VM boots in a fraction of the time with
> much lower op/s numbers in the Ceph status output. I assume this is due to
> the Linux page cache helping me out with the rbd-fuse mount.
>
>
>
> Are the RBD read-ahead settings simply not working? That’s what it looks
> like, but I figure I must be doing something wrong. Thanks for any help.
>
>
>
> Steve Taylor
>
>
>
> 2015-03-17 09:50:19.209721 mon.0 [INF] pgmap v20871: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 3 B/s rd,
> 0 op/s
>
> 2015-03-17 09:50:24.199327 mon.0 [INF] pgmap v20872: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 7 B/s rd,
> 0 op/s
>
> 2015-03-17 10:02:03.471846 mon.0 [INF] pgmap v20873: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 1 B/s rd,
> 0 op/s
>
> 2015-03-17 10:02:05.739547 mon.0 [INF] pgmap v20874: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 754 B/s
> rd, 0 op/s
>
> 2015-03-17 10:02:08.008245 mon.0 [INF] pgmap v20875: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 144 kB/s
> rd, 156 op/s
>
> 2015-03-17 10:02:09.286862 mon.0 [INF] pgmap v20876: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 130 kB/s
> rd, 147 op/s
>
> 2015-03-17 10:02:10.543695 mon.0 [INF] pgmap v20877: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 614 kB/s
> rd, 614 op/s
>
> 2015-03-17 10:02:11.832906 mon.0 [INF] pgmap v20878: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 828 kB/s
> rd, 828 op/s
>
> 2015-03-17 10:02:12.998471 mon.0 [INF] pgmap v20879: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 387 kB/s
> rd, 387 op/s
>
> 2015-03-17 10:02:14.378462 mon.0 [INF] pgmap v20880: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 76889 B/s
> rd, 75 op/s
>
> 2015-03-17 10:02:15.656530 mon.0 [INF] pgmap v20881: 8192 pgs: 8192
> active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB 

  1   2   >