Re: [ceph-users] old osds take much longer to start than newer osd

2015-03-02 Thread Stephan Hohn
Try and check the xfs fragmentation factor on your „old“ osds.

$ xfs_db -c frag -r /dev/sdX

and see if it’s incredible high.

> On 27 Feb 2015, at 14:02, Corin Langosch  wrote:
> 
> Hi guys,
> 
> I'm using ceph for a long time now, since bobtail. I always upgraded every 
> few weeks/ months to the latest stable
> release. Of course I also removed some osds and added new ones. Now during 
> the last few upgrades (I just upgraded from
> 80.6 to 80.8) I noticed that old osds take much longer to startup than equal 
> newer osds (same amount of data/ disk
> usage, same kind of storage+journal backing device (ssd), same weight, same 
> number of pgs, ...). I know I observed the
> same behavior earlier but just didn't really care about it. Here are the 
> relevant log entries (host of osd.0 and osd.15
> has less cpu power than the others):
> 
> old osds (average pgs load time: 1.5 minutes)
> 
> 2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
> 2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824 pgs
> 
> 2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
> 2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776 pgs
> 
> 2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
> 2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806 pgs
> 
> 2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
> 2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795 pgs
> 
> new osds (average pgs load time: 3 seconds)
> 
> 2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
> 2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened 873 
> pgs
> 
> 2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
> 2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened 848 
> pgs
> 
> 2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
> 2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened 821 
> pgs
> 
> 2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
> 2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened 815 
> pgs
> 
> The old osds also take more memory, here's an example:
> 
> root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55 
> /usr/bin/ceph-osd -i 9 --pid-file
> /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
> root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20 
> /usr/bin/ceph-osd -i 18 --pid-file
> /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph
> 
> 
> It seems to me there is still some old data around for the old osds which was 
> not properly migrated/ cleaned up during
> the upgrades. The cluster is healthy, no problems at all the last few weeks. 
> Is there any way to clean this up?
> 
> Thanks
> Corin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about rgw create bucket

2015-03-02 Thread ghislain.chevalier
HI all,
I think this question can maybe be linked to the mail I sent (fev 25) related 
to "unconsistency between bucket and bucket.instance".
Best regards
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
baijia...@126.com
Envoyé : lundi 2 mars 2015 08:00
À : ceph-users; Ceph?Development
Objet : [ceph-users] question about rgw create bucket

when I create bucket, why rgw create 2 objects in the domain root pool.
and one object store struct RGWBucketInfo  and the other object store struct 
RGWBucketEntryPoint

and when I delete the bucket , why rgw only delete one object.


baijia...@126.com

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] old osds take much longer to start than newer osd

2015-03-02 Thread Corin Langosch
It's a little worse, but not much:

root@r-ch106:~# xfs_db -c frag -r /dev/sda1
actual 397955, ideal 324744, fragmentation factor 18.40%
root@r-ch106:~# xfs_db -c frag -r /dev/sdb2
actual 378729, ideal 324349, fragmentation factor 14.36%

root@r-ch105:~# xfs_db -c frag -r /dev/sdb2
actual 382831, ideal 328632, fragmentation factor 14.16%
root@r-ch105:~# xfs_db -c frag -r /dev/sdc1
actual 363590, ideal 318793, fragmentation factor 12.32%

...

On 02.03.2015 09:23, Stephan Hohn wrote:
> Try and check the xfs fragmentation factor on your „old“ osds.
> 
> $ xfs_db -c frag -r /dev/sdX
> /
> /
> and see if it’s incredible high.
> 
>> On 27 Feb 2015, at 14:02, Corin Langosch > > wrote:
>>
>> Hi guys,
>>
>> I'm using ceph for a long time now, since bobtail. I always upgraded every 
>> few weeks/ months to the latest stable
>> release. Of course I also removed some osds and added new ones. Now during 
>> the last few upgrades (I just upgraded from
>> 80.6 to 80.8) I noticed that old osds take much longer to startup than equal 
>> newer osds (same amount of data/ disk
>> usage, same kind of storage+journal backing device (ssd), same weight, same 
>> number of pgs, ...). I know I observed the
>> same behavior earlier but just didn't really care about it. Here are the 
>> relevant log entries (host of osd.0 and osd.15
>> has less cpu power than the others):
>>
>> old osds (average pgs load time: 1.5 minutes)
>>
>> 2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
>> 2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824 
>> pgs
>>
>> 2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
>> 2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776 
>> pgs
>>
>> 2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
>> 2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806 
>> pgs
>>
>> 2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
>> 2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795 
>> pgs
>>
>> new osds (average pgs load time: 3 seconds)
>>
>> 2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
>> 2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened 873 
>> pgs
>>
>> 2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
>> 2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened 848 
>> pgs
>>
>> 2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
>> 2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened 821 
>> pgs
>>
>> 2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
>> 2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened 815 
>> pgs
>>
>> The old osds also take more memory, here's an example:
>>
>> root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55 
>> /usr/bin/ceph-osd -i 9 --pid-file
>> /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
>> root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20 
>> /usr/bin/ceph-osd -i 18 --pid-file
>> /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph
>>
>>
>> It seems to me there is still some old data around for the old osds which 
>> was not properly migrated/ cleaned up during
>> the upgrades. The cluster is healthy, no problems at all the last few weeks. 
>> Is there any way to clean this up?
>>
>> Thanks
>> Corin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Permanente Mount RBD blocs device RHEL7

2015-03-02 Thread Jesus Chavez (jeschave)
Hi all! I have been trying to get permanent my fs maked by the rbd device 
mapping on rhel7 modifying /etc/fstab but everytime I reboot the server I lose 
the mapping to the pool so the server gets stuck since It didnt find the 
/dev/rbd0 device, does anybody know if there any procedure to not lose the 
mapping or make the filesystem permanent?

Thanks!


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone : +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Permanente Mount RBD blocs device RHEL7

2015-03-02 Thread Alexandre DERUMIER
Hi,

maybe this can help you:

http://www.sebastien-han.fr/blog/2013/11/22/map-slash-unmap-rbd-device-on-boot-slash-shutdown/


Regards,

Alexandre

- Mail original -
De: "Jesus Chavez (jeschave)" 
À: "ceph-users" 
Envoyé: Lundi 2 Mars 2015 11:14:49
Objet: [ceph-users] Permanente Mount RBD blocs device RHEL7

Hi all! I have been trying to get permanent my fs maked by the rbd device 
mapping on rhel7 modifying /etc/fstab but everytime I reboot the server I lose 
the mapping to the pool so the server gets stuck since It didnt find the 
/dev/rbd0 device, does anybody know if there any procedure to not lose the 
mapping or make the filesystem permanent? 

Thanks! 


Jesus Chavez 
SYSTEMS ENGINEER-C.SALES 

jesch...@cisco.com 
Phone : +52 55 5267 3146 
Mobile: +51 1 5538883255 

CCIE - 44433 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Permanente Mount RBD blocs device RHEL7

2015-03-02 Thread Jesus Chavez (jeschave)
Thank you so much Alexandre! :)


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

On Mar 2, 2015, at 4:26 AM, Alexandre DERUMIER 
mailto:aderum...@odiso.com>> wrote:

Hi,

maybe this can help you:

http://www.sebastien-han.fr/blog/2013/11/22/map-slash-unmap-rbd-device-on-boot-slash-shutdown/


Regards,

Alexandre

- Mail original -
De: "Jesus Chavez (jeschave)" mailto:jesch...@cisco.com>>
À: "ceph-users" mailto:ceph-users@lists.ceph.com>>
Envoyé: Lundi 2 Mars 2015 11:14:49
Objet: [ceph-users] Permanente Mount RBD blocs device RHEL7

Hi all! I have been trying to get permanent my fs maked by the rbd device 
mapping on rhel7 modifying /etc/fstab but everytime I reboot the server I lose 
the mapping to the pool so the server gets stuck since It didnt find the 
/dev/rbd0 device, does anybody know if there any procedure to not lose the 
mapping or make the filesystem permanent?

Thanks!


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone : +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-02 Thread Tony Harris
On Sun, Mar 1, 2015 at 11:19 PM, Christian Balzer  wrote:

>
> > >
> > I'll be honest, the pricing on Intel's website is far from reality.  I
> > haven't been able to find any OEMs, and retail pricing on the 200GB 3610
> > is ~231 (the $300 must have been a different model in the line).
> > Although $231 does add up real quick if I need to get 6 of them :(
> >
> >
> Using the google shopping (which isn't ideal, but for simplicities sake)
> search I see the 100GB DC S3700 from 170USD and the 160GB DC S3500 from
> 150USD, which are a pretty good match to the OEM price on the Intel site
> of 180 and 160 respectively.
>
>
If I have to buy them personally, that'll work well.  If I can get work to
get them, then I kinda have to limit myself to whom we have marked as
suppliers as it's a pain to get a new company in the mix.



> > > You really wouldn't want less than 200MB/s, even in your setup which I
> > > take to be 2Gb/s from what you wrote below.
> >
> >
> >
> > > Note that the 100GB 3700 is going to perform way better and last
> > > immensely longer than the 160GB 3500 while being moderately more
> > > expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times
> > > long AND cheaper than the 240GB 3500.
> > >
> > > It is pretty much those numbers that made me use 4 100GB 3700s instead
> > > of 3500s (240GB), much more bang for the buck and it still did fit my
> > > budget and could deal with 80% of the network bandwidth.
> > >
> >
> > So the 3710's would be an ok solution?
>
> No, because they start from 200GB and with a 300USD price tag. The 3710s
> do not replace the 3700s, they extend the selection upwards (in size
> mostly).
>

I thought I had corrected that - I was thinking the 3700's and typed 3710 :)


>
> >I have seen the 3700s for right
> > about $200, which although doesn't seem a lot cheaper, when getting 6,
> > that does shave about $200 after shipping costs as well...
> >
> See above, google shopping. The lowballer is Walmart, of all places:
>
> http://www.walmart.com/ip/26972768?wmlspartner=wlpa&selectedSellerId=0
>
>
> >
> > >
> > > >
> > > > >
> > > > > Guestimate the amount of data written to your cluster per day,
> > > > > break that down to the load a journal SSD will see and then
> > > > > multiply by at least 5 to be on the safe side. Then see which SSD
> > > > > will fit your expected usage pattern.
> > > > >
> > > >
> > > > Luckily I don't think there will be a ton of data per day written.
> > > > The majority of servers whose VHDs will be stored in our cluster
> > > > don't have a lot of frequent activity - aside from a few windows
> > > > servers that have DBs servers in them (and even they don't write a
> > > > ton of data per day really).
> > > >
> > >
> > > Being able to put even a coarse number on this will tell you if you can
> > > skim on the endurance and have your cluster last like 5 years or if
> > > getting a higher endurance SSD is going to be cheaper.
> > >
> >
> > Any suggestions on how I can get a really accurate number on this?  I
> > mean, I could probably get some good numbers from the database servers
> > in terms of their writes in a given day, but when it comes to other
> > processes running in the background I'm not sure how much these  might
> > really affect this number.
> >
>
> If you have existing servers that run linux and have been up for
> reasonably long time (months), iostat will give you a very good idea.
> No ideas about Windows, but I bet those stats exist someplace, too.
>

I can't say months, but at least a month, maybe two - trying to remember
when our last extended power outage was - I can find out later.


>
> For example a Ceph storage node, up 74 days with OS and journals on the
> first 4 drives and OSD HDDs on the other 8:
>
> Device:tpskB_read/skB_wrtn/skB_read kB_wrtn
> sda   9.8229.88   187.87  191341125 1203171718
> sdb   9.7929.57   194.22  189367432 1243850846
> sdc   9.7729.83   188.89  191061000 1209676622
> sdd   8.7729.57   175.40  189399240 1123294410
> sde   5.24   354.1955.68 2268306443  356604748
> sdi   5.02   335.6163.60 2149338787  407307544
> sdj   4.96   350.3352.43 2243590803  335751320
> sdl   5.04   374.6248.49 2399170183  310559488
> sdf   4.85   354.5250.43 2270401571  322947192
> sdh   4.77   332.3850.60 2128622471  324065888
> sdg   6.26   403.9765.42 2587109283  418931316
> sdk   5.86   385.3655.61 2467921295  356120140
>

I do have some linux vms that have been up for a while, can't say how many
months since the last extended power outage off hand (granted I know once I
look at the uptime), but hopefully it will at least give me an idea.


> >
> > >
> > >
> > > >
> > > So it's 2x1Gb/s then?
> > >

[ceph-users] qemu-kvm and cloned rbd image

2015-03-02 Thread koukou73gr


Hello,

Today I thought I'd experiment with snapshots and cloning. So I did:

rbd import --image-format=2 vm-proto.raw rbd/vm-proto
rbd snap create rbd/vm-proto@s1
rbd snap protect rbd/vm-proto@s1
rbd clone rbd/vm-proto@s1 rbd/server

And then proceeded to create a qemu-kvm guest with rbd/server as its
backing store. The guest booted but as soon as it got to mount the root
fs, things got weird:

[...]
scsi2 : Virtio SCSI HBA
scsi 2:0:0:0: Direct-Access QEMU QEMU HARDDISK1.5. PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
dracut: Scanning devices sda2  for LVM logical volumes vg_main/lv_swap 
vg_main/lv_root
dracut: inactive '/dev/vg_main/lv_swap' [1.00 GiB] inherit
dracut: inactive '/dev/vg_main/lv_root' [6.50 GiB] inherit
EXT4-fs (dm-1): INFO: recovery required on readonly filesystem
EXT4-fs (dm-1): write access will be enabled during recovery
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 b0 e0 d8 00 00 08 00
Buffer I/O error on device dm-1, logical block 1058331
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 6f ba c8 00 00 08 00
[ ... snip ... snip ... more or less the same messages ]
end_request: I/O error, dev sda, sector 3129880
end_request: I/O error, dev sda, sector 11518432
end_request: I/O error, dev sda, sector 3194664
end_request: I/O error, dev sda, sector 3129824
end_request: I/O error, dev sda, sector 3194376
end_request: I/O error, dev sda, sector 11579664
end_request: I/O error, dev sda, sector 3129448
end_request: I/O error, dev sda, sector 3197856
end_request: I/O error, dev sda, sector 3129400
end_request: I/O error, dev sda, sector 7385360
end_request: I/O error, dev sda, sector 11515912
end_request: I/O error, dev sda, sector 11514112
__ratelimit: 12 callbacks suppressed
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 af b0 80 00 00 10 00
__ratelimit: 12 callbacks suppressed
__ratelimit: 13 callbacks suppressed
Buffer I/O error on device dm-1, logical block 1048592
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 1048593
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f bf 00 00 00 08 00
Buffer I/O error on device dm-1, logical block 480
lost page write due to I/O error on dm-1
[... snip... more of the same ... ]
Buffer I/O error on device dm-1, logical block 475
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 476
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 477
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 30 00 00 10 00
Buffer I/O error on device dm-1, logical block 454
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 10 00 00 18 00
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 08 00 00 08 00
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f bd 88 00 00 08 00
__ratelimit: 5 callbacks suppressed
Buffer I/O error on device dm-1, logical block 433
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00

Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-02 Thread Mark Nelson

Hi Alex,

I see I even responded in the same thread!  This would be a good thing 
to bring up in the meeting on Wednesday.  Those are far faster single 
OSD results than I've been able to muster with simplemessenger.  I 
wonder how much effect flow-control and header/data crc had.  He did 
have quite a bit more CPU (Intel specs say 14 cores @ 2.6GHz, 28 if you 
count hyperthreading).  Depending on whether there were 1 or 2 CPUs in 
that node, that might be around 3x the CPU power I have here.


Some other thoughts:  Were the simplemessenger tests on IPoIB or native? 
 How big was the RBD volume that was created (could some data be 
locally cached)?  Did network data transfer statistics match the 
benchmark result numbers?


I also did some tests on fdcache, though just glancing at the results it 
doesn't look like tweaking those parameters had much effect.


Mark

On 03/01/2015 08:38 AM, Alexandre DERUMIER wrote:

Hi Mark,

I found an previous bench from Vu Pham (it's was about simplemessenger vs 
xiomessenger)

http://www.spinics.net/lists/ceph-devel/msg22414.html

and with 1 osd, he was able to reach 105k iops with simple messenger

. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)

this was with more powerfull nodes, but the difference seem to be quite huge



- Mail original -
De: "aderumier" 
À: "Mark Nelson" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Vendredi 27 Février 2015 07:10:42
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Thanks Mark for the results,
default values seem to be quite resonable indeed.


I also wonder is cpu frequency can have an impact on latency or not.
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks,
I'll try replay your benchmark to compare



- Mail original -
De: "Mark Nelson" 
À: "ceph-devel" , "ceph-users" 

Envoyé: Jeudi 26 Février 2015 05:44:15
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Everyone,

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison
thread, Alexandre DERUMIER wondered if changing the default shard and
threads per shard OSD settings might have a positive effect on
performance in our tests. I went back and used one of the PCIe SSDs
from our previous tests to experiment with a recent master pull. I
wanted to know how performance was affected by changing these parameters
and also to validate that the default settings still appear to be correct.

I plan to conduct more tests (potentially across multiple SATA SSDs in
the same box), but these initial results seem to show that the default
settings that were chosen are quite reasonable.

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
Hi people,

I had one OSD crash, so the rebalancing happened - all fine (some 3% of the
data has been moved arround, and rebalanced) and my previous
recovery/backfill throtling was applied fine and we didnt have a unusable
cluster.

Now I used the procedure to remove this crashed OSD comletely from the CEPH
(
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
)

and when I used the "ceph osd crush remove osd.0" command, all of a sudden,
CEPH started to rebalance once again, this time with 37% of the object that
are "missplaced" and based on the eperience inside VMs, and the Recovery
RAte in MB/s - I can tell that my throtling of backfilling and recovery is
not taken into consideration.

Why is this, 37% of all objects again being moved arround, any help, hint,
explanation greatly appreciated.

This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash
etc.

The throtling that I have applied from before is like folowing:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

Please advise...
Thanks

-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Wido den Hollander
On 03/02/2015 03:56 PM, Andrija Panic wrote:
> Hi people,
> 
> I had one OSD crash, so the rebalancing happened - all fine (some 3% of the
> data has been moved arround, and rebalanced) and my previous
> recovery/backfill throtling was applied fine and we didnt have a unusable
> cluster.
> 
> Now I used the procedure to remove this crashed OSD comletely from the CEPH
> (
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
> )
> 
> and when I used the "ceph osd crush remove osd.0" command, all of a sudden,
> CEPH started to rebalance once again, this time with 37% of the object that
> are "missplaced" and based on the eperience inside VMs, and the Recovery
> RAte in MB/s - I can tell that my throtling of backfilling and recovery is
> not taken into consideration.
> 
> Why is this, 37% of all objects again being moved arround, any help, hint,
> explanation greatly appreciated.
> 

This has been discussed a couple of times on the list. If you remove a
item from the CRUSHMap, although it has a weight of 0, a rebalance still
happens since the CRUSHMap changes.

> This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash
> etc.
> 
> The throtling that I have applied from before is like folowing:
> 
> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
> ceph tell osd.* injectargs '--osd_max_backfills 1'
> 
> Please advise...
> Thanks
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph breizh meetup

2015-03-02 Thread eric mourgaya
Hi cephers,

The next  ceph breizhcamp will be scheduled the 12th march 2015, at Nantes
more precisely at Suravenir Assurance 2 rue vasco de gama,Saint-Herblain,
France.
It will begin at 10.00AM.

join us an fill the http://doodle.com/hvb99f2am7qucd5q
-- 
Eric Mourgaya,


Respectons la planete!
Luttons contre la mediocrite!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] XFS recovery on boot : rogue mounts ?

2015-03-02 Thread SCHAER Frederic
Hi,

I rebooted a failed server, which is now showing a rogue filesystem mount.
Actually, there were also several disks missing in the node, all reported as 
"prepared" by ceph-disk, but not activated.

[root@ceph2 ~]# grep /var/lib/ceph/tmp /etc/mtab
/dev/sdo1 /var/lib/ceph/tmp/mnt.usVRe8 xfs rw,noatime,attr2,inode64,noquota 0 0

This path does not exist, and after having to "ceph-disk activate-all", I can 
now see the OSD under it's correct path (and missing ones got mounted too) :

[root@ceph2 ~]# grep sdo1 /etc/mtab
/dev/sdo1 /var/lib/ceph/tmp/mnt.usVRe8 xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdo1 /var/lib/ceph/osd/ceph-53 xfs rw,noatime,attr2,inode64,noquota 0 0
[root@ceph2 ~]# ll /var/lib/ceph/tmp/mnt.usVRe8
ls: cannot access /var/lib/ceph/tmp/mnt.usVRe8: No such file or directory

I just looked at the logs, and it appears that this sdo disk performed an XFS 
recovery at boot :

Mar  2 11:33:45 ceph2 kernel: [   21.479747] XFS (sdo1): Mounting Filesystem
Mar  2 11:33:45 ceph2 kernel: [   21.641263] XFS (sdo1): Starting recovery 
(logdev: internal)
Mar  2 11:33:45 ceph2 kernel: [   21.674451] XFS (sdo1): Ending recovery 
(logdev: internal)

I do not see any "Ending clean mount" line for this disk.
If I check the syslogs, I can see OSDs are usually mounted twice, but not 
always, and sometimes they even aren't mounted at all :

[root@ceph2 ~]# zegrep 'XFS.*Ending clean' /var/log/messages.1.gz |sed -e 
s/.*XFS/XFS/|sort |uniq -c
  2 XFS (sdb1): Ending clean mount
  2 XFS (sdd1): Ending clean mount
  1 XFS (sde1): Ending clean mount
  2 XFS (sdg1): Ending clean mount
  1 XFS (sdh1): Ending clean mount
  1 XFS (sdi1): Ending clean mount
  2 XFS (sdj1): Ending clean mount
  3 XFS (sdk1): Ending clean mount
  3 XFS (sdl1): Ending clean mount
  4 XFS (sdm1): Ending clean mount

So : would there be an issue with disks that perform an XFS recovery at boot ?
I know that reboot will cleanup things, but rebooting isn't the cleanest thing 
to do...

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-02 Thread Alexandre DERUMIER
>> This would be a good thing to bring up in the meeting on Wednesday. 
yes !

>>I wonder how much effect flow-control and header/data crc had. 
yes. I known that sommath also disable crc for his bench

>>Were the simplemessenger tests on IPoIB or native? 

I think it's native, as the Vu Pham benchmark was done on mellanox sx1012 
(ethernet).
And xio messenger was on Roce (rdma over ethernet)

>>How big was the RBD volume that was created (could some data be 
>>locally cached)? Did network data transfer statistics match the 
>>benchmark result numbers? 



I @cc Vu pham to this mail maybe it'll be able to give us answer.


Note that I'll have same mellanox switches (sx1012) for my production cluster 
in some weeks,
so I'll be able to reproduce the bench. (with 2x10 cores 3,1ghz nodes and 
clients).





- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-devel" , "ceph-users" 

Envoyé: Lundi 2 Mars 2015 15:39:24
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Alex, 

I see I even responded in the same thread! This would be a good thing 
to bring up in the meeting on Wednesday. Those are far faster single 
OSD results than I've been able to muster with simplemessenger. I 
wonder how much effect flow-control and header/data crc had. He did 
have quite a bit more CPU (Intel specs say 14 cores @ 2.6GHz, 28 if you 
count hyperthreading). Depending on whether there were 1 or 2 CPUs in 
that node, that might be around 3x the CPU power I have here. 

Some other thoughts: Were the simplemessenger tests on IPoIB or native? 
How big was the RBD volume that was created (could some data be 
locally cached)? Did network data transfer statistics match the 
benchmark result numbers? 

I also did some tests on fdcache, though just glancing at the results it 
doesn't look like tweaking those parameters had much effect. 

Mark 

On 03/01/2015 08:38 AM, Alexandre DERUMIER wrote: 
> Hi Mark, 
> 
> I found an previous bench from Vu Pham (it's was about simplemessenger vs 
> xiomessenger) 
> 
> http://www.spinics.net/lists/ceph-devel/msg22414.html 
> 
> and with 1 osd, he was able to reach 105k iops with simple messenger 
> 
> . ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32) 
> 
> this was with more powerfull nodes, but the difference seem to be quite huge 
> 
> 
> 
> - Mail original - 
> De: "aderumier"  
> À: "Mark Nelson"  
> Cc: "ceph-devel" , "ceph-users" 
>  
> Envoyé: Vendredi 27 Février 2015 07:10:42 
> Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 
> 
> Thanks Mark for the results, 
> default values seem to be quite resonable indeed. 
> 
> 
> I also wonder is cpu frequency can have an impact on latency or not. 
> I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks, 
> I'll try replay your benchmark to compare 
> 
> 
> 
> - Mail original - 
> De: "Mark Nelson"  
> À: "ceph-devel" , "ceph-users" 
>  
> Envoyé: Jeudi 26 Février 2015 05:44:15 
> Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 
> 
> Hi Everyone, 
> 
> In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
> thread, Alexandre DERUMIER wondered if changing the default shard and 
> threads per shard OSD settings might have a positive effect on 
> performance in our tests. I went back and used one of the PCIe SSDs 
> from our previous tests to experiment with a recent master pull. I 
> wanted to know how performance was affected by changing these parameters 
> and also to validate that the default settings still appear to be correct. 
> 
> I plan to conduct more tests (potentially across multiple SATA SSDs in 
> the same box), but these initial results seem to show that the default 
> settings that were chosen are quite reasonable. 
> 
> Mark 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majord...@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Hi all, especially devs,

We have recently pinpointed one of the causes of slow requests in our
cluster. It seems deep-scrubs on pg's that contain the index file for a
large radosgw bucket lock the osds. Incresing op threads and/or disk
threads helps a little bit, but we need to increase them beyond reason in
order to completely get rid of the problem. A somewhat similar (and more
severe) version of the issue occurs when we call listomapkeys for the index
file, and since the logs for deep-scrubbing was much harder read, this
inspection was based on listomapkeys.

In this example osd.121 is the primary of pg 10.c91 which contains file
.dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
~500k objects. Standard listomapkeys call take about 3 seconds.

time rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null
real 0m2.983s
user 0m0.760s
sys 0m0.148s

In order to lock the osd we request 2 of them simultaneously with something
like:

rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
sleep 1
rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &

'debug_osd=30' logs show the flow like:

At t0 some thread enqueue_op's my omap-get-keys request.
Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
keys.
Op-Thread B responds to several other requests during that 1 second sleep.
They're generally extremely fast subops on other pgs.
At t1 (about a second later) my second omap-get-keys request gets
enqueue_op'ed. But it does not start probably because of the lock held by
Thread A.
After that point other threads enqueue_op other requests on other pgs too
but none of them starts processing, in which i consider the osd is locked.
At t2 (about another second later) my first omap-get-keys request is
finished.
Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts
reading ~500k keys again.
Op-Thread A continues to process the requests enqueued in t1-t2.

It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
can process other requests for other pg's just fine.

My guess is a somewhat larger scenario happens in deep-scrubbing, like on
the pg containing index for the bucket of >20M objects. A disk/op thread
starts reading through the omap which will take say 60 seconds. During the
first seconds, other requests for other pgs pass just fine. But in 60
seconds there are bound to be other requests for the same pg, especially
since it holds the index file. Each of these requests lock another disk/op
thread to the point where there are no free threads left to process any
requests for any pg. Causing slow-requests.

So first of all thanks if you can make it here, and sorry for the involved
mail, i'm exploring the problem as i go.
Now, is that deep-scrubbing situation i tried to theorize even possible? If
not can you point us where to look further.
We are currently running 0.72.2 and know about newer ioprio settings in
Firefly and such. While we are planning to upgrade in a few weeks but i
don't think those options will help us in any way. Am i correct?
Are there any other improvements that we are not aware?

Regards,


-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

2015-03-02 Thread Andrija Panic
OK
thx Wido.

Than can we at least update the documentaiton, that will say MAJOR data
rebalancing will happen AGAIN, and not 3%, but 37% in my case.
Because, I would never run this during work hours, while clients are
hammering VMs...

This reminds me of those tunable changes couple of months ago, when my
cluster completely colapsed during data rebalancing...

I don't see any option to contribute to documentation ?

Best




On 2 March 2015 at 16:07, Wido den Hollander  wrote:

> On 03/02/2015 03:56 PM, Andrija Panic wrote:
> > Hi people,
> >
> > I had one OSD crash, so the rebalancing happened - all fine (some 3% of
> the
> > data has been moved arround, and rebalanced) and my previous
> > recovery/backfill throtling was applied fine and we didnt have a unusable
> > cluster.
> >
> > Now I used the procedure to remove this crashed OSD comletely from the
> CEPH
> > (
> >
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd
> > )
> >
> > and when I used the "ceph osd crush remove osd.0" command, all of a
> sudden,
> > CEPH started to rebalance once again, this time with 37% of the object
> that
> > are "missplaced" and based on the eperience inside VMs, and the Recovery
> > RAte in MB/s - I can tell that my throtling of backfilling and recovery
> is
> > not taken into consideration.
> >
> > Why is this, 37% of all objects again being moved arround, any help,
> hint,
> > explanation greatly appreciated.
> >
>
> This has been discussed a couple of times on the list. If you remove a
> item from the CRUSHMap, although it has a weight of 0, a rebalance still
> happens since the CRUSHMap changes.
>
> > This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the
> crash
> > etc.
> >
> > The throtling that I have applied from before is like folowing:
> >
> > ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> > ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
> > ceph tell osd.* injectargs '--osd_max_backfills 1'
> >
> > Please advise...
> > Thanks
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shutting down a cluster fully and powering it back up

2015-03-02 Thread Daniel Schneller

On 2015-02-28 20:46:15 +, Gregory Farnum said:


Sounds good!
-Greg
On Sat, Feb 28, 2015 at 10:55 AM David 
 wrote:

Hi!



We did that a few weeks ago and it mostly worked fine.
However, on startup of one of the 4 machines, it got stuck
while starting OSDs (at least that's what the console
output indicated), while the others started up just
fine.

After waiting for more than 20 minutes with the other
3 machines already back up we hit ctrl-alt-del via
the server console. The signal got caught, the OS restarted
and came up without problems the next time.

Unfortunately, as this was in the middle of the night
after a very long day of moving hardware around in the
datacenter we did not manage to save the logs before
they were rotated...

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Daniel Schneller

On our Ubuntu 14.04/Firefly 0.80.8 cluster we are seeing
problem with log file rotation for the rados gateway.

The /etc/logrotate.d/radosgw script gets called, but
it does not work correctly. It spits out this message,
coming from the postrotate portion:

   /etc/cron.daily/logrotate:
   reload: Unknown parameter: id
   invoke-rc.d: initscript radosgw, action "reload" failed.

A new log file actually gets created, but due to the
failure in the post-rotate script, the daemon actually
continues writing into the now deleted previous file:

   [B|root@node01]  /etc/init ➜  ps aux | grep radosgw
   root 13077  0.9  0.1 13710396 203256 ? Ssl  Feb14 212:27 
/usr/bin/radosgw -n client.radosgw.node01


   [B|root@node01]  /etc/init ➜  ls -l /proc/13077/fd/
   total 0
   lr-x-- 1 root root 64 Mar  2 15:53 0 -> /dev/null
   lr-x-- 1 root root 64 Mar  2 15:53 1 -> /dev/null
   lr-x-- 1 root root 64 Mar  2 15:53 2 -> /dev/null
   l-wx-- 1 root root 64 Mar  2 15:53 3 -> 
/var/log/radosgw/radosgw.log.1 (deleted)

   ...

Trying manually with   service radosgw reload  fails with
the same message. Running the non-upstart
/etc/init.d/radosgw reload   works. It will, kind of crudely,
just send a SIGHUP to any running radosgw process.

To figure out the cause I compared OSDs and RadosGW wrt
to upstart and got this:

   [B|root@node01]  /etc/init ➜  initctl list | grep osd
   ceph-osd-all start/running
   ceph-osd-all-starter stop/waiting
   ceph-osd (ceph/8) start/running, process 12473
   ceph-osd (ceph/9) start/running, process 12503
   ...

   [B|root@node01]  /etc/init ➜  initctl reload radosgw cluster="ceph" 
id="radosgw.node01"

   initctl: Unknown instance: ceph/radosgw.node01

   [B|root@node01]  /etc/init ➜  initctl list | grep rados
   radosgw-instance stop/waiting
   radosgw stop/waiting
   radosgw-all-starter stop/waiting
   radosgw-all start/running

Apart from me not being totally clear about what the difference
between radosgw-instance and radosgw is, obviously Upstart
has no idea about which PID to send the SIGHUP to when I ask
it to reload.

I can, of course, replace the logrotate config and use the
/etc/init.d/radosgw reload  approach, but I would like to
understand if this is something unique to our system, or if
this is a bug in the scripts.

FWIW here's an excerpt from /etc/ceph.conf:

   [client.radosgw.node01]
   host = node01
   rgw print continue = false
   keyring = /etc/ceph/keyring.radosgw.gateway
   rgw socket path = /tmp/radosgw.sock
   log file = /var/log/radosgw/radosgw.log
   rgw enable ops log = false
   rgw gc max objs = 31


Thanks!
Daniel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] old osds take much longer to start than newer osd

2015-03-02 Thread Gregory Farnum
This is probably LevelDB being slow. The monitor has some options to
"compact" the store on startup and I thought the osd handled it
automatically, but you could try looking for something like that and see if
it helps.
-Greg
On Fri, Feb 27, 2015 at 5:02 AM Corin Langosch 
wrote:

> Hi guys,
>
> I'm using ceph for a long time now, since bobtail. I always upgraded every
> few weeks/ months to the latest stable
> release. Of course I also removed some osds and added new ones. Now during
> the last few upgrades (I just upgraded from
> 80.6 to 80.8) I noticed that old osds take much longer to startup than
> equal newer osds (same amount of data/ disk
> usage, same kind of storage+journal backing device (ssd), same weight,
> same number of pgs, ...). I know I observed the
> same behavior earlier but just didn't really care about it. Here are the
> relevant log entries (host of osd.0 and osd.15
> has less cpu power than the others):
>
> old osds (average pgs load time: 1.5 minutes)
>
> 2015-02-27 13:44:23.134086 7ffbfdcbe780  0 osd.0 19323 load_pgs
> 2015-02-27 13:49:21.453186 7ffbfdcbe780  0 osd.0 19323 load_pgs opened 824
> pgs
>
> 2015-02-27 13:41:32.219503 7f197b0dd780  0 osd.3 19317 load_pgs
> 2015-02-27 13:42:56.310874 7f197b0dd780  0 osd.3 19317 load_pgs opened 776
> pgs
>
> 2015-02-27 13:38:43.909464 7f450ac90780  0 osd.6 19309 load_pgs
> 2015-02-27 13:40:40.080390 7f450ac90780  0 osd.6 19309 load_pgs opened 806
> pgs
>
> 2015-02-27 13:36:14.451275 7f3c41d33780  0 osd.9 19301 load_pgs
> 2015-02-27 13:37:22.446285 7f3c41d33780  0 osd.9 19301 load_pgs opened 795
> pgs
>
> new osds (average pgs load time: 3 seconds)
>
> 2015-02-27 13:44:25.529743 7f2004617780  0 osd.15 19325 load_pgs
> 2015-02-27 13:44:36.197221 7f2004617780  0 osd.15 19325 load_pgs opened
> 873 pgs
>
> 2015-02-27 13:41:29.176647 7fb147fb3780  0 osd.16 19315 load_pgs
> 2015-02-27 13:41:31.681722 7fb147fb3780  0 osd.16 19315 load_pgs opened
> 848 pgs
>
> 2015-02-27 13:38:41.470761 7f9c404be780  0 osd.17 19307 load_pgs
> 2015-02-27 13:38:43.737473 7f9c404be780  0 osd.17 19307 load_pgs opened
> 821 pgs
>
> 2015-02-27 13:36:10.997766 7f7315e99780  0 osd.18 19299 load_pgs
> 2015-02-27 13:36:13.511898 7f7315e99780  0 osd.18 19299 load_pgs opened
> 815 pgs
>
> The old osds also take more memory, here's an example:
>
> root 15700 22.8  0.7 1423816 485552 ?  Ssl  13:36   4:55
> /usr/bin/ceph-osd -i 9 --pid-file
> /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
> root 15270 15.4  0.4 1227140 297032 ?  Ssl  13:36   3:20
> /usr/bin/ceph-osd -i 18 --pid-file
> /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph
>
>
> It seems to me there is still some old data around for the old osds which
> was not properly migrated/ cleaned up during
> the upgrades. The cluster is healthy, no problems at all the last few
> weeks. Is there any way to clean this up?
>
> Thanks
> Corin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What does the parameter journal_align_min_size mean?

2015-03-02 Thread Gregory Farnum
On Fri, Feb 27, 2015 at 5:03 AM, Mark Wu  wrote:
>
> I am wondering how the value of journal_align_min_size gives impact on
> journal padding. Is there any document describing the disk layout of
> journal?

Not much, unfortunately. Just looking at the code, the journal will
align any writes which are at least as large as that parameter,
apparently based on the page size and the target offset within the
destination object. I think this is so that it's more conveniently
aligned for transfer into the filesystem later on, whereas smaller
writes can just get copied?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-02 Thread Gregory Farnum
You can turn the filestore up to 20 instead of 1. ;) You might also
explore what information you can get out of the admin socket.

You are correct that those numbers are the OSD epochs, although note
that when the system is running you'll get output both for the OSD as
a whole and for individual PGs within it (which can be lagging
behind). I'm still pretty convinced the OSDs are simply stuck trying
to bring their PGs up to date and are thrashing the maps on disk, but
we're well past what I can personally diagnose without log diving.
-Greg

On Sat, Feb 28, 2015 at 11:51 AM, Chris Murray  wrote:
> After noticing that the number increases by 101 on each attempt to start
> osd.11, I figured I was only 7 iterations away from the output being
> within 101 of 63675. So, I killed the osd process, started it again,
> lather, rinse, repeat. I then did the same for other OSDs. Some created
> very small logs, and some created logs into the gigabytes. Grepping the
> latter for "update_osd_stat" showed me where the maps were up to, and
> therefore which OSDs needed some special attention. Some of the epoch
> numbers appeared to increase by themselves to a point and then plateaux,
> after which I'd kill then start the osd again, and this number would
> start to increase again.
>
> After all either showed 63675, or nothing at all, I turned debugging
> back off, deleted logs, and tried to bring the cluster back by unsetting
> noup, nobackfill, norecovery etc. It hasn't got very far before
> appearing stuck again, with nothing progressing in ceph status. It
> appears that 11/15 OSDs are now properly up, but four still aren't. A
> lot of placement groups are stale, so I guess I really need the
> remaining four to come up.
>
> The OSDs in question are 1, 7, 10 & 12. All have a line similar to this
> as the last in their log:
>
> 2015-02-28 10:35:04.240822 7f375ef40780  1 journal _open
> /var/lib/ceph/osd/ceph-1/journal fd 21: 5367660544 bytes, block size
> 4096 bytes, directio = 1, aio = 1
>
> Even with the following in ceph.conf, I'm not seeing anything after that
> last line in the log.
>
>  debug osd = 20
>  debug filestore = 1
>
> CPU is still being consumed by the ceph-osd process though, but not much
> memory is being used compared to the other two OSDs which are up on that
> node.
>
> Is there perhaps even further logging that I can use to see why the logs
> aren't progressing past this point?
> Osd.1 is on /dev/sdb. iostat still shows some activity as the minutes go
> on, but not much:
>
> (60 second intervals)
> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> sdb   5.45 0.00   807.33  0  48440
> sdb   5.75 0.00   807.33  0  48440
> sdb   5.43 0.00   807.20  0  48440
>
> Thanks,
> Chris
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Chris Murray
> Sent: 27 February 2015 10:32
> To: Gregory Farnum
> Cc: ceph-users
> Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy;will
> the cluster recover without help?
>
> A little further logging:
>
> 2015-02-27 10:27:15.745585 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:15.745619 7fe8e3f2f700  5 osd.11 62839 heartbeat:
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:23.530913 7fe8e8536700  1 -- 192.168.12.25:6800/673078
> --> 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
> v2 -- ?+0 0xe5f26380 con 0xe1f0cc60
> 2015-02-27 10:27:30.645902 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:30.645938 7fe8e3f2f700  5 osd.11 62839 heartbeat:
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:33.531142 7fe8e8536700  1 -- 192.168.12.25:6800/673078
> --> 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
> v2 -- ?+0 0xe5f26540 con 0xe1f0cc60
> 2015-02-27 10:27:43.531333 7fe8e8536700  1 -- 192.168.12.25:6800/673078
> --> 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
> v2 -- ?+0 0xe5f26700 con 0xe1f0cc60
> 2015-02-27 10:27:45.546275 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:45.546311 7fe8e3f2f700  5 osd.11 62839 heartbeat:
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers []/[] op hist
> [])
> 2015-02-27 10:27:53.531564 7fe8e8536700  1 -- 192.168.12.25:6800/673078
> --> 192.168.12.25:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0})
> v2 -- ?+0 0xe5f268c0 con 0xe1f0cc60
> 2015-02-27 10:27:56.846593 7fe8e3f2f700 20 osd.11 62839 update_osd_stat
> osd_stat(1305 GB used, 1431 GB avail, 2789 GB total, peers 

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu  wrote:
> Hi all, especially devs,
>
> We have recently pinpointed one of the causes of slow requests in our
> cluster. It seems deep-scrubs on pg's that contain the index file for a
> large radosgw bucket lock the osds. Incresing op threads and/or disk threads
> helps a little bit, but we need to increase them beyond reason in order to
> completely get rid of the problem. A somewhat similar (and more severe)
> version of the issue occurs when we call listomapkeys for the index file,
> and since the logs for deep-scrubbing was much harder read, this inspection
> was based on listomapkeys.
>
> In this example osd.121 is the primary of pg 10.c91 which contains file
> .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
> ~500k objects. Standard listomapkeys call take about 3 seconds.
>
> time rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null
> real 0m2.983s
> user 0m0.760s
> sys 0m0.148s
>
> In order to lock the osd we request 2 of them simultaneously with something
> like:
>
> rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
> sleep 1
> rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
>
> 'debug_osd=30' logs show the flow like:
>
> At t0 some thread enqueue_op's my omap-get-keys request.
> Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
> keys.
> Op-Thread B responds to several other requests during that 1 second sleep.
> They're generally extremely fast subops on other pgs.
> At t1 (about a second later) my second omap-get-keys request gets
> enqueue_op'ed. But it does not start probably because of the lock held by
> Thread A.
> After that point other threads enqueue_op other requests on other pgs too
> but none of them starts processing, in which i consider the osd is locked.
> At t2 (about another second later) my first omap-get-keys request is
> finished.
> Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts
> reading ~500k keys again.
> Op-Thread A continues to process the requests enqueued in t1-t2.
>
> It seems Op-Thread B is waiting on the lock held by Op-Thread A while it can
> process other requests for other pg's just fine.
>
> My guess is a somewhat larger scenario happens in deep-scrubbing, like on
> the pg containing index for the bucket of >20M objects. A disk/op thread
> starts reading through the omap which will take say 60 seconds. During the
> first seconds, other requests for other pgs pass just fine. But in 60
> seconds there are bound to be other requests for the same pg, especially
> since it holds the index file. Each of these requests lock another disk/op
> thread to the point where there are no free threads left to process any
> requests for any pg. Causing slow-requests.
>
> So first of all thanks if you can make it here, and sorry for the involved
> mail, i'm exploring the problem as i go.
> Now, is that deep-scrubbing situation i tried to theorize even possible? If
> not can you point us where to look further.
> We are currently running 0.72.2 and know about newer ioprio settings in
> Firefly and such. While we are planning to upgrade in a few weeks but i
> don't think those options will help us in any way. Am i correct?
> Are there any other improvements that we are not aware?

This is all basically correct; it's one of the reasons you don't want
to let individual buckets get too large.

That said, I'm a little confused about why you're running listomapkeys
that way. RGW throttles itself by getting only a certain number of
entries at a time (1000?) and any system you're also building should
do the same. That would reduce the frequency of any issues, and I
*think* that scrubbing has some mitigating factors to help (although
maybe not; it's been a while since I looked at any of that stuff).

Although I just realized that my vague memory of deep scrubbing
working better might be based on improvements that only got in for
firefly...not sure.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 8:44 AM, Daniel Schneller
 wrote:
> On our Ubuntu 14.04/Firefly 0.80.8 cluster we are seeing
> problem with log file rotation for the rados gateway.
>
> The /etc/logrotate.d/radosgw script gets called, but
> it does not work correctly. It spits out this message,
> coming from the postrotate portion:
>
>/etc/cron.daily/logrotate:
>reload: Unknown parameter: id
>invoke-rc.d: initscript radosgw, action "reload" failed.
>
> A new log file actually gets created, but due to the
> failure in the post-rotate script, the daemon actually
> continues writing into the now deleted previous file:
>
>[B|root@node01]  /etc/init ➜  ps aux | grep radosgw
>root 13077  0.9  0.1 13710396 203256 ? Ssl  Feb14 212:27
> /usr/bin/radosgw -n client.radosgw.node01
>
>[B|root@node01]  /etc/init ➜  ls -l /proc/13077/fd/
>total 0
>lr-x-- 1 root root 64 Mar  2 15:53 0 -> /dev/null
>lr-x-- 1 root root 64 Mar  2 15:53 1 -> /dev/null
>lr-x-- 1 root root 64 Mar  2 15:53 2 -> /dev/null
>l-wx-- 1 root root 64 Mar  2 15:53 3 ->
> /var/log/radosgw/radosgw.log.1 (deleted)
>...
>
> Trying manually with   service radosgw reload  fails with
> the same message. Running the non-upstart
> /etc/init.d/radosgw reload   works. It will, kind of crudely,
> just send a SIGHUP to any running radosgw process.
>
> To figure out the cause I compared OSDs and RadosGW wrt
> to upstart and got this:
>
>[B|root@node01]  /etc/init ➜  initctl list | grep osd
>ceph-osd-all start/running
>ceph-osd-all-starter stop/waiting
>ceph-osd (ceph/8) start/running, process 12473
>ceph-osd (ceph/9) start/running, process 12503
>...
>
>[B|root@node01]  /etc/init ➜  initctl reload radosgw cluster="ceph"
> id="radosgw.node01"
>initctl: Unknown instance: ceph/radosgw.node01
>
>[B|root@node01]  /etc/init ➜  initctl list | grep rados
>radosgw-instance stop/waiting
>radosgw stop/waiting
>radosgw-all-starter stop/waiting
>radosgw-all start/running
>
> Apart from me not being totally clear about what the difference
> between radosgw-instance and radosgw is, obviously Upstart
> has no idea about which PID to send the SIGHUP to when I ask
> it to reload.
>
> I can, of course, replace the logrotate config and use the
> /etc/init.d/radosgw reload  approach, but I would like to
> understand if this is something unique to our system, or if
> this is a bug in the scripts.
>
> FWIW here's an excerpt from /etc/ceph.conf:
>
>[client.radosgw.node01]
>host = node01
>rgw print continue = false
>keyring = /etc/ceph/keyring.radosgw.gateway
>rgw socket path = /tmp/radosgw.sock
>log file = /var/log/radosgw/radosgw.log
>rgw enable ops log = false
>rgw gc max objs = 31

I'm not very (well, at all, for rgw) familiar with these scripts, but
how are you starting up your RGW daemon? There's some way to have
Apache handle the process instead of Upstart, but Yehuda says "you
don't want to do it".
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
Oops, typo...  should say "Using ceph-deploy, I see a failure to install ceph 
on a RHEL7 node"...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 10:17
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Fresh install of GIANT failing?
Sensitivity: Personal

All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Daniel Schneller

On 2015-03-02 18:17:00 +, Gregory Farnum said:



I'm not very (well, at all, for rgw) familiar with these scripts, but
how are you starting up your RGW daemon? There's some way to have
Apache handle the process instead of Upstart, but Yehuda says "you
don't want to do it".
-Greg


Well, we installed the packages via APT. That places the upstart
scripts into /etc/init. Nothing special. That will make Upstart
launch them in boot.

In the meantime I just placed

   /var/log/radosgw/*.log {
   rotate 7
   daily
   compress
   sharedscripts
   postrotate
start-stop-daemon --stop --signal HUP -x /usr/bin/radosgw --oknodo
   endscript
   missingok
   notifempty
   }

into the logrotate script, removing the more complicated (and not working :))
logic with the core piece from the regular init.d script.

Because the daemons were already running and using an already deleted script,
logrotate wouldn't see the need to rotate the (visible) ones, because they
had not changed. So I needed to manually execute the above start-stop-daemon
on all relevant nodes ones to force the gateway to start a new, non-deleted
logfile.

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Hi Gregory,

We are not using listomapkeys that way or in any way to be precise. I used
it here just to reproduce the behavior/issue.

What i am really interested in is if scrubbing-deep actually mitigates the
problem and/or is there something that can be further improved.

Or i guess we should go upgrade now and hope for the best :)

On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum  wrote:

> On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
> wrote:
> > Hi all, especially devs,
> >
> > We have recently pinpointed one of the causes of slow requests in our
> > cluster. It seems deep-scrubs on pg's that contain the index file for a
> > large radosgw bucket lock the osds. Incresing op threads and/or disk
> threads
> > helps a little bit, but we need to increase them beyond reason in order
> to
> > completely get rid of the problem. A somewhat similar (and more severe)
> > version of the issue occurs when we call listomapkeys for the index file,
> > and since the logs for deep-scrubbing was much harder read, this
> inspection
> > was based on listomapkeys.
> >
> > In this example osd.121 is the primary of pg 10.c91 which contains file
> > .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
> > ~500k objects. Standard listomapkeys call take about 3 seconds.
> >
> > time rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null
> > real 0m2.983s
> > user 0m0.760s
> > sys 0m0.148s
> >
> > In order to lock the osd we request 2 of them simultaneously with
> something
> > like:
> >
> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
> > sleep 1
> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
> >
> > 'debug_osd=30' logs show the flow like:
> >
> > At t0 some thread enqueue_op's my omap-get-keys request.
> > Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
> > keys.
> > Op-Thread B responds to several other requests during that 1 second
> sleep.
> > They're generally extremely fast subops on other pgs.
> > At t1 (about a second later) my second omap-get-keys request gets
> > enqueue_op'ed. But it does not start probably because of the lock held by
> > Thread A.
> > After that point other threads enqueue_op other requests on other pgs too
> > but none of them starts processing, in which i consider the osd is
> locked.
> > At t2 (about another second later) my first omap-get-keys request is
> > finished.
> > Op-Thread B locks pg 10.c91 and dequeue_op's my second request and starts
> > reading ~500k keys again.
> > Op-Thread A continues to process the requests enqueued in t1-t2.
> >
> > It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
> can
> > process other requests for other pg's just fine.
> >
> > My guess is a somewhat larger scenario happens in deep-scrubbing, like on
> > the pg containing index for the bucket of >20M objects. A disk/op thread
> > starts reading through the omap which will take say 60 seconds. During
> the
> > first seconds, other requests for other pgs pass just fine. But in 60
> > seconds there are bound to be other requests for the same pg, especially
> > since it holds the index file. Each of these requests lock another
> disk/op
> > thread to the point where there are no free threads left to process any
> > requests for any pg. Causing slow-requests.
> >
> > So first of all thanks if you can make it here, and sorry for the
> involved
> > mail, i'm exploring the problem as i go.
> > Now, is that deep-scrubbing situation i tried to theorize even possible?
> If
> > not can you point us where to look further.
> > We are currently running 0.72.2 and know about newer ioprio settings in
> > Firefly and such. While we are planning to upgrade in a few weeks but i
> > don't think those options will help us in any way. Am i correct?
> > Are there any other improvements that we are not aware?
>
> This is all basically correct; it's one of the reasons you don't want
> to let individual buckets get too large.
>
> That said, I'm a little confused about why you're running listomapkeys
> that way. RGW throttles itself by getting only a certain number of
> entries at a time (1000?) and any system you're also building should
> do the same. That would reduce the frequency of any issues, and I
> *think* that scrubbing has some mitigating factors to help (although
> maybe not; it's been a while since I looked at any of that stuff).
>
> Although I just realized that my vague memory of deep scrubbing
> working better might be based on improvements that only got in for
> firefly...not sure.
> -Greg
>



-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph binary missing from ceph-0.87.1-0.el6.x86_64

2015-03-02 Thread Michael Kuriger
Hi all,
When doing a fresh install on a new cluster, and using the latest rpm (0.87.1) 
ceph-deploy fails right away.  I checked the files inside the rpm, and 
/usr/bin/ceph is not there.  Upgrading from the previous rpm seems to work, but 
ceph-deploy is pulling the latest rpm automatically.



[ceph201][DEBUG ] connected to host: ceph201

[ceph201][DEBUG ] detect platform information from remote host

[ceph201][DEBUG ] detect machine type

[ceph_deploy.install][INFO  ] Distro info: CentOS 6.5 Final

[ceph201][INFO  ] installing ceph on ceph201

[ceph201][INFO  ] Running command: yum clean all

[ceph201][DEBUG ] Loaded plugins: fastestmirror, security

[ceph201][DEBUG ] Cleaning repos: base updates-released ceph-released

[ceph201][DEBUG ] Cleaning up Everything

[ceph201][DEBUG ] Cleaning up list of fastest mirrors

[ceph201][INFO  ] Running command: rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph201][INFO  ] Running command: rpm -Uvh --replacepkgs 
http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm

[ceph201][DEBUG ] Retrieving 
http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm

[ceph201][DEBUG ] Preparing...
##

[ceph201][DEBUG ] ceph-release
##

[ceph201][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a high 
priority

[ceph201][WARNIN] altered ceph.repo priorities to contain: priority=1

[ceph201][INFO  ] Running command: yum -y install ceph

[ceph201][DEBUG ] Loaded plugins: fastestmirror, security

[ceph201][DEBUG ] Determining fastest mirrors

[ceph201][DEBUG ] Setting up Install Process

[ceph201][DEBUG ] Resolving Dependencies

[ceph201][DEBUG ] --> Running transaction check

[ceph201][DEBUG ] ---> Package ceph.x86_64 1:0.87.1-0.el6 will be installed

[ceph201][DEBUG ] --> Finished Dependency Resolution

[ceph201][DEBUG ]

[ceph201][DEBUG ] Dependencies Resolved

[ceph201][DEBUG ]

[ceph201][DEBUG ] 


[ceph201][DEBUG ]  Package  Arch   Version 
RepositorySize

[ceph201][DEBUG ] 


[ceph201][DEBUG ] Installing:

[ceph201][DEBUG ]  ceph x86_64 1:0.87.1-0.el6  
ceph-released  13 M

[ceph201][DEBUG ]

[ceph201][DEBUG ] Transaction Summary

[ceph201][DEBUG ] 


[ceph201][DEBUG ] Install   1 Package(s)

[ceph201][DEBUG ]

[ceph201][DEBUG ] Total download size: 13 M

[ceph201][DEBUG ] Installed size: 50 M

[ceph201][DEBUG ] Downloading Packages:

[ceph201][DEBUG ] Running rpm_check_debug

[ceph201][DEBUG ] Running Transaction Test

[ceph201][DEBUG ] Transaction Test Succeeded

[ceph201][DEBUG ] Running Transaction

  Installing : 1:ceph-0.87.1-0.el6.x86_64   1/1

  Verifying  : 1:ceph-0.87.1-0.el6.x86_64   1/1

[ceph201][DEBUG ]

[ceph201][DEBUG ] Installed:

[ceph201][DEBUG ]   ceph.x86_64 1:0.87.1-0.el6

[ceph201][DEBUG ]

[ceph201][DEBUG ] Complete!

[ceph201][INFO  ] Running command: ceph --version

[ceph201][ERROR ] Traceback (most recent call last):

[ceph201][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/process.py", 
line 87, in run

[ceph201][ERROR ] reporting(conn, result, timeout)

[ceph201][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/log.py", line 
13, in reporting

[ceph201][ERROR ] received = result.receive(timeout)

[ceph201][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py",
 line 704, in receive

[ceph201][ERROR ] raise self._getremoteerror() or EOFError()

[ceph201][ERROR ] RemoteError: Traceback (most recent call last):

[ceph201][ERROR ]   File "", line 1036, in executetask

[ceph201][ERROR ]   File "", line 11, in _remote_run

[ceph201][ERROR ]   File "/usr/lib64/python2.6/subprocess.py", line 642, in 
__init__

[ceph201][ERROR ] errread, errwrite)

[ceph201][ERROR ]   File "/usr/lib64/python2.6/subprocess.py", line 1234, in 
_execute_child

[ceph201][ERROR ] raise child_exception

[ceph201][ERROR ] OSError: [Errno 2] No such file or directory

[ceph201][ERROR ]

[ceph201][ERROR ]


Michael Kuriger



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fresh install of GIANT failing?

2015-03-02 Thread Don Doerner
Problem solved, I've been pointed at repository problem and an existing Ceph 
issue (http://tracker.ceph.com/issues/10476) by a couple of helpful folks.
Thanks,

-don-

From: Don Doerner
Sent: 02 March, 2015 10:20
To: Don Doerner; ceph-users@lists.ceph.com
Subject: RE: Fresh install of GIANT failing?
Sensitivity: Personal

Oops, typo...  should say "Using ceph-deploy, I see a failure to install ceph 
on a RHEL7 node"...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 10:17
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Fresh install of GIANT failing?
Sensitivity: Personal

All,
Using ceph-deploy, I see a failure to install ceph on a node.  At the beginning 
of the ceph-deploy output, it says it is installing "stable version giant".

The last few lines are...
[192.168.167.192][DEBUG ] --> Finished Dependency Resolution
[192.168.167.192][WARNIN] Error: Package: 1:python-flask-0.10.1-3.el7.noarch 
(Ceph-noarch)
[192.168.167.192][WARNIN]Requires: python-jinja2
[192.168.167.192][WARNIN] Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: libcephfs1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:libcephfs1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][DEBUG ]  You could try using --skip-broken to work around the 
problem
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:libcephfs1-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]libcephfs1 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librados2 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librados2-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librados2-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 
1:librados2-0.87.1-0.el7.x86_64 (Ceph)
[192.168.167.192][WARNIN]librados2 = 1:0.87.1-0.el7
[192.168.167.192][WARNIN] Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 
(epel)
[192.168.167.192][WARNIN]Requires: librbd1 = 1:0.80.7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.86-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.86-0.el7
[192.168.167.192][WARNIN]Available: 1:librbd1-0.87-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87-0.el7
[192.168.167.192][WARNIN]Installing: 1:librbd1-0.87.1-0.el7.x86_64 
(Ceph)
[192.168.167.192][WARNIN]librbd1 = 1:0.87.1-0.el7
[192.168.167.192][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[192.168.167.192][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install 
ceph

Offhand , it looks like there is some FIREFLY stuff in the GIANT repo?  Also, 
what source is everyone using for "python-jinja2"?

Prior to the recent GIANT update, this all worked (except for "python-jinja2" 
which I got from an EU repo somewhere...).

Any idea what I did wrong?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Calamari Reconfiguration

2015-03-02 Thread Garg, Pankaj
Hi,
I had a cluster that was working correctly with Calamari and I was able to see 
and manage from the Dashboard.
I had to reinstall the cluster and change IP Addresses etc. so I built my 
cluster back up, with same name, but mainly network changes.
When I went to calamari, it shows some stale information about the old cluster.
I cleaned the server side by calamari-ctl clear and then calamari-ctl 
initialize command. I also deleted all salt keys, and restarted salt and 
diamond on al client machines.
Accepted new keys on the servers and though it would clean everything.

It did, but now I basically get a message that "This appears to be the first 
time you have started Calamari and there are no clusters currently configured."
I have rebooted the server many times and restarted services. The server says 
that x no of clients are connected to it, but "No cluster has been created".

What do I need to clean or reinstall for it to see my cluster information 
again. Clearly they are talking to each other, but somehow the server doesn't 
pick up the new cluster info.
Any help is appreciated.

Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Tues/Wed CDS Schedule Posted

2015-03-02 Thread Patrick McGarry
Hey cephers,

The basic schedule has been posted for CDS tomorrow and Wednesday. If
you are a blueprint owner and are unable to make the slot you have
been assigned please let me know. We're working with some pretty tight
time constraints to make this all work, but there is a little wiggle
room if needed. Thanks.

https://wiki.ceph.com/Planning/CDS/Infernalis_(Mar_2015)

Expect the blueprint / pad / video link details to be filled in this
afternoon. Shout if you have any other questions. Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Georgios Dimitrakakis

Daniel,

on CentOS the logrotate script was not invoked incorrectly because it 
was called everywhere as "radosgw":


e.g.
 service radosgw reload >/dev/null or
 initctl reload radosgw cluster="$cluster" id="$id" 2>/dev/null || :

but there isn't any radosgw service!

I had to change it into "ceph-radosgw" to make it worker properly!

Since you are using APT I guess that you are on Ubuntu/Debian but you 
may experience a relevant issue.


I was going to submit a bug for CentOS but had forgot it for some time 
now! I think now it's the time...Anyone has a different opinion on that?



Regards,


G.




On 2015-03-02 18:17:00 +, Gregory Farnum said:

I'm not very (well, at all, for rgw) familiar with these scripts, 
but

how are you starting up your RGW daemon? There's some way to have
Apache handle the process instead of Upstart, but Yehuda says "you
don't want to do it".
-Greg


Well, we installed the packages via APT. That places the upstart
scripts into /etc/init. Nothing special. That will make Upstart
launch them in boot.

In the meantime I just placed

   /var/log/radosgw/*.log {
   rotate 7
   daily
   compress
   sharedscripts
   postrotate
   	start-stop-daemon --stop --signal HUP -x /usr/bin/radosgw 
--oknodo

   endscript
   missingok
   notifempty
   }

into the logrotate script, removing the more complicated (and not 
working :))

logic with the core piece from the regular init.d script.

Because the daemons were already running and using an already deleted 
script,
logrotate wouldn't see the need to rotate the (visible) ones, because 
they
had not changed. So I needed to manually execute the above 
start-stop-daemon
on all relevant nodes ones to force the gateway to start a new, 
non-deleted

logfile.

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New SSD Question

2015-03-02 Thread Tony Harris
Hi all,

After the previous thread, I'm doing my SSD shopping for  and I came across
an SSD called an Edge Boost Pro w/ Power Fail, it seems to have some
impressive specs - in most places decent user reviews, in once place a poor
one - I was wondering if anyone has had any experience with these drives
with Ceph?  Does it work well?  Reliability issues?  etc.  Right now I'm
looking at getting Intel DC S3700's, but the price on these Edge drives are
pretty good for the 240G model, but almost TGTBT for the speed and power
fail caps, so I didn't want to take a chance if they were really
problematical as I'd rather just use a drive I know people have had quality
success with.

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph binary missing from ceph-0.87.1-0.el6.x86_64

2015-03-02 Thread Gregory Farnum
The ceph tool got moved into ceph-common at some point, so it
shouldn't be in the ceph rpm. I'm not sure what step in the
installation process should have handled that, but I imagine it's your
problem.
-Greg

On Mon, Mar 2, 2015 at 11:24 AM, Michael Kuriger  wrote:
> Hi all,
> When doing a fresh install on a new cluster, and using the latest rpm
> (0.87.1) ceph-deploy fails right away.  I checked the files inside the rpm,
> and /usr/bin/ceph is not there.  Upgrading from the previous rpm seems to
> work, but ceph-deploy is pulling the latest rpm automatically.
>
>
> [ceph201][DEBUG ] connected to host: ceph201
>
> [ceph201][DEBUG ] detect platform information from remote host
>
> [ceph201][DEBUG ] detect machine type
>
> [ceph_deploy.install][INFO  ] Distro info: CentOS 6.5 Final
>
> [ceph201][INFO  ] installing ceph on ceph201
>
> [ceph201][INFO  ] Running command: yum clean all
>
> [ceph201][DEBUG ] Loaded plugins: fastestmirror, security
>
> [ceph201][DEBUG ] Cleaning repos: base updates-released ceph-released
>
> [ceph201][DEBUG ] Cleaning up Everything
>
> [ceph201][DEBUG ] Cleaning up list of fastest mirrors
>
> [ceph201][INFO  ] Running command: rpm --import
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>
> [ceph201][INFO  ] Running command: rpm -Uvh --replacepkgs
> http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm
>
> [ceph201][DEBUG ] Retrieving
> http://ceph.com/rpm-firefly/el6/noarch/ceph-release-1-0.el6.noarch.rpm
>
> [ceph201][DEBUG ] Preparing...
> ##
>
> [ceph201][DEBUG ] ceph-release
> ##
>
> [ceph201][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a high
> priority
>
> [ceph201][WARNIN] altered ceph.repo priorities to contain: priority=1
>
> [ceph201][INFO  ] Running command: yum -y install ceph
>
> [ceph201][DEBUG ] Loaded plugins: fastestmirror, security
>
> [ceph201][DEBUG ] Determining fastest mirrors
>
> [ceph201][DEBUG ] Setting up Install Process
>
> [ceph201][DEBUG ] Resolving Dependencies
>
> [ceph201][DEBUG ] --> Running transaction check
>
> [ceph201][DEBUG ] ---> Package ceph.x86_64 1:0.87.1-0.el6 will be installed
>
> [ceph201][DEBUG ] --> Finished Dependency Resolution
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ] Dependencies Resolved
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ]
> 
>
> [ceph201][DEBUG ]  Package  Arch   Version
> RepositorySize
>
> [ceph201][DEBUG ]
> 
>
> [ceph201][DEBUG ] Installing:
>
> [ceph201][DEBUG ]  ceph x86_64 1:0.87.1-0.el6
> ceph-released  13 M
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ] Transaction Summary
>
> [ceph201][DEBUG ]
> 
>
> [ceph201][DEBUG ] Install   1 Package(s)
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ] Total download size: 13 M
>
> [ceph201][DEBUG ] Installed size: 50 M
>
> [ceph201][DEBUG ] Downloading Packages:
>
> [ceph201][DEBUG ] Running rpm_check_debug
>
> [ceph201][DEBUG ] Running Transaction Test
>
> [ceph201][DEBUG ] Transaction Test Succeeded
>
> [ceph201][DEBUG ] Running Transaction
>
>   Installing : 1:ceph-0.87.1-0.el6.x86_64
> 1/1
>
>   Verifying  : 1:ceph-0.87.1-0.el6.x86_64
> 1/1
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ] Installed:
>
> [ceph201][DEBUG ]   ceph.x86_64 1:0.87.1-0.el6
>
> [ceph201][DEBUG ]
>
> [ceph201][DEBUG ] Complete!
>
> [ceph201][INFO  ] Running command: ceph --version
>
> [ceph201][ERROR ] Traceback (most recent call last):
>
> [ceph201][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/process.py",
> line 87, in run
>
> [ceph201][ERROR ] reporting(conn, result, timeout)
>
> [ceph201][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/log.py",
> line 13, in reporting
>
> [ceph201][ERROR ] received = result.receive(timeout)
>
> [ceph201][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py",
> line 704, in receive
>
> [ceph201][ERROR ] raise self._getremoteerror() or EOFError()
>
> [ceph201][ERROR ] RemoteError: Traceback (most recent call last):
>
> [ceph201][ERROR ]   File "", line 1036, in executetask
>
> [ceph201][ERROR ]   File "", line 11, in _remote_run
>
> [ceph201][ERROR ]   File "/usr/lib64/python2.6/subprocess.py", line 642, in
> __init__
>
> [ceph201][ERROR ] errread, errwrite)
>
> [ceph201][ERROR ]   File "/usr/lib64/python2.6/subprocess.py", line 1234, in
> _execute_child
>
> [ceph201][ERROR ] raise child_exception
>
> [ceph201][ERROR ] OSError: [Errno 2] No such file or directory
>
> [ceph201][ERROR ]
>
> [ceph201][ERROR ]
>
>
>
> Michael Kuriger
>
>
>
>
> ___
> ce

[ceph-users] RadosGW do not populate "log file"

2015-03-02 Thread Italo Santos
Hello everyone,

I have a radosgw configured with the bellow ceph.conf file, but this instanse 
aren't generate any log entry on "log file" path, the log is aways empty, but 
if I take a look to the apache access.log there are a lot of entries.

Anyone knows why? 

Regards.

Italo Santos
http://italosantos.com.br/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Inter-zone replication and High Availability

2015-03-02 Thread Brian Button

Hi,

I'm trying to understand object storage replication between two ceph 
clusters in two zones in a single region. Setting up replication itself 
isn't the issue, it's how to ensure high availability and data safety 
between the clusters when failing over.


The simplest case is flipping the primary cluster between the two zones, 
just to test that we *can* do that. We expect that there is data that 
was written to the original primary that has not be replicated to the 
backup cluster yet, just because people are constantly writing to it. In 
that case, is there any way to ensure that data that was written to the 
original primary, but not replicated yet to the other zone, will be 
copied? Or is that data only in one place until we flip back over to the 
original primary?


The same situation would happen upon flipping primary responsibilities 
back to the original, where there would be data that had just been 
written that hadn't been replicated yet. Now do we have data on both 
sides that exists only in one place with no way to ensure that it is 
made consistent between the two zones?


Sorry if that was confusing, but it's hard to describe without seeing 
the gestures I'm making as I'm explaining it :)


bab

--
Brian Button
bbut...@agilestl.com | @brianbuttonxp | 636.399.3146
http://www.agileprogrammer.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Scottix
We have a file system running CephFS and for a while we had this issue when
doing an ls -la we get question marks in the response.

-rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
data.2015-02-08_00-00-00.csv.bz2
-? ? ?  ?   ??
data.2015-02-09_00-00-00.csv.bz2

If we do another directory listing it show up fine.

-rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
data.2015-02-08_00-00-00.csv.bz2
-rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
data.2015-02-09_00-00-00.csv.bz2

It hasn't been a problem but just wanted to see if this is an issue, could
the attributes be timing out? We do have a lot of files in the filesystem
so that could be a possible bottleneck.

We are using the ceph-fuse mount.
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
We are planning to do the update soon to 87.1

Thanks
Scottie
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 3:39 PM, Scottix  wrote:
> We have a file system running CephFS and for a while we had this issue when
> doing an ls -la we get question marks in the response.
>
> -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> data.2015-02-08_00-00-00.csv.bz2
> -? ? ?  ?   ??
> data.2015-02-09_00-00-00.csv.bz2
>
> If we do another directory listing it show up fine.
>
> -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> data.2015-02-08_00-00-00.csv.bz2
> -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
> data.2015-02-09_00-00-00.csv.bz2
>
> It hasn't been a problem but just wanted to see if this is an issue, could
> the attributes be timing out? We do have a lot of files in the filesystem so
> that could be a possible bottleneck.

Huh, that's not something I've seen before. Are the systems you're
doing this on the same? What distro and kernel version? Is it reliably
one of them showing the question marks, or does it jump between
systems?
-Greg

>
> We are using the ceph-fuse mount.
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> We are planning to do the update soon to 87.1
>
> Thanks
> Scottie
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Bill Sanders
Forgive me if this is unhelpful, but could it be something to do with
permissions of the directory and not Ceph at all?

http://superuser.com/a/528467

Bill

On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum  wrote:

> On Mon, Mar 2, 2015 at 3:39 PM, Scottix  wrote:
> > We have a file system running CephFS and for a while we had this issue
> when
> > doing an ls -la we get question marks in the response.
> >
> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> > data.2015-02-08_00-00-00.csv.bz2
> > -? ? ?  ?   ??
> > data.2015-02-09_00-00-00.csv.bz2
> >
> > If we do another directory listing it show up fine.
> >
> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> > data.2015-02-08_00-00-00.csv.bz2
> > -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
> > data.2015-02-09_00-00-00.csv.bz2
> >
> > It hasn't been a problem but just wanted to see if this is an issue,
> could
> > the attributes be timing out? We do have a lot of files in the
> filesystem so
> > that could be a possible bottleneck.
>
> Huh, that's not something I've seen before. Are the systems you're
> doing this on the same? What distro and kernel version? Is it reliably
> one of them showing the question marks, or does it jump between
> systems?
> -Greg
>
> >
> > We are using the ceph-fuse mount.
> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> > We are planning to do the update soon to 87.1
> >
> > Thanks
> > Scottie
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Scottix
3 Ceph servers on Ubuntu 12.04.5 - kernel 3.13.0-29-generic

We have an old server that we compiled the ceph-fuse client on
Suse11.4 - kernel 2.6.37.6-0.11
This is the only mount we have right now.

We don't have any problems reading the files and the directory shows full
775 permissions and doing a second ls fixes the problem.

On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders  wrote:

> Forgive me if this is unhelpful, but could it be something to do with
> permissions of the directory and not Ceph at all?
>
> http://superuser.com/a/528467
>
> Bill
>
> On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum  wrote:
>
>> On Mon, Mar 2, 2015 at 3:39 PM, Scottix  wrote:
>> > We have a file system running CephFS and for a while we had this issue
>> when
>> > doing an ls -la we get question marks in the response.
>> >
>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
>> > data.2015-02-08_00-00-00.csv.bz2
>> > -? ? ?  ?   ??
>> > data.2015-02-09_00-00-00.csv.bz2
>> >
>> > If we do another directory listing it show up fine.
>> >
>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
>> > data.2015-02-08_00-00-00.csv.bz2
>> > -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
>> > data.2015-02-09_00-00-00.csv.bz2
>> >
>> > It hasn't been a problem but just wanted to see if this is an issue,
>> could
>> > the attributes be timing out? We do have a lot of files in the
>> filesystem so
>> > that could be a possible bottleneck.
>>
>> Huh, that's not something I've seen before. Are the systems you're
>> doing this on the same? What distro and kernel version? Is it reliably
>> one of them showing the question marks, or does it jump between
>> systems?
>> -Greg
>>
>> >
>> > We are using the ceph-fuse mount.
>> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>> > We are planning to do the update soon to 87.1
>> >
>> > Thanks
>> > Scottie
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Gregory Farnum
I bet it's that permission issue combined with a minor bug in FUSE on
that kernel, or maybe in the ceph-fuse code (but I've not seen it
reported before, so I kind of doubt it). If you run ceph-fuse with
"debug client = 20" it will output (a whole lot of) logging to the
client's log file and you could see what requests are getting
processed by the Ceph code and how it's responding. That might let you
narrow things down. It's certainly not any kind of timeout.
-Greg

On Mon, Mar 2, 2015 at 3:57 PM, Scottix  wrote:
> 3 Ceph servers on Ubuntu 12.04.5 - kernel 3.13.0-29-generic
>
> We have an old server that we compiled the ceph-fuse client on
> Suse11.4 - kernel 2.6.37.6-0.11
> This is the only mount we have right now.
>
> We don't have any problems reading the files and the directory shows full
> 775 permissions and doing a second ls fixes the problem.
>
> On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders  wrote:
>>
>> Forgive me if this is unhelpful, but could it be something to do with
>> permissions of the directory and not Ceph at all?
>>
>> http://superuser.com/a/528467
>>
>> Bill
>>
>> On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum  wrote:
>>>
>>> On Mon, Mar 2, 2015 at 3:39 PM, Scottix  wrote:
>>> > We have a file system running CephFS and for a while we had this issue
>>> > when
>>> > doing an ls -la we get question marks in the response.
>>> >
>>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
>>> > data.2015-02-08_00-00-00.csv.bz2
>>> > -? ? ?  ?   ??
>>> > data.2015-02-09_00-00-00.csv.bz2
>>> >
>>> > If we do another directory listing it show up fine.
>>> >
>>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
>>> > data.2015-02-08_00-00-00.csv.bz2
>>> > -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
>>> > data.2015-02-09_00-00-00.csv.bz2
>>> >
>>> > It hasn't been a problem but just wanted to see if this is an issue,
>>> > could
>>> > the attributes be timing out? We do have a lot of files in the
>>> > filesystem so
>>> > that could be a possible bottleneck.
>>>
>>> Huh, that's not something I've seen before. Are the systems you're
>>> doing this on the same? What distro and kernel version? Is it reliably
>>> one of them showing the question marks, or does it jump between
>>> systems?
>>> -Greg
>>>
>>> >
>>> > We are using the ceph-fuse mount.
>>> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>>> > We are planning to do the update soon to 87.1
>>> >
>>> > Thanks
>>> > Scottie
>>> >
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-02 Thread Scottix
I'll try the following things and report back to you.

1. I can get a new kernel on another machine and mount to the CephFS and
see if I get the following errors.
2. I'll run the debug and see if anything comes up.

I'll report back to you when I can do these things.

Thanks,
Scottie

On Mon, Mar 2, 2015 at 4:04 PM Gregory Farnum  wrote:

> I bet it's that permission issue combined with a minor bug in FUSE on
> that kernel, or maybe in the ceph-fuse code (but I've not seen it
> reported before, so I kind of doubt it). If you run ceph-fuse with
> "debug client = 20" it will output (a whole lot of) logging to the
> client's log file and you could see what requests are getting
> processed by the Ceph code and how it's responding. That might let you
> narrow things down. It's certainly not any kind of timeout.
> -Greg
>
> On Mon, Mar 2, 2015 at 3:57 PM, Scottix  wrote:
> > 3 Ceph servers on Ubuntu 12.04.5 - kernel 3.13.0-29-generic
> >
> > We have an old server that we compiled the ceph-fuse client on
> > Suse11.4 - kernel 2.6.37.6-0.11
> > This is the only mount we have right now.
> >
> > We don't have any problems reading the files and the directory shows full
> > 775 permissions and doing a second ls fixes the problem.
> >
> > On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders 
> wrote:
> >>
> >> Forgive me if this is unhelpful, but could it be something to do with
> >> permissions of the directory and not Ceph at all?
> >>
> >> http://superuser.com/a/528467
> >>
> >> Bill
> >>
> >> On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum 
> wrote:
> >>>
> >>> On Mon, Mar 2, 2015 at 3:39 PM, Scottix  wrote:
> >>> > We have a file system running CephFS and for a while we had this
> issue
> >>> > when
> >>> > doing an ls -la we get question marks in the response.
> >>> >
> >>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> >>> > data.2015-02-08_00-00-00.csv.bz2
> >>> > -? ? ?  ?   ??
> >>> > data.2015-02-09_00-00-00.csv.bz2
> >>> >
> >>> > If we do another directory listing it show up fine.
> >>> >
> >>> > -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
> >>> > data.2015-02-08_00-00-00.csv.bz2
> >>> > -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
> >>> > data.2015-02-09_00-00-00.csv.bz2
> >>> >
> >>> > It hasn't been a problem but just wanted to see if this is an issue,
> >>> > could
> >>> > the attributes be timing out? We do have a lot of files in the
> >>> > filesystem so
> >>> > that could be a possible bottleneck.
> >>>
> >>> Huh, that's not something I've seen before. Are the systems you're
> >>> doing this on the same? What distro and kernel version? Is it reliably
> >>> one of them showing the question marks, or does it jump between
> >>> systems?
> >>> -Greg
> >>>
> >>> >
> >>> > We are using the ceph-fuse mount.
> >>> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> >>> > We are planning to do the update soon to 87.1
> >>> >
> >>> > Thanks
> >>> > Scottie
> >>> >
> >>> >
> >>> > ___
> >>> > ceph-users mailing list
> >>> > ceph-users@lists.ceph.com
> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] EC configuration questions...

2015-03-02 Thread Don Doerner
Hello,

I am trying to set up to measure erasure coding performance and overhead.  My 
Ceph "cluster-of-one" has 27 disks, hence 27 OSDs, all empty.  I have ots of 
memory, and I am using "osd crush chooseleaf type = 0" in my config file, so my 
OSDs should be able to peer with others on the same host, right?

I look at the EC profiles defined, and see only "default" which has k=2,m=1.  
Wanting to set up a more realistic test, I defined a new profile "k8m3", 
similar to default, but with k=8,m=3.

Checked with "ceph osd erasure-code-profile get k8m3", all looks good.

I then go to define my pool: "ceph osd pool create ecpool 256 256 erasure k8m3" 
apparently succeeds.

*Sidebar: my math on the pgnum stuff was (27 pools * 100)/11 = ~246, 
round up to 256.

Now I ask "ceph health", and get:
HEALTH_WARN 256 pgs incomplete; 256 pgs stuck inactive; 256 pgs stuck unclean; 
too few pgs per osd (9 < min 20)

Digging in to this a bit ("ceph health detail"), I see the magic OSD number 
(2147483647) that says that there weren't enough OSDs to assign to a placement 
group, for all placement groups.  And at the same time, it is warning me that I 
have too few PGs per OSD.

At the moment, I am defining a traditional replicated pool (3X) to see if that 
will work...  Anyone have any guess as to what I may be doing incorrectly with 
my erasure coded pool?  Or what I should do next to get a clue?

Regards,

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC configuration questions...

2015-03-02 Thread Don Doerner
Update: the attempt to define a traditional replicated pool was  successful; 
it's online and ready to go.  So the cluster basics appear sound...

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 02 March, 2015 16:18
To: ceph-users@lists.ceph.com
Subject: [ceph-users] EC configuration questions...
Sensitivity: Personal

Hello,

I am trying to set up to measure erasure coding performance and overhead.  My 
Ceph "cluster-of-one" has 27 disks, hence 27 OSDs, all empty.  I have ots of 
memory, and I am using "osd crush chooseleaf type = 0" in my config file, so my 
OSDs should be able to peer with others on the same host, right?

I look at the EC profiles defined, and see only "default" which has k=2,m=1.  
Wanting to set up a more realistic test, I defined a new profile "k8m3", 
similar to default, but with k=8,m=3.

Checked with "ceph osd erasure-code-profile get k8m3", all looks good.

I then go to define my pool: "ceph osd pool create ecpool 256 256 erasure k8m3" 
apparently succeeds.

*Sidebar: my math on the pgnum stuff was (27 pools * 100)/11 = ~246, 
round up to 256.

Now I ask "ceph health", and get:
HEALTH_WARN 256 pgs incomplete; 256 pgs stuck inactive; 256 pgs stuck unclean; 
too few pgs per osd (9 < min 20)

Digging in to this a bit ("ceph health detail"), I see the magic OSD number 
(2147483647) that says that there weren't enough OSDs to assign to a placement 
group, for all placement groups.  And at the same time, it is warning me that I 
have too few PGs per OSD.

At the moment, I am defining a traditional replicated pool (3X) to see if that 
will work...  Anyone have any guess as to what I may be doing incorrectly with 
my erasure coded pool?  Or what I should do next to get a clue?

Regards,

-don-


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC configuration questions...

2015-03-02 Thread Loic Dachary
Hi Don,

On 03/03/2015 01:18, Don Doerner wrote:> Hello,
> 
>  
> 
> I am trying to set up to measure erasure coding performance and overhead.  My 
> Ceph “cluster-of-one” has 27 disks, hence 27 OSDs, all empty.  I have ots of 
> memory, and I am using “osd crush chooseleaf type = 0” in my config file, so 
> my OSDs should be able to peer with others on the same host, right?
> 
>  
> 
> I look at the EC profiles defined, and see only “default” which has k=2,m=1.  
> Wanting to set up a more realistic test, I defined a new profile “k8m3”, 
> similar to default, but with k=8,m=3. 
> 
>  
> 
> Checked with “ceph osd erasure-code-profile get k8m3”, all looks good.

When you create the erasure-code-profile you also need to set the failure 
domain (see ruleset-failure-domain in 
http://ceph.com/docs/master/rados/operations/erasure-code-jerasure/). It will 
not use the "osd crush chooseleaf type = 0" from your configuration file. You 
can verify the details of the ruleset used by the erasure coded pool with the 
command ./ceph osd crush rule dump

Cheers

> 
>  
> 
> I then go to define my pool: “ceph osd pool create ecpool 256 256 erasure 
> k8m3” apparently succeeds.
> 
> ·Sidebar: my math on the pgnum stuff was (27 pools * 100)/11 = ~246, 
> round up to 256.
> 
>  
> 
> Now I ask “ceph health”, and get:
> 
> HEALTH_WARN256 pgs incomplete; 256 pgs stuck inactive; 256 pgs stuck unclean; 
> too few pgs per osd (9 < min 20)
> 
>  
> 
> Digging in to this a bit (“ceph health detail”), I see the magic OSD number 
> (2147483647) that says that there weren’t enough OSDs to assign to a 
> placement group, /for all placement groups/.  And at the same time, it is 
> warning me that I have too few PGs per OSD.
> 
>  
> 
> At the moment, I am defining a traditional replicated pool (3X) to see if 
> that will work…  Anyone have any guess as to what I may be doing incorrectly 
> with my erasure coded pool?  Or what I should do next to get a clue?
> 
>  
> 
> Regards,
> 
>  
> 
> -don-
> 
>  
> 
> --
> The information contained in this transmission may be confidential. Any 
> disclosure, copying, or further distribution of confidential information is 
> not permitted unless such privilege is explicitly granted in writing by 
> Quantum. Quantum reserves the right to have electronic communications, 
> including email and attachments, sent across its networks filtered through 
> anti virus and spam software programs and retain such messages in order to 
> comply with applicable data security and retention requirements. Quantum is 
> not responsible for the proper and complete transmission of the substance of 
> this communication or for any delay in its receipt.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New SSD Question

2015-03-02 Thread Christian Balzer
On Mon, 2 Mar 2015 16:12:59 -0600 Tony Harris wrote:

> Hi all,
> 
> After the previous thread, I'm doing my SSD shopping for  and I came
> across an SSD called an Edge Boost Pro w/ Power Fail, it seems to have
> some impressive specs - in most places decent user reviews, in once
> place a poor one - I was wondering if anyone has had any experience with
> these drives with Ceph?  Does it work well?  Reliability issues?  etc.
> Right now I'm looking at getting Intel DC S3700's, but the price on
> these Edge drives are pretty good for the 240G model, but almost TGTBT
> for the speed and power fail caps, so I didn't want to take a chance if
> they were really problematical as I'd rather just use a drive I know
> people have had quality success with.
> 
> -Tony

Read this, especially the OLTP and Email section:

http://www.tweaktown.com/reviews/6337/edge-boost-pro-300gb-enterprise-ssd-review/index.html

It trails far behind other SSDs (and surely the 3700 which is about the
same price). 
Though typically the journal is actually NEVER read from, so write
performance is the main point here. 
This SSD looks good in the random write section.

However the Sandforce controller is known to slow down things during
garbage collection.

There is also the question how much this drive will degrade with time, at
least the review above did some preconditioning unlike others I saw.
With journals, you can't really to a TRIM to clean up things.
OTOH with a 240GB drive you can leave most of it empty for prevent or at
least forestall these issues.

Finally there is NO specification on the makers homepage at all. So when it
comes to durability and the buzzwords plus Toshiba eMLC mentioned in the
reviews do not solid numbers make.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines
We're seeing a lot of this as well. (as i mentioned to sage at
SCALE..) Is there a rule of thumb at all for how big is safe to let a
RGW bucket get?

Also, is this theoretically resolved by the new bucket-sharding
feature in the latest dev release?

-Ben

On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu  wrote:
> Hi Gregory,
>
> We are not using listomapkeys that way or in any way to be precise. I used
> it here just to reproduce the behavior/issue.
>
> What i am really interested in is if scrubbing-deep actually mitigates the
> problem and/or is there something that can be further improved.
>
> Or i guess we should go upgrade now and hope for the best :)
>
> On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum  wrote:
>>
>> On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
>> wrote:
>> > Hi all, especially devs,
>> >
>> > We have recently pinpointed one of the causes of slow requests in our
>> > cluster. It seems deep-scrubs on pg's that contain the index file for a
>> > large radosgw bucket lock the osds. Incresing op threads and/or disk
>> > threads
>> > helps a little bit, but we need to increase them beyond reason in order
>> > to
>> > completely get rid of the problem. A somewhat similar (and more severe)
>> > version of the issue occurs when we call listomapkeys for the index
>> > file,
>> > and since the logs for deep-scrubbing was much harder read, this
>> > inspection
>> > was based on listomapkeys.
>> >
>> > In this example osd.121 is the primary of pg 10.c91 which contains file
>> > .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
>> > ~500k objects. Standard listomapkeys call take about 3 seconds.
>> >
>> > time rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null
>> > real 0m2.983s
>> > user 0m0.760s
>> > sys 0m0.148s
>> >
>> > In order to lock the osd we request 2 of them simultaneously with
>> > something
>> > like:
>> >
>> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
>> > sleep 1
>> > rados -p .rgw.buckets listomapkeys .dir.5926.3 > /dev/null &
>> >
>> > 'debug_osd=30' logs show the flow like:
>> >
>> > At t0 some thread enqueue_op's my omap-get-keys request.
>> > Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
>> > keys.
>> > Op-Thread B responds to several other requests during that 1 second
>> > sleep.
>> > They're generally extremely fast subops on other pgs.
>> > At t1 (about a second later) my second omap-get-keys request gets
>> > enqueue_op'ed. But it does not start probably because of the lock held
>> > by
>> > Thread A.
>> > After that point other threads enqueue_op other requests on other pgs
>> > too
>> > but none of them starts processing, in which i consider the osd is
>> > locked.
>> > At t2 (about another second later) my first omap-get-keys request is
>> > finished.
>> > Op-Thread B locks pg 10.c91 and dequeue_op's my second request and
>> > starts
>> > reading ~500k keys again.
>> > Op-Thread A continues to process the requests enqueued in t1-t2.
>> >
>> > It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
>> > can
>> > process other requests for other pg's just fine.
>> >
>> > My guess is a somewhat larger scenario happens in deep-scrubbing, like
>> > on
>> > the pg containing index for the bucket of >20M objects. A disk/op thread
>> > starts reading through the omap which will take say 60 seconds. During
>> > the
>> > first seconds, other requests for other pgs pass just fine. But in 60
>> > seconds there are bound to be other requests for the same pg, especially
>> > since it holds the index file. Each of these requests lock another
>> > disk/op
>> > thread to the point where there are no free threads left to process any
>> > requests for any pg. Causing slow-requests.
>> >
>> > So first of all thanks if you can make it here, and sorry for the
>> > involved
>> > mail, i'm exploring the problem as i go.
>> > Now, is that deep-scrubbing situation i tried to theorize even possible?
>> > If
>> > not can you point us where to look further.
>> > We are currently running 0.72.2 and know about newer ioprio settings in
>> > Firefly and such. While we are planning to upgrade in a few weeks but i
>> > don't think those options will help us in any way. Am i correct?
>> > Are there any other improvements that we are not aware?
>>
>> This is all basically correct; it's one of the reasons you don't want
>> to let individual buckets get too large.
>>
>> That said, I'm a little confused about why you're running listomapkeys
>> that way. RGW throttles itself by getting only a certain number of
>> entries at a time (1000?) and any system you're also building should
>> do the same. That would reduce the frequency of any issues, and I
>> *think* that scrubbing has some mitigating factors to help (although
>> maybe not; it's been a while since I looked at any of that stuff).
>>
>> Although I just realized that my vague memory of deep scrubbing
>> working better might be based on improvements that only got in 

Re: [ceph-users] Update 0.80.5 to 0.80.8 --the VM's read request become too slow

2015-03-02 Thread Nathan O'Sullivan


On 11/02/2015 1:46 PM, 杨万元 wrote:

Hello!
We use Ceph+Openstack in our private cloud. Recently we upgrade 
our centos6.5 based cluster from Ceph Emperor to Ceph Firefly.
At first,we use redhat yum repo epel to upgrade, this Ceph's 
version is 0.80.5. First upgrade monitor,then osd,last client. when we 
complete this upgrade, we boot a VM on the cluster,then use fio to 
test the io performance. The io performance is as better as before. 
Everything is ok!
Then we upgrade the cluster from 0.80.5 to 0.80.8,when we 
 completed , we reboot the VM to load the newest librbd. after that we 
also use fio to test the io performance*.then we find the randwrite 
and write is as good as before.but the randread and read is become 
worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is 
worse. the write's bw from 400MB/s to 115MB/s*. then I downgrade the 
ceph client version from 0.80.8 to 0.80.5, then the reslut become 
 normal.
 So I think maybe something cause about librbd.  I compare the 
0.80.8 release notes with 0.80.5 
(http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just 
find this change in  0.80.8 is something about read request 
 :  librbd: cap memory utilization for read requests (Jason Dillaman) 
 .  Who can  explain this?




FWIW we are seeing the same thing when switching librbd from 0.80.7 to 
0.80.8 - there is a massive performance regression in random reads.   In 
our case, from ~10,000 4k read iops down to less than 1,000.


We also tested librbd 0.87.1 , and found it does not have this problem - 
it appears to be isolated to 0.80.8 only.


Regards
Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread GuangYang
We have had good experience so far keeping each bucket less than 0.5 million 
objects, by client side sharding. But I think it would be nice you can test at 
your scale, with your hardware configuration, as well as your expectation over 
the tail latency.

Generally the bucket sharding should help, both for Write throughput and *stall 
with recovering/scrubbing*, but it comes with a prices -  The X shards you have 
for each bucket, the listing/trimming would be X times weighted, from OSD's 
load's point of view. There was discussion to implement: 1) blind bucket (for 
use cases bucket listing is not needed). 2) Un-ordered listing, which could 
improve the problem I mentioned above. They are on the roadmap...

Thanks,
Guang



> From: bhi...@gmail.com
> Date: Mon, 2 Mar 2015 18:13:25 -0800
> To: erdem.agao...@gmail.com
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Some long running ops may lock osd
>
> We're seeing a lot of this as well. (as i mentioned to sage at
> SCALE..) Is there a rule of thumb at all for how big is safe to let a
> RGW bucket get?
>
> Also, is this theoretically resolved by the new bucket-sharding
> feature in the latest dev release?
>
> -Ben
>
> On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu  
> wrote:
>> Hi Gregory,
>>
>> We are not using listomapkeys that way or in any way to be precise. I used
>> it here just to reproduce the behavior/issue.
>>
>> What i am really interested in is if scrubbing-deep actually mitigates the
>> problem and/or is there something that can be further improved.
>>
>> Or i guess we should go upgrade now and hope for the best :)
>>
>> On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum  wrote:
>>>
>>> On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
>>> wrote:
 Hi all, especially devs,

 We have recently pinpointed one of the causes of slow requests in our
 cluster. It seems deep-scrubs on pg's that contain the index file for a
 large radosgw bucket lock the osds. Incresing op threads and/or disk
 threads
 helps a little bit, but we need to increase them beyond reason in order
 to
 completely get rid of the problem. A somewhat similar (and more severe)
 version of the issue occurs when we call listomapkeys for the index
 file,
 and since the logs for deep-scrubbing was much harder read, this
 inspection
 was based on listomapkeys.

 In this example osd.121 is the primary of pg 10.c91 which contains file
 .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
 ~500k objects. Standard listomapkeys call take about 3 seconds.

 time rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null
 real 0m2.983s
 user 0m0.760s
 sys 0m0.148s

 In order to lock the osd we request 2 of them simultaneously with
 something
 like:

 rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &
 sleep 1
 rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &

 'debug_osd=30' logs show the flow like:

 At t0 some thread enqueue_op's my omap-get-keys request.
 Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
 keys.
 Op-Thread B responds to several other requests during that 1 second
 sleep.
 They're generally extremely fast subops on other pgs.
 At t1 (about a second later) my second omap-get-keys request gets
 enqueue_op'ed. But it does not start probably because of the lock held
 by
 Thread A.
 After that point other threads enqueue_op other requests on other pgs
 too
 but none of them starts processing, in which i consider the osd is
 locked.
 At t2 (about another second later) my first omap-get-keys request is
 finished.
 Op-Thread B locks pg 10.c91 and dequeue_op's my second request and
 starts
 reading ~500k keys again.
 Op-Thread A continues to process the requests enqueued in t1-t2.

 It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
 can
 process other requests for other pg's just fine.

 My guess is a somewhat larger scenario happens in deep-scrubbing, like
 on
 the pg containing index for the bucket of>20M objects. A disk/op thread
 starts reading through the omap which will take say 60 seconds. During
 the
 first seconds, other requests for other pgs pass just fine. But in 60
 seconds there are bound to be other requests for the same pg, especially
 since it holds the index file. Each of these requests lock another
 disk/op
 thread to the point where there are no free threads left to process any
 requests for any pg. Causing slow-requests.

 So first of all thanks if you can make it here, and sorry for the
 involved
 mail, i'm exploring the problem as i go.
 Now, is that deep-scrubbing situation i tried to theorize even possible?
 If
 not can you point 

Re: [ceph-users] RadosGW do not populate "log file"

2015-03-02 Thread zhangdongmao

I have met this before.
Because I use apache with rgw, radosrgw is executed by the user 
'apache', so you have to make sure the apache user have permissions to 
write the log file.


在 2015年03月03日 07:06, Italo Santos 写道:

Hello everyone,

I have a radosgw configured with the bellow ceph.conf file, but this 
instanse aren't generate any log entry on "log file" path, the log is 
aways empty, but if I take a look to the apache access.log there are a 
lot of entries.


Anyone knows why?

Regards.

*Italo Santos*
http://italosantos.com.br/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines
Blind-bucket would be perfect for us, as we don't need to list the objects.

We only need to list the bucket when doing a bucket deletion. If we
could clean out/delete all objects in a bucket (without
iterating/listing them) that would be ideal..

On Mon, Mar 2, 2015 at 7:34 PM, GuangYang  wrote:
> We have had good experience so far keeping each bucket less than 0.5 million 
> objects, by client side sharding. But I think it would be nice you can test 
> at your scale, with your hardware configuration, as well as your expectation 
> over the tail latency.
>
> Generally the bucket sharding should help, both for Write throughput and 
> *stall with recovering/scrubbing*, but it comes with a prices -  The X shards 
> you have for each bucket, the listing/trimming would be X times weighted, 
> from OSD's load's point of view. There was discussion to implement: 1) blind 
> bucket (for use cases bucket listing is not needed). 2) Un-ordered listing, 
> which could improve the problem I mentioned above. They are on the roadmap...
>
> Thanks,
> Guang
>
>
> 
>> From: bhi...@gmail.com
>> Date: Mon, 2 Mar 2015 18:13:25 -0800
>> To: erdem.agao...@gmail.com
>> CC: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Some long running ops may lock osd
>>
>> We're seeing a lot of this as well. (as i mentioned to sage at
>> SCALE..) Is there a rule of thumb at all for how big is safe to let a
>> RGW bucket get?
>>
>> Also, is this theoretically resolved by the new bucket-sharding
>> feature in the latest dev release?
>>
>> -Ben
>>
>> On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu  
>> wrote:
>>> Hi Gregory,
>>>
>>> We are not using listomapkeys that way or in any way to be precise. I used
>>> it here just to reproduce the behavior/issue.
>>>
>>> What i am really interested in is if scrubbing-deep actually mitigates the
>>> problem and/or is there something that can be further improved.
>>>
>>> Or i guess we should go upgrade now and hope for the best :)
>>>
>>> On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum  wrote:

 On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
 wrote:
> Hi all, especially devs,
>
> We have recently pinpointed one of the causes of slow requests in our
> cluster. It seems deep-scrubs on pg's that contain the index file for a
> large radosgw bucket lock the osds. Incresing op threads and/or disk
> threads
> helps a little bit, but we need to increase them beyond reason in order
> to
> completely get rid of the problem. A somewhat similar (and more severe)
> version of the issue occurs when we call listomapkeys for the index
> file,
> and since the logs for deep-scrubbing was much harder read, this
> inspection
> was based on listomapkeys.
>
> In this example osd.121 is the primary of pg 10.c91 which contains file
> .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
> ~500k objects. Standard listomapkeys call take about 3 seconds.
>
> time rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null
> real 0m2.983s
> user 0m0.760s
> sys 0m0.148s
>
> In order to lock the osd we request 2 of them simultaneously with
> something
> like:
>
> rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &
> sleep 1
> rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &
>
> 'debug_osd=30' logs show the flow like:
>
> At t0 some thread enqueue_op's my omap-get-keys request.
> Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
> keys.
> Op-Thread B responds to several other requests during that 1 second
> sleep.
> They're generally extremely fast subops on other pgs.
> At t1 (about a second later) my second omap-get-keys request gets
> enqueue_op'ed. But it does not start probably because of the lock held
> by
> Thread A.
> After that point other threads enqueue_op other requests on other pgs
> too
> but none of them starts processing, in which i consider the osd is
> locked.
> At t2 (about another second later) my first omap-get-keys request is
> finished.
> Op-Thread B locks pg 10.c91 and dequeue_op's my second request and
> starts
> reading ~500k keys again.
> Op-Thread A continues to process the requests enqueued in t1-t2.
>
> It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
> can
> process other requests for other pg's just fine.
>
> My guess is a somewhat larger scenario happens in deep-scrubbing, like
> on
> the pg containing index for the bucket of>20M objects. A disk/op thread
> starts reading through the omap which will take say 60 seconds. During
> the
> first seconds, other requests for other pgs pass just fine. But in 60
> seconds there are bound to be other requests for the same pg, especially
> since it holds the index 

Re: [ceph-users] v0.93 Hammer release candidate released

2015-03-02 Thread Sage Weil
I forgot to mention a very important note for those running the v0.92 
development release and upgrading:

On Fri, 27 Feb 2015, Sage Weil wrote:
> Upgrading
> -

* If you are upgrading from v0.92, you must stop all OSD daemons and flush their
  journals (``ceph-osd -i NNN --flush-journal'') before upgrading.  There was
  a transaction encoding bug in v0.92 that broke compatibility.  Upgrading from
  v0.91 or anything earlier is safe.

> * No special restrictions when upgrading from firefly or giant

I know this has bitten a few of you already; I'm very sorry!  Following 
the procedure now should resolve it.

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.93 Hammer release candidate released

2015-03-02 Thread Sage Weil
On Mon, 2 Mar 2015, Sage Weil wrote:
> I forgot to mention a very important note for those running the v0.92 
> development release and upgrading:
> 
> On Fri, 27 Feb 2015, Sage Weil wrote:
> > Upgrading
> > -
> 
> * If you are upgrading from v0.92, you must stop all OSD daemons and flush 
> their
>   journals (``ceph-osd -i NNN --flush-journal'') before upgrading.  There was
>   a transaction encoding bug in v0.92 that broke compatibility.  Upgrading 
> from
>   v0.91 or anything earlier is safe.
> 
> > * No special restrictions when upgrading from firefly or giant
> 
> I know this has bitten a few of you already; I'm very sorry!  Following 
> the procedure now should resolve it.

Sorry, to be more clear: reinstalling v0.92, flushing journals, upgrading 
to v0.93, and then starting the OSDs should resolve it.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Update 0.80.5 to 0.80.8 --the VM's read request become too slow

2015-03-02 Thread Gregory Farnum
On Mon, Mar 2, 2015 at 7:15 PM, Nathan O'Sullivan  wrote:
>
> On 11/02/2015 1:46 PM, 杨万元 wrote:
>
> Hello!
> We use Ceph+Openstack in our private cloud. Recently we upgrade our
> centos6.5 based cluster from Ceph Emperor to Ceph Firefly.
> At first,we use redhat yum repo epel to upgrade, this Ceph's version is
> 0.80.5. First upgrade monitor,then osd,last client. when we complete this
> upgrade, we boot a VM on the cluster,then use fio to test the io
> performance. The io performance is as better as before. Everything is ok!
> Then we upgrade the cluster from 0.80.5 to 0.80.8,when we  completed ,
> we reboot the VM to load the newest librbd. after that we also use fio to
> test the io performance.then we find the randwrite and write is as good as
> before.but the randread and read is become worse, randwrite's iops from
> 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s
> to 115MB/s. then I downgrade the ceph client version from 0.80.8 to 0.80.5,
> then the reslut become  normal.
>  So I think maybe something cause about librbd.  I compare the 0.80.8
> release notes with 0.80.5
> (http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find
> this change in  0.80.8 is something about read request  :  librbd: cap
> memory utilization for read requests (Jason Dillaman)  .  Who can  explain
> this?
>
>
> FWIW we are seeing the same thing when switching librbd from 0.80.7 to
> 0.80.8 - there is a massive performance regression in random reads.   In our
> case, from ~10,000 4k read iops down to less than 1,000.
>
> We also tested librbd 0.87.1 , and found it does not have this problem - it
> appears to be isolated to 0.80.8 only.

I'm not familiar with the details of the issue, but we're putting out
0.80.9 as soon as we can and should resolve this. There was an
incomplete backport or something that is causing the slowness.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Update 0.80.5 to 0.80.8 --the VM's read request become too slow

2015-03-02 Thread Alexandre DERUMIER
I think this will be fixed in next firefly point release

tracker for firefly 0.80.8 speed decrease
http://tracker.ceph.com/issues/10956

Jason Dillaman link it to the famous object_cacher bug:

http://tracker.ceph.com/issues/9854

- Mail original -
De: "Gregory Farnum" 
À: "Nathan O'Sullivan" 
Cc: "ceph-users" 
Envoyé: Mardi 3 Mars 2015 07:23:54
Objet: Re: [ceph-users] Update 0.80.5 to 0.80.8 --the VM's read request become 
too slow

On Mon, Mar 2, 2015 at 7:15 PM, Nathan O'Sullivan  
wrote: 
> 
> On 11/02/2015 1:46 PM, 杨万元 wrote: 
> 
> Hello! 
> We use Ceph+Openstack in our private cloud. Recently we upgrade our 
> centos6.5 based cluster from Ceph Emperor to Ceph Firefly. 
> At first,we use redhat yum repo epel to upgrade, this Ceph's version is 
> 0.80.5. First upgrade monitor,then osd,last client. when we complete this 
> upgrade, we boot a VM on the cluster,then use fio to test the io 
> performance. The io performance is as better as before. Everything is ok! 
> Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , 
> we reboot the VM to load the newest librbd. after that we also use fio to 
> test the io performance.then we find the randwrite and write is as good as 
> before.but the randread and read is become worse, randwrite's iops from 
> 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s 
> to 115MB/s. then I downgrade the ceph client version from 0.80.8 to 0.80.5, 
> then the reslut become normal. 
> So I think maybe something cause about librbd. I compare the 0.80.8 
> release notes with 0.80.5 
> (http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find 
> this change in 0.80.8 is something about read request : librbd: cap 
> memory utilization for read requests (Jason Dillaman) . Who can explain 
> this? 
> 
> 
> FWIW we are seeing the same thing when switching librbd from 0.80.7 to 
> 0.80.8 - there is a massive performance regression in random reads. In our 
> case, from ~10,000 4k read iops down to less than 1,000. 
> 
> We also tested librbd 0.87.1 , and found it does not have this problem - it 
> appears to be isolated to 0.80.8 only. 

I'm not familiar with the details of the issue, but we're putting out 
0.80.9 as soon as we can and should resolve this. There was an 
incomplete backport or something that is causing the slowness. 
-Greg 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Erdem Agaoglu
Thank you folks for bringing that up. I had some questions about sharding.
We'd like blind buckets too, at least it's on the roadmap. For the current
sharded implementation, what are the final details? Is number of shards
defined per bucket or globally? Is there a way to split current indexes
into shards?

On the other hand what i'd like to point here is not necessarily
large-bucket-index specific. The problem is the mechanism around thread
pools. Any request may require locks on a pg and this should not block the
requests for other pgs. I'm no expert but the threads may be able to
requeue the requests to a locked pg, processing others for other pgs. Or
maybe a thread per pg design was possible. Because, you know, it is
somewhat OK not being able to do anything for a locked resource. Then you
can go and improve your processing or your locks. But it's a whole
different problem when a locked pg blocks requests for a few hundred other
pgs in other pools for no good reason.

On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines  wrote:

> Blind-bucket would be perfect for us, as we don't need to list the objects.
>
> We only need to list the bucket when doing a bucket deletion. If we
> could clean out/delete all objects in a bucket (without
> iterating/listing them) that would be ideal..
>
> On Mon, Mar 2, 2015 at 7:34 PM, GuangYang  wrote:
> > We have had good experience so far keeping each bucket less than 0.5
> million objects, by client side sharding. But I think it would be nice you
> can test at your scale, with your hardware configuration, as well as your
> expectation over the tail latency.
> >
> > Generally the bucket sharding should help, both for Write throughput and
> *stall with recovering/scrubbing*, but it comes with a prices -  The X
> shards you have for each bucket, the listing/trimming would be X times
> weighted, from OSD's load's point of view. There was discussion to
> implement: 1) blind bucket (for use cases bucket listing is not needed). 2)
> Un-ordered listing, which could improve the problem I mentioned above. They
> are on the roadmap...
> >
> > Thanks,
> > Guang
> >
> >
> > 
> >> From: bhi...@gmail.com
> >> Date: Mon, 2 Mar 2015 18:13:25 -0800
> >> To: erdem.agao...@gmail.com
> >> CC: ceph-users@lists.ceph.com
> >> Subject: Re: [ceph-users] Some long running ops may lock osd
> >>
> >> We're seeing a lot of this as well. (as i mentioned to sage at
> >> SCALE..) Is there a rule of thumb at all for how big is safe to let a
> >> RGW bucket get?
> >>
> >> Also, is this theoretically resolved by the new bucket-sharding
> >> feature in the latest dev release?
> >>
> >> -Ben
> >>
> >> On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu 
> wrote:
> >>> Hi Gregory,
> >>>
> >>> We are not using listomapkeys that way or in any way to be precise. I
> used
> >>> it here just to reproduce the behavior/issue.
> >>>
> >>> What i am really interested in is if scrubbing-deep actually mitigates
> the
> >>> problem and/or is there something that can be further improved.
> >>>
> >>> Or i guess we should go upgrade now and hope for the best :)
> >>>
> >>> On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum 
> wrote:
> 
>  On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu <
> erdem.agao...@gmail.com>
>  wrote:
> > Hi all, especially devs,
> >
> > We have recently pinpointed one of the causes of slow requests in our
> > cluster. It seems deep-scrubs on pg's that contain the index file
> for a
> > large radosgw bucket lock the osds. Incresing op threads and/or disk
> > threads
> > helps a little bit, but we need to increase them beyond reason in
> order
> > to
> > completely get rid of the problem. A somewhat similar (and more
> severe)
> > version of the issue occurs when we call listomapkeys for the index
> > file,
> > and since the logs for deep-scrubbing was much harder read, this
> > inspection
> > was based on listomapkeys.
> >
> > In this example osd.121 is the primary of pg 10.c91 which contains
> file
> > .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket
> contains
> > ~500k objects. Standard listomapkeys call take about 3 seconds.
> >
> > time rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null
> > real 0m2.983s
> > user 0m0.760s
> > sys 0m0.148s
> >
> > In order to lock the osd we request 2 of them simultaneously with
> > something
> > like:
> >
> > rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &
> > sleep 1
> > rados -p .rgw.buckets listomapkeys .dir.5926.3> /dev/null &
> >
> > 'debug_osd=30' logs show the flow like:
> >
> > At t0 some thread enqueue_op's my omap-get-keys request.
> > Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading
> ~500k
> > keys.
> > Op-Thread B responds to several other requests during that 1 second
> > sleep.
> > They're generally extremely f