Re: [ceph-users] SPAM in the ceph-users list
On another note, the ceph lists might consider munging the from address and implementing SPF/DKIM/DMARC for itself while checking others for DMARC compliance at the MTA level. I see a lot of ceph-users listserv emails landing my spam box. On Tue, Nov 12, 2019 at 7:28 PM Christian Balzer wrote: > On Tue, 12 Nov 2019 12:42:23 -0600 Sasha Litvak wrote: > > > I am seeing more and more spam on this list. Recently a strain of > messages > > announcing services and businesses in Bangalore for example. > > Aside from more stringent checks and subscribing and maybe initial > moderation (all rather labor intensive and a nuisance for real users) as > well as harsher ingress and egress (aka spamfiltering) controls you will > find that all the domains spamvertized are now in the Spamhaus DBL. > > "host abbssm.edu.in.dbl.spamhaus.org" > > Pro tip for spammers: > Don't get my attention, ever. > > Christian > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Rakuten Mobile Inc. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Shawn Iverson, CETL Director of Technology Rush County Schools ivers...@rushville.k12.in.us [image: Cybersecurity] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Node failure -- corrupt memory
Hello Cephers! I had a node over the weekend go nuts from what appears to have been failed/bad memory modules and/or motherboard. This resulted in several OSDs blocking IO for > 128s (indefinitely). I was not watching my alerts too closely over the weekend, or else I may have caught it early. The servers in the entire cluster reliant on ceph stalled from the blocked IO on this failing node and had to be restarted after taking the faulty node offline. So, my question is, is there a way to tell ceph to start setting OSDs out in the event of an IO blockage that exceeds a certain limit, or are there risks in doing so that I would be better off dealing with a stalled ceph cluster? -- Shawn Iverson, CETL Director of Technology Rush County Schools ivers...@rushville.k12.in.us [image: Cybersecurity] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Observation of bluestore db/wal performance
Just wanted to post an observation here. Perhaps someone with resources to perform some performance tests is interested in comparing or has some insight into why I observed this. Background: 12 node ceph cluster 3-way replicated by chassis group 3 chassis groups 4 nodes per chassis running Luminous (up to date) heavy use of block storage for kvm virtual machines (proxmox) some cephfs usage (<10%) ~100 OSDs ~100 pgs/osd 500GB average OSD capacity I recently attempted to do away with my ssd cache tier on Luminous and replace it with bluestore with db/wal on ssd as this seemed to be a better practice, or so I thought. Sadly, after 2 weeks of rebuilding OSDs and placing the db/wall on ssd, I was sorely disappointed with performance. My cluster performed poorly. It seemed that the db/wal on ssd did not boost performance as I was used to having. I used 60gb for the size. Unfortunately, I did not have enough ssd capacity to make it any larger for my OSDs Despite the words of caution on the Ceph docs in regard to replicated base tier and replicated cache-tier, I returned to cache tiering. Performance has returned to expectations. It would be interesting if someone had the spare iron and resources to benchmark bluestore OSDs with SSD db/wal against cache tiering and provide some statistics. -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 option 7 ivers...@rushville.k12.in.us [image: Cybersecurity] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] typical snapmapper size
17838 ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 24 hdd 1.0 1.0 419GiB 185GiB 234GiB 44.06 1.46 85 light snapshot use On Thu, Jun 6, 2019 at 2:00 PM Sage Weil wrote: > Hello RBD users, > > Would you mind running this command on a random OSD on your RBD-oriented > cluster? > > ceph-objectstore-tool \ > --data-path /var/lib/ceph/osd/ceph-NNN \ > > '["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]' > \ > list-omap | wc -l > > ...and share the number of lines along with the overall size and > utilization % of the OSD? The OSD needs to be stopped, then run that > command, then start it up again. > > I'm trying to guage how much snapmapper metadata there is in a "typical" > RBD environment. If you have some sense of whether your users make > relatively heavy or light use of snapshots, that would be helpful too! > > Thanks! > sage > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 option 7 ivers...@rushville.k12.in.us [image: Cybersecurity] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph expansion/deploy via ansible
The cephfs_metadata pool makes sense on ssd, but it won't need a lot of space. Chances are that you'll have plenty of ssd storage to spare for other uses. Personally, I'm migrating away from a cache tier and rebuilding my OSDs. I am finding that performance with Bluestore OSDs with the block.db on SSDs performs better in most cases than a cache tier, and it is a simpler design. There's some good notes here on good and bad use cases for cache tiering: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ On Mon, Jun 3, 2019 at 3:35 PM Daniele Riccucci wrote: > Hello, > sorry to jump in. > I'm looking to expand with SSDs on an HDD cluster. > I'm thinking about moving cephfs_metadata to the SSDs (maybe with device > class?) or to use them as cache layer in front of the cluster. > Any tips on how to do it with ceph-ansible? > I can share the config I currently have if necessary. > Thank you. > > Daniele > > On 17/04/19 17:01, Sinan Polat wrote: > > I have deployed, expanded and upgraded multiple Ceph clusters using > ceph-ansible. Works great. > > > > What information are you looking for? > > > > -- > > Sinan > > > >> Op 17 apr. 2019 om 16:24 heeft Francois Lafont < > francois.lafont.1...@gmail.com> het volgende geschreven: > >> > >> Hi, > >> > >> +1 for ceph-ansible too. ;) > >> > >> -- > >> François (flaf) > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 option 7 ivers...@rushville.k12.in.us [image: Cybersecurity] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] list admin issues
I thought it was just me, guess not... On Sat, Oct 6, 2018 at 1:28 PM Janne Johansson wrote: > Den lör 6 okt. 2018 kl 15:06 skrev Elias Abacioglu > : > > > > Hi, > > > > I'm bumping this old thread cause it's getting annoying. My membership > get disabled twice a month. > > Between my two Gmail accounts I'm in more than 25 mailing lists and I > see this behavior only here. Why is only ceph-users only affected? Maybe > Christian was on to something, is this intentional? > > Reality is that there is a lot of ceph-users with Gmail accounts, > perhaps it wouldn't be so bad to actually trying to figure this one out? > > > > So can the maintainers of this list please investigate what actually > gets bounced? Look at my address if you want. > > I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most > recently. > > Please help! > > Same here. > > > -- > May the most significant bit of your life be positive. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 x1171 ivers...@rushville.k12.in.us ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks
Glen, Correction...looked at the wrong column for weights, my bad... I was looking at the wrong column for weight. You have varying weights, but the process is still the same. Balance your buckets (hosts) in your crush map, and balance your osds in each bucket (host). On Sat, Jul 21, 2018 at 9:14 AM, Shawn Iverson wrote: > Glen, > > It appears you have 447G, 931G, and 558G disks in your cluster, all with a > weight of 1.0. This means that although the new disks are bigger, they are > not going to be utilized by pgs any more than any other disk. > > I would suggest reweighting your other disks (they are smaller), so that > you balance your cluster. You should do this gradually over time, > preferably during off-peak times, when remapping will not affect operations. > > I do a little math, first by taking total cluster capacity and dividing it > by total capacity of each bucket. I then do the same thing in each bucket, > until everything is proportioned appropriately down to the osds. > > On Fri, Jul 20, 2018 at 8:43 PM, Glen Baars > wrote: > >> Hello Ceph Users, >> >> >> >> We have added more ssd storage to our ceph cluster last night. We added 4 >> x 1TB drives and the available space went from 1.6TB to 0.6TB ( in `ceph >> df` for the SSD pool ). >> >> >> >> I would assume that the weight needs to be changed but I didn’t think I >> would need to? Should I change them to 0.75 from 0.9 and hopefully it will >> rebalance correctly? >> >> >> >> #ceph osd tree | grep -v hdd >> >> ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF >> >> -1 534.60309 root default >> >> -1962.90637 host NAS-AUBUN-RK2-CEPH06 >> >> 115 ssd 0.43660 osd.115 up 1.0 1.0 >> >> 116 ssd 0.43660 osd.116 up 1.0 1.0 >> >> 117 ssd 0.43660 osd.117 up 1.0 1.0 >> >> 118 ssd 0.43660 osd.118 up 1.0 1.0 >> >> -22 105.51169 host NAS-AUBUN-RK2-CEPH07 >> >> 138 ssd 0.90970 osd.138 up 1.0 1.0 >> Added >> >> 139 ssd 0.90970 osd.139 up 1.0 1.0 >> Added >> >> -25 105.51169 host NAS-AUBUN-RK2-CEPH08 >> >> 140 ssd 0.90970 osd.140 up 1.0 1.0 >> Added >> >> 141 ssd 0.90970 osd.141 up 1.0 1.0 >> Added >> >> -356.32617 host NAS-AUBUN-RK3-CEPH01 >> >> 60 ssd 0.43660 osd.60up 1.0 1.0 >> >> 61 ssd 0.43660 osd.61up 1.0 1.0 >> >> 62 ssd 0.43660 osd.62up 1.0 1.0 >> >> 63 ssd 0.43660 osd.63up 1.0 1.0 >> >> -556.32617 host NAS-AUBUN-RK3-CEPH02 >> >> 64 ssd 0.43660 osd.64up 1.0 1.0 >> >> 65 ssd 0.43660 osd.65up 1.0 1.0 >> >> 66 ssd 0.43660 osd.66up 1.0 1.0 >> >> 67 ssd 0.43660 osd.67up 1.0 1.0 >> >> -756.32617 host NAS-AUBUN-RK3-CEPH03 >> >> 68 ssd 0.43660 osd.68up 1.0 1.0 >> >> 69 ssd 0.43660 osd.69up 1.0 1.0 >> >> 70 ssd 0.43660 osd.70up 1.0 1.0 >> >> 71 ssd 0.43660 osd.71up 1.0 1.0 >> >> -1345.84741 host NAS-AUBUN-RK3-CEPH04 >> >> 72 ssd 0.54579 osd.72up 1.0 1.0 >> >> 73 ssd 0.54579 osd.73up 1.0 1.0 >> >> 76 ssd 0.54579 osd.76up 1.0 1.0 >> >> 77 ssd 0.54579 osd.77up 1.0 1.0 >> >> -1645.84741 host NAS-AUBUN-RK3-CEPH05 >> >> 74 ssd 0.54579 osd.74up 1.0 1.0 >> >> 75 ssd 0.54579 osd.75up 1.0 1.0 >> >> 78 ssd 0.54579 osd.78up 1.0 1.0 >> >> 79 ssd 0.54579 osd.79up 1.0 1.0 >> >> >> >> # c
Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks
> > 62 ssd 0.43660 1.0 447G 275G 171G 61.58 1.89 95 > > 63 ssd 0.43660 1.0 447G 260G 187G 58.15 1.79 97 > > 64 ssd 0.43660 1.0 447G 232G 214G 52.08 1.60 83 > > 65 ssd 0.43660 1.0 447G 207G 239G 46.36 1.42 75 > > 66 ssd 0.43660 1.0 447G 217G 230G 48.54 1.49 84 > > 67 ssd 0.43660 1.0 447G 252G 195G 56.36 1.73 92 > > 68 ssd 0.43660 1.0 447G 248G 198G 55.56 1.71 94 > > 69 ssd 0.43660 1.0 447G 229G 217G 51.25 1.57 84 > > 70 ssd 0.43660 1.0 447G 259G 187G 58.01 1.78 87 > > 71 ssd 0.43660 1.0 447G 267G 179G 59.83 1.84 97 > > 72 ssd 0.54579 1.0 558G 217G 341G 38.96 1.20 100 > > 73 ssd 0.54579 1.0 558G 283G 275G 50.75 1.56 121 > > 76 ssd 0.54579 1.0 558G 286G 272G 51.33 1.58 129 > > 77 ssd 0.54579 1.0 558G 246G 312G 44.07 1.35 104 > > 74 ssd 0.54579 1.0 558G 273G 285G 48.91 1.50 122 > > 75 ssd 0.54579 1.0 558G 281G 276G 50.45 1.55 114 > > 78 ssd 0.54579 1.0 558G 289G 269G 51.80 1.59 133 > > 79 ssd 0.54579 1.0 558G 276G 282G 49.39 1.52 119 > > Kind regards, > > *Glen Baars* > > BackOnline Manager > > > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally > privileged or confidential information. If you are not the recipient, any > use, distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. > If you have received this e-mail in error, please notify us immediately. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 x1171 ivers...@rushville.k12.in.us ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSDs stalling on Intel SSDs
Hi everybody, I have a situation that occurs under moderate I/O load on Ceph Luminous: 2018-07-10 10:27:01.257916 mon.node4 mon.0 172.16.0.4:6789/0 15590 : cluster [INF] mon.node4 is new leader, mons node4,node5,node6,node7,node8 in quorum (ranks 0,1,2,3,4) 2018-07-10 10:27:01.306329 mon.node4 mon.0 172.16.0.4:6789/0 15595 : cluster [INF] Health check cleared: MON_DOWN (was: 1/5 mons down, quorum node4,node6,node7,node8) 2018-07-10 10:27:01.386124 mon.node4 mon.0 172.16.0.4:6789/0 15596 : cluster [WRN] overall HEALTH_WARN 1 osds down; Reduced data availability: 1 pg peering; Degraded data redundancy: 58774/10188798 objects degraded (0.577%), 13 pgs degraded; 412 slow requests are blocked > 32 sec 2018-07-10 10:27:02.598175 mon.node4 mon.0 172.16.0.4:6789/0 15597 : cluster [WRN] Health check update: Degraded data redundancy: 77153/10188798 objects degraded (0.757%), 17 pgs degraded (PG_DEGRADED) 2018-07-10 10:27:02.598225 mon.node4 mon.0 172.16.0.4:6789/0 15598 : cluster [WRN] Health check update: 381 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-07-10 10:27:02.598264 mon.node4 mon.0 172.16.0.4:6789/0 15599 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg peering) 2018-07-10 10:27:02.608006 mon.node4 mon.0 172.16.0.4:6789/0 15600 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2018-07-10 10:27:02.701029 mon.node4 mon.0 172.16.0.4:6789/0 15601 : cluster [INF] osd.36 172.16.0.5:6800/3087 boot 2018-07-10 10:27:01.184334 osd.36 osd.36 172.16.0.5:6800/3087 23 : cluster [WRN] Monitor daemon marked osd.36 down, but it is still running 2018-07-10 10:27:04.861372 mon.node4 mon.0 172.16.0.4:6789/0 15604 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 381 slow requests are blocked > 32 sec) The OSDs that seem to be affected are Intel SSDs, specific model is SSDSC2BX480G4L. I have throttled backups to try to lessen the situation, but it seems to affect the same OSDs when it happens. It has the added side effect of taking down the mon on the same node for a few seconds and triggering a monitor election. I am wondering if this may be a firmware issue on this drive and if anyone has any insight or additional troubleshooting steps I should try to get a deeper look at this behavior. I am going to upgrade firmware on these drives and see if it helps. -- Shawn Iverson, CETL Director of Technology Rush County Schools 765-932-3901 x1171 ivers...@rushville.k12.in.us ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com