Re: [ceph-users] SPAM in the ceph-users list

2019-11-13 Thread Shawn Iverson
On another note, the ceph lists might consider munging the from address and
implementing SPF/DKIM/DMARC for itself while checking others for DMARC
compliance at the MTA level.

I see a lot of ceph-users listserv emails landing my spam box.


On Tue, Nov 12, 2019 at 7:28 PM Christian Balzer  wrote:

> On Tue, 12 Nov 2019 12:42:23 -0600 Sasha Litvak wrote:
>
> > I am seeing more and more spam on this list.  Recently a strain of
> messages
> > announcing services and businesses in Bangalore for example.
>
> Aside from more stringent checks and subscribing and maybe initial
> moderation (all rather labor intensive and a nuisance for real users) as
> well as harsher ingress and egress (aka spamfiltering) controls you will
> find that all the domains spamvertized are now in the Spamhaus DBL.
>
> "host abbssm.edu.in.dbl.spamhaus.org"
>
> Pro tip for spammers:
> Don't get my attention, ever.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Mobile Inc.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Node failure -- corrupt memory

2019-11-11 Thread Shawn Iverson
Hello Cephers!

I had a node over the weekend go nuts from what appears to have been
failed/bad memory modules and/or motherboard.

This resulted in several OSDs blocking IO for > 128s (indefinitely).

I was not watching my alerts too closely over the weekend, or else I may
have caught it early. The servers in the entire cluster reliant on ceph
stalled from the blocked IO on this failing node and had to be restarted
after taking the faulty node offline.

So, my question is, is there a way to tell ceph to start setting OSDs out
in the event of an IO blockage that exceeds a certain limit, or are there
risks in doing so that I would be better off dealing with a stalled ceph
cluster?

-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Observation of bluestore db/wal performance

2019-07-21 Thread Shawn Iverson
Just wanted to post an observation here.  Perhaps someone with resources to
perform some performance tests is interested in comparing or has some
insight into why I observed this.

Background:

12 node ceph cluster
3-way replicated by chassis group
3 chassis groups
4 nodes per chassis
running Luminous (up to date)
heavy use of block storage for kvm virtual machines (proxmox)
some cephfs usage (<10%)
~100 OSDs
~100 pgs/osd
500GB average OSD capacity

I recently attempted to do away with my ssd cache tier on Luminous and
replace it with bluestore with db/wal on ssd as this seemed to be a better
practice, or so I thought.

Sadly, after 2 weeks of rebuilding OSDs and placing the db/wall on ssd, I
was sorely disappointed with performance.  My cluster performed poorly.  It
seemed that the db/wal on ssd did not boost performance as I was used to
having.  I used 60gb for the size.  Unfortunately, I did not have enough
ssd capacity to make it any larger for my OSDs

Despite the words of caution on the Ceph docs in regard to replicated base
tier and replicated cache-tier, I returned to cache tiering.

Performance has returned to expectations.

It would be interesting if someone had the spare iron and resources to
benchmark bluestore OSDs with SSD db/wal against cache tiering and provide
some statistics.

-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] typical snapmapper size

2019-06-06 Thread Shawn Iverson
17838

ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
24   hdd  1.0  1.0  419GiB  185GiB  234GiB 44.06 1.46  85

light snapshot use


On Thu, Jun 6, 2019 at 2:00 PM Sage Weil  wrote:

> Hello RBD users,
>
> Would you mind running this command on a random OSD on your RBD-oriented
> cluster?
>
> ceph-objectstore-tool \
>  --data-path /var/lib/ceph/osd/ceph-NNN \
>  
> '["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]'
> \
>  list-omap | wc -l
>
> ...and share the number of lines along with the overall size and
> utilization % of the OSD?  The OSD needs to be stopped, then run that
> command, then start it up again.
>
> I'm trying to guage how much snapmapper metadata there is in a "typical"
> RBD environment.  If you have some sense of whether your users make
> relatively heavy or light use of snapshots, that would be helpful too!
>
> Thanks!
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph expansion/deploy via ansible

2019-06-03 Thread Shawn Iverson
The cephfs_metadata pool makes sense on ssd, but it won't need a lot of
space.  Chances are that you'll have plenty of ssd storage to spare for
other uses.

Personally, I'm migrating away from a cache tier and rebuilding my OSDs. I
am finding that performance with Bluestore OSDs with the block.db on SSDs
performs better in most cases than a cache tier, and it is a simpler
design.  There's some good notes here on good and bad use cases for cache
tiering:  http://docs.ceph.com/docs/master/rados/operations/cache-tiering/


On Mon, Jun 3, 2019 at 3:35 PM Daniele Riccucci  wrote:

> Hello,
> sorry to jump in.
> I'm looking to expand with SSDs on an HDD cluster.
> I'm thinking about moving cephfs_metadata to the SSDs (maybe with device
> class?) or to use them as cache layer in front of the cluster.
> Any tips on how to do it with ceph-ansible?
> I can share the config I currently have if necessary.
> Thank you.
>
> Daniele
>
> On 17/04/19 17:01, Sinan Polat wrote:
> > I have deployed, expanded and upgraded multiple Ceph clusters using
> ceph-ansible. Works great.
> >
> > What information are you looking for?
> >
> > --
> > Sinan
> >
> >> Op 17 apr. 2019 om 16:24 heeft Francois Lafont <
> francois.lafont.1...@gmail.com> het volgende geschreven:
> >>
> >> Hi,
> >>
> >> +1 for ceph-ansible too. ;)
> >>
> >> --
> >> François (flaf)
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] list admin issues

2018-10-06 Thread Shawn Iverson
I thought it was just me, guess not...

On Sat, Oct 6, 2018 at 1:28 PM Janne Johansson  wrote:

> Den lör 6 okt. 2018 kl 15:06 skrev Elias Abacioglu
> :
> >
> > Hi,
> >
> > I'm bumping this old thread cause it's getting annoying. My membership
> get disabled twice a month.
> > Between my two Gmail accounts I'm in more than 25 mailing lists and I
> see this behavior only here. Why is only ceph-users only affected? Maybe
> Christian was on to something, is this intentional?
> > Reality is that there is a lot of ceph-users with Gmail accounts,
> perhaps it wouldn't be so bad to actually trying to figure this one out?
> >
> > So can the maintainers of this list please investigate what actually
> gets bounced? Look at my address if you want.
> > I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most
> recently.
> > Please help!
>
> Same here.
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 x1171
ivers...@rushville.k12.in.us
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-21 Thread Shawn Iverson
Glen,

Correction...looked at the wrong column for weights, my bad...

I was looking at the wrong column for weight.  You have varying weights,
but the process is still the same.  Balance your buckets (hosts) in your
crush map, and balance your osds in each bucket (host).

On Sat, Jul 21, 2018 at 9:14 AM, Shawn Iverson  wrote:

> Glen,
>
> It appears you have 447G, 931G, and 558G disks in your cluster, all with a
> weight of 1.0.  This means that although the new disks are bigger, they are
> not going to be utilized by pgs any more than any other disk.
>
> I would suggest reweighting your other disks (they are smaller), so that
> you balance your cluster.  You should do this gradually over time,
> preferably during off-peak times, when remapping will not affect operations.
>
> I do a little math, first by taking total cluster capacity and dividing it
> by total capacity of each bucket.  I then do the same thing in each bucket,
> until everything is proportioned appropriately down to the osds.
>
> On Fri, Jul 20, 2018 at 8:43 PM, Glen Baars 
> wrote:
>
>> Hello Ceph Users,
>>
>>
>>
>> We have added more ssd storage to our ceph cluster last night. We added 4
>> x 1TB drives and the available space went from 1.6TB to 0.6TB ( in `ceph
>> df` for the SSD pool ).
>>
>>
>>
>> I would assume that the weight needs to be changed but I didn’t think I
>> would need to? Should I change them to 0.75 from 0.9 and hopefully it will
>> rebalance correctly?
>>
>>
>>
>> #ceph osd tree | grep -v hdd
>>
>> ID  CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
>>
>> -1   534.60309 root default
>>
>> -1962.90637 host NAS-AUBUN-RK2-CEPH06
>>
>> 115   ssd   0.43660 osd.115   up  1.0 1.0
>>
>> 116   ssd   0.43660 osd.116   up  1.0 1.0
>>
>> 117   ssd   0.43660 osd.117   up  1.0 1.0
>>
>> 118   ssd   0.43660 osd.118   up  1.0 1.0
>>
>> -22   105.51169 host NAS-AUBUN-RK2-CEPH07
>>
>> 138   ssd   0.90970 osd.138   up  1.0 1.0
>> Added
>>
>> 139   ssd   0.90970 osd.139   up  1.0 1.0
>> Added
>>
>> -25   105.51169 host NAS-AUBUN-RK2-CEPH08
>>
>> 140   ssd   0.90970 osd.140   up  1.0 1.0
>> Added
>>
>> 141   ssd   0.90970 osd.141   up  1.0 1.0
>> Added
>>
>> -356.32617 host NAS-AUBUN-RK3-CEPH01
>>
>> 60   ssd   0.43660 osd.60up  1.0 1.0
>>
>> 61   ssd   0.43660 osd.61up  1.0 1.0
>>
>> 62   ssd   0.43660 osd.62up  1.0 1.0
>>
>> 63   ssd   0.43660 osd.63up  1.0 1.0
>>
>> -556.32617 host NAS-AUBUN-RK3-CEPH02
>>
>> 64   ssd   0.43660 osd.64up  1.0 1.0
>>
>> 65   ssd   0.43660 osd.65up  1.0 1.0
>>
>> 66   ssd   0.43660 osd.66up  1.0 1.0
>>
>> 67   ssd   0.43660 osd.67up  1.0 1.0
>>
>> -756.32617 host NAS-AUBUN-RK3-CEPH03
>>
>> 68   ssd   0.43660 osd.68up  1.0 1.0
>>
>> 69   ssd   0.43660 osd.69up  1.0 1.0
>>
>> 70   ssd   0.43660 osd.70up  1.0 1.0
>>
>> 71   ssd   0.43660 osd.71up  1.0 1.0
>>
>> -1345.84741 host NAS-AUBUN-RK3-CEPH04
>>
>> 72   ssd   0.54579 osd.72up  1.0 1.0
>>
>> 73   ssd   0.54579 osd.73up  1.0 1.0
>>
>> 76   ssd   0.54579 osd.76up  1.0 1.0
>>
>> 77   ssd   0.54579 osd.77up  1.0 1.0
>>
>> -1645.84741 host NAS-AUBUN-RK3-CEPH05
>>
>> 74   ssd   0.54579 osd.74up  1.0 1.0
>>
>> 75   ssd   0.54579 osd.75up  1.0 1.0
>>
>> 78   ssd   0.54579 osd.78up  1.0 1.0
>>
>> 79   ssd   0.54579 osd.79up  1.0 1.0
>>
>>
>>
>> # c

Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-21 Thread Shawn Iverson
>
> 62   ssd 0.43660  1.0  447G  275G  171G 61.58 1.89  95
>
> 63   ssd 0.43660  1.0  447G  260G  187G 58.15 1.79  97
>
> 64   ssd 0.43660  1.0  447G  232G  214G 52.08 1.60  83
>
> 65   ssd 0.43660  1.0  447G  207G  239G 46.36 1.42  75
>
> 66   ssd 0.43660  1.0  447G  217G  230G 48.54 1.49  84
>
> 67   ssd 0.43660  1.0  447G  252G  195G 56.36 1.73  92
>
> 68   ssd 0.43660  1.0  447G  248G  198G 55.56 1.71  94
>
> 69   ssd 0.43660  1.0  447G  229G  217G 51.25 1.57  84
>
> 70   ssd 0.43660  1.0  447G  259G  187G 58.01 1.78  87
>
> 71   ssd 0.43660  1.0  447G  267G  179G 59.83 1.84  97
>
> 72   ssd 0.54579  1.0  558G  217G  341G 38.96 1.20 100
>
> 73   ssd 0.54579  1.0  558G  283G  275G 50.75 1.56 121
>
> 76   ssd 0.54579  1.0  558G  286G  272G 51.33 1.58 129
>
> 77   ssd 0.54579  1.0  558G  246G  312G 44.07 1.35 104
>
> 74   ssd 0.54579  1.0  558G  273G  285G 48.91 1.50 122
>
> 75   ssd 0.54579  1.0  558G  281G  276G 50.45 1.55 114
>
> 78   ssd 0.54579  1.0  558G  289G  269G 51.80 1.59 133
>
> 79   ssd 0.54579  1.0  558G  276G  282G 49.39 1.52 119
>
> Kind regards,
>
> *Glen Baars*
>
> BackOnline Manager
>
>
> This e-mail is intended solely for the benefit of the addressee(s) and any
> other named recipient. It is confidential and may contain legally
> privileged or confidential information. If you are not the recipient, any
> use, distribution, disclosure or copying of this e-mail is prohibited. The
> confidentiality and legal privilege attached to this communication is not
> waived or lost by reason of the mistaken transmission or delivery to you.
> If you have received this e-mail in error, please notify us immediately.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 x1171
ivers...@rushville.k12.in.us
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs stalling on Intel SSDs

2018-07-10 Thread Shawn Iverson
Hi everybody,

I have a situation that occurs under moderate I/O load on Ceph Luminous:

2018-07-10 10:27:01.257916 mon.node4 mon.0 172.16.0.4:6789/0 15590 :
cluster [INF] mon.node4 is new leader, mons node4,node5,node6,node7,node8
in quorum (ranks 0,1,2,3,4)
2018-07-10 10:27:01.306329 mon.node4 mon.0 172.16.0.4:6789/0 15595 :
cluster [INF] Health check cleared: MON_DOWN (was: 1/5 mons down, quorum
node4,node6,node7,node8)
2018-07-10 10:27:01.386124 mon.node4 mon.0 172.16.0.4:6789/0 15596 :
cluster [WRN] overall HEALTH_WARN 1 osds down; Reduced data availability: 1
pg peering; Degraded data redundancy: 58774/10188798 objects degraded
(0.577%), 13 pgs degraded; 412 slow requests are blocked > 32 sec
2018-07-10 10:27:02.598175 mon.node4 mon.0 172.16.0.4:6789/0 15597 :
cluster [WRN] Health check update: Degraded data redundancy: 77153/10188798
objects degraded (0.757%), 17 pgs degraded (PG_DEGRADED)
2018-07-10 10:27:02.598225 mon.node4 mon.0 172.16.0.4:6789/0 15598 :
cluster [WRN] Health check update: 381 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-07-10 10:27:02.598264 mon.node4 mon.0 172.16.0.4:6789/0 15599 :
cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
availability: 1 pg peering)
2018-07-10 10:27:02.608006 mon.node4 mon.0 172.16.0.4:6789/0 15600 :
cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2018-07-10 10:27:02.701029 mon.node4 mon.0 172.16.0.4:6789/0 15601 :
cluster [INF] osd.36 172.16.0.5:6800/3087 boot
2018-07-10 10:27:01.184334 osd.36 osd.36 172.16.0.5:6800/3087 23 : cluster
[WRN] Monitor daemon marked osd.36 down, but it is still running
2018-07-10 10:27:04.861372 mon.node4 mon.0 172.16.0.4:6789/0 15604 :
cluster [INF] Health check cleared: REQUEST_SLOW (was: 381 slow requests
are blocked > 32 sec)

The OSDs that seem to be affected are Intel SSDs, specific model is
SSDSC2BX480G4L.

I have throttled backups to try to lessen the situation, but it seems to
affect the same OSDs when it happens.  It has the added side effect of
taking down the mon on the same node for a few seconds and triggering a
monitor election.

I am wondering if this may be a firmware issue on this drive and if anyone
has any insight or additional troubleshooting steps I should try to get a
deeper look at this behavior.

I am going to upgrade firmware on these drives and see if it helps.

-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 x1171
ivers...@rushville.k12.in.us
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com