Re: [ceph-users] Usage of devices in SSD pool vary very much
On 1/26/19 10:24 PM, Kevin Olbrich wrote: I just had the time to check again: even after removing the broken OSD, mgr still crashes. All OSDs are on and in. If I run "ceph balancer on" on a HEALTH_OK cluster, an optimization plan is generated and started. After some minutes all MGRs die. This is a major problem for me, as I still got that SSD OSD that is inbalanced and limiting the whole pools space. Try to run mgr with `debug mgr = 4/5` and look to mgr log file. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How To Properly Failover a HA Setup
I tried setting noout and that did provide a bit better result. Basically I could stop the OSD on the inactive server and everything still worked (after a 2-3 second pause) but then when I rebooted the inactive server everything hung again until it came back online and resynced with the cluster. This is what I saw in ceph -s: cluster eb2003cf-b16d-4551-adb7-892469447f89 health HEALTH_WARN 128 pgs degraded 124 pgs stuck unclean 128 pgs undersized recovery 805252/1610504 objects degraded (50.000%) mds cluster is degraded 1/2 in osds are down noout flag(s) set monmap e1: 3 mons at {FILE1=10.1.1.201:6789/0,FILE2=10.1.1.202:6789/0,MON1=10.1.1.90:6789/0} election epoch 216, quorum 0,1,2 FILE1,FILE2,MON1 fsmap e796: 1/1/1 up {0=FILE2=up:rejoin} osdmap e360: 2 osds: 1 up, 2 in; 128 remapped pgs flags noout,sortbitwise,require_jewel_osds pgmap v7056802: 128 pgs, 3 pools, 164 GB data, 786 kobjects 349 GB used, 550 GB / 899 GB avail 805252/1610504 objects degraded (50.000%) 128 active+undersized+degraded client io 1379 B/s rd, 1 op/s rd, 0 op/s wr These are the commands I ran and the results: ceph osd set noout systemctl stop ceph-mds@FILE2.service # Everything still works on the clients... systemctl stop ceph-osd@0.service # This was on FILE2 wile FILE1 was the active fsmap # Fails over quickly, can still read content on the clients.. # Rebooted FILE2 # File access on the clients locked up until FILE2 rejoined This is on Ubuntu 16 with kernel 4.4.0-141, so I'm not sure if that qualifies for David's warning about old kernels... Is there a command or a logfile I can look at that will better help to diagnose this issue? Is three servers (with only 2 OSDs) enough to run a HA cluster on ceph, or does it just die when it doesn't have 3 active servers for a quorum? Would installing MDS and MON on a 4th box (but sticking with 2 OSDs) be what's required to resolve this? I really don't want to do that, but if I have to I guess I can look into find another box. On 2019-01-21 5:01 p.m., ceph-users-requ...@lists.ceph.com wrote: Message: 14 Date: Mon, 21 Jan 2019 10:05:15 +0100 From: Robert Sander To:ceph-users@lists.ceph.com Subject: Re: [ceph-users] How To Properly Failover a HA Setup Message-ID:<587dac75-96bc-8719-ee62-38e71491c...@heinlein-support.de> Content-Type: text/plain; charset="utf-8" On 21.01.19 09:22, Charles Tassell wrote: Hello Everyone, ? I've got a 3 node Jewel cluster setup, and I think I'm missing something.? When I want to take one of my nodes down for maintenance (kernel upgrades or the like) all of my clients (running the kernel module for the cephfs filesystem) hang for a couple of minutes before the redundant servers kick in. Have you set the noout flag before doing cluster maintenance? ceph osd set noout and afterwards ceph osd unset noout Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Questions about using existing HW for PoC cluster
Hi all, Kind of new to Ceph (have been using 10.2.11 on a 3-node Proxmox 4.x cluster [hyperconverged], works great!) and now I'm thinking of perhaps using it for a bigger data storage project at work, a PoC at first, but built as correctly as possible for performance and availability. I have the following server equipment available to use for the PoC; if it all goes well, I'd think new hardware for an actual production installation would be in order :) For the OSD servers, I have: (5) Intel R2312GL4GS 2U servers (c. 2013) with the following specs -- - (2) Intel Xeon E5-2660 CPUs (8-core, dual-threaded) - 64GB memory - (1) dual-port 10Gbase-T NIC (Intel X540-AT2) - (1) dual-port Infiniband HBA (Mellanox MT27500 ConnectX-3) (probably won't use, and would remove) - (4) Intel 1Gbase-T NICs (on mobo) - (1) Intel 240GB SATA SSD (OS) - (8) Hitachi 2TB SATA drives I am not bound to using the existing disk in these servers, but also want to keep the price down, as this is only a PoC. Was thinking of either putting an Intel Optane 900P PCIe SSD (480G) in for journal, or else some sort of SATA SSD in one of the available front bays (it's a 12 hotswap-bay machine, + two internal SSD mounts.) I also could get some higher capacity (and newer!) SATA drives, so as to keep the number of OSDs down for a given capacity (shooting for 25-50TB to start.) However, I'd love it if I didn't have to ask for any money ;) For monitor machines, I have available three Supermicro (c.2011) 1U servers with: - (2) Intel Xeon X5680 CPUs - 48GB memory - (2) 1Gbase-T NICs (on mobo) - (1) WD 2TB SATA drive I am considering also the rack placement; the 5 servers I'd use for OSD all currently live in one rack, and the Mon servers in another. I could move them if necessary. So, a few questions to start ;) - Is the above an acceptable collection of useful equipment for a PoC of modern Ceph? (thinking of installing Mimic with Bluestore) - Is putting the journal on a partition of the SATA drives a real I/O killer? (this is how my Proxmox boxes are set up) - If YES to the above, then is a SATA SSD acceptable for journal device, or should I definitely consider PCIe SSD? (I'd have to limit to one per server, which I know isn't optimal, but price prevents otherwise...) - Should I spread the servers out over racks, which would probably force me to use 3 out of the 5 avail OSD servers, and put bigger disks in them to get the desired capacity (I only have three racks to work with), or is it OK for a PoC to keep all OSD servers in one rack? - Are the platforms I'm proposing to use for monitor servers acceptable as-is, or do they need more memory, SSD drives, or 10GbE NICs? OK, enough q's for now - thanks for helping a new Ceph'r out :) Best, Will ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online
Hello, this is where (depending on your topology) something like: --- mon_osd_down_out_subtree_limit = host --- can come in very handy. Provided you have correct monitoring, alerting and operations, recovering a down node can often be restored long before any recovery would be finished and you also avoid the data movement back and forth. And if you see that recovering the node will take a long time, just manually set things out for the time being. Christian On Sun, 27 Jan 2019 00:02:54 +0100 Götz Reinicke wrote: > Dear Chris, > > Thanks for your feedback. The node/OSDs in question are part of an erasure > coded pool and during the weekend the workload should be close to none. > > But anyway, I could get a look on the console and on the server; the power is > up, but I cant use any console, the Loginprompt is shown, but no key is > accepted. > > I’ll have to reboot the server and check what he is complaining about > tomorrow morning ASAP I can access the server again. > > Fingers crossed and regards. Götz > > > > > Am 26.01.2019 um 23:41 schrieb Chris : > > > > It sort of depends on your workload/use case. Recovery operations can be > > computationally expensive. If your load is light because its the weekend > > you should be able to turn that host back on as soon as you resolve > > whatever the issue is with minimal impact. You can also increase the > > priority of the recovery operation to make it go faster if you feel you can > > spare additional IO and it won't affect clients. > > > > We do this in our cluster regularly and have yet to see an issue (given > > that we take care to do it during periods of lower client io) > > > > On January 26, 2019 17:16:38 Götz Reinicke > > wrote: > > > >> Hi, > >> > >> one host out of 10 is down for yet unknown reasons. I guess a power > >> failure. I could not yet see the server. > >> > >> The Cluster is recovering and remapping fine, but still has some objects > >> to process. > >> > >> My question: May I just switch the server back on and in best case, the 24 > >> OSDs get back online and recovering will do the job without problems. > >> > >> Or what might be a good way to handle that host? Should I first wait till > >> the recover is finished? > >> > >> Thanks for feedback and suggestions - Happy Saturday Night :) . Regards . > >> Götz > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bucket logging howto
>From the owner account of the bucket I am trying to enable logging, but I don't get how this should work. I see the s3:PutBucketLogging is supported, so I guess this should work. How do you enable it? And how do you access the log? [@ ~]$ s3cmd -c .s3cfg accesslog s3://archive Access logging for: s3://archive/ Logging Enabled: False [@ ~]$ s3cmd -c .s3cfg.archive accesslog s3://archive --access-logging-target-prefix=s3://archive/xx ERROR: S3 error: 405 (MethodNotAllowed) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bucket logging howto
>From the owner account of the bucket I am trying to enable logging, but I don't get how this should work. I see the s3:PutBucketLogging is supported, so I guess this should work. How do you enable it? And how do you access the log? [@ ~]$ s3cmd -c .s3cfg accesslog s3://archive Access logging for: s3://archive/ Logging Enabled: False ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online
I went through this as I reformatted all the OSDs with a much smaller cluster last weekend. When turning nodes back on, PGs would sometimes move, only to move back, prolonging the operation and system stress. What I took away is it’s least overall system stress to have the OSD tree back to target state as quickly as safe and practical. Replication will happen as replication will, but if the strategy changes midway, it just means the same speed of movement over a longer time. > On Jan 26, 2019, at 15:41, Chris wrote: > > It sort of depends on your workload/use case. Recovery operations can be > computationally expensive. If your load is light because its the weekend you > should be able to turn that host back on as soon as you resolve whatever the > issue is with minimal impact. You can also increase the priority of the > recovery operation to make it go faster if you feel you can spare additional > IO and it won't affect clients. > > We do this in our cluster regularly and have yet to see an issue (given that > we take care to do it during periods of lower client io) > >> On January 26, 2019 17:16:38 Götz Reinicke >> wrote: >> >> Hi, >> >> one host out of 10 is down for yet unknown reasons. I guess a power failure. >> I could not yet see the server. >> >> The Cluster is recovering and remapping fine, but still has some objects to >> process. >> >> My question: May I just switch the server back on and in best case, the 24 >> OSDs get back online and recovering will do the job without problems. >> >> Or what might be a good way to handle that host? Should I first wait till >> the recover is finished? >> >> Thanks for feedback and suggestions - Happy Saturday Night :) . Regards . >> Götz >> >> >> -- >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online
Dear Chris, Thanks for your feedback. The node/OSDs in question are part of an erasure coded pool and during the weekend the workload should be close to none. But anyway, I could get a look on the console and on the server; the power is up, but I cant use any console, the Loginprompt is shown, but no key is accepted. I’ll have to reboot the server and check what he is complaining about tomorrow morning ASAP I can access the server again. Fingers crossed and regards. Götz > Am 26.01.2019 um 23:41 schrieb Chris : > > It sort of depends on your workload/use case. Recovery operations can be > computationally expensive. If your load is light because its the weekend you > should be able to turn that host back on as soon as you resolve whatever the > issue is with minimal impact. You can also increase the priority of the > recovery operation to make it go faster if you feel you can spare additional > IO and it won't affect clients. > > We do this in our cluster regularly and have yet to see an issue (given that > we take care to do it during periods of lower client io) > > On January 26, 2019 17:16:38 Götz Reinicke > wrote: > >> Hi, >> >> one host out of 10 is down for yet unknown reasons. I guess a power failure. >> I could not yet see the server. >> >> The Cluster is recovering and remapping fine, but still has some objects to >> process. >> >> My question: May I just switch the server back on and in best case, the 24 >> OSDs get back online and recovering will do the job without problems. >> >> Or what might be a good way to handle that host? Should I first wait till >> the recover is finished? >> >> Thanks for feedback and suggestions - Happy Saturday Night :) . Regards . >> Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online
It sort of depends on your workload/use case. Recovery operations can be computationally expensive. If your load is light because its the weekend you should be able to turn that host back on as soon as you resolve whatever the issue is with minimal impact. You can also increase the priority of the recovery operation to make it go faster if you feel you can spare additional IO and it won't affect clients. We do this in our cluster regularly and have yet to see an issue (given that we take care to do it during periods of lower client io) On January 26, 2019 17:16:38 Götz Reinicke wrote: Hi, one host out of 10 is down for yet unknown reasons. I guess a power failure. I could not yet see the server. The Cluster is recovering and remapping fine, but still has some objects to process. My question: May I just switch the server back on and in best case, the 24 OSDs get back online and recovering will do the job without problems. Or what might be a good way to handle that host? Should I first wait till the recover is finished? Thanks for feedback and suggestions - Happy Saturday Night :) . Regards . Götz -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] One host with 24 OSDs is offline - best way to get it back online
Hi, one host out of 10 is down for yet unknown reasons. I guess a power failure. I could not yet see the server. The Cluster is recovering and remapping fine, but still has some objects to process. My question: May I just switch the server back on and in best case, the 24 OSDs get back online and recovering will do the job without problems. Or what might be a good way to handle that host? Should I first wait till the recover is finished? Thanks for feedback and suggestions - Happy Saturday Night :) . Regards . Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] repair do not work for inconsistent pg which three replica are the same
Am 10. Januar 2019 08:43:30 MEZ schrieb Wido den Hollander : > > >On 1/10/19 8:36 AM, hnuzhoulin2 wrote: >> >> Hi,cephers >> >> I have two inconsistent pg.I try list inconsistent obj,got nothing. >> >> rados list-inconsistent-obj 388.c29 >> No scrub information available for pg 388.c29 >> error 2: (2) No such file or directory >> > > >Have you tried to run a deep-scrub on this PG and see what that does? > >Wido > >> so I search the log to find the obj name, and I search this name in >> three replica. Yes, three replica all the same(md5 is the same). >> error log is: 388.c29 shard 295: soid >> >388:9430fef2:::c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4__multipart_dumbo%2f180888654%2f20181221%2fxtrabackup_full_x19_30044_20181221025000%2fx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595:head >> candidate had a readerror In Addition i would Check the underlying Disk... perhaps something in dmesg? - Mehmet >> >> obj name is: >> >DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184 >> all md5 is : 73281ed56c92a56da078b1ae52e888e0 >> >> stat info is: >> root@cld-osd3-48:/home/ceph/var/lib/osd/ceph-33/current/388.c29_head# >> stat >> >DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184 >> Size: 4194304 Blocks: 8200 IO Block: 4096 regular file >> Device: 891h/2193dInode: 4300403471 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 999/ ceph) Gid: ( 999/ > ceph) >> Access: 2018-12-21 14:17:12.945132144 +0800 >> Modify: 2018-12-21 14:17:12.965132073 +0800 >> Change: 2018-12-21 14:17:13.761129235 +0800 >> Birth: - >> >> >root@cld-osd24-48:/home/ceph/var/lib/osd/ceph-279/current/388.c29_head# >> stat >> >DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184 >> Size: 4194304 Blocks: 8200 IO Block: 4096 regular file >> Device: 831h/2097dInode: 8646464869 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 999/ ceph) Gid: ( 999/ > ceph) >> Access: 2019-01-07 10:54:23.010293026 +0800 >> Modify: 2019-01-07 10:54:23.010293026 +0800 >> Change: 2019-01-07 10:54:23.014293004 +0800 >> Birth: - >> >> >root@cld-osd31-48:/home/ceph/var/lib/osd/ceph-363/current/388.c29_head# >> stat >> >DIR_9/DIR_2/DIR_C/DIR_0/DIR_F/c2e226a9-b855-45c5-a17f-b1c697755072.1813469.4\\u\\umultipart\\udumbo\\s180888654\\s20181221\\sxtrabackup\\ufull\\ux19\\u30044\\u20181221025000\\sx19.xbstream.2~ntwW9vwutbmOJ4bDZYehERT2AokbtAi.3595__head_4F7F0C29__184 >> Size: 4194304 Blocks: 8200 IO Block: 4096 regular file >> Device: 831h/2097dInode: 13141445890 Links: 1 >> Access: (0644/-rw-r--r--) Uid: ( 999/ ceph) Gid: ( 999/ > ceph) >> Access: 2018-12-21 14:17:12.946862160 +0800 >> Modify: 2018-12-21 14:17:12.966862262 +0800 >> Change: 2018-12-21 14:17:13.762866312 +0800 >> Birth: - >> >> >> another pg os the same.I try run deep-scrub and repair. do not work. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Usage of devices in SSD pool vary very much
Hi! I just had the time to check again: even after removing the broken OSD, mgr still crashes. All OSDs are on and in. If I run "ceph balancer on" on a HEALTH_OK cluster, an optimization plan is generated and started. After some minutes all MGRs die. This is a major problem for me, as I still got that SSD OSD that is inbalanced and limiting the whole pools space. root@adminnode:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 29.91933 root default -16 29.91933 datacenter dc01 -19 29.91933 pod dc01-agg01 -10 16.52396 rack dc01-rack02 -46.29695 host node1001 0 hdd 0.90999 osd.0 up 1.0 1.0 1 hdd 0.90999 osd.1 up 1.0 1.0 5 hdd 0.90999 osd.5 up 1.0 1.0 29 hdd 0.90970 osd.29up 1.0 1.0 33 hdd 0.90970 osd.33up 1.0 1.0 2 ssd 0.43700 osd.2 up 1.0 1.0 3 ssd 0.43700 osd.3 up 1.0 1.0 4 ssd 0.43700 osd.4 up 1.0 1.0 30 ssd 0.43660 osd.30up 1.0 1.0 -76.29724 host node1002 9 hdd 0.90999 osd.9 up 1.0 1.0 10 hdd 0.90999 osd.10up 1.0 1.0 11 hdd 0.90999 osd.11up 1.0 1.0 12 hdd 0.90999 osd.12up 1.0 1.0 35 hdd 0.90970 osd.35up 1.0 1.0 6 ssd 0.43700 osd.6 up 1.0 1.0 7 ssd 0.43700 osd.7 up 1.0 1.0 8 ssd 0.43700 osd.8 up 1.0 1.0 31 ssd 0.43660 osd.31up 1.0 1.0 -282.18318 host node1005 34 ssd 0.43660 osd.34up 1.0 1.0 36 ssd 0.87329 osd.36up 1.0 1.0 37 ssd 0.87329 osd.37up 1.0 1.0 -291.74658 host node1006 42 ssd 0.87329 osd.42up 1.0 1.0 43 ssd 0.87329 osd.43up 1.0 1.0 -11 13.39537 rack dc01-rack03 -225.38794 host node1003 17 hdd 0.90999 osd.17up 1.0 1.0 18 hdd 0.90999 osd.18up 1.0 1.0 24 hdd 0.90999 osd.24up 1.0 1.0 26 hdd 0.90999 osd.26up 1.0 1.0 13 ssd 0.43700 osd.13up 1.0 1.0 14 ssd 0.43700 osd.14up 1.0 1.0 15 ssd 0.43700 osd.15up 1.0 1.0 16 ssd 0.43700 osd.16up 1.0 1.0 -255.38765 host node1004 23 hdd 0.90999 osd.23up 1.0 1.0 25 hdd 0.90999 osd.25up 1.0 1.0 27 hdd 0.90999 osd.27up 1.0 1.0 28 hdd 0.90970 osd.28up 1.0 1.0 19 ssd 0.43700 osd.19up 1.0 1.0 20 ssd 0.43700 osd.20up 1.0 1.0 21 ssd 0.43700 osd.21up 1.0 1.0 22 ssd 0.43700 osd.22up 1.0 1.0 -302.61978 host node1007 38 ssd 0.43660 osd.38up 1.0 1.0 39 ssd 0.43660 osd.39up 1.0 1.0 40 ssd 0.87329 osd.40up 1.0 1.0 41 ssd 0.87329 osd.41up 1.0 1.0 root@adminnode:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 hdd 0.90999 1.0 932GiB 353GiB 579GiB 37.87 0.83 95 1 hdd 0.90999 1.0 932GiB 400GiB 531GiB 42.98 0.94 108 5 hdd 0.90999 1.0 932GiB 267GiB 664GiB 28.70 0.63 72 29 hdd 0.90970 1.0 932GiB 356GiB 576GiB 38.19 0.84 96 33 hdd 0.90970 1.0 932GiB 344GiB 587GiB 36.94 0.81 93 2 ssd 0.43700 1.0 447GiB 273GiB 174GiB 61.09 1.34 52 3 ssd 0.43700 1.0 447GiB 252GiB 195GiB 56.38 1.23 61 4 ssd 0.43700 1.0 447GiB 308GiB 140GiB 68.78 1.51 59 30 ssd 0.43660 1.0 447GiB 231GiB 216GiB 51.77 1.13 48 9 hdd 0.90999 1.0 932GiB 358GiB 573GiB 38.48 0.84 97 10 hdd 0.90999 1.0 932GiB 347GiB 585GiB 37.25 0.82 94 11 hdd 0.90999
Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed
> Am 26.01.2019 um 14:16 schrieb Kevin Olbrich : > > Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke > : >> >> Hi, >> >> I have a fileserver which mounted a 4TB rbd, which is ext4 formatted. >> >> I grow that rbd and ext4 starting with an 2TB rbd that way: >> >> rbd resize testpool/disk01--size 4194304 >> >> resize2fs /dev/rbd0 >> >> Today I wanted to extend that ext4 to 8 TB and did: >> >> rbd resize testpool/disk01--size 8388608 >> >> resize2fs /dev/rbd0 >> >> => which gives an error: The filesystem is already 1073741824 blocks. >> Nothing to do. >> >> >>I bet I missed something very simple. Any hint? Thanks and regards . >> Götz > > Try "partprobe" to read device metrics again. Did not change anything and did not give any output/log messages. /Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke : > > Hi, > > I have a fileserver which mounted a 4TB rbd, which is ext4 formatted. > > I grow that rbd and ext4 starting with an 2TB rbd that way: > > rbd resize testpool/disk01--size 4194304 > > resize2fs /dev/rbd0 > > Today I wanted to extend that ext4 to 8 TB and did: > > rbd resize testpool/disk01--size 8388608 > > resize2fs /dev/rbd0 > > => which gives an error: The filesystem is already 1073741824 blocks. Nothing > to do. > > > I bet I missed something very simple. Any hint? Thanks and regards . > Götz Try "partprobe" to read device metrics again. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rezising an online mounted ext4 on a rbd - failed
Hi, I have a fileserver which mounted a 4TB rbd, which is ext4 formatted. I grow that rbd and ext4 starting with an 2TB rbd that way: rbd resize testpool/disk01--size 4194304 resize2fs /dev/rbd0 Today I wanted to extend that ext4 to 8 TB and did: rbd resize testpool/disk01--size 8388608 resize2fs /dev/rbd0 => which gives an error: The filesystem is already 1073741824 blocks. Nothing to do. I bet I missed something very simple. Any hint? Thanks and regards . Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating to a dedicated cluster network
Paul Emmerich writes: > Split networks is rarely worth it. One fast network is usually better. > And since you mentioned having only two interfaces: one bond is way > better than two independent interfaces. > IPv4/6 dual stack setups will be supported in Nautilus, you currently > have to use either IPv4 or IPv6. > Jumbo frames: often mentioned but usually not worth it. > (Yes, I know that this is somewhat controversial and increasing MTU is > often a standard trick for performance tuning, but I still have to see > have a benchmark that actually shows a significant performance > improvements. Some quick tests show that I can save around 5-10% CPU > load on a system doing ~50 gbit/s of IO traffic which is almost > nothing given the total system load) Agree with everything Paul said. (I know this is lame, but I think all of this bears repeating :-) To address another question in Jan's original post: I would not consider using link-local IPv6 addressing. Not just because I doubt that this would work (Ceph would always need to know/tell the OS which interface it should use with such an address), but mainly because even if it does work, it will only work as long as everything is on a single logical IPv6 network. This will artificially limit your options for the evolution of your cluster. Routable addresses are cheap in IPv6, use them! -- Simon. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
cmonty14 writes: > due to performance issues RGW is not an option. This statement may be > wrong, but there's the following aspect to consider. > If I write a backup that is typically a large file, this is normally a > single IO stream. > This causes massive performance issues on Ceph because this single IO > stream is sequentially written in small pieces on OSDs. > To overcome this issue multi IO stream should be used when writing > large files, and this means the application writing the backup must > support multi IO stream. RGW (and the S3 protocol in general) supports multi-stream uploads nicely, via the "multipart upload" feature: You split your file into many pieces, which can be uploaded in parallel. RGW with multipart uploads seems like a good fit for your application. It could solve your naming and permission issues, has low overhead, and could give you good performance as long as you use multipart uploads with parallel threads. You just need to make sure that your RGW gateways have enough throughput, but this capacity is relatively easy and inexpensive to provide. > Considering this the following question comes up: If I write a backup > into a RBD (that could be considered as a network share), will Ceph > use single IO stream or multi IO stream on storage side? Ceph should be able to handle multiple parallel streams of I/O to an RBD device (in general, writes will go to different "chunks" of the RBD, and those chunk objects will be on different OSDs). But it's another question whether your RBD client will be able to issue parallel streams of requests. Usually you have some kind of file system and kernel block I/O layer on the client side, and it's possible that those will serialize I/O, which will make it hard to get high throughput. -- Simon. > THX > Am Di., 22. Jan. 2019 um 23:20 Uhr schrieb Christian Wuerdig > : >> >> If you use librados directly it's up to you to ensure you can >> identify your objects. Generally RADOS stores objects and not files >> so when you provide your object ids you need to come up with a >> convention so you can correctly identify them. If you need to >> provide meta data (i.e. a list of all existing backups, when they >> were taken etc.) then again you need to manage that yourself >> (probably in dedicated meta-data objects). Using RADOS namespaces >> (like one per database) is probably a good idea. >> Also keep in mind that for example Bluestore has a maximum object >> size of 4GB so mapping files 1:1 to object is probably not a wise >> approach and you should breakup your files into smaller chunks when >> storing them. There is libradosstriper which handles the striping of >> large objects transparently but not sure if that has support for >> RADOS namespaces. >> >> Using RGW instead might be an easier route to go down >> >> On Wed, 23 Jan 2019 at 10:10, cmonty14 <74cmo...@gmail.com> wrote: >>> >>> My backup client is using librados. >>> I understand that defining a pool for the same application is recommended. >>> >>> However this would not answer my other questions: >>> How can I identify a backup created by client A that I want to restore >>> on another client Z? >>> I mean typically client A would write a backup file identified by the >>> filename. >>> Would it be possible on client Z to identify this backup file by >>> filename? If yes, how? >>> >>> Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb : >>> > >>> > Hi, >>> > >>> > Ceph's pool are meant to let you define specific engineering rules >>> > and/or application (rbd, cephfs, rgw) >>> > They are not designed to be created in a massive fashion (see pgs etc) >>> > So, create a pool for each engineering ruleset, and store your data in >>> > them >>> > For what is left of your project, I believe you have to implement that >>> > on top of Ceph >>> > >>> > For instance, let say you simply create a pool, with a rbd volume in it >>> > You then create a filesystem on that, and map it on some server >>> > Finally, you can push your files on that mountpoint, using various >>> > Linux's user, acl or whatever : beyond that point, there is nothing more >>> > specific to Ceph, it is "just" a mounted filesystem >>> > >>> > Regards, >>> > >>> > On 01/22/2019 02:16 PM, cmonty14 wrote: >>> > > Hi, >>> > > >>> > > my use case for Ceph is providing a central backup storage. >>> > > This means I will backup multiple databases in Ceph storage cluster. >>> > > >>> > > This is my question: >>> > > What is the best practice for creating pools & images? >>> > > Should I create multiple pools, means one pool per database? >>> > > Or should I create a single pool "backup" and use namespace when writing >>> > > data in the pool? >>> > > >>> > > This is the security demand that should be considered: >>> > > DB-owner A can only modify the files that belong to A; other files >>> > > (owned by B, C or D) are accessible for A. >>> > > >>> > > And there's another issue: >>> > > How can I identify a backup created by client A