Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
Thanks David. Thanks again Cary. If I have 682 GB used, 12998 GB / 13680 GB avail, then I still need to divide 13680/3 (my replication setting) to get what my total storage really is, right? Thanks! James Okken Lab Manager Dialogic Research Inc. 4 Gatehall Drive Parsippany NJ 07054 USA Tel: 973 967 5179 Email: james.ok...@dialogic.com Web: www.dialogic.com – The Network Fuel Company This e-mail is intended only for the named recipient(s) and may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. No waiver of privilege, confidence or otherwise is intended by virtue of communication via the internet. Any unauthorized use, dissemination or copying is strictly prohibited. If you have received this e-mail in error, or are not named as a recipient, please immediately notify the sender and destroy all copies of this e-mail. -Original Message- From: Cary [mailto:dynamic.c...@gmail.com] Sent: Friday, December 15, 2017 5:56 PM To: David Turner Cc: James Okken; ceph-users@lists.ceph.com Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster) James, You can set these values in ceph.conf. [global] ... osd pool default size = 3 osd pool default min size = 2 ... New pools that are created will use those values. If you run a "ceph -s" and look at the "usage" line, it shows how much space is: 1 used, 2 available, 3 total. ie. usage: 19465 GB used, 60113 GB / 79578 GB avail We choose to use Openstack with Ceph in this decade and do the other things, not because they are easy, but because they are hard...;-p Cary -Dynamic On Fri, Dec 15, 2017 at 10:12 PM, David Turner <drakonst...@gmail.com> wrote: > In conjunction with increasing the pool size to 3, also increase the > pool min_size to 2. `ceph df` and `ceph osd df` will eventually show > the full size in use in your cluster. In particular the output of > `ceph df` with available size in a pool takes into account the pools > replication size. > Continue watching ceph -s or ceph -w to see when the backfilling for > your change to replication size finishes. > > On Fri, Dec 15, 2017 at 5:06 PM James Okken <james.ok...@dialogic.com> > wrote: >> >> This whole effort went extremely well, thanks to Cary, and Im not >> used to that with CEPH so far. (And openstack ever) Thank you >> Cary. >> >> Ive upped the replication factor and now I see "replicated size 3" in >> each of my pools. Is this the only place to check replication level? >> Is there a Global setting or only a setting per Pool? >> >> ceph osd pool ls detail >> pool 0 'rbd' replicated size 3.. >> pool 1 'images' replicated size 3... >> ... >> >> One last question! >> At this replication level how can I tell how much total space I >> actually have now? >> Do I just 1/3 the Global size? >> >> ceph df >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED >> 13680G 12998G 682G 4.99 >> POOLS: >> NAMEID USED %USED MAX AVAIL OBJECTS >> rbd 0 0 0 6448G 0 >> images 1 216G 3.24 6448G 27745 >> backups 2 0 0 6448G 0 >> volumes 3 117G 1.79 6448G 30441 >> compute 4 0 0 6448G 0 >> >> ceph osd df >> ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS >> 0 0.81689 1.0 836G 36549M 800G 4.27 0.86 67 >> 4 3.7 1.0 3723G 170G 3553G 4.58 0.92 270 >> 1 0.81689 1.0 836G 49612M 788G 5.79 1.16 56 >> 5 3.7 1.0 3723G 192G 3531G 5.17 1.04 282 >> 2 0.81689 1.0 836G 33639M 803G 3.93 0.79 58 >> 3 3.7 1.0 3723G 202G 3521G 5.43 1.09 291 >> TOTAL 13680G 682G 12998G 4.99 >> MIN/MAX VAR: 0.79/1.16 STDDEV: 0.67 >> >> Thanks! >> >> -Original Message- >> From: Cary [mailto:dynamic.c...@gmail.com] >> Sent: Friday, December 15, 2017 4:05 PM >> To: James Okken >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server >> cluster) >> >> James, >> >> Those errors are normal. Ceph creates the missing files. You can >> check "/var/lib/ceph/osd/ceph-6", before and after you run those >> commands to see what files are added there. >> >> Make sure you get the replication factor set. >> >> >> Cary >> -Dynamic >> >> On Fri, Dec 15, 2017 at 6:11 PM, Jam
Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
This whole effort went extremely well, thanks to Cary, and Im not used to that with CEPH so far. (And openstack ever) Thank you Cary. Ive upped the replication factor and now I see "replicated size 3" in each of my pools. Is this the only place to check replication level? Is there a Global setting or only a setting per Pool? ceph osd pool ls detail pool 0 'rbd' replicated size 3.. pool 1 'images' replicated size 3... ... One last question! At this replication level how can I tell how much total space I actually have now? Do I just 1/3 the Global size? ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 13680G 12998G 682G 4.99 POOLS: NAMEID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 6448G 0 images 1 216G 3.24 6448G 27745 backups 2 0 0 6448G 0 volumes 3 117G 1.79 6448G 30441 compute 4 0 0 6448G 0 ceph osd df ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 0.81689 1.0 836G 36549M 800G 4.27 0.86 67 4 3.7 1.0 3723G 170G 3553G 4.58 0.92 270 1 0.81689 1.0 836G 49612M 788G 5.79 1.16 56 5 3.7 1.0 3723G 192G 3531G 5.17 1.04 282 2 0.81689 1.0 836G 33639M 803G 3.93 0.79 58 3 3.7 1.0 3723G 202G 3521G 5.43 1.09 291 TOTAL 13680G 682G 12998G 4.99 MIN/MAX VAR: 0.79/1.16 STDDEV: 0.67 Thanks! -Original Message- From: Cary [mailto:dynamic.c...@gmail.com] Sent: Friday, December 15, 2017 4:05 PM To: James Okken Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster) James, Those errors are normal. Ceph creates the missing files. You can check "/var/lib/ceph/osd/ceph-6", before and after you run those commands to see what files are added there. Make sure you get the replication factor set. Cary -Dynamic On Fri, Dec 15, 2017 at 6:11 PM, James Okken <james.ok...@dialogic.com> wrote: > Thanks again Cary, > > Yes, once all the backfilling was done I was back to a Healthy cluster. > I moved on to the same steps for the next server in the cluster, it is > backfilling now. > Once that is done I will do the last server in the cluster, and then I think > I am done! > > Just checking on one thing. I get these messages when running this command. I > assume this is OK, right? > root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid > 25c21708-f756-4593-bc9e-c5506622cf07 > 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: > disabling aio for non-block journal. Use journal_force_aio to force > use of aio anyway > 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: > disabling aio for non-block journal. Use journal_force_aio to force > use of aio anyway > 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 > filestore(/var/lib/ceph/osd/ceph-4) could not find > #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or > directory > 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store > /var/lib/ceph/osd/ceph-4 for osd.4 fsid > 2b9f7957-d0db-481e-923e-89972f6c594f > 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: > /var/lib/ceph/osd/ceph-4/keyring: can't open > /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory > 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring > /var/lib/ceph/osd/ceph-4/keyring > > thanks > > -Original Message- > From: Cary [mailto:dynamic.c...@gmail.com] > Sent: Thursday, December 14, 2017 7:13 PM > To: James Okken > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server > cluster) > > James, > > Usually once the misplaced data has balanced out the cluster should reach a > healthy state. If you run a "ceph health detail" Ceph will show you some more > detail about what is happening. Is Ceph still recovering, or has it stalled? > has the "objects misplaced (62.511%" > changed to a lower %? > > Cary > -Dynamic > > On Thu, Dec 14, 2017 at 10:52 PM, James Okken <james.ok...@dialogic.com> > wrote: >> Thanks Cary! >> >> Your directions worked on my first sever. (once I found the missing carriage >> return in your list of commands, the email musta messed it up. >> >> For anyone else: >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd >> 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is >> 2 commands: >> chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 and ceph auth add osd.4 >> osd 'allow
Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
Thanks again Cary, Yes, once all the backfilling was done I was back to a Healthy cluster. I moved on to the same steps for the next server in the cluster, it is backfilling now. Once that is done I will do the last server in the cluster, and then I think I am done! Just checking on one thing. I get these messages when running this command. I assume this is OK, right? root@node-54:~# ceph-osd -i 4 --mkfs --mkkey --osd-uuid 25c21708-f756-4593-bc9e-c5506622cf07 2017-12-15 17:28:22.849534 7fd2f9e928c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2017-12-15 17:28:22.855838 7fd2f9e928c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2017-12-15 17:28:22.856444 7fd2f9e928c0 -1 filestore(/var/lib/ceph/osd/ceph-4) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory 2017-12-15 17:28:22.893443 7fd2f9e928c0 -1 created object store /var/lib/ceph/osd/ceph-4 for osd.4 fsid 2b9f7957-d0db-481e-923e-89972f6c594f 2017-12-15 17:28:22.893484 7fd2f9e928c0 -1 auth: error reading file: /var/lib/ceph/osd/ceph-4/keyring: can't open /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory 2017-12-15 17:28:22.893662 7fd2f9e928c0 -1 created new key in keyring /var/lib/ceph/osd/ceph-4/keyring thanks -Original Message- From: Cary [mailto:dynamic.c...@gmail.com] Sent: Thursday, December 14, 2017 7:13 PM To: James Okken Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster) James, Usually once the misplaced data has balanced out the cluster should reach a healthy state. If you run a "ceph health detail" Ceph will show you some more detail about what is happening. Is Ceph still recovering, or has it stalled? has the "objects misplaced (62.511%" changed to a lower %? Cary -Dynamic On Thu, Dec 14, 2017 at 10:52 PM, James Okken <james.ok...@dialogic.com> wrote: > Thanks Cary! > > Your directions worked on my first sever. (once I found the missing carriage > return in your list of commands, the email musta messed it up. > > For anyone else: > chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd > 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 > commands: > chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 and ceph auth add osd.4 > osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring > > Cary, what am I looking for in ceph -w and ceph -s to show the status of the > data moving? > Seems like the data is moving and that I have some issue... > > root@node-53:~# ceph -w > cluster 2b9f7957-d0db-481e-923e-89972f6c594f > health HEALTH_WARN > 176 pgs backfill_wait > 1 pgs backfilling > 27 pgs degraded > 1 pgs recovering > 26 pgs recovery_wait > 27 pgs stuck degraded > 204 pgs stuck unclean > recovery 10322/84644 objects degraded (12.195%) > recovery 52912/84644 objects misplaced (62.511%) > monmap e3: 3 mons at > {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0} > election epoch 138, quorum 0,1,2 node-45,node-44,node-43 > osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs > flags sortbitwise,require_jewel_osds > pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects > 370 GB used, 5862 GB / 6233 GB avail > 10322/84644 objects degraded (12.195%) > 52912/84644 objects misplaced (62.511%) > 308 active+clean > 176 active+remapped+wait_backfill > 26 active+recovery_wait+degraded >1 active+remapped+backfilling >1 active+recovering+degraded recovery io 100605 > kB/s, 14 objects/s > client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr > > 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1 > activating, 1 active+recovering+degraded, 26 > active+recovery_wait+degraded, 1 active+remapped+backfilling, 307 > active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB > used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s; > 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced > (62.511%); 12224 kB/s, 2 objects/s recovering > 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1 > active+recovering+degraded, 26 active+recovery_wait+degraded, 1 > active+remapped+backfilling, 308 active+clean, 176 > active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / > 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects > de
Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
Thanks Cary! Your directions worked on my first sever. (once I found the missing carriage return in your list of commands, the email musta messed it up. For anyone else: chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 ceph auth add osd.4 osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring really is 2 commands: chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 and ceph auth add osd.4 osd 'allow *' mon 'allow profile osd' -i /etc/ceph/ceph.osd.4.keyring Cary, what am I looking for in ceph -w and ceph -s to show the status of the data moving? Seems like the data is moving and that I have some issue... root@node-53:~# ceph -w cluster 2b9f7957-d0db-481e-923e-89972f6c594f health HEALTH_WARN 176 pgs backfill_wait 1 pgs backfilling 27 pgs degraded 1 pgs recovering 26 pgs recovery_wait 27 pgs stuck degraded 204 pgs stuck unclean recovery 10322/84644 objects degraded (12.195%) recovery 52912/84644 objects misplaced (62.511%) monmap e3: 3 mons at {node-43=192.168.1.7:6789/0,node-44=192.168.1.5:6789/0,node-45=192.168.1.3:6789/0} election epoch 138, quorum 0,1,2 node-45,node-44,node-43 osdmap e206: 4 osds: 4 up, 4 in; 177 remapped pgs flags sortbitwise,require_jewel_osds pgmap v3936175: 512 pgs, 5 pools, 333 GB data, 58184 objects 370 GB used, 5862 GB / 6233 GB avail 10322/84644 objects degraded (12.195%) 52912/84644 objects misplaced (62.511%) 308 active+clean 176 active+remapped+wait_backfill 26 active+recovery_wait+degraded 1 active+remapped+backfilling 1 active+recovering+degraded recovery io 100605 kB/s, 14 objects/s client io 0 B/s rd, 92788 B/s wr, 50 op/s rd, 11 op/s wr 2017-12-14 22:45:57.459846 mon.0 [INF] pgmap v3936174: 512 pgs: 1 activating, 1 active+recovering+degraded, 26 active+recovery_wait+degraded, 1 active+remapped+backfilling, 307 active+clean, 176 active+remapped+wait_backfill; 333 GB data, 369 GB used, 5863 GB / 6233 GB avail; 0 B/s rd, 101107 B/s wr, 19 op/s; 10354/84644 objects degraded (12.232%); 52912/84644 objects misplaced (62.511%); 12224 kB/s, 2 objects/s recovering 2017-12-14 22:45:58.466736 mon.0 [INF] pgmap v3936175: 512 pgs: 1 active+recovering+degraded, 26 active+recovery_wait+degraded, 1 active+remapped+backfilling, 308 active+clean, 176 active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB avail; 0 B/s rd, 92788 B/s wr, 61 op/s; 10322/84644 objects degraded (12.195%); 52912/84644 objects misplaced (62.511%); 100605 kB/s, 14 objects/s recovering 2017-12-14 22:46:00.474335 mon.0 [INF] pgmap v3936176: 512 pgs: 1 active+recovering+degraded, 26 active+recovery_wait+degraded, 1 active+remapped+backfilling, 308 active+clean, 176 active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB avail; 0 B/s rd, 434 kB/s wr, 45 op/s; 10322/84644 objects degraded (12.195%); 52912/84644 objects misplaced (62.511%); 84234 kB/s, 10 objects/s recovering 2017-12-14 22:46:02.482228 mon.0 [INF] pgmap v3936177: 512 pgs: 1 active+recovering+degraded, 26 active+recovery_wait+degraded, 1 active+remapped+backfilling, 308 active+clean, 176 active+remapped+wait_backfill; 333 GB data, 370 GB used, 5862 GB / 6233 GB avail; 0 B/s rd, 334 kB/s wr -Original Message- From: Cary [mailto:dynamic.c...@gmail.com] Sent: Thursday, December 14, 2017 4:21 PM To: James Okken Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] add hard drives to 3 CEPH servers (3 server cluster) Jim, I am not an expert, but I believe I can assist. Normally you will only have 1 OSD per drive. I have heard discussions about using multiple OSDs per disk, when using SSDs though. Once your drives have been installed you will have to format them, unless you are using Bluestore. My steps for formatting are below. Replace the sXX with your drive name. parted -a optimal /dev/sXX print mklabel gpt unit mib mkpart OSD4sdd1 1 -1 quit mkfs.xfs -f /dev/sXX1 # Run blkid, and copy the UUID for the newly formatted drive. blkid # Add the mount point/UUID to fstab. The mount point will be created later. vi /etc/fstab # For example UUID=6386bac4-7fef-3cd2-7d64-13db51d83b12 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64,logbufs=8 0 0 # You can then add the OSD to the cluster. uuidgen # Replace the UUID below with the UUID that was created with uuidgen. ceph osd create 23e734d7-96d8-4327-a2b9-0fbdc72ed8f1 # Notice what number of osd it creates usually the lowest # OSD available. # Add osd.4 to ceph.conf on all Ceph nodes. vi /etc/ceph/ceph.conf ... [osd.4] public addr = 172.1.3.1 cluster addr = 10.1.3.1 ... # Now add the mount point. mkdir -p /var/lib/ceph/osd/ceph-4 chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 # The command below mounts everything in fstab. mount
[ceph-users] add hard drives to 3 CEPH servers (3 server cluster)
Hi all, Please let me know if I am missing steps or using the wrong steps I'm hoping to expand my small CEPH cluster by adding 4TB hard drives to each of the 3 servers in the cluster. I also need to change my replication factor from 1 to 3. This is part of an Openstack environment deployed by Fuel and I had foolishly set my replication factor to 1 in the Fuel settings before deploy. I know this would have been done better at the beginning. I do want to keep the current cluster and not start over. I know this is going thrash my cluster for a while replicating, but there isn't too much data on it yet. To start I need to safely turn off each CEPH server and add in the 4TB drive: To do that I am going to run: ceph osd set noout systemctl stop ceph-osd@1 (or 2 or 3 on the other servers) ceph osd tree (to verify it is down) poweroff, install the 4TB drive, bootup again ceph osd unset noout Next step wouyld be to get CEPH to use the 4TB drives. Each CEPH server already has a 836GB OSD. ceph> osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 0.81689 1.0 836G 101G 734G 12.16 0.90 167 1 0.81689 1.0 836G 115G 721G 13.76 1.02 166 2 0.81689 1.0 836G 121G 715G 14.49 1.08 179 TOTAL 2509G 338G 2171G 13.47 MIN/MAX VAR: 0.90/1.08 STDDEV: 0.97 ceph> df GLOBAL: SIZE AVAIL RAW USED %RAW USED 2509G 2171G 338G 13.47 POOLS: NAMEID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 2145G 0 images 1 216G 9.15 2145G 27745 backups 2 0 0 2145G 0 volumes 3 114G 5.07 2145G 29717 compute 4 0 0 2145G 0 Once I get the 4TB drive into each CEPH server should I look to increasing the current OSD (ie: to 4836GB)? Or create a second 4000GB OSD on each CEPH server? If I am going to create a second OSD on each CEPH server I hope to use this doc: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ As far as changing the replication factor from 1 to 3: Here are my pools now: ceph osd pool ls detail pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'images' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 116 flags hashpspool stripe_width 0 removed_snaps [1~3,b~6,12~8,20~2,24~6,2b~8,34~2,37~20] pool 2 'backups' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 7 flags hashpspool stripe_width 0 pool 3 'volumes' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73 flags hashpspool stripe_width 0 removed_snaps [1~3] pool 4 'compute' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 flags hashpspool stripe_width 0 I plan on using these steps I saw online: ceph osd pool set rbd size 3 ceph -s (Verify that replication completes successfully) ceph osd pool set images size 3 ceph -s ceph osd pool set backups size 3 ceph -s ceph osd pool set volumes size 3 ceph -s please let me know any advice or better methods... thanks --Jim ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] access ceph filesystem at storage level and not via ethernet
Thanks again Ronny, Ocfs2 is working well so far. I have 3 nodes sharing the same 7TB MSA FC lun. Hoping to add 3 more... James Okken Lab Manager Dialogic Research Inc. 4 Gatehall Drive Parsippany NJ 07054 USA Tel: 973 967 5179 Email: james.ok...@dialogic.com Web: www.dialogic.com - The Network Fuel Company This e-mail is intended only for the named recipient(s) and may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. No waiver of privilege, confidence or otherwise is intended by virtue of communication via the internet. Any unauthorized use, dissemination or copying is strictly prohibited. If you have received this e-mail in error, or are not named as a recipient, please immediately notify the sender and destroy all copies of this e-mail. -Original Message- From: Ronny Aasen [mailto:ronny+ceph-us...@aasen.cx] Sent: Thursday, September 14, 2017 4:18 AM To: James Okken; ceph-users@lists.ceph.com Subject: Re: [ceph-users] access ceph filesystem at storage level and not via ethernet On 14. sep. 2017 00:34, James Okken wrote: > Thanks Ronny! Exactly the info I need. And kinda of what I thought the answer > would be as I was typing and thinking clearer about what I was asking. I just > was hoping CEPH would work like this since the openstack fuel tools deploy > CEPH storage nodes easily. > I agree I would not be using CEPH for its strengths. > > I am interested further in what you've said in this paragraph though: > > "if you want to have FC SAN attached storage on servers, shareable > between servers in a usable fashion I would rather mount the same SAN > lun on multiple servers and use a cluster filesystem like ocfs or gfs > that is made for this kind of solution." > > Please allow me to ask you a few questions regarding that even though it > isn't CEPH specific. > > Do you mean gfs/gfs2 global file system? > > Does ocfs and/or gfs require some sort of management/clustering server > to maintain and manage? (akin to a CEPH OSD) I'd love to find a > distributed/cluster filesystem where I can just partition and format. And > then be able to mount and use that same SAN datastore from multiple servers > without a management server. > If ocfs or gfs do need a server of this sort does it needed to be involved in > the I/O? or will I be able to mount the datastore, similar to any other disk > and the IO goes across the fiberchannel? i only have experience with ocfs. but i think gfs works similarish. There are quite a few cluster filesystems to choose from. https://en.wikipedia.org/wiki/Clustered_file_system servers that are mounting ocfs shared filesystems must have ocfs2-tools installed. have access to the common shared FC lun via FC. they need to be aware of the other ocfs servers of the same lun, that you define in a /etc/ocfs/cluster.conf configfile and the ocfs daemon must be running. then it is just a matter of making the ocfs (on one server) and adding it to fstab (of all servers) and mount. > One final question, if you don't mind, do you think I could use ext4or xfs > and "mount the same SAN lun on multiple servers" if I can guarantee each > server will only right to its own specific directory and never anywhere the > other servers will be writing? (I even have the SAN mapped to each server > using different lun's) mounting the same (non cluster) filesystem on multiple servers is guaranteed to destroy the filesystem, you will have multiple servers writing in the same metadata area, the same journal area and generaly shitting over each other. luckily i think most modern filesystems would detect that the FS is mounted somewhere else and prevent you from mounting it again without big fat warnings. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] access ceph filesystem at storage level and not via ethernet
Thanks Ric, thanks again Ronny. I have a lot of good info now! I am going to try ocfs2. Thanks -- Jim -Original Message- From: Ric Wheeler [mailto:rwhee...@redhat.com] Sent: Thursday, September 14, 2017 4:35 AM To: Ronny Aasen; James Okken; ceph-users@lists.ceph.com Subject: Re: [ceph-users] access ceph filesystem at storage level and not via ethernet On 09/14/2017 11:17 AM, Ronny Aasen wrote: > On 14. sep. 2017 00:34, James Okken wrote: >> Thanks Ronny! Exactly the info I need. And kinda of what I thought >> the answer would be as I was typing and thinking clearer about what I >> was asking. I just was hoping CEPH would work like this since the >> openstack fuel tools deploy CEPH storage nodes easily. >> I agree I would not be using CEPH for its strengths. >> >> I am interested further in what you've said in this paragraph though: >> >> "if you want to have FC SAN attached storage on servers, shareable >> between servers in a usable fashion I would rather mount the same SAN >> lun on multiple servers and use a cluster filesystem like ocfs or gfs >> that is made for this kind of solution." >> >> Please allow me to ask you a few questions regarding that even though >> it isn't CEPH specific. >> >> Do you mean gfs/gfs2 global file system? >> >> Does ocfs and/or gfs require some sort of management/clustering >> server to maintain and manage? (akin to a CEPH OSD) I'd love to find >> a distributed/cluster filesystem where I can just partition and >> format. And then be able to mount and use that same SAN datastore >> from multiple servers without a management server. >> If ocfs or gfs do need a server of this sort does it needed to be >> involved in the I/O? or will I be able to mount the datastore, >> similar to any other disk and the IO goes across the fiberchannel? > > i only have experience with ocfs. but i think gfs works similarish. > There are quite a few cluster filesystems to choose from. > https://en.wikipedia.org/wiki/Clustered_file_system > > servers that are mounting ocfs shared filesystems must have > ocfs2-tools installed. have access to the common shared FC lun via FC. > they need to be aware of the other ocfs servers of the same lun, that > you define in a /etc/ocfs/cluster.conf configfile and the ocfs daemon must be > running. > > then it is just a matter of making the ocfs (on one server) and adding > it to fstab (of all servers) and mount. > > >> One final question, if you don't mind, do you think I could use >> ext4or xfs and "mount the same SAN lun on multiple servers" if I can >> guarantee each server will only right to its own specific directory >> and never anywhere the other servers will be writing? (I even have >> the SAN mapped to each server using different lun's) > > mounting the same (non cluster) filesystem on multiple servers is > guaranteed to destroy the filesystem, you will have multiple servers > writing in the same metadata area, the same journal area and generaly > shitting over each other. > luckily i think most modern filesystems would detect that the FS is > mounted somewhere else and prevent you from mounting it again without big fat > warnings. > > kind regards > Ronny Aasen In general, you can get shared file systems (i.e., the clients can all see the same files and directories) with lots of different approaches: * use a shared disk file system like GFS2, OCFS2 - all of the "clients" where the applications run are part of the cluster and each server attaches to the shared storage (through iSCSI, FC, whatever). They do require HA cluster infrastructure for things like fencing * use a distributed file system like cephfs, glusterfs, etc - your clients access through a file system specific protocol, they don't see raw storage * take any file system (local or other) and re-export it as a client/server type of file system by using an NFS server or Samba server Ric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] access ceph filesystem at storage level and not via ethernet
Thanks Ronny! Exactly the info I need. And kinda of what I thought the answer would be as I was typing and thinking clearer about what I was asking. I just was hoping CEPH would work like this since the openstack fuel tools deploy CEPH storage nodes easily. I agree I would not be using CEPH for its strengths. I am interested further in what you've said in this paragraph though: "if you want to have FC SAN attached storage on servers, shareable between servers in a usable fashion I would rather mount the same SAN lun on multiple servers and use a cluster filesystem like ocfs or gfs that is made for this kind of solution." Please allow me to ask you a few questions regarding that even though it isn't CEPH specific. Do you mean gfs/gfs2 global file system? Does ocfs and/or gfs require some sort of management/clustering server to maintain and manage? (akin to a CEPH OSD) I'd love to find a distributed/cluster filesystem where I can just partition and format. And then be able to mount and use that same SAN datastore from multiple servers without a management server. If ocfs or gfs do need a server of this sort does it needed to be involved in the I/O? or will I be able to mount the datastore, similar to any other disk and the IO goes across the fiberchannel? One final question, if you don't mind, do you think I could use ext4or xfs and "mount the same SAN lun on multiple servers" if I can guarantee each server will only right to its own specific directory and never anywhere the other servers will be writing? (I even have the SAN mapped to each server using different lun's) Thanks for your expertise! -- Jim -- next part -- Message: 27 Date: Wed, 13 Sep 2017 19:56:07 +0200 From: Ronny Aasen <ronny+ceph-us...@aasen.cx> To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] access ceph filesystem at storage level and not via ethernet Message-ID: <fe8ec309-7750-ad5c-3fe7-12a62ad6e...@aasen.cx> Content-Type: text/plain; charset="windows-1252"; Format="flowed" a bit cracy :) if the disks are directly attached on a OSD node, or attachable on Fiberchannel does not make a difference. you can not shortcut the ceph cluster and talk to the osd disks directly without eventually destroying the ceph cluster. Even if you did, ceph is an object storage on disk, so you would not find filesystem or RBD diskimages there, only objects on your FC attached osd node disks with filestore, and with bluestore not even readable objects. that beeing said I think a FC SAN attached ceph osd node sounds a bit strange. ceph's strength is the distributed scaleable solution. and having the osd nodes collected on a SAN array would nuter ceph's strengths, and amplify ceph's weakness of high latency. i would only consider such a solution for testing, learning or playing around without having actual hardware for a distributed system. and in that case use 1 lun for each osd disk, give 8-10 vm's some luns/osd's each, just to learn how to work with ceph. if you want to have FC SAN attached storage on servers, shareable between servers in a usable fashion I would rather mount the same SAN lun on multiple servers and use a cluster filesystem like ocfs or gfs that is made for this kind of solution. kind regards Ronny Aasen On 13.09.2017 19:03, James Okken wrote: > > Hi, > > Novice question here: > > The way I understand CEPH is that it distributes data in OSDs in a > cluster. The reads and writes come across the ethernet as RBD requests > and the actual data IO then also goes across the ethernet. > > I have a CEPH environment being setup on a fiber channel disk array > (via an openstack fuel deploy). The servers using the CEPH storage > also have access to the same fiber channel disk array. > > From what I understand those servers would need to make the RDB > requests and do the IO across ethernet, is that correct? Even though > with this infrastructure setup there is a ?shorter? and faster path to > those disks, via the fiber channel. > > Is there a way to access storage on a CEPH cluster when one has this > ?better? access to the disks in the cluster? (how about if it were to > be only a single OSD with replication set to 1) > > Sorry if this question is crazy? > > thanks > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] access ceph filesystem at storage level and not via ethernet
Hi, Novice question here: The way I understand CEPH is that it distributes data in OSDs in a cluster. The reads and writes come across the ethernet as RBD requests and the actual data IO then also goes across the ethernet. I have a CEPH environment being setup on a fiber channel disk array (via an openstack fuel deploy). The servers using the CEPH storage also have access to the same fiber channel disk array. >From what I understand those servers would need to make the RDB requests and >do the IO across ethernet, is that correct? Even though with this >infrastructure setup there is a "shorter" and faster path to those disks, via >the fiber channel. Is there a way to access storage on a CEPH cluster when one has this "better" access to the disks in the cluster? (how about if it were to be only a single OSD with replication set to 1) Sorry if this question is crazy... thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] total storage size available in my CEPH setup?
Thanks gentlemen, I hope to add more OSD since we will need a good deal more than 2.3TB and I fo want to leave free space / margins. I am also thinking of reducing the replication to2 . I am sure I can google how to do that. But I am sure most of my results are going to be people telling me not to do it. Can you direct me to a good tutorial on how to do so. And, youre are right, I am a beginner. James Okken Lab Manager Dialogic Research Inc. 4 Gatehall Drive Parsippany NJ 07054 USA Tel: 973 967 5179 Email: james.ok...@dialogic.com Web: www.dialogic.com – The Network Fuel Company This e-mail is intended only for the named recipient(s) and may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. No waiver of privilege, confidence or otherwise is intended by virtue of communication via the internet. Any unauthorized use, dissemination or copying is strictly prohibited. If you have received this e-mail in error, or are not named as a recipient, please immediately notify the sender and destroy all copies of this e-mail. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Maxime Guyot Sent: Tuesday, March 14, 2017 7:29 AM To: Christian Balzer; ceph-users@lists.ceph.com Subject: Re: [ceph-users] total storage size available in my CEPH setup? Hi, >> My question is how much total CEPH storage does this allow me? Only 2.3TB? >> or does the way CEPH duplicates data enable more than 1/3 of the storage? > 3 means 3, so 2.3TB. Note that Ceph is spare, so that can help quite a bit. To expand on this, you probably want to keep some margins and not run at your cluster 100% :) (especially if you are running RBD with thin provisioning). By default, “ceph status” will issue a warning at 85% full (osd nearfull ratio). You should also consider that you need some free space for auto healing to work (if you plan to use more than 3 OSDs on a size=3 pool). Cheers, Maxime ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] total storage size available in my CEPH setup?
Hi all, I have a 3 storage node openstack setup using CEPH. I believe that means I have 3 OSDs, as each storage node has a one of 3 fiber channel storage locations mounted. The storage media behind each node is actually single 7TB HP fiber channel MSA array. The best performance configuration for the hard drives in the MSA just happened to be 3x 2.3TB RAID10's. And that matched nicely to the 3xStorageNode/OSD of the CEPH setup. I believe my replication factor is 3. My question is how much total CEPH storage does this allow me? Only 2.3TB? or does the way CEPH duplicates data enable more than 1/3 of the storage? A follow up question would be what is the best way to tell, thru CEPH, the space used and space free? Thanks!! root@node-1:/var/log# ceph osd tree ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY -1 6.53998 root default -5 2.17999 host node-28 3 2.17999 osd.3 up 1.0 1.0 -6 2.17999 host node-30 4 2.17999 osd.4 up 1.0 1.0 -7 2.17999 host node-31 5 2.17999 osd.5 up 1.0 1.0 0 0 osd.0 down0 1.0 1 0 osd.1 down0 1.0 2 0 osd.2 down0 1.0 ## root@node-1:/var/log# ceph osd lspools 0 rbd,2 volumes,3 backups,4 .rgw.root,5 .rgw.control,6 .rgw,7 .rgw.gc,8 .users.uid,9 .users,10 compute,11 images, ## root@node-1:/var/log# ceph osd dump epoch 216 fsid d06d61b0-1cd0-4e1a-ac20-67972d0e1fde created 2016-10-11 14:15:05.638099 modified 2017-03-09 14:45:01.030678 flags pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 2 'volumes' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 130 flags hashpspool stripe_width 0 removed_snaps [1~5] pool 3 'backups' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 14 flags hashpspool stripe_width 0 pool 4 '.rgw.root' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 16 flags hashpspool stripe_width 0 pool 5 '.rgw.control' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 18 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 6 '.rgw' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 20 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 7 '.rgw.gc' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 21 flags hashpspool stripe_width 0 pool 8 '.users.uid' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 22 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 9 '.users' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 24 flags hashpspool stripe_width 0 pool 10 'compute' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 216 flags hashpspool stripe_width 0 removed_snaps [1~37] pool 11 'images' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 189 flags hashpspool stripe_width 0 removed_snaps [1~3,5~8,f~4,14~2,18~2,1c~1,1e~1] max_osd 6 osd.0 down out weight 0 up_from 48 up_thru 50 down_at 52 last_clean_interval [44,45) 192.168.0.9:6800/4485 192.168.1.4:6800/4485 192.168.1.4:6801/4485 192.168.0.9:6801/4485 exists,new osd.1 down out weight 0 up_from 10 up_thru 48 down_at 50 last_clean_interval [5,8) 192.168.0.7:6800/60912 192.168.1.6:6801/60912 192.168.1.6:6802/60912 192.168.0.7:6801/60912 exists,new osd.2 down out weight 0 up_from 10 up_thru 48 down_at 50 last_clean_interval [5,8) 192.168.0.6:6800/61013 192.168.1.7:6800/61013 192.168.1.7:6801/61013 192.168.0.6:6801/61013 exists,new osd.3 up in weight 1 up_from 192 up_thru 201 down_at 190 last_clean_interval [83,191) 192.168.0.9:6800/2634194 192.168.1.7:6802/3634194 192.168.1.7:6803/3634194 192.168.0.9:6802/3634194 exists,up 28b02052-3196-4203-bec8-ac83a69fcbc5 osd.4 up in weight 1 up_from 196 up_thru 201 down_at 194 last_clean_interval [80,195) 192.168.0.7:6800/2629319 192.168.1.6:6802/3629319 192.168.1.6:6803/3629319 192.168.0.7:6802/3629319 exists,up 124b58e6-1e38-4246-8838-cfc3b88e8a5a osd.5 up in weight 1 up_from 201 up_thru 201 down_at 199 last_clean_interval [134,200) 192.168.0.6:6800/5494 192.168.1.4:6802/1005494 192.168.1.4:6803/1005494 192.168.0.6:6802/1005494 exists,up ddfca14e-e6f6-4c48-aa8f-0ebfc765d32f root@node-1:/var/log# James Okken Lab Manager Dialogic Research Inc. 4 Gatehall Drive Parsippany NJ 07054 USA Tel: 973 967 5179 Email: james.ok...@dialogic.com<mailto:james