Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.
Hi Manuel, Yes, I already tried that option but the result it's extremely noisy and not usable due to lack of some fields, besides that forget to parse those logs in order to print some stats. Also, I'm not sure if this is a good hint to rgw performance. I think I'm going to stick with nginx and made some tests. Thanks anyway! :) El mar., 6 ago. 2019 a las 18:06, EDH - Manuel Rios Fernandez (< mrios...@easydatahost.com>) escribió: > Hi Felix, > > > > You can increase debug option with debug rgw in your rgw nodes. > > > > We got it to 10. > > > > But at least in our case we switched again to civetweb because it don’t > provide a clear log without a lot verbose. > > > > Regards > > > > Manuel > > > > > > *De:* ceph-users *En nombre de *Félix > Barbeira > *Enviado el:* martes, 6 de agosto de 2019 17:43 > *Para:* Ceph Users > *Asunto:* [ceph-users] radosgw (beast): how to enable verbose log? > request, user-agent, etc. > > > > Hi, > > > > I'm testing radosgw with beast backend and I did not found a way to view > more information on logfile. This is an example: > > > > 2019-08-06 16:59:14.488 7fc808234700 1 == starting new request > req=0x5608245646f0 = > 2019-08-06 16:59:14.496 7fc808234700 1 == req done req=0x5608245646f0 > op status=0 http_status=204 latency=0.00800043s == > > > > I would be interested on typical fields that a regular webserver has: > origin, request, useragent, etc. I checked the official docs but I don't > find anything related: > > > > https://docs.ceph.com/docs/nautilus/radosgw/frontends/ > <https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3> > > > > The only manner I found is to put in front a nginx server running as a > proxy or an haproxy, but I really don't like that solution because it would > be an overhead component used only to log requests. Anyone in the same > situation? > > > > Thanks in advance. > > -- > > Félix Barbeira. > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.
Hi, I'm testing radosgw with beast backend and I did not found a way to view more information on logfile. This is an example: 2019-08-06 16:59:14.488 7fc808234700 1 == starting new request req=0x5608245646f0 = 2019-08-06 16:59:14.496 7fc808234700 1 == req done req=0x5608245646f0 op status=0 http_status=204 latency=0.00800043s == I would be interested on typical fields that a regular webserver has: origin, request, useragent, etc. I checked the official docs but I don't find anything related: https://docs.ceph.com/docs/nautilus/radosgw/frontends/ <https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3> The only manner I found is to put in front a nginx server running as a proxy or an haproxy, but I really don't like that solution because it would be an overhead component used only to log requests. Anyone in the same situation? Thanks in advance. -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] bluestore block.db on SSD, where block.wal?
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#devices "The BlueStore journal will always be placed on the fastest device available, so using a DB device will provide the same benefit that the WAL device would while *also* allowing additional metadata to be stored there (if it will fit)." So I guess if you only specify block.db (on faster device), block.wal it will go into that lvm/partition. El dom., 2 jun. 2019 a las 18:43, M Ranga Swami Reddy () escribió: > Hello - I planned to use the bluestore's block.db on SSD (and data is on > HDD) with 4% of HDD size. Here I have not mentioned the block.wal..in this > case where block.wal place? > is it in HDD (ie data) or in block.db of SSD? > > Thanks > Swami > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: Planning all flash cluster
> Is there anything that obviously stands out as severely unbalanced? The R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a different HBA might be a better idea, any recommendations please? > Don't know that HBA. Does it support pass through mode or HBA mode? H710 card does not support pass-through. With a R720 I would recommend a JBOD card for example LSI 9207-8i. With Dell next generation servers (H730XD) they carry H730 wich already have pass-through. El mié., 20 jun. 2018 a las 15:00, Luis Periquito () escribió: > adding back in the list :) > > -- Forwarded message - > From: Luis Periquito > Date: Wed, Jun 20, 2018 at 1:54 PM > Subject: Re: [ceph-users] Planning all flash cluster > To: > > > On Wed, Jun 20, 2018 at 1:35 PM Nick A wrote: > > > > Thank you, I was under the impression that 4GB RAM per 1TB was quite > generous, or is that not the case with all flash clusters? What's the > recommended RAM per OSD currently? Happy to throw more at it for a > performance boost. The important thing is that I'd like all nodes to be > absolutely identical. > I'm doing 8G per OSD, though I use 1.9T SSDs. > > > > > Based on replies so far, it looks like 5 nodes might be a better idea, > maybe each with 14 OSD's (960GB SSD's)? Plenty of 16 slot 2U chassis around > to make it a no brainer if that's what you'd recommend! > I tend to add more nodes: 1U with 4-8 SSDs per chassis to start with, > and using a single CPU with high frequency. For IOPS/latency cpu > frequency is really important. > I have started a cluster that only has 2 SSDs (which I share with the > OS) for data, but has 8 nodes. Those servers can take up to 10 drives. > > I'm using the Fujitsu RX1330, believe Dell would be the R330, with a > Intel E3-1230v6 cpu and 64G of ram, dual 10G and PSAS (passthrough > controller). > > > > > The H710 doesn't do JBOD or passthrough, hence looking for an > alternative HBA. It would be nice to do the boot drives as hardware RAID 1 > though, so a card that can do both at the same time (like the H730 found > R630's etc) would be ideal. > > > > Regards, > > Nick > > > > On 20 June 2018 at 13:18, Luis Periquito wrote: > >> > >> Adding more nodes from the beginning would probably be a good idea. > >> > >> On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote: > >> > > >> > Hello Everyone, > >> > > >> > We're planning a small cluster on a budget, and I'd like to request > any feedback or tips. > >> > > >> > 3x Dell R720XD with: > >> > 2x Xeon E5-2680v2 or very similar > >> The CPUs look good and sufficiently fast for IOPS. > >> > >> > 96GB RAM > >> 4GB per OSD looks a bit on the short side. Probably 192G would help. > >> > >> > 2x Samsung SM863 240GB boot/OS drives > >> > 4x Samsung SM863 960GB OSD drives > >> > Dual 40/56Gbit Infiniband using IPoIB. > >> > > >> > 3 replica, MON on OSD nodes, RBD only (no object or CephFS). > >> > > >> > We'll probably add another 2 OSD drives per month per node until full > (24 SSD's per node), at which point, more nodes. We've got a few SM863's in > production on other system and are seriously impressed with them, so would > like to use them for Ceph too. > >> > > >> > We're hoping this is going to provide a decent amount of IOPS, 20k > would be ideal. I'd like to avoid NVMe Journals unless it's going to make a > truly massive difference. Same with carving up the SSD's, would rather not, > and just keep it as simple as possible. > >> I agree: those SSDs shouldn't really require a journal device. Not > >> sure about the 20k IOPS specially without any further information. > >> Doing 20k IOPS at 1kB block is totally different at 1MB block... > >> > > >> > Is there anything that obviously stands out as severely unbalanced? > The R720XD comes with a H710 - instead of putting them in RAID0, I'm > thinking a different HBA might be a better idea, any recommendations please? > >> Don't know that HBA. Does it support pass through mode or HBA mode? > >> > > >> > Regards, > >> > Nick > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mix hardware on object storage cluster
Hi Cephers, We are managing a cluster where all machines have the same hardware. The cluster is used only for object storage. We are planning to increase nodes number. Those new nodes have better hardware than the old ones. If we only add those nodes as regular nodes to cluster we are not use the full power right? what could be the best way to take advantage of this new and better hardware? After read the docs these are possible options: - Change primary affinity: - Cache tiering: I dont really like this comment on the docs "Cache tiering will degrade performance for most workloads". - Change osd weight: I think this is more oriented to disk space on every node. Do I have some other options? -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to reduce min_size of an EC pool?
Ok, lesson learned the hard way. Thank goodness it was a test cluster. Thanks a lot Bryan! El jue., 17 ene. 2019 a las 21:46, Bryan Stillwell () escribió: > When you use 3+2 EC that means you have 3 data chunks and 2 erasure chunks > for your data. So you can handle two failures, but not three. The > min_size setting is preventing you from going below 3 because that's the > number of data chunks you specified for the pool. I'm sorry to say this, > but since the data was wiped off the other 3 nodes there isn't anything > that can be done to recover it. > > > > Bryan > > > > > > *From: *ceph-users on behalf of Félix > Barbeira > *Date: *Thursday, January 17, 2019 at 1:27 PM > *To: *Ceph Users > *Subject: *[ceph-users] How to reduce min_size of an EC pool? > > > > I want to bring back my cluster to HEALTHY state because right now I have > not access to the data. > > > > I have an 3+2 EC pool on a 5 node cluster. 3 nodes were lost, all data > wiped. They were reinstalled and added to cluster again. > > > > The "ceph health detail" command says to reduce min_size number to a value > lower than 3, but: > > > > root@ceph-monitor02:~# ceph osd pool set default.rgw.buckets.data > min_size 2 > > Error EINVAL: pool min_size must be between 3 and 5 > > root@ceph-monitor02:~# > > > > This is the situation: > > > > root@ceph-monitor01:~# ceph -s > > cluster: > > id: ce78b02d-03df-4f9e-a35a-31b5f05c4c63 > > health: HEALTH_WARN > > Reduced data availability: 515 pgs inactive, 512 pgs incomplete > > > > services: > > mon: 3 daemons, quorum ceph-monitor01,ceph-monitor03,ceph-monitor02 > > mgr: ceph-monitor02(active), standbys: ceph-monitor01, ceph-monitor03 > > osd: 57 osds: 57 up, 57 in > > > > data: > > pools: 8 pools, 568 pgs > > objects: 4.48 M objects, 10 TiB > > usage: 24 TiB used, 395 TiB / 419 TiB avail > > pgs: 0.528% pgs unknown > > 90.141% pgs not active > > 512 incomplete > > 53 active+clean > > 3 unknown > > > > root@ceph-monitor01:~# > > > > And this is the output of health detail: > > > > root@ceph-monitor01:~# ceph health detail > > HEALTH_WARN Reduced data availability: 515 pgs inactive, 512 pgs incomplete > > PG_AVAILABILITY Reduced data availability: 515 pgs inactive, 512 pgs > incomplete > > pg 10.1cd is stuck inactive since forever, current state incomplete, > last acting [9,48,41,58,17] (reducing pool default.rgw.buckets.data > min_size from 3 may help; search ceph.com/docs for 'incomplete') > > pg 10.1ce is incomplete, acting [3,13,14,42,21] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1cf is incomplete, acting [36,27,3,39,51] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d0 is incomplete, acting [29,9,38,4,56] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d1 is incomplete, acting [2,34,17,7,30] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d2 is incomplete, acting [41,45,53,13,32] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d3 is incomplete, acting [7,28,15,20,3] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d4 is incomplete, acting [11,40,25,23,0] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d5 is incomplete, acting [32,51,20,57,28] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d6 is incomplete, acting [2,53,8,16,15] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d7 is incomplete, acting [1,2,33,43,42] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d8 is incomplete, acting [27,49,9,48,20] (reducing pool > default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs > for 'incomplete') > > pg 10.1d9 is incomplete, acting [37,8,7,11,20] (reducing pool > default.rgw.buckets.data min_size from 3 may help; searc
[ceph-users] How to reduce min_size of an EC pool?
ucing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e4 is incomplete, acting [16,23,37,18,20] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e5 is incomplete, acting [21,38,6,23,57] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e6 is incomplete, acting [44,32,11,15,41] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e7 is incomplete, acting [35,20,42,48,26] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e8 is incomplete, acting [49,41,16,19,5] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1e9 is incomplete, acting [26,17,58,20,24] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ea is incomplete, acting [57,23,25,26,12] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1eb is incomplete, acting [39,30,61,18,10] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ec is incomplete, acting [21,20,11,38,4] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ed is incomplete, acting [56,34,45,42,33] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ee is incomplete, acting [40,53,2,27,33] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ef is incomplete, acting [21,56,3,39,42] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f0 is incomplete, acting [32,49,45,19,2] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f1 is incomplete, acting [46,34,45,8,47] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f2 is incomplete, acting [43,39,20,30,16] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f3 is incomplete, acting [30,43,23,25,32] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f4 is incomplete, acting [30,16,29,2,8] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f5 is incomplete, acting [15,28,6,11,7] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f6 is incomplete, acting [61,25,45,34,33] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f7 is incomplete, acting [33,27,6,11,15] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f8 is incomplete, acting [47,8,30,19,7] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1f9 is incomplete, acting [11,44,58,26,20] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1fa is incomplete, acting [32,51,19,39,2] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1fb is incomplete, acting [14,19,61,35,30] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1fc is incomplete, acting [37,0,47,17,18] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1fd is incomplete, acting [49,20,34,62,15] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1fe is incomplete, acting [46,52,33,34,9] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') pg 10.1ff is incomplete, acting [33,21,7,19,52] (reducing pool default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for 'incomplete') root@ceph-monitor02:~# Somebody has an idea of how to fix this?? Maybe copying the data to a replicated pool with min_size=1 ? All data are hopelessly lost? Thanks in advance. -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Boot volume on OSD device
If you have the chance maybe the best choice is try booting OS from network. I mean you don't need an extra hd for the OS. Actually I'm trying to make a squashfs image which is booted over LAN via iPXE. This is a very good example: https://croit.io/features/efficiency-diskless El sáb., 12 ene. 2019 a las 7:15, Brian Topping () escribió: > Question about OSD sizes: I have two cluster nodes, each with 4x 800GiB > SLC SSD using BlueStore. They boot from SATADOM so the OSDs are data-only, > but the MLC SATADOM have terrible reliability and the SLC are way > overpriced for this application. > > Can I carve off 64GiB of from one of the four drives on a node without > causing problems? If I understand the strategy properly, this will cause > mild extra load on the other three drives as the weight goes down on the > partitioned drive, but it probably won’t be a big deal. > > Assuming the correct procedure is documented at > http://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/, first > removing the OSD as documented, zap it, carve off the partition of the > freed drive, then adding the remaining space back in. > > I’m a little nervous that BlueStore assumes it owns the partition table > and will not be happy that a couple of primary partitions have been used. > Will this be a problem? > > Thanks, Brian > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?
Oh BTW, I had to change back MTU to 1500 on the ceph-monitors because they didn't work with 9000. This is the output of the ansible-playbook: TASK [ceph-mon : put initial mon keyring in mon kv store] ** fatal: [ceph-monitor01]: FAILED! => {"changed": false, "cmd": ["ceph", "--cluster", "ceph", "config-key", "put", "initial_mon_keyring", "xx=="], "delta": "0:05:00.159094", "end": "2017-10-30 09:48:10.425012", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-10-30 09:43:10.265918", "stderr": "2017-10-30 09:48:10.395156 7fd314408700 0 monclient(hunting): authenticate timed out after 300\n2017-10-30 09:48:10.395197 7fd314408700 0 librados: client.admin authentication error (110) Connection timed out\n[errno 110] error connecting to the cluster", "stderr_lines": ["2017-10-30 09:48:10.395156 7fd314408700 0 monclient(hunting): authenticate timed out after 300", "2017-10-30 09:48:10.395197 7fd314408700 0 librados: client.admin authentication error (110) Connection timed out", "[errno 110] error connecting to the cluster"], "stdout": "", "stdout_lines": []} Resuming, gateways and osds with jumbo frames, monitors not. Maybe this isn't a problem because the servers that handle most of traffic are the osds and gateways. 2017-10-30 10:50 GMT+01:00 Félix Barbeira <fbarbe...@gmail.com>: > Thanks Wido, it's fixed. I'm going to put the explanation if somebody runs > into the same error. > > The MTU was defined on the client side and it was 9000. The 'ifconfig' > shows the value established but if I ask directly the /proc filesystem it > shows the following: > > root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu > 1500 > root@ceph-node03:~# > > If I restart the interface it shows 9000 for a while and then it changes > back to 1500. After some research it turns out that the router offers a MTU > 1500 in the SLAAC parameters so when the session is 'refreshed', the client > applies the wrong value (1500). > > The network guys changed the MTU parameter offered via SLAAC and now it's > working: > > root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu > 9000 > root@ceph-node03:~# ping6 -c 3 -M do -s 8952 ceph-node01 > PING ceph-node01(2a02:x:x:x:x:x:x:x) 8952 data bytes > 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=1 ttl=64 time=0.271 ms > 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=2 ttl=64 time=0.216 ms > 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=3 ttl=64 time=0.280 ms > > --- ceph-node01 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2002ms > rtt min/avg/max/mdev = 0.216/0.255/0.280/0.033 ms > root@ceph-node03:~# > > > 2017-10-27 16:02 GMT+02:00 Wido den Hollander <w...@42on.com>: > >> >> > Op 27 oktober 2017 om 14:22 schreef Félix Barbeira <fbarbe...@gmail.com >> >: >> > >> > >> > Hi, >> > >> > I'm trying to configure a ceph cluster using IPv6 only but I can't >> enable >> > jumbo frames. I made the definition on the >> > 'interfaces' file and it seems like the value is applied but when I >> test it >> > looks like only works on IPv4, not IPv6. >> > >> > It works on IPv4: >> > >> > root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02 >> > >> > PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data. >> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms >> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms >> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms >> > >> >> Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it. >> >> > --- ceph-node02 ping statistics --- >> > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms >> > rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms >> > >> > root@ceph-node01:~# >> > >> > But *not* in IPv6: >> > >> > root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02 >> > PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes >> > ping: local error: Message too long, mtu=1500 >> > ping: local error: Message too long, mtu=1500 >> > ping: local error: Message too long, mtu=1500 >> > >> >> Li
Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?
Thanks Wido, it's fixed. I'm going to put the explanation if somebody runs into the same error. The MTU was defined on the client side and it was 9000. The 'ifconfig' shows the value established but if I ask directly the /proc filesystem it shows the following: root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu 1500 root@ceph-node03:~# If I restart the interface it shows 9000 for a while and then it changes back to 1500. After some research it turns out that the router offers a MTU 1500 in the SLAAC parameters so when the session is 'refreshed', the client applies the wrong value (1500). The network guys changed the MTU parameter offered via SLAAC and now it's working: root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu 9000 root@ceph-node03:~# ping6 -c 3 -M do -s 8952 ceph-node01 PING ceph-node01(2a02:x:x:x:x:x:x:x) 8952 data bytes 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=1 ttl=64 time=0.271 ms 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=2 ttl=64 time=0.216 ms 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=3 ttl=64 time=0.280 ms --- ceph-node01 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2002ms rtt min/avg/max/mdev = 0.216/0.255/0.280/0.033 ms root@ceph-node03:~# 2017-10-27 16:02 GMT+02:00 Wido den Hollander <w...@42on.com>: > > > Op 27 oktober 2017 om 14:22 schreef Félix Barbeira <fbarbe...@gmail.com > >: > > > > > > Hi, > > > > I'm trying to configure a ceph cluster using IPv6 only but I can't enable > > jumbo frames. I made the definition on the > > 'interfaces' file and it seems like the value is applied but when I test > it > > looks like only works on IPv4, not IPv6. > > > > It works on IPv4: > > > > root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02 > > > > PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data. > > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms > > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms > > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms > > > > Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it. > > > --- ceph-node02 ping statistics --- > > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms > > rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms > > > > root@ceph-node01:~# > > > > But *not* in IPv6: > > > > root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02 > > PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes > > ping: local error: Message too long, mtu=1500 > > ping: local error: Message too long, mtu=1500 > > ping: local error: Message too long, mtu=1500 > > > > Like Ronny already mentioned, check the switches and the receiver. There > is a 1500 MTU somewhere configured. > > Wido > > > --- ceph-node02 ping statistics --- > > 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time > 3024ms > > > > root@ceph-node01:~# > > > > > > > > root@ceph-node01:~# ifconfig > > eno1 Link encap:Ethernet HWaddr 24:6e:96:05:55:f8 > > inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global > > inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST *MTU:9000* Metric:1 > > RX packets:633318 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:463355602 (463.3 MB) TX bytes:498891771 (498.8 MB) > > > > loLink encap:Local Loopback > > inet addr:127.0.0.1 Mask:255.0.0.0 > > inet6 addr: ::1/128 Scope:Host > > UP LOOPBACK RUNNING MTU:65536 Metric:1 > > RX packets:127420 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1 > > RX bytes:179470326 (179.4 MB) TX bytes:179470326 (179.4 MB) > > > > root@ceph-node01:~# > > > > root@ceph-node01:~# cat /etc/network/interfaces > > # This file describes network interfaces avaiulable on your system > > # and how to activate them. For more information, see interfaces(5). > > > > source /etc/network/interfaces.d/* > > > > # The loopback network interface > > auto lo > > iface lo inet loopback > > > > # The primary network interface > > auto eno1 > > iface eno1 inet6 auto > >post-up ifconfig eno1 mtu 9000 > > root@ceph-node01:# > > > > > > Please help! > > > > -- > > Félix Barbeira. > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to enable jumbo frames on IPv6 only cluster?
Hi, I'm trying to configure a ceph cluster using IPv6 only but I can't enable jumbo frames. I made the definition on the 'interfaces' file and it seems like the value is applied but when I test it looks like only works on IPv4, not IPv6. It works on IPv4: root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02 PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data. 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms --- ceph-node02 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms root@ceph-node01:~# But *not* in IPv6: root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02 PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 ping: local error: Message too long, mtu=1500 --- ceph-node02 ping statistics --- 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms root@ceph-node01:~# root@ceph-node01:~# ifconfig eno1 Link encap:Ethernet HWaddr 24:6e:96:05:55:f8 inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link UP BROADCAST RUNNING MULTICAST *MTU:9000* Metric:1 RX packets:633318 errors:0 dropped:0 overruns:0 frame:0 TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:463355602 (463.3 MB) TX bytes:498891771 (498.8 MB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:127420 errors:0 dropped:0 overruns:0 frame:0 TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:179470326 (179.4 MB) TX bytes:179470326 (179.4 MB) root@ceph-node01:~# root@ceph-node01:~# cat /etc/network/interfaces # This file describes network interfaces avaiulable on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eno1 iface eno1 inet6 auto post-up ifconfig eno1 mtu 9000 root@ceph-node01:# Please help! -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Grafana Dasboard
Hi, You can check the official site: https://grafana.com/dashboards?search=ceph 2017-08-29 3:08 GMT+02:00 Shravana Kumar.S <shravanakum...@gmail.com>: > All, > I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to > collect the metrics and influxDB to store the value. > > Anyone is having the dashboard json file. > > Thanks, > Saravans > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW lifecycle not expiring objects
I recently check the repo and the new version of s3cmd was released 3 days ago, including lifecycle commands: https://github.com/s3tools/s3cmd/releases These are the lifecycle options: https://github.com/s3tools/s3cmd/blob/master/s3cmd#L2444-L2448 2017-06-29 17:51 GMT+02:00 Daniel Gryniewicz <d...@redhat.com>: > On 06/28/2017 02:30 PM, Graham Allan wrote: > >> That seems to be it! I couldn't see a way to specify the auth version >> with aws cli (is there a way?). However it did work with s3cmd and v2 >> auth: >> >> % s3cmd --signature-v2 setlifecycle lifecycle.xml s3://testgta >> s3://testgta/: Lifecycle Policy updated >> > > Good stuff. > > >> (I believe that with Kraken, this threw an error and failed to set the >> policy, but I'm not certain at this point... besides which radosgw >> didn't then have access to the default.rgw.lc pool, which may have >> caused further issues) >> >> No way to read the lifecycle policy back with s3cmd, so: >> > > I submitted a patch a while ago to add getlifecycle to s3cmd, and it was > accepted, but I don't know about releases or distro packaging. It will be > there eventually. > > >> % aws --endpoint-url https://xxx.xxx.xxx.xxx s3api \ >> get-bucket-lifecycle-configuration --bucket=testgta >> { >> "Rules": [ >> { >> "Status": "Enabled", >> "Prefix": "", >> "Expiration": { >> "Days": 1 >> }, >> "ID": "test" >> } >> ] >> } >> >> and looks encouraging at the server side: >> >> # radosgw-admin lc list >> [ >> { >> "bucket": ":gta:default.6985397.1", >> "status": "UNINITIAL" >> }, >> { >> "bucket": ":testgta:default.6790451.1", >> "status": "UNINITIAL" >> } >> ] >> >> then: >> # radosgw-admin lc process >> >> and all the (very old) objects disappeared from the test bucket. >> > > Good to know. > > Daniel > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] handling different disk sizes
Hi, Thanks to your answers now I understand better this part of ceph. I did the change on the crushmap that Maxime suggested, after that the results are what I expect from the beginning: # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 7.27100 1.0 7445G 1830G 5614G 24.59 0.98 238 3 7.27100 1.0 7445G 1700G 5744G 22.84 0.91 229 4 7.27100 1.0 7445G 1731G 5713G 23.26 0.93 233 1 1.81299 1.0 1856G 661G 1195G 35.63 1.43 87 5 1.81299 1.0 1856G 544G 1311G 29.34 1.17 73 6 1.81299 1.0 1856G 519G 1337G 27.98 1.12 71 2 2.72198 1.0 2787G 766G 2021G 27.50 1.10 116 7 2.72198 1.0 2787G 651G 2136G 23.36 0.93 103 8 2.72198 1.0 2787G 661G 2126G 23.72 0.95 98 TOTAL 36267G 9067G 27200G 25.00 MIN/MAX VAR: 0.91/1.43 STDDEV: 4.20 # I understand that the ceph defaults "type host" are safer than "type osd", but like I said before this cluster is only for testing purposes only. Thanks for all your answers :) 2017-06-06 9:20 GMT+02:00 Maxime Guyot <max...@root314.com>: > Hi Félix, > > Changing the failure domain to OSD is probably the easiest option if this > is a test cluster. I think the commands would go like: > - ceph osd getcrushmap -o map.bin > - crushtool -d map.bin -o map.txt > - sed -i 's/step chooseleaf firstn 0 type host/step chooseleaf firstn 0 > type osd/' map.txt > - crushtool -c map.txt -o map.bin > - ceph osd setcrushmap -i map.bin > > Moving HDDs into ~8TB/server would be a good option if this is a capacity > focused use case. It will allow you to reboot 1 server at a time without > radosgw down time. You would target for 26/3 = 8.66TB/ node so: > - node1: 1x8TB > - node2: 1x8TB +1x2TB > - node3: 2x6 TB + 1x2TB > > If you are more concerned about performance then set the weights to 1 on > all HDDs and forget about the wasted capacity. > > Cheers, > Maxime > > > On Tue, 6 Jun 2017 at 00:44 Christian Wuerdig <christian.wuer...@gmail.com> > wrote: > >> Yet another option is to change the failure domain to OSD instead host >> (this avoids having to move disks around and will probably meet you initial >> expectations). >> Means your cluster will become unavailable when you loose a host until >> you fix it though. OTOH you probably don't have too much leeway anyway with >> just 3 hosts so it might be an acceptable trade-off. It also means you can >> just add new OSDs to the servers wherever they fit. >> >> On Tue, Jun 6, 2017 at 1:51 AM, David Turner <drakonst...@gmail.com> >> wrote: >> >>> If you want to resolve your issue without purchasing another node, you >>> should move one disk of each size into each server. This process will be >>> quite painful as you'll need to actually move the disks in the crush map to >>> be under a different host and then all of your data will move around, but >>> then your weights will be able to utilize the weights and distribute the >>> data between the 2TB, 3TB, and 8TB drives much more evenly. >>> >>> On Mon, Jun 5, 2017 at 9:21 AM Loic Dachary <l...@dachary.org> wrote: >>> >>>> >>>> >>>> On 06/05/2017 02:48 PM, Christian Balzer wrote: >>>> > >>>> > Hello, >>>> > >>>> > On Mon, 5 Jun 2017 13:54:02 +0200 Félix Barbeira wrote: >>>> > >>>> >> Hi, >>>> >> >>>> >> We have a small cluster for radosgw use only. It has three nodes, >>>> witch 3 >>>> > ^ ^ >>>> >> osds each. Each node has different disk sizes: >>>> >> >>>> > >>>> > There's your answer, staring you right in the face. >>>> > >>>> > Your default replication size is 3, your default failure domain is >>>> host. >>>> > >>>> > Ceph can not distribute data according to the weight, since it needs >>>> to be >>>> > on a different node (one replica per node) to comply with the replica >>>> size. >>>> >>>> Another way to look at it is to imagine a situation where 10TB worth of >>>> data >>>> is stored on node01 which has 8x3 24TB. Since you asked for 3 replicas, >>>> this >>>> data must be replicated to node02 but ... there only is 2x3 6TB >>>> available. >>>> So the maximum you can store is 6TB and remaining disk space on node01 >>>> and node03 >>>> will never be used. >>>>
[ceph-users] handling different disk sizes
Hi, We have a small cluster for radosgw use only. It has three nodes, witch 3 osds each. Each node has different disk sizes: node01 : 3x8TB node02 : 3x2TB node03 : 3x3TB I thought that the weight handle the amount of data that every osd receive. In this case for example the node with the 8TB disks should receive more than the rest, right? All of them receive the same amount of data and the smaller disk (2TB) reaches 100% before the bigger ones. Am I doing something wrong? The cluster is jewel LTS 10.2.7. # ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 7.27060 1.0 7445G 1012G 6432G 13.60 0.57 133 3 7.27060 1.0 7445G 1081G 6363G 14.52 0.61 163 4 7.27060 1.0 7445G 787G 6657G 10.58 0.44 120 1 1.81310 1.0 1856G 1047G 809G 56.41 2.37 143 5 1.81310 1.0 1856G 956G 899G 51.53 2.16 143 6 1.81310 1.0 1856G 877G 979G 47.24 1.98 130 2 2.72229 1.0 2787G 1010G 1776G 36.25 1.52 140 7 2.72229 1.0 2787G 831G 1955G 29.83 1.25 130 8 2.72229 1.0 2787G 1038G 1748G 37.27 1.56 146 TOTAL 36267G 8643G 27624G 23.83 MIN/MAX VAR: 0.44/2.37 STDDEV: 18.60 # # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 35.41795 root default -2 21.81180 host node01 0 7.27060 osd.0 up 1.0 1.0 3 7.27060 osd.3 up 1.0 1.0 4 7.27060 osd.4 up 1.0 1.0 -3 5.43929 host node02 1 1.81310 osd.1 up 1.0 1.0 5 1.81310 osd.5 up 1.0 1.0 6 1.81310 osd.6 up 1.0 1.0 -4 8.16687 host node03 2 2.72229 osd.2 up 1.0 1.0 7 2.72229 osd.7 up 1.0 1.0 8 2.72229 osd.8 up 1.0 1.0 # # ceph -s cluster 49ba9695-7199-4c21-9199-ac321e60065e health HEALTH_OK monmap e1: 3 mons at {ceph-mon01=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon02=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon03=[x:x:x:x:x:x:x:x]:6789/0} election epoch 48, quorum 0,1,2 ceph-mon01,ceph-mon03,ceph-mon02 osdmap e265: 9 osds: 9 up, 9 in flags sortbitwise,require_jewel_osds pgmap v95701: 416 pgs, 11 pools, 2879 GB data, 729 kobjects 8643 GB used, 27624 GB / 36267 GB avail 416 active+clean # # ceph osd pool ls .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log default.rgw.users.uid default.rgw.users.keys default.rgw.buckets.index default.rgw.buckets.non-ec default.rgw.buckets.data default.rgw.users.email # # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 36267G 27624G8643G 23.83 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 1 1588 0 5269G 4 default.rgw.control2 0 0 5269G 8 default.rgw.data.root 3 8761 0 5269G 28 default.rgw.gc 4 0 0 5269G 32 default.rgw.log5 0 0 5269G 127 default.rgw.users.uid 6 4887 0 5269G 28 default.rgw.users.keys 7144 0 5269G 16 default.rgw.buckets.index 9 0 0 5269G 14 default.rgw.buckets.non-ec 10 0 0 5269G 3 default.rgw.buckets.data 11 2879G 35.34 5269G 746848 default.rgw.users.email1213 0 5269G 1 # -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph OSD network with IPv6 SLAAC networks?
We are implementing an IPv6 native ceph cluster using SLAAC. We have some legacy machines that are not capable of using IPv6, only IPv4 due to some reasons (yeah, I know). I'm wondering what could happen if I use an additional IPv4 on the radosgw in addition to the IPv6 that is already running. The rest of the ceph cluster components only have IPv6, the radosgw would be the only one with IPv4. Do you think that this would be a good practice or should I stick to only IPv6? 2017-03-31 17:36 GMT+02:00 Wido den Hollander <w...@42on.com>: > > > Op 30 maart 2017 om 20:13 schreef Richard Hesse < > richard.he...@weebly.com>: > > > > > > Thanks for the reply Wido! How do you handle IPv6 routes and routing with > > IPv6 on public and cluster networks? You mentioned that your cluster > > network is routed, so they will need routes to reach the other racks. But > > you can't have more than 1 default gateway. Are you running a routing > > protocol to handle that? > > > > I don't. These clusters run without a public nor cluster network. Each > host has 1 IP-Address. > > I rarely use public/cluster networks as they don't add anything for most > systems. 20Gbit of bandwidth per node is more then enough in most cases and > my opinion is that multiple IPs per machine only add complexity. > > Wido > > > We're using classless static routes via DHCP on v4 to solve this problem, > > and I'm curious what the v6 SLAAC equivalent was. > > > > Thanks, > > -richard > > > > On Tue, Mar 28, 2017 at 8:30 AM, Wido den Hollander <w...@42on.com> > wrote: > > > > > > > > > Op 27 maart 2017 om 21:49 schreef Richard Hesse < > > > richard.he...@weebly.com>: > > > > > > > > > > > > Has anyone run their Ceph OSD cluster network on IPv6 using SLAAC? I > know > > > > that ceph supports IPv6, but I'm not sure how it would deal with the > > > > address rotation in SLAAC, permanent vs outgoing address, etc. It > would > > > be > > > > very nice for me, as I wouldn't have to run any kind of DHCP server > or > > > use > > > > static addressing -- just configure RA's and go. > > > > > > > > > > Yes, I do in many clusters. Works fine! SLAAC doesn't generate random > > > addresses which change over time. That's a feature called 'Privacy > > > Extensions' and is controlled on Linux by: > > > > > > - net.ipv6.conf.all.use_tempaddr > > > - net.ipv6.conf.default.use_tempaddr > > > - net.ipv6.conf.X.use_tempaddr > > > > > > Set this to 0 and the kernel will generate one address based on the > > > MAC-Address (EUI64) of the interface. This address is stable and will > not > > > change. > > > > > > I like this very much as I don't have any static or complex network > > > configurations on the hosts. It moves the whole responsibility of > > > networking and addresses to the network. A host just boots and obtains > a IP. > > > > > > The OSDs contact the MONs on boot and they will tell them their > address. > > > OSDs do not need a fixed address for Ceph. > > > > > > However, using SLAAC without Privacy Extensions means that in practice > the > > > address will not change of a machine, so you don't need to worry about > it > > > that much. > > > > > > The biggest system I have running this way is 400 nodes running > IPv6-only. > > > 10 racks, 40 nodes per rack. Each rack has a Top-of-Rack switch > running in > > > Layer 3 and a /64 is assigned per rack. > > > > > > Layer 3 routing is used between the racks that based on the IPv6 > address > > > we can even determine in which rack the host/OSD is. > > > > > > Layer 2 domains don't expand over racks which makes a rack a true > failure > > > domain in our case. > > > > > > Wido > > > > > > > On that note, does anyone have any experience with running ceph in a > > > mixed > > > > v4 and v6 environment? > > > > > > > > Thanks, > > > > -richard > > > > ___ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw bucket name performance
Hi, Regarding to Amazon S3 documentation, it is advised to insert a bit of random chars in the bucket name in order to gain performance. This is related to how Amazon store key names. It looks like they store an index of object key names in each region. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html#workloads-with-mix-request-types My question is: is this also a good practice in a ceph cluster where all the nodes are in the same datacenter? It is relevant in ceph the name of the bucket to gain more performance? I think it's not, because all the data is spread in the placement groups all over the osd nodes, no matter what bucket name he got. Can anyone confirm this? Thanks in advance. -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what happen to the OSDs if the OS disk dies?
gt;>> Should I use 2 disks for the OS making a RAID1? in this case I'm > > >>>> "wasting" 8TB only for ~10GB that the OS needs. > > >>>> > > >>>> In all the docs that i've been reading says ceph has no unique > > >>>> single point of failure, so I think that this scenario must have a > > >>>> optimal solution, maybe somebody could help me. > > >>>> > > >>>> Thanks in advance. > > >>>> > > >>>> -- > > >>>> > > >>>> Félix Barbeira. > > >>> if you do not have dedicated slots on the back for OS disks, then i > > >>> would recomend using SATADOM flash modules directly into a SATA port > > >>> internal in the machine. Saves you 2 slots for osd's and they are > > >>> quite reliable. you could even use 2 sd cards if your machine have > > >>> the internal SD slot > > >>> > > >>> > > >> http://www.dell.com/downloads/global/products/pedge/en/ > poweredge-idsdm-whitepaper-en.pdf > > >>> [1] > > >>> > > >>> kind regards > > >>> Ronny Aasen > > >>> > > >>> ___ > > >>> ceph-users mailing list > > >>> ceph-users@lists.ceph.com [2] > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] > > >>> > > >>> ___ > > >>> ceph-users mailing list > > >>> ceph-u > > >> ph.com > > >> http://li > > >> > > >>> i/ceph-users-ceph.com > > >> > > >> > > >> Links: > > >> -- > > >> [1] > > >> http://www.dell.com/downloads/global/products/pedge/en/ > poweredge-idsdm-whitepaper-en.pdf > > >> [2] mailto:ceph-users@lists.ceph.com > > >> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> [4] mailto:bsha...@sharerland.com > > > > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] what happen to the OSDs if the OS disk dies?
Hi, I'm planning to make a ceph cluster but I have a serious doubt. At this moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official ceph docs says: "We recommend using a dedicated drive for the operating system and software, and one drive for each Ceph OSD Daemon you run on the host." I could use for example 1 disk for the OS and 11 for OSD data. In the operating system I would run 11 daemons to control the OSDs. But...what happen to the cluster if the disk with the OS fails?? maybe the cluster thinks that 11 OSD failed and try to replicate all that data over the cluster...that sounds no good. Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting" 8TB only for ~10GB that the OS needs. In all the docs that i've been reading says ceph has no unique single point of failure, so I think that this scenario must have a optimal solution, maybe somebody could help me. Thanks in advance. -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rgw (infernalis docker) with hammer cluster
I want to use the ceph object gateway. The docker container has 9.2.1 version (infernalis) and my cluster it's a hammer LTS version (0.94.6). It is possible to use the rgw docker container (ceph/daemon rgw) with this ceph cluster hammer version or maybe something breaks due to its new version? -- Félix Barbeira. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com