[ceph-users] BUG 14154 on erasure coded PG
Dear all, I am using an erasure coded pool, and I get to a situation where I'm not able to recover a PG. The OSDs that contain this PG keep crashing, on the same behavior registered at http://tracker.ceph.com/issues/14154. I'm using ceph 0.94.9 (it first appeared on 0.94.7, an upgrade didn't solve the issue) on centOS 7.2, kernel 3.10.0-327.18.2.el7.x86_64. My EC profile: directory=/usr/lib64/ceph/erasure-code k=3 m=2 plugin=isa Is this issue being handled? Is there any hint on how to handle it? -- -- As informa��es contidas nesta mensagem s�o CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulga��o, distribui��o, reprodu��o ou qualquer forma de utiliza��o do teor deste documento depende de autoriza��o do emissor, sujeitando-se o infrator �s san��es legais. Caso esta comunica��o tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recovering full OSD
I got to this situation several times, due to a strange behavior in the xfs filesystem - I initially ran on debian, afterwards reinstalled the nodes to centos7, kernel 3.10.0-229.14.1.el7.x86_64, package xfsprogs-3.2.1-6.el7.x86_64. Around 75-80% of usage shown with df, the disk is already full. To delete PGs in order to restart the OSD, I first lowered the weight of the affected OSD, and observed which PGs started backfilling elsewhere. Then I deleted some of these backfilling PGs before trying to restart the OSD. It worked without data loss. Em 08-08-2016 08:19, Mykola Dvornik escreveu: @Shinobu According to http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ "If you cannot start an OSD because it is full, you may delete some data by deleting some placement group directories in the full OSD." On 8 August 2016 at 13:16, Shinobu Kinjo <mailto:shinobu...@gmail.com>> wrote: On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik mailto:mykola.dvor...@gmail.com>> wrote: > Dear ceph community, > > One of the OSDs in my cluster cannot start due to the > > ERROR: osd init failed: (28) No space left on device > > A while ago it was recommended to manually delete PGs on the OSD to let it > start. Who recommended that? > > So I am wondering was is the recommended way to fix this issue for the > cluster running Jewel release (10.2.2)? > > Regards, > > -- > Mykola > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > -- Email: shin...@linux.com <mailto:shin...@linux.com> shin...@redhat.com <mailto:shin...@redhat.com> -- Mykola** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- -- Mandic Cloud Solutions <http://www.mandic.com.br/?utm_source=Assinatura-de-Email&utm_medium=Email&utm_content=Logo&utm_campaign=Site-Mandic> *Gerd Jakobovitsch * *Diretoria de Tecnologia* +55 11 3030-3456 Avalie a Mandic Cloud <http://www.mandic.com.br/redirect/surveymonkey/?utm_source=Assinatura-de-Email&utm_medium=Email&utm_campaign=Pesquisa-Survey-Monkey> | Como está sua satisfação? *Vendas:* 4007-2442 *Suporte 24h:* 4007-1858 | 400-365-24 *Mandic.* Somos Especialistas em Cloud. <http://www.mandic.com.br/?utm_source=Assinatura%20de%20Email&utm_medium=Email&utm_content=Texto&utm_campaign=Site%20Mandic> Imagem: Cloud Sob Medida - Garantia de serviço e contrato em reais, sem variação do dólar. <http://www.mandic.com.br/solucoes/projetos-especiais-em-cloud/?utm_source=Email&utm_medium=Assinatura-de-Email-Out15&utm_campaign=BannerProjetosEspeciais> -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Lost access when removing cache pool overlay
Thank you for the response. It seems to me it is a transient situation. At this moment, I regained access to most, but not all buckets/index objects. But the overall performance dropped once again - I already have huge performance issues. Regards. Em 29-01-2016 14:41, Robert LeBlanc escreveu: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Do the client key have access to the base pool? Something similar bit us when adding a caching tier. Since the cache tier may be proxying all the I/O, the client may not have had access to the base pool and it still worked ok. Once you removed the cache tier, it could no longer access the pool. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Jan 29, 2016 at 8:47 AM, Gerd Jakobovitsch wrote: Dear all, I had to move .rgw.buckets.index pool to another structure; therefore, I created a new pool .rgw.buckets.index.new ; added the old pool as cache pool, and flushed the data. Up to this moment everything was ok. With radosgw -p df, I saw the objects moving to the new pool; the moved objects where ok, I could list omap keys and so on. When everything got moved, I removed the overlay cache pool. But at this moment, the objects became unresponsive: [(13:39:20) ceph@spchaog1 ~]$ rados -p .rgw.buckets.index listomapkeys .dir.default.198764998.1 error getting omap key set .rgw.buckets.index/.dir.default.198764998.1: (5) Input/output error That happens to all objects. When trying the access to the bucket through radosgw, I also get problems: [(13:16:01) root@spcogp1 ~]# radosgw-admin bucket stats --bucket="mybucket" error getting bucket stats ret=-2 Looking at the disk, data seems to be there: [(13:47:10) root@spcsnp1 ~]# ls /var/lib/ceph/osd/ceph-23/current/34.1f_head/|grep 198764998.1 \.dir.default.198764998.1__head_8A7482FF__22 Does anyone have a hint? Could I have lost ownership of the objects? Regards. -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -BEGIN PGP SIGNATURE- Version: Mailvelope v1.3.4 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJWq5YjCRDmVDuy+mK58QAAzUMQALqON8Ux5KPaotbyOMcr SzWVigfIa9Go1on8snKmehVzkwC25XaJxYQNU2OwsUUhHa1cy7v+rG6DTbDQ UDUK4IJ0O6ItGz4IeoyL06KyqmRy06OnuLRyzpQQD+nbIN+/82CVhMRMaKN+ U/GM+avDArN1JjjuQXFMgX/bS6ZoqJOBGqZKt3QWpJnkob1wgxP1tZA7MjZt p6Sfm/ci0dhveRhzylpEoxYKXwR6hN1hy/wiH2P5yeQBYYmpOALLDJDSTvln VZ/MbxPL5c0U/RRAkVMic1CvteeQ2nil2wEPFlu7cDjERvoBCMoyQeDXlep4 l+sAJbkKoOEKqE9xDo6CPnPNTePZsEaeSWvupkaypKL2bocBuZcwK6/c4IKE ITrhT2WTMxDiV5+h29f1ph5TQOHN72nEebggHtPnvoFI9nU50AaWb+QMr8oP ImerkQpLtvTwO3riLOY5arHXljf5X5IPtj+yDCD03QUoFLqELV+nnL8+v85v x3C0cL0n0TKm0zQpqvSoB1cXkZ1pCKATq8l7GFclR46P7a5PrDcVzsl+/p3X lqX94IoI+IIWqm7jVmOuMI2Pgo9c6FuprnG+bT997ivmucka4h/2ORNPbVt+ lz8hB1jU6dClgiaN1IdmzHDNFYniDFgnBWgfSN/N0qNZ2a84S1aTka+fr0ac MU8o =laAp -END PGP SIGNATURE- -- -- As informa��es contidas nesta mensagem s�o CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulga��o, distribui��o, reprodu��o ou qualquer forma de utiliza��o do teor deste documento depende de autoriza��o do emissor, sujeitando-se o infrator �s san��es legais. Caso esta comunica��o tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Lost access when removing cache pool overlay
Dear all, I had to move .rgw.buckets.index pool to another structure; therefore, I created a new pool .rgw.buckets.index.new ; added the old pool as cache pool, and flushed the data. Up to this moment everything was ok. With radosgw -p df, I saw the objects moving to the new pool; the moved objects where ok, I could list omap keys and so on. When everything got moved, I removed the overlay cache pool. But at this moment, the objects became unresponsive: [(13:39:20) ceph@spchaog1 ~]$ rados -p .rgw.buckets.index listomapkeys .dir.default.198764998.1 error getting omap key set .rgw.buckets.index/.dir.default.198764998.1: (5) Input/output error That happens to all objects. When trying the access to the bucket through radosgw, I also get problems: [(13:16:01) root@spcogp1 ~]# radosgw-admin bucket stats --bucket="mybucket" error getting bucket stats ret=-2 Looking at the disk, data seems to be there: [(13:47:10) root@spcsnp1 ~]# ls /var/lib/ceph/osd/ceph-23/current/34.1f_head/|grep 198764998.1 \.dir.default.198764998.1__head_8A7482FF__22 Does anyone have a hint? Could I have lost ownership of the objects? Regards. -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] leveldb on OSD with missing file after hard boot
Hello all, I had a hard reset on a ceph node, and one of the OSDs is not starting due to leveldb error. At that moment, the node was trying to start up, but there was no actual writing of new data: 2016-01-27 12:00:37.068431 7f367f654880 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 24734 2016-01-27 12:00:37.115800 7f367f654880 0 filestore(/var/lib/ceph/osd/ceph-26) backend xfs (magic 0x58465342) 2016-01-27 12:00:37.133031 7f367f654880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-26) detect_features: FIEMAP ioctl is supported and appears to work 2016-01-27 12:00:37.133042 7f367f654880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-26) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2016-01-27 12:00:37.136538 7f367f654880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-26) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2016-01-27 12:00:37.137584 7f367f654880 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-26) detect_feature: extsize is supported and kernel 3.10.0-123.el7.x86_64 >= 3.5 2016-01-27 12:00:37.176226 7f367f654880 -1 filestore(/var/lib/ceph/osd/ceph-26) Error initializing leveldb : Corruption: 1 missing files; e.g.: /var/lib/ceph/osd/ceph-26/current/omap/075074.sst 2016-01-27 12:00:37.176286 7f367f654880 -1 osd.26 0 OSD:init: unable to mount object store 2016-01-27 12:00:37.176315 7f367f654880 -1 ** ERROR: osd init failed: (1) Operation not permitted The file 075074.sst is missing indeed. Since I was not able to restart the OSD, and I could not find information on recovering the leveldb, I marked the OSD as lost, but then I got 3 incomplete OSDs. I tried to follow the recovery howto at https://ceph.com/community/incomplete-pgs-oh-my/, but it stepped on the same error on leveldb, having the same dependency. Is there any means to recover from this situation? To check and recover leveldb as good as possible? Or, alternatively, to get rid of the incomplete status, even with the penalty of losing some objects? Regards. -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] One object in .rgw.buckets.index causes systemic instability
Dear all, I have a cluster running hammer (0.94.5), with 5 nodes. The main usage is for S3-compatible object storage. I am getting to a very troublesome problem at a ceph cluster. A single object in the .rgw.buckets.index is not responding to request and takes a very long time while recovering after an osd restart. During this time, the OSDs where this object is mapped got heavily loaded, with high cpu as well as memory usage. At the same time, the directory /var/lib/ceph/osd/ceph-XX/current/omap gets a large number of entries ( > 1), that won't decrease. Very frequently, I get >100 blocked requests for this object, and the main OSD that stores it ends up accepting no other requests. Very frequently the OSD ends up crashing due to filestore timeout, and getting it up again is very troublesome - it usually has to run alone in the node for a long time, until the object gets recovered, somehow. At the OSD logs, there are several entries like these: -7051> 2015-11-03 10:46:08.339283 7f776974f700 10 log_client logged 2015-11-03 10:46:02.942023 osd.63 10.17.0.9:6857/2002 41 : cluster [WRN] slow re quest 120.003081 seconds old, received at 2015-11-03 10:43:56.472825: osd_repop(osd.53.236531:7 34.7 8a7482ff/.dir.default.198764998.1/head//34 v 2369 84'22) currently commit_sent 2015-11-03 10:28:32.405265 7f0035982700 0 log_channel(cluster) log [WRN] : 97 slow requests, 1 included below; oldest blocked for > 2046.502848 secs 2015-11-03 10:28:32.405269 7f0035982700 0 log_channel(cluster) log [WRN] : slow request 1920.676998 seconds old, received at 2015-11-03 09:56:31.7282 24: osd_op(client.210508702.0:14696798 .dir.default.198764998.1 [call rgw.bucket_prepare_op] 15.8a7482ff ondisk+write+known_if_redirected e236956) cur rently waiting for blocked object Is there any way to go deeper into this problem, or to rebuild the .rgw index without loosing data? I currently have 30 TB of data in the cluster - most of it concentrated in a handful of buckets - that I can't loose. Regards. -- -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ISA erasure code plugin in debian
Dear all, I have a ceph cluster deployed in debian; I'm trying to test ISA erasure-coded pools, but there is no plugin (libec_isa.so) included in the library. Looking at the packages at debian Ceph repository, I found a "trusty" package that includes the plugin. Is it created to use with debian? At which version? Is there any documentation for it? Otherwise, is there any other way to get the plugin working? Are there any kernel requirements? Regard -- As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou qualquer forma de utilização do teor deste documento depende de autorização do emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs stuck stale during data migration and OSD restart
I tried pg query, but it doesn't return, it hungs forever. As I understand it, when the PG is stale, there is no OSD to get the query. Am I right? I did the tunables in 2 steps, but didn't wait for all the data being moved before doing the second step. I rolled back to intermediate tunables - undefining the optimization below: chooseleaf_descend_once: Whether a recursive chooseleaf attempt will retry, or only try once and allow the original placement to retry. Legacy default is 0, optimal value is 1. Doing so, the stale OSDs imediately disappeared. Since I rolled back, I can't give you the outcome of ceph -s. I believe some of the issue is related to a under-dimensioned hardware. The OSDs are being killed by watchdog, my memory is swapping. But even so I didn't expect to lose data mapping. Regards. Em 31-08-2015 05:48, Gregory Farnum escreveu: On Sat, Aug 29, 2015 at 11:50 AM, Gerd Jakobovitsch wrote: Dear all, During a cluster reconfiguration (change of crush tunables from legacy to TUNABLES2) with large data replacement, several OSDs get overloaded and had to be restarted; when OSDs stabilize, I got a number of PGs marked stale, even when all OSDs where this data used to be located show up again. When I look at the OSDs current directory for the last placement, there is still some data. But it never shows up again. Is there any way to force these OSDs to resume being used? This sounds very strange. Can you provide the output of "ceph -s" and run pg query against one of the stuck PGs? -Greg -- -- As informa��es contidas nesta mensagem s�o CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulga��o, distribui��o, reprodu��o ou qualquer forma de utiliza��o do teor deste documento depende de autoriza��o do emissor, sujeitando-se o infrator �s san��es legais. Caso esta comunica��o tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PGs stuck stale during data migration and OSD restart
Dear all, During a cluster reconfiguration (change of crush tunables from legacy to TUNABLES2) with large data replacement, several OSDs get overloaded and had to be restarted; when OSDs stabilize, I got a number of PGs marked stale, even when all OSDs where this data used to be located show up again. When I look at the OSDs current directory for the last placement, there is still some data. But it never shows up again. Is there any way to force these OSDs to resume being used? regards. -- As informa��es contidas nesta mensagem s�o CONFIDENCIAIS, protegidas pelo sigilo legal e por direitos autorais. A divulga��o, distribui��o, reprodu��o ou qualquer forma de utiliza��o do teor deste documento depende de autoriza��o do emissor, sujeitando-se o infrator �s san��es legais. Caso esta comunica��o tenha sido recebida por engano, favor avisar imediatamente, respondendo esta mensagem. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: OSD crashes after upgrade to 0.80.10
An update: It seems that I am arriving at memory shortage. Even with 32 GB for 20 OSDs and 2 GB swap, ceph-osd uses all available memory. I created another swap device with 10 GB, and I managed to get the failed OSD running without crash, but consuming extra 5 GB. Are there known issues regarding memory on ceph osd? But I still get the problem of the incomplete+inactive PG. Regards. Gerd On 12-08-2015 10:11, Gerd Jakobovitsch wrote: I tried it, the error propagates to whichever OSD gets the errorred PG. For the moment, this is my worst problem. I have one PG incomplete+inactive, and the OSD with the highest priority in it gets 100 blocked requests (I guess that is the maximum), and, although running, doesn't get other requests - for example, ceph tell osd.21 injectargs '--osd-max-backfills 1'. After some time, it crashes, and the blocked requests go to the second OSD for the errorred PG. I can't get rid of these slow requests. I guessed a problem with leveldb, I checked, and had the default version for debian wheezy (0+20120530.gitdd0d562-1). I updated it for wheezy-backports (1.17-1~bpo70+1), but the error was the same. I use regular wheezy kernel (3.2+46). On 11-08-2015 23:52, Haomai Wang wrote: it seems like a leveldb problem. could you just kick it out and add a new osd to make cluster healthy firstly? On Wed, Aug 12, 2015 at 1:31 AM, Gerd Jakobovitsch wrote: Dear all, I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75% usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and since then I got several OSDs crashing and never recovering: trying to run it, ends up crashing as follows. Is this problem known? Is there any configuration that should be checked? Any way to try to recover these OSDs without losing all data? After that, setting the OSD to lost, I got one incomplete, inactive PG. Is there any way to recover it? Data still exists in crashed OSDs. Regards. [(12:58:13) root@spcsnp3 ~]# service ceph start osd.7 === osd.7 === 2015-08-11 12:58:21.003876 7f17ed52b700 1 monclient(hunting): found mon.spcsmp2 2015-08-11 12:58:21.003915 7f17ef493700 5 monclient: authenticate success, global_id 206010466 create-or-move updated item name 'osd.7' weight 3.64 at location {host=spcsnp3,root=default} to crush map Starting Ceph osd.7 on spcsnp3... 2015-08-11 12:58:21.279878 7f200fa8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918 starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal [(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) dump_stop 2015-08-11 12:58:21.348291 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal /var/lib/ceph/osd/ceph-7/journal 2015-08-11 12:58:21.348326 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) mount fsid is 54c136da-c51c-4799-b2dc-b7988982ee00 2015-08-11 12:58:21.349010 7f200fa8f780 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) 2015-08-11 12:58:21.349026 7f200fa8f780 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-11 12:58:21.353277 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-11 12:58:21.353302 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-11 12:58:21.362106 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-11 12:58:21.362195 7f200fa8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf 2015-08-11 12:58:21.362701 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) ** in thread 7f200fa8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf0a0) [0x7f200efcd0a0] 3: (gsignal()+0x35) [0x7f200db3f165] 4: (abort()+0x180) [0x7f200db423e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d] 6: (()+0x63996) [0x7f200e393996] 7: (()+0x639c3) [0x7f200e3939c3] 8: (()+0x63bee) [0x7f200e393bee] 9: (tc_new()+0x48e) [0x7f200f213aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&)+0x59) [0x7f200e3ef999] 11: (std::string::_Rep::_M_clone(std::allocator const&, unsigned long)+0x28) [0x7f200e3f0708] 12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7f2
Re: [ceph-users] Fwd: OSD crashes after upgrade to 0.80.10
I tried it, the error propagates to whichever OSD gets the errorred PG. For the moment, this is my worst problem. I have one PG incomplete+inactive, and the OSD with the highest priority in it gets 100 blocked requests (I guess that is the maximum), and, although running, doesn't get other requests - for example, ceph tell osd.21 injectargs '--osd-max-backfills 1'. After some time, it crashes, and the blocked requests go to the second OSD for the errorred PG. I can't get rid of these slow requests. I guessed a problem with leveldb, I checked, and had the default version for debian wheezy (0+20120530.gitdd0d562-1). I updated it for wheezy-backports (1.17-1~bpo70+1), but the error was the same. I use regular wheezy kernel (3.2+46). On 11-08-2015 23:52, Haomai Wang wrote: it seems like a leveldb problem. could you just kick it out and add a new osd to make cluster healthy firstly? On Wed, Aug 12, 2015 at 1:31 AM, Gerd Jakobovitsch wrote: Dear all, I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75% usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and since then I got several OSDs crashing and never recovering: trying to run it, ends up crashing as follows. Is this problem known? Is there any configuration that should be checked? Any way to try to recover these OSDs without losing all data? After that, setting the OSD to lost, I got one incomplete, inactive PG. Is there any way to recover it? Data still exists in crashed OSDs. Regards. [(12:58:13) root@spcsnp3 ~]# service ceph start osd.7 === osd.7 === 2015-08-11 12:58:21.003876 7f17ed52b700 1 monclient(hunting): found mon.spcsmp2 2015-08-11 12:58:21.003915 7f17ef493700 5 monclient: authenticate success, global_id 206010466 create-or-move updated item name 'osd.7' weight 3.64 at location {host=spcsnp3,root=default} to crush map Starting Ceph osd.7 on spcsnp3... 2015-08-11 12:58:21.279878 7f200fa8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918 starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal [(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) dump_stop 2015-08-11 12:58:21.348291 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal /var/lib/ceph/osd/ceph-7/journal 2015-08-11 12:58:21.348326 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) mount fsid is 54c136da-c51c-4799-b2dc-b7988982ee00 2015-08-11 12:58:21.349010 7f200fa8f780 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) 2015-08-11 12:58:21.349026 7f200fa8f780 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-11 12:58:21.353277 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-11 12:58:21.353302 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-11 12:58:21.362106 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-11 12:58:21.362195 7f200fa8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf 2015-08-11 12:58:21.362701 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) ** in thread 7f200fa8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf0a0) [0x7f200efcd0a0] 3: (gsignal()+0x35) [0x7f200db3f165] 4: (abort()+0x180) [0x7f200db423e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d] 6: (()+0x63996) [0x7f200e393996] 7: (()+0x639c3) [0x7f200e3939c3] 8: (()+0x63bee) [0x7f200e393bee] 9: (tc_new()+0x48e) [0x7f200f213aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&)+0x59) [0x7f200e3ef999] 11: (std::string::_Rep::_M_clone(std::allocator const&, unsigned long)+0x28) [0x7f200e3f0708] 12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7f200f46ffa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7f200f468360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7f200f46adf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7f200f46b11f] 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8] 19: (FileStore::mount()+0x18e0) [0x9b7080] 20:
[ceph-users] Fwd: OSD crashes after upgrade to 0.80.10
Dear all, I run a ceph system with 4 nodes and ~80 OSDs using xfs, with currently 75% usage, running firefly. On friday I upgraded it from 0.80.8 to 0.80.10, and since then I got several OSDs crashing and never recovering: trying to run it, ends up crashing as follows. Is this problem known? Is there any configuration that should be checked? Any way to try to recover these OSDs without losing all data? After that, setting the OSD to lost, I got one incomplete, inactive PG. Is there any way to recover it? Data still exists in crashed OSDs. Regards. [(12:58:13) root@spcsnp3 ~]# service ceph start osd.7 === osd.7 === 2015-08-11 12:58:21.003876 7f17ed52b700 1 monclient(hunting): found mon.spcsmp2 2015-08-11 12:58:21.003915 7f17ef493700 5 monclient: authenticate success, global_id 206010466 create-or-move updated item name 'osd.7' weight 3.64 at location {host=spcsnp3,root=default} to crush map Starting Ceph osd.7 on spcsnp3... 2015-08-11 12:58:21.279878 7f200fa8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 31918 starting osd.7 at :/0 osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal [(12:58:21) root@spcsnp3 ~]# 2015-08-11 12:58:21.348094 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) dump_stop 2015-08-11 12:58:21.348291 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) basedir /var/lib/ceph/osd/ceph-7 journal /var/lib/ceph/osd/ceph-7/journal 2015-08-11 12:58:21.348326 7f200fa8f780 10 filestore(/var/lib/ceph/osd/ceph-7) mount fsid is 54c136da-c51c-4799-b2dc-b7988982ee00 2015-08-11 12:58:21.349010 7f200fa8f780 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) 2015-08-11 12:58:21.349026 7f200fa8f780 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-11 12:58:21.353277 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-11 12:58:21.353302 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-11 12:58:21.362106 7f200fa8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-11 12:58:21.362195 7f200fa8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf 2015-08-11 12:58:21.362701 7f200fa8f780 5 filestore(/var/lib/ceph/osd/ceph-7) mount op_seq is 35490995 2015-08-11 12:58:59.383179 7f200fa8f780 -1 *** Caught signal (Aborted) ** in thread 7f200fa8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf0a0) [0x7f200efcd0a0] 3: (gsignal()+0x35) [0x7f200db3f165] 4: (abort()+0x180) [0x7f200db423e0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f200e39589d] 6: (()+0x63996) [0x7f200e393996] 7: (()+0x639c3) [0x7f200e3939c3] 8: (()+0x63bee) [0x7f200e393bee] 9: (tc_new()+0x48e) [0x7f200f213aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&)+0x59) [0x7f200e3ef999] 11: (std::string::_Rep::_M_clone(std::allocator const&, unsigned long)+0x28) [0x7f200e3f0708] 12: (std::string::reserve(unsigned long)+0x30) [0x7f200e3f07f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7f200e3f0ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7f200f46ffa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7f200f468360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7f200f46adf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7f200f46b11f] 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8] 19: (FileStore::mount()+0x18e0) [0x9b7080] 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a] 21: (main()+0x2234) [0x7331c4] 22: (__libc_start_main()+0xfd) [0x7f200db2bead] 23: /usr/bin/ceph-osd() [0x736e99] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -66> 2015-08-11 12:58:21.277524 7f200fa8f780 5 asok(0x2800230) register_command perfcounters_dump hook 0x27f0010 -65> 2015-08-11 12:58:21.277552 7f200fa8f780 5 asok(0x2800230) register_command 1 hook 0x27f0010 -64> 2015-08-11 12:58:21.277556 7f200fa8f780 5 asok(0x2800230) register_command perf dump hook 0x27f0010 -63> 2015-08-11 12:58:21.277561 7f200fa8f780 5 asok(0x2800230) register_command perfcounters_schema hook 0x27f0010 -62> 2015-08-11 12:58:21.277564 7f200fa8f780 5 asok(0x2800230) register_command 2 hook 0x27f0010 -61> 2015-08-11 12:58:21.277566 7f200fa8f780 5 asok(0x2800230) register_command perf schema hook 0x27f0010 -60> 2015-08-11 12:58:21.277569 7f200fa8f780 5 asok(0x2800230) r
[ceph-users] OSD crashes when starting
Dear all, I got to an unrecoverable crash at one specific OSD, every time I try to restart it. It happened first at firefly 0.80.8, I updated to 0.80.10, but it continued to happen. Due to this failure, I have several PGs down+peering, that won't recover even marking the OSD out. Could someone help me? Is it possible to edit/rebuild the leveldb-based log that seems to be causing the problem? Here is what the logfile informs me: [(12:54:45) root@spcsnp2 ~]# service ceph start osd.31 === osd.31 === create-or-move updated item name 'osd.31' weight 2.73 at location {host=spcsnp2,root=default} to crush map Starting Ceph osd.31 on spcsnp2... starting osd.31 at :/0 osd_data /var/lib/ceph/osd/ceph-31 /var/lib/ceph/osd/ceph-31/journal 2015-08-07 12:55:12.916880 7fd614c8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260 [(12:55:12) root@spcsnp2 ~]# 2015-08-07 12:55:12.928614 7fd614c8f780 0 filestore(/var/lib/ceph/osd/ceph-31) mount detected xfs (libxfs) 2015-08-07 12:55:12.928622 7fd614c8f780 1 filestore(/var/lib/ceph/osd/ceph-31) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-07 12:55:12.931410 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-07 12:55:12.931419 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-07 12:55:12.939290 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-07 12:55:12.939326 7fd614c8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_feature: extsize is disabled by conf 2015-08-07 12:55:45.587019 7fd614c8f780 -1 *** Caught signal (Aborted) ** in thread 7fd614c8f780 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) 1: /usr/bin/ceph-osd() [0xab7562] 2: (()+0xf030) [0x7fd6141ce030] 3: (gsignal()+0x35) [0x7fd612d41475] 4: (abort()+0x180) [0x7fd612d446f0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd61359689d] 6: (()+0x63996) [0x7fd613594996] 7: (()+0x639c3) [0x7fd6135949c3] 8: (()+0x63bee) [0x7fd613594bee] 9: (tc_new()+0x48e) [0x7fd614414aee] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&)+0x59) [0x7fd6135f0999] 11: (std::string::_Rep::_M_clone(std::allocator const&, unsigned long)+0x28) [0x7fd6135f1708] 12: (std::string::reserve(unsigned long)+0x30) [0x7fd6135f17f0] 13: (std::string::append(char const*, unsigned long)+0xb5) [0x7fd6135f1ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7fd614670fa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7fd614669360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7fd61466bdf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7fd61466c11f] 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8] 19: (FileStore::mount()+0x18e0) [0x9b7080] 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a] 21: (main()+0x2234) [0x7331c4] 22: (__libc_start_main()+0xfd) [0x7fd612d2dead] 23: /usr/bin/ceph-osd() [0x736e99] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -56> 2015-08-07 12:55:12.915675 7fd614c8f780 5 asok(0x1a20230) register_command perfcounters_dump hook 0x1a10010 -55> 2015-08-07 12:55:12.915697 7fd614c8f780 5 asok(0x1a20230) register_command 1 hook 0x1a10010 -54> 2015-08-07 12:55:12.915700 7fd614c8f780 5 asok(0x1a20230) register_command perf dump hook 0x1a10010 -53> 2015-08-07 12:55:12.915704 7fd614c8f780 5 asok(0x1a20230) register_command perfcounters_schema hook 0x1a10010 -52> 2015-08-07 12:55:12.915706 7fd614c8f780 5 asok(0x1a20230) register_command 2 hook 0x1a10010 -51> 2015-08-07 12:55:12.915709 7fd614c8f780 5 asok(0x1a20230) register_command perf schema hook 0x1a10010 -50> 2015-08-07 12:55:12.915711 7fd614c8f780 5 asok(0x1a20230) register_command config show hook 0x1a10010 -49> 2015-08-07 12:55:12.915714 7fd614c8f780 5 asok(0x1a20230) register_command config set hook 0x1a10010 -48> 2015-08-07 12:55:12.915716 7fd614c8f780 5 asok(0x1a20230) register_command config get hook 0x1a10010 -47> 2015-08-07 12:55:12.915718 7fd614c8f780 5 asok(0x1a20230) register_command log flush hook 0x1a10010 -46> 2015-08-07 12:55:12.915721 7fd614c8f780 5 asok(0x1a20230) register_command log dump hook 0x1a10010 -45> 2015-08-07 12:55:12.915723 7fd614c8f780 5 asok(0x1a20230) register_command log reopen hook 0x1a10010 -44> 2015-08-07 12:55:12.916880 7fd614c8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260 -43> 2015-08-07 12:55:12.91815
[ceph-users] Problem setting tunables for ceph firefly
Dear all, I have a ceph cluster running in 3 nodes, 240 TB space with 60% usage, used by rbd and radosgw clients. Recently I upgraded from emperor to firefly, and I got the message about legacy tunables described in http://ceph.com/docs/master/rados/operations/crush-map/#tunables. After some data rearrangement to minimize risks, I tried to apply the optimal settings. This resulted in 28% of object degradation, much more than I expected, and worse, I lost communication for the rbd clients, running in kernels 3.10 or 3.11. Searching for a solution, I got to this proposed solution: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11199.html. Applying it (before the data was all moved), I got additional 2% of object degradation, but the rbd clients came back into working. But then I got a large number of degraded or staled PGs, that are not backfilling. Looking for the definition of chooseleaf_vary_r, I reached the definition in http://ceph.com/docs/master/rados/operations/crush-map/: "chooseleaf_vary_r: Whether a recursive chooseleaf attempt will start with a non-zero value of r, based on how many attempts the parent has already made. Legacy default is 0, but with this value CRUSH is sometimes unable to find a mapping. The optimal value (in terms of computational cost and correctness) is 1. However, for legacy clusters that have lots of existing data, changing from 0 to 1 will cause a lot of data to move; a value of 4 or 5 will allow CRUSH to find a valid mapping but will make less data move." Is there any suggestion to handle it? Have I to set chooseleaf_vary_r to some other value? Will I lose communication with my rbd clients? Or should I return to legacy tunables? Regards, Gerd Jakobovitsch ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Uploading large files to swift interface on radosgw
Thank you very much, now it worked, with the value you suggested. Regards. On 09/19/2013 12:10 PM, Yehuda Sadeh wrote: Now you're hitting issue #6336 (it's a regression in dumpling that we'll fix soon). The current workaround is setting the following in your osd: osd max attr size = try a value of 10485760 (10M) which I think is large enough. Yehuda On Thu, Sep 19, 2013 at 7:30 AM, Gerd Jakobovitsch wrote: Hello Yehuda, thank you for your help. On 09/17/2013 08:35 PM, Yehuda Sadeh wrote: On Tue, Sep 17, 2013 at 3:21 PM, Gerd Jakobovitsch wrote: Hi all, I am testing a ceph environment installed in debian wheezy, and, when testing file upload of more than 1 GB, I am getting errors. For files larger than 5 GB, I get a "400 Bad Request EntityTooLarge" response; looking at The EntityTooLarge is expected, as there's a 5GB limit on objects. Bigger objects need to be uploaded using the large object api. the radosgw server, I notice that only the apache process is consuming cpu time, and I only have traffic on the external interface used by apache. For files between 2 GB and 5 GB, I get stuck for a very long time, and I see relatively high processing for both apache and radosgw. Finally, I get a response "500 Internal Server Error UnknownError". The object is created on rados, but is empty. I am wondering whether there are any configuration I should change on apache, fastcgi or rgw, or if there are hardware limitations. Apache and fastCGI where installed from the distro. My ceph configuration: Are you by any chance using the fcgi module rather than the fastcgi module? It had a problem with caching the entire object before sending it to the backend, which would result in the same symptoms as you just described. Yehuda Well, I followed the installation instructions, that explicitly refer to fastcgi. Now I disabled the cgid module and repeated the test: I got the same problem. Apache and fastcgi versions: apache2: Installed: 2.2.22-13 libapache2-mod-fastcgi: Installed: 2.4.7~0910052141-1 I enabled radosgw logging; please find annex the log file. There is a lot of information listed, but I couldn't figure out the problem. Regards. [global] mon_initial_members = spcsmp1, spcsmp2, spcsmp3 mon_host = 10.17.0.2,10.17.0.3,10.17.0.4 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true public_network = 10.17.0.0/24 cluster_network = 10.18.0.0/24 [osd] osd_journal_size = 1024 [client.radosgw.gateway] host = mss.mandic.com.br keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_enable_ops_log = false ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Uploading large files to swift interface on radosgw
Hi all, I am testing a ceph environment installed in debian wheezy, and, when testing file upload of more than 1 GB, I am getting errors. For files larger than 5 GB, I get a "400 Bad Request EntityTooLarge" response; looking at the radosgw server, I notice that only the apache process is consuming cpu time, and I only have traffic on the external interface used by apache. For files between 2 GB and 5 GB, I get stuck for a very long time, and I see relatively high processing for both apache and radosgw. Finally, I get a response "500 Internal Server Error UnknownError". The object is created on rados, but is empty. I am wondering whether there are any configuration I should change on apache, fastcgi or rgw, or if there are hardware limitations. Apache and fastCGI where installed from the distro. My ceph configuration: [global] mon_initial_members = spcsmp1, spcsmp2, spcsmp3 mon_host = 10.17.0.2,10.17.0.3,10.17.0.4 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd_journal_size = 1024 filestore_xattr_use_omap = true public_network = 10.17.0.0/24 cluster_network = 10.18.0.0/24 [osd] osd_journal_size = 1024 [client.radosgw.gateway] host = mss.mandic.com.br keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log rgw_enable_ops_log = false smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issues setting up ceph object storage
Hello all, I am trying to install a ceph environment for testing, focused on block devices for virtualization and object storage. I am facing some issues. My environment consists of one server running ceph-deploy and mon; 3 cluster nodes with 3 disks each running a total of 3 OSD's; and one object gateway. All servers are virtualized, running ubuntu 13.04 on XenServer hosts. When installing, I created a cluster with non-default name. The issues: 1) I couldn't create the OSD with non-default filesystem. I tried using "--fs-type btrfs ", as stated here. The option was not accepted. How can I use btrfs using ceph-deploy? 2) When trying to restart a OSD - in my case, to apply a changed configuration for cluster network, the command "sudo /etc/init.d/ceph -c .conf stop osd." is accepted, but doesn't shut down the osd. I had to kill the process to achieve it, and run the command line request to run it again. Btw, I still have a number of connections between OSDs across the public network, even after configuring the cluster network. 3) I'm also facing problems running radosgw - when running /etc/init.d/radosgw start, the request is accepted, but no process exists; but I'm still gathering more information and double checking the object gateway installation. Any help would be welcome. Regards Gerd Jakobovitsch smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com