Re: [ceph-users] rebalancing taking very long time
I can say exactly the same I am using ceph sin 0.38 and I never get osd so laggy than with 0.94. rebalancing /rebuild algorithm is crap in 0.94 serriously I have 2 osd serving 2 discs of 2TB and 4 GB of RAM osd takes 1.6GB each !!! serriously ! that makes avanche snow. Let me be straight and explain what changed. in 0.38 you ALWAYS could stop the ceph cluster and then start it up it would evaluate if everyone is back if there is enough replicas then start rebuilding /rebalancing what needed of course like 10 minutes was necesary to bring up ceph cluster but then the rebuilding /rebalancing process was smooth. With 0.94 first you have 2 osd too full at 95 % and 4 osd at 63% over 20 osd. then you get a disc crash. so ceph starts automatically to rebuild and rebalance stuff. and there osd start to lag then to crash you stop ceph cluster you change the drive restart the ceph cluster stops all rebuild process setting no-backfill, norecovey noscrub nodeep-scrub you rm the old osd create a new one wait for all osd to be in and up and then starts rebuilding lag/rebalancing since it is automated not much a choice there. And again all osd are stuck in enless lag/down/recovery intent cycle... It is a pain serriously. 5 days after changing the faulty disc it is still locked in the lag/down/recovery cycle. Sur it can be argued that my machines are really ressource limited and that I should buy 3 thousand dollar worth server at least. But intil 0.72 that rebalancing /rebuilding process was working smoothly on the same hardware. It seems to me that the rebalancing/rebuilding algorithm is more strict now than it was in the past. in the past only what really really needed to be rebuild or rebalance was rebalanced or rebuild. I can still delete all and go back to 0.72... like I should buy a cray T-90 to not have anymore problems and have ceph run smoothly. But this will not help making ceph a better product. for me ceph 0.94 is like windows vista... Alphe Salas I.T ingeneer On 09/08/2015 10:20 AM, Gregory Farnum wrote: On Wed, Sep 2, 2015 at 9:34 PM, Bob Ababurko wrote: When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very long time to rebalance. I should note that my cluster is slightly unique in that I am using cephfs(shouldn't matter?) and it currently contains about 310 million objects. The last time I replaced a disk/OSD was 2.5 days ago and it is still rebalancing. This is on a cluster with no client load. The configurations is 5 hosts with 6 x 1TB 7200rpm SATA OSD's & 1 850 Pro SSD which contains the journals for said OSD's. Thats means 30 OSD's in total. System disk is on its own disk. I'm also using a backend network with single Gb NIC. THe rebalancing rate(objects/s) seems to be very slow when it is close to finishingsay <1% objects misplaced. It doesn't seem right that it would take 2+ days to rebalance a 1TB disk with no load on the cluster. Are my expectations off? Possibly...Ceph basically needs to treat each object as a single IO. If you're recovering from a failed disk then you've got to replicate roughly 310 million * 3 / 30 = 31 million objects. If it's perfectly balanced across 30 disks that get 80 IOPS that's 12916 seconds (~3.5 hours) worth of work just to read each file — and in reality it's likely to take more than one IO to read the file, and then you have to spend a bunch to write it as well. I'm not sure if my pg_num/pgp_num needs to be changed OR the rebalance time is dependent on the number of objects in the pool. These are thoughts i've had but am not certain are relevant here. Rebalance time is dependent on the number of objects in the pool. You *might* see an improvement by increasing "osd max push objects" from its default of 10...or you might not. That many small files isn't something I've explored. -Greg $ sudo ceph -v ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) $ sudo ceph -s [sudo] password for bababurko: cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79 health HEALTH_WARN 5 pgs backfilling 5 pgs stuck unclean recovery 3046506/676638611 objects misplaced (0.450%) monmap e1: 3 mons at {cephmon01=10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0} election epoch 20, quorum 0,1,2 cephmon01,cephmon02,cephmon03 mdsmap e6070: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby osdmap e4395: 30 osds: 30 up, 30 in; 5 remapped pgs pgmap v3100039: 2112 pgs, 3 pools, 6454 GB data, 321 Mobjects 18319 GB used, 9612 GB / 27931 GB avail 3046506/676638611 objects misplaced (0.450%) 2095 active+clean 12 active+clean+scrubbing+deep 5 active+remapped+backfilling recovery io 2294 kB/s, 147 objects/s $ sudo rados df pool name
[ceph-users] installing ceph giant on ubuntu 15,04
Hello everyone, I recently had to install ceph giant on ubuntu 15.04 and had to solve some problems, so here is the best way to do it. 1)replace in your ubuntu 15.04 fresh install systemd with upstart apt-get update apt-get install upstart apt-get install upstart-sysv (remove systemd and replace it by the upstart whole thing) 2) install dpkg-dev apt-get install dpkg-dev 2) download ceph giant 0.94.1 in sources 3) untargz the sources package 4) from the sources directory run the script that will download all the necesary ./install-deps.sh 5) make sure we have all dependencies dpkg-checkbuilddeps 6) compile sources and create the deb packages dpkg-buildpackage Once you have all your .deb files related to all the aspect of ceph then you can deploy them on all the nodes of your ceph cluster. install the needed packages with dpkg --install ceph-0.94.1.deb once done with the installation of your packages you will have to download all their dependencies doing simply: apt-get install -f And you will be ready to use ceph-deploy to deploy your nice ceph cluster. Regards -- Alphe Salas I.T ingeneer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] the state of cephfs in giant
For the humble ceph user I am it is really hard to follow what version of what product will get the changes I requiere. Let me explain myself. I use ceph in my company is specialised in disk recovery, my company needs a flexible, easy to maintain, trustable way to store the data from the disks of our clients. We try the usual way jbod boxes connected to a single server with a SAS raid card and ZFS mirror to handle replicas and merging disks into a big disk. result is really slow. (used to use zfs and solaris 11 on x86 servers... with openZfs and ubuntu 14.04 the perf are way better but not any were comparable with ceph (on a giga ethernet lan you can get data transfer betwin client and ceph cluster around 80MB/s...while client to openzfs/ubuntu is around 25MB/S) Along my path with ceph I first used cephfs, worked fine! until I noticed that part of the folder tree suddently randomly disapeared forcing a constant periodical remount of the partitions. Then I choose to forget about cephfs and use rbd images, worked fine! Until I noticed that rbd replicas where never freed or overwriten and that for a replicas set to 2 (data and 1 replica) and an image of 13 TB after some time of write erase cycles on the same rbd image I get an overall data use of 34 TB over the 36TB available on my cluster I noticed that there was a real problem with "space management". The data part of the rbd image was properly managed using overwrites on old deleted data at OS level, so the only logical explaination of the overall data use growth was that the replicas where never freed. All along that time I was pending of the bugs/ features and advances of ceph. But those isues are not really ceph related they are kernel modules for using "ceph clients" so the release of feature add and bug fix are in part to be given in the ceph-common package (for the server related machanics) and the other part is then to be provided at the kernel level. For comodity I use Ubuntu which is not really top notch using the very lastest brew of the kernel and all the bug fixed modules. So when I see this great news about giant and the fact that alot of work has been done in solving most of the problems we all faced with ceph then I notice that it will be around a year or so for those fix to be production available in ubuntu. There is some inertia there that doesn t match with the pace of the work on ceph. Then people can arg with me "why you use ubuntu?" and the answers are simple I have a cluster of 10 machines and 1 proxy if I need to compile from source lastest brew of ceph and lastest brew of kernel then my maintainance time will be way bigger. And I am more intended to get something that isn t properly done and have a machine that doesn t reboot. I know what I am talking about I used during several month ceph in archlinux compiling kernel and ceph from source until the gcc installed on my test server was too new and a compile option had been removed then ceph wasn t compiling. That way to proceed was descarted because not stable enough to bring production level quality. So as far as I understand things I will have cephfs enhanced and rbd discard ability available at same time using the couple ceph giant and linux kernel 3.18 and up ? regards and thank you again for your hardwork, I wish I could do more to help. --- Alphe Salas I.T ingeneer On 10/15/2014 11:58 AM, Sage Weil wrote: On Wed, 15 Oct 2014, Amon Ott wrote: Am 15.10.2014 14:11, schrieb Ric Wheeler: On 10/15/2014 08:43 AM, Amon Ott wrote: Am 14.10.2014 16:23, schrieb Sage Weil: On Tue, 14 Oct 2014, Amon Ott wrote: Am 13.10.2014 20:16, schrieb Sage Weil: We've been doing a lot of work on CephFS over the past few months. This is an update on the current state of things as of Giant. ... * Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse or libcephfs) clients are in good working order. Thanks for all the work and specially for concentrating on CephFS! We have been watching and testing for years by now and really hope to change our Clusters to CephFS soon. For kernel maintenance reasons, we only want to run longterm stable kernels. And for performance reasons and because of severe known problems we want to avoid Fuse. How good are our chances of a stable system with the kernel client in the latest longterm kernel 3.14? Will there be further bugfixes or feature backports? There are important bug fixes missing from 3.14. IIRC, the EC, cache tiering, and firefly CRUSH changes aren't there yet either (they landed in 3.15), and that is not appropriate for a stable series. They can be backported, but no commitment yet on that :) If the bugfixes are easily identified in one of your Ceph git branches, I would even try to backport them myself. Still, I would rather see someone from the Ceph team with deeper knowledge of the code port them. IMHO, it would be good for
Re: [ceph-users] the state of cephfs in giant
Hello sage, last time I used CephFS it had a strange behaviour when if used in conjunction with a nfs reshare of the cephfs mount point, I experienced a partial random disapearance of the tree folders. According to people in the mailing list it was a kernel module bug (not using ceph-fuse) do you know if any work has been done recently in that topic? best regards Alphe Salas I.T ingeneer On 10/14/2014 11:23 AM, Sage Weil wrote: On Tue, 14 Oct 2014, Amon Ott wrote: Am 13.10.2014 20:16, schrieb Sage Weil: We've been doing a lot of work on CephFS over the past few months. This is an update on the current state of things as of Giant. ... * Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse or libcephfs) clients are in good working order. Thanks for all the work and specially for concentrating on CephFS! We have been watching and testing for years by now and really hope to change our Clusters to CephFS soon. For kernel maintenance reasons, we only want to run longterm stable kernels. And for performance reasons and because of severe known problems we want to avoid Fuse. How good are our chances of a stable system with the kernel client in the latest longterm kernel 3.14? Will there be further bugfixes or feature backports? There are important bug fixes missing from 3.14. IIRC, the EC, cache tiering, and firefly CRUSH changes aren't there yet either (they landed in 3.15), and that is not appropriate for a stable series. They can be backported, but no commitment yet on that :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?
Hello, the best practice is to simply shut down the whole cluster starting form the clients, monitors the mds and the osd. You do your maintenance then you bring back everyone starting from monitors, mds, osd. clients. Other while the osds missing will lead to a reconstruction of your cluster that will not end with the return of the "faulty" osd(s). In the case you turn off everything related to ceph cluster then it will be transparent for the monitors and will not have to deal with partial reconstruction to clean up and rescrubing of the returned OSD(s). best regards. Alphe Salas T.I ingeneer. On 06/13/2014 04:56 AM, David wrote: Hi, We’re going to take down one OSD node for maintenance (add cpu + ram) which might take 10-20 minutes. What’s the best practice here in a production cluster running dumpling 0.67.7-1~bpo70+1? Kind Regards, David Majchrzak ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] umount gets stuck when umounting a cloned rbd image
Hello I address you with this issue i noticed it with ceph 072.2 and linux ubuntu 13.10 and with 0.80.1 with ubuntu 14.04. here is what i do: 1) I create and format to ext4 or xfs a rbd image of 4 TB . the image has --order 25 and --image-format 2 2) I create a snapshot of that rbd image 3) I protect that snapshot 4) I create a clone image of that inicial rbd image using the protected snapshot as reference. 5) I insert the line in /etc/ceph/rbdmap I map the new image. I mount the new image to my ceph client server. Until here all is fine cool and dandy 6) I umount the /dev/rbd1 which is the previous mounted rbd clone image. and umount is stuck in the client server with the umount stuck i have this message in the /var/log/syslog Jun 11 12:26:10 tesla kernel: [63365.178657] libceph: osd8 20.10.10.105:6803 socket error on read as it seems the problem is somehow related to osd8 on my 20.10.10.105 ceph node then i go there to get more information from log in the /var/log/ceph-osd.8.log there is this message comming in endlessly 2014-06-11 12:31:51.692031 7fa26085c700 0 -- 20.10.10.105:6805/23321 >> 20.10.10.12:0/2563935849 pipe(0x9dd6780 sd=231 :6805 s=0 pgs=0 cs=0 l=0 c=0x7ed6840).accept peer addr is really 20.10.10.12:0/2563935849 (socket is 20.10.10.12:33056/0) Can anyone help me solve this issue ? -- Alphe Salas I.T ingeneer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] apple support automated mail ? WTF!?
*<http://www.kepler.cl>*Hello, each time I send a mail to the ceph user mailing list I receive an email from apple support?! Is that a joke? Alphe Salas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd always expending data space problem.
Hello all, recently I get to the conclusion that of a 40 TB of physical space I could use only 16TB before seeing pg stick because osd was too full. The data space used seems to be for ever growing. using ceph osd reweight-by-utilization 103 seems at first to rebalance the osd pg use. Then the problem is solve for a time. But then the problems appears again with more PGs stuck_too_full. and the problem for ever grows. Sure the solution should be to add more disk space but for that enhancement to be significant and solving the problem it should be at least of a 25% which means growing the ceph cluster of 10 TB (5 disks of 2TB or 3 disks of 4TB) that has a cost, and the problem will only be solved for a moment until the replicas that are never freed fills again the added data. In the end I really can count on using a rbd image of 16 TB out of a 37 TB of global ceph cluster disk. Which means I can really use a 40% and over the time that ratio will drop constantly. So It is requiered of that the replicas and data can be overwriten so that the hidden data will not keep growing. Or that I can clean them when I need to. Alphe Salas. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd Problem with disk full
Hello, I need rbd kernel module to really delete data on osd related disks. Having a ever growing "hidden data" is not a great solution. Then we can say that first of all we should be able at least manually to strip out the "hidden" data aka the replicas. I use rbd image let say it is 10 TB on a overall available space of 25TB. What the real case experience shows me if that if I write in a row 8Tb of my 10 tb. then overall used data is around 18TB. Then I delete from the rbd image 4TB and write 4 TB then the overall data would grow from 4 TB, ofcours the pgs used by the rbd image will be reused overwritten but the replicas corresponding will not so. in the end after round 2 of writing the overall used space is 22TB at that moment i get stuff like this: 2034 active+clean 7 active+remapped+wait_backfill+backfill_toofull 7 active+remapped+backfilling I tried to use ceph osd reweight-by-utilization but that didn t solve the problem. And if the problem is solve it would be only momentarily because after cleaning again 4TB and writing 4TB then I will reach the full ratio and get my osd stucked until I spend 12 000 dollars to enhance my ceph cluster. Because when you manipulate a 40TB ceph cluster adding 4TB isn t quite mutch of a difference. In the end for 40TB of real space 20 disks of 2TB after first formating I get a 37 TB cluster of available data. Then I do a 18TB rbd image. And can t use much than 16TB before having my osds showing page stucks. In the end 37TB for a 16TB of available disk space for sometimes is quite not the great solution at all because I loose 60% of my data storage. On the how to delete data, really I don't know the more "easy" way I can see is at least to be able to manually tell rbd kernel module to clean "released" data from osd when we see it fit "maintenance time". If doing it automatically has a too bad impact on overall performances. I would be glad yet to be able to decide an appropriate moment to force cleaning task that would be better than nothing and ever growing "hiden" data situation. Regards, -- Alphe Salas I.T ingeneer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] packages for Trusty
hello all, to begin with there is no Emperor package for saucy. Emperor for saucy is only rolled through git and based on my experience that can broke ceph cluster to have the test builds rolling in constantly. I don t know why there is a lack on the ceph.com/download section. But the fact that inktank consider stable production version of ceph to be dumpling should explain that much (that is what they sell). Why carring for today's ubuntu and today's "stable" when the real product sold is the ceph of past year that works greatly on the ubuntu from past year. Alphe Salas. On 04/25/2014 06:03 PM, Craig Lewis wrote: Using the Emperor builds for Precise seems to work on Trusty. I just put a hold on all of the ceph, rados, and apache packages before the release upgrade. It makes me nervous though. I haven't stressed it much, and I don't really want to roll it out to production. I would like to see Emperor builds for Trusty, so I can get started rolling out Trusty independently of Firefly. Changing one thing at a time is invaluable when bad things start happening. *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> On 4/25/14 12:10 , Sebastien wrote: Well as far as I know trusty has 0.79 and will get firefly as soon as it's ready so I'm not sure if it's that urgent. Precise repo should work fine. My 2 cents Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood." Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine 75008 Paris Web : www.enovance.com - Twitter : @enovance On Fri, Apr 25, 2014 at 9:05 PM, Travis Rhoden <mailto:trho...@gmail.com>> wrote: Are there packages for Trusty being built yet? I don't see it listed at http://ceph.com/debian-emperor/dists/ Thanks, - Travis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Firefly distribution
hello, my ask is will we get a ceph.com/debian-firefly/ directory with pakages for ubuntu 13.10 and 14.04 ? or will we have raring as last supported ubuntu distro like for emperor. And the saucy / trusty through github ? regards signatur ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS filesystem disapear!
Hello all, bad news the problem shown again today ! I had mounted a bindfs and nfs server running on the same "proxy" server when I ran a massive chmod on a directory with large amount of data I got part of the directories that disapeared! The problem wasn t showing as fast as it was with kernel 3.11 but it still was showing. I don t know the origine and if the bindfs / nfs are related. Alphe Salas I.T ingeneer On 11/22/13 10:15, Alphe Salas Michels wrote: Hello Yan, Good guess ! thank you for your advice I updated this morning my cephfs-proxy ubuntu 13.10 to recommended 3.12 kernel and after the first preliminary tests the issue isn t showing anymore. Regards, signature *Alphé Salas* Ingeniero T.I *<http://www.kepler.cl>* On 11/21/13 23:06, Yan, Zheng wrote: On Fri, Nov 22, 2013 at 9:19 AM, Alphe Salas Michels mailto:asa...@kepler.cl>> wrote: Hello all! I experience a strange issue since last update to ubuntu 13.10 (saucy) and ceph emperor 0.72.1 kernel version 3.11.0-13-generic #20-Ubuntu ceph packages installed are the ones for RARING when I mount my ceph cluster using cephfs and I upload a tons of data or do a directory listing (find . -printf "%d %k" ) or do a chown -R user:user * at some point the filesystem disapear! I don t know how to solve this issue there is no entry in anylog the only thing that seems to be affected is ceph-watch-notice that get stuckl and forbid the unmount "have to pid kill .-9 that process to umount / mount the ceph cluster on client proxy to start over the process. in the chown if I put --changes to slow it down just enought then the problem seems to disapear. sounds like the d_prune_aliases() bug. please try updating 3.12 kernel or using ceph-fuse Yan, Zheng Any suggestions are welcome Atte, -- *Alphé Salas* Ingeniero T.I Descripción: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery* *Asturias 97, Las Condes** Santiago- Chile** **(56 2) 2362 7504* asa...@kepler.cl <mailto:asa...@kepler.cl> *www.kepler.cl <http://www.kepler.cl>* ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS filesystem disapear!
Hello Yan, Good guess ! thank you for your advice I updated this morning my cephfs-proxy ubuntu 13.10 to recommended 3.12 kernel and after the first preliminary tests the issue isn t showing anymore. Regards, signature *Alphé Salas* Ingeniero T.I *<http://www.kepler.cl>* On 11/21/13 23:06, Yan, Zheng wrote: On Fri, Nov 22, 2013 at 9:19 AM, Alphe Salas Michels <mailto:asa...@kepler.cl>> wrote: Hello all! I experience a strange issue since last update to ubuntu 13.10 (saucy) and ceph emperor 0.72.1 kernel version 3.11.0-13-generic #20-Ubuntu ceph packages installed are the ones for RARING when I mount my ceph cluster using cephfs and I upload a tons of data or do a directory listing (find . -printf "%d %k" ) or do a chown -R user:user * at some point the filesystem disapear! I don t know how to solve this issue there is no entry in anylog the only thing that seems to be affected is ceph-watch-notice that get stuckl and forbid the unmount "have to pid kill .-9 that process to umount / mount the ceph cluster on client proxy to start over the process. in the chown if I put --changes to slow it down just enought then the problem seems to disapear. sounds like the d_prune_aliases() bug. please try updating 3.12 kernel or using ceph-fuse Yan, Zheng Any suggestions are welcome Atte, -- *Alphé Salas* Ingeniero T.I Descripción: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery* *Asturias 97, Las Condes** Santiago- Chile** **(56 2) 2362 7504* asa...@kepler.cl <mailto:asa...@kepler.cl> *www.kepler.cl <http://www.kepler.cl>* ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS filesystem disapear!
Title: signature Hello all! I experience a strange issue since last update to ubuntu 13.10 (saucy) and ceph emperor 0.72.1 kernel version 3.11.0-13-generic #20-Ubuntu ceph packages installed are the ones for RARING when I mount my ceph cluster using cephfs and I upload a tons of data or do a directory listing (find . -printf "%d %k" ) or do a chown -R user:user * at some point the filesystem disapear! I don t know how to solve this issue there is no entry in anylog the only thing that seems to be affected is ceph-watch-notice that get stuckl and forbid the unmount "have to pid kill .-9 that process to umount / mount the ceph cluster on client proxy to start over the process. in the chown if I put --changes to slow it down just enought then the problem seems to disapear. Any suggestions are welcome Atte, -- Alphé Salas Ingeniero T.I Kepler Data Recovery Asturias 97, Las Condes Santiago- Chile (56 2) 2362 7504 asa...@kepler.cl www.kepler.cl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-deploy] problem creating mds after a full cluster wipe
Title: signature Alphé Salas Ingeniero T.I Kepler Data Recovery Asturias 97, Las Condes Santiago- Chile (56 2) 2362 7504 asa...@kepler.cl www.kepler.cl On 09/04/13 23:56, Sage Weil wrote: On Wed, 4 Sep 2013, Alphe Salas Michels wrote: Hi again, as I was doomed to full wipe my cluster once again after. I uploaded to ceph-deploy 1.2.3 all went smoothing along my ceph-deploy process. until I create the mds and then ceph-deploy mds create myhost provoked first a File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py", line 645, in __handle raise e pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/bootstrap-mds' doing a mkdir -p /var/lib/ceph/bootstrap-mds solved that one then I got a: pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/mds/ceph-mds01' doing a mkdir -p /var/lib/ceph/mds/ceph-mds01 solved that one too What distro was this? And what version of ceph did you install? Thanks! sage Sorry Sage and all for the late reply I missed your comments distro: ubuntu 13.04 main up to date as much as it could be ceph: 0.67.2-1 raring ceph-deploy: 1.2.3 After that all was runing nicely ... health HEALTH_OK etc ../.. mdsmap e4: 1/1/1 up {0=mds01=up:active} Hope that can help. -- signature *Alph? Salas* Ingeniero T.I Descripci?n: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery* *Asturias 97, Las Condes** Santiago- Chile** **(56 2) 2362 7504* asa...@kepler.cl *www.kepler.cl * ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ceph-deploy] problem creating mds after a full cluster wipe
Title: signature Hi again, as I was doomed to full wipe my cluster once again after. I uploaded to ceph-deploy 1.2.3 all went smoothing along my ceph-deploy process. until I create the mds and then ceph-deploy mds create myhost provoked first a File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py", line 645, in __handle raise e pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/bootstrap-mds' doing a mkdir -p /var/lib/ceph/bootstrap-mds solved that one then I got a: pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/mds/ceph-mds01' doing a mkdir -p /var/lib/ceph/mds/ceph-mds01 solved that one too After that all was runing nicely ... health HEALTH_OK etc ../.. mdsmap e4: 1/1/1 up {0=mds01=up:active} Hope that can help. -- Alphé Salas Ingeniero T.I Kepler Data Recovery Asturias 97, Las Condes Santiago- Chile (56 2) 2362 7504 asa...@kepler.cl www.kepler.cl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com