Re: [ceph-users] rebalancing taking very long time

2015-09-08 Thread Alphe Salas
I can say exactly the same I am using ceph sin 0.38 and I never get osd 
so laggy than with 0.94. rebalancing /rebuild algorithm is crap in 0.94 
serriously I have 2 osd serving 2 discs of 2TB and 4 GB of RAM osd takes 
1.6GB each !!! serriously ! that makes avanche snow.


Let me be straight and explain what changed.

in 0.38 you ALWAYS could stop the ceph cluster and then start it up it 
would evaluate if everyone is back if there is enough replicas then 
start rebuilding /rebalancing what needed of course like 10 minutes was 
necesary to bring up ceph cluster but then the rebuilding /rebalancing 
process was smooth.
With 0.94 first you have 2 osd too full at 95 % and 4 osd at 63% over 20 
osd. then you get a disc crash. so ceph starts automatically to rebuild 
and rebalance stuff. and there osd start to lag then to crash
you stop ceph cluster you change the drive restart the ceph cluster 
stops all rebuild process setting no-backfill, norecovey noscrub 
nodeep-scrub you rm the old osd create a new one wait for all osd
to be in and up and then starts rebuilding lag/rebalancing since it is 
automated not much a choice there.


And again all osd are stuck in enless lag/down/recovery intent cycle...

It is a pain serriously. 5 days after changing the faulty disc it is 
still locked in the lag/down/recovery cycle.


Sur it can be argued that my machines are really ressource limited and 
that I should buy 3 thousand dollar worth server at least. But intil 
0.72 that rebalancing /rebuilding process was working smoothly on the 
same hardware.


It seems to me that the rebalancing/rebuilding algorithm is more strict 
now than it was in the past. in the past only what really really needed 
to be rebuild or rebalance was rebalanced or rebuild.


I can still delete all and go back to 0.72... like I should buy a cray 
T-90 to not have anymore problems and have ceph run smoothly. But this 
will not help making ceph a better product.


for me ceph 0.94 is like windows vista...

Alphe Salas
I.T ingeneer

On 09/08/2015 10:20 AM, Gregory Farnum wrote:

On Wed, Sep 2, 2015 at 9:34 PM, Bob Ababurko  wrote:

When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very
long time to rebalance.  I should note that my cluster is slightly unique in
that I am using cephfs(shouldn't matter?) and it currently contains about
310 million objects.

The last time I replaced a disk/OSD was 2.5 days ago and it is still
rebalancing.  This is on a cluster with no client load.

The configurations is 5 hosts with 6 x 1TB 7200rpm SATA OSD's & 1 850 Pro
SSD which contains the journals for said OSD's.  Thats means 30 OSD's in
total.  System disk is on its own disk.  I'm also using a backend network
with single Gb NIC.  THe rebalancing rate(objects/s) seems to be very slow
when it is close to finishingsay <1% objects misplaced.

It doesn't seem right that it would take 2+ days to rebalance a 1TB disk
with no load on the cluster.  Are my expectations off?


Possibly...Ceph basically needs to treat each object as a single IO.
If you're recovering from a failed disk then you've got to replicate
roughly 310 million * 3 / 30 = 31 million objects. If it's perfectly
balanced across 30 disks that get 80 IOPS that's 12916 seconds (~3.5
hours) worth of work just to read each file — and in reality it's
likely to take more than one IO to read the file, and then you have to
spend a bunch to write it as well.



I'm not sure if my pg_num/pgp_num needs to be changed OR the rebalance time
is dependent on the number of objects in the pool.  These are thoughts i've
had but am not certain are relevant here.


Rebalance time is dependent on the number of objects in the pool. You
*might* see an improvement by increasing "osd max push objects" from
its default of 10...or you might not. That many small files isn't
something I've explored.
-Greg



$ sudo ceph -v
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)

$ sudo ceph -s
[sudo] password for bababurko:
 cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79
  health HEALTH_WARN
 5 pgs backfilling
 5 pgs stuck unclean
 recovery 3046506/676638611 objects misplaced (0.450%)
  monmap e1: 3 mons at
{cephmon01=10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0}
 election epoch 20, quorum 0,1,2 cephmon01,cephmon02,cephmon03
  mdsmap e6070: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
  osdmap e4395: 30 osds: 30 up, 30 in; 5 remapped pgs
   pgmap v3100039: 2112 pgs, 3 pools, 6454 GB data, 321 Mobjects
 18319 GB used, 9612 GB / 27931 GB avail
 3046506/676638611 objects misplaced (0.450%)
 2095 active+clean
   12 active+clean+scrubbing+deep
5 active+remapped+backfilling
recovery io 2294 kB/s, 147 objects/s

$ sudo rados df
pool name

[ceph-users] installing ceph giant on ubuntu 15,04

2015-05-05 Thread Alphe Salas

Hello everyone,
I recently had to install ceph giant on ubuntu 15.04 and had to solve 
some problems, so here is the best way to do it.



1)replace in your ubuntu 15.04 fresh install systemd with upstart
apt-get update
apt-get install upstart
apt-get install upstart-sysv (remove systemd and replace it by the 
upstart whole thing)


2) install dpkg-dev
apt-get install dpkg-dev

2) download ceph giant 0.94.1 in sources
3) untargz the sources package
4) from the sources directory run the script that will download all the 
necesary ./install-deps.sh

5) make sure we have all dependencies
dpkg-checkbuilddeps
6) compile sources and create the deb packages
dpkg-buildpackage

Once you have all your .deb files related to all the aspect of ceph then 
you can deploy them on all the nodes of your ceph cluster.

install the needed packages with dpkg --install ceph-0.94.1.deb
once done with the installation of your packages you will have to 
download all their dependencies doing simply:


apt-get install -f

And you will be ready to use ceph-deploy to deploy your nice ceph cluster.

Regards
--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] the state of cephfs in giant

2014-10-15 Thread Alphe Salas
For the humble ceph user I am it is really hard to follow what version 
of what product will get the changes I requiere.


Let me explain myself. I use ceph in my company is specialised in disk 
recovery, my company needs a flexible, easy to maintain, trustable way 
to store the data from the disks of our clients.


We try the usual way jbod boxes connected to a single server with a SAS 
raid card and ZFS mirror to handle replicas and merging disks into a big 
disk. result is really slow. (used to use zfs and solaris 11 on x86 
servers... with openZfs and ubuntu 14.04 the perf are way better but not 
any were comparable with ceph (on a giga ethernet lan you can get data 
transfer betwin client and ceph cluster around 80MB/s...while client to 
openzfs/ubuntu is around 25MB/S)


Along my path with ceph I first used cephfs, worked fine! until I 
noticed that part of the folder tree suddently randomly disapeared 
forcing a constant periodical remount of the partitions.


Then I choose to forget about cephfs and use rbd images, worked fine!
Until I noticed that rbd replicas where never freed or overwriten and 
that for a replicas set to 2 (data and 1 replica) and an image of 13 TB 
after some time of write erase cycles on the same rbd image I get an 
overall data use of 34 TB over the 36TB available on my cluster I 
noticed that there was a real problem with "space management". The data 
part of the rbd image was properly managed using overwrites on old 
deleted data at OS level, so the only logical explaination of the 
overall data use growth was that the replicas where never freed.


All along that time I was pending of the bugs/ features and advances of 
ceph.
But those isues are not really ceph related they are kernel modules for 
using "ceph clients" so the release of feature add and bug fix are in 
part to be given in the ceph-common package (for the server related 
machanics) and the other part is then to be provided at the kernel level.


For comodity I use Ubuntu which is not really top notch using the very 
lastest brew of the kernel and all the bug fixed modules.


So when I see this great news about giant and the fact that alot of work 
has been done in solving most of the problems we all faced with
ceph then I notice that it will be around a year or so for those fix to 
be production available in ubuntu. There is some inertia there that 
doesn t match with the pace of the work on ceph.


Then people can arg with me "why you use ubuntu?"
and the answers are simple I have a cluster of 10 machines and 1 proxy 
if I need to compile from source lastest brew of ceph and lastest brew 
of kernel then my maintainance time will be way bigger. And I am more 
intended to get something that isn t properly done and have a machine 
that doesn t reboot.
I know what I am talking about I used during several month ceph in 
archlinux compiling kernel and ceph from source until the gcc installed 
on my test server was too new and a compile option had been removed then 
ceph wasn t compiling. That way to proceed was descarted because not 
stable enough to bring production level quality.


So as far as I understand things I will have cephfs enhanced and rbd 
discard ability available at same time using the couple ceph giant and 
linux kernel 3.18 and up ?


regards and thank you again for your hardwork, I wish I could do more to 
help.



---
Alphe Salas
I.T ingeneer

On 10/15/2014 11:58 AM, Sage Weil wrote:

On Wed, 15 Oct 2014, Amon Ott wrote:

Am 15.10.2014 14:11, schrieb Ric Wheeler:

On 10/15/2014 08:43 AM, Amon Ott wrote:

Am 14.10.2014 16:23, schrieb Sage Weil:

On Tue, 14 Oct 2014, Amon Ott wrote:

Am 13.10.2014 20:16, schrieb Sage Weil:

We've been doing a lot of work on CephFS over the past few months.
This
is an update on the current state of things as of Giant.

...

* Either the kernel client (kernel 3.17 or later) or userspace
(ceph-fuse
or libcephfs) clients are in good working order.

Thanks for all the work and specially for concentrating on CephFS! We
have been watching and testing for years by now and really hope to
change our Clusters to CephFS soon.

For kernel maintenance reasons, we only want to run longterm stable
kernels. And for performance reasons and because of severe known
problems we want to avoid Fuse. How good are our chances of a stable
system with the kernel client in the latest longterm kernel 3.14? Will
there be further bugfixes or feature backports?

There are important bug fixes missing from 3.14.  IIRC, the EC, cache
tiering, and firefly CRUSH changes aren't there yet either (they
landed in
3.15), and that is not appropriate for a stable series.

They can be backported, but no commitment yet on that :)

If the bugfixes are easily identified in one of your Ceph git branches,
I would even try to backport them myself. Still, I would rather see
someone from the Ceph team with deeper knowledge of the code port them.

IMHO, it would be good for

Re: [ceph-users] the state of cephfs in giant

2014-10-14 Thread Alphe Salas
Hello sage, last time I used CephFS it had a strange behaviour when if 
used in conjunction with a nfs reshare of the cephfs mount point, I 
experienced a partial random disapearance of the tree folders.


According to people in the mailing list it was a kernel module bug (not 
using ceph-fuse) do you know if any work has been done recently in that 
topic?


best regards

Alphe Salas
I.T ingeneer

On 10/14/2014 11:23 AM, Sage Weil wrote:

On Tue, 14 Oct 2014, Amon Ott wrote:

Am 13.10.2014 20:16, schrieb Sage Weil:

We've been doing a lot of work on CephFS over the past few months. This
is an update on the current state of things as of Giant.

...

* Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse
   or libcephfs) clients are in good working order.


Thanks for all the work and specially for concentrating on CephFS! We
have been watching and testing for years by now and really hope to
change our Clusters to CephFS soon.

For kernel maintenance reasons, we only want to run longterm stable
kernels. And for performance reasons and because of severe known
problems we want to avoid Fuse. How good are our chances of a stable
system with the kernel client in the latest longterm kernel 3.14? Will
there be further bugfixes or feature backports?


There are important bug fixes missing from 3.14.  IIRC, the EC, cache
tiering, and firefly CRUSH changes aren't there yet either (they landed in
3.15), and that is not appropriate for a stable series.

They can be backported, but no commitment yet on that :)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

2014-06-19 Thread Alphe Salas Michels
Hello, the best practice is to simply shut down the whole cluster 
starting form the clients,  monitors the mds and the osd. You do your 
maintenance then you bring back everyone starting from monitors, mds, 
osd. clients.


Other while the osds missing will lead to a reconstruction of your 
cluster that will not end with the return of the "faulty" osd(s). In the 
case you turn off everything related to ceph cluster then it will be 
transparent for the monitors and will not have to deal with partial 
reconstruction to clean up and rescrubing of the returned OSD(s).


best regards.

Alphe Salas
T.I ingeneer.


On 06/13/2014 04:56 AM, David wrote:

Hi,

We’re going to take down one OSD node for maintenance (add cpu + ram) which 
might take 10-20 minutes.
What’s the best practice here in a production cluster running dumpling 
0.67.7-1~bpo70+1?

Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] umount gets stuck when umounting a cloned rbd image

2014-06-11 Thread Alphe Salas
Hello I address you with this issue i noticed it with ceph 072.2 and 
linux ubuntu 13.10  and with 0.80.1 with ubuntu 14.04.

here is what i do:
1) I create and format to ext4 or xfs a rbd image of 4 TB . the image 
has --order 25 and --image-format 2

2) I create a snapshot of that rbd image
3) I protect that snapshot
4) I create a clone image of that inicial rbd image using the protected 
snapshot as reference.
5) I insert the line in /etc/ceph/rbdmap I map the new image. I mount 
the new image to my ceph client server.


Until here all is fine cool and dandy

6) I umount the /dev/rbd1 which is the previous mounted rbd clone image. 
and umount is stuck


in the client server with the umount stuck i have this message in the 
/var/log/syslog


Jun 11 12:26:10 tesla kernel: [63365.178657] libceph: osd8 
20.10.10.105:6803 socket error on read


as it seems the problem is somehow related to osd8 on my 20.10.10.105 
ceph node then i go there to get more information from log


in the /var/log/ceph-osd.8.log there is this message comming in endlessly

2014-06-11 12:31:51.692031 7fa26085c700  0 -- 20.10.10.105:6805/23321 >> 
20.10.10.12:0/2563935849 pipe(0x9dd6780 sd=231 :6805 s=0 pgs=0 cs=0 l=0 
c=0x7ed6840).accept peer addr is really 20.10.10.12:0/2563935849 (socket 
is 20.10.10.12:33056/0)




Can anyone help me solve this issue ?

--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] apple support automated mail ? WTF!?

2014-04-28 Thread Alphe Salas Michels

*<http://www.kepler.cl>*Hello,
each time I send a mail to the ceph user mailing list I receive an email 
from apple support?!

Is that a joke?


Alphe Salas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd always expending data space problem.

2014-04-28 Thread Alphe Salas Michels

Hello all,
recently I get to the conclusion that of a 40 TB of physical space I 
could use only 16TB before seeing pg stick because osd was too full.

The data space used seems to be for ever growing.

using ceph osd reweight-by-utilization 103 seems at first to rebalance 
the osd pg use. Then the problem is solve for a time. But then the 
problems appears again with more PGs stuck_too_full. and the problem for 
ever grows. Sure the solution should be to add more disk space but for 
that enhancement to be significant and solving the problem it should be 
at least of a 25% which means growing the ceph cluster of 10 TB (5 disks 
of 2TB or 3 disks of 4TB) that has a cost, and the problem will only be 
solved for a moment until the replicas that are never freed fills again 
the added data.



In the end I really can count on using a rbd image of 16 TB out of a 37 
TB of global ceph cluster disk. Which means I can really use a 40% and 
over the time that ratio will drop constantly.


So It is requiered of that the replicas and data can be overwriten so 
that the hidden data will not keep growing. Or that I can clean them 
when I need to.


Alphe Salas.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd Problem with disk full

2014-04-28 Thread Alphe Salas

Hello,
I need rbd kernel module to really delete data on osd related disks. 
Having a ever growing "hidden data" is not a great solution.


Then we can say that first of all we should be able at least manually to 
strip out the "hidden" data aka the replicas.


I use rbd image let say it is 10 TB on  a overall available space of 
25TB. What the real case experience shows me if that if I write in a row 
8Tb of my 10 tb. then overall used data is around 18TB. Then I delete 
from the rbd image 4TB and write 4 TB then the overall data would grow 
from 4 TB, ofcours the pgs used by the rbd image will be reused 
overwritten but the replicas corresponding will not so.

in the end after round 2 of writing the overall used space is 22TB
at that moment i get stuff like this:

2034 active+clean
   7 active+remapped+wait_backfill+backfill_toofull
   7 active+remapped+backfilling

I tried to use ceph osd reweight-by-utilization but that  didn t solve 
the problem. And if the problem is solve it would be only momentarily 
because after cleaning again 4TB and writing 4TB then I will reach the 
full ratio and get my osd stucked until I spend 12 000 dollars to 
enhance my ceph cluster. Because when you manipulate a 40TB ceph cluster

adding 4TB isn t quite mutch of a difference.

In the end for 40TB of real space 20 disks of 2TB after first formating
I get a 37 TB cluster of available data. Then I do a 18TB rbd image. And
can t use much than 16TB before having my osds showing page stucks.

In the end 37TB for a 16TB of available disk space for sometimes is 
quite not the great solution at all because I loose 60% of my data 
storage.


On the how to delete data, really I don't know the more "easy" way
I can see is at least to be able to manually tell rbd kernel module to
clean "released" data from osd when we see it fit "maintenance time".

If doing it automatically has a too bad impact on overall performances.
I would be glad yet to be able to decide an appropriate moment to force
cleaning task that would be better than nothing and ever growing "hiden" 
data situation.


Regards,

--
Alphe Salas
I.T ingeneer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] packages for Trusty

2014-04-28 Thread Alphe Salas Michels

hello all,
to begin with there is no Emperor package for saucy. Emperor for saucy 
is only rolled through git and based on my experience that can broke 
ceph cluster to have the test builds rolling in constantly.


I don t know why there is a lack on the ceph.com/download section. But 
the fact that inktank consider stable production version of ceph to be 
dumpling should explain that much (that is what they sell). Why carring 
for today's ubuntu and today's "stable" when the real product sold is 
the ceph of past year that works greatly on the ubuntu from past year.


Alphe Salas.

On 04/25/2014 06:03 PM, Craig Lewis wrote:
Using the Emperor builds for Precise seems to work on Trusty.  I just 
put a hold on all of the ceph, rados, and apache packages before the 
release upgrade.


It makes me nervous though.  I haven't stressed it much, and I don't 
really want to roll it out to production.


I would like to see Emperor builds for Trusty, so I can get started 
rolling out Trusty independently of Firefly.  Changing one thing at a 
time is invaluable when bad things start happening.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter 
<http://www.twitter.com/centraldesktop>  | Facebook 
<http://www.facebook.com/CentralDesktop>  | LinkedIn 
<http://www.linkedin.com/groups?gid=147417>  | Blog 
<http://cdblog.centraldesktop.com/>


On 4/25/14 12:10 , Sebastien wrote:


Well as far as I know trusty has 0.79 and will get firefly as soon as 
it's ready so I'm not sure if it's that urgent. Precise repo should 
work fine.


My 2 cents


Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine 75008 Paris
Web : www.enovance.com - Twitter : @enovance


On Fri, Apr 25, 2014 at 9:05 PM, Travis Rhoden <mailto:trho...@gmail.com>> wrote:


Are there packages for Trusty being built yet?

I don't see it listed at http://ceph.com/debian-emperor/dists/

Thanks,

- Travis



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Firefly distribution

2014-03-21 Thread Alphe Salas Michels

hello,
my ask is will we get a ceph.com/debian-firefly/ directory with pakages 
for ubuntu 13.10 and 14.04  ?
or will we have raring as last supported ubuntu distro like for emperor. 
And the saucy / trusty  through github ?


regards


signatur
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS filesystem disapear!

2013-11-26 Thread Alphe Salas

Hello all,
bad news the problem shown again today !
I had mounted a bindfs and nfs server running on the same "proxy" server 
when I ran a massive chmod on a directory with large amount of data I 
got part of the directories that disapeared!


The problem wasn t showing as fast as it was with kernel 3.11 but
it still was showing.

I don t know the origine and if the bindfs / nfs are related.


Alphe Salas
I.T ingeneer

On 11/22/13 10:15, Alphe Salas Michels wrote:

Hello Yan,
Good guess ! thank you for your  advice I updated this morning my
cephfs-proxy ubuntu 13.10 to recommended 3.12 kernel and
after the first preliminary tests the issue isn t showing anymore.


Regards,
signature

*Alphé Salas*
Ingeniero T.I


*<http://www.kepler.cl>*

On 11/21/13 23:06, Yan, Zheng wrote:

On Fri, Nov 22, 2013 at 9:19 AM, Alphe Salas Michels mailto:asa...@kepler.cl>> wrote:

Hello all!

I experience a strange issue since last update to ubuntu 13.10
(saucy) and ceph emperor 0.72.1

kernel version  3.11.0-13-generic #20-Ubuntu

ceph packages installed are the ones for RARING

when I mount my ceph cluster using cephfs and I upload a tons of
data or do a directory listing (find . -printf "%d %k" ) or do
a chown -R user:user * at some point the filesystem disapear!

I don t know how to solve this issue there is no entry in anylog
the only thing that seems to be affected is ceph-watch-notice that
get stuckl
and forbid the unmount "have to pid kill .-9 that process to
umount / mount the ceph cluster on client proxy to start over the
process.
in the chown if I put --changes to slow it down just enought then
the problem seems to disapear.


sounds like the d_prune_aliases() bug. please try updating 3.12 kernel
or using ceph-fuse

Yan, Zheng

Any suggestions are welcome
Atte,

--
*Alphé Salas*
Ingeniero T.I

Descripción: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery*

*Asturias 97, Las Condes**
Santiago- Chile**
**(56 2) 2362 7504*

asa...@kepler.cl <mailto:asa...@kepler.cl>
*www.kepler.cl <http://www.kepler.cl>*


___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS filesystem disapear!

2013-11-22 Thread Alphe Salas Michels

Hello Yan,
Good guess ! thank you for your  advice I updated this morning my 
cephfs-proxy ubuntu 13.10 to recommended 3.12 kernel and

after the first preliminary tests the issue isn t showing anymore.


Regards,
signature

*Alphé Salas*
Ingeniero T.I


*<http://www.kepler.cl>*

On 11/21/13 23:06, Yan, Zheng wrote:
On Fri, Nov 22, 2013 at 9:19 AM, Alphe Salas Michels <mailto:asa...@kepler.cl>> wrote:


Hello all!

I experience a strange issue since last update to ubuntu 13.10
(saucy) and ceph emperor 0.72.1

kernel version  3.11.0-13-generic #20-Ubuntu

ceph packages installed are the ones for RARING

when I mount my ceph cluster using cephfs and I upload a tons of
data or do a directory listing (find . -printf "%d %k" ) or do
a chown -R user:user * at some point the filesystem disapear!

I don t know how to solve this issue there is no entry in anylog
the only thing that seems to be affected is ceph-watch-notice that
get stuckl
and forbid the unmount "have to pid kill .-9 that process to
umount / mount the ceph cluster on client proxy to start over the
process.
in the chown if I put --changes to slow it down just enought then
the problem seems to disapear.


sounds like the d_prune_aliases() bug. please try updating 3.12 kernel 
or using ceph-fuse


Yan, Zheng

Any suggestions are welcome
Atte,

-- 


*Alphé Salas*
Ingeniero T.I

Descripción: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery*

*Asturias 97, Las Condes**
Santiago- Chile**
**(56 2) 2362 7504*

asa...@kepler.cl <mailto:asa...@kepler.cl>
*www.kepler.cl <http://www.kepler.cl>*


___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS filesystem disapear!

2013-11-21 Thread Alphe Salas Michels
Title: signature

  
  
Hello all!

I experience a strange issue since last update to ubuntu 13.10
(saucy) and ceph emperor 0.72.1

kernel version  3.11.0-13-generic #20-Ubuntu 

ceph packages installed are the ones for RARING

when I mount my ceph cluster using cephfs and I upload a tons of
data or do a directory listing (find . -printf "%d %k" ) or do 
a chown -R user:user * at some point the filesystem disapear!

I don t know how to solve this issue there is no entry in anylog the
only thing that seems to be affected is ceph-watch-notice that get
stuckl
and forbid the unmount "have to pid kill .-9 that process to umount
/ mount the ceph cluster on client proxy to start over the process.
in the chown if I put --changes to slow it down just enought then
the problem seems to disapear.


Any suggestions are welcome
Atte, 

-- 
  
  
  Alphé

Salas
Ingeniero T.I

  Kepler Data Recovery
  Asturias

97, Las Condes
Santiago- Chile
  (56 2) 2362 7504
  asa...@kepler.cl
  www.kepler.cl
  
  

  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [ceph-deploy] problem creating mds after a full cluster wipe

2013-09-11 Thread Alphe Salas Michels
Title: signature

  
  

  


Alphé

  Salas
  Ingeniero T.I
  
Kepler Data Recovery
Asturias

  97, Las Condes
  Santiago- Chile
(56 2) 2362 7504
asa...@kepler.cl
www.kepler.cl


  
  On 09/04/13 23:56, Sage Weil wrote:


  On Wed, 4 Sep 2013, Alphe Salas Michels wrote:

  
Hi again,
as I was doomed to full wipe my cluster once again after. I uploaded to
ceph-deploy 1.2.3
all went smoothing along my ceph-deploy process.

until I create the mds and then

ceph-deploy mds create myhost provoked
first a

  File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py",
line 645, in __handle
raise e
pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory:
'/var/lib/ceph/bootstrap-mds'

doing a mkdir -p /var/lib/ceph/bootstrap-mds solved that one

then I got a:

pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory:
'/var/lib/ceph/mds/ceph-mds01'

doing a mkdir -p /var/lib/ceph/mds/ceph-mds01 solved that one too

  
  
What distro was this?  And what version of ceph did you install?

Thanks!
sage


Sorry Sage and all for the late reply I missed your comments
distro: ubuntu 13.04 main up to date as much as it could be
ceph: 0.67.2-1 raring
ceph-deploy: 1.2.3


  


  


After that all was runing nicely ...
  health HEALTH_OK
etc ../..
 mdsmap e4: 1/1/1 up {0=mds01=up:active}

Hope that can help.

-- 
signature

*Alph? Salas*
Ingeniero T.I

Descripci?n: cid:image001.gif@01CAA59C.F14CE4B0*Kepler Data Recovery*

*Asturias 97, Las Condes**
Santiago- Chile**
**(56 2) 2362 7504*

asa...@kepler.cl
*www.kepler.cl *



  
  



  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ceph-deploy] problem creating mds after a full cluster wipe

2013-09-04 Thread Alphe Salas Michels
Title: signature

  
  
Hi again,
as I was doomed to full wipe my cluster once again after. I uploaded
to ceph-deploy 1.2.3
all went smoothing along my ceph-deploy process.

until I create the mds and then 

ceph-deploy mds create myhost provoked  
first a 

  File
"/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py",
line 645, in __handle
    raise e
pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or
directory: '/var/lib/ceph/bootstrap-mds'

doing a mkdir -p /var/lib/ceph/bootstrap-mds solved that one 

then I got a:

pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or
directory: '/var/lib/ceph/mds/ceph-mds01'

doing a mkdir -p /var/lib/ceph/mds/ceph-mds01 solved that one too


After that all was runing nicely ...
  health HEALTH_OK
etc ../..
 mdsmap e4: 1/1/1 up {0=mds01=up:active}

Hope that can help.

-- 
  
  
  Alphé

Salas
Ingeniero T.I

  Kepler Data Recovery
  Asturias

97, Las Condes
Santiago- Chile
  (56 2) 2362 7504
  asa...@kepler.cl
  www.kepler.cl
  
  

  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com