Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-03 Thread Jasper Siero
Hello Greg,

I saw that the site of the previous link of the logs uses a very short expiring 
time so I uploaded it to another one:

http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

Thanks,

Jasper


Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
Farnum [gfar...@redhat.com]
Verzonden: donderdag 30 oktober 2014 1:03
Aan: Jasper Siero
CC: John Spray; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
 --cluster ceph --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576
  writing 9567209693~1048576
  writing 9568258269~1048576
  writing 9569306845~1048576
  writing 9570355421~1048576
  writing 9571403997~1048576
  writing 9572452573~1048576
  writing 9573501149~1048576
  writing 9574549725~1048576
  writing 9575598301~1048576
  writing 9576646877~1048576
  writing 9577695453~1048576
  writing 9578744029~1048576
  writing 9579792605~1048576
  writing 9580841181~1048576
  writing 9581889757~1048576
  writing 9582938333~1048576
  writing 9583986909~1048576
  writing 9585035485~1048576
  writing 9586084061~1048576
  writing 9587132637~1048576
  writing 9588181213~1048576
  writing 9589229789~1048576
  writing 9590278365~1048576
  writing 9591326941~1048576
  writing 9592375517~1048576
  writing 9593424093~1048576
  writing 9594472669~1048576
  writing 9595521245~1048576
  writing 9596569821~1048576
  writing 9597618397~1048576
  writing 9598666973~1048576
  writing 9599715549~1048576
  writing 9600764125~1048576
  writing 9601812701~1048576
  writing 9602861277~1048576
  writing 9603909853~1048576
  writing 9604958429~1048576
  writing 9606007005~1048576
  writing 9607055581~1048576
  writing 9608104157~1048576
  writing 9609152733~1048576
  writing 9610201309~1048576
  writing 9611249885~1048576
  writing 9612298461~1048576
  writing 9613347037~1048576
  writing 9614395613~1048576
  writing 9615444189~1048576
  writing 9616492765~1044159
 done.
 

Re: [ceph-users] ceph version 0.79, rbd flatten report Segmentation fault (core dumped)

2014-11-03 Thread Ilya Dryomov
On Mon, Nov 3, 2014 at 9:31 AM,  duan.xuf...@zte.com.cn wrote:

 root@CONTROLLER-4F:~# rbd -p volumes flatten
 f3e81ea3-1d5b-487a-a55e-53efff604d54_disk
 *** Caught signal (Segmentation fault) **
  in thread 7fe99984f700
  ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
  1: (()+0x22a4f) [0x7fe9a1745a4f]
  2: (()+0x10340) [0x7fe9a00f2340]
  3: (librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned
 long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long
  const, char*, ceph::buffer::list*, librbd::AioCompletion*)+0x24)
 [0x7fe9a125daf4]
  4: (librbd::AioRequest::read_from_parent(std::vectorstd::pairunsigned
 long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long
 )+0x85) [0x7fe9a1242745]
  5: (librbd::AioRead::should_complete(int)+0x352) [0x7fe9a1242ca2]
  6: (librbd::rados_req_cb(void*, void*)+0x1b) [0x7fe9a124cd7b]
  7: (librados::C_AioComplete::finish(int)+0x1d) [0x7fe9a04a355d]
  8: (Context::complete(int)+0x9) [0x7fe9a0480579]
  9: (Finisher::finisher_thread_entry()+0x1b8) [0x7fe9a0531758]
  10: (()+0x8182) [0x7fe9a00ea182]
  11: (clone()+0x6d) [0x7fe99f2ce30d]
 2014-11-03 14:21:02.413259 7fe99984f700 -1 *** Caught signal (Segmentation
 fault) **
  in thread 7fe99984f700

  ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
  1: (()+0x22a4f) [0x7fe9a1745a4f]
  2: (()+0x10340) [0x7fe9a00f2340]
  3: (librbd::aio_read(librbd::ImageCtx*, std::vectorstd::pairunsigned
 long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long
  const, char*, ceph::buffer::list*, librbd::AioCompletion*)+0x24)
 [0x7fe9a125daf4]
  4: (librbd::AioRequest::read_from_parent(std::vectorstd::pairunsigned
 long, unsigned long, std::allocatorstd::pairunsigned long, unsigned long
 )+0x85) [0x7fe9a1242745]
  5: (librbd::AioRead::should_complete(int)+0x352) [0x7fe9a1242ca2]
  6: (librbd::rados_req_cb(void*, void*)+0x1b) [0x7fe9a124cd7b]
  7: (librados::C_AioComplete::finish(int)+0x1d) [0x7fe9a04a355d]
  8: (Context::complete(int)+0x9) [0x7fe9a0480579]
  9: (Finisher::finisher_thread_entry()+0x1b8) [0x7fe9a0531758]
  10: (()+0x8182) [0x7fe9a00ea182]
  11: (clone()+0x6d) [0x7fe99f2ce30d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to
 interpret this.

 --- begin dump of recent events ---
   -113 2014-11-03 14:21:01.948799 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command perfcounters_dump hook 0x7fe9a3349ee0
   -112 2014-11-03 14:21:01.948850 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command 1 hook 0x7fe9a3349ee0
   -111 2014-11-03 14:21:01.948856 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command perf dump hook 0x7fe9a3349ee0
   -110 2014-11-03 14:21:01.948894 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command perfcounters_schema hook 0x7fe9a3349ee0
   -109 2014-11-03 14:21:01.948906 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command 2 hook 0x7fe9a3349ee0
   -108 2014-11-03 14:21:01.948915 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command perf schema hook 0x7fe9a3349ee0
   -107 2014-11-03 14:21:01.948919 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command config show hook 0x7fe9a3349ee0
   -106 2014-11-03 14:21:01.948931 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command config set hook 0x7fe9a3349ee0
   -105 2014-11-03 14:21:01.948936 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command config get hook 0x7fe9a3349ee0
   -104 2014-11-03 14:21:01.948944 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command log flush hook 0x7fe9a3349ee0
   -103 2014-11-03 14:21:01.948947 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command log dump hook 0x7fe9a3349ee0
   -102 2014-11-03 14:21:01.948954 7fe9a170c7c0  5 asok(0x7fe9a33483f0)
 register_command log reopen hook 0x7fe9a3349ee0
   -101 2014-11-03 14:21:01.955080 7fe9a170c7c0 10 monclient(hunting):
 build_initial_monmap
   -100 2014-11-03 14:21:01.955154 7fe9a170c7c0  1 librados: starting msgr
 at :/0
-99 2014-11-03 14:21:01.955169 7fe9a170c7c0  1 librados: starting
 objecter
-98 2014-11-03 14:21:01.955227 7fe9a170c7c0  1 -- :/0 messenger.start
-97 2014-11-03 14:21:01.955271 7fe9a170c7c0  1 librados: setting wanted
 keys
-96 2014-11-03 14:21:01.955279 7fe9a170c7c0  1 librados: calling
 monclient init
-95 2014-11-03 14:21:01.955280 7fe9a170c7c0 10 monclient(hunting): init
-94 2014-11-03 14:21:01.955295 7fe9a170c7c0  5 adding auth protocol:
 cephx
-93 2014-11-03 14:21:01.955304 7fe9a170c7c0 10 monclient(hunting):
 auth_supported 2 method cephx
-92 2014-11-03 14:21:01.955521 7fe9a170c7c0  2 auth: KeyRing::load:
 loaded key file /etc/ceph/ceph.client.admin.keyring
-91 2014-11-03 14:21:01.955627 7fe9a170c7c0 10 monclient(hunting):
 _reopen_session rank -1 name
-90 2014-11-03 14:21:01.955718 7fe9a170c7c0 10 monclient(hunting):
 picked mon.noname-a con 0x7fe9a336b660 addr 192.129.0.230:6789/0
-89 2014-11-03 14:21:01.955769 7fe9a170c7c0 10 monclient(hunting):
 _send_mon_message to mon.noname-a at 

Re: [ceph-users] rhel7 krbd backported module repo ?

2014-11-03 Thread Ilya Dryomov
On Mon, Nov 3, 2014 at 7:35 AM, Alexandre DERUMIER aderum...@odiso.com wrote:
 Hi,

 I would like to known if a repository is available for rhel7/centos7 with 
 last krbd module backported ?


 I known that such module is available in ceph enterprise repos, but is it 
 available for non subscribers ?

Not that I know of.  krbd *fixes* are getting backported to stable
kernels regularly though.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD MTBF

2014-11-03 Thread Emmanuel Lacour
On Mon, Sep 29, 2014 at 10:31:03AM +0200, Emmanuel Lacour wrote:
 
 Dear ceph users,
 
 
 we are managing ceph clusters since 1 year now. Our setup is typically
 made of Supermicro servers with OSD sata drives and journal on SSD.
 
 Those SSD are all failing one after the other after one year :(
 
 We used Samsung 850 pro (120Go) with two setup (small nodes with 2 ssd,
 2 HD in 1U):
 

s/850/840

A quick update on this, those SSDs continues to fails, we replace each
with Intel S3700 and are rebuilding nodes with a different partition
table (RAID only for OS, one journal on each SSD, over provisionning).

We sent back Samsung SSD for warranty, its'very easy and one week later
we receive SSD with same S/N and smart ok but ... we tried to use back
two of those and they failed one day later. So sorry for samsung, but I
definitely do not recommend using 840 Pro on ceph clusters!


-- 
Easter-eggs  Spécialiste GNU/Linux
44-46 rue de l'Ouest  -  75014 Paris  -  France -  Métro Gaité
Phone: +33 (0) 1 43 35 00 37-   Fax: +33 (0) 1 43 35 00 76
mailto:elac...@easter-eggs.com  -   http://www.easter-eggs.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] giant release osd down

2014-11-03 Thread Christian Balzer

Hello,

On Mon, 3 Nov 2014 01:01:32 -0500 (EST) Ian Colle wrote:

 Christian,
 
 Why are you not fond of ceph-deploy?
 
In short, this very thread.

Ceph-deploy hides a number of things from the users that are pretty vital
for a working ceph cluster and insufficiently or not at all documented in
the manual-deploy documentation.
Specifically the GPT magic, which isn't documented at all (and no,
dissecting python code or some blurb on GIT is not the same as
documentation on the Ceph homepage) and flag files like sysvinit.
There a numerous cases in this ML where people wound up with OSDs that
didn't start (at least at boot time) due to this omission, dependence on
ceph-deploy.

That GPT magic also makes things a lot less flexible (can't use a full
device, have to partition it first) and leads to hilarious things like
ceph-deploy preparing an OSD and udev happily starting it up even though
that wasn't requested.

So when people fail to do a manual deploy the answer tends to be use
ceph-deploy (and go from there in my particular reply) instead of Did
you follow the docs in section blah?.

Then there are problems with ceph-deploy itself, like correctly picking up
formatting parameters from the config, but NOT defaulting to the
filesystem type specified there.
And since it's role is supposed to be helping people with quick deployment
(and teardown) of test clusters, the lack of the remove functionality for
OSDs isn't particular helpful either.

Christian

 Ian R. Colle
 Global Director
 of Software Engineering
 Red Hat (Inktank is now part of Red Hat!)
 http://www.linkedin.com/in/ircolle
 http://www.twitter.com/ircolle
 Cell: +1.303.601.7713
 Email: ico...@redhat.com
 
 - Original Message -
 From: Christian Balzer ch...@gol.com
 To: ceph-us...@ceph.com
 Cc: Shiv Raj Singh virk.s...@gmail.com
 Sent: Sunday, November 2, 2014 8:37:18 AM
 Subject: Re: [ceph-users] giant release osd down
 
 
 Hello,
 
 On Mon, 3 Nov 2014 00:48:20 +1300 Shiv Raj Singh wrote:
 
  Hi All
  
  I am new to ceph and I have been trying to configure 3 node ceph
  cluster with 1 monitor and 2 osd nodes. I have reinstall and recreated
  the cluster three teams and I ma stuck against the wall . My monitor is
  working as desired (I guess) but the status of the ods is down. I am
  following this link
  http://docs.ceph.com/docs/v0.80.5/install/manual-deployment/ for
  configuring the osd. The reason why I am not using ceph-deply is
  because I want to understand the technology.
  
  can someone please help e udnerstand what im doing wrong !! :-) !!
  
 a) You're using OSS. Caveat emperor and so forth.
 In particular you seem to be following documentation for Firefly while
 the 64 PGs below indicate that you're actually installing Giant.
 
 b) Since Firefly Ceph defaults to a replication size of 3, so 2 OSD won't
 do.
 
 c) But wait, you specified a pool size of 2 in your OSD section! Tough
 luck, because since Firefly there is a bug that at the very least
 prevents OSD and RGW parameters from being parsed outside the global
 section (which incidentally is what the documentation you cited
 suggests...)
 
 d) Your OSDs are down, so all of the above is (kinda) pointless.
 
 So without further info (log files, etc) we won't be able to help you
 much.
 
 My suggestion would be to take the above to heart, try with ceph-deploy
 (which I'm not fond of) and if that works try again manually and see
 where it fails.
 
 Regards,
 
 Christian
 
  *Some useful diagnostic information *
  ceph2:~$ ceph osd tree
  # idweight  type name   up/down reweight
  -1  2   root default
  -3  1   host ceph2
  0   1   osd.0   down0
  -2  1   host ceph3
  1   1   osd.1   down0
  
  ceph health detail
  HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
  pg 0.22 is stuck inactive since forever, current state creating, last
  acting []
  pg 0.21 is stuck inactive since forever, current state creating, last
  acting []
  pg 0.20 is stuck inactive since forever, current state creating, last
  acting []
  
  
  ceph -s
  cluster a04ee359-82f8-44c4-89b5-60811bef3f19
   health HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
   monmap e1: 1 mons at {ceph1=192.168.101.41:6789/0}, election epoch
  1, quorum 0 ceph1
   osdmap e9: 2 osds: 0 up, 0 in
pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects
  0 kB used, 0 kB / 0 kB avail
64 creating
  
  
  My configurations are as below:
  
  sudo nano /etc/ceph/ceph.conf
  
  [global]
  
  fsid = a04ee359-82f8-44c4-89b5-60811bef3f19
  mon initial members = ceph1
  mon host = 192.168.101.41
  public network = 192.168.101.0/24
  
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  
  
  
  [osd]
  osd journal size = 1024
  filestore xattr use omap 

Re: [ceph-users] rhel7 krbd backported module repo ?

2014-11-03 Thread Dan van der Ster
There's this one:

http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/

But that hasn't been updated since July.

Cheers, Dan

On Mon Nov 03 2014 at 5:35:23 AM Alexandre DERUMIER aderum...@odiso.com
wrote:

 Hi,

 I would like to known if a repository is available for rhel7/centos7 with
 last krbd module backported ?


 I known that such module is available in ceph enterprise repos, but is it
 available for non subscribers ?

 Regards,

 Alexandre
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rhel7 krbd backported module repo ?

2014-11-03 Thread Alexandre DERUMIER
Not that I know of.  krbd *fixes* are getting backported to stable
kernels regularly though.

Thanks. (I was thinking more about new features support like coming discard 
support in 3.18 for example)

- Mail original - 

De: Ilya Dryomov ilya.dryo...@inktank.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: ceph-users ceph-users@lists.ceph.com 
Envoyé: Lundi 3 Novembre 2014 10:09:14 
Objet: Re: [ceph-users] rhel7 krbd backported module repo ? 

On Mon, Nov 3, 2014 at 7:35 AM, Alexandre DERUMIER aderum...@odiso.com wrote: 
 Hi, 
 
 I would like to known if a repository is available for rhel7/centos7 with 
 last krbd module backported ? 
 
 
 I known that such module is available in ceph enterprise repos, but is it 
 available for non subscribers ? 

Not that I know of. krbd *fixes* are getting backported to stable 
kernels regularly though. 

Thanks, 

Ilya 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rhel7 krbd backported module repo ?

2014-11-03 Thread Alexandre DERUMIER
http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/ 


But that hasn't been updated since July. 

Great ! Thanks! 

(I think it's build from https://github.com/ceph/ceph-client/tree/rhel7  ?)

- Mail original - 

De: Dan van der Ster daniel.vanders...@cern.ch 
À: Alexandre DERUMIER aderum...@odiso.com, ceph-users 
ceph-users@lists.ceph.com 
Envoyé: Lundi 3 Novembre 2014 10:17:51 
Objet: Re: [ceph-users] rhel7 krbd backported module repo ? 

There's this one: 


http://gitbuilder.ceph.com/kmod-rpm-rhel7beta-x86_64-basic/ref/rhel7/x86_64/ 


But that hasn't been updated since July. 


Cheers, Dan 


On Mon Nov 03 2014 at 5:35:23 AM Alexandre DERUMIER  aderum...@odiso.com  
wrote: 


Hi, 

I would like to known if a repository is available for rhel7/centos7 with last 
krbd module backported ? 


I known that such module is available in ceph enterprise repos, but is it 
available for non subscribers ? 

Regards, 

Alexandre 
__ _ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Hello all,

I upgraded my cluster to Giant. Everything is working well, but on one
mon I get a strange error when I do rados df :

root@a-mon:~# rados df
2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 
10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0xe37b20).fault
pool name   category KB  objects   clones
degraded  unfound   rdrd KB   wrwr
KB
data-  0  88057910
0   0   434686   434686  90533620
metadata-  63991517680
0   0  1852535   1746370585 15900570178050318
wimi-files  - 8893618079  99833970
0   0   296284  2747513 18874311   8951883370
wimi-recette-files - 978453   235134
00   0   272389  1321262   498429
1042175
  total used 27056765864 19076090
  total avail78381176704
  total space   105437942568

root@a-mon:~# ceph -v
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)


In the same cluster, on another mon, no problem :

root@c-mon:/etc/ceph# rados df
pool name   category KB  objects   clones
degraded  unfound   rdrd KB   wrwr
KB
data-  0  88056340
0   0   434686   434686  90532050
metadata-  63626517680
0   0  1852535   1746370585 15900450178049886
wimi-files  - 8893618079  99833970
0   0   296284  2747513 18874311   8951883370
wimi-recette-files - 978449   235100
00   0   272352  1321225   498232
1042138
  total used 27056761472 19075899
  total avail78381181096
  total space   105437942568

root@c-mon:/etc/ceph# ceph -v
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

Is it a known error ?
I can fill a formal bug report if needed. This problem is not important,
but I fear implications outside of rados df.

Regards,
-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information





-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Error creating monitors

2014-11-03 Thread Sakhi Hadebe

Can  someone please help out. I am stuck


Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, 
Meraka, CSIR

Tel:   +27 12 841 2308 
Fax:   +27 12 841 4223 
Cell:  +27 71 331 9622 
Email: shad...@csir.co.za


 Sakhi Hadebe 10/31/2014 1:28 PM 

Hi Support, 



I attempt to test ceph storage cluster on a 3 node cluster. I have installed 
Ubuntu 12.04 LTS in all 3 nodes.  


While attempting to create the monitors fro node 2 and node3, I am getting the 
error below: 


[ceph-node3][ERROR ] admin_socket: exception getting command descriptions: 
[Errno 2] No such file or directory 


But mon.ceph1 gets created with no errors. What could I be doing wrong? 


These commands are executed on the primary node, node1. 


Please help. 

Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, 
Meraka, CSIR

Tel:   +27 12 841 2308 
Fax:   +27 12 841 4223 
Cell:  +27 71 331 9622 
Email: shad...@csir.co.za



-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail 
legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at 
http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.

Please consider the environment before printing this email.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Update : this error is linked to a crashed mon. It crashed during the
weekend. I try to understand why. I never had a mon crash before Giant.

-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote:
 Hello all,
 
 I upgraded my cluster to Giant. Everything is working well, but on one
 mon I get a strange error when I do rados df :
 
 root@a-mon:~# rados df
 2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 
 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0xe37b20).fault
 pool name   category KB  objects   clones
 degraded  unfound   rdrd KB   wrwr
 KB
 data-  0  88057910
 0   0   434686   434686  90533620
 metadata-  63991517680
 0   0  1852535   1746370585 15900570178050318
 wimi-files  - 8893618079  99833970
 0   0   296284  2747513 18874311   8951883370
 wimi-recette-files - 978453   235134
 00   0   272389  1321262   498429
 1042175
   total used 27056765864 19076090
   total avail78381176704
   total space   105437942568
 
 root@a-mon:~# ceph -v
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 
 
 In the same cluster, on another mon, no problem :
 
 root@c-mon:/etc/ceph# rados df
 pool name   category KB  objects   clones
 degraded  unfound   rdrd KB   wrwr
 KB
 data-  0  88056340
 0   0   434686   434686  90532050
 metadata-  63626517680
 0   0  1852535   1746370585 15900450178049886
 wimi-files  - 8893618079  99833970
 0   0   296284  2747513 18874311   8951883370
 wimi-recette-files - 978449   235100
 00   0   272352  1321225   498232
 1042138
   total used 27056761472 19075899
   total avail78381181096
   total space   105437942568
 
 root@c-mon:/etc/ceph# ceph -v
 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 
 Is it a known error ?
 I can fill a formal bug report if needed. This problem is not important,
 but I fear implications outside of rados df.
 
 Regards,
 -- 
 Thomas Lemarchand
 Cloud Solutions SAS - Responsable des systèmes d'information
 
 
 
 
 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Update : 

/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
[21787] 0 21780   492110   185044 920   240143 0
ceph-mon
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
[13136] 0 1313652172 1753  590 0
ceph
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out
of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262]
Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB,
file-rss:0kB

OOM kill.
I have 1GB memory on my mons, and 1GB swap.
It's the only mon that crashed. Is there a change in memory requirement
from Firefly ?

Regards,
-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On lun., 2014-11-03 at 11:47 +0100, Thomas Lemarchand wrote:
 Update : this error is linked to a crashed mon. It crashed during the
 weekend. I try to understand why. I never had a mon crash before Giant.
 
 -- 
 Thomas Lemarchand
 Cloud Solutions SAS - Responsable des systèmes d'information
 
 
 
 On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote:
  Hello all,
  
  I upgraded my cluster to Giant. Everything is working well, but on one
  mon I get a strange error when I do rados df :
  
  root@a-mon:~# rados df
  2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 
  10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
  c=0xe37b20).fault
  pool name   category KB  objects   clones
  degraded  unfound   rdrd KB   wrwr
  KB
  data-  0  88057910
  0   0   434686   434686  90533620
  metadata-  63991517680
  0   0  1852535   1746370585 15900570178050318
  wimi-files  - 8893618079  99833970
  0   0   296284  2747513 18874311   8951883370
  wimi-recette-files - 978453   235134
  00   0   272389  1321262   498429
  1042175
total used 27056765864 19076090
total avail78381176704
total space   105437942568
  
  root@a-mon:~# ceph -v
  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
  
  
  In the same cluster, on another mon, no problem :
  
  root@c-mon:/etc/ceph# rados df
  pool name   category KB  objects   clones
  degraded  unfound   rdrd KB   wrwr
  KB
  data-  0  88056340
  0   0   434686   434686  90532050
  metadata-  63626517680
  0   0  1852535   1746370585 15900450178049886
  wimi-files  - 8893618079  99833970
  0   0   296284  2747513 18874311   8951883370
  wimi-recette-files - 978449   235100
  00   0   272352  1321225   498232
  1042138
total used 27056761472 19075899
total avail78381181096
total space   105437942568
  
  root@c-mon:/etc/ceph# ceph -v
  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
  
  Is it a known error ?
  I can fill a formal bug report if needed. This problem is not important,
  but I fear implications outside of rados df.
  
  Regards,
  -- 
  Thomas Lemarchand
  Cloud Solutions SAS - Responsable des systèmes d'information
  
  
  
  
  
 
 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where to download 0.87 RPMS?

2014-11-03 Thread Tim Serong
On 11/01/2014 05:10 AM, Patrick McGarry wrote:
 As I understand it SUSE does their own builds of things. Just on
 cursory examination it looks like the following repo uses Firefly:
 
 https://susestudio.com/a/HVbCUu/master-ceph

This is Jan Kalcic's ceph appliance, using packages from:

  http://download.opensuse.org/repositories/home:/jkalcic:/ceph/

These are built from
https://build.opensuse.org/project/show/home:jkalcic:ceph (firefly
0.80.1), which is building against SLE 11 SP3.

We've got a slightly newer Firefly (0.80.5) in
https://build.opensuse.org/package/show/filesystems/ceph which is
building for several versions of openSUSE, but SLES is not presently
enabled (I'm not sure why offhand :-/)

IIRC there was some discussion among a few of us about having a specific
subproject (filesystems:ceph) on build.opensuse.org, where we could
offer builds of ceph for various different SUSE Linuxen without
implicitly pulling in the 100-odd non-ceph-related packages from the
filesystems repo.  I'll see about chasing this up.

 
 and there is some Calamari work going in here:
 https://susestudio.com/a/eEqfPk/calamari-opensuse-13-1

This is a Calamari appliance I was experimenting with, using packages from:

  http://download.opensuse.org/repositories/systemsmanagement:/calamari/

These are quite current, and are built from
https://build.opensuse.org/project/show/systemsmanagement:calamari (the
calamari stuff is only building for openSUSE, but salt and diamond here
are building for SLE 11 SP3, so as to allow SLE 11 SP3 ceph clusters to
hook up to a calamari server running on openSUSE).

 My guess is that the master-ceph repo will be updated to Giant once
 they have a chance to get to it, but I'm guessing Tim Serong from SUSE
 could probably shed more light on that if he is available.

Yeah, someone needs to get filesystems/ceph on build.opensuse.org
updated to Giant (or moved to filesystems:ceph/ceph then updated to
Giant), but nobody has had a chance yet.

Regards,

Tim

 
 
 Best Regards,
 
 Patrick McGarry
 Director Ceph Community || Red Hat
 http://ceph.com  ||  http://community.redhat.com
 @scuttlemonkey || @ceph
 
 
 On Fri, Oct 31, 2014 at 1:55 PM, Sanders, Bill
 bill.sand...@teradata.com wrote:
 No SLES rpm's this release or for Firefly.  Is there an issue with building
 for SLES, or is it just no longer targeted?

 Bill
 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Patrick
 McGarry [patr...@inktank.com]
 Sent: Friday, October 31, 2014 4:46 AM
 To: Kenneth Waegeman
 Cc: ceph-users@lists.ceph.com

 Subject: Re: [ceph-users] where to download 0.87 RPMS?

 Might be worth looking at the new download infrastructure. If you always
 want the latest you can try:

 http://download.ceph.com/ceph/latest/

 On Oct 31, 2014 6:17 AM, Kenneth Waegeman kenneth.waege...@ugent.be
 wrote:



 Thanks. It would be nice though to have a repo where all the packages are.
 We lock our packages ourselves, so we would just need to bump the version
 instead of adding a repo for each major version:)


 - Message from Irek Fasikhov malm...@gmail.com -
Date: Thu, 30 Oct 2014 13:37:34 +0400
From: Irek Fasikhov malm...@gmail.com
 Subject: Re: [ceph-users] where to download 0.87 RPMS?
  To: Kenneth Waegeman kenneth.waege...@ugent.be
  Cc: Patrick McGarry patr...@inktank.com, ceph-users
 ceph-users@lists.ceph.com


 Hi.

 Use http://ceph.com/rpm-giant/

 2014-10-30 12:34 GMT+03:00 Kenneth Waegeman kenneth.waege...@ugent.be:

 Hi,

 Will http://ceph.com/rpm/ also be updated to have the giant packages?

 Thanks

 Kenneth




 - Message from Patrick McGarry patr...@inktank.com -
Date: Wed, 29 Oct 2014 22:13:50 -0400
From: Patrick McGarry patr...@inktank.com
 Subject: Re: [ceph-users] where to download 0.87 RPMS?
  To: 廖建锋 de...@f-club.cn
  Cc: ceph-users ceph-users@lists.ceph.com



  I have updated the http://ceph.com/get page to reflect a more generic

 approach to linking.  It's also worth noting that the new
 http://download.ceph.com/ infrastructure is available now.

 To get to the rpms specifically you can either crawl the
 download.ceph.com tree or use the symlink at
 http://ceph.com/rpm-giant/

 Hope that (and the updated linkage on ceph.com/get) helps.  Thanks!


 Best Regards,

 Patrick McGarry
 Director Ceph Community || Red Hat
 http://ceph.com  ||  http://community.redhat.com
 @scuttlemonkey || @ceph


 On Wed, Oct 29, 2014 at 9:15 PM, 廖建锋 de...@f-club.cn wrote:




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  ___

 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 - End message from Patrick McGarry patr...@inktank.com -

 --

 Met vriendelijke groeten,
 Kenneth Waegeman


 

Re: [ceph-users] giant release osd down

2014-11-03 Thread Sage Weil
On Mon, 3 Nov 2014, Mark Kirkwood wrote:
 On 03/11/14 14:56, Christian Balzer wrote:
  On Sun, 2 Nov 2014 14:07:23 -0800 (PST) Sage Weil wrote:
  
   On Mon, 3 Nov 2014, Christian Balzer wrote:
c) But wait, you specified a pool size of 2 in your OSD section! Tough
luck, because since Firefly there is a bug that at the very least
prevents OSD and RGW parameters from being parsed outside the global
section (which incidentally is what the documentation you cited
suggests...)
   
   It just needs to be in the [mon] or [global] section.
   
  While that is true for the pool default values and even documented
  (not the [mon] bit from a quick glance though) wouldn't you agree that
  having osd* parameters that don't work inside the [osd] section to be at
  the very least very non-intuitive?
  
  Also as per the below thread, clearly something more systemic is going on
  with config parsing:
  https://www.mail-archive.com/ceph-users@lists.ceph.com/msg13859.html

Ah, I missed that thread.  Sounds like three separate bugs:

- pool defaults not used for initial pools
- osd_mkfs_type not respected by ceph-disk
- osd_* settings not working

The last one is a real shock; I would expect all kinds of things to break 
very badly if the [osd] section config behavior was not working.  I take 
it you mean these options:

osd_op_threads = 10
osd_scrub_load_threshold = 2.5

How did you determine that they weren't taking effect?  You can do 'ceph 
daemon osd.NNN config show | grep osd_op_threads' to see the value in teh 
running process.

If you have a moment, can you also open tickets in teh tracker for the 
other two?

Thanks!
sage


  
 +1
 
 I'd like to see clear(er) descriptions (and perhaps enforcement?) of which
 parameters go in which section.
 
 I'm with Christian on this - osd* params that don't work inside the [osd]
 section is just a foot gun for new (ahem - and not so new) users!
 
 regards
 
 Mark
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs survey results

2014-11-03 Thread Sage Weil
In the Ceph session at the OpenStack summit someone asked what the CephFS 
survey results looked like.  Here's the link:

https://www.surveymonkey.com/results/SM-L5JV7WXL/

In short, people want

fsck
multimds
snapshots
quotas

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] giant release osd down

2014-11-03 Thread Christian Balzer
On Mon, 3 Nov 2014 06:02:08 -0800 (PST) Sage Weil wrote:

 On Mon, 3 Nov 2014, Mark Kirkwood wrote:
  On 03/11/14 14:56, Christian Balzer wrote:
   On Sun, 2 Nov 2014 14:07:23 -0800 (PST) Sage Weil wrote:
   
On Mon, 3 Nov 2014, Christian Balzer wrote:
 c) But wait, you specified a pool size of 2 in your OSD section!
 Tough luck, because since Firefly there is a bug that at the
 very least prevents OSD and RGW parameters from being parsed
 outside the global section (which incidentally is what the
 documentation you cited suggests...)

It just needs to be in the [mon] or [global] section.

   While that is true for the pool default values and even documented
   (not the [mon] bit from a quick glance though) wouldn't you agree
   that having osd* parameters that don't work inside the [osd] section
   to be at the very least very non-intuitive?
   
   Also as per the below thread, clearly something more systemic is
   going on with config parsing:
   https://www.mail-archive.com/ceph-users@lists.ceph.com/msg13859.html
 
 Ah, I missed that thread.  Sounds like three separate bugs:
 
 - pool defaults not used for initial pools
Precisely. Not as much of a biggy with Giant, as only RBD gets created by
default and that is easily deleted and re-created.
But counter-intuitive nevertheless. 

 - osd_mkfs_type not respected by ceph-disk
If that is what ceph-deploy uses (and doesn't overwrite internally), yes.

 - osd_* settings not working

The * I can not be sure of, but for the options below, yes.

Also read the entire thread, this at the very least also affects radosgw
settings.  

 The last one is a real shock; I would expect all kinds of things to
 break very badly if the [osd] section config behavior was not working.
Most of those not working will actually have little, immediately
noticeable impact. 

 I take it you mean these options:
 
 osd_op_threads = 10
 osd_scrub_load_threshold = 2.5
 
 How did you determine that they weren't taking effect?  You can do 'ceph 
 daemon osd.NNN config show | grep osd_op_threads' to see the value in
 teh running process.
 
I did exactly that (back in emperor times) and again now (see the last
mail by me in that thread).

 If you have a moment, can you also open tickets in teh tracker for the 
 other two?

I will, probably not before Wednesday though. I was basically waiting for
somebody from the dev team to pipe up, like in this mail.

It's probably bothersome to monitor a ML for stuff like this, but on the
other hand if the official stance is all bug reports to the tracker then
expect to comb through a lot (more than already) brainfarts in there.

Christian

 
 
   
  +1
  
  I'd like to see clear(er) descriptions (and perhaps enforcement?) of
  which parameters go in which section.
  
  I'm with Christian on this - osd* params that don't work inside the
  [osd] section is just a foot gun for new (ahem - and not so new) users!
  
  regards
  
  Mark
  
  
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
Hi All,
   I upgraded from emperor to firefly.  Initial upgrade went smoothly and all 
placement groups were active+clean .
  Next I executed
'ceph osd crush tunables optimal'
  to upgrade CRUSH mapping.
  Now I keep having OSDs go down or have requests blocked for long periods of 
time.
  I start back up the down OSDs and recovery eventually stops, but with 100s 
of incomplete and down+incomplete pgs remaining.
  The ceph web page says If you see this state [incomplete], report a bug, 
and try to start any failed OSDs that may contain the needed information.  
Well, all the OSDs are up, though some have blocked requests.

Also, the logs of the OSDs which go down have this message:
2014-11-02 21:46:33.615829 7ffcf0421700  0 -- 192.168.164.192:6810/31314  
192.168.164.186:6804/20934 pipe(0x2faa0280 sd=261 :6810 s=2 pgs=9
19 cs=25 l=0 c=0x2ed022c0).fault with nothing to send, going to standby
2014-11-02 21:49:11.440142 7ffce4cf3700  0 -- 192.168.164.192:6810/31314  
192.168.164.186:6804/20934 pipe(0xe512a00 sd=249 :6810 s=0 pgs=0 
cs=0 l=0 c=0x2a308b00).accept connect_seq 26 vs existing 25 state standby
2014-11-02 21:51:20.085676 7ffcf6e3e700 -1 osd/PG.cc: In function 
'PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryS
tate::Crashed, PG::RecoveryState::RecoveryMachine::my_context)' thread 
7ffcf6e3e700 time 2014-11-02 21:51:20.052242
osd/PG.cc: 5424: FAILED assert(0 == we got a bad state machine event)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: 
(PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryState::Crashed,
 
PG::RecoveryState::RecoveryMachine, boost::mpl:
:listmpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_:
:na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
(boost::statechart::history_mode)0::my_context)+0x12f) [0x87c6ef]
 2: /usr/bin/ceph-osd() [0x8aeae9]
 3: (boost::statechart::detail::reaction_result 
boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachin
e, PG::RecoveryState::Start, 
(boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list2boost::state
chart::custom_reactionPG::IntervalFlush, 
boost::statechart::transitionboost::statechart::event_base, 
PG::RecoveryState::Crashed, boost::st
atechart::detail::no_contextboost::statechart::event_base, 
boost::statechart::detail::no_contextboost::statechart::event_base::no_functi
on , boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boost::stat
echart::history_mode)0 
(boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachine, PG::RecoveryState::
Start, (boost::statechart::history_mode)0, boost::statechart::event_base 
const, void const*)+0xbf) [0x8dd3ff]
 4: (boost::statechart::detail::reaction_result 
boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachin
e, PG::RecoveryState::Start, 
(boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list3boost::state
chart::custom_reactionPG::FlushedEvt, 
boost::statechart::custom_reactionPG::IntervalFlush, 
boost::statechart::transitionboost::statechar
t::event_base, PG::RecoveryState::Crashed, 
boost::statechart::detail::no_contextboost::statechart::event_base, 
boost::statechart::detail::
no_contextboost::statechart::event_base::no_function , 
boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::Rec
overyMachine, PG::RecoveryState::Start, (boost::statechart::history_mode)0 
(boost::statechart::simple_statePG::RecoveryState::Started, PG:
:RecoveryState::RecoveryMachine, PG::RecoveryState::Start, 
(boost::statechart::history_mode)0, boost::statechart::event_base const, 
void c
onst*)+0x57) [0x8dd4e7]
 5: (boost::statechart::detail::reaction_result 
boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachin
e, PG::RecoveryState::Start, 
(boost::statechart::history_mode)0::local_react_impl_non_empty::local_react_implboost::mpl::list5boost::state
chart::custom_reactionPG::AdvMap, 
boost::statechart::custom_reactionPG::NullEvt, 
boost::statechart::custom_reactionPG::FlushedEvt, boos
t::statechart::custom_reactionPG::IntervalFlush, 
boost::statechart::transitionboost::statechart::event_base, 
PG::RecoveryState::Crashed, b
oost::statechart::detail::no_contextboost::statechart::event_base, 
boost::statechart::detail::no_contextboost::statechart::event_base::n
o_function , boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, (boo
st::statechart::history_mode)0 
(boost::statechart::simple_statePG::RecoveryState::Started, 
PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start, 
(boost::statechart::history_mode)0, boost::statechart::event_base const, 

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
P.S.  The OSDs interacted with some 3.14 krbd clients before I realized that 
kernel version was too old for the firefly CRUSH map.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?

2014-11-03 Thread Narendra Trivedi (natrived)
Thanks. I think  the limit is 100 by default and it can be disabled. As far as 
I understand, there are no object limit on radosgw side of things only from 
Swift end (i.e. 5GB) right? In short, if someone tries to upload a 1TB of 
object onto Swift + RadosGW, it has to be truncated at the Swift API layer 
using -segment-size of 5GB but there's no hard limitation imposed by radosgw... 
correct?

--Narendra

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daniel 
Schneller
Sent: Saturday, November 01, 2014 7:15 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Swift + radosgw: How do I find 
accounts/containers/objects limitation?


To remove the max_bucket limit I used



radosgw-admin user modify --uid=username --max-buckets=0



Off the top of my head, I think



radosgw-admin user info  --uid=username



will show you the current values without changing anything.

See also this thread I started about this topic a few weeks ago.



https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12840.html



Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 7:46 AM, Chad Seys cws...@physics.wisc.edu wrote:
 Hi All,
I upgraded from emperor to firefly.  Initial upgrade went smoothly and all
 placement groups were active+clean .
   Next I executed
 'ceph osd crush tunables optimal'
   to upgrade CRUSH mapping.

Okay...you know that's a data movement command, right? So you should
expect it to impact operations. (Although not the crashes you're
witnessing.)

   Now I keep having OSDs go down or have requests blocked for long periods of
 time.
   I start back up the down OSDs and recovery eventually stops, but with 100s
 of incomplete and down+incomplete pgs remaining.
   The ceph web page says If you see this state [incomplete], report a bug,
 and try to start any failed OSDs that may contain the needed information.
 Well, all the OSDs are up, though some have blocked requests.

 Also, the logs of the OSDs which go down have this message:
 2014-11-02 21:46:33.615829 7ffcf0421700  0 -- 192.168.164.192:6810/31314 
 192.168.164.186:6804/20934 pipe(0x2faa0280 sd=261 :6810 s=2 pgs=9
 19 cs=25 l=0 c=0x2ed022c0).fault with nothing to send, going to standby
 2014-11-02 21:49:11.440142 7ffce4cf3700  0 -- 192.168.164.192:6810/31314 
 192.168.164.186:6804/20934 pipe(0xe512a00 sd=249 :6810 s=0 pgs=0
 cs=0 l=0 c=0x2a308b00).accept connect_seq 26 vs existing 25 state standby
 2014-11-02 21:51:20.085676 7ffcf6e3e700 -1 osd/PG.cc: In function
 'PG::RecoveryState::Crashed::Crashed(boost::statechart::statePG::RecoveryS
 tate::Crashed, PG::RecoveryState::RecoveryMachine::my_context)' thread
 7ffcf6e3e700 time 2014-11-02 21:51:20.052242
 osd/PG.cc: 5424: FAILED assert(0 == we got a bad state machine event)

These failures are usually the result of adjusting tunables without
having upgraded all the machines in the cluster — although they should
also be fixed in v0.80.7. Are you still seeing crashes, or just the PG
state issues?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand
thomas.lemarch...@cloud-solutions.fr wrote:
 Update :

 /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
 [21787] 0 21780   492110   185044 920   240143 0
 ceph-mon
 /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
 [13136] 0 1313652172 1753  590 0
 ceph
 /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out
 of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child
 /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262]
 Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB,
 file-rss:0kB

 OOM kill.
 I have 1GB memory on my mons, and 1GB swap.
 It's the only mon that crashed. Is there a change in memory requirement
 from Firefly ?

There generally shouldn't be, but I don't think it's something we
monitored closely.
More likely your monitor was running near its memory limit already and
restarting all the OSDs (and servicing the resulting changes) pushed
it over the edge.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] giant release osd down

2014-11-03 Thread Shiv Raj Singh
Thanks for the comments guys

I'm going to deploy it from scratch and this time ill capture every price
of debug information. Hopefully this will give me the reasons why ,,

thanks

On Mon, Nov 3, 2014 at 7:01 PM, Ian Colle ico...@redhat.com wrote:

 Christian,

 Why are you not fond of ceph-deploy?

 Ian R. Colle
 Global Director
 of Software Engineering
 Red Hat (Inktank is now part of Red Hat!)
 http://www.linkedin.com/in/ircolle
 http://www.twitter.com/ircolle
 Cell: +1.303.601.7713
 Email: ico...@redhat.com

 - Original Message -
 From: Christian Balzer ch...@gol.com
 To: ceph-us...@ceph.com
 Cc: Shiv Raj Singh virk.s...@gmail.com
 Sent: Sunday, November 2, 2014 8:37:18 AM
 Subject: Re: [ceph-users] giant release osd down


 Hello,

 On Mon, 3 Nov 2014 00:48:20 +1300 Shiv Raj Singh wrote:

  Hi All
 
  I am new to ceph and I have been trying to configure 3 node ceph cluster
  with 1 monitor and 2 osd nodes. I have reinstall and recreated the
  cluster three teams and I ma stuck against the wall . My monitor is
  working as desired (I guess) but the status of the ods is down. I am
  following this link
  http://docs.ceph.com/docs/v0.80.5/install/manual-deployment/ for
  configuring the osd. The reason why I am not using ceph-deply is because
  I want to understand the technology.
 
  can someone please help e udnerstand what im doing wrong !! :-) !!
 
 a) You're using OSS. Caveat emperor and so forth.
 In particular you seem to be following documentation for Firefly while the
 64 PGs below indicate that you're actually installing Giant.

 b) Since Firefly Ceph defaults to a replication size of 3, so 2 OSD won't
 do.

 c) But wait, you specified a pool size of 2 in your OSD section! Tough
 luck, because since Firefly there is a bug that at the very least prevents
 OSD and RGW parameters from being parsed outside the global section (which
 incidentally is what the documentation you cited suggests...)

 d) Your OSDs are down, so all of the above is (kinda) pointless.

 So without further info (log files, etc) we won't be able to help you much.

 My suggestion would be to take the above to heart, try with ceph-deploy
 (which I'm not fond of) and if that works try again manually and see where
 it fails.

 Regards,

 Christian

  *Some useful diagnostic information *
  ceph2:~$ ceph osd tree
  # idweight  type name   up/down reweight
  -1  2   root default
  -3  1   host ceph2
  0   1   osd.0   down0
  -2  1   host ceph3
  1   1   osd.1   down0
 
  ceph health detail
  HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
  pg 0.22 is stuck inactive since forever, current state creating, last
  acting []
  pg 0.21 is stuck inactive since forever, current state creating, last
  acting []
  pg 0.20 is stuck inactive since forever, current state creating, last
  acting []
 
 
  ceph -s
  cluster a04ee359-82f8-44c4-89b5-60811bef3f19
   health HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
   monmap e1: 1 mons at {ceph1=192.168.101.41:6789/0}, election epoch
  1, quorum 0 ceph1
   osdmap e9: 2 osds: 0 up, 0 in
pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects
  0 kB used, 0 kB / 0 kB avail
64 creating
 
 
  My configurations are as below:
 
  sudo nano /etc/ceph/ceph.conf
 
  [global]
 
  fsid = a04ee359-82f8-44c4-89b5-60811bef3f19
  mon initial members = ceph1
  mon host = 192.168.101.41
  public network = 192.168.101.0/24
 
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
 
 
 
  [osd]
  osd journal size = 1024
  filestore xattr use omap = true
 
  osd pool default size = 2
  osd pool default min size = 1
  osd pool default pg num = 333
  osd pool default pgp num = 333
  osd crush chooseleaf type = 1
 
  [mon.ceph1]
  host = ceph1
  mon addr = 192.168.101.41:6789
 
 
  [osd.0]
  host = ceph2
  #devs = {path-to-device}
 
  [osd.1]
  host = ceph3
  #devs = {path-to-device}
 
 
  ..
 
  OSD mount location
 
  On ceph2
  /dev/sdb1  5.0G  1.1G  4.0G  21%
  /var/lib/ceph/osd/ceph-0
 
  on Ceph3
  /dev/sdb1  5.0G  1.1G  4.0G  21%
  /var/lib/ceph/osd/ceph-1
 
  My Linux OS
 
  lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04 LTS
  Release:14.04
  Codename:   trusty
 
  Regards
 
  Shiv


 --
 Christian BalzerNetwork/Systems Engineer
 ch...@gol.com   Global OnLine Japan/Fusion Communications
 http://www.gol.com/
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Swift + radosgw: How do I find accounts/containers/objects limitation?

2014-11-03 Thread Yehuda Sadeh
On Mon, Nov 3, 2014 at 9:37 AM, Narendra Trivedi (natrived)
natri...@cisco.com wrote:
 Thanks. I think  the limit is 100 by default and it can be disabled. As far
 as I understand, there are no object limit on radosgw side of things only
 from Swift end (i.e. 5GB) ….right? In short, if someone tries to upload a
 1TB of object onto Swift + RadosGW, it has to be truncated at the Swift API
 layer using –segment-size of 5GB but there’s no hard limitation imposed by
 radosgw… correct?

radosgw won't allow a segment that is larger than 5GB either.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
[ Re-adding the list. ]

On Mon, Nov 3, 2014 at 10:49 AM, Chad Seys cws...@physics.wisc.edu wrote:

Next I executed
 
  'ceph osd crush tunables optimal'
 
to upgrade CRUSH mapping.

 Okay...you know that's a data movement command, right?

 Yes.

 So you should expect it to impact operations.


 These failures are usually the result of adjusting tunables without
 having upgraded all the machines in the cluster — although they should
 also be fixed in v0.80.7. Are you still seeing crashes, or just the PG
 state issues?

 Still getting crashes. I believe all nodes are running 0.80.7 .  Does ceph
 have a command to check this?  (Otherwise I'll do an ssh-many to check.)

There's a ceph osd metadata command, but i don't recall if it's in
Firefly or only giant. :)


 Thanks!
 C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys

 There's a ceph osd metadata command, but i don't recall if it's in
 Firefly or only giant. :)

It's in firefly.  Thanks, very handy.

All the OSDs are running 0.80.7 at the moment.

What next?

Thanks again,
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
Okay, assuming this is semi-predictable, can you start up one of the
OSDs that is going to fail with debug osd = 20, debug filestore =
20, and debug ms = 1 in the config file and then put the OSD log
somewhere accessible after it's crashed?

Can you also verify that all of your monitors are running firefly, and
then issue the command ceph scrub and report the output?
-Greg

On Mon, Nov 3, 2014 at 11:07 AM, Chad Seys cws...@physics.wisc.edu wrote:

 There's a ceph osd metadata command, but i don't recall if it's in
 Firefly or only giant. :)

 It's in firefly.  Thanks, very handy.

 All the OSDs are running 0.80.7 at the moment.

 What next?

 Thanks again,
 Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
On Monday, November 03, 2014 13:22:47 you wrote:
 Okay, assuming this is semi-predictable, can you start up one of the
 OSDs that is going to fail with debug osd = 20, debug filestore =
 20, and debug ms = 1 in the config file and then put the OSD log
 somewhere accessible after it's crashed?

Alas, I have not yet noticed a pattern.  Only thing I think is true is that 
they go down when I first make CRUSH changes.  Then after restarting, they run 
without going down again.
All the OSDs are running at the moment.

What I've been doing is marking OUT the OSDs on which a request is blocked, 
letting the PGs recover, (drain the OSD of PGs completely), then remove and 
readd the OSD.

So far OSDs treated this way no longer have blocked requests.

Also, seems as though that slowly decreases the number of incomplete and 
down+incomplete PGs .

 
 Can you also verify that all of your monitors are running firefly, and
 then issue the command ceph scrub and report the output?

Sure, should I wait until the current rebalancing is finished?

Thanks,
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote:
 On Monday, November 03, 2014 13:22:47 you wrote:
 Okay, assuming this is semi-predictable, can you start up one of the
 OSDs that is going to fail with debug osd = 20, debug filestore =
 20, and debug ms = 1 in the config file and then put the OSD log
 somewhere accessible after it's crashed?

 Alas, I have not yet noticed a pattern.  Only thing I think is true is that
 they go down when I first make CRUSH changes.  Then after restarting, they run
 without going down again.
 All the OSDs are running at the moment.

Oh, interesting. What CRUSH changes exactly are you making that are
spawning errors?

 What I've been doing is marking OUT the OSDs on which a request is blocked,
 letting the PGs recover, (drain the OSD of PGs completely), then remove and
 readd the OSD.

 So far OSDs treated this way no longer have blocked requests.

 Also, seems as though that slowly decreases the number of incomplete and
 down+incomplete PGs .


 Can you also verify that all of your monitors are running firefly, and
 then issue the command ceph scrub and report the output?

 Sure, should I wait until the current rebalancing is finished?

I don't think it should matter, although I confess I'm not sure how
much monitor load the scrubbing adds. (It's a monitor check; doesn't
hit the OSDs at all.)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
On Monday, November 03, 2014 13:50:05 you wrote:
 On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote:
  On Monday, November 03, 2014 13:22:47 you wrote:
  Okay, assuming this is semi-predictable, can you start up one of the
  OSDs that is going to fail with debug osd = 20, debug filestore =
  20, and debug ms = 1 in the config file and then put the OSD log
  somewhere accessible after it's crashed?
  
  Alas, I have not yet noticed a pattern.  Only thing I think is true is
  that they go down when I first make CRUSH changes.  Then after
  restarting, they run without going down again.
  All the OSDs are running at the moment.
 
 Oh, interesting. What CRUSH changes exactly are you making that are
 spawning errors?

Maybe I miswrote:  I've been marking OUT OSDs with blocked requests.  Then if 
a OSD becomes too_full I use 'ceph osd reweight' to squeeze blocks off of the 
too_full OSD.  (Maybe that is not technically a CRUSH map change?)


 I don't think it should matter, although I confess I'm not sure how
 much monitor load the scrubbing adds. (It's a monitor check; doesn't
 hit the OSDs at all.)

$ ceph scrub
No output.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 12:28 PM, Chad Seys cws...@physics.wisc.edu wrote:
 On Monday, November 03, 2014 13:50:05 you wrote:
 On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys cws...@physics.wisc.edu wrote:
  On Monday, November 03, 2014 13:22:47 you wrote:
  Okay, assuming this is semi-predictable, can you start up one of the
  OSDs that is going to fail with debug osd = 20, debug filestore =
  20, and debug ms = 1 in the config file and then put the OSD log
  somewhere accessible after it's crashed?
 
  Alas, I have not yet noticed a pattern.  Only thing I think is true is
  that they go down when I first make CRUSH changes.  Then after
  restarting, they run without going down again.
  All the OSDs are running at the moment.

 Oh, interesting. What CRUSH changes exactly are you making that are
 spawning errors?

 Maybe I miswrote:  I've been marking OUT OSDs with blocked requests.  Then if
 a OSD becomes too_full I use 'ceph osd reweight' to squeeze blocks off of the
 too_full OSD.  (Maybe that is not technically a CRUSH map change?)

No, it is a change, I just want to make sure I understand the
scenario. So you're reducing CRUSH weights on full OSDs, and then
*other* OSDs are crashing on these bad state machine events?



 I don't think it should matter, although I confess I'm not sure how
 much monitor load the scrubbing adds. (It's a monitor check; doesn't
 hit the OSDs at all.)

 $ ceph scrub
 No output.

Oh, yeah, I think that output goes to the central log at a later time.
(Will show up in ceph -w if you're watching, or can be accessed from
the monitor nodes; in their data directory I think?)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?

2014-11-03 Thread Samuel Just
Can you reproduce with

debug osd = 20
debug filestore = 20
debug ms = 1

In the [osd] section of that osd's ceph.conf?
-Sam

On Sun, Nov 2, 2014 at 9:10 PM, Ta Ba Tuan tua...@vccloud.vn wrote:
 Hi Sage, Samuel  All,

 I upgraded to GAINT, but still appearing that errors |:
 I'm trying on deleting  related objects/volumes, but very hard to verify
 missing objects :(.

 Guide me to resolve it, please! (I send attached detail log).

 2014-11-03 11:37:57.730820 7f28fb812700  0 osd.21 105950 do_command r=0
 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal (Segmentation
 fault) **
  in thread 7f28fc013700

  ceph version 0.87-6-gdba7def (dba7defc623474ad17263c9fccfec60fe7a439f0)
  1: /usr/bin/ceph-osd() [0x9b6725]
  2: (()+0xfcb0) [0x7f291fc2acb0]
  3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811b55]
  4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim
 const)+0x43e) [0x82b9be]
  5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects,
 ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na,
 (boost::statechart::history_mode)0::react_impl(boost::statechart::event_base
 const, void const*)+0xc0) [0x870ce0]
  6: (boost::statechart::state_machineReplicatedPG::SnapTrimmer,
 ReplicatedPG::NotTrimming, std::allocatorvoid,
 boost::statechart::null_exception_translator::process_queued_events()+0xfb)
 [0x85618b]
  7: (boost::statechart::state_machineReplicatedPG::SnapTrimmer,
 ReplicatedPG::NotTrimming, std::allocatorvoid,
 boost::statechart::null_exception_translator::process_event(boost::statechart::event_base
 const)+0x1e) [0x85633e]
  8: (ReplicatedPG::snap_trimmer()+0x4f8) [0x7d5ef8]
  9: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x673ab4]
  10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa8fade]
  11: (ThreadPool::WorkThread::entry()+0x10) [0xa92870]
  12: (()+0x7e9a) [0x7f291fc22e9a]
  13: (clone()+0x6d) [0x7f291e5ed31d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to
 interpret this.

  -9993 2014-11-03 11:37:47.689335 7f28fc814700  1 -- 172.30.5.2:6803/7606
 -- 172.30.5.1:6886/3511 -- MOSDPGPull(6.58e 105950
 [PullOp(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6,
 recovery_info:
 ObjectRecoveryInfo(87f82d8e/rbd_data.45e62779c99cf1.22b5/head//6@105938'11622009,
 copy_subset: [0~18446744073709551615], clone_subset: {}), recovery_progress:
 ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false,
 omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x26c59000 con
 0x22fbc420
 
 -2 2014-11-03 11:37:57.853585 7f2902820700  5 osd.21 pg_epoch: 105950
 pg[24.9e4( v 105946'113392 lc 105946'113391 (103622'109598,105946'113392]
 local-les=1
 05948 n=88 ec=25000 les/c 105948/105943 105947/105947/105947) [21,112,33]
 r=0 lpr=105947 pi=105933-105946/4 crt=105946'113392 lcod 0'0 mlcod 0'0
 active+recovery
 _wait+degraded m=1 snaptrimq=[303~3,307~1]] enter
 Started/Primary/Active/Recovering
 -1 2014-11-03 11:37:57.853735 7f28fc814700  1 -- 172.30.5.2:6803/7606
 -- 172.30.5.9:6806/24552 -- MOSDPGPull(24.9e4 105950
 [PullOp(5abb99e4/rbd_data.5dd32
 f2ae8944a.0165/head//24, recovery_info:
 ObjectRecoveryInfo(5abb99e4/rbd_data.5dd32f2ae8944a.0165/head//24@105946'113392,
 copy_subset: [0
 ~18446744073709551615], clone_subset: {}), recovery_progress:
 ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false,
 omap_recovered_to:, omap_c
 omplete:false))]) v2 -- ?+0 0x229e7e00 con 0x22fb7000
  0 2014-11-03 11:37:57.856578 7f28fc013700 -1 *** Caught signal
 (Segmentation fault) **

 Thanks!
 --
 Tuan
 HaNoi-VietNam




 On 11/01/2014 09:21 AM, Ta Ba Tuan wrote:

 Hi Samuel and Sage,

 I will upgrde to Giant soon, Thank you so much.

 --
 Tuan
 HaNoi-VietNam

 On 11/01/2014 01:10 AM, Samuel Just wrote:

 You should start by upgrading to giant, many many bug fixes went in
 between .86 and giant.
 -Sam

 On Fri, Oct 31, 2014 at 8:54 AM, Ta Ba Tuan tua...@vccloud.vn wrote:

 Hi Sage Weil

 Thank for your repling. Yes, I'm using Ceph v.0.86,
 I report some related bugs, Hope you help me,

 2014-10-31 15:34:52.927965 7f85efb6b700  0 osd.21 104744 do_command r=0
 2014-10-31 15:34:53.105533 7f85f036c700 -1 *** Caught signal (Segmentation
 fault) **
   in thread 7f85f036c700
   ceph version 0.86-106-g6f8524e (6f8524ef7673ab4448de2e0ff76638deaf03cae8)
   1: /usr/bin/ceph-osd() [0x9b6655]
   2: (()+0xfcb0) [0x7f8615726cb0]
   3: (ReplicatedPG::trim_object(hobject_t const)+0x395) [0x811c25]
   4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim
 const)+0x43e) [0x82baae]
   5: (boost::statechart::simple_stateReplicatedPG::TrimmingObjects,
 ReplicatedPG::SnapTrimmer, boost::mpl::listmpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
 
 No, it is a change, I just want to make sure I understand the
 scenario. So you're reducing CRUSH weights on full OSDs, and then
 *other* OSDs are crashing on these bad state machine events?

That is right.  The other OSDs shutdown sometime later.  (Not immediately.)

I really haven't tested to see if the OSDs will stay up with if there are no 
manipulations.  Need to wait with the PGs to settle for awhile, which I 
haven't done yet.

 
  I don't think it should matter, although I confess I'm not sure how
  much monitor load the scrubbing adds. (It's a monitor check; doesn't
  hit the OSDs at all.)
  
  $ ceph scrub
  No output.
 
 Oh, yeah, I think that output goes to the central log at a later time.
 (Will show up in ceph -w if you're watching, or can be accessed from
 the monitor nodes; in their data directory I think?)

OK.  Will doing ceph scrub again result in the same output? If so, I'll run it 
again and look for output in ceph -w when the migrations have stopped.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs survey results

2014-11-03 Thread Blair Bethwaite
On 4 November 2014 01:50, Sage Weil s...@newdream.net wrote:
 In the Ceph session at the OpenStack summit someone asked what the CephFS
 survey results looked like.

Thanks Sage, that was me!

  Here's the link:

 https://www.surveymonkey.com/results/SM-L5JV7WXL/

 In short, people want

 fsck
 multimds
 snapshots
 quotas

TBH I'm a bit surprised by a couple of these and hope maybe you guys
will apply a certain amount of filtering on this...

fsck and quotas were there for me, but multimds and snapshots are what
I'd consider icing features - they're nice to have but not on the
critical path to using cephfs instead of e.g. nfs in a production
setting. I'd have thought stuff like small file performance and
gateway support was much more relevant to uptake and
positive/pain-free UX. Interested to hear others rationale here.

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Samuel Just
If you have osds that are close to full, you may be hitting 9626.  I
pushed a branch based on v0.80.7 with the fix, wip-v0.80.7-9626.
-Sam

On Mon, Nov 3, 2014 at 2:09 PM, Chad Seys cws...@physics.wisc.edu wrote:

 No, it is a change, I just want to make sure I understand the
 scenario. So you're reducing CRUSH weights on full OSDs, and then
 *other* OSDs are crashing on these bad state machine events?

 That is right.  The other OSDs shutdown sometime later.  (Not immediately.)

 I really haven't tested to see if the OSDs will stay up with if there are no
 manipulations.  Need to wait with the PGs to settle for awhile, which I
 haven't done yet.


  I don't think it should matter, although I confess I'm not sure how
  much monitor load the scrubbing adds. (It's a monitor check; doesn't
  hit the OSDs at all.)
 
  $ ceph scrub
  No output.

 Oh, yeah, I think that output goes to the central log at a later time.
 (Will show up in ceph -w if you're watching, or can be accessed from
 the monitor nodes; in their data directory I think?)

 OK.  Will doing ceph scrub again result in the same output? If so, I'll run it
 again and look for output in ceph -w when the migrations have stopped.

 Thanks!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] giant release osd down

2014-11-03 Thread Mark Kirkwood

On 04/11/14 03:02, Sage Weil wrote:

On Mon, 3 Nov 2014, Mark Kirkwood wrote:



Ah, I missed that thread.  Sounds like three separate bugs:

- pool defaults not used for initial pools
- osd_mkfs_type not respected by ceph-disk
- osd_* settings not working

The last one is a real shock; I would expect all kinds of things to break
very badly if the [osd] section config behavior was not working.


I wonder if this sort of thing has escaped notice because ceph-deploy 
seems to plonk stuff into [global] only, I guess this acts as an 
implicit encouragement to have everything in there (e.g I note in our 
production setup that we have the rbd_cache* settings in [global] 
instead of [client]).


regards

Mark




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd down question

2014-11-03 Thread ??
hello, I am running ceph v0.87 for one week, at this week, 
many osd have marking down, but I run ps -ef | grep osd, I can see
the osd process, the osd not really down, then, I check osd log,
I see many logs like  osd.XX from dead osd.YY,marking down,
if the 0.87 will check other osd process ? if some osd is down, then the mon 
will mark the current to down state ?
This will cause a chain reaction, leading to failure of the entire cluster,
it is a bug ?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com