Helper for state replication machine

2014-05-20 Thread Andrey Korolyov
Hello,

I do not know about how many of you aware of this work of Michael Hines
[0], but looks like it can be extremely usable for critical applications
using qemu and, of course, Ceph at the block level. My thought was that
if qemu rbd driver can provide any kind of metadata interface to mark
each atomic write, it can be easily used to check and replay machine
states on the acceptor side independently. Since Ceph replication is
asynchronous, there is no acceptable approach to tell when it`s time to
replay certain memory state on acceptor side, even if we are pushing all
writes in synchronous manner. I`d be happy to hear any suggestions on
this, because the result probably will be widely adopted by enterprise
users whose needs includes state replication and who are bounded to
VMWare by now. Of course, I am assuming worst case above, when primary
replica shifts during disaster state and there are at least two sites
holding primary and non-primary replica sets, with 100% distinction of
primary role (=0.80). Of course there are a lot of points to discuss,
like 'fallback' primary affinity and so on, but I`d like to ask first of
possibility to implement such mechanism at a driver level.

Thanks!

0. http://wiki.qemu.org/Features/MicroCheckpointing
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Signed-off-by and aliases

2014-05-20 Thread Richard Fontana
On Tue, May 20, 2014 at 07:31:59AM +0200, Loic Dachary wrote:

 However, I know of at least one other instance where finding a way
 to handle aliases would allow contributors to participate in the
 Ceph project. OVH is a large hosting company employing a number of
 developers and management explicitly forbids participation in Free
 Software projects. The primary reason being that they could be
 contacted by companies looking for talents. If their contributions
 were clustered under the OVH li...@ovh.com alias, they may have
 permission to publish their code.

On the assumption that OVH is the copyright holder of all such
contributions, and would knowingly permit employees contributing using
this alias, this seems okay to me.

- Richard


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] 70+ OSD are DOWN and not coming up

2014-05-20 Thread Sage Weil
On Tue, 20 May 2014, Karan Singh wrote:
 Hello Cephers , need your suggestion for troubleshooting.
 
 My cluster is terribly struggling , 70+ osd are down out of 165
 
 Problem ?OSD are getting marked out of cluster and are down. The cluster is
 degraded. On checking logs of failed OSD we are getting wired entries that
 are continuously getting generated.

Tracking this at http://tracker.ceph.com/issues/8387

The most recent bits you posted in the ticket don't quite make sense: the 
OSD is trying to connect to an address for an OSD that is currently marked 
down.  I suspect this is just timing between when the logs were captured 
and when teh ceph osd dump was captured.  To get a complete pictures, 
please:

1) add

 debug osd = 20
 debug ms = 1

in [osd] and restart all osds

2) ceph osd set nodown

(to prevent flapping)

3) find some OSD that is showing these messages

4) capture a 'ceph osd dump' output.

Also happy to debug this interactively over IRC; that will likely be 
faster!

Thanks-
sage



 
 Osd Debug logs ::  http://pastebin.com/agTKh6zB
 
 
  1. 2014-05-20 10:19:03.699886 7f2328e237a0  0 osd.158 357532 done with
 init, starting boot process
  2. 2014-05-20 10:19:03.700093 7f22ff621700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
 l=0 c=0x83018c0).connect claims to be 192.168.1.109:6802/63896 not
 192.168.1.109:6802/910005982 - wrong node!
  3. 2014-05-20 10:19:03.700152 7f22ff621700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6802/910005982 pipe(0x8698500 sd=35 :33500 s=1 pgs=0 cs=0
 l=0 c=0x83018c0).fault with nothing to send, going to standby
  4. 2014-05-20 10:19:09.551269 7f22fdd12700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
 l=0 c=0x533fd20).connect claims to be 192.168.1.109:6803/63896 not
 192.168.1.109:6803/1176009454 - wrong node!
  5. 2014-05-20 10:19:09.551347 7f22fdd12700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6803/1176009454 pipe(0x56aee00 sd=53 :40060 s=1 pgs=0 cs=0
 l=0 c=0x533fd20).fault with nothing to send, going to standby
  6. 2014-05-20 10:19:09.703901 7f22fd80d700  0 -- 192.168.1.112:6802/3807 
 192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
 c=0x8302aa0).connect claims to be 192.168.1.113:6802/24612 not
 192.168.1.113:6802/13870 - wrong node!
  7. 2014-05-20 10:19:09.704039 7f22fd80d700  0 -- 192.168.1.112:6802/3807 
 192.168.1.113:6802/13870 pipe(0x56adf00 sd=137 :42889 s=1 pgs=0 cs=0 l=0
 c=0x8302aa0).fault with nothing to send, going to standby
  8. 2014-05-20 10:19:10.243139 7f22fd005700  0 -- 192.168.1.112:6802/3807 
 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
 c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
 192.168.1.112:6800/14114 - wrong node!
  9. 2014-05-20 10:19:10.243190 7f22fd005700  0 -- 192.168.1.112:6802/3807 
 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
 c=0x8304780).fault with nothing to send, going to standby
 10. 2014-05-20 10:19:10.349693 7f22fc7fd700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
 c=0x83070c0).fault with nothing to send, going to standby
 
 
  1. ceph -v
 ceph version 0.80-469-g991f7f1
 (991f7f15a6e107b33a24bbef1169f21eb7fcce2c) #
  1. ceph osd stat
 osdmap e357073: 165 osds: 91 up, 165 in
 flags noout #
 
 I have tried doing :
 
 1. Restarting the problematic OSDs , but no luck
 2.  i restarted entire host but no luck, still osds are down and getting the
 same mesage
 
  1. 2014-05-20 10:19:10.243139 7f22fd005700  0 -- 192.168.1.112:6802/3807 
 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
 c=0x8304780).connect claims to be 192.168.1.112:6800/2852 not
 192.168.1.112:6800/14114 - wrong node!
  2. 2014-05-20 10:19:10.243190 7f22fd005700  0 -- 192.168.1.112:6802/3807 
 192.168.1.112:6800/14114 pipe(0x56a8f00 sd=146 :43726 s=1 pgs=0 cs=0 l=0
 c=0x8304780).fault with nothing to send, going to standby
  3. 2014-05-20 10:19:10.349693 7f22fc7fd700  0 -- 192.168.1.112:6802/3807 
 192.168.1.109:6800/13492 pipe(0x8698c80 sd=156 :0 s=1 pgs=0 cs=0 l=0
 c=0x83070c0).fault with nothing to send, going to standby
  4. 2014-05-20 10:22:23.312473 7f2307e61700  0 osd.158 357781 do_command r=0
  5. 2014-05-20 10:22:23.326110 7f2307e61700  0 osd.158 357781 do_command r=0
 debug_osd=0/5
  6. 2014-05-20 10:22:23.326123 7f2307e61700  0 log [INF] : debug_osd=0/5
  7. 2014-05-20 10:34:08.161864 7f230224d700  0 -- 192.168.1.112:6802/3807 
 192.168.1.102:6808/13276 pipe(0x8698280 sd=22 :41078 s=2 pgs=603 cs=1
 l=0 c=0x8301600).fault with nothing to send, going to standby
 
 3. Disks do not have errors , no message in dmesg and /var/log/messages
 
 4. there was a bug in the past 

New Defects reported by Coverity Scan for ceph (fwd)

2014-05-20 Thread Sage Weil
---BeginMessage---


Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 1 of 1 defect(s)


** CID 1214678:  Unchecked return value  (CHECKED_RETURN)
/osd/OSD.cc: 318 in OSDService::_maybe_split_pgid(std::tr1::shared_ptrconst 
OSDMap, std::tr1::shared_ptrconst OSDMap, spg_t)()



*** CID 1214678:  Unchecked return value  (CHECKED_RETURN)
/osd/OSD.cc: 318 in OSDService::_maybe_split_pgid(std::tr1::shared_ptrconst 
OSDMap, std::tr1::shared_ptrconst OSDMap, spg_t)()
312   OSDMapRef new_map,
313   spg_t pgid)
314 {
315   assert(old_map-have_pg_pool(pgid.pool()));
316   if (pgid.ps()  
static_castunsigned(old_map-get_pg_num(pgid.pool( {
317 setspg_t children;
 CID 1214678:  Unchecked return value  (CHECKED_RETURN)
 No check of the return value of 
 pgid.is_split(old_map-get_pg_num(pgid.pool()), 
 new_map-get_pg_num(pgid.pool()), children).
318 pgid.is_split(old_map-get_pg_num(pgid.pool()),
319   new_map-get_pg_num(pgid.pool()), children);
320 _start_split(pgid, children);
321   } else {
322 assert(pgid.ps()  
static_castunsigned(new_map-get_pg_num(pgid.pool(;
323   }



To view the defects in Coverity Scan visit, 
http://scan.coverity.com/projects/25?tab=overview

To unsubscribe from the email notification for new defects, 
http://scan5.coverity.com/cgi-bin/unsubscribe.py



---End Message---


Re: [RFC] add rocksdb support

2014-05-20 Thread Sage Weil
Hi Xinxin,

I've pushed an updated wip-rocksdb to github/liewegas/ceph.git that 
includes the latest set of patches with the groundwork and your rocksdb 
patch.  There is also a commit that adds rocksdb as a git submodule.  I'm 
thinking that, since there aren't any distro packages for rocksdb at this 
point, this is going to be the easiest way to make this usable for people.

If you can wire the submodule into the makefile, we can merge this in so 
that rocksdb support is in the ceph.com packages on ceph.com.  I suspect 
that the distros will prefer to turns this off in favor of separate shared 
libs, but they can do this at their option if/when they include rocksdb in 
the distro. I think the key is just to have both --with-librockdb and 
--with-librocksdb-static (or similar) options so that you can either use 
the static or dynamically linked one.

Has your group done further testing with rocksdb?  Anything interesting to 
share?

Thanks!
sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html