Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
 On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  ceph pg set_full_ratio 0.95
  ceph pg set_nearfull_ratio 0.94
   
   
  On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
   
   On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
 On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
 (mailto:s...@inktank.com) wrote:
  On Fri, 13 Jul 2012, Gregory Farnum wrote:
   On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
   (mailto:and...@xdel.ru) wrote:
Hi,
 
Recently I`ve reduced my test suite from 6 to 4 osds at ~60% 
usage on
six-node,
and I have removed a bunch of rbd objects during recovery to 
avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state 
on
non-existing osd:
 
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}
 
HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%
 
Needless to say, osd.4 remains only in ceph.conf, but not at 
crushmap.
Reducing has been done 'on-line', e.g. without restart entire 
cluster.







   Whoops! It looks like Sage has written some patches to fix this, 
   but
   for now you should be good if you just update your ratios to a 
   larger
   number, and then bring them back down again. :)
   
   
   
   
   
   
   
  Restarting ceph-mon should also do the trick.
   
  Thanks for the bug report!
  sage
  
  
  
  
  
  
  
 Should I restart mons simultaneously?
I don't think restarting will actually do the trick for you — you 
actually will need to set the ratios again.
 
 Restarting one by one has no
 effect, same as filling up data pool up to ~95 percent(btw, when I
 deleted this 50Gb file on cephfs, mds was stuck permanently and usage
 remained same until I dropped and recreated data pool - hope it`s one
 of known posix layer bugs). I also deleted entry from config, and then
 restarted mons, with no effect. Any suggestions?
 
 
 
 
 
I'm not sure what you're asking about here?
-Greg





   Oh, sorry, I have mislooked and thought that you suggested filling up
   osds. How do I can set full/nearfull ratios correctly?

   $ceph injectargs '--mon_osd_full_ratio 96'
   parsed options
   $ ceph injectargs '--mon_osd_near_full_ratio 94'
   parsed options

   ceph pg dump | grep 'full'
   full_ratio 0.95
   nearfull_ratio 0.85

   Setting parameters in the ceph.conf and then restarting mons does not
   affect ratios either.
   
  
  
  
 Thanks, it worked, but setting values back result to turn warning back.  
Hrm. That shouldn't be possible if the OSD has been removed. How did you take 
it out? It sounds like maybe you just marked it in the OUT state (and turned it 
off quite quickly) without actually taking it out of the cluster?  
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com wrote:
 On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
 On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  ceph pg set_full_ratio 0.95
  ceph pg set_nearfull_ratio 0.94
 
 
  On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
 
   On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
 On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
 (mailto:s...@inktank.com) wrote:
  On Fri, 13 Jul 2012, Gregory Farnum wrote:
   On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
   (mailto:and...@xdel.ru) wrote:
Hi,
   
Recently I`ve reduced my test suite from 6 to 4 osds at ~60% 
usage on
six-node,
and I have removed a bunch of rbd objects during recovery to 
avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state 
on
non-existing osd:
   
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}
   
HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%
   
Needless to say, osd.4 remains only in ceph.conf, but not at 
crushmap.
Reducing has been done 'on-line', e.g. without restart entire 
cluster.
  
  
  
  
  
  
  
   Whoops! It looks like Sage has written some patches to fix this, 
   but
   for now you should be good if you just update your ratios to a 
   larger
   number, and then bring them back down again. :)
 
 
 
 
 
 
 
  Restarting ceph-mon should also do the trick.
 
  Thanks for the bug report!
  sage







 Should I restart mons simultaneously?
I don't think restarting will actually do the trick for you — you 
actually will need to set the ratios again.
   
 Restarting one by one has no
 effect, same as filling up data pool up to ~95 percent(btw, when I
 deleted this 50Gb file on cephfs, mds was stuck permanently and usage
 remained same until I dropped and recreated data pool - hope it`s one
 of known posix layer bugs). I also deleted entry from config, and 
 then
 restarted mons, with no effect. Any suggestions?
   
   
   
   
   
I'm not sure what you're asking about here?
-Greg
  
  
  
  
  
   Oh, sorry, I have mislooked and thought that you suggested filling up
   osds. How do I can set full/nearfull ratios correctly?
  
   $ceph injectargs '--mon_osd_full_ratio 96'
   parsed options
   $ ceph injectargs '--mon_osd_near_full_ratio 94'
   parsed options
  
   ceph pg dump | grep 'full'
   full_ratio 0.95
   nearfull_ratio 0.85
  
   Setting parameters in the ceph.conf and then restarting mons does not
   affect ratios either.
 



 Thanks, it worked, but setting values back result to turn warning back.
 Hrm. That shouldn't be possible if the OSD has been removed. How did you take 
 it out? It sounds like maybe you just marked it in the OUT state (and turned 
 it off quite quickly) without actually taking it out of the cluster?
 -Greg


As I have did removal, it was definitely not like that - at first
place, I have marked osds(4 and 5 on same host) out, then rebuilt
crushmap and then kill osd processes. As I mentioned before, osd.4
doest not exist in crushmap and therefore it shouldn`t be reported at
all(theoretically).
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to compile Java-Rados.

2012-07-18 Thread ramu
Hi Noah,

After reinstalled java-rados,when I run ant test now am getting follwing 
error in terminal,
Buildfile: /home/vutp/java-rados/build.xml

makedir:

compile-rados:

compile-tests:
[javac] Compiling 1 source file to /home/vutp/java-rados/build/test

jar:

test:
[junit] Running ClusterStatsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec
[junit] Running ClusterTest
[junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec

BUILD FAILED
/home/vutp/java-rados/build.xml:134: Test ClusterTest failed

Total time: 10 seconds

--And also two txt files generated in Java-Rados directory one is TEST-
ClusterStatsTest.txt in this file the text is ,
Testsuite: ClusterStatsTest
Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec
Testcase: test_ClusterStats took 0.027 sec

--and one more txt file is TEST-ClusterTest.txt in this file the text is ,

Testsuite: ClusterTest
Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec

Testcase: test_ConfigOption took 0.026 sec
FAILED

junit.framework.AssertionFailedError:
at ClusterTest.test_ConfigOption(Unknown Source)

Testcase: test_getClusterStats took 0.005 sec
Testcase: test_getInstancePointer took 0.004 sec
Testcase: test_getVersion took 0.005 sec
Testcase: test_PoolOperations took 1.821 sec
Testcase: test_openIOContext took 2.134 sec
Testcase: test_PoolList took 2.543 sec

Thanks,
Ramu.





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
   On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
ceph pg set_full_ratio 0.95
ceph pg set_nearfull_ratio 0.94
 
 
On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
 
 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov 
 and...@xdel.ru (mailto:and...@xdel.ru) wrote:
  Hi,
   
  Recently I`ve reduced my test suite from 6 to 4 osds at 
  ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery 
  to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull 
  state on
  non-existing osd:
   
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 
  181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
   
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
   
  Needless to say, osd.4 remains only in ceph.conf, but not 
  at crushmap.
  Reducing has been done 'on-line', e.g. without restart 
  entire cluster.
  
  
  
  
  
  
  
  
  
 Whoops! It looks like Sage has written some patches to fix 
 this, but
 for now you should be good if you just update your ratios to 
 a larger
 number, and then bring them back down again. :)
 
 
 
 
 
 
 
 
 
Restarting ceph-mon should also do the trick.
 
Thanks for the bug report!
sage









   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you 
  actually will need to set the ratios again.
   
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when I
   deleted this 50Gb file on cephfs, mds was stuck permanently and 
   usage
   remained same until I dropped and recreated data pool - hope it`s 
   one
   of known posix layer bugs). I also deleted entry from config, and 
   then
   restarted mons, with no effect. Any suggestions?
   
   
   
   
   
   
   
  I'm not sure what you're asking about here?
  -Greg
  
  
  
  
  
  
  
 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?
  
 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options
  
 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85
  
 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.
 





   Thanks, it worked, but setting values back result to turn warning back.
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state (and 
  turned it off quite quickly) without actually taking it out of the cluster?
  -Greg
  
  
  
 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

Okay, that's what happened — marking an OSD out in the CRUSH map means all the 
data gets moved off it, but that doesn't remove it from all the places where 
it's registered in the monitor and in the map, for a couple reasons:  
1) You might want to mark an OSD out before taking it down, to allow for more 
orderly data movement.
2) OSDs can get marked out automatically, but the system shouldn't be able to 
forget about them on its own.
3) You might want to remove an 

Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
   On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
ceph pg set_full_ratio 0.95
ceph pg set_nearfull_ratio 0.94
   
   
On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
   
 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov 
 and...@xdel.ru (mailto:and...@xdel.ru) wrote:
  Hi,
 
  Recently I`ve reduced my test suite from 6 to 4 osds at 
  ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery 
  to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull 
  state on
  non-existing osd:
 
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 
  181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
 
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
 
  Needless to say, osd.4 remains only in ceph.conf, but not 
  at crushmap.
  Reducing has been done 'on-line', e.g. without restart 
  entire cluster.









 Whoops! It looks like Sage has written some patches to fix 
 this, but
 for now you should be good if you just update your ratios to 
 a larger
 number, and then bring them back down again. :)
   
   
   
   
   
   
   
   
   
Restarting ceph-mon should also do the trick.
   
Thanks for the bug report!
sage
  
  
  
  
  
  
  
  
  
   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you 
  actually will need to set the ratios again.
 
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when 
   I
   deleted this 50Gb file on cephfs, mds was stuck permanently and 
   usage
   remained same until I dropped and recreated data pool - hope 
   it`s one
   of known posix layer bugs). I also deleted entry from config, 
   and then
   restarted mons, with no effect. Any suggestions?
 
 
 
 
 
 
 
  I'm not sure what you're asking about here?
  -Greg







 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?

 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options

 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85

 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.
   
  
  
  
  
  
   Thanks, it worked, but setting values back result to turn warning back.
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state (and 
  turned it off quite quickly) without actually taking it out of the cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for more 
 orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able to 
 forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the 

Puppet modules for Ceph

2012-07-18 Thread François Charlier
Hi,

I'm currently working on writing a Puppet module for Ceph.

As after some research I found no existing module, I'll start from
scratch but I would be glad to hear from people who would already have
started working or this or having any idea or pointers regarding this
subject.

Thanks,

[ By the way, I'm fc on #ceph ! ]
-- 
François Charlier Software Engineer
// eNovance labs   http://labs.enovance.com
// ✉ francois.charl...@enovance.com ☎ +33 1 49 70 99 81
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Puppet modules for Ceph

2012-07-18 Thread Mark Nelson

On 7/18/12 8:58 AM, François Charlier wrote:

Hi,

I'm currently working on writing a Puppet module for Ceph.

As after some research I found no existing module, I'll start from
scratch but I would be glad to hear from people who would already have
started working or this or having any idea or pointers regarding this
subject.

Thanks,

[ By the way, I'm fc on #ceph ! ]



Hi Francois,

That's great!  You might want to look at the chef work that has been 
done as a base to start from.  I'm not very familiar with what is in 
place, but Tommi or Dan may chime in later with more details.  Some of 
the folks from Mediawiki were actually just talking about puppet modules 
yesterday on the IRC channel so they may be interested in collaborating too.


Thanks,
Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poor read performance in KVM

2012-07-18 Thread Josh Durgin

On 07/17/2012 10:46 PM, Vladimir Bashkirtsev wrote:

On 16/07/12 15:46, Josh Durgin wrote:

On 07/15/2012 06:13 AM, Vladimir Bashkirtsev wrote:

Hello,

Lately I was trying to get KVM to perform well on RBD. But it still
appears elusive.

[root@alpha etc]# rados -p rbd bench 120 seq -t 8

Total time run:16.873277
Total reads made: 302
Read size:4194304
Bandwidth (MB/sec):71.592

Average Latency:   0.437984
Max latency:   3.26817
Min latency:   0.015786

Fairly good performance. But when I run in KVM:

[root@mail ~]# hdparm -tT /dev/vda

/dev/vda:
  Timing cached reads:   8808 MB in  2.00 seconds = 4411.49 MB/sec


This is just the guest page cache - it's reading the first two
megabytes of the device repeatedly.

Just to make sure there no issue with VM itself.



  Timing buffered disk reads:  10 MB in 6.21 seconds =   1.61 MB/sec


This is a sequential read, so readahead in the guest should help here.

Should but obviously does not.



Not even close to what rados bench show! I even seen 900KB/sec
performance. Such slow read performance of course affecting guests.

Any ideas where to start to look for performance boost?


Do you have rbd caching enabled?

rbd_cache=true:rbd_cache_size=134217728:rbd_cache_max_dirty=125829120

It would also be interesting to see
how the guest reads are translating to rados reads. hdparm is doing
2MiB sequential reads of the block device. If you add
admin_socket=/var/run/ceph/kvm.asok to the rbd device on the qemu
command line) you can see number of requests, latency, and
request size info while the guest is running via:

ceph --admin-daemon /var/run/ceph/kvm.asok perf dump

Done that. Waited for VM to fully boot then got perf dump. It would be
nice to get output in human readable format instead of JSON - I remember
some other part of ceph had relevant command line switch. Does it exist
for perf dump?

{librbd-rbd/kvm1:{rd:0,rd_bytes:0,rd_latency:{avgcount:0,sum:0},wr:0,wr_bytes:0,wr_latency:{avgcount:0,sum:0},discard:0,discard_bytes:0,discard_latency:{avgcount:0,sum:0},flush:0,aio_rd:3971,aio_rd_bytes:64750592,aio_rd_latency:{avgcount:3971,sum:803.656},aio_wr:91,aio_wr_bytes:652288,aio_wr_latency:{avgcount:91,sum:0.002977},aio_discard:0,aio_discard_bytes:0,aio_discard_latency:{avgcount:0,sum:0},snap_create:0,snap_remove:0,snap_rollback:0,notify:0,resize:0},objectcacher-librbd-rbd/kvm1:{cache_ops_hit:786,cache_ops_miss:3189,cache_bytes_hit:72186880,cache_bytes_miss:61276672,data_read:64750592,data_written:652288,data_flushed:648192,data_overwritten_while_flushing:8192,write_ops_blocked:0,write_bytes_blocked:0,write_time_blocked:0},objecter:{op_active:0,op_laggy:0,op_send:3271,op_send_bytes:0,op_resend:0,op_ack:3270,op_commit:78,op:3271,op_r:3194,op_w:77,op_rmw:

0,op_pg:0,osdop_stat:1,osdop_create:0,osdop_read:3191,osdop_write:77,osdop_writefull:0,osdop_append:0,osdop_zero:0,osdop_truncate:0,osdop_delete:0,osdop_mapext:0,osdop_sparse_read:0,osdop_clonerange:0,osdop_getxattr:0,osdop_setxattr:0,osdop_cmpxattr:0,osdop_rmxattr:0,osdop_resetxattrs:0,osdop_tmap_up:0,osdop_tmap_put:0,osdop_tmap_get:0,osdop_call:1,osdop_watch:1,osdop_notify:0,osdop_src_cmpxattr:0,osdop_pgls:0,osdop_pgls_filter:0,osdop_other:0,linger_active:1,linger_send:1,linger_resend:0,poolop_active:0,poolop_send:0,poolop_resend:0,poolstat_active:0,poolstat_send:0,poolstat_resend:0,statfs_active:0,statfs_send:0,statfs_resend:0,map_epoch:0,map_full:0,map_inc:0,osd_sessions:10,osd_session_open:4,osd_session_close:0,osd_laggy:1},throttle-msgr_dispatch_throttler-radosclient:{val:0,max:104857600,get:3292,get_sum:61673502,get_or_fail_fail:0,get_or_fail_success:0,take:0,take_sum
:0,put:3292,put_sum:61673502,wait:{avgcount:0,sum:0}},throttle-objecter_bytes:{val:0,max:104857600,get:3271,get_sum:61928960,get_or_fail_fail:0,get_or_fail_success:3271,take:0,take_sum:0,put:3268,put_sum:61928960,wait:{avgcount:0,sum:0}},throttle-objecter_ops:{val:0,max:1024,get:3271,get_sum:3271,get_or_fail_fail:0,get_or_fail_success:3271,take:0,take_sum:0,put:3271,put_sum:3271,wait:{avgcount:0,sum:0}}}



If my understanding is correct aio_rd is asynchrous read, latency in
millisecons? Average read latency of 800ms is quite high! I remember in
1991 my 80MB HDD had similar read times - surely we are in 2012! :)


It's actually the sum of the latencies of all 3971 asynchronous reads,
in seconds, so the average latency was ~200ms, which is still pretty
high.


Write latency appears to be excellent. Latency measured between KVM and
librbd or between librbd and OSDs or between KVM and OSDs? Something
tells me it is latter and thus it does not sched any light on where the
problem is. Notably rados has max latency of just over 3ms. Does it mean
that latency of 800ms comes from qemu-rbd driver?!


That's latency between KVM and the OSDs. The extra latency could be from
the callback to qemu or an artifact of this workload on the osds.
You can use the admin socket on the osds for 'perf dump' as well, and

Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))

2012-07-18 Thread Jim Schutt

On 07/17/2012 06:03 PM, Samuel Just wrote:

master should now have a fix for that, let me know how it goes.  I opened
bug #2798 for this issue.



Hmmm, it seems handle_osd_ping() now runs into a case
where for the first ping it gets, service.osdmap can be empty?

 0 2012-07-18 09:17:23.977497 7fffe6ec6700 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7fffe6ec6700

 ceph version 0.48argonaut-419-g4e1d973 
(commit:4e1d973e466cd45138f004e84ab8631d9b2a60fa)
 1: /usr/bin/ceph-osd() [0x723c39]
 2: (()+0xf4a0) [0x776584a0]
 3: (OSD::handle_osd_ping(MOSDPing*)+0x7d4) [0x5d7894]
 4: (OSD::heartbeat_dispatch(Message*)+0x71) [0x5d8111]
 5: (SimpleMessenger::DispatchQueue::entry()+0x583) [0x7d5103]
 6: (SimpleMessenger::dispatch_entry()+0x15) [0x7d6485]
 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x79523d]
 8: (()+0x77f1) [0x776507f1]
 9: (clone()+0x6d) [0x76aa1ccd]

gdb has this to say:

(gdb) bt
#0  0x7765836b in raise (sig=11) at 
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00724067 in reraise_fatal (signum=11) at 
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  signal handler called
#4  get_epoch (this=0x15d, m=0x1587000) at ./osd/OSDMap.h:210
#5  OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711
#6  0x005d8111 in OSD::heartbeat_dispatch (this=0x15d, m=0x1587000) 
at osd/OSD.cc:2769
#7  0x007d5103 in ms_deliver_dispatch (this=0x1472960) at 
msg/Messenger.h:504
#8  SimpleMessenger::DispatchQueue::entry (this=0x1472960) at 
msg/SimpleMessenger.cc:367
#9  0x007d6485 in SimpleMessenger::dispatch_entry (this=0x1472880) at 
msg/SimpleMessenger.cc:384
#10 0x0079523d in SimpleMessenger::DispatchThread::entry (this=value 
optimized out) at ./msg/SimpleMessenger.h:807
#11 0x776507f1 in start_thread (arg=0x7fffe6ec6700) at 
pthread_create.c:301
#12 0x76aa1ccd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 5
#5  OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711
1711m-stamp);
(gdb) l
1706}
1707  }
1708  Message *r = new MOSDPing(monc-get_fsid(),
1709curmap-get_epoch(),
1710MOSDPing::PING_REPLY,
1711m-stamp);
1712  hbserver_messenger-send_message(r, m-get_connection());
1713
1714  if (curmap-is_up(from)) {
1715note_peer_epoch(from, m-map_epoch);
(gdb) p curmap
$1 = std::tr1::shared_ptr (empty) 0x0

-- Jim


Thanks for the info!
-Sam

On Tue, Jul 17, 2012 at 2:54 PM, Jim Schuttjasc...@sandia.gov  wrote:

On 07/17/2012 03:44 PM, Samuel Just wrote:


Not quite.  OSDService::get_osdmap() returns the most recently
published osdmap.  Generally, OSD::osdmap is safe to use when you are
holding the osd lock.  Otherwise, OSDService::get_osdmap() should be
used.  There are a few other things that should be fixed surrounding
this issue as well, I'll put some time into it today.  The map_lock
should probably be removed all together.



Thanks for taking a look.  Let me know when
you get something, and I'll take it for a spin.

Thanks -- Jim


-Sam










--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poor read performance in KVM

2012-07-18 Thread Josh Durgin

On 07/17/2012 10:46 PM, Vladimir Bashkirtsev wrote:

ceph --admin-daemon /var/run/ceph/kvm.asok perf dump

Done that. Waited for VM to fully boot then got perf dump. It would be
nice to get output in human readable format instead of JSON - I remember
some other part of ceph had relevant command line switch. Does it exist
for perf dump?


I forgot to mention you can pipe that to 'python -mjson.tool' for more
readable output. It's intended to be used by monitoring tools, hence
json, but some kind of more plain output could be added.

Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to compile Java-Rados.

2012-07-18 Thread Noah Watkins
Please 'git pull' to grab the following change which solved the same
problem in my location tree:

diff --git a/src/test/ClusterTest.java b/src/test/ClusterTest.java
index 9b6bcb6..8b83bdd 100644
--- a/src/test/ClusterTest.java
+++ b/src/test/ClusterTest.java
@@ -25,13 +25,13 @@ public class ClusterTest {
 String val1, val2;

 /* set option to 2 and check that it set */
-val1 = 2;
+val1 = true;
 cluster.setConfigOption(opt, val1);
 val2 = cluster.getConfigOption(opt);
 assertTrue(val1.compareTo(val2) == 0);

 /* make sure the option wasn't already 2 */
-val1 = 1;
+val1 = false;
 cluster.setConfigOption(opt, val1);
 val2 = cluster.getConfigOption(opt);
 assertTrue(val1.compareTo(val2) == 0);


On Tue, Jul 17, 2012 at 11:24 PM, ramu ramu.freesyst...@gmail.com wrote:
 [junit] Running ClusterStatsTest
 [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec
 [junit] Running ClusterTest
 [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec

good, no errors.


 BUILD FAILED
 /home/vutp/java-rados/build.xml:134: Test ClusterTest failed

 Total time: 10 seconds

 --And also two txt files generated in Java-Rados directory one is TEST-
 ClusterStatsTest.txt in this file the text is ,
 Testsuite: ClusterStatsTest
 Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.044 sec

no errors here either.

 Testcase: test_ClusterStats took 0.027 sec

 --and one more txt file is TEST-ClusterTest.txt in this file the text is ,

 Testsuite: ClusterTest
 Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 6.569 sec

 Testcase: test_ConfigOption took 0.026 sec
 FAILED

 junit.framework.AssertionFailedError:
 at ClusterTest.test_ConfigOption(Unknown Source)

So, it looks like we are down to one error?


 Testcase: test_getClusterStats took 0.005 sec
 Testcase: test_getInstancePointer took 0.004 sec
 Testcase: test_getVersion took 0.005 sec
 Testcase: test_PoolOperations took 1.821 sec
 Testcase: test_openIOContext took 2.134 sec
 Testcase: test_PoolList took 2.543 sec

 Thanks,
 Ramu.





 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Puppet modules for Ceph

2012-07-18 Thread Tommi Virtanen
On Wed, Jul 18, 2012 at 6:58 AM, François Charlier
francois.charl...@enovance.com wrote:
 I'm currently working on writing a Puppet module for Ceph.

 As after some research I found no existing module, I'll start from
 scratch but I would be glad to hear from people who would already have
 started working or this or having any idea or pointers regarding this
 subject.

Hi. I don't remember anyone actively working on puppet modules for
Ceph. A quick search gives me just this:

http://git.sans.ethz.ch/?p=puppet-modules/ceph;a=summary

The Chef cookbook at https://github.com/ceph/ceph-cookbooks is
starting to get into a pretty good stage. It radically changes how we
do deployment and management, so I'd recommend you look at it in
detail, and don't imitate mkcephfs. We've been actively changing core
Ceph to make deployment and management simpler; I think the best proof
of that is that the cookbook is already shorter than the mkcephfs
shell script, and will probably just become a thinner layer in the
future. The Juju charms for Ceph are also adopting a model quite close
to what the Chef cookbook does.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))

2012-07-18 Thread Samuel Just
Sorry, master has a fix now for that also.
76efd9772c60b93bbf632e3ecc3b9117dc081427
-Sam

On Wed, Jul 18, 2012 at 8:29 AM, Jim Schutt jasc...@sandia.gov wrote:
 On 07/17/2012 06:03 PM, Samuel Just wrote:

 master should now have a fix for that, let me know how it goes.  I opened
 bug #2798 for this issue.


 Hmmm, it seems handle_osd_ping() now runs into a case
 where for the first ping it gets, service.osdmap can be empty?

  0 2012-07-18 09:17:23.977497 7fffe6ec6700 -1 *** Caught signal
 (Segmentation fault) **
  in thread 7fffe6ec6700

  ceph version 0.48argonaut-419-g4e1d973
 (commit:4e1d973e466cd45138f004e84ab8631d9b2a60fa)
  1: /usr/bin/ceph-osd() [0x723c39]
  2: (()+0xf4a0) [0x776584a0]
  3: (OSD::handle_osd_ping(MOSDPing*)+0x7d4) [0x5d7894]
  4: (OSD::heartbeat_dispatch(Message*)+0x71) [0x5d8111]
  5: (SimpleMessenger::DispatchQueue::entry()+0x583) [0x7d5103]
  6: (SimpleMessenger::dispatch_entry()+0x15) [0x7d6485]
  7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x79523d]
  8: (()+0x77f1) [0x776507f1]
  9: (clone()+0x6d) [0x76aa1ccd]

 gdb has this to say:

 (gdb) bt
 #0  0x7765836b in raise (sig=11) at
 ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
 #1  0x00724067 in reraise_fatal (signum=11) at
 global/signal_handler.cc:58
 #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
 #3  signal handler called
 #4  get_epoch (this=0x15d, m=0x1587000) at ./osd/OSDMap.h:210
 #5  OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711
 #6  0x005d8111 in OSD::heartbeat_dispatch (this=0x15d,
 m=0x1587000) at osd/OSD.cc:2769
 #7  0x007d5103 in ms_deliver_dispatch (this=0x1472960) at
 msg/Messenger.h:504
 #8  SimpleMessenger::DispatchQueue::entry (this=0x1472960) at
 msg/SimpleMessenger.cc:367
 #9  0x007d6485 in SimpleMessenger::dispatch_entry (this=0x1472880)
 at msg/SimpleMessenger.cc:384
 #10 0x0079523d in SimpleMessenger::DispatchThread::entry
 (this=value optimized out) at ./msg/SimpleMessenger.h:807
 #11 0x776507f1 in start_thread (arg=0x7fffe6ec6700) at
 pthread_create.c:301
 #12 0x76aa1ccd in clone () at
 ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
 (gdb) f 5
 #5  OSD::handle_osd_ping (this=0x15d, m=0x1587000) at osd/OSD.cc:1711
 1711m-stamp);
 (gdb) l
 1706}
 1707  }
 1708  Message *r = new MOSDPing(monc-get_fsid(),
 1709curmap-get_epoch(),
 1710MOSDPing::PING_REPLY,
 1711m-stamp);

 1712  hbserver_messenger-send_message(r, m-get_connection());
 1713
 1714  if (curmap-is_up(from)) {
 1715note_peer_epoch(from, m-map_epoch);
 (gdb) p curmap
 $1 = std::tr1::shared_ptr (empty) 0x0

 -- Jim


 Thanks for the info!
 -Sam

 On Tue, Jul 17, 2012 at 2:54 PM, Jim Schuttjasc...@sandia.gov  wrote:

 On 07/17/2012 03:44 PM, Samuel Just wrote:


 Not quite.  OSDService::get_osdmap() returns the most recently
 published osdmap.  Generally, OSD::osdmap is safe to use when you are
 holding the osd lock.  Otherwise, OSDService::get_osdmap() should be
 used.  There are a few other things that should be fixed surrounding
 this issue as well, I'll put some time into it today.  The map_lock
 should probably be removed all together.



 Thanks for taking a look.  Let me know when
 you get something, and I'll take it for a spin.

 Thanks -- Jim

 -Sam








--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state 
  (and turned it off quite quickly) without actually taking it out of the 
  cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and all 
 will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

Hrm. How did you create your new crush map? All the normal avenues of
removing an OSD from the map set a flag which the PGMap uses to delete
its records (which would prevent it reappearing in the full list), and
I can't see how setcrushmap would remove an OSD from the map (although
there might be a code path I haven't found).
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSDMap.h: 330: FAILED assert(is_up(osd))

2012-07-18 Thread Jim Schutt

On 07/18/2012 12:03 PM, Samuel Just wrote:

Sorry, master has a fix now for that also.
76efd9772c60b93bbf632e3ecc3b9117dc081427
-Sam


That got things running for me.

Thanks for the quick reply.

-- Jim

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state 
  (and turned it off quite quickly) without actually taking it out of the 
  cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and 
 all will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

 Hrm. How did you create your new crush map? All the normal avenues of
 removing an OSD from the map set a flag which the PGMap uses to delete
 its records (which would prevent it reappearing in the full list), and
 I can't see how setcrushmap would remove an OSD from the map (although
 there might be a code path I haven't found).

Manually, by deleting osd4|5 entries and reweighing remaining nodes.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did 
  you take it out? It sounds like maybe you just marked it in the OUT 
  state (and turned it off quite quickly) without actually taking it out 
  of the cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and 
 all will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

 Hrm. How did you create your new crush map? All the normal avenues of
 removing an OSD from the map set a flag which the PGMap uses to delete
 its records (which would prevent it reappearing in the full list), and
 I can't see how setcrushmap would remove an OSD from the map (although
 there might be a code path I haven't found).

 Manually, by deleting osd4|5 entries and reweighing remaining nodes.

So you extracted the CRUSH map, edited it, and injected it using ceph
osd setrcrushmap?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Puppet modules for Ceph

2012-07-18 Thread Tommi Virtanen
On Wed, Jul 18, 2012 at 2:59 PM, Teyo Tyree t...@puppetlabs.com wrote:
 As you probably know, Puppet Labs is based in Portland. Are you attending
 OScon? It might be a good opportunity for us to have some face to face
 hacking time on a Puppet module. Let me know if you would like for us to
 arrange sometime to get together if you are in town.

Sorry, I'm not at OScon, I'm intentionally limiting my travel right
now. A large chunk of Inktank is based in Los Angeles, so we're not
far away even outside of conferences.

Frankly, we still have a bit of cleanup work to do on the chef
cookbook side, and you'd probably be most productive writing puppet
modules once that stuff is all flushed out. Soon, we'll start to
de-emphasize mkcephfs in favor of other, more flexible, deployment
mechanisms; I think bringing together some Puppet, Juju and Chef
experts at that point would be most beneficial.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Puppet modules for Ceph

2012-07-18 Thread Tommi Virtanen
On Wed, Jul 18, 2012 at 3:26 PM, Teyo Tyree t...@puppetlabs.com wrote:
 Ha, that would be an interesting experiment indeed. I think Francois would
 like to have the Puppet module done sooner rather than later. Are the
 current Chef cookbooks functional enough for us to get started with them as
 a reference?

I think so. They Work For Me(tm). The ugly stuff is mostly things like
needing to wait a few rounds due to Chef's asynchronous data store,
and it's missing just about all internal documentation; the
user-visible aspects have a decent write-up, but nothing explains e.g.
what exactly /var/lib/ceph/bootstrap-osd/ is about.

I'd love to help you work through that though, so please keep talking
to me and make me explain everything in enough detail. It's just that
I don't have anything except source to give you right now.

The end user Chef deployment docs are at
http://ceph.com/docs/master/install/chef/
http://ceph.com/docs/master/config-cluster/chef/

The cookbook is at https://github.com/ceph/ceph-cookbooks and
currently it assumes Ubuntu 12.04.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to compile Java-Rados.

2012-07-18 Thread ramu
Hi Noah,

Thank u for your reply,it is working fine but Iam getting one more error
in terminal is,BUILD FAILED
/home/vu/java-rados/build.xml:134: Test IOContextTest failed

Total time: 43 seconds

and in TEST-IOContextTest.txt file the error is,

Testsuite: IOContextTest
Tests run: 11, Failures: 1, Errors: 0, Time elapsed: 32.302 sec

Testcase: test_toString took 1.791 sec
Testcase: test_getCluster took 2.116 sec
Testcase: test_getPoolStats took 2.364 sec
Testcase: test_setLocatorKey took 2.046 sec
Testcase: test_write took 3.109 sec
Testcase: test_writeFull took 3.188 sec
Testcase: test_getLastVersion took 2.46 sec
Testcase: test_append took 3.505 sec
Testcase: test_truncate took 3.423 sec
Testcase: test_getsetAttribute took 3.161 sec
Testcase: test_getObjects took 5.101 sec
FAILED

junit.framework.AssertionFailedError:
at IOContextTest.test_getObjects(Unknown Source)

Thanks,
Ramu.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html