date:20130116

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Dino Yancey

Hi Marcin,

Not sure if anyone asked, but are your OSD journals on actual disk or
are you using tmpfs?

Dino

On Wed, Jan 16, 2013 at 4:53 AM, Wido den Hollander w...@widodh.nl wrote:


 On 01/16/2013 11:50 AM, Marcin Szukala wrote:

 Hi all,

 Any ideas how can I resolve my issue? Or where the problem is?

 Let me describe the issue.
 Host boots up and maps RBD image with XFS filesystems
 Host mounts the filesystems from the RBD image
 Host starts to write data to the mounted filesystems
 Host experiences power failure
 Host comes up and map the RBD image
 Host mounts the filesystems from the RBD image
 All data from all filesystems is lost
 Host is able to use the filesystems with no problems.

 Filesystem is XFS, no errors on filesystem,


 That simply does not make sense to me. How can all data be gone and the FS
 just mount cleanly.

 Can you try to format the RBD with EXT4 and see if that makes any
 difference.

 Could you also try to run a sync prior to pulling the power from the host
 to see if that makes any difference.

 Wido


 Kernel 3.5.0-19-generic

 root@openstack-1:/etc/init# ceph -s
 health HEALTH_OK
 monmap e1: 3 mons at
 {a=10.3.82.102:6789/0,b=10.3.82.103:6789/0,d=10.3.82.105:6789/0},
 election epoch 10, quorum 0,1,2 a,b,d
 osdmap e132: 56 osds: 56 up, 56 in
  pgmap v87165: 13744 pgs: 13744 active+clean; 52727 MB data, 102 GB
 used, 52028 GB / 52131 GB avail
 mdsmap e1: 0/0/1 up

 Regards,
 Marcin
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
__
Dino Yancey
2GNT.com Admin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Yann Dupont


Le 16/01/2013 11:53, Wido den Hollander a écrit :



On 01/16/2013 11:50 AM, Marcin Szukala wrote:



Hi all,

Any ideas how can I resolve my issue? Or where the problem is?

Let me describe the issue.
Host boots up and maps RBD image with XFS filesystems
Host mounts the filesystems from the RBD image
Host starts to write data to the mounted filesystems
Host experiences power failure

you are not doing sync there, right ?


Host comes up and map the RBD image
Host mounts the filesystems from the RBD image
All data from all filesystems is lost
Host is able to use the filesystems with no problems.

Filesystem is XFS, no errors on filesystem,


you MAY have hit an XFS issue.

Please follow XFS list, in particular this thread :
http://oss.sgi.com/pipermail/xfs/2012-December/023021.html

If i Remember well, this one is after 3.4 kernel, and I think the fix 
isn't in the current ubuntu kernel.



cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Marcin Szukala

Hi Dino,

journals are on dedicated SSD

Regard,
Marcin

2013/1/16 Dino Yancey dino2...@gmail.com:
 Hi Marcin,

 Not sure if anyone asked, but are your OSD journals on actual disk or
 are you using tmpfs?

 Dino

 On Wed, Jan 16, 2013 at 4:53 AM, Wido den Hollander w...@widodh.nl wrote:


 On 01/16/2013 11:50 AM, Marcin Szukala wrote:

 Hi all,

 Any ideas how can I resolve my issue? Or where the problem is?

 Let me describe the issue.
 Host boots up and maps RBD image with XFS filesystems
 Host mounts the filesystems from the RBD image
 Host starts to write data to the mounted filesystems
 Host experiences power failure
 Host comes up and map the RBD image
 Host mounts the filesystems from the RBD image
 All data from all filesystems is lost
 Host is able to use the filesystems with no problems.

 Filesystem is XFS, no errors on filesystem,


 That simply does not make sense to me. How can all data be gone and the FS
 just mount cleanly.

 Can you try to format the RBD with EXT4 and see if that makes any
 difference.

 Could you also try to run a sync prior to pulling the power from the host
 to see if that makes any difference.

 Wido


 Kernel 3.5.0-19-generic

 root@openstack-1:/etc/init# ceph -s
 health HEALTH_OK
 monmap e1: 3 mons at
 {a=10.3.82.102:6789/0,b=10.3.82.103:6789/0,d=10.3.82.105:6789/0},
 election epoch 10, quorum 0,1,2 a,b,d
 osdmap e132: 56 osds: 56 up, 56 in
  pgmap v87165: 13744 pgs: 13744 active+clean; 52727 MB data, 102 GB
 used, 52028 GB / 52131 GB avail
 mdsmap e1: 0/0/1 up

 Regards,
 Marcin
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 __
 Dino Yancey
 2GNT.com Admin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Marcin Szukala

2013/1/16 Yann Dupont yann.dup...@univ-nantes.fr:
 Le 16/01/2013 11:53, Wido den Hollander a écrit :



 On 01/16/2013 11:50 AM, Marcin Szukala wrote:


 Hi all,

 Any ideas how can I resolve my issue? Or where the problem is?

 Let me describe the issue.
 Host boots up and maps RBD image with XFS filesystems
 Host mounts the filesystems from the RBD image
 Host starts to write data to the mounted filesystems
 Host experiences power failure

 you are not doing sync there, right ?

Nope, no sync.


 Host comes up and map the RBD image
 Host mounts the filesystems from the RBD image
 All data from all filesystems is lost
 Host is able to use the filesystems with no problems.

 Filesystem is XFS, no errors on filesystem,


 you MAY have hit an XFS issue.

 Please follow XFS list, in particular this thread :
 http://oss.sgi.com/pipermail/xfs/2012-December/023021.html

 If i Remember well, this one is after 3.4 kernel, and I think the fix isn't
 in the current ubuntu kernel.

It looks like it, with ext4 I have no issue. Also if i do sync, the
data is not lost.

Thank You All for help.

Regards,
Marcin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

2013-01-16 Thread Sébastien Han

Can we use this doc as a reference for the upgrade?

https://github.com/ceph/ceph/blob/eb02eaede53c03579d015ca00a888a48dbab739a/doc/install/upgrading-ceph.rst

Thanks.

--
Regards,
Sébastien Han.


On Tue, Jan 15, 2013 at 10:49 PM, Sage Weil s...@inktank.com wrote:
 That there are some critical bugs that are fixed in v0.48.3, including one
 that can lead to data loss in power loss or kernel panic situations.
 Please upgrade if you have not already done so!

 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

2013-01-16 Thread Jim Schutt

Hi Sage,

On 01/15/2013 07:55 PM, Sage Weil wrote:
 Hi Jim-
 
 I just realized this didn't make it into our tree.  It's now in testing, 
 and will get merged in the next window.  D'oh!

That's great news - thanks for the update.

-- Jim

 
 sage


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

OSD don't start after upgrade form 0.47.2 to 0.56.1

2013-01-16 Thread Michael Menge


Hi List,

i tried to upgrade my ceph cluster from 0.47.2 (openSuSE buildservice  
for SLES 11 SP2) to 0.56.1 (ceph.com/rpm/sles11/)


At first I updated only one server (mon.b / osd.2) and restartet ceph  
on this server. After a short time /etc/init.d/ceph -a status showed  
not running

for most osd

At this time i tried stopping ceph on all hosts, but some osd processes
where hanging in diskwait. I updated the others and after the processes
where still not responsiv i rebootet the systems.

After restarting ceph, the osds updated the filesystem, but stoped  
short afterwards.


hpb020102 had the following log entries

--
2013-01-16 15:40:25.297036 7fd348387760  0 filestore(/srv/osd.2) mount  
FIEMAP io

ctl is supported and appears to work
2013-01-16 15:40:25.297049 7fd348387760  0 filestore(/srv/osd.2) mount  
FIEMAP io

ctl is disabled via 'filestore fiemap' config option
2013-01-16 15:40:25.297392 7fd348387760  0 filestore(/srv/osd.2) mount  
did NOT d

etect btrfs
2013-01-16 15:40:25.297402 7fd348387760  0 filestore(/srv/osd.2) mount  
syncfs(2)

 syscall not supported
2013-01-16 15:40:25.297405 7fd348387760  0 filestore(/srv/osd.2) mount  
no syncfs

(2), must use sync(2).
2013-01-16 15:40:25.297407 7fd348387760  0 filestore(/srv/osd.2) mount  
WARNING:

multiple ceph-osd daemons on the same host will be slow
2013-01-16 15:40:25.297480 7fd348387760  0 filestore(/srv/osd.2) mount  
found sna

ps 
2013-01-16 15:40:25.364304 7fd348387760  0 filestore(/srv/osd.2)  
mount: enabling

 WRITEAHEAD journal mode: btrfs not detected
2013-01-16 15:40:25.373353 7fd348387760  1 journal _open  
/srv/osd.2.journal fd 2

1: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:40:25.373431 7fd348387760  1 journal _open  
/srv/osd.2.journal fd 2

1: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:40:25.374388 7fd348387760  1 journal close /srv/osd.2.journal
2013-01-16 15:40:25.430719 7fd348387760  0 filestore(/srv/osd.2) mount  
FIEMAP io

ctl is supported and appears to work
2013-01-16 15:40:25.430731 7fd348387760  0 filestore(/srv/osd.2) mount  
FIEMAP io

ctl is disabled via 'filestore fiemap' config option
2013-01-16 15:40:25.431011 7fd348387760  0 filestore(/srv/osd.2) mount  
did NOT d

etect btrfs
2013-01-16 15:40:25.431017 7fd348387760  0 filestore(/srv/osd.2) mount  
syncfs(2)

 syscall not supported
2013-01-16 15:40:25.431018 7fd348387760  0 filestore(/srv/osd.2) mount  
no syncfs

(2), must use sync(2).
2013-01-16 15:40:25.431019 7fd348387760  0 filestore(/srv/osd.2) mount  
WARNING:

multiple ceph-osd daemons on the same host will be slow
2013-01-16 15:40:25.431041 7fd348387760  0 filestore(/srv/osd.2) mount  
found sna

ps 
2013-01-16 15:40:25.489620 7fd348387760  0 filestore(/srv/osd.2)  
mount: enabling

 WRITEAHEAD journal mode: btrfs not detected
2013-01-16 15:40:25.494361 7fd348387760  1 journal _open  
/srv/osd.2.journal fd 2

9: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:40:25.494417 7fd348387760  1 journal _open  
/srv/osd.2.journal fd 2

9: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:40:25.494679 7fd348387760 -1 filestore(/srv/osd.2) could  
not find

23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2013-01-16 15:40:25.494694 7fd348387760 -1 osd.2 0 OSD::init() :  
unable to read

osd superblock
2013-01-16 15:40:25.495001 7fd348387760  1 journal close /srv/osd.2.journal
2013-01-16 15:40:25.495665 7fd348387760 -1 ESC[0;31m ** ERROR: osd  
init failed:

(22) Invalid argumentESC[0m
---

hpb020103-hpb020106 showed

2013-01-16 15:47:56.886005 7f504e1e9760  0 filestore(/srv/osd.5) mount  
FIEMAP io

ctl is supported and appears to work
2013-01-16 15:47:56.886017 7f504e1e9760  0 filestore(/srv/osd.5) mount  
FIEMAP io

ctl is disabled via 'filestore fiemap' config option
2013-01-16 15:47:56.886291 7f504e1e9760  0 filestore(/srv/osd.5) mount  
did NOT d

etect btrfs
2013-01-16 15:47:56.886298 7f504e1e9760  0 filestore(/srv/osd.5) mount  
syncfs(2)

 syscall not supported
2013-01-16 15:47:56.886300 7f504e1e9760  0 filestore(/srv/osd.5) mount  
no syncfs

(2), must use sync(2).
2013-01-16 15:47:56.886301 7f504e1e9760  0 filestore(/srv/osd.5) mount  
WARNING:

multiple ceph-osd daemons on the same host will be slow
2013-01-16 15:47:56.886351 7f504e1e9760  0 filestore(/srv/osd.5) mount  
found sna

ps 
2013-01-16 15:47:56.945149 7f504e1e9760  0 filestore(/srv/osd.5)  
mount: enabling

 WRITEAHEAD journal mode: btrfs not detected
2013-01-16 15:47:56.953456 7f504e1e9760  1 journal _open  
/srv/osd.5.journal fd 2

1: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:47:56.953545 7f504e1e9760  1 journal _open  
/srv/osd.5.journal fd 2

1: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-01-16 15:47:56.955011 7f504e1e9760  1 journal close /srv/osd.5.journal
2013-01-16

8 out of 12 OSDs died after expansion on 0.56.1 (void OSD::do_waiters())

2013-01-16 Thread Wido den Hollander


Hi,

I'm testing a small Ceph cluster with Asus C60M1-1 mainboards.

The setup is:
- AMD Fusion C60 CPU
- 8GB DDR3
- 1x Intel 520 120GB SSD (OS + Journaling)
- 4x 1TB disk

I had two of these systems running, but yesterday I wanted to add a 
third one.


So I had 8 OSDs (one per disk) running on 0.56.1 and I added one host 
bringing the total to 12.


The cluster came into a degraded state (about 50%) and it started to 
recover until it reached somewhere about 48%


In a manner of about 5 minutes all the original 8 OSDs had crashed with 
the same backtrace:


-1 2013-01-15 17:20:29.058426 7f95a0fd8700 10 -- 
[2a00:f10:113:0:6051:e06c:df3:f374]:6803/4913 reaper done
 0 2013-01-15 17:20:29.061054 7f959cfd0700 -1 osd/OSD.cc: In 
function 'void OSD::do_waiters()' thread 7f959cfd0700 time 2013-01-15 
17:20:29.057714

osd/OSD.cc: 3318: FAILED assert(osd_lock.is_locked())

 ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
 1: (OSD::do_waiters()+0x2c3) [0x6251f3]
 2: (OSD::ms_dispatch(Message*)+0x1c4) [0x62d714]
 3: (DispatchQueue::entry()+0x349) [0x8ba289]
 4: (DispatchQueue::DispatchThread::entry()+0xd) [0x8137cd]
 5: (()+0x7e9a) [0x7f95a95dae9a]
 6: (clone()+0x6d) [0x7f95a805ecbd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is 
needed to interpret this.


So osd.0 - osd.7 were down and osd.8 - osd.11 (the new ones) were still 
running happily.


I have to note that during this recovery the load of the first two 
machines spiked to 10 and the CPUs were 0% idle.


This morning I started all the OSDs again with a default loglevel since 
I don't want to stress the CPUs even more.


I know the C60 CPU is kind of limited, but it's a test-case!

The recovery started again and it showed about 90MB/sec (Gbit network) 
coming into the new node.


After about 4 hours the recovery successfully completed:

736 pgs: 1736 active+clean; 837 GB data, 1671 GB used, 9501 GB / 11172 
GB avail


Now, there was no high logging level on the OSDs prior to their crash, I 
only have the default logs.


And nothing happened after I started them again, all 12 are up now.

Is this a known one? If not, I'll file a bug in the tracker.

Wido
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

2013-01-16 Thread Sage Weil

On Wed, 16 Jan 2013, S?bastien Han wrote:
 Can we use this doc as a reference for the upgrade?
 
 https://github.com/ceph/ceph/blob/eb02eaede53c03579d015ca00a888a48dbab739a/doc/install/upgrading-ceph.rst

Yeah.  It's pretty simple in this case (since it's a point release 
upgrade):

 - install new package everywhere
 - restart daemons

(at any rate, in any order)

The most important ones to upgrade in this case are the ceph-osd's.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Jeff Mitchell

FWIW, my ceph data dirs (for e.g. mons) are all on XFS. I've
experienced a lot of corruption on these on power loss to the node --
and in some cases even when power wasn't lost, and the box was simply
rebooted. This is on Ubuntu 12.04 with the ceph-provied 3.6.3 kernel
(as I'm using RBD on these).

It's pretty much to the point where I'm thinking of changing them all
over to ext4 for these data dirs, as the hassle of rebuilding mons
constantly is just not worth the trouble.

--Jeff

On Wed, Jan 16, 2013 at 9:32 AM, Marcin Szukala
szukala.mar...@gmail.com wrote:
 2013/1/16 Yann Dupont yann.dup...@univ-nantes.fr:
 Le 16/01/2013 11:53, Wido den Hollander a écrit :



 On 01/16/2013 11:50 AM, Marcin Szukala wrote:


 Hi all,

 Any ideas how can I resolve my issue? Or where the problem is?

 Let me describe the issue.
 Host boots up and maps RBD image with XFS filesystems
 Host mounts the filesystems from the RBD image
 Host starts to write data to the mounted filesystems
 Host experiences power failure

 you are not doing sync there, right ?

 Nope, no sync.


 Host comes up and map the RBD image
 Host mounts the filesystems from the RBD image
 All data from all filesystems is lost
 Host is able to use the filesystems with no problems.

 Filesystem is XFS, no errors on filesystem,


 you MAY have hit an XFS issue.

 Please follow XFS list, in particular this thread :
 http://oss.sgi.com/pipermail/xfs/2012-December/023021.html

 If i Remember well, this one is after 3.4 kernel, and I think the fix isn't
 in the current ubuntu kernel.

 It looks like it, with ext4 I have no issue. Also if i do sync, the
 data is not lost.

 Thank You All for help.

 Regards,
 Marcin
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph slow request unstable issue

2013-01-16 Thread Sage Weil

Hi,

On Wed, 16 Jan 2013, Andrey Korolyov wrote:
 On Wed, Jan 16, 2013 at 4:58 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
  Hi list,
  We are suffering from OSD or OS down when there is continuing high 
  pressure on the Ceph rack.
  Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each 
  nodes with 20 * spindles + 4* SSDs as journal.(120 spindles in total)
  We create a lots of RBD volumes (say 240),mounting to 16 different 
  client machines ( 15 RBD Volumes/ client) and running DD concurrently on 
  top of each RBD.
 
  The issues are:
  1. Slow requests
  ??From the list-archive it seems solved in 0.56.1 but we still notice such 
  warning
  2. OSD Down or even host down
  Like the message below.Seems some OSD has been blocking there for quite a 
  long time.
 
  Suggestions are highly appreciate.Thanks
  
  
  Xiaoxi
 
  _
 
  Bad news:
 
  I have  back all my Ceph machine?s OS to kernel  3.2.0-23, which Ubuntu 
  12.04 use.
  I run dd command (dd if=/dev/zero bs=1M count=6 of=/dev/rbd${i}  )on 
  Ceph client to create data prepare test at last night.
  Now, I have one machine down (can?t be reached by ping), another two 
  machine has all OSD daemon down, while the three left has some daemon down.
 
  I have many warnings in OSD log like this:
 
  no flag points reached
  2013-01-15 19:14:22.769898 7f20a2d57700  0 log [WRN] : slow request 
  52.218106 seconds old, received at 2013-01-15 19:13:30.551718: 
  osd_op(client.10674.1:1002417 rb.0.27a8.6b8b4567.0eba [write 
  3145728~524288] 2.c61810ee RETRY) currently waiting for sub ops
  2013-01-15 19:14:23.770077 7f20a2d57700  0 log [WRN] : 21 slow requests, 6 
  included below; oldest blocked for  1132.138983 secs
  2013-01-15 19:14:23.770086 7f20a2d57700  0 log [WRN] : slow request 
  53.216404 seconds old, received at 2013-01-15 19:13:30.553616: 
  osd_op(client.10671.1:1066860 rb.0.282c.6b8b4567.1057 [write 
  2621440~524288] 2.ea7acebc) currently waiting for sub ops
  2013-01-15 19:14:23.770096 7f20a2d57700  0 log [WRN] : slow request 
  51.442032 seconds old, received at 2013-01-15 19:13:32.327988: 
  osd_op(client.10674.1:1002418
 
  Similar info in dmesg we have saw pervious:
 
  [21199.036476] INFO: task ceph-osd:7788 blocked for more than 120 seconds.
  [21199.037493] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
  this message.
  [21199.038841] ceph-osdD 0006 0  7788  1 
  0x
  [21199.038844]  880fefdafcc8 0086  
  ffe0
  [21199.038848]  880fefdaffd8 880fefdaffd8 880fefdaffd8 
  00013780
  [21199.038852]  88081aa58000 880f68f52de0 880f68f52de0 
  882017556200
  [21199.038856] Call Trace:
  [21199.038858]  [8165a55f] schedule+0x3f/0x60
  [21199.038861]  [8106b7e5] exit_mm+0x85/0x130
  [21199.038864]  [8106b9fe] do_exit+0x16e/0x420
  [21199.038866]  [8109d88f] ? __unqueue_futex+0x3f/0x80
  [21199.038869]  [8107a19a] ? __dequeue_signal+0x6a/0xb0
  [21199.038872]  [8106be54] do_group_exit+0x44/0xa0
  [21199.038874]  [8107ccdc] get_signal_to_deliver+0x21c/0x420
  [21199.038877]  [81013865] do_signal+0x45/0x130
  [21199.038880]  [810a091c] ? do_futex+0x7c/0x1b0
  [21199.038882]  [810a0b5a] ? sys_futex+0x10a/0x1a0
  [21199.038885]  [81013b15] do_notify_resume+0x65/0x80
  [21199.038887]  [81664d50] int_signal+0x12/0x17

We have seen this stack trace several times over the past 6 months, but 
are not sure what the trigger is.  In principle, the ceph server-side 
daemons shouldn't be capable of locking up like this, but clearly 
something is amiss between what they are doing in userland and how the 
kernel is tolerating that.  Low memory, perhaps?  In each case where we 
tried to track it down, the problem seemed to go away on its own.  Is this 
easily reproducible in your case?

 my 0.02$:
 http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg11531.html
 and kernel panic from two different hosts from yesterday during ceph
 startup(on 3.8-rc3, images from console available at
 http://imgur.com/wIRVn,k0QCS#0) leads to suggestion that Ceph may have
 been introduced lockup-alike behavior not a long ago, causing, in my
 case, excessive amount of context switches on the host leading to osd
 flaps and panic at the ip-ib stack due to same issue.

For the stack trace my first guess would be a problem with the IB driver 
that is triggered by memory pressure.  Can you characterize what the 
system utilization (CPU, memory) looks like leading up to the lockup?

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Sage Weil

On Wed, 16 Jan 2013, Wido den Hollander wrote:
 
 On 01/16/2013 11:50 AM, Marcin Szukala wrote:
  Hi all,
  
  Any ideas how can I resolve my issue? Or where the problem is?
  
  Let me describe the issue.
  Host boots up and maps RBD image with XFS filesystems
  Host mounts the filesystems from the RBD image
  Host starts to write data to the mounted filesystems
  Host experiences power failure
  Host comes up and map the RBD image
  Host mounts the filesystems from the RBD image
  All data from all filesystems is lost
  Host is able to use the filesystems with no problems.
  
  Filesystem is XFS, no errors on filesystem,
  
 
 That simply does not make sense to me. How can all data be gone and the FS
 just mount cleanly.
 
 Can you try to format the RBD with EXT4 and see if that makes any difference.
 
 Could you also try to run a sync prior to pulling the power from the host to
 see if that makes any difference.

A few other quick questions:

What version of qemu and librbd are you using?  What is the command line 
that is used to start the VM?  This could be a problem with the qemu 
and librbd caching configuration.

Thanks!
sage


 
 Wido
 
  Kernel 3.5.0-19-generic
  
  root@openstack-1:/etc/init# ceph -s
  health HEALTH_OK
  monmap e1: 3 mons at
  {a=10.3.82.102:6789/0,b=10.3.82.103:6789/0,d=10.3.82.105:6789/0},
  election epoch 10, quorum 0,1,2 a,b,d
  osdmap e132: 56 osds: 56 up, 56 in
   pgmap v87165: 13744 pgs: 13744 active+clean; 52727 MB data, 102 GB
  used, 52028 GB / 52131 GB avail
  mdsmap e1: 0/0/1 up
  
  Regards,
  Marcin
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

2013-01-16 Thread Sébastien Han

Thanks Sage!

--
Regards,
Sébastien Han.


On Wed, Jan 16, 2013 at 5:39 PM, Sage Weil s...@inktank.com wrote:
 On Wed, 16 Jan 2013, S?bastien Han wrote:
 Can we use this doc as a reference for the upgrade?

 https://github.com/ceph/ceph/blob/eb02eaede53c03579d015ca00a888a48dbab739a/doc/install/upgrading-ceph.rst

 Yeah.  It's pretty simple in this case (since it's a point release
 upgrade):

  - install new package everywhere
  - restart daemons

 (at any rate, in any order)

 The most important ones to upgrade in this case are the ceph-osd's.

 sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph version 0.56.1, data loss on power failure

2013-01-16 Thread Wido den Hollander





Op 16 jan. 2013 om 18:00 heeft Sage Weil s...@inktank.com het volgende 
geschreven:

 On Wed, 16 Jan 2013, Wido den Hollander wrote:
 
 On 01/16/2013 11:50 AM, Marcin Szukala wrote:
 Hi all,
 
 Any ideas how can I resolve my issue? Or where the problem is?
 
 Let me describe the issue.
 Host boots up and maps RBD image with XFS filesystems
 Host mounts the filesystems from the RBD image
 Host starts to write data to the mounted filesystems
 Host experiences power failure
 Host comes up and map the RBD image
 Host mounts the filesystems from the RBD image
 All data from all filesystems is lost
 Host is able to use the filesystems with no problems.
 
 Filesystem is XFS, no errors on filesystem,
 
 That simply does not make sense to me. How can all data be gone and the FS
 just mount cleanly.
 
 Can you try to format the RBD with EXT4 and see if that makes any difference.
 
 Could you also try to run a sync prior to pulling the power from the host 
 to
 see if that makes any difference.
 
 A few other quick questions:
 
 What version of qemu and librbd are you using?  What is the command line 
 that is used to start the VM?  This could be a problem with the qemu 
 and librbd caching configuration.
 

I don't think he uses Qemu. From what I understand he uses kernel RBD since he 
uses the words 'map' and 'unmap'

Wido

 Thanks!
 sage
 
 
 
 Wido
 
 Kernel 3.5.0-19-generic
 
 root@openstack-1:/etc/init# ceph -s
health HEALTH_OK
monmap e1: 3 mons at
 {a=10.3.82.102:6789/0,b=10.3.82.103:6789/0,d=10.3.82.105:6789/0},
 election epoch 10, quorum 0,1,2 a,b,d
osdmap e132: 56 osds: 56 up, 56 in
 pgmap v87165: 13744 pgs: 13744 active+clean; 52727 MB data, 102 GB
 used, 52028 GB / 52131 GB avail
mdsmap e1: 0/0/1 up
 
 Regards,
 Marcin
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: flashcache

2013-01-16 Thread Mark Nelson


On 01/16/2013 03:46 PM, Sage Weil wrote:

On Wed, 16 Jan 2013, Gandalf Corvotempesta wrote:

2013/1/16 Sage Weil s...@inktank.com:

This sort of configuration effectively bundles the disk and SSD into a
single unit, where the failure of either results in the loss of both.
 From Ceph's perspective, it doesn't matter if the thing it is sitting on
is a single disk, an SSD+disk flashcache thing, or a big RAID array.  All
that changes is the probability of failure.


Ok, it will fail, but this should not be an issue, in a cluster like
ceph, right?
With or without flashcache or SSD, ceph should be able to handle
disks/nodes/osds failures on its own by replicating in real time to
multiple server.


Exactly.


Should I worry about loosing data in case of failure? It should
rebalance automatically in case of failure with no data loss.


You should not worry, except to the extent that 2 might fail
simultaneously, and failures in general are not good things.


I would worry that there is a lot of stuff piling onto the SSD and it may
become your bottleneck.  My guess is that another 1-2 SSDs will be a
better 'balance', but only experiementation will really tell us that.

Otherwise, those seem to all be good things to put on teh SSD!


I can't add more than 2 SSD, I don't have enough space.
I can move OS to the first 2 spinning disks in raid1 software, if this
will improve performance of SSD

What about swap? I'm thinking to no use swap at all and start with
16/32GB RAM


You could use the first (single) disk for os and logs.  You might not even
bother with raid1, since you will presumably be replicating across hosts.
When the OSD disk dies, you can re-run your chef/juju/puppet rule or
whatever provisioning tool is at work to reinstall/configure the OS disk.
The data on the SSDs and data disks will all be intact.


Other options might be network boot or even usb stick boot.



sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ceph slow request unstable issue

2013-01-16 Thread Andrey Korolyov

On Wed, Jan 16, 2013 at 10:35 PM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jan 16, 2013 at 8:58 PM, Sage Weil s...@inktank.com wrote:
 Hi,

 On Wed, 16 Jan 2013, Andrey Korolyov wrote:
 On Wed, Jan 16, 2013 at 4:58 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
  Hi list,
  We are suffering from OSD or OS down when there is continuing 
  high pressure on the Ceph rack.
  Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each 
  nodes with 20 * spindles + 4* SSDs as journal.(120 spindles in total)
  We create a lots of RBD volumes (say 240),mounting to 16 
  different client machines ( 15 RBD Volumes/ client) and running DD 
  concurrently on top of each RBD.
 
  The issues are:
  1. Slow requests
  ??From the list-archive it seems solved in 0.56.1 but we still notice 
  such warning
  2. OSD Down or even host down
  Like the message below.Seems some OSD has been blocking there for quite a 
  long time.
 
  Suggestions are highly appreciate.Thanks


  Xiaoxi
 
  _
 
  Bad news:
 
  I have  back all my Ceph machine?s OS to kernel  3.2.0-23, which Ubuntu 
  12.04 use.
  I run dd command (dd if=/dev/zero bs=1M count=6 of=/dev/rbd${i}  )on 
  Ceph client to create data prepare test at last night.
  Now, I have one machine down (can?t be reached by ping), another two 
  machine has all OSD daemon down, while the three left has some daemon 
  down.
 
  I have many warnings in OSD log like this:
 
  no flag points reached
  2013-01-15 19:14:22.769898 7f20a2d57700  0 log [WRN] : slow request 
  52.218106 seconds old, received at 2013-01-15 19:13:30.551718: 
  osd_op(client.10674.1:1002417 rb.0.27a8.6b8b4567.0eba [write 
  3145728~524288] 2.c61810ee RETRY) currently waiting for sub ops
  2013-01-15 19:14:23.770077 7f20a2d57700  0 log [WRN] : 21 slow requests, 
  6 included below; oldest blocked for  1132.138983 secs
  2013-01-15 19:14:23.770086 7f20a2d57700  0 log [WRN] : slow request 
  53.216404 seconds old, received at 2013-01-15 19:13:30.553616: 
  osd_op(client.10671.1:1066860 rb.0.282c.6b8b4567.1057 [write 
  2621440~524288] 2.ea7acebc) currently waiting for sub ops
  2013-01-15 19:14:23.770096 7f20a2d57700  0 log [WRN] : slow request 
  51.442032 seconds old, received at 2013-01-15 19:13:32.327988: 
  osd_op(client.10674.1:1002418
 
  Similar info in dmesg we have saw pervious:
 
  [21199.036476] INFO: task ceph-osd:7788 blocked for more than 120 seconds.
  [21199.037493] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
  disables this message.
  [21199.038841] ceph-osdD 0006 0  7788  1 
  0x
  [21199.038844]  880fefdafcc8 0086  
  ffe0
  [21199.038848]  880fefdaffd8 880fefdaffd8 880fefdaffd8 
  00013780
  [21199.038852]  88081aa58000 880f68f52de0 880f68f52de0 
  882017556200
  [21199.038856] Call Trace:
  [21199.038858]  [8165a55f] schedule+0x3f/0x60
  [21199.038861]  [8106b7e5] exit_mm+0x85/0x130
  [21199.038864]  [8106b9fe] do_exit+0x16e/0x420
  [21199.038866]  [8109d88f] ? __unqueue_futex+0x3f/0x80
  [21199.038869]  [8107a19a] ? __dequeue_signal+0x6a/0xb0
  [21199.038872]  [8106be54] do_group_exit+0x44/0xa0
  [21199.038874]  [8107ccdc] get_signal_to_deliver+0x21c/0x420
  [21199.038877]  [81013865] do_signal+0x45/0x130
  [21199.038880]  [810a091c] ? do_futex+0x7c/0x1b0
  [21199.038882]  [810a0b5a] ? sys_futex+0x10a/0x1a0
  [21199.038885]  [81013b15] do_notify_resume+0x65/0x80
  [21199.038887]  [81664d50] int_signal+0x12/0x17

 We have seen this stack trace several times over the past 6 months, but
 are not sure what the trigger is.  In principle, the ceph server-side
 daemons shouldn't be capable of locking up like this, but clearly
 something is amiss between what they are doing in userland and how the
 kernel is tolerating that.  Low memory, perhaps?  In each case where we
 tried to track it down, the problem seemed to go away on its own.  Is this
 easily reproducible in your case?

 my 0.02$:
 http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg11531.html
 and kernel panic from two different hosts from yesterday during ceph
 startup(on 3.8-rc3, images from console available at
 http://imgur.com/wIRVn,k0QCS#0) leads to suggestion that Ceph may have
 been introduced lockup-alike behavior not a long ago, causing, in my
 case, excessive amount of context switches on the host leading to osd
 flaps and panic at the ip-ib stack due to same issue.

 For the stack trace my first guess would be a problem with the IB driver
 that is triggered by memory pressure.  Can you characterize what the
 system utilization

Re: [PATCH REPOST 0/2] libceph: embed r_trail struct in ceph_osd_request()

2013-01-16 Thread Josh Durgin


On 01/03/2013 03:34 PM, Alex Elder wrote:

This series simplifies some handling of osd client message
handling by using an initialized ceph_pagelist structure
to refer to the trail portion of a ceph_osd_request rather
than using a null pointer to represent not there.

-Alex

[PATCH REPOST 1/2] libceph: always allow trail in osd request
[PATCH REPOST 2/2] libceph: kill op_needs_trail()


These look good.
Reviewed-by: Josh Durgin josh.dur...@inktank.com

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST] rbd: separate layout init

2013-01-16 Thread Josh Durgin


On 01/03/2013 02:55 PM, Alex Elder wrote:

Pull a block of code that initializes the layout structure in an osd
request into its own function so it can be reused.

Signed-off-by: Alex Elder el...@inktank.com
---


Reviewed-by: Josh Durgin josh.dur...@inktank.com


  drivers/block/rbd.c |   23 ++-
  1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fd6a708..8e030d1 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -54,6 +54,7 @@

  /* It might be useful to have this defined elsewhere too */

+#defineU32_MAX ((u32) (~0U))
  #define   U64_MAX ((u64) (~0ULL))

  #define RBD_DRV_NAME rbd
@@ -1096,6 +1097,16 @@ static void rbd_coll_end_req(struct rbd_request
*rbd_req,
ret, len);
  }

+static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id)
+{
+   memset(layout, 0, sizeof (*layout));
+   layout-fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   layout-fl_stripe_count = cpu_to_le32(1);
+   layout-fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   rbd_assert(pool_id = (u64) U32_MAX);
+   layout-fl_pg_pool = cpu_to_le32((u32) pool_id);
+}
+
  /*
   * Send ceph osd request
   */
@@ -1117,7 +1128,6 @@ static int rbd_do_request(struct request *rq,
  u64 *ver)
  {
struct ceph_osd_request *osd_req;
-   struct ceph_file_layout *layout;
int ret;
u64 bno;
struct timespec mtime = CURRENT_TIME;
@@ -1161,14 +1171,9 @@ static int rbd_do_request(struct request *rq,
strncpy(osd_req-r_oid, object_name, sizeof(osd_req-r_oid));
osd_req-r_oid_len = strlen(osd_req-r_oid);

-   layout = osd_req-r_file_layout;
-   memset(layout, 0, sizeof(*layout));
-   layout-fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   layout-fl_stripe_count = cpu_to_le32(1);
-   layout-fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   layout-fl_pg_pool = cpu_to_le32((int) rbd_dev-spec-pool_id);
-   ret = ceph_calc_raw_layout(osdc, layout, snapid, ofs, len, bno,
-  osd_req, ops);
+   rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
+   ret = ceph_calc_raw_layout(osdc, osd_req-r_file_layout,
+   snapid, ofs, len, bno, osd_req, ops);
rbd_assert(ret == 0);

ceph_osdc_build_request(osd_req, ofs, len,



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST 0/6] libceph: parameter cleanup

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:31 AM, Alex Elder wrote:

This series mostly cleans up parameters used by functions
in libceph, in the osd client code.

-Alex

[PATCH REPOST 1/6] libceph: pass length to ceph_osdc_build_request()
[PATCH REPOST 2/6] libceph: pass length to ceph_calc_file_object_mapping()
[PATCH REPOST 3/6] libceph: drop snapid in ceph_calc_raw_layout()
[PATCH REPOST 4/6] libceph: drop osdc from ceph_calc_raw_layout()
[PATCH REPOST 5/6] libceph: don't set flags in ceph_osdc_alloc_request()
[PATCH REPOST 6/6] libceph: don't set pages or bio in
ceph_osdc_alloc_request()


These all look good.

Reviewed-by: Josh Durgin josh.dur...@inktank.com

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mds: first stab at lookup-by-ino problem/soln description

2013-01-16 Thread Yan, Zheng

On Thu, Jan 17, 2013 at 5:52 AM, Gregory Farnum g...@inktank.com wrote:
 My biggest concern with this was how it worked on cluster with
 multiple data pools, and Sage's initial response was to either
 1) create an object for each inode that lives in the metadata pool,
 and holds the backtraces (rather than putting them as attributes on
 the first object in the file), or
 2) use a more sophisticated data structure, perhaps built on Eleanor's
 b-tree project from last summer
 (http://ceph.com/community/summer-adventures-with-ceph-building-a-b-tree/)

 I had thought that we could just query each data pool for the object,
 but Sage points out that 100-pool clusters aren't exactly unreasonable
 and that would take quite a lot of query time. And having the
 backtraces in the data pools significantly complicates things with our
 rules about setting layouts on new files.

 So this is going to need some kind of revision, please suggest alternatives!
 -Greg

how about using DHT to map regular files to their parent directories,
then use backtraces
to find parent directory's path.

Regards
Yan, Zheng


 On Tue, Jan 15, 2013 at 3:35 PM, Sage Weil s...@inktank.com wrote:
 One of the first things we need to fix in the MDS is how we support
 lookup-by-ino.  It's important for fsck, NFS reexport, and (insofar as
 there are limitations to the current anchor table design) hard links and
 snapshots.

 Below is a description of the problem and a rough sketch of my proposed
 solution.  This is the first time I thought about the lookup algorithm in
 any detail, so I've probably missed something, and the 'ghost entries' bit
 is what came to mind on the plane.  Hopefully we can think of something a
 bit lighter weight.

 Anyway, poke holes if anything isn't clear, if you have any better ideas,
 or if it's time to refine further.  This is just a starting point for the
 conversation.


 The problem
 ---

 The MDS stores all fs metadata (files, inodes) in a hierarchy,
 allowing it to distribute responsibility among ceph-mds daemons by
 partitioning the namespace hierarchically.  This is also a huge win
 for inode prefetching: loading the directory gets you both the names
 and the inodes in a single IO.

 One consequence of this is that we do not have a flat inode table
 that let's us look up files by inode number.  We *can* find
 directories by ino simply because they are stored in an object named
 after the ino.  However, we can't populate the cache this way because
 the metadata in cache must be fully attached to the root to avoid
 various forms of MDS anarchy.

 Lookup-by-ino is currently needed for hard links.  The first link to a
 file is deemed the primary link, and that is where the inode is
 stored.  Any additional links are internally remote links, and
 reference the inode by ino.  However, there are other uses for
 lookup-by-ino, including NFS reexport and fsck.

 Anchor table
 

 The anchor table is currently used to locate inodes that have hard
 links.  Inodes in the anchor table are said to be anchored, and can
 be found by ino alone with no knowledge of their path.  Normally, only
 inodes that have hard links need to be anchored.  There are a few
 other cases, but they are not relevant here.

 The anchor table is a flat table of records like:

  ino - (parent ino, hash(name), refcount)

 All parent ino's referenced in the table also have records.  The
 refcount includes both other records listing a given ino as parent and
 the anchor itself (i.e., the inode).  To anchor an inode, we insert
 records for the ino and all ancestors (if they are not already present).

 An anchor removal means decrementing the ino record.  Once a refcount
 hits 0 it can be removed, and the parent ino's refcount can be
 decremented.

 A directory rename involves changing the parent ino value for an
 existing record, populating the new ancestors into the table (as
 needed), and decrementing the old parent's refcount.

 This all works great if there are a small number of anchors, but does
 not scale.  The entire table is managed by a single MDS, and is
 currently kept in memory.  We do not want to anchor every inode in the
 system or this is impractical.

 But, be want lookup-by-ino for NFS reexport, and something
 similar/related for fsck.


 Current lookup by ino procedure
 ---

 ::

  lookup_ino(ino)
send message mds.N - mds.0
  anchor lookup $ino
get reply message mds.0 - mds.N
  reply contains record for $ino and all ancestors (an anchor trace)
parent = depest ancestor in trace that we have in our cache
while parent != ino
  child = parent.lookup(hash(name))
  if not found
restart from the top
  parent = child


 Directory backpointers
 --

 There is partial infrastructure for supporting fsck that is already 
 maintained
 for directories.  Each directory object (the first object for the directory,
 if there are multiple

Re: mds: first stab at lookup-by-ino problem/soln description

2013-01-16 Thread Gregory Farnum

On Wed, Jan 16, 2013 at 3:54 PM, Sam Lang sam.l...@inktank.com wrote:

 On Wed, Jan 16, 2013 at 3:52 PM, Gregory Farnum g...@inktank.com wrote:

 My biggest concern with this was how it worked on cluster with
 multiple data pools, and Sage's initial response was to either
 1) create an object for each inode that lives in the metadata pool,
 and holds the backtraces (rather than putting them as attributes on
 the first object in the file), or
 2) use a more sophisticated data structure, perhaps built on Eleanor's
 b-tree project from last summer
 (http://ceph.com/community/summer-adventures-with-ceph-building-a-b-tree/)

 I had thought that we could just query each data pool for the object,
 but Sage points out that 100-pool clusters aren't exactly unreasonable
 and that would take quite a lot of query time. And having the
 backtraces in the data pools significantly complicates things with our
 rules about setting layouts on new files.

 So this is going to need some kind of revision, please suggest
 alternatives!


 Correct me if I'm wrong, but this seems like its only an issue in the NFS
 reexport case, as fsck can walk through the data objects in each pool (in
 parallel?) and verify back/forward consistency, so we won't have to guess
 which pool an ino is in.

 Given that, if we could stuff the pool id in the ino for the file returned
 through the client interfaces, then we wouldn't have to guess.

 -sam

I'm not familiar with the interfaces at work there. Do we have a free
32 bits we can steal in order to do that stuffing? (I *think* it would
go in the NFS filehandle structure rather than the ino, right?)
We would need to also store that information in order to eventually
replace the anchor table, but of course that's much easier to deal
with. If we can just do it this way, that still leaves handling files
which don't have any data written yet — under our current system,
users can apply a data layout to any inode which has not had data
written to it yet. Unfortunately that gets hard to deal with if a user
touches a bunch of files and then comes back to place them the next
day. :/ I suppose un-touched files could have the special property
that their lookup data is stored in the metadata pool and it gets
moved as soon as they have data — in the typical case files are
written right away and so this wouldn't be any more writes, just a bit
more logic.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/29] Various fixes for MDS

2013-01-16 Thread Sage Weil

Hi Yan,

I reviewed these on the plane last night and they look good.  There was 
one small cleanup I pushed on top of wip-mds (in ceph.git).  I'll run this 
through our (still limited) fs suite and then merge into master.

Thanks!
sage

On Fri, 4 Jan 2013, Yan, Zheng wrote:

 From: Yan, Zheng zheng.z@intel.com
 
 This patch series fix various issues I encountered when running 3 MDS.
 I test this patch series by runing fsstress on two clients, using the
 same test directory. The MDS and clients could survived overnight test
 at times.
 
 This patch series are also in:
   git://github.com/ukernel/ceph.git wip-mds
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Ceph slow request unstable issue

2013-01-16 Thread Sage Weil

On Thu, 17 Jan 2013, Chen, Xiaoxi wrote:
 Hi Sage?
   Both CPU and Memory utilization are very low. CPU is ~ 20% (with 
 60% IOWAIT), Memory is far more less . I have 32 Core Sandybridege 
 CPU(64 Core for HT), together with 128GB RAM per node.

Hmm!

 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com] 
 Sent: 2013?1?17? 0:59
 To: Andrey Korolyov
 Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
 Subject: Re: Ceph slow request  unstable issue
 
 Hi,
 
 On Wed, 16 Jan 2013, Andrey Korolyov wrote:
  On Wed, Jan 16, 2013 at 4:58 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
   Hi list,
   We are suffering from OSD or OS down when there is continuing 
   high pressure on the Ceph rack.
   Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each 
   nodes with 20 * spindles + 4* SSDs as journal.(120 spindles in total)
   We create a lots of RBD volumes (say 240),mounting to 16 
   different client machines ( 15 RBD Volumes/ client) and running DD 
   concurrently on top of each RBD.
  
   The issues are:
   1. Slow requests
   ??From the list-archive it seems solved in 0.56.1 but we still 
   notice such warning 2. OSD Down or even host down Like the message 
   below.Seems some OSD has been blocking there for quite a long time.

There is still an issue with throttling recovery/migration traffic 
leading to the slow requests that should be fixed shortly.

   Suggestions are highly appreciate.Thanks
 
 
   
   Xiaoxi
  
   _
  
   Bad news:
  
   I have  back all my Ceph machine?s OS to kernel  3.2.0-23, which Ubuntu 
   12.04 use.
   I run dd command (dd if=/dev/zero bs=1M count=6 of=/dev/rbd${i}  )on 
   Ceph client to create data prepare test at last night.

Oooh, you are running the kernel RBD client on a 3.2 kernel.  There have 
been a long series of fixes since then, but we've only backported as far 
back as 3.4.  Can you try a newer kernel version for the client?  
Something a recnet 3.4 or 3.7 series, like 3.7.2 or 3.4.25... 

Thanks!

   Now, I have one machine down (can?t be reached by ping), another two 
   machine has all OSD daemon down, while the three left has some daemon 
   down.
  
   I have many warnings in OSD log like this:
  
   no flag points reached
   2013-01-15 19:14:22.769898 7f20a2d57700  0 log [WRN] : slow request 
   52.218106 seconds old, received at 2013-01-15 19:13:30.551718: 
   osd_op(client.10674.1:1002417 rb.0.27a8.6b8b4567.0eba [write 
   3145728~524288] 2.c61810ee RETRY) currently waiting for sub ops
   2013-01-15 19:14:23.770077 7f20a2d57700  0 log [WRN] : 21 slow 
   requests, 6 included below; oldest blocked for  1132.138983 secs
   2013-01-15 19:14:23.770086 7f20a2d57700  0 log [WRN] : slow request 
   53.216404 seconds old, received at 2013-01-15 19:13:30.553616: 
   osd_op(client.10671.1:1066860 rb.0.282c.6b8b4567.1057 [write 
   2621440~524288] 2.ea7acebc) currently waiting for sub ops
   2013-01-15 19:14:23.770096 7f20a2d57700  0 log [WRN] : slow request 
   51.442032 seconds old, received at 2013-01-15 19:13:32.327988: 
   osd_op(client.10674.1:1002418
  
   Similar info in dmesg we have saw pervious:
  
   [21199.036476] INFO: task ceph-osd:7788 blocked for more than 120 seconds.
   [21199.037493] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
   disables this message.
   [21199.038841] ceph-osdD 0006 0  7788  1 
   0x
   [21199.038844]  880fefdafcc8 0086  
   ffe0 [21199.038848]  880fefdaffd8 880fefdaffd8 
   880fefdaffd8 00013780 [21199.038852]  88081aa58000 
   880f68f52de0 880f68f52de0 882017556200 [21199.038856] Call 
   Trace:
   [21199.038858]  [8165a55f] schedule+0x3f/0x60 
   [21199.038861]  [8106b7e5] exit_mm+0x85/0x130 
   [21199.038864]  [8106b9fe] do_exit+0x16e/0x420 
   [21199.038866]  [8109d88f] ? __unqueue_futex+0x3f/0x80 
   [21199.038869]  [8107a19a] ? __dequeue_signal+0x6a/0xb0 
   [21199.038872]  [8106be54] do_group_exit+0x44/0xa0 
   [21199.038874]  [8107ccdc] 
   get_signal_to_deliver+0x21c/0x420 [21199.038877]  
   [81013865] do_signal+0x45/0x130 [21199.038880]  
   [810a091c] ? do_futex+0x7c/0x1b0 [21199.038882]  
   [810a0b5a] ? sys_futex+0x10a/0x1a0 [21199.038885]  
   [81013b15] do_notify_resume+0x65/0x80 [21199.038887]  
   [81664d50] int_signal+0x12/0x17
 
 We have seen this stack trace several times over the past 6 months, but are 
 not sure what the trigger is.  In principle, the ceph server-side daemons 
 shouldn't be capable of locking up like this, but clearly something is amiss 
 between what they are doing in userland and

Re: [PATCH REPOST 0/4] rbd: explicitly support only one osd op

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:43 AM, Alex Elder wrote:

An osd request can be made up of multiple ops, all of which are
completed (or not) transactionally.  There is partial support for
multiple ops in an rbd request in the rbd code, but it's incomplete
and not even supported by the osd client or the messenger right now.

I see three problems with this partial implementation:  it gives a
false impression of how things work; it complicates some code in
some cases where it's not necessary; and it may constrain how one
might pursue fully implementing multiple ops in a request to ways
that don't fit well with how we want to do things.

So this series just simplifies things, making it explicit that there
is only one op in an kernel osd client request right now.

-Alex

[PATCH REPOST 1/4] rbd: pass num_op with ops array
[PATCH REPOST 2/4] libceph: pass num_op with ops
[PATCH REPOST 3/4] rbd: there is really only one op
[PATCH REPOST 4/4] rbd: assume single op in a request


These look good.

Reviewed-by: Josh Durgin josh.dur...@inktank.com

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST 0/3] rbd: no need for file mapping calculation

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:51 AM, Alex Elder wrote:

Currently every osd request submitted by the rbd code undergoes a
file mapping operation, which is common with what the ceph file system
uses.  But some analysis shows that there is no need to do this for
rbd, because it already takes care of its own blocking of image data
into distinct objects.  Removing this simplifies things.  I especially
think removing this improves things conceptually, removing a complex
mapping operation from the I/O path.

-Alex

[PATCH REPOST 1/3] rbd: pull in ceph_calc_raw_layout()
[PATCH REPOST 2/3] rbd: open code rbd_calc_raw_layout()
[PATCH REPOST 3/3] rbd: don't bother calculating file mapping


We'll want to use similar methods later for fancier rbd striping with 
format 2 images, but that'll take more restructuring later anyway.

This is fine for now.

Reviewed-by: Josh Durgin josh.dur...@inktank.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST] rbd: kill ceph_osd_req_op-flags

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:46 AM, Alex Elder wrote:

The flags field of struct ceph_osd_req_op is never used, so just get
rid of it.

Signed-off-by: Alex Elder el...@inktank.com
---


Reviewed-by: Josh Durgin josh.dur...@inktank.com


  include/linux/ceph/osd_client.h |1 -
  1 file changed, 1 deletion(-)

diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 2b04d05..69287cc 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -157,7 +157,6 @@ struct ceph_osd_client {

  struct ceph_osd_req_op {
u16 op;   /* CEPH_OSD_OP_* */
-   u32 flags;/* CEPH_OSD_FLAG_* */
union {
struct {
u64 offset, length;



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST] rbd: use a common layout for each device

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:54 AM, Alex Elder wrote:

Each osd message includes a layout structure, and for rbd it is
always the same (at least for osd's in a given pool).

Initialize a layout structure when an rbd_dev gets created and just
copy that into osd requests for the rbd image.

Replace an assertion that was done when initializing the layout
structures with code that catches and handles anything that would
trigger the assertion as soon as it is identified.  This precludes
that (bad) condition from ever occurring.

Signed-off-by: Alex Elder el...@inktank.com
---


Reviewed-by: Josh Durgin josh.dur...@inktank.com


  drivers/block/rbd.c |   34 +++---
  1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 072608e..7c35608 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -235,6 +235,8 @@ struct rbd_device {

char*header_name;

+   struct ceph_file_layout layout;
+
struct ceph_osd_event   *watch_event;
struct ceph_osd_request *watch_request;

@@ -1091,16 +1093,6 @@ static void rbd_coll_end_req(struct rbd_request
*rbd_req,
ret, len);
  }

-static void rbd_layout_init(struct ceph_file_layout *layout, u64 pool_id)
-{
-   memset(layout, 0, sizeof (*layout));
-   layout-fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   layout-fl_stripe_count = cpu_to_le32(1);
-   layout-fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
-   rbd_assert(pool_id = (u64) U32_MAX);
-   layout-fl_pg_pool = cpu_to_le32((u32) pool_id);
-}
-
  /*
   * Send ceph osd request
   */
@@ -1165,7 +1157,7 @@ static int rbd_do_request(struct request *rq,
strncpy(osd_req-r_oid, object_name, sizeof(osd_req-r_oid));
osd_req-r_oid_len = strlen(osd_req-r_oid);

-   rbd_layout_init(osd_req-r_file_layout, rbd_dev-spec-pool_id);
+   osd_req-r_file_layout = rbd_dev-layout; /* struct */

if (op-op == CEPH_OSD_OP_READ || op-op == CEPH_OSD_OP_WRITE) {
op-extent.offset = ofs;
@@ -2295,6 +2287,13 @@ struct rbd_device *rbd_dev_create(struct
rbd_client *rbdc,
rbd_dev-spec = spec;
rbd_dev-rbd_client = rbdc;

+   /* Initialize the layout used for all rbd requests */
+
+   rbd_dev-layout.fl_stripe_unit = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   rbd_dev-layout.fl_stripe_count = cpu_to_le32(1);
+   rbd_dev-layout.fl_object_size = cpu_to_le32(1  RBD_MAX_OBJ_ORDER);
+   rbd_dev-layout.fl_pg_pool = cpu_to_le32((u32) spec-pool_id);
+
return rbd_dev;
  }

@@ -2549,6 +2548,12 @@ static int rbd_dev_v2_parent_info(struct
rbd_device *rbd_dev)
if (parent_spec-pool_id == CEPH_NOPOOL)
goto out;   /* No parent?  No problem. */

+   /* The ceph file layout needs to fit pool id in 32 bits */
+
+   ret = -EIO;
+   if (WARN_ON(parent_spec-pool_id  (u64) U32_MAX))
+   goto out;
+
image_id = ceph_extract_encoded_string(p, end, NULL, GFP_KERNEL);
if (IS_ERR(image_id)) {
ret = PTR_ERR(image_id);
@@ -3678,6 +3683,13 @@ static ssize_t rbd_add(struct bus_type *bus,
goto err_out_client;
spec-pool_id = (u64) rc;

+   /* The ceph file layout needs to fit pool id in 32 bits */
+
+   if (WARN_ON(spec-pool_id  (u64) U32_MAX)) {
+   rc = -EIO;
+   goto err_out_client;
+   }
+
rbd_dev = rbd_dev_create(rbdc, spec);
if (!rbd_dev)
goto err_out_client;



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST] rbd: combine rbd sync watch/unwatch functions

2013-01-16 Thread Josh Durgin


On 01/04/2013 06:55 AM, Alex Elder wrote:

The rbd_req_sync_watch() and rbd_req_sync_unwatch() functions are
nearly identical.  Combine them into a single function with a flag
indicating whether a watch is to be initiated or torn down.

Signed-off-by: Alex Elder el...@inktank.com
---


Reviewed-by: Josh Durgin josh.dur...@inktank.com


  drivers/block/rbd.c |   81
+--
  1 file changed, 27 insertions(+), 54 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 7c35608..c1e5f24 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1429,74 +1429,48 @@ static void rbd_watch_cb(u64 ver, u64 notify_id,
u8 opcode, void *data)
  }

  /*
- * Request sync osd watch
+ * Request sync osd watch/unwatch.  The value of start determines
+ * whether a watch request is being initiated or torn down.
   */
-static int rbd_req_sync_watch(struct rbd_device *rbd_dev)
+static int rbd_req_sync_watch(struct rbd_device *rbd_dev, int start)
  {
struct ceph_osd_req_op *op;
-   struct ceph_osd_client *osdc = rbd_dev-rbd_client-client-osdc;
+   struct ceph_osd_request **linger_req = NULL;
+   __le64 version = 0;
int ret;

op = rbd_create_rw_op(CEPH_OSD_OP_WATCH, 0);
if (!op)
return -ENOMEM;

-   ret = ceph_osdc_create_event(osdc, rbd_watch_cb, 0,
-(void *)rbd_dev, rbd_dev-watch_event);
-   if (ret  0)
-   goto fail;
-
-   op-watch.ver = cpu_to_le64(rbd_dev-header.obj_version);
-   op-watch.cookie = cpu_to_le64(rbd_dev-watch_event-cookie);
-   op-watch.flag = 1;
-
-   ret = rbd_req_sync_op(rbd_dev,
- CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
- op,
- rbd_dev-header_name,
- 0, 0, NULL,
- rbd_dev-watch_request, NULL);
-
-   if (ret  0)
-   goto fail_event;
-
-   rbd_destroy_op(op);
-   return 0;
-
-fail_event:
-   ceph_osdc_cancel_event(rbd_dev-watch_event);
-   rbd_dev-watch_event = NULL;
-fail:
-   rbd_destroy_op(op);
-   return ret;
-}
-
-/*
- * Request sync osd unwatch
- */
-static int rbd_req_sync_unwatch(struct rbd_device *rbd_dev)
-{
-   struct ceph_osd_req_op *op;
-   int ret;
+   if (start) {
+   struct ceph_osd_client *osdc;

-   op = rbd_create_rw_op(CEPH_OSD_OP_WATCH, 0);
-   if (!op)
-   return -ENOMEM;
+   osdc = rbd_dev-rbd_client-client-osdc;
+   ret = ceph_osdc_create_event(osdc, rbd_watch_cb, 0, rbd_dev,
+   rbd_dev-watch_event);
+   if (ret  0)
+   goto done;
+   version = cpu_to_le64(rbd_dev-header.obj_version);
+   linger_req = rbd_dev-watch_request;
+   }

-   op-watch.ver = 0;
+   op-watch.ver = version;
op-watch.cookie = cpu_to_le64(rbd_dev-watch_event-cookie);
-   op-watch.flag = 0;
+   op-watch.flag = (u8) start ? 1 : 0;

ret = rbd_req_sync_op(rbd_dev,
  CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
- op,
- rbd_dev-header_name,
- 0, 0, NULL, NULL, NULL);
-
+ op, rbd_dev-header_name,
+ 0, 0, NULL, linger_req, NULL);

+   if (!start || ret  0) {
+   ceph_osdc_cancel_event(rbd_dev-watch_event);
+   rbd_dev-watch_event = NULL;
+   }
+done:
rbd_destroy_op(op);
-   ceph_osdc_cancel_event(rbd_dev-watch_event);
-   rbd_dev-watch_event = NULL;
+
return ret;
  }

@@ -3031,7 +3005,7 @@ static int rbd_init_watch_dev(struct rbd_device
*rbd_dev)
int ret, rc;

do {
-   ret = rbd_req_sync_watch(rbd_dev);
+   ret = rbd_req_sync_watch(rbd_dev, 1);
if (ret == -ERANGE) {
rc = rbd_dev_refresh(rbd_dev, NULL);
if (rc  0)
@@ -3750,8 +3724,7 @@ static void rbd_dev_release(struct device *dev)
rbd_dev-watch_request);
}
if (rbd_dev-watch_event)
-   rbd_req_sync_unwatch(rbd_dev);
-
+   rbd_req_sync_watch(rbd_dev, 0);

/* clean up and free blkdev */
rbd_free_disk(rbd_dev);



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

2013-01-16 Thread Josh Durgin


On 01/04/2013 07:07 AM, Alex Elder wrote:

The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and
CEPH_OSD_OP_NOTIFY_ACK.  Move the setup of those operations into
rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and
rbd_destroy_op().

Signed-off-by: Alex Elder el...@inktank.com
---
  drivers/block/rbd.c |   68
---
  1 file changed, 27 insertions(+), 41 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 9f41c32..21fbf82 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1027,24 +1027,6 @@ out_err:
return NULL;
  }

-static struct ceph_osd_req_op *rbd_create_rw_op(int opcode, u64 ofs,
u64 len)
-{
-   struct ceph_osd_req_op *op;
-
-   op = kzalloc(sizeof (*op), GFP_NOIO);
-   if (!op)
-   return NULL;
-
-   op-op = opcode;
-
-   return op;
-}
-
-static void rbd_destroy_op(struct ceph_osd_req_op *op)
-{
-   kfree(op);
-}
-
  struct ceph_osd_req_op *rbd_osd_req_op_create(u16 opcode, ...)
  {
struct ceph_osd_req_op *op;
@@ -1087,6 +1069,16 @@ struct ceph_osd_req_op *rbd_osd_req_op_create(u16
opcode, ...)
op-cls.indata_len = (u32) size;
op-payload_len += size;
break;
+   case CEPH_OSD_OP_NOTIFY_ACK:
+   case CEPH_OSD_OP_WATCH:
+   /* rbd_osd_req_op_create(NOTIFY_ACK, cookie, version) */
+   /* rbd_osd_req_op_create(WATCH, cookie, version, flag) */
+   op-watch.cookie = va_arg(args, u64);
+   op-watch.ver = va_arg(args, u64);
+   op-watch.ver = cpu_to_le64(op-watch.ver);   /* XXX */


why the /* XXX */ comment?


+   if (opcode == CEPH_OSD_OP_WATCH  va_arg(args, int))
+   op-watch.flag = (u8) 1;
+   break;
default:
rbd_warn(NULL, unsupported opcode %hu\n, opcode);
kfree(op);
@@ -1434,14 +1426,10 @@ static int rbd_req_sync_notify_ack(struct
rbd_device *rbd_dev,
struct ceph_osd_req_op *op;
int ret;

-   op = rbd_create_rw_op(CEPH_OSD_OP_NOTIFY_ACK, 0, 0);
+   op = rbd_osd_req_op_create(CEPH_OSD_OP_NOTIFY_ACK, notify_id, ver);
if (!op)
return -ENOMEM;

-   op-watch.ver = cpu_to_le64(ver);
-   op-watch.cookie = notify_id;
-   op-watch.flag = 0;
-
ret = rbd_do_request(NULL, rbd_dev, NULL, CEPH_NOSNAP,
  rbd_dev-header_name, 0, 0, NULL,
  NULL, 0,
@@ -1450,7 +1438,8 @@ static int rbd_req_sync_notify_ack(struct
rbd_device *rbd_dev,
  NULL, 0,
  rbd_simple_req_cb, 0, NULL);

-   rbd_destroy_op(op);
+   rbd_osd_req_op_destroy(op);
+
return ret;
  }

@@ -1480,14 +1469,9 @@ static void rbd_watch_cb(u64 ver, u64 notify_id,
u8 opcode, void *data)
   */
  static int rbd_req_sync_watch(struct rbd_device *rbd_dev, int start)
  {
-   struct ceph_osd_req_op *op;
struct ceph_osd_request **linger_req = NULL;
-   __le64 version = 0;
-   int ret;
-
-   op = rbd_create_rw_op(CEPH_OSD_OP_WATCH, 0, 0);
-   if (!op)
-   return -ENOMEM;
+   struct ceph_osd_req_op *op;
+   int ret = 0;

if (start) {
struct ceph_osd_client *osdc;
@@ -1496,26 +1480,28 @@ static int rbd_req_sync_watch(struct rbd_device
*rbd_dev, int start)
ret = ceph_osdc_create_event(osdc, rbd_watch_cb, 0, rbd_dev,
rbd_dev-watch_event);
if (ret  0)
-   goto done;
-   version = cpu_to_le64(rbd_dev-header.obj_version);
+   return ret;
linger_req = rbd_dev-watch_request;
+   } else {
+   rbd_assert(rbd_dev-watch_request != NULL);
}

-   op-watch.ver = version;
-   op-watch.cookie = cpu_to_le64(rbd_dev-watch_event-cookie);
-   op-watch.flag = (u8) start ? 1 : 0;
-
-   ret = rbd_req_sync_op(rbd_dev,
+   op = rbd_osd_req_op_create(CEPH_OSD_OP_WATCH,
+   rbd_dev-watch_event-cookie,
+   rbd_dev-header.obj_version, start);
+   if (op)
+   ret = rbd_req_sync_op(rbd_dev,
  CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
  op, rbd_dev-header_name,
  0, 0, NULL, linger_req, NULL);

-   if (!start || ret  0) {
+   /* Cancel the event if we're tearing down, or on error */
+
+   if (!start || !op || ret  0) {
ceph_osdc_cancel_event(rbd_dev-watch_event);
rbd_dev-watch_event = NULL;
}
-done:
-   rbd_destroy_op(op);
+   rbd_osd_req_op_destroy(op);

return ret;
  }



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org

Re: [PATCH REPOST 0/6] rbd: consolidate osd request setup

2013-01-16 Thread Josh Durgin


On 01/04/2013 07:03 AM, Alex Elder wrote:

This series consolidates and encapsulates the setup of all
osd requests into a single function which takes variable
arguments appropriate for the type of request.  The result
groups together common code idioms and I think makes the
spots that build these messages a little easier to read.

-Alex

[PATCH REPOST 1/6] rbd: don't assign extent info in rbd_do_request()
[PATCH REPOST 2/6] rbd: don't assign extent info in rbd_req_sync_op()
[PATCH REPOST 3/6] rbd: initialize off and len in rbd_create_rw_op()
[PATCH REPOST 4/6] rbd: define generalized osd request op routines
[PATCH REPOST 5/6] rbd: move call osd op setup into rbd_osd_req_op_create()
[PATCH REPOST 6/6] rbd: move remaining osd op setup into
rbd_osd_req_op_create()


I'm not sure about the varargs approach. It makes it easy to
accidentally use the wrong parameters. What do you think about
replacing calls to rbd_osd_req_create_op() with helpers for the various
kinds of requests that just call rbd_osd_req_create_op() themselves, so
that the arguments can be checked at compile time? This will probably
be more of an issue with multi-op osd requests in the future.

Eventually I think all this osd-request-related stuff should go into
libceph, but that's a cleanup for another day.

In any case, the new structure looks good to me.

Reviewed-by: Josh Durgin josh.dur...@inktank.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH REPOST] rbd: assign watch request more directly

2013-01-16 Thread Josh Durgin


On 01/04/2013 07:07 AM, Alex Elder wrote:

Both rbd_req_sync_op() and rbd_do_request() have a linger
parameter, which is the address of a pointer that should refer to
the osd request structure used to issue a request to an osd.

Only one case ever supplies a non-null linger argument: an
CEPH_OSD_OP_WATCH start.  And in that one case it is assigned
rbd_dev-watch_request.

Within rbd_do_request() (where the assignment ultimately gets made)
we know the rbd_dev and therefore its watch_request field.  We
also know whether the op being sent is CEPH_OSD_OP_WATCH start.

Stop opaquely passing down the linger pointer, and instead just
assign the value directly inside rbd_do_request() when it's needed.

This makes it unnecessary for rbd_req_sync_watch() to make
arrangements to hold a value that's not available until a
bit later.  This more clearly separates setting up a watch
request from submitting it.

Signed-off-by: Alex Elder el...@inktank.com
---


Reviewed-by: Josh Durgin josh.dur...@inktank.com


  drivers/block/rbd.c |   20 
  1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 21fbf82..02002b1 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1158,7 +1158,6 @@ static int rbd_do_request(struct request *rq,
  int coll_index,
  void (*rbd_cb)(struct ceph_osd_request *,
 struct ceph_msg *),
- struct ceph_osd_request **linger_req,
  u64 *ver)
  {
struct ceph_osd_client *osdc;
@@ -1210,9 +1209,9 @@ static int rbd_do_request(struct request *rq,
ceph_osdc_build_request(osd_req, ofs, len, 1, op,
snapc, snapid, mtime);

-   if (linger_req) {
+   if (op-op == CEPH_OSD_OP_WATCH  op-watch.flag) {
ceph_osdc_set_request_linger(osdc, osd_req);
-   *linger_req = osd_req;
+   rbd_dev-watch_request = osd_req;
}

ret = ceph_osdc_start_request(osdc, osd_req, false);
@@ -1296,7 +1295,6 @@ static int rbd_req_sync_op(struct rbd_device *rbd_dev,
   const char *object_name,
   u64 ofs, u64 inbound_size,
   char *inbound,
-  struct ceph_osd_request **linger_req,
   u64 *ver)
  {
int ret;
@@ -1317,7 +1315,7 @@ static int rbd_req_sync_op(struct rbd_device *rbd_dev,
  op,
  NULL, 0,
  NULL,
- linger_req, ver);
+ ver);
if (ret  0)
goto done;

@@ -1383,7 +1381,7 @@ static int rbd_do_op(struct request *rq,
 flags,
 op,
 coll, coll_index,
-rbd_req_cb, 0, NULL);
+rbd_req_cb, NULL);
if (ret  0)
rbd_coll_end_req_index(rq, coll, coll_index,
(s32) ret, seg_len);
@@ -1410,7 +1408,7 @@ static int rbd_req_sync_read(struct rbd_device
*rbd_dev,
return -ENOMEM;

ret = rbd_req_sync_op(rbd_dev, CEPH_OSD_FLAG_READ,
-  op, object_name, ofs, len, buf, NULL, ver);
+  op, object_name, ofs, len, buf, ver);
rbd_osd_req_op_destroy(op);

return ret;
@@ -1436,7 +1434,7 @@ static int rbd_req_sync_notify_ack(struct
rbd_device *rbd_dev,
  CEPH_OSD_FLAG_READ,
  op,
  NULL, 0,
- rbd_simple_req_cb, 0, NULL);
+ rbd_simple_req_cb, NULL);

rbd_osd_req_op_destroy(op);

@@ -1469,7 +1467,6 @@ static void rbd_watch_cb(u64 ver, u64 notify_id,
u8 opcode, void *data)
   */
  static int rbd_req_sync_watch(struct rbd_device *rbd_dev, int start)
  {
-   struct ceph_osd_request **linger_req = NULL;
struct ceph_osd_req_op *op;
int ret = 0;

@@ -1481,7 +1478,6 @@ static int rbd_req_sync_watch(struct rbd_device
*rbd_dev, int start)
rbd_dev-watch_event);
if (ret  0)
return ret;
-   linger_req = rbd_dev-watch_request;
} else {
rbd_assert(rbd_dev-watch_request != NULL);
}
@@ -1493,7 +1489,7 @@ static int rbd_req_sync_watch(struct rbd_device
*rbd_dev, int start)
ret = rbd_req_sync_op(rbd_dev,
  CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
  op, rbd_dev-header_name,
- 0, 0, NULL, linger_req, NULL);
+ 0, 0, NULL, NULL);

/* Cancel the event if we're tearing down, or on error */

@@

Re: [PATCH REPOST 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

2013-01-16 Thread Alex Elder

On 01/16/2013 10:23 PM, Josh Durgin wrote:
 On 01/04/2013 07:07 AM, Alex Elder wrote:
 The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and
 CEPH_OSD_OP_NOTIFY_ACK.  Move the setup of those operations into
 rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and
 rbd_destroy_op().

 Signed-off-by: Alex Elder el...@inktank.com
 ---
   drivers/block/rbd.c |   68
 ---
   1 file changed, 27 insertions(+), 41 deletions(-)

 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
 index 9f41c32..21fbf82 100644
 --- a/drivers/block/rbd.c
 +++ b/drivers/block/rbd.c
 @@ -1027,24 +1027,6 @@ out_err:
   return NULL;
   }

 -static struct ceph_osd_req_op *rbd_create_rw_op(int opcode, u64 ofs,
 u64 len)
 -{
 -struct ceph_osd_req_op *op;
 -
 -op = kzalloc(sizeof (*op), GFP_NOIO);
 -if (!op)
 -return NULL;
 -
 -op-op = opcode;
 -
 -return op;
 -}
 -
 -static void rbd_destroy_op(struct ceph_osd_req_op *op)
 -{
 -kfree(op);
 -}
 -
   struct ceph_osd_req_op *rbd_osd_req_op_create(u16 opcode, ...)
   {
   struct ceph_osd_req_op *op;
 @@ -1087,6 +1069,16 @@ struct ceph_osd_req_op *rbd_osd_req_op_create(u16
 opcode, ...)
   op-cls.indata_len = (u32) size;
   op-payload_len += size;
   break;
 +case CEPH_OSD_OP_NOTIFY_ACK:
 +case CEPH_OSD_OP_WATCH:
 +/* rbd_osd_req_op_create(NOTIFY_ACK, cookie, version) */
 +/* rbd_osd_req_op_create(WATCH, cookie, version, flag) */
 +op-watch.cookie = va_arg(args, u64);
 +op-watch.ver = va_arg(args, u64);
 +op-watch.ver = cpu_to_le64(op-watch.ver);/* XXX */
 
 why the /* XXX */ comment?

Because it's the only value here that is converted
from cpu byte order.  It was added in this commit:

a71b891bc7d77a070e723c8c53d1dd73cf931555
rbd: send header version when notifying

And I think was done without full understanding that it
was being done different from all the others.  I think
it may be wrong but I haven't really looked at it yet.
Pulling them all into this function made this difference
more obvious.

It was a note to self that I wanted to fix that.

I normally try to resolve anything like that before I
post for review but I guess I forgot.  There may be
others.

-Alex

 +if (opcode == CEPH_OSD_OP_WATCH  va_arg(args, int))
 +op-watch.flag = (u8) 1;
 +break;
   default:
   rbd_warn(NULL, unsupported opcode %hu\n, opcode);
   kfree(op);
 @@ -1434,14 +1426,10 @@ static int rbd_req_sync_notify_ack(struct
 rbd_device *rbd_dev,
   struct ceph_osd_req_op *op;
   int ret;

 -op = rbd_create_rw_op(CEPH_OSD_OP_NOTIFY_ACK, 0, 0);
 +op = rbd_osd_req_op_create(CEPH_OSD_OP_NOTIFY_ACK, notify_id, ver);
   if (!op)
   return -ENOMEM;

 -op-watch.ver = cpu_to_le64(ver);
 -op-watch.cookie = notify_id;
 -op-watch.flag = 0;
 -
   ret = rbd_do_request(NULL, rbd_dev, NULL, CEPH_NOSNAP,
 rbd_dev-header_name, 0, 0, NULL,
 NULL, 0,
 @@ -1450,7 +1438,8 @@ static int rbd_req_sync_notify_ack(struct
 rbd_device *rbd_dev,
 NULL, 0,
 rbd_simple_req_cb, 0, NULL);

 -rbd_destroy_op(op);
 +rbd_osd_req_op_destroy(op);
 +
   return ret;
   }

 @@ -1480,14 +1469,9 @@ static void rbd_watch_cb(u64 ver, u64 notify_id,
 u8 opcode, void *data)
*/
   static int rbd_req_sync_watch(struct rbd_device *rbd_dev, int start)
   {
 -struct ceph_osd_req_op *op;
   struct ceph_osd_request **linger_req = NULL;
 -__le64 version = 0;
 -int ret;
 -
 -op = rbd_create_rw_op(CEPH_OSD_OP_WATCH, 0, 0);
 -if (!op)
 -return -ENOMEM;
 +struct ceph_osd_req_op *op;
 +int ret = 0;

   if (start) {
   struct ceph_osd_client *osdc;
 @@ -1496,26 +1480,28 @@ static int rbd_req_sync_watch(struct rbd_device
 *rbd_dev, int start)
   ret = ceph_osdc_create_event(osdc, rbd_watch_cb, 0, rbd_dev,
   rbd_dev-watch_event);
   if (ret  0)
 -goto done;
 -version = cpu_to_le64(rbd_dev-header.obj_version);
 +return ret;
   linger_req = rbd_dev-watch_request;
 +} else {
 +rbd_assert(rbd_dev-watch_request != NULL);
   }

 -op-watch.ver = version;
 -op-watch.cookie = cpu_to_le64(rbd_dev-watch_event-cookie);
 -op-watch.flag = (u8) start ? 1 : 0;
 -
 -ret = rbd_req_sync_op(rbd_dev,
 +op = rbd_osd_req_op_create(CEPH_OSD_OP_WATCH,
 +rbd_dev-watch_event-cookie,
 +rbd_dev-header.obj_version, start);
 +if (op)
 +ret = rbd_req_sync_op(rbd_dev,
 CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
 op, rbd_dev-header_name,
 0, 0, NULL, linger_req, NULL);

 -if (!start || ret  0) {
 +/* Cancel the event if we're tearing down, or

Re: flashcache

2013-01-16 Thread Stefan Priebe - Profihost AG

Hi Mark,

Am 16.01.2013 um 22:53 schrieb Mark 
 With only 2 SSDs for 12 spinning disks, you'll need to make sure the SSDs are 
 really fast.  I use Intel 520s for testing which are great, but I wouldn't 
 use them in  production.

Why not? I use them for a ssd only ceph cluster.

Stefan--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Ceph slow request unstable issue

2013-01-16 Thread Chen, Xiaoxi

Some update summary for tested case till now:
Ceph is v0.56.1

1.  RBD:Ubuntu 13.04 + 3.7Kernel 
OSD:Ubuntu 13.04 + 3.7Kernel
XFS

Result: Kernel Panic on both RBD and OSD sides

2.  RBD:Ubuntu 13.04 +3.2Kernel
OSD:Ubuntu 13.04 +3.2Kernel
XFS

Result:Kernel Panic on RBD( ~15Minus)

3.  RBD:Ubuntu 13.04 + 3.6.7 Kernel (Suggested by Ceph.com)
OSD:Ubuntu 13.04 + 3.2   Kernel 
XFS

Result: Auto-reset on OSD ( ~ 30 mins after the test started)

4.  RBD:Ubuntu 13.04+3.6.7 Kernel (Suggested by Ceph.com)
OSD:Ubuntu 12.04 + 3.2.0-36 Kernel (Suggested by Ceph.com)
XFS

Result:auto-reset on OSD ( ~ 30 mins after the test started)

5.  RBD:Ubuntu 13.04+3.6.7 Kernel (Suggested by Ceph.com)
OSD:Ubuntu 13.04 +3.6.7 (Suggested by Sage)
XFS

Result: seems stable for last 1 hour, still running till now


Test 34 are repeatable.
My test setup 
OSD side:
  3 nodes, 60 Disks(20 per nodes,1 per OSD),10Gb E, 4 *Intel 520 SSD per node 
as journal,XFS
  For each node,2 * Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GH + 128GB RAM were 
used.
RBD side:
  8 nodes,for each node:10Gb E,2 * Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GH , 
128GB RAM

Method:
Create 240 RBD and mounted to 8 nodes ( 30 RBD per nodes), doing DD 
concurrently on all 240 RBDs.

After ~ 30 minutes, it's likely to have one of the OSD node reset.

Ceph OSD logs, syslog and dmesg from reseted node are available if you 
needed.(It looks to me that no valuable information except a lot of 
slow-request in OSD's log)



Xiaoxi


-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: 2013年1月17日 10:35
To: Chen, Xiaoxi
Subject: RE: Ceph slow request  unstable issue

On Thu, 17 Jan 2013, Chen, Xiaoxi wrote:
 No, on the OSD node, not the same node. OSD node with 3.2 kernel while 
 client node with 3.6 kernel
 
 We did suffer kernel panic on rbd client nodes but after upgrade 
 client kernel to 3.6.6 it seems solved .

Is it easy to try the 3.6 kernel on the osd nodes too?


 
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com]
 Sent: 2013?1?17? 10:17
 To: Chen, Xiaoxi
 Subject: RE: Ceph slow request  unstable issue
 
 On Thu, 17 Jan 2013, Chen, Xiaoxi wrote:
  It is easily to reproduce in my setup...
  Once I have enough high load on it and waiting for tens of minutes? I can 
  see such log.
  As a forecast, slow requests more than 30~60s  are frequently present in 
  ceph osd's log.
 
 Just replied to your other email.  Do I understand correctly that you are 
 seeing this problem on the *rbd client* nodes?  Or also on the OSDs?  Are 
 they the same nodes?
 
 sage
 
  
  -Original Message-
  From: Sage Weil [mailto:s...@inktank.com]
  Sent: 2013?1?17? 0:59
  To: Andrey Korolyov
  Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
  Subject: Re: Ceph slow request  unstable issue
  
  Hi,
  
  On Wed, 16 Jan 2013, Andrey Korolyov wrote:
   On Wed, Jan 16, 2013 at 4:58 AM, Chen, Xiaoxi xiaoxi.c...@intel.com 
   wrote:
Hi list,
We are suffering from OSD or OS down when there is continuing 
high pressure on the Ceph rack.
Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each 
nodes with 20 * spindles + 4* SSDs as journal.(120 spindles in total)
We create a lots of RBD volumes (say 240),mounting to 16 
different client machines ( 15 RBD Volumes/ client) and running DD 
concurrently on top of each RBD.
   
The issues are:
1. Slow requests
??From the list-archive it seems solved in 0.56.1 but we still 
notice such warning 2. OSD Down or even host down Like the 
message below.Seems some OSD has been blocking there for quite a long 
time.
   
Suggestions are highly appreciate.Thanks



Xiaoxi
   
_
   
Bad news:
   
I have  back all my Ceph machine?s OS to kernel  3.2.0-23, which Ubuntu 
12.04 use.
I run dd command (dd if=/dev/zero bs=1M count=6 of=/dev/rbd${i}  
)on Ceph client to create data prepare test at last night.
Now, I have one machine down (can?t be reached by ping), another two 
machine has all OSD daemon down, while the three left has some daemon 
down.
   
I have many warnings in OSD log like this:
   
no flag points reached
2013-01-15 19:14:22.769898 7f20a2d57700  0 log [WRN] : slow 
request
52.218106 seconds old, received at 2013-01-15 19:13:30.551718: 
osd_op(client.10674.1:1002417 rb.0.27a8.6b8b4567.0eba

Re: code coverage and teuthology

2013-01-16 Thread Loic Dachary

On 01/15/2013 06:21 PM, Josh Durgin wrote:
 On 01/15/2013 02:10 AM, Loic Dachary wrote:
 On 01/14/2013 06:26 PM, Josh Durgin wrote:

 Looking at how it's run automatically might help:

 https://github.com/ceph/teuthology/blob/master/teuthology/coverage.py#L88


 You should also add 'coverage: true' for the ceph task overrides.
 This way daemons are killed with SIGTERM, and the atexit function
 that outputs coverage information will run.

 Then you don't need your patch changing the flavor either.
 For each task X, the docstring for teuthology.task.X.task documents
 example usage and extra options like this.
 Hi,

 That helped a lot, thanks :-) I think I'm almost there. After running:

 ./virtualenv/bin/teuthology --archive /tmp/a1 /srv/3node_rgw.yaml

 wget -O /tmp/build/tmp.tgz 
 http://gitbuilder.ceph.com/ceph-tarball-precise-x86_64-gcov/sha1/$(cat 
 /tmp/a1/ceph-sha1)/ceph.x86_64.tgz

 echo ceph_build_output_dir: /tmp/build  ~/.teuthology.yaml

 ./virtualenv/bin/teuthology-coverage -v --html-output /tmp/html 
 --lcov-output /tmp/lcov --cov-tools-dir /srv/teuthology/coverage /tmp

 I get

 INFO:teuthology.coverage:initializing coverage data...
 Retrieving source and .gcno files...
 Initializing lcov files...
 Deleting all .da files in /tmp/lcov/ceph/src and subdirectories
 Done.
 Capturing coverage data from /tmp/lcov/ceph/src
 Found gcov version: 4.7.2
 Scanning /tmp/lcov/ceph/src for .gcno files ...
 Found 692 graph files in /tmp/lcov/ceph/src
 Processing src/test_libhadoopcephfs_build-AuthMethodList.gcno
 geninfo: ERROR: 
 /tmp/lcov/ceph/src/test_libhadoopcephfs_build-AuthMethodList.gcno: reached 
 unexpected end of file

 root@ceph:/srv/teuthology# ls -l 
 /tmp/lcov/ceph/src/test_libhadoopcephfs_build-AuthMethodList.gcno
 -rw-r--r-- 1 root root 41088 Jan 15 09:49 
 /tmp/lcov/ceph/src/test_libhadoopcephfs_build-AuthMethodList.gcno

 I'm using

 lcov: LCOV version 1.9

 The only problem I can think of is that the machine I'm running lcov on is a 
 Debian GNU/Linux Wheezy, trying to analyze coverage for binaries created for 
 Ubuntu Precise. They are both amd64 but .gcno files may have dependencies to 
 the toolchain.

 Did you ever run into similar problems ?

 I think I did when I built and ran on debian, and it was fixed with a
 later version of lcov (I think 1.9-2). I didn't try doing the coverage
 analysis on a different distribution from where ceph was built and run
 though, so that may also cause some issues.
It was indeed a compatibility problem : running lcov on precise works fine.

Thanks :-)

 Josh
 -- 
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

attachment: loic.vcf

signature.asc
Description: OpenPGP digital signature

HOWTO: teuthology and code coverage

2013-01-16 Thread Loic Dachary

Hi,

I'm happy to report that running teuthology to get a lcov code coverage report 
worked for me.

http://dachary.org/wp-uploads/2013/01/teuthology/total/mon/Monitor.cc.gcov.html

It took me a while to figure out the logic (thanks Josh for the help :-). I 
wrote a HOWTO explaining the steps in detail. It should be straightforward to 
run on an OpenStack tenant, using virtual machines instead of bare metal.

http://dachary.org/?p=1788

Cheers

attachment: loic.vcf

signature.asc
Description: OpenPGP digital signature

Re: Ceph version 0.56.1, data loss on power failure

Re: Ceph version 0.56.1, data loss on power failure

Re: Ceph version 0.56.1, data loss on power failure

Re: Ceph version 0.56.1, data loss on power failure

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

Re: [PATCH] libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

OSD don't start after upgrade form 0.47.2 to 0.56.1

8 out of 12 OSDs died after expansion on 0.56.1 (void OSD::do_waiters())

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

Re: Ceph version 0.56.1, data loss on power failure

Re: Ceph slow request unstable issue

Re: Ceph version 0.56.1, data loss on power failure

Re: REMINDER: all argonaut users should upgrade to v0.48.3argonaut

Re: Ceph version 0.56.1, data loss on power failure

Re: flashcache

Re: Ceph slow request unstable issue

Re: [PATCH REPOST 0/2] libceph: embed r_trail struct in ceph_osd_request()

Re: [PATCH REPOST] rbd: separate layout init

Re: [PATCH REPOST 0/6] libceph: parameter cleanup

Re: mds: first stab at lookup-by-ino problem/soln description

Re: mds: first stab at lookup-by-ino problem/soln description

Re: [PATCH 00/29] Various fixes for MDS

RE: Ceph slow request unstable issue

Re: [PATCH REPOST 0/4] rbd: explicitly support only one osd op

Re: [PATCH REPOST 0/3] rbd: no need for file mapping calculation

Re: [PATCH REPOST] rbd: kill ceph_osd_req_op-flags

Re: [PATCH REPOST] rbd: use a common layout for each device

Re: [PATCH REPOST] rbd: combine rbd sync watch/unwatch functions

Re: [PATCH REPOST 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

Re: [PATCH REPOST 0/6] rbd: consolidate osd request setup

Re: [PATCH REPOST] rbd: assign watch request more directly

Re: [PATCH REPOST 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

Re: flashcache

RE: Ceph slow request unstable issue

Re: code coverage and teuthology

HOWTO: teuthology and code coverage

36 matches

Site Navigation

Mail list logo

Footer information