Re: [ceph-users] cuttlefish countdown -- OSD doesn't get marked out

2013-04-26 Thread Martin Mailand
Hi David,

did you test it with more than one rack as well? In my first problem I
used two racks, with a custom crushmap, so that the replicas are in the
two racks (replicationlevel = 2). Than I took one osd down, and expected
that the remaining osds in this rack would get the now missing replicas
from the osd of the other rack.
But nothing happened, the cluster stayed degraded.

-martin


On 26.04.2013 02:22, David Zafman wrote:
 
 I filed tracker bug 4822 and have wip-4822 with a fix.  My manual testing 
 shows that it works.  I'm building a teuthology test.
 
 Given your osd tree has a single rack it should always mark OSDs down after 5 
 minutes by default.
 
 David Zafman
 Senior Developer
 http://www.inktank.com
 
 
 
 
 On Apr 25, 2013, at 9:38 AM, Martin Mailand mar...@tuxadero.com wrote:
 
 Hi Sage,

 On 25.04.2013 18:17, Sage Weil wrote:
 What is the output from 'ceph osd tree' and the contents of your 
 [mon*] sections of ceph.conf?

 Thanks!
 sage


 root@store1:~# ceph osd tree

 # id weight  type name   up/down reweight
 -1   24  root default
 -3   24  rack unknownrack
 -2   4   host store1
 01   osd.0   up  1   
 11   osd.1   down1   
 21   osd.2   up  1   
 31   osd.3   up  1   
 -4   4   host store3
 10   1   osd.10  up  1   
 11   1   osd.11  up  1   
 81   osd.8   up  1   
 91   osd.9   up  1   
 -5   4   host store4
 12   1   osd.12  up  1   
 13   1   osd.13  up  1   
 14   1   osd.14  up  1   
 15   1   osd.15  up  1   
 -6   4   host store5
 16   1   osd.16  up  1   
 17   1   osd.17  up  1   
 18   1   osd.18  up  1   
 19   1   osd.19  up  1   
 -7   4   host store6
 20   1   osd.20  up  1   
 21   1   osd.21  up  1   
 22   1   osd.22  up  1   
 23   1   osd.23  up  1   
 -8   4   host store2
 41   osd.4   up  1   
 51   osd.5   up  1   
 61   osd.6   up  1   
 71   osd.7   up  1   



 [global]
auth cluster requierd = none
auth service required = none
auth client required = none
 #   log file = 
log_max_recent=100
log_max_new=100

 [mon]
mon data = /data/mon.$id
 [mon.a]
mon host = store1
mon addr = 192.168.195.31:6789
 [mon.b]
mon host = store3
mon addr = 192.168.195.33:6789
 [mon.c]
mon host = store5
mon addr = 192.168.195.35:6789
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cuttlefish countdown -- OSD doesn't get marked out

2013-04-25 Thread Martin Mailand
Hi,

if I shutdown an OSD, the OSD gets marked down after 20 seconds, after
300 seconds the osd should get marked out, an the cluster should resync.
But that doesn't happened, the OSD stays in the status down/in forever,
therefore the cluster stays forever degraded.
I can reproduce it with a new installed cluster.

If I manually set the osd out (ceph osd out 1), the cluster resync
starts immediately.

I think thats a release critical bug, because the cluster health is not
automatically recovered.

And I reported this behavior a while ago
http://article.gmane.org/gmane.comp.file-systems.ceph.user/603/

-martin


Log:


root@store1:~# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e204: 24 osds: 24 up, 24 in
pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB
used, 173 TB / 174 TB avail
   mdsmap e1: 0/0/1 up

root@store1:~# ceph --version
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
root@store1:~# /etc/init.d/ceph stop osd.1
=== osd.1 ===
Stopping Ceph osd.1 on store1...bash: warning: setlocale: LC_ALL: cannot
change locale (en_GB.utf8)
kill 5492...done
root@store1:~# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e204: 24 osds: 24 up, 24 in
pgmap v106709: 5056 pgs: 5056 active+clean; 526 GB data, 1068 GB
used, 173 TB / 174 TB avail
   mdsmap e1: 0/0/1 up

root@store1:~# date -R
Thu, 25 Apr 2013 13:09:54 +0200



root@store1:~# ceph -s  date -R
   health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery
10999/269486 degraded (4.081%); 1/24 in osds are down
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e206: 24 osds: 23 up, 24 in
pgmap v106715: 5056 pgs: 4633 active+clean, 423 active+degraded; 526
GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%)
   mdsmap e1: 0/0/1 up

Thu, 25 Apr 2013 13:10:14 +0200


root@store1:~# ceph -s  date -R
   health HEALTH_WARN 423 pgs degraded; 423 pgs stuck unclean; recovery
10999/269486 degraded (4.081%); 1/24 in osds are down
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 82, quorum 0,1,2 a,b,c
   osdmap e206: 24 osds: 23 up, 24 in
pgmap v106719: 5056 pgs: 4633 active+clean, 423 active+degraded; 526
GB data, 1068 GB used, 173 TB / 174 TB avail; 10999/269486 degraded (4.081%)
   mdsmap e1: 0/0/1 up

Thu, 25 Apr 2013 13:23:01 +0200

On 25.04.2013 01:46, Sage Weil wrote:
 Hi everyone-
 
 We are down to a handful of urgent bugs (3!) and a cuttlefish release date 
 that is less than a week away.  Thank you to everyone who has been 
 involved in coding, testing, and stabilizing this release.  We are close!
 
 If you would like to test the current release candidate, your efforts 
 would be much appreciated!  For deb systems, you can do
 
  wget -q -O- 
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc' | sudo 
 apt-key add -
  echo deb http://gitbuilder.ceph.com/ceph-deb-$(lsb_release 
 -sc)-x86_64-basic/ref/next $(lsb_release -sc) main | sudo tee 
 /etc/apt/sources.list.d/ceph.list
 
 For rpm users you can find packages at
 
  http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/next/
  http://gitbuilder.ceph.com/ceph-rpm-fc17-x86_64-basic/ref/next/
  http://gitbuilder.ceph.com/ceph-rpm-fc18-x86_64-basic/ref/next/
 
 A draft of the release notes is up at
 
  http://ceph.com/docs/master/release-notes/#v0-61
 
 Let me know if I've missed anything!
 
 sage
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cuttlefish countdown -- OSD doesn't get marked out

2013-04-25 Thread Martin Mailand
Hi Sage,

On 25.04.2013 18:17, Sage Weil wrote:
 What is the output from 'ceph osd tree' and the contents of your 
 [mon*] sections of ceph.conf?
 
 Thanks!
 sage


root@store1:~# ceph osd tree

# idweight  type name   up/down reweight
-1  24  root default
-3  24  rack unknownrack
-2  4   host store1
0   1   osd.0   up  1   
1   1   osd.1   down1   
2   1   osd.2   up  1   
3   1   osd.3   up  1   
-4  4   host store3
10  1   osd.10  up  1   
11  1   osd.11  up  1   
8   1   osd.8   up  1   
9   1   osd.9   up  1   
-5  4   host store4
12  1   osd.12  up  1   
13  1   osd.13  up  1   
14  1   osd.14  up  1   
15  1   osd.15  up  1   
-6  4   host store5
16  1   osd.16  up  1   
17  1   osd.17  up  1   
18  1   osd.18  up  1   
19  1   osd.19  up  1   
-7  4   host store6
20  1   osd.20  up  1   
21  1   osd.21  up  1   
22  1   osd.22  up  1   
23  1   osd.23  up  1   
-8  4   host store2
4   1   osd.4   up  1   
5   1   osd.5   up  1   
6   1   osd.6   up  1   
7   1   osd.7   up  1   



[global]
auth cluster requierd = none
auth service required = none
auth client required = none
#   log file = 
log_max_recent=100
log_max_new=100

[mon]
mon data = /data/mon.$id
[mon.a]
mon host = store1
mon addr = 192.168.195.31:6789
[mon.b]
mon host = store3
mon addr = 192.168.195.33:6789
[mon.c]
mon host = store5
mon addr = 192.168.195.35:6789
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Cluster Map Problems

2013-04-03 Thread Martin Mailand
Hi,

I still have this problem in v0.60.
If I stop one OSD, the OSD get set down after 20 seconds. But after 300
seconds the OSD get not set out, there for the ceph stays degraded for ever.
I can reproduce it with a fresh created cluster.

root@store1:~# ceph -s
   health HEALTH_WARN 405 pgs degraded; 405 pgs stuck unclean; recovery
10603/259576 degraded (4.085%); 1/24 in osds are down
   monmap e1: 3 mons at
{a=192.168.195.31:6789/0,b=192.168.195.33:6789/0,c=192.168.195.35:6789/0},
election epoch 10, quorum 0,1,2 a,b,c
   osdmap e150: 24 osds: 23 up, 24 in
pgmap v12028: 4800 pgs: 4395 active+clean, 405 active+degraded; 505
GB data, 1017 GB used, 173 TB / 174 TB avail; 0B/s rd, 6303B/s wr,
2op/s; 10603/259576 degraded (4.085%)
   mdsmap e1: 0/0/1 up


-martin


On 28.03.2013 23:45, John Wilkins wrote:
 Martin,
 
 I'm just speculating: since I just rewrote the networking section and
 there is an empty mon_host value, and I do recall a chat last week
 where mon_host was considered a different setting now, maybe you might
 try specifying:
 
 [mon.a]
 mon host = store1
 mon addr = 192.168.195.31:6789
 
 etc. for monitors. I'm assuming that's not the case, but I want to
 make sure my docs are right on this point.
 
 
 On Thu, Mar 28, 2013 at 3:24 PM, Martin Mailand mar...@tuxadero.com wrote:
 Hi John,

 my ceph.conf is a bit further down in this email.

 -martin

 Am 28.03.2013 23:21, schrieb John Wilkins:

 Martin,

 Would you mind posting your Ceph configuration file too?  I don't see
 any value set for mon_host: 

 On Thu, Mar 28, 2013 at 1:04 PM, Martin Mailand mar...@tuxadero.com
 wrote:

 Hi Greg,

 the dump from mon.a is attached.

 -martin

 On 28.03.2013 20:55, Gregory Farnum wrote:

 Hmm. The monitor code for checking this all looks good to me. Can you
 go to one of your monitor nodes and dump the config?

 (http://ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=admin%20socket#viewing-a-configuration-at-runtime)
 -Greg

 On Thu, Mar 28, 2013 at 12:33 PM, Martin Mailand mar...@tuxadero.com
 wrote:

 Hi,

 I get the same behavior an new created cluster as well, no changes to
 the cluster config at all.
 I stop the osd.1, after 20 seconds it got marked down. But it never get
 marked out.

 ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)

 -martin

 On 28.03.2013 19:48, John Wilkins wrote:

 Martin,

 Greg is talking about noout. With Ceph, you can specifically preclude
 OSDs from being marked out when down to prevent rebalancing--e.g.,
 during upgrades, short-term maintenance, etc.


 http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing

 On Thu, Mar 28, 2013 at 11:12 AM, Martin Mailand mar...@tuxadero.com
 wrote:

 Hi Greg,

 setting the osd manually out triggered the recovery.
 But now it is the question, why is the osd not marked out after 300
 seconds? That's a default cluster, I use the 0.59 build from your
 site.
 And I didn't change any value, except for the crushmap.

 That's my ceph.conf.

 -martin

 [global]
  auth cluster requierd = none
  auth service required = none
  auth client required = none
 #   log file = 
  log_max_recent=100
  log_max_new=100

 [mon]
  mon data = /data/mon.$id
 [mon.a]
  host = store1
  mon addr = 192.168.195.31:6789
 [mon.b]
  host = store3
  mon addr = 192.168.195.33:6789
 [mon.c]
  host = store5
  mon addr = 192.168.195.35:6789
 [osd]
  journal aio = true
  osd data = /data/osd.$id
  osd mount options btrfs = rw,noatime,nodiratime,autodefrag
  osd mkfs options btrfs = -n 32k -l 32k

 [osd.0]
  host = store1
  osd journal = /dev/sdg1
  btrfs devs = /dev/sdc
 [osd.1]
  host = store1
  osd journal = /dev/sdh1
  btrfs devs = /dev/sdd
 [osd.2]
  host = store1
  osd journal = /dev/sdi1
  btrfs devs = /dev/sde
 [osd.3]
  host = store1
  osd journal = /dev/sdj1
  btrfs devs = /dev/sdf
 [osd.4]
  host = store2
  osd journal = /dev/sdg1
  btrfs devs = /dev/sdc
 [osd.5]
  host = store2
  osd journal = /dev/sdh1
  btrfs devs = /dev/sdd
 [osd.6]
  host = store2
  osd journal = /dev/sdi1
  btrfs devs = /dev/sde
 [osd.7]
  host = store2
  osd journal = /dev/sdj1
  btrfs devs = /dev/sdf
 [osd.8]
  host = store3
  osd journal = /dev/sdg1
  btrfs devs = /dev/sdc
 [osd.9]
  host = store3
  osd journal = /dev/sdh1
  btrfs devs = /dev/sdd
 [osd.10]
  host = store3
  osd journal = /dev/sdi1
  btrfs devs = /dev/sde
 [osd.11]
  host = store3
  osd journal = /dev/sdj1
  btrfs devs = /dev/sdf
 [osd.12]
  host = store4
  osd journal = /dev/sdg1
  btrfs devs = /dev/sdc

Re: [ceph-users] Mon crash

2013-03-28 Thread Martin Mailand
Hi Joao,

thanks for catching that up.

-martin

On 28.03.2013 20:03, Joao Eduardo Luis wrote:
 
 Hi Martin,
 
 As John said in his reply, these should be reported to ceph-devel (CC'ing).
 
 Anyway, this is bug #4519 [1].  It was introduced after 0.58, released
 under 0.59 and is already fixed in master.  As far as we can tell, only
 when using auth none will anyone using 0.59 stumble upon it.
 
   -Joao
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


osdc/ObjectCacher.cc: 834: FAILED assert(ob-last_commit_tid tid)

2013-02-14 Thread Martin Mailand
Hi List,

I get reproducible this assertion, how can I help to debug it?


-martin

(Lese Datenbank ... 52246 Dateien und Verzeichnisse sind derzeit
installiert.)
Vorbereitung zum Ersetzen von linux-firmware 1.79 (durch
.../linux-firmware_1.79.1_all.deb) ...
Ersatz für linux-firmware wird entpackt ...
osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
tid_t, int)' thread 7f72b7fff700 time 2013-02-14 16:04:48.867285
osdc/ObjectCacher.cc: 834: FAILED assert(ob-last_commit_tid  tid)
 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
unsigned long, int)+0xd68) [0x7f72d4050848]
 2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f72d405742b]
 3: (Context::complete(int)+0xa) [0x7f72d400f9ba]
 4: (librbd::C_Request::finish(int)+0x85) [0x7f72d403f145]
 5: (Context::complete(int)+0xa) [0x7f72d400f9ba]
 6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f72d40241b7]
 7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f72d33db16d]
 8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f72d3444e50]
 9: (()+0x7e9a) [0x7f72d03c7e9a]
 10: (clone()+0x6d) [0x7f72d00f4cbd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osdc/ObjectCacher.cc: 834: FAILED assert(ob-last_commit_tid tid)

2013-02-14 Thread Martin Mailand
Hi Sage,

everything is on 0.56.2 and the cluster is healthy.
I can reproduce it with an apt-get upgrade within the vm, the vm os is
12.04. Most of the time the assertion happened when the firmware .deb is
updated. See the log in my first email.
But I use a custom build qemu version (1.4-rc1), which was build against
0.56.2.


root@store1:~# ceph -s
   health HEALTH_OK
   monmap e1: 1 mons at {a=192.168.195.33:6789/0}, election epoch 1,
quorum 0 a
   osdmap e160: 20 osds: 20 up, 20 in
pgmap v28314: 3264 pgs: 3264 active+clean; 437 GB data, 1027 GB
used, 144 TB / 145 TB avail
   mdsmap e1: 0/0/1 up

root@store1:~# ceph --version
ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)


root@compute4:~# dpkg -l|grep 'rbd\|rados\|qemu'
ii  librados20.56.2-1precise
RADOS distributed object store client library
ii  librbd1  0.56.2-1precise
RADOS block device client library
ii  qemu-common  1.4.0-rc1-vdsp1.0
qemu common functionality (bios, documentation, etc)
ii  qemu-kvm 1.4.0-rc1-vdsp1.0
Full virtualization on i386 and amd64 hardware
ii  qemu-utils   1.4.0-rc1-vdsp1.0
qemu utilities


-martin

On 14.02.2013 18:18, Sage Weil wrote:
 Hi Martin-
 
 On Thu, 14 Feb 2013, Martin Mailand wrote:
 Hi List,

 I get reproducible this assertion, how can I help to debug it?
 
 Can you describe the workload?  Are the OSDs also running 0.56.2(+)?  Any 
 other activity on the server side (data migration, OSD failure, etc.) that 
 may have contributed?
 
 We just reopened http://tracker.ceph.com/issues/2947 to track this.  I'm 
 working on reproducing it now as well.
 
 Thanks!
 sage
 
 
 


 -martin

 (Lese Datenbank ... 52246 Dateien und Verzeichnisse sind derzeit
 installiert.)
 Vorbereitung zum Ersetzen von linux-firmware 1.79 (durch
 .../linux-firmware_1.79.1_all.deb) ...
 Ersatz f?r linux-firmware wird entpackt ...
 osdc/ObjectCacher.cc: In function 'void
 ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
 tid_t, int)' thread 7f72b7fff700 time 2013-02-14 16:04:48.867285
 osdc/ObjectCacher.cc: 834: FAILED assert(ob-last_commit_tid  tid)
  ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
  1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
 unsigned long, int)+0xd68) [0x7f72d4050848]
  2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f72d405742b]
  3: (Context::complete(int)+0xa) [0x7f72d400f9ba]
  4: (librbd::C_Request::finish(int)+0x85) [0x7f72d403f145]
  5: (Context::complete(int)+0xa) [0x7f72d400f9ba]
  6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f72d40241b7]
  7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f72d33db16d]
  8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f72d3444e50]
  9: (()+0x7e9a) [0x7f72d03c7e9a]
  10: (clone()+0x6d) [0x7f72d00f4cbd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.
 terminate called after throwing an instance of 'ceph::FailedAssertion'
 Aborted
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD journal suggestion

2012-11-07 Thread Martin Mailand

Hi,

I have 16 SAS disk on a LSI 9266-8i and 4 Intel 520 SSD on a HBA, the 
node has dual 10G Ethernet. The clients are 4 nodes with dual 10GeB, as 
test I use rados bench on each client. The aggregated write speed is 
around 1,6GB/s with single replication.


In the first configuration, I had the SSDs on the raidcontroller as 
well, but then I saturated the PCIe 2.0 x8 interface of the 
raidcontroller, therefore I use a second controller for the SSDs.



-martin


Am 07.11.2012 17:41, schrieb Mark Nelson:

Well, local, but still over tcp.  Right now I'm focusing on pushing the
osds/filestores as far as I can, and after that I'm going to setup a
bonded 10GbE network to see what kind of messenger bottlenecks I run
into.  Sadly the testing is going slower than I would like.


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD journal suggestion

2012-11-07 Thread Martin Mailand

Hi,

I tested a Arista 7150S-24, a HP5900 and in a few weeks I will get a 
Mellanox MSX1016. ATM the Arista is may favourite.
For the dual 10GeB NICs I tested the Intel X520-DA2 and the Mellanox 
ConnectX-3. My favourite is the Intel X520-DA2.


-martin

Am 07.11.2012 22:14, schrieb Gandalf Corvotempesta:

2012/11/7 Martin Mailand mar...@tuxadero.com:

I have 16 SAS disk on a LSI 9266-8i and 4 Intel 520 SSD on a HBA, the node
has dual 10G Ethernet. The clients are 4 nodes with dual 10GeB, as test I
use rados bench on each client. The aggregated write speed is around 1,6GB/s
with single replication.


Just for curiosity, which switches do you have?


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD journal suggestion

2012-11-07 Thread Martin Mailand

Hi Stefan,

deep buffers means latency spikes, you should go for fast switching 
latency. The HP5900 has a latency of 1ms, the Arista and Mellanox of 250ns.

And I you should think at the price the HP5900 cost 3 times of the Mellanox.

-martin

Am 07.11.2012 22:44, schrieb Stefan Priebe:

Am 07.11.2012 22:35, schrieb Martin Mailand:

Hi,

I tested a Arista 7150S-24, a HP5900 and in a few weeks I will get a
Mellanox MSX1016. ATM the Arista is may favourite.
For the dual 10GeB NICs I tested the Intel X520-DA2 and the Mellanox
ConnectX-3. My favourite is the Intel X520-DA2.


That's pretty interesting i'll get the HP5900 and HP5920 in a few weeks.
HP told me the deep packet buffers of the HP5920 will burst the
performance and should be used for storage related stuff.

Greets,
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD journal suggestion

2012-11-07 Thread Martin Mailand

Hi,

I *think* the HP is Broadcom based, the Arista is Fulcrum based, and I 
don't know which chips Mellanox is using.


Our NOC tested both of them, an the Arista was the clear winner, at 
least in our workload.


-martin

Am 07.11.2012 22:59, schrieb Stefan Priebe:

HP told me they all use the same ships and Arista measures latency while
only one port is in use. HP guarentees the latency when all ports are in
use. If this is correct or just somehing hp told me - i don't know. They
told me the arista is slower and the statistics are not comporable...

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD journal suggestion

2012-11-07 Thread Martin Mailand

good question, probably we do not have enough experience with IPoIB.
But it looks good on paper, so it's definitely a try worth.

-martin

Am 07.11.2012 23:28, schrieb Gandalf Corvotempesta:

2012/11/7 Martin Mailand mar...@tuxadero.com:

I tested a Arista 7150S-24, a HP5900 and in a few weeks I will get a
Mellanox MSX1016. ATM the Arista is may favourite.


Why not infiniband?


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ceph benchmark high wait on journal device

2012-10-15 Thread Martin Mailand

Hi,

inspired from the performance test Mark did, I tried to compile my own one.
I have four OSD processes on one Node, each process has a Intel 710 SSD 
for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
If I test the SSD with fio they are quite fast and the w_wait time is 
quite low.
But if I run rados bench on the cluster, the w_wait times for the 
journal devices are quite high (around 20-40ms).

I thought the SSD would be better, any ideas what happend here?

 -martin

Logs:

/dev/sd{c,d,e,f}
Intel SSD 710 200G

/dev/sd{g,h,i,j}
each 4 x SAS on LSI 9266-8i Raid 0

fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2 
-ioengine libaio -direct 1 -bs 256k


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

- snip -
sdc   0,00 0,000,00  809,20 0,00   202,30 
512,00 0,961,190,001,19   1,18  95,84

- snap -



rados bench -p rbd 300 write -t 16

2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat: 
0.189553

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   300  16 25329 25313   337.443   324  0.274815  0.189553
 Total time run: 300.169843
Total writes made:  25329
Write size: 4194304
Bandwidth (MB/sec): 337.529

Stddev Bandwidth:   25.1568
Max bandwidth (MB/sec): 372
Min bandwidth (MB/sec): 0
Average Latency:0.189597
Stddev Latency: 0.0641609
Max latency:0.469604
Min latency:0.035382


during the rados bench test.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  20,380,00   16,208,870,00   54,55

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0,0041,200,00   12,40 0,00 0,35 
57,42 0,000,310,000,31   0,31   0,38
sdb   0,00 0,000,000,00 0,00 0,00 
0,00 0,000,000,000,00   0,00   0,00
sdc   0,00 0,000,00  332,80 0,00   139,67 
859,53 7,36   22,090,00   22,09   2,12  70,42
sdd   0,00 0,000,00  391,60 0,00   175,84 
919,6215,59   39,620,00   39,62   2,40  93,80
sde   0,00 0,000,00  342,00 0,00   147,39 
882,59 8,54   24,890,00   24,89   2,18  74,58
sdf   0,00 0,000,00  362,20 0,00   162,72 
920,0515,35   42,500,00   42,50   2,60  94,20
sdg   0,00 0,000,00  522,00 0,00   139,20 
546,13 0,280,540,000,54   0,10   5,26
sdh   0,00 0,000,00  672,00 0,00   179,20 
546,13 9,67   14,420,00   14,42   0,61  41,18
sdi   0,00 0,000,00  555,00 0,00   148,00 
546,13 0,320,570,000,57   0,10   5,46
sdj   0,00 0,000,00  582,00 0,00   155,20 
546,13 0,510,870,000,87   0,12   6,96


100 seconds later

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  22,920,00   19,579,250,00   48,25

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0,0040,800,00   15,60 0,00 0,36 
47,08 0,000,220,000,22   0,22   0,34
sdb   0,00 0,000,000,00 0,00 0,00 
0,00 0,000,000,000,00   0,00   0,00
sdc   0,00 0,000,00  386,60 0,00   168,33 
891,7012,11   31,080,00   31,08   2,25  86,86
sdd   0,00 0,000,00  405,00 0,00   183,06 
925,6815,68   38,700,00   38,70   2,34  94,90
sde   0,00 0,000,00  411,00 0,00   185,06 
922,1515,58   38,090,00   38,09   2,33  95,92
sdf   0,00 0,000,00  387,00 0,00   168,33 
890,7912,19   31,480,00   31,48   2,26  87,48
sdg   0,00 0,000,00  646,20 0,00   171,22 
542,64 0,420,650,000,65   0,10   6,70
sdh   0,0085,600,40  797,00 0,01   192,97 
495,6510,95   13,73   32,50   13,72   0,55  44,22
sdi   0,00 0,000,00  678,20 0,00   180,01 
543,59 0,450,670,000,67   0,10   6,76
sdj   0,00 0,000,00  639,00 0,00   169,61 
543,61 0,360,570,000,57   0,10   6,32


 --admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump

Re: Ceph benchmark high wait on journal device

2012-10-15 Thread Martin Mailand

Hi Mark,

I think there is no differences between the 9266-8i and the 9265-8i, 
except for the cache vault and the angel of the SAS connectors.
In the last test, which I posted, the SSDs where connected to the 
onboard SATA ports. Further test showed if I reduce the the object size 
(the -b option) to 1M, 512k, 256k the latency almost vanished.

With 256k the w_wait was around 1ms.
So my observation shows almost the different of yours.

I use a singel controller with a dual expander backplane.

That's the baby.

http://85.214.49.87/ceph/testlab/IMAG0018.jpg

btw.

Is there a nice way to format the output of ceph --admin-daemon 
ceph-osd.0.asok perf_dump?



-martin

Am 15.10.2012 21:50, schrieb Mark Nelson:

Hi Martin,

I haven't tested the 9266-8i specifically, but it may behave similarly
to the 9265-8i.  This is just a theory, but I get the impression that
the controller itself introduces some latency getting data to disk, and
that it may get worse as the more data is pushed across the controller.
That seems to be the case even of the data is not going to the disk in
question.  Are you using a single controller with expanders?  On some of
our nodes that use a single controller with lots of expanders, I've
noticed high IO wait times, especially when doing lots of small writes.

Mark

On 10/15/2012 11:12 AM, Martin Mailand wrote:

Hi,

inspired from the performance test Mark did, I tried to compile my own
one.
I have four OSD processes on one Node, each process has a Intel 710 SSD
for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
If I test the SSD with fio they are quite fast and the w_wait time is
quite low.
But if I run rados bench on the cluster, the w_wait times for the
journal devices are quite high (around 20-40ms).
I thought the SSD would be better, any ideas what happend here?

-martin

Logs:

/dev/sd{c,d,e,f}
Intel SSD 710 200G

/dev/sd{g,h,i,j}
each 4 x SAS on LSI 9266-8i Raid 0

fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2
-ioengine libaio -direct 1 -bs 256k

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
- snip -
sdc 0,00 0,00 0,00 809,20 0,00 202,30 512,00 0,96 1,19 0,00 1,19 1,18
95,84
- snap -



rados bench -p rbd 300 write -t 16

2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat:
0.189553
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
300 16 25329 25313 337.443 324 0.274815 0.189553
Total time run: 300.169843
Total writes made: 25329
Write size: 4194304
Bandwidth (MB/sec): 337.529

Stddev Bandwidth: 25.1568
Max bandwidth (MB/sec): 372
Min bandwidth (MB/sec): 0
Average Latency: 0.189597
Stddev Latency: 0.0641609
Max latency: 0.469604
Min latency: 0.035382


during the rados bench test.

avg-cpu: %user %nice %system %iowait %steal %idle
20,38 0,00 16,20 8,87 0,00 54,55

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 41,20 0,00 12,40 0,00 0,35 57,42 0,00 0,31 0,00 0,31 0,31 0,38
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 332,80 0,00 139,67 859,53 7,36 22,09 0,00 22,09 2,12
70,42
sdd 0,00 0,00 0,00 391,60 0,00 175,84 919,62 15,59 39,62 0,00 39,62 2,40
93,80
sde 0,00 0,00 0,00 342,00 0,00 147,39 882,59 8,54 24,89 0,00 24,89 2,18
74,58
sdf 0,00 0,00 0,00 362,20 0,00 162,72 920,05 15,35 42,50 0,00 42,50 2,60
94,20
sdg 0,00 0,00 0,00 522,00 0,00 139,20 546,13 0,28 0,54 0,00 0,54 0,10
5,26
sdh 0,00 0,00 0,00 672,00 0,00 179,20 546,13 9,67 14,42 0,00 14,42 0,61
41,18
sdi 0,00 0,00 0,00 555,00 0,00 148,00 546,13 0,32 0,57 0,00 0,57 0,10
5,46
sdj 0,00 0,00 0,00 582,00 0,00 155,20 546,13 0,51 0,87 0,00 0,87 0,12
6,96

100 seconds later

avg-cpu: %user %nice %system %iowait %steal %idle
22,92 0,00 19,57 9,25 0,00 48,25

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 40,80 0,00 15,60 0,00 0,36 47,08 0,00 0,22 0,00 0,22 0,22 0,34
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 386,60 0,00 168,33 891,70 12,11 31,08 0,00 31,08 2,25
86,86
sdd 0,00 0,00 0,00 405,00 0,00 183,06 925,68 15,68 38,70 0,00 38,70 2,34
94,90
sde 0,00 0,00 0,00 411,00 0,00 185,06 922,15 15,58 38,09 0,00 38,09 2,33
95,92
sdf 0,00 0,00 0,00 387,00 0,00 168,33 890,79 12,19 31,48 0,00 31,48 2,26
87,48
sdg 0,00 0,00 0,00 646,20 0,00 171,22 542,64 0,42 0,65 0,00 0,65 0,10
6,70
sdh 0,00 85,60 0,40 797,00 0,01 192,97 495,65 10,95 13,73 32,50 13,72
0,55 44,22
sdi 0,00 0,00 0,00 678,20 0,00 180,01 543,59 0,45 0,67 0,00 0,67 0,10
6,76
sdj 0,00 0,00 0,00 639,00 0,00 169,61 543,61 0,36 0,57 0,00 0,57 0,10
6,32

--admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump
{filestore:{journal_queue_max_ops:500,journal_queue_ops:0,journal_ops:34653,journal_queue_max_bytes:104857600,journal_queue_bytes:0,journal_bytes:86821481160,journal_latency:{avgcount:34653,sum:3458.68},journal_wr:19372,journal_wr_bytes:{avgcount:19372,sum:87026655232

rbd map error with new rbd format

2012-09-12 Thread Martin Mailand

Hi,

whilst testing the new rbd layering feature I found a problem with rbd 
map. It seems rbd map doesn't support the new format.


-martin


ceph -v
ceph version 0.51-265-gc7d11cd 
(commit:c7d11cd7b813a47167108c160358f70ec1aab7d6)



rbd create --size 10 --new-format new
rbd map new
add failed: (2) No such file or directory


rbd create --size 10 old
rbd map old
rbd showmapped
id  poolimage   snapdevice
1   rbd old -   /dev/rbd1


rbd info new
rbd image 'new':
size 10 MB in 25000 objects
order 22 (4096 KB objects)
block_name_prefix: rbd_data.101e1a89b511
old format: False
features: layering
rbd info old
rbd image 'old':
size 10 MB in 25000 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.1021.23697452
old format: True
features:
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.51 released

2012-08-27 Thread Martin Mailand

Hi Sage,

is in this release the rbd layering/cloning already testable?
Do you have a link to the docs how to use it?

Best Regards,
 martin


Am 26.08.2012 17:58, schrieb Sage Weil:

The latest development release v0.51 is ready. Notable changes include:

  * crush: tunables documented; feature bit now present and enforced
  * osd: various fixes for out-of-order op replies
  * osd: several rare peering cases fixed
  * osd: fixed detection of EIO errors from fs on read
  * osd: new 'lock' rados class for generic object locking
  * librbd: fixed memory leak on discard
  * librbd: image layering/cloning
  * radosgw: fix range header for large objects, ETag quoting, GMT dates,
other compatibility fixes
  * mkcephfs: fix for default keyring, osd data/journal locations
  * wireshark: ceph protocol dissector patch updated
  * ceph.spec: fixed packaging problem with crush headers

Full RBD cloning support will be in place in v0.52, as will a refactor of
the messenger code with many bug fixes in the socket failure handling.
This is available for testing now in 'next' for the adventurous. Improved
OSD scrubbing is also coming soon. We should (finally) be building some
release RPMs for v0.52 as well.

You can get v0.51 from the usual locations:

  * Git at git://github.com/ceph/ceph.git
  * Tarball athttp://ceph.newdream.net/download/ceph-0.51.tar.gz
  * For Debian/Ubuntu packages, 
seehttp://ceph.newdream.net/docs/master/install/debian
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RBD layering design draft

2012-06-17 Thread Martin Mailand

Hi,
what's up locked, unlocked, unlocking?

-martin

Am 16.06.2012 17:11, schrieb Sage Weil:

On Fri, 15 Jun 2012, Yehuda Sadeh wrote:

On Fri, Jun 15, 2012 at 5:46 PM, Sage Weils...@inktank.com  wrote:

Looks good!  Couple small things:


 $ rbd unpreserve pool/image@snap


Is 'preserve' and 'unpreserve' the verbiage we want to use here?  Not sure
I have a better suggestion, but preserve is unusual.



freeze, thaw/unfreeze?


Freeze/thaw usually mean something like quiesce I/O or read-only, usually
temporarily.  What we actaully mean is you can't delete this.  Maybe
pin/unpin?  preserve/unpreserve may be fine, too!

sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable btrfs filesystems

2012-06-17 Thread Martin Mailand

Hi Wido,
until recently there were still a few bugs in btrfs which could be hit 
quite easily with ceph. The last big one was fixed here 
http://www.spinics.net/lists/ceph-devel/msg06270.html


I am running a ceph cluster with btrfs on a 3.5-rc2 without a problem, 
even under heavy test load.


Hope that's helped.

-martin


Am 16.06.2012 20:46, schrieb Wido den Hollander:

I tried various kernels, the most recent 3.3.0 from kernel.ubuntu.com,
but I'm still seeing this.

Is anyone seeing the same or did everybody migrate away to ext4 or XFS?

I still prefer btrfs due to the snapshotting, but loosing all these
OSD's all the time is getting kind of frustrating.

Any thoughts or comments?

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-24 Thread Martin Mailand

Hi,
the ceph cluster is running under heavy load for the last 13 hours 
without a problem, dmesg is empty and the performance is good.


-martin

Am 23.05.2012 21:12, schrieb Martin Mailand:

this patch is running for 3 hours without a Bug and without the Warning.
I will let it run overnight and report tomorrow.
It looks very good ;-)

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-23 Thread Martin Mailand

Hi Josef,

this patch is running for 3 hours without a Bug and without the Warning.
I will let it run overnight and report tomorrow.
It looks very good ;-)

-martin

Am 23.05.2012 17:02, schrieb Josef Bacik:

Ok give this a shot, it should do it.  Thanks,

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-18 Thread Martin Mailand

Hi Josef,
there was one line before the bug.

[  995.725105] couldn't find orphan item for 524


Am 18.05.2012 16:48, schrieb Josef Bacik:

Ok hopefully this will print something out that makes sense.  Thanks,


-martin

[  241.754693] Btrfs loaded
[  241.755148] device fsid 43c4ebd9-3824-4b07-a710-3ec39b012759 devid 1 
transid 4 /dev/sdc

[  241.755750] btrfs: setting nodatacow
[  241.755753] btrfs: enabling auto defrag
[  241.755754] btrfs: disk space caching is enabled
[  241.755755] btrfs flagging fs with big metadata feature
[  241.768683] device fsid e7e7f2df-6a4e-45b1-85cc-860cda849953 devid 1 
transid 4 /dev/sdd

[  241.769028] btrfs: setting nodatacow
[  241.769030] btrfs: enabling auto defrag
[  241.769031] btrfs: disk space caching is enabled
[  241.769032] btrfs flagging fs with big metadata feature
[  241.781360] device fsid 203fdd4c-baac-49f8-bfdb-08486c937989 devid 1 
transid 4 /dev/sde

[  241.781854] btrfs: setting nodatacow
[  241.781859] btrfs: enabling auto defrag
[  241.781861] btrfs: disk space caching is enabled
[  241.781864] btrfs flagging fs with big metadata feature
[  242.713741] device fsid 95c36e12-0098-48d7-a08d-9d54a299206b devid 1 
transid 4 /dev/sdf

[  242.714110] btrfs: setting nodatacow
[  242.714118] btrfs: enabling auto defrag
[  242.714121] btrfs: disk space caching is enabled
[  242.714125] btrfs flagging fs with big metadata feature
[  995.725105] couldn't find orphan item for 524
[  995.725126] [ cut here ]
[  995.725134] kernel BUG at fs/btrfs/inode.c:2227!
[  995.725143] invalid opcode:  [#1] SMP
[  995.725158] CPU 0
[  995.725162] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
coretemp ghash_clmulni_intel aesni_intel bonding cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport ixgbe usbhid hid isci libsas megaraid_sas 
scsi_transport_sas igb dca mdio

[  995.725285]
[  995.725290] Pid: 2972, comm: ceph-osd Tainted: G C 
3.4.0-rc7.2012051800+ #14 Supermicro X9SRi/X9SRi
[  995.725324] RIP: 0010:[a028535f]  [a028535f] 
btrfs_orphan_del+0x14f/0x160 [btrfs]

[  995.725354] RSP: 0018:881016ed9d18  EFLAGS: 00010292
[  995.725364] RAX: 0037 RBX: 88101485fdb0 RCX: 

[  995.725378] RDX:  RSI: 0082 RDI: 
0246
[  995.725392] RBP: 881016ed9d58 R08:  R09: 

[  995.725405] R10:  R11: 00b6 R12: 
88101efe9f90
[  995.725419] R13: 88101efe9c00 R14: 0001 R15: 
0001
[  995.725433] FS:  7f58e5dbc700() GS:88107fc0() 
knlGS:

[  995.725466] CS:  0010 DS:  ES:  CR0: 80050033
[  995.725492] CR2: 03f28000 CR3: 00101acac000 CR4: 
000407f0
[  995.725522] DR0:  DR1:  DR2: 

[  995.725551] DR3:  DR6: 0ff0 DR7: 
0400
[  995.725581] Process ceph-osd (pid: 2972, threadinfo 881016ed8000, 
task 88101618)

[  995.725626] Stack:
[  995.725646]  0c02 88101deaf550 881016ed9d38 
88101deaf550
[  995.725700]   88101efe9c00 88101485fdb0 
880be890c1e0
[  995.725757]  881016ed9e08 a02897a8 88101485fdb0 


[  995.725807] Call Trace:
[  995.725835]  [a02897a8] btrfs_truncate+0x5e8/0x6d0 [btrfs]
[  995.725869]  [a028b121] btrfs_setattr+0xc1/0x1b0 [btrfs]
[  995.725898]  [811955c3] notify_change+0x183/0x320
[  995.725925]  [8117889e] do_truncate+0x5e/0xa0
[  995.725951]  [81178a24] sys_truncate+0x144/0x1b0
[  995.725979]  [8165fd29] system_call_fastpath+0x16/0x1b
[  995.726006] Code: 45 31 ff e9 3c ff ff ff 48 8b b3 58 fe ff ff 48 85 
f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 08 48 2e a0 31 c0 e8 09 7c 
3c e1 0f 0b 48 8b 73 40 eb ea 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
[  995.726221] RIP  [a028535f] btrfs_orphan_del+0x14f/0x160 
[btrfs]

[  995.726258]  RSP 881016ed9d18
[  995.726574] ---[ end trace 4bde8f513a6d106d ]---

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-18 Thread Martin Mailand

Hi Josef,
now I get
[ 2081.142669] couldn't find orphan item for 2039, nlink 1, root 269, 
root being deleted no


-martin

Am 18.05.2012 21:01, schrieb Josef Bacik:

*sigh*  ok try this, hopefully it will point me in the right direction.  Thanks,



[  126.389847] Btrfs loaded
[  126.390284] device fsid 0c9d8c6d-2982-4604-b32a-fc443c4e2c50 devid 1 
transid 4 /dev/sdc

[  126.391246] btrfs: setting nodatacow
[  126.391252] btrfs: enabling auto defrag
[  126.391254] btrfs: disk space caching is enabled
[  126.391257] btrfs flagging fs with big metadata feature
[  126.405700] device fsid e8a0dc27-8714-49bd-a14f-ac37525febb1 devid 1 
transid 4 /dev/sdd

[  126.406162] btrfs: setting nodatacow
[  126.406167] btrfs: enabling auto defrag
[  126.406170] btrfs: disk space caching is enabled
[  126.406172] btrfs flagging fs with big metadata feature
[  126.419819] device fsid f67cd977-ebf4-41f2-9821-f2989e985954 devid 1 
transid 4 /dev/sde

[  126.420198] btrfs: setting nodatacow
[  126.420206] btrfs: enabling auto defrag
[  126.420210] btrfs: disk space caching is enabled
[  126.420214] btrfs flagging fs with big metadata feature
[  127.274555] device fsid 3001355e-c2e2-46c7-9eba-dfecb441d6a6 devid 1 
transid 4 /dev/sdf

[  127.274980] btrfs: setting nodatacow
[  127.274986] btrfs: enabling auto defrag
[  127.274989] btrfs: disk space caching is enabled
[  127.274992] btrfs flagging fs with big metadata feature
[ 2081.142669] couldn't find orphan item for 2039, nlink 1, root 269, 
root being deleted no

[ 2081.142735] [ cut here ]
[ 2081.142750] kernel BUG at fs/btrfs/inode.c:2228!
[ 2081.142766] invalid opcode:  [#1] SMP
[ 2081.142786] CPU 10
[ 2081.142794] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ioatdma ses 
enclosure mac_hid lp parport usbhid hid megaraid_sas isci libsas 
scsi_transport_sas igb ixgbe dca mdio

[ 2081.142974]
[ 2081.142985] Pid: 2966, comm: ceph-osd Tainted: G C 
3.4.0-rc7.2012051802+ #16 Supermicro X9SRi/X9SRi
[ 2081.143020] RIP: 0010:[a0269383]  [a0269383] 
btrfs_orphan_del+0x173/0x180 [btrfs]

[ 2081.143080] RSP: 0018:881016d83d18  EFLAGS: 00010292
[ 2081.143096] RAX: 0062 RBX: 881017ad4770 RCX: 

[ 2081.143115] RDX:  RSI: 0082 RDI: 
0246
[ 2081.143134] RBP: 881016d83d58 R08:  R09: 

[ 2081.143154] R10:  R11: 0116 R12: 
88101e7baf90
[ 2081.143173] R13: 88101e7bac00 R14: 0001 R15: 
0001
[ 2081.143193] FS:  7fcc1e736700() GS:88107fd4() 
knlGS:

[ 2081.143243] CS:  0010 DS:  ES:  CR0: 80050033
[ 2081.143274] CR2: 09269000 CR3: 00101ba87000 CR4: 
000407e0
[ 2081.143308] DR0:  DR1:  DR2: 

[ 2081.143341] DR3:  DR6: 0ff0 DR7: 
0400
[ 2081.143376] Process ceph-osd (pid: 2966, threadinfo 881016d82000, 
task 881023c744a0)

[ 2081.143424] Stack:
[ 2081.143447]  0c07 88101e1dac30 881016d83d38 
88101e1dac30
[ 2081.143510]   88101e7bac00 881017ad4770 
88101f0f7d60
[ 2081.143572]  881016d83e08 a026d7c8 881017ad4770 


[ 2081.143634] Call Trace:
[ 2081.143684]  [a026d7c8] btrfs_truncate+0x5e8/0x6d0 [btrfs]
[ 2081.143737]  [a026f141] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 2081.143773]  [811955c3] notify_change+0x183/0x320
[ 2081.143807]  [8117889e] do_truncate+0x5e/0xa0
[ 2081.143839]  [81178a24] sys_truncate+0x144/0x1b0
[ 2081.143873]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 2081.143903] Code: a0 49 8b 8d f0 02 00 00 8b 53 48 4c 0f 44 c0 48 85 
f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 10 88 2c a0 31 c0 e8 e5 3b 
3e e1 0f 0b 48 8b 73 40 eb ea 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10
[ 2081.144199] RIP  [a0269383] btrfs_orphan_del+0x173/0x180 
[btrfs]

[ 2081.144258]  RSP 881016d83d18
[ 2081.144614] ---[ end trace 8d0829d100639242 ]---

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Martin Mailand

Hi Josef,

somehow I still get the kernel Bug messages, I used your patch from the 
16th against rc7.


-martin

Am 16.05.2012 21:20, schrieb Josef Bacik:

Hrm ok so I finally got some time to try and debug it and let the test run a
good long while (5 hours almost) and I couldn't hit either the original bug or
the one you guys were hitting.  So either my extra little bit of locking did the
trick or I get to keep my Worst reproducer ever award.  Can you guys give this
one a whirl and if it panics send the entire dmesg since it should spit out a
WARN_ON() to let me know what I thought was the problem was it.  Thanks,


[ 2868.813236] [ cut here ]
[ 2868.813297] kernel BUG at fs/btrfs/inode.c:2220!
[ 2868.813355] invalid opcode:  [#2] SMP
[ 2868.813479] CPU 2
[ 2868.813516] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb megaraid_sas dca mdio

[ 2868.814871]
[ 2868.814925] Pid: 5325, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 2868.815108] RIP: 0010:[a02212f2]  [a02212f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 2868.815236] RSP: 0018:880296e89d18  EFLAGS: 00010282
[ 2868.815294] RAX: fffe RBX: 88101ef3c390 RCX: 
00562497
[ 2868.815355] RDX: 00562496 RSI: 88101ef1 RDI: 
ea00407bc400
[ 2868.815416] RBP: 880296e89d58 R08: 60ef8fd0 R09: 
a01f8c6a
[ 2868.815476] R10:  R11: 011d R12: 
880fdf602790
[ 2868.815537] R13: 880fdf602400 R14: 0001 R15: 
0001
[ 2868.815598] FS:  7f07d5512700() GS:88107fc4() 
knlGS:

[ 2868.815675] CS:  0010 DS:  ES:  CR0: 80050033
[ 2868.815734] CR2: 0ab16000 CR3: 00082a6b2000 CR4: 
000407e0
[ 2868.815796] DR0:  DR1:  DR2: 

[ 2868.815858] DR3:  DR6: 0ff0 DR7: 
0400
[ 2868.815920] Process ceph-osd (pid: 5325, threadinfo 880296e88000, 
task 8810170616e0)

[ 2868.815997] Stack:
[ 2868.816049]  0c07 88101ef12960 880296e89d38 
88101ef12960
[ 2868.816262]   880fdf602400 88101ef3c390 
880b4ce2f260
[ 2868.816485]  880296e89e08 a0225628 88101ef3c390 


[ 2868.816694] Call Trace:
[ 2868.816755]  [a0225628] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 2868.816817]  [81188afd] ? path_lookupat+0x6d/0x750
[ 2868.816880]  [a0227021] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 2868.816940]  [811955c3] notify_change+0x183/0x320
[ 2868.816998]  [8117889e] do_truncate+0x5e/0xa0
[ 2868.817056]  [81178a24] sys_truncate+0x144/0x1b0
[ 2868.817115]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 2868.817173] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 
80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff 
eb b8 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 2868.819501] RIP  [a02212f2] btrfs_orphan_del+0xe2/0xf0 [btrfs]
[ 2868.819602]  RSP 880296e89d18
[ 2868.819703] ---[ end trace 94d17b770b376c84 ]---
[ 3249.857453] [ cut here ]
[ 3249.857481] kernel BUG at fs/btrfs/inode.c:2220!
[ 3249.857506] invalid opcode:  [#3] SMP
[ 3249.857534] CPU 0
[ 3249.857538] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb megaraid_sas dca mdio

[ 3249.857721]
[ 3249.857740] Pid: 5384, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 3249.857791] RIP: 0010:[a02212f2]  [a02212f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 3249.857847] RSP: 0018:880abe8b5d18  EFLAGS: 00010282
[ 3249.857873] RAX: fffe RBX: 8807eb8b6670 RCX: 
0077a084
[ 3249.857902] RDX: 0077a083 RSI: 88101ee497e0 RDI: 
ea00407b9240
[ 3249.857931] RBP: 880abe8b5d58 R08: 60ef8fd0 R09: 
a01f8c6a
[ 3249.857959] R10:  R11: 0153 R12: 
880d56825390
[ 3249.857988] R13: 880d56825000 R14: 0001 R15: 
0001
[ 3249.858017] FS:  7f06bd13b700() GS:88107fc0() 
knlGS:

[ 3249.858062] CS:  0010 DS:  ES:  CR0: 80050033
[ 3249.858088] CR2: 043d2000 CR3: 000e7ebe5000 CR4: 
000407f0
[ 3249.858117] DR0:  DR1:  DR2: 

[ 3249.858146] DR3:  DR6: 0ff0 DR7: 

Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Martin Mailand

Hi Josef,
no there was nothing above. Here the is another dmesg output.


Was there anything above those messages?  There should have been a WARN_ON() or
something.  If not thats fine, I just need to know one way or the other so I can
figure out what to do next.  Thanks,

Josef


-martin

[   63.027277] Btrfs loaded
[   63.027485] device fsid 266726e1-439f-4d89-a374-7ef92d355daf devid 1 
transid 4 /dev/sdc

[   63.027750] btrfs: setting nodatacow
[   63.027752] btrfs: enabling auto defrag
[   63.027753] btrfs: disk space caching is enabled
[   63.027754] btrfs flagging fs with big metadata feature
[   63.036347] device fsid 070e2c6c-2ea5-478d-bc07-7ce3a954e2e4 devid 1 
transid 4 /dev/sdd

[   63.036624] btrfs: setting nodatacow
[   63.036626] btrfs: enabling auto defrag
[   63.036627] btrfs: disk space caching is enabled
[   63.036628] btrfs flagging fs with big metadata feature
[   63.045628] device fsid 6f7b82a9-a1b7-40c6-8b00-2c2a44481066 devid 1 
transid 4 /dev/sde

[   63.045910] btrfs: setting nodatacow
[   63.045912] btrfs: enabling auto defrag
[   63.045913] btrfs: disk space caching is enabled
[   63.045914] btrfs flagging fs with big metadata feature
[   63.831278] device fsid 46890b76-45c2-4ea2-96ee-2ea88e29628b devid 1 
transid 4 /dev/sdf

[   63.831577] btrfs: setting nodatacow
[   63.831579] btrfs: enabling auto defrag
[   63.831579] btrfs: disk space caching is enabled
[   63.831580] btrfs flagging fs with big metadata feature
[ 1521.820412] [ cut here ]
[ 1521.820424] kernel BUG at fs/btrfs/inode.c:2220!
[ 1521.820433] invalid opcode:  [#1] SMP
[ 1521.820448] CPU 4
[ 1521.820452] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses 
enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd 
aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev 
ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb dca megaraid_sas mdio

[ 1521.820562]
[ 1521.820567] Pid: 3095, comm: ceph-osd Tainted: G C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 1521.820591] RIP: 0010:[a02532f2]  [a02532f2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 1521.820616] RSP: 0018:881013da9d18  EFLAGS: 00010282
[ 1521.820626] RAX: fffe RBX: 881013a3b7f0 RCX: 
00395dcf
[ 1521.820640] RDX: 00395dce RSI: 88101df77480 RDI: 
ea004077ddc0
[ 1521.820654] RBP: 881013da9d58 R08: 60ef800010d0 R09: 
a022ac6a
[ 1521.820667] R10:  R11: 010a R12: 
88101e378790
[ 1521.820681] R13: 88101e378400 R14: 0001 R15: 
0001
[ 1521.820695] FS:  7faa45d30700() GS:88107fc8() 
knlGS:

[ 1521.820710] CS:  0010 DS:  ES:  CR0: 80050033
[ 1521.820738] CR2: 7fe0efba6010 CR3: 001016fec000 CR4: 
000407e0
[ 1521.820767] DR0:  DR1:  DR2: 

[ 1521.820796] DR3:  DR6: 0ff0 DR7: 
0400
[ 1521.820825] Process ceph-osd (pid: 3095, threadinfo 881013da8000, 
task 881013da44a0)

[ 1521.820870] Stack:
[ 1521.820889]  0c05 88101df9c230 881013da9d38 
88101df9c230
[ 1521.820939]   88101e378400 881013a3b7f0 
880c6880f840
[ 1521.820988]  881013da9e08 a0257628 881013a3b7f0 


[ 1521.821038] Call Trace:
[ 1521.821066]  [a0257628] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 1521.821096]  [81188afd] ? path_lookupat+0x6d/0x750
[ 1521.821128]  [a0259021] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 1521.821156]  [811955c3] notify_change+0x183/0x320
[ 1521.821183]  [8117889e] do_truncate+0x5e/0xa0
[ 1521.821209]  [81178a24] sys_truncate+0x144/0x1b0
[ 1521.821237]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 1521.821265] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 
80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff 
eb b8 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 1521.821458] RIP  [a02532f2] btrfs_orphan_del+0xe2/0xf0 [btrfs]
[ 1521.821492]  RSP 881013da9d18
[ 1521.821758] ---[ end trace aee4c5fe92ee2a67 ]---
[ 6888.637508] btrfs: truncated 1 orphans
[ 7641.701736] [ cut here ]
[ 7641.701764] kernel BUG at fs/btrfs/inode.c:2220!
[ 7641.701789] invalid opcode:  [#2] SMP
[ 7641.701816] CPU 3
[ 7641.701819] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses 
enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd 
aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev 
ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
ixgbe igb dca megaraid_sas mdio

[ 7641.702000]
[ 7641.702030] Pid: 3064, comm: ceph-osd Tainted: G  D  C 
3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi
[ 7641.702081] RIP: 0010:[a02532f2]  [a02532f2] 

Re: Ceph on btrfs 3.4rc

2012-05-14 Thread Martin Mailand

Hi Josef,

Am 11.05.2012 21:16, schrieb Josef Bacik:

Heh duh, sorry, try this one instead.  Thanks,


With this patch I got this Bug:

[ 8233.828722] [ cut here ]
[ 8233.828737] kernel BUG at fs/btrfs/inode.c:2217!
[ 8233.828746] invalid opcode:  [#1] SMP
[ 8233.828761] CPU 1
[ 8233.828766] Modules linked in: btrfs zlib_deflate libcrc32c ses 
enclosure bonding coretemp ghash_clmulni_intel psmouse aesni_intel 
sb_edac cryptd a es_x86_64 ext2 microcode serio_raw edac_core mei(C) 
joydev ioatdma mac_hid lp parport usbhid hid isci libsas ixgbe 
scsi_transport_sas megaraid_sas igb  dca mdio

[ 8233.828885]
[ 8233.828891] Pid: , comm: ceph-osd Tainted: GWC 
3.4.0-rc6+ #6 Supermicro X9SRi/X9SRi
[ 8233.828915] RIP: 0010:[a02492d2]  [a02492d2] 
btrfs_orphan_del+0xe2/0xf0 [btrfs]

[ 8233.828947] RSP: 0018:88101ce53d18  EFLAGS: 00010282
[ 8233.828957] RAX: fffe RBX: 880d194e2c50 RCX: 
00d0a3be
[ 8233.828971] RDX: 00d0a3bd RSI: 88101de2a000 RDI: 
ea0040778a80
[ 8233.828985] RBP: 88101ce53d58 R08: 60ef8f00 R09: 
a0220c6a
[ 8233.828999] R10:  R11: 00f0 R12: 
88071bb1e790
[ 8233.829029] R13: 88071bb1e400 R14: 0001 R15: 
0001
[ 8233.829059] FS:  7fdfa179b700() GS:88107fc2() 
knlGS:

[ 8233.829104] CS:  0010 DS:  ES:  CR0: 80050033
[ 8233.829131] CR2: 0c614000 CR3: 0001df9d2000 CR4: 
000407e0
[ 8233.829160] DR0:  DR1:  DR2: 

[ 8233.829190] DR3:  DR6: 0ff0 DR7: 
0400
[ 8233.829220] Process ceph-osd (pid: , threadinfo 88101ce52000, 
task 88101b7b96e0)

[ 8233.829265] Stack:
[ 8233.829286]  0c02 88101de14cd0 88101ce53d38 
88101de14cd0
[ 8233.829336]   88071bb1e400 880d194e2c50 
881024680620
[ 8233.829386]  88101ce53e08 a024d608 880d194e2c50 


[ 8233.829436] Call Trace:
[ 8233.829472]  [a024d608] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 8233.829503]  [81188afd] ? path_lookupat+0x6d/0x750
[ 8233.829537]  [a024efc1] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 8233.829567]  [811955c3] notify_change+0x183/0x320
[ 8233.829595]  [8117889e] do_truncate+0x5e/0xa0
[ 8233.829621]  [81178a24] sys_truncate+0x144/0x1b0
[ 8233.829649]  [8165fd69] system_call_fastpath+0x16/0x1b
[ 8233.829676] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 
80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 70 73 fe ff 
eb b8  0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 8233.829875] RIP  [a02492d2] btrfs_orphan_del+0xe2/0xf0 [btrfs]
[ 8233.829914]  RSP 88101ce53d18
[ 8233.830187] ---[ end trace 46dd4a711bf2979d ]---


-martin

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Martin Mailand

Hi Josef,

Am 11.05.2012 15:31, schrieb Josef Bacik:

That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you
are on mainline.  Thanks,


I tried your patch against mainline, after a few minutes I hit this bug.

[ 1078.523655] [ cut here ]
[ 1078.523667] kernel BUG at fs/btrfs/inode.c:2211!
[ 1078.523676] invalid opcode:  [#1] SMP
[ 1078.523692] CPU 5
[ 1078.523696] Modules linked in: btrfs zlib_deflate libcrc32c mlx4_en 
bonding ext2 coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core mei(C) joydev ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
igb megaraid_sas mlx4_core dca

[ 1078.523813]
[ 1078.523818] Pid: 4108, comm: ceph-osd Tainted: G C 
3.4.0-rc6+ #5 Supermicro X9SRi/X9SRi
[ 1078.523841] RIP: 0010:[a022b2a2]  [a022b2a2] 
btrfs_orphan_del+0xb2/0xc0 [btrfs]

[ 1078.523867] RSP: 0018:880ff14a5d38  EFLAGS: 00010282
[ 1078.523877] RAX: fffe RBX: 880ff004d6f0 RCX: 
00117400
[ 1078.523891] RDX: 001173ff RSI: 8810279f6ea0 RDI: 
ea00409e7d80
[ 1078.523905] RBP: 880ff14a5d58 R08: 60ef80001400 R09: 
a0202c6a
[ 1078.523918] R10:  R11: 00ba R12: 
0001
[ 1078.523932] R13: 881017663c00 R14: 0001 R15: 
88101776f5a0
[ 1078.523946] FS:  7f1d2c03c700() GS:88107fca() 
knlGS:

[ 1078.523961] CS:  0010 DS:  ES:  CR0: 80050033
[ 1078.523990] CR2: 050f4000 CR3: 000ff2a57000 CR4: 
000407e0
[ 1078.524019] DR0:  DR1:  DR2: 

[ 1078.524048] DR3:  DR6: 0ff0 DR7: 
0400
[ 1078.524077] Process ceph-osd (pid: 4108, threadinfo 880ff14a4000, 
task 880ff2aa44a0)

[ 1078.524121] Stack:
[ 1078.524141]  8810279f7460  881017663c00 
880ff004d6f0
[ 1078.524190]  880ff14a5e08 a022f5d8 880ff004d6f0 

[ 1078.524240]  880ff14a5e18 81188afd 8000 
80001000

[ 1078.524289] Call Trace:
[ 1078.524317]  [a022f5d8] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 1078.524348]  [81188afd] ? path_lookupat+0x6d/0x750
[ 1078.524380]  [a0230f91] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 1078.524408]  [811955c3] notify_change+0x183/0x320
[ 1078.524435]  [8117889e] do_truncate+0x5e/0xa0
[ 1078.524461]  [81178a24] sys_truncate+0x144/0x1b0
[ 1078.524489]  [8165fd69] system_call_fastpath+0x16/0x1b
[ 1078.524516] Code: 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 0f 1f 40 00 
80 bb 60 fe ff ff 84 75 c1 eb bb 0f 1f 44 00 00 48 89 df e8 a0 73 fe ff 
eb c1 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 1078.524710] RIP  [a022b2a2] btrfs_orphan_del+0xb2/0xc0 [btrfs]
[ 1078.524744]  RSP 880ff14a5d38
[ 1078.525013] ---[ end trace 88c92720204f7aa4 ]---


That's the drive with the broken btrfs.

[  212.843776] device fsid 28492275-01d3-4e89-9f1c-bd86057194bf devid 1 
transid 4 /dev/sdc

[  212.844630] btrfs: setting nodatacow
[  212.844637] btrfs: enabling auto defrag
[  212.844640] btrfs: disk space caching is enabled
[  212.844643] btrfs flagging fs with big metadata feature



-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange write behavior on an osd

2012-04-24 Thread Martin Mailand

Hi,
I have a strange behavior on the osd, the cluster is a two node system, 
on one machine 50 qemu/rbd vm's are running (idling) the other machine 
is a osd with four osd processes and one mon processes.


The osd disk are as follow

sda is root
sdb is journal four partitions
sd{c,d,e,f) each three disk via a raid controler.

/dev/sdc on /data/osd.0 type btrfs 
(rw,noatime,nodiratime,nodatacow,autodefrag)
/dev/sdd on /data/osd.1 type btrfs 
(rw,noatime,nodiratime,nodatacow,autodefrag)
/dev/sde on /data/osd.2 type btrfs 
(rw,noatime,nodiratime,nodatacow,autodefrag)
/dev/sdf on /data/osd.3 type btrfs 
(rw,noatime,nodiratime,nodatacow,autodefrag)



There is almost no network traffic, but the osd writes huge amount to 
the disk for around 90 sec and then its almost idle for 30 sec, the 
writes always goes to sde.


Why is it so bursty?


-martin


## Busy Log ##


total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1   2  89   8   0   0|1632k   88M|   0 0 | 475B 1234B|155310k
  0   0  88  12   0   0|   0   147M| 856B 2974B|   0 0 |2056  1789
  0   0  88  12   0   0|   0   164M|  85k 6771B|   0 0 |2227  3104
  0   1  84  15   0   0|   0   152M| 193k   17k|   0 0 |2805  6116
  1   2  83  14   0   0|2704k  183M| 314k   23k|   0 0 |3184  7942
  0   1  84  15   0   0|2072k  183M| 213k   16k|   0 0 |3142  6798
  0   0  88  12   0   0|   0   167M|  27k 5571B|   0 0 |2418  2608
  1   1  80  18   0   0|  96k  207M| 443k   26k|   0 0 |3267  9278
  1   2  81  15   0   0|   0   180M| 682k   43k|   0 0 |394113k
  1   1  80  17   0   0|2736k  153M| 573k   35k|   0 0 |322911k
  1   1  84  14   0   0|9564k  163M| 242k   22k|   0 0 |2988  7054
  0   1  75  24   0   0| 160k  166M|  40k 5331B|   0 0 |2187  2759
  0   1  85  14   0   0|  32k  176M|  85k 6730B|   0 0 |2244  3198
  0   1  83  16   0   0|   0   183M| 137k   12k|   0 0 |2590  5254
  0   1  84  15   0   0|2688k  170M| 179k   15k|   0 0 |2780  5461
  0   1  86  13   0   0|2692k  166M| 185k   17k|   0 0 |2638  6242
  1   1  83  15   0   0|   0   179M| 149k   17k|   0 0 |3165  5695
  1   2  81  17   0   0|   0   186M| 484k   33k|   0 0 |351211k
  0   1  82  16   0   0|   0   177M| 523k   33k|   0 0 |317711k
  1   1  82  16   0   0|  36k  179M| 603k   39k|   0 0 |300611k
  1   1  79  19   0   0|3332k  210M| 332k   28k|   0 0 |3555  8813
  0   0  89  11   0   0|   0   167M|  53k 7553B|   0 0 |2423  3136
  0   0  87  12   0   0|   0   139M| 129k   11k|   0 0 |2073  3888
  0   2  80  18   0   0|  32k  170M| 293k   26k|   0 0 |2950  8825
  0   0  88  12   0   0| 772k  175M|  95k 8765B|   0 0 |2512  3640
  0   2  86  12   0   0|  28k  197M| 199k   12k|   0 0 |2435  5194
  0   0  87  13   0   0|  20k  179M| 111k 7843B|   0 0 |2310  3064


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.770.001.44   15.810.00   81.99

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.0071.800.00   17.80 0.00 0.35 
40.27 0.031.570.001.57   0.54   0.96
sdb   0.00 0.000.00  188.40 0.00 1.51 
16.37 0.492.590.002.59   0.70  13.20
sdc   0.00 0.004.00   61.00 0.53 2.34 
90.34 0.365.61   46.202.95   1.00   6.48
sde   0.00  1542.000.40 2172.00 0.01   165.82 
156.34   143.39   65.76  214.00   65.73   0.46 100.00
sdd   0.00 0.003.40   59.60 0.53 1.25 
57.85 0.203.19   32.471.52   0.88   5.52
sdf   0.00 0.008.40   75.40 1.35 1.75 
75.59 0.516.13   42.102.12   1.37  11.44


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.230.000.77   15.960.00   83.03

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.0072.200.00   18.20 0.00 0.35 
39.74 0.021.320.001.32   0.57   1.04
sdb   0.00 0.000.00   72.00 0.00 0.52 
14.80 0.192.580.002.58   0.73   5.28
sdc   0.00 0.000.20   38.00 0.00 1.64 
88.17 0.369.36   16.009.33   1.09   4.16
sde   0.00  1554.801.20 2058.20 0.04   163.24 
162.37   143.50   69.50  296.67   69.37   0.49 100.00
sdd   0.00 0.003.40   39.00 0.53 2.90 
165.51 0.368.49   67.533.34   1.04   4.40
sdf   0.00 0.003.20   53.40 0.53 4.16 
169.41 0.83   14.66   53.00   12.36   1.58   8.96


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.650.001.36   16.560.00   81.43

Device: 

Re: Strange write behavior on an osd

2012-04-24 Thread Martin Mailand

Hi,

Am 24.04.2012 17:23, schrieb João Eduardo Luís:

Any chance you could run iotop during the busy periods and tell us which
processes are issuing the io?


sure,
http://85.214.49.87/ceph/iotop.txt

-martin

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange write behavior on an osd

2012-04-24 Thread Martin Mailand

Hi,

Am 24.04.2012 18:31, schrieb João Eduardo Luís:

What kernel and btrfs versions are you using?


Kernel:3.4.0-rc3

btrfs-tools 0.19+20100601-3ubuntu3

That's how I created the fs.
mkfs.btrfs -n 32k -l 32k /dev/sd{c,d,e,f}

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snapshot in qemu and libvirt

2012-04-19 Thread Martin Mailand

Hi List,

is it possible to quiesce the disk before a snapshot? Or does it make no 
sense with rbd?

How about the new rbd_cache, does it get flushed before the snapshot?

I would like to use it like this.

virsh snapshot-create --quiesce $DOMAIN

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wip-librbd-caching

2012-04-18 Thread Martin Mailand

Am 12.04.2012 21:45, schrieb Sage Weil:

The config options you'll want to look at are client_oc_* (in case you
didn't see that already :).  oc is short for objectcacher, and it isn't
only used for client (libcephfs), so it might be worth renaming these
options before people start using them.


Hi,

I changed the values and the performance is still very good and the 
memory footprint is much smaller.


OPTION(client_oc_size, OPT_INT, 1024*1024* 50)// MB * n
OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 25)// MB * n  (dirty 
OR tx.. bigish)
OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty 
(keep this smallish)

// note: the max amount of in flight dirty data is roughly (max - target)

But I am not quite sure about the meaning of the values.
client_oc_size Max size of the cache?
client_oc_max_dirty max dirty value before the writeback starts?
client_oc_target_dirty ???


-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rbd snapshot in qemu and libvirt

2012-04-18 Thread Martin Mailand

Hi List,

does anyone know the actual progress of the rbd snapshot feature 
integration into qemu and libvirt?


-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snapshot in qemu and libvirt

2012-04-18 Thread Martin Mailand

Hi Wido,

I am looking for doing the snapshots via libvirt, create, delete, 
rollback and list of the snapshot.


-martin

Am 18.04.2012 15:10, schrieb Wido den Hollander:

I tested this about a year ago and that worked fine.

Anything in particular you are looking for?


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snapshot in qemu and libvirt

2012-04-18 Thread Martin Mailand

Hi Andrey,

if I try it I get this error.

virsh snapshot-create linux1
error: Requested operation is not valid: Disk 
'rbd/vm1:rbd_cache_enabled=1' does not support snapshotting


maybe the rbd_cache option is the problem?


-martin


Am 18.04.2012 16:39, schrieb Andrey Korolyov:

I have tested all of them about a week ago, all works fine. Also it
will be very nice if rbd can list an actual allocated size of every
image or snapshot in future.

On Wed, Apr 18, 2012 at 5:22 PM, Martin Mailandmar...@tuxadero.com  wrote:

Hi Wido,

I am looking for doing the snapshots via libvirt, create, delete, rollback
and list of the snapshot.

-martin

Am 18.04.2012 15:10, schrieb Wido den Hollander:


I tested this about a year ago and that worked fine.

Anything in particular you are looking for?



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snapshot in qemu and libvirt

2012-04-18 Thread Martin Mailand

Hi,

Am 18.04.2012 17:52, schrieb Andrey Korolyov:

Oh, I forgot to say about a patch:


perfect, now it works.

Thanks.

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


wip-librbd-caching

2012-04-12 Thread Martin Mailand

Hi,

today I tried the wip-librbd-caching branch. The performance improvement 
is very good particular for small writes.

I tested from within a vm with fio:

rbd_cache_enabled=1

fio -name iops -rw=write -size=10G -iodepth 1 -filename /tmp/bigfile 
-ioengine libaio -direct 1 -bs 4k


I get over 10k iops

With an iodepth 4 I get over 30k iops

In comparison with the rbd_writebackwindow I get around 5k iops with an 
iodepth of 1.


So far the whole cluster is running stable for over 12 hours.

But there is also a downside.
My typical vm are 1Gb in size, the default cache size is 200Mb, which is 
20% more memory usage. Maybe 50Mb or less will be enough?

I am going to test that.

The other point is, that the cache is not KSM enabled, therefore 
identical pages will not be merged, could that be changed, what would be 
the downside?


So maybe we could reduce the memory footprint of the cache, but keep 
it's performance.


-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Crowbar] barclamp-ceph and crowbar

2012-02-24 Thread Martin Mailand

Hi John,
I tried them a few weeks ago, they are developed for crowbar version 1.1 
and doesn't seem to work with 1.2. If I want to create a proposal, the 
next page is white and an error is log.
The barclamp installs one ceph-mon node, and several ceph-store nodes. 
The glue to connect your virtual machine to the ceph-store is not 
included in the barclamp.


-martin

On 24.02.2012 15:46, John Alberts wrote:

Does anyone know what I can do with barclamp-ceph?
https://github.com/NewDreamNetwork/barclamp-ceph
The code hasn't been touched since it's initial import 4 months ago.
Does it allow me to easily use ceph for /var/lib/instances on compute
hosts so I can use features like live migration easily?

Thanks
John

___
Crowbar mailing list
crow...@dell.com
https://lists.us.dell.com/mailman/listinfo/crowbar
For more information: https://github.com/dellcloudedge/crowbar/wiki


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-26 Thread Martin Mailand

Hi Sage,
I uploaded the osd.0 log as well.

http://85.214.49.87/ceph/20120124/osd.0.log.bz2

-martin

Am 25.01.2012 23:08, schrieb Sage Weil:

Hi Martin,

On Tue, 24 Jan 2012, Martin Mailand wrote:

Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
crashed.
I am not sure, if the crashes happened because I played with osd.0, or if they
are bugs.


osd.2
-rw---  1 root root 1.1G 2012-01-24 12:19
core-ceph-osd-1000-1327403927-s-brick-002

log:
2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
osd.0 from (185'113859,185'113859] 0//0 to 196'114038
osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t, bool)',
in thread '7f1fdab26700'
osd/PG.cc: 1553: FAILED assert(recovery_ops_active  0)

-rw---  1 root root 758M 2012-01-24 15:58
core-ceph-osd-20755-1327417128-s-brick-002


Can you post the log for osd.0 too?

Thanks!
sage





log:
2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * oi-watcher: client.4478
cookie=1
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
'7fe26fdca700'
osd/ReplicatedPG.cc: 3199: FAILED assert(obc-watchers.size() == 0)
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
'7fe26fdca700'

http://85.214.49.87/ceph/20120124/osd.2.log.bz2



osd.3
-rw---  1 root root 986M 2012-01-24 12:24
core-ceph-osd-962-1327404263-s-brick-003

log:
2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction,
std::listContext*, std::mapint, std::mappg_t, PG::Query  ,
std::mapint, MOSDPGInfo**)', in thread '7f30c8fde700'

http://85.214.49.87/ceph/20120124/osd.3.log.bz2



-martin


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-24 Thread Martin Mailand

Hi
I tried the branch on one of my ceph osd, and there is a big difference 
in the performance.
The average request size stayed high, but after around a hour the kernel 
crashed.


IOstat
http://pastebin.com/xjuriJ6J

Kernel trace
http://pastebin.com/SYE95GgH

-martin

Am 23.01.2012 19:50, schrieb Chris Mason:

On Mon, Jan 23, 2012 at 01:19:29PM -0500, Josef Bacik wrote:

On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote:

As you might know, I have been seeing btrfs slowdowns in our ceph
cluster for quite some time. Even with the latest btrfs code for 3.3
I'm still seeing these problems. To make things reproducible, I've now
written a small test, that imitates ceph's behavior:

On a freshly created btrfs filesystem (2 TB size, mounted with
noatime,nodiratime,compress=lzo,space_cache,inode_cache) I'm opening
100 files. After that I'm doing random writes on these files with a
sync_file_range after each write (each write has a size of 100 bytes)
and ioctl(BTRFS_IOC_SYNC) after every 100 writes.

After approximately 20 minutes, write activity suddenly increases
fourfold and the average request size decreases (see chart in the
attachment).

You can find IOstat output here: http://pastebin.com/Smbfg1aG

I hope that you are able to trace down the problem with the test
program in the attachment.


Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree and
formatted the fs with 64k node and leaf sizes and the problem appeared to go
away.  So surprise surprise fragmentation is biting us in the ass.  If you can
try running that branch with 64k node and leaf sizes with your ceph cluster and
see how that works out.  Course you should only do that if you dont mind if you
lose everything :).  Thanks,



Please keep in mind this branch is only out there for development, and
it really might have huge flaws.  scrub doesn't work with it correctly
right now, and the IO error recovery code is probably broken too.

Long term though, I think the bigger block sizes are going to make a
huge difference in these workloads.

If you use the very dangerous code:

mkfs.btrfs -l 64k -n 64k /dev/xxx

(-l is leaf size, -n is node size).

64K is the max right now, 32K may help just as much at a lower CPU cost.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs slowdown with ceph (how to reproduce)

2012-01-24 Thread Martin Mailand

Hi Chris,
great to hear that, could you give me a ping if you fixed it, than I can 
retry it?


-martin

Am 24.01.2012 20:40, schrieb Chris Mason:

On Tue, Jan 24, 2012 at 08:15:58PM +0100, Martin Mailand wrote:

Hi
I tried the branch on one of my ceph osd, and there is a big
difference in the performance.
The average request size stayed high, but after around a hour the
kernel crashed.

IOstat
http://pastebin.com/xjuriJ6J

Kernel trace
http://pastebin.com/SYE95GgH


Aha, this I know how to fix.  Thanks for trying it out.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-24 Thread Martin Mailand

Hi Greg,
ok, do you guys still need the core files, or could I delete them?

-martin

Am 24.01.2012 22:13, schrieb Gregory Farnum:

On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailandmar...@tuxadero.com  wrote:

Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
osd.3 crashed.
I am not sure, if the crashes happened because I played with osd.0, or if
they are bugs.


These are OSD-level issues not caused by btrfs, so your new kernel
definitely didn't do it. It's probably fallout from the backfill
changes that got merged in last week. I created new bugs to track
them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
Josh are going wild on some other issues that we've turned up and
these have been added to the queue as soon as somebody qualified can
get to them. :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rbd snap ls does not list more than 200 snapshots

2012-01-24 Thread Martin Mailand

Hi,
I created quite a few snapshots of a rbd image. After around 200 
snapshots the command rbd snap ls vm10 does not return, instead it uses 
all of the memory of a 32G machine an then the oom killer gets kicked in.

Are 200 snapshots a known limit?

How to reproduce:

for i in $(seq 500); do rbd snap create --snap=a$i vm10; echo $i ; done

rbd snap ls vm10
doesn't return

top:
25381 root  20   0 5425m 5.2g 5436 S   29 16.4   1:10.21 rbd

rbd -v
ceph version 0.40-206-g6c275c8 
(commit:6c275c8195a8ae04e8a492d043fa6dfd60cecd82)



-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion in v0.40 - os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)

2012-01-15 Thread Martin Mailand

Hi Sage,
that's exactly what I did, the first two crashes are in this log, 
unfortunately there was no debug level set.


http://85.214.49.87/ceph/osd.0.full.log.bz2

-martin



Am 15.01.2012 03:45, schrieb Sage Weil:

Hi Martin-

On Sat, 14 Jan 2012, Martin Mailand wrote:


Hi
one of four OSD died during the update to v0.40 with an Assertion
os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)
Even after a complete shutdown of the cluster an a new start with all OSD at
the same version, this osd did not start.

The OSD Log it attached.


It's trying to replay a transaction that appears to be invalid because the
.2 clone is smaller than it thinks.  Is this the first time the OSD
crashed, or did it crash once, and you cranked up logs and generated
this one?  If you have the previous log, that would be helpful... it
should have a similar tranasction dump but a different stack trace.

Also, are any of the 6 patches on top of 0.40 related to the filestore or
osd?

Thanks!
sage


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion in v0.40 - os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)

2012-01-15 Thread Martin Mailand

Hi Sage,

here is the requested dump file.

http://85.214.49.87/ceph/foo.txt.bz2

-martin


Am 15.01.2012 06:52, schrieb Sage Weil:

Hi Martin-

On Sat, 14 Jan 2012, Sage Weil wrote:

Hi Martin-

On Sat, 14 Jan 2012, Martin Mailand wrote:


Hi
one of four OSD died during the update to v0.40 with an Assertion
os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)
Even after a complete shutdown of the cluster an a new start with all OSD at
the same version, this osd did not start.

The OSD Log it attached.


It's trying to replay a transaction that appears to be invalid because the
.2 clone is smaller than it thinks.  Is this the first time the OSD
crashed, or did it crash once, and you cranked up logs and generated
this one?  If you have the previous log, that would be helpful... it
should have a similar tranasction dump but a different stack trace.


I pushed a wip-osd-dump-journal branch to git that will make

ceph-osd -iwhatever  --dump-journal  /tmp/foo.txt

dump the contents of your entire osd journal (sans data) to a text file.
Do you mind sending that along as well?  I'd like to see what is in the
journal _after_ the event that is failing (if anything).

Thanks!
sage




Also, are any of the 6 patches on top of 0.40 related to the filestore or
osd?

Thanks!
sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.40 released

2012-01-15 Thread Martin Mailand

Hi,
is there an example how to use it, because there is no cpeh plugin for 
collectd?


-martin


Am 14.01.2012 06:30, schrieb Sage Weil:

* mon: expose cluster stats via admin socket (accessible via collectd
plugin)

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Assertion in v0.40 - os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)

2012-01-14 Thread Martin Mailand

Hi
one of four OSD died during the update to v0.40 with an Assertion 
os/FileStore.cc: 2438: FAILED assert(0 == unexpected error)
Even after a complete shutdown of the cluster an a new start with all 
OSD at the same version, this osd did not start.


The OSD Log it attached.

-martin


osd.0.log.bz2
Description: application/bzip


Assertion: ./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)

2011-12-22 Thread Martin Mailand

Hi
today 2 of my osds (osd.4 and osd.7) crashed with the same error.

2011-12-21 14:41:18.896008 7fae9f3a5700 journal check_for_full at 
80625664 : JOURNAL FULL 80625664 = 368639 (max_size 107372544 start 
80994304)
2011-12-21 14:41:23.205993 7fae9fba6700 journal  FULL_FULL - FULL_WAIT. 
 last commit epoch committed, waiting for a new one to start.
2011-12-21 14:41:24.075990 7fae9fba6700 journal  FULL_WAIT - 
FULL_NOTFULL.  journal now active, setting completion plug.
./messages/MOSDRepScrub.h: In function 'virtual void 
MOSDRepScrub::decode_payload(CephContext*)', in thread '7fae93977700'

./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)
 ceph version 0.39-171-gdcedda8 
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)

 1: /usr/bin/ceph-osd() [0x685e77]
 2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer, 
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2) 
[0x6a7202]

 3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
 4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
 5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
 6: (()+0x6d8c) [0x7faea6873d8c]
 7: (clone()+0x6d) [0x7faea4eb004d]
 ceph version 0.39-171-gdcedda8 
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)

 1: /usr/bin/ceph-osd() [0x685e77]
 2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer, 
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2) 
[0x6a7202]

 3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
 4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
 5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
 6: (()+0x6d8c) [0x7faea6873d8c]
 7: (clone()+0x6d) [0x7faea4eb004d]
*** Caught signal (Aborted) **
 in thread 7fae93977700
 ceph version 0.39-171-gdcedda8 
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)

 1: /usr/bin/ceph-osd() [0x645172]
 2: (()+0xfc60) [0x7faea687cc60]
 3: (gsignal()+0x35) [0x7faea4dfdd05]
 4: (abort()+0x186) [0x7faea4e01ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faea56b46dd]
 6: (()+0xb9926) [0x7faea56b2926]
 7: (()+0xb9953) [0x7faea56b2953]
 8: (()+0xb9a5e) [0x7faea56b2a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x396) [0x6193d6]

 10: /usr/bin/ceph-osd() [0x685e77]
 11: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer, 
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2) 
[0x6a7202]

 12: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
 13: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
 14: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
 15: (()+0x6d8c) [0x7faea6873d8c]
 16: (clone()+0x6d) [0x7faea4eb004d]


(gdb) thread apply all bt

snip

Thread 1 (Thread 2400):
#0  0x7faea687cb3b in raise () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00644dc2 in reraise_fatal (signum=6) at 
global/signal_handler.cc:59
#2  0x006453ba in handle_fatal_signal (signum=6) at 
global/signal_handler.cc:106

#3  signal handler called
---Type return to continue, or q return to quit---
#4  0x7faea4dfdd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7faea4e01ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7faea56b46dd in __gnu_cxx::__verbose_terminate_handler() () 
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x7faea56b2926 in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x7faea56b2953 in std::terminate() () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7faea56b2a5e in __cxa_throw () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x006193d6 in ceph::__ceph_assert_fail (assertion=value 
optimized out, file=value optimized out, line=value optimized out,

func=value optimized out) at common/assert.cc:70
#11 0x00685e77 in MOSDRepScrub::decode_payload (this=0x33c0c40, 
cct=value optimized out) at ./messages/MOSDRepScrub.h:64
#12 0x006a7202 in decode_message (cct=0x2722000, header=..., 
footer=value optimized out, front=value optimized out, middle=value 
optimized out,

data=...) at msg/Message.cc:551
#13 0x0062c9cd in SimpleMessenger::Pipe::read_message 
(this=0x2ed3780, pm=0x7fae93976d88) at msg/SimpleMessenger.cc:1987
#14 0x006357d9 in SimpleMessenger::Pipe::reader (this=0x2ed3780) 
at msg/SimpleMessenger.cc:1601
#15 0x004c244d in SimpleMessenger::Pipe::Reader::entry 
(this=value optimized out) at msg/SimpleMessenger.h:208
#16 0x7faea6873d8c in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0

#17 0x7faea4eb004d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 2400)]#0  0x7faea687cb3b in raise () 
from /lib/x86_64-linux-gnu/libpthread.so.0

(gdb) frame 11
#11 0x00685e77 in MOSDRepScrub::decode_payload (this=0x33c0c40, 
cct=value optimized out) at ./messages/MOSDRepScrub.h:64

64  ./messages/MOSDRepScrub.h: No such file or directory.

Re: Random blocks when accessing rbd images

2011-12-22 Thread Martin Mailand

Hi Samuel
I think I am seeing it now.

root@s-brick-003:~# ceph pg dump|grep -i scrub
pg_stat objects mip degrunf kb  bytes   log disklog 
state   v   reportedup  acting  last_scrub
0.6 0   0   0   0   0   0   0   0 
active+clean+scrubbing  0'0 60'156  [6,2]   [6,2]   0'0 
2011-12-20 14:44:55.787529

root@s-brick-003:~# ceph -v
ceph version 0.39-171-gdcedda8 
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)

root@s-brick-003:~#


I also had an osd crash and hit this (Assertion: 
./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)), see my other 
email for more information.


-martin



Am 16.12.2011 22:17, schrieb Samuel Just:

In master, 061e7619aacf60a828e0ce84a108d5a0bea247c6 may fix the
problem.  If not, 5274e88d2cb8c0449a4ecd1ff0cf8bb0af2cfc97 includes
some asserts that may give us a clue as to how this is happening.
-Sam


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Assertion: ./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)

2011-12-22 Thread Martin Mailand

Hi Greg,
ok, I also have at the moment one pg which stays in scrubbing, is that 
also a result of the different versions I am running?
Do you know if Sam needs the cluster in this state to debug the 
scrubbing problem? Or is it unusable for that due to the different versions?



-martin

Am 22.12.2011 21:24, schrieb Gregory Farnum:

I see you're following master! :) You got bit by a wire-incompatible
change in one of the OSD messages that Sam made, although I think he's
actually going to be walking it back after a conversation we just had.
In any case, restarting all of your OSDs so they're running the same
code will fix it. :)
-Greg

On Thu, Dec 22, 2011 at 5:48 AM, Martin Mailandmar...@tuxadero.com  wrote:

Hi
today 2 of my osds (osd.4 and osd.7) crashed with the same error.

2011-12-21 14:41:18.896008 7fae9f3a5700 journal check_for_full at 80625664 :
JOURNAL FULL 80625664= 368639 (max_size 107372544 start 80994304)
2011-12-21 14:41:23.205993 7fae9fba6700 journal  FULL_FULL -  FULL_WAIT.
  last commit epoch committed, waiting for a new one to start.
2011-12-21 14:41:24.075990 7fae9fba6700 journal  FULL_WAIT -  FULL_NOTFULL.
  journal now active, setting completion plug.
./messages/MOSDRepScrub.h: In function 'virtual void
MOSDRepScrub::decode_payload(CephContext*)', in thread '7fae93977700'
./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x685e77]
  2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  6: (()+0x6d8c) [0x7faea6873d8c]
  7: (clone()+0x6d) [0x7faea4eb004d]
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x685e77]
  2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  6: (()+0x6d8c) [0x7faea6873d8c]
  7: (clone()+0x6d) [0x7faea4eb004d]
*** Caught signal (Aborted) **
  in thread 7fae93977700
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x645172]
  2: (()+0xfc60) [0x7faea687cc60]
  3: (gsignal()+0x35) [0x7faea4dfdd05]
  4: (abort()+0x186) [0x7faea4e01ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faea56b46dd]
  6: (()+0xb9926) [0x7faea56b2926]
  7: (()+0xb9953) [0x7faea56b2953]
  8: (()+0xb9a5e) [0x7faea56b2a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x6193d6]
  10: /usr/bin/ceph-osd() [0x685e77]
  11: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  12: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  13: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  14: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  15: (()+0x6d8c) [0x7faea6873d8c]
  16: (clone()+0x6d) [0x7faea4eb004d]


(gdb) thread apply all bt

snip

Thread 1 (Thread 2400):
#0  0x7faea687cb3b in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00644dc2 in reraise_fatal (signum=6) at
global/signal_handler.cc:59
#2  0x006453ba in handle_fatal_signal (signum=6) at
global/signal_handler.cc:106
#3signal handler called
---Typereturn  to continue, or qreturn  to quit---
#4  0x7faea4dfdd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7faea4e01ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7faea56b46dd in __gnu_cxx::__verbose_terminate_handler() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x7faea56b2926 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x7faea56b2953 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7faea56b2a5e in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x006193d6 in ceph::__ceph_assert_fail (assertion=value
optimized out, file=value optimized out, line=value optimized out,
func=value optimized out) at common/assert.cc:70
#11 0x00685e77 in MOSDRepScrub::decode_payload (this=0x33c0c40,
cct=value optimized out) at ./messages/MOSDRepScrub.h:64
#12 0x006a7202 in decode_message (cct=0x2722000, header=...,
footer=value optimized out, front=value optimized out, middle=value
optimized out,
data=...) at msg/Message.cc:551
#13 0x0062c9cd in SimpleMessenger::Pipe::read_message
(this=0x2ed3780, pm=0x7fae93976d88) at msg/SimpleMessenger.cc:1987
#14 0x006357d9 in 

Re: Assertion: ./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)

2011-12-22 Thread Martin Mailand

Hi Sam,
okay, after I upgraded the whole cluster, the stuck pg went away.

-martin

Am 22.12.2011 22:08, schrieb Samuel Just:

Martin, that bug should actually be fixed in current master.  You'll
need to upgrade the whole cluster, though.
-Sam

On Thu, Dec 22, 2011 at 12:40 PM, Martin Mailandmar...@tuxadero.com  wrote:

Hi Greg,
ok, I also have at the moment one pg which stays in scrubbing, is that also
a result of the different versions I am running?
Do you know if Sam needs the cluster in this state to debug the scrubbing
problem? Or is it unusable for that due to the different versions?


-martin

Am 22.12.2011 21:24, schrieb Gregory Farnum:


I see you're following master! :) You got bit by a wire-incompatible
change in one of the OSD messages that Sam made, although I think he's
actually going to be walking it back after a conversation we just had.
In any case, restarting all of your OSDs so they're running the same
code will fix it. :)
-Greg

On Thu, Dec 22, 2011 at 5:48 AM, Martin Mailandmar...@tuxadero.com
  wrote:


Hi
today 2 of my osds (osd.4 and osd.7) crashed with the same error.

2011-12-21 14:41:18.896008 7fae9f3a5700 journal check_for_full at
80625664 :
JOURNAL FULL 80625664= 368639 (max_size 107372544 start 80994304)
2011-12-21 14:41:23.205993 7fae9fba6700 journal  FULL_FULL -FULL_WAIT.
  last commit epoch committed, waiting for a new one to start.
2011-12-21 14:41:24.075990 7fae9fba6700 journal  FULL_WAIT -
  FULL_NOTFULL.
  journal now active, setting completion plug.
./messages/MOSDRepScrub.h: In function 'virtual void
MOSDRepScrub::decode_payload(CephContext*)', in thread '7fae93977700'
./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x685e77]
  2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  6: (()+0x6d8c) [0x7faea6873d8c]
  7: (clone()+0x6d) [0x7faea4eb004d]
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x685e77]
  2: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  3: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  4: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  5: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  6: (()+0x6d8c) [0x7faea6873d8c]
  7: (clone()+0x6d) [0x7faea4eb004d]
*** Caught signal (Aborted) **
  in thread 7fae93977700
  ceph version 0.39-171-gdcedda8
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
  1: /usr/bin/ceph-osd() [0x645172]
  2: (()+0xfc60) [0x7faea687cc60]
  3: (gsignal()+0x35) [0x7faea4dfdd05]
  4: (abort()+0x186) [0x7faea4e01ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faea56b46dd]
  6: (()+0xb9926) [0x7faea56b2926]
  7: (()+0xb9953) [0x7faea56b2953]
  8: (()+0xb9a5e) [0x7faea56b2a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x6193d6]
  10: /usr/bin/ceph-osd() [0x685e77]
  11: (decode_message(CephContext*, ceph_msg_header, ceph_msg_footer,
ceph::buffer::list, ceph::buffer::list, ceph::buffer::list)+0xcd2)
[0x6a7202]
  12: (SimpleMessenger::Pipe::read_message(Message**)+0x136d) [0x62c9cd]
  13: (SimpleMessenger::Pipe::reader()+0xb99) [0x6357d9]
  14: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4c244d]
  15: (()+0x6d8c) [0x7faea6873d8c]
  16: (clone()+0x6d) [0x7faea4eb004d]


(gdb) thread apply all bt

snip

Thread 1 (Thread 2400):
#0  0x7faea687cb3b in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00644dc2 in reraise_fatal (signum=6) at
global/signal_handler.cc:59
#2  0x006453ba in handle_fatal_signal (signum=6) at
global/signal_handler.cc:106
#3signal handler called
---Typereturnto continue, or qreturnto quit---
#4  0x7faea4dfdd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7faea4e01ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7faea56b46dd in __gnu_cxx::__verbose_terminate_handler() ()
from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x7faea56b2926 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x7faea56b2953 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7faea56b2a5e in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x006193d6 in ceph::__ceph_assert_fail (assertion=value
optimized out, file=value optimized out, line=value optimized out,
func=value optimized out) at common/assert.cc:70
#11 0x00685e77 in MOSDRepScrub::decode_payload (this=0x33c0c40,
cct=value optimized out) at ./messages/MOSDRepScrub.h:64
#12 0x006a7202 in 

Re: Random blocks when accessing rbd images

2011-12-15 Thread Martin Mailand

Hi Guido,
I am running ceph version 0.39-37-g54758ab 
(commit:54758abccf429122c1bc3bce6d01bc33f1cfe238) on my cluster and I do 
not see this problem. Do you use the qemu rbd block driver or the kernel 
mount?

How did you install ceph, via the packages?

-martin


Am 15.12.2011 16:45, schrieb Guido Winkelmann:

Am Donnerstag, 15. Dezember 2011, 17:32:25 schrieben Sie:

On 12/15/2011 05:07 PM, Guido Winkelmann wrote:

Hi,

I've got a small ceph cluster with one mon, one mds and two osds (all on
the same machine, for now), that I want to use as a block- and file
storage backend for qemu machine virtualisation.

I found that read access to some of the rbd images, or parts of some of
them sometimes blocks indefinitely, usually after the image has been
sitting around untouched for a while, for example over night. This has
the effect that virtual machines that try to access their disks as well
as rbd commands like rbd cp will just hang indefinitely.

  I found that these blocks can usually be fixed by restarting one of
  the
osds.

The last time this happened, ceph -s reported one of the osds to be in
state active+clean+scrubbing. (I'm afraid I don't have the complete
output from ceph -s anymore.)

Does anybody have any idea what could be going wrong here?


I think it's fixed in v0.39


I'm already using 0.39, so, no. (Should have mentioned that to start with...)

 Guido
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Random blocks when accessing rbd images

2011-12-15 Thread Martin Mailand

Hi Wido,
but wasn't that fixed a few weeks ago?

-martin

Am 15.12.2011 17:33, schrieb Wido den Hollander:

Yes, from what I've seen it will block indefinitely until you restart
one of the OSDs who are member of the PG.

Wido


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Random blocks when accessing rbd images

2011-12-15 Thread Martin Mailand

Hi,
at least there is a patch that should have fixed it.

http://marc.info/?l=ceph-develm=131955913203561w=2

Am 15.12.2011 17:38, schrieb Martin Mailand:

Hi Wido,
but wasn't that fixed a few weeks ago?

-martin

Am 15.12.2011 17:33, schrieb Wido den Hollander:

Yes, from what I've seen it will block indefinitely until you restart
one of the OSDs who are member of the PG.

Wido


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)

2011-12-05 Thread Martin Mailand

Hi Sage,
it happened again, this time I have the log, it's attached.

(gdb) thread 1
[Switching to thread 1 (Thread 24077)]#0  0x7f7995b83b3b in raise () 
from /lib/x86_64-linux-gnu/libpthread.so.0

(gdb) frame 11
#11 0x0072ee8d in FileJournal::committed_thru (this=0x1ebc000, 
seq=16833973) at os/FileJournal.cc:1011

1011os/FileJournal.cc: No such file or directory.
in os/FileJournal.cc
(gdb) p seq
$1 = 16833973
(gdb) p last_committed_seq
$2 = 16834010
(gdb)

Is this all info you need, or should I leave the osd in this state for 
further debugging?


-martin

Am 29.11.2011 17:07, schrieb Sage Weil:

On Tue, 29 Nov 2011, Martin Mailand wrote:

Hi,
with a build from today, I have the same prob.

os/FileJournal.cc: In function 'virtual void
FileJournal::committed_thru(uint64_t)', in thread '7fc55c85f700'
os/FileJournal.cc: 1011: FAILED assert(seq= last_committed_seq)
  ceph version 0.38-250-gc2889fe
(commit:c2889fef420611df3dd0de4064c91f6aa9f86625)


Can you post a log of the failed ceph-osd restart with 'debug journal =
20' and 'debug filestore = 20'?

Thanks!
sage




osd.0.log.debug.bz2
Description: BZip2 compressed data


Re: Cluster sync doesn't finsh

2011-12-05 Thread Martin Mailand

Hi Sam,
is there anything new on this Issue, which I could test?

-martin


Am 19.11.2011 02:05, schrieb Samuel Just:

I've inserted this bug as #1738.  Unfortunately, this will take a bit
of effort to fix.  In the short term, you could switch to a crushmap
where each node at the bottom level of the hierarchy contains more
than one device.  (i.e., remove the node level and stop at the rack
level).

Thanks for the help!
-Sam

On Fri, Nov 18, 2011 at 12:17 PM, Martin Mailandmar...@tuxadero.com  wrote:

Hi Sam,

here the crushmap

http://85.214.49.87/ceph/crushmap.txt
http://85.214.49.87/ceph/crushmap

-martin

Samuel Just schrieb:


It looks like a crushmap related problem.  Could you send us the crushmap?

ceph osd getcrushmap

Thanks
-Sam

On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum
gregory.far...@dreamhost.com  wrote:


On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen
tommi.virta...@dreamhost.com  wrote:


On Thu, Nov 17, 2011 at 12:48, Martin Mailandmar...@tuxadero.com
wrote:


Hi,
I am doing cluster failure test, where I shut down one OSD an wait for
the
cluster to sync. But the sync never finshed, at around 4-5% it stops. I
stoped osd2.


...


2011-11-17 16:42:45.520740pg v1337: 600 pgs: 547 active+clean, 53
active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB
avail;
4025/82404 degraded (4.884%)


...


The osd log, the ceph.conf, pg dump, osd dump could be found here.

http://85.214.49.87/ceph/


This looks a bit worrying:

2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424
192.168.42.114:6802/21115 pipe(0x2596c80 sd=17 pgs=0 cs=0 l=0).connect
claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21115 -
wrong node!

So osd.0 is basically refusing to talk to one of the other OSDs. I
don't understand the messenger well enough to know why this would be,
but it wouldn't surprise me if this problem kept the objects degraded
-- it looks like a breakage in the osd-osd communication.

Now if this was the reason, I'd expect a restart of all the OSDs to
get it back in shape; messenger state is ephemeral. Can you confirm
that?


Probably not — that wrong node thing can occur for a lot of different
reasons, some of which matter and most of which don't. Sam's looking
into the problem; there's something going wrong with the CRUSH
calculations or the monitor PG placement overrides or something...
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)

2011-12-05 Thread Martin Mailand

Hi Sage,
I just updated the crashed osd, and it did not work very well.

os/FileJournal.cc: 1173: FAILED assert(h-seq = last_committed_seq)

1173os/FileJournal.cc: No such file or directory.
in os/FileJournal.cc
(gdb) p h-seq
value has been optimized out
(gdb) p last_committed_seq
$1 = 16834095


-martin

Am 05.12.2011 18:44, schrieb Sage Weil:

dc167bac7800c75df971bded4b54e0de48f7b18f (wip-journal branch) should fix
this.  Can you give it a test before I push to stable?

Thanks!
sage


On Mon, 5 Dec 2011, Martin Mailand wrote:


Hi Sage,
it happened again, this time I have the log, it's attached.

(gdb) thread 1
[Switching to thread 1 (Thread 24077)]#0  0x7f7995b83b3b in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) frame 11
#11 0x0072ee8d in FileJournal::committed_thru (this=0x1ebc000,
seq=16833973) at os/FileJournal.cc:1011
1011os/FileJournal.cc: No such file or directory.
 in os/FileJournal.cc
(gdb) p seq
$1 = 16833973
(gdb) p last_committed_seq
$2 = 16834010
(gdb)

Is this all info you need, or should I leave the osd in this state for further
debugging?

-martin

Am 29.11.2011 17:07, schrieb Sage Weil:

On Tue, 29 Nov 2011, Martin Mailand wrote:

Hi,
with a build from today, I have the same prob.

os/FileJournal.cc: In function 'virtual void
FileJournal::committed_thru(uint64_t)', in thread '7fc55c85f700'
os/FileJournal.cc: 1011: FAILED assert(seq= last_committed_seq)
   ceph version 0.38-250-gc2889fe
(commit:c2889fef420611df3dd0de4064c91f6aa9f86625)


Can you post a log of the failed ceph-osd restart with 'debug journal =
20' and 'debug filestore = 20'?

Thanks!
sage





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


osd.log.bz2
Description: application/bzip


os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)

2011-11-29 Thread Martin Mailand

Hi
I hit this assertion a few times. I use ext4 as the osd fs, so I think 
we have to replay the whole journal, maybe that triggers it.



-martin

2011-11-29 11:37:55.393296 7fab45dbc7a0 FileStore is up to date.
os/FileJournal.cc: In function 'virtual void 
FileJournal::committed_thru(uint64_t)', in thread '7fab434cf700'

os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)
 ceph version 0.38-244-g30def38 
(commit:30def38d21b217f244db74e6c469598d794fa8a1)

 1: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
 2: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
 3: (FileStore::sync_entry()+0xec7) [0x70aae7]
 4: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
 5: (()+0x6d8c) [0x7fab45993d8c]
 6: (clone()+0x6d) [0x7fab43fd004d]
 ceph version 0.38-244-g30def38 
(commit:30def38d21b217f244db74e6c469598d794fa8a1)

 1: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
 2: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
 3: (FileStore::sync_entry()+0xec7) [0x70aae7]
 4: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
 5: (()+0x6d8c) [0x7fab45993d8c]
 6: (clone()+0x6d) [0x7fab43fd004d]
*** Caught signal (Aborted) **
 in thread 7fab434cf700
 ceph version 0.38-244-g30def38 
(commit:30def38d21b217f244db74e6c469598d794fa8a1)

 1: /usr/bin/ceph-osd() [0x5a7ba2]
 2: (()+0xfc60) [0x7fab4599cc60]
 3: (gsignal()+0x35) [0x7fab43f1dd05]
 4: (abort()+0x186) [0x7fab43f21ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fab447d46dd]
 6: (()+0xb9926) [0x7fab447d2926]
 7: (()+0xb9953) [0x7fab447d2953]
 8: (()+0xb9a5e) [0x7fab447d2a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x396) [0x5cd9e6]

 10: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
 11: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
 12: (FileStore::sync_entry()+0xec7) [0x70aae7]
 13: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
 14: (()+0x6d8c) [0x7fab45993d8c]
 15: (clone()+0x6d) [0x7fab43fd004d]


Thread 1 (Thread 2491):
#0  0x7fab4599cb3b in raise () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005a77f2 in reraise_fatal (signum=6) at 
global/signal_handler.cc:59
#2  0x005a7dea in handle_fatal_signal (signum=6) at 
global/signal_handler.cc:106

#3  signal handler called
#4  0x7fab43f1dd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7fab43f21ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7fab447d46dd in __gnu_cxx::__verbose_terminate_handler() () 
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x7fab447d2926 in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6

---Type return to continue, or q return to quit---
#8  0x7fab447d2953 in std::terminate() () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7fab447d2a5e in __cxa_throw () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x005cd9e6 in ceph::__ceph_assert_fail (assertion=value 
optimized out, file=value optimized out, line=value optimized out, 
func=value optimized out)

at common/assert.cc:70
#11 0x0072e7cd in FileJournal::committed_thru (this=0x141, 
seq=4145693) at os/FileJournal.cc:1011
#12 0x00714d79 in JournalingObjectStore::commit_finish 
(this=0x1401000) at os/JournalingObjectStore.cc:260
#13 0x0070aae7 in FileStore::sync_entry (this=0x1401000) at 
os/FileStore.cc:3079
#14 0x007139bd in FileStore::SyncThread::entry (this=value 
optimized out) at os/FileStore.h:101
#15 0x7fab45993d8c in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0

#16 0x7fab43fd004d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x in ?? ()
(gdb)
(gdb) thread 1
[Switching to thread 1 (Thread 2491)]#0  0x7fab4599cb3b in raise () 
from /lib/x86_64-linux-gnu/libpthread.so.0

(gdb) thread 11
Thread ID 11 not known.
(gdb) frame 11
#11 0x0072e7cd in FileJournal::committed_thru (this=0x141, 
seq=4145693) at os/FileJournal.cc:1011

1011os/FileJournal.cc: No such file or directory.
in os/FileJournal.cc
(gdb) p seq
$1 = 4145693
(gdb) p last_committed_seq
$2 = 4145768
(gdb)

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)

2011-11-29 Thread Martin Mailand

Hi,
with a build from today, I have the same prob.

os/FileJournal.cc: In function 'virtual void 
FileJournal::committed_thru(uint64_t)', in thread '7fc55c85f700'

os/FileJournal.cc: 1011: FAILED assert(seq = last_committed_seq)
 ceph version 0.38-250-gc2889fe 
(commit:c2889fef420611df3dd0de4064c91f6aa9f86625)


-martin

Am 29.11.2011 13:14, schrieb Martin Mailand:

Hi Stratos,
ok, my build was form the 23.11, I retest with master.

-martin

Am 29.11.2011 12:56, schrieb Stratos Psomadakis:

On 11/29/2011 01:48 PM, Martin Mailand wrote:

Hi
I hit this assertion a few times. I use ext4 as the osd fs, so I think
we have to replay the whole journal, maybe that triggers it.


I've hit that too with v0.38 (with OSD on ext4), but when I built ceph
from the master branch, the issue seemed to be resolved.



-martin

2011-11-29 11:37:55.393296 7fab45dbc7a0 FileStore is up to date.
os/FileJournal.cc: In function 'virtual void
FileJournal::committed_thru(uint64_t)', in thread '7fab434cf700'
os/FileJournal.cc: 1011: FAILED assert(seq= last_committed_seq)
ceph version 0.38-244-g30def38
(commit:30def38d21b217f244db74e6c469598d794fa8a1)
1: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
2: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
3: (FileStore::sync_entry()+0xec7) [0x70aae7]
4: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
5: (()+0x6d8c) [0x7fab45993d8c]
6: (clone()+0x6d) [0x7fab43fd004d]
ceph version 0.38-244-g30def38
(commit:30def38d21b217f244db74e6c469598d794fa8a1)
1: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
2: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
3: (FileStore::sync_entry()+0xec7) [0x70aae7]
4: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
5: (()+0x6d8c) [0x7fab45993d8c]
6: (clone()+0x6d) [0x7fab43fd004d]
*** Caught signal (Aborted) **
in thread 7fab434cf700
ceph version 0.38-244-g30def38
(commit:30def38d21b217f244db74e6c469598d794fa8a1)
1: /usr/bin/ceph-osd() [0x5a7ba2]
2: (()+0xfc60) [0x7fab4599cc60]
3: (gsignal()+0x35) [0x7fab43f1dd05]
4: (abort()+0x186) [0x7fab43f21ab6]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fab447d46dd]
6: (()+0xb9926) [0x7fab447d2926]
7: (()+0xb9953) [0x7fab447d2953]
8: (()+0xb9a5e) [0x7fab447d2a5e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5cd9e6]
10: (FileJournal::committed_thru(unsigned long)+0xcd) [0x72e7cd]
11: (JournalingObjectStore::commit_finish()+0xb9) [0x714d79]
12: (FileStore::sync_entry()+0xec7) [0x70aae7]
13: (FileStore::SyncThread::entry()+0xd) [0x7139bd]
14: (()+0x6d8c) [0x7fab45993d8c]
15: (clone()+0x6d) [0x7fab43fd004d]


Thread 1 (Thread 2491):
#0 0x7fab4599cb3b in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x005a77f2 in reraise_fatal (signum=6) at
global/signal_handler.cc:59
#2 0x005a7dea in handle_fatal_signal (signum=6) at
global/signal_handler.cc:106
#3signal handler called
#4 0x7fab43f1dd05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x7fab43f21ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x7fab447d46dd in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x7fab447d2926 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
---Typereturn to continue, or qreturn to quit---
#8 0x7fab447d2953 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x7fab447d2a5e in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x005cd9e6 in ceph::__ceph_assert_fail (assertion=value
optimized out, file=value optimized out, line=value optimized
out, func=value optimized out)
at common/assert.cc:70
#11 0x0072e7cd in FileJournal::committed_thru (this=0x141,
seq=4145693) at os/FileJournal.cc:1011
#12 0x00714d79 in JournalingObjectStore::commit_finish
(this=0x1401000) at os/JournalingObjectStore.cc:260
#13 0x0070aae7 in FileStore::sync_entry (this=0x1401000) at
os/FileStore.cc:3079
#14 0x007139bd in FileStore::SyncThread::entry (this=value
optimized out) at os/FileStore.h:101
#15 0x7fab45993d8c in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x7fab43fd004d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x in ?? ()
(gdb)
(gdb) thread 1
[Switching to thread 1 (Thread 2491)]#0 0x7fab4599cb3b in raise
() from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) thread 11
Thread ID 11 not known.
(gdb) frame 11
#11 0x0072e7cd in FileJournal::committed_thru (this=0x141,
seq=4145693) at os/FileJournal.cc:1011
1011 os/FileJournal.cc: No such file or directory.
in os/FileJournal.cc
(gdb) p seq
$1 = 4145693
(gdb) p last_committed_seq
$2 = 4145768
(gdb)

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body

Re: osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-24 Thread Martin Mailand

Hi Sage,
I hit it again, this time on another osd

ceph version 0.38-181-g2e19550 
(commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)


Thread 1 (Thread 2951):
#0  0x7f36bbb41b3b in raise () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x005f5852 in reraise_fatal (signum=6) at 
global/signal_handler.cc:59
#2  0x005f5e4a in handle_fatal_signal (signum=6) at 
global/signal_handler.cc:106

#3  signal handler called
#4  0x7f36ba0c2d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7f36ba0c6ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7f36ba9796dd in __gnu_cxx::__verbose_terminate_handler() () 
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

---Type return to continue, or q return to quit---
#7  0x7f36ba977926 in ?? () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x7f36ba977953 in std::terminate() () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7f36ba977a5e in __cxa_throw () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x005f6956 in ceph::__ceph_assert_fail (assertion=value 
optimized out, file=value optimized out, line=value optimized out,

func=value optimized out) at common/assert.cc:70
#11 0x0056616a in OSD::dequeue_op (this=0x25b, pg=value 
optimized out) at osd/OSD.cc:5518
#12 0x005d4406 in ThreadPool::worker (this=0x25b0408) at 
common/WorkQueue.cc:54
#13 0x005822dd in ThreadPool::WorkThread::entry (this=value 
optimized out) at ./common/WorkQueue.h:120
#14 0x7f36bbb38d8c in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0

#15 0x7f36ba17504d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x in ?? ()
(gdb) thread 1
[Switching to thread 1 (Thread 2951)]#0  0x7f36bbb41b3b in raise () 
from /lib/x86_64-linux-gnu/libpthread.so.0

(gdb) frame 11
#11 0x0056616a in OSD::dequeue_op (this=0x25b, pg=value 
optimized out) at osd/OSD.cc:5518

5518osd/OSD.cc: No such file or directory.
in osd/OSD.cc
(gdb) p pending_ops
$1 = 0



-martin


Am 16.11.2011 22:12, schrieb Sage Weil:

Hi Martin,

I've reread the code twice now and it's really not clear to me how
pending_ops could get out of sync with the actual queue size.  I've pushed
a couple of patches that remove surrounding dead code and add an
additional assert sanity check to master.Have you seen this again, or
just that once?

Opened http://tracker.newdream.net/issues/1727

Thanks-
sage


On Wed, 16 Nov 2011, Martin Mailand wrote:


Hi,
so after a little help from greg.

(gdb) print pending_ops
$1 = 0

-martin

Sage Weil schrieb:

On Mon, 14 Nov 2011, Gregory Farnum wrote:

It's not a big deal; logging is expensive. :) Just a backtrace isn't a
lot to go on, but it's better than nothing!

On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailandmar...@tuxadero.com
wrote:

Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always
on,
a core dump would be the best solution?


I'm mainly interested in whether pending_ops is 0 or  0.  A 'thread apply
all bt' may also be useful.

Thanks!
sage



-martin

Gregory Farnum schrieb:

Do you have any other system state? (More logs, core dumps.)

Make a bug in the tracker either way so it doesn't get lost track of.
:)
-Greg

On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailandmar...@tuxadero.com
wrote:

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
'7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
  in thread 7faeb6139700
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: /usr/bin/ceph-osd() [0x5b8b52]
  2: (()+0xfc60) [0x7faec4d1bc60]
  3: (gsignal()+0x35) [0x7faec34a1d05]
  4: (abort()+0x186) [0x7faec34a5ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
[0x7faec3d586dd]
  6: (()+0xb9926) [0x7faec3d56926]
  7: (()+0xb9953) [0x7faec3d56953]
  8: (()+0xb9a5e) [0x7faec3d56a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
  10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  13: (()+0x6d8c) [0x7faec4d12d8c]
  14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe
ceph-devel in
the body of a message

Re: Cluster sync doesn't finsh

2011-11-18 Thread Martin Mailand

Hi Sam,

here the crushmap

http://85.214.49.87/ceph/crushmap.txt
http://85.214.49.87/ceph/crushmap

-martin

Samuel Just schrieb:

It looks like a crushmap related problem.  Could you send us the crushmap?

ceph osd getcrushmap

Thanks
-Sam

On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum
gregory.far...@dreamhost.com wrote:

On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen
tommi.virta...@dreamhost.com wrote:

On Thu, Nov 17, 2011 at 12:48, Martin Mailand mar...@tuxadero.com wrote:

Hi,
I am doing cluster failure test, where I shut down one OSD an wait for the
cluster to sync. But the sync never finshed, at around 4-5% it stops. I
stoped osd2.

...

2011-11-17 16:42:45.520740pg v1337: 600 pgs: 547 active+clean, 53
active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB avail;
4025/82404 degraded (4.884%)

...

The osd log, the ceph.conf, pg dump, osd dump could be found here.

http://85.214.49.87/ceph/

This looks a bit worrying:

2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424 
192.168.42.114:6802/21115 pipe(0x2596c80 sd=17 pgs=0 cs=0 l=0).connect
claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21115 -
wrong node!

So osd.0 is basically refusing to talk to one of the other OSDs. I
don't understand the messenger well enough to know why this would be,
but it wouldn't surprise me if this problem kept the objects degraded
-- it looks like a breakage in the osd-osd communication.

Now if this was the reason, I'd expect a restart of all the OSDs to
get it back in shape; messenger state is ephemeral. Can you confirm
that?

Probably not — that wrong node thing can occur for a lot of different
reasons, some of which matter and most of which don't. Sam's looking
into the problem; there's something going wrong with the CRUSH
calculations or the monitor PG placement overrides or something...
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-17 Thread Martin Mailand

Hi Sage,
I saw it once, but the osd node seems a bit dodgy. I re-imaged the  node 
today, I try again to reproduce it.


-martin

Am 16.11.2011 22:12, schrieb Sage Weil:

Hi Martin,

I've reread the code twice now and it's really not clear to me how
pending_ops could get out of sync with the actual queue size.  I've pushed
a couple of patches that remove surrounding dead code and add an
additional assert sanity check to master.Have you seen this again, or
just that once?

Opened http://tracker.newdream.net/issues/1727

Thanks-
sage


On Wed, 16 Nov 2011, Martin Mailand wrote:


Hi,
so after a little help from greg.

(gdb) print pending_ops
$1 = 0

-martin

Sage Weil schrieb:

On Mon, 14 Nov 2011, Gregory Farnum wrote:

It's not a big deal; logging is expensive. :) Just a backtrace isn't a
lot to go on, but it's better than nothing!

On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailandmar...@tuxadero.com
wrote:

Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always
on,
a core dump would be the best solution?


I'm mainly interested in whether pending_ops is 0 or  0.  A 'thread apply
all bt' may also be useful.

Thanks!
sage



-martin

Gregory Farnum schrieb:

Do you have any other system state? (More logs, core dumps.)

Make a bug in the tracker either way so it doesn't get lost track of.
:)
-Greg

On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailandmar...@tuxadero.com
wrote:

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
'7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  4: (()+0x6d8c) [0x7faec4d12d8c]
  5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
  in thread 7faeb6139700
  ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
  1: /usr/bin/ceph-osd() [0x5b8b52]
  2: (()+0xfc60) [0x7faec4d1bc60]
  3: (gsignal()+0x35) [0x7faec34a1d05]
  4: (abort()+0x186) [0x7faec34a5ab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d)
[0x7faec3d586dd]
  6: (()+0xb9926) [0x7faec3d56926]
  7: (()+0xb9953) [0x7faec3d56953]
  8: (()+0xb9a5e) [0x7faec3d56a5e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
  10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
  11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
  12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
  13: (()+0x6d8c) [0x7faec4d12d8c]
  14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe
ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cluster sync doesn't finsh

2011-11-17 Thread Martin Mailand

Hi,
I am doing cluster failure test, where I shut down one OSD an wait for 
the cluster to sync. But the sync never finshed, at around 4-5% it 
stops. I stoped osd2.


2011-11-17 16:40:48.015370pg v1333: 600 pgs: 1 active, 546 
active+clean, 53 active+clean+degraded; 113 GB data, 183 GB used, 1142 
GB / 1395 GB avail; 4200/82404 degraded (5.097%)
2011-11-17 16:40:53.109391pg v1334: 600 pgs: 1 active, 546 
active+clean, 53 active+clean+degraded; 113 GB data, 183 GB used, 1142 
GB / 1395 GB avail; 4117/82404 degraded (4.996%)
2011-11-17 16:40:58.228525pg v1335: 600 pgs: 1 active, 546 
active+clean, 53 active+clean+degraded; 113 GB data, 183 GB used, 1142 
GB / 1395 GB avail; 4037/82404 degraded (4.899%)
2011-11-17 16:41:03.223778pg v1336: 600 pgs: 547 active+clean, 53 
active+clean+degraded; 113 GB data, 183 GB used, 1142 GB / 1395 GB 
avail; 4025/82404 degraded (4.884%)
2011-11-17 16:42:45.520740pg v1337: 600 pgs: 547 active+clean, 53 
active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB 
avail; 4025/82404 degraded (4.884%)


^C
root@m-brick-000:~# date -R
Thu, 17 Nov 2011 17:56:08 +0100
root@m-brick-000:~#

So for the last hour nothing happend, there is no load on the cluster.

The osd log, the ceph.conf, pg dump, osd dump could be found here.

http://85.214.49.87/ceph/

ceph version 0.38-181-g2e19550 
(commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)


-martin



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-15 Thread Martin Mailand

Hi,
I have a bt.
http://pastebin.com/QNcja2QK

-martin

Sage Weil schrieb:

On Mon, 14 Nov 2011, Gregory Farnum wrote:

It's not a big deal; logging is expensive. :) Just a backtrace isn't a
lot to go on, but it's better than nothing!

On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand mar...@tuxadero.com wrote:

Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always on,
a core dump would be the best solution?


I'm mainly interested in whether pending_ops is 0 or  0.  A 'thread apply 
all bt' may also be useful.


Thanks!
sage



-martin

Gregory Farnum schrieb:

Do you have any other system state? (More logs, core dumps.)

Make a bug in the tracker either way so it doesn't get lost track of. :)
-Greg

On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand mar...@tuxadero.com
wrote:

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
'7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
 in thread 7faeb6139700
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: /usr/bin/ceph-osd() [0x5b8b52]
 2: (()+0xfc60) [0x7faec4d1bc60]
 3: (gsignal()+0x35) [0x7faec34a1d05]
 4: (abort()+0x186) [0x7faec34a5ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
 6: (()+0xb9926) [0x7faec3d56926]
 7: (()+0xb9953) [0x7faec3d56953]
 8: (()+0xb9a5e) [0x7faec3d56a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 13: (()+0x6d8c) [0x7faec4d12d8c]
 14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-15 Thread Martin Mailand

Hi,
so after a little help from greg.

(gdb) print pending_ops
$1 = 0

-martin

Sage Weil schrieb:

On Mon, 14 Nov 2011, Gregory Farnum wrote:

It's not a big deal; logging is expensive. :) Just a backtrace isn't a
lot to go on, but it's better than nothing!

On Mon, Nov 14, 2011 at 11:45 AM, Martin Mailand mar...@tuxadero.com wrote:

Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always on,
a core dump would be the best solution?


I'm mainly interested in whether pending_ops is 0 or  0.  A 'thread apply 
all bt' may also be useful.


Thanks!
sage



-martin

Gregory Farnum schrieb:

Do you have any other system state? (More logs, core dumps.)

Make a bug in the tracker either way so it doesn't get lost track of. :)
-Greg

On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand mar...@tuxadero.com
wrote:

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread
'7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
 in thread 7faeb6139700
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: /usr/bin/ceph-osd() [0x5b8b52]
 2: (()+0xfc60) [0x7faec4d1bc60]
 3: (gsignal()+0x35) [0x7faec34a1d05]
 4: (abort()+0x186) [0x7faec34a5ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
 6: (()+0xb9926) [0x7faec3d56926]
 7: (()+0xb9953) [0x7faec3d56953]
 8: (()+0xb9a5e) [0x7faec3d56a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 13: (()+0x6d8c) [0x7faec4d12d8c]
 14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph and ext4

2011-11-14 Thread Martin Mailand

Hi Christian,
I am not sure if you noticed, but your ext4 bug is fixed in mainline. I 
am running a ceph cluster with 40+ vms for over a week by now, without 
any problems. An fsck.ext4 shows the ext4 is clean.
The performance of ext4 is much better than btrfs, no rise in the load 
of the osd's.


-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph and ext4

2011-11-14 Thread Martin Mailand

Hi Tomasz,
as far as I know it still has this limit.
But it should be relatively safe to use it.

http://marc.info/?l=ceph-develm=131942130322957w=2

If we hit the 4KB limit of xattrs in ext4 how does it show up in the rbd 
layer?


How does it show up in the fs layer, would the fs still be clean?

-martin


Am 14.11.2011 14:09, schrieb Tomasz Paszkowski:

what about limit on xattr size ? Is it still limited to 4KB ?



On Mon, Nov 14, 2011 at 1:15 PM, Martin Mailandmar...@tuxadero.com  wrote:

Hi Christian,
I am not sure if you noticed, but your ext4 bug is fixed in mainline. I am
running a ceph cluster with 40+ vms for over a week by now, without any
problems. An fsck.ext4 shows the ext4 is clean.
The performance of ext4 is much better than btrfs, no rise in the load of
the osd's.

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-14 Thread Martin Mailand

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
 in thread 7faeb6139700
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: /usr/bin/ceph-osd() [0x5b8b52]
 2: (()+0xfc60) [0x7faec4d1bc60]
 3: (gsignal()+0x35) [0x7faec34a1d05]
 4: (abort()+0x186) [0x7faec34a5ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
 6: (()+0xb9926) [0x7faec3d56926]
 7: (()+0xb9953) [0x7faec3d56953]
 8: (()+0xb9a5e) [0x7faec3d56a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x396) [0x5bddb6]

 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 13: (()+0x6d8c) [0x7faec4d12d8c]
 14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crushmap errors

2011-11-14 Thread Martin Mailand

Hi Sage,
1. The crushtool grammer fix is working for me. Thanks.

2. I think if an admin puts the extra rack info into the ceph.conf file, 
than it should do what expected. I understand your worries but on the 
other end ceph is not an end user tool, and people should know what they 
do and balance there racks evenly.

Just my two cents.

-martin


Am 11.11.2011 23:51, schrieb Sage Weil:

On Fri, 11 Nov 2011, Martin Mailand wrote:

Hi,
I used in ceph v0.38 the host and rack feature in the conf during an mkcephfs.
Now I have to problems with the crushmap

1. I cannot compile a ceph genearated crushmap.
crushtool -c file.txt -o file
file.txt:4 error: parse error at '.0'


Whoops, will push a patch to stable shortly.  The grammer wasn't
recognizing '.' as a legal character.


2. Why are 2 racks are not enough for 2 failure domains?
 From the commit:
If there are2 racks, separate across racks.


Well, technically they are.  My worry is that it's more likely that racks
will have significantly vary capacity (i.e. crush weight) due to, say, 1
full rack and a second 1/2 rack.  If the policy forces replicas be placed
across racks things won't balance well.

I suppose there should be an argument like --min-racks that controls that
threshold?

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph and ext4

2011-11-14 Thread Martin Mailand

Hi Gregory,
this is quite bad, so ext4 is still no alternative as a backend fs.

-martin

Gregory Farnum schrieb:

On Mon, Nov 14, 2011 at 5:33 AM, Martin Mailand mar...@tuxadero.com wrote:

Hi Tomasz,
as far as I know it still has this limit.
But it should be relatively safe to use it.

http://marc.info/?l=ceph-develm=131942130322957w=2

If we hit the 4KB limit of xattrs in ext4 how does it show up in the rbd
layer?

How does it show up in the fs layer, would the fs still be clean?


Right now it would show up very badly, unfortunately. (And yes, the
limit is still there.) You'd notice, though you might manage to
corrupt some of your data first. :/

However, if you're not taking snapshots and you're not using xattrs
yourself, you won't hit it with rbd or the Ceph FS.
-Greg


-martin


Am 14.11.2011 14:09, schrieb Tomasz Paszkowski:

what about limit on xattr size ? Is it still limited to 4KB ?



On Mon, Nov 14, 2011 at 1:15 PM, Martin Mailandmar...@tuxadero.com
 wrote:

Hi Christian,
I am not sure if you noticed, but your ext4 bug is fixed in mainline. I
am
running a ceph cluster with 40+ vms for over a week by now, without any
problems. An fsck.ext4 shows the ext4 is clean.
The performance of ext4 is much better than btrfs, no rise in the load of
the osd's.

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd/OSD.cc: 5534: FAILED assert(pending_ops 0)

2011-11-14 Thread Martin Mailand

Hi Gregory,
I do not have more at the moment. As I cannot have the debug log always 
on, a core dump would be the best solution?


-martin

Gregory Farnum schrieb:

Do you have any other system state? (More logs, core dumps.)

Make a bug in the tracker either way so it doesn't get lost track of. :)
-Greg

On Mon, Nov 14, 2011 at 6:04 AM, Martin Mailand mar...@tuxadero.com wrote:

Hi,
today one of my ods died, the log is.

sd/OSD.cc: In function 'void OSD::dequeue_op(PG*)', in thread '7faeb6139700'
osd/OSD.cc: 5534: FAILED assert(pending_ops  0)
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 2: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 3: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 4: (()+0x6d8c) [0x7faec4d12d8c]
 5: (clone()+0x6d) [0x7faec355404d]
*** Caught signal (Aborted) **
 in thread 7faeb6139700
 ceph version 0.38 (commit:b600ec2ac7c0f2e508720f8e8bb87c3db15509b9)
 1: /usr/bin/ceph-osd() [0x5b8b52]
 2: (()+0xfc60) [0x7faec4d1bc60]
 3: (gsignal()+0x35) [0x7faec34a1d05]
 4: (abort()+0x186) [0x7faec34a5ab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7faec3d586dd]
 6: (()+0xb9926) [0x7faec3d56926]
 7: (()+0xb9953) [0x7faec3d56953]
 8: (()+0xb9a5e) [0x7faec3d56a5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x396) [0x5bddb6]
 10: (OSD::dequeue_op(PG*)+0x4bb) [0x55a4db]
 11: (ThreadPool::worker()+0x6e6) [0x5b7b16]
 12: (ThreadPool::WorkThread::entry()+0xd) [0x57398d]
 13: (()+0x6d8c) [0x7faec4d12d8c]
 14: (clone()+0x6d) [0x7faec355404d]

Anything else needed to debug this?

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph and ext4

2011-11-14 Thread Martin Mailand

so snapshots of rbd images would be safe? or would they hit the limit?

Sage Weil schrieb:

On Mon, 14 Nov 2011, Gregory Farnum wrote:

On Mon, Nov 14, 2011 at 5:33 AM, Martin Mailand mar...@tuxadero.com wrote:

Hi Tomasz,
as far as I know it still has this limit.
But it should be relatively safe to use it.

http://marc.info/?l=ceph-develm=131942130322957w=2

If we hit the 4KB limit of xattrs in ext4 how does it show up in the rbd
layer?

How does it show up in the fs layer, would the fs still be clean?

Right now it would show up very badly, unfortunately. (And yes, the
limit is still there.) You'd notice, though you might manage to
corrupt some of your data first. :/


Well, the osd's are now more careful about being fail-stop, so if they hit 
the xattr limit they crash.  So there won't be data corruption per se, 
except that you won't be able to start the OSD up again because the 
journal replay will keep hitting the limit.



However, if you're not taking snapshots and you're not using xattrs
yourself, you won't hit it with rbd or the Ceph FS.


Right.  Nothing sets large xattrs on objects in rbd.  For the file system, 
this would only happen on extremely (!) deeply nested directories (ceph 
dfs xattrs are managed by the MDS, not as object attrs).


sage



-Greg


-martin


Am 14.11.2011 14:09, schrieb Tomasz Paszkowski:

what about limit on xattr size ? Is it still limited to 4KB ?



On Mon, Nov 14, 2011 at 1:15 PM, Martin Mailandmar...@tuxadero.com
 wrote:

Hi Christian,
I am not sure if you noticed, but your ext4 bug is fixed in mainline. I
am
running a ceph cluster with 40+ vms for over a week by now, without any
problems. An fsck.ext4 shows the ext4 is clean.
The performance of ext4 is much better than btrfs, no rise in the load of
the osd's.

-martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crushmap errors

2011-11-11 Thread Martin Mailand

Hi,
I used in ceph v0.38 the host and rack feature in the conf during an 
mkcephfs. Now I have to problems with the crushmap


1. I cannot compile a ceph genearated crushmap.
crushtool -c file.txt -o file
file.txt:4 error: parse error at '.0'

# begin crush map

# devices
device 0 osd.0


2. Why are 2 racks are not enough for 2 failure domains?
From the commit:
If there are 2 racks, separate across racks.

and in the src/osd/OSDMap.cc

   if (racks.size()  3) {
  // spread replicas across hosts
  crush_rule_set_step(rule, 1, CRUSH_RULE_CHOOSE_LEAF_FIRSTN, 
CRUSH_CHOOSE_N, 2);


shouldn't that be

   if (racks.size()  1) {
  // spread replicas across racks
  crush_rule_set_step(rule, 1, CRUSH_RULE_CHOOSE_LEAF_FIRSTN, 
CRUSH_CHOOSE_N, 2);


Best Regards,
 martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]

2011-10-27 Thread Martin Mailand

Hi
resend without the perf attachment, which could be found here:
http://tuxadero.com/multistorage/perf.report.txt.bz2

Best Regards,
 martin

 Original-Nachricht 
Betreff: Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
Datum: Wed, 26 Oct 2011 22:38:47 +0200
Von: Martin Mailand mar...@tuxadero.com
Antwort an: mar...@tuxadero.com
An: Sage Weil s...@newdream.net
Kopie (CC): Christian Brunner c...@muc.de, ceph-devel@vger.kernel.org, 
 linux-bt...@vger.kernel.org


Hi,
I have more or less the same setup as Christian and I suffer the same
problems.
But as far as I can see the output of latencytop and perf differs form
Christian one, both are attached.
I was wondering about the high latency from btrfs-submit.

Process btrfs-submit-0 (970) Total: 2123.5 msec

I have as well the high IO rate and high IO wait.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.600.002.20   82.400.00   14.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.008.40 0.0074.40
17.71 0.033.810.003.81   3.81   3.20
sdb   0.00 7.000.00  269.80 0.00  1224.80
9.08   107.19  398.690.00  398.69   3.15  85.00

top - 21:57:41 up  8:41,  1 user,  load average: 0.65, 0.79, 0.76
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  2.4%sy,  0.0%ni, 70.8%id, 25.8%wa,  0.0%hi,  0.3%si,
0.0%st
Mem:   4018276k total,  1577728k used,  2440548k free,10496k buffers
Swap:  1998844k total,0k used,  1998844k free,  1316696k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

  1399 root  20   0  548m 103m 3428 S  0.0  2.6   2:01.85 ceph-osd

  1401 root  20   0  548m 103m 3428 S  0.0  2.6   1:51.71 ceph-osd

  1400 root  20   0  548m 103m 3428 S  0.0  2.6   1:50.30 ceph-osd

  1391 root  20   0 000 S  0.0  0.0   1:18.39
btrfs-endio-wri

   976 root  20   0 000 S  0.0  0.0   1:18.11
btrfs-endio-wri

  1367 root  20   0 000 S  0.0  0.0   1:05.60
btrfs-worker-1

   968 root  20   0 000 S  0.0  0.0   1:05.45
btrfs-worker-0

  1163 root  20   0  141m 1636 1100 S  0.0  0.0   1:00.56 collectd

   970 root  20   0 000 S  0.0  0.0   0:47.73
btrfs-submit-0

  1402 root  20   0  548m 103m 3428 S  0.0  2.6   0:34.86 ceph-osd

  1392 root  20   0 000 S  0.0  0.0   0:33.70
btrfs-endio-met

   975 root  20   0 000 S  0.0  0.0   0:32.70
btrfs-endio-met

  1415 root  20   0  548m 103m 3428 S  0.0  2.6   0:28.29 ceph-osd

  1414 root  20   0  548m 103m 3428 S  0.0  2.6   0:28.24 ceph-osd

  1397 root  20   0  548m 103m 3428 S  0.0  2.6   0:24.60 ceph-osd

  1436 root  20   0  548m 103m 3428 S  0.0  2.6   0:13.31 ceph-osd


Here ist my setup.
Kernel v3.1 + Josef

The config for this osd (ceph version 0.37
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)) is:
[osd.1]
 host = s-brick-003
 osd journal = /dev/sda7
 btrfs devs = /dev/sdb
btrfs options = noatime
filestore_btrfs_snap = false

I hope this helps to pin point the problem.

Best Regards,
martin


Sage Weil schrieb:

On Wed, 26 Oct 2011, Christian Brunner wrote:

2011/10/26 Sage Weil s...@newdream.net:

On Wed, 26 Oct 2011, Christian Brunner wrote:

Christian, have you tweaked those settings in your ceph.conf?  It would be
something like 'journal dio = false'.  If not, can you verify that
directio shows true when the journal is initialized from your osd log?
E.g.,

 2011-10-21 15:21:02.026789 7ff7e5c54720 journal _open dev/osd0.journal fd 14: 
104857600 bytes, block size 4096 bytes, directio = 1

If directio = 1 for you, something else funky is causing those
blkdev_fsync's...

I've looked it up in the logs - directio is 1:

Oct 25 17:20:16 os00 osd.000[1696]: 7f0016841740 journal _open
/dev/vg01/lv_osd_journal_0 fd 15: 17179869184 bytes, block size 4096
bytes, directio = 1

Do you mind capturing an strace?  I'd like to see where that blkdev_fsync
is coming from.

Here is an strace. I can see a lot of sync_file_range operations.

Yeah, these all look like the flusher thread, and shouldn't be hitting
blkdev_fsync.  Can you confirm that with

   filestore flusher = false
   filestore sync flush = false

you get no sync_file_range at all?  I wonder if this is also perf lying
about the call chain.

Yes, setting this makes the sync_file_range calls go away.


Okay.  That means either sync_file_range on a regular btrfs file is
triggering blkdev_fsync somewhere in btrfs, there is an extremely sneaky
bug that is mixing up file descriptors, or latencytop is lying.  I'm
guessing the latter, given the other weirdness Josef and Chris were
seeing.  :)


Is it safe to use these settings with filestore btrfs snap = 0?


Yeah.  They're purely

Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]

2011-10-27 Thread Martin Mailand

Hi Stefan,
I think the machine has enough ram.

root@s-brick-003:~# free -m
 total   used   free sharedbuffers cached
Mem:  3924   2401   1522  0 42   2115
-/+ buffers/cache:243   3680
Swap: 1951  0   1951

There is no swap usage at all.

-martin


Am 27.10.2011 12:59, schrieb Stefan Majer:

Hi Martin,

a quick dig into your perf report show a large amount of swapper work.
If this is the case, i would suspect latency. So do you have not
enough physical ram in your machine ?

Greetings

Stefan Majer

On Thu, Oct 27, 2011 at 12:53 PM, Martin Mailandmar...@tuxadero.com  wrote:

Hi
resend without the perf attachment, which could be found here:
http://tuxadero.com/multistorage/perf.report.txt.bz2

Best Regards,
  martin

 Original-Nachricht 
Betreff: Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
Datum: Wed, 26 Oct 2011 22:38:47 +0200
Von: Martin Mailandmar...@tuxadero.com
Antwort an: mar...@tuxadero.com
An: Sage Weils...@newdream.net
Kopie (CC): Christian Brunnerc...@muc.de, ceph-devel@vger.kernel.org,
  linux-bt...@vger.kernel.org

Hi,
I have more or less the same setup as Christian and I suffer the same
problems.
But as far as I can see the output of latencytop and perf differs form
Christian one, both are attached.
I was wondering about the high latency from btrfs-submit.

Process btrfs-submit-0 (970) Total: 2123.5 msec

I have as well the high IO rate and high IO wait.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.600.002.20   82.400.00   14.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.008.40 0.0074.40
17.71 0.033.810.003.81   3.81   3.20
sdb   0.00 7.000.00  269.80 0.00  1224.80
9.08   107.19  398.690.00  398.69   3.15  85.00

top - 21:57:41 up  8:41,  1 user,  load average: 0.65, 0.79, 0.76
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  2.4%sy,  0.0%ni, 70.8%id, 25.8%wa,  0.0%hi,  0.3%si,
0.0%st
Mem:   4018276k total,  1577728k used,  2440548k free,10496k buffers
Swap:  1998844k total,0k used,  1998844k free,  1316696k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

  1399 root  20   0  548m 103m 3428 S  0.0  2.6   2:01.85 ceph-osd

  1401 root  20   0  548m 103m 3428 S  0.0  2.6   1:51.71 ceph-osd

  1400 root  20   0  548m 103m 3428 S  0.0  2.6   1:50.30 ceph-osd

  1391 root  20   0 000 S  0.0  0.0   1:18.39
btrfs-endio-wri

   976 root  20   0 000 S  0.0  0.0   1:18.11
btrfs-endio-wri

  1367 root  20   0 000 S  0.0  0.0   1:05.60
btrfs-worker-1

   968 root  20   0 000 S  0.0  0.0   1:05.45
btrfs-worker-0

  1163 root  20   0  141m 1636 1100 S  0.0  0.0   1:00.56 collectd

   970 root  20   0 000 S  0.0  0.0   0:47.73
btrfs-submit-0

  1402 root  20   0  548m 103m 3428 S  0.0  2.6   0:34.86 ceph-osd

  1392 root  20   0 000 S  0.0  0.0   0:33.70
btrfs-endio-met

   975 root  20   0 000 S  0.0  0.0   0:32.70
btrfs-endio-met

  1415 root  20   0  548m 103m 3428 S  0.0  2.6   0:28.29 ceph-osd

  1414 root  20   0  548m 103m 3428 S  0.0  2.6   0:28.24 ceph-osd

  1397 root  20   0  548m 103m 3428 S  0.0  2.6   0:24.60 ceph-osd

  1436 root  20   0  548m 103m 3428 S  0.0  2.6   0:13.31 ceph-osd


Here ist my setup.
Kernel v3.1 + Josef

The config for this osd (ceph version 0.37
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)) is:
[osd.1]
 host = s-brick-003
 osd journal = /dev/sda7
 btrfs devs = /dev/sdb
btrfs options = noatime
filestore_btrfs_snap = false

I hope this helps to pin point the problem.

Best Regards,
martin


Sage Weil schrieb:


On Wed, 26 Oct 2011, Christian Brunner wrote:


2011/10/26 Sage Weils...@newdream.net:


On Wed, 26 Oct 2011, Christian Brunner wrote:


Christian, have you tweaked those settings in your ceph.conf?  It
would be
something like 'journal dio = false'.  If not, can you verify that
directio shows true when the journal is initialized from your osd
log?
E.g.,

  2011-10-21 15:21:02.026789 7ff7e5c54720 journal _open
dev/osd0.journal fd 14: 104857600 bytes, block size 4096 bytes, directio = 1

If directio = 1 for you, something else funky is causing those
blkdev_fsync's...


I've looked it up in the logs - directio is 1:

Oct 25 17:20:16 os00 osd.000[1696]: 7f0016841740 journal _open
/dev/vg01/lv_osd_journal_0 fd 15: 17179869184 bytes, block size 4096
bytes, directio = 1


Do you mind capturing an strace?  I'd like to see where that
blkdev_fsync
is coming from.


Here is an strace. I can see a lot of sync_file_range operations.


Yeah, these all look like the flusher 

Re: kernel BUG at fs/btrfs/inode.c:1163

2011-10-20 Thread Martin Mailand

Hi Anand,
I changed the replication level of the rbd pool, from one to two.
ceph osd pool set rbd size 2

And then during the sync the bug happened, but today I could not 
reproduce it.


So I do not have a testcase for you.

Best Regards,
 martin

Am 19.10.2011 17:02, schrieb Anand Jain:

I tried to play with ceph here and not a complete success yet.

  any idea what was done on the system at the time of the problem ?
  and any specific command that could trigger this again ?
  Thanks.
anand


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


0.37 crash

2011-10-20 Thread Martin Mailand

Hi,
today I tried the version 0.37 and it did not work very well, see below.
It was an update from 0.36.

Best Regards,
 Martin


2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, pid 
21707
2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 149)
2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps 
2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not enabled

2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes

2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1

2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 152)
2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps 
2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not enabled

2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes

2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1

*** Caught signal (Aborted) **
 in thread 0x7f0ace7f9700
 ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
 1: /usr/bin/ceph-osd() [0x5bd012]
 2: (()+0xfc60) [0x7f0ada2d4c60]
 3: (gsignal()+0x35) [0x7f0ad8a5ad05]
 4: (abort()+0x186) [0x7f0ad8a5eab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
 6: (()+0xb9926) [0x7f0ad930f926]
 7: (()+0xb9953) [0x7f0ad930f953]
 8: (()+0xb9a5e) [0x7f0ad930fa5e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x129) 
[0x5a7e99]

 10: (OSDMap::decode(ceph::buffer::list)+0x81) [0x58f9f1]
 11: (OSD::get_map(unsigned int)+0x242) [0x53f6d2]
 12: (OSD::handle_osd_map(MOSDMap*)+0x1f82) [0x56ae72]
 13: (OSD::_dispatch(Message*)+0x36b) [0x56d11b]
 14: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
 15: (SimpleMessenger::dispatch_entry()+0x88b) [0x5fff2b]
 16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
 17: (()+0x6d8c) [0x7f0ada2cbd8c]
 18: 

Re: 0.37 crash

2011-10-20 Thread Martin Mailand

Hi Stefan,
in my case the osd process was just terminated, no IO wait. Could you 
have a look in your dmesg, if there is any btrfs entry?

Because the IO wait sounds like a btrfs problem.

Best Regards,
 martin

Stefan Kleijkers schrieb:

Hello,

I got the exact same problem. Upgraded from 0.36 to 0.37 and one of the 
two osds wouldn't start. In the log of the osd I also found the same 
error as below. The ceph-osd had status D (with ps, which is 
uninterruptable sleep) and I see a high IO wait with top. Also I noticed 
a lot of disk io on the disks.


Stefan

On 10/20/2011 05:39 PM, Martin Mailand wrote:

Hi,
today I tried the version 0.37 and it did not work very well, see below.
It was an update from 0.36.

Best Regards,
 Martin


2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, 
pid 21707
2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 149)
2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps 
2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
enabled

2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes

2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1

2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 152)
2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps 
2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
enabled

2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes

2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1

*** Caught signal (Aborted) **
 in thread 0x7f0ace7f9700
 ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
 1: /usr/bin/ceph-osd() [0x5bd012]
 2: (()+0xfc60) [0x7f0ada2d4c60]
 3: (gsignal()+0x35) [0x7f0ad8a5ad05]
 4: (abort()+0x186) [0x7f0ad8a5eab6]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
 6

Re: kernel BUG at fs/btrfs/inode.c:1163

2011-10-19 Thread Martin Mailand

Am 19.10.2011 11:49, schrieb David Sterba:

On Tue, Oct 18, 2011 at 10:04:01PM +0200, Martin Mailand wrote:

[28997.273289] [ cut here ]
[28997.282916] kernel BUG at fs/btrfs/inode.c:1163!


1119 fi = btrfs_item_ptr(leaf, path-slots[0],
1120 struct btrfs_file_extent_item);
1121 extent_type = btrfs_file_extent_type(leaf, fi);
1122
1123 if (extent_type == BTRFS_FILE_EXTENT_REG ||
1124 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
...
1158 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
1159 extent_end = found_key.offset +
1160 btrfs_file_extent_inline_len(leaf, fi);
1161 extent_end = ALIGN(extent_end, root-sectorsize);
1162 } else {
1163 BUG_ON(1);
1164 }

rc10 kernel sources point to this, can you please verify it in your
sources? if it's really this one, that means that it's an unhandled
extent_type read from the b-tree leaf and could be a corruption. (the
value is directly obtained from file extent type item, line 1121)


yep, that's the same in my source


It would be interesting what's the value of 'extent_type' at the time of
crash, if it's eg -1 that could point to a real bug, some unhandled
corner case in truncate, for example.



How can I do that?



[28997.507960] Call Trace:
[28997.507960]  [a00903e0] ? acls_after_inode_item+0xc0/0xc0 [btrfs]


... a corruption caused by overflow of xattrs/acls into inode item bytes?

As ceph stresses xattrs very well, I wouldn't be surprised by that.


david


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/inode.c:1163

2011-10-18 Thread Martin Mailand

Hi
today I hit this Bug, kernel is v3.1-rc10 + josef from today, workload 
is a ceph osd.


Best Regards,
 Martin

[28997.273289] [ cut here ]
[28997.282916] kernel BUG at fs/btrfs/inode.c:1163!
[28997.290863] invalid opcode:  [#1] SMP
[28997.290863] CPU 0
[28997.290863] Modules linked in: radeon ttm drm_kms_helper drm psmouse 
sp5100_tco i2c_piix4 i2c_algo_bit serio_raw edac_core k8temp 
edac_mce_amd shpchp lp parport pata_atiixp btrfs zlib_deflate e1000e 
libcrc32c ahci libahci

[28997.290863]
[28997.290863] Pid: 1220, comm: ceph-osd Tainted: GW 
3.1.0-rc10+ #2 MICRO-STAR INTERNATIONAL CO., LTD MS-96B3/MS-96B3
[28997.290863] RIP: 0010:[a0094f17]  [a0094f17] 
run_delalloc_nocow+0x7a7/0x7c0 [btrfs]

[28997.290863] RSP: 0018:880117357a78  EFLAGS: 00010206
[28997.290863] RAX: 002f RBX: 880116b12a20 RCX: 
880117357a38
[28997.290863] RDX: 8800 RSI: 0496 RDI: 
8801003851e0
[28997.290863] RBP: 880117357b78 R08: 0497 R09: 
880117357a28
[28997.290863] R10: 0030 R11:  R12: 
00011d3b
[28997.290863] R13: 00011d3b R14: 8801003851e0 R15: 
0030
[28997.290863] FS:  7ff45ae7b700() GS:88011fc0() 
knlGS:

[28997.507960] CS:  0010 DS:  ES:  CR0: 8005003b
[28997.507960] CR2: 7ff450a2 CR3: 000114b75000 CR4: 
06f0
[28997.507960] DR0:  DR1:  DR2: 

[28997.507960] DR3:  DR6: 0ff0 DR7: 
0400
[28997.507960] Process ceph-osd (pid: 1220, threadinfo 880117356000, 
task 88011526)

[28997.507960] Stack:
[28997.507960]  880117357aa8 81156e90 880104413af0 
880104413af0
[28997.507960]  880117550030 880117357bf0 880117550028 
880117550020
[28997.507960]  0040 00010040 880117357d14 
00ffa00a973e

[28997.507960] Call Trace:
[28997.507960]  [81156e90] ? kmem_cache_free+0x20/0x100
[28997.507960]  [a0095264] run_delalloc_range+0x334/0x380 [btrfs]
[28997.507960]  [a00abc85] __extent_writepage+0x5b5/0x6f0 [btrfs]
[28997.507960]  [812e526d] ? 
radix_tree_gang_lookup_tag_slot+0x8d/0xd0
[28997.507960]  [a00abfea] 
extent_write_cache_pages.clone.19.clone.26+0x22a/0x3a0 [btrfs]

[28997.507960]  [a00ac3a5] extent_writepages+0x45/0x60 [btrfs]
[28997.507960]  [a00903e0] ? acls_after_inode_item+0xc0/0xc0 
[btrfs]

[28997.507960]  [81182ade] ? vfsmount_lock_local_unlock+0x1e/0x30
[28997.507960]  [a008fa27] btrfs_writepages+0x27/0x30 [btrfs]
[28997.507960]  [81118161] do_writepages+0x21/0x40
[28997.507960]  [8110e2cb] __filemap_fdatawrite_range+0x5b/0x60
[28997.507960]  [8110f1d3] filemap_fdatawrite_range+0x13/0x20
[28997.507960]  [81192c99] sys_sync_file_range+0x149/0x180
[28997.835220]  [815f05c2] system_call_fastpath+0x16/0x1b
[28997.835220] Code: 8b 7d 80 e8 dc 9e 00 00 41 b9 04 00 00 00 e9 3d fe 
ff ff 4d 89 ef 41 bc 01 00 00 00 48 c7 45 a8 ff ff ff ff e9 5c fb ff ff 
0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 0f 1f 84 00
[28997.835220] RIP  [a0094f17] run_delalloc_nocow+0x7a7/0x7c0 
[btrfs]

[28997.835220]  RSP 880117357a78
[28997.927402] ---[ end trace a0a1c4a13d975229 ]---

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD blocked for more than 120 seconds

2011-10-17 Thread Martin Mailand

Am 17.10.2011 11:40, schrieb Christian Brunner:

2011/10/15 Martin Mailandmar...@tuxadero.com:

Hi Christian,
I have a very similar experience, I also used josef's tree and btrfs snaps =
0, the next problem I had than was excessive fragmentation, so I  used this
patch http://marc.info/?l=linux-btrfsm=131495014823121w=2, and changed the
btrfs option to (btrfs options = noatime,nodatacow,autodefrag) that kept the
fragmentation under control.
But even with this setup after a few days the load on the osd is unbearable.


How did you find out about our fragmentation issues? Was it just a
performance problem?



I used filefrag to show the number of extents, after the patch, I have 
on average 1,14 extents per 4MB ceph object on the osd.



As far as I understood the doku if you disable the btrfs snapshot
functionality the writeahead journal is activated.
http://ceph.newdream.net/wiki/Ceph.conf
And I get this in the logs.
mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not
enabled

May I asked what kind of probs you did have with ext4? Because I am looking
into this direction as well.


You can read about our ext4 problems here:

http://marc.info/?l=ceph-develm=131201869703245w=2


I still can reproduce the bug with v3.1-rc9.



Our bugreport with RedHat didn't make any progress for a long time,
but last week RedHat made two sugestions:

- If you configure ceph with 'filestore flusher = false', do you see
any different behavior?
- If you mount with -o noauto_da_alloc does it change anything?

Since I have just migrated to btrfs, I've some problems to check this,
but I'll try to do this as soon as I can get hold of some extra
hardware.


I can check this, I have a spare cluster at the moment.


Regards,
Christian


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD blocked for more than 120 seconds

2011-10-17 Thread Martin Mailand

Am 17.10.2011 14:05, schrieb Tomasz Paszkowski:

Hi,

It seems that ext4 and btrfs are not to be considered as stable for
now. Does anyone could confirm that
ext3 is the best choice for this moment ?


Hi,
I did a quick test with ext3, and it did not look very good.
After a few minutes one of the osds failed with this message.

[315274.737204] kjournald starting.  Commit interval 5 seconds
[315274.737919] EXT3-fs (sdb): using internal journal
[315274.737929] EXT3-fs (sdb): mounted filesystem with ordered data mode
[317040.890148] INFO: task ceph-osd:18032 blocked for more than 120 seconds.
[317040.905855] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
[317040.923801] ceph-osdD 880114c8b1a0 0 18032  1 
0x
[317040.923812]  88010f2e3cb8 0086 88010f2e3cb8 
88010f2e3cb8
[317040.923821]  88011ffdff08 88010f2e3fd8 88010f2e2000 
88010f2e3fd8
[317040.923830]  880116dadbc0 880114c8ade0 88010f2e3cd8 
8110d500

[317040.923847] Call Trace:
[317040.923865]  [8110d500] ? find_get_pages_tag+0x40/0x130
[317040.923876]  [815d93df] schedule+0x3f/0x60
[317040.923884]  [815d99ed] schedule_timeout+0x26d/0x2e0
[317040.923893]  [8101a725] ? native_sched_clock+0x15/0x70
[317040.923899]  [8101a789] ? sched_clock+0x9/0x10
[317040.923908]  [8108d465] ? sched_clock_local+0x25/0x90
[317040.923916]  [815d9219] wait_for_common+0xd9/0x180
[317040.923924]  [8105bbc0] ? try_to_wake_up+0x2b0/0x2b0
[317040.923932]  [815d939d] wait_for_completion+0x1d/0x20
[317040.923941]  [8118d652] sync_inodes_sb+0x92/0x1c0
[317040.923949]  [81192440] ? __sync_filesystem+0x90/0x90
[317040.923956]  [81192430] __sync_filesystem+0x80/0x90
[317040.923963]  [8119245f] sync_one_sb+0x1f/0x30
[317040.923972]  [81169268] iterate_supers+0xa8/0x100
[317040.923979]  [81192360] sync_filesystems+0x20/0x30
[317040.923985]  [81192501] sys_sync+0x21/0x40
[317040.923995]  [815e37c2] system_call_fastpath+0x16/0x1b

Best Regards,
 martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD blocked for more than 120 seconds

2011-10-17 Thread Martin Mailand

Am 17.10.2011 11:40, schrieb Christian Brunner:

Our bugreport with RedHat didn't make any progress for a long time,
but last week RedHat made two sugestions:

- If you configure ceph with 'filestore flusher = false', do you see
any different behavior?
- If you mount with -o noauto_da_alloc does it change anything?


Hi,
after a quick test I think 'filestore flusher = false' did the trick.
What does it do?

Best Regards,
 martin

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD blocked for more than 120 seconds

2011-10-17 Thread Martin Mailand

Hi Sage,
the hang was on a btrfs, I do not have a fix for that.

The 'filestore flusher = false' does fix the ext4 problems, which where 
reported from Christian, but this option has quite an impact of the osd 
performance.

The '-o noauto_da_alloc' option did not solve the fsck problem.

Best Regards,
 Martin


Sage Weil schrieb:

On Mon, 17 Oct 2011, Martin Mailand wrote:

Am 17.10.2011 11:40, schrieb Christian Brunner:

Our bugreport with RedHat didn't make any progress for a long time,
but last week RedHat made two sugestions:

- If you configure ceph with 'filestore flusher = false', do you see
any different behavior?
- If you mount with -o noauto_da_alloc does it change anything?

Hi,
after a quick test I think 'filestore flusher = false' did the trick.
What does it do?


It fixes your hang (previous email), or the subsequent fsck errors?

When filestore flusher = true (default), after every write the fd is 
handed off to another thread that uses sync_file_range() to push the data 
out to disk quickly before closing the file.  The purpose is to limit the 
latency for the eventual snapshot or sync.  Eric suspected the handoff 
between threads may be what was triggering the bug in ext4.


sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD blocked for more than 120 seconds

2011-10-15 Thread Martin Mailand

Hi Christian,
I have a very similar experience, I also used josef's tree and btrfs 
snaps = 0, the next problem I had than was excessive fragmentation, so I 
 used this patch http://marc.info/?l=linux-btrfsm=131495014823121w=2, 
and changed the btrfs option to (btrfs options = 
noatime,nodatacow,autodefrag) that kept the fragmentation under control.

But even with this setup after a few days the load on the osd is unbearable.

As far as I understood the doku if you disable the btrfs snapshot 
functionality the writeahead journal is activated.

http://ceph.newdream.net/wiki/Ceph.conf
And I get this in the logs.
mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is 
not enabled


May I asked what kind of probs you did have with ext4? Because I am 
looking into this direction as well.


Best Regards,
 martin

Christian Brunner schrieb:

I'm not seeing the same problem, but I've experienced something similar:

As you might know, I had serious performance problems with btrfs some
month ago, after that, I switched to ext4 and had other problems
there. Last Saturday I decided to give josef's current btrfs git repo
a try in our ceph cluster.

Everything performed well at first, but after a day I noticed that
btrfs-cleaner was wasting more and more time in
btrfs_clean_old_snapshots. When we reached load 20 on the OSDs I
rebooted the nodes, everything was back to normal then. But again
after a a few hours the load started to rise.

My solution to fix this for the moment was, to turn of the btrfs
snapshot feature in ceph with:

filestore btrfs snaps = 0

Now I have good performance, low waitio values on the disks and I
haven't seen our btrfs warning until now as well.

I don't know what the implications are (does this enable writeahead
journaling in ceph?), but to me it's the only setup that does the job
at the moment.

Regards,
Christian



2011/10/14 Wido den Hollander w...@widodh.nl:

Hi,

On Thu, 2011-10-13 at 22:39 +0200, Martin Mailand wrote:

Hi,
on one of my OSDs the ceph-osd task hung for more than 120 sec. The OSD
had almost no load, therefore it cannot be an overload problem. I think
it is a btrfs problem, could someone clarify it?

This was in the dmesg.

[29280.890040] INFO: task btrfs-cleaner:1708 blocked for more than 120

Judging on the fact that I see btrfs-cleaner and btrfs-transaction
blocking I guess this is a btrfs bug/hangup.

Which kernel are you using?

Wido


seconds.
[29280.905659] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[29280.922916] btrfs-cleaner   D 8801153bdf80 0  1708  2
0x
[29280.922931]  88011698bbd0 0046 88011698bb90
81090d7d
[29280.922960]  8801 88011698bfd8 88011698a000
88011698bfd8
[29280.922988]  81a0d020 8801153bdbc0 88011698bbd0
000181090d7d
[29280.923018] Call Trace:
[29280.923043]  [81090d7d] ? ktime_get_ts+0xad/0xe0
[29280.923062]  [8110cf10] ? __lock_page+0x70/0x70
[29280.923082]  [815d93df] schedule+0x3f/0x60
[29280.923098]  [815d948c] io_schedule+0x8c/0xd0
[29280.923114]  [8110cf1e] sleep_on_page+0xe/0x20
[29280.923130]  [815d9c6f] __wait_on_bit+0x5f/0x90
[29280.923147]  [8110d168] wait_on_page_bit+0x78/0x80
[29280.923165]  [81086bd0] ? autoremove_wake_function+0x40/0x40
[29280.923227]  [a0065ecb] btrfs_defrag_file+0x4fb/0xc10 [btrfs]
[29280.923246]  [8117f6ac] ? find_inode+0xac/0xb0
[29280.923281]  [a003a2d0] ?
btrfs_clean_old_snapshots+0x160/0x160 [btrfs]
[29280.923302]  [812e369b] ? radix_tree_lookup+0xb/0x10
[29280.923337]  [a0034f62] ?
btrfs_read_fs_root_no_name+0x1c2/0x2e0 [btrfs]
[29280.923375]  [a004897e] btrfs_run_defrag_inodes+0x15e/0x210
[btrfs]
[29280.923410]  [a003278f] cleaner_kthread+0x17f/0x1a0 [btrfs]
[29280.923443]  [a0032610] ? btrfs_congested_fn+0xb0/0xb0 [btrfs]
[29280.923460]  [81086436] kthread+0x96/0xa0
[29280.923477]  [815e5934] kernel_thread_helper+0x4/0x10
[29280.923493]  [810863a0] ? flush_kthread_worker+0xb0/0xb0
[29280.923510]  [815e5930] ? gs_change+0x13/0x13
[29280.923521] INFO: task btrfs-transacti:1709 blocked for more than 120
seconds.
[29280.939551] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[29280.956782] btrfs-transacti D 880115745f80 0  1709  2
0x
[29280.956792]  880115e6fd50 0046 880115e6fd20
880111a5a3e0
[29280.956800]  8801 880115e6ffd8 880115e6e000
880115e6ffd8
[29280.956809]  81a0d020 880115745bc0 0282
000116758450
[29280.956817] Call Trace:
[29280.956827]  [815d93df] schedule+0x3f/0x60
[29280.956855]  [a0037de5] wait_for_commit.clone.16+0x55/0x90
[btrfs]
[29280.956864]  [81086b90] ? wake_up_bit+0x40/0x40
[29280.956891]  [a0039726]
btrfs_commit_transaction+0x776/0x860 [btrfs

Btrfs High IO-Wait

2011-10-09 Thread Martin Mailand

Hi,
I have high IO-Wait on the ods (ceph), the osd are running a v3.1-rc9 
kernel.

I also experience high IO-rates, around 500IO/s reported via iostat.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.006.80 0.0062.40 
18.35 0.045.290.005.29   5.29   3.60
sdb   0.00   249.800.40  669.60 1.60  4118.40 
12.3087.47  130.56   15.00  130.63   1.01  67.40


In comparison, the same workload, but the osd uses ext4 as a backing fs.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.00   10.00 0.00   128.00 
25.60 0.033.400.003.40   3.40   3.40
sdb   0.0027.800.00   48.20 0.00   318.40 
13.21 0.438.840.008.84   1.99   9.60


iodump shows similar results, where sdb is the data disk, sda7 the 
journal and sda5 the root.


btrfs

root@s-brick-003:~# echo 1  /proc/sys/vm/block_dump
root@s-brick-003:~# while true; do sleep 1; dmesg -c; done | perl 
/usr/local/bin/iodump

^C# Caught SIGINT.
TASK   PID  TOTAL   READ  WRITE  DIRTY 
DEVICES

btrfs-submit-08321  28040  0  28040  0 sdb
ceph-osd  8514158  0158  0 sda7
kswapd0 46 81  0 81  0 sda1
bash 10709 35 35  0  0 sda1
flush-8:0  962 12  0 12  0 sda5
kworker/0:1   8897  6  0  6  0 sdb
kworker/1:1  10354  3  0  3  0 sdb
kjournald  266  3  0  3  0 sda5
ceph-osd  8523  2  2  0  0 sda1
ceph-osd  8531  1  1  0  0 sda1
dmesg10712  1  1  0  0 sda5


ext4

root@s-brick-002:~# echo 1  /proc/sys/vm/block_dump
root@s-brick-002:~# while true; do sleep 1; dmesg -c; done | perl 
/usr/local/bin/iodump

^C# Caught SIGINT.
TASK   PID  TOTAL   READ  WRITE  DIRTY 
DEVICES

ceph-osd  3115847  0847  0 sdb
jbd2/sdb-82897784  0784  0 sdb
ceph-osd  3112728  0728  0 
sda5, sdb

ceph-osd  3110191  0191  0 sda7
perl  3628 13 13  0  0 sda5
flush-8:162901  8  0  8  0 sdb
kjournald  272  3  0  3  0 sda5
dmesg 3630  1  1  0  0 sda5
sleep 3629  1  1  0  0 sda5


I think that is the same problem as in 
http://marc.info/?l=ceph-develm=131158049117139w=2


I also did a latencytop as Chris recommended in the above thread.

Best Regards,
 martin








latencytop.out_long_uptime.bz2
Description: application/bzip


latencytop.out_short_uptime.bz2
Description: application/bzip


Re: OSD::disk_tp timeout

2011-10-08 Thread Martin Mailand

Hi Christian,
if I remember correctly you are using ceph with a qemu-kvm setup?

After the last update of ceph, the load average on the osd was doubled,
the performance of the kvm machines became bad.

The really weird thing is, the cluster needs around 30 mins to get 
into this state. After I restart the osd's everything is fine, than 
after a while the load of the osd nodes is building up. Most of the load 
is produced by btrfs kernel processes in the deferred state.


Not sure if I have the same problem as you, as I do not get any timeouts.

Best Regards,
 martin

Christian Brunner schrieb:

Hi,

I've upgraded ceph from 0.32 to 0.36 yesterday. Now I have a totaly
screwed ceph cluster. :(

What bugs me most is the fact, that OSDs become unresponsive
frequently. The process is eating a lot of cpu and I can see the
following messages in the log:

Oct  8 22:30:05 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:10 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:15 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:20 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:25 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:30 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60

Do you have any idea, what to do about that?

Regards,
Christian
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD::disk_tp timeout

2011-10-08 Thread Martin Mailand

Hi,
I am using v3.1-rc9, so the fix in there. Maybe I can nail it down a bit 
more specific.


Best Regards,
 martin

Sage Weil schrieb:

Hi Christian,

On Sat, 8 Oct 2011, Christian Brunner wrote:

Hi,

I've upgraded ceph from 0.32 to 0.36 yesterday. Now I have a totaly
screwed ceph cluster. :(

What bugs me most is the fact, that OSDs become unresponsive
frequently. The process is eating a lot of cpu and I can see the


What version of btrfs are you running?  This sound a bit like the bug 
fixed by this patch:


http://www.spinics.net/lists/linux-btrfs/msg12627.html

(That was just merged into mainline this week.)


following messages in the log:

Oct  8 22:30:05 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:10 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:15 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:20 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:25 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60
Oct  8 22:30:30 os00 osd.000[31688]: 7fe0f3b9c700 heartbeat_map
is_healthy 'OSD::disk_tp thread 0x7fe0e527e700' had timed out after 60

Do you have any idea, what to do about that?


Those messages just mean that a thread in the disk threadpool (which is 
doing all the writes to btrfs) is blocked/stopped.


sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snap

2011-09-23 Thread Martin Mailand

Hi,
is it possible to access snapshots without to rollback the head of the 
rbd volumen?
Because I want to do a snapshot of a vm running via librbd and qemu, and 
use the snapshot to make a offsite backup of the vm.


Best Regards,
 Martin


Martin Mailand schrieb:
Okay, with the btrfs patch and the right commandline snapshotting works 
like a charm.


best regards,
 martin

Josh Durgin schrieb:

On 09/16/2011 02:32 PM, Martin Mailand wrote:

root@c-brick-001:~# rbd rm --snap=2011091601 test
*** Caught signal (Segmentation fault) **
in thread 0x7f203d749740
ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
1: rbd() [0x457062]
2: (()+0xfc60) [0x7f203ccf6c60]
3: (librbd::snap_set(librbd::ImageCtx*, char const*)+0x10) 
[0x7f203d32ecd0]

4: (main()+0x59f) [0x4518ff]
5: (__libc_start_main()+0xff) [0x7f203b6cdeff]
6: rbd() [0x44d569]
Segmentation fault


I added a bug to the tracker for this 
(http://tracker.newdream.net/issues/1545). It shouldn't crash the way 
you ran it, but if you're trying to remove a snapshot you need to use 
the 'snap rm' command, i.e.:

$ rbd snap rm --snap=2011091601 test
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snap

2011-09-23 Thread Martin Mailand

Hi,
that's great, and it is safe to start diffrent vms with diffrent 
snapshots of the same image at the same time?


Best Regards,
 Martin

Sage Weil schrieb:

On Fri, 23 Sep 2011, Martin Mailand wrote:

Hi,
is it possible to access snapshots without to rollback the head of the rbd
volumen?
Because I want to do a snapshot of a vm running via librbd and qemu, and use
the snapshot to make a offsite backup of the vm.


$ rbd export foo --snap=mysnap /path/to/foo.dump

You can also map the snapshot via qemu with a string like 
rbd:rbd/foo@mysnap.


sage



Best Regards,
 Martin


Martin Mailand schrieb:

Okay, with the btrfs patch and the right commandline snapshotting works like
a charm.

best regards,
 martin

Josh Durgin schrieb:

On 09/16/2011 02:32 PM, Martin Mailand wrote:

root@c-brick-001:~# rbd rm --snap=2011091601 test
*** Caught signal (Segmentation fault) **
in thread 0x7f203d749740
ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
1: rbd() [0x457062]
2: (()+0xfc60) [0x7f203ccf6c60]
3: (librbd::snap_set(librbd::ImageCtx*, char const*)+0x10)
[0x7f203d32ecd0]
4: (main()+0x59f) [0x4518ff]
5: (__libc_start_main()+0xff) [0x7f203b6cdeff]
6: rbd() [0x44d569]
Segmentation fault

I added a bug to the tracker for this
(http://tracker.newdream.net/issues/1545). It shouldn't crash the way you
ran it, but if you're trying to remove a snapshot you need to use the
'snap rm' command, i.e.:
$ rbd snap rm --snap=2011091601 test
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RDB Performance

2011-09-21 Thread Martin Mailand

hi,
I have a few question about the rbd performance. I have a small ceph 
installation, three osd server one monitor server and one compute node 
which maps a rbd image to a block device, all server a connectet via a 
dedicated 1Gbs network.

Each osd is capable of doing around 90MB/s tested with osd bench.
But if I test the write speed of the rbd block device the performance 
ist quite poor.


I do the test with
dd if=/dev/zero of=/dev/rbd0 bs=1M count=1 oflag=direct,
I get a throughput around 25MB/s.
I used wireshark to graph the network throughput, the image is
http://tuxadero.com/multistorage/ceph.jpg
as you can see the throughput is not smooth.

The graph for the test without the oflag=direct is
http://tuxadero.com/multistorage/ceph2.jpg
which is much better, but I the compute node uses around 4-5G of it's 
RAM as a writeback cache, which is not acceptable for my application.


For comparison the graph for a scp transfer.
http://tuxadero.com/multistorage/scp.jpg

I read in the ceph doku, that ever package has to be commited to the 
disk on the osd, before it is acknowledged to the client, could you 
please expalain what a package is? Probably not a TCP package.


And on the mailinglist was a discussion about a writeback window, to my 
understanding it say how many byte can be

unacknowledged in transit, is that right?

How could I activate it?

Thanks for your time.

Best Regards,
 martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDB Performance

2011-09-21 Thread Martin Mailand

Hi Sage,
good to hear that you are working on this issue. I tried qemu-kvm with 
the rbd block device patch, which I think uses librbd, but I couldn't 
measure any performance improvements.
Which versions do I have to use, and do I have to activate the writeback 
window or is it default on?


Best Regards,
 Martin


Sage Weil schrieb:

On Wed, 21 Sep 2011, Martin Mailand wrote:

hi,
I have a few question about the rbd performance. I have a small ceph
installation, three osd server one monitor server and one compute node which
maps a rbd image to a block device, all server a connectet via a dedicated
1Gbs network.
Each osd is capable of doing around 90MB/s tested with osd bench.
But if I test the write speed of the rbd block device the performance ist
quite poor.

I do the test with
dd if=/dev/zero of=/dev/rbd0 bs=1M count=1 oflag=direct,
I get a throughput around 25MB/s.
I used wireshark to graph the network throughput, the image is
http://tuxadero.com/multistorage/ceph.jpg
as you can see the throughput is not smooth.

The graph for the test without the oflag=direct is
http://tuxadero.com/multistorage/ceph2.jpg
which is much better, but I the compute node uses around 4-5G of it's RAM as a
writeback cache, which is not acceptable for my application.

For comparison the graph for a scp transfer.
http://tuxadero.com/multistorage/scp.jpg

I read in the ceph doku, that ever package has to be commited to the disk on
the osd, before it is acknowledged to the client, could you please expalain
what a package is? Probably not a TCP package.


You probably mean object.. each write has to be on disk before it is 
acknowledged.


And on the mailinglist was a discussion about a writeback window, to my 
understanding it say how many byte can be unacknowledged in transit, is 
that right?


Right.


How could I activate it?


So far it's currently only implemented in librbd (the userland 
implementation).  The problem is that your dd is doing synchronous writes 
to the block device, which are synchronously written to the OSD.  That 
means a lot of time waiting around for the last write to complete before 
starting to send the next one.


Normal hard disks have a cache that absorbs this.  They acknowledge the 
write immediately, and only promise that the data will actually be durable 
when you issue a flush command later.


In librbd, we just added a write window that gives you similar 
performance.  We acknowledge writes immediately and do the write 
asynchronously, with a cap on the amount of outstanding bytes.  This 
doesn't coalesce small writes into big ones like a real cache, but usually 
the filesystem does most of that, so we should get similar performance.


Anyway, the kernrel implementation doesn't do that yet.  It's on the todo 
list for the next 2 weeks...


sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDB Performance

2011-09-21 Thread Martin Mailand

Hi Sage,
the performance improvment is quite impressive. Now I get around 90MB/s 
from within the vm.

Thanks.

Best Regards,
 martin

Sage Weil schrieb:

On Wed, 21 Sep 2011, Martin Mailand wrote:

Hi Sage,
good to hear that you are working on this issue. I tried qemu-kvm with the rbd
block device patch, which I think uses librbd, but I couldn't measure any
performance improvements.

Which versions do I have to use, and do I have to activate the writeback
window or is it default on?


In the qemu rbd: line, include an option like :rbd_writeback_window=8192,
where the size of the window is specified in bytes.  (It's off by 
default.)


Also, keep mind that unless you're using the latest qemu upstream (or our 
repo), the flush aren't being passed down properly, and your data won't 
quite be safe.  (That's the main reason why we're leaving it off by 
default for the time being.)


sage


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph kernel bug

2011-09-16 Thread Martin Mailand

Hi Sage,
I rerun the test and I think I triggered the first bug again.
http://pastebin.com/ydNm0pff

I did also the dumps for you.
http://tuxadero.com/multistorage/ceph.ko_dump
http://tuxadero.com/multistorage/libceph.ko_dump

Best Regards,
 martin
Am 16.09.2011 00:54, schrieb Sage Weil:

On Thu, 15 Sep 2011, Martin Mailand wrote:

Hi Sage,
that's quite a bit of output, I put it in a pastebin.
http://pastebin.com/9CNJk0Pw.

Any chance you can include the output of 'objdump -rdS libceph.ko'?
ceph.ko too, for good measure.

This looks like a sightly different crash than the one on that bug!

Thanks!
sage



Best Regards,
  martin

Sage Weil schrieb:

On Thu, 15 Sep 2011, Martin Mailand wrote:

Hi Sage,
I am still hitting this in -rc6. It happeneds every time I stop an OSD.
Do you need more information to reproduce it?

Oh, great to hear it's easy to reproduce!  I was trying (in my uml
environment) and failing.

Can run the script below right before stopping the osd, and send the dmesg
output along?  (Or attach to http://tracker.newdream.net/issues/1382)

Thanks!
sage


#!/bin/sh -x

p() {
  echo $*  /sys/kernel/debug/dynamic_debug/control
}

p 'module ceph +p'
p 'module libceph +p'
p 'module rbd +p'
p 'file net/ceph/messenger.c -p'
p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph \
| awk '{print $1}' | sed 's/:/ line /'` '+p'
p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph \
| awk '{print $1}' | sed 's/:/ line /'` '+p'


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

2011-09-16 Thread Martin Mailand

Hi Josef,
after a quick test it seems that I do not hit this Warning any longer.
But I got a new one.

[ 5241.839951] [ cut here ]
[ 5241.839974] WARNING: at fs/btrfs/extent-tree.c:5715 
btrfs_alloc_free_block+0xac/0x330 [btrfs]()

[ 5241.839979] Hardware name: MS-96B3
[ 5241.839982] Modules linked in: radeon ttm drm_kms_helper drm 
i2c_algo_bit psmouse k8temp sp5100_tco edac_core edac_mce_amd serio_raw 
shpchp i2c_piix4 lp parport ahci pata_atiixp libahci btrfs e1000e 
zlib_deflate libcrc32c
[ 5241.840068] Pid: 1568, comm: kworker/0:1 Tainted: GW 
3.1.0-rc6 #1

[ 5241.840072] Call Trace:
[ 5241.840084]  [81063d0f] warn_slowpath_common+0x7f/0xc0
[ 5241.840101]  [81063d6a] warn_slowpath_null+0x1a/0x20
[ 5241.840133]  [a002a9cc] btrfs_alloc_free_block+0xac/0x330 
[btrfs]

[ 5241.840152]  [8110d35a] ? unlock_page+0x2a/0x40
[ 5241.840188]  [a0059268] ? read_extent_buffer+0xa8/0x180 [btrfs]
[ 5241.840222]  [a0031c00] ? verify_parent_transid+0x160/0x160 
[btrfs]

[ 5241.840252]  [a001a0d2] __btrfs_cow_block+0x122/0x4b0 [btrfs]
[ 5241.840283]  [a001a552] btrfs_cow_block+0xf2/0x1f0 [btrfs]
[ 5241.840314]  [a001cb88] push_leaf_left+0x108/0x180 [btrfs]
[ 5241.840344]  [a001fb78] btrfs_del_items+0x2b8/0x440 [btrfs]
[ 5241.840379]  [a00300c2] btrfs_del_csums+0x2d2/0x310 [btrfs]
[ 5241.840415]  [a00677a8] ? btrfs_tree_unlock+0x28/0xb0 [btrfs]
[ 5241.840447]  [a002597a] __btrfs_free_extent+0x48a/0x6f0 [btrfs]
[ 5241.840480]  [a0028c8d] run_clustered_refs+0x21d/0x840 [btrfs]
[ 5241.840514]  [a002937a] btrfs_run_delayed_refs+0xca/0x220 
[btrfs]
[ 5241.840551]  [a0053576] ? 
btrfs_run_ordered_operations+0x1d6/0x200 [btrfs]
[ 5241.840587]  [a0038fa3] btrfs_commit_transaction+0x83/0x870 
[btrfs]

[ 5241.840605]  [81012871] ? __switch_to+0x261/0x2f0
[ 5241.840622]  [81086d70] ? wake_up_bit+0x40/0x40
[ 5241.840656]  [a0039790] ? 
btrfs_commit_transaction+0x870/0x870 [btrfs]

[ 5241.840691]  [a00397af] do_async_commit+0x1f/0x30 [btrfs]
[ 5241.840708]  [8108110d] process_one_work+0x11d/0x430
[ 5241.840724]  [81081dd9] worker_thread+0x169/0x360
[ 5241.840741]  [81081c70] ? manage_workers.clone.21+0x240/0x240
[ 5241.840758]  [81086616] kthread+0x96/0xa0
[ 5241.840775]  [815f2434] kernel_thread_helper+0x4/0x10
[ 5241.840792]  [81086580] ? flush_kthread_worker+0xb0/0xb0
[ 5241.840808]  [815f2430] ? gs_change+0x13/0x13
[ 5241.840819] ---[ end trace c8a580615cad6cb5 ]---


Best Regards,
 Martin

Am 15.09.2011 21:50, schrieb Josef Bacik:

On Thu, Sep 15, 2011 at 11:44:09AM -0700, Sage Weil wrote:

On Tue, 13 Sep 2011, Liu Bo wrote:

On 09/11/2011 05:47 AM, Martin Mailand wrote:

Hi
I am hitting this Warning reproducible, the workload is a ceph osd,
kernel ist 3.1.0-rc5.



Have posted a patch for this:

http://marc.info/?l=linux-btrfsm=131547325515336w=2


We're still seeing this with -rc6, which includes 98c9942 and 65450aa.

I haven't looked at the reservation code in much detail.  Is there
anything I can do to help track this down?



This should be taken care of with all my enospc changes.  You can pull them down
from my btrfs-work tree as soon as kernel.org comes back from the dead :).
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rbd snap

2011-09-16 Thread Martin Mailand

Hi,
should rbd snap work? I created a snapshot and then I want to list it.
rbd snap create --snap=2011091601 lenny1
rbd snap ls lenny1

But the ls command does not come back and I get a Kernel bug on 2 OSD.
It is reproducible.


[ 7658.115729] [ cut here ]
[ 7658.115779] kernel BUG at fs/btrfs/delayed-inode.c:1693!
[ 7658.115812] invalid opcode:  [#1] SMP
[ 7658.115846] CPU 1
[ 7658.115861] Modules linked in: radeon ttm drm_kms_helper drm 
i2c_algo_bit psmouse k8temp sp5100_tco edac_core edac_mce_amd serio_raw 
shpchp i2c_piix4 lp parport ahci pata_atiixp libahci btrfs e1000e 
zlib_deflate libcrc32c

[ 7658.116080]
[ 7658.116095] Pid: 1418, comm: cosd Tainted: GW   3.1.0-rc6 #1 
MICRO-STAR INTERNATIONAL CO., LTD MS-96B3/MS-96B3
[ 7658.116167] RIP: 0010:[a007ffd0]  [a007ffd0] 
btrfs_delayed_update_inode+0x2a0/0x2b0 [btrfs]

[ 7658.116278] RSP: 0018:8801160efbc8  EFLAGS: 00010286
[ 7658.116311] RAX: ffe4 RBX: 8800777c0120 RCX: 
00018000
[ 7658.116351] RDX: f7e5 RSI: 00018000 RDI: 
880116b10160
[ 7658.116389] RBP: 8801160efc08 R08: e8c81a40 R09: 
8800886826a0
[ 7658.116428] R10:  R11:  R12: 
880001e4af50
[ 7658.116467] R13: 8800777c0168 R14: 880117122ea0 R15: 
880115113000
[ 7658.116507] FS:  7f80e30eb700() GS:88011fc8() 
knlGS:

[ 7658.116550] CS:  0010 DS:  ES:  CR0: 80050033
[ 7658.116583] CR2: ff600400 CR3: 000116015000 CR4: 
06e0
[ 7658.116623] DR0:  DR1:  DR2: 

[ 7658.116662] DR3:  DR6: 0ff0 DR7: 
0400
[ 7658.116700] Process cosd (pid: 1418, threadinfo 8801160ee000, 
task 880116aeade0)

[ 7658.116744] Stack:
[ 7658.116759]  0282 00018000 8801160efc18 
880001e4af50
[ 7658.116817]  880117122ea0 8800391b01e0 8801171113f0 

[ 7658.116875]  8801160efc58 a003f353 8801160efc38 
a00677f8

[ 7658.116933] Call Trace:
[ 7658.116978]  [a003f353] btrfs_update_inode+0x53/0x160 [btrfs]
[ 7658.117039]  [a00677f8] ? btrfs_tree_unlock+0x78/0xb0 [btrfs]
[ 7658.117099]  [a0063184] btrfs_ioctl_clone+0x9b4/0xd20 [btrfs]
[ 7658.117164]  [a00666f6] btrfs_ioctl+0x306/0xe20 [btrfs]
[ 7658.117204]  [81175f32] ? do_filp_open+0x42/0xa0
[ 7658.117240]  [81178048] do_vfs_ioctl+0x98/0x540
[ 7658.117277]  [81156f40] ? kmem_cache_free+0x20/0x100
[ 7658.117313]  [81178581] sys_ioctl+0x91/0xa0
[ 7658.117347]  [815f02c2] system_call_fastpath+0x16/0x1b
[ 7658.117381] Code: 00 03 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 f8 7d 
fa ff 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 cc fd ff ff 0f 1f 80 00 00 
00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
[ 7658.117814] RIP  [a007ffd0] 
btrfs_delayed_update_inode+0x2a0/0x2b0 [btrfs]

[ 7658.117883]  RSP 8801160efbc8
[ 7658.122364] ---[ end trace c8a580615cad6cbe ]---

Best Regards,
 Martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

2011-09-16 Thread Martin Mailand

Hi Josef,
the commit is not in there, but the code looks like your post.

if (--trans-use_count) {
trans-block_rsv = trans-orig_rsv;
return 0;
}

trans-block_rsv = NULL;
while (count  4) {
unsigned long cur = trans-delayed_ref_updates;
trans-delayed_ref_updates = 0;

But on the other hand I am quite new to git, how could I get your latest 
commit?


Best Regards,
 Martin

Am 16.09.2011 16:37, schrieb Josef Bacik:

On 09/16/2011 10:09 AM, Martin Mailand wrote:

Hi Josef,
after a quick test it seems that I do not hit this Warning any longer.
But I got a new one.



Hmm looks like that may not be my newest stuff, is commit

57f499e1bb76ba3ebeb09cd12e9dac84baa5812b

in there?  Specifically look at __btrfs_end_transaction in transaction.c
and see if the line

trans-block_rsv = NULL;

is before the first while() loop.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph kernel bug

2011-09-16 Thread Martin Mailand

Hi Sage,
yes it fixes things for me as well.

Best Regards,
 martin

Sage Weil schrieb:

Hi Martin,

Thanks, this was enough to help me reproduce it, and I believe I have a 
correct fix (it's working for me).  Can you try commit 935b639 'libceph: 
fix linger request requeuing' (for-linus branch of 
git://github.com/NewDreamNetwork/ceph-client.git) and confirm that it 
fixes things for you as well?


Thanks!
sage



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snap

2011-09-16 Thread Martin Mailand

Hi Sage,
yes, that fixes the btrfs problem. But now I have a new bug.

root@c-brick-001:~# rbd rm --snap=2011091601 test
*** Caught signal (Segmentation fault) **
 in thread 0x7f203d749740
 ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
 1: rbd() [0x457062]
 2: (()+0xfc60) [0x7f203ccf6c60]
 3: (librbd::snap_set(librbd::ImageCtx*, char const*)+0x10) 
[0x7f203d32ecd0]

 4: (main()+0x59f) [0x4518ff]
 5: (__libc_start_main()+0xff) [0x7f203b6cdeff]
 6: rbd() [0x44d569]
Segmentation fault

I use the ceph ubuntu build from your site.

Best Regards,
 martin



Sage Weil schrieb:
There is a patch for a btrfs bug in the clone ioctl reservation that 
hasn't made it upstream yet.  See


http://marc.info/?l=linux-btrfsm=131291225105499w=2

That should sort you out!

sage


On Fri, 16 Sep 2011, Martin Mailand wrote:


Hi,
should rbd snap work? I created a snapshot and then I want to list it.
rbd snap create --snap=2011091601 lenny1
rbd snap ls lenny1

But the ls command does not come back and I get a Kernel bug on 2 OSD.
It is reproducible.


[ 7658.115729] [ cut here ]
[ 7658.115779] kernel BUG at fs/btrfs/delayed-inode.c:1693!
[ 7658.115812] invalid opcode:  [#1] SMP
[ 7658.115846] CPU 1
[ 7658.115861] Modules linked in: radeon ttm drm_kms_helper drm i2c_algo_bit
psmouse k8temp sp5100_tco edac_core edac_mce_amd serio_raw shpchp i2c_piix4 lp
parport ahci pata_atiixp libahci btrfs e1000e zlib_deflate libcrc32c
[ 7658.116080]
[ 7658.116095] Pid: 1418, comm: cosd Tainted: GW   3.1.0-rc6 #1
MICRO-STAR INTERNATIONAL CO., LTD MS-96B3/MS-96B3
[ 7658.116167] RIP: 0010:[a007ffd0]  [a007ffd0]
btrfs_delayed_update_inode+0x2a0/0x2b0 [btrfs]
[ 7658.116278] RSP: 0018:8801160efbc8  EFLAGS: 00010286
[ 7658.116311] RAX: ffe4 RBX: 8800777c0120 RCX:
00018000
[ 7658.116351] RDX: f7e5 RSI: 00018000 RDI:
880116b10160
[ 7658.116389] RBP: 8801160efc08 R08: e8c81a40 R09:
8800886826a0
[ 7658.116428] R10:  R11:  R12:
880001e4af50
[ 7658.116467] R13: 8800777c0168 R14: 880117122ea0 R15:
880115113000
[ 7658.116507] FS:  7f80e30eb700() GS:88011fc8()
knlGS:
[ 7658.116550] CS:  0010 DS:  ES:  CR0: 80050033
[ 7658.116583] CR2: ff600400 CR3: 000116015000 CR4:
06e0
[ 7658.116623] DR0:  DR1:  DR2:

[ 7658.116662] DR3:  DR6: 0ff0 DR7:
0400
[ 7658.116700] Process cosd (pid: 1418, threadinfo 8801160ee000, task
880116aeade0)
[ 7658.116744] Stack:
[ 7658.116759]  0282 00018000 8801160efc18
880001e4af50
[ 7658.116817]  880117122ea0 8800391b01e0 8801171113f0

[ 7658.116875]  8801160efc58 a003f353 8801160efc38
a00677f8
[ 7658.116933] Call Trace:
[ 7658.116978]  [a003f353] btrfs_update_inode+0x53/0x160 [btrfs]
[ 7658.117039]  [a00677f8] ? btrfs_tree_unlock+0x78/0xb0 [btrfs]
[ 7658.117099]  [a0063184] btrfs_ioctl_clone+0x9b4/0xd20 [btrfs]
[ 7658.117164]  [a00666f6] btrfs_ioctl+0x306/0xe20 [btrfs]
[ 7658.117204]  [81175f32] ? do_filp_open+0x42/0xa0
[ 7658.117240]  [81178048] do_vfs_ioctl+0x98/0x540
[ 7658.117277]  [81156f40] ? kmem_cache_free+0x20/0x100
[ 7658.117313]  [81178581] sys_ioctl+0x91/0xa0
[ 7658.117347]  [815f02c2] system_call_fastpath+0x16/0x1b
[ 7658.117381] Code: 00 03 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 f8 7d fa ff
85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 cc fd ff ff 0f 1f 80 00 00 00 00 0f
0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
[ 7658.117814] RIP  [a007ffd0]
btrfs_delayed_update_inode+0x2a0/0x2b0 [btrfs]
[ 7658.117883]  RSP 8801160efbc8
[ 7658.122364] ---[ end trace c8a580615cad6cbe ]---

Best Regards,
 Martin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd snap

2011-09-16 Thread Martin Mailand

Hi Josh,
right, that's my mistake, I will try it with the right commandline tomorrow.

Best Regards,
 martin

Josh Durgin schrieb:

On 09/16/2011 02:32 PM, Martin Mailand wrote:

root@c-brick-001:~# rbd rm --snap=2011091601 test
*** Caught signal (Segmentation fault) **
in thread 0x7f203d749740
ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
1: rbd() [0x457062]
2: (()+0xfc60) [0x7f203ccf6c60]
3: (librbd::snap_set(librbd::ImageCtx*, char const*)+0x10) 
[0x7f203d32ecd0]

4: (main()+0x59f) [0x4518ff]
5: (__libc_start_main()+0xff) [0x7f203b6cdeff]
6: rbd() [0x44d569]
Segmentation fault


I added a bug to the tracker for this 
(http://tracker.newdream.net/issues/1545). It shouldn't crash the way 
you ran it, but if you're trying to remove a snapshot you need to use 
the 'snap rm' command, i.e.:

$ rbd snap rm --snap=2011091601 test
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph kernel bug

2011-09-15 Thread Martin Mailand

Hi Sage,
I am still hitting this in -rc6. It happeneds every time I stop an OSD.
Do you need more information to reproduce it?

Best Regards,
 martin

[103159.164630] libceph: osd0 192.168.42.113:6800 socket closed
[103169.153484] [ cut here ]
[103169.162935] kernel BUG at net/ceph/messenger.c:2193!
[103169.163332] invalid opcode:  [#1] SMP
[103169.163332] CPU 0
[103169.163332] Modules linked in: btrfs zlib_deflate rbd libceph 
libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables 
kvm_amd kvm bridge nv_tco stp radeon ttm drm_kms_helper drm lp parport 
i2c_algo_bit amd64_edac_mod i2c_nforce2 edac_core edac_mce_amd k10temp 
shpchp psmouse serio_raw ses enclosure aacraid forcedeth

[103169.163332]
[103169.163332] Pid: 4405, comm: kworker/0:1 Not tainted 3.1.0-rc6 #1 
Supermicro H8DM8-2/H8DM8-2
[103169.163332] RIP: 0010:[a02b73f1]  [a02b73f1] 
ceph_con_send+0x111/0x120 [libceph]

[103169.163332] RSP: 0018:88031c5b3bd0  EFLAGS: 00010202
[103169.163332] RAX: 88040502c678 RBX: 88040452b030 RCX: 
88031c8a9e50
[103169.163332] RDX: 88031c5b3fd8 RSI: 88040502c600 RDI: 
88040452b1a8
[103169.163332] RBP: 88031c5b3bf0 R08: 88040fc0de40 R09: 
0002
[103169.163332] R10: 0002 R11: 0072 R12: 
88040452b1a8
[103169.163332] R13: 88040502c600 R14: 88031c8a9e60 R15: 
88031c8a9e50
[103169.163332] FS:  7f6d43dd2700() GS:88040fc0() 
knlGS:

[103169.163332] CS:  0010 DS:  ES:  CR0: 8005003b
[103169.163332] CR2: ff600400 CR3: 000403fb1000 CR4: 
06f0
[103169.163332] DR0:  DR1:  DR2: 

[103169.163332] DR3:  DR6: 0ff0 DR7: 
0400
[103169.163332] Process kworker/0:1 (pid: 4405, threadinfo 
88031c5b2000, task 880405cd5bc0)

[103169.163332] Stack:
[103169.163332]  88031c5b3bf0 880404632a00 88031c8a9e30 
88031c8a9da8
[103169.163332]  88031c5b3c40 a02bc8ad 88031c8a9c80 
88031c8a9e00
[103169.163332]  88031c5b3c40 8804045b7151 88031c8a9da8 


[103169.163332] Call Trace:
[103169.163332]  [a02bc8ad] send_queued+0xed/0x130 [libceph]
[103169.163332]  [a02bed81] ceph_osdc_handle_map+0x261/0x3b0 
[libceph]

[103169.163332]  [a02bb31f] dispatch+0x10f/0x580 [libceph]
[103169.163332]  [a02b954f] con_work+0x214f/0x21d0 [libceph]
[103169.163332]  [a02b7400] ? ceph_con_send+0x120/0x120 [libceph]
[103169.163332]  [8108110d] process_one_work+0x11d/0x430
[103169.163332]  [81081c69] worker_thread+0x169/0x360
[103169.163332]  [81081b00] ? manage_workers.clone.21+0x240/0x240
[103169.163332]  [81086496] kthread+0x96/0xa0
[103169.163332]  [815e5c34] kernel_thread_helper+0x4/0x10
[103169.163332]  [81086400] ? flush_kthread_worker+0xb0/0xb0
[103169.163332]  [815e5c30] ? gs_change+0x13/0x13
[103169.163332] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 
48 c7 c6 70 98 2b a0 e8 1d ad 02 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 
c9 c3 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57
[103169.163332] RIP  [a02b73f1] ceph_con_send+0x111/0x120 
[libceph]

[103169.163332]  RSP 88031c5b3bd0
[103169.805672] ---[ end trace 49d197af1dff5a93 ]---
[103169.818910] BUG: unable to handle kernel paging request at 
fff8

[103169.828781] IP: [810868f0] kthread_data+0x10/0x20
[103169.828781] PGD 1a07067 PUD 1a08067 PMD 0
[103169.828781] Oops:  [#2] SMP
[103169.828781] CPU 0
[103169.828781] Modules linked in: btrfs zlib_deflate rbd libceph 
libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables 
kvm_amd kvm bridge nv_tco stp radeon ttm drm_kms_helper drm lp parport 
i2c_algo_bit amd64_edac_mod i2c_nforce2 edac_core edac_mce_amd k10temp 
shpchp psmouse serio_raw ses enclosure aacraid forcedeth

[103169.828781]
[103169.828781] Pid: 4405, comm: kworker/0:1 Tainted: G  D 
3.1.0-rc6 #1 Supermicro H8DM8-2/H8DM8-2
[103169.828781] RIP: 0010:[810868f0]  [810868f0] 
kthread_data+0x10/0x20

[103169.828781] RSP: 0018:88031c5b3878  EFLAGS: 00010096
[103169.828781] RAX:  RBX:  RCX: 

[103169.828781] RDX: 880405cd5bc0 RSI:  RDI: 
880405cd5bc0
[103169.828781] RBP: 88031c5b3878 R08: 00989680 R09: 

[103169.828781] R10: 0400 R11: 0005 R12: 
880405cd5f88
[103169.828781] R13:  R14:  R15: 
880405cd5e90
[103169.828781] FS:  7f6d43dd2700() GS:88040fc0() 
knlGS:

[103169.828781] CS:  0010 DS:  ES:  CR0: 8005003b
[103169.828781] CR2: fff8 CR3: 000403fb1000 CR4: 
06f0
[103169.828781] DR0: 

  1   2   >