Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-12-03 Thread Sylvain Munaut
Hi,

 What sort of memory are your instances using?

I just had a look. Around 120 Mb. Which indeed is a bit higher that I'd like.


 I haven't turned on any caching so I assume it's disabled.

Yes.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-30 Thread James Harper
 
 Hi James,
 
  Are you still working on this in any way?
 
 Well I'm using it, but I haven't worked on it. I never was able to
 reproduce any issue with it locally ...
 In prod, I do run it with cache disabled though since I never took the
 time to check using the cache was safe in the various failure modes.
 
 Is 300 MB normal ? Well, that probably depends on your settings (cache
 enabled / size / ...). But in anycase I'd guess the memory comes from
 a librbd itself. It's not like I do much allocation myself :p
 

What sort of memory are your instances using? I haven't turned on any caching 
so I assume it's disabled.

I increased the stack size to 8M to work around the crash I was having, but 
lowering that to 2MB doesn't have any significant impact on memory usage.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread James Harper
Sylvain,

Are you still working on this in any way?

It's been working great for me but seems to use an excessive amount of memory, 
like 300MB per process. Is that expected?

Thanks

James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Sylvain Munaut
 Sent: Saturday, 20 April 2013 12:41 AM
 To: Pasi Kärkkäinen
 Cc: ceph-devel@vger.kernel.org; xen-de...@lists.xen.org
 Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
 test ? :p
 
  If you have time to write up some lines about steps required to test this,
  that'd be nice, it'll help people to test this stuff.
 
 To quickly test, I compiled the package and just replaced the tapdisk
 binary from my normal blktap install with the newly compiled one.
 
 Then you need to setup a RBD image named 'test' in the default 'rbd'
 pool. You also need to setup a proper ceph.conf and keyring file on
 the client (since librbd will use those for the parameters). The
 keyring must contain the 'client.admin' key
 
 Then in the config file, use something like
 tap2:tapdisk:rbd:xxx,xvda1,w  the 'xxx' part is currently ignored
 ...
 
 
 Cheers,
 
 Sylvain
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread Sylvain Munaut
Hi James,

 Are you still working on this in any way?

Well I'm using it, but I haven't worked on it. I never was able to
reproduce any issue with it locally ...
In prod, I do run it with cache disabled though since I never took the
time to check using the cache was safe in the various failure modes.

Is 300 MB normal ? Well, that probably depends on your settings (cache
enabled / size / ...). But in anycase I'd guess the memory comes from
a librbd itself. It's not like I do much allocation myself :p

Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-16 Thread Frederik Thuysbaert

Hi Sylvain,


I'm not quite sure what u mean, can u give some more information on how I do
this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
is what u meant.


Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

...

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash



I did 2 runs, with a cold reboot in between just to be sure. I don't 
think I'm getting a lot of valuable information, but I will post it 
anyway. The reason for the cold reboot was a 'Cannot access memory at 
address ...' in gdb after the first frame, I thought it could help.


Here's what I got:

try 1:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

(gdb) bt
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

Cannot access memory at address 0x7fb42f081c38
(gdb) frame 0
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

(gdb) list
77  }
78  
79  int
80  main(int argc, char *argv[])
81  {
82  char *control;
83  int c, err, nodaemon;
84  FILE *out;
85  
86  control  = NULL;
(gdb) info locals
No symbol table info available.

try 2:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
Cannot access memory at address 0x7fe05c2ba518
(gdb) frame 0
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) list
77  }
78  
79  int
80  main(int argc, char *argv[])
81  {
82  char *control;
83  int c, err, nodaemon;
84  FILE *out;
85  
86  control  = NULL;
(gdb) info locals
No symbol table info available.

Regards,

- Frederik

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper
 
  Hi,
 
   I just tested with tap2:aio and that worked (had an old image of the VM
 on
  lvm still so just tested with that). Switching back to rbd and it crashes 
  every
  time, just as postgres is starting in the vm. Booting into single user mode,
  waiting 30 seconds, then letting the boot continue it still crashes at the
 same
  point so I think it's not a timing thing - maybe postgres has a disk access
  pattern that is triggering the bug?
 
  Mmm, that's really interesting.
 
  Could you try to disable request merging ? Just give option
  max_merge_size=0 in the tap2 disk description. Something like
  'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
 
 
 Just as suddenly the problem went away and I can no longer reproduce the
 crash on startup. Very frustrating. Most likely it still crashed during heavy 
 use
 but that can take days.
 
 I've just upgraded librbd to dumpling (from cuttlefish) on that one server and
 will see what it's doing by morning. I'll disable merging when I can reproduce
 it next.
 

I just had a crash since upgrading to dumpling, and will disable merging 
tonight.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper

 
 I just had a crash since upgrading to dumpling, and will disable merging
 tonight.
 

Still crashes with merging disabled.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Frederik Thuysbaert

On 13-08-13 17:39, Sylvain Munaut wrote:


It's actually strange that it changes anything at all.

Can you try adding a ERROR(HERE\n);  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.

When halting the domU after the errors, I get the following in dom0 syslog:

Aug 14 10:43:57 xen-001 kernel: [ 5041.338756] INFO: task tapdisk:9690 
blocked for more than 120 seconds.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338817] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338903] tapdisk D 
8800bf213780 0  9690  1 0x
Aug 14 10:43:57 xen-001 kernel: [ 5041.338908]  8800b4b0e730 
0246 8800 8160d020
Aug 14 10:43:57 xen-001 kernel: [ 5041.338912]  00013780 
8800b4ebffd8 8800b4ebffd8 8800b4b0e730
Aug 14 10:43:57 xen-001 kernel: [ 5041.338916]  8800b4d36190 
000181199c37 8800b5798c00 8800b5798c00

Aug 14 10:43:57 xen-001 kernel: [ 5041.338921] Call Trace:
Aug 14 10:43:57 xen-001 kernel: [ 5041.338929] [a0308411] ? 
blktap_device_destroy_sync+0x85/0x9b [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338936] [8105fadf] ? 
add_wait_queue+0x3c/0x3c
Aug 14 10:43:57 xen-001 kernel: [ 5041.338940] [a0307444] ? 
blktap_ring_release+0x10/0x2d [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338945] [810fb141] ? 
fput+0xf9/0x1a1
Aug 14 10:43:57 xen-001 kernel: [ 5041.338949] [810f8e6c] ? 
filp_close+0x62/0x6a
Aug 14 10:43:57 xen-001 kernel: [ 5041.338954] [81049831] ? 
put_files_struct+0x60/0xad
Aug 14 10:43:57 xen-001 kernel: [ 5041.338958] [81049e38] ? 
do_exit+0x292/0x713
Aug 14 10:43:57 xen-001 kernel: [ 5041.338961] [8104a539] ? 
do_group_exit+0x74/0x9e
Aug 14 10:43:57 xen-001 kernel: [ 5041.338965] [81055f94] ? 
get_signal_to_deliver+0x46d/0x48f
Aug 14 10:43:57 xen-001 kernel: [ 5041.338970] [81347759] ? 
force_sig_info_fault+0x5b/0x63
Aug 14 10:43:57 xen-001 kernel: [ 5041.338975] [8100de27] ? 
do_signal+0x38/0x610
Aug 14 10:43:57 xen-001 kernel: [ 5041.338979] [81070deb] ? 
arch_local_irq_restore+0x7/0x8
Aug 14 10:43:57 xen-001 kernel: [ 5041.338983] [8134eb77] ? 
_raw_spin_unlock_irqrestore+0xe/0xf
Aug 14 10:43:57 xen-001 kernel: [ 5041.338987] [8103f944] ? 
wake_up_new_task+0xb9/0xc2
Aug 14 10:43:57 xen-001 kernel: [ 5041.338992] [8106f987] ? 
sys_futex+0x120/0x151
Aug 14 10:43:57 xen-001 kernel: [ 5041.338995] [8100e435] ? 
do_notify_resume+0x25/0x68
Aug 14 10:43:57 xen-001 kernel: [ 5041.338999] [8134ef3c] ? 
retint_signal+0x48/0x8c

...
Aug 14 10:44:17 xen-001 tap-ctl: tap-err:tap_ctl_connect: couldn't 
connect to /var/run/blktap-control/ctl9478: 111





Cheers,

 Sylvain

Regards

- Frederik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi,

 I just tested with tap2:aio and that worked (had an old image of the VM on 
 lvm still so just tested with that). Switching back to rbd and it crashes 
 every time, just as postgres is starting in the vm. Booting into single user 
 mode, waiting 30 seconds, then letting the boot continue it still crashes at 
 the same point so I think it's not a timing thing - maybe postgres has a disk 
 access pattern that is triggering the bug?

Mmm, that's really interesting.

Could you try to disable request merging ? Just give option
max_merge_size=0 in the tap2 disk description. Something like
'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'

Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread James Harper
 
 Hi,
 
  I just tested with tap2:aio and that worked (had an old image of the VM on
 lvm still so just tested with that). Switching back to rbd and it crashes 
 every
 time, just as postgres is starting in the vm. Booting into single user mode,
 waiting 30 seconds, then letting the boot continue it still crashes at the 
 same
 point so I think it's not a timing thing - maybe postgres has a disk access
 pattern that is triggering the bug?
 
 Mmm, that's really interesting.
 
 Could you try to disable request merging ? Just give option
 max_merge_size=0 in the tap2 disk description. Something like
 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
 

Just as suddenly the problem went away and I can no longer reproduce the crash 
on startup. Very frustrating. Most likely it still crashed during heavy use but 
that can take days.

I've just upgraded librbd to dumpling (from cuttlefish) on that one server and 
will see what it's doing by morning. I'll disable merging when I can reproduce 
it next.

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi Frederik,

 A traceback would be great if you can get a core file. And possibly
 compile tapdisk with debug symbols.

 I'm not quite sure what u mean, can u give some more information on how I do
 this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
 is what u meant.

Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

Then when it crashes, if will leave a 'core' time somewhere. (not sure
where, maybe in / or in /tmp)
If it doesn't you may have to enable it. When the process is running,
use this on the tapdisk PID :

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash
happenned :

See for ex http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html
for how to do it.


 When halting the domU after the errors, I get the following in dom0 syslog:

It's not really unexpected. If tapdisk crashes the IO ring is going to
be left hanging and god knows what weird behaviour will happen ...


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
 FWIW, I can confirm via printf's that this error path is never hit in at 
 least some of the crashes I'm seeing.

Ok thanks.

Are you using cache btw ?

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
  FWIW, I can confirm via printf's that this error path is never hit in at 
  least
 some of the crashes I'm seeing.
 
 Ok thanks.
 
 Are you using cache btw ?
 

I hope not. How could I tell? It's not something I've explicitly enabled.

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi,

 I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Frederik Thuysbaert


Hi,

I have been testing this a while now, and just finished testing your 
untested patch. The rbd caching problem still persists.


The system I am testing on has the following characteristics:

Dom0:
- Linux xen-001 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64
- Most recent git checkout of blktap rbd branch

DomU:
- Same kernel as dom0
- Root (xvda1) is a logical volume on dom0
- xvda2 is a Rados Block Device format 1

Let me start by saying that the errors only occur with RBD client 
caching ON.
I will give the error messages of both dom0 and domU before and after I 
applied the patch.


Actions in domU to trigger errors:

~# mkfs.xfs -f /dev/xvda2
~# mount /dev/xvda2 /mnt
~# bonnie -u 0 -g 0 /mnt


Error messages:

BEFORE patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0: no errors

domU:
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960475] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960488] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960501] lost page write due 
to I/O error on xvda2

...
Aug 13 18:18:52 debian-vm-101 kernel: [   56.394645] XFS (xvda2): 
xfs_do_force_shutdown(0x2) called from line 1007 of file 
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c.  Return address = 
0xa013ced5
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941539] XFS (xvda2): 
xfs_log_force: error 5 returned.
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941565] XFS (xvda2): 
xfs_log_force: error 5 returned.

...

AFTER patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0:
Aug 13 16:40:49 xen-001 kernel: [   94.954734] tapdisk[3075]: segfault 
at 7f749ee86da0 ip 7f749d060776 sp 7f748ea7a460 error 7 in 
libpthread-2.13.so[7f749d059000+17000]



domU:
Same as before patch.



I would like to add that I have the time to test this, we are happy to 
help you in any way possible. However, since I am no C developer, I 
won't be able to do much more than testing.



Regards

Frederik


On 13-08-13 11:20, Sylvain Munaut wrote:

Hi,


I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi,

 I have been testing this a while now, and just finished testing your
 untested patch. The rbd caching problem still persists.

Yes, I wouldn't expect to change anything for caching. But I still
don't understand why caching would change anything at all ... all of
it should be handled within the librbd lib.


Note that I would recommend against caching anyway. The blktap layer
doesn't pass through the FLUSH commands and so this make it completely
unsafe because the VM will think things are commited to disk durably
even though they are not ...



 I will give the error messages of both dom0 and domU before and after I
 applied the patch.

It's actually strange that it changes anything at all.

Can you try adding a ERROR(HERE\n);  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
Just noticed email subject qemu-1.4.0 and onwards, linux kernel 3.2.x, 
ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and 
unresponsive qemu-process, [Qemu-devel] [Bug 1207686] where Sage noted that he 
has seen a completion called twice in the logs the OP posted. If that is 
actually happening (and not just an artefact of logging ring buffer overflowing 
or something) then I think that could easily cause a segfault in tapdisk rbd.

I'll try and see if I can log when that happens.

James

 -Original Message-
 From: Sylvain Munaut [mailto:s.mun...@whatever-company.com]
 Sent: Tuesday, 13 August 2013 7:20 PM
 To: James Harper
 Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-de...@lists.xen.org
 Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
 test ? :p
 
 Hi,
 
  I hope not. How could I tell? It's not something I've explicitly enabled.
 
 It's disabled by default.
 
 So you'd have to have enabled it either in ceph.conf  or directly in
 the device path in the xen config. (option is 'rbd cache',
 http://ceph.com/docs/next/rbd/rbd-config-ref/ )
 
 Cheers,
 
 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
I think I have a separate problem too - tapdisk will segfault almost 
immediately upon starting but seemingly only for Linux PV DomU's. Once it has 
started doing this I have to wait a few hours to a day before it starts working 
again. My Windows DomU's appear to be able to start normally though.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
On Wed, Aug 14, 2013 at 1:39 AM, James Harper
james.har...@bendigoit.com.au wrote:
 I think I have a separate problem too - tapdisk will segfault almost 
 immediately upon starting but seemingly only for Linux PV DomU's. Once it has 
 started doing this I have to wait a few hours to a day before it starts 
 working again. My Windows DomU's appear to be able to start normally though.

What about other blktap driver ? like using blktap raw driver, does
that work without issue ?

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
 On Wed, Aug 14, 2013 at 1:39 AM, James Harper
 james.har...@bendigoit.com.au wrote:
  I think I have a separate problem too - tapdisk will segfault almost
 immediately upon starting but seemingly only for Linux PV DomU's. Once it
 has started doing this I have to wait a few hours to a day before it starts
 working again. My Windows DomU's appear to be able to start normally
 though.
 
 What about other blktap driver ? like using blktap raw driver, does
 that work without issue ?
 

What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know 
how to specify raw and anything I try just says it doesn't understand

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
 
  On Wed, Aug 14, 2013 at 1:39 AM, James Harper
  james.har...@bendigoit.com.au wrote:
   I think I have a separate problem too - tapdisk will segfault almost
  immediately upon starting but seemingly only for Linux PV DomU's. Once it
  has started doing this I have to wait a few hours to a day before it starts
  working again. My Windows DomU's appear to be able to start normally
  though.
 
  What about other blktap driver ? like using blktap raw driver, does
  that work without issue ?
 
 
 What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know
 how to specify raw and anything I try just says it doesn't understand
 

I just tested with tap2:aio and that worked (had an old image of the VM on lvm 
still so just tested with that). Switching back to rbd and it crashes every 
time, just as postgres is starting in the vm. Booting into single user mode, 
waiting 30 seconds, then letting the boot continue it still crashes at the same 
point so I think it's not a timing thing - maybe postgres has a disk access 
pattern that is triggering the bug?

Putting printf's in seems to make the problem go away sometimes, so it's hard 
to debug.

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread Sylvain Munaut
Hi,

   tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
  7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
   tapdisk:9180 blocked for more than 120 seconds.
   tapdisk D 88043fc13540 0  9180  1 0x

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


 Actually maybe not. What I was reading only applies for large number of bytes 
 written to the pipe, and even then I got confused by the double negatives. 
 Sorry for the noise.

Yes, as you discovered but size  PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could leak requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:


diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
if (c)
rbd_aio_release(c);

+   list_move(req-queue, prv-reqs_free);
+   prv-reqs_free_count++;
+
return rv;
 }


Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread James Harper
tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
   7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D 88043fc13540 0  9180  1 0x
 
 You can try generating a core file by changing the ulimit on the running
 process
 
 A backtrace would be useful :)
 

I found it was actually dumping core in /, but gdb doesn't seem to work nicely 
and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such 
file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach 
strace to it, meaning it's probably a race condition and may not actually be in 
your code...

 
  Actually maybe not. What I was reading only applies for large number of
  bytes written to the pipe, and even then I got confused by the double
  negatives. Sorry for the noise.
 
 Yes, as you discovered but size  PIPE_BUF, they should be atomic even
 in non-blocking mode. But I could still add assert() there to make
 sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer 
so yes it should never be non-atomic.

 I did find a bug where it could leak requests which may lead to
 hang. But it shouldn't crash ...
 
 Here's an (untested yet) patch in the rbd error path:
 

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but 
as far as I can determine every routine starts and ends. My thinking at the 
moment is that it's either a race (the VM's most likely to crash have multiple 
disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's 
(eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see 
that if the read returns failure it means the callback was never called so the 
release is then the responsibility of the caller.

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread James Harper
 Here's an (untested yet) patch in the rbd error path:
 
 diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
 index 68fbed7..ab2d2c5 100644
 --- a/drivers/block-rbd.c
 +++ b/drivers/block-rbd.c
 @@ -560,6 +560,9 @@ err:
 if (c)
 rbd_aio_release(c);
 
 +   list_move(req-queue, prv-reqs_free);
 +   prv-reqs_free_count++;
 +
 return rv;
  }
 

FWIW, I can confirm via printf's that this error path is never hit in at least 
some of the crashes I'm seeing.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-10 Thread James Harper
 
 Hi,
 
  I've had a few occasions where tapdisk has segfaulted:
 
  tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
  tapdisk:9180 blocked for more than 120 seconds.
  tapdisk D 88043fc13540 0  9180  1 0x
 
  and then like:
 
  end_request: I/O error, dev tdc, sector 472008
 
  I can't be sure but I suspect that when this happened either one OSD was
  offline, or the cluster lost quorum briefly.
 
 Interesting. There might be an issue if a request ends in error, I'll
 have to check that.
 I'll have a look on monday.
 

You say in tdrbd_finish_aiocb:

while (1) {
/* POSIX says write will be atomic or blocking */
rv = write(prv-pipe_fds[1], (void*)req, sizeof(req));

but from what I've read in man 7 pipe, the statement about being atomic only 
applies if the pipe is open in non-blocking mode, and you open it with a call 
to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This would 
be consistent with the random crashes I'm seeing - I thought they were related 
to transient errors but my ceph cluster has been perfectly stable for a few 
days now and it's still happening.

What do you think?

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-09 Thread Sylvain Munaut
Hi,

 I've had a few occasions where tapdisk has segfaulted:

 tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 
 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
 tapdisk:9180 blocked for more than 120 seconds.
 tapdisk D 88043fc13540 0  9180  1 0x

 and then like:

 end_request: I/O error, dev tdc, sector 472008

 I can't be sure but I suspect that when this happened either one OSD was 
 offline, or the cluster lost quorum briefly.

Interesting. There might be an issue if a request ends in error, I'll
have to check that.
I'll have a look on monday.

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-08 Thread James Harper
 
 Yes the procedure didn't change.
 
 If you're on debian I could also sent your prebuilt .deb for blktap
 and for a patched xen version that includes userspace RBD support.
 
 If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
 

I've had a few occasions where tapdisk has segfaulted:

tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 7f7e3a5c8c10 
error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D 88043fc13540 0  9180  1 0x

and then like:

end_request: I/O error, dev tdc, sector 472008

I can't be sure but I suspect that when this happened either one OSD was 
offline, or the cluster lost quorum briefly.

James


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi,


Yes the procedure didn't change.

If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.

If you have any issue, I can be found on ceph's IRC under 'tnt' nick.


Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread James Harper
 
 Yes the procedure didn't change.
 
 If you're on debian I could also sent your prebuilt .deb for blktap
 and for a patched xen version that includes userspace RBD support.
 

It's working great so far. I just pulled the source and built it then copied 
blktap in.

For some reason I already had a tapdisk in /usr/sbin, as well as the one in 
/usr/bin, which confused the issue for a while. I must have installed something 
manually but I don't remember what.

Xen also includes tap-ctl:

blktap-utils: /usr/sbin/tap-ctl
xen-utils-4.1: /usr/lib/xen-4.1/bin/tap-ctl

and I removed the one from xen and linked it to the one in /usr/sbin. I did 
that before I found the other tapdisk in /usr/sbin so I'm not sure if that step 
was necessary.

Any chance this will be rolled into the main blktap sources?

 If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
 

Even though I have been on the internet since 94, I never got the hang of 
IRC... always found the stream of information a little overwhelming.

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi,


 It's working great so far. I just pulled the source and built it then copied 
 blktap in.

Good to hear :)

I've been using it more and more recently and it'll been good for me
too, even with live migrations.


 For some reason I already had a tapdisk in /usr/sbin, as well as the one in 
 /usr/bin, which confused the issue for a while. I must have installed 
 something manually but I don't remember what.

What distribution are you using ?


 Any chance this will be rolled into the main blktap sources?

I'd like to ... but I ave no idea how or even who to contact for that
... blktap is so fragmented ...

You have blktap2 which is in the man Xen tree. But that's not what's
used in debian (it's not installed / compiled)

You have the so called blktap2.5 which is what's on github and what I
have based my stuff on. It's also what's shipped with debian as
blktap-utils I think.
I also think Citrix have their own version based off blktap2.5 as well.

And soon there will be blktap3 in the official Xen tree.

I want to at least get it merged in blktap3 but since that code is not
ready (or even merged) yet, it's a bit early for that. That's also
probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.


Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread James Harper
 
  For some reason I already had a tapdisk in /usr/sbin, as well as the one in
  /usr/bin, which confused the issue for a while. I must have installed
  something manually but I don't remember what.
 
 What distribution are you using ?
 

Debian Wheezy

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Pasi Kärkkäinen
On Mon, Aug 05, 2013 at 01:01:35PM +0200, Sylvain Munaut wrote:
 
  Any chance this will be rolled into the main blktap sources?
 
 I'd like to ... but I ave no idea how or even who to contact for that
 ... blktap is so fragmented ...
 
 You have blktap2 which is in the man Xen tree. But that's not what's
 used in debian (it's not installed / compiled)
 
 You have the so called blktap2.5 which is what's on github and what I
 have based my stuff on. It's also what's shipped with debian as
 blktap-utils I think.
 I also think Citrix have their own version based off blktap2.5 as well.


Yep, XenServer is using blktap2.5. 

Also the Centos-6 Xen packages have blktap2.5 patched in.
 
 And soon there will be blktap3 in the official Xen tree.
 
 I want to at least get it merged in blktap3 but since that code is not
 ready (or even merged) yet, it's a bit early for that. That's also
 probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.
 

I think I saw an announcement recently on xen-devel that blktap3 development 
has been stopped.. 


-- Pasi

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
 I think I saw an announcement recently on xen-devel that blktap3 development 
 has been stopped..

Oh :(

In the mail it speaks about QEMU but is it possible to use the QEMU
driver model when booting PV domains ? (and not PVHVM).

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread George Dunlap
On Mon, Aug 5, 2013 at 1:03 PM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
 I think I saw an announcement recently on xen-devel that blktap3 development 
 has been stopped..

 Oh :(

 In the mail it speaks about QEMU but is it possible to use the QEMU
 driver model when booting PV domains ? (and not PVHVM).

Yes; qemu knows how to be a Xen PV block back-end.

One of the reasons for stopping work on blktap3 (AIUI) was that it
should in theory have performance characteristics similar to blktap3,
and tends to get newer protocols like ceph for free (i.e.,
implemented by someone else).

 -George
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Sylvain Munaut
Hi George,


 Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.


 One of the reasons for stopping work on blktap3 (AIUI) was that it
 should in theory have performance characteristics similar to blktap3,

And did anyone check the theory currently ? :)


 and tends to get newer protocols like ceph for free (i.e.,
 implemented by someone else).

Yes I can definitely see the appeal.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread George Dunlap

On 05/08/13 14:55, Sylvain Munaut wrote:

Hi George,



Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.


I don't think this is documented anywhere; you'll need to ask the 
experts.  Stefano? Roger? Wei?






One of the reasons for stopping work on blktap3 (AIUI) was that it
should in theory have performance characteristics similar to blktap3,

And did anyone check the theory currently ? :)


I say in theory because they are using the same basic architecture: a 
normal process running in dom0, with no special kernel support.  If 
there were a performance difference, it would be something that should 
(in theory) be able to be optimized.


I don't think we have comparisons between qdisk (which is what we call 
qemu-as-pv-backend in Xen) and blktap3 (and since blktap3 wasn't 
finished they wouldn't mean much anyway); but I think qdisk compares 
reasonably with blkback.


 -George
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Wei Liu
On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
 On 05/08/13 14:55, Sylvain Munaut wrote:
 Hi George,
 
 
 Yes; qemu knows how to be a Xen PV block back-end.
 Very interesting. Is there documentation about this somewhere ?
 I had a look some time ago and it was really not very clear.
 
 Things like what Xen version support this. And with which features (
 indirect descriptors, persistent grants, discard, flush, ...) and/or
 which limitation.
 
 I don't think this is documented anywhere; you'll need to ask the
 experts.  Stefano? Roger? Wei?
 

These are Linux features not Xen ones AFAICT. In theory they are not
bound to specific Xen versions.

For the network part I don't think new features depend on any specific
hypercall. However for block Roger and Stefano seem to introduce
new hypercalls for certain features (I might be wrong though).


Wei.

 
 
 One of the reasons for stopping work on blktap3 (AIUI) was that it
 should in theory have performance characteristics similar to blktap3,
 And did anyone check the theory currently ? :)
 
 I say in theory because they are using the same basic
 architecture: a normal process running in dom0, with no special
 kernel support.  If there were a performance difference, it would be
 something that should (in theory) be able to be optimized.
 
 I don't think we have comparisons between qdisk (which is what we
 call qemu-as-pv-backend in Xen) and blktap3 (and since blktap3
 wasn't finished they wouldn't mean much anyway); but I think qdisk
 compares reasonably with blkback.
 
  -George
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread George Dunlap

On 05/08/13 16:18, Wei Liu wrote:

On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:

On 05/08/13 14:55, Sylvain Munaut wrote:

Hi George,



Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.

I don't think this is documented anywhere; you'll need to ask the
experts.  Stefano? Roger? Wei?


These are Linux features not Xen ones AFAICT. In theory they are not
bound to specific Xen versions.

For the network part I don't think new features depend on any specific
hypercall. However for block Roger and Stefano seem to introduce
new hypercalls for certain features (I might be wrong though).


We're talking about qemu; so the toolstack needs to know how to set up 
qdisk, and I think qdisk would need to be programmed to use, for 
example, persistent grants, yes?


 -G

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread Wei Liu
On Mon, Aug 05, 2013 at 04:20:20PM +0100, George Dunlap wrote:
 On 05/08/13 16:18, Wei Liu wrote:
 On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
 On 05/08/13 14:55, Sylvain Munaut wrote:
 Hi George,
 
 
 Yes; qemu knows how to be a Xen PV block back-end.
 Very interesting. Is there documentation about this somewhere ?
 I had a look some time ago and it was really not very clear.
 
 Things like what Xen version support this. And with which features (
 indirect descriptors, persistent grants, discard, flush, ...) and/or
 which limitation.
 I don't think this is documented anywhere; you'll need to ask the
 experts.  Stefano? Roger? Wei?
 
 These are Linux features not Xen ones AFAICT. In theory they are not
 bound to specific Xen versions.
 
 For the network part I don't think new features depend on any specific
 hypercall. However for block Roger and Stefano seem to introduce
 new hypercalls for certain features (I might be wrong though).
 
 We're talking about qemu; so the toolstack needs to know how to set
 up qdisk, and I think qdisk would need to be programmed to use, for
 example, persistent grants, yes?
 

I don't think toolstack needs to involve in this. At least for the
network part FE and BE negotiate what features to use. The general idea
is that new feature will always be of benifit to enable so we make use
of them whenever possible. Certain features do have sysfs entries to
configure but that's not coded into libxl.

I cannot speak for block drivers, but grepping the source code I don't
think you can configure persistent grants via libxl either.


Wei.

  -G
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-07-31 Thread James Harper
I'm about to start trying this out. Has anything changed since this email 
http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg13984.html ?

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-07-01 Thread Sylvain Munaut
Hi again,

 However when rbd cache is enabled with:
 [client]
 rbd_cache = true

 the tapdisk process crashes if I do this in the domU:
 dd if=/dev/xvda bs=1M  /dev/null

I tested this locally and couldn't reproduce the issue.

Doing reads doesn't do anything bad AFAICT.
Doing writes OTOH seems to leak memory (or at least use much more
memory than the configured cache size).

I also rechecked the code and I don't see anything wrong with it.
AFAICT with or without cache shouldn't change anything so the issue
might be in librbd itself.

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-07-01 Thread Nathan O'Sullivan
I've installed debug symbols, perhaps that will give a better idea what 
is going on?


#0  __GI___libc_free (mem=0x7f516065) at malloc.c:2970
#1  0x7f515f3ac84b in ~raw_posix_aligned (this=0x7f513c418f20, 
__in_chrg=optimised out) at common/buffer.cc:152
#2  ceph::buffer::raw_posix_aligned::~raw_posix_aligned (this=optimised 
out, __in_chrg=optimised out) at common/buffer.cc:155
#3  0x7f515f3a7f6e in ceph::buffer::ptr::release 
(this=0x7f513801d600) at common/buffer.cc:328
#4  0x7f515ee721c7 in ~ptr (this=0x7f513801d600, 
__in_chrg=optimised out) at ./include/buffer.h:159
#5  destroy (__p=0x7f513801d600, this=optimised out) at 
/usr/include/c++/4.6/ext/new_allocator.h:118
#6  std::_List_baseceph::buffer::ptr, std::allocatorceph::buffer::ptr 
::_M_clear (this=0x15e3908) at /usr/include/c++/4.6/bits/list.tcc:78
#7  0x7f515eea9ffe in ~_List_base (this=0x15e3908, 
__in_chrg=optimised out) at /usr/include/c++/4.6/bits/stl_list.h:372
#8  ~list (this=0x15e3908, __in_chrg=optimised out) at 
/usr/include/c++/4.6/bits/stl_list.h:429
#9  ~list (this=0x15e3908, __in_chrg=optimised out) at 
./include/buffer.h:304
#10 ~BufferHead (this=0x15e38c0, __in_chrg=optimised out) at 
osdc/ObjectCacher.h:84
#11 ObjectCacher::trim (this=0x1594a00, max_bytes=33554432, max_ob=42) 
at osdc/ObjectCacher.cc:949
#12 0x7f515eeb8e60 in ObjectCacher::_readx (this=optimised out, 
rd=0x15f1f70, oset=0x1595110, onfinish=0x1591280, external_call=false) 
at osdc/ObjectCacher.cc:1240
#13 0x7f515eebe620 in ObjectCacher::C_RetryRead::finish 
(this=0x15c3c30, r=optimised out) at osdc/ObjectCacher.h:554
#14 0x7f515ee7381a in Context::complete (this=0x15c3c30, 
r=optimised out) at ./include/Context.h:41
#15 0x7f515eeb9f65 in finish_contexts (cct=0x155cc30, finished=..., 
result=0) at ./include/Context.h:78
#16 0x7f515eeaf705 in ObjectCacher::bh_read_finish (this=optimised 
out, poolid=optimised out, oid=..., start=983040, length=131072, 
bl=..., r=0, trust_enoent=true)

at osdc/ObjectCacher.cc:773
#17 0x7f515eebd32f in ObjectCacher::C_ReadFinish::finish 
(this=0x15ced30, r=0) at osdc/ObjectCacher.h:478
#18 0x7f515ee7381a in Context::complete (this=0x15ced30, 
r=optimised out) at ./include/Context.h:41
#19 0x7f515eea41f5 in librbd::C_Request::finish (this=0x159dfd0, 
r=0) at librbd/LibrbdWriteback.cc:55
#20 0x7f515eea2c14 in librbd::context_cb (c=optimised out, 
arg=optimised out) at librbd/LibrbdWriteback.cc:35
#21 0x7f515f21056d in librados::C_AioComplete::finish 
(this=optimised out, r=optimised out) at 
./librados/AioCompletionImpl.h:171
#22 0x7f515f27cb00 in Finisher::finisher_thread_entry 
(this=0x1576d98) at common/Finisher.cc:56
#23 0x7f515e113e9a in start_thread (arg=0x7f5158a87700) at 
pthread_create.c:308
#24 0x7f515eb89ccd in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:112

#25 0x in ?? ()



On 1/07/2013 7:57 PM, Sylvain Munaut wrote:

Hi again,


However when rbd cache is enabled with:
[client]
rbd_cache = true

the tapdisk process crashes if I do this in the domU:
dd if=/dev/xvda bs=1M  /dev/null

I tested this locally and couldn't reproduce the issue.

Doing reads doesn't do anything bad AFAICT.
Doing writes OTOH seems to leak memory (or at least use much more
memory than the configured cache size).

I also rechecked the code and I don't see anything wrong with it.
AFAICT with or without cache shouldn't change anything so the issue
might be in librbd itself.

Cheers,

 Sylvain


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-06-21 Thread Nathan O'Sullivan
I've been testing this on Ubuntu 12.04.02 64-bit with kernel 3.2.0-48 
and ceph 0.61.4


With rbd cache disabled, it works well enough in initial testing.

However when rbd cache is enabled with:
[client]
rbd_cache = true

the tapdisk process crashes if I do this in the domU:
dd if=/dev/xvda bs=1M  /dev/null


I grabbed the tapdisk stacktrace with gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f1677186700 (LWP 6507)]
0x7f167d21857c in free () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7f167d21857c in free () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f167daab84b in 
ceph::buffer::raw_posix_aligned::~raw_posix_aligned() () from 
/usr/lib/librados.so.2
#2  0x7f167daa6f6e in ceph::buffer::ptr::release() () from 
/usr/lib/librados.so.2
#3  0x7f167d5711c7 in std::_List_baseceph::buffer::ptr, 
std::allocatorceph::buffer::ptr ::_M_clear() () from /usr/lib/librbd.so.1
#4  0x7f167d5a8ffe in ObjectCacher::trim(long, long) () from 
/usr/lib/librbd.so.1
#5  0x7f167d5b7e60 in ObjectCacher::_readx(ObjectCacher::OSDRead*, 
ObjectCacher::ObjectSet*, Context*, bool) () from /usr/lib/librbd.so.1
#6  0x7f167d5bd620 in ObjectCacher::C_RetryRead::finish(int) () from 
/usr/lib/librbd.so.1
#7  0x7f167d57281a in Context::complete(int) () from 
/usr/lib/librbd.so.1
#8  0x7f167d5b8f65 in finish_contexts(CephContext*, 
std::listContext*, std::allocatorContext* , int) () from 
/usr/lib/librbd.so.1
#9  0x7f167d5ae705 in ObjectCacher::bh_read_finish(long, sobject_t, 
long, unsigned long, ceph::buffer::list, int, bool) ()

   from /usr/lib/librbd.so.1
#10 0x7f167d5bc32f in ObjectCacher::C_ReadFinish::finish(int) () 
from /usr/lib/librbd.so.1
#11 0x7f167d57281a in Context::complete(int) () from 
/usr/lib/librbd.so.1
#12 0x7f167d5a31f5 in librbd::C_Request::finish(int) () from 
/usr/lib/librbd.so.1
#13 0x7f167d5a1c14 in librbd::context_cb(void*, void*) () from 
/usr/lib/librbd.so.1
#14 0x7f167d90f56d in librados::C_AioComplete::finish(int) () from 
/usr/lib/librados.so.2
#15 0x7f167d97bb00 in Finisher::finisher_thread_entry() () from 
/usr/lib/librados.so.2
#16 0x7f167c812e9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0

#17 0x7f167d288ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x in ?? ()


Regards
Nathan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-06-21 Thread Sylvain Munaut
Hi,

 I've been testing this on Ubuntu 12.04.02 64-bit with kernel 3.2.0-48 and
 ceph 0.61.4

Thanks for testing :)

 However when rbd cache is enabled with:
 [client]
 rbd_cache = true

 the tapdisk process crashes if I do this in the domU:
 dd if=/dev/xvda bs=1M  /dev/null

Interesting. I'm currently away for I'll try to setup a test and see
if I can reproduce the issue locally.

I never really tried with the cache enabled.

Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-19 Thread Pasi Kärkkäinen
On Thu, Apr 18, 2013 at 05:05:29PM +0200, Sylvain Munaut wrote:
 Hi,


Hi,
 
 I've been working on getting a working blktap driver allowing to
 access ceph RBD block devices without relying on the RBD kernel driver
 and it finally got to a point where, it works and is testable.
 

Great! Ceph distributed block storage is cool.

 Some of the advantages are:
  - Easier to update to newer RBD version
  - Allows functionality only available in the userspace RBD library
 (write cache, layering, ...)
  - Less issue when you have OSD as domU on the same dom0
  - Contains crash to user space :p (they shouldn't happen, but ...)
 
 It's still an early prototype, but if you want to give it a shot and
 give feedback.
 
 You can find the code there https://github.com/smunaut/blktap/tree/rbd
  (rbd branch).
 
 Currently the username, poolname and image name are hardcoded ...
 (look for FIXME in the code). I'll get to that next, once I figured
 the best format for arguments.
 

If you have time to write up some lines about steps required to test this,
that'd be nice, it'll help people to test this stuff.

Thanks,

-- Pasi

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-19 Thread Sylvain Munaut
 If you have time to write up some lines about steps required to test this,
 that'd be nice, it'll help people to test this stuff.

To quickly test, I compiled the package and just replaced the tapdisk
binary from my normal blktap install with the newly compiled one.

Then you need to setup a RBD image named 'test' in the default 'rbd'
pool. You also need to setup a proper ceph.conf and keyring file on
the client (since librbd will use those for the parameters). The
keyring must contain the 'client.admin' key

Then in the config file, use something like
tap2:tapdisk:rbd:xxx,xvda1,w  the 'xxx' part is currently ignored
...


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html