[sheepdog] [PATCH] func/test: change functional test output for __vdi_list
Change output of functional test using __vdi_list in assosiation with adding block_size_shift information to vdi list command. Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- tests/functional/016.out |2 +- tests/functional/029.out | 18 +++--- tests/functional/030.out | 158 +++--- tests/functional/031.out | 20 +++--- tests/functional/039.out | 42 ++-- tests/functional/040.out |8 +- tests/functional/041.out | 70 ++-- tests/functional/043.out | 24 tests/functional/044.out |2 +- tests/functional/046.out | 24 tests/functional/047.out |4 +- tests/functional/048.out |6 +- tests/functional/052.out | 64 +- tests/functional/059.out | 24 tests/functional/060.out | 80 tests/functional/062.out |2 +- tests/functional/068.out | 12 ++-- tests/functional/072.out | 12 ++-- tests/functional/076.out |8 +- tests/functional/077.out |4 +- tests/functional/078.out | 18 +++--- tests/functional/079.out | 16 +++--- tests/functional/080.out |8 +- tests/functional/083.out |4 +- tests/functional/088.out |8 +- tests/functional/091.out |8 +- tests/functional/092.out | 20 +++--- tests/functional/096.out | 66 ++-- 28 files changed, 366 insertions(+), 366 deletions(-) diff --git a/tests/functional/016.out b/tests/functional/016.out index ab648d4..50e23d7 100644 --- a/tests/functional/016.out +++ b/tests/functional/016.out @@ -2,7 +2,7 @@ QA output created by 016 using backend plain store Failed to create snapshot for base, maybe snapshot id (0) or tag (tag) is existed there should be no vdi - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift there should be no object STORE DATAVDI VMSTATE ATTRLEDGER STALE 0 0 3 0 0 0 0 diff --git a/tests/functional/029.out b/tests/functional/029.out index 23117f7..7c50653 100644 --- a/tests/functional/029.out +++ b/tests/functional/029.out @@ -6,15 +6,15 @@ To create replicated vdi, set -c x To create erasure coded vdi, set -c x:y x(2,4,8,16) - number of data strips y(1 to 15) - number of parity strips - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test50 20 MB 20 MB 0.0 MB DATE fd2c304:2 - test40 20 MB 20 MB 0.0 MB DATE fd2de3 4 - test70 20 MB 20 MB 0.0 MB DATE fd2f964:4 - test60 20 MB 20 MB 0.0 MB DATE fd31494:3 - test30 20 MB 20 MB 0.0 MB DATE fd3662 3 - test20 20 MB 0.0 MB 20 MB DATE fd3816 2 - test90 20 MB 20 MB 0.0 MB DATE fd4094 16:7 - test80 20 MB 20 MB 0.0 MB DATE fd42474:5 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test50 20 MB 20 MB 0.0 MB DATE fd2c304:222 + test40 20 MB 20 MB 0.0 MB DATE fd2de3 422 + test70 20 MB 20 MB 0.0 MB DATE fd2f964:422 + test60 20 MB 20 MB 0.0 MB DATE fd31494:322 + test30 20 MB 20 MB 0.0 MB DATE fd3662 322 + test20 20 MB 0.0 MB 20 MB DATE fd3816 222 + test90 20 MB 20 MB 0.0 MB DATE fd4094 16:722 + test80 20 MB 20 MB 0.0 MB DATE fd42474:522 Looking for the object 0xfd38150001 (vid 0xfd3816 idx 1, 2 copies) with 23 nodes 127.0.0.1:7000 doesn't have the object diff --git a/tests/functional/030.out b/tests/functional/030.out index 5b386ab..00f50a9 100644 --- a/tests/functional/030.out +++ b/tests/functional/030.out @@ -5,36 +5,36 @@ Index Tag Snapshot Time Index Tag Snapshot Time 1 s1 DATE 2 s2 DATE - NameIdSizeUsed SharedCreation time VDI id Copies Tag -s test11 10 MB 12 MB 0.0 MB DATE fd32fc 6 -s test12 10 MB 12 MB 0.0 MB DATE fd32fd 6 - test10 10 MB 0.0 MB 12 MB DATE fd32fe 6 -s test21 10 MB 12 MB 0.0 MB DATE fd3815 3 -s test22 10 MB 12 MB 0.0 MB DATE fd3816 3 - test20 10 MB 0.0 MB 12 MB DATE fd3817 3 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift +s test11 10 MB 12
[sheepdog] [PATCH] sheep: fix bug for not saving block_size_shift to cluster config
This patch fixes bugs that block_size_shift info was forgotten after cluster shutdown and start sheepdog. Add block_size_shift info to cluster config file. Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- sheep/config.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/sheep/config.c b/sheep/config.c index 383a1ed..dfad5fd 100644 --- a/sheep/config.c +++ b/sheep/config.c @@ -11,7 +11,7 @@ #include sheep_priv.h -#define SD_FORMAT_VERSION 0x0005 +#define SD_FORMAT_VERSION 0x0006 #define SD_CONFIG_SIZE 40 static struct sheepdog_config { @@ -21,7 +21,7 @@ static struct sheepdog_config { uint8_t store[STORE_LEN]; uint8_t shutdown; uint8_t copy_policy; - uint8_t __pad; + uint8_t block_size_shift; uint16_t version; uint64_t space; } config; @@ -64,6 +64,7 @@ static int get_cluster_config(struct cluster_info *cinfo) cinfo-nr_copies = config.copies; cinfo-flags = config.flags; cinfo-copy_policy = config.copy_policy; + cinfo-block_size_shift = config.block_size_shift; memcpy(cinfo-store, config.store, sizeof(config.store)); return SD_RES_SUCCESS; @@ -155,6 +156,7 @@ int set_cluster_config(const struct cluster_info *cinfo) config.copies = cinfo-nr_copies; config.copy_policy = cinfo-copy_policy; config.flags = cinfo-flags; + config.block_size_shift = cinfo-block_size_shift; memset(config.store, 0, sizeof(config.store)); pstrcpy((char *)config.store, sizeof(config.store), (char *)cinfo-store); -- 1.7.1 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [sheepdog-users] sheepdog 0.9 vs live migration
Hi Hitoshi, sorry, second try to send to the list... Am 2014-12-16 10:51, schrieb Hitoshi Mitake: if I remove the VDI lock the live migration works correctly: $ dog vdi lock unlock test-vm-disk but after the live migration I can't relock the VDI. Thanks for your report. As you say, live migration and vdi locking seem to be conflicted. I'll work on it later. But I'm not familiar with qemu's live migration feature, so it would take time. Could you add an issue to our launchpad tracker for remainder? You had two qemu instances temporarily running which accesses the same vdi on different hosts. Similar problem exists with drbd (kind of RAID 1 over network on two nodes), which let only one node (Primary node) access the drbd device in default configuration. But for live migration both nodes (or better both qemu instances) need access to the drbd device. For this, drbd has a dual primary mode which can be enabled by the drbd admin utility temporarily by a command line switch. I let handle this task to libvirt in my environment by writing a simple hook script for libvirt, which enabled the dual primary mode when a live migration starts and disables it when finished. For sheepdog it would be nice (at least for me), if it is possible to unlock the vdi, migrate the guest to a new node and lock the vdi again (Dont know if this is possible to implement in sheepdog; Allow a second lock to an already locked vdi and clear the old lock automatically after the old qemu is destroyed?) Cheers Bastian -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH RFT 0/4] garbage collect needless VIDs and inode objects
2014-12-15 10:36 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: Current sheepdog never recycles VIDs. But it will cause problems e.g. VID space exhaustion, too much garbage inode objects. I've been testing this branch and it seem to work. I use a script that creates 3 vdi, 3 snapshot for each (writing 10M of data), then removes them and look for objects with name starting with 80*. With all snap active /mnt/sheep/1/80fd3663 /mnt/sheep/0/80fd3818 /mnt/sheep/0/80fd32fc /mnt/sheep/0/80fd32fd /mnt/sheep/0/80fd32fe After removing all snap /mnt/sheep/1/80fd3663 /mnt/sheep/0/80fd3818 /mnt/sheep/0/80fd32fc /mnt/sheep/0/80fd32fd /mnt/sheep/0/80fd32fe After removing all vdi empty sheep -v Sheepdog daemon version 0.9.0_25_g24ef77f But I found a repeatable sheepdog crash! I notice that happening if I was running the script a second time. The crash occur after when I recreate a vdi with the same name and then I take a snapshot of it. Dec 16 12:12:42 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40067, op=DEL_VDI, result=00 Dec 16 12:12:47 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40069, op=DEL_VDI, data=(not string) Dec 16 12:12:47 INFO [main] run_vid_gc(2106) all members of the family (root: fd3662) are deleted Dec 16 12:12:47 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40069, op=DEL_VDI, result=00 Dec 16 12:13:57 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40072, op=NEW_VDI, data=(not string) Dec 16 12:13:57 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd32fc Dec 16 12:13:57 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40072, op=NEW_VDI, result=00 Dec 16 12:14:12 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40074, op=NEW_VDI, data=(not string) Dec 16 12:14:13 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd3815 Dec 16 12:14:13 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40074, op=NEW_VDI, result=00 Dec 16 12:14:23 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40076, op=NEW_VDI, data=(not string) Dec 16 12:14:23 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd3662 Dec 16 12:14:23 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40076, op=NEW_VDI, result=00 Dec 16 12:14:34 INFO [main] rx_main(830) req=0x7f314400d310, fd=26, client=127.0.0.1:40078, op=NEW_VDI, data=(not string) Dec 16 12:14:34 EMERG [main] crash_handler(268) sheep exits unexpectedly (Segmentation fault). Dec 16 12:14:34 EMERG [main] sd_backtrace(833) sheep.c:270: crash_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f31515cc02f] Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:64: lookup_vdi_family_member Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:109: update_vdi_family Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:396: add_vdi_state Dec 16 12:14:34 EMERG [main] sd_backtrace(833) ops.c:674: cluster_notify_vdi_add Dec 16 12:14:34 EMERG [main] sd_backtrace(833) group.c:948: sd_notify_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(833) zookeeper.c:1252: zk_event_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(833) event.c:210: do_event_loop Dec 16 12:14:34 EMERG [main] sd_backtrace(833) sheep.c:963: main Dec 16 12:14:34 EMERG [main] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc) [0x7f3150badeac] Dec 16 12:14:34 EMERG [main] sd_backtrace(847) sheep() [0x405fa8] How to reproduce: dog cluster format -c 2 dog vdi create -P test 1G dog vdi snapshot test dd if=/dev/urandom bs=1M count=10 | dog vdi write test dog vdi delete -s 1 test dog vdi delete test echo 'Recreating vdi test' dog vdi create -P test 1G dog vdi snapshot test -- at this point, sheep crashes dog vdi list -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] Fwd: [PATCH 2/2] sheep: forbid revival of orphan objects
2014-12-11 8:00 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: Current recovery process can cause revival of orphan objects. This patch solves this problem. sheep -v Sheepdog daemon version 0.9.0_18_g7215788 It works fine! Dec 16 15:00:20 INFO [main] main(966) shutdown Dec 16 15:00:20 INFO [main] zk_leave(989) leaving from cluster Dec 16 15:00:37 INFO [main] md_add_disk(343) /mnt/sheep/0, vdisk nr 206, total disk 1 Dec 16 15:00:37 INFO [main] md_add_disk(343) /mnt/sheep/1, vdisk nr 279, total disk 2 Dec 16 15:00:37 NOTICE [main] get_local_addr(522) found IPv4 address Dec 16 15:00:37 INFO [main] send_join_request(1016) IPv4 ip:192.168.10.7 port:7000 going to join the cluster Dec 16 15:00:37 NOTICE [main] nfs_init(611) nfs server service is not compiled Dec 16 15:00:37 INFO [main] main(958) sheepdog daemon (version 0.9.0_18_g7215788) started Dec 16 15:00:38 INFO [rw 30221] prepare_object_list(1100) skipping object list reading from IPv4 ip:192.168.10.7 port:7000 becauseit is marked as excluded node Dec 16 15:00:38 INFO [main] recover_object_main(930) object recovery progress 2% Dec 16 15:00:38 INFO [main] recover_object_main(930) object recovery progress 3% There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Dec 16 14:55:09 INFO [main] zk_leave(989) leaving from cluster Dec 16 14:55:40 INFO [main] md_add_disk(343) /mnt/sheep/0, vdisk nr 206, total disk 1 Dec 16 14:55:40 INFO [main] md_add_disk(343) /mnt/sheep/1, vdisk nr 279, total disk 2 Dec 16 14:55:40 NOTICE [main] get_local_addr(522) found IPv4 address Dec 16 14:55:40 INFO [main] send_join_request(1016) IPv4 ip:192.168.10.7 port:7000 going to join the cluster Dec 16 14:55:40 NOTICE [main] nfs_init(611) nfs server service is not compiled Dec 16 14:55:40 INFO [main] main(958) sheepdog daemon (version 0.9.0_18_g7215788) started Dec 16 14:55:41 INFO [rw 30049] prepare_object_list(1100) skipping object list reading from IPv4 ip:192.168.10.5 port:7000 becauseit is marked as excluded node Dec 16 14:55:41 ERROR [rw 30049] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30068] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30068] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.6:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30072] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 INFO [main] recover_object_main(930) object recovery progress 1% cut -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH 2/2] sheep: forbid revival of orphan objects
2014-12-16 15:07 GMT+01:00 Valerio Pachera siri...@gmail.com: It works fine! ... There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Please, notice that the same logic should apply to multi device: create some vdi unplug a disk remove some vdi plug back the disk This still causes Dec 16 15:10:16 INFO [main] recover_object_main(930) object recovery progress 74% Dec 16 15:10:16 ERROR [rw 30554] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30553] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30500] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30500] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30554] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ALERT [rw 30500] recover_replication_object(419) cannot access any replicas of fd32fc0013 at epoch 2 Dec 16 15:10:16 ALERT [rw 30500] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30500] recover_replication_object(427) can not recover oid fd32fc0013 Dec 16 15:10:16 ERROR [rw 30500] recover_object_work(600) failed to recover object fd32fc0013 Dec 16 15:10:16 ALERT [rw 30554] recover_replication_object(419) cannot access any replicas of fd32fc000b at epoch 2 Dec 16 15:10:16 ALERT [rw 30554] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30554] recover_replication_object(427) can not recover oid fd32fc000b Dec 16 15:10:16 ERROR [rw 30554] recover_object_work(600) failed to recover object fd32fc000b Dec 16 15:10:16 ERROR [rw 30553] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30552] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ALERT [rw 30553] recover_replication_object(419) cannot access any replicas of fd32fc0012 at epoch 2 Dec 16 15:10:16 ALERT [rw 30553] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30553] recover_replication_object(427) can not recover oid fd32fc0012 Dec 16 15:10:16 ERROR [rw 30553] recover_object_work(600) failed to recover object fd32fc0012 Notice I'm not using the option --enable-diskvnodes. Thank you. -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] Build failed in Jenkins: sheepdog-build #574
See http://jenkins.sheepdog-project.org:8080/job/sheepdog-build/574/changes Changes: [mitake.hitoshi] sheep, dog: add block_size_shift option to cluster format command [mitake.hitoshi] sheep, dog: add selectable object_size support of VDI operation [mitake.hitoshi] dog: revert the change for output of dog vdi list manually -- [...truncated 51 lines...] checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for size_t... yes checking for working alloca.h... yes checking for alloca... yes checking for dirent.h that defines DIR... yes checking for library containing opendir... none required checking for ANSI C header files... (cached) yes checking for sys/wait.h that is POSIX.1 compatible... yes checking arpa/inet.h usability... yes checking arpa/inet.h presence... yes checking for arpa/inet.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking netdb.h usability... yes checking netdb.h presence... yes checking for netdb.h... yes checking netinet/in.h usability... yes checking netinet/in.h presence... yes checking for netinet/in.h... yes checking for stdint.h... (cached) yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/ioctl.h usability... yes checking sys/ioctl.h presence... yes checking for sys/ioctl.h... yes checking sys/param.h usability... yes checking sys/param.h presence... yes checking for sys/param.h... yes checking sys/socket.h usability... yes checking sys/socket.h presence... yes checking for sys/socket.h... yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking syslog.h usability... yes checking syslog.h presence... yes checking for syslog.h... yes checking for unistd.h... (cached) yes checking for sys/types.h... (cached) yes checking getopt.h usability... yes checking getopt.h presence... yes checking for getopt.h... yes checking malloc.h usability... yes checking malloc.h presence... yes checking for malloc.h... yes checking sys/sockio.h usability... no checking sys/sockio.h presence... no checking for sys/sockio.h... no checking utmpx.h usability... yes checking utmpx.h presence... yes checking for utmpx.h... yes checking urcu.h usability... yes checking urcu.h presence... yes checking for urcu.h... yes checking urcu/uatomic.h usability... yes checking urcu/uatomic.h presence... yes checking for urcu/uatomic.h... yes checking for an ANSI C-conforming const... yes checking for uid_t in sys/types.h... yes checking for inline... inline checking for size_t... (cached) yes checking whether time.h and sys/time.h may both be included... yes checking for working volatile... yes checking size of short... 2 checking size of int... 4 checking size of long... 8 checking size of long long... 8 checking sys/eventfd.h usability... yes checking sys/eventfd.h presence... yes checking for sys/eventfd.h... yes checking sys/signalfd.h usability... yes checking sys/signalfd.h presence... yes checking for sys/signalfd.h... yes checking sys/timerfd.h usability... yes checking sys/timerfd.h presence... yes checking for sys/timerfd.h... yes checking whether closedir returns void... no checking for error_at_line... yes checking for mbstate_t... yes checking for working POSIX fnmatch... yes checking for pid_t... yes checking vfork.h usability... no checking vfork.h presence... no checking for vfork.h... no checking for fork... yes checking for vfork... yes checking for working fork... yes checking for working vfork... (cached) yes checking whether gcc needs -traditional... no checking for stdlib.h... (cached) yes checking for GNU libc compatible malloc... yes checking for working memcmp... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking sys/select.h usability... yes checking sys/select.h presence... yes checking for sys/select.h... yes checking for sys/socket.h... (cached) yes checking types of arguments for select... int,fd_set *,struct timeval * checking return type of signal handlers... void checking for vprintf... yes checking for _doprnt... no checking for alarm... yes checking for alphasort... yes checking for atexit... yes checking for bzero... yes checking for dup2... yes checking for endgrent... yes checking for endpwent... yes checking for fcntl... yes checking for getcwd... yes checking for getpeerucred... no checking for getpeereid... no checking for gettimeofday... yes checking for inet_ntoa... yes
Re: [sheepdog] [PATCH] func/test: change functional test output for __vdi_list (1/2)
From: Hitoshi Mitake mitake.hitoshi@lab.ntt.co.jp To: Teruaki Ishizaki ishizaki.teruaki@lab.ntt.co.jp Cc: sheepdog@lists.wpkg.org Subject: Re: [sheepdog] [PATCH] func/test: change functional test output for __vdi_list In-Reply-To: 1418725227-20464-1-git-send-email-ishizaki.teruaki@lab.ntt.co.jp References: 1418725227-20464-1-git-send-email-ishizaki.teruaki@lab.ntt.co.jp User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?ISO-2022-JP-2?B?R29qGyQoRCtXGyhC?=) APEL/10.8 Emacs/23.4 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - Maruoka) Content-Type: text/plain; charset=US-ASCII At Tue, 16 Dec 2014 19:20:27 +0900, Teruaki Ishizaki wrote: Change output of functional test using __vdi_list in assosiation with adding block_size_shift information to vdi list command. Signed-off-by: Teruaki Ishizaki ishizaki.teruaki@lab.ntt.co.jp --- tests/functional/016.out |2 +- tests/functional/029.out | 18 +++--- tests/functional/030.out | 158 +++--- tests/functional/031.out | 20 +++--- tests/functional/039.out | 42 ++-- tests/functional/040.out |8 +- tests/functional/041.out | 70 ++-- tests/functional/043.out | 24 tests/functional/044.out |2 +- tests/functional/046.out | 24 tests/functional/047.out |4 +- tests/functional/048.out |6 +- tests/functional/052.out | 64 +- tests/functional/059.out | 24 tests/functional/060.out | 80 tests/functional/062.out |2 +- tests/functional/068.out | 12 ++-- tests/functional/072.out | 12 ++-- tests/functional/076.out |8 +- tests/functional/077.out |4 +- tests/functional/078.out | 18 +++--- tests/functional/079.out | 16 +++--- tests/functional/080.out |8 +- tests/functional/083.out |4 +- tests/functional/088.out |8 +- tests/functional/091.out |8 +- tests/functional/092.out | 20 +++--- tests/functional/096.out | 66 ++-- 28 files changed, 366 insertions(+), 366 deletions(-) Applied, thanks. Hitoshi diff --git a/tests/functional/016.out b/tests/functional/016.out index ab648d4..50e23d7 100644 --- a/tests/functional/016.out +++ b/tests/functional/016.out @@ -2,7 +2,7 @@ QA output created by 016 using backend plain store Failed to create snapshot for base, maybe snapshot id (0) or tag (tag) is existed there should be no vdi - NameIdSizeUsed SharedCreation time VDI id Copies Tag + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift there should be no object STORE DATA VDI VMSTATE ATTR LEDGER STALE 0 0 3 0 0 0 0 diff --git a/tests/functional/029.out b/tests/functional/029.out index 23117f7..7c50653 100644 --- a/tests/functional/029.out +++ b/tests/functional/029.out @@ -6,15 +6,15 @@ To create replicated vdi, set -c x To create erasure coded vdi, set -c x:y x(2,4,8,16) - number of data strips y(1 to 15) - number of parity strips - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test50 20 MB 20 MB 0.0 MB DATE fd2c304:2 - test40 20 MB 20 MB 0.0 MB DATE fd2de3 4 - test70 20 MB 20 MB 0.0 MB DATE fd2f964:4 - test60 20 MB 20 MB 0.0 MB DATE fd31494:3 - test30 20 MB 20 MB 0.0 MB DATE fd3662 3 - test20 20 MB 0.0 MB 20 MB DATE fd3816 2 - test90 20 MB 20 MB 0.0 MB DATE fd4094 16:7 - test80 20 MB 20 MB 0.0 MB DATE fd42474:5 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test50 20 MB 20 MB 0.0 MB DATE fd2c304:222 + test40 20 MB 20 MB 0.0 MB DATE fd2de3 422 + test70 20 MB 20 MB 0.0 MB DATE fd2f964:422 + test60 20 MB 20 MB 0.0 MB DATE fd31494:322 + test30 20 MB 20 MB 0.0 MB DATE fd3662 322 + test20 20 MB 0.0 MB 20 MB DATE fd3816 222 + test90 20 MB 20 MB 0.0 MB DATE fd4094 16:722 + test80 20 MB 20 MB 0.0 MB DATE fd42474:522 Looking for the object 0xfd38150001 (vid 0xfd3816 idx 1, 2 copies) with 23 nodes 127.0.0.1:7000 doesn't have the object diff --git a/tests/functional/030.out b/tests/functional/030.out index 5b386ab..00f50a9 100644 --- a/tests/functional/030.out +++ b/tests/functional/030.out @@ -5,36 +5,36 @@ Index Tag Snapshot Time Index Tag Snapshot Time 1
Re: [sheepdog] [PATCH] func/test: change functional test output for __vdi_list (2/2)
- NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 1 - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 1 - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 1 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 122 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 122 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 8.0 MB 8.0 MB 0.0 MB DATE 7c2b25 122 finish checkrepair test diff --git a/tests/functional/076.out b/tests/functional/076.out index b179a24..c21daeb 100644 --- a/tests/functional/076.out +++ b/tests/functional/076.out @@ -1,7 +1,7 @@ QA output created by 076 using backend plain store - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 40 MB 0.0 MB 0.0 MB DATE 7c2b25 16:15 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 40 MB 0.0 MB 0.0 MB DATE 7c2b25 16:1522 using backend plain store - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 40 MB 0.0 MB 0.0 MB DATE 7c2b252:1 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 40 MB 0.0 MB 0.0 MB DATE 7c2b252:122 diff --git a/tests/functional/077.out b/tests/functional/077.out index 191d39f..0657acc 100644 --- a/tests/functional/077.out +++ b/tests/functional/077.out @@ -1,7 +1,7 @@ QA output created by 077 using backend plain store - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 12 MB 0.0 MB 0.0 MB DATE 7c2b25 3 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test 0 12 MB 0.0 MB 0.0 MB DATE 7c2b25 322 [127.0.0.1:7000] oid 007c2b25 is missing. test lost 1 object(s). fixed missing 7c2b25 diff --git a/tests/functional/078.out b/tests/functional/078.out index e98c1d2..4ee8002 100644 --- a/tests/functional/078.out +++ b/tests/functional/078.out @@ -1,11 +1,11 @@ QA output created by 078 using backend plain store - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test10 20 MB 0.0 MB 0.0 MB DATE fd32fc4:2 - test30 20 MB 0.0 MB 0.0 MB DATE fd36622:1 - test20 20 MB 0.0 MB 0.0 MB DATE fd3815 2 - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test40 20 MB 0.0 MB 0.0 MB DATE fd2de34:2 - test10 20 MB 0.0 MB 0.0 MB DATE fd32fc4:2 - test30 20 MB 0.0 MB 0.0 MB DATE fd36622:1 - test20 20 MB 0.0 MB 0.0 MB DATE fd3815 2 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test10 20 MB 0.0 MB 0.0 MB DATE fd32fc4:222 + test30 20 MB 0.0 MB 0.0 MB DATE fd36622:122 + test20 20 MB 0.0 MB 0.0 MB DATE fd3815 222 + NameIdSizeUsed SharedCreation time VDI id Copies Tag Block Size Shift + test40 20 MB 0.0 MB 0.0 MB DATE fd2de34:222 + test10 20 MB 0.0 MB 0.0 MB DATE fd32fc4:222 + test30 20 MB 0.0 MB 0.0 MB DATE fd36622:122 + test20 20 MB 0.0 MB 0.0 MB DATE fd3815 222 diff --git a/tests/functional/079.out b/tests/functional/079.out index 7f0949d..021ccfe 100644 --- a/tests/functional/079.out +++ b/tests/functional/079.out @@ -1,11 +1,11 @@ QA output created by 079 using backend plain store - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 16 PB 0.0 MB 0.0 MB DATE 7c2b25 3 - NameIdSizeUsed SharedCreation time VDI id Copies Tag - test 0 16 PB 64 MB 0.0 MB DATE 7c2b25 3 + NameIdSizeUsed
Re: [sheepdog] [PATCH] sheep: fix bug for not saving block_size_shift to cluster config
At Tue, 16 Dec 2014 19:32:17 +0900, Teruaki Ishizaki wrote: This patch fixes bugs that block_size_shift info was forgotten after cluster shutdown and start sheepdog. Add block_size_shift info to cluster config file. Signed-off-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp --- sheep/config.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) Applied, thanks. Hitoshi diff --git a/sheep/config.c b/sheep/config.c index 383a1ed..dfad5fd 100644 --- a/sheep/config.c +++ b/sheep/config.c @@ -11,7 +11,7 @@ #include sheep_priv.h -#define SD_FORMAT_VERSION 0x0005 +#define SD_FORMAT_VERSION 0x0006 #define SD_CONFIG_SIZE 40 static struct sheepdog_config { @@ -21,7 +21,7 @@ static struct sheepdog_config { uint8_t store[STORE_LEN]; uint8_t shutdown; uint8_t copy_policy; - uint8_t __pad; + uint8_t block_size_shift; uint16_t version; uint64_t space; } config; @@ -64,6 +64,7 @@ static int get_cluster_config(struct cluster_info *cinfo) cinfo-nr_copies = config.copies; cinfo-flags = config.flags; cinfo-copy_policy = config.copy_policy; + cinfo-block_size_shift = config.block_size_shift; memcpy(cinfo-store, config.store, sizeof(config.store)); return SD_RES_SUCCESS; @@ -155,6 +156,7 @@ int set_cluster_config(const struct cluster_info *cinfo) config.copies = cinfo-nr_copies; config.copy_policy = cinfo-copy_policy; config.flags = cinfo-flags; + config.block_size_shift = cinfo-block_size_shift; memset(config.store, 0, sizeof(config.store)); pstrcpy((char *)config.store, sizeof(config.store), (char *)cinfo-store); -- 1.7.1 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH 2/2] sheep: forbid revival of orphan objects
At Tue, 16 Dec 2014 15:18:18 +0100, Valerio Pachera wrote: 2014-12-16 15:07 GMT+01:00 Valerio Pachera siri...@gmail.com: It works fine! ... There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Please, notice that the same logic should apply to multi device: create some vdi unplug a disk remove some vdi plug back the disk This still causes Dec 16 15:10:16 INFO [main] recover_object_main(930) object recovery progress 74% Dec 16 15:10:16 ERROR [rw 30554] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30553] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30500] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30500] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30554] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ALERT [rw 30500] recover_replication_object(419) cannot access any replicas of fd32fc0013 at epoch 2 Dec 16 15:10:16 ALERT [rw 30500] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30500] recover_replication_object(427) can not recover oid fd32fc0013 Dec 16 15:10:16 ERROR [rw 30500] recover_object_work(600) failed to recover object fd32fc0013 Dec 16 15:10:16 ALERT [rw 30554] recover_replication_object(419) cannot access any replicas of fd32fc000b at epoch 2 Dec 16 15:10:16 ALERT [rw 30554] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30554] recover_replication_object(427) can not recover oid fd32fc000b Dec 16 15:10:16 ERROR [rw 30554] recover_object_work(600) failed to recover object fd32fc000b Dec 16 15:10:16 ERROR [rw 30553] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.4:7000, op name: READ_PEER Dec 16 15:10:16 ERROR [rw 30552] sheep_exec_req(1170) failed Network error between sheep, remote address: 192.168.10.5:7000, op name: READ_PEER Dec 16 15:10:16 ALERT [rw 30553] recover_replication_object(419) cannot access any replicas of fd32fc0012 at epoch 2 Dec 16 15:10:16 ALERT [rw 30553] recover_replication_object(420) clients may see old data Dec 16 15:10:16 ERROR [rw 30553] recover_replication_object(427) can not recover oid fd32fc0012 Dec 16 15:10:16 ERROR [rw 30553] recover_object_work(600) failed to recover object fd32fc0012 Notice I'm not using the option --enable-diskvnodes. Thank you. To be honest, the design of current md should be refined completely. I'll work on the issue related to md in the future. Thanks, Hitoshi -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
[sheepdog] [PATCH] sheep, dog: check cluster is formatted or not during vdi creation
Current dog prints an odd error message in a case of vdi creation before cluster formatting like below: $ dog/dog vdi create test 16M VDI size is larger than 1.0 MB bytes, please use '-y' to create a hyper volume with size up to 16 PB bytes or use '-z' to create larger object size volume This patch revives previous behavior. Cc: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- dog/vdi.c | 7 +++ sheep/ops.c | 3 +++ 2 files changed, 10 insertions(+) diff --git a/dog/vdi.c b/dog/vdi.c index 22d6c83..effed17 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -478,6 +478,13 @@ static int vdi_create(int argc, char **argv) ret = EXIT_FAILURE; goto out; } + + if (rsp-result == SD_RES_WAIT_FOR_FORMAT) { + sd_err(Failed to create VDI %s: %s, vdiname, + sd_strerror(rsp-result)); + return EXIT_FAILURE; + } + if (rsp-result != SD_RES_SUCCESS) { sd_err(%s, sd_strerror(rsp-result)); ret = EXIT_FAILURE; diff --git a/sheep/ops.c b/sheep/ops.c index 448fd8e..3fb34aa 100644 --- a/sheep/ops.c +++ b/sheep/ops.c @@ -1125,6 +1125,9 @@ static int local_oids_exist(const struct sd_req *req, struct sd_rsp *rsp, static int local_cluster_info(const struct sd_req *req, struct sd_rsp *rsp, void *data, const struct sd_node *sender) { + if (sys-cinfo.ctime == 0) + return SD_RES_WAIT_FOR_FORMAT; + memcpy(data, sys-cinfo, sizeof(sys-cinfo)); rsp-data_length = sizeof(sys-cinfo); return SD_RES_SUCCESS; -- 1.8.3.2 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] Fwd: [PATCH 2/2] sheep: forbid revival of orphan objects
At Tue, 16 Dec 2014 15:11:49 +0100, Valerio Pachera wrote: 2014-12-11 8:00 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: Current recovery process can cause revival of orphan objects. This patch solves this problem. sheep -v Sheepdog daemon version 0.9.0_18_g7215788 It works fine! Dec 16 15:00:20 INFO [main] main(966) shutdown Dec 16 15:00:20 INFO [main] zk_leave(989) leaving from cluster Dec 16 15:00:37 INFO [main] md_add_disk(343) /mnt/sheep/0, vdisk nr 206, total disk 1 Dec 16 15:00:37 INFO [main] md_add_disk(343) /mnt/sheep/1, vdisk nr 279, total disk 2 Dec 16 15:00:37 NOTICE [main] get_local_addr(522) found IPv4 address Dec 16 15:00:37 INFO [main] send_join_request(1016) IPv4 ip:192.168.10.7 port:7000 going to join the cluster Dec 16 15:00:37 NOTICE [main] nfs_init(611) nfs server service is not compiled Dec 16 15:00:37 INFO [main] main(958) sheepdog daemon (version 0.9.0_18_g7215788) started Dec 16 15:00:38 INFO [rw 30221] prepare_object_list(1100) skipping object list reading from IPv4 ip:192.168.10.7 port:7000 becauseit is marked as excluded node Dec 16 15:00:38 INFO [main] recover_object_main(930) object recovery progress 2% Dec 16 15:00:38 INFO [main] recover_object_main(930) object recovery progress 3% Thanks for your checking, applied this series. There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Do you mean the problem is the below error messages? Thanks, Hitoshi Dec 16 14:55:09 INFO [main] zk_leave(989) leaving from cluster Dec 16 14:55:40 INFO [main] md_add_disk(343) /mnt/sheep/0, vdisk nr 206, total disk 1 Dec 16 14:55:40 INFO [main] md_add_disk(343) /mnt/sheep/1, vdisk nr 279, total disk 2 Dec 16 14:55:40 NOTICE [main] get_local_addr(522) found IPv4 address Dec 16 14:55:40 INFO [main] send_join_request(1016) IPv4 ip:192.168.10.7 port:7000 going to join the cluster Dec 16 14:55:40 NOTICE [main] nfs_init(611) nfs server service is not compiled Dec 16 14:55:40 INFO [main] main(958) sheepdog daemon (version 0.9.0_18_g7215788) started Dec 16 14:55:41 INFO [rw 30049] prepare_object_list(1100) skipping object list reading from IPv4 ip:192.168.10.5 port:7000 becauseit is marked as excluded node Dec 16 14:55:41 ERROR [rw 30049] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30068] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30068] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.6:7000, op name: GET_HASH Dec 16 14:55:41 ERROR [rw 30072] sheep_exec_req(1170) failed No object found, remote address: 192.168.10.5:7000, op name: GET_HASH Dec 16 14:55:41 INFO [main] recover_object_main(930) object recovery progress 1% cut -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH] sheep, dog: check cluster is formatted or not during vdi creation
(2014/12/17 10:48), Hitoshi Mitake wrote: Current dog prints an odd error message in a case of vdi creation before cluster formatting like below: $ dog/dog vdi create test 16M VDI size is larger than 1.0 MB bytes, please use '-y' to create a hyper volume with size up to 16 PB bytes or use '-z' to create larger object size volume This patch revives previous behavior. Cc: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- dog/vdi.c | 7 +++ sheep/ops.c | 3 +++ 2 files changed, 10 insertions(+) I've tested and it looks good to me. Reviewed-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp Best Regards, Teruaki diff --git a/dog/vdi.c b/dog/vdi.c index 22d6c83..effed17 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -478,6 +478,13 @@ static int vdi_create(int argc, char **argv) ret = EXIT_FAILURE; goto out; } + + if (rsp-result == SD_RES_WAIT_FOR_FORMAT) { + sd_err(Failed to create VDI %s: %s, vdiname, +sd_strerror(rsp-result)); + return EXIT_FAILURE; + } + if (rsp-result != SD_RES_SUCCESS) { sd_err(%s, sd_strerror(rsp-result)); ret = EXIT_FAILURE; diff --git a/sheep/ops.c b/sheep/ops.c index 448fd8e..3fb34aa 100644 --- a/sheep/ops.c +++ b/sheep/ops.c @@ -1125,6 +1125,9 @@ static int local_oids_exist(const struct sd_req *req, struct sd_rsp *rsp, static int local_cluster_info(const struct sd_req *req, struct sd_rsp *rsp, void *data, const struct sd_node *sender) { + if (sys-cinfo.ctime == 0) + return SD_RES_WAIT_FOR_FORMAT; + memcpy(data, sys-cinfo, sizeof(sys-cinfo)); rsp-data_length = sizeof(sys-cinfo); return SD_RES_SUCCESS; -- NTT ソフトウェアイノベーションセンタ 分散処理基盤技術P(I分P) 石崎 晃朗 Tel: 0422-59-3488 Fax: 0422-59-2965 Email: ishizaki.teru...@lab.ntt.co.jp -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH] sheep, dog: check cluster is formatted or not during vdi creation
At Wed, 17 Dec 2014 12:32:08 +0900, Teruaki Ishizaki wrote: (2014/12/17 10:48), Hitoshi Mitake wrote: Current dog prints an odd error message in a case of vdi creation before cluster formatting like below: $ dog/dog vdi create test 16M VDI size is larger than 1.0 MB bytes, please use '-y' to create a hyper volume with size up to 16 PB bytes or use '-z' to create larger object size volume This patch revives previous behavior. Cc: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- dog/vdi.c | 7 +++ sheep/ops.c | 3 +++ 2 files changed, 10 insertions(+) I've tested and it looks good to me. Reviewed-by: Teruaki Ishizaki ishizaki.teru...@lab.ntt.co.jp Best Regards, Teruaki Applied. Thanks, Hitoshi diff --git a/dog/vdi.c b/dog/vdi.c index 22d6c83..effed17 100644 --- a/dog/vdi.c +++ b/dog/vdi.c @@ -478,6 +478,13 @@ static int vdi_create(int argc, char **argv) ret = EXIT_FAILURE; goto out; } + + if (rsp-result == SD_RES_WAIT_FOR_FORMAT) { + sd_err(Failed to create VDI %s: %s, vdiname, + sd_strerror(rsp-result)); + return EXIT_FAILURE; + } + if (rsp-result != SD_RES_SUCCESS) { sd_err(%s, sd_strerror(rsp-result)); ret = EXIT_FAILURE; diff --git a/sheep/ops.c b/sheep/ops.c index 448fd8e..3fb34aa 100644 --- a/sheep/ops.c +++ b/sheep/ops.c @@ -1125,6 +1125,9 @@ static int local_oids_exist(const struct sd_req *req, struct sd_rsp *rsp, static int local_cluster_info(const struct sd_req *req, struct sd_rsp *rsp, void *data, const struct sd_node *sender) { + if (sys-cinfo.ctime == 0) + return SD_RES_WAIT_FOR_FORMAT; + memcpy(data, sys-cinfo, sizeof(sys-cinfo)); rsp-data_length = sizeof(sys-cinfo); return SD_RES_SUCCESS; -- NTT ソフトウェアイノベーションセンタ 分散処理基盤技術P(I分P) 石崎 晃朗 Tel: 0422-59-3488 Fax: 0422-59-2965 Email: ishizaki.teru...@lab.ntt.co.jp -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v2] sheep: let gateway node exit in a case of gateway only cluster
At Mon, 15 Dec 2014 23:14:55 +0900, Hitoshi Mitake wrote: When a cluster has gateway nodes only, it means the gateway nodes doesn't contribute to I/O of VMs. So this patch simply let them exit and avoid the below recovery issue. Related issue: https://bugs.launchpad.net/sheepdog-project/+bug/1327037 Cc: duron...@qq.com Cc: Yang Zhang 3100100...@zju.edu.cn Cc: long nxtxiaol...@gmail.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- sheep/group.c | 17 + 1 file changed, 17 insertions(+) Yang, long, when you have time, could you test this patch? Thanks, Hitoshi v2: remove needless logging diff --git a/sheep/group.c b/sheep/group.c index 095b7c5..5dc3284 100644 --- a/sheep/group.c +++ b/sheep/group.c @@ -1151,6 +1151,18 @@ main_fn void sd_accept_handler(const struct sd_node *joined, } } +static bool is_gateway_only_cluster(const struct rb_root *nroot) +{ + struct sd_node *n; + + rb_for_each_entry(n, nroot, rb) { + if (n-space) + return false; + } + + return true; +} + main_fn void sd_leave_handler(const struct sd_node *left, const struct rb_root *nroot, size_t nr_nodes) { @@ -1177,6 +1189,11 @@ main_fn void sd_leave_handler(const struct sd_node *left, old_vnode_info = main_thread_get(current_vnode_info); main_thread_set(current_vnode_info, alloc_vnode_info(nroot)); if (sys-cinfo.status == SD_STATUS_OK) { + if (is_gateway_only_cluster(nroot)) { + sd_info(only gateway nodes are remaining, exiting); + exit(0); + } + ret = inc_and_log_epoch(); if (ret != 0) panic(cannot log current epoch %d, sys-cinfo.epoch); -- 1.9.1 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH RFT 0/4] garbage collect needless VIDs and inode objects
At Tue, 16 Dec 2014 12:28:29 +0100, Valerio Pachera wrote: 2014-12-15 10:36 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: Current sheepdog never recycles VIDs. But it will cause problems e.g. VID space exhaustion, too much garbage inode objects. I've been testing this branch and it seem to work. I use a script that creates 3 vdi, 3 snapshot for each (writing 10M of data), then removes them and look for objects with name starting with 80*. With all snap active /mnt/sheep/1/80fd3663 /mnt/sheep/0/80fd3818 /mnt/sheep/0/80fd32fc /mnt/sheep/0/80fd32fd /mnt/sheep/0/80fd32fe After removing all snap /mnt/sheep/1/80fd3663 /mnt/sheep/0/80fd3818 /mnt/sheep/0/80fd32fc /mnt/sheep/0/80fd32fd /mnt/sheep/0/80fd32fe After removing all vdi empty sheep -v Sheepdog daemon version 0.9.0_25_g24ef77f But I found a repeatable sheepdog crash! I notice that happening if I was running the script a second time. The crash occur after when I recreate a vdi with the same name and then I take a snapshot of it. Dec 16 12:12:42 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40067, op=DEL_VDI, result=00 Dec 16 12:12:47 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40069, op=DEL_VDI, data=(not string) Dec 16 12:12:47 INFO [main] run_vid_gc(2106) all members of the family (root: fd3662) are deleted Dec 16 12:12:47 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40069, op=DEL_VDI, result=00 Dec 16 12:13:57 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40072, op=NEW_VDI, data=(not string) Dec 16 12:13:57 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd32fc Dec 16 12:13:57 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40072, op=NEW_VDI, result=00 Dec 16 12:14:12 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40074, op=NEW_VDI, data=(not string) Dec 16 12:14:13 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd3815 Dec 16 12:14:13 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40074, op=NEW_VDI, result=00 Dec 16 12:14:23 INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40076, op=NEW_VDI, data=(not string) Dec 16 12:14:23 INFO [main] post_cluster_new_vdi(133) req-vdi.base_vdi_id: 0, rsp-vdi.vdi_id: fd3662 Dec 16 12:14:23 INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26, client=127.0.0.1:40076, op=NEW_VDI, result=00 Dec 16 12:14:34 INFO [main] rx_main(830) req=0x7f314400d310, fd=26, client=127.0.0.1:40078, op=NEW_VDI, data=(not string) Dec 16 12:14:34 EMERG [main] crash_handler(268) sheep exits unexpectedly (Segmentation fault). Dec 16 12:14:34 EMERG [main] sd_backtrace(833) sheep.c:270: crash_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f31515cc02f] Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:64: lookup_vdi_family_member Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:109: update_vdi_family Dec 16 12:14:34 EMERG [main] sd_backtrace(833) vdi.c:396: add_vdi_state Dec 16 12:14:34 EMERG [main] sd_backtrace(833) ops.c:674: cluster_notify_vdi_add Dec 16 12:14:34 EMERG [main] sd_backtrace(833) group.c:948: sd_notify_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(833) zookeeper.c:1252: zk_event_handler Dec 16 12:14:34 EMERG [main] sd_backtrace(833) event.c:210: do_event_loop Dec 16 12:14:34 EMERG [main] sd_backtrace(833) sheep.c:963: main Dec 16 12:14:34 EMERG [main] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc) [0x7f3150badeac] Dec 16 12:14:34 EMERG [main] sd_backtrace(847) sheep() [0x405fa8] How to reproduce: dog cluster format -c 2 dog vdi create -P test 1G dog vdi snapshot test dd if=/dev/urandom bs=1M count=10 | dog vdi write test dog vdi delete -s 1 test dog vdi delete test echo 'Recreating vdi test' dog vdi create -P test 1G dog vdi snapshot test -- at this point, sheep crashes dog vdi list Thanks for your report, I've fixed the problem and updated the gc-vid branch. Thanks, Hitoshi -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v2] sheep: let gateway node exit in a case of gateway only cluster
hi,Hitoshi we've tested the patch. Our test method is: We attached a 20G sheepdog VDI to a VM holded by openstack. And we created a 2G file which we have it's md5 in hand in the VDI. We killed the non-gateway nodes in the middle of the process, then restarted the cluster. The process resumed and the content of the file is right(same md5) Thanks, Yang,Long On Wed, Dec 17, 2014 at 1:09 PM, Hitoshi Mitake mitake.hito...@lab.ntt.co.jp wrote: At Mon, 15 Dec 2014 23:14:55 +0900, Hitoshi Mitake wrote: When a cluster has gateway nodes only, it means the gateway nodes doesn't contribute to I/O of VMs. So this patch simply let them exit and avoid the below recovery issue. Related issue: https://bugs.launchpad.net/sheepdog-project/+bug/1327037 Cc: duron...@qq.com Cc: Yang Zhang 3100100...@zju.edu.cn Cc: long nxtxiaol...@gmail.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- sheep/group.c | 17 + 1 file changed, 17 insertions(+) Yang, long, when you have time, could you test this patch? Thanks, Hitoshi v2: remove needless logging diff --git a/sheep/group.c b/sheep/group.c index 095b7c5..5dc3284 100644 --- a/sheep/group.c +++ b/sheep/group.c @@ -1151,6 +1151,18 @@ main_fn void sd_accept_handler(const struct sd_node *joined, } } +static bool is_gateway_only_cluster(const struct rb_root *nroot) +{ + struct sd_node *n; + + rb_for_each_entry(n, nroot, rb) { + if (n-space) + return false; + } + + return true; +} + main_fn void sd_leave_handler(const struct sd_node *left, const struct rb_root *nroot, size_t nr_nodes) { @@ -1177,6 +1189,11 @@ main_fn void sd_leave_handler(const struct sd_node *left, old_vnode_info = main_thread_get(current_vnode_info); main_thread_set(current_vnode_info, alloc_vnode_info(nroot)); if (sys-cinfo.status == SD_STATUS_OK) { + if (is_gateway_only_cluster(nroot)) { + sd_info(only gateway nodes are remaining, exiting); + exit(0); + } + ret = inc_and_log_epoch(); if (ret != 0) panic(cannot log current epoch %d, sys-cinfo.epoch); -- 1.9.1 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v2] sheep: let gateway node exit in a case of gateway only cluster
At Wed, 17 Dec 2014 15:40:35 +0800, $B=y.$(AAz(B wrote: [1 text/plain; UTF-8 (7bit)] hi,Hitoshi we've tested the patch. Our test method is: We attached a 20G sheepdog VDI to a VM holded by openstack. And we created a 2G file which we have it's md5 in hand in the VDI. We killed the non-gateway nodes in the middle of the process, then restarted the cluster. The process resumed and the content of the file is right(same md5) Thanks, Yang,Long Thanks a lot for testing! Could you give me your Tested-by: tags? (e.g. Tested-by: Yang Zhang 3100100...@zju.edu.cn, Tested-by: Long nxtxiaol...@gmail.com) Thanks, Hitoshi On Wed, Dec 17, 2014 at 1:09 PM, Hitoshi Mitake mitake.hito...@lab.ntt.co.jp wrote: At Mon, 15 Dec 2014 23:14:55 +0900, Hitoshi Mitake wrote: When a cluster has gateway nodes only, it means the gateway nodes doesn't contribute to I/O of VMs. So this patch simply let them exit and avoid the below recovery issue. Related issue: https://bugs.launchpad.net/sheepdog-project/+bug/1327037 Cc: duron...@qq.com Cc: Yang Zhang 3100100...@zju.edu.cn Cc: long nxtxiaol...@gmail.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- sheep/group.c | 17 + 1 file changed, 17 insertions(+) Yang, long, when you have time, could you test this patch? Thanks, Hitoshi v2: remove needless logging diff --git a/sheep/group.c b/sheep/group.c index 095b7c5..5dc3284 100644 --- a/sheep/group.c +++ b/sheep/group.c @@ -1151,6 +1151,18 @@ main_fn void sd_accept_handler(const struct sd_node *joined, } } +static bool is_gateway_only_cluster(const struct rb_root *nroot) +{ + struct sd_node *n; + + rb_for_each_entry(n, nroot, rb) { + if (n-space) + return false; + } + + return true; +} + main_fn void sd_leave_handler(const struct sd_node *left, const struct rb_root *nroot, size_t nr_nodes) { @@ -1177,6 +1189,11 @@ main_fn void sd_leave_handler(const struct sd_node *left, old_vnode_info = main_thread_get(current_vnode_info); main_thread_set(current_vnode_info, alloc_vnode_info(nroot)); if (sys-cinfo.status == SD_STATUS_OK) { + if (is_gateway_only_cluster(nroot)) { + sd_info(only gateway nodes are remaining, exiting); + exit(0); + } + ret = inc_and_log_epoch(); if (ret != 0) panic(cannot log current epoch %d, sys-cinfo.epoch); -- 1.9.1 [2 text/html; UTF-8 (quoted-printable)] -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] Fwd: [PATCH 2/2] sheep: forbid revival of orphan objects
2014-12-17 3:48 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Do you mean the problem is the below error messages? The problem is that: node 1, 2, 3, 4 create vdi disconnect node 4 remove *all* vdi reconnect node 4 Node 1,2 and 3 are empty but node 4 doesn't remove the objects once rejoined the cluster. And it prints the below error messages. -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] [PATCH v2] sheep: let gateway node exit in a case of gateway only cluster
At Wed, 17 Dec 2014 16:42:27 +0900, Hitoshi Mitake wrote: At Wed, 17 Dec 2014 15:40:35 +0800, $B=y.$(AAz(B wrote: [1 text/plain; UTF-8 (7bit)] hi,Hitoshi we've tested the patch. Our test method is: We attached a 20G sheepdog VDI to a VM holded by openstack. And we created a 2G file which we have it's md5 in hand in the VDI. We killed the non-gateway nodes in the middle of the process, then restarted the cluster. The process resumed and the content of the file is right(same md5) Thanks, Yang,Long Thanks a lot for testing! Could you give me your Tested-by: tags? (e.g. Tested-by: Yang Zhang 3100100...@zju.edu.cn, Tested-by: Long nxtxiaol...@gmail.com) Applied this one. Thanks, Hitoshi Thanks, Hitoshi On Wed, Dec 17, 2014 at 1:09 PM, Hitoshi Mitake mitake.hito...@lab.ntt.co.jp wrote: At Mon, 15 Dec 2014 23:14:55 +0900, Hitoshi Mitake wrote: When a cluster has gateway nodes only, it means the gateway nodes doesn't contribute to I/O of VMs. So this patch simply let them exit and avoid the below recovery issue. Related issue: https://bugs.launchpad.net/sheepdog-project/+bug/1327037 Cc: duron...@qq.com Cc: Yang Zhang 3100100...@zju.edu.cn Cc: long nxtxiaol...@gmail.com Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp --- sheep/group.c | 17 + 1 file changed, 17 insertions(+) Yang, long, when you have time, could you test this patch? Thanks, Hitoshi v2: remove needless logging diff --git a/sheep/group.c b/sheep/group.c index 095b7c5..5dc3284 100644 --- a/sheep/group.c +++ b/sheep/group.c @@ -1151,6 +1151,18 @@ main_fn void sd_accept_handler(const struct sd_node *joined, } } +static bool is_gateway_only_cluster(const struct rb_root *nroot) +{ + struct sd_node *n; + + rb_for_each_entry(n, nroot, rb) { + if (n-space) + return false; + } + + return true; +} + main_fn void sd_leave_handler(const struct sd_node *left, const struct rb_root *nroot, size_t nr_nodes) { @@ -1177,6 +1189,11 @@ main_fn void sd_leave_handler(const struct sd_node *left, old_vnode_info = main_thread_get(current_vnode_info); main_thread_set(current_vnode_info, alloc_vnode_info(nroot)); if (sys-cinfo.status == SD_STATUS_OK) { + if (is_gateway_only_cluster(nroot)) { + sd_info(only gateway nodes are remaining, exiting); + exit(0); + } + ret = inc_and_log_epoch(); if (ret != 0) panic(cannot log current epoch %d, sys-cinfo.epoch); -- 1.9.1 [2 text/html; UTF-8 (quoted-printable)] -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog
Re: [sheepdog] Fwd: [PATCH 2/2] sheep: forbid revival of orphan objects
At Wed, 17 Dec 2014 08:50:35 +0100, Valerio Pachera wrote: 2014-12-17 3:48 GMT+01:00 Hitoshi Mitake mitake.hito...@lab.ntt.co.jp: There's only this corner case to fix: all vdi are removed then and the disconnected node joins back the cluster Do you mean the problem is the below error messages? The problem is that: node 1, 2, 3, 4 create vdi disconnect node 4 remove *all* vdi reconnect node 4 Node 1,2 and 3 are empty but node 4 doesn't remove the objects once rejoined the cluster. And it prints the below error messages. Ah, I see. I'll work on it later. Thanks, Hitoshi -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog