Hello,
We are running robinhood v3.0 against a lustre 2.7 filesystem and using
the LHSM policy to archive the filesystem.
I am doing some testing at the moment of restoring directories using the
rbh-undelete command and I am running into a segmentation fault when
using the command to restore a directory that has been deleted.
What I find notable is that the command will reliably restore two files
from the directory and segfault when restoring the 3rd file, every time.
If you then run it again, it will again restore another 2 files, and
segfault on the 3rd.
I've installed debuginfo packages and here is a stacktrace from gdb
after a crash. You can see that I am trying to restore a directory that
contained 5 files all in state 'synchro', and the segfault happens after
the first two files are successfully restored.
Here is the rebind_cmd we are using:
lhsm_config {
# used for "undelete": command to change the fid of an entry in archive
rebind_cmd = "/usr/sbin/lhsmtool_posix --hsm_root=/mnt/qstar/rds-d1/lhsm
--archive {archive_id} --rebind {oldfid} {newfid} {fsroot}";
# for UUID-based mapping
uuid {
xattr = "trusted.lhsm_uuid";
}
}
[root@rbh-rds-data robinhood-src]# gdb
rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete...done.
(gdb) run -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:36:35 [15513/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15517)]
[Thread 0x7ffff3ae1700 (LWP 15517) exited]
rm_time, id, type, user,
group, size, last_mod, lhsm.status,
path
2016/12/26 21:07:32, [0x200000ddb:0xe45b:0x0], file, wjt27,
wjt27, 13.00 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_al7230
b.c
2016/12/26 21:07:32, [0x200000ddb:0xe45c:0x0], file, wjt27,
wjt27, 8.62 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959
.c
2016/12/26 21:07:32, [0x200000ddb:0xe45d:0x0], file, wjt27,
wjt27, 15.34 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453
.c
2016/12/26 21:07:32, [0x200000ddb:0xe45e:0x0], file, wjt27,
wjt27, 50.44 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
2016/12/26 21:07:32, [0x200000ddb:0xe45f:0x0], file, wjt27,
wjt27, 7.30 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15513) exited normally]
Missing separate debuginfos, use: debuginfo-install
libuuid-2.23.2-33.el7.x86_64 mariadb-libs-5.5.52-1.el7.x86_64
pcre-8.32-15.el7_2.1.x86_64
(gdb) run -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:36:48 [15519/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15520)]
[Thread 0x7ffff3ae1700 (LWP 15520) exited]
Restoring
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_al7230b.c'...
restore OK (file)
Entry successfully updated in the dabatase
Restoring
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c'...
restore OK (file)
Entry successfully updated in the dabatase
Program received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at malloc.c:4146
4146 size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
Missing separate debuginfos, use: debuginfo-install
sssd-client-1.14.0-43.el7_3.4.x86_64
(gdb) bt
#0 malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at
malloc.c:4146
#1 0x00007ffff5eb6385 in _int_malloc (av=av@entry=0x7ffff61f3760 <main_arena>,
bytes=bytes@entry=4096) at malloc.c:3436
#2 0x00007ffff5eb8fbc in __GI___libc_malloc (bytes=4096) at malloc.c:2893
#3 0x00007ffff5e7b60c in __realpath (name=0x7fffffffbd24
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
resolved=0x0) at canonicalize.c:78
#4 0x00007ffff6b6f98a in llapi_search_fsname
(pathname=pathname@entry=0x7fffffffbd24
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
fsname=fsname@entry=0x7fffffff6b70 "")
at liblustreapi.c:1173
#5 0x00007ffff6b6fb0e in llapi_file_open_param (name=name@entry=0x7fffffffbd24
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
flags=flags@entry=65, mode=436,
param=param@entry=0x7fffffff6cd0) at liblustreapi.c:685
#6 0x00007ffff6b6ff75 in llapi_file_open_pool (name=name@entry=0x7fffffffbd24
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
flags=flags@entry=65, mode=<optimized out>,
stripe_size=stripe_size@entry=0, stripe_offset=stripe_offset@entry=-1,
stripe_count=stripe_count@entry=0,
stripe_pattern=stripe_pattern@entry=-2147483647, pool_name=pool_name@entry=0x0)
at liblustreapi.c:849
#7 0x00007ffff6b749f5 in llapi_hsm_import (dst=dst@entry=0x7fffffffbd24
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
archive=archive@entry=1, st=st@entry=0x7fffffff6e00,
stripe_size=stripe_size@entry=0, stripe_offset=stripe_offset@entry=-1,
stripe_count=stripe_count@entry=0, stripe_pattern=<optimized out>,
stripe_pattern@entry=0, pool_name=pool_name@entry=0x0,
newfid=newfid@entry=0x7fffffff6fd0) at liblustreapi_hsm.c:1333
#8 0x00007ffff42f7209 in lhsm_undelete (smi=0x6a0fb0, p_old_id=0x7fffffff9720,
p_attrs_old_in=0x7fffffffbc00, p_new_id=0x7fffffff6fd0,
p_attrs_new=0x7fffffff6fe0, already_recovered=<optimized out>) at lhsm.c:915
#9 0x000000000040c289 in undelete_helper (id=id@entry=0x7fffffff9720,
attrs=attrs@entry=0x7fffffffbc00) at rbh_undelete.c:329
#10 0x000000000040bd37 in undelete () at rbh_undelete.c:440
#11 main (argc=<optimized out>, argv=<optimized out>) at rbh_undelete.c:712
I can then run it again on the same directory and it will again restore
another two files before segfaulting again.
(gdb) run -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:00 [15521/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15522)]
[Thread 0x7ffff3ae1700 (LWP 15522) exited]
rm_time, id, type, user,
group, size, last_mod, lhsm.status,
path
2016/12/26 21:07:32, [0x200000ddb:0xe45d:0x0], file, wjt27,
wjt27, 15.34 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453
.c
2016/12/26 21:07:32, [0x200000ddb:0xe45e:0x0], file, wjt27,
wjt27, 50.44 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
2016/12/26 21:07:32, [0x200000ddb:0xe45f:0x0], file, wjt27,
wjt27, 7.30 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15521) exited normally]
(gdb) run -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:33 [15627/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15628)]
[Thread 0x7ffff3ae1700 (LWP 15628) exited]
Restoring
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c'...
restore OK (file)
Entry successfully updated in the dabatase
Restoring
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c'...
restore OK (file)
Entry successfully updated in the dabatase
Program received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at malloc.c:4146
4146 size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
And then finally running again, it restores one file and then exits
cleanly.
(gdb) run -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:38 [15629/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15630)]
[Thread 0x7ffff3ae1700 (LWP 15630) exited]
rm_time, id, type, user,
group, size, last_mod, lhsm.status,
path
2016/12/26 21:07:32, [0x200000ddb:0xe45f:0x0], file, wjt27,
wjt27, 7.30 KB, 2016/10/28 08:51:31, synchro,
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15629) exited normally]
(gdb) run -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program:
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:48 [15633/1] CheckFS | '/rds-d1' matches mount point
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15634)]
[Thread 0x7ffff3ae1700 (LWP 15634) exited]
Restoring
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h'...
restore OK (file)
Entry successfully updated in the dabatase
undelete summary:
1 files
0 old version
0 empty files
0 non-files
0 no backup
0 errors
0 DB errors
[Inferior 1 (process 15633) exited normally]
I was wondering if anyone else using HSM has seen or can reproduce this
crash? I'm afraid my C experience is very rusty but I am trying to
understand the code to see if I can spot where it is failing - any
pointers here would be most welcome!
Kind regards,
--
Matt Rásó-Barnett
Research Computing Platforms
University Information Services
High Performance Computing Service
University of Cambridge
Email: [email protected] <mailto:[email protected]>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support