I'm not sure how you are getting there yet, but I am able to recreate
this state on my laptop with a single server and the pvfs2 admin utilities.
The following shows the creation of 4 files. I then get the metadata
handle for one of them (via pvfs2-stat) and the fsid from the conf
file. Then pvfs2-remove-object is used to manually delete that
particular object, leaving the directory entry in place. pvfs2-lsplus
at that point starts showing the "Invalid type..." error and pvfs2-stat
gets no such file or directory.
-------------------------------------------------------------
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/a.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/b.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/d.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/c.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-lsplus -alh /mnt/pvfs2
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 10:53 .
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 10:53 .. (faked)
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 a.dat
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 b.dat
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 c.dat
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 d.dat
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 10:52 lost+found
[pca...@pcarns-laptop admin]$ ./pvfs2-stat /mnt/pvfs2/b.dat
-------------------------------------------------------
File Name : /mnt/pvfs2/b.dat
Relative Name : /b.dat
fs ID : 513116601
Handle : 1048574
Mask : 704000177
Permissions : 644
Type : Regular File
Size : 16777216
Owner : 1000 (pcarns)
Group : 1000 (pcarns)
atime : 1286808818 (Mon Oct 11 10:53:38 2010)
mtime : 1286808818 (Mon Oct 11 10:53:38 2010)
ctime : 1286808818 (Mon Oct 11 10:53:38 2010)
datafiles : 1
flags : none
[pca...@pcarns-laptop server]$ cat simple.conf |grep ID
ID 513116601
[pca...@pcarns-laptop admin]$ ./pvfs2-remove-object -f 513116601 -o 1048574
Attempting to remove object 1048574,513116601
[pca...@pcarns-laptop admin]$ ./pvfs2-lsplus /mnt/pvfs2
[E 10:54:54.724633] Invalid type 2 in readdirplus
a.dat
b.dat
c.dat
d.dat
lost+found
[pca...@pcarns-laptop admin]$ ./pvfs2-stat /mnt/pvfs2/b.dat
PVFS_sys_lookup: No such file or directory (error class: 0)
Error stating [/mnt/pvfs2/b.dat]
----------------------------------------------------------------------
From there I can do a pvfs2-rm, but it doesn't really clean it up
properly. I think you might get different errors if you tried to do
this via the kernel module, FWIW:
[pca...@pcarns-laptop admin]$ ./pvfs2-rm /mnt/pvfs2/b.dat
Error: An error occurred while removing /mnt/pvfs2/b.dat
PVFS_sys_remove: No such file or directory (error class: 0)
[pca...@pcarns-laptop admin]$ ./pvfs2-ls -alh /mnt/pvfs2/
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 11:00 .
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 11:00 .. (faked)
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 a.dat
Failed to get attributes on handle 1048574,513116601
Getattr failure: No such file or directory (error class: 0)
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 c.dat
-rw-r--r-- 1 pcarns pcarns 16.0M 2010-10-11 10:53 d.dat
drwxrwxrwx 1 pcarns pcarns 4.0K 2010-10-11 10:52 lost+found
Maybe that example above can make it easier to reproduce in a
development environment? We can tackle the problem later about how the
metadata object disappeared in the first place, but for starters it
would probably be good to at least fix these two things:
- how to make pvfs2-rm safely remove what it can (even if via a "force"
option)
- how to get pvfs2-lsplus (and probably other utilities and/or kernel
module as well) to report a sane error message instead of the "Invalid
object" message
....
FYI, the 2.6 tree is pretty cranky to build on my machine at this
point. I attached a patch that sets more BDB flags that are necessary
for recent BDB releases, in case anyone else needs to build 2.6 on a
modern box. I also found that I had to use the "DBCacheType mmap"
option in the server config file. I can't get the kernel module
working, so I stuck to command line utilities.
-Phil
On 10/11/2010 09:23 AM, Phil Carns wrote:
Oh, Ok. The metadata object must be missing then. If you do a normal
pvfs2-ls (no -al options) does the file show up in the listing without
errors?
Maybe there is an initial problem (the metadata object missing for
some reason), followed by a secondary problem that a getattr on that
object is returning an empty attribute structure rather than
indicating that there is an error.
I'm going to setup a small 2.6 setup here and try destroying a
metafile (using remove-object, probably) to see if I can recreate the
symptoms that you are seeing.
-Phil
On 10/08/2010 04:05 PM, Bart Taylor wrote:
Hey Phil,
The pvfs2-stat call never actually made it past sys_lookup, so there
was no ref.handle to print out. Here is the output:
#pvfs2-stat /mnt/pvfs2/file1.txt
PVFS_sys_lookup: No such file or directory (error class: 0)
Error stating [/mnt/pvfs2/file1.txt]
Bart.
On Thu, Oct 7, 2010 at 3:13 PM, Phil Carns <ca...@mcs.anl.gov
<mailto:ca...@mcs.anl.gov>> wrote:
Hi Bart,
Can you run pvfs2-stat on one of the files, and also send along
the fs.conf file? pvfs2-stat might be helpful because it shows
the metadata handle value. We can compare that value to the
handle ranges in the conf file to narrow down whether it is
hitting a metadata object that has just been corrupted somehow,
or whether it really is hitting a datafile handle.
If pvfs2-stat fails to show any output, then maybe you can modify
pvfs2-stat.c to print the value of ref.handle right before the
sys_getattr() call.
thanks,
-Phil
On 10/07/2010 01:08 PM, Bart Taylor wrote:
Hey guys,
We are having an increasing number of files that cannot be
removed on our 2.6 file systems. When we run the pvfs2-lsplus
tool, the output on these files looks like this:
[E 15:14:05.798568] Invalid type 2 in readdirplus
---------- 1 root root 0 1969-12-31 18:00 File1
[E 15:14:17.712553] Invalid type 2 in readdirplus
---------- 1 root root 0 1969-12-31 18:00 File2
[E 15:14:24.799221] Invalid type 2 in readdirplus
[E 15:14:24.799257] Invalid type 2 in readdirplus
[E 15:14:24.799269] Invalid type 2 in readdirplus
---------- 1 root root 0 1969-12-31 18:00
File3.txt
---------- 1 root root 0 1969-12-31 18:00
File5.txt
---------- 1 root root 0 1969-12-31 18:00
File6.txt
The "Invalid type 2" message indicates that readdirplus is
returning datafile attributes mixed in with the directory
entries. That might explain why all of the attributes look like
default values, but I am not sure why those files are having
problems in the first place.
This gets noticed when someone tries to update, create, append,
etc. the file. Most operations seem to return "No such file or
directory" when trying to access those files. A standard /bin/ls
will return normally, but long listings fail. pvfs2-ls,
pvfs2-viewdist and pvfs2-validate return getattr failures.
pvfs2-rm returns output like this:
[E 09:42:30.669413] Error: failed removing one or more datafiles
associated with the meta handle 238502937
[E 09:42:30.669599] WARNING: PVFS_sys_remove() encountered an
error which may lead
to inconsistent state: No such file or directory
[E 09:42:30.669614] WARNING: PVFS2 fsck (if available) may be
needed.
Error: An error occurred while removing /mnt/pvfs2/file1.txt
PVFS_sys_remove: No such file or directory (error class: 0)
Removing these files is a manual process. These are the steps we
follow:
- Track down the file(s) that are causing the problems
- pvfs2-stat on the directory where the file resides
- Grab the FSID and handle from the output
- pvfs2-remove-object using the file name, directory handle, and
FSID
As more of these files start appearing, this process is becoming
slow and painful. It would be great if we could sort out why
these files are showing up like they are, but right now I think
a utility that could efficiently remove these files without the
legwork would be really helpful. Any idea what might work based
on what we are seeing?
I am not sure if the problem also exists in 2.8, but it may be
related to this issue mailed in by Jim in September:
http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-September/003186.html
We are experiencing this issue as well.
Thanks,
Bart.
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
<mailto:Pvfs2-developers@beowulf-underground.org>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
<mailto:Pvfs2-developers@beowulf-underground.org>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-dspace.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c 2008-11-04 17:40:04.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-dspace.c 2010-10-11 10:47:59.000000000 -0400
@@ -250,6 +250,7 @@ static int dbpf_dspace_create_op_svc(str
memset(&key, 0, sizeof(key));
key.data = &new_handle;
key.size = key.ulen = sizeof(new_handle);
+ key.flags = DB_DBT_USERMEM;
memset(&data, 0, sizeof(data));
data.data = &s_attr;
@@ -862,6 +863,7 @@ static int dbpf_dspace_verify_op_svc(str
memset(&key, 0, sizeof(key));
key.data = &op_p->handle;
key.size = key.ulen = sizeof(TROVE_handle);
+ key.flags = DB_DBT_USERMEM;
memset(&data, 0, sizeof(data));
data.data = &s_attr;
@@ -1177,6 +1179,7 @@ static int dbpf_dspace_getattr_op_svc(st
memset(&key, 0, sizeof(key));
key.data = &op_p->handle;
key.size = key.ulen = sizeof(TROVE_handle);
+ key.flags = DB_DBT_USERMEM;
memset(&data, 0, sizeof(data));
memset(&s_attr, 0, sizeof(TROVE_ds_storedattr_s));
@@ -1266,6 +1269,7 @@ static int dbpf_dspace_getattr_list_op_s
memset(&key, 0, sizeof(key));
key.data = &op_p->u.d_getattr_list.handle_array[i];
key.size = key.ulen = sizeof(TROVE_handle);
+ key.flags = DB_DBT_USERMEM;
memset(&data, 0, sizeof(data));
memset(&s_attr, 0, sizeof(TROVE_ds_storedattr_s));
diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-keyval.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c 2009-01-23 09:23:53.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-keyval.c 2010-10-11 10:52:09.000000000 -0400
@@ -243,6 +243,7 @@ static int dbpf_keyval_read_op_svc(struc
key.size = key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(
op_p->u.k_read.key->buffer_sz);
+ key.flags = DB_DBT_USERMEM;
memset(&data, 0, sizeof(data));
data.data = op_p->u.k_read.val->buffer;
data.ulen = op_p->u.k_read.val->buffer_sz;
@@ -378,6 +379,7 @@ static int dbpf_keyval_write_op_svc(stru
key.data = &key_entry;
key.size = key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(
op_p->u.k_write.key.buffer_sz);
+ key.flags = DB_DBT_USERMEM;
data.data = op_p->u.k_write.val.buffer;
data.size = op_p->u.k_write.val.buffer_sz;
@@ -1376,6 +1378,7 @@ static int dbpf_keyval_do_remove(
memset(&db_key, 0, sizeof(db_key));
db_key.data = &key_entry;
db_key.size = db_key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(key->buffer_sz);
+ db_key.flags = DB_DBT_USERMEM;
gossip_debug(GOSSIP_DBPF_KEYVAL_DEBUG,
"keyval_db->del(handle= %llu, key= %*s (%d)) size=%d\n",
diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-mgmt.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-mgmt.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-mgmt.c 2008-01-31 12:53:45.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-mgmt.c 2010-10-11 10:49:52.000000000 -0400
@@ -575,6 +575,7 @@ int dbpf_collection_geteattr(TROVE_coll_
memset(&db_data, 0, sizeof(db_data));
db_key.data = key_p->buffer;
db_key.size = key_p->buffer_sz;
+ db_key.flags = DB_DBT_USERMEM;
db_data.data = val_p->buffer;
db_data.ulen = val_p->buffer_sz;
@@ -796,6 +797,7 @@ int dbpf_collection_create(char *collnam
key.data = collname;
key.size = strlen(collname)+1;
+ key.flags = DB_DBT_USERMEM;
data.data = &db_data;
data.ulen = sizeof(db_data);
data.flags = DB_DBT_USERMEM;
@@ -1020,6 +1022,7 @@ int dbpf_collection_remove(char *collnam
key.data = collname;
key.size = strlen(collname) + 1;
+ key.flags = DB_DBT_USERMEM;
data.data = &db_data;
data.ulen = sizeof(db_data);
data.flags = DB_DBT_USERMEM;
@@ -1242,8 +1245,8 @@ int dbpf_collection_iterate(TROVE_ds_pos
}
*(TROVE_ds_position *) key.data = *inout_position_p;
key.flags |= DB_DBT_USERMEM;
-
memset(&data, 0, sizeof(data));
+
data.data = &db_entry;
data.size = data.ulen = sizeof(db_entry);
data.flags |= DB_DBT_USERMEM;
@@ -1384,6 +1387,7 @@ int dbpf_collection_lookup(char *collnam
memset(&data, 0, sizeof(data));
key.data = collname;
key.size = strlen(collname)+1;
+ key.flags = DB_DBT_USERMEM;
data.data = &db_data;
data.ulen = sizeof(db_data);
data.flags = DB_DBT_USERMEM;
@@ -1469,6 +1473,7 @@ int dbpf_collection_lookup(char *collnam
memset(&data, 0, sizeof(data));
key.data = TROVE_DBPF_VERSION_KEY;
key.size = strlen(TROVE_DBPF_VERSION_KEY);
+ key.flags = DB_DBT_USERMEM;
data.data = &trove_dbpf_version;
data.ulen = 32;
data.flags = DB_DBT_USERMEM;
diff -Naupr pvfs2_src/src/server/pvfs2-server.c pvfs2_src_buildfix/src/server/pvfs2-server.c
--- pvfs2_src/src/server/pvfs2-server.c 2009-04-02 09:51:45.000000000 -0400
+++ pvfs2_src_buildfix/src/server/pvfs2-server.c 2010-10-11 09:16:48.000000000 -0400
@@ -1421,7 +1421,7 @@ static int server_setup_signal_handlers(
struct sigaction new_action;
struct sigaction ign_action;
struct sigaction hup_action;
- hup_action.sa_sigaction = (void *)hup_sighandler;
+ //hup_action.sa_sigaction = (void *)hup_sighandler;
sigemptyset (&hup_action.sa_mask);
hup_action.sa_flags = SA_RESTART | SA_SIGINFO;
#ifdef __PVFS2_SEGV_BACKTRACE__
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers