I'm not sure how you are getting there yet, but I am able to recreate this state on my laptop with a single server and the pvfs2 admin utilities.

The following shows the creation of 4 files. I then get the metadata handle for one of them (via pvfs2-stat) and the fsid from the conf file. Then pvfs2-remove-object is used to manually delete that particular object, leaving the directory entry in place. pvfs2-lsplus at that point starts showing the "Invalid type..." error and pvfs2-stat gets no such file or directory.

-------------------------------------------------------------
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/a.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/b.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/d.dat
[pca...@pcarns-laptop admin]$ ./pvfs2-cp /tmp/16mb.dat /mnt/pvfs2/c.dat

[pca...@pcarns-laptop admin]$ ./pvfs2-lsplus -alh /mnt/pvfs2
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 10:53 .
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 10:53 .. (faked)
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 a.dat
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 b.dat
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 c.dat
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 d.dat
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 10:52 lost+found

[pca...@pcarns-laptop admin]$ ./pvfs2-stat /mnt/pvfs2/b.dat
-------------------------------------------------------
  File Name     : /mnt/pvfs2/b.dat
  Relative Name : /b.dat
  fs ID         : 513116601
  Handle        : 1048574
  Mask          : 704000177
  Permissions   : 644
  Type          : Regular File
  Size          : 16777216
  Owner         : 1000 (pcarns)
  Group         : 1000 (pcarns)
  atime         : 1286808818 (Mon Oct 11 10:53:38 2010)
  mtime         : 1286808818 (Mon Oct 11 10:53:38 2010)
  ctime         : 1286808818 (Mon Oct 11 10:53:38 2010)
  datafiles     : 1
  flags         : none

[pca...@pcarns-laptop server]$ cat simple.conf |grep ID
    ID 513116601

[pca...@pcarns-laptop admin]$ ./pvfs2-remove-object -f 513116601 -o 1048574
Attempting to remove object 1048574,513116601

[pca...@pcarns-laptop admin]$ ./pvfs2-lsplus /mnt/pvfs2
[E 10:54:54.724633] Invalid type 2 in readdirplus
a.dat
b.dat
c.dat
d.dat
lost+found

[pca...@pcarns-laptop admin]$ ./pvfs2-stat /mnt/pvfs2/b.dat
PVFS_sys_lookup: No such file or directory (error class: 0)
Error stating [/mnt/pvfs2/b.dat]

----------------------------------------------------------------------
From there I can do a pvfs2-rm, but it doesn't really clean it up properly. I think you might get different errors if you tried to do this via the kernel module, FWIW:

[pca...@pcarns-laptop admin]$ ./pvfs2-rm /mnt/pvfs2/b.dat
Error: An error occurred while removing /mnt/pvfs2/b.dat
PVFS_sys_remove: No such file or directory (error class: 0)

[pca...@pcarns-laptop admin]$ ./pvfs2-ls -alh /mnt/pvfs2/
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 11:00 .
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 11:00 .. (faked)
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 a.dat
Failed to get attributes on handle 1048574,513116601
Getattr failure: No such file or directory (error class: 0)
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 c.dat
-rw-r--r--    1 pcarns   pcarns         16.0M 2010-10-11 10:53 d.dat
drwxrwxrwx    1 pcarns   pcarns          4.0K 2010-10-11 10:52 lost+found

Maybe that example above can make it easier to reproduce in a development environment? We can tackle the problem later about how the metadata object disappeared in the first place, but for starters it would probably be good to at least fix these two things:

- how to make pvfs2-rm safely remove what it can (even if via a "force" option) - how to get pvfs2-lsplus (and probably other utilities and/or kernel module as well) to report a sane error message instead of the "Invalid object" message

....

FYI, the 2.6 tree is pretty cranky to build on my machine at this point. I attached a patch that sets more BDB flags that are necessary for recent BDB releases, in case anyone else needs to build 2.6 on a modern box. I also found that I had to use the "DBCacheType mmap" option in the server config file. I can't get the kernel module working, so I stuck to command line utilities.

-Phil

On 10/11/2010 09:23 AM, Phil Carns wrote:
Oh, Ok. The metadata object must be missing then. If you do a normal pvfs2-ls (no -al options) does the file show up in the listing without errors?

Maybe there is an initial problem (the metadata object missing for some reason), followed by a secondary problem that a getattr on that object is returning an empty attribute structure rather than indicating that there is an error.

I'm going to setup a small 2.6 setup here and try destroying a metafile (using remove-object, probably) to see if I can recreate the symptoms that you are seeing.

-Phil


On 10/08/2010 04:05 PM, Bart Taylor wrote:
Hey Phil,

The pvfs2-stat call never actually made it past sys_lookup, so there was no ref.handle to print out. Here is the output:

#pvfs2-stat /mnt/pvfs2/file1.txt
PVFS_sys_lookup: No such file or directory (error class: 0)
Error stating [/mnt/pvfs2/file1.txt]

Bart.



On Thu, Oct 7, 2010 at 3:13 PM, Phil Carns <ca...@mcs.anl.gov <mailto:ca...@mcs.anl.gov>> wrote:

    Hi Bart,

    Can you run pvfs2-stat on one of the files, and also send along
    the fs.conf file?  pvfs2-stat might be helpful because it shows
    the metadata handle value.  We can compare that value to the
    handle ranges in the conf file to narrow down whether it is
    hitting a metadata object that has just been corrupted somehow,
    or whether it really is hitting a datafile handle.

    If pvfs2-stat fails to show any output, then maybe you can modify
    pvfs2-stat.c to print the value of ref.handle right before the
    sys_getattr() call.

    thanks,
    -Phil




    On 10/07/2010 01:08 PM, Bart Taylor wrote:
    Hey guys,

    We are having an increasing number of files that cannot be
    removed on our 2.6 file systems. When we run the pvfs2-lsplus
    tool, the output on these files looks like this:

    [E 15:14:05.798568] Invalid type 2 in readdirplus
    ----------    1 root     root               0 1969-12-31 18:00 File1
    [E 15:14:17.712553] Invalid type 2 in readdirplus
    ----------    1 root     root               0 1969-12-31 18:00 File2
    [E 15:14:24.799221] Invalid type 2 in readdirplus
    [E 15:14:24.799257] Invalid type 2 in readdirplus
    [E 15:14:24.799269] Invalid type 2 in readdirplus
    ----------    1 root     root               0 1969-12-31 18:00
    File3.txt
    ----------    1 root     root               0 1969-12-31 18:00
    File5.txt
    ----------    1 root     root               0 1969-12-31 18:00
    File6.txt


    The "Invalid type 2" message indicates that readdirplus is
    returning datafile attributes mixed in with the directory
    entries. That might explain why all of the attributes look like
    default values, but I am not sure why those files are having
    problems in the first place.

    This gets noticed when someone tries to update, create, append,
    etc. the file. Most operations seem to return "No such file or
    directory" when trying to access those files. A standard /bin/ls
    will return normally, but long listings fail. pvfs2-ls,
    pvfs2-viewdist and pvfs2-validate return getattr failures.
    pvfs2-rm returns output like this:

    [E 09:42:30.669413] Error: failed removing one or more datafiles
    associated with the meta handle 238502937
    [E 09:42:30.669599] WARNING: PVFS_sys_remove() encountered an
    error which may lead
    to inconsistent state: No such file or directory
    [E 09:42:30.669614] WARNING: PVFS2 fsck (if available) may be
    needed.
    Error: An error occurred while removing /mnt/pvfs2/file1.txt
    PVFS_sys_remove: No such file or directory (error class: 0)

    Removing these files is a manual process. These are the steps we
    follow:
    - Track down the file(s) that are causing the problems
    - pvfs2-stat on the directory where the file resides
      - Grab the FSID and handle from the output
    - pvfs2-remove-object using the file name, directory handle, and
    FSID

    As more of these files start appearing, this process is becoming
    slow and painful. It would be great if we could sort out why
    these files are showing up like they are, but right now I think
    a utility that could efficiently remove these files without the
    legwork would be really helpful. Any idea what might work based
    on what we are seeing?

    I am not sure if the problem also exists in 2.8, but it may be
    related to this issue mailed in by Jim in September:
    
http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-September/003186.html
    We are experiencing this issue as well.

    Thanks,
    Bart.





    _______________________________________________
    Pvfs2-developers mailing list
    Pvfs2-developers@beowulf-underground.org  
<mailto:Pvfs2-developers@beowulf-underground.org>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


    _______________________________________________
    Pvfs2-developers mailing list
    Pvfs2-developers@beowulf-underground.org
    <mailto:Pvfs2-developers@beowulf-underground.org>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers




_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-dspace.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c	2008-11-04 17:40:04.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-dspace.c	2010-10-11 10:47:59.000000000 -0400
@@ -250,6 +250,7 @@ static int dbpf_dspace_create_op_svc(str
     memset(&key, 0, sizeof(key));
     key.data = &new_handle;
     key.size = key.ulen = sizeof(new_handle);
+    key.flags = DB_DBT_USERMEM;
 
     memset(&data, 0, sizeof(data));
     data.data = &s_attr;
@@ -862,6 +863,7 @@ static int dbpf_dspace_verify_op_svc(str
     memset(&key, 0, sizeof(key));
     key.data = &op_p->handle;
     key.size = key.ulen = sizeof(TROVE_handle);
+    key.flags = DB_DBT_USERMEM;
 
     memset(&data, 0, sizeof(data));
     data.data = &s_attr;
@@ -1177,6 +1179,7 @@ static int dbpf_dspace_getattr_op_svc(st
     memset(&key, 0, sizeof(key));
     key.data = &op_p->handle;
     key.size = key.ulen = sizeof(TROVE_handle);
+    key.flags = DB_DBT_USERMEM;
 
     memset(&data, 0, sizeof(data));
     memset(&s_attr, 0, sizeof(TROVE_ds_storedattr_s));
@@ -1266,6 +1269,7 @@ static int dbpf_dspace_getattr_list_op_s
         memset(&key, 0, sizeof(key));
         key.data = &op_p->u.d_getattr_list.handle_array[i];
         key.size = key.ulen = sizeof(TROVE_handle);
+        key.flags = DB_DBT_USERMEM;
 
         memset(&data, 0, sizeof(data));
         memset(&s_attr, 0, sizeof(TROVE_ds_storedattr_s));
diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-keyval.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c	2009-01-23 09:23:53.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-keyval.c	2010-10-11 10:52:09.000000000 -0400
@@ -243,6 +243,7 @@ static int dbpf_keyval_read_op_svc(struc
     key.size = key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(
         op_p->u.k_read.key->buffer_sz);
 
+    key.flags = DB_DBT_USERMEM;
     memset(&data, 0, sizeof(data));
     data.data = op_p->u.k_read.val->buffer;
     data.ulen = op_p->u.k_read.val->buffer_sz;
@@ -378,6 +379,7 @@ static int dbpf_keyval_write_op_svc(stru
     key.data = &key_entry;
     key.size = key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(
         op_p->u.k_write.key.buffer_sz);
+    key.flags = DB_DBT_USERMEM;
     data.data = op_p->u.k_write.val.buffer;
     data.size = op_p->u.k_write.val.buffer_sz;
 
@@ -1376,6 +1378,7 @@ static int dbpf_keyval_do_remove(
     memset(&db_key, 0, sizeof(db_key));
     db_key.data = &key_entry;
     db_key.size = db_key.ulen = DBPF_KEYVAL_DB_ENTRY_TOTAL_SIZE(key->buffer_sz);
+    db_key.flags = DB_DBT_USERMEM;
 
     gossip_debug(GOSSIP_DBPF_KEYVAL_DEBUG,
                  "keyval_db->del(handle= %llu, key= %*s (%d)) size=%d\n",
diff -Naupr pvfs2_src/src/io/trove/trove-dbpf/dbpf-mgmt.c pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-mgmt.c
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-mgmt.c	2008-01-31 12:53:45.000000000 -0500
+++ pvfs2_src_buildfix/src/io/trove/trove-dbpf/dbpf-mgmt.c	2010-10-11 10:49:52.000000000 -0400
@@ -575,6 +575,7 @@ int dbpf_collection_geteattr(TROVE_coll_
     memset(&db_data, 0, sizeof(db_data));
     db_key.data = key_p->buffer;
     db_key.size = key_p->buffer_sz;
+    db_key.flags = DB_DBT_USERMEM;
 
     db_data.data  = val_p->buffer;
     db_data.ulen  = val_p->buffer_sz;
@@ -796,6 +797,7 @@ int dbpf_collection_create(char *collnam
 
     key.data = collname;
     key.size = strlen(collname)+1;
+    key.flags = DB_DBT_USERMEM;
     data.data = &db_data;
     data.ulen = sizeof(db_data);
     data.flags = DB_DBT_USERMEM;
@@ -1020,6 +1022,7 @@ int dbpf_collection_remove(char *collnam
 
     key.data = collname;
     key.size = strlen(collname) + 1;
+    key.flags = DB_DBT_USERMEM;
     data.data = &db_data;
     data.ulen = sizeof(db_data);
     data.flags = DB_DBT_USERMEM;
@@ -1242,8 +1245,8 @@ int dbpf_collection_iterate(TROVE_ds_pos
         }
         *(TROVE_ds_position *) key.data = *inout_position_p;
         key.flags |= DB_DBT_USERMEM;
-
         memset(&data, 0, sizeof(data));
+        
         data.data = &db_entry;
         data.size = data.ulen = sizeof(db_entry);
         data.flags |= DB_DBT_USERMEM;
@@ -1384,6 +1387,7 @@ int dbpf_collection_lookup(char *collnam
     memset(&data, 0, sizeof(data));
     key.data = collname;
     key.size = strlen(collname)+1;
+    key.flags = DB_DBT_USERMEM;
     data.data = &db_data;
     data.ulen = sizeof(db_data);
     data.flags = DB_DBT_USERMEM;
@@ -1469,6 +1473,7 @@ int dbpf_collection_lookup(char *collnam
     memset(&data, 0, sizeof(data));
     key.data = TROVE_DBPF_VERSION_KEY;
     key.size = strlen(TROVE_DBPF_VERSION_KEY);
+    key.flags = DB_DBT_USERMEM;
     data.data = &trove_dbpf_version;
     data.ulen = 32;
     data.flags = DB_DBT_USERMEM;
diff -Naupr pvfs2_src/src/server/pvfs2-server.c pvfs2_src_buildfix/src/server/pvfs2-server.c
--- pvfs2_src/src/server/pvfs2-server.c	2009-04-02 09:51:45.000000000 -0400
+++ pvfs2_src_buildfix/src/server/pvfs2-server.c	2010-10-11 09:16:48.000000000 -0400
@@ -1421,7 +1421,7 @@ static int server_setup_signal_handlers(
     struct sigaction new_action;
     struct sigaction ign_action;
     struct sigaction hup_action;
-    hup_action.sa_sigaction = (void *)hup_sighandler;
+    //hup_action.sa_sigaction = (void *)hup_sighandler;
     sigemptyset (&hup_action.sa_mask);
     hup_action.sa_flags = SA_RESTART | SA_SIGINFO;
 #ifdef __PVFS2_SEGV_BACKTRACE__
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to