Ok, I think I was able to reproduce the same thing that you were seeing. Can you try the attached patch and let me know if it fixes things on your end too?

thanks!
-Phil

On 05/06/2010 02:34 PM, Phil Carns wrote:
Actually, it might be good if you could just send me the fs.conf file for your test setup.

thanks,
-Phil

On 05/06/2010 02:26 PM, Phil Carns wrote:
Thanks Bart. In your example, what are the names and ports of each of the servers involved? Are they all on the same node (with different ports) by any chance?

thanks,
-Phil

On 05/04/2010 09:50 AM, Bart Taylor wrote:
The log file is attached.

I upgraded, let the file system start responding to pvfs2-ping, gave both servers a sighup to pick up the logging update, cleared the log file, and gave it the dd command listed below. The second server never logged anything.

$ dd if=/dev/zero of=/mnt/pvfs2/10M.zeros.3 bs=1M count=10
dd: writing `/mnt/pvfs2/10M.zeros.3': Connection timed out
1+0 records in
0+0 records out

Bart.



On Mon, May 3, 2010 at 11:22 AM, Phil Carns <ca...@mcs.anl.gov <mailto:ca...@mcs.anl.gov>> wrote:

    Can you get a server into this state (where everything works
    except for > strip size files), turn on verbose logging, and
    then try to create a big file?

    I'd like to see the log file from the metadata server for the
    file in question.  That server is the one that has to come up
    with the pre-created file handles at that point and must be
    having a problem.  Even if the pre-create requests had failed up
    until then, it is supposed to eventually sort things out.

    thanks,
    -Phil


    On 04/29/2010 04:51 PM, Bart Taylor wrote:
    Yes, it does finish the Trove Migration and print similar
    messages. The file system responds to requests; I just can't
    create files larger than one strip size.  Once I restart the
    file system I can, but on first start, they fail.

    Bart.



    On Thu, Apr 29, 2010 at 1:50 PM, Kevin Harms
    <ha...@alcf.anl.gov <mailto:ha...@alcf.anl.gov>> wrote:

        Bart,

         I think the server should print out when conversion starts
        and ends.

         examples:
         Trove Migration Started: Ver=2.6.3
         Trove Migration Complete: Ver=2.6.3
         Trove Migration Set: 2.8.1

         Does is get that far?

        kevin

        On Apr 29, 2010, at 1:55 PM, Bart Taylor wrote:

        > Thanks for the information and suggestion Phil.
        Unfortunately, I didn't get a different result after moving
        that BMI init block. I also managed to reproduce this once
        while leaving the trove method to alt-aio although that
        doesn't seem directly related to the direction you were going.
        >
        > Another thing I noticed is that I can create files
        successfully after the upgrade as long as the size is
        within 64k which is the value of my strip_size distribution
        param. Once the size exceeds that value, I start running
        into this problem again.
        >
        > Does that help shed any more light on my situation?
        >
        > Bart.
        >
        >
        > On Fri, Apr 16, 2010 at 1:39 PM, Phil Carns
        <ca...@mcs.anl.gov <mailto:ca...@mcs.anl.gov>> wrote:
        > Sadly none of my test boxes will run 2.6 any more, but I
        have a theory about what the problem might be here.
        >
        > For some background, the pvfs2-server daemon does these
        steps in order (among others): initializes BMI
        (networking), initializes Trove (storage), and then finally
        starts processing requests.
        >
        > In your case, two extra things are going on:
        >
        > - the trove initialization may take a while, because it
        has to do a conversion of the
        > format for all objects from v. 2.6 to 2.8, especially if
        it is also switching to o_direct format at the same time.
        >
        > - whichever server gets done first is going to
        immediately contact the other servers in order to precreate
        handles for new files (a new feature in 2.8)
        >
        > I'm guessing that one server finished the trove
        conversion before the others and started its pre-create
        requests.  The other servers can't answer yet (because they
        are still busy with trove), but since BMI is already
        running the incoming precreate requests just get queued up
        on the socket.  When the slow server finally does try to
        service them, the requests are way out of date and have
        since been retried by the fast server.
        >
        > I'm not sure exactly what goes wrong from there, but if
        that's the cause, the solution might be relatively simple.
         If you look in pvfs2-server.c, you can take the block of
        code from "BMI_initialize(...)" to "*server_status_flag |=
        SERVER_BMI_INIT;" and try moving that whole block to
        _after_ the "*server_status_flag |= SERVER_TROVE_INIT;"
        line that indicates that trove is done.
        >
        > -Phil
        >
        >
        > On 03/30/2010 06:23 PM, Bart Taylor wrote:
        >>
        >> I am having some problems upgrading existing file
        systems to 2.8. After I finish the upgrade and start the
        file system, I cannot create files. Simple commands like dd
        and cp stall until they timeout and leave partial dirents
        like this:
        >>
        >> [bat...@client t]$ dd if=/dev/zero
        of=/mnt/pvfs28/10MB.dat.6 bs=1M count=10
        >> dd: writing `/mnt/pvfs28/10MB.dat.6': Connection timed out
        >> 1+0 records in
        >> 0+0 records out
        >> 0 bytes (0 B) copied, 180.839 seconds, 0.0 kB/s
        >>
        >>
        >> [r...@client ~]# ls -alh /mnt/pvfs28/
        >> total 31M
        >> drwxrwxrwt 1 root   root   4.0K Mar 30 11:24 .
        >> drwxr-xr-x 4 root   root   4.0K Mar 23 13:38 ..
        >> -rw-rw-r-- 1 batayl batayl  10M Mar 30 08:44 10MB.dat.1
        >> -rw-rw-r-- 1 batayl batayl  10M Mar 30 08:44 10MB.dat.2
        >> -rw-rw-r-- 1 batayl batayl  10M Mar 30 08:44 10MB.dat.3
        >> ?--------- ? ?      ?         ?            ? 10MB.dat.5
        >> drwxrwxrwx 1 root   root   4.0K Mar 29 14:06 lost+found
        >>
        >>
        >> This happens both on local disk and on network storage,
        but it only happens if the upgraded file system starts up
        the first time using directio. If it is started with
        alt-aio as the TroveMethod, everything works as expected.
        It also only happens the first time the file system is
        started; if I stop the server daemons and restart them,
        everything operates as expected. I do have to kill -9 the
        server deamons, since they will not exit gracefully.
        >>
        >> My test is running on RHEL4 U8 i386 with kernel version
        2.6.9-89.ELsmp with two server nodes and one client. I was
        unable to recreate the problem with a single server.
        >>
        >> I attached verbose server logs from the time the daemon
        was started after the upgrade until the client failed as
        well as client logs from mount until the returned error.
        The cliffs notes are that one of the servers logs as many
        unstuff requests as we have client retries configured. The
        client fails at the end of the allotted retries. The other
        server doesn't log anythign after starting.
        >>
        >> Has anyone seen anything similar or know what might be
        going on?
        >>
        >> Bart.
        >>
        >>
        >>
        >>
        >> _______________________________________________
        >> Pvfs2-developers mailing list
        >>
        >> Pvfs2-developers@beowulf-underground.org
        <mailto:Pvfs2-developers@beowulf-underground.org>
        >>
        http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
        >>
        >>
        >>
        >
        >
        > _______________________________________________
        > Pvfs2-developers mailing list
        > Pvfs2-developers@beowulf-underground.org
        <mailto:Pvfs2-developers@beowulf-underground.org>
        >
        http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers






_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

? pvfs2
Index: src/io/job/job.c
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/io/job/job.c,v
retrieving revision 1.190
diff -a -u -p -r1.190 job.c
--- src/io/job/job.c	6 Nov 2009 18:04:54 -0000	1.190
+++ src/io/job/job.c	6 May 2010 19:39:17 -0000
@@ -4426,7 +4426,10 @@ static void precreate_pool_get_thread_mg
     }
 
     trove_pending_count--;
+    assert(trove_pending_count >= 0);
+
     tmp_trove->jd->u.precreate_pool.trove_pending--;
+    assert(tmp_trove->jd->u.precreate_pool.trove_pending >= 0);
 
     /* don't overwrite error codes from other trove ops */
     if(tmp_trove->jd->u.precreate_pool.error_code == 0)
@@ -5897,6 +5900,10 @@ static void precreate_pool_get_handles_t
             }
         }
 
+        /* pre-increment pending count before posting trove operation */
+        trove_pending_count++;
+        jd->u.precreate_pool.trove_pending++;
+
         /* post trove operation to pull out a handle */
         ret = trove_keyval_iterate_keys(
             fs->fsid, 
@@ -5926,8 +5933,6 @@ static void precreate_pool_get_handles_t
         else
         {
             /* callback will be triggered later */
-            trove_pending_count++;
-            jd->u.precreate_pool.trove_pending++;
         }
     }
     gen_mutex_unlock(&precreate_pool_mutex);
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to