Re: [Pvfs2-developers] Segfault during cancel op (previously kernel loop)

Phil Carns Fri, 15 Jan 2010 12:29:25 -0800

Hi Michael,

I just tried your test case on a clean trunk build here and was able toreproduce the pvfs2-client-core segfault 100% of the time on my box.

The problem in a nutshell is that pvfs2-client-core was trying to cancela small-io operation using logic that is only appropriate for a normalI/O operation, in turn causing some memory corruptions.

Can you try out the fix and see if it solves the problem for you? Thepatch is attached your you can pull it from cvs trunk.

You might want to try that change by itself (without the op purgedchange) first and go from there. Some of the other issues you ran intomay have been an after-effect from the cancel problem.


-Phil


Michael Moore wrote:

On Thu, Jan 07, 2010 at 06:48:19PM -0500, Michael Moore wrote:
On Thu, Jan 07, 2010 at 03:07:17PM -0600, Sam Lang wrote:
On Jan 7, 2010, at 2:29 PM, Michael Moore wrote:
On Thu, Jan 07, 2010 at 12:30:00PM -0600, Sam Lang wrote:
On Jan 7, 2010, at 11:32 AM, Michael Moore wrote:
In continuing to look at the 100% CPU usage (kernel loop) Randy hadwritten about previously I've narrowed the issue down a little. Itappears related to cancellation of operations when a write() callis blocking and I/O has been retried.While on our cluster the retries were caused by congestion I amre-creating the congestion by killing an I/O server. The test C programI'm using just loops around writes of 4k to a PVFS file. If,while the program is executing, I kill a PVFS I/O server the write hangs(expectedly) . About 30% of the time when I try to kill theprocess doing the writing it spikes to 100% CPU usage and is notkillable. Also, every time I try to kill the writing processpvfs2-client-core segfaults with something similar to:
[E 11:58:09.724121] PVFS2 client: signal 11, faulty address is 0x41ec,from 0x8050b51
[E 11:58:09.725403] [bt] pvfs2-client-core [0x8050b51]
[E 11:58:09.725427] [bt] pvfs2-client-core(main+0xe48) [0x8052498]
[E 11:58:09.725436] [bt] /lib/libc.so.6(__libc_start_main+0xdc)[0x75ee9c]
[E 11:58:09.725444] [bt] pvfs2-client-core [0x804a381]
[E 11:58:09.740133] Child process with pid 2555 was killed by anuncaught signal 6
In the cases when the CPU usage becomes 100% (and the process can't beterminated) the for() loop in PINT_client_io_cancel strangely segfaultsduring exactly iteration 31. The value of sm_p->u.io.context_count isin the hunderds so there are a signifigant number of jobs left to cancel.
Hi Michael,

Are you guys using infiniband by chance?  Do you have a stack trace with 
debugging symbols where the pvfs2-client-core segfault occurs?  That might be 
useful for narrowing things down.
No infiniband at use, vanilla ethernet.

Stack trace with symbols:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe06d0 (LWP 2069)]
PINT_client_io_cancel (id=5594) atsrc/client/sysint/client-state-machine.c:524
524             if (cur_ctx->msg_send_in_progress)
(gdb)(gdb)(gdb) bt
#0  PINT_client_io_cancel (id=5594)
   at src/client/sysint/client-state-machine.c:524
#1  0x0804b926 in service_operation_cancellation (vfs_request=0xa073d98)
   at src/apps/kernel/linux/pvfs2-client-core.c:407
#2  0x0804f141 in handle_unexp_vfs_request (vfs_request=0xa073d98)
   at src/apps/kernel/linux/pvfs2-client-core.c:2959
#3  0x08050c7b in process_vfs_requests ()
   at src/apps/kernel/linux/pvfs2-client-core.c:3148
#4  0x08052498 in main (argc=12, argv=0xbfe0d224)
   at src/apps/kernel/linux/pvfs2-client-core.c:3561
(gdb)
In addition to the above segfault, I'm also seeing one for vfs_requestin process_vfs_requests. The vfs_request segfault is the one depicted inthe logs attached. The segfault is just at the first time the functionaccesses vfs_request:
Program received signal SIGSEGV, Segmentation fault.
0x08050cd7 in process_vfs_requests ()
    at src/apps/kernel/linux/pvfs2-client-core.c:3125
3125                if (vfs_request->num_ops == 1 &&
Again, this is orange branch CVS line numbers. As a side note I get aconfigure error when trying to build the head version:
"config.status: error: cannot find input file: src/io/bmi/bmi_zoid/module.mk.in"
It's obvious that vfs_request is not a valid address but Ihaven't figured out where it's getting mucked with or thrown away. Note,these logs are after making the op_state_purged(op) change recommendedbelow. That change seems to have prevented the kernel infinite loopissue as you indicated it would.
I don't think I'm doing anything special from the application side butI'm also attaching the application I'm using to create this issue.
Thanks,
Michael
What version of PVFS is this?  I'm having trouble matching up the lines to 
anything interesting.
-sam
Sorry, Orange-Branch out of CVS.

Michael
The real issue is the 30% of the time when the process gets stuck in thekernel waiting for a downcall. With some additional debugging, theprocess's write() call is clearly stuck in the while(1) loop inwait_for_cancellation_downcall(). The function's assumption is thateither the request will timeout or it will be serviced after oneiteration of the loop. However, in this situation it neither occurs. Theschedule_timeout() call immediately returns with a signal pending butthe op is never serviced so it spins indefinately.
Those infinite looping conditionals have bugged me for a while now (there's one 
in wait_for_matching_downcall too).  We should probably go through and 
systematically replace all of them.  Lets try to fix your bug first though.

I'm actually surprised that the process goes to 100% cpu in this case.  You're 
right that the op is not getting serviced, so the first if conditional won't 
break out of the while loop.  But the schedule_timeout should only return 
non-zero if the task gets woken up, and it only gets woken up in 
purge_waiting_ops() when the pvfs2-client-core segfaults.  And that should only 
happen once.  One thing you might try is to change the first if conditional 
from:

       if (op_state_serviced(op))

to:

       if (op_state_serviced(op) || op_state_purged(op))

That will allow purged ops to exit the while loop.  Could you share your 
debugging output and modified code?
No special code other than some gossip_debugs. The
kernel debugging right before and including the cancel is:

Jan  7 15:15:25 vm04 kernel: Alloced OP (d0c48020: 875 OP_FILE_IO)
Jan  7 15:15:25 vm04 kernel: GET op d0c48020 -> buffer_index 0
Jan 7 15:15:25 vm04 kernel: pvfs2_file_write: copy_to_user 1 nr_segs 1,offset: 3129344 total_size: 4096Jan 7 15:15:25 vm04 kernel: pvfs_bufmap_copy_iovec_from_user: index 0,size 4096Jan 7 15:15:25 vm04 kernel: First character (integer value) inpvfs_bufmap_copy_from_user: 63Jan 7 15:15:25 vm04 kernel: pvfs2: service_operation: pvfs2_file_writed0c48020Jan 7 15:15:25 vm04 kernel: pvfs2: operation posted by process:pvfs_rand_write, pid: 2076
Jan  7 15:15:25 vm04 kernel: Releasing OP (d0c98060: 874)
Jan  7 15:15:25 vm04 kernel: client-core: reading op tag 875 OP_FILE_IO
Jan 7 15:15:55 vm04 kernel: *** operation interrupted by a signal (tag875, op d0c48020)Jan 7 15:15:55 vm04 kernel: Interrupted: Removed op d0c48020 fromhtable_ops_in_progressJan 7 15:15:55 vm04 kernel: pvfs2: service_operation pvfs2_file_writereturning: -4 for d0c48020.Jan 7 15:15:55 vm04 kernel: pvfs2_cancel_op_in_progress called on tag875
Jan  7 15:15:55 vm04 kernel: Alloced OP (da5000a0: 876 OP_CANCEL)
Jan 7 15:15:55 vm04 kernel: Attempting PVFS2 operation cancellation oftag 875Jan 7 15:15:55 vm04 kernel: pvfs2: service_operation: pvfs2_cancelda5000a0Jan 7 15:15:55 vm04 kernel: pvfs2: operation posted by process:pvfs_rand_write, pid: 2076Jan 7 15:15:55 vm04 kernel: *** operation didn't time out: da5000a0,jiffies_left: 20000
Jan  7 15:15:57 vm04 last message repeated 3 times
Jan 7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0,jiffies_left: 19999Jan 7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0,jiffies_left: 20000
Jan  7 15:15:58 vm04 last message repeated 59 times
............
The pvfs2-client debug log (from an earlier failure) proves that mystatement earlier about segfaulting in the 31st iteration false. Thisone goes into the 70's.
[D 14:35:53.231333] cancel_op_in_progress called
[D 14:35:53.231342] cancelling I/O req 0x88c7f98 from tag 1128
[D 14:35:53.231357] PINT_client_io_cancel id 7350
[D 14:35:53.231368] job iteration [0]
[D 14:35:53.231377] [0] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231386] [0] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231396] [0] Posting cancellation of type: FLOW
[D 14:35:53.231431] [0] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231444] job iteration [1]
[D 14:35:53.231452] [1] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231461] [1] Posting cancellation of type: BMI Recv(Response)[D 14:35:53.231469] [1] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231502] job iteration [2]
[D 14:35:53.231510] [2] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231519] [2] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231528] [2] Posting cancellation of type: FLOW
[D 14:35:53.231536] [2] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231545] job iteration [3]
[D 14:35:53.231553] [3] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231562] [3] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231571] [3] Posting cancellation of type: FLOW
[D 14:35:53.231579] [3] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231588] job iteration [4]
[D 14:35:53.231596] [4] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231605] [4] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231614] [4] Posting cancellation of type: FLOW
[D 14:35:53.231622] [4] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231628] job iteration [5]
[D 14:35:53.231666] [5] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231675] [5] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231684] [5] Posting cancellation of type: FLOW
[D 14:35:53.231692] [5] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.231701] job iteration [6]
[D 14:35:53.231709] [6] Posting cancellation of type: BMI Send (Request)
[D 14:35:53.231718] [6] Posting cancellation of type: BMI Recv(Response)
[D 14:35:53.231727] [6] Posting cancellation of type: FLOW
[D 14:35:53.231735] [6] Posting cancellation of type: BMI Recv (WriteAck)
............
<snip>
............
[D 14:35:53.235128] job iteration [69]
[D 14:35:53.235136] [69] Posting cancellation of type: FLOW
[D 14:35:53.235145] [69] Posting cancellation of type: BMI Recv (WriteAck)
[D 14:35:53.235153] job iteration [70]
[D 14:35:53.235161] job iteration [71]
[D 14:35:53.235169] job iteration [72]
[D 14:35:53.235177] job iteration [73]
[E 14:35:53.235212] PVFS2 client: signal 11, faulty address is 0xb8c0c8,from 0xb26900[E 14:35:53.235810] [bt]/usr/local/pvfs/lib/libpvfs2.so(PINT_client_io_cancel+0
xe0) [0xb26900]
[E 14:35:53.235828] [bt] pvfs2-client-core [0x804b926]
[E 14:35:53.235836] [bt] pvfs2-client-core [0x804f141]
[E 14:35:53.235844] [bt] pvfs2-client-core [0x8050c7b]
[E 14:35:53.235852] [bt] pvfs2-client-core(main+0xe48) [0x8052498]


I'll give the pending_ops change a try and post the results.

Michael
Thanks,

-sam
Has anyone else seen the issue with client-core segfaulting on everycancel op? Should the kernel wait_for_cancellation_downcall() be changedto not allow indefinite looping?
Thanks,
Michael
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

------------------------------------------------------------------------

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Index: src/client/sysint/client-state-machine.c
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/client/sysint/client-state-machine.c,v
retrieving revision 1.103
diff -a -u -p -r1.103 client-state-machine.c
--- src/client/sysint/client-state-machine.c	7 Oct 2009 13:48:49 -0000	1.103
+++ src/client/sysint/client-state-machine.c	15 Jan 2010 20:15:16 -0000
@@ -483,6 +483,7 @@ PVFS_error PINT_client_io_cancel(PVFS_sy
     PVFS_error ret = -PVFS_EINVAL;
     PINT_smcb *smcb = NULL;
     PINT_client_sm *sm_p = NULL;
+    PINT_client_sm *sm_base_p = NULL;
 
     gossip_debug(GOSSIP_CLIENT_DEBUG,
             "PINT_client_io_cancel id %lld\n",lld(id));
@@ -507,6 +508,18 @@ PVFS_error PINT_client_io_cancel(PVFS_sy
     {
 	/* op already completed; nothing to cancel. */
         return 0;
+    }
+    
+    /* We also don't cancel small I/O operations as posted by
+     * sys-small-io.sm.  Check the corresponding flag.  We have 
+     * to jump to the base frame rather than the current frame for this
+     * information because small-io may have pushed a msgpairarray.
+     */ 
+    sm_base_p = PINT_sm_frame(smcb, (-(smcb->frame_count -1)));
+    if(sm_base_p->u.io.small_io)
+    {
+        gossip_debug(GOSSIP_CANCEL_DEBUG,  "skipping cancellation of small I/O operation.\n");
+        return(0);
     }
 
     /* if we fall to here, the I/O operation is still in flight */

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] Segfault during cancel op (previously kernel loop)

Reply via email to