>On 23/01/17(Mon) 17:03, Ted Unangst wrote:
>> Martin Pieuchot wrote:
>> > On 17/01/17(Tue) 00:24, Alexander Bluhm wrote:
>> > > On Mon, Jan 16, 2017 at 08:36:46PM +0100, cheek...@gmx.com wrote:
>> > > > kernel: protection fault trap, code=0
>> > > > Stopped at fd_getfile+0x20: testb $0x2,mptramp_gdt32_desc+0x1e(%r
>> > > > ax)
>> > > > ddb{3}> fd_getfile() at fd_getfile+0x20
>> > > > sys_fstat() at sys_fstat+0x43
>> > > > syscall() at syscall+0x27b
>> > > 
>> > > It crashes in fd_getfile() FILE_IS_USABLE(fp) as fdp->fd_ofiles has
>> > > been freed.
>> > 
>> > Are you sure?  The faulting instruction is:
>> > 
>> > /sys/kern/kern_descrip.c:190
>> >    80:       f6 40 40 02             testb  $0x2,0x40(%rax)
>> >    84:       75 e7                   jne    6d <fd_getfile+0xd>
>> > 
>> > So %rax contains an incorrect value which is not NULL, are you
>> > suggesting that it is garbage due to free(9) poisoning the memory? 
>> > 
>> > If that's the case, the easiest fix in would be to do the allocations
>> > upfront.  An alternative solution discussed here at a2k17 would be to
>> > use a SRP to guarantee that fd_ofiles is not freed while another CPU is
>> > still referencing it.  That would also help for MP work.
>> > 
>> > Or could it be another race that your lock is preventing?
>> > 
>> > What about the diff below?  Cheeky does it help?
>> 
>> In that case, I think it's better to move the free down. This is more
>> idiomatic, free the old value and immediately replace.
>
>Sure I can do that, I still want to add a comment since it might bite us
>later.
>
>So a counter diff is like an ok?

My theory is memory pressure causes one or two of these waits, and then
a threading bug enters this concurrently.

It seems better to put all sleeping malloc's together, so that no sleeping
operations occur during the manipulation.  So I favor mpi's diff.

Are there any occurances during which free sleeps?

Reply via email to