Re: problem for the VM gurus

1999-06-14 Thread Matthew Dillon

:  VM lookup the page again.  Always.  vm_fault already does this, 
:  in fact.   We would clean up the code and document it to this effect.
:
:  This change would allow us to immediately fix the self-referential
:  deadlocks and I think it would also allow me to fix a similar bug
:  in NFS trivially.
:
:   I should point out here that the process of looking up the pages is a
:significant amount of the overhead of the routines involved. Although
:doing this for just one page is probably sufficiently in the noise as to
:not be a concern.

It would be for only one page and, besides, it *already* relooksup
the page in vm_fault ( to see if the page was ripped out from under the 
caller ), so the overhead on the change would be very near zero.

:  The easiest interim solution is to break write atomicy.  That is,
:  unlock the vnode if the backing store of the uio being written is
:  (A) vnode-pager-backed and (B) not all in-core. 
:
:   Uh, I don't think you can safely do that. I thought one of the reasons
:for locking a vnode for writes is so that the file metadata doesn't change
:underneath you while the write is in progress, but perhaps I'm wrong about
:that.
:
:-DG
:
:David Greenman

The problem can be distilled into the fact that we currently hold an 
exclusive lock *through* a uiomove that might possibly incur read I/O
due to pages not being entirely in core.   The problem does *not* occur
when we are blocked on meta-data I/O ( such as a BMAP operation ) since
meta-data cannot be mmaped.   Under current circumstances we already
lose read atomicy on the source during the write(), but do not lose
write() atomicy.

The simple solution is to give up or downgrade the lock on the 
destination when blocked within the uiomove.  We can pre-fault
the first two pages of the uio to guarentee a minimum write atomicy
I/O size.  I suppose this could be extended to pre-faulting the
first N pages of the uio, where N is chosen to be reasonably large - like
64K, but we could not guarentee arbitrary write atomicy because the user
might decide to write a very large mmap'd buffer ( e.g. megabytes or
gigabytes ) and obviously wiring that many pages just won't work.

The more complex solution is to implement a separate range lock for
I/O that is independant of the vnode lock.  This solution would also
require deadlock detection and restart handling.  Atomicy would be 
maintained from the point of view of the processes running on the machine
but not from the point of view of the physical storage.  Since write
atomicy is already not maintained from the point of view of the physical
storage I don't think this would present a problem.  Due to the
complexity, however, it could not be used as an interim solution.  It
would have to be a permanent solution for the programming time to be
worth it.  Doing range-based deadlock detection and restart handling
properly is not trivial.  It is something that only databases usually 
need to do.

-Matt
Matthew Dillon 
dil...@backplane.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-14 Thread Matthew Dillon
:  A permanent vnode locking fix is many months away because core
:  decided to ask Kirk to fix it, which was news to me at the time.
:  However, I agree with the idea of having Kirk fix VNode locking.
:
:   Actually, core did no such thing. Kirk told me a month or so ago that he
:intended to fix the vnode locking. Not that this is particularly important,
:but people shouldn't get the idea that Kirk's involvement had anything to
:do with core since it did not.
:
:-DG
:
:David Greenman

Let me put it this way:  You didn't bother to inform anyone else who
might have reason to be interested until it came up as an offhand
comment at USENIX.  Perhaps you should consider not keeping such important
events to yourself, eh?  Frankly, I am rather miffed -- if I had known
that Kirk had expressed an interest a month ago I would have been able
to pool our interests earlier.  Instead I've been working in a vacuum
for a month because I didn't know that someone else was considering trying
to solve the problem.   This does not fill me with rosy feelings.

-Matt
Matthew Dillon 
dil...@backplane.com

:Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
:Creator of high-performance Internet servers - http://www.terasolutions.com
:



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-14 Thread Dag-Erling Smorgrav
Matthew Dillon dil...@apollo.backplane.com writes:
 :A permanent vnode locking fix is many months away because core
 :decided to ask Kirk to fix it, which was news to me at the time.
 :However, I agree with the idea of having Kirk fix VNode locking.
 :
 :   Actually, core did no such thing. Kirk told me a month or so ago that he
 :intended to fix the vnode locking. Not that this is particularly important,
 :but people shouldn't get the idea that Kirk's involvement had anything to
 :do with core since it did not.
 
 Let me put it this way:  You didn't bother to inform anyone else who
 might have reason to be interested until it came up as an offhand
 comment at USENIX.  Perhaps you should consider not keeping such important
 events to yourself, eh?  Frankly, I am rather miffed -- if I had known
 that Kirk had expressed an interest a month ago I would have been able
 to pool our interests earlier.  Instead I've been working in a vacuum
 for a month because I didn't know that someone else was considering trying
 to solve the problem.   This does not fill me with rosy feelings.

Eivind Eklund has also been working on this. It is my understanding
that he has a working Perl version of vnode_if.sh, and is about
halfway through adding invariants to the locking code to track down
locking errors. He stopped working on it about a month or two ago for
lack of time; I seem to recall that he had managed to get the kernel
to boot and was working on panics (from violated invariants) which
occurred during fsck.

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-13 Thread John S. Dyson
 
 * We hack a fix to deal with the mmap/write case.
 
   A permanent vnode locking fix is many months away because core
   decided to ask Kirk to fix it, which was news to me at the time.
   However, I agree with the idea of having Kirk fix VNode locking.
 
   But since this sort of permanent fix is months away, we really need
   an interim solution to the mmap/write deadlock case.
 
   The easiest interim solution is to break write atomicy.  That is,
   unlock the vnode if the backing store of the uio being written is
   (A) vnode-pager-backed and (B) not all in-core. 
 
   This will generally fix all known deadlock situations but at the
   cost of write atomicy in certain cases.  We can use the same hack
   that pipe code uses and only guarentee write atomicy for small 
   block sizes.  We would do this by wiring ( and faulting, if 
   necessary ) the first N pages of the uio prior to locking the vnode.
 
   We cannot wire all the pages of the uio since the user may specify
   a very large buffer - megabytes or gigabytes.
 
 * Stage 3:  Permanent fix is committed by generally fixing vnode locks
   and VFS layering.
 
   ... which may be 6 months if Kirk agrees to do a complete rewrite
   of the vnode locking algorithms.
 
Regarding atomicy:

Remember that you cannot assume that the mappings stay the same during
almost any I/O mechanism anymore.  The issue of wiring pages and assuming
constant mapping has to be resolved.  A careful definition of whether
or not one is doing I/O to an address or I/O to a specific piece of
memory.  I know that this is an end condition, but it has consequences
as to the effects on the design.  (I suspect that a punt to do I/O
to a virtual address is correct, but those change, and also disappear.)

John


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-13 Thread David Greenman
   A permanent vnode locking fix is many months away because core
   decided to ask Kirk to fix it, which was news to me at the time.
   However, I agree with the idea of having Kirk fix VNode locking.

   Actually, core did no such thing. Kirk told me a month or so ago that he
intended to fix the vnode locking. Not that this is particularly important,
but people shouldn't get the idea that Kirk's involvement had anything to
do with core since it did not.

-DG

David Greenman
Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
Creator of high-performance Internet servers - http://www.terasolutions.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-13 Thread Julian Elischer


On Sun, 13 Jun 1999, John S. Dyson wrote:
 
 Remember that you cannot assume that the mappings stay the same during
 almost any I/O mechanism anymore.  The issue of wiring pages and assuming
 constant mapping has to be resolved.  A careful definition of whether
 or not one is doing I/O to an address or I/O to a specific piece of
 memory.  I know that this is an end condition, but it has consequences
 as to the effects on the design.  (I suspect that a punt to do I/O
 to a virtual address is correct, but those change, and also disappear.)

Which brings up the fact that some of us have been talking about making
all IO operations refer to PHYSICAL pages at the strategy layer.

Consider
 
for Raw IO:
read()... user address
physio()  ... user pages are faulted to ensur ethey are present,
  then physical addreses are extracted and remapped to KV.
addresses strategy..  for DMA devices (most ones we really care about)
  ... KV addresses are converted to PHYSICAL addresses again.


If we changed the iterface so that the UIO passed from physio to the
strategy routine held an array of physical addresses we could save quite a
bit of work. Also, it wouldn't matter that the pages were or were not
mapped, as long as they are locked in ram. For domb devices that don't do
DMA, the pages would be mapped by some other special scheme.


For pages coming from the buffer cache/vm system, the physical page
addresses should be known already somewhere and the physical UIO
addresses should be pretty trivially collected for the Strategy.

This sounds like a project that could be bitten off and completed pretty
quickly.

1/ redefine UIO correctly to include UIO_PHYSSPACE and an appropriate 
change in iovec to allow physical addresses.. (they may be differnt to
virtual addresses in some architectures).. maybe define a phys_iovec[]
and make the pointer in UIO a pointer to a union. (?)

2/ change drivers to be able to handle getting a UIO_PHYSSPACE
request. This would require adding a routine to map such requests into 
KV space, for use by the dumb drivers. (all drivers still know how to
handle old type requests)

3/ Change the callers at our leasure (e.g. physio, buffercache etc.)

anyone have comments? I know I discussed this with the NetBSD guys
(e.g. chuq, chuck and jason) and they said htay were looking at similar
things.

possible gotcha's:
you would have to be careful about blocking when allocating teh IOVEC of
physical addresses. maybe they would be allocated as part of allocating
the UIO. (maybe you'd hav eot specify how many pages the UIO
should hold when you allocate it to optimise the allocation.

How this fit's into the proposed rewriting of the iorequest (struct buf)
rewrite that's been rumbling in the background needs evaluation.

julian

p.s. david.. you didn't comment on Matt's subission of a plan for
attacking the deadlock problem. It sounded reasonable to me but I'm only
marginal in this area.






To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-13 Thread David Greenman
Interesting.  It's an overlapping same-process deadlock with mmap/write.
This bug also hits NFS, though in a slightly different way, and also
occurs with mmap/write when two processes are mmap'ing two files and
write()ing the other descriptor using the map as a buffer.

I see a three-stage solution:

* We change the API for the VM pager *getpages() code.

   At the moment the caller busies all pages being passed to getpages()
   and expects the primary page (but not any of the others) to be 
   returned busied.  I also believe that some of the code assumes that
   the page will not be unbusied at all for the duration of the
   operation ( though vm_fault was hacked to handle the situation where
   it might have been ). 

   This API is screwing up NFS and would also make it very difficult for
   general VFS deadlock avoidance to be implemented properly and for
   a fix to the specific case being discussed in this thread to be 
   implemented properly.

   I recommend changing the API such that *ALL* passed pages are 
   unbusied prior to return.  The caller of getpages() must then 
   VM lookup the page again.  Always.  vm_fault already does this, 
   in fact.   We would clean up the code and document it to this effect.

   This change would allow us to immediately fix the self-referential
   deadlocks and I think it would also allow me to fix a similar bug
   in NFS trivially.

   I should point out here that the process of looking up the pages is a
significant amount of the overhead of the routines involved. Although
doing this for just one page is probably sufficiently in the noise as to
not be a concern.

   The easiest interim solution is to break write atomicy.  That is,
   unlock the vnode if the backing store of the uio being written is
   (A) vnode-pager-backed and (B) not all in-core. 

   Uh, I don't think you can safely do that. I thought one of the reasons
for locking a vnode for writes is so that the file metadata doesn't change
underneath you while the write is in progress, but perhaps I'm wrong about
that.

-DG

David Greenman
Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
Creator of high-performance Internet servers - http://www.terasolutions.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-12 Thread Matthew Dillon
Interesting.  It's an overlapping same-process deadlock with mmap/write.
This bug also hits NFS, though in a slightly different way, and also
occurs with mmap/write when two processes are mmap'ing two files and
write()ing the other descriptor using the map as a buffer.

I see a three-stage solution:

* We change the API for the VM pager *getpages() code.

At the moment the caller busies all pages being passed to getpages()
and expects the primary page (but not any of the others) to be 
returned busied.  I also believe that some of the code assumes that
the page will not be unbusied at all for the duration of the
operation ( though vm_fault was hacked to handle the situation where
it might have been ). 

This API is screwing up NFS and would also make it very difficult for
general VFS deadlock avoidance to be implemented properly and for
a fix to the specific case being discussed in this thread to be 
implemented properly.

I recommend changing the API such that *ALL* passed pages are 
unbusied prior to return.  The caller of getpages() must then 
VM lookup the page again.  Always.  vm_fault already does this, 
in fact.   We would clean up the code and document it to this effect.

This change would allow us to immediately fix the self-referential
deadlocks and I think it would also allow me to fix a similar bug
in NFS trivially.

* We hack a fix to deal with the mmap/write case.

A permanent vnode locking fix is many months away because core
decided to ask Kirk to fix it, which was news to me at the time.
However, I agree with the idea of having Kirk fix VNode locking.

But since this sort of permanent fix is months away, we really need
an interim solution to the mmap/write deadlock case.

The easiest interim solution is to break write atomicy.  That is,
unlock the vnode if the backing store of the uio being written is
(A) vnode-pager-backed and (B) not all in-core. 

This will generally fix all known deadlock situations but at the
cost of write atomicy in certain cases.  We can use the same hack
that pipe code uses and only guarentee write atomicy for small 
block sizes.  We would do this by wiring ( and faulting, if 
necessary ) the first N pages of the uio prior to locking the vnode.

We cannot wire all the pages of the uio since the user may specify
a very large buffer - megabytes or gigabytes.

* Stage 3:  Permanent fix is committed by generally fixing vnode locks
  and VFS layering.

... which may be 6 months if Kirk agrees to do a complete rewrite
of the vnode locking algorithms.

-Matt
Matthew Dillon 
dil...@backplane.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-09 Thread John S. Dyson
Howard Goldstein said:
 On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman gr...@unixhelp.org 
 wrote:
  : On Mon, 7 Jun 1999, Matthew Dillon wrote:
  :  ... what version of the operating system?
  : 4.0-CURRENT
 
 3.2R too...
 
I just checked the source (CVS) tree, and something bad happend
between 1.27 and 1.29 on ufs_readwrite.c.  Unless other things
had been changed to make the problem go away, the recursive vnode
thing was broken then.  I am surprised that was changed that long
ago.  (The breakage is an example of someone making a change, and
not either understanding why the code was there, or forgetting to
put the alternative into the code.)

-- 
John  | Never try to teach a pig to sing,
dy...@iquest.net  | it makes one look stupid
jdy...@nc.com | and it irritates the pig.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-09 Thread Brian Feldman
On Wed, 9 Jun 1999, John S. Dyson wrote:

 Howard Goldstein said:
  On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman gr...@unixhelp.org 
  wrote:
   : On Mon, 7 Jun 1999, Matthew Dillon wrote:
   :  ... what version of the operating system?
   : 4.0-CURRENT
  
  3.2R too...
  
 I just checked the source (CVS) tree, and something bad happend
 between 1.27 and 1.29 on ufs_readwrite.c.  Unless other things
 had been changed to make the problem go away, the recursive vnode
 thing was broken then.  I am surprised that was changed that long
 ago.  (The breakage is an example of someone making a change, and
 not either understanding why the code was there, or forgetting to
 put the alternative into the code.)

Is that the limit to Bruce's fu*kup, or did he break it elsewhere, too? It'd be
nice to get this reversed since it's been found. And FWIW, semenu seems to
be the only one to have anything to handle IN_RECURSE, probably because his
NTFS code was recently committed and not mangled.

 
 -- 
 John  | Never try to teach a pig to sing,
 dy...@iquest.net  | it makes one look stupid
 jdy...@nc.com | and it irritates the pig.
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message
 

 Brian Feldman_ __ ___   ___ ___ ___  
 gr...@unixhelp.org_ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!  _ __ | _ \._ \ |) |
 http://www.freebsd.org   _ |___)___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-09 Thread John S. Dyson
 On Wed, 9 Jun 1999, John S. Dyson wrote:
 
  Howard Goldstein said:
   On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman 
   gr...@unixhelp.org wrote:
: On Mon, 7 Jun 1999, Matthew Dillon wrote:
:  ... what version of the operating system?
: 4.0-CURRENT
   
   3.2R too...
   
  I just checked the source (CVS) tree, and something bad happend
  between 1.27 and 1.29 on ufs_readwrite.c.  Unless other things
  had been changed to make the problem go away, the recursive vnode
  thing was broken then.  I am surprised that was changed that long
  ago.  (The breakage is an example of someone making a change, and
  not either understanding why the code was there, or forgetting to
  put the alternative into the code.)
 
 Is that the limit to Bruce's fu*kup, or did he break it elsewhere, too? It'd 
 be
 nice to get this reversed since it's been found. And FWIW, semenu seems to
 be the only one to have anything to handle IN_RECURSE, probably because his
 NTFS code was recently committed and not mangled.
 
I think that I had most of the filesystems fixed somewhere (in my private
tree or in the standard one.)  It is easy to make mistakes, but he was
also right that there is probably a better way to do it.  I suggest putting
the recurse stuff back in for a quick fix, and working the problem in
more detail in the future.

(I could even be wrong if this is where the problem came in -- so much has
 happened since then :-)).

John


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-09 Thread Howard Goldstein
John S. Dyson writes:
  Howard Goldstein said:
   On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman 
   gr...@unixhelp.org wrote:
: 4.0-CURRENT
   
   3.2R too...
   
  I just checked the source (CVS) tree, and something bad happend
  between 1.27 and 1.29 on ufs_readwrite.c.  Unless other things
  had been changed to make the problem go away, the recursive vnode
  thing was broken then.  

I can pretty easily test patches and try other stuff out on a couple
of dozen brand new, architecturally (sp) stressed out (memorywise
(zero swap, 16mb RAM, mfsroot) and cpu bandwidth wise (386sx40)) 3.1-R
(switchable to 3.2R) systems, if it'd be helpful.  Should it bring out
clues leading to the fix for 'the' golden page-not-present instability
it'd be awesome karma.  This very limited environment is especially
fragile and highly susceptible to consistently reproducing the popular
= 3.1R page not present panics.



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-09 Thread John S. Dyson
 John S. Dyson writes:
   Howard Goldstein said:
On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman 
 gr...@unixhelp.org wrote:
 : 4.0-CURRENT

3.2R too...

   I just checked the source (CVS) tree, and something bad happend
   between 1.27 and 1.29 on ufs_readwrite.c.  Unless other things
   had been changed to make the problem go away, the recursive vnode
   thing was broken then.  
 
 I can pretty easily test patches and try other stuff out on a couple
 of dozen brand new, architecturally (sp) stressed out (memorywise
 (zero swap, 16mb RAM, mfsroot) and cpu bandwidth wise (386sx40)) 3.1-R
 (switchable to 3.2R) systems, if it'd be helpful.  Should it bring out
 clues leading to the fix for 'the' golden page-not-present instability
 it'd be awesome karma.  This very limited environment is especially
 fragile and highly susceptible to consistently reproducing the popular
 = 3.1R page not present panics.
 
BTW, one more thing that is useful for testing limited memory situations
is setting the MAXMEM config variable.  Last time that I looked, it allows
you to set the number of K of avail mem.  If you try to run with less than
MAXMEM=4096 or MAXMEM=5120, you'll have troubles through. 

John


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread John S. Dyson
Arun Sharma said:
 bread
 ffs_read
 ffs_getpages
 vnode_pager_getpages
 vm_fault
 ---
 slow_copyin
 ffs_write
 vn_write
 dofilewrite
 write
 syscall 
 
 getblk finds that the buffer is marked B_BUSY and sleeps on it. But I
 can't figure out who marked it busy.
 
This looks like the historical problem of doing I/O to an mmap'ed region.
There are two facets to the problem:  One where the I/O is to the same
vn, and the other is where the I/O is to a different vn.  The case where
the I/O is to the same vn had a (short term) fix previously in the code, 
by allowing for recursive usage of a vn under certain circumstances.  The
problem of different vn's can be fixed by proper resource handling in
vfs_bio (and perhaps other places.)  (My memory isn't 100% clear on the
code anymore, but you have shown alot of info with your backtrace.)

-- 
John  | Never try to teach a pig to sing,
dy...@iquest.net  | it makes one look stupid
jdy...@nc.com | and it irritates the pig.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread Brian Feldman
One of the problems that would make it sensible to do a complete rewrite
of vfs_bio.c is this?

 Brian Feldman_ __ ___   ___ ___ ___  
 gr...@unixhelp.org_ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!  _ __ | _ \._ \ |) |
 http://www.freebsd.org   _ |___)___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread John S. Dyson

 One of the problems that would make it sensible to do a complete rewrite
 of vfs_bio.c is this?
 
Specifically for that reason, probably not.  However, if the effort
was taken as an entire and encompassing effort, with the understanding
of what is really happening in the code regarding policy (and there
is alot more than the original vfs_bio type things), then it would
certainly be best.  Note that some of the policy might even be
marginalized given a restructuring by eliminating the conventional
struct buf's for everything except for I/O.  In the case of I/O,
it would be good to talk to those who work on block drivers, and
collect info on what they need.  The new definition could replace
the struct bufs for the block I/O subsystems, but in many ways could
be similar to struct bufs (for backwards compat.)

In the current vfs_bio, the continual remapping is problematical,
and was one of the very negative side-effects of the backwards
compatibility choice.  The original vfs_bio merged cache design
actually (mostly) eliminated the struct bufs for the buffer cache
interfacing, and the temporary mappings thrashed much less often.
It would also be good to design in the ability to use physical
addressing (for those architectures that don't incur significant
additional cost for physically mapping all of memory.)  Along
with proper design, the fully mapped physical memory would
eliminate the need for remapping entirely.  Uiomove in this
case  wouldn't need virtually mapped I/O buffers, and this
would be ideal.  However, it is unlikely that X86 machines
would ever support this option.  PPC's, R(X) and Alpha
can support mapping all of memory by their various means 
though.

In a sense, the deadlock issue is an example of the initially
unforseen problems when hacking on that part of the code.  I suggest
a carefully orchestrated and organized migration towards the more
efficient and perhaps architecturally cleaner approach.  The
deadlock was an after the fact bug that we found very early on,
and there was a temporary fix for part of it, and a mitigation
of the other part.  Issues like that can be very, very nasty to
deal with.

John


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread John S. Dyson
Brian Feldman said:
   In the long-standing tradition of deadlocks, I present to you all a new one.
 This one locks in getblk, and causes other processes to lock in inode. It's
 easy to induce, but I have no idea how I'd go about fixing it myself
 (being very new to that part of the kernel.)
   Here's the program which induces the deadlock:
 
 
   tmp = mmap(NULL, psize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
   if (tmp == MAP_FAILED) {
   perror(mmap);
   exit(1);
   }
   printf(write retval == %lld, write(fd, tmp, psize));
 
I responded earlier to a reply to this message :-).  This did
work about the time that I left, and it appears that it is likely
that code has been removed that mitigated this as a problem.

It is important to either modify the way that read or write
operations occur (perhaps prefault before letting the uiomove
operation occur (yuck, and it also still doesn't close all 
windows), or reinstate the handling of recursive operations
on a vnode by the same process.)  Handling the vnode locking
in a more sophistcated way would be better, but reinstating
(or fixing) the already existant code that used to handle
this would be a good fix that will mitigate the problem
for now.

-- 
John  | Never try to teach a pig to sing,
dy...@iquest.net  | it makes one look stupid
jdy...@nc.com | and it irritates the pig.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread Matthew Dillon
... what version of the operating system?

-Matt

:  In the long-standing tradition of deadlocks, I present to you all a new one.
:This one locks in getblk, and causes other processes to lock in inode. It's
:easy to induce, but I have no idea how I'd go about fixing it myself
:(being very new to that part of the kernel.)
:  Here's the program which induces the deadlock:
:
:#include sys/types.h
:#include sys/mman.h
:...


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread Brian Feldman
On Mon, 7 Jun 1999, Matthew Dillon wrote:

 ... what version of the operating system?
 
   -Matt

4.0-CURRENT

 
 :  In the long-standing tradition of deadlocks, I present to you all a new 
 one.
 :This one locks in getblk, and causes other processes to lock in inode. It's
 :easy to induce, but I have no idea how I'd go about fixing it myself
 :(being very new to that part of the kernel.)
 :  Here's the program which induces the deadlock:
 :
 :#include sys/types.h
 :#include sys/mman.h
 :...
 

 Brian Feldman_ __ ___   ___ ___ ___  
 gr...@unixhelp.org_ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!  _ __ | _ \._ \ |) |
 http://www.freebsd.org   _ |___)___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-07 Thread Howard Goldstein
On Mon, 7 Jun 1999 18:38:51 -0400 (EDT), Brian Feldman gr...@unixhelp.org 
wrote:
 : On Mon, 7 Jun 1999, Matthew Dillon wrote:
 :  ... what version of the operating system?
 : 4.0-CURRENT

3.2R too...


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



problem for the VM gurus

1999-06-06 Thread Brian Feldman
  In the long-standing tradition of deadlocks, I present to you all a new one.
This one locks in getblk, and causes other processes to lock in inode. It's
easy to induce, but I have no idea how I'd go about fixing it myself
(being very new to that part of the kernel.)
  Here's the program which induces the deadlock:

#include sys/types.h
#include sys/mman.h

#include stdio.h
#include fcntl.h
#include unistd.h

int
main(int argc, char **argv) {
int psize = getpagesize() * 2;
void *tmp;
char name[] = m.XX;
int fd = mkstemp(name);

if (fd == -1) {
perror(open);
exit(1);
}

tmp = mmap(NULL, psize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (tmp == MAP_FAILED) {
perror(mmap);
exit(1);
}
printf(write retval == %lld, write(fd, tmp, psize));
unlink(name);
exit(0);
}

 Brian Feldman_ __ ___   ___ ___ ___  
 gr...@unixhelp.org_ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!  _ __ | _ \._ \ |) |
 http://www.freebsd.org   _ |___)___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: problem for the VM gurus

1999-06-06 Thread Arun Sharma
Brian Feldman gr...@unixhelp.org writes:

   In the long-standing tradition of deadlocks, I present to you all
   a new one. This one locks in getblk, and causes other processes to
   lock in inode. It's easy to induce, but I have no idea how I'd go
   about fixing it myself (being very new to that part of the
   kernel.)  Here's the program which induces the deadlock:

I could reproduce it with 4.0-current. The stack trace was:

tsleep
getblk
bread
ffs_read
ffs_getpages
vnode_pager_getpages
vm_fault
---
slow_copyin
ffs_write
vn_write
dofilewrite
write
syscall 

getblk finds that the buffer is marked B_BUSY and sleeps on it. But I
can't figure out who marked it busy.

-Arun

PS: Does anyone know how to get the stack trace by pid in ddb ? I can
manually type trace p-p_addr. But is there an easier way ?


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message